Previously we tolerated EOF as an alias for newline, but a file without
a newline at the end is a edge case primarily limited to contrived
examples in unit tests, and requiring it simplifies tasks such as code
generation in zclwrite, since we can always assume that every block item
comes with a built-in line terminator.
We consume both of these things primarily so that when we manipulate
the AST they will all stay together. For example, removing an attribute
from a body should take its comments and newline too.
The native parser's ranges don't include any surrounding comments, so we
need to do a little more work to pick them out of the surrounding token
sequences.
This just takes care of _lead_ comments, which are those that appear as
whole line comments above the item in question. Line comments, which
appear after the item on the same line, will follow in a later commit.
This is not yet complete, since it fails to capture the newline, line
comments, and variable references in expressions. However, it does
capture the broad structure of an attribute, along with gathering up
all of its _interior_ tokens.
The "writer parser" is a parser that produces a writer AST rather than
a zclsyntax AST. This can be used to produce a writer AST from existing
source in order to modify it before writing it out again.
It's implemented with the somewhat-unintuitive approach of running the
main zclsyntax parser and then mapping the source ranges it finds back
onto the token sequence to pull out the raw tokens for each object.
This allows us to avoid maintaining two parsers but also keeps all of
this raw-token-wrangling complexity out of the main parser.
The context where a string literal was found affects what sort of escaping
it can have, so we need to distinguish these cases so that we will only
look for and handle backslash escapes in quoted strings.
In zclwrite we throw away the absolute source position information and
instead just retain the number of spaces before each token. This different
model allows us to rewrite parts of the token sequence without needing
to re-adjust all of the positions, and it also allows us to do simple
indentation and spacing adjustments just by walking through the token
list and adjusting these numbers.