Rather than setting an "Unwrap" field on the existing TemplateExpr, we'll
instead use a separate, simpler AST node.
There is very little in common between these two cases, so overloading the
TemplateExpr node doesn't buy much. A separate node is needed here,
rather than just returning the wrapped node directly, to give us somewhere
to capture the full source range extent of the wrapping template, whereas
the inner expression only captures the range of itself. This is important
both for good diagnostics and for transforming zclsyntax AST into
zclwrite AST.
Previously it was !{, but in real examples this looked confusing since
an exclamation point after a word looks (to humans) like literal
punctuation rather than syntax.
% is not ideal either since it's also the marker traditionally used for
printf, but has the advantage that programmers are already primed for it
to be syntax.
Since the native syntax requires that attribute names and block types
be identifiers, there is no way for "//" to be a valid symbol in a body.
Therefore we can special-case it here as an ignored property name that
users can then use for comments.
This special handling intentionally does not apply to objects representing
zcl object expressions, since it _is_ valid to have non-identifier keys
there and so "//" may be a legitimate object key for some applications.
Previously we tried to do the whole template parse in one pass. This was
adequate for dealing with literals and interpolations because they
create a flat structure, but to parse the template control sequences we
need to be able to deal with nested template sequences.
As a building block towards this, we first do a pass of extracting the
template-level "tokens": literals, interpolations, control sequences.
We then pass over that sequence of tokens and parse it, which is then
simplified because the larger template atoms have already been produced.
Attribute-only splat expressions cannot have other splats nested inside,
since we're only interested in supporting how these behaved for HIL when
running inside Hashicorp Terraform. More complex cases should be dealt
with using either full splats (bracketed *) or "for" expressions.
Attribute-only splat expressions, indicated by using the star symbol as
if it were an attribute, are inherited from HIL and are a limited sort of
splat that only works for walking through attributes in each item of its
source.
Later zcl will also get full splat expressions, whose syntax is using
the star symbol as if it were an _index_, which will generalize to
supporting _all_ postfix traversal operations, including indexing and
nested splats.
If we can't find a function in the given EvalContext, we must traverse
the context chain until either we run out of contexts or we find a
matching function.
At present there is no reason for a non-root context to have any
functions, so this will always traverse to the root. This may change in
future if we introduce constructs that define local functions.
it's acceptable to have "Variables" set to nil in an EvalContext, if
a particular scope has no variables at all. If _no_ contexts in the
chain have non-nil variables, this is considered to mean that variables
are not allowed at all, which produces a different error message.
This syntax func(arg...) allows the final argument to be a sequence-typed
value that then expands to be one argument for each element of the
value.
This allows applications to define variadic functions where that's
user-friendly while still allowing users to pass tuples to those functions
in situations where the args are chosen dynamically.
The purpose of the Variables function is to tell a calling application
what symbols need to be present in the _root_ scope, so it would be
unhelpful to include child scope traversals. Child scopes are populated
by the nodes that create them, and are thus not interesting to the
calling application (for this purpose, at least).
The ForExpr is essentially a list/map comprehension, allowing projecting
one expression into another. From a syntactic standpoint it's the most
complex structure we've dealt with so far, with many separate parts.
The tests introduced here are not exhaustive but illustrate that the
basic mechanism is working.
Previously operations were an enum, but we've ended up needing to store
a collection of values against each, so an operation being a pointer to
a struct feels more natural.
This in turn allows us to more easily fix the return types of the
operations, so that we don't need to do any unusual work to understand
that (for example) arithmetic always returns a number.
Previously we were detecting the exactly-one-part case at parse time and
skipping the TemplateExpr node in the AST, but this was problematic
because we then misaligned the source range for the expression to the
_content_ of the quotes, rather than including the quotes themselves.
As well as producing confusing diagnostics, this also caused problems for
zclwrite since it relies on source ranges to map our AST back onto the
source tokens it came from.
The existing round trip test was testing that we can pass through a set
of tokens through the AST entirely unmodified.
This new round-trip test passes the source through the Format function
and then checks that it has the same semantics, by evaluating it both
before and after and expecting an identical set of values.
This tweaks the number of spaces before each token to create a consistent
layout. It doesn't (yet?) attempt to add or remove line breaks or
otherwise mess with the non-space characters in the code.
The main goal here is to get things to line up. zcl syntax is simple
enough that there's not much latitude for super-weird usage, and so if
someone does manage to write something weird (asymmetric breaking of
brackets between lines, etc) we won't try to fix it here.
Our approach here is to split the input into lines and then to split
each line into up to three cells which, when present, should have their
leftmost characters aligned (eventually) by the formatter.
Previously we tolerated EOF as an alias for newline, but a file without
a newline at the end is a edge case primarily limited to contrived
examples in unit tests, and requiring it simplifies tasks such as code
generation in zclwrite, since we can always assume that every block item
comes with a built-in line terminator.
We consume both of these things primarily so that when we manipulate
the AST they will all stay together. For example, removing an attribute
from a body should take its comments and newline too.
When we're skipping comments but retaining newlines, we need to do some
slight-of-hand because single-line comment tokens contain the newline
that terminates them (for simpler handling of lead doc comments) but our
parsing can be newline-sensitive.
To allow for this, as a special case we transform single-line comment
tokens into newlines when in this situation, thus allowing parser code
to just worry about the newlines and skip over the comments.
The native parser's ranges don't include any surrounding comments, so we
need to do a little more work to pick them out of the surrounding token
sequences.
This just takes care of _lead_ comments, which are those that appear as
whole line comments above the item in question. Line comments, which
appear after the item on the same line, will follow in a later commit.
This is mainly for symmetry with the API of zclsyntax, but also later
we'll probably have a ParseExpression function that can be used to make
edits to individual expressions.
This is not yet complete, since it fails to capture the newline, line
comments, and variable references in expressions. However, it does
capture the broad structure of an attribute, along with gathering up
all of its _interior_ tokens.
We'll use TokenSeq for everything, including sequences we expect to
contain only one token, mainly for consistency but also so that our local
"parser" doesn't need to care about whether the main parser is returning
one token or many.
This now allows for round-tripping some input from bytes to tokens and
then back to bytes again. With no other changes, we expect this to produce
an identical result.
Although building a flat array of tokens is _one_ use-case for TokenGen,
this new approach means we can also use the interface to write to an
io.Writer without needing to produce an intermediate buffer.
Previously we were allocating a separate heap object for each token, which
creates a lot of small objects for the GC to manage. Since we know that
we're always converting from a flat array of native tokens, we can produce
a flat array of writer tokens first and _then_ take pointers into that
array to achieve our goal of making a slice of pointers.
For the use-case of formatting a sequence of tokens by tweaking the
"SpacesBefore" value, this means we can get all of our memory allocation
done in a single chunk and then just tweak the allocated, contiguous
tokens in-place, which should reduce memory pressure for a task which
will likely be done frequently by a text editor integration doing "format
on save".
The "writer parser" is a parser that produces a writer AST rather than
a zclsyntax AST. This can be used to produce a writer AST from existing
source in order to modify it before writing it out again.
It's implemented with the somewhat-unintuitive approach of running the
main zclsyntax parser and then mapping the source ranges it finds back
onto the token sequence to pull out the raw tokens for each object.
This allows us to avoid maintaining two parsers but also keeps all of
this raw-token-wrangling complexity out of the main parser.