This introduces only some minor bugfixes compared to the commit we
selected before; the main goal here is to be on an actual tagged release
rather than an arbitrary commit.
Previously we were using the EndOfLine as a mandatory marker to end the
comment, but that meant that if a comment appeared immediately before EOF
without a newline on the end it would fail to match.
Now we use the :>> operator similarly to how we previously fixed
greediness in the multi-line comment case: it tells Ragel to end the
Comment production if the following pattern matches (if EndOfLine is found)
but also allows the point before EndOfLine to be a final state, in case
EOF shows up there.
The contract for our parser is that in the case of errors our result is
stil valid, though possibly incomplete, so that development tools can
still do analysis of the parts of the result we _were_ able to parse.
However, we were previously failing to meet that contract in the presence
of certain syntax errors during block parsing, where we were producing
a nil body instead of a valid empty one.
Now we'll produce an empty placeholder body if for any reason we don't
have a real one before we return from block parsing, which then allows
analysis tools to see that the containing block was present but makes
its content appear totally empty. This is always done in conjunction with
returning an error, so a calling application will not be mislead into
thinking it is a complete result even though parts are missing.
The TemplateStringLiteral production was not quite right, causing a
literal $ or % immediately followed by " to consume the quotes and any
following characters on the line if there were any more characters on the
line.
Now we match things more precisely, but at the expense of generating some
redundant extra tokens when escapes and literal dollar/percent signs are
present. Those extra tokens don't matter in practice because the resulting
strings get concatenated together anyway, which is proven by the fact
that this changeset includes changes only to the scanner and parser tests,
and not to any of the expression result tests.
While here, I also improved the error message for when the user attempts
to split a quoted string over multiple lines. Previously it was just using
the generic "invalid character" message, which isn't particularly
actionable. Now we'll give the user a couple options of what to do
instead.
Previously our behavior for an unknown for_each was to produce a single
block whose content was the result of evaluating content with the iterator
set to cty.DynamicVal. That produced a reasonable idea of the content, but
the number of blocks in the result was still not accurate, and that can
present a problem for applications that use unknown values to predict
the overall shape of a not-yet-complete structure.
We can't return an unknown block via the HCL API, but to make that
situation easier to recognize by callers we'll now go a little further and
force _all_ of the leaf attributes in such a block to be unknown values,
even if they are constants in the configuration. This allows a calling
application that is making predictions to use a single object whose
leaves are all unknown as a heuristic to recognize what is effectively
an unknown set of blocks.
This is still not a perfect heuristic, but is the best we can do here
within the HCL API assumptions. A fundamental assumption of the HCL API
is that it's possible to walk the block structure without evaluating any
expressions and the dynamic block extension is intentionally subverting
that assumption, so some oddities are to be expected. Calling applications
that need a fully reliable sense of the final structure should not use
the dynamic block extension.
Fixes an issue where the name range may be incorrect in case there are
multiple attributes and one of them is wrong. Another attribute's name
range could overwrite the previous one as the attr variable is
overwritten in the for loop.
This situation is likely to arise if the user attempts to compute an index
using division while expecting HCL to do integer division rather than
float division.
This message is intended therefore to help the user see what fix is needed
(round the number) but because of layering it sadly cannot suggest a
specific remedy because HCL itself has no built-in rounding functionality,
and thus how exactly that is done is a matter for the calling application.
We're accepting that compromise for now to see how common this turns out
to be in practice. My hypothesis is that it'll be seen more when an
application moves from HCL 1 to HCL 2 because HCL 1 would do integer
division in some cases, but then it will be less common once new patterns
are established in the language community. For example, any existing
examples of using integer division to index in Terraform will over time
be replaced with examples showing the use of Terraform's "floor" function.
This is a variant of NewRangeScanner that allows the caller to specify the
start position, which is appropriate when this utility is being used with
a sub-slice of a file, rather than the whole file.
Previously, hclsyntax MissingItemRange() function returned a zero-length
range anchored at the end of the block in question. This commit changes
that to the beginning of the block. In practice, the end of a block is
generally just a "}" and not very useful in error messages.
In normal situations the block type name alone is enough to determine the
appropriate schema for a child, but when callers are otherwise doing
unusual pre-processing of bodies to dynamically generate schemas during
decoding they are likely to need to take similar steps while analyzing
for variables, to ensure that all of the references can be located in
spite of the not-yet-applied pre-processing.
Our API previously had a function only for retrieving the variables used
in the for_each and labels arguments used during an Expand call, and
expected callers to then interrogate the resulting expanded block to find
the other variables required to fully decode the content.
That approach is insufficient for any application that needs to know the
full set of required variables before any evaluation begins, such as when
a dependency graph will be constructed to allow a topological traversal
through blocks while evaluating.
Now we have WalkVariables, which finds both the variables used to expand
_and_ the variables within any blocks. This also renames
WalkForEachVariables to WalkExpandVariables since that name is more
accurate with the addition of the "label" argument into the expand-time
dependency set.
There is also a hcldec-based helper wrapper for each of those, allowing
single-shot analysis of blocks for applications that use hcldec.
This is a breaking change to the dynblock package API, because the old
WalkForEachVariables and ForEachVariablesHCLDec functions are no longer
present.
Evaluate json null values as cty.Null, rather than as unknown value.
Using DynamicPseudoType as the null type as a placeholder for the null
type. Callers may convert the type against schema to get the concrete
type.
The fact that object constructors are newline-sensitive while object for
expressions are not requires some special consideration in the parser. We
previously make a small fix here to delay turning on newline-sensitive
scanning before peeking ahead for a "for" keyword, but that was sufficient
only when the for expression was not already in a newline-sensitive
context.
Now we force newline-sensitive parsing off while we scan for the keyword,
and also again once we begin parsing the for expression, ensuring that
the for expression is always scanned properly regardless of what context
it appears in.
The range was incorrectly being reported as "Context", rather than
"Subject". The Context field has meaning only in conjunction with Subject.
While here, this also tweaks the summary to show the block type name in
quotes, since otherwise the sentence can read oddly for certain block type
names.
These make it easier for calling applications to get the same result as
operators within HCL expressions both for individual attribute accesses
and when processing whole cty.Paths.
(We previously had an Index function which did the same thing for
indexing, and ApplyPath is just a wrapper around calling GetAttr and Index
in a loop.)
When dealing with numbers that have no finite representation in base 2, it
is important that all parsers agree on the expected maximum precision.
Previously we had agreement by convention, but for robustness here we'll
centralize the handling of number parsing to cty.ParseNumberVal, which
uses the same settings as we were previously using in the JSON parser and,
for the native syntax parser, is just a shorthand to the same parsing
we were previously doing with the cty/convert package.
This should cause no behavior change since all of these callers were
previously in agreement with the cty "standard", but this factoring helps
establish that there _is_ a standard here.
This includes a new function cty.ParseNumberVal which centralizes the
standard way to produce a cty.Number from a string so we can be sure to
always get comparable numbers.
When marshalling, the current file index was not stored. Because of
this, a ';' was inserted multiple times for each file, even if the file
did not change.
When unmarshalling, the fileIdx determined by number of ';' was ignored.
Thus, if there were more than one file, all the positions would still
point to the first file.
Fixes setting the MissingItemRange on the remaining body when a
*hclpack.Body is partially decoded. Otherwise when the remaining body is
decoded with missing fields, the diagnostic cannot point to where they
should be set.
Fixes an issue where a nested block would be decoded incorrectly, the
body of the last decoded block overwrites the previously decoded ones.
This was caused by the block being assigned on the stack in the for
loop; when the block is converted to a *hcl.Block, the pointer to Body
will always point to the same block. This caused decoding a new block to
overwrite the bodies of any previously decoded blocks.
This allows using a splat expression to conveniently coerce a
possibly-null scalar into a zero- or one-item tuple, which is helpful
because in HCL we prefer "for each item in sequence" operations over pure
conditionals in many situations just because they compose better in our
declarative language.
For example, in a language that uses the "dynblock" extension we can
turn a possibly-null object into zero or one blocks using its for_each
argument with a splat operation:
dynamic "thingy" {
for_each = maybe_null.*
content {
name = thingy.value.name
}
}
This fixes#66.
Previously we were incorrectly passing down the original forEachCtx down
to nested child blocks for recursive expansion. Instead, we must use the
iteration-specific constructed EvalContext, which then allows any nested
dynamic blocks to use the parent's iterator variable in their for_each or
labels expressions, and thus unpack nested data structures into
corresponding nested block structures:
dynamic "parent" {
for_each = [["a", "b"], []]
content {
dynamic "child" {
for_each = parent.value
content {}
}
}
}
A BOM is pointless in a UTF-8 file because it has a fixed encoding
agnostic of host byte ordering, but since Windows tends to use UTF-16
internally lots of Windows software will tend to generate redundant BOM
sequences at the start of UTF-8 files too.
By tolerating a leading BOM we can make life easier for those using such
Windows software, without any significant loss for normal use. This
slightly violates some of our normal assumptions about token positioning
since the BOM occupies bytes but not visible columns, but we'll just
accept that this may cause some slightly-odd behavior for use-cases such
as the diagnostic renderer and hclwrite.
Template sequences are forbidden in block labels, but previously we were
handling them in a very severe way, by bailing out of block parsing early
and leaving the body in the AST as nil.
The rest of HCL doesn't expect to find a nil body, and in any case we can
safely keep parsing the rest of the block after recovering because the
closing quote gives us an unambiguous resume point. Therefore we'll now
process the rest of the block as normal, producing an AST that is complete
aside from having an invalid label string inside of it.
Skipping the template sequence in the returned label entirely creates a
risk that analysis code (which may try to inspect a partial AST on error)
will misinterpret the string as valid, so we generate a placeholder
"${ ... }" or "%{ ... }" sequence in the returned string to make it
clearer in any follow-up output that there was something there in the
original source. Normal callers won't be affected by this because they
don't process the AST when errors are present anyway.
The traversal returned from AbsTraversalForExpr may, for some expression
types, be referring to the same backing array as one stored inside the
node itself, and so previously this function may have inadvertently
corrupted the data associated with an AST node.
The symmetrical spaces around colons in conditionals are important for
familiarity with C-like languages, so we'll instead accept spaces around
colons in our HCL2-unique "for expression" construct.
Leading whitespace is significant in heredocs, so we'll avoid making any
indentation adjustments for lines between OHeredoc and CHeredoc.
This fixes#31.
Our normal ruleset thinks that the "in" keyword here is a variable
reference and so writes it as "in[y]". Since there's never any reason for
a variable to appear immediately after another variable, we can check
for a preceding identifier as a heuristic to recognize whether in is
probably being used as a keyword rather than as a variable.
This is not exact, but the only time this should be a false positive is
if there were a syntax error in the input, and we don't make any
guarantees about the result in that case anyway.
This fixes#52.
This relaxes our previous spec to include a special form from HCL 1:
foo { bar = baz }
Although we normally require each argument to be on a line of its own, as
a special case we allow a block to be defined with a single nested
argument all on one line.
Only one nested argument definition is allowed, and a nested block
definition like "foo { bar {} }" is also disallowed in order to force the
more-readable split of bar {} onto a line of its own.
This is a pragmatic addition for broader compatibility with HCL 1-oriented
input. This single-line usage is not considered idiomatic HCL 2 and may
in future be undone by the formatter, though for now it is left as-is
aside from the spacing around the braces.
This also changes the behavior of the source code formatter to include
spaces on both sides of braces. This mimicks the formatting behavior of
HCL 1 for this situation, and (subjectively) reads better even for other
one-line braced expressions like object constructors and object for
expressions.
We were taking a pointer to a for loop iterator variable and thus
capturing the final iteration value rather than each one separately. By
using the .Ptr() method instead, we force a copy of the range which we
then take a pointer to.
This was implemented a long time ago in the original template parser, but
it was missed in the rewrite of the template parser to make it use a
two-stage parsing strategy.
It's implemented as a post-processing step on the result of the first
stage of parsing, which produces a flat sequence of literal strings,
interpolation markers, and control markers, and prior to the second stage
which matches opening and closing control markers to produce an expression
AST.
It's important to do this at parse time rather than eval time since it is
the static layout of the source code that decides the indentation level,
and so an interpolation marker at the start of a line that itself produces
spaces does not affect the result.
The evaluation of this was there but the parsing was still a TODO comment
from early development. Whoops!
Fortunately the existing parser functionality makes this straightforward
since we can just have the traversal parser recursively call itself.
This fixes#63.
When a JSON object is representing an expression, template sequences are
permitted in the property names as well as the values. We must detect
the references here so that applications that do dynamic scope
construction or dependency analysis will get the right result.
Although our API had a place to provide a start position for scanning, it
didn't actually work in practice because the scanner wasn't aware of it
and so it would immediately undo the effect of that start offset when
making the first position adjustment.
Now we'll remember the byte offset we started at and offset the indices
the generate scanner produces so that they are are treated as relative
to that start byte instead of byte zero.
Since we rarely start with a non-zero pos this doesn't affect much, but
one specific thing it affects is the positions of native syntax templates
inside JSON syntax strings.