This accompanies ExprList, ExprMap, and AbsTraversalForExpr to
complete the set of static analysis interfaces for digging down into the
expression syntax structures without evaluation.
The intent of this function is to be a little like AbsTraversalForExpr
but for function calls. However, it's also similar to ExprList in that
it gives access to the raw expression objects for the arguments, allowing
for recursive analysis.
We recognize and allow naked $ and % sequences by reading ahead one more
character to see if it's a "{" that would introduce an interpolation or
control sequence.
Unfortunately this is problematic in the end condition because it can
"eat" the terminating character and cause the scanner to continue parsing
a template when the user intended the template to end.
Handling this is a bit messy. For the quoted and heredoc situations we
can use Ragel's fhold statement to "backtrack" to before the character
we consumed, which does the trick. For bare templates this is insufficient
because there _is_ no following character and so the scanner detects this
as an error.
Rather than adding even more complexity to the state machine, instead we
just handle as a special case invalid bytes at the top-level of a bare
template, returning them as a TokenStringLit instead of a TokenInvalid.
This then gives the parser what it needs.
The fhold approach causes some odd behavior where an escaped template
introducer character causes a token split and two tokens are emitted
instead of one. This is weird but harmless, since we'll ultimately just
concatenate all of these strings together anyway, and so we allow it
again to avoid making the scanner more complex when it's easy enough to
handle this in the parser where we have more context.
This was allowed in legacy HCL, and although it was never documented as
usable in the Terraform documentation it appears that some Terraform
configurations use this form anyway.
While it is non-ideal to have another edge-case to support/maintain, this
capability adds no ambiguity and doesn't add significant complexity, so
we'll allow it to be pragmatic for existing usage.
Terraform allowed indexing like foo.0.bar to work around HIL limitations,
and so we'll permit that as a pragmatic way to accept existing Terraform
configurations.
However, we can't support this fully because our parser thinks that
chained number indexes, like foo.0.0.bar, are single numbers. Since that
usage in Terraform is very rare (there are very few lists of lists) we
will mark that situation as an error with a helpful message suggesting
to use the modern index syntax instead.
This also turned up a similar bug in the existing legacy index handling
we were doing for splat expressions, which is now handled in the same
way.
We are leaning on the unicode identifier definitions here, but the
specified ID_Start does not include the underscore character and users
seem to expect this to be allowed due to experience with other languages.
Since allowing a leading underscore introduces no ambiguity, we'll allow
it. Calling applications may choose to reject it if they'd rather not have
such weird names.
Previously we missed the '%' character in our "SelfToken" production,
which meant that the modulo operator could not parse properly due to it
being represented as a TokenInvalid.
Due to some earlier limitations of the parser we required each attribute
and block to end with a newline, even if it appeared at the end of a
file. In effect, this required all files to end with a newline character.
This is no longer required and so we'll tolerate that missing newline for
pragmatic reasons.
Elsewhere we are using 512-bit precision as the standard for converting
from a string to a number, since the default is shorter. This is just to
unify JSON parsing with the native syntax processing and the automatic
type conversions in the language, so we don't see different precision
behaviors depending on syntax.
big.Float is not DeepEqual-friendly because it contains a precision value
that can make two numerically-equal values appear as non-equal.
Since the number decoding isn't the point of these tests, instead we just
swap out for cty.Bool values which _are_ compatible with
reflect.DeepEqual, since they are just wrappers around the native bool
type.
The contract for AbsTraversalForExpr calls for us to interpret an
expression as if it were traversal syntax. Traversal syntax does not have
the special keywords "null", "true" and "false", so we must interpret
these as TraverseRoot rather than as literal values.
Previously this wasn't working because the parser converted these to
literals too early. To make this work properly, we implement
AbsTraversalForExpr on literal expressions and effectively "undo" the
parser's re-interpretation of these keywords to back out to the original
keyword strings.
We also rework how object keys are handled so that we wait until eval time
to decide whether to interpret the key expression as an unquoted literal
string. This allows us to properly support AbsTraversalForExpr on keys
in object constructors, bypassing the string-interpretation behavior in
that case.
This is similar to the ExprList function but for map-like constructs
(object constructors in the native syntax). It allows a more-advanced
calling application to analyze the physical structure of the configuration
directly, rather than analyzing the dynamic results of its expressions.
This is useful when creating what appear to be first-class language
constructs out of the language's grammar elements.
In the JSON syntax, a static map construct is expressed as a direct JSON
object. As with ExprList, this bypasses any dynamic expression evaluation
behavior and requires the user to provide a literal JSON object, though
the calling application is then free to evaluate the key/value expressions
inside in whatever way makes sense.
Previously this was handled in the parser, but the parser now permits
multiple properties with the same name and so we must handle this at the
decoder level instead.
Previously we required optional attributes to be specified as pointers so that we could represent the empty vs. absent distinction.
For applications that don't need to make that distinction, representing "optional" as a struct tag is more convenient.
Previously we allowed arrays only at the "leaf" of a set of objects
describing a block and its labels. This is not sufficient because it is
therefore impossible to preserve the relative ordering of a sequence
of blocks that have different block types or labels.
The spec now allows arrays of objects to be used in place of single
objects when that value is representing either an HCL body or a set of
labels on a nested block. This relaxing does not apply to JSON objects
interpreted as expressions or bodies interpreted in dynamic attributes
mode, since there is no requirement to preserve attribute ordering or
support duplicate property names in those scenarios.
This new model imposes additional constraints on the underlying JSON
parser used to interpret JSON HCL: it must now be able to retain the
relative ordering of object keys and accept multiple definitions of the
same key. This requirement is not imposed on _producers_, which are free
to use the allowance for arrays of objects to force ordering and duplicate
keys with JSON-producing libraries that are unable to make these
distinctions.
Since we are now requiring a specialized parser anyway, we also require
that it be able to represent numbers at full precision, whereas before
we made some allowances for implementations to not support this.
The peeker has an "include newlines" stack which the parser manipulates
to switch between the newline-sensitive and non-sensitive scanning modes.
If the parser code fails to manage this stack correctly (for example,
due to a missed call to PopIncludeNewlines) then this causes very
confusing downstream errors that are otherwise difficult to debug.
As an extra debug tool for when errors _are_ detected, when this problem
is encountered during tests we are able to produce a visualization of the
pushes and pops to help the test developer see which pushes and pops
seem out of place.
This is a lot of ugly extra code but it's usually disabled and seems worth
it to allow us to catch quickly bugs that would otherwise be quite
difficult to diagnose.
Previously it was mismanaging the stack by first pushing on "false" and
then trying to undo that by pushing on "true". Instead, it should just
pop off the "false" to return to whatever the previous setting was, since
indexing brackets might already be inside a no-newlines context.
We were previously using an ugly combination of "pretty" and "spew" to
do this, which never really quite worked because of limitations in each
of those.
deep.Equal doesn't produce quite as much detailed information as the
others, but it has the advantage of showing exactly where a difference
exists rather than forcing us to hunt through a noisy diff to find it.
Fuzz testing revealed that there were a few different crashers in the
string literal decoder, which was previously a rather-unweildy
hand-written scanner with manually-implemented lookahead.
Rather than continuing to hand-tweak that code, here instead we use
ragel (which we were already using for the main scanner anyway) to
partition our string literals into tokens that are easier for our
decoder to wrangle.
As a bonus, this also makes our source ranges in our diagnostics more
accurate.
Now that we have the necessary functions to deal with this in the
low-level HCL API, it's more intuitive to use bare identifiers for these
parameter names. This reinforces the idea that they are symbols being
defined rather than arbitrary string expressions.
In a few specific portions of the spec format it's convenient to have
access to some of the functions defined in the cty stdlib. Here we allow
them to be used when constructing the value for a "literal" spec and in
the result expression for a "transform" spec.
This new spec type allows evaluating an arbitrary expression on the
result of a nested spec, for situations where the a value must be
transformed in some way.
This is essentially a CLI wrapper around the hcldec package, accepting a
decoding specification via a HCL-based language and using it to translate
input HCL files into JSON values while performing basic structural and
type validation of the input files.
A common pattern is emerging in calling applications of using single-item
absolute traversals to give the impression of static language keywords.
This new function makes that explicitly possible and allows a convenient
pattern for doing so that should improve the readability of a calling
application making use of it.
Calling applications often need to validate strings provided by the user
that will eventually be variable or attribute names in the evaluation
scope, to ensure that they will be evaluable.
Rather than having each application specify its own different subset of
the full set we support (which is derived from Unicode specifications),
we provide a simple function to let callers easily check the validity
of a potential identifier using exactly the same scanning rules we use
within the expression scanner.
To achieve this we actually invoke the scanner and then assert on its
result, which is a pretty expensive way to just check one string but it's
easy to do with code we already have in place and we don't expect this
sort of validation to be going on in a tight loop.
The readme was previously unclear about the fact that HCL is not a configuration language in itself but rather a toolkit for defining and parsing configuration languages.
It may still not be totally clear, but it is hopefully clearer than it was.
These allow the inclusion of arbitrary unicode codepoints (always encoded
as UTF-8) using a hex representation.
\u expects four digits and can thus represent only characters in the basic
multilingual plane.
\U expects eight digits and can thus represent all unicode characters,
at the cost of being extra-verbose.
Since our parser properly accounts for unicode characters (including
combining sequences) it's recommended to include them literally (UTF-8
encoded) in source code, but these sequences are useful for explicitly
representing non-printable characters that could otherwise appear
invisible in source code, such as zero-width modifier characters.
This fixes#6.
We inherited a restriction from an early zcl prototype here, but it's
far too strict to prohibit tabs entirely and so we'll accept them and
just treat them as spaces for column-counting purposes.
Tabs are still not _advised_, since they add extra complexity for problems
like generating annotated source code snippets (can't necessarily know
how large the tab stop is going to be) or doing surgical updates to
existing source files. The canonical formatting applied by hclwrite's
Format function will still eliminate all tabs, imposing the canonical
style of two spaces per indent level.
This fixes#2.
An earlier iteration of this package was able to optionally use HIL as
its expression engine in place of the hclsyntax expression parser, but
this has since been removed and so this flag no longer has any effect.
Consequently, the public functions ParseWithHIL and ParseFileWithHIL were,
in fact, just using the zclsyntax parser and thus behaving identically to
the Parse and ParseFile functions.
A pattern has emerged of wrapping Expression instances with other
Expressions in order to subtly modify their behavior. A key example of
this is in ext/dynblock, where wrap an expression in order to introduce
our additional iteration variable for expressions in dynamic blocks.
Rather than having each wrapper expression implement wrapping
implementations for our various syntax-level-analysis functions (like
ExprList and AbsTraversalForExpr), instead we define a standard mechanism
to unwrap expressions back to the lowest-level object -- usually an AST
node -- and then use this in all of our analyses that look at the
expression's structure rather than its value.
Terraform is the prime use-case for the dynblock extension, so we'll
include this here currently as a proof-of-concept for Terraform's usage,
but eventually (once Terraform is actually using it) this'll give some
insurance that it doesn't get broken.
For applications already using hcldec, a decoder specification can be used
to automatically drive the recursive variable detection walk that begins
with WalkForEachVariables, allowing all "for_each" and "labels" variables
in a recursive block structure to be detected in a single call.