When a test file declares one or more expected diagnostics, we check those
instead of checking the result value. The severities and source ranges
must match.
We don't test the error messages themselves because they are not part of
the specification and may vary between implementations or, in future, be
translated into other languages.
The harness can now run tests that decode successfully and compare the
result with a given value. Further work is required in later commits to
deal with other cases, such as tests that intentionally produce errors.
By default we generate human-readable diagnostics on the assumption that
the caller is a simple program that is capturing stdin via a pipe and
letting stderr go to the terminal.
More sophisticated callers may wish to analyze the diagnostics themselves
and perhaps present them in a different way, such as via a GUI.
This option skips the usual decoding step and instead prints out a JSON-
formatted list of the variables that are referenced by the configuration.
In simple cases this is not required, but for more complex use-cases it
can be useful to first analyze the input to see which variables need to
be in the scope, then construct a suitable set of variables before finally
decoding the input. For example, some of the variable values may be
expensive to produce.
This is the hcldec interface to Body.JustAttributes, producing a map whose
keys are the child attribute names and whose values are the results of
evaluating those expressions.
We can't just expose a JustAttributes-style spec directly here because
it's not really compatible with how hcldec thinks about things, but we
can expose a spec that decodes a specific child block because that can
then compose properly with other specs at the same level without
interfering with their operation.
The primary use for this is to allow the use of the block syntax to define
a map:
dynamic_stuff {
foo = "bar"
}
JustAttributes is normally used in static analysis situations such as
enumerating the contents of a block to decide what to include in the
final EvalContext. That's not really possible with the hcldec model
because both structural decoding and expression evaluation happen
together. Therefore the use of this is pretty limited: it's useful if you
want to be compatible with an existing format based on legacy HCL where a
map was conventionally defined using block syntax, relying on the fact
that HCL did not make a strong distinction between attribute and block
syntax.
For now, this is the only way to set an attribute, and so attributes can
only be set to literal values.
Later this will be generalized so that this is just a helper wrapper
around a "SetAttribute" method that just uses a given expression, which
then helps by constructing the expression from the value first.
The original prototype of hclwrite tried to track both the tokens and
the AST as two parallel data structures. This quickly exploded in
complexity, leading to lots of messy code to manage keeping those two
structures in sync.
This new approach melds the two structures together, creating first a
physical token tree (made of "node" objects, and hidden from the caller)
and then attaching the AST nodes to that token tree as additional sidecar
data.
The result is much easier to work with, leading to less code in the parser
and considerably less complex data structures in the parser's tests.
This commit is enough to reach feature parity with the previous prototype,
but it remains a prototype. With a more usable foundation, we'll evolve
this into a more complete implementation in subsequent commits.
We previously weren't returning appropriate Expression and EvalContext
references inside many of the diagnostics for ForExpr.
First, it was using the top-level expression instead of one of the nested
expressions in many cases. Secondly, it was using the given context
rather than the child context when talking about expressions that get
evaluated once per iteration.
As a result of this reporting we must now produce a new EvalContext for
each iteration, rather than sharing and mutating as we did before, but
in retrospect that's less likely to cause other confusing bugs anyway,
since we don't generally expect EvalContexts to be mutated.
In practice this should never arise because the index operator only works
for lists and maps and they use string keys, but we'll guard against this
anyway and return a placeholder for other values so that the output
doesn't grow unreadably long in that case.
If a diagnostic has an associated Expression and EvalContext then we can
look up the values of any variables referenced in the expression and show
them in the diagnostics message as additional context.
This is particularly useful when dealing with situations where a given
expression is evaluated multiple times with different variables, such as
in a 'for' expression, since each evaluation may produce a different set
of diagnostics.
If a diagnostic occurs while we're evaluating an expression, we'll now
include a reference to that expression in the diagnostic object. We
previously added the corresponding EvalContext here too, and so with these
together it is now possible for a diagnostic renderer to see not only
what was in scope when the problem occurred but also what parts of that
scope the expression was relying on (via method Expression.Variables).
When building tools around HCL configuration files it is useful to be
able to ask what is present at a given position in a file. This set of
new helper functions provide a best-effort implementation of this for
the native syntax only.
It cannot be supported for JSON syntax with these signatures because the
JSON syntax is ambiguous and thus can't be interpreted without a schema
for each structural level. In practice this is not a big loss because
JSON files will usually be generated rather than hand-written anyway, and
so doing automatic analysis and transformation of them would not be
useful: the program that generated the file must be updated instead.
When we're evaluating expressions, we may end up evaluating the same
source-level expression a number of times in different contexts, such as
in a 'for' expression, where each one may produce a different set of
diagnostic messages.
Now we'll attach the EvalContext to each expression diagnostic so that
a diagnostic renderer can potentially show additional information to help
distinguish the different iterations in rendered diagnostics.
During implementation of HCL in other applications, it became clear that
the overloading of the word "attribute" to mean both a key/value pair in
a body and an element within an object value creates confusion.
It's too late to change that in the HCL Go API now, but here we at least
update the diagnostic messages. The new convention is that a key/value
pair within a block is now called an "argument", while an element of an
object is still called an "attribute".
It is unfortunate that the Go-facing API still uses the word "attribute"
for both, but the user experience is the most important thing and in
practice many applications will treat block arguments as one way to set
the attributes of some object anyway, and in that case arguments can be
thought of as the subset of attributes of an object whose values come
from that object's associated block.
This also includes a few other minor terminology tweaks in the diagnostic
messages the reflect how our lexicon has evolved during development and
authoring of user-facing documentation.
This will allow for use-cases such as renaming a variable (changing the
content of the first token) and replacing variable references with
constant values that they evaluate to for debug purposes.
This is for the core HCL syntax, so it doesn't include any
application-specific keyword highlighting, etc.
The structural, expression, and template languages are separated into
different grammar definitions so that they can be used independently, but
they embed each other as needed to complete the language.
This is just a first pass, really. There are probably some bugs here, and
also some missing features.
Previously this implementation was doing only one level of recursion in
its walk, which gave the appearance of working until the
transform/container-type specs (DefaultSpec, TransformSpec, ...) were
introduced, creating the possibility of "same body children" being more
than one level away from the initial spec.
It's still correct to only process the schema and content once, because
ImpliedSchema is already collecting all of the requirements from the
"same body children", and so our content object will include everything
that the nested specs should need to analyze needed variables.
This is another heuristic because the "[" syntax is also the tuple
constructor start marker, but this takes care of the common cases of
indexing keywords and bracketed expressions.
This fixes#29.
We automatically convert from set to list in many other situations, so for
consistency we should accept sets here too and just treat them as
unordered sequences.
This closes#30.
Due to the special handling of the anonymous symbol employed to evaluate
a splat expression, we need to employ a lock on that symbol so that it's
safe for concurrent evaluation.
As before, it's not safe to concurrently evaluate the same expression in
the same context, but it is now safe to do so as long as all concurrent
evaluations have a _distinct_ EvalContext.
This fixes#28.
Previously it was not implementing the two optional interfaces required
for this, and so decoding would fail for any AttrSpec or block spec nested
inside.
Now it passes through attribute requirements from both the primary and
default, and passes block requirements only from the primary, thus
allowing either fallback between two attributes, fallback from an
attribute to a constant, or fallback from a block to a constant. Other
permutations are also possible, but not very important.
Previously we were only counting a \n as starting a new line, so input
using \r\n endings would get treated as one long line for source-range
purposes.
Now we also consider \r\n to be a newline marker, resetting the column
count to zero and incrementing the line just as we would do for a single
\n. This is made easier because the unicode definition of "grapheme
cluster" considers \r\n to be a single character, so we don't need to
do anything special in order to match it.
Previously, due to how heredoc scanning was implemented, the closing
marker for a heredoc would consume the newline that terminated it. This
was problematic in any context that is newline-sensitive, because it
would cause us to skip the TokenNewline that might terminate e.g. an
attribute definition:
foo = <<EOT
hello
EOT
bar = "hello"
Previously the "foo" attribute would fail to parse properly due to trying
to consume the "bar" definition as part of its expression.
Now we synthetically split the marker token into two parts: the marker
itself and the newline that follows it. This means that using a heredoc
in any context where newlines are sensitive will involuntarily introduce
a newline, but that seems consistent with user expectation based on how
heredocs seem to be used "in the wild".
This uses the expression static analysis features to interpret
a combination of static calls and static traversals as the description
of a type.
This is intended for situations where applications need to accept type
information from their end-users, providing a concise syntax for doing
so.
Since this is implemented using static analysis, the type vocabulary is
constrained only to keywords representing primitive types and type
construction functions for complex types. No other expression elements
are allowed.
A separate function is provided for parsing type constraints, which allows
the additonal keyword "any" to represent the dynamic pseudo-type.
Finally, a helper function is provided to convert a type back into a
string representation resembling the original input, as an aid to
applications that need to produce error messages relating to user-entered
types.
Implementing the config loader for Terraform led to the addition of some
special static analysis operations for expressions, separate from the
usual action of evaluating an expression to produce a value.
These operations are useful for building application-specific language
constructs within HCL syntax, and so they are now included as part of the
specification in order to help developers of other applications understand
their behaviors and the implications of using them.
This accompanies ExprList, ExprMap, and AbsTraversalForExpr to
complete the set of static analysis interfaces for digging down into the
expression syntax structures without evaluation.
The intent of this function is to be a little like AbsTraversalForExpr
but for function calls. However, it's also similar to ExprList in that
it gives access to the raw expression objects for the arguments, allowing
for recursive analysis.
We recognize and allow naked $ and % sequences by reading ahead one more
character to see if it's a "{" that would introduce an interpolation or
control sequence.
Unfortunately this is problematic in the end condition because it can
"eat" the terminating character and cause the scanner to continue parsing
a template when the user intended the template to end.
Handling this is a bit messy. For the quoted and heredoc situations we
can use Ragel's fhold statement to "backtrack" to before the character
we consumed, which does the trick. For bare templates this is insufficient
because there _is_ no following character and so the scanner detects this
as an error.
Rather than adding even more complexity to the state machine, instead we
just handle as a special case invalid bytes at the top-level of a bare
template, returning them as a TokenStringLit instead of a TokenInvalid.
This then gives the parser what it needs.
The fhold approach causes some odd behavior where an escaped template
introducer character causes a token split and two tokens are emitted
instead of one. This is weird but harmless, since we'll ultimately just
concatenate all of these strings together anyway, and so we allow it
again to avoid making the scanner more complex when it's easy enough to
handle this in the parser where we have more context.
This was allowed in legacy HCL, and although it was never documented as
usable in the Terraform documentation it appears that some Terraform
configurations use this form anyway.
While it is non-ideal to have another edge-case to support/maintain, this
capability adds no ambiguity and doesn't add significant complexity, so
we'll allow it to be pragmatic for existing usage.
Terraform allowed indexing like foo.0.bar to work around HIL limitations,
and so we'll permit that as a pragmatic way to accept existing Terraform
configurations.
However, we can't support this fully because our parser thinks that
chained number indexes, like foo.0.0.bar, are single numbers. Since that
usage in Terraform is very rare (there are very few lists of lists) we
will mark that situation as an error with a helpful message suggesting
to use the modern index syntax instead.
This also turned up a similar bug in the existing legacy index handling
we were doing for splat expressions, which is now handled in the same
way.
We are leaning on the unicode identifier definitions here, but the
specified ID_Start does not include the underscore character and users
seem to expect this to be allowed due to experience with other languages.
Since allowing a leading underscore introduces no ambiguity, we'll allow
it. Calling applications may choose to reject it if they'd rather not have
such weird names.
Previously we missed the '%' character in our "SelfToken" production,
which meant that the modulo operator could not parse properly due to it
being represented as a TokenInvalid.
Due to some earlier limitations of the parser we required each attribute
and block to end with a newline, even if it appeared at the end of a
file. In effect, this required all files to end with a newline character.
This is no longer required and so we'll tolerate that missing newline for
pragmatic reasons.