HCL uses grapheme cluster segmentation to produce accurate "column"
indications in diagnostic messages and other human-oriented source
location information. Each new major version of Unicode introduces new
codepoints, some of which are defined to combine with other codepoints to
produce a single visible character (grapheme cluster).
We were previously using the rules from Unicode 9.0.0. This change
switches to using the segmentation rules from Unicode 12.0.0, which is
the latest version at the time of this commit and is also the version of
Unicode used for other purposes by the Go 1.14 runtime.
HCL does not use text segmentation results for any purpose that would
affect the meaning of decoded data extracted from HCL files, so this
change will only affect the human-oriented source positions generated for
files containing characters that were newly-introduced in Unicode 10, 11,
or 12. (Machine-oriented uses of source location information are based on
byte offsets and not affected by text segmentation.)
Some HCL callers make the (reasonable) assumption that the overall source
range of an expression will be a superset of all of the ranges of its
child expressions, for purposes such as extraction of source code
snippets, parse tree annotation in hclwrite, text editor analysis
functions like "go to reference", etc.
The IndexExpr type was not previously honoring that assumption, since its
source range was placed around only the bracket portion. That is a good
region to use when reporting errors relating to the index operation, but
it is not a faithful representation of the full extent of the expression.
In order to meet both of these requirements at once, IndexExpr now has
both SrcRange covering the entire expression and BracketRange covering
the index part delimited by brackets. We can then use BracketRange in
our error messages but return SrcRange as the result of the general
Range method that is common to all expression types.
We currently have functions for constructing new expressions from either
constant values or from traversals, but some surgical updates require
producing a more complex expression.
In the long run perhaps we'll have some mechanism for constructing valid
expressions via a high-level AST-like API, similar to what we already have
for structural constructs, but as a simpler first step here we add a
mechanism to just write raw tokens directly into an expression, with the
caller being responsible for making sure those tokens represent valid
HCL expression syntax.
Since this new API treats the given tokens as unstructured, the resulting
expression can't fully support the whole of the expression API, but it's
good enough for writing in complex expressions without disturbing existing
content elsewhere in the input file.
Previously we allowed adding both attributes and blocks, and we allowed
updating attributes, but we had no mechanism to surgically remove
attributes and blocks altogether.
The symmetrical spaces around colons in conditionals are important for
familiarity with C-like languages, so we'll instead accept spaces around
colons in our HCL2-unique "for expression" construct.
Leading whitespace is significant in heredocs, so we'll avoid making any
indentation adjustments for lines between OHeredoc and CHeredoc.
This fixes#31.
Our normal ruleset thinks that the "in" keyword here is a variable
reference and so writes it as "in[y]". Since there's never any reason for
a variable to appear immediately after another variable, we can check
for a preceding identifier as a heuristic to recognize whether in is
probably being used as a keyword rather than as a variable.
This is not exact, but the only time this should be a false positive is
if there were a syntax error in the input, and we don't make any
guarantees about the result in that case anyway.
This fixes#52.
This relaxes our previous spec to include a special form from HCL 1:
foo { bar = baz }
Although we normally require each argument to be on a line of its own, as
a special case we allow a block to be defined with a single nested
argument all on one line.
Only one nested argument definition is allowed, and a nested block
definition like "foo { bar {} }" is also disallowed in order to force the
more-readable split of bar {} onto a line of its own.
This is a pragmatic addition for broader compatibility with HCL 1-oriented
input. This single-line usage is not considered idiomatic HCL 2 and may
in future be undone by the formatter, though for now it is left as-is
aside from the spacing around the braces.
This also changes the behavior of the source code formatter to include
spaces on both sides of braces. This mimicks the formatting behavior of
HCL 1 for this situation, and (subjectively) reads better even for other
one-line braced expressions like object constructors and object for
expressions.
This completes the minimal functionality for creating a new file from
scratch, rather than modifying an existing one. This is illustrated by
a new test TestRoundupCreate that uses the API to create a new file in a
similar way to how a calling application might.
There isn't any strong reason for this -- they don't implement io.Reader
and so can't be used in places where a Reader+WriterTo is expected, like
io.Copy -- but go lint thinks that anything called WriteTo with an
io.Writer argument is an attempt to implement WriterTo and so this just
shuts up the linter.
Since this function implicitly creates a new body, this name is more
appropriate and leaves the name "AppendBlock" open for a later method to
append an _existing_ block, such as when moving a block from one file
to another.
This method allows a caller to generate a nested block within a body.
Since a nested block has its own content body, this now allows for deep
structures to be generated.
Although our underlying parse tree retains all of the token content, it
doesn't necessarily retain all of the spacing information under editing,
and so formatting on save ensures that we'll produce a canonical result
even if some edits have been applied that have changed the expected
alignment of objects, etc.
For now, this is the only way to set an attribute, and so attributes can
only be set to literal values.
Later this will be generalized so that this is just a helper wrapper
around a "SetAttribute" method that just uses a given expression, which
then helps by constructing the expression from the value first.
The original prototype of hclwrite tried to track both the tokens and
the AST as two parallel data structures. This quickly exploded in
complexity, leading to lots of messy code to manage keeping those two
structures in sync.
This new approach melds the two structures together, creating first a
physical token tree (made of "node" objects, and hidden from the caller)
and then attaching the AST nodes to that token tree as additional sidecar
data.
The result is much easier to work with, leading to less code in the parser
and considerably less complex data structures in the parser's tests.
This commit is enough to reach feature parity with the previous prototype,
but it remains a prototype. With a more usable foundation, we'll evolve
this into a more complete implementation in subsequent commits.
This will allow for use-cases such as renaming a variable (changing the
content of the first token) and replacing variable references with
constant values that they evaluate to for debug purposes.
This is another heuristic because the "[" syntax is also the tuple
constructor start marker, but this takes care of the common cases of
indexing keywords and bracketed expressions.
This fixes#29.
This is a super-invasive update since the "zcl" package in particular
is referenced all over.
There are probably still a few zcl references hanging around in comments,
etc but this takes care of most of it.
The main "zcl" package requires a bit more care because of how many
callers it has and because of its two subpackages, so we'll take care
of that one separately.