Commit Graph

1058 Commits

Author SHA1 Message Date
Florian Forster
ddff2bcdd7 printer: Add another failing input to TestFormatParsable. 2018-04-03 19:39:12 +02:00
Mitchell Hashimoto
061bf373e4
Merge pull request #239 from octo/scanner
scanner: Don't call unread() after reading EOF.
2018-04-03 10:01:18 -07:00
Mitchell Hashimoto
c247bd0851
Merge pull request #245 from octo/cartridge-return
scanner: Improve regular expression in "scanner".scanHeredoc().
2018-04-03 10:00:26 -07:00
Florian Forster
25340db58d scanner: scanHeredoc(): Accept any number of CRs (\r) at end of line.
When there are multiple cartridge returns at the end of the line, the regular expression will consider n-1 of them to be part of the string. Later, the last `\r` is removed. That may mean that a line that did previously *not* terminate a heredoc string may now terminate it, changing the meaning of the HCL file.
2018-04-03 16:23:33 +02:00
Florian Forster
6a21c5aa50 printer: Add another failing input to TestFormatParsable. 2018-04-03 16:18:04 +02:00
Florian Forster
13daa63726 scanner: Anchor heredoc-regexes at beginning of line. 2018-04-03 16:17:39 +02:00
Florian Forster
89240c3707 printer: Add another failing input to TestFormatParsable. 2018-04-03 16:16:34 +02:00
Florian Forster
23ed7ba25b scanner: Don't call unread() after reading EOF.
This fixes the TestScanDigitsUnread() unit test.
2018-03-20 21:24:50 +01:00
Florian Forster
cade852d47 scanner: Add unit test triggering a panic in unread().
For example, the (Go quoted) input "\"\\00" creates the following stack
trace:

```
panic: bytes.Buffer: UnreadRune: previous operation was not a successful ReadRune

goroutine 1 [running]:
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).unread(0xc420090270)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:112 +0x245
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).scanDigits(0xc420090270, 0x0, 0x8, 0x3, 0x5c2005b740)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:557 +0x1ba
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).scanEscape(0xc420090270, 0xc40000005c)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:520 +0x181
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).scanString(0xc420090270)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:504 +0x2c3
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).Scan(0xc420090270, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:172 +0x509
github.com/hashicorp/hcl/hcl/parser.(*Parser).scan(0xc42005bd18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:448 +0xf4
github.com/hashicorp/hcl/hcl/parser.(*Parser).objectKey(0xc42005bd18, 0x530aa8, 0xc42005bd18, 0xc42005bd18, 0x18, 0x50f980)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:224 +0xca
github.com/hashicorp/hcl/hcl/parser.(*Parser).objectItem(0xc42005bd18, 0x0, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:150 +0xbf
github.com/hashicorp/hcl/hcl/parser.(*Parser).objectList(0xc42005bd18, 0xc42000e000, 0x0, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:88 +0x139
github.com/hashicorp/hcl/hcl/parser.(*Parser).Parse(0xc42005bd18, 0xc420090270, 0x200000, 0xc42005bce0)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:59 +0xf3
github.com/hashicorp/hcl/hcl/parser.Parse(0x7fca1fdd9000, 0x4, 0x200000, 0x8, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/parser/parser.go:46 +0x294
github.com/hashicorp/hcl/hcl/printer.Format(0x7fca1fdd9000, 0x4, 0x200000, 0x0, 0xc42005bef0, 0x464307, 0x4, 0xc42005bed0)
        gopath/src/github.com/hashicorp/hcl/hcl/printer/printer.go:53 +0x5b
```
2018-03-20 21:24:50 +01:00
Mitchell Hashimoto
f40e974e75
Merge pull request #240 from octo/scanner-next
scanner: Update prevPos even when returning utf8.RuneError.
2018-03-20 13:20:55 -07:00
Mitchell Hashimoto
adef769457
Merge pull request #241 from octo/scanner-null
printer, scanner: Don't produce unparsable output.
2018-03-20 13:19:40 -07:00
Florian Forster
ec2ba18997 scanner: Fail if U+E123 is found in input.
This (invalid) Unicode codepoint is used by the printer package to fix up
the indentation of generated files. If this codepoint is present in the
input, the package gets confused and removes more than it should,
producing unparsable output.
2018-03-20 20:46:51 +01:00
Florian Forster
a5efd34964 scanner: Report null bytes as errors, even at the end of file.
The formatter will append a newline at the end of file, causing the output
of printer.Format() to be invalid.
2018-03-20 20:46:51 +01:00
Florian Forster
a81aa7b5dd printer: Add unit test of Format() producing unparsable output. 2018-03-20 20:46:51 +01:00
Florian Forster
fdaaf22252 scanner: Update prevPos even when returning utf8.RuneError.
The calling code will still call unread(), causing panics.
This fixes the TestScanHeredocRegexpCompile() unit test.
2018-03-20 20:46:20 +01:00
Florian Forster
73fde59edb scanner: Add unit test triggering a panic in scanHeredoc().
```
panic: regexp: Compile("[[:space:]]*<\xc8\\z"): error parsing regexp: invalid UTF-8: `�\z`

goroutine 32 [running]:
testing.tRunner.func1(0xc4200cae10)
        /usr/lib/google-golang/src/testing/testing.go:742 +0x29d
panic(0x507a00, 0xc420290690)
        /usr/lib/google-golang/src/runtime/panic.go:505 +0x229
regexp.MustCompile(0xc420289e10, 0x10, 0xc420087680)
        /usr/lib/google-golang/src/regexp/regexp.go:240 +0x171
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).scanHeredoc(0xc4200878c0)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:444 +0x3a9
github.com/hashicorp/hcl/hcl/scanner.(*Scanner).Scan(0xc4200878c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        gopath/src/github.com/hashicorp/hcl/hcl/scanner/scanner.go:186 +0x3e5
```
2018-03-20 20:46:20 +01:00
Seth Vargo
b1738d9053 Fix broken formatting directives (#242)
It looks like Go 1.10 fails these. This makes the build green again.
2018-03-20 14:36:33 -04:00
Martin Atkins
5f8ed954ab hclsyntax: count \r\n line endings properly in source ranges
Previously we were only counting a \n as starting a new line, so input
using \r\n endings would get treated as one long line for source-range
purposes.

Now we also consider \r\n to be a newline marker, resetting the column
count to zero and incrementing the line just as we would do for a single
\n. This is made easier because the unicode definition of "grapheme
cluster" considers \r\n to be a single character, so we don't need to
do anything special in order to match it.
2018-03-08 08:30:58 -08:00
Martin Atkins
7d6ed4d8f3 hclsyntax: emit Newline after a CHeredoc
Previously, due to how heredoc scanning was implemented, the closing
marker for a heredoc would consume the newline that terminated it. This
was problematic in any context that is newline-sensitive, because it
would cause us to skip the TokenNewline that might terminate e.g. an
attribute definition:

    foo = <<EOT
    hello
    EOT
    bar = "hello"

Previously the "foo" attribute would fail to parse properly due to trying
to consume the "bar" definition as part of its expression.

Now we synthetically split the marker token into two parts: the marker
itself and the newline that follows it. This means that using a heredoc
in any context where newlines are sensitive will involuntarily introduce
a newline, but that seems consistent with user expectation based on how
heredocs seem to be used "in the wild".
2018-03-08 08:22:32 -08:00
Martin Atkins
be66a72aa8 ext/typeexpr: HCL extension for "type expressions"
This uses the expression static analysis features to interpret
a combination of static calls and static traversals as the description
of a type.

This is intended for situations where applications need to accept type
information from their end-users, providing a concise syntax for doing
so.

Since this is implemented using static analysis, the type vocabulary is
constrained only to keywords representing primitive types and type
construction functions for complex types. No other expression elements
are allowed.

A separate function is provided for parsing type constraints, which allows
the additonal keyword "any" to represent the dynamic pseudo-type.

Finally, a helper function is provided to convert a type back into a
string representation resembling the original input, as an aid to
applications that need to produce error messages relating to user-entered
types.
2018-03-04 14:45:25 -08:00
Martin Atkins
ab87bc9ded Update the various spec documents to include static analysis
Implementing the config loader for Terraform led to the addition of some
special static analysis operations for expressions, separate from the
usual action of evaluating an expression to produce a value.

These operations are useful for building application-specific language
constructs within HCL syntax, and so they are now included as part of the
specification in order to help developers of other applications understand
their behaviors and the implications of using them.
2018-03-04 14:35:16 -08:00
Martin Atkins
5956048526 hcl: ExprCall function
This accompanies ExprList, ExprMap, and AbsTraversalForExpr to
complete the set of static analysis interfaces for digging down into the
expression syntax structures without evaluation.

The intent of this function is to be a little like AbsTraversalForExpr
but for function calls. However, it's also similar to ExprList in that
it gives access to the raw expression objects for the arguments, allowing
for recursive analysis.
2018-03-04 14:04:54 -08:00
Martin Atkins
92456935b8 hclsyntax: fix end-of-string edge cases for $ and % escapes
We recognize and allow naked $ and % sequences by reading ahead one more
character to see if it's a "{" that would introduce an interpolation or
control sequence.

Unfortunately this is problematic in the end condition because it can
"eat" the terminating character and cause the scanner to continue parsing
a template when the user intended the template to end.

Handling this is a bit messy. For the quoted and heredoc situations we
can use Ragel's fhold statement to "backtrack" to before the character
we consumed, which does the trick. For bare templates this is insufficient
because there _is_ no following character and so the scanner detects this
as an error.

Rather than adding even more complexity to the state machine, instead we
just handle as a special case invalid bytes at the top-level of a bare
template, returning them as a TokenStringLit instead of a TokenInvalid.
This then gives the parser what it needs.

The fhold approach causes some odd behavior where an escaped template
introducer character causes a token split and two tokens are emitted
instead of one. This is weird but harmless, since we'll ultimately just
concatenate all of these strings together anyway, and so we allow it
again to avoid making the scanner more complex when it's easy enough to
handle this in the parser where we have more context.
2018-03-03 11:24:31 -08:00
Martin Atkins
d66303f45b hclsyntax: allow block labels to be naked identifiers
This was allowed in legacy HCL, and although it was never documented as
usable in the Terraform documentation it appears that some Terraform
configurations use this form anyway.

While it is non-ideal to have another edge-case to support/maintain, this
capability adds no ambiguity and doesn't add significant complexity, so
we'll allow it to be pragmatic for existing usage.
2018-03-03 10:09:10 -08:00
Martin Atkins
074b73b8b5 hclsyntax: Allow Terraform-style legacy index form
Terraform allowed indexing like foo.0.bar to work around HIL limitations,
and so we'll permit that as a pragmatic way to accept existing Terraform
configurations.

However, we can't support this fully because our parser thinks that
chained number indexes, like foo.0.0.bar, are single numbers. Since that
usage in Terraform is very rare (there are very few lists of lists) we
will mark that situation as an error with a helpful message suggesting
to use the modern index syntax instead.

This also turned up a similar bug in the existing legacy index handling
we were doing for splat expressions, which is now handled in the same
way.
2018-03-03 09:02:29 -08:00
Martin Atkins
061412b83a hclsyntax: allow underscore at the start of identifiers
We are leaning on the unicode identifier definitions here, but the
specified ID_Start does not include the underscore character and users
seem to expect this to be allowed due to experience with other languages.

Since allowing a leading underscore introduces no ambiguity, we'll allow
it. Calling applications may choose to reject it if they'd rather not have
such weird names.
2018-03-03 08:03:52 -08:00
Martin Atkins
440debc6d4 zclsyntax: properly scan the modulo operator
Previously we missed the '%' character in our "SelfToken" production,
which meant that the modulo operator could not parse properly due to it
being represented as a TokenInvalid.
2018-03-03 07:56:54 -08:00
Martin Atkins
386ab3257c hclsyntax: allow missing newline at EOF
Due to some earlier limitations of the parser we required each attribute
and block to end with a newline, even if it appeared at the end of a
file. In effect, this required all files to end with a newline character.

This is no longer required and so we'll tolerate that missing newline for
pragmatic reasons.
2018-03-03 07:46:04 -08:00
Martin Atkins
998a3053e2 hcl/json: decode number literals at full precision
Elsewhere we are using 512-bit precision as the standard for converting
from a string to a number, since the default is shorter. This is just to
unify JSON parsing with the native syntax processing and the automatic
type conversions in the language, so we don't see different precision
behaviors depending on syntax.
2018-02-27 07:54:56 -08:00
Martin Atkins
75cceef4f0 gohcl: don't reflect.DeepEqual number values in tests
big.Float is not DeepEqual-friendly because it contains a precision value
that can make two numerically-equal values appear as non-equal.

Since the number decoding isn't the point of these tests, instead we just
swap out for cty.Bool values which _are_ compatible with
reflect.DeepEqual, since they are just wrappers around the native bool
type.
2018-02-27 07:53:20 -08:00
Martin Atkins
4719b76b52 hcl/json: update tokentype_string.go for latest version of stringer 2018-02-26 08:38:56 -08:00
Martin Atkins
cc8b14cf45 hclsyntax: "null", "true", "false" AbsTraversalForExpr
The contract for AbsTraversalForExpr calls for us to interpret an
expression as if it were traversal syntax. Traversal syntax does not have
the special keywords "null", "true" and "false", so we must interpret
these as TraverseRoot rather than as literal values.

Previously this wasn't working because the parser converted these to
literals too early. To make this work properly, we implement
AbsTraversalForExpr on literal expressions and effectively "undo" the
parser's re-interpretation of these keywords to back out to the original
keyword strings.

We also rework how object keys are handled so that we wait until eval time
to decide whether to interpret the key expression as an unquoted literal
string. This allows us to properly support AbsTraversalForExpr on keys
in object constructors, bypassing the string-interpretation behavior in
that case.
2018-02-26 08:38:35 -08:00
Martin Atkins
a42f1fdb23 hclsyntax: Tests for static expression analysis behaviors
We previously lacked tests for our implementstions of
hcl.AbsTraversalForExpr, hcl.ExprList, and hcl.ExprMap. These are now
tested.
2018-02-23 08:43:18 -08:00
Martin Atkins
227ccafb01 hclsyntax: use deep.Equal for TestParseTraversalAbs
This makes failure messages much easier to understand.
2018-02-23 08:42:26 -08:00
Martin Atkins
397fa07dea hcl: ExprMap function
This is similar to the ExprList function but for map-like constructs
(object constructors in the native syntax). It allows a more-advanced
calling application to analyze the physical structure of the configuration
directly, rather than analyzing the dynamic results of its expressions.

This is useful when creating what appear to be first-class language
constructs out of the language's grammar elements.

In the JSON syntax, a static map construct is expressed as a direct JSON
object. As with ExprList, this bypasses any dynamic expression evaluation
behavior and requires the user to provide a literal JSON object, though
the calling application is then free to evaluate the key/value expressions
inside in whatever way makes sense.
2018-02-23 08:41:58 -08:00
Martin Atkins
8c3aa9a6d4 hcl/json: catch and reject duplicate attrs in JustAttributes
Previously this was handled in the parser, but the parser now permits
multiple properties with the same name and so we must handle this at the
decoder level instead.
2018-02-17 15:23:06 -08:00
Nicholas Jackson
23fc060132 gohcl: allow optional attributes to be specified via struct tag
Previously we required optional attributes to be specified as pointers so that we could represent the empty vs. absent distinction.

For applications that don't need to make that distinction, representing "optional" as a struct tag is more convenient.
2018-02-17 10:36:04 -08:00
Martin Atkins
eea3a14a71 hcl/json: allow more flexible use of arrays when describing bodies
Previously we allowed arrays only at the "leaf" of a set of objects
describing a block and its labels. This is not sufficient because it is
therefore impossible to preserve the relative ordering of a sequence
of blocks that have different block types or labels.

The spec now allows arrays of objects to be used in place of single
objects when that value is representing either an HCL body or a set of
labels on a nested block. This relaxing does not apply to JSON objects
interpreted as expressions or bodies interpreted in dynamic attributes
mode, since there is no requirement to preserve attribute ordering or
support duplicate property names in those scenarios.

This new model imposes additional constraints on the underlying JSON
parser used to interpret JSON HCL: it must now be able to retain the
relative ordering of object keys and accept multiple definitions of the
same key. This requirement is not imposed on _producers_, which are free
to use the allowance for arrays of objects to force ordering and duplicate
keys with JSON-producing libraries that are unable to make these
distinctions.

Since we are now requiring a specialized parser anyway, we also require
that it be able to represent numbers at full precision, whereas before
we made some allowances for implementations to not support this.
2018-02-17 10:26:58 -08:00
Martin Atkins
77dc2cba20 hcl/json: fuzzing utilities 2018-02-16 21:18:25 -08:00
Martin Atkins
f87a794800 hclsyntax: check for and report incorrect peeker stack discipline
The peeker has an "include newlines" stack which the parser manipulates
to switch between the newline-sensitive and non-sensitive scanning modes.
If the parser code fails to manage this stack correctly (for example,
due to a missed call to PopIncludeNewlines) then this causes very
confusing downstream errors that are otherwise difficult to debug.

As an extra debug tool for when errors _are_ detected, when this problem
is encountered during tests we are able to produce a visualization of the
pushes and pops to help the test developer see which pushes and pops
seem out of place.

This is a lot of ugly extra code but it's usually disabled and seems worth
it to allow us to catch quickly bugs that would otherwise be quite
difficult to diagnose.
2018-02-16 17:37:22 -08:00
Martin Atkins
9dfc220a4b hclsyntax: index expression parsing properly manages "include newlines"
Previously it was mismanaging the stack by first pushing on "false" and
then trying to undo that by pushing on "true". Instead, it should just
pop off the "false" to return to whatever the previous setting was, since
indexing brackets might already be inside a no-newlines context.
2018-02-16 16:45:42 -08:00
Martin Atkins
9301cd2ad5 hclsyntax: use go-test/deep for comparing parse test results
We were previously using an ugly combination of "pretty" and "spew" to
do this, which never really quite worked because of limitations in each
of those.

deep.Equal doesn't produce quite as much detailed information as the
others, but it has the advantage of showing exactly where a difference
exists rather than forcing us to hunt through a noisy diff to find it.
2018-02-16 16:44:03 -08:00
Martin Atkins
5ca9713bf0 hclsyntax: prevent ragel line comments becoming package docs 2018-02-04 19:01:48 -08:00
Martin Atkins
cfd802163b hclsyntax: rewrite string literal decoder with ragel
Fuzz testing revealed that there were a few different crashers in the
string literal decoder, which was previously a rather-unweildy
hand-written scanner with manually-implemented lookahead.

Rather than continuing to hand-tweak that code, here instead we use
ragel (which we were already using for the main scanner anyway) to
partition our string literals into tokens that are easier for our
decoder to wrangle.

As a bonus, this also makes our source ranges in our diagnostics more
accurate.
2018-02-04 19:01:48 -08:00
Martin Atkins
93a7008e3d hclsyntax: helpers for fuzz testing with go-fuzz 2018-02-04 18:55:25 -08:00
Martin Atkins
18a92d222b ext/userfunc: use bare identifiers for param names
Now that we have the necessary functions to deal with this in the
low-level HCL API, it's more intuitive to use bare identifiers for these
parameter names. This reinforces the idea that they are symbols being
defined rather than arbitrary string expressions.
2018-02-04 11:20:42 -08:00
Martin Atkins
2ddf8b4b8c cmd/hcldec: allow spec file to define variables and functions
The spec file can now additionally define default variables and functions
for the eval context used to evaluate the input file.
2018-02-04 11:05:23 -08:00
Martin Atkins
6c3ae68a0e cmd/hcldec: make cty stdlib functions available to specs
In a few specific portions of the spec format it's convenient to have
access to some of the functions defined in the cty stdlib. Here we allow
them to be used when constructing the value for a "literal" spec and in
the result expression for a "transform" spec.
2018-02-04 10:33:35 -08:00
Martin Atkins
1ba92ee170 cmd/hcldec: "transform" spec type
This new spec type allows evaluating an arbitrary expression on the
result of a nested spec, for situations where the a value must be
transformed in some way.
2018-02-04 09:59:20 -08:00
Martin Atkins
f65a097d17 cmd/hcldec: decode "array" blocks
These were missed on the previous pass, causing a disagreement with the
documentation.
2018-02-04 09:45:28 -08:00