Commit Graph

8 Commits

Author SHA1 Message Date
Martin Atkins
2eaeb36cb3 Use Unicode 13 text segmentation rules
HCL uses a number of upstream libraries that implement algorithms defined
in Unicode. This commit is updating those libraries all to versions that
have Unicode 13 support.

The main implication of this for HCL directly is that when it returns
column numbers in source locations it will count characters using the
Unicode 13 definition of "character", which includes various new
multi-codeunit characters added in Unicode 13.

These new version dependencies will also make Unicode 13 support available
for other functionality that HCL callers might use, such as the stdlib
functions in upstream cty, even though HCL itself does not directly use
those.
2021-02-23 09:05:19 -08:00
wata_mac
636e660fac json: Add ParseExpression function 2020-09-04 14:12:01 -07:00
Kazuma Watanabe
e72341df8a
json: Add json.ParseWithStartPos function
This is like json.Parse but allows specifying a non-default start position,
in case the caller is parsing a fragment from a larger JSON document.
2020-08-24 10:53:10 -07:00
Alisdair McDiarmid
e899db5b9f Update other fuzz docs for consistency 2020-05-14 15:03:29 -04:00
Alisdair McDiarmid
b265bbd046 json: Fix panic when parsing malformed JSON
When scanning JSON, upon encountering an invalid token, we immediately
return. Previously this return happened without inserting an EOF token.
Since other functions assume that a token sequence always ends in EOF,
this could cause a panic.

This commit adds a synthetic EOF token after the invalid token before
returning. While this does not match the real end-of-file of the source
JSON, it is marking the end of the scanned bytes, so it seems reasonable.

Fixes #339
2020-03-25 16:40:36 -04:00
Martin Atkins
fee90926da Use Unicode 12.0.0 grapheme cluster segmentation rules
HCL uses grapheme cluster segmentation to produce accurate "column"
indications in diagnostic messages and other human-oriented source
location information. Each new major version of Unicode introduces new
codepoints, some of which are defined to combine with other codepoints to
produce a single visible character (grapheme cluster).

We were previously using the rules from Unicode 9.0.0. This change
switches to using the segmentation rules from Unicode 12.0.0, which is
the latest version at the time of this commit and is also the version of
Unicode used for other purposes by the Go 1.14 runtime.

HCL does not use text segmentation results for any purpose that would
affect the meaning of decoded data extracted from HCL files, so this
change will only affect the human-oriented source positions generated for
files containing characters that were newly-introduced in Unicode 10, 11,
or 12. (Machine-oriented uses of source location information are based on
byte offsets and not affected by text segmentation.)
2020-03-09 09:16:33 -07:00
Martin Atkins
4755f8bf41 json: Clarify that this package is not interesting to import 2019-10-01 15:59:10 -07:00
Martin Atkins
6c4344623b Unfold the "hcl" directory up into the root
The main HCL package is more visible this way, and so it's easier than
having to pick it out from dozens of other package directories.
2019-09-09 16:08:19 -07:00