hcl/hclsyntax: Fix scanning of multi-line comments

The pattern here was being too greedy because by default the longest match
is taken, and "*/" followed by more content is always longer than just
the "*/" to terminate the match.

The :>> symbol is a "finish guard" which tells Ragel to prefer to exit
from any* as soon as "*/" matches, making the match ungreedy.

The result of this is that input files containing more than one multi-line
comment will now tokenize each one separately, whereas before we would
create one long comment including everything from the first /* to the
final */ in the file, effectively causing parts of the file to be
ignored entirely.
This commit is contained in:
Martin Atkins 2018-12-03 16:25:56 -08:00
parent 4d82d52bfa
commit cd67ba1b25
3 changed files with 1724 additions and 1705 deletions

File diff suppressed because it is too large Load Diff

View File

@ -60,7 +60,7 @@ func scanTokens(data []byte, filename string, start hcl.Pos, mode scanMode) []To
Comment = (
("#" (any - EndOfLine)* EndOfLine) |
("//" (any - EndOfLine)* EndOfLine) |
("/*" any* "*/")
("/*" any* :>> "*/")
);
# Note: hclwrite assumes that only ASCII spaces appear between tokens,

View File

@ -1526,6 +1526,43 @@ EOF
},
},
},
{
"/* hello */ howdy /* hey */",
[]Token{
{
Type: TokenComment,
Bytes: []byte("/* hello */"),
Range: hcl.Range{
Start: hcl.Pos{Byte: 0, Line: 1, Column: 1},
End: hcl.Pos{Byte: 11, Line: 1, Column: 12},
},
},
{
Type: TokenIdent,
Bytes: []byte("howdy"),
Range: hcl.Range{
Start: hcl.Pos{Byte: 12, Line: 1, Column: 13},
End: hcl.Pos{Byte: 17, Line: 1, Column: 18},
},
},
{
Type: TokenComment,
Bytes: []byte("/* hey */"),
Range: hcl.Range{
Start: hcl.Pos{Byte: 18, Line: 1, Column: 19},
End: hcl.Pos{Byte: 27, Line: 1, Column: 28},
},
},
{
Type: TokenEOF,
Bytes: []byte{},
Range: hcl.Range{
Start: hcl.Pos{Byte: 27, Line: 1, Column: 28},
End: hcl.Pos{Byte: 27, Line: 1, Column: 28},
},
},
},
},
// Invalid things
{