Commit Graph

25 Commits

Author SHA1 Message Date
Martin Atkins
cb768a591a zclwrite: parsing of blocks
There's something a little off here, as illustrated by the failing
round trip test. Will figure that out later.
2017-06-10 17:16:19 -07:00
Martin Atkins
948b2e0b7b zclwrite: populate EOLTokens when parsing attributes 2017-06-10 16:06:09 -07:00
Martin Atkins
1565d4f906 zclwrite: formatting round-trip test
The existing round trip test was testing that we can pass through a set
of tokens through the AST entirely unmodified.

This new round-trip test passes the source through the Format function
and then checks that it has the same semantics, by evaluating it both
before and after and expecting an identical set of values.
2017-06-10 14:14:01 -07:00
Martin Atkins
ac42b456f3 zclwrite: "format" implementation
This tweaks the number of spaces before each token to create a consistent
layout. It doesn't (yet?) attempt to add or remove line breaks or
otherwise mess with the non-space characters in the code.

The main goal here is to get things to line up. zcl syntax is simple
enough that there's not much latitude for super-weird usage, and so if
someone does manage to write something weird (asymmetric breaking of
brackets between lines, etc) we won't try to fix it here.
2017-06-10 14:11:47 -07:00
Martin Atkins
040160f3f9 zclwrite: line/cell splitting for "format"
Our approach here is to split the input into lines and then to split
each line into up to three cells which, when present, should have their
leftmost characters aligned (eventually) by the formatter.
2017-06-10 07:47:15 -07:00
Martin Atkins
a523846abd zclwrite: start to stub out the mutation API
These new stub methods will allow dealing with attributes within a body.
2017-06-09 08:31:14 -07:00
Martin Atkins
5477fecfad zclsyntax: require newlines after block items
Previously we tolerated EOF as an alias for newline, but a file without
a newline at the end is a edge case primarily limited to contrived
examples in unit tests, and requiring it simplifies tasks such as code
generation in zclwrite, since we can always assume that every block item
comes with a built-in line terminator.
2017-06-09 08:19:47 -07:00
Martin Atkins
1de72e146e zclwrite: consume trailing line comments and newlines in body items
We consume both of these things primarily so that when we manipulate
the AST they will all stay together. For example, removing an attribute
from a body should take its comments and newline too.
2017-06-09 08:00:38 -07:00
Martin Atkins
c88641b147 zclwrite: absorb lead comments into attributes
The native parser's ranges don't include any surrounding comments, so we
need to do a little more work to pick them out of the surrounding token
sequences.

This just takes care of _lead_ comments, which are those that appear as
whole line comments above the item in question. Line comments, which
appear after the item on the same line, will follow in a later commit.
2017-06-08 09:04:27 -07:00
Martin Atkins
7609327736 zclwrite: Rename "Parse" to "ParseConfig"
This is mainly for symmetry with the API of zclsyntax, but also later
we'll probably have a ParseExpression function that can be used to make
edits to individual expressions.
2017-06-07 08:28:43 -07:00
Martin Atkins
13c93e974f zclwrite: initial attribute and basic expression parsing
This is not yet complete, since it fails to capture the newline, line
comments, and variable references in expressions. However, it does
capture the broad structure of an attribute, along with gathering up
all of its _interior_ tokens.
2017-06-07 08:24:33 -07:00
Martin Atkins
15b38d8a48 zclwrite: correct silly misplaced colon in round_trip_test 2017-06-07 08:01:30 -07:00
Martin Atkins
33c3d6e5ad zclwrite: standardize on TokenSeq for all node parts
We'll use TokenSeq for everything, including sequences we expect to
contain only one token, mainly for consistency but also so that our local
"parser" doesn't need to care about whether the main parser is returning
one token or many.
2017-06-07 07:41:39 -07:00
Martin Atkins
ed2a739cdb zclwrite: Don't crash if EachToken called on a nil TokenSeq. 2017-06-07 07:39:01 -07:00
Martin Atkins
69a87c73b4 zclwrite: start to partition body items
So far they are still just all unstructured, but each item is a separate
node.
2017-06-07 07:38:44 -07:00
Martin Atkins
363d08ed0d zclwrite: File-level AllTokens
This captures any leftover tokens that aren't considered part of the
file's main body, such as the trailing EOF token.
2017-06-07 07:37:56 -07:00
Martin Atkins
fa8a707c7f zclwrite: begin to flesh out public interface 2017-06-07 07:24:10 -07:00
Martin Atkins
598740b638 zclwrite: method for writing tokens to a writer
This now allows for round-tripping some input from bytes to tokens and
then back to bytes again. With no other changes, we expect this to produce
an identical result.
2017-06-07 07:06:23 -07:00
Martin Atkins
efbcfd19b2 zclwrite: TokenGen interface has EachToken, not AppendToTokens
Although building a flat array of tokens is _one_ use-case for TokenGen,
this new approach means we can also use the interface to write to an
io.Writer without needing to produce an intermediate buffer.
2017-06-07 06:45:01 -07:00
Martin Atkins
c233270a9b zclwrite: use a single, flat writer token buffer
Previously we were allocating a separate heap object for each token, which
creates a lot of small objects for the GC to manage. Since we know that
we're always converting from a flat array of native tokens, we can produce
a flat array of writer tokens first and _then_ take pointers into that
array to achieve our goal of making a slice of pointers.

For the use-case of formatting a sequence of tokens by tweaking the
"SpacesBefore" value, this means we can get all of our memory allocation
done in a single chunk and then just tweak the allocated, contiguous
tokens in-place, which should reduce memory pressure for a task which
will likely be done frequently by a text editor integration doing "format
on save".
2017-06-07 06:38:41 -07:00
Martin Atkins
3c0dde2ae5 zclwrite: foundations of the writer parser
The "writer parser" is a parser that produces a writer AST rather than
a zclsyntax AST. This can be used to produce a writer AST from existing
source in order to modify it before writing it out again.

It's implemented with the somewhat-unintuitive approach of running the
main zclsyntax parser and then mapping the source ranges it finds back
onto the token sequence to pull out the raw tokens for each object.
This allows us to avoid maintaining two parsers but also keeps all of
this raw-token-wrangling complexity out of the main parser.
2017-06-06 08:53:13 -07:00
Martin Atkins
e100bf4723 zclsyntax: generate lexer diagnostics
There are certain tokens that are _never_ valid, so we might as well
catch them early in the Lex... functions rather than having to handle
them in many different contexts within the parser.

Unfortunately for now when such errors occur they tend to be echoed by
more confusing errors coming from the parser, but we'll accept that for
now.
2017-06-04 07:34:26 -07:00
Martin Atkins
f94c89a6a5 zclsyntax: differentiate quoted and unquoted string literals
The context where a string literal was found affects what sort of escaping
it can have, so we need to distinguish these cases so that we will only
look for and handle backslash escapes in quoted strings.
2017-05-30 19:03:25 -07:00
Martin Atkins
476e2c127e zclwrite: convert zclsyntax tokens into zclwrite tokens
In zclwrite we throw away the absolute source position information and
instead just retain the number of spaces before each token. This different
model allows us to rewrite parts of the token sequence without needing
to re-adjust all of the positions, and it also allows us to do simple
indentation and spacing adjustments just by walking through the token
list and adjusting these numbers.
2017-05-29 16:59:20 -07:00
Martin Atkins
3a567abb51 zclwrite: stub of zcl code generation package
This uses a separate, lower-level AST than the parser so that it can
retain the raw tokens and make surgical changes. As a consequence it
has much less semantic detail than the parser AST, and in particular
doesn't represent the structure of expressions except to retain variable
references to enable global rename operations.
2017-05-29 16:05:34 -07:00