guide: The "Configuration Language Design" section

This commit is contained in:
Martin Atkins 2018-09-05 08:08:11 -07:00
parent 280771fe8a
commit 57c9a676d7
3 changed files with 286 additions and 0 deletions

View File

@ -75,6 +75,8 @@ complex structures:
source_file = "${path.module}/foo.txt"
.. _go-expression-funcs:
Defining Functions
------------------

View File

@ -8,6 +8,8 @@ some more complex situations that can benefit from some additional techniques.
This section lists a few of these situations and ways to use the HCL API to
accommodate them.
.. _go-interdep-blocks:
Interdependent Blocks
---------------------

View File

@ -1,3 +1,285 @@
Configuration Language Design
=============================
In this section we will cover some conventions for HCL-based configuration
languages that can help make them feel consistent with other HCL-based
languages, and make the best use of HCL's building blocks.
HCL's native and JSON syntaxes both define a mapping from input bytes to a
higher-level information model. In designing a configuration language based on
HCL, your building blocks are the components in that information model:
blocks, arguments, and expressions.
Each calling application of HCL, then, effectively defines its own language.
Just as Atom and RSS are higher-level languages built on XML, HashiCorp
Terraform has a higher-level language built on HCL, while HashiCorp Nomad has
its own distinct language that is *also* built on HCL.
From an end-user perspective, these are distinct languages but have a common
underlying texture. Users of both are therefore likely to bring some
expectations from one to the other, and so this section is an attempt to
codify some of these shared expectations to reduce user surprise.
These are subjective guidelines however, and so applications may choose to
ignore them entirely or ignore them in certain specialized cases. An
application providing a configuration language for a pre-existing system, for
example, may choose to eschew the identifier naming conventions in this section
in order to exactly match the existing names in that underlying system.
Language Keywords and Identifiers
---------------------------------
Much of the work in defining an HCL-based language is in selecting good names
for arguments, block types, variables, and functions.
The standard for naming in HCL is to use all-lowercase identifiers with
underscores separating words, like ``service`` or ``io_mode``. HCL identifiers
do allow uppercase letters and dashes, but this primarily for natural
interfacing with external systems that may have other identifier conventions,
and so these should generally be avoided for the identifiers native to your
own language.
The distinction between "keywords" and other identifiers is really just a
convention. In your own language documentation, you may use the word "keyword"
to refer to names that are presented as an intrinsic part of your language,
such as important top-level block type names.
Block type names are usually singular, since each block defines a single
object. Use a plural block name only if the block is serving only as a
namespacing container for a number of other objects. A block with a plural
type name will generally contain only nested blocks, and no arguments of its
own.
Argument names are also singular unless they expect a collection value, in
which case they should be plural. For example, ``name = "foo"`` but
``subnet_ids = ["abc", "123"]``.
Function names will generally *not* use underscores and will instead just run
words together, as is common in the C standard library. This is a result of
the fact that several of the standard library functions offered in ``cty``
(covered in a later section) have names that follow C library function names
like ``substr``. This is not a strong rule, and applications that use longer
names may choose to use underscores for them to improve readability.
Blocks vs. Object Values
------------------------
HCL blocks and argument values of object type have quite a similar appearance
in the native syntax, and are identical in JSON syntax:
.. code-block:: hcl
block {
foo = bar
}
# argument with object constructor expression
argument = {
foo = bar
}
In spite of this superficial similarity, there are some important differences
between these two forms.
The most significant difference is that a child block can contain nested blocks
of its own, while an object constructor expression can define only attributes
of the object it is creating.
The user-facing model for blocks is that they generally form the more "rigid"
structure of the language itself, while argument values can be more free-form.
An application will generally define in its schema and documentation all of
the arguments that are valid for a particular block type, while arguments
accepting object constructors are more appropriate for situations where the
arguments themselves are freely selected by the user, such as when the
expression will be converted by the application to a map type.
As a less contrived example, consider the ``resource`` block type in Terraform
and its use with a particular resource type ``aws_instance``:
.. code-block:: hcl
resource "aws_instance" "example" {
ami = "ami-abc123"
instance_type = "t2.micro"
tags = {
Name = "example instance"
}
ebs_block_device {
device_name = "hda1"
volume_size = 8
volume_type = "standard"
}
}
The top-level block type ``resource`` is fundamental to Terraform itself and
so an obvious candidate for block syntax: it maps directly onto an object in
Terraform's own domain model.
Within this block we see a mixture of arguments and nested blocks, all defined
as part of the schema of the ``aws_instance`` resource type. The ``tags``
map here is specified as an argument because its keys are free-form, chosen
by the user and mapped directly onto a map in the underlying system.
``ebs_block_device`` is specified as a nested block, because it is a separate
domain object within the remote system and has a rigid schema of its own.
As a special case, block syntax may sometimes be used with free-form keys if
those keys each serve as a separate declaration of some first-class object
in the language. For example, Terraform has a top-level block type ``locals``
which behaves in this way:
.. code-block:: hcl
locals {
instance_type = "t2.micro"
instance_id = aws_instance.example.id
}
Although the argument names in this block are arbitrarily selected by the
user, each one defines a distinct top-level object. In other words, this
approach is used to create a more ergonomic syntax for defining these simple
single-expression objects, as a pragmatic alternative to more verbose and
redundant declarations using blocks:
.. code-block:: hcl
local "instance_type" {
value = "t2.micro"
}
local "instance_id" {
value = aws_instance.example.id
}
The distinction between domain objects, language constructs and user data will
always be subjective, so the final decision is up to you as the language
designer.
Standard Functions
------------------
HCL itself does not define a common set of functions available in all HCL-based
languages; the built-in language operators give a baseline of functionality
that is always available, but applications are free to define functions as they
see fit.
With that said, there's a number of generally-useful functions that don't
belong to the domain of any one application: string manipulation, sequence
manipulation, date formatting, JSON serialization and parsing, etc.
Given the general need such functions serve, it's helpful if a similar set of
functions is available with compatible behavior across multiple HCL-based
languages, assuming the language is for an application where function calls
make sense at all.
The Go implementation of HCL is built on an underlying type and function system
:go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That
library also has a package of "standard library" functions which we encourage
applications to offer with consistent names and compatible behavior, either by
using the standard implementations directly or offering compatible
implementations under the same name.
The "standard" functions that new configuration formats should consider
offering are:
* ``abs(number)`` - returns the absolute (positive) value of the given number.
* ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear.
* ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order.
* ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments.
* ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``.
* ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type.
* ``int(number)`` - returns the integer component of the given number, rounding towards zero.
* ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value.
* ``jsonencode(val)`` - encodes the given value as a JSON string.
* ``length(coll)`` - returns the length of the given collection.
* ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules.
* ``max(numbers...)`` - returns the highest of the given number values.
* ``min(numbers...)`` - returns the lowest of the given number values.
* ``sethas(set, val)`` - returns true only if the given set has the given value as an element.
* ``setintersection(sets...)`` - returns the intersection of the given sets
* ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``.
* ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets.
* ``setunion(sets...)`` - returns the union of the given sets.
* ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters.
* ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters.
* ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp.
* ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules.
Not all of these functions will make sense in all applications. For example, an
application that doesn't use set types at all would have no reason to provide
the set-manipulation functions here.
Some languages will not provide functions at all, since they are primarily for
assigning values to arguments and thus do not need nor want any custom
computations of those values.
Block Results as Expression Variables
-------------------------------------
In some applications, top-level blocks serve also as declarations of variables
(or of attributes of object variables) available during expression evaluation,
as discussed in :ref:`go-interdep-blocks`.
In this case, it's most intuitive for the variables map in the evaluation
context to contain an value named after each valid top-level block
type and for these values to be object-typed or map-typed and reflect the
structure implied by block type labels.
For example, an application may have a top-level ``service`` block type
used like this:
.. code-block:: hcl
service "http" "web_proxy" {
listen_addr = "127.0.0.1:8080"
process "main" {
command = ["/usr/local/bin/awesome-app", "server"]
}
process "mgmt" {
command = ["/usr/local/bin/awesome-app", "mgmt"]
}
}
If the result of decoding this block were available for use in expressions
elsewhere in configuration, the above convention would call for it to be
available to expressions as an object at ``service.http.web_proxy``.
If it the contents of the block itself that are offered to evaluation -- or
a superset object *derived* from the block contents -- then the block arguments
can map directly to object attributes, but it is up to the application to
decide which value type is most appropriate for each block type, since this
depends on how multiple blocks of the same type relate to one another, or if
multiple blocks of that type are even allowed.
In the above example, an application would probably expose the ``listen_addr``
argument value as ``service.http.web_proxy.listen_addr``, and may choose to
expose the ``process`` blocks as a map of objects using the labels as keys,
which would allow an expression like
``service.http.web_proxy.service["main"].command``.
If multiple blocks of a given type do not have a significant order relative to
one another, as seems to be the case with these ``process`` blocks,
representation as a map is often the most intuitive. If the ordering of the
blocks *is* significant then a list may be more appropriate, allowing the use
of HCL's "splat operators" for convenient access to child arguments. However,
there is no one-size-fits-all solution here and language designers must
instead consider the likely usage patterns of each value and select the
value representation that best accommodates those patterns.
Some applications may choose to offer variables with slightly different names
than the top-level blocks in order to allow for more concise references, such
as abbreviating ``service`` to ``svc`` in the above examples. This should be
done with care since it may make the relationship between the two less obvious,
but this may be a good tradeoff for names that are accessed frequently that
might otherwise hurt the readability of expressions they are embedded in.
Familiarity permits brevity.
Many applications will not make blocks results available for use in other
expressions at all, in which case they are free to select whichever variable
names make sense for what is being exposed. For example, a format may make
environment variable values available for use in expressions, and may do so
either as top-level variables (if no other variables are needed) or as an
object named ``env``, which can be used as in ``env.HOME``.