From 57c9a676d7c4540f9eb4cfc84d977323d3341460 Mon Sep 17 00:00:00 2001 From: Martin Atkins Date: Wed, 5 Sep 2018 08:08:11 -0700 Subject: [PATCH] guide: The "Configuration Language Design" section --- guide/go_expression_eval.rst | 2 + guide/go_patterns.rst | 2 + guide/language_design.rst | 282 +++++++++++++++++++++++++++++++++++ 3 files changed, 286 insertions(+) diff --git a/guide/go_expression_eval.rst b/guide/go_expression_eval.rst index a1165e1..f6ed0b6 100644 --- a/guide/go_expression_eval.rst +++ b/guide/go_expression_eval.rst @@ -75,6 +75,8 @@ complex structures: source_file = "${path.module}/foo.txt" +.. _go-expression-funcs: + Defining Functions ------------------ diff --git a/guide/go_patterns.rst b/guide/go_patterns.rst index 1c36ebe..0c70496 100644 --- a/guide/go_patterns.rst +++ b/guide/go_patterns.rst @@ -8,6 +8,8 @@ some more complex situations that can benefit from some additional techniques. This section lists a few of these situations and ways to use the HCL API to accommodate them. +.. _go-interdep-blocks: + Interdependent Blocks --------------------- diff --git a/guide/language_design.rst b/guide/language_design.rst index dc01a75..880ac7d 100644 --- a/guide/language_design.rst +++ b/guide/language_design.rst @@ -1,3 +1,285 @@ Configuration Language Design ============================= +In this section we will cover some conventions for HCL-based configuration +languages that can help make them feel consistent with other HCL-based +languages, and make the best use of HCL's building blocks. + +HCL's native and JSON syntaxes both define a mapping from input bytes to a +higher-level information model. In designing a configuration language based on +HCL, your building blocks are the components in that information model: +blocks, arguments, and expressions. + +Each calling application of HCL, then, effectively defines its own language. +Just as Atom and RSS are higher-level languages built on XML, HashiCorp +Terraform has a higher-level language built on HCL, while HashiCorp Nomad has +its own distinct language that is *also* built on HCL. + +From an end-user perspective, these are distinct languages but have a common +underlying texture. Users of both are therefore likely to bring some +expectations from one to the other, and so this section is an attempt to +codify some of these shared expectations to reduce user surprise. + +These are subjective guidelines however, and so applications may choose to +ignore them entirely or ignore them in certain specialized cases. An +application providing a configuration language for a pre-existing system, for +example, may choose to eschew the identifier naming conventions in this section +in order to exactly match the existing names in that underlying system. + +Language Keywords and Identifiers +--------------------------------- + +Much of the work in defining an HCL-based language is in selecting good names +for arguments, block types, variables, and functions. + +The standard for naming in HCL is to use all-lowercase identifiers with +underscores separating words, like ``service`` or ``io_mode``. HCL identifiers +do allow uppercase letters and dashes, but this primarily for natural +interfacing with external systems that may have other identifier conventions, +and so these should generally be avoided for the identifiers native to your +own language. + +The distinction between "keywords" and other identifiers is really just a +convention. In your own language documentation, you may use the word "keyword" +to refer to names that are presented as an intrinsic part of your language, +such as important top-level block type names. + +Block type names are usually singular, since each block defines a single +object. Use a plural block name only if the block is serving only as a +namespacing container for a number of other objects. A block with a plural +type name will generally contain only nested blocks, and no arguments of its +own. + +Argument names are also singular unless they expect a collection value, in +which case they should be plural. For example, ``name = "foo"`` but +``subnet_ids = ["abc", "123"]``. + +Function names will generally *not* use underscores and will instead just run +words together, as is common in the C standard library. This is a result of +the fact that several of the standard library functions offered in ``cty`` +(covered in a later section) have names that follow C library function names +like ``substr``. This is not a strong rule, and applications that use longer +names may choose to use underscores for them to improve readability. + +Blocks vs. Object Values +------------------------ + +HCL blocks and argument values of object type have quite a similar appearance +in the native syntax, and are identical in JSON syntax: + +.. code-block:: hcl + + block { + foo = bar + } + + # argument with object constructor expression + argument = { + foo = bar + } + +In spite of this superficial similarity, there are some important differences +between these two forms. + +The most significant difference is that a child block can contain nested blocks +of its own, while an object constructor expression can define only attributes +of the object it is creating. + +The user-facing model for blocks is that they generally form the more "rigid" +structure of the language itself, while argument values can be more free-form. +An application will generally define in its schema and documentation all of +the arguments that are valid for a particular block type, while arguments +accepting object constructors are more appropriate for situations where the +arguments themselves are freely selected by the user, such as when the +expression will be converted by the application to a map type. + +As a less contrived example, consider the ``resource`` block type in Terraform +and its use with a particular resource type ``aws_instance``: + +.. code-block:: hcl + + resource "aws_instance" "example" { + ami = "ami-abc123" + instance_type = "t2.micro" + + tags = { + Name = "example instance" + } + + ebs_block_device { + device_name = "hda1" + volume_size = 8 + volume_type = "standard" + } + } + +The top-level block type ``resource`` is fundamental to Terraform itself and +so an obvious candidate for block syntax: it maps directly onto an object in +Terraform's own domain model. + +Within this block we see a mixture of arguments and nested blocks, all defined +as part of the schema of the ``aws_instance`` resource type. The ``tags`` +map here is specified as an argument because its keys are free-form, chosen +by the user and mapped directly onto a map in the underlying system. +``ebs_block_device`` is specified as a nested block, because it is a separate +domain object within the remote system and has a rigid schema of its own. + +As a special case, block syntax may sometimes be used with free-form keys if +those keys each serve as a separate declaration of some first-class object +in the language. For example, Terraform has a top-level block type ``locals`` +which behaves in this way: + +.. code-block:: hcl + + locals { + instance_type = "t2.micro" + instance_id = aws_instance.example.id + } + +Although the argument names in this block are arbitrarily selected by the +user, each one defines a distinct top-level object. In other words, this +approach is used to create a more ergonomic syntax for defining these simple +single-expression objects, as a pragmatic alternative to more verbose and +redundant declarations using blocks: + +.. code-block:: hcl + + local "instance_type" { + value = "t2.micro" + } + local "instance_id" { + value = aws_instance.example.id + } + +The distinction between domain objects, language constructs and user data will +always be subjective, so the final decision is up to you as the language +designer. + +Standard Functions +------------------ + +HCL itself does not define a common set of functions available in all HCL-based +languages; the built-in language operators give a baseline of functionality +that is always available, but applications are free to define functions as they +see fit. + +With that said, there's a number of generally-useful functions that don't +belong to the domain of any one application: string manipulation, sequence +manipulation, date formatting, JSON serialization and parsing, etc. + +Given the general need such functions serve, it's helpful if a similar set of +functions is available with compatible behavior across multiple HCL-based +languages, assuming the language is for an application where function calls +make sense at all. + +The Go implementation of HCL is built on an underlying type and function system +:go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That +library also has a package of "standard library" functions which we encourage +applications to offer with consistent names and compatible behavior, either by +using the standard implementations directly or offering compatible +implementations under the same name. + +The "standard" functions that new configuration formats should consider +offering are: + +* ``abs(number)`` - returns the absolute (positive) value of the given number. +* ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear. +* ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order. +* ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments. +* ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``. +* ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type. +* ``int(number)`` - returns the integer component of the given number, rounding towards zero. +* ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value. +* ``jsonencode(val)`` - encodes the given value as a JSON string. +* ``length(coll)`` - returns the length of the given collection. +* ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules. +* ``max(numbers...)`` - returns the highest of the given number values. +* ``min(numbers...)`` - returns the lowest of the given number values. +* ``sethas(set, val)`` - returns true only if the given set has the given value as an element. +* ``setintersection(sets...)`` - returns the intersection of the given sets +* ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``. +* ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets. +* ``setunion(sets...)`` - returns the union of the given sets. +* ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters. +* ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters. +* ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp. +* ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules. + +Not all of these functions will make sense in all applications. For example, an +application that doesn't use set types at all would have no reason to provide +the set-manipulation functions here. + +Some languages will not provide functions at all, since they are primarily for +assigning values to arguments and thus do not need nor want any custom +computations of those values. + +Block Results as Expression Variables +------------------------------------- + +In some applications, top-level blocks serve also as declarations of variables +(or of attributes of object variables) available during expression evaluation, +as discussed in :ref:`go-interdep-blocks`. + +In this case, it's most intuitive for the variables map in the evaluation +context to contain an value named after each valid top-level block +type and for these values to be object-typed or map-typed and reflect the +structure implied by block type labels. + +For example, an application may have a top-level ``service`` block type +used like this: + +.. code-block:: hcl + + service "http" "web_proxy" { + listen_addr = "127.0.0.1:8080" + + process "main" { + command = ["/usr/local/bin/awesome-app", "server"] + } + + process "mgmt" { + command = ["/usr/local/bin/awesome-app", "mgmt"] + } + } + +If the result of decoding this block were available for use in expressions +elsewhere in configuration, the above convention would call for it to be +available to expressions as an object at ``service.http.web_proxy``. + +If it the contents of the block itself that are offered to evaluation -- or +a superset object *derived* from the block contents -- then the block arguments +can map directly to object attributes, but it is up to the application to +decide which value type is most appropriate for each block type, since this +depends on how multiple blocks of the same type relate to one another, or if +multiple blocks of that type are even allowed. + +In the above example, an application would probably expose the ``listen_addr`` +argument value as ``service.http.web_proxy.listen_addr``, and may choose to +expose the ``process`` blocks as a map of objects using the labels as keys, +which would allow an expression like +``service.http.web_proxy.service["main"].command``. + +If multiple blocks of a given type do not have a significant order relative to +one another, as seems to be the case with these ``process`` blocks, +representation as a map is often the most intuitive. If the ordering of the +blocks *is* significant then a list may be more appropriate, allowing the use +of HCL's "splat operators" for convenient access to child arguments. However, +there is no one-size-fits-all solution here and language designers must +instead consider the likely usage patterns of each value and select the +value representation that best accommodates those patterns. + +Some applications may choose to offer variables with slightly different names +than the top-level blocks in order to allow for more concise references, such +as abbreviating ``service`` to ``svc`` in the above examples. This should be +done with care since it may make the relationship between the two less obvious, +but this may be a good tradeoff for names that are accessed frequently that +might otherwise hurt the readability of expressions they are embedded in. +Familiarity permits brevity. + +Many applications will not make blocks results available for use in other +expressions at all, in which case they are free to select whichever variable +names make sense for what is being exposed. For example, a format may make +environment variable values available for use in expressions, and may do so +either as top-level variables (if no other variables are needed) or as an +object named ``env``, which can be used as in ``env.HOME``. +