guide: The "Configuration Language Design" section

2018-09-05 08:08:11 -07:00 · 2018-09-05 08:08:11 -07:00 · 57c9a676d7
commit 57c9a676d7
parent 280771fe8a
3 changed files with 286 additions and 0 deletions
--- a/guide/go_expression_eval.rst
+++ b/guide/go_expression_eval.rst
@ -75,6 +75,8 @@ complex structures:

   source_file = "${path.module}/foo.txt"

+.. _go-expression-funcs:
+
 Defining Functions
 ------------------

--- a/guide/go_patterns.rst
+++ b/guide/go_patterns.rst
@ -8,6 +8,8 @@ some more complex situations that can benefit from some additional techniques.
 This section lists a few of these situations and ways to use the HCL API to
 accommodate them.

+.. _go-interdep-blocks:
+
 Interdependent Blocks
 ---------------------

--- a/guide/language_design.rst
+++ b/guide/language_design.rst
@ -1,3 +1,285 @@
 Configuration Language Design
 =============================

+In this section we will cover some conventions for HCL-based configuration
+languages that can help make them feel consistent with other HCL-based
+languages, and make the best use of HCL's building blocks.
+
+HCL's native and JSON syntaxes both define a mapping from input bytes to a
+higher-level information model. In designing a configuration language based on
+HCL, your building blocks are the components in that information model:
+blocks, arguments, and expressions.
+
+Each calling application of HCL, then, effectively defines its own language.
+Just as Atom and RSS are higher-level languages built on XML, HashiCorp
+Terraform has a higher-level language built on HCL, while HashiCorp Nomad has
+its own distinct language that is *also* built on HCL.
+
+From an end-user perspective, these are distinct languages but have a common
+underlying texture. Users of both are therefore likely to bring some
+expectations from one to the other, and so this section is an attempt to
+codify some of these shared expectations to reduce user surprise.
+
+These are subjective guidelines however, and so applications may choose to
+ignore them entirely or ignore them in certain specialized cases. An
+application providing a configuration language for a pre-existing system, for
+example, may choose to eschew the identifier naming conventions in this section
+in order to exactly match the existing names in that underlying system.
+
+Language Keywords and Identifiers
+---------------------------------
+
+Much of the work in defining an HCL-based language is in selecting good names
+for arguments, block types, variables, and functions.
+
+The standard for naming in HCL is to use all-lowercase identifiers with
+underscores separating words, like ``service`` or ``io_mode``. HCL identifiers
+do allow uppercase letters and dashes, but this primarily for natural
+interfacing with external systems that may have other identifier conventions,
+and so these should generally be avoided for the identifiers native to your
+own language.
+
+The distinction between "keywords" and other identifiers is really just a
+convention. In your own language documentation, you may use the word "keyword"
+to refer to names that are presented as an intrinsic part of your language,
+such as important top-level block type names.
+
+Block type names are usually singular, since each block defines a single
+object. Use a plural block name only if the block is serving only as a
+namespacing container for a number of other objects. A block with a plural
+type name will generally contain only nested blocks, and no arguments of its
+own.
+
+Argument names are also singular unless they expect a collection value, in
+which case they should be plural. For example, ``name = "foo"`` but
+``subnet_ids = ["abc", "123"]``.
+
+Function names will generally *not* use underscores and will instead just run
+words together, as is common in the C standard library. This is a result of
+the fact that several of the standard library functions offered in ``cty``
+(covered in a later section) have names that follow C library function names
+like ``substr``. This is not a strong rule, and applications that use longer
+names may choose to use underscores for them to improve readability.
+
+Blocks vs. Object Values
+------------------------
+
+HCL blocks and argument values of object type have quite a similar appearance
+in the native syntax, and are identical in JSON syntax:
+
+.. code-block:: hcl
+
+   block {
+     foo = bar
+   }
+
+   # argument with object constructor expression
+   argument = {
+     foo = bar
+   }
+
+In spite of this superficial similarity, there are some important differences
+between these two forms.
+
+The most significant difference is that a child block can contain nested blocks
+of its own, while an object constructor expression can define only attributes
+of the object it is creating.
+
+The user-facing model for blocks is that they generally form the more "rigid"
+structure of the language itself, while argument values can be more free-form.
+An application will generally define in its schema and documentation all of
+the arguments that are valid for a particular block type, while arguments
+accepting object constructors are more appropriate for situations where the
+arguments themselves are freely selected by the user, such as when the
+expression will be converted by the application to a map type.
+
+As a less contrived example, consider the ``resource`` block type in Terraform
+and its use with a particular resource type ``aws_instance``:
+
+.. code-block:: hcl
+
+   resource "aws_instance" "example" {
+     ami           = "ami-abc123"
+     instance_type = "t2.micro"
+
+     tags = {
+       Name = "example instance"
+     }
+
+     ebs_block_device {
+       device_name = "hda1"
+       volume_size = 8
+       volume_type = "standard"
+     }
+   }
+
+The top-level block type ``resource`` is fundamental to Terraform itself and
+so an obvious candidate for block syntax: it maps directly onto an object in
+Terraform's own domain model.
+
+Within this block we see a mixture of arguments and nested blocks, all defined
+as part of the schema of the ``aws_instance`` resource type. The ``tags``
+map here is specified as an argument because its keys are free-form, chosen
+by the user and mapped directly onto a map in the underlying system.
+``ebs_block_device`` is specified as a nested block, because it is a separate
+domain object within the remote system and has a rigid schema of its own.
+
+As a special case, block syntax may sometimes be used with free-form keys if
+those keys each serve as a separate declaration of some first-class object
+in the language. For example, Terraform has a top-level block type ``locals``
+which behaves in this way:
+
+.. code-block:: hcl
+
+   locals {
+     instance_type = "t2.micro"
+     instance_id   = aws_instance.example.id
+   }
+
+Although the argument names in this block are arbitrarily selected by the
+user, each one defines a distinct top-level object. In other words, this
+approach is used to create a more ergonomic syntax for defining these simple
+single-expression objects, as a pragmatic alternative to more verbose and
+redundant declarations using blocks:
+
+.. code-block:: hcl
+
+   local "instance_type" {
+     value = "t2.micro"
+   }
+   local "instance_id" {
+     value = aws_instance.example.id
+   }
+
+The distinction between domain objects, language constructs and user data will
+always be subjective, so the final decision is up to you as the language
+designer.
+
+Standard Functions
+------------------
+
+HCL itself does not define a common set of functions available in all HCL-based
+languages; the built-in language operators give a baseline of functionality
+that is always available, but applications are free to define functions as they
+see fit.
+
+With that said, there's a number of generally-useful functions that don't
+belong to the domain of any one application: string manipulation, sequence
+manipulation, date formatting, JSON serialization and parsing, etc.
+
+Given the general need such functions serve, it's helpful if a similar set of
+functions is available with compatible behavior across multiple HCL-based
+languages, assuming the language is for an application where function calls
+make sense at all.
+
+The Go implementation of HCL is built on an underlying type and function system
+:go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That
+library also has a package of "standard library" functions which we encourage
+applications to offer with consistent names and compatible behavior, either by
+using the standard implementations directly or offering compatible
+implementations under the same name.
+
+The "standard" functions that new configuration formats should consider
+offering are:
+
+* ``abs(number)`` - returns the absolute (positive) value of the given number.
+* ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear.
+* ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order.
+* ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments.
+* ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``.
+* ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type.
+* ``int(number)`` - returns the integer component of the given number, rounding towards zero.
+* ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value.
+* ``jsonencode(val)`` - encodes the given value as a JSON string.
+* ``length(coll)`` - returns the length of the given collection.
+* ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules.
+* ``max(numbers...)`` - returns the highest of the given number values.
+* ``min(numbers...)`` - returns the lowest of the given number values.
+* ``sethas(set, val)`` - returns true only if the given set has the given value as an element.
+* ``setintersection(sets...)`` - returns the intersection of the given sets
+* ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``.
+* ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets.
+* ``setunion(sets...)`` - returns the union of the given sets.
+* ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters.
+* ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters.
+* ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp.
+* ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules.
+
+Not all of these functions will make sense in all applications. For example, an
+application that doesn't use set types at all would have no reason to provide
+the set-manipulation functions here.
+
+Some languages will not provide functions at all, since they are primarily for
+assigning values to arguments and thus do not need nor want any custom
+computations of those values.
+
+Block Results as Expression Variables
+-------------------------------------
+
+In some applications, top-level blocks serve also as declarations of variables
+(or of attributes of object variables) available during expression evaluation,
+as discussed in :ref:`go-interdep-blocks`.
+
+In this case, it's most intuitive for the variables map in the evaluation
+context to contain an value named after each valid top-level block
+type and for these values to be object-typed or map-typed and reflect the
+structure implied by block type labels.
+
+For example, an application may have a top-level ``service`` block type
+used like this:
+
+.. code-block:: hcl
+
+  service "http" "web_proxy" {
+    listen_addr = "127.0.0.1:8080"
+
+    process "main" {
+      command = ["/usr/local/bin/awesome-app", "server"]
+    }
+
+    process "mgmt" {
+      command = ["/usr/local/bin/awesome-app", "mgmt"]
+    }
+  }
+
+If the result of decoding this block were available for use in expressions
+elsewhere in configuration, the above convention would call for it to be
+available to expressions as an object at ``service.http.web_proxy``.
+
+If it the contents of the block itself that are offered to evaluation -- or
+a superset object *derived* from the block contents -- then the block arguments
+can map directly to object attributes, but it is up to the application to
+decide which value type is most appropriate for each block type, since this
+depends on how multiple blocks of the same type relate to one another, or if
+multiple blocks of that type are even allowed.
+
+In the above example, an application would probably expose the ``listen_addr``
+argument value as ``service.http.web_proxy.listen_addr``, and may choose to
+expose the ``process`` blocks as a map of objects using the labels as keys,
+which would allow an expression like
+``service.http.web_proxy.service["main"].command``.
+
+If multiple blocks of a given type do not have a significant order relative to
+one another, as seems to be the case with these ``process`` blocks,
+representation as a map is often the most intuitive. If the ordering of the
+blocks *is* significant then a list may be more appropriate, allowing the use
+of HCL's "splat operators" for convenient access to child arguments. However,
+there is no one-size-fits-all solution here and language designers must
+instead consider the likely usage patterns of each value and select the
+value representation that best accommodates those patterns.
+
+Some applications may choose to offer variables with slightly different names
+than the top-level blocks in order to allow for more concise references, such
+as abbreviating ``service`` to ``svc`` in the above examples. This should be
+done with care since it may make the relationship between the two less obvious,
+but this may be a good tradeoff for names that are accessed frequently that
+might otherwise hurt the readability of expressions they are embedded in.
+Familiarity permits brevity.
+
+Many applications will not make blocks results available for use in other
+expressions at all, in which case they are free to select whichever variable
+names make sense for what is being exposed. For example, a format may make
+environment variable values available for use in expressions, and may do so
+either as top-level variables (if no other variables are needed) or as an
+object named ``env``, which can be used as in ``env.HOME``.
+