guide: The "Configuration Language Design" section

2018-09-05 08:08:11 -07:00 · 2018-09-05 08:08:11 -07:00 · 57c9a676d7
commit 57c9a676d7
parent 280771fe8a
3 changed files with 286 additions and 0 deletions
--- a/guide/go_expression_eval.rst
+++ b/guide/go_expression_eval.rst
@ -75,6 +75,8 @@ complex structures:
   source_file = "${path.module}/foo.txt"
 .. _go-expression-funcs:
 Defining Functions
 ------------------
--- a/guide/go_patterns.rst
+++ b/guide/go_patterns.rst
@ -8,6 +8,8 @@ some more complex situations that can benefit from some additional techniques.
 This section lists a few of these situations and ways to use the HCL API to
 accommodate them.
 .. _go-interdep-blocks:
 Interdependent Blocks
 ---------------------
--- a/guide/language_design.rst
+++ b/guide/language_design.rst
@ -1,3 +1,285 @@
 Configuration Language Design
 =============================
 In this section we will cover some conventions for HCL-based configuration
 languages that can help make them feel consistent with other HCL-based
 languages, and make the best use of HCL's building blocks.
 HCL's native and JSON syntaxes both define a mapping from input bytes to a
 higher-level information model. In designing a configuration language based on
 HCL, your building blocks are the components in that information model:
 blocks, arguments, and expressions.
 Each calling application of HCL, then, effectively defines its own language.
 Just as Atom and RSS are higher-level languages built on XML, HashiCorp
 Terraform has a higher-level language built on HCL, while HashiCorp Nomad has
 its own distinct language that is *also* built on HCL.
 From an end-user perspective, these are distinct languages but have a common
 underlying texture. Users of both are therefore likely to bring some
 expectations from one to the other, and so this section is an attempt to
 codify some of these shared expectations to reduce user surprise.
 These are subjective guidelines however, and so applications may choose to
 ignore them entirely or ignore them in certain specialized cases. An
 application providing a configuration language for a pre-existing system, for
 example, may choose to eschew the identifier naming conventions in this section
 in order to exactly match the existing names in that underlying system.
 Language Keywords and Identifiers
 ---------------------------------
 Much of the work in defining an HCL-based language is in selecting good names
 for arguments, block types, variables, and functions.
 The standard for naming in HCL is to use all-lowercase identifiers with
 underscores separating words, like ``service`` or ``io_mode``. HCL identifiers
 do allow uppercase letters and dashes, but this primarily for natural
 interfacing with external systems that may have other identifier conventions,
 and so these should generally be avoided for the identifiers native to your
 own language.
 The distinction between "keywords" and other identifiers is really just a
 convention. In your own language documentation, you may use the word "keyword"
 to refer to names that are presented as an intrinsic part of your language,
 such as important top-level block type names.
 Block type names are usually singular, since each block defines a single
 object. Use a plural block name only if the block is serving only as a
 namespacing container for a number of other objects. A block with a plural
 type name will generally contain only nested blocks, and no arguments of its
 own.
 Argument names are also singular unless they expect a collection value, in
 which case they should be plural. For example, ``name = "foo"`` but
 ``subnet_ids = ["abc", "123"]``.
 Function names will generally *not* use underscores and will instead just run
 words together, as is common in the C standard library. This is a result of
 the fact that several of the standard library functions offered in ``cty``
 (covered in a later section) have names that follow C library function names
 like ``substr``. This is not a strong rule, and applications that use longer
 names may choose to use underscores for them to improve readability.
 Blocks vs. Object Values
 ------------------------
 HCL blocks and argument values of object type have quite a similar appearance
 in the native syntax, and are identical in JSON syntax:
 .. code-block:: hcl
   block {
     foo = bar
   }
   # argument with object constructor expression
   argument = {
     foo = bar
   }
 In spite of this superficial similarity, there are some important differences
 between these two forms.
 The most significant difference is that a child block can contain nested blocks
 of its own, while an object constructor expression can define only attributes
 of the object it is creating.
 The user-facing model for blocks is that they generally form the more "rigid"
 structure of the language itself, while argument values can be more free-form.
 An application will generally define in its schema and documentation all of
 the arguments that are valid for a particular block type, while arguments
 accepting object constructors are more appropriate for situations where the
 arguments themselves are freely selected by the user, such as when the
 expression will be converted by the application to a map type.
 As a less contrived example, consider the ``resource`` block type in Terraform
 and its use with a particular resource type ``aws_instance``:
 .. code-block:: hcl
   resource "aws_instance" "example" {
     ami           = "ami-abc123"
     instance_type = "t2.micro"
     tags = {
       Name = "example instance"
     }
     ebs_block_device {
       device_name = "hda1"
       volume_size = 8
       volume_type = "standard"
     }
   }
 The top-level block type ``resource`` is fundamental to Terraform itself and
 so an obvious candidate for block syntax: it maps directly onto an object in
 Terraform's own domain model.
 Within this block we see a mixture of arguments and nested blocks, all defined
 as part of the schema of the ``aws_instance`` resource type. The ``tags``
 map here is specified as an argument because its keys are free-form, chosen
 by the user and mapped directly onto a map in the underlying system.
 ``ebs_block_device`` is specified as a nested block, because it is a separate
 domain object within the remote system and has a rigid schema of its own.
 As a special case, block syntax may sometimes be used with free-form keys if
 those keys each serve as a separate declaration of some first-class object
 in the language. For example, Terraform has a top-level block type ``locals``
 which behaves in this way:
 .. code-block:: hcl
   locals {
     instance_type = "t2.micro"
     instance_id   = aws_instance.example.id
   }
 Although the argument names in this block are arbitrarily selected by the
 user, each one defines a distinct top-level object. In other words, this
 approach is used to create a more ergonomic syntax for defining these simple
 single-expression objects, as a pragmatic alternative to more verbose and
 redundant declarations using blocks:
 .. code-block:: hcl
   local "instance_type" {
     value = "t2.micro"
   }
   local "instance_id" {
     value = aws_instance.example.id
   }
 The distinction between domain objects, language constructs and user data will
 always be subjective, so the final decision is up to you as the language
 designer.
 Standard Functions
 ------------------
 HCL itself does not define a common set of functions available in all HCL-based
 languages; the built-in language operators give a baseline of functionality
 that is always available, but applications are free to define functions as they
 see fit.
 With that said, there's a number of generally-useful functions that don't
 belong to the domain of any one application: string manipulation, sequence
 manipulation, date formatting, JSON serialization and parsing, etc.
 Given the general need such functions serve, it's helpful if a similar set of
 functions is available with compatible behavior across multiple HCL-based
 languages, assuming the language is for an application where function calls
 make sense at all.
 The Go implementation of HCL is built on an underlying type and function system
 :go:pkg:`cty`, whose usage was introduced in :ref:`go-expression-funcs`. That
 library also has a package of "standard library" functions which we encourage
 applications to offer with consistent names and compatible behavior, either by
 using the standard implementations directly or offering compatible
 implementations under the same name.
 The "standard" functions that new configuration formats should consider
 offering are:
 * ``abs(number)`` - returns the absolute (positive) value of the given number.
 * ``coalesce(vals...)`` - returns the value of the first argument that isn't null. Useful only in formats where null values may appear.
 * ``compact(vals...)`` - returns a new tuple with the non-null values given as arguments, preserving order.
 * ``concat(seqs...)`` - builds a tuple value by concatenating together all of the given sequence (list or tuple) arguments.
 * ``format(fmt, args...)`` - performs simple string formatting similar to the C library function ``printf``.
 * ``hasindex(coll, idx)`` - returns true if the given collection has the given index. ``coll`` may be of list, tuple, map, or object type.
 * ``int(number)`` - returns the integer component of the given number, rounding towards zero.
 * ``jsondecode(str)`` - interprets the given string as JSON format and return the corresponding decoded value.
 * ``jsonencode(val)`` - encodes the given value as a JSON string.
 * ``length(coll)`` - returns the length of the given collection.
 * ``lower(str)`` - converts the letters in the given string to lowercase, using Unicode case folding rules.
 * ``max(numbers...)`` - returns the highest of the given number values.
 * ``min(numbers...)`` - returns the lowest of the given number values.
 * ``sethas(set, val)`` - returns true only if the given set has the given value as an element.
 * ``setintersection(sets...)`` - returns the intersection of the given sets
 * ``setsubtract(set1, set2)`` - returns a set with the elements from ``set1`` that are not also in ``set2``.
 * ``setsymdiff(sets...)`` - returns the symmetric difference of the given sets.
 * ``setunion(sets...)`` - returns the union of the given sets.
 * ``strlen(str)`` - returns the length of the given string in Unicode grapheme clusters.
 * ``substr(str, offset, length)`` - returns a substring from the given string by splitting it between Unicode grapheme clusters.
 * ``timeadd(time, duration)`` - takes a timestamp in RFC3339 format and a possibly-negative duration given as a string like ``"1h"`` (for "one hour") and returns a new RFC3339 timestamp after adding the duration to the given timestamp.
 * ``upper(str)`` - converts the letters in the given string to uppercase, using Unicode case folding rules.
 Not all of these functions will make sense in all applications. For example, an
 application that doesn't use set types at all would have no reason to provide
 the set-manipulation functions here.
 Some languages will not provide functions at all, since they are primarily for
 assigning values to arguments and thus do not need nor want any custom
 computations of those values.
 Block Results as Expression Variables
 -------------------------------------
 In some applications, top-level blocks serve also as declarations of variables
 (or of attributes of object variables) available during expression evaluation,
 as discussed in :ref:`go-interdep-blocks`.
 In this case, it's most intuitive for the variables map in the evaluation
 context to contain an value named after each valid top-level block
 type and for these values to be object-typed or map-typed and reflect the
 structure implied by block type labels.
 For example, an application may have a top-level ``service`` block type
 used like this:
 .. code-block:: hcl
  service "http" "web_proxy" {
    listen_addr = "127.0.0.1:8080"
    process "main" {
      command = ["/usr/local/bin/awesome-app", "server"]
    }
    process "mgmt" {
      command = ["/usr/local/bin/awesome-app", "mgmt"]
    }
  }
 If the result of decoding this block were available for use in expressions
 elsewhere in configuration, the above convention would call for it to be
 available to expressions as an object at ``service.http.web_proxy``.
 If it the contents of the block itself that are offered to evaluation -- or
 a superset object *derived* from the block contents -- then the block arguments
 can map directly to object attributes, but it is up to the application to
 decide which value type is most appropriate for each block type, since this
 depends on how multiple blocks of the same type relate to one another, or if
 multiple blocks of that type are even allowed.
 In the above example, an application would probably expose the ``listen_addr``
 argument value as ``service.http.web_proxy.listen_addr``, and may choose to
 expose the ``process`` blocks as a map of objects using the labels as keys,
 which would allow an expression like
 ``service.http.web_proxy.service["main"].command``.
 If multiple blocks of a given type do not have a significant order relative to
 one another, as seems to be the case with these ``process`` blocks,
 representation as a map is often the most intuitive. If the ordering of the
 blocks *is* significant then a list may be more appropriate, allowing the use
 of HCL's "splat operators" for convenient access to child arguments. However,
 there is no one-size-fits-all solution here and language designers must
 instead consider the likely usage patterns of each value and select the
 value representation that best accommodates those patterns.
 Some applications may choose to offer variables with slightly different names
 than the top-level blocks in order to allow for more concise references, such
 as abbreviating ``service`` to ``svc`` in the above examples. This should be
 done with care since it may make the relationship between the two less obvious,
 but this may be a good tradeoff for names that are accessed frequently that
 might otherwise hurt the readability of expressions they are embedded in.
 Familiarity permits brevity.
 Many applications will not make blocks results available for use in other
 expressions at all, in which case they are free to select whichever variable
 names make sense for what is being exposed. For example, a format may make
 environment variable values available for use in expressions, and may do so
 either as top-level variables (if no other variables are needed) or as an
 object named ``env``, which can be used as in ``env.HOME``.