From ef39087c4b85b884b13c9e4dc777a5dbe2b539e9 Mon Sep 17 00:00:00 2001 From: Martin Atkins Date: Wed, 21 Jun 2017 22:00:23 -0700 Subject: [PATCH] zcl: beginnings of the spec for the syntax-agnostic model --- zcl/spec.md | 646 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 646 insertions(+) create mode 100644 zcl/spec.md diff --git a/zcl/spec.md b/zcl/spec.md new file mode 100644 index 0000000..660b7c6 --- /dev/null +++ b/zcl/spec.md @@ -0,0 +1,646 @@ +# zcl Syntax-Agnostic Information Model + +This is the specification for the general information model (abstract types and +semantics) for zcl. zcl is a system for defining configuration languages for +applications. The zcl information model is designed to support multiple +concrete syntaxes for configuration, each with a mapping to the model defined +in this specification. + +The two primary syntaxes intended for use in conjunction with this model are +[the zcl native syntax](./zclsyntax/spec.md) and [the JSON syntax](./json/spec.md). +In principle other syntaxes are possible as long as either their language model +is sufficiently rich to express the concepts described in this specification +or the language targets a well-defined subset of the specification. + +## Structural Elements + +The primary structural element is the _body_, which is a container representing +a set of zero or more _attributes_ and a set of zero or more _blocks_. + +A _configuration file_ is the top-level object, and will usually be produced +by reading a file from disk and parsing it as a particular syntax. A +configuration file has its own _body_, representing the top-level attributes +and blocks. + +An _attribute_ is a name and value pair associated with a body. Attribute names +are unique within a given body. Attribute values are provided as _expressions_, +which are discussed in detail in a later section. + +A _block_ is a nested structure that has a _type name_, zero or more string +_labels_ (e.g. identifiers), and a nested body. + +Together the structural elements create a heirarchical data structure, with +attributes intended to represent the direct properties of a particular object +in the calling application, and blocks intended to represent child objects +of a particular object. + +## Body Content + +To support the expression of the zcl concepts in languages whose information +model is a subset of zcl's, such as JSON, a _body_ is an opaque container +whose content can only be accessed by providing information on the expected +structure of the content. + +The specification for each syntax must describe how its physical constructs +are mapped on to body content given a schema. For syntaxes that have +first-class syntax distinguishing attributes and bodies this can be relatively +straightforward, while more detailed mapping rules may be required in syntaxes +where the representation of attributes vs. blocks is ambiguous. + +### Schema-driven Processing + +Schema-driven processing is the primary way to access body content. +A _body schema_ is a description of what is expected within a particular body, +which can then be used to extract the _body content_, which then provides +access to the specific attributes and blocks requested. + +A _body schema_ consists of a list of _attribute schemata_ and +_block header schemata_: + +* An _attribute schema_ provides the name of an attribute and whether its + presence is required. + +* A _block header schema_ provides a block type name and the semantic names + assigned to each of the labels of that block type, if any. + +Within a schema, it is an error to request the same attribute name twice or +to request a block type whose name is also an attribute name. While this can +in principle be supported in some syntaxes, in other syntaxes the attribute +and block namespaces are combined and so an an attribute cannot coexist with +a block whose type name is identical to the attribute name. + +The result of applying a body schema to a body is _body content_, which +consists of an _attribute map_ and a _block sequence_: + +* The _attribute map_ is a map data structure whose keys are attribute names + and whose values are _expressions_ that represent the corresponding attribute + values. + +* The _block sequence_ is an ordered sequence of blocks, with each specifying + a block _type name_, the sequence of _labels_ specified for the block, + and the body object (not body _content_) representing the block's own body. + +After obtaining _body content_, the calling application may continue processing +by evaluating attribute expressions and/or recursively applying further +schema-driven processing to the child block bodies. + +**Note:** The _body schema_ is intentionally minimal, to reduce the set of +mapping rules that must be defined for each syntax. Higher-level utility +libraries may be provided to assist in the construction of a schema and +perform additional processing, such as automatically evaluating attribute +expressions and assigning their result values into a data structure, or +recursively applying a schema to child blocks. Such utilities are not part of +this core specification and will vary depending on the capabilities and idiom +of the implementation language. + +### _Dynamic Attributes_ Processing + +The _schema-driven_ processing model is useful when the expected structure +of a body is known a priori by the calling application. Some blocks are +instead more free-form, such as a user-provided set of arbitrary key/value +pairs. + +The alternative _dynamic attributes_ processing mode allows for this more +ad-hoc approach. Processing in this mode behaves as if a schema had been +constructed without any _block header schemata_ and with an attribute +schema for each distinct key provided within the physical representation +of the body. + +The means by which _distinct keys_ are identified is dependent on the +physical syntax; this processing mode assumes that the syntax has a way +to enumerate keys provided by the author and identify expressions that +correspond with those keys, but does not define the means by which this is +done. + +The result of _dynamic attributes_ processing is an _attribute map_ as +defined in the previous section. No _block sequence_ is produced in this +processing mode. + +### Partial Processing of Body Content + +Under _schema-driven processing_, by default the given schema is assumed +to be exhaustive, such that any attribute or block not matched by schema +elements is considered an error. This allows feedback about unsupported +attributes and blocks (such as typos) to be provided. + +An alternative is _partial processing_, where any additional elements within +the body are not considered an error. + +Under partial processing, the result is both body content as described +above _and_ a new body that represents any body elements that remain after +the schema has been processed. + +Specifically: + +* Any attribute whose name is specified in the schema is returned in body + content and elided from the new body. + +* Any block whose type is specified in the schema is returned in body content + and elided from the new body. + +* Any attribute or block _not_ meeting the above conditions is placed into + the new body, unmodified. + +The new body can then be recursively processed using any of the body +processing models. This facility allows different subsets of body content +to be processed by different parts of the calling application. + +Processing a body in two steps — first partial processing of a source body, +then exhaustive processing of the returned body — is equivalent to single-step +processing with a schema that is the union of the schemata used +across the two steps. + +## Expressions + +Attribute values are represented by _expressions_. Depending on the concrete +syntax in use, an expression may just be a literal value or it may describe +a computation in terms of literal values, variables, and functions. + +Each syntax defines its own representation of expressions. For syntaxes based +in languages that do not have any non-literal expression syntax, it is +recommended to embed the template language from +[the native syntax](./zclsyntax/spec.md) e.g. as a post-processing step on +string literals. + +### Expression Evaluation + +In order to obtain a concrete value, each expression must be _evaluated_. +Evaluation is performed in terms of an evaluation context, which +consists of the following: + +* An _evaluation mode_, which is defined below. +* A _variable scope_, which provides a set of named variables for use in + expressions. +* A _function table_, which provides a set of named functions for use in + expressions. + +The _evaluation mode_ allows for two different interpretations of an +expression: + +* In _literal-only mode_, variables and functions are not available and it + is assumed that the calling application's intent is to treat the attribute + value as a literal. + +* In _full expression mode_, variables and functions are defined and it is + assumed that the calling application wishes to provide a full expression + language for definition of the attribute value. + +The actual behavior of these two modes depends on the syntax in use. For +languages with first-class expression syntax, these two modes may be considered +equivalent, with _literal-only mode_ simply not defining any variables or +functions. For languages that embed arbitrary expressions via string templates, +_literal-only mode_ may disable such processing, allowing literal strings to +pass through without interpretation as templates. + +Since literal-only mode does not support variables and functions, it is an +error for the calling application to enable this mode and yet provide a +variable scope and/or function table. + +## Values and Value Types + +The result of expression evaluation is a _value_. Each value has a _type_, +which is dynamically determined during evaluation. The _variable scope_ in +the evaluation context is a map from variable name to value, using the same +definition of value. + +The type system for zcl values is intended to be of a level abstraction +suitable for configuration of various applications. A well-defined, +implementation-language-agnostic type system is defined to allow for +consistent processing of configuration across many implementation languages. +Concrete implementations may provide additional functionality to lower +zcl values and types to corresponding native language types, which may then +impose additional constraints on the values outside of the scope of this +specification. + +Two values are _equal_ if and only if they have identical types and their +values are equal according to the rules of their shared type. + +### Primitive Types + +The primitive types are _string_, _bool_, and _number_. + +A _string_ is a sequence of unicode characters. Two strings are equal if +NFC normalization ([UAX#15](http://unicode.org/reports/tr15/) +of each string produces two identical sequences of characters. +NFC normalization ensures that, for example, a precomposed combination of a +latin letter and a diacritic compares equal with the letter followed by +a combining diacritic. + +The _bool_ type has only two non-null values: _true_ and _false_. Two bool +values are equal if and only if they are either both true or both false. + +A _number_ is an arbitrary-precision floating point value. An implementation +_must_ make the full-precision values available to the calling application +for interpretation into any suitable number representation. An implementation +may in practice implement numbers with limited precision so long as the +following constraints are met: + +* Integers are represented with at least 256 bits. +* Non-integer numbers are represented as floating point values with a + mantissa of at least 256 bits and a signed binary exponent of at least + 16 bits. +* An error is produced if an integer value given in source cannot be + represented precisely. +* An error is produced if a non-integer value cannot be represented due to + overflow. +* A non-integer number is rounded to the nearest possible value when a + value is of too high a precision to be represented. + +The _number_ type also requires representation of both positive and negative +infinity. A "not a number" (NaN) value is _not_ provided nor used. + +Two number values are equal if they are numerically equal to the precision +associated with the number. Positive infinity and negative infinity are +equal to themselves but not to each other. Positive infinity is greater than +any other number value, and negative infinity is less than any other number +value. + +Some syntaxes may be unable to represent numeric literals of arbitrary +precision. This must be defined in the syntax specification as part of its +description of mapping numeric literals to zcl values. + +### Structural Types + +_Structural types_ are types that are constructed by combining other types. +Each distinct combination of other types is itself a distinct type. There +are two structural type _kinds_: + +* _Object types_ are constructed of a set of named attributes, each of which + has a type. Attribute names are always strings. (_Object_ attributes are a + distinct idea from _body_ attributes, though calling applications + may choose to blur the distinction by use of common naming schemes.) +* _Tuple tupes_ are constructed of a sequence of elements, each of which + has a type. + +Values of structural types are compared for equality in terms of their +attributes or elements. A structural type value is equal to another if and +only if all of the corresponding attributes or elements are equal. + +Two structural types are identical if they are of the same kind and +have attributes or elements with identical types. + +### Collection Types + +_Collection types_ are types that combine together an arbitrary number of +values of some other single type. There are three collection type _kinds_: + +* _List types_ represent ordered sequences of values of their element type. +* _Map types_ represent values of their element type accessed via string keys. +* _Set types_ represent unordered sets of distinct values of their element type. + +For each of these kinds and each distinct element type there is a distinct +collection type. For example, "list of string" is a distinct type from +"set of string", and "list of number" is a distinct type from "list of string". + +Values of collection types are compared for equality in terms of their +elements. A collection type value is equal to another if and only if both +have the same number of elements and their corresponding elements are equal. + +Two collection types are identical if they are of the same kind and have +the same element type. + +### Null values + +Each type has a null value. The null value of a type represents the absense +of a value, but with type information retained to allow for type checking. + +Null values are used primarily to represent the conditional absense of a +body attribute. In a syntax with a conditional operator, one of the result +values of that conditional may be null to indicate that the attribute should be +considered not present in that case. + +Calling applications _should_ consider an attribute with a null value as +equivalent to the value not being present at all. + +A null value of a particular type is equal to itself. + +### Unknown Values and the Dynamic Pseudo-type + +An _unknown value_ is a placeholder for a value that is not yet known. +Operations on unknown values themselves return unknown values that have a +type appropriate to the operation. For example, adding together two unknown +numbers yields an unknown number, while comparing two unknown values of any +type for equality yields an unknown bool. + +Each type has a distinct unknown value. For example, an unknown _number_ is +a distinct value from an unknown _string_. + +_The dynamic pseudo-type_ is a placeholder for a type that is not yet known. +The only values of this type are its null value and its unknown value. It is +referred to as a _pseudo-type_ because it should not be considered a type in +its own right, but rather as a placeholder for a type yet to be established. +The unknown value of the dynamic pseudo-type is referred to as _the dynamic +value_. + +Operations on values of the dynamic pseudo-type behave as if it is a value +of the expected type, optimistically assuming that once the value and type +are known they will be valid for the operation. For example, adding together +a number and the dynamic value produces an unknown number. + +Unknown values and the dynamic pseudo-type can be used as a mechanism for +partial type checking and semantic checking: by evaluating an expression with +all variables set to an unknown value, the expression can be evaluated to +produce an unknown value of a given type, or produce an error if any operation +is provably invalid with only type information. + +Unknown values and the dynamic pseudo-type must never be returned from +operations unless at least one operand is unknown or dynamic. Calling +applications are guaranteed that unless the global scope includes unknown +values, or the function table includes functions that return unknown values, +no expression will evaluate to an unknown value. The calling application is +thus in total control over the use and meaning of unknown values. + +The dynamic pseudo-type is identical only to itself. + +### Capsule Types + +A _capsule type_ is a custom type defined by the calling application. A value +of a capsule type is considered opaque to zcl, but may be accepted +by functions provided by the calling application. + +A particular capsule type is identical only to itself. The equality of two +values of the same capsule type is defined by the calling application. No +other operations are supported for values of capsule types. + +Support for capsule types in a zcl implementation is optional. Capsule types +are intended to allow calling applications to pass through values that are +not part of the standard type system. For example, an application that +deals with raw binary data may define a capsule type representing a byte +array, and provide functions that produce or operate on byte arrays. + +### Type Specifications + +In certain situations it is necessary to define expectations about the expected +type of a value. Whereas two _types_ have a commutative _identity_ relationship, +a type has a non-commutative _matches_ relationship with a _type specification_. +A type specification is, in practice, just a different interpretation of a +type such that: + +* Any type _matches_ any type that it is identical to. + +* Any type _matches_ the dynamic pseudo-type. + +For example, given a type specification "list of dynamic pseudo-type", the +concrete types "list of string" and "list of map" match, but the +type "set of string" does not. + +## Functions and Function Calls + +The evaluation context used to evaluate an expression includes a function +table, which represents an application-defined set of named functions +available for use in expressions. + +Each syntax defines whether function calls are supported and how they are +physically represented in source code, but the semantics of function calls are +defined here to ensure consistent results across syntaxes and to allow +applications to provide functions that are interoperable with all syntaxes. + +A _function_ is defined from the following elements: + +* Zero or more _positional parameters_, each with a name used for documentation, + a type specification for expected argument values, and a flag for whether + each of null values, unknown values, and values of the dynamic pseudo-type + are accepted. + +* Zero or one _variadic parameters_, with the same structure as the _positional_ + parameters, which if present collects any additional arguments provided at + the function call site. + +* A _result type definition_, which specifies the value type returned for each + valid sequence of argument values. + +* A _result value definition_, which specifies the value returned for each + valid sequence of argument values. + +A _function call_, regardless of source syntax, consists of a sequence of +argument values. The argument values are each mapped to a corresponding +parameter as follows: + +* For each of the function's positional parameters in sequence, take the next + argument. If there are no more arguments, the call is erroneous. + +* If the function has a variadic parameter, take all remaining arguments that + where not yet assigned to a positional parameter and collect them into + a sequence of variadic arguments that each correspond to the variadic + parameter. + +* If the function has _no_ variadic parameter, it is an error if any arguments + remain after taking one argument for each positional parameter. + +After mapping each argument to a parameter, semantic checking proceeds +for each argument: + +* If the argument value corresponding to a parameter does not match the + parameter's type specification, the call is erroneous. + +* If the argument value corresponding to a parameter is null and the parameter + is not specified as accepting nulls, the call is erroneous. + +* If the argument value corresponding to a parameter is the dynamic value + and the parameter is not specified as accepting values of the dynamic + pseudo-type, the call is valid but its _result type_ is forced to be the + dynamic pseudo type. + +* If neither of the above conditions holds for any argument, the call is + valid and the function's value type definition is used to determine the + call's _result type_. A function _may_ vary its result type depending on + the argument _values_ as well as the argument _types_; for example, a + function that decodes a JSON value will return a different result type + depending on the data structure described by the given JSON source code. + +If semantic checking succeeds without error, the call is _executed_: + +* For each argument, if its value is unknown and its corresponding parameter + is not specified as accepting unknowns, the _result value_ is forced to be an + unknown value of the result type. + +* If the previous condition does not apply, the function's result value + definition is used to determine the call's _result value_. + +The result of a function call expression is either an error, if one of the +erroenous conditions above applies, or the _result value_. + +## Type Conversions and Unification + +Values given in configuration may not always match the expectations of the +operations applied to them or to the calling application. In such situations, +automatic type conversion is attempted as a convenience to the user. + +Along with conversions to a _specified_ type, it is sometimes necessary to +ensure that a selection of values are all of the _same_ type, without any +constraint on which type that is. This is the process of _type unification_, +which attempts to find the most general type that all of the given types can +be converted to. + +Both type conversions and unification are defined in the syntax-agnostic +model to ensure consistency of behavior between syntaxes. + +Type conversions are broadly characterized into two categories: _safe_ and +_unsafe_. A conversion is "safe" if any distinct value of the source type +has a corresponding distinct value in the target type. A conversion is +"unsafe" if either the target type values are _not_ distinct (information +may be lost in conversion) or if some values of the source type do not have +any corresponding value in the target type. An unsafe conversion may result +in an error. + +A given type can always be converted to itself, which is a no-op. + +### Conversion of Null Values + +All null values are safely convertable to a null value of any other type, +regardless of other type-specific rules specified in the sections below. + +### Conversion to and from the Dynamic Pseudo-type + +Conversion _from_ the dynamic pseudo-type _to_ any other type always succeeds, +producing an unknown value of the target type. + +Conversion of any value _to_ the dynamic pseudo-type is a no-op. The result +is the input value, verbatim. This is the only situation where the conversion +result value is not of the the given target type. + +### Primitive Type Conversions + +Bidirectional conversions are available between the string and number types, +and between the string and boolean types. + +The bool value true corresponds to the string containing the characters "true", +while the bool value false corresponds to teh string containing the characters +"false". Conversion from bool to string is safe, while the converse is +unsafe. The strings "1" and "0" are alternative string representations +of true and false respectively. It is an error to convert a string other than +the four in this paragraph to type bool. + +A number value is converted to string by translating its integer portion +into a sequence of decimal digits (`0` through `9`), and then if it has a +non-zero fractional part, a period `.` followed by a sequence of decimal +digits representing its fractional part. No exponent portion is included. +The number is converted at its full precision. Conversion from number to +string is safe. + +A string is converted to a number value by reversing the above mapping. +No exponent portion is allowed. Conversion from string to number is unsafe. +It is an error to convert a string that does not comply with the expected +syntax to type number. + +No direct conversion is available between the bool and number types. + +### Collection and Structural Type Conversions + +Conversion from set types to list types is _safe_, as long as their +element types are safely convertable. If the element types are _unsafely_ +convertable, then the collection conversion is also unsafe. Each set element +becomes a corresponding list element, in an undefined order. Although no +particular ordering is required, implementations _should_ produce list +elements in a consistent order for a given input set, as a convenience +to calling applications. + +Conversion from list types to set types is _unsafe_, as long as their element +types are convertable. Each distinct list item becomes a distinct set item. +If two list items are equal, one of the two is lost in the conversion. + +Conversion from tuple types to list types permitted if all of the +tuple element types are convertable to the target list element type. +The safety of the conversion depends on the safety of each of the element +conversions. Each element in turn is converted to the list element type, +producing a list of identical length. + +Conversion from tuple types to set types is permitted, behaving as if the +tuple type was first converted to a list of the same element type and then +that list converted to the target set type. + +Conversion from object types to map types is permitted if all of the object +attribute types are convertable to the target map element type. The safety +of the conversion depends on the safety of each of the attribute conversions. +Each attribute in turn is converted to the map element type, and map element +keys are set to the name of each corresponding object attribute. + +Conversion from list and set types to tuple types is permitted, following +the opposite steps as the converse conversions. Such conversions are _unsafe_. +It is an error to convert a list or set to a tuple type whose number of +elements does not match the list or set length. + +Conversion from map types to object types is permitted if each map key +corresponds to an attribute in the target object type. It is an error to +convert from a map value whose set of keys does not exactly match the target +type's attributes. The conversion takes the opposite steps of the converse +conversion. + +Conversion from one object type to another is permitted as long as the +common attribute names have convertable types. Any attribute present in the +target type but not in the source type is populated with a null value of +the appropriate type. + +Conversion from one tuple type to another is permitted as long as the +tuples have the same length and the elements have convertable types. + +### Type Unification + +Type unification is an operation that takes a list of types and attempts +to find a single type to which they can all be converted. Since some +type pairs have bidirectional conversions, preference is given to _safe_ +conversions. In technical terms, all possible types are arranged into +a lattice, from which a most general supertype is selected where possible. + +The type resulting from type unification may be one of the input types, or +it may be an entirely new type produced by combination of two or more +input types. + +The following rules do not guarantee a valid result. In addition to these +rules, unification fails if any of the given types are not convertable +(per the above rules) to the selected result type. + +The following unification rules apply transitively. That is, if a rule is +defined from A to B, and one from B to C, then A can unify to C. + +Number and bool types both unify with string by preferring string. + +Two collection types of the same kind unify according to the unification +of their element types. + +List and set types unify by preferring the list type. + +Map and object types unify by preferring the object type. + +List, set and tuple types unify by preferring the tuple type. + +The dynamic pseudo-type unifies with any other type by selecting that other +type. The dynamic pseudo-type is the result type only if _all_ input types +are the dynamic pseudo-type. + +Two object types unify by constructing a new type whose attributes are +the union of those of the two input types. Any common attributes themselves +have their types unified. + +Two tuple types of the same length unify constructing a new type of the +same length whose elements are the unification of the corresponding elements +in the two input types. + +## Implementation Considerations + +Implementations of this specification are free to adopt any strategy that +produces behavior consistent with the specification. This non-normative +section describes some possible implementation strategies that are consistent +with the goals of this specification. + +### Language-agnosticism + +The language-agnosticism of this specification assumes that certain behaviors +are implemented separately for each syntax: + +* Matching of a body schema with the physical elements of a body in the + source language, to determine correspondance between physical constructs + and schema elements. + +* Implementing the _dynamic attributes_ body processing mode by either + interpreting all physical constructs as attributes or producing an error + if non-attribute constructs are present. + +* Providing an evaluation function for all possible expressions that produces + a value given an evaluation context. + +The suggested implementation strategy is to use an implementation language's +closest concept to an _abstract type_, _virtual type_ or _interface type_ +to represent both Body and Expression. Each language-specific implementation +can then provide an implementation of each of these types wrapping AST nodes +or other physical constructs from the language parser.