This repository has been archived on 2024-04-08. You can view files and clone it, but cannot push or open issues or pull requests.
dagger/docs/learn/1005-what_is_cue.md
Joel Clermont 30d9981ee3 typo fixes
Signed-off-by: Joel Clermont <jclermont@gmail.com>
2021-11-19 16:51:20 -06:00

13 KiB
Raw Blame History

slug
/1005/what-is-cue/

What is Cue?

CUE is a powerful configuration language created by Marcel van Lohuizen who co-created the Borg Configuration Language (BCL)—the language used to deploy all applications at Google. CUE is the result of years of experience writing configuration languages at Google, and seeks to improve the developer experience while avoiding some nasty pitfalls. It is a superset of JSON, with additional features to make declarative, data-driven programming as pleasant and productive as regular imperative programming.

The Need for a Configuration Language

For decades, developers, engineers, and system administrators alike have used some combination of INI, ENV, YAML, XML, and JSON (as well as custom formats such as those for Apache, Nginx, et al) to describe configurations, resources, operations, variables, parameters, state, etc. While these examples work fine for storing data, they are merely data formats, not languages, and as such they each lack the ability to execute logic and operate on data directly.

Simple—yet powerful!—things like if statements, for loops, comprehensions, and string interpolation, among others are just not possible in these formats without the use of a separate process for execution. The result is that variables or parameters must be injected, and any logic executed by a templating language (such as Jinja) or by a separate engine instructed by a DSL (Domain-specific Language). Often templating languages and DSLs are used in conjuction and while this technically works, the results are that we end up with code bases, or even single files, that are overly verbose, that intersperse templating languages with various DSLs (and sometimes multiple DSLs that feed ouput from one to the input of another!), that create rigid structures without enforcing schemas (not without more effort), thereby making the code challenging to reason about, difficult to maintain, brittle, and perhaps worst of all, prone to side effects.

A configuration language such as CUE, allows us to both specify data as well as act upon that data with any logic necessary to achieve the desired output. Furthermore, and perhaps most importantly, CUE allows us to not only specify data as concrete values, but also specify the types those concrete values must be as well as any constraints such as min and max for example. It gives us the ability to define a schema but unlike doing so with say JSON Schema, CUE can both define and enforce the schema, whereas JSON Schema is merely a definition that requires some other process to enforce it.

For a deeper dive on the problems with configuration as code, check out The Configuration Complexity Curse and How Cue Wins.

Understanding Cue

We won't attempt to rewrite the CUE documentation or replicate some excellent CUE tutorials, but instead give you enough understanding of CUE to move through the dagger tutorials.

At least for our purposes here you need to understand a few key concepts, each of which will be explained in more detail below:

  1. Cue is a superset of JSON
  2. Types are values
  3. Concrete values
  4. Constraints, definitions, and schemas
  5. Unification
  6. Default values and the nature of inheritance
  7. Packages

It would be most helpful to install CUE, but if you prefer you can also try these examples in the CUE playground.

Cue is a Superset of JSON

What you can express in JSON, you can express in CUE, but not everything in CUE can be expressed in JSON. CUE also supports a "lite" version of JSON where certain characters can be eliminated entirely. Take a look at the following code:

{
  "Bob": {
    "Name": "Bob Smith",
    "Age": 42
  }
}

Bob: Name: "Bob Smith"

In this example we see that in CUE we have declared a top-level key Bob twice: once in a more verbose JSON style with brackets, quotes, and commas, and again with a "lite" style without the extra characters. Notice also that CUE supports short hand: when you are targeting a single key within an object, you don't need the curly braces, you can write it as a colon-separated path. Try it in the CUE playground, and notice the output (you can choose different formats). The top-level Bob key is declared twice, but is only output once because CUE is automatically unifying the two declarations. Its ok to declare the same field multiple times, so long as we provide the same value. See Default Values and the Nature of Inheritance below.

Types are Values

In the previous example we defined the Name value as the string literal "Bob Smith" and the Age value as the integer literal 42, both of which are concrete values. Generally, the output of CUE will be used as input to some other system be it an API, a CLI tool such as dagger, a CICD process, etc, and those systems will likely expect that the data conform to a schema where each field has a type and potentially constrained by such functions as min, max, enums, regular expressions, and so on. With that in mind, we need to enforce types and constraints in order to prevent us for example from setting the Name value as an integer, or the Age value as a string

Bob: {
  Name: string // type as the value
  Age: int
}

Bob: {
  Name: "Bob Smith" // literals match the type
  Age: 42
}

Here weve defined the Name field as a string and the Age field as an int. Notice how string and int are not within quotes. This is what we mean when we say "types are values". This will be quite familiar to anyone who has written Go or some other strongly-typed language. With these types defined, CUE will now enforce them, so that any attempt to provide say an integer for the Name or a string for the Age will result in an error. Its worth noting here that the output from this example is the result of implicit unification; well talk about explicit unification later. Try it in the CUE playground.

Concrete Values

CUE is ultimately used to export data, and is most useful when that data has been validated against a well-defined schema. In order for CUE to export anything, we must provide concrete values for all defined fields not marked as optional. If we were to leave a required field simply defined as a type, without a concrete value, CUE will return an error.

Bob: {
  Name: string
  Age: int
}

Bob: {
  Name: "Bob Smith"
  //Age: is considered "incomplete" because no concrete value is defined
}

Try it in the CUE playground and see that CUE will complain of an incomplete value.

Definitions

In a real-world scenario wed likely need to define more than one person, and ensure that each one satisfies the schema. That's where definitions come in handy.

#Person: {
  Name: string
  Email: string
  Age?: int
}

Bob: #Person & {
  Name: "Bob Smith"
  Email: "bob@smith.com"
  // Age is now optional
}

In this example weve declared that #Person is a definition, as denoted by the # sign. By doing so, we have constrained the Person object to a specific set of fields, each a specific type. Definitions by default are closed meaning that a #Person cannot contain any fields not specified in the definition. You will also notice that Age? now contains a ? which denotes this field as being optional.

Definitions themselves are not exported to final output. To get concrete output, weve declared that the field Bob is a #Person, and using the single & (not the same as logical AND via &&!) we unified the #Person definition with an object whose concrete values satisfy the constraints defined by that definition.

You can think of definitions as a logical set of related constraints and a schema as a larger collective of constraints, not all of which need to be definitions.

Try it in the CUE playground and experiment with making fields optional via ? with values both defined and not defined to see.

Unification

Unification is really at the core of what makes CUE what it is. If values are the fuel, unification is the engine. It is through unification that we can both define constraints and compute concrete values. Let's take a look at some examples to see this idea in action:

import (
  "strings" // import builtin package
)           // more on packages later

#Person: {

  // further constrain to a min and max length
  Name: string & strings.MinRunes(3) & strings.MaxRunes(22)

  // we don't need string because the regex handles that
  Email: =~"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$"

  // further constrain to realistic ages
  Age?: int & >0 & <140

}

Bob: #Person & {
  Name: "Bob Smith"
  Email: "bob@smith.com"
  Age: 42
}

// output in YAML:
Bob:
  Name: Bob Smith
  Email: bob@smith.com
  Age: 42

The output here is a product of unifying the #Person definition with an object that contains concrete values each of which is the product of unifying the concrete value with the types and constraints declared by the field in the defintion. Try it in the CUE playground

Default Values and the Nature of Inheritance

When unifying objects, or structs as we like to call them, a form of merging happens where fields are unified recursively, but unlike for example merging JSON objects in JavaScript, differing values will not override but result in an error. This is partially due to the commutative nature of CUE (if order doesn't matter how would you choose one value over another?), but it is primarily due to the fact that overrides too easily result in unwanted and difficult to debug side effects. Let's take a look at another example:

import (
  "strings" // a builtin package
)

#Person: {

  // further constrain to a min and max length
  Name: string & strings.MinRunes(3) & strings.MaxRunes(22)

  // we don't need string because the regex handles that
  Email: =~"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$"

  // further constrain to realistic ages
  Age?: int & >0 & <140

  // Job is optional and a string
  Job?: string
}

#Engineer: #Person & {
  Job: "Engineer" // Job is further constrained to required and exactly this value
}


Bob: #Engineer & {
  Name: "Bob Smith"
  Email: "bob@smith.com"
  Age: 42
  // Job: "Carpenter" // would result in an error
}

// output in YAML:
Bob:
  Name: Bob Smith
  Email: bob@smith.com
  Age: 42
  Job: Engineer

While it's possible for the Bob object to inherit the Job value from #Engineer which in turn inherits constraints from #Person, it is not possible to override the Job value. Try it in the CUE playground and uncomment the Job field in Bob to see that CUE returns an error.

If we wanted the Bob object to have a different job, it would either need to be unified with a different type OR the #Engineer:Job: field would need a looser constraint with a default value. Try changing the Job field to the following:

#Engineer: #Person & {
  Job: string | *"Engineer" // can still be any string, but *defaults* to "Engineer" when no concrete value is explicitly defined
}

Bob inherits the default value but is now allowed to specify a different job.

Packages

Packages in CUE allow us to write modular, reusable, and composable code. We can define schemas that are imported into various files and projects. If youve written Go, then CUE should feel quite familiar. Not only is it written in Go much of its behavior and syntax are modeled after Go as well.

CUE has a number of builtin packages such as strings, regexp, math, and many more. These builtin packages are already available to CUE without the need to download or install anything else. Third-party packages are those that are placed within the cue.mod/pkg/ folder and start with a fully qualified domain such as alpha.dagger.io.

In the last few examples weve included an import statement to load the builtin strings package. In the tutorials and other examples you will notice that other packages from alpha.dagger.io will be imported and used.