Jeremy Siek: July 2012

Monday, July 30, 2012

The Semantics of a Familiar Language: Featherweight C

The example languages used to explain programming language semantics are often based on languages that are less familiar to many programmers, such as the lambda calculus or ML. This blog post is motivated by the need to have more example semantics for languages that are familiar to many programmers. One such semantics is that of Featherweight Java by Igarashi, Pierce, and Wadler. FJ does a nice job of capturing the essence of Java with a relatively simple small-step semantics.

In this blog post I begin to do something similar for a subset of the C language that I'm going to call Featherweight C, or FC for short. I should emphasize that the point of FC is educational, as an example in programming language semantics. The point is not to create a complete semantics for C. That has recently been accomplished by Chucky Ellison in his Ph.D. thesis A Formal Semantics of C with Applications. The C language, while often called "small", is actually a rather complex and subtle language once you look carefully at all the details. I suspect that the C language, in all its gory details, is not a language with which many programmers are familiar.

With Featherweight C, I'm going to try and capture a subset of C that is both familiar to many programmers and that captures the essence of C. By that I mean that it includes the following distinguishing characteristics of C:

functions with stack-based allocation and no lexical scoping,
pointers and heap allocation,
arbitrary control-flow via goto, and
character-based input and output.

Of course, one could select other characteristics of C as being the distinguishing ones. These just strike me as the important ones from a language design standpoint.

Featherweight C draws considerable inspiration from the C subset named Clight by Sandrine Blazy and Xavier Leroy. Featherweight C is a bit smaller syntactically than Clight but a bit closer to C because it includes goto. Blazy and Leroy give a big-step semantics to Clight whereas here I define an abstract machine for Featherweight C, which I think better corresponds to a programmer's mental model of an executing C program. However, there is no deep difference between big-step semantics and abstract machines, as Oliver Danvy and his students have taught us. (I'll blog about the equivalence between big-step semantics and abstract machines someday soon.)

Disclaimer: My knowledge of C is a tad rusty and I put this together in a rush, so don't consider it the gospel truth! If you see anything that is inaccurate with respect to the C standard, please post a comment!

Syntax

With the above four characteristics in mind and drawing inspiration from Clight, I've carved out the following subset of the C grammar to call Featherweight C, described in more detail below. (I'm still looking for ways to reduce the size of FC without removing any essential characteristics. If you notice something that could be removed, let me know!) $\begin{array}{lrcl} \text{types} & T & ::= & \mathtt{int} \mid T\mathtt{*} \mid T \mathtt{(}T_1,\ldots,T_n\mathtt{)} \\ \text{variables} & x,y \\ \text{labels} & \ell \\ \text{integers} & i \\ \text{unary ops} & \diamond & ::= & \mathtt{-} \mid \mathtt{!} \\ \text{binary ops} & \odot & ::= & \mathtt{+} \mid \mathtt{-} \mid \mathtt{\&\&} \mid \mathtt{||} \mid \mathtt{==} \\ \text{l-values} & l & ::= & x \mid \mathtt{*} e \\ \text{expressions} & e & ::= & l \mid i \mid \mathtt{\&}\,l \mid e \odot e \mid \diamond e \\ \text{statements} & s & ::= & l \,\mathtt{=}\, e \mathtt{;} \mid \mathtt{free(}e\mathtt{);} \mid l \,\mathtt{=}\, e \mathtt{(} \overline{e} \mathtt{)} \mathtt{;} \mid \\ & & & \mathtt{if}\,\mathtt{(} e \mathtt{)} \, \mathtt{goto}\, \ell \mathtt{;} \mid \ell \mathtt{:}\; s \mid \\ & & & \mathtt{return}\, e \mathtt{;} \\ \text{declarations} & d & ::= & T\, x(\overline{T\,x}) \, \{ \overline{T\,x\mathtt{;}} \; \overline{s} \} \mid T\,x \mathtt{;} \\ \text{programs} & p & ::= & \overline{d} \end{array}$

We have just three types in FC: integers, pointers, and function pointers. We keep pointers and function pointers separate because we're going to allow different actions on pointers and function pointers. Function pointers may be used in a function calls whereas (normal) pointers may be dereferenced.

The expressions include variables, integer literals, some arithmetic and logical operations, and pointer operations (address-of and dereference). The l-value subset of the expressions may be used on the left-hand side of an assignment. We do not include any possibly side-effecting expressions (such as assignment or function call) because then the semantics would need to consider sequence points and order of evaluation, which makes my head hurt.

Assignments and function call do appear as statements. However, we do not include void as a potential return type because void is a rather strange type. This choice forces us to make free be it's own statement (because it has a void return type). For control flow, we include an if-goto combination and labeled statements. Because FC has functions, we of course include return statements. In addition to programmer-defined functions, we include three functions from the C standard library: malloc, getchar, and putchar.

Programs in FC are sequences of declarations, which can either be function declarations or global variables. Our functions begin with a sequence of local variable declarations followed by a sequence of statements.

Operational Semantics via an Abstract Machine

As mentioned above, we'll create an abstract machine to specify the semantics of Featherweight C. An abstract machine is similar to a small-step semantics in that it takes small steps, one transition for each computation. An abstract machine differs from a small-step semantics in that instead of relying solely on textual rewriting of the program itself, an abstract machine includes auxiliary data structures that facilitate the description of what's going on during execution. (An abstract machine differs from a virtual machine in that it still works over program text, whereas a virtual machine works over instructions such as bytecodes.)

The two main auxiliary data structures that we'll need for the FC machine are a procedure call stack and memory. Our treatment of the stack and memory differs from that of a typical computer because we need to be careful to not define behavior that shouldn't be defined, such as accessing past the end of an array or jumping a pointer from one chunk of allocated memory to another using pointer arithmetic. We're trying to create a specification, whereas, in an implementation, you're free to provide behavior for things that are undefined. Also, in most computers the stack is just a region within memory. Here we'll put the control-flow aspects into a separate stack data structure while the values of stack-allocated variables will be in memory. So anything that can be thought of has having an address, including functions, is stored in memory.

Getting more specific, the memory will consist of blocks, one block for each call to malloc. Each block is a tuple of 4-byte values. (We're assuming a 32-bit machine.) An address is therefore a pair $(b,n)$ containing a block identifier $b$ and an offset $n$ into the block.

A stack is a list of frames and a frame is a 4-tuple containing the following information:

the address in which to store the return value,
the statements following the function call (think return address),
the address of the enclosing function, and
a mapping from variable names to their addresses.

The address of the enclosing function is needed because the goto statement needs to look at the entire body of the function to determine where to jump to.

Putting this altogether, the state of the machine will consist of the current frame (minus the return value address), the stack, and memory. The formal definition of the state is given below. $\begin{array}{lrcl} \text{block id's} & b \\ \text{natural numbers} & n \\ \text{addresses} & a & ::= & (b,n) \\ \text{values} & v & ::= & a \mid i \mid \mathsf{garbage} \\ \text{environments} & \rho,\gamma & ::= & \{ (x_1,a_1), \ldots \} \\ \text{stacks} & \kappa & ::= & \epsilon \mid (a,\overline{s}, a, \rho) {::} \kappa \\ \text{stored values} & h & ::= & v \mid T\, x(\overline{T\,x}) \, \{ \overline{T\,x\mathtt{;}} \; \overline{s} \} \\ \text{memory} & \mu & ::= & \{ (b_1,(h_{11},\ldots,h_{1k})), \ldots \} \\ \text{states} & \sigma & ::= & (\overline{s}, a, \rho, \kappa, \mu) \end{array}$ The garbage value is meant to represent an uninitialized value.

The operational semantics describes transitions (steps) between abstract machine states. However, because FC includes input and output, we'll optionally annotate each transition with an input or output action: $\begin{array}{lrcl} \text{actions} & \chi & ::= & \mathrm{in}(i) \mid \mathrm{out}(i) \end{array}$

The abstract machine uses three important auxiliary functions: addr computes the address associated with an l-value, val computes the value of an expression, and goto finds the statements that follow a particular label. $\begin{align*} \mathit{addr}(x,\rho,\gamma,\mu) &= \begin{cases} \rho(x) & \text{if } x \in \mathrm{dom}(\rho) \\ \gamma(x) & \text{if } x \notin \mathrm{dom}(\rho), x \in \mathrm{dom}(\gamma) \end{cases} \\ \mathit{addr}(*e,\rho,\gamma,\mu) &= \mathit{val}(e,\rho,\gamma,\mu) \end{align*}$ $\begin{align*} \mathit{val}(x,\rho,\gamma,\mu) &= \begin{cases} \mu(\rho(x)) & \text{if } x \in \mathrm{dom}(\rho) \\ \mu(\gamma(x)) & \text{if } x \notin \mathrm{dom}(\rho), x \in \mathrm{dom}(\gamma) \end{cases} \\ \mathit{val}(*e,\rho,\gamma,\mu) &= \mu(\mathit{val}(e,\rho,\gamma,\mu)) \\ \mathit{val}(\& l,\rho,\gamma,\mu) &= \mathit{val}(l,\rho,\gamma,\mu) \\ \mathit{val}(i,\rho,\gamma,\mu) &= i \\ \mathit{val}(e_1 \odot e_2,\rho,\gamma,\mu) &= \delta(\odot, \mathit{val}(e_1,\rho,\gamma,\mu), \mathit{val}(e_2,\rho,\gamma,\mu)) \\ \mathit{val}(\diamond e,\rho,\gamma,\mu) &= \delta(\diamond, \mathit{val}(e,\rho,\gamma,\mu)) \\ \end{align*}$ The $\delta$ function specifies the results for the built-in operators. I'm going to skip its definition for now, but come back later and fill it in. $\begin{align*} \mathit{goto}(\ell, (\ell: s_0) \, \overline{s}) & = s_0 \overline{s} \\ \mathit{goto}(\ell, (\ell': s_0) \, \overline{s}) & = \mathit{goto}(\ell, s_0\,\overline{s}) & \text{if } l \neq l'\\ \mathit{goto}(\ell, s_0 \, \overline{s}) & = \mathit{goto}(\ell, \overline{s}) & \text{if } s_0 \text{ is not a label statement} \end{align*}$

We also use the following auxiliary notations in the description of the transition relation. $\begin{align*} R \backslash x &= \{ (k,v) \mid (k,v) \in R \land k \neq x \} \\ R[k\mapsto v] & = (R \backslash x) \cup \{ (k,v) \} \\ \mu[a \mapsto v] & = \mu[b \mapsto \mu(b)[n \mapsto v] ] & \text{ where } a = (b,n) \\ \mu(a) & = \mu(b)_n & \text{ where } a = (b,n) \\ \left[\overline{k} \mapsto \overline{v}\right] &= \{ (k,v) \mid \exists i.\, k = \overline{k}_i, v = \overline{v}_i \} \\ \mathit{zip}(\overline{x},\overline{y}) & = (x_1,y_1),\ldots,(x_n,y_n) \end{align*}$

We are now ready to define the transition relation. It will have the form $\gamma \vdash \sigma \overset{\chi}{\longrightarrow} \sigma'$ where $\gamma$ is the global variable map.

Assignment

First we define the transition for an assignment statement. The machine computes the address of the left-hand side, the value of the right-hand side, stores the value at the address, and moves on to the next statement without emitting any actions. $\begin{align*} \gamma \vdash (l \mathtt{=} e\mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \longrightarrow (\overline{s}, a_f, \rho, \kappa, \mu[a \mapsto v]) \\ \text{where } & a = \mathit{addr}(l,\rho,\gamma,\mu), v = \mathit{val}(e,\rho,\gamma,\mu) \end{align*}$ Let's try out this rule on a couple examples, starting with a simple one.

Example 1. Suppose the state of the machine is $(\mathtt{x = 5; y = x;}, \ldots, \rho_1, \epsilon, \mu_1)$ where $\rho_1 = \{(\mathtt{x},(b_1,0))\}, \qquad \mu_1 = \{ (b_1,(\mathsf{garbage})) \}$ We compute the address of the left-hand side and the value of the right-hand side. $(b_1,0) = \mathit{addr}(\mathtt{x}, \rho_1, \ldots, \mu_1), \qquad 5 = \mathit{val}(5, \rho_1,\ldots,\mu_1)$ so we transition to the following state $(\mathtt{y = x;}, \ldots, \rho_1, \epsilon, \mu_2) \text{ where } \mu_2 = \{ (b_1,(5)) \}$

Example 2. Now for a more complex example involving pointers. Suppose $p$ is a pointer to pointer to a garbage integer and $q$ is a pointer to 42. Suppose the state of the machine is $(\mathtt{*\!*p = *q; x = putchar(*\!*p);}, \ldots, \rho_2, \epsilon, \mu_3)$ where $\begin{align*} \rho_2 &= \{(\mathtt{p},(b_p,0)), (\mathtt{q}, (b_q,0)), (\mathtt{x}, (b_x,0))\}\\ \mu_3&= \{ (b_p,((b_2,0))), (b_2,(b_3,0)), (b_3,(\mathsf{garbage})), \\ & \quad\; (b_q,((b_{42},0))), (b_{42},(42)), \\ & \quad\; (b_x,(\mathsf{garbage})) \} \end{align*}$ We compute the address of the left-hand side: $\begin{align*} & \mathit{addr}(\mathtt{*\!*p}, \rho_2, \ldots, \mu_3) = \mathit{val}(\mathtt{*p}, \rho_2, \ldots, \mu_3) \\ & = \mu_3(\mathit{val}(\mathtt{p}, \rho_2, \ldots, \mu_3)) = \mu_3(\mu_3((b_p,0))) = (b_3,0) \end{align*}$ and the value of the right-hand side: $\mathit{val}(\mathtt{*q}, \rho_2,\ldots,\mu_3) = \mu_3(\mathit{val}(\mathtt{q}, \rho_2,\ldots,\mu_3)) = \mu_3(\mu_3( (b_q,0) )) = 42$ So the machine transitions to the following state: $(\mathtt{x = putchar(*\!*p);}, \ldots, \rho_2, \epsilon, \mu_4)$ where $\mu_4$ is like $\mu_3$ except at address $(b_3,0)$ : $\mu_4= \{ \ldots (b_3,(42)) \ldots \}$

Input and Output

Next let's consider input and output. A call to getchar retrieves a character (represented by an integer) from standard input. We model this by annotating the transition with the input character i. The character is store at the address of the left-hand side. $\begin{align*} \gamma \vdash (l \mathtt{=} \mathtt{getchar}() \mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \overset{\mathrm{in}(i)}{\longrightarrow} (\overline{s}, a_f, \rho, \kappa, \mu[a \mapsto i]) \\ \text{where } & a = \mathit{addr}(l,\rho,\gamma,\mu) \end{align*}$ A call to putchar is similar, except that we annotate the transition with the output character given by the argument to putchar, which we evaluate using the val function. $\begin{align*} \gamma \vdash (l \mathtt{=} \mathtt{putchar}(e) \mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \overset{\mathrm{out}(i)}{\longrightarrow} (\overline{s}, a_f, \rho, \kappa, \mu[a \mapsto i]) \\ \text{where } & a = \mathit{addr}(l,\rho,\gamma,\mu), i = \mathit{val}(e,\rho,\gamma,\mu) \end{align*}$

Labels and Goto

The label statement is the most trivial of our statements. We simply move on to execute the statement that was labeled. $\begin{align*} \gamma \vdash ((\ell{:}\, s)\,\overline{s}, a_f, \rho, \kappa,\mu) & \longrightarrow (s\,\overline{s}, a_f, \rho, \kappa, \mu) \end{align*}$ For the conditional goto, we have two cases to consider, depending on whether the conditional expression evaluates to zero or not. First, the non-zero case in which we use the auxiliary goto function to find the statements that follow the given label. $\begin{align*} \gamma \vdash (\mathtt{if}\,(e)\,\mathtt{goto}\,\ell\mathtt{;}\,\overline{s}, a_f, \rho, \kappa,\mu) & \longrightarrow (\mathit{goto}(\ell, \overline{s}'), a_f, \rho, \kappa, \mu) \\ \text{where } & \mathit{val}(e,\rho,\gamma,\mu) \neq 0 \\ & \mu(a_f) = T\;x_f(\overline{T\,x}) \{ \overline{T'\,y;} \;\; \overline{s}' \} \end{align*}$ In the case of zero, the machine simply falls through to the next statement. $\begin{align*} \gamma \vdash (\mathtt{if}\,(e)\,\mathtt{goto}\,\ell\mathtt{;}\,\overline{s}, a_f, \rho, \kappa,\mu) & \longrightarrow (\overline{s}, a_f, \rho, \kappa, \mu) \\ \text{where } & \mathit{val}(e,\rho,\gamma,\mu) = 0 \end{align*}$

Example 3. Consider the following sequence of statements that implement a while loop. The comments assigning numbers to each statement:

  x = 1;                      // 1
start:                        // 2
  if (x == 0) goto end;       // 3
  x = x - 1;                  // 4
  if (1) goto start;          // 5
end:                          // 6
  x = putchar(x + 65);        // 7

The abstract machine executes these statements in the following order: $1, 2, 3, 4, 5, 3, 7$ and outputs the letter A (ASCII code 65).

To go into more detail regarding goto, consider the transition from statement 5 to 3. Let $s_i$ refer to the ith statement in the above program. The conditional expression of $s_5$ does not evaluate to 0, so the machine uses the goto auxiliary function to determine the target of the jump, looking for statement labeled start. The result is the sequence of statements following the start label, which is $s_3,s_4,s_5,s_6,s_7$ . So the machine transitions to a new state and begins executing at $s_3$ .

Malloc and Free

A call to malloc allocates a block of n bytes in memory and returns the address of the block, which is assigned into the left-hand side. Because our values are 4 bytes, the allocated block will contain a tuple of n/4 uninitialized values. We use the notation $\mathsf{garbage}^i$ for a tuple of $i$ garbage values. $\begin{align*} \gamma \vdash (l \mathtt{=} \mathtt{malloc}(e) \mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \longrightarrow (\overline{s}, a_f, \rho, \kappa, \mu') \\ \text{where } & a = \mathit{addr}(l,\rho,\gamma,\mu), n = \mathit{val}(e,\rho,\gamma,\mu) \\ & b \notin \mathsf{dom}(\mu) \\ & \mu' = \mu[b\mapsto\mathsf{garbage}^{n/4} ][a \mapsto (b,0)] \end{align*}$ To execute a call to free, we evaluate the argument to an address and then remove that address and its associated block from memory. $\begin{align*} \gamma \vdash (\mathtt{free}(e) \mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \longrightarrow (\overline{s}, a_f, \rho, \kappa, \mu \backslash a_1) \\ \text{where } & a = \mathit{val}(e,\rho,\gamma,\mu) \end{align*}$

Function Call and Return

The transition for a function call is the most involved of the transitions, so take a deep breadth and make sure you have some coffee at hand! Computing the address of the left-hand side of the assignment gives us the address in which the return value will be stored. We'll push it along with the current frame onto the stack. We'll compute the value of the expression in the function position to get the address $a_{f'}$ of the function we'll be calling, which we retrieve from memory, and we compute the values of the arguments $\overline{v}$ . Next, we allocate single-value blocks $\overline{b}$ in memory, one for each parameter and local variable, and record the mapping of variable names to these addresses in the environment $\rho'$ . We initialize the parameter's locations with the argument values and we leave the local parameters uninitialized. $\begin{align*} \gamma \vdash (l \mathtt{=} e_f(\overline{e}) \mathtt{;} \; \overline{s}, a_f, \rho, \kappa, \mu) & \longrightarrow (\overline{s}', a_{f'}, \rho', (a_r,\overline{s},a_f,\rho) {::} \kappa, \mu') \\ \text{where } & a_r = \mathit{addr}(l,\rho,\gamma,\mu) \\ & a_{f'} = \mathit{val}(e_f,\rho,\gamma,\mu), \overline{v} = \overline{\mathit{val}}(\overline{e},\rho,\gamma,\mu) \\ & \mu(a_{f'}) = T\;x_f(\overline{T\,x}) \{ \overline{T'\,y;} \;\; \overline{s}' \} \\ & \overline{b} \cap \mathrm{dom}(\mu) = \emptyset \\ & \rho' = [\overline{x},\overline{y} \mapsto \mathit{zip}(\overline{b},\overline{0})] \\ & \mu' = \mu[\rho'(\overline{x}) \mapsto \overline{v}] \end{align*}$ Thankfully, returning from a function is much easier. We simply pop the top frame from the stack, store the return value at the appropriate address, and deallocate the storage for the parameters and local variables. $\begin{align*} \gamma \vdash (\mathtt{return}\,e\mathtt{;}\,\overline{s}, \rho, a_f, (a_r,\overline{s}',a_{f'},\rho'){::}\kappa,\mu) & \longrightarrow (\overline{s}', a_{f'}, \rho', \kappa, \mu') \end{align*}$ $\begin{align*} \text{where } \mu' &= (\mu \backslash (\overline{x},\overline{y}))[a_r \mapsto \mathit{val}(e,\rho,\gamma,\mu)] \\ \mu(a_f) & = T\;x_f(\overline{T\,x}) \{ \overline{T'\,y;} \;\; \ldots \} \\ \end{align*}$

Multiple Transitions

We define the multi-transition relation below. The main idea is that we annotate a multi-transition by collecting up all the actions into a sequence of actions. $\frac{ }{ \gamma \vdash \sigma \overset{\epsilon}{\longrightarrow^{*}} \sigma } \qquad \frac{\gamma \vdash \sigma_1 \overset{\chi}{\longrightarrow} \sigma_2 \quad \gamma \vdash \sigma_2 \overset{\overline{\chi}}{\longrightarrow^{*}} \sigma_3 }{ \gamma \vdash \sigma_1 \overset{\chi,\overline{\chi}}{\longrightarrow^{*}} \sigma_3 }$

The Semantics of a Whole Program

To wrap up the semantics for FC, we give the meaning for a whole program in terms of the abstract machine transitions. The meaning is the return value from the main function together with the sequence of input and output actions that occurred during program execution. Nothing else is considered observable. The meaning is expressed as a relation because there can be many different action sequences depending on the input. $\begin{align*} \mathit{Eval} =& \{ (p, i, \overline{\chi}) \mid \gamma \vdash \sigma_0 \overset{\overline{\chi}}{\longrightarrow^{*}} (\epsilon, a_{init}, \rho_0, \epsilon, \mu'), i = \mathit{val}(\mathtt{ret},\rho_0,\gamma,\mu') \} \\ \text{where } & \sigma_0 = (\mathtt{ret=main();}, a_{init}, \rho_0, \epsilon,\mu_0), \rho_0 = \{(\mathtt{ret},a_{ret})\} \end{align*}$ The initial memory $\mu_0$ contains storage for the global variables and functions in program $p$ , a pointer to each function, and a block $a_{ret}$ for the return value from main. The global environment $\gamma$ maps the names of the global variables and functions to the addresses of the global variables and the addresses of the function pointers, respectively.

Type System

The type system for the expressions of FC is defined as follows. $\begin{gather*} \frac{(x,T) \in \Gamma}{\Gamma \vdash x : T} \qquad \frac{\Gamma \vdash l : T}{\Gamma \vdash \mathtt{*}l : T\mathtt{*}} \qquad \frac{}{\Gamma \vdash n : \mathtt{int}} \\[2ex] \frac{\Gamma \vdash e_1 : T_1 \quad \Gamma \vdash e_2 : T_2 \quad \Delta(\odot,(T_1,T_2),T_3)}{\Gamma \vdash e_1 \odot e_2 : T_3} \qquad \frac{\Gamma \vdash e : T_1 \quad \Delta(\diamond, (T_1), T_2)}{\Gamma \vdash \diamond e : T_2} \end{gather*}$ Here's an exerpt from the definition of $\Delta$ . $\begin{align*} \Delta(\mathtt{+},\mathtt{int},\mathtt{int}) &= \mathtt{int} \\ \Delta(\mathtt{+},T\mathtt{*},\mathtt{int}) &= T\mathtt{*} & \text{for any } T \\ \Delta(\mathtt{+},\mathtt{int},T\mathtt{*}) &= T\mathtt{*} & \text{for any } T \\ \Delta(\mathtt{==},\mathtt{int},\mathtt{int}) &= \mathtt{int} \\ \Delta(\mathtt{==},T\mathtt{*},T\mathtt{*}) &= \mathtt{int} \\ & \vdots \end{align*}$ The type system for the statements of FC is defined as follows. $\begin{gather*} \frac{\Gamma \vdash l : T \quad \Gamma \vdash e : T}{\Gamma;T_r \vdash l = e} \qquad \frac{ \Gamma; T_r \vdash e : T\mathtt{*} }{ \Gamma; T_r \vdash \mathtt{free(} e \mathtt{);} } \\[2ex] \frac{\Gamma \vdash l : T \quad \Gamma \vdash e_f : T(\overline{T}) \quad \Gamma \vdash \overline{e} : \overline{T}} {\Gamma;T_r \vdash l = e_f(\overline{e});} \\[2ex] \frac{\Gamma \vdash e : \mathtt{int}} {\Gamma;T_r \vdash \mathtt{if}\,(e)\,\mathtt{goto}\,\ell\mathtt{;}} \qquad \frac{\Gamma \vdash e : T_r} {\Gamma; T_r \vdash \mathtt{return}\,e\mathtt{;}} \end{gather*}$ The following is the type rule for function definitions. $\begin{gather*} \frac{ \Gamma[x_f \mapsto T(\overline{T})][\overline{x}\mapsto\overline{T}][\overline{y}\mapsto\overline{T}']; T \vdash \overline{s} }{ \Gamma \vdash T\;x_f(\overline{T\,x}) \{ \overline{T'\,y;} \;\; \overline{s} \} } \end{gather*}$ To type check the whole program, we first collect the types of all the top-level declarations into the global environment $\Gamma_0$ . Then we type check every top-level function using $\Gamma_0$ as the environment.

Conclusion

This post defined a subset of C, Featherweight C, that I hope is familiar to most programmers, and provides an example of using an abstract machine to define operational semantics. There's a few things left to be done: fill in the $\delta$ function, and add more concrete examples, which I'll do the next time I can grab an hour or two. If you see any places in which this semantics is inaccurate, please let me know!

Thursday, July 26, 2012

Crash Course on Notation in Programming Language Theory

This blog post is meant to help my friends get started in reading my other blog posts, that is, this post is a crash course on the notation used in programming language theory ("PL theory" for short). For a much more thorough introduction, I recommend Types and Programming Languages by Benjamin C. Pierce and Semantic Engineering with PLT Redex by Felleisen, Findler, and Flatt. I'll assume the reader is an experienced programmer but not an experienced mathematician or PL theorist. I'll start with the most basic definitions and try to build up quickly.

Sets, Tuples, Relations, and Definition by Rules

I suspect many readers will already be familiar with sets, tuples, and relations and thus may want to skip the next few subsections, but if you are not familiar with inductive definitions, then please make sure to read the subsection below titled Definition by Rules.

Sets

The main building block that we use in PL theory is the set, a collection of objects (also called elements), such as the set containing the first three natural numbers: $\{ 0, 1, 2 \}$ The only thing that matters is whether an object is in the set or not; it doesn't make sense to ask if there are duplicates of an object or in what order the objects appear in the set. For example, the set $\{ 0,2,1 \}$ is the same set as the set listed above. The notation $\in$ means "in", so $1 \in \{ 0,1,2\}$ is true and $3 \in \{ 0,1,2\}$ is false. Sets may have an infinite number of elements, such as the set of all natural numbers (non-negative integers), written $\mathbb{N}$ .

Tuples

Another building block is the tuple, which is an ordered collection of objects. So $(0,1,2)$ is a tuple of three elements and it is different from the tuple $(0,2,1)$ . The subscript notation $t_i$ retrieves the ith element of tuple $t$ . For example, if $t=(0,2,1)$ , then $t_1=0,t_2=2,t_3=1$ . Tuples contain only a finite number of elements and usually less than a handful. Sometimes angle brackets are used for tuples instead of parentheses, such as $\langle 0,2,1 \rangle$ .

Relations

Putting tuples and sets together we get relations. That is, a relation is a set of tuples. $\{ (0,1), (1,2), (2,3), (3,4), \ldots \}$ We often use relations to represent a mapping from input to output. For example, the above relation can be thought of as mapping a natural number to its successor, that is, to the next greater natural number. The above definition is rather imprecise because of the elipses ( $\ldots$ ). Fortunately, there are more precise notations for describing infinite sets and relations.

Definition by Rules

The main way that we define infinite sets in PL theory is by giving a list of rules for what is in the set. Let's use the name $R$ for the above relation. Then the following two rules give a precise definition of $R$ . Notice that the second rule is recursive in that it refers to itself. That's ok and quite common.

$(0,1) \in R$ .
For any natural numbers $n$ and $m$ , if $(n,m) \in R$ , then $(n+1,m+1) \in R$ .

When we use rules to define a set, we implicitly mean that an element is not in the set if there is no way to use the given rules to justify that the element should be in the set. So $(0,0)$ is not in $R$ because there is no way to use the above two rules to conclude that $(0,0)$ is in $R$ .
Some sets of rules are nonsensical and do not define a set. For example, rules should not be contradictory as in the following.

$(0,1) \in R$ .
$(0,1) \notin R$ .

A textbook on set theory will give the restrictions on what constitutes a "good" bunch of rules, but we won't go into that here, other than to point out that you need at least one non-recursive rule and that logical negation should be avoided.
A common notation for rules such as the above uses a horizontal line in place of "if" and "then". For example, an equivalent definition of $R$ is given by the following. $(1)\,\frac{}{(0,1) \in R} \qquad (2)\,\frac{(n,m) \in R}{(n+1,m+1) \in R}$ We have dropped the "For any natural numbers $n$ and $m$ " part of rule 2. The convention is that variables such as $n$ and $m$ that appear in a rule can be replaced by any object of the correct type, in this case, a natural number. Often the "correct type" is something you can deduce from the context of the discussion, in this case, the natural numbers.
Suppose that I claim that a particular element is in $R$ , say $(2,3) \in R$ . You might respond by saying that you don't believe me. To convince you, I need to show you how the rules justify that $(2,3) \in R$ , that is, I need to show you a derivation. A derivation is a chaining together of rules, replacing variables such as $n$ and $m$ with particular objects and replacing premises such as $(n,m) \in R$ with sub-derivations. $(2) \frac{ (2)\begin{array}{c} (1)\begin{array}{c} \\\hline (0,1) \in R\end{array} \\\hline (1,2) \in R \end{array} }{ (2,3) \in R }$ I've labelled each step in the derivation with the rule number.
The fancy name for what I'm calling Definition by Rules is inductive definition. (My daughter loves the Fancy Nancy series of books.)

Language Syntax and Grammars

It turns out that using rules to defines sets, as we did above, is how we define the syntax of a programming language. Suppose we'd like to define a simple language of integer arithmetic, call it $\mathit{Arith}$ , including expressions such as 1 + 3 and -(5 + 2). Recall that $\mathbb{Z}$ is the set of all integers. Then here's a bunch of rules that we might use to define $\mathit{Arith}$ :

For any $i\in \mathbb{Z}$ , $i \in \mathit{Arith}$ .
For any $e$ , if $e \in \mathit{Arith}$ , then - $e \in \mathit{Arith}$ .
For any $e_1$ and $e_2$ , if $e_1 \in \mathit{Arith}$ and $e_2 \in \mathit{Arith}$ , then $e_1\mathtt{+}e_2 \in \mathit{Arith}$ .
For any $e$ , if $e \in \mathit{Arith}$ , then $\mathtt{(}e\mathtt{)}\in \mathit{Arith}$ .

Backus-Naur Form (BNF) is another common notation for writing rules that define the syntax of a language, but the meaning is the same. (There are several variations on BNF; I forget which one I'm using here.) The bunch of rules is referred to as a grammar.

Arith ::= integer
Arith ::= "-" Arith
Arith ::= Arith "+" Arith
Arith ::= "(" Arith ")"

A vertical bar (meaning "or") is often used to make such syntax definitions more concise, as follows.

Arith ::= integer | "-" Arith | Arith "+" Arith | "(" Arith ")"

In PL theory, we use a peculiar variation on BNF that replaces the name of the language being defined, in this case $\mathit{Arith}$ , with the variable that is used to range over elements of $\mathit{Arith}$ . So suppose we are using the variable $i$ as a placeholder for any integer and $e$ as a placeholder for elements of $\mathit{Arith}$ . Then we would write the above grammar as $e ::= i \mid - e \mid e + e$ Note that I've dropped the parentheses. It's generally understood that parentheses are allowed in any language.
The notion of derivation coincides with that of a parse tree, they both demonstrate why a particular element is in a set.

Operational Semantics

A language is brought to life by defining what it means to run a program in the language, that is, the operational semantics of a language. In the case of $\mathit{Arith}$ , we just need to specify the integer output of each program. As discussed above, relations can be used to map inputs to outputs, and indeed, we typically use relations for this purpose in PL theory. There's several different styles of relations, the first we'll discuss is a big-step style of semantics that maps a program directly to its output.

Big-step Semantics

Let's define a relation $\mathit{Eval}$ that maps elements of $\mathit{Arith}$ to integers. For example, we'd like to have $(\mathtt{-(2 + -5)},3) \in \mathit{Eval}$ . This relation will be infinite (because there are an infinite number of programs in $\mathit{Arith}$ ), so again we'll use a bunch of rules to define $\mathit{Eval}$ . But before we start, it's common to introduce some shorthand: $e \Downarrow i$ means $(e,i) \in \mathit{Eval}$ . Below we state the rules that define $\mathit{Eval}$ using the horizontal notation. To make sure we don't leave out any programs, we create one rule for each syntactic rule of $\mathit{Arith}$ (there are three). We say that the rules are syntax-directed when there is one rule for each syntactic rule in the language. $\frac{}{i \Downarrow i} \qquad \frac{e \Downarrow i \quad j = -i}{\mathtt{-}e \Downarrow j} \qquad \frac{e_1 \Downarrow i \quad e_2 \Downarrow j \quad k = i + j} {e_1\mathtt{+}e_2 \Downarrow k}$ It may seem a little odd that I'm defining - in terms of $-$ , and similarly for +. Isn't that circular? No, the $-$ and $+$ are the usual arithmetic operators for integers that everyone learned in grade school. In this way, the $\mathit{Arith}$ language is rather odd in not using 32 or 64-bit arithmetic. An implementor of $\mathit{Arith}$ would have to use a big-integer package to properly handle the arithmetic.

Small-step Semantics

The second, and perhaps more common, style of operational semantics is small-step semantics. In this style, the relation doesn't map a program to its output, but instead it maps a program to a slightly simplified program in which one subexpression has been computed. This style of semantics can be thought of as textual rewriting. To give an example of this style, let's define a relation named $\mathit{Step}$ . We'll want this relation to have the following elements, among many others: $\begin{align*} (\mathtt{-(2 + -5)}, \mathtt{-(-3)}) & \in \mathit{Step} \\ (\mathtt{-(-3)}, \mathtt{3}) & \in \mathit{Step} \\ \end{align*}$ Again, we'll introduce shorthand: $e_1 \longrightarrow e_2$ means $(e_1,e_2) \in \mathit{Step}$ . Also, we'll chain together steps, so $e_1 \longrightarrow e_2 \longrightarrow e_3$ means $e_1 \longrightarrow e_2$ and $e_2 \longrightarrow e_3$ . The term reduce is a synonym for step. The above example of two steps can now be written as $\mathtt{-(2 + -5)} \;\longrightarrow\; \mathtt{-(-3)} \;\longrightarrow\; \mathtt{3}$ OK, on to the rules that define the $\mathit{Step}$ relation. There are five rules, which we explain below. $\begin{gather*} (1)\frac{j = -i}{\mathtt{-}i \longrightarrow j} \qquad (2)\frac{k = i + j}{i \mathtt{+} j \longrightarrow k} \\[3ex] (3)\frac{e \longrightarrow e'}{\mathtt{-}e \longrightarrow \mathtt{-}e'} \qquad (4)\frac{e_1 \longrightarrow e'_1}{ e_1 \mathtt{+} e_2 \longrightarrow e'_1 \mathtt{+} e_2} \qquad (5)\frac{e_2 \longrightarrow e'_2}{ i \mathtt{+} e_2 \longrightarrow i \mathtt{+} e'_2} \end{gather*}$ Rules (1) and (2) are the most interesting; they perform the arithmetic. We call them computational reduction rules. Rules (3-5) allow us to reach inside of sub-expressions to perform computation. They are often called congruence rules for reasons we won't go into. The use of the variable $i$ in rule (5) means that reduction proceeds from left to right. In particular, we're not allowed to reduce the right-hand expression of a plus until we've already reduced the left-hand side to an integer.
Aside: This left-to-right ordering is a choice that I made as the designer of this example language. I could have not specified an ordering, letting it be non-deterministic. However, this example language doesn't have side-effects, so the ordering doesn't matter! However, most language do have side-effects and they do specify an ordering (but not all!), so I thought to include an example of how ordering is typically specified.
Time for an example: let's see the derivation of the step $\mathtt{-(2 + -5)}\;\longrightarrow\; \mathtt{-(-3)}$ . $(3)\begin{array}{c} (2)\begin{array}{c}\\\hline \mathtt{2 + -5} \;\longrightarrow\; \mathtt{-3} \end{array} \\\hline \mathtt{-(2 + -5)}\;\longrightarrow\; \mathtt{-(-3)} \end{array}$
We've defined a single step of computation, the $\mathit{Step}$ relation, but we're not quite done. We still need to specify what it means to run a program to completion. We'll do this by defining a relation $\mathit{Eval}'$ in terms of the $\mathit{Step}$ relation as follows. In plain Engilish, the $\mathit{Eval}'$ relation will contain any pair $(e,i)$ if expression $e$ reduces to integer $i$ in zero or more steps. Some of the notation here is new and is explained below. $\mathit{Eval'} = \{ (e,i) \mid e \longrightarrow^{*} i \}$ The notation $\{ \ldots \mid \ldots \}$ is the set builder or set comprehension notation for defining a set. The stuff to the left of the vertical bar is a template for a typical element of the set and the stuff to the right of the vertical bar places restrictions on the elements in the set. The notation $\longrightarrow^{*}$ means zero or more steps. I like to define this multi-step relation with the following rules: $\frac{}{e \longrightarrow^{*} e} \qquad \frac{e_1 \longrightarrow e_2 \quad e_2 \longrightarrow^{*} e_3} {e_1 \longrightarrow^{*} e_3}$ (My brain is good at reasoning about Lisp-style lists, so I think of the first rule as nil and the second rule as cons.)

Type Systems (with the Lambda Calculus as an example)

Many programming languages are statically typed, that is, the compiler performs some sanity checking before proceeding with the actual work of compiling. The checking usually involves make sure that objects are only used as intended, for example, not trying to treat an integer as if it were a function. The way a programming language designer (PL theorist) specifies what kind of sanity checking should be performed is by defining a type system for the language. The $\mathit{Arith}$ language is so simple that there is no interesting type checking to be performed. Let's consider a slightly larger language that also happens to be used over and over again in PL theory, the lambda calculus (technically, the simply-typed lambda calculus). The lambda calculus just consists of first-class anonymous functions. Here we'll extend the lambda calculus to also include our arithmetic expressions. So now our example language is defined by the following grammar. $e ::= i \mid - e \mid e + e \mid x \mid e\,e \mid \lambda x{:}T.e$ The $x$ variable ranges over parameter names, such as foo and g. Two expressions right next to each other denote function application (i.e., function call). So if you're familiar with the C language, read $e_1 \, e_2$ as $e_1(e_2)$ . In the lambda calculus, functions only take one parameter, so function calls only require one argument. The syntax $\lambda x{:}T.e$ creates a function with one parameter named $x$ of type $T$ (types will be defined shortly) and whose body is the expression $e$ . (A common point of confusion is to think that $x$ is the name of the function. It instead is the parameter name. The function is anonymous, i.e. it doesn't have a name.) The return value of the function will be whatever the expression $e$ evaluates to.
Now let's consider what kind of objects will be alive when we run the program: there's integers and functions. We'll create a set of types to describe the kinds of objects, using $T$ to range over the set of types. $T ::= \mathtt{Int} \mid T \to T$ In a function type $T_1 \to T_2$ , the $T_1$ is the type of the parameter and $T_2$ is the return type.
The job of a type system is to predict which type of value will be produced by an expression. For example, the expression -(5 + 2) should have the type Int because the result of -(5 + 2) is -3, which is an integer. As with the syntax and operational semantics of a language, PL theorists use relations and rules to define a type system. We'll define a relation named $\mathit{WellTyped}$ that, as a first approximation, maps expressions to types, so for example, we'll have $(\mathtt{-(5 + 2)},\mathtt{Int}) \in \mathit{WellTyped}$ .
However, because the lambda calculus includes variables, we'll need something analogous to a symbol table, a relation called a type environment, to keep track of which variables have which types. The Greek letter $\Gamma$ (gamma) is traditionally used for this purpose. We'll need to be able to create new type environments from old ones, potentially overshadowing variable definitions from outer scopes. To set up the mathematical machinery for that, we define $\Gamma \backslash x$ to be the relation just like $\Gamma$ except that any tuple starting with $x$ is removed. (The way the type system will be defined, there may be 0 or 1 tuple that starts with $x$ , making the type environment a special kind of relation called a partial function.) We'll write $\Gamma,x{:}T$ for the operation of extending a type environment with variable x, possibly overriding a previous definition, and define it as follows: $\Gamma,x{:}T = (\Gamma \backslash x) \cup \{ (x, T) \}$ Suppose we have $\Gamma_1 = \{ (\mathtt{foo},\mathtt{Int}), (\mathtt{g},\mathtt{Int} \to \mathtt{Int}) \}$ Then $\begin{align*} \Gamma_1,\mathtt{g}{:}\mathtt{Int} &= \{ (\mathtt{foo},\mathtt{Int}), (\mathtt{g},\mathtt{Int}) \} \\ \Gamma_1,\mathtt{f}{:}(\mathtt{Int}{\to}\mathtt{Int}){\to}\mathtt{Int} &= \{ (\mathtt{foo},\mathtt{Int}), (\mathtt{g},\mathtt{Int} {\to} \mathtt{Int}), (\mathtt{f}, (\mathtt{Int}{\to}\mathtt{Int}){\to}\mathtt{Int}) \} \end{align*}$
One way in which type environments are different from the global symbol table in a compiler is that there isn't just one type environment, there will be lots of them, one for each scope. Further, we won't ever update a type environment in place, we'll keep creating new ones that differ a little bit from the old ones. From a programming perspective, the mathematical metalanguage we're using here is pure functional, that is, it doesn't use state or side effects. The reader might worry that this might lead to inefficiency, but remember, we're not writing a program here, we're writing a specification! Clarity is what matters most in this setting, and staying pure helps to make things clear.
Getting back to the $\mathit{WellTyped}$ relation, instead of containing 2-tuples (pairs) it will contain 3-tuples (triples) of the form $(\Gamma,e,T)$ , so we'll be assigning types to expressions in the context of a type environment. As yet more shorthand (PL theorists love shorthand!), we'll write $\Gamma \vdash e : T$ instead of $(\Gamma,e,T) \in \mathit{WellTyped}$ . We're now ready to write down the rules that define $\mathit{WellTyped}$ . $\begin{gather*} \frac{}{\Gamma \vdash i : \mathtt{Int}} \qquad \frac{\Gamma \vdash e : \mathtt{Int}} {\Gamma \vdash \mathtt{-}e : \mathtt{Int}} \qquad \frac{\Gamma \vdash e_1 : \mathtt{Int} \quad \Gamma \vdash e_2 : \mathtt{Int}} {\Gamma \vdash e_1\mathtt{+}e_2 : \mathtt{Int}} \\[3ex] \frac{(x, T) \in \Gamma}{\Gamma \vdash x : T} \qquad \frac{\Gamma,x{:}T_1 \vdash e : T_2 }{\Gamma \vdash (\lambda x{:}T_1. e) : (T_1 \to T_2)} \qquad \frac{\Gamma \vdash e_1 : T \to T' \quad \Gamma \vdash e_2 : T} {\Gamma \vdash e_1 \, e_2 : T'} \end{gather*}$ To sum up the above rules, the arithmetic operators work on integers, variables get their types from the environment, lambdas are given function types based on their parameter type and their deduced return type, the body of a lambda is checked using the environment from the point of creation (this is lexical scoping) extended with the lambda's parameter, and function application is sane so long as the argument's type is the same as the parameter type.

Conclusion

Thus ends this crash course on the notation used in programming language theory. This blog post only scratches the surface, but much of the additional notation that you'll need is variations on what's covered here. Happy reading! And P.S., feel free to ask questions in the form of comments to this blog.

Wednesday, July 25, 2012

Big-step, diverging or stuck?

In a naive big-step semantics, one can't tell the difference between the following two programs, even though their behavior is quite different according to single-step reduction rules. $\mathsf{fix}\,(\lambda f. \lambda y. f \, y)\,3$ $42 \;\, 3$ The first program diverges (runs forever) whereas the second program is stuck, that is, it results in a run-time type error because you can't perform function application with a number in the function position. However, a naive big-step semantics is undefined for both programs. That is, you can't build a derivation of the evaluation relation for either program.

The naive big-step semantics that I'm referring to is the following. The $\rho$ maps variables to values (parameters to their arguments), and is called the environment. $\rho \vdash x \Downarrow \rho(x) \qquad \rho \vdash \lambda x. e \Downarrow \langle \lambda x.e , \rho \rangle \\[2ex] \frac{ \begin{array}{c} \rho \vdash e_1 \Downarrow \langle \lambda x.e', \rho' \rangle \quad \rho \vdash e_2 \Downarrow v' \\ \rho'[x\mapsto v'] \vdash e' \Downarrow v \end{array} }{ \rho \vdash e_1 \, e_2 \Downarrow v } \\[3ex] \frac{ \rho \vdash e \Downarrow \langle \lambda f. e', \rho' \rangle \quad \rho'[f\mapsto \langle \lambda f. e', \rho' \rangle] \vdash e' \Downarrow v }{ \rho \vdash \mathsf{fix}\, e \Downarrow v }$

Trouble formulating type safety

This inability to distinguish between divergence and stuck makes it difficult to formulate type safety using a naive big-step semantics. The following statement is true but not very strong. $\text{If } \emptyset \vdash e : T \text{ and } \emptyset \vdash e \Downarrow v \text{ then } \emptyset \vdash v : T.$ It captures the notion of type preservation, but not progress. That is, the above does not rule out the possibility of type errors because the premise $\emptyset \vdash e \Downarrow v$ is false in such cases, making the entire statement vacuously true. On the other hand, the following statement is so strong that it is false. $\text{If } \emptyset \vdash e : T \text{ then } \emptyset \vdash e \Downarrow v \text{ and } \emptyset \vdash v : T \text{ for some } v.$ One counter-example to the above statement is the diverging program above. The program is well typed, but it does not terminate and return a value.

Several textbooks stop at this point and just say that the above is a problem with big-step semantics. Not true! There are several solutions to this problem in the literature. I've included papers that address the divergence/stuck issue at the bottom of this blog. If you know of others, please let me know! The solutions I've seen fall into three categories:

Introduce a separate co-inductive definition of divergence for the language in question.
Develop a notion of partial derivations.
Add a time counter that causes a time-out error when it gets to zero. That is, make the semantics step-indexed.

Of the three kinds of solutions, the counter-based solution is the simplest, and it gets the job done, so I won't discuss the other two. I first learned of this approach from the paper A Virtual Class Calculus by Ernst, Ostermann, and Cook.

Big-step Semantics with Counters

The first part of the solution is to augment the big-step semantics with a counter k, making sure to decrement the counter in the premises of each rule. $\rho; k \vdash x \Downarrow \rho(x) \qquad \rho; k \vdash \lambda x. e \Downarrow \langle \lambda x.e , \rho \rangle \\[2ex] \frac{ \begin{array}{c} \rho; k-1 \vdash e_1 \Downarrow \langle \lambda x.e', \rho' \rangle \quad \rho; k-1 \vdash e_2 \Downarrow v' \\ \rho'[x\mapsto v']; k-1 \vdash e' \Downarrow v \end{array} }{ \rho; k \vdash e_1 \, e_2 \Downarrow v } \\[3ex] \frac{ \rho; k-1 \vdash e \Downarrow \langle \lambda f. e', \rho' \rangle \quad \rho'[f\mapsto \langle \lambda f. e', \rho' \rangle]; k-1 \vdash e' \Downarrow v }{ \rho; k \vdash \mathsf{fix}\, e \Downarrow v }$ In addition, all of the above rules have a side condition requiring $k > 0$ . Next, we add a rule to handle the $k = 0$ case. $\rho; 0 \vdash e \Downarrow \mathsf{TimeOut}$ Finally, we add the usual rules for propagating errors such as this time out. $\frac{ \rho; k-1 \vdash e_1 \Downarrow \mathsf{TimeOut} }{ \rho; k \vdash e_1 \, e_2 \Downarrow \mathsf{TimeOut} } \\[3ex] \frac{ \rho; k-1 \vdash e_1 \Downarrow \langle \lambda x.e', \rho' \rangle \quad \rho; k-1 \vdash e_2 \Downarrow \mathsf{TimeOut} }{ \rho; k \vdash e_1 \, e_2 \Downarrow \mathsf{TimeOut} } \\[3ex] \frac{ \begin{array}{c} \rho; k-1 \vdash e_1 \Downarrow \langle \lambda x.e', \rho' \rangle \quad \rho; k-1 \vdash e_2 \Downarrow v' \\ \rho'[x\mapsto v']; k-1 \vdash e' \Downarrow \mathsf{TimeOut} \end{array} }{ \rho; k \vdash e_1 \, e_2 \Downarrow \mathsf{TimeOut} }\\[3ex] \frac{ \rho; k-1 \vdash e \Downarrow \mathsf{TimeOut} }{ \rho; k \vdash \mathsf{fix}\, e \Downarrow \mathsf{TimeOut} }$

With the addition of the counter, we can always build a derivation for a diverging program, with the result being a time out, no matter how high you set the counter to begin with. However, for a program that gets stuck, there will be some starting counter that is high enough so that you can't build a derivation anymore (you might have been able to get time outs with lower values).

The type safety theorem can now be stated in a way that is both strong enough and true! Let r range over both values and the time out. $\text{For any } k, \text{ if } \emptyset \vdash e : T \text{ then } \emptyset; k \vdash e \Downarrow r \text{ and } \emptyset \vdash r : T \text{ for some } r.$ (We let the time out take on any type.) At first glance this statement might seem weak. If the counter k is 0, then its trivial to build a derivation of $\emptyset; k \vdash e \Downarrow r$ . However, note that the statement says for any k. Thus, if the program get's stuck, there will be some value for k for which you can't build a derivation, making the above statement false. So this indeed is a good formulation of type safety. If the above holds, then no well-typed program will get stuck!

One parting comment. The addition of propagation rules was rather annoying. I'll discuss how to avoid propagation rules in big-step semantics in an upcoming blog post, summarizing a talk I gave at Scottish Programming Languages Seminar in 2010.

References

C. A. Gunter and D. Rémy. A Proof-Theoretic Assesment of Runtime Type Errors. AT&T Bell Laboratories Technical Memo 11261-921230-43TM, 1993.
A. Stoughton. An operational semantics framework supporting the incremental construction of derivation trees. Electr. Notes Theor. Comput. Sci., 10, 1997.
H. Ibraheem and D. A. Schmidt. Adapting big-step semantics to small-step style: Coinductive interpretations and “higher-order” derivations. In Second Workshop on Higher-Order Techniques in Operational Semantics (HOOTS2). Elsevier, 1997.
Kathleen Fisher and John Reppy. Statically typed traits. University of Chicago, technical report TR-2003-13, December 2003.
Erik Ernst, Klaus Ostermann, and William R. Cook. A Virtual Class Calculus. In POPL 2006.
Xavier Leroy and Herve Grall. Coinductive big-step operational semantics. Journal of Information and Computation. Vol. 207, Issue 2, February 2009.
Big-step Operational Semantics Revisited, by Jarosław Dominik, Mateusz Kuśmierek, and Viviana Bono. In Fundamenta Informaticae Volume 103, Issue 1-4, January 2010.

Tuesday, July 17, 2012

Linking isn't substitution: the separately-compiled lambda calculus

I've been reading some papers on separate compilation, such as Cardelli's Program fragments, linking, and modularization, and I'm seeing a trend in using substitution to model the linking of compilation units (units for short). While substitution does a fine job of conveying the semantics of the final executable, I think it falls down in conveying the semantics of a unit, especially from the point of view of a compiler writer.

What I'm interested in here is the situation in which one or more compilers are used to compile several units, and then these units are linked together into an executable. We want to allow the compilers to perform aggressive optimizations within a unit, while still requiring the compilers to respect the application binary interface (ABI), which says how different units must talk to each other.

To take a stab at such a semantics, let's consider a variant of the lambda calculus that supports separate compilation. Each compilation unit will have a name c and a function table $\varphi$ , which maps function names to lambdas. At run-time, each compilation unit will also need a stack $\kappa$ to store it's part of the procedure call stack. In addition to the usual expressions of the lambda calculus, we'll allow references to functions in other compilation units: the expression c.f refers to the function named f in unit c. Now, we want the observable behavior to be function calls and returns that cross from one unit to another. One way to specify this behavior is to create an abstract machine that logs the crossings (producing a trace). It would be preferable to map directly from a program to the trace, but I'm better at thinking in terms of abstract machines. Thus, the approach I'm using here is similar to that of the paper Java Jr.: Fully abstract trace semantics for a core Java language by Jeffrey and Rathke, which addresses the same problem that we're concerned with here, but for a subset of Java. More generally, the use of transition systems to generate traces goes back at least as far as A Theory of Communicating Sequential Processes by Brookes, Hoare, and Roscoe.

The lambda calculus includes first-class functions, so we need to address the question: what happens when a lambda is passed from one compilation unit to another? Drawing inspiration from the paper Temporal higher-order contracts, we don't literally pass lambdas, but instead the originating unit must give the lambda a name and then pass a reference to the lambda. We'll let $\chi$ range over the exportable values. $\begin{array}{rcl} n &\in & \mathbb{Z} \\ v & ::= & n \mid \lambda x.e \mid c{.}f \\ \chi &::= & n \mid c{.}f \end{array}$

The state of the machine is the actively executing expression e, the current compilation unit c, the previous compilation unit $c'$ (that called the current one), and the unit table U (mapping unit names to units). $\langle e, c, c', U \rangle$ We write $\varphi_c$ for accessing the function table of unit c from inside U and we write $\kappa_c$ for accessing it's stack. To allow internal computation within a unit, we define the following transition which single-steps the active expression according to reduction rules of the lambda calculus. $\begin{align*} & \langle e, c, c' , U \rangle \\ \longmapsto\; & \langle e', c, c' , U \rangle \qquad \text{if } e \longrightarrow e' \qquad \text{(Red)} \end{align*}$ A call to an external function looks up the function in the target unit $c'$ , installs the function body as the active expression, and records the continuation (evaluation context) $E$ and the previous unit $c''$ on the stack of the caller unit. The annotation on top of the $\longmapsto$ is how we log that an observable action occurred. $\begin{align*} & \langle E[c'{.}f(\chi)], c, c'' , U\rangle \\ \overset{c'{.}f(\chi)}{\longmapsto} \; & \langle [x:=\chi]e, c', c , U[c:=((E,c'') \cdot \kappa_c, \varphi_c)] \rangle \qquad \text{(Call)} \\ & \qquad \text{if } c \neq c' \text{ and } \varphi_{c'}(f) = \lambda x.e \end{align*}$ We also need to deal with the case where an internal function has been passed to another unit and then passed back. A call to such a function should not be observable. Thus, we have the following transition rule to convert from a reference to a function back to a lambda expression. $\begin{align*} & \langle E[c{.}f(v)], c, c' , U\rangle \\ \longmapsto \; & \langle E[(\lambda x.e) \, v], c, c' , U\rangle \qquad \text{(InFun)} \\ & \qquad \text{if } \varphi_{c}(f) = \lambda x.e \end{align*}$ When the active expression has been reduced to an exportable value, it's time to return to the caller $c'$ . We pop the activation record $(E,c'')$ from the top of the caller's stack and install $E[\chi]$ as the active expression. $\begin{align*} &\langle \chi,c,c', U \rangle \\ \overset{\chi}{\longmapsto} \;& \langle E[\chi], c', c'', U[c' := (\kappa',\varphi_{c'})] \rangle \qquad \text{(Ret)} \\ &\qquad \text{if } \kappa_{c'} = (E,c'') \cdot \kappa' \end{align*}$ The following transition rule converts a lambda into a reference that is suitable for export by registering it in the function table. $\begin{align*} &\langle R_c[\lambda x.e],c,c' , U \rangle \\ \longmapsto\;& \langle R_c[c{.}f], c, c' , U[c:= (\kappa_c, \varphi_c[f{:=} \lambda x.e]) ] \rangle \qquad \text{(Reg)} \\ &\qquad \text{if } f \notin \mathrm{dom}(\varphi_c) \end{align*}$ where the $R_c$ context is defined as follows $R_c ::= \Box \mid E[c'{.}f(\Box)] \qquad c \neq c'$ So this rule converts lambdas to references when they need to be returned to another unit or when they are to be passed as an argument to another unit.

We can now define the semantics, the eval function, for the separately-compiled lambda calculus. Let s be an initial state (all the stacks are empty, the current unit is I (for initial), which represents the operating system, and the active expression is a call to the main function in the first unit with argument 42). $\mathit{eval}(s) = \begin{cases} \overline{a} & \text{if } s \overset{\overline{a}}{\longmapsto^{*}} \langle \chi, I, I, U' \rangle \\ \bot & \text{otherwise} \end{cases}$

Exercise: Write down the machine transitions for the following initial state. $\begin{align*} s_0 &= \langle A{.}\mathit{main}(42), I, I, \\ &\qquad \{ I \mapsto (\epsilon,\emptyset), \\ &\qquad\; A \mapsto (\epsilon,\{ \mathit{main} \mapsto \lambda x. B{.}f(\lambda y. y) \}),\\ &\qquad\; B \mapsto (\epsilon, \{ f \mapsto \lambda h. h(3) \}) \} \rangle \end{align*}$ Then compute the result of $\mathit{eval}(s_0)$ . It should be $A{.}main(42), B{.}f(A{.}g), A{.}g(3), 3, 3, 3$ (Up to your choice of name for the lambda passed to $B{.}f$ . Here we've named it $g$ .)

The take-away for the compiler writer is that they can compile a unit any way they want, so long as the log comes out as specified by eval.

Sunday, July 15, 2012

Over-specification in operational semantics, or the case of the missing "eval"

For a long time, one aspect of operational semantics kept nagging at me: all the intermediate steps in the evaluation of the program are spelled out. This over-specifies the semantics in the sense that a compiler writer certainly doesn't need to make sure that the executing program really goes through each of those intermediate steps. So how should a compiler writer read the operational semantics of a language?

Fortunately, I asked Matthias Felleisen this question over dinner a couple years back, and he told me the simple solution. The semantics of a language should not be defined as the single-step reduction relation, or even the reflexive transitive closure of it, but instead, every semantics should define an eval function (technically, a partial function) that maps a program to its observable behavior. Readers familiar with denotational semantics might grin at this point; that's what a denotational semantics does! True, but it's also straightforward to define the eval function using a reduction relation.

Suppose the language in question is the Simply-Typed Lambda Calculus with constants (like numbers and Booleans) ranged over by c. Then the following eval function is one straightforward choice of semantics. $\mathit{eval}(e) = \begin{cases} c & \text{if } e \longrightarrow^{*} c \\ '\mathit{function}' & \text{if } e \longrightarrow^{*} \lambda x{:}T.e' \\ \bot & \text{otherwise} \end{cases}$ By stating that eval is the semantics of the STLC, we're telling a compiler writer that an implementation is correct so long as, after running a program, it prints out the correct constant or the string 'function', in accordance with the result given by eval. In general, programs don't always halt, which is the reason for the $\bot$ case. Of course, STLC is a somewhat odd language in that every STLC program halts, so the $\bot$ case is not technically needed here, even though for most languages it would be needed.

These days I try to remember to include the eval function is the semantics that I write. Yes, the eval function is usually pretty obvious, but so are many aspects of the semantics. Yes, it's pedantic, but in some sense writing down a formal semantics is all about being pedantic.

There aren't many examples of eval in the literature. Most textbooks on programming language theory don't include a single eval function. However, the book Semantic Engineering with PLT Redex and the notes that preceded the book, Programming Languages and Lambda Calculi, by Matthias and his coauthors, includes many example eval functions.

So the next time you're writing down an operational semantics, please include the eval!

What about big-step operational semantics?

If you use big-step semantics instead of small-step, part of the over-specification problem goes away, but not all. A big-step semantics doesn't specify every intermediate step of computation; it relates programs directly to the resulting values. However, there is still an issue regarding what counts as observable. The following is a typical big-step semantics for the STLC. $\begin{gather*} \rho \vdash c \Downarrow c \qquad \rho \vdash x \Downarrow \rho(x) \qquad \rho \vdash \lambda x{:}T.e \Downarrow \langle \lambda x{:}T. e, \rho \rangle \\[2ex] \frac{ \begin{array}{c} \rho \vdash e_1 \Downarrow \langle \lambda x{:}T. e_3, \rho' \rangle \quad \rho \vdash e_2 \Downarrow v_2 \\ \rho'\left[x{:=}v_2\right] \vdash e_3 \Downarrow v_3 \end{array} }{ \rho \vdash e_1 \, e_2 \Downarrow v_3 } \end{gather*}$

Would this big-step semantics serve as a good specification for the STLC? Not quite. Consider the semantics of the following program. $\emptyset \vdash (\lambda x{:}\mathtt{int}. \lambda y{:}\mathtt{int}. y)\,42 \Downarrow \langle \lambda y{:}\mathtt{int}. y, \{x \mapsto 42 \} \rangle$ The problem is that we don't want to tell compiler writers how to represent closures. For example, a high-quality implementation doesn't store the x because it doesn't occur in the body of the lambda.

To fix this over-specfication, we need to specify what's observable about a closure, which is traditionally nothing except that it is a closure. So we can define an eval function based on a big-step semantics as follows to obtain a good specification for the STLC. $\mathit{eval}(e) = \begin{cases} c & \text{if } \emptyset \vdash e \Downarrow c \\ '\mathit{function}' & \text{if } \emptyset \vdash e \Downarrow \langle \lambda x{:}T.e',\rho \rangle \\ \bot & \text{otherwise} \end{cases}$

Thursday, July 12, 2012

My new favorite abstract machine: ECD on ANF

A few years ago I blogged about my favorite abstraction machine, the ECD machine. It is a variant of Landin's classic SECD machine that uses evaluation contexts to handle control flow within an expression. Having received my Ph.D. from Indiana, evaluation contexts are very close to my heart, however, I'm having second thoughts.

The question that started these doubts was the following:

Should control flow within an expression be part of the static or dynamic semantics of a language?

The knee-jerk answer is: of course, control flow belongs in the dynamic semantics! But then I put on my compiler-writer hat and retort: but the control-flow within an expression can be compiled away. The standard conversion to A-normal form (ANF), which flattens expressions into statements, determines intra-expression control flow statically. So taking this answer to heart, I tried out several Isabelle formalizations using this approach. Low and behold, it worked quite well, removing the need for the machinery and tedious lemmas regarding evaluation contexts.

I'll show the main points of this idea in the setting of the Simply-Typed Lambda Calculus (STLC).

Conversion to A-Normal Form

First, we'll define the ANF variant of STLC, then give the conversion to ANF. The main changes are that function application becomes a statement and there is a statement for returning from a function. Also, note that the body of a lambda is now a statement instead of an expression. $\begin{array}{lrcl} \text{expressions} & e & ::= & c \mid x \mid \lambda x{:}T. s \\ \text{definitions} & d & ::= & x = e(e) \mid x = \mathit{op}(\overline{e}) \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\, e \end{array}$

Now for the conversion to ANF. $\mathit{ToANF(e)} = \mathrm{let}\,(e',\overline{d}) = \mathit{ANF}(e) \,\mathrm{in}\, (\overline{d}; \mathbf{return}\, e')$ $\begin{align*} \mathit{ANF}(c) &= (c,\epsilon) \\ \mathit{ANF}(\mathit{op}(\overline{e})) &= \mathrm{let}\, (\overline{e}',\overline{d}) = \overline{\mathit{ANF}}(\overline{e}),\\ &\qquad\,\, x = \mathit{fresh{-}variable}() \\ &\quad \mathrm{in}\, (x, (\overline{d}; x = \mathit{op}(\overline{e}'))) \\ \mathit{ANF}(x) &= (x,\epsilon) \\ \mathit{ANF}(\lambda x{:}T.e) &= \mathrm{let}\, (e',\overline{d}) = \mathit{ANF}(e)\\ &\quad \mathrm{in}\, (\lambda x{:}T. (\overline{d}; \mathbf{return}\, e'), \epsilon)\\ \mathit{ANF}(e_1\,e_2) &= \mathrm{let}\, (e'_1,\overline{d}_1) = \mathit{ANF}(e_1), \\ & \qquad\,\, (e'_2,\overline{d}_2) = \mathit{ANF}(e_2) \\ & \qquad\,\, x = \mathit{fresh{-}variable}() \\ & \quad \mathrm{in}\, (x, (\overline{d}_1;\overline{d}_2; x = e'_1(e'_2))) \end{align*}$ This conversion to ANF specifies that, within an expression, evaluation proceeds from left to right.

The ECD on ANF

The machine definition consists of two parts, a function for evaluating expressions to values and the transition rules for the machine itself, which handle the statements. We start with evaluating expressions. The evaluation function V takes two arguments, an expression and an environment mapping variables to values. $\begin{align*} V(c,\rho) &= c\\ V(x, \rho) &= \rho(x) \\ V(\lambda x{:}T.s, \rho) &= \langle \lambda x{:}T.s, \rho \rangle \end{align*}$

The machine state has the form $(s,\rho,\kappa)$ , where the stack $\kappa$ is a list of frames, which have the form $(x,s,\rho)$ . Now for the transition rules. There are three, one for each kind of statement. $\begin{align*} (x = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (x,s,\rho)::\kappa) \\ \text{where} & V(e_1,\rho) = \langle \lambda y{:}T. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where}& v = \delta(\mathit{op},\overline{V}(\overline{e},\rho)) \\ (\mathbf{return}\,e, \rho, (x,s,\rho')::\kappa) & \longmapsto (s, [x{:=}v]\rho', \kappa) \\ \text{where}& V(e,\rho) = v \end{align*}$

To finish up, we define the semantics of the Simply-Typed Lambda Calculus using ToANF and the ECD machine. $\mathit{eval}(e) = \begin{cases} c & (\mathit{ToANF}(e), \emptyset, \epsilon) \longmapsto^{*} (\mathbf{return}\,e', \rho, \epsilon) \text{ and } V(e',\rho) = c \\ \bot & \text{otherwise} \end{cases}$

Discussion

While the ECD on ANF is my new favorite, I don't claim it's the best for all purposes. I suspect the original ECD machine, with evaluation contexts, may be better for explaining the semantics of the STLC to a programmer. Though it's unclear which is more difficult to explain, ANF or evaluation contexts. Also, programmers coming from C or Java may be more use to a language with statements.

I do claim the ECD on ANF is great for the purposes of doing PL meta-theory. For example, proving type safety is extremely straightforward.

Proof of type safety in Isabelle

Here's the proof of type safety for the STLC in A-normal form. The whole development is only 369 lines!

ECDOnANF.thy

Monday, July 09, 2012

Closure Conversion, with Polymorphism

In the last post, we discussed closure conversion applied to the Simply-Typed Lambda Calculus (STLC), perhaps the simplest language with lexical scoping and first-class functions. Now we turn to closure conversion applied to a richer language, one that includes generics.

The Polymorphic Lambda Calculus (aka. System F)

System F extends the STLC with first-class generics: the $\Lambda$ expression and an expression for instantiating a generic. $\begin{array}{lrcl} \text{types} & T & ::= & \ldots \mid \alpha \mid \forall \alpha. T \\ \text{expressions} & e & ::= & \ldots \mid \Lambda \alpha. e \mid e[T] \\ \text{values} & v & ::= & \ldots \mid \Lambda \alpha. e \\ \text{frames} & F & ::= & \ldots \mid \Box[T] \end{array}$ The expression $\Lambda \alpha.e$ creates a generic expression parametrized by the type variable $\alpha$ . The type of such a generic expression is $\forall \alpha. T$ , where T is the type of the inner expression e. The expression $e[T]$ instantiates the generic e, replacing its type parameter with type T.

The typing rules for System F are those of the STLC plus the following: $\frac{ \Gamma,\alpha \vdash e : T }{ \Gamma \vdash \Lambda \alpha.e : \forall \alpha. T } \qquad \frac{ \Gamma \vdash e : \forall \alpha. T_2 }{ \Gamma \vdash e[T_1] : [\alpha := T_1] T_2 }$

Also, there is one new reduction rule: $(\Lambda \alpha. e)[T] \longrightarrow [\alpha {:=} T]e$

Updating the Intermediate Language: IL2

With the introduction of generics, we not only worry about lexically scoped term variables but we are also concerned with lexically scoped type variables. So the $\delta$ and $\gamma$ functions must be updated to include type variables and their bindings. Furthermore, the closure operator must be extended with type arguments to bind the new type variables. $\begin{array}{lrcl} \text{expressions} & e & ::= \ldots \mid \delta \overline{y{:}T},\overline{\alpha},x{:}T.e \mid \gamma x{:}T,\overline{y{=}v,\alpha{=}T}. e \mid e \{ \overline{e}, \overline{T} \} \end{array}$

The following are the reduction rules for the functions of IL2. $\begin{align*} (\delta \overline{y{:}T},\overline{\alpha},x{:}T.e)\{\overline{e},\overline{T}\} & \longrightarrow \gamma x{:}T,\overline{y{=}v,\alpha{=}T}. e \\ (\gamma x{:}T,\overline{y{=}v,\alpha{=}T}. e)\, v & \longrightarrow [\overline{\alpha{:=}T}]\{ \overline{y{:=}v}\} \{ x{:=}v \} e \end{align*}$

Next we turn to the $\Lambda$ . We'll need to translate $\Lambda$ to something in IL2, and in the spirit of closure conversion, it should be something that does not support lexical scoping. Inspired by the $\delta$ and $\gamma$ functions, we create analogous things for generics. $\begin{array}{lrcl} \text{expressions} & e & ::= \ldots \mid \Phi \overline{y{:}T},\overline{\alpha},\alpha.e \mid \Upsilon \alpha,\overline{y{=}v,\alpha{=}T}. e \end{array}$

The following are the reduction rules for the generics of IL2. $\begin{align*} (\Phi \overline{y{:}T},\overline{\alpha},\alpha.e)\{\overline{e},\overline{T}\} & \longrightarrow \Upsilon \alpha,\overline{y{=}v,\alpha{=}T}. e \\ (\Upsilon \alpha,\overline{y{=}v,\alpha{=}T}. e) [T] & \longrightarrow [\overline{\alpha{:=}T}]\{ \overline{y{:=}v} \} [\alpha{:=}T]e \end{align*}$

Closure conversion for System F

With IL2 in place, the definition of closure conversion for System F is relatively straightforward. The case for $\lambda$ computes the free type variables (FTV) and includes them in the $\delta$ function. The new case for $\Lambda$ is analogous to the case for $\lambda$ .

$\mathit{ClosureConversion}(e) = \mathrm{let}\;(e',\overline{f}) = C(e)\,\mathrm{in}\; (\mathbf{globals} \,\overline{f}\,\mathbf{in}\, e') \\$ $\begin{align*} C(x) &= (x,\epsilon) \\ C(c) &= (c,\epsilon) \\ C(\mathit{op}(\overline{e})) &= \mathrm{let}\;(\overline{e}',\overline{f}) = \overline{C}(\overline{e})\,\mathrm{in} \; (\mathit{op}(\overline{e}'), \overline{f}) \\ C(\lambda x{:}T.e) &= \mathrm{let}\; (e',\overline{f}) = C(e) \\ & \qquad\; \overline{y{:}T} = \mathrm{FV}(e) - \{x\} \\ & \qquad\; \overline{\alpha} = \mathrm{FTV}(e) \\ & \qquad\; g = \mathrm{freshname}() \\ & \quad \mathrm{in}\; (g\{\overline{y},\overline{\alpha}\}, (g = (\delta \overline{y{:}T,\alpha},x{:}T.e'), \overline{f}))\\ C(e_1\,e_2) &= \mathrm{let}\; (e'_1,\overline{f}_1) = C(e_1) \\ & \qquad\; (e'_2,\overline{f}_2) = C(e_2) \\ & \quad \mathrm{in}\; (e'_1\,e'_2, (\overline{f}_1 , \overline{f}_2)) \\ C(\Lambda \alpha.e) &= \mathrm{let}\; (e',\overline{f}) = C(e) \\ & \qquad\; \overline{y{:}T} = \mathrm{FV}(e) \\ & \qquad\; \overline{\alpha} = \mathrm{FTV}(e) - \{ \alpha \} \\ & \qquad\; g = \mathrm{freshname}() \\ & \quad \mathrm{in}\; (g\{\overline{y},\overline{\alpha}\}, (g = (\Phi \overline{y{:}T,\alpha},\alpha.e'), \overline{f}))\\ C(e[T]) &= \mathrm{let}\;(e',\overline{f}) = C(e) \,\mathrm{in} \; (e'[T], \overline{f}) \end{align*}$

Saturday, July 07, 2012

The Essence of Closure Conversion

The interaction between lexical scoping of variables and first-class functions, as found in all functional programming languages, is rather subtle and interesting, especially if you want to compile the language to assembly code. For some reason, computer architects have yet to include first-class functions with lexical scoping in modern ISAs! ;) In this blog post, I'll try to capture the essence of compiling from a language with lexical scoping and first-class functions to a language without lexical scoping (but still with first-class functions). This kind of translation is usually referred to as closure conversion for reasons that will become apparent.

We take the Simply-Typed Lambda Calculus (STLC) as our first example of a language with lexical scoping and first-class functions. Unlike most treatments of closure conversion, we create a custom intermediate language (IL1) to be the target of closure conversion. IL1 is similar to the STLC except that lexical scoping does not apply to functions, that is, occurrences of variables inside functions may not refer to binders outside of the function. However, functions in IL1 are still first class in the sense that they can be passed to and returned from functions. Most presentations of closure conversion use a lower-level intermediate language. We do not take that approach here because it clouds the essence of closure conversion by bringing in data representation issues.

Both the source and target languages of the translation are statically typed, so it will be important for the translation to map well-typed programs to well-typed programs. The type-preserving aspect of the translation is inspired by Morrisett et al.'s work on the TIL compiler. For reference, see From System F to Typed Assembly Language in ACM TOPLAS, 1999. However, we avoid some of the complexity of their approach by our use of a custom intermediate language. In particular, the use of existential types is replaced by an existential at the meta-level, in the typing rule for closures.

In a later blog post, I'll move to a slightly richer language, the Polymorphic Lambda Calculus (System F). I'll make some modifications to IL1 to obtain IL2 and then present a translation from System F to IL2.

Warming up with an example

Consider the following program in the STLC. $(\lambda x{:}\mathtt{int}.\lambda y{:}\mathtt{int}. x + y)\;40\;2$ The outer function is applied to 40, so x is bound to 40. The outer function immediately returns the inner function, which is then applied to 2, so y is bound to 2. The inner function computes $x + y$ , so it returns 42. Note how the inner function referred to x, the parameter of the outer function. This is an example of lexical scoping.

Now let's try to translate the above program into an equivalent program (produces the same answer) but that does not rely on lexical scoping (except for global variables). Note that if functions are not lexically scoped, then they might as well be defined at the top of the program, and not be nested. Our first attempt adds an extra parameter to the inner function. $\begin{array}{l} \mathbf{globals} \; f = \lambda x{:}\mathtt{int}, y{:}\mathtt{int}. x + y \\ \qquad\;\;\; g = \lambda x{:}\mathtt{int}. f \, x \\ \mathbf{in}\; g\;40\;2 \end{array}$ However, the above is ill-formed because inside g, f is applied to one argument but it requires two. And unfortunately, the second argument is not available inside g, it doesn't come along until we return from g (the argument 2). Furthermore, we don't want to call f inside of g, we just want to bind the value for x.

The way out of this problem is to get creative with the target language of closure conversion. After all, the target language is just our private intermediate language, so we can do anything we want with it, as long as it helps us get closer to assembly code.

In the above example, what we need is some kind of function that can bind some of its arguments before being called. Focusing on the function f, it needs to bind the value of x inside of g. Only later f will be actually called, with y bound to 2. Let's use $\delta$ as the notation for our new kind of function in which all of the parameters except for the last are the early parameters. The notation $e_0 \{ e_1,\ldots,e_n \}$ is the closure operator. It binds all the early parameters of function $e_0$ to the arguments $e_1,\ldots,e_n$ but it does not call the function.

With our new kind of $\delta$ function and closure operator, we can translate our example to the following. $\begin{array}{l} \mathbf{globals}\; f = \delta x{:}\mathtt{int}, y{:}\mathtt{int}. x + y \\ \qquad\;\;\; g = \delta x{:}\mathtt{int}. f \{ x \} \\ \mathbf{in}\;\; g\{\} \;40\;2 \end{array}$ The first thing that happens is that g is trivially closed. Then it is called with the normal argument 40. Inside g, we close f, binding the value of x (which is 40) to parameter x of f. This closure is returned from g and called with normal parameter 2. The body of f is then evaluated, and $x + y$ gives us 42 as expected.

Next we move on to spell out all of the details of how this works. We first review the definition of the STLC and then present the definition of IL1, complete with the new $\delta$ functions. We then present closure conversion, the translation from STLC to IL1.

The Simply-Typed Lambda Calculus

To review, the following is the definition of the Simply-Typed Lambda Calculus.

Syntax $\begin{array}{lrcl} \text{types} & T & ::= & B \mid T \to T \\ \text{expressions} &e & ::= & c \mid \mathit{op}(\overline{e}) \mid x \mid \lambda x{:}T. e \mid e\,e \\ \text{values} &v & ::= & c \mid \lambda x{:}T. e \\ \text{frames}&F& ::= & \mathit{op}(\overline{v},\Box,\overline{e}) \mid \Box\,e \mid v \, \Box \\ \end{array}$ The type B ranges over basic types such as Int and Bool. We restrict complete STLC programs to have a type in B. The expression c ranges over constants such as 0 and true. The operator op ranges over primitive operations such as arithmetic and logical operations. We use an overline to denote a sequence of things.

Type System $\begin{gather*} \frac{x{:}T \in \Gamma}{\Gamma \vdash x : T} \quad \frac{\Delta(\mathit{op}) = (\overline{T},T) \quad \Gamma \vdash \overline{e : T}} {\Gamma \vdash \mathit{op}(\overline{e}) : T} \quad \frac{\mathit{typeof}(c)=T}{\Gamma \vdash c : T} \\[2ex] \frac{ \Gamma,x{:}T \vdash e : T' }{ \Gamma \vdash \lambda x{:}T. e : T \to T' } \quad \frac{ \Gamma \vdash e_1 : T \to T' \quad \Gamma \vdash e_2 : T }{ \Gamma \vdash e_1\, e_2 : T' } \end{gather*}$ Note: The appearance of $\Gamma$ in the antecedent of the typing rule for functions is what allows occurrences of variables inside a function to refer to bindings outside the function.

Substitution $\begin{align*} \left[x{:=}v\right] c &= c \\ \left[x{:=}v\right] \mathit{op}(\overline{e}) &= \mathit{op}(\left[x{:=}v\right] \overline{e}) \\ \left[x{:=}v\right] x &= v \\ \left[x{:=}v\right] y &= y & \text{if } y \neq x \\ \left[x{:=}v\right] \lambda y{:}T. e &= \lambda y{:}T. \left[x{:=}v\right] e & \text{ if } y \notin \{x\} \cup \mathrm{FV}(\overline{v}) \\ \left[x{:=}v\right] e_1 \, e_2 &= (\left[x{:=}v\right] e_1) \, (\left[x{:=}v\right] e_2) \end{align*}$ Note: Substitution goes inside of functions because there are variable occurrences inside the functions that refer to variables from outside the function.

Reduction Rules and Evaluation $\begin{align*} \mathit{op}(\overline{v}) & \longrightarrow \delta(\mathit{op},\overline{v}) \\ (\lambda x{:}T.e)\,v & \longrightarrow [x{:=}v] e \\ F[e] & \longrightarrow F[e'] & \text{if } e \longrightarrow e' \end{align*}$ $\mathit{eval}(e) = \begin{cases} c & \text{if } e \longrightarrow^{*} c \\ \bot & \text{otherwise} \end{cases}$

An Intermediate Language (IL1) without Lexical Scoping

Now for the creative part of defining IL1. The type of a $\delta$ function will have the form $(\overline{T}_1,T_2) \Rightarrow T_3$ , where $\overline{T}_1$ are the types for the early parameters, $T_2$ is the type of the normal parameter, and $T_3$ is the return type. The result of applying the closure operator to a $\delta$ function is another kind of function, a $\gamma$ function (otherwise known as a closure) which has stored away the values for its early parameter. Given a $\delta$ function of type $(\overline{T}_1,T_2) \Rightarrow T_3$ , its closure will have type $T_2 \to T_3$ .

Syntax $\begin{array}{lrcl} \text{types}&T & ::= & B \mid (\overline{T},T) \Rightarrow T \mid T \to T \\ \text{expressions}&e & ::= & c \mid \mathit{op}(\overline{e})\mid x \mid \delta \overline{y{:}T},x{:}T.e \mid \gamma x{:}T,\overline{y{=}v}. e \mid e\{\overline{e}\} \mid e\,e \\ \text{programs}&p & ::= & \mathbf{globals}\, \overline{x=v}\,\mathbf{in}\, e \\ \text{values}&v & ::= & c \mid \delta x{:}T,\overline{y{:}T}.e \mid \gamma x{:}T,\overline{y=v}. e \\ \text{frames}&F& ::= & \mathit{op}(\overline{v},\Box,\overline{e}) \mid \Box \{ \overline{e} \} \mid v \{ \overline{v},\Box,\overline{e} \} \mid \Box\,e \mid v \, \Box \\ \end{array}$

Type System $\begin{gather*} \frac{ \overline{y{:}T},x{:}T \vdash e : T' }{ \Gamma \vdash \delta \overline{y{:}T},x{:}T.e : (\overline{T},T) \Rightarrow T' } \quad \frac{ \vdash \overline{v : T} \quad \overline{y{:}T},x{:}T \vdash e : T' }{ \Gamma \vdash \gamma x{:}T,\overline{y{=}v}. e : T \to T' } \\[2ex] \frac{ \Gamma \vdash e : (\overline{T},T) \Rightarrow T' \quad \Gamma \vdash \overline{e : T} }{ \Gamma \vdash e \{ \overline{e} \} : T \to T' } \quad \frac{ \Gamma \vdash e_1 : T \to T' \quad \Gamma \vdash e_2 : T }{ \Gamma \vdash e_1\, e_2 : T' } \\[2ex] \frac{ \vdash \overline{v : T} \quad \overline{x : T} \vdash e : B }{ \vdash \mathbf{globals}\,\overline{x=v}\,\mathbf{in}\,e : B } \end{gather*}$ Note: The antecedents of the typing rules for the $\delta$ and $\gamma$ functions do not include $\Gamma$ !

Shallow Substitution $\begin{align*} \{x{:=}v\} x &= v \\ \{x{:=}v\} y &= y & \text{if } y \neq x \\ \{x{:=}v\} v' &= v' & \text{Whoa!}\\ \{x{:=}v\} \mathit{op}(\overline{e}) &= \mathit{op}(\{x{:=}v\} \overline{e}) \\ \{x{:=}v\} e\{\overline{e}\} &= (\{x{:=}v\} e) \{ \{x{:=}v\}\overline{e} \} \\ \{x{:=}v\} e_1 \, e_2 &= (\{x{:=}v\} e_1) \, (\{x{:=}v\} e_2) \end{align*}$ Note: Shallow substitution does not go inside of values, particularly it leaves functions unchanged in contrast to substitution in the STLC.

We introduce the following notation for iterated substitution and shallow substitution. $\begin{align*} \left[ \overline{x {:=} v} \right] e &= \left[x_1{:=}v_1 \right] \ldots \left[x_n{:=}v_n\right] e \\ \{ \overline{x {:=} v} \} e &= \{ x_1{:=}v_1 \} \ldots \{ x_n {:=} v_n \} e \end{align*}$

Reduction Rules and Evaluation $\begin{align*} \mathit{op}(\overline{v}) & \longrightarrow \delta(\mathit{op},\overline{v}) \\ (\delta \overline{y{:}T},x{:}T.e)\{\overline{v}\} &\longrightarrow \gamma x{:}T,\overline{y{=}v}.e \\ (\gamma x{:}T,\overline{y{=}v}.e)\; v &\longrightarrow \{\overline{y{:=}v}\} \{x{:=}v\} e \\ F\left[e\right] & \longrightarrow F\left[e'\right] & \text{if } e \longrightarrow e' \end{align*}$ $\mathit{eval}(\mathbf{globals}\,\overline{x{=}v}\,\mathbf{in}\,e) = \begin{cases} c & \text{if } [\overline{x{=}v}] e \longrightarrow^{*} c \\ \bot & \text{otherwise} \end{cases}$ Note: Function application uses shallow substitution but for globals, we use normal (deep) substitution.

Translation from the Simply-Typed Lambda Calculus to IL1

With our source and target languages in place, we're ready to go into the details of closure conversion. The main ClosureConversion function takes as input a STLC program and outputs an IL1 program. It uses an auxiliary recursive function C that takes a STLC expression and returns two things, an IL1 expression and a list of functions, or more precisely, a list of bindings, where each binding consists of the function's name and the actual $\delta$ function. As we can see in the body of the ClosureConversion function, the functions returned from C become installed as global variables. $\mathit{ClosureConversion}(e) = \mathrm{let}\;(e',\overline{f}) = C(e)\,\mathrm{in}\; (\mathbf{globals} \,\overline{f}\,\mathbf{in}\, e') \\$

The way C processes expressions can be divided into three categories. For variables and constants, we just translate them to themselves and return an empty list of functions. For primitive and normal application, we apply C recursively to the subexpressions, create a new application from the resulting expressions and return the list of all the functions from the subexpressions. So that is it for the easy cases. All of the action is in the case for $\lambda$ . After the recursive call, we compute the free variables (FV) of the body of the function and remove the parameter x. This gives us all of the variables (and their types) that are used inside the function but defined outside. We generate a fresh name g for this function, then return two things. First, we apply the closure operator to g, with the free variables as arguments. At runtime, these variables will evaluate to values, which will then be captured in the closure. Second, we return the list of functions from the recursive call, plus a new $\delta$ function for this $\lambda$ , which we bind to g. $\begin{align*} C(x) &= (x,\epsilon) \\ C(c) &= (c,\epsilon) \\ C(\mathit{op}(\overline{e})) &= \mathrm{let}\;(\overline{e}',\overline{f}) = \overline{C}(\overline{e})\,\mathrm{in} \; (\mathit{op}(\overline{e}'), \overline{f}) \\ C(\lambda x{:}T.e) &= \mathrm{let}\; (e',\overline{f}) = C(e) \\ & \qquad\; \overline{y{:}T} = \mathrm{FV}(e) - \{x\} \\ & \qquad\; g = \mathrm{freshname}() \\ & \quad \mathrm{in}\; (g\{\overline{y}\}, (g = (\delta \overline{y{:}T},x{:}T.e'), \overline{f}))\\ C(e_1\,e_2) &= \mathrm{let}\; (e'_1,\overline{f}_1) = C(e_1) \\ & \qquad\; (e'_2,\overline{f}_2) = C(e_2) \\ & \quad \mathrm{in}\; (e'_1\,e'_2, (\overline{f}_1 , \overline{f}_2)) \end{align*}$

That's it! The essence of closure conversion is

Converting from lexically-scoped functions to two-phase functions that support a closure operator to bind early parameters and an application operator to bind the late parameter and execute the body.
Lifting all functions to become global variables.