Jeremy Siek: December 2009

There are many different styles of operational semantics but my favorite is not very well known. Hence this post. While in graduate school, I took a course on type systems from Amr Sabry in which we studied a miniature version of SML and used the style of operational semantics that I'm about to write about. Amr didn't give a name to this style, so I'm calling it the ECD abstract machine.

Why do I like the ECD machine? The ECD machine works a lot like a debugger. A debugger session has three components: a view of the source code for the currently executing procedure with the current position marked, a list of in-scope variables and their values, and a stack of the procedure calls. The ECD machine has the same three components.

Historical aside: the ECD machine is closely related to the SECD virtual machine created by Peter Landin. The ECD machine drops the operand Stack and instead uses evaluation contexts.

In the following I'm going to write down what an ECD machine looks like for the lambda calculus. The grammar for the lambda calculus is given below (using the keyword "fun" instead of "lambda"). Note that function application is just two expressions next to each other, where the first is the function and the second is the argument. The id terminal is for identifiers (variable names). $\begin{array}{rcl} expr &::=& id \mid expr\, expr \mid \lambda id. expr \mid value \\ value &::=& \langle \lambda id. expr, env \rangle \end{array}$ The only kind of value (the result of running the program) in the lambda calculus is a closure, which is the result of evaluating a function (a lambda). A closure is just a tuple containing a lambda and an environment. An environment (env) is a function from identifiers to values. Yes, this is a bit circular!

Unfortunately, the lambda calculus looks rather different from your typical imperative programming language, so that may make this particular ECD machine more difficult to understand to a reader not familiar with the lambda calculus or functional programming.

First, a word about how to represent source code with a mark on the current position. Because we're dealing with an expression-oriented language, the current position is not a line number but instead a sub-expression. So the current position can be visualized as a circle drawn around the next sub-expression to be evaluated. The traditional way to represent this is with two pieces: the first piece is a data structure called an evaluation context that represents the source code outside the circle. The second piece is just the sub-expression inside the circle. The following is the grammar for evaluation contexts for the call-by-value version of the lambda calculus. The $\Box$ is the hole in the context, i.e., the location of the circle. $\begin{array}{rcl} \mathit{EvalContext} ::= \Box \mid \mathit{EvalContext} \,expr \mid value \,\mathit{EvalContext} \end{array}$ The function fill takes an evaluation context and an expression and returns the result of plugging the expression into the hole and then rebuilding the rest of the program. In the following we use lowercase e's for expressions and uppercase E's for evaluation contexts. We use the notation $E[e]$ as shorthand for $\mathit{fill}(E,e)$ . $\begin{align*} \Box[e] = e \\ (E\, e_2)[e] = E[e]\, e_2\\ (e_1 \, E)[e] = e_1\, E[e] \end{align*}$

Next, let's describe the ECD abstract machine. As stated above, the ECD has three components. The first is an Environment, the second is the Control, which we will represented with an expression of the lambda calculus, and the third component, the strangely named Dump, is the call stack. The following are the reduction rules for the ECD abstraction machine. The variable x ranges over variables, s over stacks, and r over environments. Each reduction rule has a name given in parenthesis on the right-hand side. $\begin{align*} (r, E[x], s) &\longrightarrow (r, E[r(x)], s) & \text{(VAR)} \\ (r, E[\lambda x.e], s) &\longrightarrow (r, E[\langle \lambda x. e, r\rangle], s) & \text{(LAM)} \\ (r, E[\langle \lambda x.e',r'\rangle \,v], s) &\longrightarrow (r'[x:=v], e', (E,r) s) & \text{(APP)} \\ (r, v, (E,r') s) &\longrightarrow (r', E[v], s) & (RET) \\ \end{align*}$ The VAR rule handles the case of evaluating a variable by looking it up in the environment. The LAM rule evaluates a lambda into a closure, capturing the current environment in the second part of the closure. The APP rule starts a function call whereas the RET rule finishes a function call. Each element of the call stack is a tuple containing an evaluation context and an environment.

Let's finish with an example: $\begin{align*} & (\emptyset, (\lambda x. (\lambda y. x))\, (\lambda z. z)\, (\lambda w. w), []) \\ (LAM) \longrightarrow\;\;& (\emptyset, \langle \lambda x. (\lambda y. x), \emptyset\rangle \, (\lambda z. z) \, (\lambda w. w), []) \\ (LAM) \longrightarrow\;\;& (\emptyset, \langle \lambda x. (\lambda y. x), \emptyset\rangle\, \langle \lambda z. z, \emptyset\rangle \, (\lambda w. w), []) \\ (APP) \longrightarrow\;\;& (\{x:=\langle \lambda z. z, \emptyset \rangle\}, (\lambda y. x), [ (\Box\, (\lambda w. w), \emptyset) ]) \\ (LAM) \longrightarrow\;\;& (\{x:=\langle \lambda z. z, \emptyset \rangle\}, \langle \lambda y. x, \{x:=\langle \lambda z. z, \emptyset \rangle\}\rangle, [ (\Box\, (\lambda w. w),\emptyset) ]) \\ (RET) \longrightarrow\;\;& (\emptyset, \langle \lambda y. x, \{x:=\langle \lambda z. z, \emptyset \rangle\}\rangle\, (\lambda w. w), []) \\ (LAM) \longrightarrow\;\;& (\emptyset, \langle \lambda y. x, \{x:=\langle \lambda z. z, \emptyset \rangle\}\rangle \,\langle \lambda w. w, \emptyset \rangle, []) \\ (APP) \longrightarrow\;\;& (\{x:=\langle \lambda z. z, \emptyset \rangle, y:=\langle \lambda w. w, \emptyset \rangle\}, x, [(\Box, \emptyset)]) \\ (VAR) \longrightarrow\;\;& (\{x:=\langle \lambda z. z, \emptyset \rangle, y:=\langle \lambda w. w, \emptyset \rangle\}, \langle \lambda z. z, \emptyset \rangle, [(\Box,\emptyset)]) \\ (RET) \longrightarrow\;\;& (\emptyset, \langle \lambda z. z, \emptyset \rangle, []) \end{align*}$

A parting question. Is the ECD machine space efficient with regards to tail-recursive functions? If not, how would you modify it to be space efficient?

Euclid's algorithm for computing the greatest common divisor of two integers is beautiful because it is extremely simple and also captures an interesting property of linear equations. The equation ax+by = c has an integer solution if and only if gcd(a,b) divides c, where gcd is Euclid's algorithm written below. Recall that x divides y means there exists some n such that xn = y.

gcd(a,b) =
  if a == 0 then
     b
  else if b == 0 then
     a
  else if b < a then
     gcd(a - b, b)
  else
     gcd(a, b - a)

Proving that Euclid's algorithm really works is a good exercise in applying strong induction. We are going to prove that gcd(a,b) is the greatest common divisor of a and b. To apply strong induction, we need to pick a number to do induction on. The numbers a or b are obvious candidates, but neither does the job. Consider the two branches of the "if" expression in gcd. If we choose to do induction on a, then the "else" branch will cause us trouble because we won't be able to apply the induction hypothesis for gcd(a, b - a). If we choose to do induction on b, then we'll have the same kind of trouble in the "then" branch. We need some number that gets smaller in both branches. It turns out that a + b is such a number.

Theorem (Correctness of gcd).
gcd(a,b) is the greatest common divisor of a and b.
Proof.
We proceed by strong induction on a + b. When trying to prove something about a function like gcd, it often helps to structure your proof in a way that mimics the definition of the function. That is, we'll do case analysis in the proof in a way that matches the cases in the definition of gcd.
Case a = 0:
In this case gcd(a,b) = b. We know that b divides 0 and b divides b. Also, for any other divisor d of a and b, it is trivially true that d divides b. Thus, gcd(a,b) is the greatest common divisor of a and b.
Case not (a = 0) and b = 0:
The reasoning is the mirror image of the previous case and left for the reader.
Case not (a = 0) and not (b = 0):
Without loss of generality, assume that b < a. Then gcd(a,b) = gcd(a - b, b). Note that (a - b) + b < a + b. So by the induction hypothesis we know that gcd(a - b, b) is the greatest common divisor of a - b and b and so is its equal, gcd(a,b). Because gcd(a,b) divides both a - b and b, gcd(a,b) divides a, so gcd(a,b) is a common divisor of a and b. To finish we need to show it is the greatest. Assume d is an arbitrary common divisor of a and b. Then d divides a - b and because gcd(a,b) is the greatest common divisor of a - b and b, we can conclude that d divides gcd(a,b). We therefore proved that gcd(a,b) is the greatest common divisor of a and b.
QED.

A proof of this theorem in Isabelle can be found here.

Jeremy Siek

Monday, December 21, 2009

The ECD Abstract Machine, A Programmer's Operational Semantics

Friday, December 04, 2009

Greatest Common Divisor