## Wednesday, August 29, 2012

### Rationale for "Type Safety in Five"

In my last post Type Safety in Five Easy Lemmas, I made a claim that the formulation of the operational semantics of a language makes a big difference regarding how many lemmas, and how tedious, the proof of type safety becomes. While I showed that the particular semantics that I used led to a simple proof, with just five easy lemmas, I didn't compare it to the alternatives. Perhaps there's an even better alternative! In this post I discuss the alternatives that I know about and why they lead to more lemmas and more tedium. This post is organized by design decision.

### Why A-normal form?

The syntax of the little language is in A-normal form, that is, it's flattened out so that expressions don't have sub-expressions, but instead, more complex computations have to be built up from several variable assignments. It's much more common to see type-safety proofs on languages that are not in A-normal form, that is, languages with arbitrary nesting of expressions. This nesting of expressions can be handled in one of two ways in the operational semantics: either with extra so-called congruence reduction rules or by using evaluation contexts. The congruence rule approach adds many more cases to the "$\longmapsto$ is safe" lemma (or equivalently, the progress and preservation lemmas). The evaluation contexts approach requires an extra lemma often called the unique decomposition lemma.

You might be thinking, real languages aren't in A-normal form, so one would really need to compile to A-normal form and prove that the compilation is type preserving (as we did in the Work Horse post). This is true, but I'd argue that it's nice to have a separation of concerns and take care of intra-expression control flow in one function and lemma instead of dealing with it throughout the operational semantics and proof of type safety.

### Why environments instead of substitution?

The operational semantics passed around environments that associated values with variables and used the $\mathit{lookup}$ function to find the value for a variable. The more common alternative is to perform substitution, that is, during a function call, to go through the body of the function replacing occurrences of the parameters with their arguments. I'll probably get flamed for saying this, but substitution has a lot of disadvantages compared to environments. First, the substitution function is difficult to define properly. Second, it requires that you prove that substitution is type preserving, which is rather tedious because it touches every kind of expression (or statement) in the language. In comparison, the lemma that we proved about the $\mathit{lookup}$ function didn't mention any expressions. Third, substitution is really slow. Technically, that doesn't effect the complexity of the proof, but it does affect your ability to test your theorems. As I mentioned in the Work Horse post, it's a good idea to implement an interpreter and test whether your theorems hold on lots of example programs before trying to prove your theorems. If your interpreter uses substitution, it will be very very slow. You might instead use environments, but keep substitution in your formal semantics, but then your interpreter and your formal semantics differ so that properties of one may not imply properties of the other.

### Why functions with names (for recursion) instead of fix?

The only way that I know of to implement fix, the fixpoint combinator, is with substitution, and we don't want substitution.

Also, I should highlight how we handle recursion in the operational semantics. First, the $\mathcal{V}$ function doesn't do much, it just captures the current environment. An alternative used by Milner and Tofte is to extend the environment with a binding for the function itself. This alternative makes the proof of type safety more challenging because, instead of using induction, you have to use coinduction. (See their paper Co-induction in relational semantics.) While coinduction is a beautiful concept, it's nice to be able to stick to good old induction. Now, because $\mathcal{V}$ doesn't do much, we need to make up for it in the transition relation. Notice that the transition for function call extends the environment with the function itself. I forget where I learned this trick. If you know of any references for this, please let me know!

### Conclusion

I think that's it for the important design decisions, but I may be forgetting something. If you have any questions regarding the rationale for anything else, please post a comment! So to sum up, if you want your type safety proof to be less tedious, I recommend using A-normal form and formulating your semantics as an environment-passing abstract machine. Oh, and if you want recursive functions, than extend your lambdas with a name and be lazy about extending the environment: wait until the function call to add the function itself to the environment.

## Friday, August 24, 2012

### Type Safety in Five Easy Lemmas

A language is type safe if running a program in the language cannot result in an untrapped error. A trapped error is something like an exception or a message-not-found error, that is, it's an error that is part of the defined behavior of the language. An untrapped error is something like a segmentation fault, that is, the program has run into an undefined state in which the language doesn't define what should happen. A segmentation fault is the underlying operating system catching the error, not the language itself. It is the untrapped errors that hackers take advantage of to break into computer systems. If you want to run untrusted code without getting into trouble, then it's a good idea to only run code that is in a type safe language!

Wright and Felleisen pioneered what has become the most flexible approach to proving that a language is type safe in their 1992 paper A Syntactic Approach to Type Soundness. The general idea is to define a small-step operational semantics for the language and show that if a program is well typed, then it is either done reducing or it can reduce to another well-typed program. The safety of an entire sequence of reductions can then be proved by induction. It is now common practice for language designers to prove type safety for new language designs, or at least for interesting subsets of the languages of interest. Proving type safety does not require advanced mathematics and isn't particularly challenging, except that it is often rather tedious, requiring many technical lemmas in which it is easy to make a mistake.

It turns out that the choice in formulation of the operational semantics can make a significant difference regarding how many lemmas, and how tedious, the proof of type safety becomes. In this blog post, I present an operational semantics that removes the need for many of the standard lemmas. The semantics is based on an abstract machine. The language will be the simply-typed lambda calculus in A-normal form, extended with recursive functions and a few primitive operators. Despite its small size, this language is Turing complete. This language is roughly equivalent to the target language discussed in my previous posts about Structural Induction and the ECD Machine. In the following I present the syntax, operational semantics, and type system of this language. Then I give the proof of type safety. The proof will rely on five lemmas, one lemma for each function or relation used in the operational semantics: the variable $\mathit{lookup}$ partial function, the $\delta$ partial function that defines the behavior of the primitive operators, the $\mathcal{V}$ partial function for evaluating expression, the transition relation $\longmapsto$ for the abstract machine, and the multi-transition relation $\longmapsto^{*}$. The lemma for the transition relation is the main event.

### Operational Semantics

We use the notation $\epsilon$ for the empty list. Given a list $L$, the notation $a \cdot L$ is a larger list with $a$ as the first element and the rest of the elements are the same as $L$. We use lists of key-value pairs (association lists) to represent mapping from variables to types (type environments) and variables to values (environments). The following lookup (partial) function finds the thing associated with a given key in an association list. Next we define the $\delta$ function, which gives meaning to the primitive operators. The function $\mathcal{V}$ maps expressions to values, using environment $\rho$. The values of this language are constants and closures, as defined below. The definition of $\mathcal{V}$ uses the $\mathit{lookup}$ function for variables, the $\delta$ function for primitive operations, and turns functions to closures. A stack $\kappa$ is a list of statement-environment pairs. The state $\sigma$ of the machine is simply a stack. The top of the stack contains the actively-executing statement and its environment. The relation $\longmapsto$ defines transitions between states. There are only three transition rules, for primitive operators, calling functions, and returning from functions. We define $\longmapsto^{*}$ in the usual way, as follows. The semantics of this language is given by the following $\mathit{eval}$ function. A state is final if it is of the form $(\mathbf{return}\,e,\rho) \cdot \epsilon$ and $\mathcal{V}(e,\rho) = v$ for some $v$.

### Type System

The types for the constants is given by the $\mathit{typeof}$ function. The $\Delta$ function maps a primitive operator and argument types to the return type. The following presents the type rules for expressions, definitions, and statements. Our proof of type safety will require that we define notions of well-typed values, well-typed environments, well-typed stacks, and well-typed states.

### Proof of Type Safety

The first lemma proves that when an operator is applied to values of the expected type, the result is a value whose type matches the return type of the operator.

Lemma ($\delta$ is safe)
If $\Delta(o,\overline{T}) = T$ and $\vdash v_i : T_i$ for $i \in \{1,\ldots,n\}$, then $\delta(o,\overline{v}) = v$ and $\vdash v : T$, for some $v$.

Proof. We proceed by cases on the operator $o$.

1. If the operator $o$ is $+$, then we have $T_1=T_2=\mathsf{Int}$ and $T=\mathsf{Int}$. Then because $\vdash v_i : \mathsf{Int}$, we know that $v_i = n_i$ for $i \in \{1,2\}$. Then $\delta(+,n_1,n_2) = n_1 + n_2$ and we have $\vdash (n_1 + n_2) : \mathsf{Int}$.
2. If the operator $o$ is $-$, then we have $T_1=\mathsf{Int}$ and $T=\mathsf{Int}$. Then because $\vdash v_1 : \mathsf{Int}$, we know that $v_1 = n$ for some $n$. Then $\delta(-,n) = -n$ and we have $\vdash -n : \mathsf{Int}$.
3. If the operator $o$ is $=$, then we have $T_1=T_2=\mathsf{Int}$ and $T=\mathsf{Bool}$. Then because $\vdash v_i : \mathsf{Int}$, we know that $v_i = n_i$ for $i \in \{1,2\}$. Then $\delta(=,n_1,n_2) = n_1 = n_2$ and we have $\vdash (n_1 = n_2) : \mathsf{Bool}$.
QED.

The second lemma says that if you have an environment that is well-typed with respect to the type environment, and if a variable x is associated with type T in the type environment, then looking up x in the environment produces a value that has type T.

Lemma ($\mathit{lookup}$ is safe)
If $\Gamma \vdash \rho$ and $\mathit{lookup}(x,\Gamma) = T$, then $\mathit{lookup}(x,\rho) = v$ and $\vdash v : T$ for some $v$.
Proof. We proceed by induction on $\Gamma \vdash \rho$.
1. Case $\epsilon \vdash \epsilon: \qquad (\Gamma=\epsilon, \rho = \epsilon)$
But then we have a contradition with the premise $\mathit{lookup}(x,\Gamma) = T$, so this case is vacuously true.
2. Case $\begin{array}{c}\vdash v : T' \quad \Gamma' \vdash \rho' \\ \hline (x',T') \cdot \Gamma' \vdash (x',v) \cdot \rho' \end{array}$:
Next we consider two cases, whether $x = x'$ or not.
1. Case $x = x'$: Then $\mathit{lookup}(x, \rho) = v$ and $T = T'$, so we conclude that $\vdash v : T$.
2. Case $x \neq x'$: Then $\mathit{lookup}(x,\rho) = \mathit{lookup}(x,\rho')$ and $\mathit{lookup}(x,\Gamma) = \mathit{lookup}(x,\Gamma') = T$. By the induction hypothesis, we have $\mathit{lookup}(x,\rho') = v$ and $\vdash v : T$ for some $v$, which completes this case.
QED.

The next lemma proves that a well-typed expression evaluates to a value of the expected type.

Lemma ($\mathcal{V}$ is safe)
If $\Gamma \vdash e : T$ and $\Gamma \vdash \rho$, then $\mathcal{V}(e,\rho) = v$ and $\vdash v : T$ for some $v$.
Proof. We proceed by cases on the expression $e$.
1. Case $e = c$:
From $\Gamma \vdash c : T$ we have $\mathit{typeof}(c) = T$ and therefore $\vdash c : T$. Also, we have $\mathcal{V}(c,\rho) = c$, which completes this case.
2. Case $e = x$:
From $\Gamma \vdash x : T$ we have $\mathit{lookup}(x,\Gamma) = T$. We then apply the lookup is safe lemma to obtain $\mathit{lookup}(x,\rho) = v$ and $\vdash v : T$ for some $v$. Thus, we have $\mathcal{V}(x,\rho) = v$ and this case is complete.
3. Case $e = \mathsf{fun}\,f(x{:}T_1) s$:
We have $\mathcal{V}(\mathsf{fun}\,f(x{:}T_1) s,\rho) = \langle\mathsf{fun}\,f(x{:}T_1) s, \rho \rangle$. From $\Gamma \vdash \mathsf{fun}\,f(x{:}T_1) s : T$ we have $(x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash s : T_2$, with $T = T_1 \to T_2$. Together with $\Gamma \vdash \rho$, we conclude that $\vdash \langle\mathsf{fun}\,f(x{:}T_1) s, \rho \rangle : T$.
QED.

Now for the fourth and most important lemma. This lemma states that if a state is well typed, then either the state is a final state or the state can transition to a new state of the same type. In the literature, this lemma is often split into two lemmas called progress and preservation. In the setting of an abstract machine, it's convenient to merge these two lemmas into one lemma. Note that the conclusion of this lemmas includes two alternatives: the state is final or it can take a step. The power of this lemma is that it rules out the third alternative, that the state is not final and can't take a step. Such a situation is referred to as "stuck" and corresponds to untrapped errors.

Lemma ($\longmapsto$ is safe)
If $\vdash \sigma : T$, then either $\sigma$ is a final state or $\sigma \longmapsto \sigma'$ and $\vdash \sigma' : T$.
Proof. Because $\vdash \sigma : T$ we know that $\sigma = (s,\rho) \cdot \kappa$, $\Gamma \vdash \rho$, $\Gamma \vdash s : T_1$, and $\vdash \kappa : T_1 \Rightarrow T$. We proceed by cases on $s$ because the transition rule that will apply depends primarily on $s$.
1. Case $s = (x = o(\overline{e}); s')$:
We have $\Gamma \vdash x = o(\overline{e}) : (x,T_1)$ and $(x,T_1) \cdot \Gamma \vdash s' : T_1$. So $\Delta(o,\overline{T}') = T_1$ and $\Gamma \vdash e_i : T'_i$ for $i \in \{ 1,\ldots,n\}$. Because $\mathcal{V}$ is safe, we have $\mathcal{V}(e_i,\rho) = v_i$ and $\vdash v_i : T'_i$ for $i \in \{ 1,\ldots,n\}$. Because $\delta$ is safe, we have $\delta(o,\overline{v}) = v$ and $\vdash v : T_1$ for some $v$. Thus, the current state takes the following transition. We have $(x,T_1) \cdot \Gamma \vdash (x,v) \cdot \rho$
and therefore $\vdash (s', (x,v) \cdot \rho) \cdot \kappa : T$
2. Case $s = (x = e_1(e_2); s')$:
From $\Gamma \vdash x = e_1(e_2); s': T_1$ we have $\Gamma \vdash x = e_1(e_2) : (x,T_2)$ and $(x,T_2) \cdot \Gamma \vdash s' : T_1$. Thus, we also have $\Gamma \vdash e_1 : T_3 \to T_2$ and $\Gamma \vdash e_2 : T_3$. Because $\mathcal{V}$ is safe, there exist $v_1$ and $v_2$ such that $\mathcal{V}(e_1,\rho) = v_1$, $\mathcal{V}(e_2,\rho) = v_2$, $\vdash v_1 : T_3 \to T_2$, and $\vdash v_2 : T_3$. The only way for $\vdash v_1 : T_3 \to T_2$ to be true is for $v_1$ to be a closure. That is, $v_1 = \langle \mathsf{fun}\,f(y{:}T_3) s'',\rho' \rangle$. This closure is well typed, so we have $\Gamma' \vdash \rho'$ and $(y:T_3) \cdot (f,T_3 \to T_2) \cdot \Gamma' \vdash s'' : T_2$. We have what we need to know that the current state $\sigma = (x = e_1(e_2); s', \rho) \cdot \kappa$ transitions as follows. We can deduce $\vdash \sigma : T_2 \Rightarrow T$
and $(y,T_3) \cdot (f,T_3\to T_2) \cdot \Gamma' \vdash (y,v_2)\cdot (f,v_1) \cdot \rho'$
so we have everything necessary to conclude
3. Case $s = (\mathbf{return}\,e)$:
If the stack $\kappa$ is empty, then $\sigma$ is a final state and this case is complete. If the stack is not empty, we have $\kappa = (s',\rho') \cdot \kappa'$. Then, because $\vdash \kappa : T_1 \Rightarrow T$, we have $s' = (x = e_1(e_2); s'')$, $\Gamma' \vdash \rho'$, $(x,T_1) \cdot \Gamma' \vdash s'' : T_2$, and $\vdash \kappa' : T_2 \Rightarrow T$. Because $\Gamma \vdash \mathbf{return}\,e : T_1$, we have $\Gamma \vdash e : T_1$ and therefore $\mathcal{V}(e,\rho) = v$ and $\vdash v : T_1$ for some $v$ (because $\mathcal{V}$ is safe). So the current state $\sigma$ takes the following transition: We have $(x,T_1) \cdot \Gamma' \vdash (x,v) \cdot \rho'$, which is the last thing we needed to conclude that $\vdash (s'', (x,v) \cdot \rho') \cdot \kappa' : T$.
QED.

Lemma ($\longmapsto^{*}$ is safe)
If $\vdash \sigma : T$ and $\sigma \longmapsto^{*} \sigma'$, then $\vdash \sigma' : T$.
Proof. The proof is by induction on $\sigma \longmapsto^{*} \sigma'$.
1. Case $\sigma \longmapsto^{*} \sigma$: We already have $\vdash \sigma : T$.
2. Case $\begin{array}{c} \sigma \longmapsto \sigma_1 \quad \sigma_1 \longmapsto^{*} \sigma' \\\hline \sigma \longmapsto^{*} \sigma'\end{array}$: Because $\longmapsto$ is safe and deterministic (the three transition rules obviously do not overlap), we have $\vdash \sigma_1 : T$. Then by the induction hypothesis, we conclude that $\vdash \sigma' : T$.
QED.

Theorem (Type Safety)
If $\epsilon \vdash s : T$, then either
1. $\mathit{eval}(s) = c$ and $\vdash c : T$, or
2. $\mathit{eval}(s) = \mathtt{fun}$ and $T = T_1 \to T_2$ for some $T_1,T_2$, or
3. $\mathit{eval}(s) = \bot$ (the program does not terminate).

Proof. Suppose that the program terminates. We have $(s,\epsilon) \cdot \epsilon \longmapsto^{*} \sigma$ for some $\sigma$ and $\neg \exists \sigma'.\, \sigma \longmapsto \sigma'$. Because $\longmapsto^{*}$ is safe, we have $\vdash \sigma : T$. Then because $\longmapsto$ is safe, we know that either $\sigma$ is final or it can take a step. But we know it can't take a step, so it must be final. So $\sigma = (\mathbf{return}\,e, \rho) \cdot \epsilon$ and $\Gamma \vdash e : T$. Then because $\mathcal{V}$ is safe, we have $\mathcal{V}(e,\rho) = v$ and $\vdash v : T$. If $v = c$, then we have $\mathit{eval}(s) = c$ and $\vdash c : T$. If $v = \langle \mathsf{fun}\,f(x{:}T) s',\rho'\rangle$, then $T = T_1 \to T_2$, $\mathit{eval}(s) = \mathtt{fun}$, and $\vdash \mathtt{fun}: T_1 \to T_2$.
QED.

That's it! Type safety in just five easy lemmas.

## Monday, August 13, 2012

### The Work Horse of PL Theory: Structural Induction

The main idea in the Crash Course post was that we can define infinite sets and relations by giving a list of rules (some of which are recursive), that is, by giving an inductive definition. Such inductive definitions are used everywhere. In the crash course we used them to define the syntax of the language, the operational semantics, and the type system.

Once you define a programming language, the next step is to implement an interpreter or compiler and write lots of programs. The point of this, from the design point of view, is to make sure that your definition actually matches your intuitions about how your language should behave. Once you're reasonably happy with the definition of the language, you might want to gain further confidence in the design by proving that the language has the properties that you think it should. For example, you might want reduction to be deterministic, so you would try prove that for each program there is one unique reduction sequence. Another property that you might want is type safety, which means that values aren't misused but also includes things like memory safety (no segmentation faults, buffer overruns, etc.).

It may seem rather daunting to consider proving something about all possible programs. After all, there's an infinite number of them! Fortunately, if we use inductive definitions to specify our languages, then there's a proof technique called structural induction that we can use to prove things about them.

### A Simple Example

Let's begin with an example of using structural induction on the relation $R$ that we defined in the Crash Course. Recall that this relation was defined by the following two rules.

• $(0,1) \in R$.
• For any natural numbers $n$ and $m$, if $(n,m) \in R$, then $(n+1,m+1) \in R$.
Also recall that we had informally described this relation as mapping each natural number to its successor. The following is the formal version of that statement. Now how would we go about proving this statement mathematically?
A proof by structural induction creates a recipe for proving the property of interest for any element of the set. The recipe includes a case for each rule in the inductive definition of the set. Because every element in the set is the root of a tree of rule applications (a derivation), the recipe can be applied to each node in the tree, starting at the leaves, to eventually build a proof for the element at the root.

But it's easy to get lost in generalities, so let's get back to our example and build a recipe for generating a proof that $y = x + 1$ for any pair $(x,y) \in R$.

1. We need to give a proof that $y = x + 1$ for the case where $x=0,y=1$, so by simple arithmetic we have $1 = 0 + 1$.
2. The second rule in the definition of $R$ is more interesting. We're going to give a recipe for creating a proof of $y = x + 1$ for the element in the conclusion of this rule, $(n+1,m+1) \in R$, out of the proof of $y = x + 1$ for the premise of this rule, $(n,m) \in R$. That is, we need to take a proof of $m=n+1$ and create a proof of $m+1=(n+1)+1$. Of course, that's really easy: we just add one to both sides!

So we've created a recipe that includes cases for all of the rules that defined the relation $R$. Furthermore, in the second case, we didn't assume anything about $n$ or $m$, so we can apply the second case in lots of different situations. Now let's see how this recipe can be used to create proofs for elements of $R$. In the Crash Course, we built a derivation that showed $(2,3) \in R$. The derivation started with $(0,1) \in R$ (by rule 1), then had $(1,2) \in R$ (by rule 2), and finally $(2,3) \in R$ (by rule 2). Now let's apply our proof recipe to each of these steps. We have $1 = 0 + 1$ by case 1 above, then $2 = 1 + 1$ by case 2, and finally $3 = 2 + 1$ again by case 2. Thus, we've seen that our property holds for one element of the relation, namely for $(2,3)$. However, this process would have worked for any element of $R$. Thus, the above two recipes can be taken as enough to prove that $y = x + 1$ for every pair $(x,y) \in R$.

### A More Interesting Example

Let's now move on to an example of a proof about programming languages. In the post My new favorite abstraction machine, I wrote down a translation that puts the simply-typed lambda calculus (STLC) into A-normal form (ANF) and then defined an abstract machine that operates on the ANF. The proof of type safety for this definition of the STLC includes two parts: the first shows that the translation of a well-typed program always produces a well-typed program in ANF and the second part proves that the ANF is type safe, that is, the abstract machine doesn't get stuck. Let's prove the first part by structural induction. Formally, what we'd like to prove is where $\emptyset \vdash e : T$ refers to the type system for STLC (see the definition here) and $\emptyset; T \models \mathit{ToANF}(e)$ is the type system for ANF statements. We have not yet given a definition of the type system for ANF statements, definitions, or expressions, so we must pause here to do so. We make one small change to the syntax of ANF statements to make the proof go through easier, allowing a sequence of declarations in a statement: replacing the statement $d; s$ with $\overline{d}; s$.

The ToANF function is defined in terms of a recursive function ANF. It typically helps to prove something separately for each function, so let's try to come up with a lemma about ANF that will help prove that ToANF preserves types. Looking at the definition of ToANF, what we'd like to be true is the following property. Let's see if we can prove this by structural induction. (We won't be able to but the failure will point out what minor modifications need to be made.)

#### A First Attempt

Our first choice is what to do structural induction on. There's several things in the above statement defined using inductive definitions. The expression $e$ and the typing derivation $\emptyset \vdash e : T$ are both good candidates because they appear to the left of the implication. Let's do induction on the typing derivation because that tends to give us more ammunition than induction on the structure of expressions.

There are five rules that define the typing relation for the STLC, so we'll need five cases.

1. Case The premise $(x,T) \in \emptyset$ is false, so this case is vacuously true. However, this should make us start to worry, something fishy is going on.
2. Case The proof recipe for this case needs to take a proof for the premise and create a proof for the conclusion. However, our property doesn't apply to the premise because the type environment $\emptyset,x{:}T_1$ isn't empty. At this point we're stuck and we must have the good sense to give up this proof attempt.

#### A Successful Structural Induction

But what have we learned? What we're trying to prove must not fix the type environment as being empty, but needs to work with any type environment. Let's reformulate the property with an arbitrary $\Gamma$ replacing the $\emptyset$. Let's try the proof by induction on $\Gamma \vdash e : T$ again.

1. Case We have $\mathit{ANF}(x) = (x,\epsilon)$, so we need to show that $\mathbf{return}\,x$ is well typed. From $(x,T) \in \Gamma$ we have $\Gamma \models x : T$ and therefore $\Gamma; T \models \mathbf{return}\,x$.
2. Case We have $\mathit{ANF}(\lambda x{:}T_1.\,e') = (F, \epsilon)$ where $F = \lambda x{:}T_1.\, (\overline{d}; \mathbf{return}\,e'')$ and $(e'',\overline{d}) = \mathit{ANF}(e')$. So we need to prove that $\Gamma; T_1 \to T_2 \models \mathbf{return}\,F$ and so it suffices to prove that $\Gamma \models F : T_1 \to T_2$. Our recipe for this case takes as input a proof for the premise, so we know that (The statement is called an induction hypothesis. It's always a good idea to write out an induction hypothesis carefully because it's easy to get it wrong. To get an induction hypothesis right, just instantiate the forall in the property you are trying to prove with the appropriate variables from the premise. Do this mechanically; thinking about it too much can lead to errors! Also, always tell your reader when you're using the induction hypothesis; think of it as the climax of the case!)

Because $\mathit{ANF}$ is a function (in the mathematical sense), we know that the $e''$ and $\overline{d}$ in the existential must be the same as the variables of the same name in our current proof. So our induction hypothesis tells us that $\Gamma,x{:}T_1; T_2 \models (\overline{d}; \mathbf{return}\, e'')$. We then just apply the typing rule for lambda's to prove our goal, that $\Gamma \models F : T_1 \to T_2$.

3. Case We have $\mathit{ANF}(e_1\,e_2) = (x,\overline{d}_1 \overline{d}_2; x=e'_1(e'_2))$, where $(e'_1,\overline{d}_1) = \mathit{ANF}(e_1)$, $(e'_2,\overline{d}_1) = \mathit{ANF}(e_2)$, and $x$ is a fresh variable. We need to prove that We have two induction hypotheses, one for each premise: and So we have $\Gamma \models \overline{d}_1 \Rightarrow \Gamma_1$ and $\Gamma \cup \Gamma_1 \models e'_1 : T_1 \to T_2$ from the first induction hypothesis (IH for short). We have $\Gamma \models \overline{d}_2 \Rightarrow \Gamma_2$ and $\Gamma \cup \Gamma_2 \models e'_2 : T_1$ from the second IH. Looking back at what we need to prove, there's a bit of a jump that needs to happen to get us from $\overline{d}_1$ and $\overline{d}_2$ being well-typed separately to their concatenation being well typed. Let's posit the following lemma: under the assumption that the variables on the left-hand sides are all distinct. Also, we need to show that Again there's a bit of a jump that needs to happen. This time, the problem is that we know $e'_1$ and $e'_2$ are well typed in smaller environments (missing $\Gamma_2$ in the former case and missing $\Gamma_1$ in the later). So we need a lemma that tells us it is ok to add more variables to the environment so long as those variables to don't interfere with variables that are already there. This lemma comes up so frequently it has a special name. It is know as the Weakening Lemma. Applying this lemma twice, we have and which gives us and then applying the concatenation lemma again, we have All that remains is to deal with the final $\mathbf{return}\,x$. But it's easy to see that so we can conclude this case, having shown that
So far we've handled three of the five cases, so the proof is not yet complete. I'd encourage the reader to try and fill in the two remaining cases (for constants and primitive operators) and the two lemmas that we used. (Hint: use structural induction to prove the lemmas.)

### Conclusion

If you define a set using an inductive definition, then you can prove stuff about every element of the set (even if the set is infinite!) by structural induction. To do this, create a recipe that includes a case for each rule in the inductive definition. Your recipe for each case takes as input the proof for the premises and it should output a proof for the conclusion of the rule. That's all it takes! Just make sure not to skip any cases and be extra careful with the induction hypotheses!

## Friday, August 10, 2012

### Compositional Separate Compilation, Take Three

In the post Separate Compilation, Take Two, Compositionally, I tried to write down a trace-based semantics for the lambda calculus that was compositional, that is, a semantics that could tell us the meaning of an individual compilation unit. The result was a semantics that wasn't quite right. A call to an external function could return a function masquerading as an internal function but that was never created by the compilation unit. Here I solve the problem in a fairly straightforward way, by threading an internal function map through the semantics.

### Syntax

The syntax is the same as before:

### Traces

We make a few changes to the definition of traces and behavior. Our representation of a function now has an extra parameter for the internal function map. We only have one kind of action, the function-call action, whose argument and result are now required to be exportables. A function map is just a map from function names to functions. Behavior is now a set of 3-tuples, whose parts include the trace, result value, and updated internal function map.

### Trace Semantics

To handle external function calls, we'll need to be able to convert from values to exportables and back again. If the value being exported is a function, we need to create a new name for it and add it to the internal function map. On the way back, if the function reference is to an internal function, we need to look it up in the internal function map. The $\mathit{export}$ and $\mathit{import}$ functions handle this logic. Both of these functions act like the identify function on integers and external functions. The meaning function for expressions now takes two more parameters, the name of the current compilation unit and the internal function map. As before, the result is the behavior of the expression. where $\mathcal{I}$ handles calls to internal functions and $\mathcal{E}$ handles calls to external functions To wrap things up, we define the meaning of a compilation unit as follows, with the internal function map not part of the observable behavior.

## Tuesday, August 07, 2012

### How to Prove It

One of the better books that introduces rigorous proof is How to Prove It by Daniel J. Velleman. Here I'm going to discuss one of the key skills from that book, which is recognizing what form of proof is appropriate for the kind of thing you want to prove. This skill is the easy part of doing proofs, but I find that many students are too slow in this regard. Analogous to a computer game, one needs to become fast with the button combinations. You shouldn't have to think consciously about them. The same is true for the basics of proof. There's a bunch of stuff that should be automatic, at the tips of your fingers. Once this is accomplished, your mind is then free to think about the truly hard aspects of more difficult proofs.

For today, what we're going to prove is formulas in propositional logic. These formulas form a little language, defined by the grammar below. We'll use lowercase letters to represent propositional variables. These variables represent statements such as "socrates is mortal". We'll use uppercase letters as placeholders for formulas. The kinds of formulas, working left to right, are variables, the trivial formulas $\mathit{True}$ and $\mathit{False}$, negation ($\neg$), logical and ($\land$), logical or ($\lor$), and implication ($\to$). We refer to the symbols $\land$, $\lor$, etc. as logical connectives. The following is an example of a formula:

### Identify the Main Connective

The first skill needed to recognize what form of proof is needed to prove a formula, such as the one above, is the ability to identify the main connective of a formula. The main connective is at the root of the parse tree for the formula. One can construct a parse tree by starting at the leaves and working your way towards the root. The leaves are the formulas that don't have any sub-parts. Constructing a parse tree is equivalent to adding parentheses to every sub-part of a formula, which is what we'll do here. In the above formula, the leaves are: $p,q,q,p$. Working from the first $p$ and $q$ we have $(p \to q)$. (We just used the grammar rule $P ::= P \to Q$.) Working our way from the last $p$ we have $(\neg p)$. For the second occurrence of $q$ we have two choices, we can either go to the left to form $(\neg q)$ or we can go to the right and form $(q \to (\neg p))$. Here we need to use the precedence of the connectives to decide what to do. The grammar above lists the formulas in the order of their precedence. So $\neg$ is higher precedence than $\to$, so we go to the left and form $(\neg q)$. So far, we have parses for Next, we need to decide whether to combine $(p \to q)$ and $(\neg q)$ with the connective $\land$, or whether to combine $(\neg q)$ and $(\neg p)$ with the connective $\to$. Again, we consider the precedences, and here $\land$ has higher precedence than $\to$. So we have the parses: The last step is to parse the last connective, $\to$, and complete the tree: The root of the tree is the last connective that we added to complete the parse, in this case, the right-most $\to$. So we have found the main connective. It's also important to know the two sub-trees of the main connective. In this case,

### Prove It!

Now that we've found the main connective, we just do a simple table look-up to figure out what form of proof we should use. Here's the table, which we call the prove-it table.
1. $\neg P$: assume $P$ and prove $\mathit{False}$.
2. $P \land Q$: prove $P$ and also prove $Q$.
3. $P \lor Q$: prove one of $P$ or $Q$.
4. $P \to Q$: assume $P$ and then prove $Q$.
Our main connective is $\to$, so we can begin our proof using prove-it rule 4:
Assume $(p \to q) \land (\neg q)$.
$\vdots$
Therefore $(\neg p)$.
Therefore $((p \to q) \land (\neg q)) \to (\neg p)$.
The $\;\vdots\;$ represents the part of the proof that we haven't finished yet. The table told use to assume $P$, meaning that we should assume the first sub-tree under the implication, which in this case was $((p \to q) \land (\neg q))$. We need to prove $Q$, the second sub-tree, which in this case is $(\neg p)$.

At this point we need to prove $(\neg p)$, so we repeat the above process. The main connective is $\neg$, so rule 1 of the prove-it table tells us to assume $p$ and prove $\mathit{False}$.

Assume $((p \to q) \land (\neg q))$.
Assume $p$.
$\vdots$
Therefore $\mathit{False}$.
Therefore $(\neg p)$.
Therefore $((p \to q) \land (\neg q)) \to (\neg p)$.
At this point we're a bit stuck, there are no more main connectives to find in the simple formula $\mathit{False}$. But that's ok, we have lots of assumptions to use!

Aside: the fancy name for a prove-it rule is an introduction rule.

### Use It!

Using assumptions and previously-proven facts is a lot like proving formulas: find the main connective and then do a table look-up. The only difference is that we use a different table, shown below, which we call the use-it table.

1. From $\neg P$ and $P$: have $Q$. (You get to choose any formula!)
2. From $P \land Q$: have $P$.
3. From $P \land Q$: have $Q$.
4. From $P \lor Q$, $P \to R$, and $Q \to R$: have $R$.
5. From $P \to Q$ and $P$: have $Q$.

We have the assumption $((p \to q) \land (\neg q))$, whose main connective is $\land$, so we apply the use-it rules 2 and 3.

Assume $((p \to q) \land (\neg q))$.
Have $p \to q$.
Have $\neg q$.
Assume $p$.
$\vdots$
Therefore $\mathit{False}$.
Therefore $(\neg p)$.
Therefore $((p \to q) \land (\neg q)) \to (\neg p)$.
Next we apply use-it rule 5 with $p \to q$ and $p$ to deduce $q$. (The fancy name for use-it rule 5 is modus ponens.)
Assume $((p \to q) \land (\neg q))$.
Have $p \to q$.
Have $\neg q$.
Assume $p$.
Have $q$.
$\vdots$
Therefore $\mathit{False}$.
Therefore $(\neg p)$.
Therefore $((p \to q) \land (\neg q)) \to (\neg p)$.
We now have $q$ and $\neg q$, so we can conclude anything at all. (We've reached a contradiction.) We want to prove $\mathit{False}$, so that's what we'll choose. Thus, we don't need any more steps in the proof, so we remove the vertical dots and our proof is complete.
Assume $((p \to q) \land (\neg q))$.
Have $p \to q$.
Have $\neg q$.
Assume $p$.
Have $q$.
Therefore $\mathit{False}$.
Therefore $(\neg p)$.
Therefore $((p \to q) \land (\neg q)) \to (\neg p)$.
Also, the name for the formula we just proved is modus tolens.

In general, the fancy name for a use-it rule is an elimination rule.

### More Practice

In the above proof, we didn't get a chance to apply the use-it rule 4 or prove-it rule 3, so let's look at another proof. Let's prove Skipping a couple steps that we already know how to do, we arrive at the following partial proof.

Assume $(p \lor q) \land (\neg p \lor r)$.
Have $p \lor q$.
Have $\neg p \lor r$.
$\vdots$
Therefore $q \lor r$.
Therefore $(p \lor q) \land (\neg p \lor r) \to q \lor r$.
We have two formulas whose main connective is a logical or, so we need to apply use-it rule 4. This rule is also know as reasoning by cases. We have to choose between using fact $p \lor q$ or the fact $\neg p \lor r$. Sometimes choices like this matter, and you might end up wasting time by making the wrong choice, but that's water under the bridge and you can always backtrack and try the other option. Also, as you gain in skill you'll be able to look a bit into the future, like a good chess player, and figure out which choice is better. Sometimes a choice like this doesn't matter because you're going to end up using both assumptions and it doesn't matter what order to use them in.

Here we're going to use the first fact first, $p \lor q$. Now, to apply use-it rule 4 to this assumption, we need two more things that we don't have. We need $p \to R$ and $q \to R$. We get to choose what $R$ stands for. Once we've applied rule 4, we'll know this $R$ is true, so let's choose what we're trying to prove, $q \lor r$. Our partial proof expands to the following.

Assume $(p \lor q) \land (\neg p \lor r)$.
Have $p \lor q$.
Have $\neg p \lor r$.
Assume $p$.
$\vdots$
Therefore $q \lor r$.
Therefore $p \to (q \lor r)$.
Assume $q$.
$\vdots$
Therefore $q \lor r$.
Therefore $q \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $(p \lor q) \land (\neg p \lor r) \to q \lor r$.
The second case, in which we assume $q$ is easy. We just apply prove-it rule 3. So we've connected those dots.
Assume $(p \lor q) \land (\neg p \lor r)$.
Have $p \lor q$.
Have $\neg p \lor r$.
Assume $p$.
$\vdots$
Therefore $q \lor r$.
Therefore $p \to (q \lor r)$.
Assume $q$.
Therefore $q \lor r$.
Therefore $q \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $(p \lor q) \land (\neg p \lor r) \to q \lor r$.
To deal with the other set of vertical dots, the way forward is not so obvious. But we do have one more fact to use: $\neg p \lor r$. To apply use-it rule 4, we'll need to prove a pair of implications. Again, let's choose $q \lor r$ as the $R$.
Assume $(p \lor q) \land (\neg p \lor r)$.
Have $p \lor q$.
Have $\neg p \lor r$.
Assume $p$.
Assume $\neg p$.
$\vdots$
Therefore $q \lor r$.
Therefore $\neg p \to (q \lor r)$.
Assume $r$.
$\vdots$
Therefore $q \lor r$.
Therefore $\neg p \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $p \to (q \lor r)$.
Assume $q$.
Therefore $q \lor r$.
Therefore $q \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $(p \lor q) \land (\neg p \lor r) \to q \lor r$.
The first set of dots can be eliminated by use-it rule 1 and the second set of dots can be eliminated by prove it rule 3. So our completed proof is as follows.
Assume $(p \lor q) \land (\neg p \lor r)$.
Have $p \lor q$.
Have $\neg p \lor r$.
Assume $p$.
Assume $\neg p$.
Therefore $q \lor r$.
Therefore $\neg p \to (q \lor r)$.
Assume $r$.
Therefore $q \lor r$.
Therefore $\neg p \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $p \to (q \lor r)$.
Assume $q$.
Therefore $q \lor r$.
Therefore $q \to (q \lor r)$.
Therefore $q \lor r$.
Therefore $(p \lor q) \land (\neg p \lor r) \to q \lor r$.
Aside: The formula we have just proved is known as the resolution rule.

### But this doesn't look like a real proof!

If you've read one or more proofs in textbooks, you might be thinking at this point that what you've seen above doesn't look like the proofs you've seen in the past. That's because seasoned mathematicians expect that you can do proofs such as the ones above in your head, and don't write down those parts of the proofs. That is, they skip lots of steps and only write down the really important stuff. So why is it a good idea to start by writing down proofs such as the ones above? It's great practice! It's the only way I know that you'll gain enough experience that you'll be able do these simple proofs in your head with confidence.

### Conclusion

The basics of proof is simple. Look at what you need to prove, find the main connective, and find what to do in the prove-it table. Repeat until you run into a dead end. Then look at what you know (assumptions and facts you've already proved), find the main connectives in those formulas, and apply the use-it rules. Hopefully that let's you connect all the dots! Happy propositional proving.

## Sunday, August 05, 2012

### Separate Compilation, Take Two, Compositionally

A couple weeks ago I wrote about what separate compilation means for the lambda calculus. To capture the observable behavior of the compilation units, I defined an abstract machine that produced a trace of the calls between them. The idea was that an optimizing compiler has free reign within a compilation unit, but that it must make and respond to the appropriate external function calls. I noted that while the abstract machine approach got the job done, it would be preferable to map directly from the syntax of the lambda calculus to a trace instead of indirectly via the abstract machine. Further, we would like to know what a a single compilation unit means in isolation, but the abstract machine only gives a semantics to whole programs.

A semantics that can give meaning to parts of a program is, roughly speaking, called a compositional semantics. A compositional semantics defines the meaning of each construct of the language in terms of the meaning of the sub-parts of the construct. For example, a conditional 'if' is defined in terms of the meaning of the conditional and the two branches. One of the main styles of semantics that is compositional is denotational semantics. Such a semantics maps from the syntax of the program to some mathematical constructs that specify the behavior of the program (typically a function from inputs to outputs). A big-step semantics also maps from syntax to mathematical constructs, but big-step semantics are often not compositional. For example, the meaning of a lambda term is not defined in terms of the meaning of its body. Instead, a big-step semantics maps a lambda term to a closure, which is still syntax. (For reference, see Big-step, diverging or stuck?.)

Here I'm going to give a denotational semantics for the separately compiled lambda calculus, mapping from the syntax of a compilation unit to a set of all possible execution traces.

### Syntax

The following will serve as the syntax for the separately-compiled lambda calculus. The main addition is the notation $c.f$ for referring to functions in other compilation units. Also, each compilation unit consists of a function table that maps function names to lambdas and a whole program is a mapping from compilation unit names to function tables.

### Traces

A trace is a sequence of actions. In our setting, actions are function calls or producing a result value. For a function call, we record the name of the function, the argument, and the return value. We'll need three kinds of values: integers, external functions, and functions. From within one compilation unit, we don't know what the external functions do, so we treat them symbolically. On the other hand, functions from within the current unit are known and we represent them directly in terms of their observable behavior, that is, as a mapping from values to a set of traces. The definitions of the sets $\mathsf{Val}$, $\mathsf{Fun}$, $\mathsf{Act}$, and $\mathsf{Beh}$ are mutually recursive. One doesn't do that with run-of-the-mill set theory. Here we must invoke the inverse limit construction to obtain sets that solve the above equations. Thanks Dana Scott! (If you want to read more about this, I recommend starting with Chapter 11 of David A. Schmidt's book on denotational semantics.)

We'll represent function tables with the following function maps, and programs with the following unit maps.

### Preview of the Semantics

Before giving the general definition of the semantics, let's consider a couple examples. We'll write $\mathcal{M}(e,\rho)$ for the behavior (set of traces) of expression $e$ in environment $\rho$. (Environments map variables to values.)

First, let's consider the semantics of the identify function. It's behavior is captured by a single trace which consists of the single action of producing a function $\varphi_{\mathit{id}}$. Of course, this function maps every value to itself.

Next let's consider a call to an external function. We don't know anything about the other compilation units, so we'll include all possible return values as possible behavior. For example, we'll have

### Semantics as Traces

The interesting part of the semantics is in function application. The result from $e_1$ may be either an external function or an internal function. The helper function $\mathcal{E}$ handles the external calls and $\mathcal{I}$ the internal calls. where We can now define the meaning of a compilation unit, that is, the meaning of a function table, as follows.

### Meaning of Whole Programs via Composition

That was the main event, but let's go a bit further and see how we can give the meaning of a program in terms of the meanings of the compilation units. To do this, we need a way of filtering out traces that include calls to external functions that couldn't happen, that is, calls whose argument and return values don't match what you'd actually get from the given function in the other compilation unit. To do this, we'll need to pass values back and forth between units. If the values are just numbers, this is easy, but for functions things are a bit more complex because we need to convert back and forth from the symbolic (hidden) representation of a function to the value to behavior mapping. First, we define a reg function for registering a function under a fresh name. We also need a way to convert from a symbolic function back to the value to behavior mapping, which we accomplish with the following up-arrow function. We now define the following predicates on actions and sequences of actions. Next we define a filter that takes a behavior (set of traces) and keeps only those traces that can really happen. Filtering extends naturally to functions and unit maps.

With filtering in place, we are now ready to give the meaning of a whole program in terms of the meaning of each compilation unit. where

### Critique

I'm not completely happy with the above semantics. The reason is that the behavior of an expression includes receiving internal functions that it never created. Hopefully I'll read about or discover a solution to this!

### Historical note: Algol 60 was an early metaprogramming lang.

Some literature browsing this week brought me back to Algo 60, and for some reason, instead of blowing it off as an old dead call-by-name language, I read a bit more this time. Perhaps what caught my eye was a statement by Reynolds that Algo 60 has first-class functions and stack allocation, two features that are difficult to mix in a safe way. So I read chapter 19 of Theories of Programming Languages by Reynolds, which then pointed me to On the Orthogonality of Assignments and Procedures in Algol by Weeks and Felleisen.

It turns out that Algo 60 was an early meta-programming language in that it provided two stages of computation. The first stage was essentially a lambda calculus (pure and functional) whereas the second stage is a simple imperative language. This reminds me of C++, which has a nearly pure functional language in its template system as the first stage and an imperative language in its second stage. Algol 60 is also similar to MetaML in that the type system helps enforce the distinction between the two stages, with expressions and statements of the second stage marked as phrase types, analogous to the code type of MetaML.

With regards to Algo 60's first-class functions and stack allocation, it's no wonder that the combination doesn't cause trouble because they don't exist at the same time in Algo 60: functions live in stage 1 and stack allocation occurs in stage 2.

On the one hand, I'm intrigued by Algo 60's 2-stage design, and think that it could handle a very large number of the programming situations that I care about. On the other hand, my default position regarding language features is that they should be first class, so I find it disturbing that functions are not first-class in Algo 60's second stage (only in it's first stage).