Jeremy Siek: The Work Horse of PL Theory: Structural Induction

The main idea in the Crash Course post was that we can define infinite sets and relations by giving a list of rules (some of which are recursive), that is, by giving an inductive definition. Such inductive definitions are used everywhere. In the crash course we used them to define the syntax of the language, the operational semantics, and the type system.

Once you define a programming language, the next step is to implement an interpreter or compiler and write lots of programs. The point of this, from the design point of view, is to make sure that your definition actually matches your intuitions about how your language should behave. Once you're reasonably happy with the definition of the language, you might want to gain further confidence in the design by proving that the language has the properties that you think it should. For example, you might want reduction to be deterministic, so you would try prove that for each program there is one unique reduction sequence. Another property that you might want is type safety, which means that values aren't misused but also includes things like memory safety (no segmentation faults, buffer overruns, etc.).

It may seem rather daunting to consider proving something about all possible programs. After all, there's an infinite number of them! Fortunately, if we use inductive definitions to specify our languages, then there's a proof technique called structural induction that we can use to prove things about them.

A Simple Example

Let's begin with an example of using structural induction on the relation $R$ that we defined in the Crash Course. Recall that this relation was defined by the following two rules.

$(0,1) \in R$ .
For any natural numbers $n$ and $m$ , if $(n,m) \in R$ , then $(n+1,m+1) \in R$ .

Also recall that we had informally described this relation as mapping each natural number to its successor. The following is the formal version of that statement. $\forall x\, y.\, (x,y) \in R \to y = x + 1$ Now how would we go about proving this statement mathematically?

A proof by structural induction creates a recipe for proving the property of interest for any element of the set. The recipe includes a case for each rule in the inductive definition of the set. Because every element in the set is the root of a tree of rule applications (a derivation), the recipe can be applied to each node in the tree, starting at the leaves, to eventually build a proof for the element at the root.

But it's easy to get lost in generalities, so let's get back to our example and build a recipe for generating a proof that $y = x + 1$ for any pair $(x,y) \in R$ .

We need to give a proof that $y = x + 1$ for the case where $x=0,y=1$ , so by simple arithmetic we have $1 = 0 + 1$ .
The second rule in the definition of $R$ is more interesting. We're going to give a recipe for creating a proof of $y = x + 1$ for the element in the conclusion of this rule, $(n+1,m+1) \in R$ , out of the proof of $y = x + 1$ for the premise of this rule, $(n,m) \in R$ . That is, we need to take a proof of $m=n+1$ and create a proof of $m+1=(n+1)+1$ . Of course, that's really easy: we just add one to both sides!

So we've created a recipe that includes cases for all of the rules that defined the relation $R$ . Furthermore, in the second case, we didn't assume anything about $n$ or $m$ , so we can apply the second case in lots of different situations. Now let's see how this recipe can be used to create proofs for elements of $R$ . In the Crash Course, we built a derivation that showed $(2,3) \in R$ . The derivation started with $(0,1) \in R$ (by rule 1), then had $(1,2) \in R$ (by rule 2), and finally $(2,3) \in R$ (by rule 2). Now let's apply our proof recipe to each of these steps. We have $1 = 0 + 1$ by case 1 above, then $2 = 1 + 1$ by case 2, and finally $3 = 2 + 1$ again by case 2. Thus, we've seen that our property holds for one element of the relation, namely for $(2,3)$ . However, this process would have worked for any element of $R$ . Thus, the above two recipes can be taken as enough to prove that $y = x + 1$ for every pair $(x,y) \in R$ .

A More Interesting Example

Let's now move on to an example of a proof about programming languages. In the post My new favorite abstraction machine, I wrote down a translation that puts the simply-typed lambda calculus (STLC) into A-normal form (ANF) and then defined an abstract machine that operates on the ANF. The proof of type safety for this definition of the STLC includes two parts: the first shows that the translation of a well-typed program always produces a well-typed program in ANF and the second part proves that the ANF is type safe, that is, the abstract machine doesn't get stuck. Let's prove the first part by structural induction. Formally, what we'd like to prove is $\forall e\,T. (\emptyset \vdash e : T) \to (\emptyset; T \models \mathit{ToANF}(e))$ where $\emptyset \vdash e : T$ refers to the type system for STLC (see the definition here) and $\emptyset; T \models \mathit{ToANF}(e)$ is the type system for ANF statements. We have not yet given a definition of the type system for ANF statements, definitions, or expressions, so we must pause here to do so. We make one small change to the syntax of ANF statements to make the proof go through easier, allowing a sequence of declarations in a statement: replacing the statement $d; s$ with $\overline{d}; s$ . $\begin{gather*} \Gamma \models c : \mathit{typeof}(c) \quad \frac{ (x,T) \in \Gamma }{ \Gamma \models x : T } \quad \frac{ \Gamma,x{:}T_1 ; T_2 \models s }{ \Gamma \models (\lambda x{:}T_1.\, s) : T_1 \to T_2 } \\[3ex] \frac{ \Gamma \models e_1 : T_1 \to T_2 \quad \Gamma \models e_2 : T_1 }{ \Gamma \models x = e_1(e_2) \Rightarrow \{ (x,T_2) \} } \\[2ex] \frac{ \Gamma \models \overline{e} : \overline{T} \quad \Delta(\mathit{op},\overline{T}) = T }{ \Gamma \models x = \mathit{op}(\overline{e}) \Rightarrow \{ (x,T) \} } \\[3ex] \Gamma \models \epsilon \Rightarrow \emptyset \quad \frac{ \Gamma \models d \Rightarrow \Gamma' \quad \Gamma \cup \Gamma' \models \overline{d} \Rightarrow \Gamma'' }{ \Gamma \models d \, \overline{d} \Rightarrow \Gamma' \cup \Gamma'' } \\[3ex] \frac{ \Gamma \models \overline{d} \Rightarrow \Gamma' \quad \Gamma \cup \Gamma'; T \models s }{ \Gamma; T \models \overline{d}; s } \quad \frac{ \Gamma \models e : T }{ \Gamma; T \models \mathbf{return}\,e } \end{gather*}$

The ToANF function is defined in terms of a recursive function ANF. It typically helps to prove something separately for each function, so let's try to come up with a lemma about ANF that will help prove that ToANF preserves types. Looking at the definition of ToANF, what we'd like to be true is the following property. $\forall e\,T. (\emptyset \vdash e : T) \to \exists e' \overline{d}.\, \mathit{ANF}(e) = (e',\overline{d}) \land (\emptyset; T \models \overline{d}; \mathbf{return}\,e')$ Let's see if we can prove this by structural induction. (We won't be able to but the failure will point out what minor modifications need to be made.)

A First Attempt

Our first choice is what to do structural induction on. There's several things in the above statement defined using inductive definitions. The expression $e$ and the typing derivation $\emptyset \vdash e : T$ are both good candidates because they appear to the left of the implication. Let's do induction on the typing derivation because that tends to give us more ammunition than induction on the structure of expressions.

There are five rules that define the typing relation for the STLC, so we'll need five cases.

Case $\frac{ (x,T) \in \emptyset }{ \emptyset \vdash x : T }$ The premise $(x,T) \in \emptyset$ is false, so this case is vacuously true. However, this should make us start to worry, something fishy is going on.
Case $\frac{ \emptyset,x{:}T_1 \vdash e' : T_2 }{ \emptyset \vdash \lambda x{:}T_1.\, e' : T_1 \to T_2 }$ The proof recipe for this case needs to take a proof for the premise and create a proof for the conclusion. However, our property doesn't apply to the premise because the type environment $\emptyset,x{:}T_1$ isn't empty. At this point we're stuck and we must have the good sense to give up this proof attempt.

A Successful Structural Induction

But what have we learned? What we're trying to prove must not fix the type environment as being empty, but needs to work with any type environment. Let's reformulate the property with an arbitrary $\Gamma$ replacing the $\emptyset$ . $\forall \Gamma\,e\,T. (\Gamma \vdash e : T) \to \exists e' \overline{d}.\, \mathit{ANF}(e) = (e',\overline{d}) \land (\Gamma; T \models \overline{d}; \mathbf{return}\,e')$ Let's try the proof by induction on $\Gamma \vdash e : T$ again.

Case $\frac{ (x,T) \in \Gamma }{ \Gamma \vdash x : T }$ We have $\mathit{ANF}(x) = (x,\epsilon)$ , so we need to show that $\mathbf{return}\,x$ is well typed. From $(x,T) \in \Gamma$ we have $\Gamma \models x : T$ and therefore $\Gamma; T \models \mathbf{return}\,x$ .
Case $\frac{ \Gamma,x{:}T_1 \vdash e' : T_2 }{ \Gamma \vdash \lambda x{:}T_1.\, e' : T_1 \to T_2 }$ We have $\mathit{ANF}(\lambda x{:}T_1.\,e') = (F, \epsilon)$ where $F = \lambda x{:}T_1.\, (\overline{d}; \mathbf{return}\,e'')$ and $(e'',\overline{d}) = \mathit{ANF}(e')$ . So we need to prove that $\Gamma; T_1 \to T_2 \models \mathbf{return}\,F$ and so it suffices to prove that $\Gamma \models F : T_1 \to T_2$ . Our recipe for this case takes as input a proof for the premise, so we know that $\begin{align*} &(\Gamma,x{:}T_1 \vdash e' : T_2) \to \\ & \exists e'' \overline{d}.\, \mathit{ANF}(e') = (e'',\overline{d}) \land (\Gamma,x{:}T_1; T_2 \models \overline{d}; \mathbf{return}\,e'') \end{align*}$ (The statement is called an induction hypothesis. It's always a good idea to write out an induction hypothesis carefully because it's easy to get it wrong. To get an induction hypothesis right, just instantiate the forall in the property you are trying to prove with the appropriate variables from the premise. Do this mechanically; thinking about it too much can lead to errors! Also, always tell your reader when you're using the induction hypothesis; think of it as the climax of the case!)

Because $\mathit{ANF}$ is a function (in the mathematical sense), we know that the $e''$ and $\overline{d}$ in the existential must be the same as the variables of the same name in our current proof. So our induction hypothesis tells us that $\Gamma,x{:}T_1; T_2 \models (\overline{d}; \mathbf{return}\, e'')$ . We then just apply the typing rule for lambda's to prove our goal, that $\Gamma \models F : T_1 \to T_2$ .
Case $\frac{ \Gamma \vdash e_1 : T_1 \to T_2 \quad \Gamma \vdash e_2 : T_1 }{ \Gamma \vdash e_1\,e_2 : T_2 }$ We have $\mathit{ANF}(e_1\,e_2) = (x,\overline{d}_1 \overline{d}_2; x=e'_1(e'_2))$ , where $(e'_1,\overline{d}_1) = \mathit{ANF}(e_1)$ , $(e'_2,\overline{d}_1) = \mathit{ANF}(e_2)$ , and $x$ is a fresh variable. We need to prove that $\Gamma; T_2 \models \overline{d}_1 \overline{d}_2; x=e'_1(e'_2); \mathbf{return}\,x$ We have two induction hypotheses, one for each premise: $\begin{align*} & (\Gamma \vdash e_1 : T_1\to T_2) \to \\ & \exists e'_1 \overline{d}_1.\, \mathit{ANF}(e_1) = (e'_1,\overline{d}_1) \land (\Gamma; T_1\to T_2 \models \overline{d}_1; \mathbf{return}\,e'_1) \end{align*}$ and $\begin{align*} & (\Gamma \vdash e_2 : T_1) \to \\ & \exists e'_2 \overline{d}_2.\, \mathit{ANF}(e_2) = (e'_2,\overline{d}_2) \land (\Gamma; T_1 \models \overline{d}_2; \mathbf{return}\,e'_2) \end{align*}$ So we have $\Gamma \models \overline{d}_1 \Rightarrow \Gamma_1$ and $\Gamma \cup \Gamma_1 \models e'_1 : T_1 \to T_2$ from the first induction hypothesis (IH for short). We have $\Gamma \models \overline{d}_2 \Rightarrow \Gamma_2$ and $\Gamma \cup \Gamma_2 \models e'_2 : T_1$ from the second IH. Looking back at what we need to prove, there's a bit of a jump that needs to happen to get us from $\overline{d}_1$ and $\overline{d}_2$ being well-typed separately to their concatenation being well typed. Let's posit the following lemma: $\Gamma \models \overline{d}_1 \Rightarrow \Gamma_1 \land \Gamma \models \overline{d}_2 \Rightarrow \Gamma_2 \to \Gamma \models \overline{d}_1 \overline{d}_2 \Rightarrow \Gamma_1 \cup \Gamma_2$ under the assumption that the variables on the left-hand sides are all distinct. Also, we need to show that $\Gamma \cup \Gamma_1 \cup \Gamma_2 \models x = e'_1(e'_2) \Rightarrow \{(x,T_2)\}$ Again there's a bit of a jump that needs to happen. This time, the problem is that we know $e'_1$ and $e'_2$ are well typed in smaller environments (missing $\Gamma_2$ in the former case and missing $\Gamma_1$ in the later). So we need a lemma that tells us it is ok to add more variables to the environment so long as those variables to don't interfere with variables that are already there. This lemma comes up so frequently it has a special name. It is know as the Weakening Lemma. $(\Gamma \models e : T) \land (\Gamma \subseteq \Gamma') \to (\Gamma' \models e :T)$ Applying this lemma twice, we have $\Gamma \cup \Gamma_1 \cup \Gamma_2 \models e'_1 : T_1 \to T_2$ and $\Gamma \cup \Gamma_1 \cup \Gamma_2 \models e'_2 : T_1$ which gives us $\Gamma \cup \Gamma_1 \cup \Gamma_2 \models x = e'_1(e'_2) \Rightarrow \{(x,T_2)\}$ and then applying the concatenation lemma again, we have $\Gamma \models \overline{d}_1 \overline{d}_2; x=e'_1(e'_2) \Rightarrow \Gamma_1 \cup \Gamma_2 \cup \{(x,T_2)\}$ All that remains is to deal with the final $\mathbf{return}\,x$ . But it's easy to see that $\Gamma \cup \Gamma_1 \cup \Gamma_2 \cup \{ (x,T_2) \} \models x : T_2$ so we can conclude this case, having shown that $\Gamma; T_2 \models \overline{d}_1 \overline{d}_2; x=e'_1(e'_2); \mathbf{return}\,x$

So far we've handled three of the five cases, so the proof is not yet complete. I'd encourage the reader to try and fill in the two remaining cases (for constants and primitive operators) and the two lemmas that we used. (Hint: use structural induction to prove the lemmas.)

Conclusion

If you define a set using an inductive definition, then you can prove stuff about every element of the set (even if the set is infinite!) by structural induction. To do this, create a recipe that includes a case for each rule in the inductive definition. Your recipe for each case takes as input the proof for the premises and it should output a proof for the conclusion of the rule. That's all it takes! Just make sure not to skip any cases and be extra careful with the induction hypotheses!

1 comment:

Kristopher Micinski12:07 AM
Great post! One of my favorite papers on this topic is "An introduction to (co)algebras and (co)induction," which gave a fun detour on the categorical background behind inductively defined structures (and functions which were defined over them!), along with the (co)universe you get when you stand your head.

Monday, August 13, 2012

The Work Horse of PL Theory: Structural Induction