In my last post Type Safety in Five Easy Lemmas, I made a claim that the formulation of the operational semantics of a language makes a big difference regarding how many lemmas, and how tedious, the proof of type safety becomes. While I showed that the particular semantics that I used led to a simple proof, with just five easy lemmas, I didn't compare it to the alternatives. Perhaps there's an even better alternative! In this post I discuss the alternatives that I know about and why they lead to more lemmas and more tedium. This post is organized by design decision.
Why A-normal form?
The syntax of the little language is in A-normal form, that is, it's flattened out so that expressions don't have sub-expressions, but instead, more complex computations have to be built up from several variable assignments. It's much more common to see type-safety proofs on languages that are not in A-normal form, that is, languages with arbitrary nesting of expressions. This nesting of expressions can be handled in one of two ways in the operational semantics: either with extra so-called congruence reduction rules or by using evaluation contexts. The congruence rule approach adds many more cases to the " is safe" lemma (or equivalently, the progress and preservation lemmas). The evaluation contexts approach requires an extra lemma often called the unique decomposition lemma.
You might be thinking, real languages aren't in A-normal form, so one would really need to compile to A-normal form and prove that the compilation is type preserving (as we did in the Work Horse post). This is true, but I'd argue that it's nice to have a separation of concerns and take care of intra-expression control flow in one function and lemma instead of dealing with it throughout the operational semantics and proof of type safety.
Why environments instead of substitution?
The operational semantics passed around environments that associated values with variables and used the function to find the value for a variable. The more common alternative is to perform substitution, that is, during a function call, to go through the body of the function replacing occurrences of the parameters with their arguments. I'll probably get flamed for saying this, but substitution has a lot of disadvantages compared to environments. First, the substitution function is difficult to define properly. Second, it requires that you prove that substitution is type preserving, which is rather tedious because it touches every kind of expression (or statement) in the language. In comparison, the lemma that we proved about the function didn't mention any expressions. Third, substitution is really slow. Technically, that doesn't effect the complexity of the proof, but it does affect your ability to test your theorems. As I mentioned in the Work Horse post, it's a good idea to implement an interpreter and test whether your theorems hold on lots of example programs before trying to prove your theorems. If your interpreter uses substitution, it will be very very slow. You might instead use environments, but keep substitution in your formal semantics, but then your interpreter and your formal semantics differ so that properties of one may not imply properties of the other.
Why functions with names (for recursion) instead of fix?
The only way that I know of to implement fix, the fixpoint combinator, is with substitution, and we don't want substitution.
Also, I should highlight how we handle recursion in the operational semantics. First, the function doesn't do much, it just captures the current environment. An alternative used by Milner and Tofte is to extend the environment with a binding for the function itself. This alternative makes the proof of type safety more challenging because, instead of using induction, you have to use coinduction. (See their paper Co-induction in relational semantics.) While coinduction is a beautiful concept, it's nice to be able to stick to good old induction. Now, because doesn't do much, we need to make up for it in the transition relation. Notice that the transition for function call extends the environment with the function itself. I forget where I learned this trick. If you know of any references for this, please let me know!
Conclusion
I think that's it for the important design decisions, but I may be forgetting something. If you have any questions regarding the rationale for anything else, please post a comment! So to sum up, if you want your type safety proof to be less tedious, I recommend using A-normal form and formulating your semantics as an environment-passing abstract machine. Oh, and if you want recursive functions, than extend your lambdas with a name and be lazy about extending the environment: wait until the function call to add the function itself to the environment.