Processing math: 100%

Monday, May 27, 2013

Type Safety in Three Easy Lemmas

Last year I wrote a blog post showing how proofs of type safety can be done with just five easy lemmas if one defines the programming language in a certain way, using a special abstract machine. In this blog post I'll improve on that post in two ways. I'll show that type safety can be proved with just three easy lemmas and I'll use a standard interpreter-based definition of the language instead of a special abstract machine.
For readers interested in mechanized metatheory, here's the mechanized version of this post in Isabelle.

Syntax

This time we do not need to put the language in A-normal form. To review, we study a simple but Turing-complete language with integers, Booleans, and first-class functions that may be recursive. typesT::=IntBoolTTvariablesx,y,zintegersnoperatorso::=+=Booleansb::=truefalseconstantsc::=nbexpressionse::=co(e,e)xfunf(x:T)ee(e)

Dynamic Semantics

As before, we use the notation ϵ for the empty list. Given a list L, the notation aL is a larger list with a as the first element and the rest of the elements are the same as L. We use lists of key-value pairs (association lists) to represent mapping from variables to types (type environments, written Γ) and variables to values (environments, written ρ). The following lookup function finds the thing associated with a given key in an association list. The return type of lookup and the use of stuck and return deserves a little explanation, which I give below. lookup:α×(α×β)listβresultlookup(K1,ϵ)=stucklookup(K1,(K2,V)L)={returnVif K1=K2lookup(K1,L)otherwise
One might normally consider lookup as a partial function, because it might fail to find a matching key in the association list. However, here we're going to make the partiality explicit by returning a special result called stuck. We'll also include timeout in the kinds of results, which we explain shortly. datatypeαresult=ResultαStuckTimeOut
We define the following auxiliary notation for dealing with the result datatype. The most import definition is the last line, which helps us avoid cluttering our code with lots of case's. returnaResultastuckStucktimeoutTimeOutXM1;M2caseM1ofStuckStuckTimeOutTimeOutResultR[X:=R]M2
The values of this language (the results of evaluation) are constants and closures. A closure pairs a function with an environment ρ. valuesv::=cf(x:T)e,ρ
In many places within the interpreter we're going to extract an integer, Boolean, or closure from a value. The extraction might fail because, for example, even though we may want to extract an integer the value may instead be a Boolean. toInt(n)=returnntoInt(b)=stucktoInt(f(x:T)e,ρ)=stucktoBool(n)=stucktoBool(b)=returnbtoBool(f(x:T)e,ρ)=stucktoClosure(n)=stucktoClosure(b)=stucktoClosure(f(x:T)e,ρ)=return(f,x,e,ρ)
Next we define the δ function, which gives meaning to the primitive operators. δ(+,v1,v2)=n1toInt(v1);n2toInt(v2);return(n1+n2)δ(,v1,v2)=n1toInt(v1);n2toInt(v2);return(n1n2)δ(=,n1,n2)=n1toInt(v1);n2toInt(v2);return(n1=n2)
The evaluation function V is a mostly-standard closure-based interpreter. The one thing that is a bit unusual is that we make sure that the interpreter is a total function by ensuring termination. The interpreter includes a third parameter that is a counter. If the counter gets to zero, the result is timeout. (This counter technique was described in the earlier blog post.) V(e,ρ,0)=timeoutV(x,ρ,1+k)=lookup(x,ρ)V(c,ρ,1+k)=returncV(funf(x:T)e,ρ,1+k)=returnf(x:T)e,ρV(o(e1,e2),ρ,1+k)=v1V(e1,ρ,k);v2V(e2,ρ,k);δ(o,v1,v2)V(e1e2,ρ,1+k)=v1V(e1,ρ,k);v2V(e2,ρ,k);(f,x,e,ρ)toClosure(v1);V(e,(x,v2)(f,v1)ρ,k)
To finish off the dynamic semantics we must define eval which specifies the behavior of whole programs. The behavior of a program is to either return a constant, return diverge which indicates that the program runs forever, or the behavior is undefined. eval(e)={cif V(e,ϵ,n)=Resultc for some ndivergeif n.V(e,ϵ,n)=TimeOut

Type System

The types for the constants is given by the typeof partial function. typeof(n)=Inttypeof(true)=Booltypeof(false)=Bool
The Δ partial function maps a primitive operator and argument types to the return type. Δ(+,Int,Int)=IntΔ(,Int,Int)=IntΔ(=,Int,Int)=Bool
The following presents the type rules for expressions.
Γe:T
lookup(x,Γ)=ResultTΓx:Ttypeof(c)=TΓc:T(x,T1)(f,T1T2)Γe:T2Γfunf(x:T1)e:T1T2Γe1:T1T2Γe2:T1Γe1(e2):T2Δ(o,T1,T2)=T3Γe1:T1Γe2:T2Γo(e1,e2):T3
Our proof of type safety will require that we define notions of well-typed values, results, and environments.
v:T typeof(c)=Tc:TΓρ(x,T1)(f,T1T2)Γe:T2f(x:T1)e,ρ:T1T2
r:T v:TResultv:TTimeOut:T
Γρ ϵϵv:TΓρ(x,T)Γ(x,v)ρ

Type Safety

The proof of type safety, as promised, includes just three easy lemmas. The first two lemmas are essentially identical to the corresponding lemmas in ``Type Safety in Five''. They establish that the primitive operators and lookup on environments are sound with respect to Δ and lookup on type environments, respectively. The third lemma is the main event of the proof, showing that V is sound with respect to the type system. Because we know that V terminates, we can do the proof by induction on the definition of V, which helps to streamline the proof. We cap off the proof with the Type Safety theorem, which follows almost immediately from the soundness of V.
Lemma (δ is safe)
If Δ(o,¯T)=T and vi:Ti for i{1,,n}, then δ(o,¯v)=Resultv and v:T, for some v.
Proof. We proceed by cases on the operator o.
  1. If the operator o is +, then we have T1=T2=Int and T=Int. Then because vi:Int, we know that vi=ni for i{1,2}. Then δ(+,n1,n2)=Result(n1+n2) and we have (n1+n2):Int.
  2. If the operator o is , then we have T1=T2=Int and T=Int. Then because vi:Int, we know that vi=ni for i{1,2}. Then δ(,n1,n2)=Result(n1n2) and we have (n1n2):Int.
  3. If the operator o is =, then we have T1=T2=Int and T=Bool. Then because vi:Int, we know that vi=ni for i{1,2}. Then δ(=,n1,n2)=Result(n1=n2) and we have (n1=n2):Bool.
QED.
Lemma (lookup is safe)
If Γρ and lookup(x,Γ)=ResultT, then lookup(x,ρ)=Resultv and v:T for some v.
Proof. We proceed by induction on Γρ.
  1. Case ϵϵ:(Γ=ϵ,ρ=ϵ)
    But then we have a contradition with the premise lookup(x,Γ)=ResultT, so this case is vacuously true.
  2. Case v:TΓρ(x,T)Γ(x,v)ρ:
    Next we consider two cases, whether x=x or not.
    1. Case x=x: Then lookup(x,ρ)=Resultv and T=T, so we conclude that v:T.
    2. Case xx: Then lookup(x,ρ)=lookup(x,ρ) and lookup(x,Γ)=lookup(x,Γ)=ResultT. By the induction hypothesis, we have lookup(x,ρ)=Resultv and v:T for some v, which completes this case.
QED.
Lemma (V is safe)
If Γe:T and Γρ, then V(e,ρ,k):T.
Proof. The proof is by induction on the definition of V.
  1. Case V(e,ρ,0): We have V(e,ρ,0)=TimeOut and TimeOut:T.
  2. Case V(x,ρ,1+k): We have V(x,ρ,1+k)=lookup(x,ρ). From Γx:T we have lookup(x,Γ)=ResultT. Then by the lookup is safe lemma, we have lookup(x,ρ)=Resultv and v:T for some v. Thus we have lookup(x,ρ):T and conclude V(x,ρ,1+k):T.
  3. Case V(c,ρ,1+k): From Γc:T we have typeof(c)=T and therefore c:T. We have V(c,ρ)=Resultc and therefore V(c,ρ,1+k):T.
  4. Case V(funf(x:T1)e1,ρ,1+k):
    We have V(funf(x:T1)e1,ρ,1+k)=Resultf(x:T1)e1,ρ. From Γfunf(x:T1)e1:T we have (x,T1)(f,T1T2)Γe1:T2, with T=T1T2. Together with Γρ, we have f(x:T1)e1,ρ:T and therefore V(funf(x:T1)e1,ρ,1+k):T.
  5. Case V(o(e1,e2),ρ,1+k)):
    We have Δ(o,T1,T2)=T and Γe1:T1 and Γe2:T2. By the induction hypothesis, we have V(e1,ρ,k)=Resultv1 and v1:T1 and V(e2,ρ,k)=Resultv2 and v2:T2. Because δ is safe, we have δ(o,v1,v2)=Resultv3 and v3:T for some v3. We have δ(o,v1,v2):T and therefore V(o(e1,e2),ρ,1+k)):T.
  6. Case V(e1e2,ρ,1+k): We have Γe1:T2T and Γe2:T2. By the induction hypothesis, we have V(e1,ρ,k)=Resultv1 and v1:T2T and V(e2,ρ,k)=Resultv2 and v2:T2. By cases on v1:T2T, we have that v1=f(x:T2)e3,ρ and Γρ and (x,T2)(f,T2T)Γe3:T for some f,x,e3,Γ,ρ. So we have V(e1e2,ρ,1+k)=V(e3,(x,v2)(f,v1)ρ,k). Applying the induction hypothesis for a third time, we have V(e3,(x,v2)(f,v1)ρ,k):T. Therefore V(e1e2,ρ,1+k):T.
QED.
Theorem (Type Safety)
If ϵe:T and T{Int,Bool}, then either eval(e)=c and c:T for some c or eval(e)=diverge .
Proof. We consider two cases: whether the program diverges or not.
  1. Suppose that V(e,ϵ,k)=TimeOut for all k. Then eval(e)=diverge.
  2. Suppose it is not the case that V(e,ϵ,k)=TimeOut for all k. So V(e,ϵ,k)TimeOut for some k. Then because V is safe, we have V(e,ϵ,k)=Resultv and v:T. Then from T{Int,Bool} and by cases on v:T, we have v=c for some c.
QED.

6 comments:

  1. I was curious to try how this proof would go through in Coq, and the result was - surprisingly easy. (The current Coq source is here: https://gist.github.com/dkrustev/5890291) As I used Coq's functional induction (for the first time) for the 3rd lemma, it required considering 16 cases, not 6 (including cases when some intermediate computation times out or gets stuck), but most of these cases were easy to deal with. One thing I realized only during the Coq formalization - though, in retrospect, it should have been obvious from the beginning - was that while all lemmas have constructive proofs, the main theorem does not. It appears this non-constructiveness follows from the way \mathit{eval} is defined. I still wonder if another definition of \mathit{eval} would be possible, giving both fully constructive proofs and a similar level of simplicity ... Another minor point: the \mathit{typeof} function is actually total in the Coq formalization, but it probably depends on the exact encoding of the language syntax.

    ReplyDelete
    Replies
    1. Great! Thanks for sharing!

      That would be interesting if you could find a constructive way to define eval. (Though I wouldn't hold my breadth.)

      I forget why I said typeof is partial... in my Isabelle version it is total. I've now linked to my Isabelle development at the beginning of the post.

      Delete
    2. Anonymous1:10 PM

      Hi Jeremy, awesome post thanks! The link to the Isabelle development is not working anymore. Is it still available somewhere? Thanks! Ranjit.

      Delete
    3. Yes, darn Dropbox change a while back regarding Public directory access. I've fixed the link. Thanks for pointing this out!

      Delete
  2. Anonymous6:28 PM

    Just a quick typo: Turing-complete is misspelled as Turning-complete.

    ReplyDelete