## Monday, September 17, 2012

### Interpretations of the Gradually-Typed Lambda Calculus, Part 1

I just got back from Copenhagen, where I gave a tutorial on gradual typing at the Workshop on Scheme and Functional Programming. I very much enjoyed the workshop and giving the tutorial. Thank you for the invitation Olivier!

For those of you who couldn't be there, this series of blog posts will include the material from my tutorial. For those of you who were there, this series will include some bonus material: an efficient machine for "Eager D" based on recent work by Ronald Garcia and myself.

When I first began working on gradual typing in 2005 and 2006, my focus was on the type system. The main pieces of the type system fell into place that first year, and ever since then I've been thinking about the dynamic semantics. It turns out there are many design choices and implementation challenges regarding the dynamic semantics. In this post I'll restrict my attention to the gradually-typed lambda calculus, as many issues already arise in that setting. I'll quickly review the syntax and type system, then move on to discuss the dynamic semantics.

The following defines the syntax for the gradually-typed lambda calculus. Here I'm writing the dynamic type as $\star$. Also, note that a lambda without a type annotation on its parameter is shorthand for a lambda whose parameter is annotated with $\star$.

### The Gradual Type System and Cast Insertion

The type system of the gradually-typed lambda calculus is quite similar to the simply-typed lambda calculus. The only differences are in the rules for application. Instead of requiring the argument's type to be identical to the function's parameter type, we only require that the types be consistent, written $\sim$ and defined below. We also allow the application of expressions of type $$\star$$. The lack of contra-variance in how function parameters are handled in the consistency relation not a mistake. Unlike subtyping, the consistency relation is symmetric, so it wouldn't matter if we wrote $T_3 \sim T_1$ instead of $T_1 \sim T_3$. Also, consistency is not transitive, which is why we don't use a separate subsumption rule, but instead use consistency in the rules for application.

The dynamic semantics of the gradually-typed lambda calculus is not defined in terms of the surface syntax, but instead it is defined on an intermediate language that extends the simply-typed lambda calculus with explicit casts. We use a non-standard notation for casts so that they are easier to read, so that they go left to right. The casts are annotated with blame labels, which we treat here as symbols, but in a real implementation would include the static location (line and character position) of the cast. We use a single blame label without any notion of polarity because these casts are really just casts, not contracts between two parties.

Cast insertion is a type-directed translation, so it is the same as the type system with the addition of an output term.

We often abbreviate a pair of casts to remove the duplication of the middle type as follows

### Design Choices Regarding the Dynamics

Consider the following example in which a function is cast to the dynamic type and then cast to a type that is inconsistent with the type of the function. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} A few questions immediately arise:

• Should a runtime cast error occur during the evaluation of the right-hand side of the let? Or should the runtime error occur later, when f is applied to \textsf{true}?
• When the runtime cast error occurs, which cast should be blamed, $\ell_0$ or $\ell_1$? More generally, we want to define a subtyping relation to characterize safe casts (casts that never fail), and the specifics of subtyping relation depend on the blame tracking strategy.

Ron, Walid, and I wrote a paper, Exploring the Design Space of Higher-Order Casts (ESOP 2009), that characterized the different answers to the above questions in terms of Henglein's Coercion Calculus. One can choose to check higher-order casts in either a lazy or eager fashion and one can assign blame to only downcasts (D) or the one can share blame between upcasts and downcasts (UD). The semantics of casts with lazy checking is straightforward whereas eager checking is not, so we'll first discuss lazy checking. Also, the semantics of the D approach is slightly simpler than UD, so we'll start with lazy D. We'll delay discussing the Coercion Calculus until we really need it.

### The Lazy D Semantics

We'll define an evaluation (partial) function $\mathcal{E}$ that maps a term and environment to a result. That is, we'll give Lazy D a denotational semantics. The values and results are defined as follows: $\begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : T \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array}$

To handle the short-circuiting of evaluation in the case of a cast error (signaled by $\mathbf{blame}\,\ell$), we use the following monadic operators: \begin{align*} \mathbf{return}\,v &= v \\ \mathbf{letB}\,X = M\,\mathbf{in}\, N &= \mathbf{case}\,M\,\mathbf{of}\,\\ & \quad\;\; \mathbf{blame}\,\ell \Rightarrow \mathbf{blame}\,\ell \\ & \quad \mid v \Rightarrow [X{:=}v]N \end{align*}

The primitive operators are given their semantics by the $\delta$ function. \begin{align*} \delta(\mathsf{inc},n) &= n + 1 \\ \delta(\mathsf{dec},n) &= n - 1 \\ \delta(\mathsf{zero?},n) &= (n = 0) \end{align*}

In lazy cast checking, when determining whether to signal a cast error, we only compare the heads of the types: \begin{align*} \mathit{hd}(B) &= B \\ \mathit{hd}(T_1 \to T_2) &= \star \to \star \end{align*}

The following auxiliary function, named cast, is the main event. It is defined by cases on the source and target types $T_1$ and $T_2$. The line for projecting from $\star$ to $T_2$ picks the $\ell$ blame label from the projection (the down-cast), which is what gives this semantics its "D". \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : T_3 \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',T_3,\ell,T_2) \\ \mathsf{cast}(v,T_1,\ell,\star) &= v : T_1 \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The apply auxiliary function performs function application, and is defined by induction on the first parameter. \begin{align*} \mathsf{apply}(F,v_2) &=F(v_2) \\ \mathsf{apply}(v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4,v_2) &= \mathbf{letB}\,X_3 = \mathsf{cast}(v_2,T_3,\ell,T_1)\,\mathbf{in} \\ & \quad \mathbf{letB}\,X_4 = \mathsf{apply}(v,X_3)\,\mathbf{in} \\ & \quad \mathsf{cast}(X_4, T_2, \ell, T_4) \end{align*}

With these auxiliary functions and monadic operators in hand, the definition of the evaluation function $\mathcal{E}$ is straightforward. \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X,T_1 ,\ell, T_2) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad \mathsf{apply}(X_1,X_2) \end{align*} The semantics for the Lazy D Gradually-Typed Lambda Calculus is defined by the following $\mathit{eval}$ partial function. $\mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases}$ where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : T_1\to T_2\Rightarrow^\ell T_3\to T_4) &= \mathit{function} \\ \mathit{observe}(v : T \Rightarrow \star) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*}

Exercise: Calculate the output of eval for the example program at the beginning of this post.

Similar to object-oriented languages, we can define a subtyping relation that characterizes when a cast is safe, that is, when a cast will never fail. The following is the subtyping relation for the D semantics. \begin{gather*} \frac{}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*} This subtyping relation is what I expected to see. The dynamic type plays the role of the top element of this ordering and the rule for function types has the usual contra-variance in the parameter type. The Subtyping Theorem connects the dynamic semantics with the subtyping relation.

Theorem (Subtyping) If the cast labeled with $$\ell$$ in program $$e$$ respects subtyping, then $$\mathit{eval}(e) \neq \mathbf{blame}\,\ell$$.

### The Lazy UD Semantics

One interpretation of the dynamic type $\star$ is to view it as the following recursive type: $\star \equiv \mu \, d.\, \mathsf{Int} + \mathsf{Bool} + (d \to d)$ (See, for example, the chapter on Dynamic Typing in Robert Harper's textbook Practical Foundations for Programming Languages.) In such an interpretation, one can directly convert from $\star \to \star$ to $\star$, but not, for example, from $\mathsf{Int}\to\mathsf{Int}$ to $\star$. Instead, one must first convert from $\mathsf{Int}\to\mathsf{Int}$ to $\star \to \star$ and then to $\star$.

Let I be the subset of types that can be directly injected into $\star$: $I ::= B \mid \star \to \star$ The definition of values for Lazy UD changes to use I instead of T for the values of type $\star$. $\begin{array}{lrcl} \text{values} & v \in V & ::= & k \mid F \mid v : I \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \end{array}$ This change in the definition of value necessitates some changes in the cast function. The second and third-to-last lines below contain most of the changes. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : I \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',I,\ell,T_2) \\ \mathsf{cast}(v,I,\ell,\star) &= v : I \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,\star) &= v : T_{11}\to T_{12} \Rightarrow^\ell \star \to \star \Rightarrow^\ell \star \\ & \text{if } T_{11} \neq \star, T_{12} \neq \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The rest of the definitions for Lazy UD are the same as those for Lazy D. The following is the subtyping relation for Lazy UD. With this subtyping relation, the type $$\star$$ does not play the role of the top element. Instead, a type $T$ is a subtype of $$\star$$ if it is a subtype of some injectable type $$I$$. \begin{gather*} \frac{}{\star <: \star} \qquad \frac{T <: I}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*}

Theorem (Subtyping) If the cast labeled with $$\ell$$ in program $$e$$ respects subtyping, then $$\mathit{eval}(e) \neq \mathbf{blame}\,\ell$$.

Exercise: Calculate the output of the Lazy UD eval for the example program at the beginning of this post.

In the next post I'll turn to the efficient implementation of Lazy D and UD.