Jeremy Siek: September 2012

Thursday, September 20, 2012

Interpretations of the GTLC, Part 4: Even Faster

Consider the following statically-typed function. (The type $\star$ does not occur anywhere in this function.) \[ \lambda f: \mathsf{Int}{\to}\mathsf{Int}. \; x{:}\mathsf{Int} = f(42); \mathbf{return}\,x \] We'd like the execution speed of this function to be the same as if the entire language were statically typed. That is, we don't want statically-typed parts of a gradually-typed program to pay overhead because other parts may be dynamically typed. Unfortunately, in the abstract machines that we've defined so far, there is an overhead. At the point of a function call, such as $f(42)$ above, the machine needs to check whether $f$ has evaluated to a closure or to a closure wrapped in a threesome. This act of checking constitutes some run-time overhead.

Taking a step back, there are two approaches that one sees in the literature regarding how a cast is applied to a function. One approach is to build a new function that casts the argument, applies the old function, and then casts the result. The reduction rule looks like this: \[ v : T_1 \to T_2 \Rightarrow T_3 \to T_4 \longrightarrow \lambda x{:}T_3. (v\,(x : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that there's only one kind of value of function type, functions! So when it comes to function application, we only need one reduction rule, good old beta: \[ (\lambda x{:}T.\,e)\, v \longrightarrow [x{:=}v]e \] The other approach is to leave the cast around the function and then add a second reduction rule for applications. \[ (v_1 : T_1 \to T_2 \Rightarrow T_3 \to T_4) \, v_2 \longrightarrow (v_1\, (v_2 : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that the cast around the function is easy to access and change, which we took advantage of to compress sequences of such casts. But as we've already pointed out, having two kinds of values at function type induces some run-time overhead, even in parts of the program that are statically typed.

Our solution to this conundrum is to use a hybrid representation and to take advantage of the indirection that is already present in a function call. Instead of having two kinds of values at function type, we have only one: a closure that includes an optional threesome: \[ \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \] When a closure is first created, there is no threesome. Later, when a closure is cast, the threesome is added. \[ V(\lambda x{:}T.\, s,\rho) = \langle \lambda x{:}T.\, s, \rho, \bot \rangle \] The one transition rule for function application passes the optional threesome as a special parameter, here named $c$, to the function. In the case of an un-casted closure, the function ignores the $c$ parameter. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \end{align*}

When an un-casted closure is cast, we build a wrapper function, similar to the first approach discussed above, but using the special variable $c$ to refer to the threesome instead of hard-coding the cast into the wrapper function. We add $\mathit{dom}$ and $\mathit{cod}$ operations for accessing the parts of a function threesome. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s', \rho', \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ & \text{and } \rho' = \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \} \end{align*} When a closure is cast for the second time, the casts are combined to save space. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*}

That's it. We now have a machine that doesn't perform extra dispatching at function calls. There is still a tiny bit of overhead in the form of passing the $c$ argument. This overhead can be removed by passing the entire closure to itself (instead of passing the array of free variables and the threesome separately), and from inside the function, access the threesome from the closure.

In the following I give the complete definitions for the new abstraction machine. In addition to $\mathit{dom}$ and $\mathit{cod}$, we add a tail call without a cast to avoid overhead when there is no cast. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \mid \mathit{dom}(e) \mid \mathit{cod}(e) \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) \mid \mathbf{return}\,e(e) : \tau \\ \text{optional threesomes} & \tau_\bot & ::= & \bot \mid \tau \\ \text{values}& v & ::= & k \mid k : \tau \mid \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \end{array} \] Here's the complete definition of the cast function. \begin{align*} \mathsf{cast}(v, \bot) &= v \\ \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s' , \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \}, \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*} Here are the updated evaluation rules. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho, \bot \rangle \\ V(\mathit{dom}(e),\rho) &= \tau_1 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \\ V(\mathit{cod}(e),\rho) &= \tau_2 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \end{align*} Lastly, here are the transition rules for the machine. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau, (x,s,\rho')::\kappa)) & \longmapsto (s, \rho'[x{:=}v'], \kappa) \\ \text{where }& V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2), \rho,\kappa) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot],\kappa) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho,(\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot], ((\tau_1; \tau_2), \sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\[2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*}

I like how there are fewer rules and the rules are somewhat simpler compared to the previous machine. There is one last bit of overhead in statically typed code: in a normal return we have to apply the pending threesome that's on the stack. If one doesn't care about making tail-calls space efficient in the presence of casts, then this wouldn't be necessary. But I care.

Wednesday, September 19, 2012

Interpretations of the GTLC: Part 3, Going Faster

The intuition for an efficient coercion composition function came from thinking about types, not coercions. We'll start with the UD blame tracking strategy and then later consider D. Also, for now we'll stick with lazy cast checking.

Consider the following sequence of casts: \[ e : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_2} \cdots \Rightarrow^{\ell_{n-1}} T_n \] We'd like some way to summarize the sequence of types without loosing any important information. That is, we'd like to come up with something that can catch the same cast errors as the entire sequence, blaming the appropriate label, but using less space. Imagine the $n$ types as a line of differently colored trees on the side of a road. If you're next to the road, staring down the line of trees, you see what looks like one tree with branches of many colors. Some of the branches from further-away trees are hidden from view by closer trees, but some are visible. Now, suppose we wanted to maintain the same view from your standpoint, but save on water. We could replace the line of trees with a single multi-colored tree that includes all the branches visible to you. The figure below depicts three differently-colored trees getting merged into a single multi-color tree. The nodes without color should be considered transparent.

Mapping this idea back to types, the colors are blame labels and transparent nodes are the type $\star$. Because we need to color individual branches, we need blame labels on every internal node of a type. In particular, we need the notion of a labeled type: \[ \begin{array}{lrcl} \text{optional labels} & p,q & ::= & \epsilon \mid \ell \\ \text{labeled types} & P,Q & ::= & B^p \mid P \to^p Q \mid \star \mid (I^p; \bot^\ell) \end{array} \] (These labeled types are for the UD strategy. We'll discuss the labeled types for D later. Also, the labeled type $I^p; \bot^\ell$ deserves some explanation, which we'll get to soon.) The $\mathit{label}$ function returns the top-most label of a labeled type: \begin{align*} \mathit{label}(B^p) &= p \\ \mathit{label}(P \to^p Q) &= p \\ \mathit{label}(\star) &= \epsilon \\ \mathit{label}(I^p; \bot^\ell) &= p \end{align*}

We'll define a function for composing two labeled types $P$ and $Q$ to produce a new labeled type $P'$, using semicolon as the syntax for this composition function: \[ P ; Q = P' \] We replace each cast with a threesome, that is, a cast annotated with a labeled type. The labeled type is computed by a simple function $\mathcal{L}$ that we define below. \begin{align*} & e : T_1 \Rightarrow^\ell T_2 \text{ becomes } e : T_1 \overset{P}{\Longrightarrow} T_2 \\ & \text{ where } P = \mathcal{L}(T_1 \Rightarrow^\ell T_2) \end{align*} \begin{align*} \mathcal{L}(B \Rightarrow^\ell B) &= B \\ \mathcal{L}(\star \Rightarrow^\ell \star) &= \star \\ \mathcal{L}(B \Rightarrow^\ell \star) &= B \\ \mathcal{L}(\star \Rightarrow^\ell B) &= B^\ell \\ \mathcal{L}(T_1 \Rightarrow^\ell T_2) &= I ; \bot^\ell \qquad \text{where } I \sim T_1 \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell T_4) \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell \star) &= \mathcal{L}(\star \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell \star) \\ \mathcal{L}(\star \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell \star) \to^\ell \mathcal{L}(\star \Rightarrow^\ell T_4) \end{align*} A sequence of threesomes is compressed to a single threesome using the composition function: \begin{gather*} e : T_1 \overset{P_1}{\Longrightarrow} T_2 \overset{P_2}{\Longrightarrow} \cdots \overset{P_{n-1}}{\Longrightarrow} T_n \\ \text{becomes} \\ e : T_1 \overset{P}{\Longrightarrow} T_n \\ \text{where } P = P_1; P_2; \cdots; P_{n-1} \end{gather*}

Before we go into the details of the composition function, it helps to see how (well-formed) threesomes correspond to coercions in normal form. With this correspondence in place, we can use the coercion reduction rules to help guide the definition of threesome composition. The function $\mathit{TC}$ defined below maps threesomes to coercions in normal form. This function is an isomorphism, so it's inverse maps normal coercions back to threesomes. \begin{align*} \mathit{TC}(B \overset{B}{\Longrightarrow} B) &= \iota_B \\ \mathit{TC}(\star \overset{\star}{\Longrightarrow}\star) &=\iota_\star \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} B) &= B?^\ell \\ \mathit{TC}(B\overset{B}{\Longrightarrow} \star) &= B! \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} \star) &= B?^\ell; B! \\ \mathit{TC}(T_1 \overset{I; \bot^\ell}{\Longrightarrow} T_2) &= \mathsf{Fail}^\ell \\ \mathit{TC}(T_1 \overset{I^{\ell_1}; \bot^{\ell_2}}{\Longrightarrow} T_2) &= I?^{\ell_1} ; \mathsf{Fail}^{\ell_2} \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} T_3 \to T_4)&= \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} T_3 \to T_4)&= (\star \to \star)?^\ell ; \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} \star)&= \mathit{TC}(\star \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} \star)&= (\star \to \star)?^\ell ; \mathit{TC}(\star \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \end{align*}

We're ready to make precise how two labeled types can be composed to form a single labeled type. The two cases in which one of the labeled types is $\star$ are easy: just return the other type: \begin{align*} \star; Q &= Q \\ P; \star &= P \end{align*} Next, suppose we have $\mathsf{Int}^{\ell_1}$ followed by $\mathsf{Int}^{\ell_2}$. These should compose to $\mathsf{Int}^{\ell_1}$ because if the first cast succeeds, so will the second, making the blame label $\ell_2$ redundant. In general, for labeled basic types we have the following rule. \begin{equation} \label{eq:1} B^p; B^q = B^p \end{equation} Suppose instead that the basic types don't match. The right-hand side of the following rule is a bit tricky, so let's think about this in terms of coercions. \begin{equation} \label{eq:2} B_1^p ; B_2^q = B_1^p ; \bot^q \qquad \text{if } B_1 \neq B_2 \end{equation} Suppose $p=\ell_1, q = \ell_2$ and these two labeled types come from the threesomes \[ \star \overset{B_1^{\ell_1}}{\Longrightarrow} \star \overset{B_2^{\ell_2}}{\Longrightarrow} B_2 \] The corresponding coercion sequence is \[ B_1?^{\ell_1} ; B_1! ; B_2?^{\ell_2} \] which reduces to \[ B_1?^{\ell_1} ; \mathsf{Fail}^{\ell_2} \] and that corresponds to the labeled type for errors, $B_1^{\ell_1}; \bot^{\ell_2}$. We also need to consider a mismatch between basic types and function types: \begin{align} B^p; (P \to^q Q) &= B^q; \bot^q \\ (P \to^p Q); B^q &= (\star \to \star)^p; \bot^q \end{align} The rule for labeled function types takes the label $p$ for the label in the result and it recursively composes the domain and codomain types. The contra-variance in the parameter type is important for getting the right blame and coincides with the contra-variance in the reduction rule for composing function coercions. \begin{equation} \label{eq:4} (P_1 \to^p P_2) ; (Q_1 \to^q Q_2) = (Q_1; P_1) \to^p (P_2; Q_2) \end{equation} The following figure shows an example similar to the previous figure, but with function types instead of pair types. The analogy with real trees and line-of-sight breaks down because you have to flip to looking at the trees from back-to-front instead of front-to-back for negative positions within the type.

Lastly we need several rules to handle when the error type is on the left or right. \begin{align} (I^p; \bot^\ell); Q &= (I^p; \bot^\ell) \\ P; (I^q; \bot^\ell) &= I^p ; \bot^\ell \qquad \text{if } I \sim P \text{ and } \mathit{label}(P) = p \\ P; (I^q; \bot^\ell) &= I^p ; \bot^q \qquad \text{if } I \not\sim P \text{ and } \mathit{label}(P) = p \end{align}

What's extra cool about labeled types and their composition function is that each rule covers many different rules if you were to formulate them in terms of coercions. For example, the single rule $ B^p; B^q = B^p$ covers four situations when viewed as threesomes or coercions: \begin{align*} (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= B \overset{B}{\Longrightarrow} B \\ \iota_B; \iota_B &\longrightarrow \iota_B \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= \star \overset{B^\ell}{\Longrightarrow} B \\ B?^\ell ; \iota_B &\longrightarrow B?^\ell \\ (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= B \overset{B}{\Longrightarrow} \star \\ \iota_B; B! &\longrightarrow B! \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= \star \overset{B^\ell}{\Longrightarrow} \star \\ B?^\ell ; B! & \text{ is already in normal form} \end{align*}

We define a threesomes as a source type, middle labeled typed, and a target type. \[ \begin{array}{llcl} \text{threesomes} & \tau & ::= & T \overset{P}{\Longrightarrow} T \\ \end{array} \] We define the sequencing of threesomes as follows \[ (T_1 \overset{P}{\Longrightarrow} T_2); (T_2 \overset{Q}{\Longrightarrow}T_3) = T_1 \overset{P;Q}{\Longrightarrow} T_3 \] Similarly, we define the notation $\tau_1 \to \tau_2$ as \[ (T_3 \overset{P}{\Longrightarrow} T_1) \to (T_2 \overset{Q}{\Longrightarrow} T_4) = T_1\to T_2 \overset{P\to Q}{\Longrightarrow} T_3 \to T_4 \]

We can now go back to the ECD machine and replace the coercions with threesomes. Here's the syntax in A-normal form. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x : T = e(e) \mid x = e : \tau \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \tau \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \tau \end{array} \] The cast function, of course, needs to change. \begin{align*} \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = (B \overset{B}{\Longrightarrow} B) \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle, \tau) &= \langle \lambda x{:}T.s,\rho \rangle : \tau \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle : \tau_1, \tau_2) &= \begin{cases} \langle \lambda x{:}T.s,\rho \rangle : (\tau_1;\tau_2)& \text{if } \mathit{middle}(\tau_1;\tau_2) \neq (I^p;\bot^\ell) \\ \mathbf{blame}\,\ell & \text{if } \mathit{middle}(\tau_1;\tau_2) = (I^p;\bot^\ell) \end{cases} \end{align*} And last but not least, here's the transitions for the ECD machine, but with threesomes instead of coercions. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (T_1 \overset{T_1}{\Longrightarrow}T_1,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (\tau_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : \tau_1 \to \tau_2, \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2, \tau_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], ((\tau_1; \tau_2),\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\tau_5,\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_1. s, \rho' \rangle : \tau_3 \to \tau_4,\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, \tau_3) = v'_2, \text{ and} \\ & (\tau_4; \tau_1; \tau_2) = \tau_5 \\[2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*}

We now have an implementation of Lazy UD that is space efficient and relatively efficient in time as well. However, there is one nagging issue regarding the speed of statically-typed code. Notice how there are two transition rules for each kind of function call. The source of the problem is that there are two kinds of values that have function type, closures and closures wrapped in a threesome. In the next post I'll define a unified representation for closures and wrapped closures so that we don't need to dispatch at runtime.

Tuesday, September 18, 2012

Interpretations of the GTLC, Part 2: Space-Efficient Machines

I briefly mentioned in my previous post that there are implementation challenges regarding gradual typing. One of those challenges regards space efficiency. In their paper Space-Efficient Gradual Typing, Herman, Tomb, and Flanagan observed two circumstances in which function casts can lead to unbounded space consumption. First, some programs repeatedly apply casts to the same function, resulting in a build-up of casts around the function. In the following example, each time the function bound to k is passed between even and odd a cast is added, causing a space leak proportional to n.

let rec even(n : Int, k : Dyn->Bool) : Bool =
   if (n = 0) then k(True : Bool => *)
   else odd(n - 1, k : *->Bool => Bool->Bool)
and odd(n : Int, k : Bool->Bool) : Bool =
   if (n = 0) then k(False)
   else even(n - 1, k : Bool->Bool => *->Bool)

Second, some casts break proper tail recursion. Consider the following example in which the return type of even is $\star$ and odd is $\mathsf{Bool}$.

let rec even(n : Int) : Dyn =
  if (n = 0) then True else odd(n - 1) : Bool => *
and odd(n : Int) : Bool =
  if (n = 0) then False else even(n - 1) : * => Bool

Assuming tail call optimization, cast-free versions of the even and odd functions require only constant space, but because the call to even is no longer a tail call, the run-time stack grows with each call and space consumption is proportional to n.

Herman et al.'s solution to this space-efficiency problem relies on Henglein's Coercion Calculus. This calculus defines a set of combinators, called coercions, that can be used to express casts. The key advantage of coercions is that an arbitrarily long sequence of coercions reduces to a sequence of at most three coercions. The following defines the syntax and typing rules for coercions. The two most important coercions are the injection coercion $I!$ from $I$ to $\star$ and the projection coercion $I?^\ell$ from $\star$ to $I$. The definition of the injectable types $I$ depends on the blame tracking strategy: \begin{align*} I & ::= B \mid \star \to \star & (UD) \\ I & ::= B \mid T \to T & (D) \end{align*} \begin{gather*} \begin{array}{llcl} \text{coercions} & c & ::= & \iota_T \mid I! \mid I?^\ell \mid c\to c \mid c;c \mid \mathsf{Fail}^\ell \end{array} \\[2ex] \frac{}{\vdash \iota_T : T \Rightarrow T} \qquad \frac{}{\vdash T! : T \Rightarrow \star} \qquad \frac{}{\vdash T?^\ell : \star \Rightarrow T} \\[2ex] \frac{\vdash c_1 : T_{21} \Rightarrow T_{11} \quad \vdash c_2 : T_{12} \Rightarrow T_{22}} {\vdash c_1 \to c_2 : T_{11} \to T_{12} \Rightarrow T_{21} \to T_{22}} \\[2ex] \frac{\vdash c_1 : T_1 \Rightarrow T_2 \quad \vdash c_2 : T_2 \Rightarrow T_3} {\vdash c_1 ; c_2 : T_1 \Rightarrow T_3} \qquad \frac{}{\vdash \mathsf{Fail}^\ell : T_1 \Rightarrow T_2} \end{gather*} We sometimes drop the subscript on the $\iota_T$ when the type $T$ doesn't matter.

The way in which casts are compiled to coercions depends on whether you're using the D or UD blame tracking strategy. Here's the compilation function for D. \begin{align*} \mathcal{C}_D(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_D(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_D(\star \Rightarrow^\ell I) &= I?^\ell \\ \mathcal{C}_D(I \Rightarrow^\ell \star) &= I! \\ \mathcal{C}_D(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_D(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} The compilation function for UD is a bit more complicated because the definition of injectable type is more restrictive. \begin{align*} \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell B) &= B?^\ell \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell T_3 \to T_4) &= (\star \to \star)?^\ell; (c_1 \to c_2) \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell \star) \\ \text{and } & c_2 = \mathcal{C}_D(\star \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell \star) &= B! \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell \star) &= (c_1 \to c_2) ; (\star \to \star)! \\ \text{where } & c_1 = \mathcal{C}_D(\star \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell \star) \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*}

We identify coercion sequencing up to associativity and identity: \begin{align*} c_1 ; (c_2; c_3) &= (c_1 ; c_2); c_3 \\ (c ; \iota) &= (\iota ; c) = c \end{align*} The following are the reduction rules for coercions. The compilation function $\mathcal{C}$ depends on the choice of blame tracking strategy (D or UD). \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (c_{11} \to c_{12}); (c_{21} \to c_{22}) & \longrightarrow (c_{21};c_{11}) \to (c_{12}; c_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ \end{align*}

The last reduction rule refers to $\overline{c}$, a subset of the coercions that we refer to as wrapper coercions. We define wrapper coercions and coercions in normal form $\hat{c}$ as follows. \[ \begin{array}{llcl} \text{optional injections} & \mathit{inj} & ::= & \iota \mid I! \\ \text{optional function coercions} & \mathit{fun} & ::= & \iota \mid \hat{c} \to \hat{c} \\ \text{optional projections} & \mathit{proj} & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \mathit{fun}; \mathit{inj} \\ \text{normal coercions} & \hat{c} & ::= & \mathit{proj} ; \mathit{fun}; \mathit{inj} \mid \mathit{proj}; \mathsf{Fail}^\ell \end{array} \] Here we can easily see that coercions in normal form can always be represented by a coercion with a length of at most three.

Theorem (Strong Normalization for Coercions) For any coercion $c$, there exists a $\hat{c}$ such that $c \longrightarrow^{*} \hat{c}$.

With the Coercion Calculus in hand, we can define a space-efficient abstract machine. This machine is a variant of my favorite abstract machine for the lambda calculus, the ECD machine on terms in A-normal form. Herman et al. define an efficient reduction semantics that relies on mutually-recursive evaluation contexts to enable the simplification of coercions in tail position. Our choice of the ECD machine allows us to deal with coercions in tail position in a more straightforward way. Here's the syntax for our coercion-based calculus in A-normal form. The two main additions are the cast definition and the tail-call statement. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x:T = e(e) \mid x = e : \hat{c}\\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \hat{c} \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \overline{c} \end{array} \]

The evaluation function $V$ maps expressions and environments to values in the usual way. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho \rangle \end{align*} The following auxiliary function applies a coercion to a value. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota_B \\ \mathbf{blame}\,\ell & \text{if } \hat{c} = \mathsf{Fail}^\ell \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c_2}) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c_2})= \iota_B \\ \mathbf{blame}\,\ell & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \mathsf{Fail}^\ell \\ \tilde{v} : \overline{c_3} & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \overline{c_3} \text{ and } \overline{c_3} \neq \mathsf{Fail}^\ell \end{cases} \end{align*}

The machine state has the form $(s,\rho,\kappa)$, where the stack $\kappa$ is essentially a list of frames. The frames are somewhat unusual in that they include a pending coercion. We also need a pending coercion for the empty stack, so we define stacks as follows. \[ \begin{array}{llcl} & \sigma & ::= & [] \mid (x,s,\rho)::\kappa \\ \text{stacks} & \kappa & ::= & (c,\sigma) \end{array} \] The machine has seven transition rules to handle normal execution and two rules to handle cast errors. \begin{align*} (x:T = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (\iota_T,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (c_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : (c_1 \to c_2), \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2,c_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : c; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,e, \rho, (c,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], (\hat{c_3},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2\\ & (c_1; c_2) \longrightarrow^{*} \hat{c_3} \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\hat{c_5},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle : (c_3 \to c_4),\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, c_3) = v'_2, \text{ and} \\ & (c_4; c_1; c_2) \longrightarrow^{*} \hat{c_5} \\[2ex] (x = e : c; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (c,\sigma)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \end{align*}

The space-efficiency of this machine comes from two places. First, the values only ever include a single cast wrapped around a simple value. The cast function maintains this invariant by normalizing whenever a cast is applied to an already-casted value. The second place is the transition rule for tail calls. The coercion in tail position $c_1$ is sequenced with the pending coercion $c_2$ and normalized to $c_3$, which becomes the new pending coercion.

While this machine is space efficient, it is not efficient with respect to time. The reason is that naive coercion reduction is an expensive process: one must search for a redex, reduce it, and plug the contractum back in. Furthermore, the associativity of coercions makes searching for a redex more challenging. In the next post I'll discuss a recursive function that directly maps a sequence of two coercions in normal form to its normal form, using ideas from the paper Threesomes, With and Without Blame I coauthored with Philip Wadler.

Monday, September 17, 2012

Interpretations of the Gradually-Typed Lambda Calculus, Part 1

I just got back from Copenhagen, where I gave a tutorial on gradual typing at the Workshop on Scheme and Functional Programming. I very much enjoyed the workshop and giving the tutorial. Thank you for the invitation Olivier!

For those of you who couldn't be there, this series of blog posts will include the material from my tutorial. For those of you who were there, this series will include some bonus material: an efficient machine for "Eager D" based on recent work by Ronald Garcia and myself.

When I first began working on gradual typing in 2005 and 2006, my focus was on the type system. The main pieces of the type system fell into place that first year, and ever since then I've been thinking about the dynamic semantics. It turns out there are many design choices and implementation challenges regarding the dynamic semantics. In this post I'll restrict my attention to the gradually-typed lambda calculus, as many issues already arise in that setting. I'll quickly review the syntax and type system, then move on to discuss the dynamic semantics.

The following defines the syntax for the gradually-typed lambda calculus. Here I'm writing the dynamic type as $\star$ . Also, note that a lambda without a type annotation on its parameter is shorthand for a lambda whose parameter is annotated with $\star$ . $\begin{array}{lrcl} \text{basic types} & B & ::= & \mathsf{Int} \mid \mathsf{Bool} \\ \text{types} & T & ::= & B \mid T \to T \mid \star \\ \text{variables} & x,y,z \\ \text{integers} & n \\ \text{constants}& k & ::= & n \mid \mathsf{true} \mid \mathsf{false} \\ \text{operators}&\mathit{op} & ::= & \mathsf{inc} \mid \mathsf{dec} \mid \mathsf{zero?} \\ \text{expressions} & e & ::= & k \mid \mathit{op}(e) \mid x \mid \lambda x{:}T.\, e \mid e\,e \\ & & & \lambda x.\, e \equiv \lambda x{:}\star.\, e \end{array}$

The Gradual Type System and Cast Insertion

The type system of the gradually-typed lambda calculus is quite similar to the simply-typed lambda calculus. The only differences are in the rules for application. Instead of requiring the argument's type to be identical to the function's parameter type, we only require that the types be consistent, written $\sim$ and defined below. We also allow the application of expressions of type $\star$. $\begin{gather*} \frac{\mathrm{typeof}(k) = B}{\Gamma \vdash k : B} \quad \frac{\mathrm{typeof}(\mathit{op}) = B_1 \to B_2 \quad \Gamma \vdash e : T \quad B_1 \sim T}{ \Gamma \vdash \mathit{op}(e) : B_2} \\[2ex] \frac{x:T \in \Gamma}{\Gamma \vdash x : T} \quad \frac{\Gamma,x:T_1 \vdash e : T_2}{\Gamma \vdash \lambda x:T_1.e : T_1 \to T_2} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 : T_1 \to T_3 \\ \Gamma \vdash e_2 : T_2 \end{array} \quad T_1 \sim T_2}{\Gamma \vdash e_1\,e_2 : T_3} \quad \frac{\begin{array}{l}\Gamma \vdash e_1 : \star \\ \Gamma \vdash e_2 : T_2 \end{array}}{\Gamma \vdash e_1\,e_2 : \star} \\[4ex] B \sim B \qquad \star \sim T \qquad T \sim \star \qquad \frac{T_1 \sim T_3 \quad T_2 \sim T_4}{T_1 \to T_2 \sim T_3 \to T_4} \end{gather*}$ The lack of contra-variance in how function parameters are handled in the consistency relation not a mistake. Unlike subtyping, the consistency relation is symmetric, so it wouldn't matter if we wrote $T_3 \sim T_1$ instead of $T_1 \sim T_3$ . Also, consistency is not transitive, which is why we don't use a separate subsumption rule, but instead use consistency in the rules for application.

The dynamic semantics of the gradually-typed lambda calculus is not defined in terms of the surface syntax, but instead it is defined on an intermediate language that extends the simply-typed lambda calculus with explicit casts. $\begin{array}{lrcl} \text{blame labels} & \ell \\ \text{expressions} & e &::=& \ldots \mid e : T \Rightarrow^\ell T \end{array}$ We use a non-standard notation for casts so that they are easier to read, so that they go left to right. The casts are annotated with blame labels, which we treat here as symbols, but in a real implementation would include the static location (line and character position) of the cast. We use a single blame label without any notion of polarity because these casts are really just casts, not contracts between two parties.

Cast insertion is a type-directed translation, so it is the same as the type system with the addition of an output term. $\begin{gather*} \frac{\mathrm{typeof}(k) = B}{\Gamma \vdash k \leadsto k : B} \quad \frac{\begin{array}{l}\mathrm{typeof}(\mathit{op}) = B_1 \to B_2 \\ \Gamma \vdash e \leadsto e' : T \end{array} \quad B_1 \sim T}{ \Gamma \vdash \mathit{op}(e) \leadsto \mathit{op}(e' : T \Rightarrow^\ell B_1): B_2} \\[2ex] \frac{x:T \in \Gamma}{\Gamma \vdash x \leadsto x : T} \quad \frac{\Gamma,x:T_1 \vdash e \leadsto e' : T_2} {\Gamma \vdash (\lambda x:T_1.e) \leadsto (\lambda x:T_1.e') : T_1 \to T_2} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 \leadsto e'_1 : T_1 \to T_3 \\ \Gamma \vdash e_2 \leadsto e'_2 : T_2 \end{array} \quad T_1 \sim T_2}{\Gamma \vdash e_1\,e_2 \leadsto e'_1 \, (e'_2 : T_2 \Rightarrow^\ell T_1): T_3} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 \leadsto e'_1 : \star \\ \Gamma \vdash e_2 \leadsto e'_2 : T_2 \end{array}} {\Gamma \vdash e_1 \, e_2 \leadsto (e'_1: \star \Rightarrow^\ell T_2 \to \star )\,e'_2 : \star} \end{gather*}$

We often abbreviate a pair of casts $(e : T_1 \Rightarrow^{\ell_1} T_2) : T_2 \Rightarrow^{\ell_2} T_3$ to remove the duplication of the middle type as follows $e : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_2} T_3$

Design Choices Regarding the Dynamics

Consider the following example in which a function is cast to the dynamic type and then cast to a type that is inconsistent with the type of the function. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} A few questions immediately arise:

Should a runtime cast error occur during the evaluation of the right-hand side of the let? Or should the runtime error occur later, when f is applied to \textsf{true}?
When the runtime cast error occurs, which cast should be blamed, $\ell_0$ or $\ell_1$ ? More generally, we want to define a subtyping relation to characterize safe casts (casts that never fail), and the specifics of subtyping relation depend on the blame tracking strategy.

Ron, Walid, and I wrote a paper, Exploring the Design Space of Higher-Order Casts (ESOP 2009), that characterized the different answers to the above questions in terms of Henglein's Coercion Calculus. One can choose to check higher-order casts in either a lazy or eager fashion and one can assign blame to only downcasts (D) or the one can share blame between upcasts and downcasts (UD). The semantics of casts with lazy checking is straightforward whereas eager checking is not, so we'll first discuss lazy checking. Also, the semantics of the D approach is slightly simpler than UD, so we'll start with lazy D. We'll delay discussing the Coercion Calculus until we really need it.

The Lazy D Semantics

We'll define an evaluation (partial) function $\mathcal{E}$ that maps a term and environment to a result. That is, we'll give Lazy D a denotational semantics. The values and results are defined as follows: \[ \begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : T \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array} \]

To handle the short-circuiting of evaluation in the case of a cast error (signaled by $\mathbf{blame}\,\ell$ ), we use the following monadic operators: \begin{align*} \mathbf{return}\,v &= v \\ \mathbf{letB}\,X = M\,\mathbf{in}\, N &= \mathbf{case}\,M\,\mathbf{of}\,\\ & \quad\;\; \mathbf{blame}\,\ell \Rightarrow \mathbf{blame}\,\ell \\ & \quad \mid v \Rightarrow [X{:=}v]N \end{align*}

The primitive operators are given their semantics by the $\delta$ function. \begin{align*} \delta(\mathsf{inc},n) &= n + 1 \\ \delta(\mathsf{dec},n) &= n - 1 \\ \delta(\mathsf{zero?},n) &= (n = 0) \end{align*}

In lazy cast checking, when determining whether to signal a cast error, we only compare the heads of the types: \begin{align*} \mathit{hd}(B) &= B \\ \mathit{hd}(T_1 \to T_2) &= \star \to \star \end{align*}

The following auxiliary function, named cast, is the main event. It is defined by cases on the source and target types $T_1$ and $T_2$ . The line for projecting from $\star$ to $T_2$ picks the $\ell$ blame label from the projection (the down-cast), which is what gives this semantics its "D". \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : T_3 \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',T_3,\ell,T_2) \\ \mathsf{cast}(v,T_1,\ell,\star) &= v : T_1 \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The apply auxiliary function performs function application, and is defined by induction on the first parameter. \begin{align*} \mathsf{apply}(F,v_2) &=F(v_2) \\ \mathsf{apply}(v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4,v_2) &= \mathbf{letB}\,X_3 = \mathsf{cast}(v_2,T_3,\ell,T_1)\,\mathbf{in} \\ & \quad \mathbf{letB}\,X_4 = \mathsf{apply}(v,X_3)\,\mathbf{in} \\ & \quad \mathsf{cast}(X_4, T_2, \ell, T_4) \end{align*}

With these auxiliary functions and monadic operators in hand, the definition of the evaluation function $\mathcal{E}$ is straightforward. \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X,T_1 ,\ell, T_2) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad \mathsf{apply}(X_1,X_2) \end{align*} The semantics for the Lazy D Gradually-Typed Lambda Calculus is defined by the following $\mathit{eval}$ partial function. \[ \mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases} \] where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : T_1\to T_2\Rightarrow^\ell T_3\to T_4) &= \mathit{function} \\ \mathit{observe}(v : T \Rightarrow \star) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*}

Exercise: Calculate the output of eval for the example program at the beginning of this post.

Similar to object-oriented languages, we can define a subtyping relation that characterizes when a cast is safe, that is, when a cast will never fail. The following is the subtyping relation for the D semantics. \begin{gather*} \frac{}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*} This subtyping relation is what I expected to see. The dynamic type plays the role of the top element of this ordering and the rule for function types has the usual contra-variance in the parameter type. The Subtyping Theorem connects the dynamic semantics with the subtyping relation.

Theorem (Subtyping) If the cast labeled with $\ell$ in program $e$ respects subtyping, then $\mathit{eval}(e) \neq \mathbf{blame}\,\ell$.

The Lazy UD Semantics

One interpretation of the dynamic type $\star$ is to view it as the following recursive type: \[ \star \equiv \mu \, d.\, \mathsf{Int} + \mathsf{Bool} + (d \to d) \] (See, for example, the chapter on Dynamic Typing in Robert Harper's textbook Practical Foundations for Programming Languages.) In such an interpretation, one can directly convert from $\star \to \star$ to $\star$ , but not, for example, from $\mathsf{Int}\to\mathsf{Int}$ to $\star$ . Instead, one must first convert from $\mathsf{Int}\to\mathsf{Int}$ to $\star \to \star$ and then to $\star$ .

Let I be the subset of types that can be directly injected into $\star$ : \[ I ::= B \mid \star \to \star \] The definition of values for Lazy UD changes to use I instead of T for the values of type $\star$ . \[ \begin{array}{lrcl} \text{values} & v \in V & ::= & k \mid F \mid v : I \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \end{array} \] This change in the definition of value necessitates some changes in the cast function. The second and third-to-last lines below contain most of the changes. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : I \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',I,\ell,T_2) \\ \mathsf{cast}(v,I,\ell,\star) &= v : I \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,\star) &= v : T_{11}\to T_{12} \Rightarrow^\ell \star \to \star \Rightarrow^\ell \star \\ & \text{if } T_{11} \neq \star, T_{12} \neq \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The rest of the definitions for Lazy UD are the same as those for Lazy D. The following is the subtyping relation for Lazy UD. With this subtyping relation, the type $\star$ does not play the role of the top element. Instead, a type $T$ is a subtype of $\star$ if it is a subtype of some injectable type $I$. \begin{gather*} \frac{}{\star <: \star} \qquad \frac{T <: I}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*}

Theorem (Subtyping) If the cast labeled with $\ell$ in program $e$ respects subtyping, then $\mathit{eval}(e) \neq \mathbf{blame}\,\ell$.

Exercise: Calculate the output of the Lazy UD eval for the example program at the beginning of this post.

In the next post I'll turn to the efficient implementation of Lazy D and UD.