Tuesday, September 18, 2012

Interpretations of the GTLC, Part 2: Space-Efficient Machines

I briefly mentioned in my previous post that there are implementation challenges regarding gradual typing. One of those challenges regards space efficiency. In their paper Space-Efficient Gradual Typing, Herman, Tomb, and Flanagan observed two circumstances in which function casts can lead to unbounded space consumption. First, some programs repeatedly apply casts to the same function, resulting in a build-up of casts around the function. In the following example, each time the function bound to k is passed between even and odd a cast is added, causing a space leak proportional to n.

let rec even(n : Int, k : Dyn->Bool) : Bool =
   if (n = 0) then k(True : Bool => *)
   else odd(n - 1, k : *->Bool => Bool->Bool)
and odd(n : Int, k : Bool->Bool) : Bool =
   if (n = 0) then k(False)
   else even(n - 1, k : Bool->Bool => *->Bool)

Second, some casts break proper tail recursion. Consider the following example in which the return type of even is \(\star\) and odd is \(\mathsf{Bool}\).

let rec even(n : Int) : Dyn =
  if (n = 0) then True else odd(n - 1) : Bool => *
and odd(n : Int) : Bool =
  if (n = 0) then False else even(n - 1) : * => Bool
Assuming tail call optimization, cast-free versions of the even and odd functions require only constant space, but because the call to even is no longer a tail call, the run-time stack grows with each call and space consumption is proportional to n.

Herman et al.'s solution to this space-efficiency problem relies on Henglein's Coercion Calculus. This calculus defines a set of combinators, called coercions, that can be used to express casts. The key advantage of coercions is that an arbitrarily long sequence of coercions reduces to a sequence of at most three coercions. The following defines the syntax and typing rules for coercions. The two most important coercions are the injection coercion \(I!\) from \(I\) to \(\star\) and the projection coercion \(I?^\ell\) from \(\star\) to \(I\). The definition of the injectable types \(I\) depends on the blame tracking strategy: \begin{align*} I & ::= B \mid \star \to \star & (UD) \\ I & ::= B \mid T \to T & (D) \end{align*} \begin{gather*} \begin{array}{llcl} \text{coercions} & c & ::= & \iota_T \mid I! \mid I?^\ell \mid c\to c \mid c;c \mid \mathsf{Fail}^\ell \end{array} \\[2ex] \frac{}{\vdash \iota_T : T \Rightarrow T} \qquad \frac{}{\vdash T! : T \Rightarrow \star} \qquad \frac{}{\vdash T?^\ell : \star \Rightarrow T} \\[2ex] \frac{\vdash c_1 : T_{21} \Rightarrow T_{11} \quad \vdash c_2 : T_{12} \Rightarrow T_{22}} {\vdash c_1 \to c_2 : T_{11} \to T_{12} \Rightarrow T_{21} \to T_{22}} \\[2ex] \frac{\vdash c_1 : T_1 \Rightarrow T_2 \quad \vdash c_2 : T_2 \Rightarrow T_3} {\vdash c_1 ; c_2 : T_1 \Rightarrow T_3} \qquad \frac{}{\vdash \mathsf{Fail}^\ell : T_1 \Rightarrow T_2} \end{gather*} We sometimes drop the subscript on the \(\iota_T\) when the type \(T\) doesn't matter.

The way in which casts are compiled to coercions depends on whether you're using the D or UD blame tracking strategy. Here's the compilation function for D. \begin{align*} \mathcal{C}_D(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_D(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_D(\star \Rightarrow^\ell I) &= I?^\ell \\ \mathcal{C}_D(I \Rightarrow^\ell \star) &= I! \\ \mathcal{C}_D(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_D(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} The compilation function for UD is a bit more complicated because the definition of injectable type is more restrictive. \begin{align*} \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell B) &= B?^\ell \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell T_3 \to T_4) &= (\star \to \star)?^\ell; (c_1 \to c_2) \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell \star) \\ \text{and } & c_2 = \mathcal{C}_D(\star \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell \star) &= B! \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell \star) &= (c_1 \to c_2) ; (\star \to \star)! \\ \text{where } & c_1 = \mathcal{C}_D(\star \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell \star) \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*}

We identify coercion sequencing up to associativity and identity: \begin{align*} c_1 ; (c_2; c_3) &= (c_1 ; c_2); c_3 \\ (c ; \iota) &= (\iota ; c) = c \end{align*} The following are the reduction rules for coercions. The compilation function \(\mathcal{C}\) depends on the choice of blame tracking strategy (D or UD). \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (c_{11} \to c_{12}); (c_{21} \to c_{22}) & \longrightarrow (c_{21};c_{11}) \to (c_{12}; c_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ \end{align*}

The last reduction rule refers to \(\overline{c}\), a subset of the coercions that we refer to as wrapper coercions. We define wrapper coercions and coercions in normal form \(\hat{c}\) as follows. \[ \begin{array}{llcl} \text{optional injections} & \mathit{inj} & ::= & \iota \mid I! \\ \text{optional function coercions} & \mathit{fun} & ::= & \iota \mid \hat{c} \to \hat{c} \\ \text{optional projections} & \mathit{proj} & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \mathit{fun}; \mathit{inj} \\ \text{normal coercions} & \hat{c} & ::= & \mathit{proj} ; \mathit{fun}; \mathit{inj} \mid \mathit{proj}; \mathsf{Fail}^\ell \end{array} \] Here we can easily see that coercions in normal form can always be represented by a coercion with a length of at most three.

Theorem (Strong Normalization for Coercions) For any coercion \(c\), there exists a \(\hat{c}\) such that \(c \longrightarrow^{*} \hat{c}\).

With the Coercion Calculus in hand, we can define a space-efficient abstract machine. This machine is a variant of my favorite abstract machine for the lambda calculus, the ECD machine on terms in A-normal form. Herman et al. define an efficient reduction semantics that relies on mutually-recursive evaluation contexts to enable the simplification of coercions in tail position. Our choice of the ECD machine allows us to deal with coercions in tail position in a more straightforward way. Here's the syntax for our coercion-based calculus in A-normal form. The two main additions are the cast definition and the tail-call statement. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x:T = e(e) \mid x = e : \hat{c}\\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \hat{c} \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \overline{c} \end{array} \]

The evaluation function \(V\) maps expressions and environments to values in the usual way. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho \rangle \end{align*} The following auxiliary function applies a coercion to a value. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota_B \\ \mathbf{blame}\,\ell & \text{if } \hat{c} = \mathsf{Fail}^\ell \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c_2}) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c_2})= \iota_B \\ \mathbf{blame}\,\ell & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \mathsf{Fail}^\ell \\ \tilde{v} : \overline{c_3} & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \overline{c_3} \text{ and } \overline{c_3} \neq \mathsf{Fail}^\ell \end{cases} \end{align*}

The machine state has the form \((s,\rho,\kappa)\), where the stack \(\kappa\) is essentially a list of frames. The frames are somewhat unusual in that they include a pending coercion. We also need a pending coercion for the empty stack, so we define stacks as follows. \[ \begin{array}{llcl} & \sigma & ::= & [] \mid (x,s,\rho)::\kappa \\ \text{stacks} & \kappa & ::= & (c,\sigma) \end{array} \] The machine has seven transition rules to handle normal execution and two rules to handle cast errors. \begin{align*} (x:T = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (\iota_T,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (c_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : (c_1 \to c_2), \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2,c_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : c; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,e, \rho, (c,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], (\hat{c_3},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2\\ & (c_1; c_2) \longrightarrow^{*} \hat{c_3} \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\hat{c_5},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle : (c_3 \to c_4),\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, c_3) = v'_2, \text{ and} \\ & (c_4; c_1; c_2) \longrightarrow^{*} \hat{c_5} \\[2ex] (x = e : c; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (c,\sigma)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \end{align*}

The space-efficiency of this machine comes from two places. First, the values only ever include a single cast wrapped around a simple value. The cast function maintains this invariant by normalizing whenever a cast is applied to an already-casted value. The second place is the transition rule for tail calls. The coercion in tail position \(c_1\) is sequenced with the pending coercion \(c_2\) and normalized to \(c_3\), which becomes the new pending coercion.

While this machine is space efficient, it is not efficient with respect to time. The reason is that naive coercion reduction is an expensive process: one must search for a redex, reduce it, and plug the contractum back in. Furthermore, the associativity of coercions makes searching for a redex more challenging. In the next post I'll discuss a recursive function that directly maps a sequence of two coercions in normal form to its normal form, using ideas from the paper Threesomes, With and Without Blame I coauthored with Philip Wadler.

No comments:

Post a Comment