I just got back from Copenhagen, where I gave a tutorial on gradual
typing at the Workshop on Scheme and Functional Programming.
I very much enjoyed the workshop and giving the tutorial.
Thank you for the invitation Olivier!
For those of you who couldn't be there, this series of blog posts will
include the material from my tutorial. For those of you who were
there, this series will include some bonus material: an efficient
machine for "Eager D" based on recent work by Ronald Garcia and
myself.
When I first began working on gradual typing in 2005 and 2006, my
focus was on the type system. The main pieces of the type system fell
into place that first year, and ever since then I've been thinking
about the dynamic semantics. It turns out there are many design
choices and implementation challenges regarding the dynamic
semantics. In this post I'll restrict my attention to the
gradually-typed lambda calculus, as many issues already arise in that
setting. I'll quickly review the syntax and type system, then move on
to discuss the dynamic semantics.
The following defines the syntax for the gradually-typed lambda
calculus. Here I'm writing the dynamic type as . Also, note
that a lambda without a type annotation on its parameter is shorthand
for a lambda whose parameter is annotated with .
The Gradual Type System and Cast Insertion
The type system of the gradually-typed lambda calculus is quite
similar to the simply-typed lambda calculus. The only differences are
in the rules for application. Instead of requiring the argument's type
to be identical to the function's parameter type, we only require that
the types be consistent, written
and defined below. We also allow the application of expressions
of type \star.
The lack of contra-variance in how function parameters are handled in
the consistency relation not a mistake. Unlike subtyping, the
consistency relation is symmetric, so it wouldn't matter if we
wrote instead of
. Also, consistency is
not transitive, which is why we don't use a separate subsumption rule,
but instead use consistency in the rules for application.
The dynamic semantics of the gradually-typed lambda calculus is not
defined in terms of the surface syntax, but instead it is defined on
an intermediate language that extends the simply-typed lambda calculus
with explicit casts.
We use a non-standard notation for casts so that they are easier to
read, so that they go left to right. The casts are annotated with
blame labels, which we treat here as symbols, but in a real
implementation would include the static location (line and character
position) of the cast. We use a single blame label without any notion
of polarity because these casts are really just casts, not contracts
between two parties.
Cast insertion is a type-directed translation, so it is the same as
the type system with the addition of an output term.
We often abbreviate a pair of casts
to remove the duplication of the middle type as follows
Design Choices Regarding the Dynamics
Consider the following example in which a function is cast to the
dynamic type and then cast to a type that is inconsistent with the
type of the function.
\begin{align*}
& \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x)
: \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star
\Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\
& \mathsf{in} \, f\, \mathsf{true}
\end{align*}
A few questions immediately arise:
- Should a runtime cast error occur during the evaluation of the
right-hand side of the let? Or should the runtime error
occur later, when f is applied to \textsf{true}?
- When the runtime cast error occurs, which cast should be
blamed, or
? More generally, we want to define a
subtyping relation to characterize safe casts (casts that never
fail), and the specifics of subtyping relation depend on the blame
tracking strategy.
Ron, Walid, and I wrote a paper, Exploring the Design Space of
Higher-Order Casts (ESOP 2009), that characterized the different
answers to the above questions in terms of Henglein's Coercion
Calculus. One can choose to check higher-order casts in either a
lazy or eager fashion and one can assign blame to only
downcasts (D) or the one can share blame between upcasts and downcasts
(UD). The semantics of casts with lazy checking is straightforward
whereas eager checking is not, so we'll first discuss lazy checking.
Also, the semantics of the D approach is slightly simpler than UD, so
we'll start with lazy D. We'll delay discussing the Coercion Calculus
until we really need it.
The Lazy D Semantics
We'll define an evaluation (partial) function
that maps
a term and environment to a result. That is, we'll give Lazy D a
denotational semantics. The values and results are defined as
follows:
\begin{array}{lrcl}
& F & \in & V \to_c R \\
\text{values} & v \in V & ::= & k \mid F \mid v : T \Rightarrow^\ell \star
\mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \\
\text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell
\end{array}
To handle the short-circuiting of evaluation in the case of a cast
error (signaled by ),
we use the following monadic operators:
\begin{align*}
\mathbf{return}\,v &= v \\
\mathbf{letB}\,X = M\,\mathbf{in}\, N &=
\mathbf{case}\,M\,\mathbf{of}\,\\
& \quad\;\; \mathbf{blame}\,\ell \Rightarrow \mathbf{blame}\,\ell \\
& \quad \mid v \Rightarrow [X{:=}v]N
\end{align*}
The primitive operators are given their semantics by
the function.
\begin{align*}
\delta(\mathsf{inc},n) &= n + 1 \\
\delta(\mathsf{dec},n) &= n - 1 \\
\delta(\mathsf{zero?},n) &= (n = 0)
\end{align*}
In lazy cast checking, when determining whether to signal a cast
error, we only compare the heads of the types:
\begin{align*}
\mathit{hd}(B) &= B \\
\mathit{hd}(T_1 \to T_2) &= \star \to \star
\end{align*}
The following auxiliary function, named cast, is the main
event. It is defined by cases on the source and target types and
. The line for projecting from to picks the
blame label from the projection (the down-cast), which is what gives
this semantics its "D".
\begin{align*}
\mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\
\mathsf{cast}(v,B,\ell,B) &= v \\
\mathsf{cast}(v,\star,\ell,\star) &= v \\
\mathsf{cast}(v,\star,\ell,T_2) &=
\mathbf{case}\,v\,\mathbf{of}\,
(v' : T_3 \Rightarrow^{\ell'} \star) \Rightarrow \\
& \qquad \mathsf{cast}(v',T_3,\ell,T_2) \\
\mathsf{cast}(v,T_1,\ell,\star) &= v : T_1 \Rightarrow^\ell \star \\
\mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &=
v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22}
\end{align*}
The apply auxiliary function performs function application,
and is defined by induction on the first parameter.
\begin{align*}
\mathsf{apply}(F,v_2) &=F(v_2) \\
\mathsf{apply}(v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4,v_2) &=
\mathbf{letB}\,X_3 = \mathsf{cast}(v_2,T_3,\ell,T_1)\,\mathbf{in} \\
& \quad \mathbf{letB}\,X_4 = \mathsf{apply}(v,X_3)\,\mathbf{in} \\
& \quad \mathsf{cast}(X_4, T_2, \ell, T_4)
\end{align*}
With these auxiliary functions and monadic operators in hand, the
definition of the evaluation function
is straightforward.
\begin{align*}
\mathcal{E}(k,\rho) &= \mathbf{return}\, k \\
\mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\
\mathcal{E}(\lambda x{:}T.\,e, \rho) &=
\mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\
\mathcal{E}(\mathit{op}(e)) &=
\mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\
\mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &=
\mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X,T_1 ,\ell, T_2) \\
\mathcal{E}(e_1\,e_2) &=
\mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\
& \quad \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\
& \quad \mathsf{apply}(X_1,X_2)
\end{align*}
The semantics for the Lazy D Gradually-Typed Lambda Calculus is
defined by the following
partial function.
\mathit{eval}(e) =
\begin{cases}
\mathit{observe(r)} &
\text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\
\bot & \text{otherwise}
\end{cases}
where
\begin{align*}
\mathit{observe}(k) &= k \\
\mathit{observe}(F) &= \mathit{function} \\
\mathit{observe}(v : T_1\to T_2\Rightarrow^\ell T_3\to T_4) &= \mathit{function} \\
\mathit{observe}(v : T \Rightarrow \star) &= \mathit{dynamic} \\
\mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell
\end{align*}
Exercise:
Calculate the output of eval for the example program
at the beginning of this post.
Similar to object-oriented languages, we can define a subtyping
relation that characterizes when a cast is safe, that is, when a cast
will never fail. The following is the subtyping relation for the D
semantics.
\begin{gather*}
\frac{}{T <: \star} \qquad
\frac{}{B <: B} \qquad
\frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4}
\end{gather*}
This subtyping relation is what I expected to see. The dynamic type plays the
role of the top element of this ordering and the rule for function
types has the usual contra-variance in the parameter type.
The Subtyping Theorem connects the dynamic semantics with
the subtyping relation.
Theorem (Subtyping)
If the cast labeled with
\ell in program e respects subtyping, then \mathit{eval}(e) \neq
\mathbf{blame}\,\ell.
The Lazy UD Semantics
One interpretation of the dynamic type is to
view it as the following recursive type:
\star \equiv \mu \, d.\, \mathsf{Int} + \mathsf{Bool} + (d \to d)
(See, for example, the chapter on Dynamic Typing in Robert Harper's
textbook Practical Foundations for Programming Languages.) In
such an interpretation, one can directly convert from to
, but not, for example, from
to .
Instead, one must first convert from to
and then to .
Let I be the subset of types that can be directly injected
into :
I ::= B \mid \star \to \star
The definition of values for Lazy UD changes to use
I instead of T for the values of type .
\begin{array}{lrcl}
\text{values} & v \in V & ::= & k \mid F \mid v : I \Rightarrow^\ell \star
\mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4
\end{array}
This change in the definition of value necessitates some changes in the
cast function. The second and third-to-last lines below
contain most of the changes.
\begin{align*}
\mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\
\mathsf{cast}(v,B,\ell,B) &= v \\
\mathsf{cast}(v,\star,\ell,\star) &= v \\
\mathsf{cast}(v,\star,\ell,T_2) &=
\mathbf{case}\,v\,\mathbf{of}\,
(v' : I \Rightarrow^{\ell'} \star) \Rightarrow \\
& \qquad \mathsf{cast}(v',I,\ell,T_2) \\
\mathsf{cast}(v,I,\ell,\star) &= v : I \Rightarrow^\ell \star \\
\mathsf{cast}(v,T_{11}\to T_{12},\ell,\star) &= v : T_{11}\to T_{12} \Rightarrow^\ell \star \to \star \Rightarrow^\ell \star \\
& \text{if } T_{11} \neq \star, T_{12} \neq \star \\
\mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &=
v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22}
\end{align*}
The rest of the definitions for Lazy UD are the same
as those for Lazy D. The following is the subtyping relation for Lazy UD. With this
subtyping relation, the type \star does not play the role of the
top element. Instead, a type $T$ is a subtype of \star if
it is a subtype of some injectable type I.
\begin{gather*}
\frac{}{\star <: \star} \qquad
\frac{T <: I}{T <: \star} \qquad
\frac{}{B <: B} \qquad
\frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4}
\end{gather*}
Theorem (Subtyping)
If the cast labeled with
\ell in program e respects subtyping, then \mathit{eval}(e) \neq
\mathbf{blame}\,\ell.
Exercise:
Calculate the output of the Lazy UD eval for the example program
at the beginning of this post.
In the next post I'll turn to the efficient implementation of Lazy D
and UD.