Consider the following statically-typed function. (The type
⋆ does not occur anywhere in this function.)
λf:Int→Int.x:Int=f(42);returnx
We'd like the execution speed of this function to be the same as if
the entire language were statically typed. That is, we don't want
statically-typed parts of a gradually-typed program to pay overhead
because other parts may be dynamically typed. Unfortunately, in the
abstract machines that we've defined so far, there is an overhead. At
the point of a function call, such as
f(42) above, the machine
needs to check whether
f has evaluated to a closure or to a
closure wrapped in a threesome. This act of checking constitutes some
run-time overhead.
Taking a step back, there are two approaches that one sees in the
literature regarding how a cast is applied to a function. One
approach is to build a new function that casts the argument, applies
the old function, and then casts the result. The reduction rule
looks like this:
v:T1→T2⇒T3→T4⟶λx:T3.(v(x:T3⇒T1)):T2⇒T4
The nice thing about this approach is that there's only one kind of
value of function type, functions! So when it comes to function
application, we only need one reduction rule, good old beta:
(λx:T.e)v⟶[x:=v]e
The other approach is to leave the cast around the function and then
add a second reduction rule for applications.
(v1:T1→T2⇒T3→T4)v2⟶(v1(v2:T3⇒T1)):T2⇒T4
The nice thing about this approach is that the cast around the
function is easy to access and change, which we took advantage of to
compress sequences of such casts. But as we've already pointed out,
having two kinds of values at function type induces some run-time
overhead, even in parts of the program that are statically typed.
Our solution to this conundrum is to use a hybrid representation and
to take advantage of the indirection that is already present in a
function call. Instead of having two kinds of values at function type,
we have only one: a closure that includes an optional threesome:
⟨λx:T.s,ρ,τ⊥⟩
When a closure is first created, there is no threesome. Later,
when a closure is cast, the threesome is added.
V(λx:T.s,ρ)=⟨λx:T.s,ρ,⊥⟩
The one transition rule for function application passes the optional
threesome as a special parameter, here named
c, to the function. In
the case of an un-casted closure, the function ignores the
c
parameter.
(x:T1=e1(e2);s,ρ,κ)⟼(s′,ρ′[y:=v2,c:=τ⊥],(T1T1⟹T1,(x,s,ρ)::κ))where V(e1,ρ)=⟨λy:T2.s′,ρ′,τ⊥⟩and V(e2,ρ)=v2
When an un-casted closure is cast, we build a wrapper function,
similar to the first approach discussed above, but using the special
variable c to refer to the threesome instead of hard-coding
the cast into the wrapper function. We add dom and
cod operations for accessing the parts of a function
threesome.
cast(⟨λx:T.s,ρ,⊥⟩,τ)={blameℓif τ=(T1Ip;⊥ℓ⟹T2)⟨λx1.s′,ρ′,τ⟩otherwisewhere s′=(x2=x1:dom(c);returnf(x2):cod(c))and ρ′={f:=⟨λx:T.s,ρ,⊥⟩}
When a closure is cast for the second time, the casts are combined to save space.
cast(⟨λx:T.s,ρ,τ1⟩,τ2)={blameℓif (τ1;τ2)=(T1Ip;⊥ℓ⟹T2)⟨λx:T.s,ρ,(τ1;τ2)⟩otherwise
That's it. We now have a machine that doesn't perform extra
dispatching at function calls. There is still a tiny bit of overhead
in the form of passing the c argument. This overhead can be
removed by passing the entire closure to itself (instead of passing
the array of free variables and the threesome separately), and from
inside the function, access the threesome from the closure.
In the following I give the complete definitions for the new
abstraction machine. In addition to dom and
cod, we add a tail call without a cast to avoid overhead
when there is no cast.
expressionse::=k∣x∣λx:T.s∣dom(e)∣cod(e)statementss::=d;s∣returne∣returne(e)∣returne(e):τoptional threesomesτ⊥::=⊥∣τvaluesv::=k∣k:τ∣⟨λx:T.s,ρ,τ⊥⟩
Here's the complete definition of the cast function.
cast(v,⊥)=vcast(k,τ)={kif τ=BB⟹Bblameℓif τ=BBp;⊥ℓ⟹Tk:τotherwisecast(k:τ1,τ2)={kif (τ1;τ2)=BB⟹Bblameℓif (τ1;τ2)=BBp;⊥ℓ⟹Tk:(τ1;τ2)otherwisecast(⟨λx:T.s,ρ,⊥⟩,τ)={blameℓif τ=(T1Ip;⊥ℓ⟹T2)⟨λx1.s′,{f:=⟨λx:T.s,ρ,⊥⟩},τ⟩otherwisewhere s′=(x2=x1:dom(c);returnf(x2):cod(c))cast(⟨λx:T.s,ρ,τ1⟩,τ2)={blameℓif (τ1;τ2)=(T1Ip;⊥ℓ⟹T2)⟨λx:T.s,ρ,(τ1;τ2)⟩otherwise
Here are the updated evaluation rules.
V(k,ρ)=kV(x,ρ)=ρ(x)V(λx:T.s,ρ)=⟨λx:T.s,ρ,⊥⟩V(dom(e),ρ)=τ1if V(e,ρ)=τ1→τ2V(cod(e),ρ)=τ2if V(e,ρ)=τ1→τ2
Lastly, here are the transition rules for the machine.
(x:T1=e1(e2);s,ρ,κ)⟼(s′,ρ′[y:=v2,c:=τ⊥],(T1T1⟹T1,(x,s,ρ)::κ))where V(e1,ρ)=⟨λy:T2.s′,ρ′,τ⊥⟩and V(e2,ρ)=v2(x=op(¯e);s,ρ,κ)⟼(s,ρ[x:=v],κ)where v=δ(op,V(e,ρ))(x=e:τ;s,ρ,κ)⟼(s,ρ[x:=v′],κ)where V(e,ρ)=v and cast(v,τ)=v′(returne,ρ,(τ,(x,s,ρ′)::κ))⟼(s,ρ′[x:=v′],κ)where V(e,ρ)=v and cast(v,τ)=v′(returne1(e2),ρ,κ)⟼(s,ρ′[y:=v2,c:=τ⊥],κ)where V(e1,ρ)=⟨λy:T.s,ρ′,τ⊥⟩and V(e2,ρ)=v2(returne1(e2):τ1,ρ,(τ2,σ))⟼(s,ρ′[y:=v2,c:=τ⊥],((τ1;τ2),σ))where V(e1,ρ)=⟨λy:T.s,ρ′,τ⊥⟩and V(e2,ρ)=v2(x=e:τ;s,ρ,κ)⟼blameℓwhere V(e,ρ)=v,cast(v,τ)=blameℓ(returne,ρ,(τ,(x,s,ρ′)::κ))⟼blameℓwhere V(e,ρ)=v,cast(v,τ)=blameℓ
I like how there are fewer rules and the rules are somewhat simpler
compared to the previous machine. There is one last bit of overhead in
statically typed code: in a normal return we have to apply
the pending threesome that's on the stack. If one doesn't care about
making tail-calls space efficient in the presence of casts, then this
wouldn't be necessary. But I care.
No comments:
Post a Comment