The interaction between lexical scoping of variables and first-class functions,
as found in all functional programming languages, is rather subtle and
interesting, especially if you want to compile the language to
assembly code. For some reason, computer architects have yet to include
first-class functions with lexical scoping in modern ISAs! ;)
In this blog post, I'll try to capture the essence of compiling from a
language with lexical scoping and first-class functions to a language
*without* lexical scoping (but still with first-class functions).
This kind of translation is usually referred to as
*closure conversion* for reasons that will become apparent.

We take the Simply-Typed Lambda Calculus (STLC) as our first example of a language with lexical scoping and first-class functions. Unlike most treatments of closure conversion, we create a custom intermediate language (IL1) to be the target of closure conversion. IL1 is similar to the STLC except that lexical scoping does not apply to functions, that is, occurrences of variables inside functions may not refer to binders outside of the function. However, functions in IL1 are still first class in the sense that they can be passed to and returned from functions. Most presentations of closure conversion use a lower-level intermediate language. We do not take that approach here because it clouds the essence of closure conversion by bringing in data representation issues.

Both the source and target languages of the translation are statically
typed, so it will be important for the translation to map well-typed
programs to well-typed programs. The type-preserving aspect of the
translation is inspired by Morrisett et al.'s work on the TIL
compiler. For reference, see *From System F to Typed Assembly Language*
in ACM TOPLAS, 1999. However, we avoid some of the complexity of
their approach by our use of a custom intermediate language.
In particular, the use of existential types is replaced by an
existential at the meta-level, in the typing rule for closures.

In a later blog post, I'll move to a slightly richer language, the Polymorphic Lambda Calculus (System F). I'll make some modifications to IL1 to obtain IL2 and then present a translation from System F to IL2.

### Warming up with an example

Consider the following program in the STLC.
The outer function is applied to 40, so *x* is bound to 40.
The outer function immediately returns the inner
function, which is then applied to 2, so *y* is bound to 2.
The inner function computes ,
so it returns 42. Note how the inner function referred to *x*,
the parameter of the outer function. This is an example of lexical scoping.

Now let's try to translate the above program into an equivalent program
(produces the same answer) but that does not rely on lexical scoping
(except for global variables).
Note that if functions are not lexically scoped, then they might as well
be defined at the top of the program, and not be nested.
Our first attempt adds an extra parameter to the inner function.
However, the above is ill-formed because inside *g*, *f*
is applied to one argument but it requires two. And unfortunately,
the second argument is not available inside *g*, it doesn't
come along until we return from *g* (the argument 2).
Furthermore, we don't want to call *f* inside of *g*, we
just want to bind the value for *x*.

The way out of this problem is to get creative with the target language of closure conversion. After all, the target language is just our private intermediate language, so we can do anything we want with it, as long as it helps us get closer to assembly code.

In the above example, what we need is some kind of function that can
bind some of its arguments before being called.
Focusing on the function *f*, it needs to bind the value of *x*
inside of *g*. Only later *f* will be actually called, with *y* bound
to 2. Let's use as the notation for
our new kind of function in which all of the parameters except for the last
are the *early parameters*. The notation
is the *closure operator*. It binds all the early parameters of
function
to the arguments
but it does not call the function.

With our new kind of
function and closure operator, we can translate our example to the
following.
The first thing that happens is that *g* is trivially closed.
Then it is called with the normal argument 40. Inside *g*,
we close *f*, binding the value of x (which is 40) to parameter
*x* of *f*.
This closure is returned from *g* and called with normal
parameter 2. The body of *f* is then evaluated, and
gives us 42 as expected.

Next we move on to spell out all of the details of how this works. We first review the definition of the STLC and then present the definition of IL1, complete with the new functions. We then present closure conversion, the translation from STLC to IL1.

### The Simply-Typed Lambda Calculus

To review, the following is the definition of the Simply-Typed Lambda Calculus.

Syntax
The type *B* ranges over basic types such as `Int` and `Bool`.
We restrict complete STLC programs to have a type in *B*.
The expression *c* ranges over constants such as `0` and
`true`. The operator *op* ranges over primitive operations
such as arithmetic and logical operations. We use an overline to denote
a sequence of things.

Type System
**Note:** The appearance of
in the antecedent of the typing rule for functions is what allows
occurrences of variables inside a function to refer to bindings outside
the function.

Substitution
**Note:** Substitution goes inside of functions because there are variable
occurrences inside the functions that refer to variables from outside the
function.

Reduction Rules and Evaluation

### An Intermediate Language (IL1) without Lexical Scoping

Now for the creative part of defining IL1.
The type of a function will
have the form ,
where are the types for the
early parameters, is the type of
the normal parameter, and is the
return type. The result of applying the closure operator to
a function is another kind of
function, a function
(otherwise known as a *closure*) which has stored away the
values for its early parameter. Given a function of type ,
its closure will have type
.

Syntax

Type System
**Note:** The antecedents of the typing rules for the
and
functions do not
include !

Shallow Substitution
**Note:** Shallow substitution does not go inside of values, particularly
it leaves functions unchanged in contrast to substitution in the STLC.

We introduce the following notation for iterated substitution and shallow substitution.

Reduction Rules and Evaluation
**Note:** Function application uses shallow substitution but
for globals, we use normal (deep) substitution.

### Translation from the Simply-Typed Lambda Calculus to IL1

With our source and target languages in place, we're ready to
go into the details of closure conversion. The main
*ClosureConversion* function takes as input a STLC program
and outputs an IL1 program. It uses an auxiliary recursive function *C*
that takes a STLC expression and returns two things, an IL1 expression
and a list of functions, or more precisely, a list of
bindings, where each binding consists of the function's name
and the actual function. As we can see
in the body of the *ClosureConversion* function, the
functions returned from *C* become installed as global variables.

The way *C* processes expressions can be divided into three
categories. For variables and constants, we just translate them
to themselves and return an empty list of functions. For
primitive and normal application, we apply *C* recursively
to the subexpressions, create a new application from the resulting expressions
and return the list of all the functions from the subexpressions.
So that is it for the easy cases.
All of the action is in the case for .
After the recursive call, we compute the *free variables* (FV)
of the body of the function and remove the parameter *x*. This
gives us all of the variables (and their types) that are used inside the function but
defined outside. We generate a fresh name *g* for this function, then return
two things. First, we apply the closure operator to *g*, with the
free variables as arguments. At runtime, these variables will evaluate to
values, which will then be captured in the closure. Second, we return the
list of functions from the recursive call, plus a new
function for this
,
which we bind to *g*.

That's it! The essence of closure conversion is

- Converting from lexically-scoped functions to two-phase functions
that support a
*closure operator*to bind early parameters and an application operator to bind the late parameter and execute the body. - Lifting all functions to become global variables.

## No comments:

## Post a Comment