Sunday, July 15, 2012

Over-specification in operational semantics, or the case of the missing "eval"

For a long time, one aspect of operational semantics kept nagging at me: all the intermediate steps in the evaluation of the program are spelled out. This over-specifies the semantics in the sense that a compiler writer certainly doesn't need to make sure that the executing program really goes through each of those intermediate steps. So how should a compiler writer read the operational semantics of a language?

Fortunately, I asked Matthias Felleisen this question over dinner a couple years back, and he told me the simple solution. The semantics of a language should not be defined as the single-step reduction relation, or even the reflexive transitive closure of it, but instead, every semantics should define an eval function (technically, a partial function) that maps a program to its observable behavior. Readers familiar with denotational semantics might grin at this point; that's what a denotational semantics does! True, but it's also straightforward to define the eval function using a reduction relation.

Suppose the language in question is the Simply-Typed Lambda Calculus with constants (like numbers and Booleans) ranged over by c. Then the following eval function is one straightforward choice of semantics. By stating that eval is the semantics of the STLC, we're telling a compiler writer that an implementation is correct so long as, after running a program, it prints out the correct constant or the string 'function', in accordance with the result given by eval. In general, programs don't always halt, which is the reason for the case. Of course, STLC is a somewhat odd language in that every STLC program halts, so the case is not technically needed here, even though for most languages it would be needed.

These days I try to remember to include the eval function is the semantics that I write. Yes, the eval function is usually pretty obvious, but so are many aspects of the semantics. Yes, it's pedantic, but in some sense writing down a formal semantics is all about being pedantic.

There aren't many examples of eval in the literature. Most textbooks on programming language theory don't include a single eval function. However, the book Semantic Engineering with PLT Redex and the notes that preceded the book, Programming Languages and Lambda Calculi, by Matthias and his coauthors, includes many example eval functions.

So the next time you're writing down an operational semantics, please include the eval!

What about big-step operational semantics?

If you use big-step semantics instead of small-step, part of the over-specification problem goes away, but not all. A big-step semantics doesn't specify every intermediate step of computation; it relates programs directly to the resulting values. However, there is still an issue regarding what counts as observable. The following is a typical big-step semantics for the STLC.

Would this big-step semantics serve as a good specification for the STLC? Not quite. Consider the semantics of the following program. The problem is that we don't want to tell compiler writers how to represent closures. For example, a high-quality implementation doesn't store the x because it doesn't occur in the body of the lambda.

To fix this over-specfication, we need to specify what's observable about a closure, which is traditionally nothing except that it is a closure. So we can define an eval function based on a big-step semantics as follows to obtain a good specification for the STLC.


  1. Nice post! What you're really after, I think, is a notion of "test": programs can then be distinguished semantically by asking whether they pass some test. A test is a context paired with a value that is expected when a program is plugged into the context and run. This is common in, e.g., pi calculi, where may-testing semantics is defined in terms of whether a program, when run in parallel with a "testing environment," outputs on a particular channel. It should be possible to adapt this notion for lambda calculi.

  2. Yes, once we start talking about languages with input and output, then the notion of observation must be some form of trace. And once we start talking about separate compilation and ABIs (application binary interfaces), we need a more detailed kind of trace. I see several different formalisms out there that might do the job: the testing semantics from concurrent calculi (e.g. Algebraic Theory of Processes by Matthew Hennessy),
    game semantics (e.g. work by Samson Abramsky, Luke Ong, etc.), and decision trees for SPCF (Cartwright, Felleisen, Berry, and Curien). I think the real trick will be distilling the simplest form of such semantics and relating them to more familiar operational semantics, so that they can be easily applied by working language designers. One recent paper in this direction is Temporal higher-order contracts by Disney, Flanagan, and McCarthy.