Sunday, January 26, 2014

The Publication Process in Programming Languages

There was an interesting discussion at the ACM SIGPLAN Meeting at POPL 2014 regarding the problems and potential solutions with the publication process within programming languages. The discussion centered around the problem that we primarily publish in conferences and not journals, which makes the field of programming languages look bad compared to other disciplines. I agree that this is a problem, but it is not the only problem with our publication process. As we consider alternative publication models, we should aim to solve all of the problems (or as many as possible). The following are some of the other problems that I see with our current conference-oriented process. I conclude with a short outline of a possible solution inspired by code review processes in open-source software groups such as C++ Boost.

Soundness vs. Importance

When making the accept/reject decision for conference publications, we judge papers on both scientific soundness (are the claims in the paper true?) and some sort of feeling of importance. These two things should be decoupled. On the one hand, we don't give ourselves enough time to evaluate scientific soundness, and on the other, we don't need to, and should not try to evaluate importance at the time of publication because we are not very good at doing so. A somewhat amusing/annoying side-effect of this evaluation model is that many papers dedicate several paragraphs to inflating the importance of the results, sometimes obscuring the actual results. Another rumored side-effect is authors cherry-picking data to make their results seem more important. Similarly, when is the last time you read about a negative result in a programming languages conference or journal?

Publication vs. Presentation

The conference system conflates the decision to accept for publication with the decision to accept for presentation. Despite many conferences going to multi-track formats, the number of available presentation slots (and their cost) plays a strong role in limiting the number of papers accepted for publication. On the other hand, the cost of publishing a paper is very low (and should be much lower than what it is for the ACM). The only limit on publishing papers should be with regards to scientific soundness.

Publication Delay

One of the supposed benefits of the conference system is fast turn-around. However, I know of many papers that take over a year, sometimes multiple years, to be published because they receive multiple rejections before being accepted for publication, and not because the paper is scientifically unsound. This phenomenon slows the scientific process because the dissemination of scientific results is delayed. (Many researchers do not feel it is appropriate to post pre-prints on their web pages.) This phenomenon also increases the reviewing load on the community because some papers end up with nine reviews when three would have sufficed.

 Reviewer Expertise

Papers are not always reviewed by the reviewers with the most expertise. On a fairly routine basis, I find minor inaccuracies and problems in published papers in the areas that I'm expert. In such situations, I feel that I should have reviewed the paper but didn't get a chance to because I wasn't on that particular program committee. I was on other PC's that year. One of the reasons this happens is that there is large overlap in topics among the major programming language conferences.

Narrow Audience

Most papers in our area are written with a very narrow audience in mind: other experts in programming languages. The reason for this is obvious: program committees consist of experts in programming languages. This narrow writing style reduces the readability of our papers to people outside the area, including the people in industry who are in a position to apply the research.

Reviewer Load

Recently, program chairs have been discouraging PC members from getting help from their graduate students and friends. There are a couple problems with this. First, reviewing is an important part of  training graduate students. Second, with over 20 papers assigned to a PC member over a couple months time, doing high-quality reviews becomes a heroic effort. For academics whose job description includes reviewing, this kind of load is manageable in a macho sort of way. For those in industry (not counting a few research laboratories), I imagine this kind of load is a non-starter.

Post-Publication Reviews and Corrections

The current publication process does not provide a sufficiently low-overhead mechanism for reviews and/or corrections to come along after a paper has been published. In particular, it would be great to have an easy way for someone that is trying to replicate an experiment, or apply a technique in industry, to post a review providing further data.

The Outline of a Solution

Instead of a mixture of conferences and journals, we should just have one big continual reviewing process for all research on programming languages. Anyone who wants to participate could submit papers (anonymously or not) and anyone could review any submission (anonymously or not). Both the paper submissions and the reviews would be public. Authors would likely revise their papers as reviews come in. If it turns out that some people submit lots of papers without doing any reviews, then some simple rules regarding the ratio of paper submissions to reviews could be created. A group of editors would evaluate reviews and skim papers to decide whether a paper is scientifically sound and therefore deserving of publication, that is, deserving of their seal of approval. The editor's meta-review and decision would also be public. Of course, reviews of papers that have already been published would be allowed and could lead to a revision or even retraction of a paper. Regarding presentations, conferences and workshops could select from the recently published papers as they see fit.

This outline of a solution has much in common with the Open Review model already used by several conferences (ICLR, ICML, and AKBC at openreview.net), though it is directly inspired by my positive experiences with the code review process in the C++ Boost open-source software group.

Wednesday, July 10, 2013

Indiana Univ. beats Univ. of Colorado, 14 to 12

As of August 1, 2013 I am starting a new job as an Associate Professor in the Computer Science Division of the School of Informatics (SOIC) at Indiana University (IU). If you're unfamiliar with Indiana University, you might wonder why I'm moving there from the University of Colorado (CU). After all, CU is on the doorstep of the Rocky Mountains and some of the best skiing in the world. (And I love skiing.)

There are three reasons why I was open to interviewing elsewhere this year. The first reason is that at CU, I was in the Electrical, Computer, and Energy Engineering (ECEE) Department, not the Computer Science Department. I was happy to start out as an Assistant Prof. in CU's ECEE department because it solved our two-body problem (my wife Katie is a prof. in the CS dept.) and got my foot in the door of academia. However, I am a Computer Scientist, so in the long term that was just not going to work and an offer from another university would make the switch to CS possible. The second reason is that I come up for tenure at CU in AY 2013-2014, and although my case looks good, one can never be too careful. If the chances are 99% of getting tenure, then I'm looking for how to change that to 100%. I figured that an outside offer would do just that. Finally, the third reason I was open to interviewing was the low or nonexistent raises at CU over the past six years. I realize that this is pretty common around the country, but then again CU was giving enormous raises to some of the people in administration and raising tuition.

In the Fall of 2012, I found out that IU had several faculty positions open in the SOIC and that they wanted Katie and I to apply. Katie and I both got our Ph.D. from IU and have fond memories of IU and Bloomington, IN. IU has a strong tradition in Programming Languages and the current PL group is excellent. So for me, IU is a better fit with respect to research collaboration than CU. Similarly for Katie, IU has a growing group in Health Informatics, so IU is also a great fit for Katie.
Bloomington is a wonderful little college town in the rolling hills of southern Indiana, surrounded by state forest and lakes. It's a good place for bicycling but not downhill skiing.

To make a long story short (the academic hiring process only has one gear, first gear), we interviewed at IU and received an attractive offer that included tenure, a sizable startup package, and a significant raise. CU made a counter-offer that included a moderate package of research funds, a moderate raise,  and me moving to the CS dept. CU elected not to rush our tenure process.

On April 24, 2013, Katie and I had a complex and difficult decision to make. The below picture is the pros/cons list that we created that night to help us make the decision. We divided the items into things related to family (the top three), things related to career (next 6), and then things related to lifestyle. We gave a 3x weighting to the family items, a 2x weighting to the career items, and a 1x weighting to the lifestyle items. To keep things simple, we just gave a + (for +1), - (for -1), or 0 to each item. As you can see written in red at the bottom, IU won out over CU by a score of 14 to 12. So we're off to Bloomington!


 

Monday, May 27, 2013

Type Safety in Three Easy Lemmas

Last year I wrote a blog post showing how proofs of type safety can be done with just five easy lemmas if one defines the programming language in a certain way, using a special abstract machine. In this blog post I'll improve on that post in two ways. I'll show that type safety can be proved with just three easy lemmas and I'll use a standard interpreter-based definition of the language instead of a special abstract machine.
For readers interested in mechanized metatheory, here's the mechanized version of this post in Isabelle.

Syntax

This time we do not need to put the language in A-normal form. To review, we study a simple but Turing-complete language with integers, Booleans, and first-class functions that may be recursive. \[ \begin{array}{lrcl} \text{types} & T & ::= & \mathtt{Int} \mid \mathtt{Bool} \mid T \to T \\ \text{variables} & x,y,z \\ \text{integers} & n \\ \text{operators}&o & ::= & + \mid - \mid \,=\, \\ \text{Booleans}&b & ::= & \mathtt{true} \mid \mathtt{false}\\ \text{constants}&c & ::= & n \mid b \\ \text{expressions} & e & ::= & c \mid o(e,e) \mid x \mid \mathtt{fun}\,f(x{:}T)\, e \mid e(e) \end{array} \]

Dynamic Semantics

As before, we use the notation \(\epsilon\) for the empty list. Given a list \(L\), the notation \(a \cdot L\) is a larger list with \(a\) as the first element and the rest of the elements are the same as \(L\). We use lists of key-value pairs (association lists) to represent mapping from variables to types (type environments, written \(\Gamma\)) and variables to values (environments, written \(\rho\)). The following lookup function finds the thing associated with a given key in an association list. The return type of \(\mathit{lookup}\) and the use of \(\mathit{stuck}\) and \(\mathit{return}\) deserves a little explanation, which I give below. \begin{align*} \mathit{lookup} & : \alpha \times (\alpha \times \beta) \, \mathit{list}\to \beta \, \mathit{result}\\ \mathit{lookup}(K_1, \epsilon ) &= \mathit{stuck} \\ \mathit{lookup}(K_1, (K_2,V) \cdot L ) &= \begin{cases} \mathit{return}\,V & \text{if } K_1 = K_2 \\ \mathit{lookup}(K_1,L) & \text{otherwise} \end{cases} \end{align*} One might normally consider \(\mathit{lookup}\) as a partial function, because it might fail to find a matching key in the association list. However, here we're going to make the partiality explicit by returning a special result called \(\mathit{stuck}\). We'll also include \(\mathit{timeout}\) in the kinds of results, which we explain shortly. \[ \mathit{datatype}\;\;\alpha\,\mathit{result} = \mathit{Result}\,\alpha \mid \mathit{Stuck} \mid \mathit{TimeOut} \] We define the following auxiliary notation for dealing with the \(\mathit{result}\) datatype. The most import definition is the last line, which helps us avoid cluttering our code with lots of \(\mathit{case}\)'s. \begin{align*} \mathit{return}\,a &\equiv \mathit{Result}\;a \\ \mathit{stuck} &\equiv \mathit{Stuck} \\ \mathit{timeout} &\equiv \mathit{TimeOut} \\ X \gets M_1;\; M_2 &\equiv \mathit{case}\,M_1\,\mathit{of} \\ & \qquad \mathit{Stuck} \Rightarrow \mathit{Stuck} \\ & \quad\; \mid \mathit{TimeOut} \Rightarrow \mathit{TimeOut} \\ & \quad\; \mid \mathit{Result}\, R \Rightarrow [X := R]M_2 \end{align*} The values of this language (the results of evaluation) are constants and closures. A closure pairs a function with an environment \(\rho\). \[ \begin{array}{lrcl} \text{values}& v & ::= & c \mid \langle f(x{:}T)\, e, \rho \rangle \end{array} \] In many places within the interpreter we're going to extract an integer, Boolean, or closure from a value. The extraction might fail because, for example, even though we may want to extract an integer the value may instead be a Boolean. \begin{align*} \mathit{toInt}(n) &= \mathit{return}\,n \\ \mathit{toInt}(b) &= \mathit{stuck} \\ \mathit{toInt}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{stuck} \\ \\ \mathit{toBool}(n) &= \mathit{stuck} \\ \mathit{toBool}(b) &= \mathit{return}\,b \\ \mathit{toBool}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{stuck} \\ \\ \mathit{toClosure}(n) &= \mathit{stuck} \\ \mathit{toClosure}(b) &= \mathit{stuck} \\ \mathit{toClosure}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{return}\,(f,x,e,\rho) \end{align*} Next we define the \(\delta\) function, which gives meaning to the primitive operators. \begin{align*} \delta(+, v_1, v_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2);\; \mathit{return}\; (n_1 + n_2) \\ \delta(-, v_1, v_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2); \; \mathit{return}\,(n_1 - n_2) \\ \delta(=, n_1, n_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2);\; \mathit{return}\; (n_1 = n_2) \end{align*} The evaluation function \(\mathcal{V}\) is a mostly-standard closure-based interpreter. The one thing that is a bit unusual is that we make sure that the interpreter is a total function by ensuring termination. The interpreter includes a third parameter that is a counter. If the counter gets to zero, the result is \(\mathit{timeout}\). (This counter technique was described in the earlier blog post.) \begin{align*} \mathcal{V}(e,\rho,0) &= \mathit{timeout} \\ \mathcal{V}(x,\rho,1+k) &= \mathit{lookup}(x,\rho) \\ \mathcal{V}(c,\rho,1+k) &= \mathit{return}\,c \\ \mathcal{V}(\mathtt{fun}\,f(x{:}T)\, e, \rho, 1+k) &= \mathit{return}\,\langle f (x{:}T)\,e,\rho\rangle \\ \mathcal{V}(o(e_1,e_2),\rho,1+k) &= v_1 \gets \mathcal{V}(e_1,\rho,k);\; v_2 \gets \mathcal{V}(e_2,\rho,k); \; \delta(o, v_1, v_2) \\ \mathcal{V}(e_1\,e_2,\rho,1+k) &= v_1 \gets \mathcal{V}(e_1,\rho,k); \; v_2 \gets \mathcal{V}(e_2,\rho,k); \\ & \quad\; (f,x,e,\rho') \gets \mathit{toClosure}(v_1); \\ & \quad\; \mathcal{V}(e, (x,v_2) \cdot (f,v_1) \cdot \rho', k) \end{align*} To finish off the dynamic semantics we must define \(\mathit{eval}\) which specifies the behavior of whole programs. The behavior of a program is to either return a constant, return \(\mathit{diverge}\) which indicates that the program runs forever, or the behavior is undefined. \[ \mathit{eval}(e) = \begin{cases} c & \text{if } \mathcal{V}(e,\epsilon,n) = \mathit{Result}\;c \text{ for some } n \\ \mathit{diverge} & \text{if } \forall n.\; \mathcal{V}(e,\epsilon,n) = \mathit{TimeOut} \end{cases} \]

Type System

The types for the constants is given by the \(\mathit{typeof}\) partial function. \begin{align*} \mathit{typeof}(n) &= \mathsf{Int} \\ \mathit{typeof}(\mathsf{true}) &= \mathsf{Bool} \\ \mathit{typeof}(\mathsf{false}) &= \mathsf{Bool} \end{align*} The \(\Delta\) partial function maps a primitive operator and argument types to the return type. \begin{align*} \Delta(+,\mathsf{Int},\mathsf{Int}) &= \mathsf{Int} \\ \Delta(-, \mathsf{Int},\mathsf{Int}) &= \mathsf{Int} \\ \Delta(=,\mathsf{Int},\mathsf{Int}) &= \mathsf{Bool} \end{align*} The following presents the type rules for expressions.
\(\Gamma \vdash e : T\)
\begin{gather*} \frac{\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T}{\Gamma \vdash x : T} \qquad \frac{\mathit{typeof}(c) = T}{\Gamma \vdash c : T} \\[2ex] \frac{ (x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash e : T_2 }{ \Gamma \vdash \mathtt{fun}\,f(x{:}T_1)e : T_1 \to T_2 } \\[2ex] \frac{ \Gamma \vdash e_1 : T_1 \to T_2 \quad \Gamma \vdash e_2 : T_1 }{ \Gamma \vdash e_1(e_2) : T_2 } \qquad \frac{ \begin{array}{c} \Delta(o,T_1,T_2) = T_3 \\ \Gamma \vdash e_1 : T_1 \quad \Gamma \vdash e_2 : T_2 \end{array} }{ \Gamma \vdash o(e_1,e_2) : T_3 } \end{gather*} Our proof of type safety will require that we define notions of well-typed values, results, and environments.
\(\vdash v : T\) \begin{gather*} \frac{\mathit{typeof}(c) = T}{\vdash c : T} \quad \frac{ \Gamma \vdash \rho \quad (x,T_1) \cdot (f,T_1\to T_2) \cdot \Gamma \vdash e : T_2 }{ \vdash \langle f(x{:}T_1)e , \rho \rangle : T_1 \to T_2 } \end{gather*} \(\vdash r : T\) \begin{gather*} \frac{\vdash v : T}{\vdash \mathit{Result}\,v : T} \quad \frac{}{\vdash \mathit{TimeOut} : T} \end{gather*} \(\Gamma \vdash \rho\) \begin{gather*} \frac{}{\epsilon \vdash \epsilon} \qquad \frac{ \vdash v : T \quad \Gamma \vdash \rho }{ (x,T) \cdot \Gamma \vdash (x,v) \cdot \rho } \end{gather*}

Type Safety

The proof of type safety, as promised, includes just three easy lemmas. The first two lemmas are essentially identical to the corresponding lemmas in ``Type Safety in Five''. They establish that the primitive operators and lookup on environments are sound with respect to \(\Delta\) and lookup on type environments, respectively. The third lemma is the main event of the proof, showing that \(\mathcal{V}\) is sound with respect to the type system. Because we know that \(\mathcal{V}\) terminates, we can do the proof by induction on the definition of \(\mathcal{V}\), which helps to streamline the proof. We cap off the proof with the Type Safety theorem, which follows almost immediately from the soundness of \(\mathcal{V}\).
Lemma (\(\delta\) is safe)
If \(\Delta(o,\overline{T}) = T\) and \(\vdash v_i : T_i\) for \(i \in \{1,\ldots,n\}\), then \(\delta(o,\overline{v}) = \mathit{Result}\;v\) and \(\vdash v : T\), for some \(v\).
Proof. We proceed by cases on the operator \(o\).
  1. If the operator \(o\) is \(+\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Int}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(+,n_1,n_2) = \mathit{Result}\, (n_1 + n_2)\) and we have \(\vdash (n_1 + n_2) : \mathsf{Int}\).
  2. If the operator \(o\) is \(-\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Int}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(-,n_1,n_2) = \mathit{Result}\, (n_1 - n_2)\) and we have \(\vdash (n_1 - n_2) : \mathsf{Int}\).
  3. If the operator \(o\) is \(=\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Bool}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(=,n_1,n_2) = \mathit{Result}\;(n_1 = n_2)\) and we have \(\vdash (n_1 = n_2) : \mathsf{Bool}\).
QED.
Lemma (\(\mathit{lookup}\) is safe)
If \(\Gamma \vdash \rho\) and \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T\), then \(\mathit{lookup}(x,\rho) = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\).
Proof. We proceed by induction on \(\Gamma \vdash \rho\).
  1. Case \(\epsilon \vdash \epsilon: \qquad (\Gamma=\epsilon, \rho = \epsilon)\)
    But then we have a contradition with the premise \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T\), so this case is vacuously true.
  2. Case \(\begin{array}{c}\vdash v : T' \quad \Gamma' \vdash \rho' \\ \hline (x',T') \cdot \Gamma' \vdash (x',v) \cdot \rho' \end{array}\):
    Next we consider two cases, whether \(x = x'\) or not.
    1. Case \(x = x'\): Then \(\mathit{lookup}(x, \rho) = \mathit{Result}\;v\) and \(T = T'\), so we conclude that \(\vdash v : T\).
    2. Case \(x \neq x'\): Then \(\mathit{lookup}(x,\rho) = \mathit{lookup}(x,\rho')\) and \(\mathit{lookup}(x,\Gamma) = \mathit{lookup}(x,\Gamma') = \mathit{Result}\;T\). By the induction hypothesis, we have \(\mathit{lookup}(x,\rho') = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\), which completes this case.
QED.
Lemma (\(\mathcal{V}\) is safe)
If \(\Gamma \vdash e : T\) and \(\Gamma \vdash \rho\), then \(\vdash \mathcal{V}(e,\rho,k) : T\).
Proof. The proof is by induction on the definition of \(\mathcal{V}\).
  1. Case \(\mathcal{V}(e,\rho,0)\): We have \(\mathcal{V}(e,\rho,0) = \mathit{TimeOut}\) and \(\vdash \mathit{TimeOut} : T\).
  2. Case \(\mathcal{V}(x,\rho,1+k)\): We have \(\mathcal{V}(x,\rho,1+k) = \mathit{lookup}(x,\rho)\). From \(\Gamma \vdash x : T\) we have \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\; T\). Then by the \(\mathit{lookup}\) is safe lemma, we have \(\mathit{lookup}(x,\rho) = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\). Thus we have \(\vdash \mathit{lookup}(x,\rho) : T\) and conclude \(\vdash \mathcal{V}(x,\rho,1+k) : T\).
  3. Case \(\mathcal{V}(c,\rho,1+k)\): From \(\Gamma \vdash c : T\) we have \(\mathit{typeof}(c) = T\) and therefore \(\vdash c : T\). We have \(\mathcal{V}(c,\rho) = \mathit{Result}\;c\) and therefore \(\vdash \mathcal{V}(c,\rho,1+k) : T\).
  4. Case \(\mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1, \rho,1+k)\):
    We have \(\mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1,\rho,1+k) = \mathit{Result}\;\langle f(x{:}T_1) e_1, \rho \rangle\). From \(\Gamma \vdash \mathtt{fun}\,f(x{:}T_1) e_1 : T\) we have \((x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash e_1 : T_2\), with \(T = T_1 \to T_2\). Together with \(\Gamma \vdash \rho\), we have \(\vdash \langle f(x{:}T_1) e_1, \rho \rangle : T\) and therefore \(\vdash \mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1, \rho,1+k) : T\).
  5. Case \(\mathcal{V}(o(e_1,e_2), \rho, 1+k))\):
    We have \(\Delta(o,T_1,T_2) = T\) and \(\Gamma \vdash e_1 : T_1\) and \(\Gamma \vdash e_2 : T_2\). By the induction hypothesis, we have \(\mathcal{V}(e_1,\rho,k) = \mathit{Result}\;v_1\) and \(\vdash v_1 : T_1\) and \(\mathcal{V}(e_2,\rho,k) = \mathit{Result}\;v_2\) and \(\vdash v_2 : T_2\). Because \(\delta\) is safe, we have \(\delta(o,v_1,v_2) = \mathit{Result}\;v_3\) and \(\vdash v_3 : T\) for some \(v_3\). We have \(\vdash \delta(o,v_1,v_2) : T\) and therefore \(\vdash \mathcal{V}(o(e_1,e_2), \rho, 1+k)) : T\).
  6. Case \(\mathcal{V}(e_1\,e_2, \rho, 1+k)\): We have \(\Gamma \vdash e_1 : T_2 \to T\) and \(\Gamma \vdash e_2 : T_2\). By the induction hypothesis, we have \(\mathcal{V}(e_1,\rho,k) = \mathit{Result}\;v_1\) and \(\vdash v_1 : T_2 \to T\) and \(\mathcal{V}(e_2,\rho,k) = \mathit{Result}\;v_2\) and \(\vdash v_2 : T_2\). By cases on \(\vdash v_1 : T_2 \to T\), we have that \(v_1 = \langle f(x{:}T_2)e_3, \rho' \rangle\) and \(\Gamma' \vdash \rho'\) and \((x,T_2) \cdot (f,T_2 \to T) \cdot \Gamma' \vdash e_3 : T\) for some \(f,x,e_3,\Gamma',\rho'\). So we have \(\mathcal{V}(e_1\,e_2, \rho, 1+k) = \mathcal{V}(e_3,(x,v_2) \cdot (f,v_1) \cdot \rho',k)\). Applying the induction hypothesis for a third time, we have \(\vdash \mathcal{V}(e_3,(x,v_2) \cdot (f,v_1) \cdot \rho',k) : T\). Therefore \(\vdash \mathcal{V}(e_1\,e_2, \rho, 1+k) : T\).
QED.
Theorem (Type Safety)
If \(\epsilon \vdash e : T\) and \(T \in \{ \mathtt{Int}, \mathtt{Bool}\} \), then either \(\mathit{eval}(e) = c\) and \(\vdash c : T\) for some \(c\) or \(\mathit{eval}(e) = \mathit{diverge}\) .
Proof. We consider two cases: whether the program diverges or not.
  1. Suppose that \(\mathcal{V}(e,\epsilon,k) = \mathit{TimeOut}\) for all \(k\). Then \(\mathit{eval}(e) = \mathit{diverge}\).
  2. Suppose it is not the case that \(\mathcal{V}(e,\epsilon,k) = \mathit{TimeOut}\) for all \(k\). So \(\mathcal{V}(e,\epsilon,k') \neq \mathit{TimeOut}\) for some \(k'\). Then because \(\mathcal{V}\) is safe, we have \(\mathcal{V}(e,\epsilon,k') = \mathit{Result}\,v\) and \(\vdash v : T\). Then from \(T \in \{ \mathtt{Int}, \mathtt{Bool} \}\) and by cases on \(\vdash v : T\), we have \(v = c\) for some \(c\).
QED.

Thursday, October 18, 2012

Interp. of the GTLC, Part 5: Eager Cast Checking

Back in Part 1 of this series, I mentioned that there is a design choice between eager and lazy cast checking. Recall the following example. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With eager cast checking, the cast labeled \(\ell_1\) fails at the moment when it is applied to a value. Whereas with lazy cast checking, the \(\ell_1\) cast initially succeeds, but then later, when the function is applied at \(f\,\mathsf{true}\), the cast fails. I like eager cast checking because it tells the programmer as soon as possible that something is amiss. Further, it turns out that when using the space-efficient implementations, eager and lazy checking are about the same regarding run-time overhead. (Lazy can be faster if you don't care about space efficiency.)

We saw the specification for lazy cast checking in Part 1, but the specification for eager checking was postponed. The reason for the postponement was that specifying the semantics of eager cast checking requires more machinery than for lazy cast checking. (I'll expand on this claim in the next paragraph.) Thankfully, in the meantime we've acquired the necessary machinery: the Coercion Calculus. The Eager Coercion Calculus was first discussed in the paper Space-Efficient Gradual Typing and was extended to include blame labels in Exploring the Design Space of Higher-Order Casts. Here we'll discuss the version with blame labels and flesh out more of the theory, such as characterizing the coercion normal forms and defining an efficient method of composing coercions in normal form. This is based on an ongoing collaboration with Ronald Garcia.

Before getting into the eager coercion calculus, let me take some time to explain why the eager coercion calculus is needed for the semantics, not just for an efficient implementation. After all, there was no mention of coercions in the semantics of the lazy variants of the Gradually-Typed Lambda Calculus. Instead, those semantics just talked about casts, which consisted of a pair of types (source and target) and a blame label. The heart of those semantics was a \(\mathsf{cast}\) function that applies a cast to a value.

It's instructive to see where naive definitions of a \(\mathsf{cast}\) function for eager checking break down. The most obvious thing to try is to modify the \(\mathsf{cast}\) function to check for (deep) type consistency instead of only looking at the head of the type. So we change the first line of the \(\mathsf{cast}\) function from \[ \mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \] to \[ \mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } T_1 \not\sim T_2 \] Let's see what happens on an example just a tad different from the previous example. In this example we go through \(\star \to \star\) instead of \(\star\). \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With the naive cast function, both casts initially succeed, producing the value \[ (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool} \] However, that's not what an eager cast checking should do. The above should not be a value, it should have already failed and blamed \(\ell_0\).

So the \(\mathsf{cast}\) function not only needs to check whether the target type is consistent with the source type, it also needs to check whether the target type is consistent with all of the casts that are wrapping the value. One way we could try to do this is to compute the greatest lower bound (with respect to naive subtyping) of all the types in the casts on the value, and then compare the target type to the greatest lower bound. The meet operator on types is defined as follows: \begin{align*} B \sqcap B &= B \\ \star \sqcap T &= T \\ T \sqcap \star &= T \\ (T_1 \to T_2) \sqcap (T_3 \to T_4) &= (T_1 \sqcap T_3) \to (T_2 \sqcap T_4) \\ & \text{if } (T_1 \to T_2) \sim (T_3 \to T_4)\\ T_1 \sqcap T_2 &= \bot & \text{if } T_1 \not\sim T_2 \end{align*} We introduce the bottom type \(\bot\) so that the meet operator can be a total function. Next we define a function that computes the meet of all the casts wrapping a value. \begin{align*} \sqcap (s : T_1 \Rightarrow^{\ell} T_2) &= T_1 \sqcap T_2 \\ \sqcap (v : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_1} T_3) & = (\sqcap (v : T_1 \Rightarrow^{\ell_1} T_2)) \sqcap T_3 \end{align*} Now we can replace the first line of the \(\mathsf{cast}\) function to use this meet operator. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \left(\sqcap v\right) \not\sim T_2 \end{align*} How does this version fare on our example? An error is now triggered when the value flows into the cast labeled \(\ell_1\), so that's good, but the blame goes to \(\ell_1\). Unfortunately, the prior work on eager checking based on coercions says that \(\ell_0\) should be blamed instead! The problem with this version of \(\mathsf{cast}\) is that the \(\sqcap\) operator forgets about all the blame labels that are in the casts wrapping the value. In this example, it's dropping the label \(\ell_0\) which really ought to be blamed.

The Eager Coercion Calculus

In the context of the Coercion Calculus, one needs to add the following two reduction rules to obtain eager cast checking. What these rules do is make sure that failure coercions immediately bubble up to the top of the coercion where they can trigger a cast failure. \begin{align*} (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\hat{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*} In the second rule, we require that the domain coercion be in normal form, thereby imposing a left-to-right ordering for coercion failures.

To ensure confluence, we also need to make two changes to existing reduction rules. In the rule for composing function coercions, we need to require that the two coercions be in normal form. (The notation \(\tilde{c}\) is new and will be explained shortly.) \begin{align*} (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \end{align*} Here's the counter-example to confluence, thanks to Ron, if the above restriction is not made. \begin{align*} (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow \mathsf{Fail}^{\ell_1}; (\mathsf{Fail}^{\ell_2}\to c_2) \longrightarrow \mathsf{Fail}^{\ell_1} \\ (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow (\mathsf{Fail}^{\ell_2};\mathsf{Fail}^{\ell_1}) \to (c_1; c_2) \\ & \longrightarrow \mathsf{Fail}^{\ell_2} \to (c_1; c_2)\\ & \longrightarrow \mathsf{Fail}^{\ell_2} \end{align*} There is also a confluence problem regarding the following rule. \begin{align*} \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*} The counter-example, again thanks to Ron, is \begin{align*} (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota;\iota) \to (\mathsf{Bool}!; \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \iota \to (\mathsf{Fail}^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \mathsf{Fail}^{\ell_2} \\ (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota \to \mathsf{Bool}!); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow \mathsf{Fail}^{\ell_1} \end{align*} We fix this problem by making the reduction rule more specific, by only allowing injections to be consumed on the left of a failure. \begin{align*} I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*}

Here's the complete set of reduction rules for the Eager Coercion Calculus. \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\tilde{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*}

These additions and changes to the reduction rules cause changes in the normal forms for coercions. First, \(\mathsf{Fail}^\ell\) cannot appear under a function coercion We therefore introduce another category, called ``normal parts'' and written \(\tilde{c}\), that excludes \(\mathsf{Fail}^\ell\) (but still includes \(I?^{\ell_1}; \mathsf{Fail}^{\ell_2}\) because the \(\ell_1\) projection could still fail and take precedence over \(\ell_2\)). Also, \( (\tilde{c}_1 \to \tilde{c}_2); \mathsf{Fail}^\ell\) is now a normal form. Further, to regularize the form that coercions can take, we always write them as having three parts. The following grammar defines the normal coercions for eager cast checking. \[ \begin{array}{llcl} \text{optional injections} & i & ::= & \iota \mid I! \\ & i_\bot & ::= & i \mid \mathsf{Fail}^\ell \\ \text{optional functions} & f & ::= & \iota \mid \tilde{c} \to \tilde{c} \\ & f_\bot & ::= & f \mid \mathsf{Fail}^\ell \\ \text{optional projections} & j & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \iota; f; i \qquad \dagger\\ \text{normal parts} & \tilde{c} & ::= & j ; f; i_\bot \qquad \ddagger \\ \text{normal coercions} & \hat{c} & ::= & \tilde{c} \mid \iota; \iota; \mathsf{Fail}^\ell \end{array} \] \(\dagger\) The coercion \((\iota ;\iota; \iota)\) is not a wrapper coercion.
\(\ddagger\) The coercion \((\iota; \iota; \mathsf{Fail}^\ell)\) is not a normal part.

The Eager Gradually-Typed Lambda Calculus

Taking a step back, recall that we gave the semantics of the Lazy Gradually-Typed Lambda Calculus in terms of a denotational semantics, based on an evaluation function \(\mathcal{E}\). We can do the same for the Eager variant but using coercions to give the meaning of casts. The following is the definition of values and results for the Eager variant. \[ \begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : \overline{c} \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array} \]

Most of the action in the \(\mathcal{E}\) function is in the \(\mathsf{cast}\) auxiliary function. We will give an alternative version of \(\mathsf{cast}\) for eager checking. To make \(\mathsf{cast}\) more succinct we make use of the following helper function regarding cast failure. \[ \mathsf{isfail}(c,\ell) \equiv (c = \mathsf{Fail}^\ell \text{ or } c = \mathsf{Fail}^\ell \circ (\tilde{c}_1 \to \tilde{c}_2) \text{ for some } \tilde{c}_1 \text{ and } \tilde{c}_2) \] Here's the updated definition of \(\mathsf{cast}\) for eager checking. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \overline{c}_3 \end{cases} \end{align*}

We can now give the definition of \(\mathcal{E}\), making use of the above \(\mathsf{cast}\) function as well as a function \(\mathcal{C}\) for compiling casts to coercions. (Use \(\mathcal{C}_{\mathit{D}}\) or \(\mathcal{C}_{\mathit{UD}}\) for \(\mathcal{C}\) to obtain the D or UD blame tracking strategy.) \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X, \mathcal{C}(T_1 \Rightarrow^\ell T_2)) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad\; \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad\; \mathsf{apply}(X_1,X_2) \end{align*}

The semantics for the Eager Gradually-Typed Lambda Calculus is defined by the following \(\mathit{eval}\) partial function. \[ \mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases} \] where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : \iota \circ (\hat{c}_1 \to \hat{c}_2) \circ \iota) &= \mathit{function} \\ \mathit{observe}(v : I! \circ \iota \circ \iota) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*}

An Eager Space-Efficient Machine

To obtain a space efficient machine for the Eager variant, we just plug the eager version of \(\mathsf{cast}\) into the lazy space-efficient machine.

An Eager Time-Efficient Machine

Recall that the lazy time-efficient machine used threesomes instead of coercions because we could define an efficient function for composing threesomes, whereas reducing coercions is a complex process. The natural thing to do here is to try and come up with an eager variant of threesomes and the composition function. The lazy threesomes were isomorphic to lazy coercions in normal form, and we already have the normal forms for eager coercions, so it should be straightforward to come up with eager threesomes. It is straightforward, but in this case nothing is gained; we just end up with a slightly different notation. The reason is that the normal forms for eager coercions are more complex. So we might as well stick with using the eager coercions.

However, the essential lesson from the threesomes is that we don't need to implement reduction on coercions, instead we just need to define a composition function that takes coercions in normal form. After thinking about this for a long time, trying lots of variants, we've come up with the definition shown below. (Here we use \(\rhd\) for composition. I'd prefer to use the fatsemi latex symbol, but it seems that is not available in MathJax.)

Composition of Normal Coercions: \( \hat{c} \rhd \hat{c}\)
\begin{align*} (j; f; i_\bot) \rhd (j'; f'; i'_\bot) &= \mathbf{case}\;i_\bot \rhd j'\;\mathbf{of}\\ & \qquad I! \Rightarrow j; f; (I! \rhd i'_\bot) \\ & \quad \mid I?^\ell \Rightarrow I?^\ell; f'; i'_\bot \\ & \quad \mid \mathsf{Fail}^\ell \Rightarrow j; f; \mathsf{Fail}^\ell \\ & \quad \mid c \Rightarrow \mathbf{case}\;(f \rhd c) \rhd f' \;\mathbf{of}\\ & \qquad\qquad\quad \mathsf{Fail}^\ell \Rightarrow j; \iota; \mathsf{Fail}^\ell\\ & \qquad\quad\quad \mid c' \Rightarrow j; c'; i'_\bot \end{align*} \begin{align*} \iota \rhd c &= c \\ c \rhd \iota &= c \\ I_1! \rhd I_2?^\ell &= \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_1 \to \tilde{c}_2) \rhd (\tilde{c}_3 \to \tilde{c}_4) &= (\tilde{c_3}\rhd \tilde{c_1}) \overset{\bullet}{\to} (\tilde{c}_2 \rhd \tilde{c}_4) \\ \mathsf{Fail}^\ell \rhd c &= \mathsf{Fail}^\ell \\ I! \rhd \mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \\ \\ \tilde{c}_1 \overset{\bullet}{\to} \tilde{c}_2 &= \tilde{c}_1 \to \tilde{c}_2 \\ \mathsf{Fail}^\ell \overset{\bullet}{\to}\hat{c}_2 &= \mathsf{Fail}^\ell \\ \tilde{c}_1 \overset{\bullet}{\to}\mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \end{align*}

To obtain an eager, time-efficient machine, we just replace coercion reduction with coercion composition. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \overline{c}_3 \end{cases} \end{align*}

Monday, October 08, 2012

Is TypeScript gradually typed? Part 2

Consider the following TypeScript program, in which a number is stored in variable x of type any and then passed to the display function that expects a string. As we saw in the previous post, a gradual type system allows the implicit down-cast from any to string, so this is a well-typed program.

function display(y : string) {
   document.body.innerHTML = y.charAt(0);
}

var x : any = 3;
display(x);
But what happens at run-time? The answer for TypeScript is that this program hits an error at the y.charAt(0) method call because charAt is not supported by numbers like 3. But isn't y guaranteed to be a string? No, not in TypeScript. TypeScript does not guarantee that the run-time value in a variable is consistent with the static type of the variable. The reason for this is simple, TypeScript does not perform any run-time checks at down-casts to ensure that the incoming value is of the target type. In the above program, the call display(x) causes an implicit cast to string, but there's no run-time check to make sure that the value is in fact a string. TypeScript is implemented as a compiler to JavaScript, and the compiler simply ignores the implicit casts. Let's refer to this no-checking semantics as level 1 gradual typing. I briefly describe this approach in the paper Gradual Typing for Objects.

Level 2 Gradual Typing

A second alternative semantics for gradual typing is to perform run-time checking to ensure that values are consistent with the static types. For implicit casts concerning simple types like number and string, this run-time checking is straightforward. In the above example program, an error would be signaled just prior to the call to display, saying that display expects a string, not a number.

For implicit casts concerning complex types, such as function and object types, run-time checking is more subtle. Consider the following program that defines a deriv function that takes another function as a parameter.

function deriv(d:number, f:(number)=>number, x:number) {
 return (f(x + d) - f(x - d)) / (2.0 * d);
}

function fun(y):any {
 if (y > 0)
  return Math.pow(y,3) - y - 1;
 else
  return "yikes";
}

deriv(0.01, fun, 3.0);
deriv(0.01, fun, -3.0);
The function fun has type (any)=>any, and at each call to deriv, this function is implicitly cast to (number)=>number. The fundamental challenge in casting functions is that it's impossible to tell in general how a function will behave, and in particular, what the return value will be. Here we don't know whether fun will return a number or a string until we've actually called it.

The standard way to deal with function casts is to delay the checking until subsequent calls. One way to visualize this semantics is to imagine the compiler generating the following wrapper function, casted_fun, that applies casts to the argument and return value.

function casted_fun(z:number):number {
 return <number>fun(<any>z);
}

deriv(0.01, casted_fun, 3.0);
deriv(0.01, casted_fun, -3.0);

My first two papers on gradual typing, Gradual Typing for Functional Languages and Gradual Typing for Objects, both used level 2 gradual typing.

Level 3 Gradual Typing

The down-side of delayed checking of function casts is that when an error is finally caught, the location of the error can be far away from the cast that failed. In the above example, the error would occur during the call f(x + d), not at the call to deriv. Findler and Felleisen solved this problem by introducing the notion of blame tracking in their paper Contracts for higher-order functions. The idea is to associate source location information with each cast and then to carry along this information at run-time, in the wrapper functions, so that when the cast in a wrapper fails, it can emit an error that mentions the source location of the original cast, in this example, the call to deriv.

Implementing casts in a way that supports blame tracking while also keeping space overheads to a constant factor is challenging. My paper Threesomes, With and Without Blame shows how to do this.

Discussion

Each of the three levels comes with some advantages and disadvantages. Level 1 gradual typing is the easiest to implement, an important engineering concern, and it comes with no run-time overhead, as there is no run-time checking. On the other hand, level 1 gradual typing does not provide run-time support for catching broken invariants, such as deriv's expectation that its arguments have type string. Thus, a TypeScript programmer that really wants to enforce such an invariant would need to add code to check the type of the argument, a common practice today in JavaScript.

Level 2 gradual typing ensures that the value stored in a variable is consistent with the variable's static type and it provides the run-time checking to catch when this invariant is about to be broken. Thus, level 2 gradual typing removes the need for hand-written type tests. Also, level 2 gradual typing opens up the possibility of compiling statically-typed regions of a program in a more efficient, type-specific manner. (This is an active area of my current research.) The disadvantages of level 2 gradual typing are the run-time overhead from cast checking and the increased implementation complexity.

Level 3 gradual typing improves on level 2 by adding blame tracking, thereby improving the diagnostic errors reported when a cast fails. The extra cost of blame tracking is not very significant, so I would always suggest level 3 over level 2.

Thursday, October 04, 2012

Is TypeScript gradually typed? Part 1

If you haven't heard already, there's a new language named TypeScript from Microsoft, designed by Anders Hejlsberg and several others, including a recent alumni from my research group named Jonathan Turner. The TypeScript language extends JavaScript with features that are intended to help with large-scale programming such as optional static type checking, classes, interfaces, and modules. In this post I'll try to characterize what optional static typing means for TypeScript. There are a large number of possible design decisions regarding optional static typing, so the characterization is non-trivial. When discussing types, it's often easy to fixate on the static semantics, that is, how the type checker should behave, but we'll also need to look at the dynamic semantics of TypeScript in Part 2 of this post. The punch line will be that TypeScript is a gradually-typed language, but only to level 1. (I'll define levels that go from 1 to 3 and discuss their pros and cons.)

Static Semantics (Type System)

TypeScript has an any type. Variables and fields of this type can store any type of value. TypeScript has function types that describe the types of the parameters and return types of a function and the way in which the any type and function types interact is closely related to the design I wrote about in Gradual Typing for Functional Languages, SFP 2006. Further, TypeScript has object types to describe the types of fields and methods within an object. The way in which the any type and the object types behave in TypeScript is closely related to the system I described in Gradual Typing for Objects, ECOOP 2007.

The basic feature of any is that you can implicitly convert from any type to any and you can implicitly convert from any to any other type. For example, the following is a well-typed program in TypeScript that demonstrates converting from type string to type any and back to string.

var answer : string = "42";
var a : any = answer;
var the_answer : string = a;
document.body.innerHTML = the_answer;
On the other hand, a gradual type system acts like a static type system when the any type is not involved. For example, the following program tries to implicitly convert from a number to a string, so the type system rejects this program.
var answer : number = 42;
var the_answer : string = answer;

Next, let's look at how any and function types interact. TypeScript uses structural typing for function types, which means that whether you can convert from one function type to another depends on the parts of the function type. The parts are the parameter and the return types. Consider the following example, in which a function of type (string)=>string is implicitly converted to (any)=>string and then to (any)=>any.

function f(x:string):string { return x; }
var g : (any)=>string = f;
var h : any = g;
document.body.innerHTML = h("42");
The first conversion is interesting because, if g is called with an argument of type any, then the argument needs to be implicitly converted to the string that f expects. This is an implicit down-cast, and doesn't follow the contra-variance rule for functions that one sees in the subtyping rules for object-oriented languages. Indeed, in a gradually typed system, assignment compatibility is co-variant in the parameter type of a function, at least, with respect to the any type. The second conversion, from (any)=>string to any is not so surprising, it's just up-casting from (any)=>string to any. Interestingly, there is a third implicit conversion in this program. Can you see it? It's in the call to h. The fact that we're calling h implies that h needs to be a function (or something callable), so there's essentially an implicit conversion here from any to (string)=>any.

Next let's look at implicit conversions involving object types. Like function types, object types are also structural. Consider the following well-typed program in TypeScript, in which an object of type {x:number; y:any} is implicitly converted to {x:any; y:string}, then {x:number}, and finally to any.

var o : {x:number; y:any;} = {x:1, y:"42"};
var p : {x:any; y:string;} = o;
var q : {x:number;} = p;
var r : any = p;
document.body.innerHTML = r.y;
The assignment of o to p shows structural changes within an object type, both to and from any. The next conversion, to {x:number}, shows that the type system allows implicit narrowing of object types. Thus, the rules governing implicit conversion are quite close to the consistent-subtyping relation described in Gradual Typing for Objects. This relation combines the consistency relation that governs the static behavior of any (sometimes called compatibility) with the traditional subtyping relation of structural type systems that allows the implicit narrowing of object types. Getting back to the above example, similar to the function call at type any, TypeScript allows member access on things of type any.

The next example is not well-typed in TypeScript.

var o : {x:number; y:any; } = {x:1, y:"42"};
var q : {x:number;} = o;
var r : {x:number; y:any;} = q;
document.body.innerHTML = r.y;
The tsc compiler complains that
example.ts(3,29): Cannot convert '{ x: number; }' 
    to '{ x: number; y: any; }':
Type '{ x: number; }' is missing property 'y'
    from type '{ x: number; y: any; }'
which shows the TypeScript doesn't allow implicit widening (again in line with the consistent-subtyping relation).

To wrap up the discussion of the static semantics, let's take a look at the interaction between function types (arrows) and object types. To quote John Reynolds by way of Olivier Danvy, "As usual, something funny happens at the left of the arrow". I'm curious to see whether object narrowing is contra-variant in the parameters of function types, which is what I'd expect based on traditional subtyping relations and based on wanting a consistent design with respect to not allowing implicit widening. Consider the following example.

function f(o: {x:number;}):string { return "42"; };
var g : (o: {x:number; y:number;})=>string = f;
var h : (o: {x:number;})=>string = g;
document.body.innerHTML = h({x:1,y:2});
The conversion from f to g should be OK, because it only requires an argument of type {x:number; y:number;} to be up-cast (narrowed) to {x:number;}. However, the conversion from g to h should not be OK because it requires an argument of type {x:number;} to be implicitly down-cast (widened) to {x:number; y:number;}. Surprisingly, the tsc compiler does not give a type error for the above example! So what I said above about TypeScript disallowing implicit widening is not quite true. In many cases it disallows widening, but here we see an exception to the rule. I don't like exceptions in language design because they increase the complexity of the language. So on this one point, TypeScript differs from the design in Gradual Typing for Objects. Perhaps Jonathan can comment on whether this difference was intentional or accidental.

Thursday, September 20, 2012

Interpretations of the GTLC, Part 4: Even Faster

Consider the following statically-typed function. (The type \(\star\) does not occur anywhere in this function.) \[ \lambda f: \mathsf{Int}{\to}\mathsf{Int}. \; x{:}\mathsf{Int} = f(42); \mathbf{return}\,x \] We'd like the execution speed of this function to be the same as if the entire language were statically typed. That is, we don't want statically-typed parts of a gradually-typed program to pay overhead because other parts may be dynamically typed. Unfortunately, in the abstract machines that we've defined so far, there is an overhead. At the point of a function call, such as \(f(42)\) above, the machine needs to check whether \(f\) has evaluated to a closure or to a closure wrapped in a threesome. This act of checking constitutes some run-time overhead.

Taking a step back, there are two approaches that one sees in the literature regarding how a cast is applied to a function. One approach is to build a new function that casts the argument, applies the old function, and then casts the result. The reduction rule looks like this: \[ v : T_1 \to T_2 \Rightarrow T_3 \to T_4 \longrightarrow \lambda x{:}T_3. (v\,(x : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that there's only one kind of value of function type, functions! So when it comes to function application, we only need one reduction rule, good old beta: \[ (\lambda x{:}T.\,e)\, v \longrightarrow [x{:=}v]e \] The other approach is to leave the cast around the function and then add a second reduction rule for applications. \[ (v_1 : T_1 \to T_2 \Rightarrow T_3 \to T_4) \, v_2 \longrightarrow (v_1\, (v_2 : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that the cast around the function is easy to access and change, which we took advantage of to compress sequences of such casts. But as we've already pointed out, having two kinds of values at function type induces some run-time overhead, even in parts of the program that are statically typed.

Our solution to this conundrum is to use a hybrid representation and to take advantage of the indirection that is already present in a function call. Instead of having two kinds of values at function type, we have only one: a closure that includes an optional threesome: \[ \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \] When a closure is first created, there is no threesome. Later, when a closure is cast, the threesome is added. \[ V(\lambda x{:}T.\, s,\rho) = \langle \lambda x{:}T.\, s, \rho, \bot \rangle \] The one transition rule for function application passes the optional threesome as a special parameter, here named \(c\), to the function. In the case of an un-casted closure, the function ignores the \(c\) parameter. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \end{align*}

When an un-casted closure is cast, we build a wrapper function, similar to the first approach discussed above, but using the special variable \(c\) to refer to the threesome instead of hard-coding the cast into the wrapper function. We add \(\mathit{dom}\) and \(\mathit{cod}\) operations for accessing the parts of a function threesome. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s', \rho', \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ & \text{and } \rho' = \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \} \end{align*} When a closure is cast for the second time, the casts are combined to save space. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*}

That's it. We now have a machine that doesn't perform extra dispatching at function calls. There is still a tiny bit of overhead in the form of passing the \(c\) argument. This overhead can be removed by passing the entire closure to itself (instead of passing the array of free variables and the threesome separately), and from inside the function, access the threesome from the closure.

In the following I give the complete definitions for the new abstraction machine. In addition to \(\mathit{dom}\) and \(\mathit{cod}\), we add a tail call without a cast to avoid overhead when there is no cast. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \mid \mathit{dom}(e) \mid \mathit{cod}(e) \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) \mid \mathbf{return}\,e(e) : \tau \\ \text{optional threesomes} & \tau_\bot & ::= & \bot \mid \tau \\ \text{values}& v & ::= & k \mid k : \tau \mid \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \end{array} \] Here's the complete definition of the cast function. \begin{align*} \mathsf{cast}(v, \bot) &= v \\ \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s' , \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \}, \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*} Here are the updated evaluation rules. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho, \bot \rangle \\ V(\mathit{dom}(e),\rho) &= \tau_1 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \\ V(\mathit{cod}(e),\rho) &= \tau_2 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \end{align*} Lastly, here are the transition rules for the machine. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau, (x,s,\rho')::\kappa)) & \longmapsto (s, \rho'[x{:=}v'], \kappa) \\ \text{where }& V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2), \rho,\kappa) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot],\kappa) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho,(\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot], ((\tau_1; \tau_2), \sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\[2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*}

I like how there are fewer rules and the rules are somewhat simpler compared to the previous machine. There is one last bit of overhead in statically typed code: in a normal return we have to apply the pending threesome that's on the stack. If one doesn't care about making tail-calls space efficient in the presence of casts, then this wouldn't be necessary. But I care.