Saturday, December 23, 2017

Putting the Function back in Lambda

Happy holidays! There’s nothing quite like curling up in a comfy chair on a rainy day and proving a theorem in your favorite proof assistant.

Lately I’ve been interested in graph models of the \(\lambda\)-calculus, that is, models that represent a \(\lambda\) with relations from inputs to outputs. The use of relations instead of functions is not a problem when reasoning about expressions that produce numbers, but it does introduce problems when reasoning about expressions that produce higher-order functions. Some of these expressions are contextually equivalent but not denotationally equivalent. For example, consider the following two expressions. \[{\lambda f.\,} (f {\;}0) + (f {\;}0) =_{\mathrm{ctx}} {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) \qquad\qquad (1)\] The expression on the left-hand side has two copies of a common subexpression \((f {\;}0)\). The expression on the right-hand side is optimized to have just a single copy of \((f {\;}0)\). The left and right-hand expressions in equation (1) are contextually equivalent because the \(\lambda\)-calculus is a pure language (no side effects), so whether we call \(f\) once or twice does not matter, and it always returns the same result given the same input. Unfortunately, the two expressions in equation (1) are not denotationally equivalent. \[{\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset \neq {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset \qquad\qquad (2)\] Recall that my semantics \(\mathcal{E}\) maps an expression and environment to a set of values. The “set” is not because an expression produces multiple conceptually-different values. Sets are needed because we represent a (infinite) function as an infinite set of finite relations. So to prove the above inequality (2) we simply need to find a value that is in the set on the left-hand side that is not in the set on the right-hand side. The idea is that we consider the behavior when parameter \(f\) is bound to a relation that is not a function. In particular, the relation \[R = \{ (0,1), (0,2) \}\] Now when we consider the application \((f {\;}0)\), the semantics of function application given by \(\mathcal{E}\) can choose the result to be either \(1\) or \(2\). Furthermore, for the left-hand side of equation (2), it could choose \(1\) for the first \((f {\;}0)\) and \(2\) for the second \((f {\;}0)\) . Thus, the result of the function can be \(3\). \[\{ (R,3) \} \in {\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset\] Of course, this function could never actually produce \(3\) because \(R\) does not correspond to any \(\lambda\)’s. In other words, garbage-in garbage-out. Turning to the right-hand side of equation (2), there is only one \((f{\;}0)\), which can either produce \(1\) or \(2\), so the result of the outer function can be \(2\) or \(4\), but not \(3\).

\[\begin{aligned} \{ (R,2) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,3) \} &\notin {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,4) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\end{aligned}\]

So we need to put the function back in \(\lambda\)! That is, we need to restrict the notion of values so that all the relations are also functions. Recall the definition: a function \(f\) is a relation on two sets \(A\) and \(B\) such that for all \(a \in A\) there exists a unique \(b \in B\) such that \((a,b) \in f\). In other words, if \((a,b) \in f\) and \((a,b') \in f\), then necessarily \(b = b'\). Can we simply add this restriction to our notion of value? Not quite. If we literally applied this definition, we could still get graphs such as the following one, which maps two different approximations of the add-one function to different outputs. This graph does not correspond to any \(\lambda\). \[\{ (\{(0,1)\}, 2), (\{(0,1),(5,6) \}, 3) \}\]

So we need to generalize the notion of function to allow for differing approximations. We shall do this by generalizing from equality to consistency, written \(\sim\). Two integers are consistent when they are equal. Two graphs as consistent when they map consistent inputs to consistent outputs. We are also forced to explicitly define inconsistency, which we explain below.

\[\begin{gathered} \frac{}{n \sim n} \qquad \frac{\begin{array}{l}\forall v_1 v'_1 v_2 v'_2, (v_1,v'_1) \in t_1 \land (v_2,v'_2) \in t_2 \\ \implies (v_1 \sim v_2 \land v'_1 \sim v'_2) \lor v_1 \not\sim v_2 \end{array}} {t_1 \sim t_2} \\[2ex] \frac{n_1 \neq n_2}{n_1 \not\sim n_2} \qquad \frac{(v_1,v'_1) \in t_1 \quad (v_2,v'_2) \in t_2 \quad v_1 \sim v_2 \quad v'_1 \not\sim v'_2} {t_1 \not\sim t_2} \\[2ex] \frac{}{n \not\sim t} \qquad \frac{}{t \not\sim n}\end{gathered}\]

The definition of consistency is made a bit more complicated than I expected because the rules of an inductive definition must be monotonic, so we can’t negate a recursive application or put it on the left of an implication. In the above definition of consistency for graphs \(t_1 \sim t_2\), it would have been more natural to say \(v_1 \sim v_2 \implies v'_1 \sim v'_2\) in the premise, but then \(v_1 \sim v_2\) is on the left of an implication. The above inductive definition works around this problem by mutually defining consistency and inconsistency. We then prove that inconsistency is the negation of consistency.

Proposition 1 (Inconsistency) \(v_1 \not\sim v_2 = \neg (v_1 \sim v_2)\)
Proof. We first establish by mutual induction that \(v_1 \sim v_2 \implies \neg (v_1 \not\sim v_2)\) and \(v_1 \not\sim v_2 \implies \neg (v_1 \sim v_2)\). We then show that \((v_1 \sim v_2) \lor (v_1 \not\sim v_2)\) by induction on \(v_1\) and case analysis on \(v_2\). Therefore \(\neg (v_1 \not\sim v_2) \implies v_1 \sim v_2\), so we have proved both directions of the desired equality. \(\Box\)

Armed with this definition of consistency, we can define a generalized notion of function, let’s call it \(\mathsf{is\_fun}\). \[\mathsf{is\_fun}\;t \equiv \forall v_1 v_2 v'_1 v'_2, (v_1,v'_1) \in t \land (v_2,v'_2) \in t \land v_1 \sim v_2 \implies v'_1 \sim v'_2\] Next we restrict the notion of value to require the graphs to satisfy \(\mathsf{is\_fun}\). Recall that we use to define values by the following grammar. \[\begin{array}{lrcl} \text{numbers} & n & \in & \mathbb{Z} \\ \text{graphs} & t & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \}\\ \text{values} & v & ::= & n \mid t \end{array}\] We keep this definition but add an induction definition of a more refined notion of value, namely \(\mathsf{is\_val}\). Numbers are values and graphs are values so long as they satisfy \(\mathsf{is\_fun}\) and only map values to values.

\[\begin{gathered} \frac{}{\mathsf{is\_val}\,n} \qquad \frac{\mathsf{is\_fun}\;t \quad \forall v v', (v,v') \in t \implies \mathsf{is\_val}\,v \land \mathsf{is\_val}\,v'} {\mathsf{is\_val}\,t}\end{gathered}\]

We are now ready to update our semantic function \(\mathcal{E}\). The one change that we make is to require that each graph \(t\) satisfies \(\mathsf{is\_val}\) in the meaning of a \(\lambda\). \[{\mathcal{E}[\![ {\lambda x.\,} e ]\!]}\rho = \{ t \mid \mathsf{is\_val}\;t \land \forall (v,v')\in t, v' \in {\mathcal{E}[\![ e ]\!]}\rho(x{:=}v) \}\] Hopefully this change to the semantics enables a proof that \(\mathcal{E}\) is deterministic. Indeed, we shall show that if \(v \in {\mathcal{E}[\![ e ]\!]}\rho\) and \(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\) for any suitably related \(\rho\) and \(\rho'\), then \(v \sim v'\).

To relate \(\rho\) and \(\rho'\), we extend the definitions of consistency and \(\mathsf{is\_val}\) to environments.

\[\begin{gathered} \emptyset \sim \emptyset \qquad \frac{v \sim v' \quad \rho \sim \rho'} {\rho(x{:=}v) \sim \rho'(x{:=}v')} \\[2ex] \mathsf{val\_env}\;\emptyset \qquad \frac{\mathsf{is\_val}\; v \quad \mathsf{val\_env}\;\rho} {\mathsf{val\_env}\;\rho(x{:=}v)}\end{gathered}\]

We will need a few small lemmas concerning these definitions and their relationship with the \(\sqsubseteq\) ordering on values.

Proposition 2 

  1. If \(\mathsf{val\_env}\;\rho\) and \(\rho(x) = v\), then \(\mathsf{is\_val}\; v\).

  2. If \(\rho \sim \rho'\), \(\rho(x) = v\), \(\rho'(x) = v'\), then \(v \sim v'\).

Proposition 3 

  1. If \(\mathsf{is\_val}\;v'\) and \(v \sqsubseteq v'\), then \(\mathsf{is\_val}\; v\).

  2. If \(v_1 \sqsubseteq v'_1\), \(v_2 \sqsubseteq v'_2\), and \(v'_1 \sim v'_2\), then \(v_1 \sim v_2\).

We now come to the main theorem, which is proved by induction on \(e\), using the above three propositions.

Theorem (Determinism of \(\mathcal{E}\)) If \(v \in {\mathcal{E}[\![ e ]\!]}\rho\), \(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\), \(\mathsf{val\_env}\;\rho\), \(\mathsf{val\_env}\;\rho'\), and \(\rho \sim \rho'\), then \(\mathsf{is\_val}\;v\), \(\mathsf{is\_val}\;v'\), and \(v \sim v'\).

Sunday, October 15, 2017

New revision of the semantics paper (POPL rejection, ESOP submission)

My submission about declarative semantics to POPL was rejected. It's been a few weeks now, so I'm not so angry about it anymore. I've revised the paper and will be submitting it to ESOP this week.

The main reason for rejection according to the reviewers was a lack of technical novelty, but I think the real reasons were that 1) the paper came across as too grandiose and as a result, it accidentally annoyed the reviewer who is an expert in denotational semantics, and 2) the paper did not do a good job of comparing to the related set-theoretic models of Plotkin and Engeler.

Regarding 1), in the paper I use the term "declarative semantics" to try and distance this new semantics from the standard lattice-based denotational semantics. However, the reviewer took it to claim that the new semantics is not a denotational semantics, which is clearly false. In the new version of the paper I've removed the term "declarative semantics" and instead refer to the new semantics as a denotational semantics of the "elementary" variety. Also, I've toned down the sales pitch to better acknowledge that this new semantics is not the first elementary denotational semantics.

Regarding 2), I've revised the paper to include a new section at the beginning that gives background on the elementary semantics of Plotkin, Engeler, and Coppo et al. This should help put the contributions of the paper in context.

Other than that, I've added a section with a counter example to full abstraction. A big thanks to the POPL reviewers for the counter example! (Also thanks to Max New, who sent me the counter example a couple months ago.)

Unfortunately, the ESOP page limit is a bit shorter, so I removed the relational version of the semantics and also the part about mutable references.

A draft of the revision is available on arXiv. Feedback is most welcome, especially from experts in denotational semantics! I really hope that this version is no longer annoying, but if it is, please tell me!

Tuesday, October 03, 2017

Comparing to Plotkin and Engeler's Set-theoretic Models of the Lambda Calculus

On the plane ride back from ICFP last month I had a chance to re-read and better understand Plotkin’s Set-theoretical and other elementary models of the \(\lambda\)-calculus (Technical Report 1972, Theoretical Computer Science 1993) and to read, for the first time, Engeler’s Algebras and combinators (Algebra Universalis 1981). As I wrote in my draft paper Declarative semantics for functional languages: compositional, extensional, and elementary, the main intuitions behind my simple semantics are present in these earlier papers, but until now I did not understand these other semantics deeply enough to give a crisp explanation of the similarities and differences. (The main intuitions are also present in the early work on intersection type systems, and my semantics is more closely related to those systems. A detailed explanation of that relationship is given in the draft paper.)

I should note that Engeler’s work was in the context of combinators (S and K), not the \(\lambda\)-calculus, but of course the \(\lambda\)-calculus can be encoded into combinators. I’ve ported his definitions to the \(\lambda\)-calculus, along the lines suggested by Plotkin (1993), to make for easier comparison. In addition, I’ll extend both Engeler and Plotkin’s semantics to include integers and integer arithmetic in addition to the \(\lambda\)-calculus. Here’s the syntax for the \(\lambda\)-calculus that we consider here: \[\begin{array}{rcl} && n \in \mathbb{Z} \qquad x \in \mathbb{X} \;\;\text{(program variables)}\\ \oplus & ::= & + \mid - \mid \times \mid \div \\ \mathbb{E} \ni e & ::= & n \mid e \oplus e \mid x \mid {\lambda x.\,} e \mid e \; e \mid {\textbf{if}\,}e {\,\textbf{then}\,}e {\,\textbf{else}\,}e \end{array}\]


Perhaps the best place to start the comparison is in the definition of what I’ll call values. All three semantics give an inductive definition of values and all three involve finite sets, but in different ways. I’ll write \(\mathbb{V}_S\) for my definition, \(\mathbb{V}_P\) for Plotkin’s, and \(\mathbb{V}_E\) for Engeler’s. \[\begin{aligned} \mathbb{V}_S &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_S \times \mathbb{V}_S) \\ \mathbb{V}_P &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_P) \times \mathcal{P}_f(\mathbb{V}_P) \\ \mathbb{V}_E &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_E) \times \mathbb{V}_E\end{aligned}\] In \(\mathbb{V}_S\), a function is represented as a finite graph, that is, a finite set of input-output pairs. For example, the graph \(\{ (0,1), (1,2), (2,3) \}\) is one of the meanings for the term \((\lambda x.\, x + 1)\).

Plotkin’s values \(\mathbb{V}_P\) include only a single input-output pair from a function’s graph. For example, \((\{0\}, \{1\})\) is one of the meanings for the term \((\lambda x.\, x + 1)\). Engeler’s values also include just a single entry. For example, \((\{0\}, 1)\) is one of the meanings for the term \((\lambda x.\, x + 1)\). In this example we have not made use of the finite sets in the input and output of Plotkin’s values. To do so, let us consider a higher-order example, such as the term \((\lambda f.\, f\,1 + f\,2)\). For Plotkin, the following value is one of its meanings: \[(\{ (\{1\}, \{3\}), (\{2\}, \{4\}) \}, \{7\})\] That is, in case \(f\) is the function that adds \(2\) to its input, the result is \(7\). We see that the presence of finite sets in the input is needed to accomodate functions-as-input. The corresponding value in \(\mathbb{V}_S\) is \[\{ (\{ (1, 3), (2, 4) \}, 7) \}\]

The difference between Plotkin and Engeler’s values can be seen in functions that return functions. Consider the \(K\) combinator \((\lambda x.\,\lambda y.\, x)\). For Plotkin, the following value is one of its meanings: \[(\{1\}, \{ (\{0\},\{1\}), (\{2\},\{1\}) \})\] That is, when applied to \(1\) it returns a function that returns \(1\) when applied to either \(0\) or \(2\). The corresponding value in \(\mathbb{V}_S\) is \[\{ (1, \{ (0,1), (2,1) \}) \}\] For Engeler, there is not a single value corresponding to the above value. Instead it requires two values to represent the same information. \[(\{1\}, (\{0\},1)) \quad\text{and}\quad (\{1\}, (\{2\},1))\] We’ll see later that it doesn’t matter that Engeler requires more values to represent the same information.

The Domains

The semantics of Plotkin, Engeler, and myself does not use values for the domain, but instead a set of values. That is \[\mathcal{P}(\mathbb{V}_S) \qquad \mathcal{P}(\mathbb{V}_P) \qquad \mathcal{P}(\mathbb{V}_E)\]

The role of the outer \(\mathcal{P}\) is intimately tied to the meaning of functions in Plotkin and Engeler’s semantics because the values themselves only record a single input-output pair. The outer \(\mathcal{P}\) is needed to represent all of the input-output pairs for a given function. While the \(\mathcal{P}\) is also necessary for functions in my semantics, one can view it generically as providing non-determinism and therefore somewhat orthogonal to the meaning of functions per se. Next let’s take a look at the semantics.

Comparing the Semantics

Here is Plotkin’s semantics \(\mathcal{E}_P\). Let \(V,V'\) range over finite sets of values. \[\begin{aligned} {\mathcal{E}_P[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_P[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_P[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_P[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,V') \mid V' \subseteq {\mathcal{E}_P[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_P[\![ e_1\;e_2 ]\!]}\rho &= \bigcup \left\{ V' \, \middle| \begin{array}{l} \exists V.\, (V,V') {\in} {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_P[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_P[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_P[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] For Plotkin, the environment \(\rho\) maps variables to finite sets of values. In the case for application, the input set \(V\) must be a subset of the meaning of the argument, which is critical for enabling self application and, using the \(Y\) combinator, general recursion. The \(\bigcup\) flattens the set-of-finite-sets into a set.

Next we consider Engeler’s semantics \(\mathcal{E}_E\). \[\begin{aligned} {\mathcal{E}_E[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_E[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_E[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_E[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,v') \mid v' \in {\mathcal{E}_E[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_E[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v' \, \middle| \begin{array}{l} \exists V.\, (V,v') {\in} {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_E[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_E[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_E[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] The semantics is quite similar to Plotkin’s, as again we see the use of \(\subseteq\) in the case for application. Because the output \(v'\) is just a value, and not a finite set of values as for Plotkin, there is no need for the \(\bigcup\).

Finally we review my semantics \(\mathcal{E}_S\). For it we need to define an ordering on values that is just equality for integers and \(\subseteq\) on function graphs. Let \(t\) range over \(\mathcal{P}_{f}(\mathbb{V} \times \mathbb{V})\). \[\frac{}{n \sqsubseteq n} \qquad \frac{t_1 \subseteq t_2}{t_1 \sqsubseteq t_2}\] Then we define \(\mathcal{E}_S\) as follows. \[\begin{aligned} {\mathcal{E}_S[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_S[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_S[\![ x ]\!]}\rho &= \{ v \mid v \sqsubseteq \rho(x) \} \\ {\mathcal{E}_S[\![ {\lambda x.\,} e ]\!]}\rho &= \{ t \mid \forall (v,v')\in t.\, v' \in {\mathcal{E}_S[\![ e ]\!]}\rho(x{:=}v) \} \\ {\mathcal{E}_S[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v \, \middle| \begin{array}{l} \exists t\, v_2\, v_3\, v_3'.\, t {\in} {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land v_2 {\in} {\mathcal{E}_S[\![ e_2 ]\!]}\rho \\ \land\, (v_3, v_3') \in t \land v_3 \sqsubseteq v_2 \land v \sqsubseteq v_3' \end{array} \right\} \\ {\mathcal{E}_S[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_S[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] In my semantics, \(\rho\) maps a variable to a single value. The \(v_3 \sqsubseteq v_2\) in my semantics corresponds to the uses of \(\subseteq\) in Plotkin and Engeler’s. One can view this as a kind of subsumption, allowing the use of a larger approximation of a function in places where a smaller approximation is needed. I’m not sure whether all the other uses of \(\sqsubseteq\) are necessary, but the semantics needs to be downward closed, and the above placement of \(\sqsubseteq\)’s makes this easy to prove.

Relational Semantics

For people like myself with a background in operational semantics, there is another view of the semantics that is helpful to look at. We can turn the above dentoational semantics into a relational semantics (like a big-step semantics) that hides the \(\mathcal{P}\) by making use of the following isomorphism (where \(\mathbb{V}\) is one of \(\mathbb{V}_S\), \(\mathbb{V}_P\), or \(\mathbb{V}_E\)). \[\mathbb{E} \to (\mathbb{X}\rightharpoonup \mathbb{V}) \to {\mathcal{P}(\mathbb{V})} \quad\cong\quad \mathbb{E} \times (\mathbb{X}\rightharpoonup \mathbb{V}) \times \mathbb{V}\] Let \(v\) range over \(\mathbb{V}\). We can define the semantic relation \(\rho \vdash_S e \Rightarrow v\) that corresponds to \(\mathcal{E}_S\) as follows. Note that in the rule for lambda abstraction, the table \(t\) comes out of thin air (it is existentially quantified), and that there is one premise in the rule per entry in the table, that is, we have the quantification \(\forall(v,v') \in t\). \[\begin{gathered} \frac{}{\rho \vdash_S n \Rightarrow n} \quad \frac {\rho \vdash_S e_1 \Rightarrow n_1 \quad \rho \vdash_S e_2 \Rightarrow n_2} {\rho \vdash_S e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \sqsubseteq \rho(x)} {\rho \vdash_S x \Rightarrow v} \\[3ex] \frac{\forall (v,v'){\in} t.\; \rho(x{:=}v) \vdash_S e \Rightarrow v'} {\rho \vdash_S {\lambda x.\,}e \Rightarrow t} \quad \frac{\begin{array}{c}\rho \vdash_S e_1 \Rightarrow t \quad \rho \vdash_S e_2 \Rightarrow v_2 \\ (v_3,v'_3) \in t \quad v_3 \sqsubseteq v_2 \quad v \sqsubseteq v'_3 \end{array} } {\rho \vdash_S (e_1{\;}e_2) \Rightarrow v} \\[3ex] \frac{\rho \vdash_S e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_S e_2 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_S e_1 \Rightarrow 0 \quad \rho \vdash_S e_3 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]

For comparison, let us also turn Plotkin’s semantics into a relation. \[\begin{gathered} \frac{}{\rho \vdash_P n \Rightarrow n} \quad \frac {\rho \vdash_P e_1 \Rightarrow n_1 \quad \rho \vdash_P e_2 \Rightarrow n_2} {\rho \vdash_P e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \in \rho(x)} {\rho \vdash_P x \Rightarrow v} \\[3ex] \frac{\forall v' \in V'.\, \rho(x{:=}V) \vdash_P e \Rightarrow v'} {\rho \vdash_P {\lambda x.\,}e \Rightarrow (V,V')} \quad \frac{\begin{array}{c}\rho \vdash_P e_1 \Rightarrow (V,V') \quad \forall v_2 \in V.\, \rho \vdash_P e_2 \Rightarrow v_2 \\ v' \in V' \end{array} } {\rho \vdash_P (e_1{\;}e_2) \Rightarrow v'} \\[3ex] \frac{\rho \vdash_P e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_P e_2 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_P e_1 \Rightarrow 0 \quad \rho \vdash_P e_3 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\] Recall that in Plotkin’s semantics, the environment maps variables to finite sets of values. The “set” is needed to handle the case of a function bound to a variable, but is just extra baggage when we have an integer bound to a variable. So in the variable rule we have \(v \in \rho(x)\), which either extracts a singleton integer from \(\rho(x)\), or extracts one input-output entry from a function’s graph. Moving on to the lambda rule, it only produces one input-output entry, but to handle the case when the output \(V'\) is representing a function, we must build it up one entry at a time with the quantification \(\forall v'\in V'\) and a finite but arbitrary number of premises. In the application rule we again have a finite number of premises, with \(\forall v_2\in V\), and also the premise \(v' \in V'\).

The relational version of Engeler’s semantics removes the need for quantification in the lambda rule, but the application rule still has \(\forall v_2 \in V\). \[\begin{gathered} \frac{\rho(x{:=}V) \vdash_E e \Rightarrow v'} {\rho \vdash_E {\lambda x.\,}e \Rightarrow (V,v')} \quad \frac{\begin{array}{c}\rho \vdash_E e_1 \Rightarrow (V,v') \quad \forall v_2 \in V.\, \rho \vdash_E e_2 \Rightarrow v_2 \end{array} } {\rho \vdash_E (e_1{\;}e_2) \Rightarrow v'}\end{gathered}\]


My semantics is similar to Plotkin and Engeler’s in that

  1. The domain is a set of values, and values are inductively defined and involve finite sets.

  2. Self application is enabled by allowing a kind of subsumption on functions.

The really nice thing about all three semantics is that they are simple; very little mathematics is necessary to understand them, which is important pedagogically, practically (easier for practitioners to apply such semantics), and aesthetically (Occam’s razor!).

My semantics is different to Plotkin and Engeler’s in that

  1. the definition of values places \(\mathcal{P}_f\) so that functions are literally represented by finite graphs, and

  2. environments map each variable to a single value, and

  3. \(\sqsubseteq\) is used instead of \(\subseteq\) to enable self application.

The upshot of these (relatively minor) differences is that my semantics may be easier to understand.

Thursday, July 13, 2017

POPL submission, pulling together these blog posts on semantics!

Last week I submitted a paper to POPL 2018 about the new kind of denotational semantics that I've been writing about in this blog, which I am now calling declarative semantics. I think this approach to semantics has the potential to replace operational semantics for the purposes of language specification. The declarative semantics has the advantage of being compositional and extensional while, like operational semantics, using only elementary mathematics. Thus, the declarative semantics should be better than operational semantics for reasoning about programs and for reasoning about the language as a whole (i.e. it's meta-theory). The paper pulls together many of the blog posts, updates them, and adds a semantics for mutable references. The paper is available now on arXiv and the Isabelle mechanization is available here. I hope you enjoy it and I welcome your feedback!

Wednesday, June 07, 2017

Revisiting "well-typed programs cannot go wrong"

Robin Milner proved that well-typed programs cannot go wrong in his 1978 paper A Theory of Type Polymorphism in Programming (Milner 1978). That is, he defined a type system and denotational semantics for the Exp language (a subset of ML) and then proved that the denotation of a well-typed program in Exp is not the “wrong” value. The “wrong” denotation signifies that a runtime type error occurred, so Milner’s theorem proves that the type system is strong enough to prevent all the runtime type errors that could occur in an Exp program. The denotational semantics used by Milner (1978) was based on the standard domain theory for an explicitly typed language with higher-order functions.

I have been exploring, over the last month, whether I can prove a similar theorem but using my new denotational semantics, and mechanize the proof in the Isabelle proof assistant. At first I tried to stay as close to Milner’s proof as possible, but in the process I learned that Milner’s proof is rather syntactic and largely consists of proving lemmas about how substitution interacts with the type system, which does not shed much light on the semantics of polymorphism.

Last week I decided to take a step back and try a more semantic approach and switch to a cleaner but more expressive setting, one with first-class polymorphism. So I wrote down a denotational semantics for System F (Reynolds 1974) extended with support for general recursion. The proof that well-typed programs cannot go wrong came together rather quickly. Today I finished the mechanization in Isabelle and it came in at just 539 lines for all the definitions, lemmas, and main proof. I’m excited to share the details of how it went! Spoiler: the heart of the proof turned out to be a lemma I call Compositionality because it looks a lot like the similarly-named lemma that shows up in proofs of parametricity.


The types in the language include natural numbers, function types, universal types, and type variables. Regarding the variables, after some experimentation with names and locally nameless, I settled on good old DeBruijn indices to represent both free and bound type variables. \[\begin{array}{rcl} i,j & \in & \mathbb{N} \\ \sigma,\tau & ::= & \mathtt{nat} \mid \tau \to \tau \mid \forall\,\tau \mid i \end{array}\] So the type of the polymorphic identity function, normaly written \(\forall \alpha.\, \alpha \to \alpha\), is instead written \(\forall \left(0 \to 0\right)\).

The syntax of expressions is as follows. I choose to use DeBruijn indices for term variables as well, and left off all type annotations, but I don’t think that matters for our purposes here. \[\begin{array}{rcl} n & \in & \mathbb{N} \\ e & ::= & n \mid i \mid \lambda e \mid e\; e \mid \Lambda e \mid e [\,] \mid \mathtt{fix}\, e \end{array}\]

Denotational Semantics

The values in this language, described by the below grammar, include natural numbers, functions represented by finite lookup tables, type abstractions, and \(\mathsf{wrong}\) to represent a runtime type error. \[\begin{array}{rcl} f & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \} \\ o & ::= & \mathsf{none} \mid \mathsf{some}(v) \\ v & ::= & n \mid \mathsf{fun}(f) \mid \mathsf{abs}(o) \mid \mathsf{wrong} \end{array}\] A type abstraction \(\mathsf{abs}(o)\) consists of an optional value, and not simply a value, because the body of a type abstraction might be a non-terminating computation.

We define the following information ordering on values so that we can reason about one lookup table being more or less-defined than another lookup table. We define \(v \sqsubseteq v'\) inductively as follows.

\[\begin{gathered} n \sqsubseteq n \quad \frac{f_1 \subseteq f_2} {\mathsf{fun}(f_1) \sqsubseteq \mathsf{fun}(f_2)} \quad \mathsf{wrong} \sqsubseteq\mathsf{wrong} \\ \mathsf{abs}(\mathsf{none}) \sqsubseteq\mathsf{abs}(\mathsf{none}) \quad \frac{v \sqsubseteq v'} {\mathsf{abs}(\mathsf{some}(v)) \sqsubseteq\mathsf{abs}(\mathsf{some}(v'))}\end{gathered}\]

The denotational semantics maps an expression to a set of values. Why a set and not just a single value? A single finite lookup table is not enough to capture the meaning of a lambda, but an infinite set of finite tables is. However, dealing with sets is somewhat inconvenient, so we mitigate this issue by working in a set monad. Also, to deal with \(\mathsf{wrong}\) we need an error monad, so we use a combined set-and-error monad.

\[\begin{aligned} X := E_1 ; E_2 &\equiv \{ v \mid \exists v'. \, v' \in E_1, v' \neq \mathsf{wrong}, v \in E_2[v'/X] \} \\ & \quad \cup \{ v \mid v = \mathsf{wrong}, \mathsf{wrong} \in E_1 \} \\ \mathsf{return}(E) & \equiv \{ v \mid v \sqsubseteq E \} \\ X \leftarrow E_1; E_2 & \equiv \{ v \mid \exists v'.\, v' \in E_1, v \in E_2[v'/X]\}\end{aligned}\]

The use of \(\sqsubseteq\) in \(\mathsf{return}\) is to help ensure that the meaning of an expression is downward-closed with respect to \(\sqsubseteq\). (The need for which is explained in prior blog posts.)

Our semantics will make use of a runtime environment \(\rho\) that includes two parts, \(\rho_1\) and \(\rho_2\). The first part gives meaning to the term variables, for which we use a list of values (indexed by their DeBruijn number). The second part, for the type variables, is a list containing sets of values, as the meaning of a type will be a set of values. We define the following notation for dealing with runtime environments.

\[\begin{aligned} v{::}\rho \equiv (v{::}\rho_1, \rho_2) \\ V{::}\rho \equiv (\rho_1, V{::}\rho_2)\end{aligned}\]

We write \(\rho[i]\) to mean either \(\rho_1[i]\) or \(\rho_2[i]\), which can be disambiguated based on the context of use.

To help define the meaning of \(\mathtt{fix}\,e\), we inductively define a predicate named \(\mathsf{iterate}\). Its first parameter is the meaning \(L\) of the expression \(e\), which is a mapping from an environment to a set of values. The second parameter is a runtime environment \(\rho\) and the third parameter is a value that is the result of iteration.

\[\begin{gathered} \mathsf{iterate}(L, \rho, \mathsf{fun}(\emptyset)) \quad \frac{\mathsf{iterate}(L, \rho, v) \quad v' \in E(v{::}\rho)} {\mathsf{iterate}(L, \rho, v')}\end{gathered}\]

To help define the meaning of function application, we define the following \(\mathsf{apply}\) functiion. \[\mathsf{apply}(V_1,V_2) \equiv \begin{array}{l} x_1 := V_1; \\ x_2 := V_2; \\ \mathsf{case}\,x_1\,\textsf{of}\\ \;\; \mathsf{fun}(f) \Rightarrow (x'_2,x'_3) \leftarrow f; \mathsf{if}\, x'_2 \sqsubseteq x_2 \, \mathsf{then}\, x'_3 \,\mathsf{else}\, \emptyset \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\]

The denotational semantics is given by the following function \(E\) that maps an expression and environment to a set of values.

\[\begin{aligned} E[ n ]\rho &= \mathsf{return}(n) \\[1ex] E[ i ]\rho &= \mathsf{return}(\rho[i]) \\[1ex] E[ \lambda e ]\rho &= \{ v \mid \exists f.\, v = \mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f \Rightarrow \\ & \qquad\qquad \exists v_2.\, v_2 \in E[ e ] (v_1{::}\rho), v'_2 \sqsubseteq v_2\} \\[1ex] E[ e_1\; e_2 ] \rho &= \mathsf{apply}(E[ e_1 ]\rho, E[ e_2 ]\rho) \\[1ex] E[ \mathtt{fix}\,e ] \rho &= \{ v \mid \mathsf{iterate}(E[ e ], \rho, v) \} \\[1ex] E[ \Lambda e ] \rho &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V. v' \in E[ e ] (V{::}\rho) \} \\ & \quad\; \cup \{ v \mid v = \mathsf{abs}(\mathsf{none}), \forall V. E[ e ](V{::}\rho) = \emptyset \} \\[1ex] E[ e [\,] ] \rho &= \begin{array}{l} x := E [ e ] \rho;\\ \mathsf{case}\,x\,\mathsf{of} \\ \;\; \mathsf{abs}(\mathsf{none}) \Rightarrow \emptyset \\ \mid \mathsf{abs}(\mathsf{some}(v')) \Rightarrow \mathsf{return}(v') \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\end{aligned}\]

We give meaning to types with the function \(T\), which maps a type and an environment to a set of values. For this purposes, we only need the second part of the runtime environment which gives meaning to type variables. Instead of writing \(\rho_2\) everywhere, we’ll use the letter \(\eta\). It is important to ensure that \(T\) is downward closed, which requires some care either in the definition of \(T[ \forall \tau ]\eta\) or in the definition of \(T[ i ]\eta\). We have chosen to do this work in the definition of \(T[ i ]\eta\), and let the definition of \(T[ \forall \tau ]\eta\) quantify over any set of values \(V\) to give meaning to it’s bound type variable.

\[\begin{aligned} T[ \mathtt{nat} ] \eta &= \mathbb{N} \\ T[ i ] \eta &= \begin{cases} \{ v \mid \exists v'.\, v' \in \eta[i], v \sqsubseteq v',v \neq \mathsf{wrong} \} &\text{if } i < |\eta| \\ \emptyset & \text{otherwise} \end{cases} \\ T[ \sigma\to\tau ] \eta &= \{ v\mid \exists f. \,v=\mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f, v_1 \in T[\sigma]\eta \\ & \hspace{1.5in} \Rightarrow \exists v_2.\, v_2 \in T[\tau]\eta, v'_2 \sqsubseteq v_2 \} \\ T[ \forall\tau ] \eta &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V.\, v' \in T[\tau ] (V{::}\eta) \} \cup \{ \mathsf{abs}(\mathsf{none}) \} \end{aligned}\]

Type System

Regarding the type system, it is standard except perhaps how we deal with the DeBruijn representation of type variables. We begin with the definition of well-formed types. A type is well formed if all the type variables in it are properly scoped, which is captured by their indices being below a given threshold (the number of enclosing type variable binders, that is, \(\Lambda\)’s and \(\forall\)’s). More formally, we write \(j \vdash \tau\) to say that type \(\tau\) is well-formed under threshold \(j\), and give the following inductive definition.

\[\begin{gathered} j \vdash \mathtt{nat} \quad \frac{j \vdash \sigma \quad j \vdash \tau}{j \vdash \sigma \to \tau} \quad \frac{j+1 \vdash \tau }{j \vdash \forall \tau} \quad \frac{i < j}{j \vdash i}\end{gathered}\]

Our representation of the type environment is somewhat unusual. Because term variables are just DeBruijn indices, we can use a list of types (instead of a mapping from names to types). However, to keep track of the type-variable scoping, we also include with each type the threshold from its point of definition. Also, we need to keep track of the current threshold, so when we write \(\Gamma\), we mean a pair where \(\Gamma_1\) is a list and \(\Gamma_2\) is a number. The list consists of pairs of types and numbers, so for example, \(\Gamma_1[i]_1\) is a type and \(\Gamma_1[i]_2\) is a number whenever \(i\) is less than the length of \(\Gamma_1\). We use the following notation for extending the type environment:

\[\begin{aligned} \tau :: \Gamma &\equiv ((\tau,\Gamma_2){::}\Gamma_1, \Gamma_2) \\ * :: \Gamma & \equiv (\Gamma_1, \Gamma_2 + 1)\end{aligned}\]

We write \(\vdash \rho : \Gamma\) to say that environment \(\rho\) is well-typed according to \(\Gamma\) and define it inductively as follows.

\[\begin{gathered} \vdash ([],[]) : ([], 0) \quad \frac{\vdash \rho : \Gamma \quad v \in T[ \tau ] \rho_2} {\vdash v{::}\rho : \tau{::}\Gamma} \quad \frac{\vdash \rho : \Gamma} {\vdash V{::}\rho : *{::}\Gamma}\end{gathered}\]

The primary operation that we perform on a type environment is looking up the type associated with a term variable, for which we define the following function \(\mathsf{lookup}\) that maps a type environment and DeBruijn index to a type. To make sure that the resulting type is well-formed in the current environment, we must increase all of its free type variables by the difference of the current threshold \(\Gamma_2\) and the threshold at its point of definition, \(\Gamma_1[i]_2\), which is accomplished by the shift operator \(\uparrow^k_c(\tau)\) (Pierce 2002). \[\mathsf{lookup}(\Gamma,i) \equiv \begin{cases} \mathsf{some}(\uparrow^{k}_{0}(\Gamma_1[i]_1) & \text{if } n < |\Gamma_1| \\ & \text{where } k = \Gamma_2 - \Gamma_1[i]_2 \\ \mathsf{none} & \text{otherwise} \end{cases}\]

To review, the shift operator is defined as follows.

\[\begin{aligned} \uparrow^{k}_{c}(\mathtt{nat}) &= \mathtt{nat} \\ \uparrow^{k}_{c}(i) &= \begin{cases} i + k & \text{if } c \leq i \\ i & \text{otherwise} \end{cases} \\ \uparrow^{k}_{c}(\sigma \to \tau) &= \uparrow^{k}_{c}(\sigma) \to \uparrow^{k}_{c}(\tau) \\ \uparrow^{k}_{c}(\forall \tau) &= \forall\, \uparrow^{k}_{c+1}(\tau)\end{aligned}\]

Last but not least, we need to define type substitution so that we can use it in the typing rule for instantiation (type application). We write \([j\mapsto \tau]\sigma\) for the substitution of type \(\tau\) for DeBruijn index \(j\) within type \(\sigma\) (Pierce 2002).

\[\begin{aligned} [j\mapsto \tau]\mathtt{nat} &= \mathtt{nat} \\ [j\mapsto\tau]i &= \begin{cases} \tau & \text{if } j = i \\ i - 1 & \text{if } j < i \\ i & \text{otherwise} \end{cases}\\ [j\mapsto\tau](\sigma\to\sigma') &= [j\mapsto\tau]\sigma \to [j\mapsto \tau]\sigma' \\ [j\mapsto \tau]\forall\sigma &= \forall\, [j+1 \mapsto \uparrow^{1}_{0}(\tau)]\sigma\end{aligned}\]

Here is the type system for System F extended with \(\mathtt{fix}\).

\[\begin{gathered} \Gamma \vdash n : \mathtt{nat} \qquad \frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau} \qquad \frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau} \\[2ex] \frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau} \qquad \frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}\end{gathered}\]

We say that a type environment \(\Gamma\) is well-formed if \(\Gamma_2\) is greater or equal to every threshold in \(\Gamma_1\), that is \(\Gamma_1[i]_2 \leq \Gamma_2\) for all \(i < |\Gamma_1|\).

Proof of well-typed programs cannot go wrong

The proof required 6 little lemmas and 4 big lemmas. (There were some itsy bitsy lemmas too that I’m not counting.)

Little Lemmas

Lemma [\(\sqsubseteq\) is a preorder]  

  • \(v \sqsubseteq v\)

  • If \(v_1 \sqsubseteq v_2\) and \(v_2 \sqsubseteq v_3\), then \(v_1 \sqsubseteq v_3\).

[lem:less-refl] [lem:less-trans]

I proved transitivity by induction on \(v_2\).

Lemma [\(T\) is downward closed] If \(v \in T [ \tau ] \eta\) and \(v' \sqsubseteq v\), then \(v' \in T [ \tau ] \eta\). [lem:T-down-closed]

The above is a straightforward induction on \(\tau\)

Lemma [\(\mathsf{wrong}\) not in \(T\)] For any \(\tau\) and \(\eta\), \(\mathsf{wrong} \notin T [ \tau ] \eta\). [lem:wrong-not-in-T]

The above is another straightforward induction on \(\tau\)

Lemma If \(\vdash \rho : \Gamma\), then \(\Gamma\) is a well-formed type environment. [lem:wfenv-good-ctx]

The above is proved by induction on the derivation of \(\vdash \rho : \Gamma\).

Lemma \[T [ \tau ] (\eta_1 \eta_3) = T [ \uparrow^{|\eta_2|}_{ |\eta_1|}(\tau) ] (\eta_1\eta_2\eta_3)\]

The above lemma is proved by induction on \(\tau\). It took me a little while to figure out the right strengthening of the statement of this lemma to get the induction to go through. The motivations for this lemma were the following corollaries.

Corollary [Lift/Append Preserves \(T\)] \[T [ \tau ](\eta_2) = T [ \uparrow^{|\eta_1|}_{0}(\tau) ] (\eta_1\eta_2)\] [lem:lift-append-preserves-T]

Corollary[Lift/Cons Preserves \(T\)] \[T [ \tau ] (\eta) = T [ \uparrow^{1}_{0}(\tau) ] (V{::}\eta)\] [lem:shift-cons-preserves-T]

Of course, two shifts can be composed into a single shift by adding the amounts.

Lemma [Compose Shift] \[\uparrow^{j+k}_{c}(\tau) = \uparrow^{j}_{c}( \uparrow^{k}_{c}(\tau))\] [lem:compose-shift]

The proof is a straightforward induction on \(\tau\).

Big Lemmas

There are one or two big lemmas for each of the “features” in this variant of System F.

The first lemma shows that well-typed occurrences of term variables cannot go wrong.

Lemma [Lookup in Well-typed Environment] 
If \(\vdash \rho : \Gamma\) and \(\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)\), then \(\exists v.\, \rho_1[i] = v\) and \(v \in T [ \tau ] \rho_2\). [lem:lookup-wfenv]

The proof is by induction on the derivation of \(\vdash \rho : \Gamma\). The first two cases were straightforward but the third case required some work and used lemmas [lem:wfenv-good-ctx], [lem:shift-cons-preserves-T], and [lem:compose-shift].

Lemma [Application cannot go wrong] If \(V \subseteq T [ \sigma \to \tau ] \eta\) and \(V' \subseteq T [ \sigma ] \eta\), then \(\mathsf{apply}(V,V') \subseteq T [ \tau ] \eta\). [lem:fun-app]

The proof of this lemma is direct and does not use induction. However, it does use lemmas [lem:wrong-not-in-T] and [lem:T-down-closed].

Lemma [Compositionality] Let \(V = T [ \sigma ] (\eta_1\eta_2)\). \[T [ \tau ] (\eta_1 V \eta_2) = T [ \tau[\sigma/|\eta_1|] ] (\eta_1 \eta_2)\] [lem:compositionality]

I proved the Compositionality lemma by induction on \(\tau\). All of the cases were straightforward except for \(\tau=\forall\tau'\). In that case I used the induction hypothesis to show that \[T [ \tau' ] (V \eta_1 S \eta_2) = T [ ([|V\eta_1|\mapsto \uparrow^1_0(\sigma)] \tau' ] (V\eta_1\eta_2) \text{ where } S = T [ \uparrow^1_0(\sigma) ] (V\eta_1\eta_2)\] and I used Lemma [lem:shift-cons-preserves-T].

Lemma [Iterate cannot go wrong] If

  • \(\mathsf{iterate}(L,\rho,v)\) and

  • for any \(v'\), \(v' \in T[ \sigma\to\tau ] \rho_2\) implies \(L(v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2\),

then \(v \in T [ \sigma \to \tau ] \rho_2\). [lem:iterate-sound]

This was straightfroward to prove by induction on the derivation of \(\mathsf{iterate}(L,\rho,v)\). The slightly difficult part was coming up with the definition of \(\mathsf{iterate}\) to begin with and in formulating the second premise.

The Theorem

Theorem [Well-typed programs cannot go wrong] 
If \(\Gamma \vdash e : \tau\) and \(\vdash \rho : \Gamma\), then \(E [ e ] \rho \subseteq T[ \tau ] \rho_2\). [thm:welltyped-dont-go-wrong]

The proof is by induction on the derivation of \(\Gamma \vdash e : \tau\).

  • \(\Gamma \vdash n : \mathtt{nat}\)

    This case is immediate.

  • \(\frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau}\)

    Lemma [lem:lookup-wfenv] tells us that \(\rho_1[i] = v\) and \(v \in T [ \tau ] \rho_2\) for some \(v\). We conclude by Lemma [lem:T-down-closed].

  • \(\frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau}\)

    After unraveling some definitions, for arbitrary \(f,v_1,v_2,v'_2\) we can assume \(v_1 \in T [ \sigma ] \rho_2\), \(v_2 \in E [ e ](v_1{::}\rho)\), and \(v'_2 \sqsubseteq v_2\). We need to prove that \(v_2 \in T [ \tau ] (v_1{::}\rho)_2\).

    We can show \(\vdash v_1{::}\rho : \sigma{::}\Gamma\) and therefore, by the induction hypothesis, \(E [ e ] (v_1{::}\rho) \subseteq T [ \tau ] (v_1{::}\rho)_2\). So we conclude that \(v_2 \in T [ \tau ] (v_1{::}\rho)_2\).

  • \(\frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau}\)

    By the induction hypothesis, we have \(E [ e ] \rho \subseteq T [ \sigma\to\tau ] \rho_2\) and \(E [ e' ] \rho \subseteq T [ \sigma ] \rho_2\). We conclude by Lemma [lem:fun-app].

  • \(\frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau}\)

    For an arbitrary \(v\), we may assume \(\mathsf{iterate}(E[ e ], \rho, v)\) and need to show that \(v \in T [ \sigma\to\tau ]\rho_2\).

    In preparation to apply Lemma [lem:iterate-sound], we first prove that for any \(v'\), \(v' \in T[ \sigma\to\tau ] \rho_2\) implies \(E[ e](v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2\). Assume \(v'' \in E[ e](v'{::}\rho)\). We need to show that \(v'' \in T[ \sigma\to\tau ] \rho_2\). We have \(\vdash v'{::}\rho : (\sigma\to\tau){::}\Gamma\), so by the induction hypothesis \(E [ e ](v'{::}\rho) \subseteq T[ \sigma\to\tau ](v'{::}\rho)\). From this we conclude that \(v'' \in T[ \sigma\to\tau ] \rho_2\).

    We then apply Lemma [lem:iterate-sound] to conclude this case.

  • \(\frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau}\)

    After unraveling some definitions, for an arbitrary \(v'\) and \(V\) we may assume that \(\forall V'.\, v' \in E [ e ](V{::}\rho)\). We need to show that \(v' \in T [ \tau ] (V{::}\rho_2)\). We have \(\vdash V{::}\rho : *{::}\Gamma\), so by the induction hypothesis \(E[ e ](V{::}\rho) \subseteq T [ \tau ] (V{::}\rho)_2\). Also, from the assumption we have \(v' \in E [ e ](V{::}\rho)\), so we can conclude.

  • \(\frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}\)

    Fix a \(v' \in E [ e ] \rho\). We have three cases to consider.

    1. \(v'=\mathsf{abs}(\mathsf{none})\). This case is immediate.

    2. \(v'=\mathsf{abs}(\mathsf{some}(v''))\) for some \(v''\). By the induction hypothesis, \(v' \in T [ \forall\tau ]\rho_2\). So we have \(v'' \in T [ \tau ](V{::}\rho_2)\) where \(V=T[\sigma]\rho_2\). Then by Compositionality (Lemma [lem:compositionality]) we conclude that \(v'' \in T [ [0\mapsto \sigma]\tau]\rho_2\).

    3. \(v'\) is some other kind of value. This can’t happen because, by the induction hypothesis, \(v' \in T [ \forall\tau ]\rho_2\).


Milner, Robin. 1978. “A Theory of Type Polymorphism in Programming.” Journal of Computer and System Sciences 17 (3): 348–75.

Pierce, Benjamin C. 2002. Types and Programming Languages. MIT Press.

Reynolds, John C. 1974. “Towards a Theory of Type Structure.” In Programming Symposium: Proceedings, Colloque Sur La Programmation, 19:408–25. LNCS. Springer-Verlag.

Friday, March 24, 2017

Consolidation of the Denotational Semantics and an Application to Compiler Correctness

This is a two part post. The second part depends on the first.

Part 1. Consolidation of the Denotational Semantics

As a matter of expediency, I've been working with two different versions of the intersection type system upon which the denotational semantics is based, one version with subsumption and one without. I had used the one with subsumption to prove completeness with respect to the reduction semantics whereas I had used the one without subsumption to prove soundness (for both whole programs and parts of programs, that is, contextual equivalence). The two versions of the intersection type system are equivalent. However, it would be nice to simplify the story and just have one version. Also, while the correspondence to intersection types has been enormously helpful in working out the theory, it would be nice to have a presentation of the semantics that doesn't talk about them and instead talks about functions as tables.

Towards these goals, I went back to the proof of completeness with respect to the reduction semantics and swapped in the "take 3" semantics. While working on that I realized that the subsumption rule was almost admissible in the "take 3" semantics, just the variable and application equations needed more uses of \(\sqsubseteq\). With those changes in place, the proof of completeness went through without a hitch. So here's the updated definition of the denotational semantics of the untyped lambda calculus.

The definition of values remains the same as last time: \[ \begin{array}{lrcl} \text{function tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] as does the \(\sqsubseteq\) operator. \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} For the denotation function \(E\), we add uses of \(\sqsubseteq\) to the equations for variables (\(v \sqsubseteq \rho(x)\)) and function application (\(v_3 \sqsubseteq v_3'\)). (I've also added the conditional expression \(\mathbf{if}\,e_1\,e_2\,e_3\) and primitive operations on numbers \(f(e_1,e_2)\), where \(f\) ranges over binary functions on numbers.) \begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ v \mid v \sqsubseteq \rho(x) \} \\ E[\!| \lambda x.\, e |\!](\rho) &= \left\{ T \middle| \begin{array}{l} \forall v_1 v_2'. \, v_1\mapsto v_2' \in T \Rightarrow\\ \exists v_2.\, v_2 \in E[\!| e |\!](\rho(x{:=}v_1)) \land v_2' \sqsubseteq v_2 \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists T v_2 v_2' v_3'.\, T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, v'_2\mapsto v_3' \in T \land v'_2 \sqsubseteq v_2 \land v_3 \sqsubseteq v_3' \end{array} \right\} \\ E[\!| f(e_1, e_2) |\!](\rho) &= \{ f(n_1,n_2) \mid \exists n_1 n_2.\, n_1 \in E[\!| e_1 |\!](\rho) \land n_2 \in E[\!| e_2 |\!](\rho) \} \\ E[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](\rho) &= \left\{ v \, \middle| \begin{array}{l} v \in E[\!| e_2 |\!](\rho) \quad \text{if } n \neq 0 \\ v \in E[\!| e_3 |\!](\rho) \quad \text{if } n = 0 \end{array} \right\} \end{align*}

Here are the highlights of the results for this definition.

Proposition (Admissibility of Subsumption)
If \(v \in E[\!| e |\!] \) and \(v' \sqsubseteq v\), then \(v' \in E[\!| e |\!] \).

Theorem (Reduction implies Denotational Equality)

  1. If \(e \longrightarrow e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\).
  2. If \(e \longrightarrow^{*} e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\).

Theorem (Whole-program Soundness and Completeness)

  1. If \(v' \in E[\!| e |\!](\emptyset)\), then \(e \longrightarrow^{*} v\) and \(v' \in E[\!| v |\!](\emptyset)\).
  2. If \(e \longrightarrow^{*} v\), then \(v' \in E[\!| e |\!](\emptyset) \) and \(v' \in E[\!| v |\!](\emptyset) \) for some \(v'\).

Proposition (Denotational Equality is a Congruence)
For any context \(C\), if \(E[\!| e |\!] = E[\!| e' |\!]\), then \(E[\!| C[e] |\!] = E[\!| C[e'] |\!]\).

Theorem (Soundness wrt. Contextual Equivalence)
If \(E[\!| e |\!] = E[\!| e' |\!]\), then \(e \simeq e'\).

Part 2. An Application to Compiler Correctness

Towards finding out how useful this denotational semantics is, I've begun looking at using it to prove compiler correctness. I'm not sure exactly which compiler I want to target yet, but as a first step, I wrote a simple source-to-source optimizer \(\mathcal{O}\) for the lambda calculus. It performs inlining and constant folding and simplifies conditionals. The optimizer is parameterized over the inlining depth to ensure termination. We perform optimization on the body of a function after inlining, so this is a polyvariant optimizer. Here's the definition. \begin{align*} \mathcal{O}[\!| x |\!](k) &= x \\ \mathcal{O}[\!| n |\!](k) &= n \\ \mathcal{O}[\!| \lambda x.\, e |\!](k) &= \lambda x.\, \mathcal{O}[\!| e |\!](k) \\ \mathcal{O}[\!| e_1\,e_2 |\!](k) &= \begin{array}{l} \begin{cases} \mathcal{O}[\!| [x{:=}e_2'] e |\!] (k{-}1) & \text{if } k \geq 1 \text{ and } e_1' = \lambda x.\, e \\ & \text{and } e_2' \text{ is a value} \\ e_1' \, e_2' & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| f(e_1,e_2) |\!](k) &= \begin{array}{l} \begin{cases} f(n_1,n_2) & \text{if } e_1' = n_1 \text{ and } e_2' = n_2 \\ f(e_1',e_2') & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](k) &= \begin{array}{l} \begin{cases} e_2' & \text{if } e_1' = n \text{ and } n \neq 0 \\ e_3' & \text{if } e_1' = n \text{ and } n = 0 \\ \mathbf{if}\,e_1'\, e_2'\,e_3'|\!](k) & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k)\\ \text{ and } e_3' = \mathcal{O}[\!|e_3 |\!](k) \end{array} \end{align*}

I've proved that this optimizer is correct. The first step was proving that it preserves denotational equality.

Lemma (Optimizer Preserves Denotations)
\(E(\mathcal{O}[\!| e|\!](k)) = E[\!|e|\!] \)
The proof is by induction on the termination metric for \(\mathcal{O}\), which is the lexicographic ordering of \(k\) then the size of \(e\). All the cases are straightforward to prove because Reduction implies Denotational Equality and because Denotational Equality is a Congruence. QED

Theorem (Correctness of the Optimizer)
\(\mathcal{O}[\!| e|\!](k) \simeq e\)
The proof is a direct result of the above Lemma and Soundness wrt. Contextual Equivalence. QED

Of course, all of this is proved in Isabelle. Here is the tar ball. I was surprised that this proof of correctness for the optimizer was about the same length as the definition of the optimizer!

Friday, March 10, 2017

The Take 3 Semantics, Revisited

In my post about intersection types as denotations, I conjectured that the simple "take 3" denotational semantics is equivalent to an intersection type system. I haven't settled that question per se, but I've done something just as good, which is to show that everything that I've done with the intersection type system can also be done with the "take 3" semantics (with a minor modification).

Recall that the main difference between the "take 3" semantics and the intersection type system is how subsumption of functions is handled. The "take 3" semantics defined function application as follows, using the subset operator \(\sqsubseteq\) to require the argument \(v_2\) to include all the entries in the parameter \(v'_2\), while allowing \(v_2\) to have possibly more entries. \begin{align*} E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists v_1 v_2 v'_2.\, v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, \{ v'_2\mapsto v_3 \} \sqsubseteq v_1 \land v'_2 \sqsubseteq v_2 \end{array} \right\} \end{align*} Values are either numbers or functions. Functions are represented as a finite tables mapping values to values. \[ \begin{array}{lrcl} \text{tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] and \(\sqsubseteq\) is defined as equality on numbers and subset for function tables: \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} Recall that \(\subseteq\) is defined in terms of equality on elements.

In an intersection type system (without subsumption), function application uses subtyping. Here's one way to formulate the typing rule for application: \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A' \to B \quad A <: A'} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Types are defined as follows \[ \begin{array}{lrcl} \text{types} & A,B,C & ::= & n \mid A \to B \mid A \land B \mid \top \end{array} \] and the subtyping relation is given below. \begin{gather*} \frac{}{n <: n}(a) \quad \frac{}{\top <: \top}(b) \quad \frac{}{A \to B <: \top}(c) \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(d) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B}(e) \quad \frac{}{A \wedge B <: A}(f) \quad \frac{}{A \wedge B <: B}(g) \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)}(h) \end{gather*} Recall that values and types are isomorphic (and dual) to eachother in this setting. Here's the functions \(\mathcal{T}\) and \(\mathcal{V}\) that map back and forth between values and types. \begin{align*} \mathcal{T}(n) &= n \\ \mathcal{T}( \{ v_1 \mapsto v'_1, \ldots, v_n \mapsto v'_n \} ) &= \mathcal{T}(v_1) {\to} \mathcal{T}(v'_1) \land \cdots \land \mathcal{T}(v_n) {\to} \mathcal{T}(v'_n) \\[2ex] \mathcal{V}(n) &= n \\ \mathcal{V}(A \to B) &= \{ \mathcal{V}(A)\mapsto\mathcal{V}(B) \} \\ \mathcal{V}(A \land B) &= \mathcal{V}(A) \cup \mathcal{V}(B)\\ \mathcal{V}(\top) &= \emptyset \end{align*}

Given that values and types are really the same, the the typing rule for application is almost the same as the equation for the denotation of \(E[\!| e_1\;e_2 |\!](\rho)\). The only real difference is the use of \(<:\) versus \(\sqsubseteq\). However, subtyping is a larger relation than \(\sqsubseteq\), i.e., \(v_1 \sqsubseteq v_2\) implies \(\mathcal{T}(v_1) <: \mathcal{T}(v_2)\) but it is not the case that \(A <: B\) implies \(\mathcal{V}(A) \sqsubseteq \mathcal{V}(B)\). Subtyping is larger because of rules \((d)\) and \((h)\). The other rules just express the dual of \(\subseteq\).

So the natural question is whether subtyping needs to be bigger than \(\sqsubseteq\), or would we get by with just \(\sqsubseteq\)? In my last post, I mentioned that rule \((h)\) was not necessary. Indeed, I removed it from the Isabelle formalization without disturbing the proofs of whole-program soundness and completeness wrt. operational semantics, and was able to carry on and prove soundness wrt. contextual equivalence. This morning I also replaced rule \((d)\) with a rule that only allows equal function types to be subtypes. \[ \frac{}{A \to B <: A \to B}(d') \] The proofs went through again! Though I did have to make two minor changes in the type system without subsumption to ensure that it stays equivalent to the version of the type system with subsumption. I used the rule given above for function application instead of \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A \to B} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Also, I had to change the typing rule for \(\lambda\) to use subtyping to relate the body's type to the return type. \[ \frac{\Gamma,x:A \vdash e : B' \qquad B' <: B} {\Gamma \vdash \lambda x.\, e : A \to B} \] Transposing this back into the land of denotational semantics and values, we get the following equation for the meaning of \(\lambda\), in which everything in the return specification \(v_2\) must be contained in the value \(v'_2\) produced by the body. \[ E[\!| \lambda x.\; e |\!] (\rho) = \left\{ v \middle| \begin{array}{l}\forall v_1 v_2. \{v_1\mapsto v_2\} \sqsubseteq v \implies \\ \exists v_2'.\; v'_2 \in E[\!| e |\!] (\rho(x{:=}v_1)) \,\land\, v_2 \sqsubseteq v'_2 \end{array} \right\} \]

So with this little change, the "take 3" semantics is a great semantics for the call-by-value untyped lambda calculus! For whole programs, it's sound and complete with respect to the standard operational semantics, and it is also sound with respect to contextual equivalence.