Sunday, October 15, 2017

New revision of the semantics paper (POPL rejection, ESOP submission)

My submission about declarative semantics to POPL was rejected. It's been a few weeks now, so I'm not so angry about it anymore. I've revised the paper and will be submitting it to ESOP this week.

The main reason for rejection according to the reviewers was a lack of technical novelty, but I think the real reasons were that 1) the paper came across as too grandiose and as a result, it accidentally annoyed the reviewer who is an expert in denotational semantics, and 2) the paper did not do a good job of comparing to the related set-theoretic models of Plotkin and Engeler.

Regarding 1), in the paper I use the term "declarative semantics" to try and distance this new semantics from the standard lattice-based denotational semantics. However, the reviewer took it to claim that the new semantics is not a denotational semantics, which is clearly false. In the new version of the paper I've removed the term "declarative semantics" and instead refer to the new semantics as a denotational semantics of the "elementary" variety. Also, I've toned down the sales pitch to better acknowledge that this new semantics is not the first elementary denotational semantics.

Regarding 2), I've revised the paper to include a new section at the beginning that gives background on the elementary semantics of Plotkin, Engeler, and Coppo et al. This should help put the contributions of the paper in context.

Other than that, I've added a section with a counter example to full abstraction. A big thanks to the POPL reviewers for the counter example! (Also thanks to Max New, who sent me the counter example a couple months ago.)

Unfortunately, the ESOP page limit is a bit shorter, so I removed the relational version of the semantics and also the part about mutable references.

A draft of the revision is available on arXiv. Feedback is most welcome, especially from experts in denotational semantics! I really hope that this version is no longer annoying, but if it is, please tell me!

Tuesday, October 03, 2017

Comparing to Plotkin and Engeler's Set-theoretic Models of the Lambda Calculus

On the plane ride back from ICFP last month I had a chance to re-read and better understand Plotkin’s Set-theoretical and other elementary models of the \(\lambda\)-calculus (Technical Report 1972, Theoretical Computer Science 1993) and to read, for the first time, Engeler’s Algebras and combinators (Algebra Universalis 1981). As I wrote in my draft paper Declarative semantics for functional languages: compositional, extensional, and elementary, the main intuitions behind my simple semantics are present in these earlier papers, but until now I did not understand these other semantics deeply enough to give a crisp explanation of the similarities and differences. (The main intuitions are also present in the early work on intersection type systems, and my semantics is more closely related to those systems. A detailed explanation of that relationship is given in the draft paper.)

I should note that Engeler’s work was in the context of combinators (S and K), not the \(\lambda\)-calculus, but of course the \(\lambda\)-calculus can be encoded into combinators. I’ve ported his definitions to the \(\lambda\)-calculus, along the lines suggested by Plotkin (1993), to make for easier comparison. In addition, I’ll extend both Engeler and Plotkin’s semantics to include integers and integer arithmetic in addition to the \(\lambda\)-calculus. Here’s the syntax for the \(\lambda\)-calculus that we consider here: \[\begin{array}{rcl} && n \in \mathbb{Z} \qquad x \in \mathbb{X} \;\;\text{(program variables)}\\ \oplus & ::= & + \mid - \mid \times \mid \div \\ \mathbb{E} \ni e & ::= & n \mid e \oplus e \mid x \mid {\lambda x.\,} e \mid e \; e \mid {\textbf{if}\,}e {\,\textbf{then}\,}e {\,\textbf{else}\,}e \end{array}\]

Values

Perhaps the best place to start the comparison is in the definition of what I’ll call values. All three semantics give an inductive definition of values and all three involve finite sets, but in different ways. I’ll write \(\mathbb{V}_S\) for my definition, \(\mathbb{V}_P\) for Plotkin’s, and \(\mathbb{V}_E\) for Engeler’s. \[\begin{aligned} \mathbb{V}_S &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_S \times \mathbb{V}_S) \\ \mathbb{V}_P &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_P) \times \mathcal{P}_f(\mathbb{V}_P) \\ \mathbb{V}_E &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_E) \times \mathbb{V}_E\end{aligned}\] In \(\mathbb{V}_S\), a function is represented as a finite graph, that is, a finite set of input-output pairs. For example, the graph \(\{ (0,1), (1,2), (2,3) \}\) is one of the meanings for the term \((\lambda x.\, x + 1)\).

Plotkin’s values \(\mathbb{V}_P\) include only a single input-output pair from a function’s graph. For example, \((\{0\}, \{1\})\) is one of the meanings for the term \((\lambda x.\, x + 1)\). Engeler’s values also include just a single entry. For example, \((\{0\}, 1)\) is one of the meanings for the term \((\lambda x.\, x + 1)\). In this example we have not made use of the finite sets in the input and output of Plotkin’s values. To do so, let us consider a higher-order example, such as the term \((\lambda f.\, f\,1 + f\,2)\). For Plotkin, the following value is one of its meanings: \[(\{ (\{1\}, \{3\}), (\{2\}, \{4\}) \}, \{7\})\] That is, in case \(f\) is the function that adds \(2\) to its input, the result is \(7\). We see that the presence of finite sets in the input is needed to accomodate functions-as-input. The corresponding value in \(\mathbb{V}_S\) is \[\{ (\{ (1, 3), (2, 4) \}, 7) \}\]

The difference between Plotkin and Engeler’s values can be seen in functions that return functions. Consider the \(K\) combinator \((\lambda x.\,\lambda y.\, x)\). For Plotkin, the following value is one of its meanings: \[(\{1\}, \{ (\{0\},\{1\}), (\{2\},\{1\}) \})\] That is, when applied to \(1\) it returns a function that returns \(1\) when applied to either \(0\) or \(2\). The corresponding value in \(\mathbb{V}_S\) is \[\{ (1, \{ (0,1), (2,1) \}) \}\] For Engeler, there is not a single value corresponding to the above value. Instead it requires two values to represent the same information. \[(\{1\}, (\{0\},1)) \quad\text{and}\quad (\{1\}, (\{2\},1))\] We’ll see later that it doesn’t matter that Engeler requires more values to represent the same information.

The Domains

The semantics of Plotkin, Engeler, and myself does not use values for the domain, but instead a set of values. That is \[\mathcal{P}(\mathbb{V}_S) \qquad \mathcal{P}(\mathbb{V}_P) \qquad \mathcal{P}(\mathbb{V}_E)\]

The role of the outer \(\mathcal{P}\) is intimately tied to the meaning of functions in Plotkin and Engeler’s semantics because the values themselves only record a single input-output pair. The outer \(\mathcal{P}\) is needed to represent all of the input-output pairs for a given function. While the \(\mathcal{P}\) is also necessary for functions in my semantics, one can view it generically as providing non-determinism and therefore somewhat orthogonal to the meaning of functions per se. Next let’s take a look at the semantics.

Comparing the Semantics

Here is Plotkin’s semantics \(\mathcal{E}_P\). Let \(V,V'\) range over finite sets of values. \[\begin{aligned} {\mathcal{E}_P[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_P[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_P[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_P[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,V') \mid V' \subseteq {\mathcal{E}_P[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_P[\![ e_1\;e_2 ]\!]}\rho &= \bigcup \left\{ V' \, \middle| \begin{array}{l} \exists V.\, (V,V') {\in} {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_P[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_P[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_P[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] For Plotkin, the environment \(\rho\) maps variables to finite sets of values. In the case for application, the input set \(V\) must be a subset of the meaning of the argument, which is critical for enabling self application and, using the \(Y\) combinator, general recursion. The \(\bigcup\) flattens the set-of-finite-sets into a set.

Next we consider Engeler’s semantics \(\mathcal{E}_E\). \[\begin{aligned} {\mathcal{E}_E[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_E[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_E[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_E[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,v') \mid v' \in {\mathcal{E}_E[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_E[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v' \, \middle| \begin{array}{l} \exists V.\, (V,v') {\in} {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_E[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_E[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_E[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] The semantics is quite similar to Plotkin’s, as again we see the use of \(\subseteq\) in the case for application. Because the output \(v'\) is just a value, and not a finite set of values as for Plotkin, there is no need for the \(\bigcup\).

Finally we review my semantics \(\mathcal{E}_S\). For it we need to define an ordering on values that is just equality for integers and \(\subseteq\) on function graphs. Let \(t\) range over \(\mathcal{P}_{f}(\mathbb{V} \times \mathbb{V})\). \[\frac{}{n \sqsubseteq n} \qquad \frac{t_1 \subseteq t_2}{t_1 \sqsubseteq t_2}\] Then we define \(\mathcal{E}_S\) as follows. \[\begin{aligned} {\mathcal{E}_S[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_S[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_S[\![ x ]\!]}\rho &= \{ v \mid v \sqsubseteq \rho(x) \} \\ {\mathcal{E}_S[\![ {\lambda x.\,} e ]\!]}\rho &= \{ t \mid \forall (v,v')\in t.\, v' \in {\mathcal{E}_S[\![ e ]\!]}\rho(x{:=}v) \} \\ {\mathcal{E}_S[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v \, \middle| \begin{array}{l} \exists t\, v_2\, v_3\, v_3'.\, t {\in} {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land v_2 {\in} {\mathcal{E}_S[\![ e_2 ]\!]}\rho \\ \land\, (v_3, v_3') \in t \land v_3 \sqsubseteq v_2 \land v \sqsubseteq v_3' \end{array} \right\} \\ {\mathcal{E}_S[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_S[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] In my semantics, \(\rho\) maps a variable to a single value. The \(v_3 \sqsubseteq v_2\) in my semantics corresponds to the uses of \(\subseteq\) in Plotkin and Engeler’s. One can view this as a kind of subsumption, allowing the use of a larger approximation of a function in places where a smaller approximation is needed. I’m not sure whether all the other uses of \(\sqsubseteq\) are necessary, but the semantics needs to be downward closed, and the above placement of \(\sqsubseteq\)’s makes this easy to prove.

Relational Semantics

For people like myself with a background in operational semantics, there is another view of the semantics that is helpful to look at. We can turn the above dentoational semantics into a relational semantics (like a big-step semantics) that hides the \(\mathcal{P}\) by making use of the following isomorphism (where \(\mathbb{V}\) is one of \(\mathbb{V}_S\), \(\mathbb{V}_P\), or \(\mathbb{V}_E\)). \[\mathbb{E} \to (\mathbb{X}\rightharpoonup \mathbb{V}) \to {\mathcal{P}(\mathbb{V})} \quad\cong\quad \mathbb{E} \times (\mathbb{X}\rightharpoonup \mathbb{V}) \times \mathbb{V}\] Let \(v\) range over \(\mathbb{V}\). We can define the semantic relation \(\rho \vdash_S e \Rightarrow v\) that corresponds to \(\mathcal{E}_S\) as follows. Note that in the rule for lambda abstraction, the table \(t\) comes out of thin air (it is existentially quantified), and that there is one premise in the rule per entry in the table, that is, we have the quantification \(\forall(v,v') \in t\). \[\begin{gathered} \frac{}{\rho \vdash_S n \Rightarrow n} \quad \frac {\rho \vdash_S e_1 \Rightarrow n_1 \quad \rho \vdash_S e_2 \Rightarrow n_2} {\rho \vdash_S e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \sqsubseteq \rho(x)} {\rho \vdash_S x \Rightarrow v} \\[3ex] \frac{\forall (v,v'){\in} t.\; \rho(x{:=}v) \vdash_S e \Rightarrow v'} {\rho \vdash_S {\lambda x.\,}e \Rightarrow t} \quad \frac{\begin{array}{c}\rho \vdash_S e_1 \Rightarrow t \quad \rho \vdash_S e_2 \Rightarrow v_2 \\ (v_3,v'_3) \in t \quad v_3 \sqsubseteq v_2 \quad v \sqsubseteq v'_3 \end{array} } {\rho \vdash_S (e_1{\;}e_2) \Rightarrow v} \\[3ex] \frac{\rho \vdash_S e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_S e_2 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_S e_1 \Rightarrow 0 \quad \rho \vdash_S e_3 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]

For comparison, let us also turn Plotkin’s semantics into a relation. \[\begin{gathered} \frac{}{\rho \vdash_P n \Rightarrow n} \quad \frac {\rho \vdash_P e_1 \Rightarrow n_1 \quad \rho \vdash_P e_2 \Rightarrow n_2} {\rho \vdash_P e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \in \rho(x)} {\rho \vdash_P x \Rightarrow v} \\[3ex] \frac{\forall v' \in V'.\, \rho(x{:=}V) \vdash_P e \Rightarrow v'} {\rho \vdash_P {\lambda x.\,}e \Rightarrow (V,V')} \quad \frac{\begin{array}{c}\rho \vdash_P e_1 \Rightarrow (V,V') \quad \forall v_2 \in V.\, \rho \vdash_P e_2 \Rightarrow v_2 \\ v' \in V' \end{array} } {\rho \vdash_P (e_1{\;}e_2) \Rightarrow v'} \\[3ex] \frac{\rho \vdash_P e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_P e_2 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_P e_1 \Rightarrow 0 \quad \rho \vdash_P e_3 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\] Recall that in Plotkin’s semantics, the environment maps variables to finite sets of values. The “set” is needed to handle the case of a function bound to a variable, but is just extra baggage when we have an integer bound to a variable. So in the variable rule we have \(v \in \rho(x)\), which either extracts a singleton integer from \(\rho(x)\), or extracts one input-output entry from a function’s graph. Moving on to the lambda rule, it only produces one input-output entry, but to handle the case when the output \(V'\) is representing a function, we must build it up one entry at a time with the quantification \(\forall v'\in V'\) and a finite but arbitrary number of premises. In the application rule we again have a finite number of premises, with \(\forall v_2\in V\), and also the premise \(v' \in V'\).

The relational version of Engeler’s semantics removes the need for quantification in the lambda rule, but the application rule still has \(\forall v_2 \in V\). \[\begin{gathered} \frac{\rho(x{:=}V) \vdash_E e \Rightarrow v'} {\rho \vdash_E {\lambda x.\,}e \Rightarrow (V,v')} \quad \frac{\begin{array}{c}\rho \vdash_E e_1 \Rightarrow (V,v') \quad \forall v_2 \in V.\, \rho \vdash_E e_2 \Rightarrow v_2 \end{array} } {\rho \vdash_E (e_1{\;}e_2) \Rightarrow v'}\end{gathered}\]

Conclusion

My semantics is similar to Plotkin and Engeler’s in that

  1. The domain is a set of values, and values are inductively defined and involve finite sets.

  2. Self application is enabled by allowing a kind of subsumption on functions.

The really nice thing about all three semantics is that they are simple; very little mathematics is necessary to understand them, which is important pedagogically, practically (easier for practitioners to apply such semantics), and aesthetically (Occam’s razor!).

My semantics is different to Plotkin and Engeler’s in that

  1. the definition of values places \(\mathcal{P}_f\) so that functions are literally represented by finite graphs, and

  2. environments map each variable to a single value, and

  3. \(\sqsubseteq\) is used instead of \(\subseteq\) to enable self application.

The upshot of these (relatively minor) differences is that my semantics may be easier to understand.