Wednesday, September 12, 2018

Reading list for getting started on Gradual Typing

Which papers would I recommend for getting started on understanding the research on gradual typing? That's a hard question because there are a lot of papers to choose from and, as research papers, their primary goal was not to give a good introduction, but instead to describe some scientific contribution. I really ought to write a proper introduction, but in the mean time, here's my choice of a few papers to get started.
  1. Refined Criteria for Gradual Typing
    This paper does a decent job of surveying research related to gradual typing and situating it with respect to other areas of research in programming languages and type systems. The paper includes a modern and, what I would deem canonical, specification of the Gradually Typed Lambda Calculus (GTLC). Finally, the paper gives formal criteria for what it means for a language to be gradually typed, including the gradual guarantee.
  2. Blame and Coercion: Together Again for the First Time (alternative location)
    The runtime semantics of a gradually typed language is typically given in two parts: 1) a translation to a cast calculus and 2) an operational semantics for the cast calculus. Nowadays,  I recommend using coercions to express casts because they help to constrain the design space in a good way, they are easily extended to handle blame tracking, and they can be compressed to ensure space efficiency (time too!). This paper defines an easy-to-understand coercion calculus \(\lambda C\) and a space-efficient calculus \(\lambda S\), proves that they are equivalent to the standard cast calculus \(\lambda B\), and also reviews the blame safety theorem.
  3. Abstracting Gradual Typing (alternative location)
    This paper presents a general framework based on abstract interpretation for understanding gradual typing and for extending gradual typing to handle languages that make use of other predicates on types, such as subtyping. The framework provides guidance for how to define the consistency relation and for how to derive an operational semantics.
After reading the above papers, there's plenty more to enjoy! See the bibliography maintained by Sam Tobin-Hochstadt.

Thursday, August 09, 2018

Intersection Types, Sub-formula Property, and the Functional Character of the Lambda Calculus

Intersection Types, Sub-formula Property, and the Functional Character of the Lambda Calculus

Last December I proved that my graph model of the lambda calculus, once suitable restricted, is deterministic. That is, I defined a notion of consistency between values, written \(v_1 \sim v_2\), and showed that any two outputs of the same program are consistent.
Theorem (Determinism)
If \(v \in {\mathcal{E}{[\![ e ]\!]}}\rho\), \(v' \in {\mathcal{E}{[\![ e ]\!]}}\rho'\), and \(\rho \sim \rho'\), then \(v \sim v'\).
Recall that values are integers or finite relations; consistency for integers is equality and consistency for relations means mapping consistent inputs to consistent outputs. I then restricted values to be well formed, meaning that they must be consistent with themselves (and similarly for their parts).

Having proved the Determinism Theorem, I thought it would be straightforward to prove the following related theorem about the join of two values.
Theorem (Join)
If \(v \in {\mathcal{E}{[\![ e ]\!]}}\rho\), \(v' \in {\mathcal{E}{[\![ e ]\!]}}\rho'\), \(\rho\) is well formed, \(\rho'\) is well formed, and \(\rho \sim \rho'\),
then \(v \sqcup v' \in {\mathcal{E}{[\![ e ]\!]}}(\rho\sqcup\rho')\).
I am particularly interested in this theorem because \(\beta\)-equality can be obtained as a corollary. \[{\mathcal{E}{[\![ ({\lambda x.\,}e){\;}e' ]\!]}}\rho = {\mathcal{E}{[\![ [x{:=}e']e ]\!]}}\rho\] This would enable the modeling of the call-by-name \(\lambda\)-calculus and it would also enable the use of \(\beta\)-equality in a call-by-value setting when \(e'\) is terminating (instead of restricting \(e'\) to be a syntactic value).

Recall that we have defined a partial order \(\sqsubseteq\) on values, and that, in most partial orders, there is a close connection between notions of consistency and least upper bounds (joins). One typically has that \(v \sim v'\) iff \(v \sqcup v'\) exists. So my thinking was that it should be easy to adapt my proof of the Determinism Theorem to prove the Join Theorem, and I set out hoping to finish in a couple weeks. Hah! Here we are 8 months later and the proof is complete; it was a long journey that ended up depending on a result that was published just this summer, concerning intersection types, the sub-formula property, and cut elimination by Olivier Laurent. In this blog post I’ll try to recount the journey and describe the proof, hopefully remembering the challenges and motivations. Here is a tar ball of the mechanization in Isabelle and in pdf form.

Many of the challenges revolved around the definitions of \(\sqsubseteq\), consistency, and \(\sqcup\). Given that I already had definitions for \(\sqsubseteq\) and consistency, the obvious thing to try was to define \(\sqcup\) such that it would be the least upper bound of \(\sqsubseteq\). So I arrived at this partial function: \[\begin{aligned} n \sqcup n &= n \\ f_1 \sqcup f_2 &= f_1 \cup f_2\end{aligned}\] Now suppose we prove the Join Theorem by induction on \(e\) and consider the case for application: \(e = (e_1 {\;}e_2)\). From \(v \in {\mathcal{E}{[\![ e_1 {\;}e_2 ]\!]}}\) and \(v' \in {\mathcal{E}{[\![ e_1 {\;}e_2 ]\!]}}\) we have

  • \(f \in {\mathcal{E}{[\![ e_1 ]\!]}}\rho\), \(v_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho\), \(v_3 \mapsto v_4 \in f\), \(v_3 \sqsubseteq v_2\), and \(v \sqsubseteq v_4\) for some \(f, v_2, v_3, v_4\).

  • \(f' \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho'\), \(v'_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho'\), \(v'_3 \mapsto v'_4 \in f\), \(v'_3 \sqsubseteq v'_2\), and \(v' \sqsubseteq v'_4\) for some \(f', v'_2, v'_3, v'_4\).

By the induction hypothesis we have \(f \sqcup f' \in {\mathcal{E}{[\![ e_1 ]\!]}}\) and \(v_2 \sqcup v'_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\). We need to show that \[v''_3 \mapsto v''_4 \in f \sqcup f' \qquad v''_3 \sqsubseteq v_2 \sqcup v'_2 \qquad v \sqcup v' \sqsubseteq v''_4\] But here we have a problem. Given our definition of \(\sqcup\) in terms of set union, there won’t necessarily be a single entry in \(f \sqcup f'\) that combines the information from both \(v_3 \mapsto v_4\) and \(v'_3 \mapsto v'_4\). After all, \(f \sqcup f'\) contains all the entries of \(f\) and all the entries of \(f'\), but the set union operation does not mix together information from entries in \(f\) and \(f'\) to form new entries.

Intersection Types to the Rescue

At this point I started thinking that my definitions of \(\sqsubseteq\), consistency, and \(\sqcup\) were too simple, and that I needed to incorporate ideas from the literature on filter models and intersection types. As I’ve written about previously, my graph model corresponds to a particular intersection type system, and perhaps a different intersection type system would do the job. Recall that the correspondence goes as follows: values correspond to types, \(\sqsubseteq\) corresponds to subtyping \(<:\) (in reverse), and \(\sqcup\) corresponds to intersection \(\sqcap\). The various intersection type systems primarily differ in their definitions of subtyping. Given the above proof attempt, I figured that I would need the usual co/contra-variant rule for function types and also the following rule for distributing intersections over function types. \[(A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)\] This distributivity rule enables the “mixing” of information from two different entries.

So I defined types as follows: \[A,B,C,D ::= n \mid A \to B \mid A \sqcap B\] and defined subtyping according to the BCD intersection type system (Lambda Calculus with Types, Barendregt et al. 2013). \[\begin{gathered} A <: A \qquad \frac{A <: B \quad B <: C}{A <: C} \\[2ex] A \sqcap B <: A \qquad A \sqcap B <: B \qquad \frac{C <: A \quad C <: B}{C <: A \sqcap B} \\[2ex] \frac{C <: A \quad B <: D}{A \to B <: C \to D} \qquad (A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)\end{gathered}\] I then adapted the definition of consistency to work over types. (Because this definition uses negation, it is easier to define consistency as a recursive function in Isabelle instead of as an inductively defined relation.) \[\begin{aligned} n \sim n' &= (n = n') \\ n \sim (C \to D) &= \mathit{false} \\ n \sim (C \sqcap D) &= n \sim C \text{ and } n \sim D \\ (A \to B) \sim n' &= \mathit{false} \\ (A \to B) \sim (C \to D) &= (A \sim C \text{ and } B \sim D) \text{ or } A \not\sim C \\ (A \to B) \sim (C \sqcap D) &= (A \to B) \sim C \text{ and } (A \to B) \sim D \\ (A \sqcap B) \sim n' &= A \sim n' \text{ and } B \sim n' \\ (A \sqcap B) \sim (C \sqcap D) &= A \sim C \text{ and } A \sim D \text{ and } B \sim C \text{ and } B \sim D\end{aligned}\]

Turning back to the Join Theorem, I restated it in terms of the intersection type system and rebranded it the Meet Theorem. Instead of using the letter \(\rho\) for environments, we shall switch to \(\Gamma\) because they now contain types instead of values.
Theorem (Meet)
If \(\Gamma \vdash e : A\), \(\Gamma' \vdash e : B\), and \(\Gamma \sim \Gamma'\), then \(\Gamma\sqcap\Gamma' \vdash e : A \sqcap B\).
By restating the theorem in terms of intersection types, we have essentially arrived at the rule for intersection introduction. In other words, if we can prove this theorem we will have shown that the intersection introduction rule is admissible in our system.

While the switch to intersection types and subtyping enabled this top-level proof to go through, I got stuck on one of the lemmas that it requires, which is an adaptation of Proposition 3 of the prior blog post.
Lemma (Consistency and Subtyping)

  1. If \(A \sim B\), \(A <: C\), and \(B <: D\), then \(C \sim D\).

  2. If \(A \not\sim B\), \(C <: A\), \(D <: B\), then \(C \not\sim D\).

In particular, I got stuck in the cases where the subtyping \(A <: C\) or \(B <: D\) was derived using the transitivity rule.

Subtyping and the Sub-formula Property

For a long time I’ve disliked definitions of subtyping in which transitivity is given as a rule instead of proved as a theorem. There are several reasons for this: a subtyping algorithm can’t directly implement a transitivity rule (or any rule that is not syntax directed), reasoning by induction or cases (inversion) is more difficult, and it is redundant. Furthermore, the presence of the transitivity rule means that subtyping does not satisfy the sub-formula property. This term sub-formula property comes from logic, and means that a derivation (proof) of a formula only mentions propositions that are a part of the formulate to be proved. The transitivity rule breaks this property because the type \(B\) comes out of nowhere, it is not part of \(A\) or \(C\), the types in the conclusion of the rule.

So I removed the transitivity rule and tried to prove transitivity. For most type systems, proving the transitivity of subtyping is straightforward. But I soon realized that the addition of the distributivity rule makes it significantly more difficult. After trying and failing to prove transitivity for some time, I resorted to reading the literature. Unfortunately, it turns out that none of the published intersection type systems satisfied the sub-formula property and vast majority of them included the transitivity rule. However, there was one paper that offered some hope. In a 2012 article in Fundamenta Informaticae titled Intersection Types with Subtyping by Means of Cut Elimination, Olivier Laurent defined subtyping without transitivity and instead proved it, but his system still did not satisfy the sub-formula property because of an additional rule for function types. Nevertheless, Olivier indicated that he was interested in finding a version of the system that did, writing

“it would be much nicer and much more natural to go through a sub-formula property”

A lot of progress can happen in six years, so I sent an email to Olivier. He replied,

“Indeed! I now have two different sequent-calculus systems which are equivalent to BCD subtyping and satisfy the sub-formula property. I am currently writting a paper on this but it is not ready yet.”

and he attached the paper draft and the Coq mechanization. What great timing! Furthermore, Olivier would be presenting the paper, titled Intersection Subtyping with Constructors, at the Workshop on Intersection Types and Related System in Oxford on July 8, part of the Federated Logic Conference (FLOC). I was planning to attend FLOC anyways, for the DOMAINS workshop to celebrate Dana Scott’s 85th birthday.

Olivier’s systems makes two important changes compared to prior work: he combines the distributivity rule and the usual arrow rule into a single elegant rule, and to enable this, he generalizes the form of subtyping from \(A <: B\) to \(A_1,\ldots,A_n \vdash B\), which should be interpreted as meaning \(A_1 \sqcap \cdots \sqcap A_n <: B\). Having a sequence of formulas (types) on the left is characteristic of proof systems in logic, including both natural deduction systems and sequence calculi. (Sequent calculi, in addition, typically have a sequence on the right that means the disjunction of the formulas.) Here is one of Olivier’s systems, adapted to my setting, which I’ll describe below. Let \(\Gamma\) range over sequences of types. \[\begin{gathered} \frac{\Gamma_1, \Gamma_2 \vdash A} {\Gamma_1 , n, \Gamma_2 \vdash A} \qquad \frac{\Gamma_1, \Gamma_2 \vdash A} {\Gamma_1 , B \to C, \Gamma_2 \vdash A} \\[2ex] \frac{\Gamma \vdash A \quad \Gamma \vdash B}{\Gamma \vdash A \sqcap B} \qquad \frac{\Gamma_1,B,C,\Gamma_2 \vdash A}{\Gamma_1,B\sqcap C,\Gamma_2 \vdash A} \\[2ex] \frac{}{n \vdash n} \qquad \frac{A \vdash C_1, \ldots, C_n \quad D_1, \ldots, D_n \vdash B} {C_1\to D_1,\ldots, C_n\to D_n \vdash A \to B}\end{gathered}\] The first two rules are weakening rules for singleton integers and function types. There is no weakening rule for intersections. The third and fourth rules are introduction and elimination rules for intersection. The fifth rule is reflexivity for integers, and the last is the combined rule for function types.

The combined rule for function types says that the intersection of a sequence of function types \({\sqcap}_{i=1\ldots n} (C_i\to D_i)\) is a subtype of \(A \to B\) if \[A <: {\sqcap}_{i\in\{1\ldots n\}} C_i \qquad \text{and}\qquad {\sqcap}_{i\in\{1\ldots n\}} D_i <: B\] Interestingly, the inversion principle for this rule is the \(\beta\)-sound property described in Chapter 14 of Lambda Calculus with Types by Barendregt et al., and is the key to proving \(\beta\)-equality. In Olivier’s system, \(\beta\)-soundness falls out immediately, instead of by a somewhat involved proof.

The regular subtyping rule for function types is simply an instance of the combined rule in which the sequence on the left contains just one function type.

The next step for me was to enter Olivier’s definitions into Isabelle and prove transitivity via cut elimination. That is, I needed to prove the following generalized statement via a sequence of lemmas laid out by Olivier in his draft.
Theorem (Cut Elimination)
If \(\Gamma_2 \vdash B\) and \(\Gamma_1,B,\Gamma_3 \vdash C\), then \(\Gamma_1,\Gamma_2,\Gamma_3 \vdash C\).
The transitivity rule is the instance of cut elimination where \(\Gamma_2 = A\) and both \(\Gamma_1\) and \(\Gamma_3\) are empty.

Unfortunately, I couldn’t resist making changes to Olivier’s subtyping system as I entered it into Isabelle, which cost me considerable time. Some of Olivier’s lemmas show that the collection of types on the left, that is, the \(A's\) in \(A_1,\ldots, A_n \vdash B\), behave like a set instead of a sequence. I figured that if the left-hand-side was represented as a set, then I would be able to bypass several lemmas and obtain a shorter proof. I got stuck in proving Lemma \(\cap L_e\) which states that \(\Gamma_1,A\sqcap B,\Gamma_2 \vdash C\) implies \(\Gamma_1,A, B,\Gamma_2 \vdash C\). Olivier’s subtyping rules are carefully designed to minimize the amount of overlap between the rules, and switching to a set representation increases the amount of overlap, making the proof of this lemma more difficult (perhaps impossible?).

So after struggling with the set representation for some time, I went back to sequences and was able to complete the proof of cut elimination, with a little help from Olivier at FLOC. I proved the required lemmas in the following order.
Lemma (Weakening)
If \(\Gamma_1,\Gamma_2 \vdash A\), then \(\Gamma_1,B,\Gamma_2 \vdash A\).
(Proved by induction on \(A\).)
Lemma (Axiom)
\(A \vdash A\)
(Proved by induction on \(A\).)
Lemma (Permutation)
If \(\Gamma_1 \vdash A\) and \(\Gamma_2\) is a permutation of \(\Gamma_1\), then \(\Gamma_2 \vdash A\).
(Proved by induction on the derivation of \(\Gamma_1 \vdash A\), using many lemmas about permutations.)
Lemma (\(\cap L_e\))
If \(\Gamma_1,A\sqcap B,\Gamma_2 \vdash C\), then \(\Gamma_1,A, B,\Gamma_2 \vdash C\).
(Proved by induction on the derivation of \(\Gamma_1,A\sqcap B,\Gamma_2 \vdash C\).)
Lemma (Collapse Duplicates)
If \(\Gamma_1,A,A,\Gamma_2 \vdash C\), then \(\Gamma_1,A,\Gamma_2 \vdash C\).
(This is proved by well-founded induction on the lexicographical ordering of the pair \((n,k)\) where \(n\) is the size of \(A\) and \(k\) is the depth of the derivation of \(\Gamma_1,A,A,\Gamma_2 \vdash C\). Proof assistants such as Isabelle and Coq do not directly provide the depth of a derivation, but the depth can be manually encoded as an extra argument of the relation, as in \(\Gamma_1,A,A,\Gamma_2 \vdash_k C\).)
The Cut Elimination Theorem is then proved by well-founded induction on the triple \((n,k_1,k_2)\) where \(n\) is the size of B, \(k_1\) is the depth of the derivation of \(\Gamma_2 \vdash B\), and \(k_2\) is the depth of the derivation of \(\Gamma_1,B,\Gamma_3 \vdash C\).

We define subtyping as follows. \[A <: B \quad = \quad A \vdash B\]

The BCD subtyping rules and other derived rules follow from the above lemmas.
Proposition (Properties of Subtyping)

  1. \(A <: A\).

  2. If \(A <: B\) and \(B <: C\), then \(A <: C\).

  3. If \(C <: A\) and \(B <: D\), then \(A \to B <: C \to D\).

  4. If \(A_1 <: B\), then \(A_1 \sqcap A_2 <: B\).

  5. If \(A_2 <: B\), then \(A_1 \sqcap A_2 <: B\).

  6. If \(B <: A_1\) and \(B <: A_2\), then \(B <: A_1 \sqcap A_2\).

  7. If \(A <: C\) and \(B <: D\), then \(A \sqcap B <: C \sqcap D\).

  8. \((A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)\).

  9. \((A \to C) \sqcap (B \to D) <: (A\sqcap B) \to (C \sqcap D)\)

Consistency and Subtyping, Resolved

Recall that my switch to intersection types was motivated by my failure to prove the Consistency and Subtyping Lemma. We now return to the proof of that Lemma. We start with a handful of lemmas that are needed for that proof.
Lemma (Consistency is Symmetric and Reflexive)

  1. If \(A \sim B\), then \(B \sim A\).

  2. If \({\mathsf{wf}(A)}\), then \(A \sim A\).

It will often be convenient to decompose a type into its set of atoms, defined as follows. \[\begin{aligned} {\mathit{atoms}(n)} &= \{ n \} \\ {\mathit{atoms}(A\to B)} &= \{ A \to B \} \\ {\mathit{atoms}(A \sqcap B)} &= {\mathit{atoms}(A)} \cup {\mathit{atoms}(B)}\end{aligned}\]

The consistency of two types is determined by the consistency of its atoms.
Lemma (Atomic Consistency)

  1. If \(A \sim B\), \(C \in {\mathit{atoms}(A)}\), and \(D \in {\mathit{atoms}(B)}\), then \(C \sim D\).

  2. If (for any \(C \in {\mathit{atoms}(A)}\) and \(D \in {\mathit{atoms}(B)}\), \(C \sim D\)), then \(A \sim B\).

  3. If \(A \not\sim B\), then \(C \not\sim D\) for some \(C \in {\mathit{atoms}(A)}\) and \(D \in {\mathit{atoms}(B)}\).

  4. If \(C \not\sim D\), \(C \in {\mathit{atoms}(A)}\), and \(D \in {\mathit{atoms}(B)}\), then \(A \not\sim B\).

There are also several properties of subtyping and the atoms of a type.
Lemma (Atomic Subtyping)

  1. If \(A <: B\) and \(C \in {\mathit{atoms}(B)}\), then \(A <: C\).

  2. If \(A <: n\), then \(n \in {\mathit{atoms}(A)}\).

  3. \(n <: A\) if and only if \({\mathit{atoms}(A)} \subseteq \{ n \}\).

  4. If \(C <: A \to B\), then \(D\to E \in {\mathit{atoms}(C)}\) for some \(D,E\).

  5. If \(\Gamma \vdash A\) and every atom in \(\Gamma\) is a function type, then every atom of \(A\) is a function type.

And we have the following important inversion lemma for function types. We use the following abbreviations: \[\begin{aligned} \mathrm{dom}(\Gamma) &= \{ A \mid \exists B.\; A \to B \in \Gamma \}\\ \mathrm{cod}(\Gamma) &= \{ B \mid \exists A.\; A \to B \in \Gamma \}\end{aligned}\]

Lemma (Subtyping Inversion for Function Types)
If \(C <: A \to B\), then there is a sequence of function types \(\Gamma\) such that

  1. each element of \(\Gamma\) is an atom of \(C\),

  2. For every \(D\to E \in \Gamma\), we have \(A <: D\), and

  3. \({\sqcap}\mathrm{cod}(\Gamma) <: B\).

Note that item 2 above implies that \(A <: {\sqcap}\mathrm{dom}(\Gamma)\).

Lemma (Consistency and Subtyping)

  1. If \(A \sim B\), \(A <: C\), and \(B <: D\), then \(C \sim D\).

  2. If \(A \not\sim B\), \(C <: A\), \(D <: B\), then \(C \not\sim D\).

(1) The proof is by strong induction on the sum of the depths of \(A\), \(B\), \(C\), and \(D\). We define the depth of a type as follows. \[\begin{aligned} \mathit{depth}(n) &= 0 \\ \mathit{depth}(A \to B) &= 1 + \mathrm{max}(\mathit{depth}(A),\mathit{depth}(B)) \\ \mathit{depth}(A \sqcap B) &= \mathrm{max}(\mathit{depth}(A),\mathit{depth}(B)) \end{aligned}\] To show that \(C \sim D\) it suffices to show that all of their atoms are consistent. Suppose \(C' \in {\mathit{atoms}(C)}\) and \(D'\in{\mathit{atoms}(D)}\). So we need to show that \(C' \sim D'\). We proceed by cases on \(C'\).

  • Case \(C'=n_1\):
    We have \(A <: C'\) and therefore \(n_1 \in {\mathit{atoms}(A)}\). Then because \(A \sim B\), we have \({\mathit{atoms}(B)} \subseteq \{n_1\}\). We have \(B <: D'\), so we also have \({\mathit{atoms}(D)} \subseteq \{n_1\}\). Therefore \(C' \sim D'\).

  • Case \(C'=C_1\to C_2\):
    We have \(A <: C_1 \to C_2\), so by inversion we have some sequence of function types \(\Gamma_1\) such that every element of \(\Gamma_1\) is an atom of \(A\), \(C_1 <: {\sqcap}\mathrm{dom}(\Gamma_1)\), and \({\sqcap}\mathrm{cod}(\Gamma_1) <: C_2\).

    We also know that \(D'\) is a function type, say \(D'=D_1 \to D_2\). (This is because we have \(A <: C'\), so we know that \(A_1\to A_2 \in {\mathit{atoms}(A)}\) for some \(A_1,A_2\). Then because \(A \sim B\), we know that all the atoms in \(B\) are function types. Then because \(B <: D\) and \(D' \in {\mathit{atoms}(D)}\), we have that \(D'\) is a function type.) So by inversion on \(B <: D_1 \to D_2\), we have some sequence of function types \(\Gamma_2\) such that every element of \(\Gamma_2\) is an atom of \(B\), \(D_1 <: {\sqcap}\mathrm{dom}(\Gamma_2)\), and \({\sqcap}\mathrm{cod}(\Gamma_2) <: D_2\).

    It’s the case that either \(C_1 \sim D_1\) or \(C_1 \not\sim D_1\).

    • Sub-case \(C_1 \sim D_1\).
      It suffices to show that \(C_2 \sim D_2\). By the induction hypothesis, we have \({\sqcap}\mathrm{dom}(\Gamma_1) \sim {\sqcap}\mathrm{dom}(\Gamma_2)\).

      As an intermediate step, we shall prove that \({\sqcap}\mathrm{cod}(\Gamma_1) \sim {\sqcap}\mathrm{cod}(\Gamma_2)\), which we shall do by showing that all their atoms are consistent. Suppose \(A' \in {\mathit{atoms}({\sqcap}\mathrm{cod}(\Gamma_1))}\) and \(B' \in {\mathit{atoms}({\sqcap}\mathrm{cod}(\Gamma_2))}\). There is some \(A_1\to A_2 \in \Gamma_1\) where \(A' \in {\mathit{atoms}(A_2)}\). Similarly, there is \(B_1 \to B_2 \in \Gamma_2\) where \(B' \in {\mathit{atoms}(B_2)}\). Also, we have \(A_1 \to A_2 \in {\mathit{atoms}(A)}\) and \(B_1 \to B_2 \in {\mathit{atoms}(B)}\). Then because \(A \sim B\), we have \(A_1 \to A_2 \sim B_1 \to B_2\). Furthermore, we have \(A_1 \sim B_1\) because \({\sqcap}\mathrm{dom}(\Gamma_1) \sim {\sqcap}\mathrm{dom}(\Gamma_2)\), so it must be the case that \(A_2 \sim B_2\). Then because \(A' \in {\mathit{atoms}(A_2)}\) and \(B' \in {\mathit{atoms}(B_2)}\), we have \(A' \sim B'\). Thus concludes this intermediate step.

      By another use of the induction hypothesis, we have \(C_2 \sim D_2\), and this case is finished.

    • Sub-case \(C_1 \not\sim D_1\).
      Then we immediately have \(C_1 \to C_2 \sim D_1 \to D_2\).

  • Case \(C'=C_1\sqcap C_2\):
    We already know that \(C'\) is an atom, so we have a contradiction and this case is vacously true.

The next two lemmas follow from the Consistency and Subtyping Lemma and help prepare to prove the case for application in the Join Theorem.
Lemma (Application Consistency)
If \(A_1 \sim A_2\), \(B_1 \sim B_2\), \(A_1 <: B_1 \to C_1\), \(A_2 <: B_2 \to C_2\), and all these types are well formed, then \(C_1 \sim C_2\).
(This lemma is proved directly, without induction.)
Lemma (Application Intersection)
If \(A_1 <: B_1 \to C_1\), \(A_2 <: B_2 \to C_2\), \(A_1 \sim A_2\), \(B_1 \sim B_2\), and \(C_1 \sim C_2\), then \((A_1\sqcap A_2) <: (B_1 \sqcap B_2) \to (C_1 \sqcap C_2)\).
(This lemma is proved directly, without induction.)

Updating the Denotational Semantics

Armed with the Consistency and Subtyping Lemma, I turned back to the proof of the Join Theorem, but first I needed to update my denotational semantics to use intersection types instead of values. For this we’ll need the definition of well formed types that we alluded to earlier.

\[\begin{gathered} \frac{}{{\mathsf{wf}(n)}} \qquad \frac{{\mathsf{wf}(A)} \quad {\mathsf{wf}(B)}}{{\mathsf{wf}(A \to B)}} \qquad \frac{A \sim B \quad {\mathsf{wf}(A)} \quad {\mathsf{wf}(B)}}{{\mathsf{wf}(A \sqcap B)}}\end{gathered}\]

Here are some examples and non-examples of well-formed types. \[\begin{gathered} {\mathsf{wf}(4)} \qquad {\mathsf{wf}(3 \sqcap 3)} \qquad \neg {\mathsf{wf}(3 \sqcap 4)} \\ {\mathsf{wf}((0\to 1) \sqcap (2 \to 3))} \qquad \neg {\mathsf{wf}((0 \to 1) \sqcap (0 \to 2))}\end{gathered}\] It is sometimes helpful to think of well-formed types in terms of the equivalence classes determined by subtype equivalence: \[A \approx B \quad = \quad A <: B \text{ and } B <: A\] For example, we have \(3 \approx (3 \sqcap 3)\), so they are in the same equivalence class and \(3\) would be the representative.

We also introduce the following notation for all the well-formed types that are super-types of a given type. \[{\mathord{\uparrow} A} \quad = \quad \{ B\mid A <: B \text{ and } {\mathsf{wf}(B)} \}\]

We shall represent variables with de Bruijn indices, so an environment \(\Gamma\) is a sequence of types. The denotational semantics of the CBV \(\lambda\)-calculus is defined as follows. \[\begin{aligned} {\mathcal{E}{[\![ n ]\!]}}\Gamma &= {\mathord{\uparrow} n} \\ {\mathcal{E}{[\![ x ]\!]}}\Gamma &= {\mathrm{if}\;}x < |\Gamma| {\;\mathrm{then}\;}{\mathord{\uparrow} \Gamma[k]} {\;\mathrm{else}\;}\emptyset \\ {\mathcal{E}{[\![ \lambda e ]\!]}}\Gamma &= \{ A \mid {\mathsf{wf}(A)} \text{ and } {\mathcal{F}{[\![ A ]\!]}}e\Gamma \} \\ {\mathcal{E}{[\![ e_1{\;}e_2 ]\!]}}\Gamma &= \left\{ C\, \middle| \begin{array}{l} \exists A,B.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma,\\ A <: B \to C, \text{ and } {\mathsf{wf}(C)} \end{array} \right\} \\ {\mathcal{E}{[\![ f(e_1,e_2) ]\!]}}\Gamma &= \left\{ C\, \middle| \begin{array}{l} \exists A,B,n_1,n_2.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma,\\ A <: n_1, B <: n_2, {[\![ f ]\!]}(n_1,n_2) <: C, {\mathsf{wf}(C)} \end{array} \right\} \\ {\mathcal{E}{[\![ {\mathrm{if}\;}e_1 {\;\mathrm{then}\;}e_2 {\;\mathrm{else}\;}e_3 ]\!]}}\Gamma &= \left\{ B\, \middle| \begin{array}{l} \exists A, n.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, A <: n,\\ n = 0 \Rightarrow B \in {\mathcal{E}{[\![ e_3 ]\!]}}\Gamma,\\ n \neq 0 \Rightarrow B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma \end{array} \right\} \\[2ex] {\mathcal{F}{[\![ n ]\!]}}e\Gamma &= \mathit{false} \\ {\mathcal{F}{[\![ A \sqcap B ]\!]}}e \Gamma &= {\mathcal{F}{[\![ A ]\!]}}e\Gamma \text{ and } {\mathcal{F}{[\![ B ]\!]}}e\Gamma\\ {\mathcal{F}{[\![ A \to B ]\!]}}e \Gamma &= B \in {\mathcal{E}{[\![ e ]\!]}} (A, \Gamma)\end{aligned}\]

It is easy to show that swapping in a “super” environment does not change the semantics.

Lemma (Weakening)

  1. If \({\mathcal{F}{[\![ A ]\!]}}e \Gamma_1\), \(\Gamma_1 <: \Gamma_2\) and \((\forall B, \Gamma_1, \Gamma_2.\; B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1, \Gamma_2 <: \Gamma_1 \Rightarrow B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2)\), then \({\mathcal{F}{[\![ A ]\!]}}e \Gamma_2\).

  2. If \(A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1\) and \(\Gamma_2 <: \Gamma_1\), then \(A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2\).

(Part 1 is proved by induction on \(A\). Part 2 is proved by induction on \(e\) and uses part 1.)

The Home Stretch

Now for the main event, the proof of the Meet Theorem!
Theorem (Meet)
If \(A_1 \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1\), \(A_2 \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2\), both \(\Gamma_1\) and \(\Gamma_2\) are well formed, and \(\Gamma_1 \sim \Gamma_2\),
then \(A_1 \sqcap A_2 \in {\mathcal{E}{[\![ e ]\!]}}(\Gamma_1\sqcap\Gamma_2)\) and \({\mathsf{wf}(A_1 \sqcap A_2)}\).
Proof We proceed by induction on \(e\).

  • Case \(e=k\) (\(k\) is a de Bruijn index for a variable):
    We have \(\Gamma_1[k] <: A_1\) and \(\Gamma_2[k] <: A_2\), so \(\Gamma_1[k] \sqcap \Gamma_2[k] <: A_1 \sqcap A_2\). Also, because \(\Gamma_1 \sim \Gamma_2\) we have \(\Gamma_1[k] \sim \Gamma_2[k]\) and therefore \(A_1 \sim A_2\), by the Consistency and Subtyping Lemma. So we have \({\mathsf{wf}(A_1 \sqcap A_2)}\) and this case is finished.

  • Case \(e=n\):
    We have \(n <: A_1\) and \(n <: A_2\), so \(n <: A_1 \sqcap A_2\). Also, we have \(A_1 \sim A_2\) by the Consistency and Subtyping Lemma. So we have \({\mathsf{wf}(A_1 \sqcap A_2)}\) and this case is finished.

  • Case \(e=\lambda e\):
    We need to show that \({\mathsf{wf}(A_1 \sqcap A_2)}\) and \({\mathcal{F}{[\![ A_1 \sqcap A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)\). For the later, it suffices to show that \(A_1 \sim A_2\), which we shall do by showing that their atoms are consistent. Suppose \(A'_1 \in {\mathit{atoms}(A_1)}\) and \(A'_2 \in {\mathit{atoms}(A_2)}\). Because \({\mathcal{F}{[\![ A_1 ]\!]}}e\Gamma_1\) we have \(A'_1 =A'_{11} \to A'_{12}\) and \(A'_{12} \in {\mathcal{E}{[\![ e ]\!]}}(A'_{11},\Gamma_1)\). Similarly, from \({\mathcal{F}{[\![ A_2 ]\!]}}e\Gamma_2\) we have \(A'_2 =A'_{21} \to A'_{22}\) and \(A'_{22} \in {\mathcal{E}{[\![ e ]\!]}}(A'_{21},\Gamma_2)\). We proceed by cases on whether \(A'_{11} \sim A'_{21}\).

    • Sub-case \(A'_{11} \sim A'_{21}\):
      By the induction hypothesis, we have \({\mathsf{wf}(A'_{12} \sqcap A'_{22})}\) from which we have \(A'_{12} \sim A'_{22}\) and therefore \(A'_{11}\to A'_{12} \sim A'_{21} \to A'_{22}\).

    • Sub-case \(A'_{11} \not\sim A'_{21}\):
      It immediately follows that \(A'_{11}\to A'_{12} \sim A'_{21} \to A'_{22}\).

    It remains to show \({\mathcal{F}{[\![ A_1 \sqcap A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)\). This follows from two uses of the Weakening Lemma to obtain \({\mathcal{F}{[\![ A_1 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)\) and \({\mathcal{F}{[\![ A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)\).

  • Case \(e = (e_1 {\;}e_2)\):
    We have \[B_1 \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma_1 \quad C_1 \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma_1 \quad B_1 <: C_1 \to A_1 \quad {\mathsf{wf}(A_1)}\] and \[B_2 \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma_2 \quad C_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma_2 \quad B_2 <: C_2 \to A_2 \quad {\mathsf{wf}(A_2)}\] By the induction hypothesis, we have \[B_1 \sqcap B_2 \in {\mathcal{E}{[\![ e_1 ]\!]}}(\Gamma_1 \sqcap \Gamma_2) \quad {\mathsf{wf}(B_1 \sqcap B_2)}\] and \[C_1 \sqcap C_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}(\Gamma_1 \sqcap \Gamma_2) \quad {\mathsf{wf}(C_1 \sqcap C_2)}\] We obtain \(A_1 \sim A_2\) by the Application Consistency Lemma, and then by the Application Intersection Lemma we have \[B_1 \sqcap B_2 <: (C_1 \sqcap C_2) \to (A_1 \sqcap A_2)\] So we have \(A_1 \sqcap A_2 \in {\mathcal{E}{[\![ e ]\!]}}(\Gamma_1 \sqcap \Gamma_2)\).

    Also, from \(A_1 \sim A_2\), \({\mathsf{wf}(A_1)}\), and \({\mathsf{wf}(A_2)}\), we conclude that \({\mathsf{wf}(A_1 \sqcap A_2)}\).

  • Case \(e= f(e_1,e_2)\):
    (This case is not very interesting. See the Isabelle proof for the details.)

  • Case \(e= {\mathrm{if}\;}e_1 {\;\mathrm{then}\;}e_2 {\;\mathrm{else}\;}e_3\):
    (This case is not very interesting. See the Isabelle proof for the details.)

I thought that the following Subsumption Theorem would be needed to prove the Meet Theorem, but it turned out not to be necessary, which is especially nice because the proof of the Subsumption Theorem turned out to depend on the Meet Theorem!
Theorem (Subsumption)
If \(A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma\), \(A <: B\), and both \(B\) and \(\Gamma\) are well-formed, then \(B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma\).
The proof is by induction on \(e\) and all but the case \(e=\lambda e'\) are straightforward. For that case, we use the following lemmas.
Lemma (Distributivity for \(\mathcal{F}\))
If \({\mathcal{F}{[\![ (A \to B)\sqcap (C \to D) ]\!]}} e \Gamma\), \(A \sim C\), and everything is well formed, then \({\mathcal{F}{[\![ (A\sqcap C) \to (B\sqcap D) ]\!]}} e \Gamma\).
(The proof is direct, using the Meet Theorem and the Weakening Lemma.)
Lemma (\(\mathcal{F}\) and Intersections)
Suppose \(\Gamma_1\) is a non-empty sequence of well-formed and consistent function types. If \({\mathcal{F}{[\![ {\sqcap}\Gamma_1 ]\!]}} e \Gamma_2\), then \({\mathcal{F}{[\![ {\sqcap}\mathrm{dom}(\Gamma_1) \to {\sqcap}\mathrm{cod}(\Gamma_1) ]\!]}} e \Gamma_2\).
(The proof is by induction on \(\Gamma_1\) and uses the previous lemma.)

Conclusion

This result can be viewed a couple ways. As discussed at the beginning of this post, establishing the Meet Theorem means that the this call-by-value denotational semantics respects \(\beta\)-equality for any terminating argument expression. This is useful in proving the correctness of a function inlining optimizer. Also, it would be straightforward to define a call-by-name (or need) version of the semantics that respects \(\beta\)-equality unconditionally.

Secondly, from the viewpoint of intersection type systems, this result shows that, once we require types to be well formed (i.e. self consistent), we no longer need the intersection introduction rule because it is a consequence of having the subtyping rule for distributing intersections through function types.

Tuesday, April 24, 2018

What do real numbers have in common with lambdas? and what does continuity have to do with it?

Continuous functions over the real numbers

As a high school student and undergraduate I learned in Calculus that
  1. real numbers involve infinity in precision, e.g. some have no finite decimal representation, and
  2. a continuous function forms an unbroken line, a necessary condition to be differentiable.
For an example, the decimal representation of \(\sqrt 2\) goes on forever: \[1.41421 \ldots\] Later on, in a course on Real Analysis, I learned that one way to define the real numbers is to declare them to be Cauchy sequences, that is, infinite sequences of rational numbers that get closer and closer together. So, for example, \(\sqrt 2\) is declared to be the sequence \(1, \frac{3}{2}, \frac{17}{12}, \frac{577}{408}, \ldots\) described by the following recursive formulas.
\[A_0 = 1 \qquad A_{n+1} = \frac{A_n}{2} + \frac{1}{A_n} \hspace{1in} (1)  \label{eq:caucy-sqrt-2}\]
Depending on how close an approximation to \(\sqrt 2\) you need, you can go further out in this sequence. (Alternatively, one can represent \(\sqrt 2\) by its sequence of continued fractions.)
For an example of a continuous function, Figure 1 depicts \(x^3 - x^2 - 4x\). On the other hand, Figures 2 and 3 depict functions that are not continuous. The function \(1/\mathrm{abs}(x-\sqrt 2)^{1/4}\) in Figure 2 is not continuous because it goes to infinity as it approaches \(\sqrt 2\). The function \((x+1)\,\mathrm{sign}(x)\) in Figure 3 is not continuous because it jumps from \(-1\) to \(1\) at \(0\).

Figure 1. The function \(x^3 - x^2 - 4x\) is continuous.
Figure 2. The function \(1/\mathrm{abs}(x-\sqrt 2)^{1/4}\) is not continuous at \(\sqrt 2\).
Figure 3. The function \((x+1)\,\mathrm{sign}(x)\) is not continuous at \(0\).

You may recall the \(\epsilon\)-\(\delta\) definition of continuity, stated below and depicted in Figure 4.
A function \(f\) is continuous at a point \(x\) if for any \(\epsilon > 0\) there exists a \(\delta > 0\) such that for any \(x'\) in the interval \((x - \delta,x+\delta)\), \(f(x')\) is in \((f(x) -\epsilon, f(x) + \epsilon)\).
In other words, when a function is continuous, if you want to determine its result with an accuracy of \(\epsilon\), you need to measure the input with an accuracy of \(\delta\).

Figure 4. The \(\epsilon\)-\(\delta\) definition of continuity.
One connection between the infinite nature of real numbers and continuity that only recently sunk-in is that continuous functions are the ones that can be reasonably approximated by applying them to approximate, finitely-represented inputs. For example, suppose you wish to compute \(f(\sqrt 2)\) for some continuous function \(f\). You can accomplish this by applying \(f\) to each rational number in the Cauchy sequence for \(\sqrt 2\) until two subsequent results are closer than your desired accuracy. On the other hand, consider trying to approximate the function from Figure 2 by applying it to rational numbers in the Cauchy sequence for \(\sqrt 2\). No matter how far down the sequence you go, you’ll still get a result that is wrong by an infinite margin!

The \(\lambda\)-calculus and continuous functions

In graduate school I studied programming languages and learned that
  1. the \(\lambda\)-calculus is a little language for creating and applying functions, and
  2. Dana S. Scott’s semantics of the \(\lambda\)-calculus interprets \(\lambda\)’s as continuous functions.
For example, the \(\lambda\) expression \[\lambda x.\; x + 1\] creates an anonymous function that maps its input \(x\), say a natural number, to the next greatest one. The graph of this function is \[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\}\] which is infinite. So we have our first similarity between the real numbers and \(\lambda\)’s, both involve infinity.
A key characteristic of the \(\lambda\)-calculus is that functions can take functions as input. Thus, the semantics of the \(\lambda\)-calculus is also concerned with functions over infinite entities (just like functions over the real numbers). For example, here is a \(\lambda\) expression that takes a function \(f\) and produces a function that applies \(f\) twice in succession to its input \(x\). \[\lambda f.\; \lambda x.\; f(f(x))\] The graph of this function is especially difficult to write down. Not only does it have an infinite domain and range, but each element in the domain and range is an infinite entity. \[\left\{ \begin{array}{l} \{ 0\mapsto 1, 1\mapsto 2, 2\mapsto 3, \ldots \} \mapsto \{ 0\mapsto 2, 1\mapsto 3, 2\mapsto 4, \ldots \},\\ \{ 0\mapsto 0, 1\mapsto 2, 2\mapsto 4, \ldots \} \mapsto \{ 0\mapsto 0, 1\mapsto 4, 2\mapsto 8, \ldots \},\\ \ldots \end{array} \right\}\]
Denotational semantics for the \(\lambda\)-calculus interpret \(\lambda\)’s as continuous functions, so just based on the terminology there should be another similarity with real numbers! However, these continuous functions are over special sets called domains, not real numbers, and the definition of continuity in this setting bears little resemblance to the \(\epsilon\)-\(\delta\) definition. For example, in Dana S. Scott’s classic paper Data Types as Lattices, the domain is the powerset of the natural numbers, \(\mathcal{P}(\mathbb{N})\). This domain can be used to represent a function's graph by encoding (create a bijection) between pairs and natural numbers, and between sets and naturals. The following are the easier-to-specify directions of the two bijections, the mapping from pairs to naturals and the mapping from naturals to sets of naturals.
\[\begin{aligned} \langle n, m \rangle &= 2^n (2m+1) - 1 \\ \mathsf{set}(0) &= \emptyset \\ \mathsf{set}(1+k) &= \{ m \} \cup \mathsf{set}(n) & \text{if } \langle n, m \rangle = k\end{aligned}\]
Scott defines the continuous functions on \(\mathcal{P}(\mathbb{N})\) as those functions \(h\) that satisfy
\[h(f) = \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \hspace{1in} (2) \label{eq:cont-pn}\]
In other words, the value of a continuous function \(h\) on some function \(f \in \mathcal{P}(\mathbb{N})\) must be the same as the union of applying \(h\) to all the finite subgraphs of \(f\). One immediately wonders, why are the \(\lambda\)-definable functions continuous in this sense? Consider some \(\lambda\) expression \(h\) that takes as input a function \(f\).
But \(f\) is a function; an infinite object. What does it mean to “compute” with an “infinite” argument? In this case it means most simply that \(h(f)\) is determined by asking of \(f\) finitely many questions: \(f(m_0), f(m_1), ..., f(m_{k-1})\). —Dana S. Scott, A type-theoretical alternative to ISWIM, CUCH, OWHY, 1969.
Put another way, if \(h\) terminates and returns a result, then it will only have had a chance to call \(f\) finitely many times. So it suffices to apply \(h\) instead to a finite subset of the graph of \(f\). However, we do not know up-front which subset of \(f\) to use, but it certainly suffices to try all of them!

Relating the two kinds of continuity

But what does equation (2) have to do with continuous functions over the real numbers? What does it have to do with the \(\epsilon\)-\(\delta\) definition? This question has been in the back of my mind for some time, but only recently have I had the opportunity to learn the answer.
To understand how these two kinds of continuity are related, it helps to focus on the way that infinite entities can be approximated with finite ones in the two settings. We can approximate a real number with a rational interval. For example, refering back to the Cauchy sequence for \(\sqrt 2\), equation (1), we have \[\sqrt 2 \in \left(\frac{17}{12}, \frac{3}{2}\right)\] Of course an approximation does not uniquely identify the thing it approximates. So there are other real numbers in this interval, such as \(\sqrt{2.1}\). \[\sqrt{2.1} \in \left(\frac{17}{12}, \frac{3}{2}\right)\]
Likewise we can approximate the infinite graph of a function with a finite part of its graph. For example, let \(G\) be the a graph with just one input-output entry. \[G=\{ 1 \mapsto 2 \}\] Then we consider \(G\) to be an approximation of any function that agrees with \(G\) (maps \(1\) to \(2\)), which is to say its graph is a superset of \(G\). So the set of all functions that are approximated by \(G\) can be expressed with a set comprehension as follows: \(\{ f \mid G \subseteq f\}\). In particular, the function \(+1\) that adds one to its input is approximated by \(G\). \[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\] But also the function \(\times 2\) that doubles its input is approximated by \(G\). \[\left\{ \begin{array}{l} 0\mapsto 0, \\ 1\mapsto 2, \\ 2\mapsto 4, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\] Of course, a better approximation such as \(G'=\{1\mapsto 2, 2\mapsto 3\}\) is able to tell these two functions apart.
The interval \((17/12, 3/2)\) and the set \(\{f\mid G \subseteq f\}\) are both examples of neighborhoods (aka. base elements) in a topological space. The field of Topology was created to study the essence of continuous functions, capturing the similarities and abstracting away the differences regarding how such functions work in different settings. A topological space is just some set \(X\) together with a collection \(B\) of neighborhoods, called a base, that must satisfy a few conditions that we won’t get into. We’ve already seen two topological spaces.
  1. The real numbers form a topological space where each neighborhood consists of all the real numbers in a rational interval.
  2. The powerset \(\mathcal{P}(\mathbb{N})\) forms a topological space where each neighborhood consists of all the functions approximated by a finite graph.
The \(\epsilon\)-\(\delta\) definition of continuity generalizes to topological spaces: instead of talking about intervals, it talks generically about neighborhoods. In the following, the interval \((f(x) -\epsilon, f(x) + \epsilon)\) is replaced by neighborhood \(E\) and the interval \((x - \delta,x+\delta)\) is replaced by neighborhood \(D\).
A function \(f\) is continuous at a point \(x\) if for any neighborhood \(E\) that contains \(f(x)\), there exists a neighborhood \(D\) that contains \(x\) such that for any \(y\) in \(D\), \(f(y)\) is in \(E\).
Now let us instantiate this topological definition of continuity into \(\mathcal{P}(\mathbb{N})\).
A function \(f\) over \(\mathcal{P}(\mathbb{N})\) is continuous at \(X\) if for any finite set \(E\) such that \(E \subseteq f(X)\), there exists a finite set \(D\) with \(D \subseteq X\) such that for any \(Y\), \(D \subseteq Y\) implies \(E \subseteq f(Y)\).
Hmm, this still doesn’t match up with the definition of continuity in equation (2) but perhaps they are equivalent. Let us take the above as the definition and try to prove equation (2).
First we show that \[h(f) \subseteq \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}\] Let \(x'\) be an arbitrary element of \(h(f)\). To show that \(x'\) is in the right-hand side we need to identify some finite \(g\) such that \(g \subseteq f\) and \(x' \in h(g)\), that is, \(\{x'\} \subseteq h(g)\). But this is just what continuity gives us, taking \(h\) as \(f\), \(f\) as \(X\), \(\{x'\}\) as \(E\), \(g\) as \(D\), and also \(g\) as \(Y\). Second we need show that \[\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \subseteq h(f)\] This time let \(x'\) be an element of \(\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}\). So we known there is some finite set \(g\) such that \(x' \in h(g)\) and \(g \subseteq f\). Of course \(\{x'\}\) is a finite set and \(\{x'\} \subseteq h(g)\), so we can apply the definition of continuity to obtain a finite set \(E\) such that \(E \subseteq g\) and for all \(Y\), \(E \subseteq Y\) implies \(\{x'\} \subseteq h(Y)\). From \(E \subseteq g\) and \(g \subseteq f\) we transitively have \(E \subseteq f\). So instantiating \(Y\) with \(f\) we have \(\{x'\} \subseteq h(f)\) and therefore \(x' \in h(f)\).
We have shown that the topologically-derived definition of continuity for \(\mathcal{P}(\mathbb{N})\) implies the definition used in the semantics of the \(\lambda\)-calculus, i.e., equation (2). It is also straightforward to prove the other direction, taking equation (2) as given and proving that the topologically-derived definition holds. Thus, continuity for functions over real numbers really is similar to continuity for \(\lambda\) functions, they are both instances of continuous functions in a topological space.

Continuous functions over partial orders

In the context of Denotational Semantics, domains are often viewed as partial orders where the ordering \(g \sqsubseteq f\) means that \(g\) approximates \(f\), or \(f\) is more informative than \(g\). The domain \(\mathcal{P}(\mathbb{N})\) with set containment \(\subseteq\) forms a partial order. Refering back to the examples in the first section, with \(G=\{ 1 \mapsto 2 \}\) and \(G'=\{1\mapsto 2, 2\mapsto 3\}\), we have \(G \sqsubseteq G'\), \(G' \sqsubseteq +1\), and \(G \sqsubseteq \times 2\). In a partial order, the join \(x \sqcup y\) of \(x\) and \(y\) is the least element that is greater than both \(x\) and \(y\). For the partial order on \(\mathcal{P}(\mathbb{N})\), join corresponds to set union.
In the context of partial orders, continuity is defined with respect to infinite sequences of ever-better approximations: \[f_0 \sqsubseteq f_1 \sqsubseteq f_2 \sqsubseteq \cdots\] A function \(h\) is continuous if applying it to the join of the sequence is the same as applying it to each element of the sequence and then taking the join.
\[h\left(\bigsqcup_{n\in\mathbb{N}} f_n\right) = \bigsqcup_{n\in\mathbb{N}} h(f_n) \hspace{1in} (3) \label{eq:cont-cpo}\]
But this equation is not so different from the equation (2) that expresses continuity on \(\mathcal{P}(\mathbb{N})\). For any function \(f\) (with infinite domain) we can find an sequence \((f_n)_{n=0}^{\infty}\) of ever-better but still finite approximations of \(f\) such that \[f = \bigsqcup_{n\in\mathbb{N}} f_n\] Then both equation (2) and (3) tell us that \(h(f)\) is equal to the union of applying \(h\) to each \(f_n\).

Further Reading

The following is the list of resources that I found helpful in trying to understand the relationship between real numbers, \(\lambda\)’s, and the role of continuity.
  • Data Types as Lattices by Dana S. Scott.
  • A type-theoretical alternative to ISWIM, CUCH, OWHY by Dana S. Scott.
  • The Formal Semantics of Programming Languages by Glynn Winskel.
  • Topology via Logic by Steven Vickers.
  • Topology (2nd Edition) by James R. Munkres.
  • Introduction to Lattices and Order by B.A. Davey and H.A. Priestley.
  • The Wikipedia articles on

Saturday, December 23, 2017

Putting the Function back in Lambda

Happy holidays! There’s nothing quite like curling up in a comfy chair on a rainy day and proving a theorem in your favorite proof assistant.

Lately I’ve been interested in graph models of the \(\lambda\)-calculus, that is, models that represent a \(\lambda\) with relations from inputs to outputs. The use of relations instead of functions is not a problem when reasoning about expressions that produce numbers, but it does introduce problems when reasoning about expressions that produce higher-order functions. Some of these expressions are contextually equivalent but not denotationally equivalent. For example, consider the following two expressions. \[{\lambda f.\,} (f {\;}0) + (f {\;}0) =_{\mathrm{ctx}} {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) \qquad\qquad (1)\] The expression on the left-hand side has two copies of a common subexpression \((f {\;}0)\). The expression on the right-hand side is optimized to have just a single copy of \((f {\;}0)\). The left and right-hand expressions in equation (1) are contextually equivalent because the \(\lambda\)-calculus is a pure language (no side effects), so whether we call \(f\) once or twice does not matter, and it always returns the same result given the same input. Unfortunately, the two expressions in equation (1) are not denotationally equivalent. \[{\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset \neq {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset \qquad\qquad (2)\] Recall that my semantics \(\mathcal{E}\) maps an expression and environment to a set of values. The “set” is not because an expression produces multiple conceptually-different values. Sets are needed because we represent a (infinite) function as an infinite set of finite relations. So to prove the above inequality (2) we simply need to find a value that is in the set on the left-hand side that is not in the set on the right-hand side. The idea is that we consider the behavior when parameter \(f\) is bound to a relation that is not a function. In particular, the relation \[R = \{ (0,1), (0,2) \}\] Now when we consider the application \((f {\;}0)\), the semantics of function application given by \(\mathcal{E}\) can choose the result to be either \(1\) or \(2\). Furthermore, for the left-hand side of equation (2), it could choose \(1\) for the first \((f {\;}0)\) and \(2\) for the second \((f {\;}0)\) . Thus, the result of the function can be \(3\). \[\{ (R,3) \} \in {\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset\] Of course, this function could never actually produce \(3\) because \(R\) does not correspond to any \(\lambda\)’s. In other words, garbage-in garbage-out. Turning to the right-hand side of equation (2), there is only one \((f{\;}0)\), which can either produce \(1\) or \(2\), so the result of the outer function can be \(2\) or \(4\), but not \(3\).

\[\begin{aligned} \{ (R,2) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,3) \} &\notin {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,4) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\end{aligned}\]

So we need to put the function back in \(\lambda\)! That is, we need to restrict the notion of values so that all the relations are also functions. Recall the definition: a function \(f\) is a relation on two sets \(A\) and \(B\) such that for all \(a \in A\) there exists a unique \(b \in B\) such that \((a,b) \in f\). In other words, if \((a,b) \in f\) and \((a,b') \in f\), then necessarily \(b = b'\). Can we simply add this restriction to our notion of value? Not quite. If we literally applied this definition, we could still get graphs such as the following one, which maps two different approximations of the add-one function to different outputs. This graph does not correspond to any \(\lambda\). \[\{ (\{(0,1)\}, 2), (\{(0,1),(5,6) \}, 3) \}\]

So we need to generalize the notion of function to allow for differing approximations. We shall do this by generalizing from equality to consistency, written \(\sim\). Two integers are consistent when they are equal. Two graphs as consistent when they map consistent inputs to consistent outputs. We are also forced to explicitly define inconsistency, which we explain below.

\[\begin{gathered} \frac{}{n \sim n} \qquad \frac{\begin{array}{l}\forall v_1 v'_1 v_2 v'_2, (v_1,v'_1) \in t_1 \land (v_2,v'_2) \in t_2 \\ \implies (v_1 \sim v_2 \land v'_1 \sim v'_2) \lor v_1 \not\sim v_2 \end{array}} {t_1 \sim t_2} \\[2ex] \frac{n_1 \neq n_2}{n_1 \not\sim n_2} \qquad \frac{(v_1,v'_1) \in t_1 \quad (v_2,v'_2) \in t_2 \quad v_1 \sim v_2 \quad v'_1 \not\sim v'_2} {t_1 \not\sim t_2} \\[2ex] \frac{}{n \not\sim t} \qquad \frac{}{t \not\sim n}\end{gathered}\]

The definition of consistency is made a bit more complicated than I expected because the rules of an inductive definition must be monotonic, so we can’t negate a recursive application or put it on the left of an implication. In the above definition of consistency for graphs \(t_1 \sim t_2\), it would have been more natural to say \(v_1 \sim v_2 \implies v'_1 \sim v'_2\) in the premise, but then \(v_1 \sim v_2\) is on the left of an implication. The above inductive definition works around this problem by mutually defining consistency and inconsistency. We then prove that inconsistency is the negation of consistency.

Proposition 1 (Inconsistency) \(v_1 \not\sim v_2 = \neg (v_1 \sim v_2)\)
Proof. We first establish by mutual induction that \(v_1 \sim v_2 \implies \neg (v_1 \not\sim v_2)\) and \(v_1 \not\sim v_2 \implies \neg (v_1 \sim v_2)\). We then show that \((v_1 \sim v_2) \lor (v_1 \not\sim v_2)\) by induction on \(v_1\) and case analysis on \(v_2\). Therefore \(\neg (v_1 \not\sim v_2) \implies v_1 \sim v_2\), so we have proved both directions of the desired equality. \(\Box\)

Armed with this definition of consistency, we can define a generalized notion of function, let’s call it \(\mathsf{is\_fun}\). \[\mathsf{is\_fun}\;t \equiv \forall v_1 v_2 v'_1 v'_2, (v_1,v'_1) \in t \land (v_2,v'_2) \in t \land v_1 \sim v_2 \implies v'_1 \sim v'_2\] Next we restrict the notion of value to require the graphs to satisfy \(\mathsf{is\_fun}\). Recall that we use to define values by the following grammar. \[\begin{array}{lrcl} \text{numbers} & n & \in & \mathbb{Z} \\ \text{graphs} & t & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \}\\ \text{values} & v & ::= & n \mid t \end{array}\] We keep this definition but add an induction definition of a more refined notion of value, namely \(\mathsf{is\_val}\). Numbers are values and graphs are values so long as they satisfy \(\mathsf{is\_fun}\) and only map values to values.

\[\begin{gathered} \frac{}{\mathsf{is\_val}\,n} \qquad \frac{\mathsf{is\_fun}\;t \quad \forall v v', (v,v') \in t \implies \mathsf{is\_val}\,v \land \mathsf{is\_val}\,v'} {\mathsf{is\_val}\,t}\end{gathered}\]

We are now ready to update our semantic function \(\mathcal{E}\). The one change that we make is to require that each graph \(t\) satisfies \(\mathsf{is\_val}\) in the meaning of a \(\lambda\). \[{\mathcal{E}[\![ {\lambda x.\,} e ]\!]}\rho = \{ t \mid \mathsf{is\_val}\;t \land \forall (v,v')\in t, v' \in {\mathcal{E}[\![ e ]\!]}\rho(x{:=}v) \}\] Hopefully this change to the semantics enables a proof that \(\mathcal{E}\) is deterministic. Indeed, we shall show that if \(v \in {\mathcal{E}[\![ e ]\!]}\rho\) and \(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\) for any suitably related \(\rho\) and \(\rho'\), then \(v \sim v'\).

To relate \(\rho\) and \(\rho'\), we extend the definitions of consistency and \(\mathsf{is\_val}\) to environments.

\[\begin{gathered} \emptyset \sim \emptyset \qquad \frac{v \sim v' \quad \rho \sim \rho'} {\rho(x{:=}v) \sim \rho'(x{:=}v')} \\[2ex] \mathsf{val\_env}\;\emptyset \qquad \frac{\mathsf{is\_val}\; v \quad \mathsf{val\_env}\;\rho} {\mathsf{val\_env}\;\rho(x{:=}v)}\end{gathered}\]

We will need a few small lemmas concerning these definitions and their relationship with the \(\sqsubseteq\) ordering on values.

Proposition 2 

  1. If \(\mathsf{val\_env}\;\rho\) and \(\rho(x) = v\), then \(\mathsf{is\_val}\; v\).

  2. If \(\rho \sim \rho'\), \(\rho(x) = v\), \(\rho'(x) = v'\), then \(v \sim v'\).

Proposition 3 

  1. If \(\mathsf{is\_val}\;v'\) and \(v \sqsubseteq v'\), then \(\mathsf{is\_val}\; v\).

  2. If \(v_1 \sqsubseteq v'_1\), \(v_2 \sqsubseteq v'_2\), and \(v'_1 \sim v'_2\), then \(v_1 \sim v_2\).

We now come to the main theorem, which is proved by induction on \(e\), using the above three propositions.

Theorem (Determinism of \(\mathcal{E}\)) If \(v \in {\mathcal{E}[\![ e ]\!]}\rho\), \(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\), \(\mathsf{val\_env}\;\rho\), \(\mathsf{val\_env}\;\rho'\), and \(\rho \sim \rho'\), then \(\mathsf{is\_val}\;v\), \(\mathsf{is\_val}\;v'\), and \(v \sim v'\).

Sunday, October 15, 2017

New revision of the semantics paper (POPL rejection, ESOP submission)

My submission about declarative semantics to POPL was rejected. It's been a few weeks now, so I'm not so angry about it anymore. I've revised the paper and will be submitting it to ESOP this week.

The main reason for rejection according to the reviewers was a lack of technical novelty, but I think the real reasons were that 1) the paper came across as too grandiose and as a result, it accidentally annoyed the reviewer who is an expert in denotational semantics, and 2) the paper did not do a good job of comparing to the related set-theoretic models of Plotkin and Engeler.

Regarding 1), in the paper I use the term "declarative semantics" to try and distance this new semantics from the standard lattice-based denotational semantics. However, the reviewer took it to claim that the new semantics is not a denotational semantics, which is clearly false. In the new version of the paper I've removed the term "declarative semantics" and instead refer to the new semantics as a denotational semantics of the "elementary" variety. Also, I've toned down the sales pitch to better acknowledge that this new semantics is not the first elementary denotational semantics.

Regarding 2), I've revised the paper to include a new section at the beginning that gives background on the elementary semantics of Plotkin, Engeler, and Coppo et al. This should help put the contributions of the paper in context.

Other than that, I've added a section with a counter example to full abstraction. A big thanks to the POPL reviewers for the counter example! (Also thanks to Max New, who sent me the counter example a couple months ago.)

Unfortunately, the ESOP page limit is a bit shorter, so I removed the relational version of the semantics and also the part about mutable references.

A draft of the revision is available on arXiv. Feedback is most welcome, especially from experts in denotational semantics! I really hope that this version is no longer annoying, but if it is, please tell me!

Tuesday, October 03, 2017

Comparing to Plotkin and Engeler's Set-theoretic Models of the Lambda Calculus

On the plane ride back from ICFP last month I had a chance to re-read and better understand Plotkin’s Set-theoretical and other elementary models of the \(\lambda\)-calculus (Technical Report 1972, Theoretical Computer Science 1993) and to read, for the first time, Engeler’s Algebras and combinators (Algebra Universalis 1981). As I wrote in my draft paper Declarative semantics for functional languages: compositional, extensional, and elementary, the main intuitions behind my simple semantics are present in these earlier papers, but until now I did not understand these other semantics deeply enough to give a crisp explanation of the similarities and differences. (The main intuitions are also present in the early work on intersection type systems, and my semantics is more closely related to those systems. A detailed explanation of that relationship is given in the draft paper.)

I should note that Engeler’s work was in the context of combinators (S and K), not the \(\lambda\)-calculus, but of course the \(\lambda\)-calculus can be encoded into combinators. I’ve ported his definitions to the \(\lambda\)-calculus, along the lines suggested by Plotkin (1993), to make for easier comparison. In addition, I’ll extend both Engeler and Plotkin’s semantics to include integers and integer arithmetic in addition to the \(\lambda\)-calculus. Here’s the syntax for the \(\lambda\)-calculus that we consider here: \[\begin{array}{rcl} && n \in \mathbb{Z} \qquad x \in \mathbb{X} \;\;\text{(program variables)}\\ \oplus & ::= & + \mid - \mid \times \mid \div \\ \mathbb{E} \ni e & ::= & n \mid e \oplus e \mid x \mid {\lambda x.\,} e \mid e \; e \mid {\textbf{if}\,}e {\,\textbf{then}\,}e {\,\textbf{else}\,}e \end{array}\]

Values

Perhaps the best place to start the comparison is in the definition of what I’ll call values. All three semantics give an inductive definition of values and all three involve finite sets, but in different ways. I’ll write \(\mathbb{V}_S\) for my definition, \(\mathbb{V}_P\) for Plotkin’s, and \(\mathbb{V}_E\) for Engeler’s. \[\begin{aligned} \mathbb{V}_S &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_S \times \mathbb{V}_S) \\ \mathbb{V}_P &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_P) \times \mathcal{P}_f(\mathbb{V}_P) \\ \mathbb{V}_E &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_E) \times \mathbb{V}_E\end{aligned}\] In \(\mathbb{V}_S\), a function is represented as a finite graph, that is, a finite set of input-output pairs. For example, the graph \(\{ (0,1), (1,2), (2,3) \}\) is one of the meanings for the term \((\lambda x.\, x + 1)\).

Plotkin’s values \(\mathbb{V}_P\) include only a single input-output pair from a function’s graph. For example, \((\{0\}, \{1\})\) is one of the meanings for the term \((\lambda x.\, x + 1)\). Engeler’s values also include just a single entry. For example, \((\{0\}, 1)\) is one of the meanings for the term \((\lambda x.\, x + 1)\). In this example we have not made use of the finite sets in the input and output of Plotkin’s values. To do so, let us consider a higher-order example, such as the term \((\lambda f.\, f\,1 + f\,2)\). For Plotkin, the following value is one of its meanings: \[(\{ (\{1\}, \{3\}), (\{2\}, \{4\}) \}, \{7\})\] That is, in case \(f\) is the function that adds \(2\) to its input, the result is \(7\). We see that the presence of finite sets in the input is needed to accomodate functions-as-input. The corresponding value in \(\mathbb{V}_S\) is \[\{ (\{ (1, 3), (2, 4) \}, 7) \}\]

The difference between Plotkin and Engeler’s values can be seen in functions that return functions. Consider the \(K\) combinator \((\lambda x.\,\lambda y.\, x)\). For Plotkin, the following value is one of its meanings: \[(\{1\}, \{ (\{0\},\{1\}), (\{2\},\{1\}) \})\] That is, when applied to \(1\) it returns a function that returns \(1\) when applied to either \(0\) or \(2\). The corresponding value in \(\mathbb{V}_S\) is \[\{ (1, \{ (0,1), (2,1) \}) \}\] For Engeler, there is not a single value corresponding to the above value. Instead it requires two values to represent the same information. \[(\{1\}, (\{0\},1)) \quad\text{and}\quad (\{1\}, (\{2\},1))\] We’ll see later that it doesn’t matter that Engeler requires more values to represent the same information.

The Domains

The semantics of Plotkin, Engeler, and myself does not use values for the domain, but instead a set of values. That is \[\mathcal{P}(\mathbb{V}_S) \qquad \mathcal{P}(\mathbb{V}_P) \qquad \mathcal{P}(\mathbb{V}_E)\]

The role of the outer \(\mathcal{P}\) is intimately tied to the meaning of functions in Plotkin and Engeler’s semantics because the values themselves only record a single input-output pair. The outer \(\mathcal{P}\) is needed to represent all of the input-output pairs for a given function. While the \(\mathcal{P}\) is also necessary for functions in my semantics, one can view it generically as providing non-determinism and therefore somewhat orthogonal to the meaning of functions per se. Next let’s take a look at the semantics.

Comparing the Semantics

Here is Plotkin’s semantics \(\mathcal{E}_P\). Let \(V,V'\) range over finite sets of values. \[\begin{aligned} {\mathcal{E}_P[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_P[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_P[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_P[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,V') \mid V' \subseteq {\mathcal{E}_P[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_P[\![ e_1\;e_2 ]\!]}\rho &= \bigcup \left\{ V' \, \middle| \begin{array}{l} \exists V.\, (V,V') {\in} {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_P[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_P[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_P[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] For Plotkin, the environment \(\rho\) maps variables to finite sets of values. In the case for application, the input set \(V\) must be a subset of the meaning of the argument, which is critical for enabling self application and, using the \(Y\) combinator, general recursion. The \(\bigcup\) flattens the set-of-finite-sets into a set.

Next we consider Engeler’s semantics \(\mathcal{E}_E\). \[\begin{aligned} {\mathcal{E}_E[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_E[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_E[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_E[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,v') \mid v' \in {\mathcal{E}_E[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_E[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v' \, \middle| \begin{array}{l} \exists V.\, (V,v') {\in} {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_E[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_E[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_E[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] The semantics is quite similar to Plotkin’s, as again we see the use of \(\subseteq\) in the case for application. Because the output \(v'\) is just a value, and not a finite set of values as for Plotkin, there is no need for the \(\bigcup\).

Finally we review my semantics \(\mathcal{E}_S\). For it we need to define an ordering on values that is just equality for integers and \(\subseteq\) on function graphs. Let \(t\) range over \(\mathcal{P}_{f}(\mathbb{V} \times \mathbb{V})\). \[\frac{}{n \sqsubseteq n} \qquad \frac{t_1 \subseteq t_2}{t_1 \sqsubseteq t_2}\] Then we define \(\mathcal{E}_S\) as follows. \[\begin{aligned} {\mathcal{E}_S[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_S[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_S[\![ x ]\!]}\rho &= \{ v \mid v \sqsubseteq \rho(x) \} \\ {\mathcal{E}_S[\![ {\lambda x.\,} e ]\!]}\rho &= \{ t \mid \forall (v,v')\in t.\, v' \in {\mathcal{E}_S[\![ e ]\!]}\rho(x{:=}v) \} \\ {\mathcal{E}_S[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v \, \middle| \begin{array}{l} \exists t\, v_2\, v_3\, v_3'.\, t {\in} {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land v_2 {\in} {\mathcal{E}_S[\![ e_2 ]\!]}\rho \\ \land\, (v_3, v_3') \in t \land v_3 \sqsubseteq v_2 \land v \sqsubseteq v_3' \end{array} \right\} \\ {\mathcal{E}_S[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_S[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] In my semantics, \(\rho\) maps a variable to a single value. The \(v_3 \sqsubseteq v_2\) in my semantics corresponds to the uses of \(\subseteq\) in Plotkin and Engeler’s. One can view this as a kind of subsumption, allowing the use of a larger approximation of a function in places where a smaller approximation is needed. I’m not sure whether all the other uses of \(\sqsubseteq\) are necessary, but the semantics needs to be downward closed, and the above placement of \(\sqsubseteq\)’s makes this easy to prove.

Relational Semantics

For people like myself with a background in operational semantics, there is another view of the semantics that is helpful to look at. We can turn the above dentoational semantics into a relational semantics (like a big-step semantics) that hides the \(\mathcal{P}\) by making use of the following isomorphism (where \(\mathbb{V}\) is one of \(\mathbb{V}_S\), \(\mathbb{V}_P\), or \(\mathbb{V}_E\)). \[\mathbb{E} \to (\mathbb{X}\rightharpoonup \mathbb{V}) \to {\mathcal{P}(\mathbb{V})} \quad\cong\quad \mathbb{E} \times (\mathbb{X}\rightharpoonup \mathbb{V}) \times \mathbb{V}\] Let \(v\) range over \(\mathbb{V}\). We can define the semantic relation \(\rho \vdash_S e \Rightarrow v\) that corresponds to \(\mathcal{E}_S\) as follows. Note that in the rule for lambda abstraction, the table \(t\) comes out of thin air (it is existentially quantified), and that there is one premise in the rule per entry in the table, that is, we have the quantification \(\forall(v,v') \in t\). \[\begin{gathered} \frac{}{\rho \vdash_S n \Rightarrow n} \quad \frac {\rho \vdash_S e_1 \Rightarrow n_1 \quad \rho \vdash_S e_2 \Rightarrow n_2} {\rho \vdash_S e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \sqsubseteq \rho(x)} {\rho \vdash_S x \Rightarrow v} \\[3ex] \frac{\forall (v,v'){\in} t.\; \rho(x{:=}v) \vdash_S e \Rightarrow v'} {\rho \vdash_S {\lambda x.\,}e \Rightarrow t} \quad \frac{\begin{array}{c}\rho \vdash_S e_1 \Rightarrow t \quad \rho \vdash_S e_2 \Rightarrow v_2 \\ (v_3,v'_3) \in t \quad v_3 \sqsubseteq v_2 \quad v \sqsubseteq v'_3 \end{array} } {\rho \vdash_S (e_1{\;}e_2) \Rightarrow v} \\[3ex] \frac{\rho \vdash_S e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_S e_2 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_S e_1 \Rightarrow 0 \quad \rho \vdash_S e_3 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]

For comparison, let us also turn Plotkin’s semantics into a relation. \[\begin{gathered} \frac{}{\rho \vdash_P n \Rightarrow n} \quad \frac {\rho \vdash_P e_1 \Rightarrow n_1 \quad \rho \vdash_P e_2 \Rightarrow n_2} {\rho \vdash_P e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \in \rho(x)} {\rho \vdash_P x \Rightarrow v} \\[3ex] \frac{\forall v' \in V'.\, \rho(x{:=}V) \vdash_P e \Rightarrow v'} {\rho \vdash_P {\lambda x.\,}e \Rightarrow (V,V')} \quad \frac{\begin{array}{c}\rho \vdash_P e_1 \Rightarrow (V,V') \quad \forall v_2 \in V.\, \rho \vdash_P e_2 \Rightarrow v_2 \\ v' \in V' \end{array} } {\rho \vdash_P (e_1{\;}e_2) \Rightarrow v'} \\[3ex] \frac{\rho \vdash_P e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_P e_2 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_P e_1 \Rightarrow 0 \quad \rho \vdash_P e_3 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\] Recall that in Plotkin’s semantics, the environment maps variables to finite sets of values. The “set” is needed to handle the case of a function bound to a variable, but is just extra baggage when we have an integer bound to a variable. So in the variable rule we have \(v \in \rho(x)\), which either extracts a singleton integer from \(\rho(x)\), or extracts one input-output entry from a function’s graph. Moving on to the lambda rule, it only produces one input-output entry, but to handle the case when the output \(V'\) is representing a function, we must build it up one entry at a time with the quantification \(\forall v'\in V'\) and a finite but arbitrary number of premises. In the application rule we again have a finite number of premises, with \(\forall v_2\in V\), and also the premise \(v' \in V'\).

The relational version of Engeler’s semantics removes the need for quantification in the lambda rule, but the application rule still has \(\forall v_2 \in V\). \[\begin{gathered} \frac{\rho(x{:=}V) \vdash_E e \Rightarrow v'} {\rho \vdash_E {\lambda x.\,}e \Rightarrow (V,v')} \quad \frac{\begin{array}{c}\rho \vdash_E e_1 \Rightarrow (V,v') \quad \forall v_2 \in V.\, \rho \vdash_E e_2 \Rightarrow v_2 \end{array} } {\rho \vdash_E (e_1{\;}e_2) \Rightarrow v'}\end{gathered}\]

Conclusion

My semantics is similar to Plotkin and Engeler’s in that

  1. The domain is a set of values, and values are inductively defined and involve finite sets.

  2. Self application is enabled by allowing a kind of subsumption on functions.

The really nice thing about all three semantics is that they are simple; very little mathematics is necessary to understand them, which is important pedagogically, practically (easier for practitioners to apply such semantics), and aesthetically (Occam’s razor!).

My semantics is different to Plotkin and Engeler’s in that

  1. the definition of values places \(\mathcal{P}_f\) so that functions are literally represented by finite graphs, and

  2. environments map each variable to a single value, and

  3. \(\sqsubseteq\) is used instead of \(\subseteq\) to enable self application.

The upshot of these (relatively minor) differences is that my semantics may be easier to understand.