Friday, March 10, 2017

The Take 3 Semantics, Revisited

In my post about intersection types as denotations, I conjectured that the simple "take 3" denotational semantics is equivalent to an intersection type system. I haven't settled that question per se, but I've done something just as good, which is to show that everything that I've done with the intersection type system can also be done with the "take 3" semantics (with a minor modification).

Recall that the main difference between the "take 3" semantics and the intersection type system is how subsumption of functions is handled. The "take 3" semantics defined function application as follows, using the subset operator \(\sqsubseteq\) to require the argument \(v_2\) to include all the entries in the parameter \(v'_2\), while allowing \(v_2\) to have possibly more entries. \begin{align*} E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists v_1 v_2 v'_2.\, v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, \{ v'_2\mapsto v_3 \} \sqsubseteq v_1 \land v'_2 \sqsubseteq v_2 \end{array} \right\} \end{align*} Values are either numbers or functions. Functions are represented as a finite tables mapping values to values. \[ \begin{array}{lrcl} \text{tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] and \(\sqsubseteq\) is defined as equality on numbers and subset for function tables: \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} Recall that \(\subseteq\) is defined in terms of equality on elements.

In an intersection type system (without subsumption), function application uses subtyping. Here's one way to formulate the typing rule for application: \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A' \to B \quad A <: A'} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Types are defined as follows \[ \begin{array}{lrcl} \text{types} & A,B,C & ::= & n \mid A \to B \mid A \land B \mid \top \end{array} \] and the subtyping relation is given below. \begin{gather*} \frac{}{n <: n}(a) \quad \frac{}{\top <: \top}(b) \quad \frac{}{A \to B <: \top}(c) \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(d) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B}(e) \quad \frac{}{A \wedge B <: A}(f) \quad \frac{}{A \wedge B <: B}(g) \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)}(h) \end{gather*} Recall that values and types are isomorphic (and dual) to eachother in this setting. Here's the functions \(\mathcal{T}\) and \(\mathcal{V}\) that map back and forth between values and types. \begin{align*} \mathcal{T}(n) &= n \\ \mathcal{T}( \{ v_1 \mapsto v'_1, \ldots, v_n \mapsto v'_n \} ) &= \mathcal{T}(v_1) {\to} \mathcal{T}(v'_1) \land \cdots \land \mathcal{T}(v_n) {\to} \mathcal{T}(v'_n) \\[2ex] \mathcal{V}(n) &= n \\ \mathcal{V}(A \to B) &= \{ \mathcal{V}(A)\mapsto\mathcal{V}(B) \} \\ \mathcal{V}(A \land B) &= \mathcal{V}(A) \cup \mathcal{V}(B)\\ \mathcal{V}(\top) &= \emptyset \end{align*}

Given that values and types are really the same, the the typing rule for application is almost the same as the equation for the denotation of \(E[\!| e_1\;e_2 |\!](\rho)\). The only real difference is the use of \(<:\) versus \(\sqsubseteq\). However, subtyping is a larger relation than \(\sqsubseteq\), i.e., \(v_1 \sqsubseteq v_2\) implies \(\mathcal{T}(v_1) <: \mathcal{T}(v_2)\) but it is not the case that \(A <: B\) implies \(\mathcal{V}(A) \sqsubseteq \mathcal{V}(B)\). Subtyping is larger because of rules \((d)\) and \((h)\). The other rules just express the dual of \(\subseteq\).

So the natural question is whether subtyping needs to be bigger than \(\sqsubseteq\), or would we get by with just \(\sqsubseteq\)? In my last post, I mentioned that rule \((h)\) was not necessary. Indeed, I removed it from the Isabelle formalization without disturbing the proofs of whole-program soundness and completeness wrt. operational semantics, and was able to carry on and prove soundness wrt. contextual equivalence. This morning I also replaced rule \((d)\) with a rule that only allows equal function types to be subtypes. \[ \frac{}{A \to B <: A \to B}(d') \] The proofs went through again! Though I did have to make two minor changes in the type system without subsumption to ensure that it stays equivalent to the version of the type system with subsumption. I used the rule given above for function application instead of \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A \to B} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Also, I had to change the typing rule for \(\lambda\) to use subtyping to relate the body's type to the return type. \[ \frac{\Gamma,x:A \vdash e : B' \qquad B' <: B} {\Gamma \vdash \lambda x.\, e : A \to B} \] Transposing this back into the land of denotational semantics and values, we get the following equation for the meaning of \(\lambda\), in which everything in the return specification \(v_2\) must be contained in the value \(v'_2\) produced by the body. \[ E[\!| \lambda x.\; e |\!] (\rho) = \left\{ v \middle| \begin{array}{l}\forall v_1 v_2. \{v_1\mapsto v_2\} \sqsubseteq v \implies \\ \exists v_2'.\; v'_2 \in E[\!| e |\!] (\rho(x{:=}v_1)) \,\land\, v_2 \sqsubseteq v'_2 \end{array} \right\} \]

So with this little change, the "take 3" semantics is a great semantics for the call-by-value untyped lambda calculus! For whole programs, it's sound and complete with respect to the standard operational semantics, and it is also sound with respect to contextual equivalence.

Wednesday, March 08, 2017

Sound wrt. Contextual Equivalence

The ICFP paper submission deadline kept me busy for much of February, but now I'm back to thinking about the simple denotational semantics of the lambda calculus. In previous posts I showed that this semantics is equivalent to standard operational semantics when considering the behavior of whole programs. However, sometimes it is necessary to reason about the behavior of program fragments and we would like to use the denotational semantics for this as well. For example, an optimizing compiler might want to exchange one expression for another less-costly expression that does the same job.

The formal notion of two such ``exchangeable'' expressions is contextual equivalence (Morris 1968). It says that two expression are equivalent if plugging them into an arbitrary context produces programs that behave the same.

Definition (Contextual Equivalence)
Two expressions \(e_1\) and \(e_2\) are contextually equivalent, written \(e_1 \simeq e_2\), iff for any closing context \(C\), \[ \mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]). \]

We would like to know that when two expressions are denotationally equal, then they are also contextually equivalent.

Theorem (Sound wrt. Contextual Equivalence)
If \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\), then \(e_1 \simeq e_2\).

The rest of the blog post gives an overview of the proof (except for the discussion of related work at the very end). The details of the proof are in the Isabelle mechanization. But first we need to define the terms used in the above statements.


Recall that our denotational semantics is defined in terms of an intersection type system. The meaning of an expression is the set of all types assigned to it by the type system. \[ E[e]\Gamma \equiv \{ A \mid \Gamma \vdash_2 e : A \} \] Recall that the types include singletons, functions, intersections, and a top type: \[ A,B,C ::= n \mid A \to B \mid A \land B \mid \top \] I prefer to think of these types as values, where the function, intersection, and top types are used to represent finite tables that record the input-output values of a function.

The intersection type system that we use here differs from the one in the previous post in that we remove the subsumption rule and sprinkle uses of subtyping elsewhere in a standard fashion (Pierce 2002).

\begin{gather*} \frac{}{\Gamma \vdash_2 n : n} \\[2ex] \frac{} {\Gamma \vdash_2 \lambda x.\, e : \top} \quad \frac{\Gamma \vdash_2 \lambda x.\, e : A \quad \Gamma \vdash_2 \lambda x.\, e : B} {\Gamma \vdash_2 \lambda x.\, e : A \wedge B} \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash_2 x : A} \quad \frac{\Gamma,x:A \vdash_2 B} {\Gamma \vdash_2 \lambda x.\, e : A \to B} \\[2ex] \frac{\Gamma \vdash_2 e_1: C \quad C <: A \to B \quad \Gamma \vdash_2 e_2 : A} {\Gamma \vdash_2 e_1 \; e_2 : B} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash_2 e_1 : A \quad A <: n_1 \\ \Gamma \vdash_2 e_2 : B \quad B <: n_2 \end{array} \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash_2 \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: 0 \quad \Gamma \vdash_2 e_3 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: n \quad n \neq 0 \quad \Gamma \vdash_2 e_2 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \end{gather*}

Regarding subtyping, we make a minor change and leave out the rule \[ \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \] because I had a hunch that it wasn't needed to prove Completeness with respect to the small step semantics, and indeed it was not. So the subtyping relation is defined as follows.

\begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \end{gather*}

This type system is equivalent to the one with subsumption in the following sense.

Theorem (Equivalent Type Systems)

  1. If \(\Gamma \vdash e : A\), then \(\Gamma \vdash_2 e : A'\) and \(A' <: A\) for some \(A'\).
  2. If \(\Gamma \vdash_2 e : A\), then \(\Gamma \vdash e : A\).
The proofs of the two parts are straightforward inductions on the derivations of the typing judgments. QED

This type system satisfies the usual progress and preservation properties.

Theorem (Preservation)
If \(\Gamma \vdash_2 e : A\) and \(e \longrightarrow e'\), then \(\Gamma \vdash_e e' : A'\) and \(A' <: A\) for some \(A'\).
The proof of preservation is by induction on the derivation of the reduction. The case for \(\beta\) reduction relies on lemmas about substitution and type environments. QED

Theorem (Progress)
If \(\emptyset \vdash_2 e : A\) and \(\mathrm{FV}(e) = \emptyset\), then \(e\) is a value or \(e \longrightarrow e'\) for some \(e'\).
The proof of progress is by induction on the typing derivation. As usual it relies on a canonical forms lemma. QED

Lemma (Canonical forms)
Suppose \(\emptyset \vdash_2 v : A\).

  1. If \(A <: n\), then \(v = n\).
  2. If \(A <: B \to C\), then \(v = \lambda x.\, e\) for some \(x,e\).

Next we turn to the definition of \(\mathit{eval}\). As usual, we shall define the behavior of a program in terms of the operational (small-step) semantics and an \(\mathit{observe}\) function. \begin{align*} \mathit{eval}(e) &= \begin{cases} \mathit{observe}(v) & \text{if } e \longrightarrow^{*} v \\ \mathtt{bad} & \text{otherwise} \end{cases}\\ \mathit{observe}(n) &= n \\ \mathit{observe}(\lambda x.\, e) &= \mathtt{fun} \end{align*} In the above we categorize programs as \(\mathtt{bad}\) if they do not produce a value. Thus, we are glossing over the distinction between programs that diverge and programs that go wrong (e.g., segmentation fault). We do this because our denotational semantics does not make such a distinction. However, I plan to circle back to this issue in the future and develop a version of the semantics that does.

Soundness wrt. Contextual Equivalence

We assume that \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\) and need to show that \(e_1 \simeq e_2\). That is, we need to show that \(\mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]) \) for any closing context \(C\). We shall prove Congruence which lets us lift the denotational equality of \(e_1\) and \(e_2\) through any context, so we have \begin{equation} E[C[e_1]]\emptyset = E[C[e_2]]\emptyset \qquad\qquad (1) \end{equation} Now let us consider the cases for \(\mathsf{eval}(C[e_1])\).

  • Case \(\mathsf{eval}(C[e_1]) = \mathit{observe}(v)\) and \(C[e_1] \longrightarrow^{*} v\):
    By Completeness of the intersection type system we have \(\emptyset \vdash_2 C[e_1] : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\). Then with (1) we have \begin{equation} \emptyset \vdash_2 C[e_2] : A \qquad\qquad (2) \end{equation} The type system is sound wrt. the big-step semantics, so \(\emptyset \vdash C[e_2] \Downarrow v'\) for some \(v'\). Therefore \(C[e_2] \longrightarrow^{*} v''\) because the big-step semantics is sound wrt. the small-step semantics. It remains to show that \(\mathit{observe}(v'') = \mathit{observe}(v)\). From (2) we have \(\emptyset \vdash_2 v'' : A''\) for some \(A''\) where \(A'' <: A\), by Preservation. Noting that we already have \(\emptyset \vdash_2 v : A'\), \(\emptyset \vdash_2 v'' : A''\), \(A' <: A\), and \(A'' <: A\), we conclude that \(\mathit{observe}(v) = \mathit{observe}(v'')\) by the Lemma Observing values of subtypes.
  • Case \(\mathsf{eval}(C[e_1]) = \mathtt{bad}\):
    So \(C[e_1]\) either diverges or gets stuck. In either case, we have \(E[C[e_1]]\emptyset = \emptyset \) (Lemmas Diverging programs have no meaning and Programs that get stuck have no meaning). So by (1) we have \(E[C[e_2]]\emptyset = \emptyset\). We conclude that \(C[e_2]\) either diverges or gets stuck by Lemma (Programs with no meaning diverge or get stuck). Thus, \(\mathsf{eval}(C[e_2]) = \mathtt{bad}\).

Lemma (Congruence)
Let \(C\) be an arbitrary context. If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[C[e_1]]\Gamma = E[C[e_2]]\Gamma\).
We prove congruence by structural induction on the context \(C\), using the induction hypothesis and the appropriate Compatibility lemma for each kind of expression. QED

Most of the Compatibility lemmas are straightforward, though the one for abstraction is worth discussing.

Lemma (Compatibility for abstraction)
If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[\lambda x.\, e_1]\Gamma = E[\lambda x.\, e_2]\Gamma\).
To prove compatibility for abstractions, we first prove that

If \(\Gamma' \vdash_2 e_1 : B\) implies \(\Gamma' \vdash_2 e_2 : B\) for any \(\Gamma',B\), then \(\Gamma \vdash_2 \lambda x.\, e_1 : C\) implies \(\Gamma \vdash_2 \lambda x.\, e_2 : C\).
This is a straightforward induction on the type \(C\). Compatibility follows by two uses this fact. QED

Theorem (Completeness wrt. small-step semantics) If \(e \longrightarrow^{*} v\) then \(\emptyset \vdash_2 e : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\).
We have \(\emptyset \vdash e : B\) and \(\emptyset \vdash v : B\) by Completeness of the type system with subsumption. Therefore \(\emptyset \vdash_2 e : A\) and \(A <: B\) by Theorem Equivalent Type Systems. By preservation we conclude that \(\emptyset \vdash_2 v : A'\) and \(A' <: A\). QED

In a previous blog post, we proved soundness with respect to big-step semantics for a slightly different denotational semantics. So we update that proof for the denotational semantics defined above. We shall make use of the following logical relation \(\mathcal{G}\) in this proof. \begin{align*} G[n] &= \{ n \} \\ G[A \to B] &= \{ \langle \lambda x.\, e, \rho \rangle \mid \forall v \in G[A]. \; \rho(x{:=}v) \vdash e \Downarrow v' \text{ and } v' \in G[B] \} \\ G[A \land B] &= G[A] \cap G[B] \\ G[\top] &= \{ v \mid v \in \mathrm{Values} \} \\ \\ G[\emptyset] &= \{ \emptyset \} \\ G[\Gamma,x:A] &= \{ \rho(x{:=}v) \mid v \in G[A] \text{ and } \rho \in G[\Gamma] \} \end{align*}

We shall need two lemmas about this logical relation.

Lemma (Lookup in \(\mathcal{G}\))
If \(x:A \in \Gamma\) and \(\rho \in G[\Gamma]\), then \(\rho(x) = v\) and \(v \in G[A]\).

Lemma (\(\mathcal{G}\) preserves subtyping )
If \(A <: B\) and \(v \in G[A]\), then \(v \in G[B]\).

Theorem (Soundness wrt. big-step semantics)
If \(\Gamma \vdash_2 e : A\) and \(\rho \in G[\Gamma]\), then \(\rho \vdash e \Downarrow v\) and \(v \in G[A]\).
The proof is by induction on the typing derivation. The case for variables uses the Lookup Lemma and all of the elimination forms use the above Subtyping Lemma (because their typing rules use subtyping). QED

Lemma (Observing values of subtypes)
If \(\emptyset \vdash_2 v : A\), \(\emptyset \vdash_2 v' : B\), \(A <: C\), and \(B <: C\), then \(\mathit{observe}(v) = \mathit{observe}(v')\).
The proof is by cases of \(v\) and \(v'\). We use Lemmas about the symmetry of subtyping for singletons, an inversion lemma for functions, and that subtyping preserves function types. QED

Lemma (Subtyping symmetry for singletons) If \(n <: A\), then \(A <: n\).

For the next lemma we need to characterize the types for functions. \begin{gather*} \frac{}{\mathit{fun}(A \to B)} \quad \frac{\mathit{fun}(A) \qquad \mathit{fun}(B)} {\mathit{fun}(A \land B)} \quad \frac{}{\mathit{fun}(\top)} \end{gather*}

Lemma (Inversion on Functions)
If \(\Gamma \vdash_2 \lambda x.\, e : A\), then \(\mathit{fun}(A)\).

Lemma (Subtyping preserves functions)
If \(A <: B\) and \(\mathit{fun}(A)\), then \(\mathit{fun}(B)\).

Lemma (Diverging Programs have no meaning)
If \(e\) diverges, then \(E[e]\emptyset = \emptyset\).
Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Then by soundness wrt. big-step semantics, we have \(\emptyset \vdash e \Downarrow v\) and so also \(e \longrightarrow^{*} v'\). But this contradicts the premise that \(e\) diverges. QED

Lemma (Programs that get stuck have no meaning)
Suppose that \(e \longrightarrow^{*} e'\) and \(e'\) is stuck (and not a value). Then \(E[e]\emptyset = \emptyset\).
Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Therefore \(\emptyset \vdash_2 e' : A'\) for some \(A' <: A\). By Progress, either \(e'\) is a value or it can take a step. But that contradicts the premise. QED

Lemma (Programs with no meaning diverge or gets stuck)
If \(E[e]\emptyset = \emptyset\), then \(e\) diverges or reduces to a stuck non-value.
Towards a contradiction, suppose that \(e\) does not diverge and does not reduce to a stuck non-value. So \(e \longrightarrow^{*} v\) for some \(v\). But then by Completeness wrt. the small-step semantics, we have \(\emptyset \vdash_2 e : A\) for some \(A\), which contradicts the premise \(E[e]\emptyset = \emptyset\). QED

Related Work

The proof method used here, of proving Compatibility and Congruence lemmas to show soundness wrt. contextual equivalence, is adapted from Gunter's book (1992), where he proves that the standard model for PCF (CPO's and continuous functions) is sound. This approach is also commonly used to show that logical relations are sound wrt. contextual equivalence (Pitts 2005).

The problem of full abstraction is to show that denotational equivalence is both sound (aka. correct): \[ E[e_1] = E[e_2] \qquad \text{implies} \qquad e_1 \simeq e_2 \] and complete: \[ e_1 \simeq e_2 \qquad \text{implies} \qquad E[e_1] = E[e_2] \] with respect to contextual equivalence (Milner 1975). Here we showed that the simple denotational semantics is sound. I do not know whether it is complete wrt. contextual equivalence.

There are famous examples of denotational semantics that are not complete. For example, the standard model for PCF is not complete. There are two expressions in PCF that are contextually equivalent but not denotationally equivalent (Plotkin 1977). The idea behind the counter-example is that parallel-or cannot be defined in PCF, but it can be expressed in the standard model. The two expressions are higher-order functions constructed to behave differently only when applied to parallel-or.

Rocca and Paolini (2004) define a filter model \(\mathcal{V}\) for the call-by-value lambda calculus, similar to our simple denotational semantics, and prove that it is sound wrt. contextual equivalence (Theorem 12.1.18). Their type system and subtyping relation differs from ours in several ways. Their \(\land\,\mathrm{intro}\) rule is not restricted to \(\lambda\), they include subsumption, their \(\top\) type is a super-type of all types (not just function types), they include the distributivity rule discussed at the beginning of this post, and they include a couple other rules (labeled \((g)\) and \((v)\) in Fig. 12.1). I'm not sure whether any of these differences really matter; the two systems might be equivalent. Their proof is quite different from ours and more involved; it is based on the notion of approximants. They also show that \(\mathcal{V}\) is incomplete wrt. contextual equivalence, but go on to create another model based on \(\mathcal{V}\) that is. The fact that \(\mathcal{V}\) is incomplete leads me suspect that \(\mathcal{E}\) is also incomplete. This is certainly worth looking into.

Abramsky (1990) introduced a domain logic whose formulas are intersetion types: \[ \phi ::= \top \mid \phi \land \phi \mid \phi \to \phi \] and whose proof theory is an intersection type system designed to capture the semantics of the lazy lambda calculus. Abramsky proves that it is sound with respect to contextual equivalence. As far as I can tell, the proof is different than the approach used here, as it shows that the domain logic is sound with respect to a denotational semantics that solves the domain equation \(D = (D \to D)_\bot\), then shows that this denotational semantics is sound wrt. contextual equivalence. (See also Alan Jeffrey (1994).)