tag:blogger.com,1999:blog-111622302018-06-13T01:39:33.030-07:00Jeremy SiekMusing about programming languages and computer science.Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.comBlogger44125tag:blogger.com,1999:blog-11162230.post-84932132937730475072018-04-24T20:31:00.000-07:002018-04-29T13:11:24.887-07:00What do real numbers have in common with lambdas? and what does continuity have to do with it?<h1 id="continuous-functions-over-the-real-numbers">Continuous functions over the real numbers</h1>As a high school student and undergraduate I learned in Calculus that<br /><ol><li>real numbers involve infinity in precision, e.g. some have no finite decimal representation, and</li><li>a continuous function forms an unbroken line, a necessary condition to be differentiable.</li></ol>For an example, the decimal representation of <span class="math">\(\sqrt 2\)</span> goes on forever: <span class="math">\[1.41421 \ldots\]</span> Later on, in a course on Real Analysis, I learned that one way to define the real numbers is to declare them to be Cauchy sequences, that is, infinite sequences of rational numbers that get closer and closer together. So, for example, <span class="math">\(\sqrt 2\)</span> is declared to be the sequence <span class="math">\(1, \frac{3}{2}, \frac{17}{12}, \frac{577}{408}, \ldots\)</span> described by the following recursive formulas.<br /><span class="math">\[A_0 = 1 \qquad A_{n+1} = \frac{A_n}{2} + \frac{1}{A_n} \hspace{1in} (1) \label{eq:caucy-sqrt-2}\]</span><br />Depending on how close an approximation to <span class="math">\(\sqrt 2\)</span> you need, you can go further out in this sequence. (Alternatively, one can represent \(\sqrt 2\) by its sequence of continued fractions.)<br />For an example of a continuous function, Figure 1 depicts <span class="math">\(x^3 - x^2 - 4x\)</span>. On the other hand, Figures 2 and 3 depict functions that are not continuous. The function <span class="math">\(1/\mathrm{abs}(x-\sqrt 2)^{1/4}\)</span> in Figure 2 is not continuous because it goes to infinity as it approaches <span class="math">\(\sqrt 2\)</span>. The function <span class="math">\((x+1)\,\mathrm{sign}(x)\)</span> in Figure 3 is not continuous because it jumps from <span class="math">\(-1\)</span> to <span class="math">\(1\)</span> at <span class="math">\(0\)</span>.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://4.bp.blogspot.com/-RxcwPUmZd5Q/Wt_wvojTXnI/AAAAAAAAAqc/L-SrNhUvdZYXtmvGnBkeTzqiOx1ElA22wCLcBGAs/s1600/graph-poly.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="413" data-original-width="421" height="313" src="https://4.bp.blogspot.com/-RxcwPUmZd5Q/Wt_wvojTXnI/AAAAAAAAAqc/L-SrNhUvdZYXtmvGnBkeTzqiOx1ElA22wCLcBGAs/s320/graph-poly.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 1. The function <span class="math">\(x^3 - x^2 - 4x\)</span> is continuous.</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/-5tGSVRZAfXA/Wt_xF_TtSHI/AAAAAAAAAqk/U-Gqkx95LBcxJc3yCDgVJAYnUfAW9qUPgCLcBGAs/s1600/graph-inv-abs-exp.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="417" data-original-width="415" height="320" src="https://2.bp.blogspot.com/-5tGSVRZAfXA/Wt_xF_TtSHI/AAAAAAAAAqk/U-Gqkx95LBcxJc3yCDgVJAYnUfAW9qUPgCLcBGAs/s320/graph-inv-abs-exp.png" width="318" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2. The function <span class="math">\(1/\mathrm{abs}(x-\sqrt 2)^{1/4}\)</span> is not continuous at <span class="math">\(\sqrt 2\)</span>.</td></tr></tbody></table><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://3.bp.blogspot.com/-aCX0P8Pc4lY/Wt_xWsPmQ7I/AAAAAAAAAqs/u7DKqP2OrWc2K_vItsfS7v6P9rcPZiuSQCLcBGAs/s1600/graph-xp1-sign-x.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="418" data-original-width="416" height="320" src="https://3.bp.blogspot.com/-aCX0P8Pc4lY/Wt_xWsPmQ7I/AAAAAAAAAqs/u7DKqP2OrWc2K_vItsfS7v6P9rcPZiuSQCLcBGAs/s320/graph-xp1-sign-x.png" width="318" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 3. The function <span class="math">\((x+1)\,\mathrm{sign}(x)\)</span> is not continuous at <span class="math">\(0\)</span>.</td></tr></tbody></table><br />You may recall the <span class="math">\(\epsilon\)</span>-<span class="math">\(\delta\)</span> definition of continuity, stated below and depicted in Figure 4.<br /><blockquote>A function <span class="math">\(f\)</span> is continuous at a point <span class="math">\(x\)</span> if for any <span class="math">\(\epsilon > 0\)</span> there exists a <span class="math">\(\delta > 0\)</span> such that for any <span class="math">\(x'\)</span> in the interval <span class="math">\((x - \delta,x+\delta)\)</span>, <span class="math">\(f(x')\)</span> is in <span class="math">\((f(x) -\epsilon, f(x) + \epsilon)\)</span>.</blockquote>In other words, when a function is continuous, if you want to determine its result with an accuracy of <span class="math">\(\epsilon\)</span>, you need to measure the input with an accuracy of <span class="math">\(\delta\)</span>.<br /><br /><div class="caption"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://4.bp.blogspot.com/-_o3YNt3v-Pk/Wt_yAu7IE0I/AAAAAAAAAq0/bj4RHK1HXhQu87nojl2Li1Zz1mrd-8x1ACLcBGAs/s1600/epsilon-delta.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="382" data-original-width="415" height="293" src="https://4.bp.blogspot.com/-_o3YNt3v-Pk/Wt_yAu7IE0I/AAAAAAAAAq0/bj4RHK1HXhQu87nojl2Li1Zz1mrd-8x1ACLcBGAs/s320/epsilon-delta.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 4. The <span class="math">\(\epsilon\)</span>-<span class="math">\(\delta\)</span> definition of continuity.</td></tr></tbody></table></div>One connection between the infinite nature of real numbers and continuity that only recently sunk-in is that continuous functions are the ones that can be reasonably approximated by applying them to approximate, finitely-represented inputs. For example, suppose you wish to compute <span class="math">\(f(\sqrt 2)\)</span> for some continuous function <span class="math">\(f\)</span>. You can accomplish this by applying <span class="math">\(f\)</span> to each rational number in the Cauchy sequence for <span class="math">\(\sqrt 2\)</span> until two subsequent results are closer than your desired accuracy. On the other hand, consider trying to approximate the function from Figure 2 by applying it to rational numbers in the Cauchy sequence for <span class="math">\(\sqrt 2\)</span>. No matter how far down the sequence you go, you’ll still get a result that is wrong by an infinite margin!<br /><br /><h1 id="the-lambda-calculus-and-continuous-functions">The <span class="math">\(\lambda\)</span>-calculus and continuous functions</h1>In graduate school I studied programming languages and learned that<br /><ol><li>the <span class="math">\(\lambda\)</span>-calculus is a little language for creating and applying functions, and</li><li>Dana S. Scott’s semantics of the <span class="math">\(\lambda\)</span>-calculus interprets <span class="math">\(\lambda\)</span>’s as continuous functions.</li></ol>For example, the <span class="math">\(\lambda\)</span> expression <span class="math">\[\lambda x.\; x + 1\]</span> creates an anonymous function that maps its input <span class="math">\(x\)</span>, say a natural number, to the next greatest one. The graph of this function is <span class="math">\[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\}\]</span> which is infinite. So we have our first similarity between the real numbers and <span class="math">\(\lambda\)</span>’s, both involve infinity.<br />A key characteristic of the <span class="math">\(\lambda\)</span>-calculus is that functions can take functions as input. Thus, the semantics of the <span class="math">\(\lambda\)</span>-calculus is also concerned with functions over infinite entities (just like functions over the real numbers). For example, here is a <span class="math">\(\lambda\)</span> expression that takes a function <span class="math">\(f\)</span> and produces a function that applies <span class="math">\(f\)</span> twice in succession to its input <span class="math">\(x\)</span>. <span class="math">\[\lambda f.\; \lambda x.\; f(f(x))\]</span> The graph of this function is especially difficult to write down. Not only does it have an infinite domain and range, but each element in the domain and range is an infinite entity. <span class="math">\[\left\{ \begin{array}{l} \{ 0\mapsto 1, 1\mapsto 2, 2\mapsto 3, \ldots \} \mapsto \{ 0\mapsto 2, 1\mapsto 3, 2\mapsto 4, \ldots \},\\ \{ 0\mapsto 0, 1\mapsto 2, 2\mapsto 4, \ldots \} \mapsto \{ 0\mapsto 0, 1\mapsto 4, 2\mapsto 8, \ldots \},\\ \ldots \end{array} \right\}\]</span><br />Denotational semantics for the <span class="math">\(\lambda\)</span>-calculus interpret <span class="math">\(\lambda\)</span>’s as continuous functions, so just based on the terminology there should be another similarity with real numbers! However, these continuous functions are over special sets called domains, not real numbers, and the definition of continuity in this setting bears little resemblance to the <span class="math">\(\epsilon\)</span>-<span class="math">\(\delta\)</span> definition. For example, in Dana S. Scott’s classic paper <i>Data Types as Lattices</i>, the domain is the powerset of the natural numbers, <span class="math">\(\mathcal{P}(\mathbb{N})\)</span>. This domain can be used to represent a function's graph by encoding (create a bijection) between pairs and natural numbers, and between sets and naturals. The following are the easier-to-specify directions of the two bijections, the mapping from pairs to naturals and the mapping from naturals to sets of naturals.<br /><span class="math">\[\begin{aligned} \langle n, m \rangle &= 2^n (2m+1) - 1 \\ \mathsf{set}(0) &= \emptyset \\ \mathsf{set}(1+k) &= \{ m \} \cup \mathsf{set}(n) & \text{if } \langle n, m \rangle = k\end{aligned}\]</span><br />Scott defines the continuous functions on <span class="math">\(\mathcal{P}(\mathbb{N})\)</span> as those functions <span class="math">\(h\)</span> that satisfy<br /><span class="math">\[h(f) = \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \hspace{1in} (2) \label{eq:cont-pn}\]</span><br />In other words, the value of a continuous function <span class="math">\(h\)</span> on some function <span class="math">\(f \in \mathcal{P}(\mathbb{N})\)</span> must be the same as the union of applying <span class="math">\(h\)</span> to all the finite subgraphs of <span class="math">\(f\)</span>. One immediately wonders, why are the <span class="math">\(\lambda\)</span>-definable functions continuous in this sense? Consider some <span class="math">\(\lambda\)</span> expression <span class="math">\(h\)</span> that takes as input a function <span class="math">\(f\)</span>.<br /><blockquote>But <span class="math">\(f\)</span> is a <i>function</i>; an infinite object. What does it mean to “compute” with an “infinite” argument? In this case it means most simply that <span class="math">\(h(f)\)</span> is determined by asking of <span class="math">\(f\)</span> finitely many questions: <span class="math">\(f(m_0), f(m_1), ..., f(m_{k-1})\)</span>. —Dana S. Scott, <i>A type-theoretical alternative to ISWIM, CUCH, OWHY</i>, 1969.</blockquote>Put another way, if <span class="math">\(h\)</span> terminates and returns a result, then it will only have had a chance to call <span class="math">\(f\)</span> finitely many times. So it suffices to apply <span class="math">\(h\)</span> instead to a finite subset of the graph of <span class="math">\(f\)</span>. However, we do not know up-front which subset of <span class="math">\(f\)</span> to use, but it certainly suffices to try all of them!<br /><br /><h1 id="sec:relating-cont">Relating the two kinds of continuity</h1>But what does equation (2) have to do with continuous functions over the real numbers? What does it have to do with the <span class="math">\(\epsilon\)-\(\delta\)</span> definition? This question has been in the back of my mind for some time, but only recently have I had the opportunity to learn the answer.<br />To understand how these two kinds of continuity are related, it helps to focus on the way that infinite entities can be approximated with finite ones in the two settings. We can approximate a real number with a rational interval. For example, refering back to the Cauchy sequence for <span class="math">\(\sqrt 2\)</span>, equation (1), we have <span class="math">\[\sqrt 2 \in \left(\frac{17}{12}, \frac{3}{2}\right)\]</span> Of course an approximation does not uniquely identify the thing it approximates. So there are other real numbers in this interval, such as <span class="math">\(\sqrt{2.1}\)</span>. <span class="math">\[\sqrt{2.1} \in \left(\frac{17}{12}, \frac{3}{2}\right)\]</span><br />Likewise we can approximate the infinite graph of a function with a finite part of its graph. For example, let <span class="math">\(G\)</span> be the a graph with just one input-output entry. <span class="math">\[G=\{ 1 \mapsto 2 \}\]</span> Then we consider <span class="math">\(G\)</span> to be an approximation of any function that agrees with <span class="math">\(G\)</span> (maps <span class="math">\(1\)</span> to <span class="math">\(2\)</span>), which is to say its graph is a superset of <span class="math">\(G\)</span>. So the set of all functions that are approximated by <span class="math">\(G\)</span> can be expressed with a set comprehension as follows: <span class="math">\(\{ f \mid G \subseteq f\}\)</span>. In particular, the function <span class="math">\(+1\)</span> that adds one to its input is approximated by <span class="math">\(G\)</span>. <span class="math">\[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\]</span> But also the function <span class="math">\(\times 2\)</span> that doubles its input is approximated by <span class="math">\(G\)</span>. <span class="math">\[\left\{ \begin{array}{l} 0\mapsto 0, \\ 1\mapsto 2, \\ 2\mapsto 4, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\]</span> Of course, a better approximation such as <span class="math">\(G'=\{1\mapsto 2, 2\mapsto 3\}\)</span> is able to tell these two functions apart.<br />The interval <span class="math">\((17/12, 3/2)\)</span> and the set <span class="math">\(\{f\mid G \subseteq f\}\)</span> are both examples of <i>neighborhoods</i> (aka. base elements) in a topological space. The field of Topology was created to study the essence of continuous functions, capturing the similarities and abstracting away the differences regarding how such functions work in different settings. A <i>topological space</i> is just some set <span class="math">\(X\)</span> together with a collection <span class="math">\(B\)</span> of neighborhoods, called a <i>base</i>, that must satisfy a few conditions that we won’t get into. We’ve already seen two topological spaces.<br /><ol><li>The real numbers form a topological space where each neighborhood consists of all the real numbers in a rational interval.</li><li>The powerset \(\mathcal{P}(\mathbb{N})\) forms a topological space where each neighborhood consists of all the functions approximated by a finite graph.</li></ol>The <span class="math">\(\epsilon\)</span>-<span class="math">\(\delta\)</span> definition of continuity generalizes to topological spaces: instead of talking about intervals, it talks generically about neighborhoods. In the following, the interval <span class="math">\((f(x) -\epsilon, f(x) + \epsilon)\)</span> is replaced by neighborhood <span class="math">\(E\)</span> and the interval <span class="math">\((x - \delta,x+\delta)\)</span> is replaced by neighborhood <span class="math">\(D\)</span>.<br /><blockquote>A function <span class="math">\(f\)</span> is continuous at a point <span class="math">\(x\)</span> if for any neighborhood <span class="math">\(E\)</span> that contains <span class="math">\(f(x)\)</span>, there exists a neighborhood <span class="math">\(D\)</span> that contains <span class="math">\(x\)</span> such that for any <span class="math">\(y\)</span> in <span class="math">\(D\)</span>, <span class="math">\(f(y)\)</span> is in <span class="math">\(E\)</span>.</blockquote>Now let us instantiate this topological definition of continuity into <span class="math">\(\mathcal{P}(\mathbb{N})\)</span>.<br /><blockquote>A function <span class="math">\(f\)</span> over <span class="math">\(\mathcal{P}(\mathbb{N})\)</span> is continuous at <span class="math">\(X\)</span> if for any finite set <span class="math">\(E\)</span> such that <span class="math">\(E \subseteq f(X)\)</span>, there exists a finite set <span class="math">\(D\)</span> with <span class="math">\(D \subseteq X\)</span> such that for any <span class="math">\(Y\)</span>, <span class="math">\(D \subseteq Y\)</span> implies <span class="math">\(E \subseteq f(Y)\)</span>.</blockquote>Hmm, this still doesn’t match up with the definition of continuity in equation (2) but perhaps they are equivalent. Let us take the above as the definition and try to prove equation (2).<br />First we show that <span class="math">\[h(f) \subseteq \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}\]</span> Let <span class="math">\(x'\)</span> be an arbitrary element of <span class="math">\(h(f)\)</span>. To show that <span class="math">\(x'\)</span> is in the right-hand side we need to identify some finite <span class="math">\(g\)</span> such that <span class="math">\(g \subseteq f\)</span> and <span class="math">\(x' \in h(g)\)</span>, that is, <span class="math">\(\{x'\} \subseteq h(g)\)</span>. But this is just what continuity gives us, taking <span class="math">\(h\)</span> as <span class="math">\(f\)</span>, <span class="math">\(f\)</span> as <span class="math">\(X\)</span>, <span class="math">\(\{x'\}\)</span> as <span class="math">\(E\)</span>, <span class="math">\(g\)</span> as <span class="math">\(D\)</span>, and also <span class="math">\(g\)</span> as <span class="math">\(Y\)</span>. Second we need show that <span class="math">\[\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \subseteq h(f)\]</span> This time let <span class="math">\(x'\)</span> be an element of <span class="math">\(\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}\)</span>. So we known there is some finite set <span class="math">\(g\)</span> such that <span class="math">\(x' \in h(g)\)</span> and <span class="math">\(g \subseteq f\)</span>. Of course <span class="math">\(\{x'\}\)</span> is a finite set and <span class="math">\(\{x'\} \subseteq h(g)\)</span>, so we can apply the definition of continuity to obtain a finite set <span class="math">\(E\)</span> such that <span class="math">\(E \subseteq g\)</span> and for all <span class="math">\(Y\)</span>, <span class="math">\(E \subseteq Y\)</span> implies <span class="math">\(\{x'\} \subseteq h(Y)\)</span>. From <span class="math">\(E \subseteq g\)</span> and <span class="math">\(g \subseteq f\)</span> we transitively have <span class="math">\(E \subseteq f\)</span>. So instantiating <span class="math">\(Y\)</span> with <span class="math">\(f\)</span> we have <span class="math">\(\{x'\} \subseteq h(f)\)</span> and therefore <span class="math">\(x' \in h(f)\)</span>.<br />We have shown that the topologically-derived definition of continuity for <span class="math">\(\mathcal{P}(\mathbb{N})\)</span> implies the definition used in the semantics of the <span class="math">\(\lambda\)</span>-calculus, i.e., equation (2). It is also straightforward to prove the other direction, taking equation (2) as given and proving that the topologically-derived definition holds. Thus, continuity for functions over real numbers really is similar to continuity for <span class="math">\(\lambda\)</span> functions, they are both instances of continuous functions in a topological space.<br /><br /><h1 id="continuous-functions-over-partial-orders">Continuous functions over partial orders</h1>In the context of Denotational Semantics, domains are often viewed as partial orders where the ordering <span class="math">\(g \sqsubseteq f\)</span> means that <span class="math">\(g\)</span> approximates <span class="math">\(f\)</span>, or <span class="math">\(f\)</span> is more informative than <span class="math">\(g\)</span>. The domain <span class="math">\(\mathcal{P}(\mathbb{N})\)</span> with set containment <span class="math">\(\subseteq\)</span> forms a partial order. Refering back to the examples in the first section, with <span class="math">\(G=\{ 1 \mapsto 2 \}\)</span> and <span class="math">\(G'=\{1\mapsto 2, 2\mapsto 3\}\)</span>, we have <span class="math">\(G \sqsubseteq G'\)</span>, <span class="math">\(G' \sqsubseteq +1\)</span>, and <span class="math">\(G \sqsubseteq \times 2\)</span>. In a partial order, the join <span class="math">\(x \sqcup y\)</span> of <span class="math">\(x\)</span> and <span class="math">\(y\)</span> is the least element that is greater than both <span class="math">\(x\)</span> and <span class="math">\(y\)</span>. For the partial order on <span class="math">\(\mathcal{P}(\mathbb{N})\)</span>, join corresponds to set union.<br />In the context of partial orders, continuity is defined with respect to infinite sequences of ever-better approximations: <span class="math">\[f_0 \sqsubseteq f_1 \sqsubseteq f_2 \sqsubseteq \cdots\]</span> A function <span class="math">\(h\)</span> is continuous if applying it to the join of the sequence is the same as applying it to each element of the sequence and then taking the join.<br /><span class="math">\[h\left(\bigsqcup_{n\in\mathbb{N}} f_n\right) = \bigsqcup_{n\in\mathbb{N}} h(f_n) \hspace{1in} (3) \label{eq:cont-cpo}\]</span><br />But this equation is not so different from the equation (2) that expresses continuity on <span class="math">\(\mathcal{P}(\mathbb{N})\)</span>. For any function <span class="math">\(f\)</span> (with infinite domain) we can find an sequence <span class="math">\((f_n)_{n=0}^{\infty}\)</span> of ever-better but still finite approximations of <span class="math">\(f\)</span> such that <span class="math">\[f = \bigsqcup_{n\in\mathbb{N}} f_n\]</span> Then both equation (2) and (3) tell us that <span class="math">\(h(f)\)</span> is equal to the union of applying <span class="math">\(h\)</span> to each <span class="math">\(f_n\)</span>.<br /><br /><h1 id="further-reading">Further Reading</h1>The following is the list of resources that I found helpful in trying to understand the relationship between real numbers, <span class="math">\(\lambda\)</span>’s, and the role of continuity.<br /><ul><li><i>Data Types as Lattices</i> by Dana S. Scott.</li><li><i>A type-theoretical alternative to ISWIM, CUCH, OWHY</i> by Dana S. Scott.</li><li><i>The Formal Semantics of Programming Languages</i> by Glynn Winskel.</li><li><i>Topology via Logic</i> by Steven Vickers.</li><li><i>Topology (2nd Edition)</i> by James R. Munkres.</li><li><i>Introduction to Lattices and Order</i> by B.A. Davey and H.A. Priestley.</li><li>The Wikipedia articles on</li><ul><li><a href="https://en.wikipedia.org/wiki/Continuous_function">continuous functions</a></li><li><a href="https://en.wikipedia.org/wiki/Computable_number">computable numbers</a></li><li><a href="https://en.wikipedia.org/wiki/Continued_fraction">continued fractions</a></li></ul></ul>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com4tag:blogger.com,1999:blog-11162230.post-4512500290471528332017-12-23T20:04:00.000-08:002017-12-23T20:04:31.310-08:00Putting the Function back in Lambda<p>Happy holidays! There’s nothing quite like curling up in a comfy chair on a rainy day and proving a theorem in your favorite proof assistant.</p><p>Lately I’ve been interested in graph models of the <span class="math">\(\lambda\)</span>-calculus, that is, models that represent a <span class="math">\(\lambda\)</span> with <em>relations</em> from inputs to outputs. The use of relations instead of functions is not a problem when reasoning about expressions that produce numbers, but it does introduce problems when reasoning about expressions that produce higher-order functions. Some of these expressions are contextually equivalent but not denotationally equivalent. For example, consider the following two expressions. <span class="math">\[{\lambda f.\,} (f {\;}0) + (f {\;}0) =_{\mathrm{ctx}} {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) \qquad\qquad (1)\]</span> The expression on the left-hand side has two copies of a common subexpression <span class="math">\((f {\;}0)\)</span>. The expression on the right-hand side is optimized to have just a single copy of <span class="math">\((f {\;}0)\)</span>. The left and right-hand expressions in equation (1) are contextually equivalent because the <span class="math">\(\lambda\)</span>-calculus is a pure language (no side effects), so whether we call <span class="math">\(f\)</span> once or twice does not matter, and it always returns the same result given the same input. Unfortunately, the two expressions in equation (1) are not denotationally equivalent. <span class="math">\[{\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset \neq {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset \qquad\qquad (2)\]</span> Recall that my semantics <span class="math">\(\mathcal{E}\)</span> maps an expression and environment to a set of values. The “set” is not because an expression produces multiple conceptually-different values. Sets are needed because we represent a (infinite) function as an infinite set of finite relations. So to prove the above inequality (2) we simply need to find a value that is in the set on the left-hand side that is not in the set on the right-hand side. The idea is that we consider the behavior when parameter <span class="math">\(f\)</span> is bound to a relation that is not a function. In particular, the relation <span class="math">\[R = \{ (0,1), (0,2) \}\]</span> Now when we consider the application <span class="math">\((f {\;}0)\)</span>, the semantics of function application given by <span class="math">\(\mathcal{E}\)</span> can choose the result to be either <span class="math">\(1\)</span> or <span class="math">\(2\)</span>. Furthermore, for the left-hand side of equation (2), it could choose <span class="math">\(1\)</span> for the first <span class="math">\((f {\;}0)\)</span> and <span class="math">\(2\)</span> for the second <span class="math">\((f {\;}0)\)</span> . Thus, the result of the function can be <span class="math">\(3\)</span>. <span class="math">\[\{ (R,3) \} \in {\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset\]</span> Of course, this function could never actually produce <span class="math">\(3\)</span> because <span class="math">\(R\)</span> does not correspond to any <span class="math">\(\lambda\)</span>’s. In other words, garbage-in garbage-out. Turning to the right-hand side of equation (2), there is only one <span class="math">\((f{\;}0)\)</span>, which can either produce <span class="math">\(1\)</span> or <span class="math">\(2\)</span>, so the result of the outer function can be <span class="math">\(2\)</span> or <span class="math">\(4\)</span>, but not <span class="math">\(3\)</span>.</p><p><span class="math">\[\begin{aligned} \{ (R,2) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,3) \} &\notin {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,4) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\end{aligned}\]</span></p><p>So we need to put the function back in <span class="math">\(\lambda\)</span>! That is, we need to restrict the notion of values so that all the relations are also functions. Recall the definition: a <em>function</em> <span class="math">\(f\)</span> is a relation on two sets <span class="math">\(A\)</span> and <span class="math">\(B\)</span> such that for all <span class="math">\(a \in A\)</span> there exists a unique <span class="math">\(b \in B\)</span> such that <span class="math">\((a,b) \in f\)</span>. In other words, if <span class="math">\((a,b) \in f\)</span> and <span class="math">\((a,b') \in f\)</span>, then necessarily <span class="math">\(b = b'\)</span>. Can we simply add this restriction to our notion of value? Not quite. If we literally applied this definition, we could still get graphs such as the following one, which maps two different approximations of the add-one function to different outputs. This graph does not correspond to any <span class="math">\(\lambda\)</span>. <span class="math">\[\{ (\{(0,1)\}, 2), (\{(0,1),(5,6) \}, 3) \}\]</span></p><p>So we need to generalize the notion of function to allow for differing approximations. We shall do this by generalizing from equality to consistency, written <span class="math">\(\sim\)</span>. Two integers are consistent when they are equal. Two graphs as consistent when they map consistent inputs to consistent outputs. We are also forced to explicitly define inconsistency, which we explain below.</p><p><span class="math">\[\begin{gathered} \frac{}{n \sim n} \qquad \frac{\begin{array}{l}\forall v_1 v'_1 v_2 v'_2, (v_1,v'_1) \in t_1 \land (v_2,v'_2) \in t_2 \\ \implies (v_1 \sim v_2 \land v'_1 \sim v'_2) \lor v_1 \not\sim v_2 \end{array}} {t_1 \sim t_2} \\[2ex] \frac{n_1 \neq n_2}{n_1 \not\sim n_2} \qquad \frac{(v_1,v'_1) \in t_1 \quad (v_2,v'_2) \in t_2 \quad v_1 \sim v_2 \quad v'_1 \not\sim v'_2} {t_1 \not\sim t_2} \\[2ex] \frac{}{n \not\sim t} \qquad \frac{}{t \not\sim n}\end{gathered}\]</span></p><p>The definition of consistency is made a bit more complicated than I expected because the rules of an inductive definition must be monotonic, so we can’t negate a recursive application or put it on the left of an implication. In the above definition of consistency for graphs <span class="math">\(t_1 \sim t_2\)</span>, it would have been more natural to say <span class="math">\(v_1 \sim v_2 \implies v'_1 \sim v'_2\)</span> in the premise, but then <span class="math">\(v_1 \sim v_2\)</span> is on the left of an implication. The above inductive definition works around this problem by mutually defining consistency and inconsistency. We then prove that inconsistency is the negation of consistency.<br /></p><p><strong>Proposition 1</strong> (Inconsistency) <span class="math">\(v_1 \not\sim v_2 = \neg (v_1 \sim v_2)\)</span><br /><em>Proof.</em> We first establish by mutual induction that <span class="math">\(v_1 \sim v_2 \implies \neg (v_1 \not\sim v_2)\)</span> and <span class="math">\(v_1 \not\sim v_2 \implies \neg (v_1 \sim v_2)\)</span>. We then show that <span class="math">\((v_1 \sim v_2) \lor (v_1 \not\sim v_2)\)</span> by induction on <span class="math">\(v_1\)</span> and case analysis on <span class="math">\(v_2\)</span>. Therefore <span class="math">\(\neg (v_1 \not\sim v_2) \implies v_1 \sim v_2\)</span>, so we have proved both directions of the desired equality. <span class="math">\(\Box\)</span></p><p>Armed with this definition of consistency, we can define a generalized notion of function, let’s call it <span class="math">\(\mathsf{is\_fun}\)</span>. <span class="math">\[\mathsf{is\_fun}\;t \equiv \forall v_1 v_2 v'_1 v'_2, (v_1,v'_1) \in t \land (v_2,v'_2) \in t \land v_1 \sim v_2 \implies v'_1 \sim v'_2\]</span> Next we restrict the notion of value to require the graphs to satisfy <span class="math">\(\mathsf{is\_fun}\)</span>. Recall that we use to define values by the following grammar. <span class="math">\[\begin{array}{lrcl} \text{numbers} & n & \in & \mathbb{Z} \\ \text{graphs} & t & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \}\\ \text{values} & v & ::= & n \mid t \end{array}\]</span> We keep this definition but add an induction definition of a more refined notion of value, namely <span class="math">\(\mathsf{is\_val}\)</span>. Numbers are values and graphs are values so long as they satisfy <span class="math">\(\mathsf{is\_fun}\)</span> and only map values to values.</p><p><span class="math">\[\begin{gathered} \frac{}{\mathsf{is\_val}\,n} \qquad \frac{\mathsf{is\_fun}\;t \quad \forall v v', (v,v') \in t \implies \mathsf{is\_val}\,v \land \mathsf{is\_val}\,v'} {\mathsf{is\_val}\,t}\end{gathered}\]</span></p><p>We are now ready to update our semantic function <span class="math">\(\mathcal{E}\)</span>. The one change that we make is to require that each graph <span class="math">\(t\)</span> satisfies <span class="math">\(\mathsf{is\_val}\)</span> in the meaning of a <span class="math">\(\lambda\)</span>. <span class="math">\[{\mathcal{E}[\![ {\lambda x.\,} e ]\!]}\rho = \{ t \mid \mathsf{is\_val}\;t \land \forall (v,v')\in t, v' \in {\mathcal{E}[\![ e ]\!]}\rho(x{:=}v) \}\]</span> Hopefully this change to the semantics enables a proof that <span class="math">\(\mathcal{E}\)</span> is deterministic. Indeed, we shall show that if <span class="math">\(v \in {\mathcal{E}[\![ e ]\!]}\rho\)</span> and <span class="math">\(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\)</span> for any suitably related <span class="math">\(\rho\)</span> and <span class="math">\(\rho'\)</span>, then <span class="math">\(v \sim v'\)</span>.</p><p>To relate <span class="math">\(\rho\)</span> and <span class="math">\(\rho'\)</span>, we extend the definitions of consistency and <span class="math">\(\mathsf{is\_val}\)</span> to environments.</p><p><span class="math">\[\begin{gathered} \emptyset \sim \emptyset \qquad \frac{v \sim v' \quad \rho \sim \rho'} {\rho(x{:=}v) \sim \rho'(x{:=}v')} \\[2ex] \mathsf{val\_env}\;\emptyset \qquad \frac{\mathsf{is\_val}\; v \quad \mathsf{val\_env}\;\rho} {\mathsf{val\_env}\;\rho(x{:=}v)}\end{gathered}\]</span></p><p>We will need a few small lemmas concerning these definitions and their relationship with the <span class="math">\(\sqsubseteq\)</span> ordering on values.<br /></p><p><strong>Proposition 2</strong> </p><ol><li><p>If <span class="math">\(\mathsf{val\_env}\;\rho\)</span> and <span class="math">\(\rho(x) = v\)</span>, then <span class="math">\(\mathsf{is\_val}\; v\)</span>.</p></li><li><p>If <span class="math">\(\rho \sim \rho'\)</span>, <span class="math">\(\rho(x) = v\)</span>, <span class="math">\(\rho'(x) = v'\)</span>, then <span class="math">\(v \sim v'\)</span>.</p></li></ol><p><strong>Proposition 3</strong> </p><ol><li><p>If <span class="math">\(\mathsf{is\_val}\;v'\)</span> and <span class="math">\(v \sqsubseteq v'\)</span>, then <span class="math">\(\mathsf{is\_val}\; v\)</span>.</p></li><li><p>If <span class="math">\(v_1 \sqsubseteq v'_1\)</span>, <span class="math">\(v_2 \sqsubseteq v'_2\)</span>, and <span class="math">\(v'_1 \sim v'_2\)</span>, then <span class="math">\(v_1 \sim v_2\)</span>.</p></li></ol><p>We now come to the main theorem, which is proved by induction on <span class="math">\(e\)</span>, using the above three propositions.<br /></p><p><strong>Theorem</strong> (Determinism of <span class="math">\(\mathcal{E}\)</span>) If <span class="math">\(v \in {\mathcal{E}[\![ e ]\!]}\rho\)</span>, <span class="math">\(v' \in {\mathcal{E}[\![ e ]\!]}\rho'\)</span>, <span class="math">\(\mathsf{val\_env}\;\rho\)</span>, <span class="math">\(\mathsf{val\_env}\;\rho'\)</span>, and <span class="math">\(\rho \sim \rho'\)</span>, then <span class="math">\(\mathsf{is\_val}\;v\)</span>, <span class="math">\(\mathsf{is\_val}\;v'\)</span>, and <span class="math">\(v \sim v'\)</span>.</p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-64844811525397102682017-10-15T14:54:00.000-07:002017-10-17T20:27:52.073-07:00New revision of the semantics paper (POPL rejection, ESOP submission)My submission about declarative semantics to POPL was rejected. It's been a few weeks now, so I'm not so angry about it anymore. I've revised the paper and will be submitting it to ESOP this week.<br /><br />The main reason for rejection according to the reviewers was a lack of technical novelty, but I think the real reasons were that 1) the paper came across as too grandiose and as a result, it accidentally annoyed the reviewer who is an expert in denotational semantics, and 2) the paper did not do a good job of comparing to the related set-theoretic models of Plotkin and Engeler.<br /><br />Regarding 1), in the paper I use the term "declarative semantics" to try and distance this new semantics from the standard lattice-based denotational semantics. However, the reviewer took it to claim that the new semantics is not a denotational semantics, which is clearly false. In the new version of the paper I've removed the term "declarative semantics" and instead refer to the new semantics as a denotational semantics of the "elementary" variety. Also, I've toned down the sales pitch to better acknowledge that this new semantics is not the first elementary denotational semantics.<br /><br />Regarding 2), I've revised the paper to include a new section at the beginning that gives background on the elementary semantics of Plotkin, Engeler, and Coppo et al. This should help put the contributions of the paper in context.<br /><br />Other than that, I've added a section with a counter example to full abstraction. A big thanks to the POPL reviewers for the counter example! (Also thanks to Max New, who sent me the counter example a couple months ago.)<br /><br />Unfortunately, the ESOP page limit is a bit shorter, so I removed the relational version of the semantics and also the part about mutable references.<br /><br />A draft of the revision is available <a href="https://arxiv.org/abs/1707.03762" target="_blank">on arXiv</a>. Feedback is most welcome, especially from experts in denotational semantics! I really hope that this version is no longer annoying, but if it is, please tell me!Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-3063786702940087242017-10-03T17:43:00.000-07:002017-10-06T07:06:53.240-07:00Comparing to Plotkin and Engeler's Set-theoretic Models of the Lambda Calculus<p>On the plane ride back from ICFP last month I had a chance to re-read and better understand Plotkin’s <em>Set-theoretical and other elementary models of the <span class="math inline">\(\lambda\)</span>-calculus</em> (Technical Report 1972, Theoretical Computer Science 1993) and to read, for the first time, Engeler’s <em>Algebras and combinators</em> (Algebra Universalis 1981). As I wrote in my draft paper <a href="https://arxiv.org/abs/1707.03762"><em>Declarative semantics for functional languages: compositional, extensional, and elementary</em></a>, the main intuitions behind my simple semantics are present in these earlier papers, but until now I did not understand these other semantics deeply enough to give a crisp explanation of the similarities and differences. (The main intuitions are also present in the early work on intersection type systems, and my semantics is more closely related to those systems. A detailed explanation of that relationship is given in the draft paper.)</p><p>I should note that Engeler’s work was in the context of combinators (S and K), not the <span class="math inline">\(\lambda\)</span>-calculus, but of course the <span class="math inline">\(\lambda\)</span>-calculus can be encoded into combinators. I’ve ported his definitions to the <span class="math inline">\(\lambda\)</span>-calculus, along the lines suggested by Plotkin (1993), to make for easier comparison. In addition, I’ll extend both Engeler and Plotkin’s semantics to include integers and integer arithmetic in addition to the <span class="math inline">\(\lambda\)</span>-calculus. Here’s the syntax for the <span class="math inline">\(\lambda\)</span>-calculus that we consider here: <span class="math display">\[\begin{array}{rcl} && n \in \mathbb{Z} \qquad x \in \mathbb{X} \;\;\text{(program variables)}\\ \oplus & ::= & + \mid - \mid \times \mid \div \\ \mathbb{E} \ni e & ::= & n \mid e \oplus e \mid x \mid {\lambda x.\,} e \mid e \; e \mid {\textbf{if}\,}e {\,\textbf{then}\,}e {\,\textbf{else}\,}e \end{array}\]</span></p><h1 id="values">Values</h1><p>Perhaps the best place to start the comparison is in the definition of what I’ll call values. All three semantics give an inductive definition of values and all three involve finite sets, but in different ways. I’ll write <span class="math inline">\(\mathbb{V}_S\)</span> for my definition, <span class="math inline">\(\mathbb{V}_P\)</span> for Plotkin’s, and <span class="math inline">\(\mathbb{V}_E\)</span> for Engeler’s. <span class="math display">\[\begin{aligned} \mathbb{V}_S &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_S \times \mathbb{V}_S) \\ \mathbb{V}_P &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_P) \times \mathcal{P}_f(\mathbb{V}_P) \\ \mathbb{V}_E &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_E) \times \mathbb{V}_E\end{aligned}\]</span> In <span class="math inline">\(\mathbb{V}_S\)</span>, a function is represented as a finite graph, that is, a finite set of input-output pairs. For example, the graph <span class="math inline">\(\{ (0,1), (1,2), (2,3) \}\)</span> is one of the meanings for the term <span class="math inline">\((\lambda x.\, x + 1)\)</span>.</p><p>Plotkin’s values <span class="math inline">\(\mathbb{V}_P\)</span> include only a single input-output pair from a function’s graph. For example, <span class="math inline">\((\{0\}, \{1\})\)</span> is one of the meanings for the term <span class="math inline">\((\lambda x.\, x + 1)\)</span>. Engeler’s values also include just a single entry. For example, <span class="math inline">\((\{0\}, 1)\)</span> is one of the meanings for the term <span class="math inline">\((\lambda x.\, x + 1)\)</span>. In this example we have not made use of the finite sets in the input and output of Plotkin’s values. To do so, let us consider a higher-order example, such as the term <span class="math inline">\((\lambda f.\, f\,1 + f\,2)\)</span>. For Plotkin, the following value is one of its meanings: <span class="math display">\[(\{ (\{1\}, \{3\}), (\{2\}, \{4\}) \}, \{7\})\]</span> That is, in case <span class="math inline">\(f\)</span> is the function that adds <span class="math inline">\(2\)</span> to its input, the result is <span class="math inline">\(7\)</span>. We see that the presence of finite sets in the input is needed to accomodate functions-as-input. The corresponding value in <span class="math inline">\(\mathbb{V}_S\)</span> is <span class="math display">\[\{ (\{ (1, 3), (2, 4) \}, 7) \}\]</span></p><p>The difference between Plotkin and Engeler’s values can be seen in functions that return functions. Consider the <span class="math inline">\(K\)</span> combinator <span class="math inline">\((\lambda x.\,\lambda y.\, x)\)</span>. For Plotkin, the following value is one of its meanings: <span class="math display">\[(\{1\}, \{ (\{0\},\{1\}), (\{2\},\{1\}) \})\]</span> That is, when applied to <span class="math inline">\(1\)</span> it returns a function that returns <span class="math inline">\(1\)</span> when applied to either <span class="math inline">\(0\)</span> or <span class="math inline">\(2\)</span>. The corresponding value in <span class="math inline">\(\mathbb{V}_S\)</span> is <span class="math display">\[\{ (1, \{ (0,1), (2,1) \}) \}\]</span> For Engeler, there is not a single value corresponding to the above value. Instead it requires two values to represent the same information. <span class="math display">\[(\{1\}, (\{0\},1)) \quad\text{and}\quad (\{1\}, (\{2\},1))\]</span> We’ll see later that it doesn’t matter that Engeler requires more values to represent the same information.</p><h1 id="the-domains">The Domains</h1><p>The semantics of Plotkin, Engeler, and myself does not use values for the domain, but instead a set of values. That is <span class="math display">\[\mathcal{P}(\mathbb{V}_S) \qquad \mathcal{P}(\mathbb{V}_P) \qquad \mathcal{P}(\mathbb{V}_E)\]</span></p><p>The role of the outer <span class="math inline">\(\mathcal{P}\)</span> is intimately tied to the meaning of functions in Plotkin and Engeler’s semantics because the values themselves only record a single input-output pair. The outer <span class="math inline">\(\mathcal{P}\)</span> is needed to represent all of the input-output pairs for a given function. While the <span class="math inline">\(\mathcal{P}\)</span> is also necessary for functions in my semantics, one can view it generically as providing non-determinism and therefore somewhat orthogonal to the meaning of functions per se. Next let’s take a look at the semantics.</p><h1 id="comparing-the-semantics">Comparing the Semantics</h1><p>Here is Plotkin’s semantics <span class="math inline">\(\mathcal{E}_P\)</span>. Let <span class="math inline">\(V,V'\)</span> range over finite sets of values. <span class="math display">\[\begin{aligned} {\mathcal{E}_P[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_P[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_P[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_P[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,V') \mid V' \subseteq {\mathcal{E}_P[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_P[\![ e_1\;e_2 ]\!]}\rho &= \bigcup \left\{ V' \, \middle| \begin{array}{l} \exists V.\, (V,V') {\in} {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_P[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_P[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_P[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\]</span> For Plotkin, the environment <span class="math inline">\(\rho\)</span> maps variables to finite sets of values. In the case for application, the input set <span class="math inline">\(V\)</span> must be a subset of the meaning of the argument, which is critical for enabling self application and, using the <span class="math inline">\(Y\)</span> combinator, general recursion. The <span class="math inline">\(\bigcup\)</span> flattens the set-of-finite-sets into a set.</p><p>Next we consider Engeler’s semantics <span class="math inline">\(\mathcal{E}_E\)</span>. <span class="math display">\[\begin{aligned} {\mathcal{E}_E[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_E[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_E[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_E[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,v') \mid v' \in {\mathcal{E}_E[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_E[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v' \, \middle| \begin{array}{l} \exists V.\, (V,v') {\in} {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_E[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_E[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_E[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\]</span> The semantics is quite similar to Plotkin’s, as again we see the use of <span class="math inline">\(\subseteq\)</span> in the case for application. Because the output <span class="math inline">\(v'\)</span> is just a value, and not a finite set of values as for Plotkin, there is no need for the <span class="math inline">\(\bigcup\)</span>.</p><p>Finally we review my semantics <span class="math inline">\(\mathcal{E}_S\)</span>. For it we need to define an ordering on values that is just equality for integers and <span class="math inline">\(\subseteq\)</span> on function graphs. Let <span class="math inline">\(t\)</span> range over <span class="math inline">\(\mathcal{P}_{f}(\mathbb{V} \times \mathbb{V})\)</span>. <span class="math display">\[\frac{}{n \sqsubseteq n} \qquad \frac{t_1 \subseteq t_2}{t_1 \sqsubseteq t_2}\]</span> Then we define <span class="math inline">\(\mathcal{E}_S\)</span> as follows. <span class="math display">\[\begin{aligned} {\mathcal{E}_S[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_S[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_S[\![ x ]\!]}\rho &= \{ v \mid v \sqsubseteq \rho(x) \} \\ {\mathcal{E}_S[\![ {\lambda x.\,} e ]\!]}\rho &= \{ t \mid \forall (v,v')\in t.\, v' \in {\mathcal{E}_S[\![ e ]\!]}\rho(x{:=}v) \} \\ {\mathcal{E}_S[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v \, \middle| \begin{array}{l} \exists t\, v_2\, v_3\, v_3'.\, t {\in} {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land v_2 {\in} {\mathcal{E}_S[\![ e_2 ]\!]}\rho \\ \land\, (v_3, v_3') \in t \land v_3 \sqsubseteq v_2 \land v \sqsubseteq v_3' \end{array} \right\} \\ {\mathcal{E}_S[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_S[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\]</span> In my semantics, <span class="math inline">\(\rho\)</span> maps a variable to a single value. The <span class="math inline">\(v_3 \sqsubseteq v_2\)</span> in my semantics corresponds to the uses of <span class="math inline">\(\subseteq\)</span> in Plotkin and Engeler’s. One can view this as a kind of subsumption, allowing the use of a larger approximation of a function in places where a smaller approximation is needed. I’m not sure whether all the other uses of <span class="math inline">\(\sqsubseteq\)</span> are necessary, but the semantics needs to be downward closed, and the above placement of <span class="math inline">\(\sqsubseteq\)</span>’s makes this easy to prove.</p><h1 id="relational-semantics">Relational Semantics</h1><p>For people like myself with a background in operational semantics, there is another view of the semantics that is helpful to look at. We can turn the above dentoational semantics into a relational semantics (like a big-step semantics) that hides the <span class="math inline">\(\mathcal{P}\)</span> by making use of the following isomorphism (where <span class="math inline">\(\mathbb{V}\)</span> is one of <span class="math inline">\(\mathbb{V}_S\)</span>, <span class="math inline">\(\mathbb{V}_P\)</span>, or <span class="math inline">\(\mathbb{V}_E\)</span>). <span class="math display">\[\mathbb{E} \to (\mathbb{X}\rightharpoonup \mathbb{V}) \to {\mathcal{P}(\mathbb{V})} \quad\cong\quad \mathbb{E} \times (\mathbb{X}\rightharpoonup \mathbb{V}) \times \mathbb{V}\]</span> Let <span class="math inline">\(v\)</span> range over <span class="math inline">\(\mathbb{V}\)</span>. We can define the semantic relation <span class="math inline">\(\rho \vdash_S e \Rightarrow v\)</span> that corresponds to <span class="math inline">\(\mathcal{E}_S\)</span> as follows. Note that in the rule for lambda abstraction, the table <span class="math inline">\(t\)</span> comes out of thin air (it is existentially quantified), and that there is one premise in the rule per entry in the table, that is, we have the quantification <span class="math inline">\(\forall(v,v') \in t\)</span>. <span class="math display">\[\begin{gathered} \frac{}{\rho \vdash_S n \Rightarrow n} \quad \frac {\rho \vdash_S e_1 \Rightarrow n_1 \quad \rho \vdash_S e_2 \Rightarrow n_2} {\rho \vdash_S e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \sqsubseteq \rho(x)} {\rho \vdash_S x \Rightarrow v} \\[3ex] \frac{\forall (v,v'){\in} t.\; \rho(x{:=}v) \vdash_S e \Rightarrow v'} {\rho \vdash_S {\lambda x.\,}e \Rightarrow t} \quad \frac{\begin{array}{c}\rho \vdash_S e_1 \Rightarrow t \quad \rho \vdash_S e_2 \Rightarrow v_2 \\ (v_3,v'_3) \in t \quad v_3 \sqsubseteq v_2 \quad v \sqsubseteq v'_3 \end{array} } {\rho \vdash_S (e_1{\;}e_2) \Rightarrow v} \\[3ex] \frac{\rho \vdash_S e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_S e_2 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_S e_1 \Rightarrow 0 \quad \rho \vdash_S e_3 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]</span></p><p>For comparison, let us also turn Plotkin’s semantics into a relation. <span class="math display">\[\begin{gathered} \frac{}{\rho \vdash_P n \Rightarrow n} \quad \frac {\rho \vdash_P e_1 \Rightarrow n_1 \quad \rho \vdash_P e_2 \Rightarrow n_2} {\rho \vdash_P e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \in \rho(x)} {\rho \vdash_P x \Rightarrow v} \\[3ex] \frac{\forall v' \in V'.\, \rho(x{:=}V) \vdash_P e \Rightarrow v'} {\rho \vdash_P {\lambda x.\,}e \Rightarrow (V,V')} \quad \frac{\begin{array}{c}\rho \vdash_P e_1 \Rightarrow (V,V') \quad \forall v_2 \in V.\, \rho \vdash_P e_2 \Rightarrow v_2 \\ v' \in V' \end{array} } {\rho \vdash_P (e_1{\;}e_2) \Rightarrow v'} \\[3ex] \frac{\rho \vdash_P e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_P e_2 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_P e_1 \Rightarrow 0 \quad \rho \vdash_P e_3 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]</span> Recall that in Plotkin’s semantics, the environment maps variables to finite sets of values. The “set” is needed to handle the case of a function bound to a variable, but is just extra baggage when we have an integer bound to a variable. So in the variable rule we have <span class="math inline">\(v \in \rho(x)\)</span>, which either extracts a singleton integer from <span class="math inline">\(\rho(x)\)</span>, or extracts one input-output entry from a function’s graph. Moving on to the lambda rule, it only produces one input-output entry, but to handle the case when the output <span class="math inline">\(V'\)</span> is representing a function, we must build it up one entry at a time with the quantification <span class="math inline">\(\forall v'\in V'\)</span> and a finite but arbitrary number of premises. In the application rule we again have a finite number of premises, with <span class="math inline">\(\forall v_2\in V\)</span>, and also the premise <span class="math inline">\(v' \in V'\)</span>.</p><p>The relational version of Engeler’s semantics removes the need for quantification in the lambda rule, but the application rule still has <span class="math inline">\(\forall v_2 \in V\)</span>. <span class="math display">\[\begin{gathered} \frac{\rho(x{:=}V) \vdash_E e \Rightarrow v'} {\rho \vdash_E {\lambda x.\,}e \Rightarrow (V,v')} \quad \frac{\begin{array}{c}\rho \vdash_E e_1 \Rightarrow (V,v') \quad \forall v_2 \in V.\, \rho \vdash_E e_2 \Rightarrow v_2 \end{array} } {\rho \vdash_E (e_1{\;}e_2) \Rightarrow v'}\end{gathered}\]</span></p><h1 id="conclusion">Conclusion</h1><p>My semantics is similar to Plotkin and Engeler’s in that</p><ol><li><p>The domain is a set of values, and values are inductively defined and involve finite sets.</p></li><li><p>Self application is enabled by allowing a kind of subsumption on functions.</p></li></ol><p>The really nice thing about all three semantics is that they are simple; very little mathematics is necessary to understand them, which is important pedagogically, practically (easier for practitioners to apply such semantics), and aesthetically (Occam’s razor!).</p><p>My semantics is different to Plotkin and Engeler’s in that</p><ol><li><p>the definition of values places <span class="math inline">\(\mathcal{P}_f\)</span> so that functions are literally represented by finite graphs, and</p></li><li><p>environments map each variable to a single value, and</p></li><li><p><span class="math inline">\(\sqsubseteq\)</span> is used instead of <span class="math inline">\(\subseteq\)</span> to enable self application.</p></li></ol><p>The upshot of these (relatively minor) differences is that my semantics may be easier to understand.</p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-37059006205672825812017-07-13T08:44:00.002-07:002017-07-13T08:46:45.445-07:00POPL submission, pulling together these blog posts on semantics!Last week I submitted a paper to POPL 2018 about the new kind of denotational semantics that I've been writing about in this blog, which I am now calling <i>declarative semantics</i>. I think this approach to semantics has the potential to replace operational semantics for the purposes of language specification. The declarative semantics has the advantage of being compositional and extensional while, like operational semantics, using only elementary mathematics. Thus, the declarative semantics should be better than operational semantics for reasoning about programs and for reasoning about the language as a whole (i.e. it's meta-theory). The paper pulls together many of the blog posts, updates them, and adds a semantics for mutable references. The paper is available now on <a href="https://arxiv.org/abs/1707.03762">arXiv</a> and the Isabelle mechanization is available <a href="https://www.dropbox.com/s/qslz1q6193qumsw/DeclSem.zip?dl=0">here</a>. I hope you enjoy it and I welcome your feedback!Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-61146457211099770662017-06-07T21:04:00.000-07:002017-06-08T05:06:09.504-07:00Revisiting "well-typed programs cannot go wrong"<p>Robin Milner proved that well-typed programs cannot go wrong in his 1978 paper <em>A Theory of Type Polymorphism in Programming</em> <span class="citation">(Milner 1978)</span>. That is, he defined a type system and denotational semantics for the Exp language (a subset of ML) and then proved that the denotation of a well-typed program in Exp is not the “wrong” value. The “wrong” denotation signifies that a runtime type error occurred, so Milner’s theorem proves that the type system is strong enough to prevent all the runtime type errors that could occur in an Exp program. The denotational semantics used by <span class="citation">Milner (1978)</span> was based on the standard domain theory for an explicitly typed language with higher-order functions.</p><p>I have been exploring, over the last month, whether I can prove a similar theorem but using my new denotational semantics, and mechanize the proof in the Isabelle proof assistant. At first I tried to stay as close to Milner’s proof as possible, but in the process I learned that Milner’s proof is rather syntactic and largely consists of proving lemmas about how substitution interacts with the type system, which does not shed much light on the semantics of polymorphism.</p><p>Last week I decided to take a step back and try a more semantic approach and switch to a cleaner but more expressive setting, one with first-class polymorphism. So I wrote down a denotational semantics for System F <span class="citation">(Reynolds 1974)</span> extended with support for general recursion. The proof that well-typed programs cannot go wrong came together rather quickly. Today I finished the <a href="https://www.dropbox.com/s/1481epexw53togy/SystemF.thy?dl=0">mechanization in Isabelle</a> and it came in at just 539 lines for all the definitions, lemmas, and main proof. I’m excited to share the details of how it went! Spoiler: the heart of the proof turned out to be a lemma I call Compositionality because it looks a lot like the similarly-named lemma that shows up in proofs of parametricity.</p><h1 id="syntax">Syntax</h1><p>The types in the language include natural numbers, function types, universal types, and type variables. Regarding the variables, after some experimentation with names and locally nameless, I settled on good old DeBruijn indices to represent both free and bound type variables. <span class="math">\[\begin{array}{rcl} i,j & \in & \mathbb{N} \\ \sigma,\tau & ::= & \mathtt{nat} \mid \tau \to \tau \mid \forall\,\tau \mid i \end{array}\]</span> So the type of the polymorphic identity function, normaly written <span class="math">\(\forall \alpha.\, \alpha \to \alpha\)</span>, is instead written <span class="math">\(\forall \left(0 \to 0\right)\)</span>.</p><p>The syntax of expressions is as follows. I choose to use DeBruijn indices for term variables as well, and left off all type annotations, but I don’t think that matters for our purposes here. <span class="math">\[\begin{array}{rcl} n & \in & \mathbb{N} \\ e & ::= & n \mid i \mid \lambda e \mid e\; e \mid \Lambda e \mid e [\,] \mid \mathtt{fix}\, e \end{array}\]</span></p><h1 id="denotational-semantics">Denotational Semantics</h1><p>The values in this language, described by the below grammar, include natural numbers, functions represented by finite lookup tables, type abstractions, and <span class="math">\(\mathsf{wrong}\)</span> to represent a runtime type error. <span class="math">\[\begin{array}{rcl} f & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \} \\ o & ::= & \mathsf{none} \mid \mathsf{some}(v) \\ v & ::= & n \mid \mathsf{fun}(f) \mid \mathsf{abs}(o) \mid \mathsf{wrong} \end{array}\]</span> A type abstraction <span class="math">\(\mathsf{abs}(o)\)</span> consists of an optional value, and not simply a value, because the body of a type abstraction might be a non-terminating computation.</p><p>We define the following information ordering on values so that we can reason about one lookup table being more or less-defined than another lookup table. We define <span class="math">\(v \sqsubseteq v'\)</span> inductively as follows.</p><p><span class="math">\[\begin{gathered} n \sqsubseteq n \quad \frac{f_1 \subseteq f_2} {\mathsf{fun}(f_1) \sqsubseteq \mathsf{fun}(f_2)} \quad \mathsf{wrong} \sqsubseteq\mathsf{wrong} \\ \mathsf{abs}(\mathsf{none}) \sqsubseteq\mathsf{abs}(\mathsf{none}) \quad \frac{v \sqsubseteq v'} {\mathsf{abs}(\mathsf{some}(v)) \sqsubseteq\mathsf{abs}(\mathsf{some}(v'))}\end{gathered}\]</span></p><p>The denotational semantics maps an expression to a set of values. Why a set and not just a single value? A single finite lookup table is not enough to capture the meaning of a lambda, but an infinite set of finite tables is. However, dealing with sets is somewhat inconvenient, so we mitigate this issue by working in a set monad. Also, to deal with <span class="math">\(\mathsf{wrong}\)</span> we need an error monad, so we use a combined set-and-error monad.</p><p><span class="math">\[\begin{aligned} X := E_1 ; E_2 &\equiv \{ v \mid \exists v'. \, v' \in E_1, v' \neq \mathsf{wrong}, v \in E_2[v'/X] \} \\ & \quad \cup \{ v \mid v = \mathsf{wrong}, \mathsf{wrong} \in E_1 \} \\ \mathsf{return}(E) & \equiv \{ v \mid v \sqsubseteq E \} \\ X \leftarrow E_1; E_2 & \equiv \{ v \mid \exists v'.\, v' \in E_1, v \in E_2[v'/X]\}\end{aligned}\]</span></p><p>The use of <span class="math">\(\sqsubseteq\)</span> in <span class="math">\(\mathsf{return}\)</span> is to help ensure that the meaning of an expression is downward-closed with respect to <span class="math">\(\sqsubseteq\)</span>. (The need for which is explained in prior blog posts.)</p><p>Our semantics will make use of a runtime environment <span class="math">\(\rho\)</span> that includes two parts, <span class="math">\(\rho_1\)</span> and <span class="math">\(\rho_2\)</span>. The first part gives meaning to the term variables, for which we use a list of values (indexed by their DeBruijn number). The second part, for the type variables, is a list containing sets of values, as the meaning of a type will be a set of values. We define the following notation for dealing with runtime environments.</p><p><span class="math">\[\begin{aligned} v{::}\rho \equiv (v{::}\rho_1, \rho_2) \\ V{::}\rho \equiv (\rho_1, V{::}\rho_2)\end{aligned}\]</span></p><p>We write <span class="math">\(\rho[i]\)</span> to mean either <span class="math">\(\rho_1[i]\)</span> or <span class="math">\(\rho_2[i]\)</span>, which can be disambiguated based on the context of use.</p><p>To help define the meaning of <span class="math">\(\mathtt{fix}\,e\)</span>, we inductively define a predicate named <span class="math">\(\mathsf{iterate}\)</span>. Its first parameter is the meaning <span class="math">\(L\)</span> of the expression <span class="math">\(e\)</span>, which is a mapping from an environment to a set of values. The second parameter is a runtime environment <span class="math">\(\rho\)</span> and the third parameter is a value that is the result of iteration.</p><p><span class="math">\[\begin{gathered} \mathsf{iterate}(L, \rho, \mathsf{fun}(\emptyset)) \quad \frac{\mathsf{iterate}(L, \rho, v) \quad v' \in E(v{::}\rho)} {\mathsf{iterate}(L, \rho, v')}\end{gathered}\]</span></p><p>To help define the meaning of function application, we define the following <span class="math">\(\mathsf{apply}\)</span> functiion. <span class="math">\[\mathsf{apply}(V_1,V_2) \equiv \begin{array}{l} x_1 := V_1; \\ x_2 := V_2; \\ \mathsf{case}\,x_1\,\textsf{of}\\ \;\; \mathsf{fun}(f) \Rightarrow (x'_2,x'_3) \leftarrow f; \mathsf{if}\, x'_2 \sqsubseteq x_2 \, \mathsf{then}\, x'_3 \,\mathsf{else}\, \emptyset \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\]</span></p><p>The denotational semantics is given by the following function <span class="math">\(E\)</span> that maps an expression and environment to a set of values.</p><p><span class="math">\[\begin{aligned} E[ n ]\rho &= \mathsf{return}(n) \\[1ex] E[ i ]\rho &= \mathsf{return}(\rho[i]) \\[1ex] E[ \lambda e ]\rho &= \{ v \mid \exists f.\, v = \mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f \Rightarrow \\ & \qquad\qquad \exists v_2.\, v_2 \in E[ e ] (v_1{::}\rho), v'_2 \sqsubseteq v_2\} \\[1ex] E[ e_1\; e_2 ] \rho &= \mathsf{apply}(E[ e_1 ]\rho, E[ e_2 ]\rho) \\[1ex] E[ \mathtt{fix}\,e ] \rho &= \{ v \mid \mathsf{iterate}(E[ e ], \rho, v) \} \\[1ex] E[ \Lambda e ] \rho &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V. v' \in E[ e ] (V{::}\rho) \} \\ & \quad\; \cup \{ v \mid v = \mathsf{abs}(\mathsf{none}), \forall V. E[ e ](V{::}\rho) = \emptyset \} \\[1ex] E[ e [\,] ] \rho &= \begin{array}{l} x := E [ e ] \rho;\\ \mathsf{case}\,x\,\mathsf{of} \\ \;\; \mathsf{abs}(\mathsf{none}) \Rightarrow \emptyset \\ \mid \mathsf{abs}(\mathsf{some}(v')) \Rightarrow \mathsf{return}(v') \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\end{aligned}\]</span></p><p>We give meaning to types with the function <span class="math">\(T\)</span>, which maps a type and an environment to a set of values. For this purposes, we only need the second part of the runtime environment which gives meaning to type variables. Instead of writing <span class="math">\(\rho_2\)</span> everywhere, we’ll use the letter <span class="math">\(\eta\)</span>. It is important to ensure that <span class="math">\(T\)</span> is downward closed, which requires some care either in the definition of <span class="math">\(T[ \forall \tau ]\eta\)</span> or in the definition of <span class="math">\(T[ i ]\eta\)</span>. We have chosen to do this work in the definition of <span class="math">\(T[ i ]\eta\)</span>, and let the definition of <span class="math">\(T[ \forall \tau ]\eta\)</span> quantify over any set of values <span class="math">\(V\)</span> to give meaning to it’s bound type variable.</p><p><span class="math">\[\begin{aligned} T[ \mathtt{nat} ] \eta &= \mathbb{N} \\ T[ i ] \eta &= \begin{cases} \{ v \mid \exists v'.\, v' \in \eta[i], v \sqsubseteq v',v \neq \mathsf{wrong} \} &\text{if } i < |\eta| \\ \emptyset & \text{otherwise} \end{cases} \\ T[ \sigma\to\tau ] \eta &= \{ v\mid \exists f. \,v=\mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f, v_1 \in T[\sigma]\eta \\ & \hspace{1.5in} \Rightarrow \exists v_2.\, v_2 \in T[\tau]\eta, v'_2 \sqsubseteq v_2 \} \\ T[ \forall\tau ] \eta &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V.\, v' \in T[\tau ] (V{::}\eta) \} \cup \{ \mathsf{abs}(\mathsf{none}) \} \end{aligned}\]</span></p><h1 id="type-system">Type System</h1><p>Regarding the type system, it is standard except perhaps how we deal with the DeBruijn representation of type variables. We begin with the definition of well-formed types. A type is well formed if all the type variables in it are properly scoped, which is captured by their indices being below a given threshold (the number of enclosing type variable binders, that is, <span class="math">\(\Lambda\)</span>’s and <span class="math">\(\forall\)</span>’s). More formally, we write <span class="math">\(j \vdash \tau\)</span> to say that type <span class="math">\(\tau\)</span> is well-formed under threshold <span class="math">\(j\)</span>, and give the following inductive definition.</p><p><span class="math">\[\begin{gathered} j \vdash \mathtt{nat} \quad \frac{j \vdash \sigma \quad j \vdash \tau}{j \vdash \sigma \to \tau} \quad \frac{j+1 \vdash \tau }{j \vdash \forall \tau} \quad \frac{i < j}{j \vdash i}\end{gathered}\]</span></p><p>Our representation of the type environment is somewhat unusual. Because term variables are just DeBruijn indices, we can use a list of types (instead of a mapping from names to types). However, to keep track of the type-variable scoping, we also include with each type the threshold from its point of definition. Also, we need to keep track of the current threshold, so when we write <span class="math">\(\Gamma\)</span>, we mean a pair where <span class="math">\(\Gamma_1\)</span> is a list and <span class="math">\(\Gamma_2\)</span> is a number. The list consists of pairs of types and numbers, so for example, <span class="math">\(\Gamma_1[i]_1\)</span> is a type and <span class="math">\(\Gamma_1[i]_2\)</span> is a number whenever <span class="math">\(i\)</span> is less than the length of <span class="math">\(\Gamma_1\)</span>. We use the following notation for extending the type environment:</p><p><span class="math">\[\begin{aligned} \tau :: \Gamma &\equiv ((\tau,\Gamma_2){::}\Gamma_1, \Gamma_2) \\ * :: \Gamma & \equiv (\Gamma_1, \Gamma_2 + 1)\end{aligned}\]</span></p><p>We write <span class="math">\(\vdash \rho : \Gamma\)</span> to say that environment <span class="math">\(\rho\)</span> is well-typed according to <span class="math">\(\Gamma\)</span> and define it inductively as follows.</p><p><span class="math">\[\begin{gathered} \vdash ([],[]) : ([], 0) \quad \frac{\vdash \rho : \Gamma \quad v \in T[ \tau ] \rho_2} {\vdash v{::}\rho : \tau{::}\Gamma} \quad \frac{\vdash \rho : \Gamma} {\vdash V{::}\rho : *{::}\Gamma}\end{gathered}\]</span></p><p>The primary operation that we perform on a type environment is looking up the type associated with a term variable, for which we define the following function <span class="math">\(\mathsf{lookup}\)</span> that maps a type environment and DeBruijn index to a type. To make sure that the resulting type is well-formed in the current environment, we must increase all of its free type variables by the difference of the current threshold <span class="math">\(\Gamma_2\)</span> and the threshold at its point of definition, <span class="math">\(\Gamma_1[i]_2\)</span>, which is accomplished by the shift operator <span class="math">\(\uparrow^k_c(\tau)\)</span> <span class="citation">(Pierce 2002)</span>. <span class="math">\[\mathsf{lookup}(\Gamma,i) \equiv \begin{cases} \mathsf{some}(\uparrow^{k}_{0}(\Gamma_1[i]_1) & \text{if } n < |\Gamma_1| \\ & \text{where } k = \Gamma_2 - \Gamma_1[i]_2 \\ \mathsf{none} & \text{otherwise} \end{cases}\]</span></p><p>To review, the shift operator is defined as follows.</p><p><span class="math">\[\begin{aligned} \uparrow^{k}_{c}(\mathtt{nat}) &= \mathtt{nat} \\ \uparrow^{k}_{c}(i) &= \begin{cases} i + k & \text{if } c \leq i \\ i & \text{otherwise} \end{cases} \\ \uparrow^{k}_{c}(\sigma \to \tau) &= \uparrow^{k}_{c}(\sigma) \to \uparrow^{k}_{c}(\tau) \\ \uparrow^{k}_{c}(\forall \tau) &= \forall\, \uparrow^{k}_{c+1}(\tau)\end{aligned}\]</span></p><p>Last but not least, we need to define type substitution so that we can use it in the typing rule for instantiation (type application). We write <span class="math">\([j\mapsto \tau]\sigma\)</span> for the substitution of type <span class="math">\(\tau\)</span> for DeBruijn index <span class="math">\(j\)</span> within type <span class="math">\(\sigma\)</span> <span class="citation">(Pierce 2002)</span>.</p><p><span class="math">\[\begin{aligned} [j\mapsto \tau]\mathtt{nat} &= \mathtt{nat} \\ [j\mapsto\tau]i &= \begin{cases} \tau & \text{if } j = i \\ i - 1 & \text{if } j < i \\ i & \text{otherwise} \end{cases}\\ [j\mapsto\tau](\sigma\to\sigma') &= [j\mapsto\tau]\sigma \to [j\mapsto \tau]\sigma' \\ [j\mapsto \tau]\forall\sigma &= \forall\, [j+1 \mapsto \uparrow^{1}_{0}(\tau)]\sigma\end{aligned}\]</span></p><p>Here is the type system for System F extended with <span class="math">\(\mathtt{fix}\)</span>.</p><p><span class="math">\[\begin{gathered} \Gamma \vdash n : \mathtt{nat} \qquad \frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau} \qquad \frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau} \\[2ex] \frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau} \qquad \frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}\end{gathered}\]</span></p><p>We say that a type environment <span class="math">\(\Gamma\)</span> is well-formed if <span class="math">\(\Gamma_2\)</span> is greater or equal to every threshold in <span class="math">\(\Gamma_1\)</span>, that is <span class="math">\(\Gamma_1[i]_2 \leq \Gamma_2\)</span> for all <span class="math">\(i < |\Gamma_1|\)</span>.</p><h1 id="proof-of-well-typed-programs-cannot-go-wrong">Proof of well-typed programs cannot go wrong</h1><p>The proof required 6 little lemmas and 4 big lemmas. (There were some itsy bitsy lemmas too that I’m not counting.)</p><h2 id="little-lemmas">Little Lemmas</h2><p><b>Lemma</b> [<span class="math">\(\sqsubseteq\)</span> is a preorder] </p><ul><li><p><span class="math">\(v \sqsubseteq v\)</span></p></li><li><p>If <span class="math">\(v_1 \sqsubseteq v_2\)</span> and <span class="math">\(v_2 \sqsubseteq v_3\)</span>, then <span class="math">\(v_1 \sqsubseteq v_3\)</span>.</p></li></ul><p>[lem:less-refl] [lem:less-trans]</p><p>I proved transitivity by induction on <span class="math">\(v_2\)</span>.</p><p><b>Lemma</b> [<span class="math">\(T\)</span> is downward closed] If <span class="math">\(v \in T [ \tau ] \eta\)</span> and <span class="math">\(v' \sqsubseteq v\)</span>, then <span class="math">\(v' \in T [ \tau ] \eta\)</span>. [lem:T-down-closed]</p><p>The above is a straightforward induction on <span class="math">\(\tau\)</span></p><p><b>Lemma</b> [<span class="math">\(\mathsf{wrong}\)</span> not in <span class="math">\(T\)</span>] For any <span class="math">\(\tau\)</span> and <span class="math">\(\eta\)</span>, <span class="math">\(\mathsf{wrong} \notin T [ \tau ] \eta\)</span>. [lem:wrong-not-in-T]</p><p>The above is another straightforward induction on <span class="math">\(\tau\)</span></p><p><b>Lemma</b> If <span class="math">\(\vdash \rho : \Gamma\)</span>, then <span class="math">\(\Gamma\)</span> is a well-formed type environment. [lem:wfenv-good-ctx]</p><p>The above is proved by induction on the derivation of <span class="math">\(\vdash \rho : \Gamma\)</span>.</p><p><b>Lemma</b> <span class="math">\[T [ \tau ] (\eta_1 \eta_3) = T [ \uparrow^{|\eta_2|}_{ |\eta_1|}(\tau) ] (\eta_1\eta_2\eta_3)\]</span></p><p>The above lemma is proved by induction on <span class="math">\(\tau\)</span>. It took me a little while to figure out the right strengthening of the statement of this lemma to get the induction to go through. The motivations for this lemma were the following corollaries.</p><p><b>Corollary</b> [Lift/Append Preserves <span class="math">\(T\)</span>] <span class="math">\[T [ \tau ](\eta_2) = T [ \uparrow^{|\eta_1|}_{0}(\tau) ] (\eta_1\eta_2)\]</span> [lem:lift-append-preserves-T]</p><p><b>Corollary</b>[Lift/Cons Preserves <span class="math">\(T\)</span>] <span class="math">\[T [ \tau ] (\eta) = T [ \uparrow^{1}_{0}(\tau) ] (V{::}\eta)\]</span> [lem:shift-cons-preserves-T]</p><p>Of course, two shifts can be composed into a single shift by adding the amounts.</p><p><b>Lemma</b> [Compose Shift] <span class="math">\[\uparrow^{j+k}_{c}(\tau) = \uparrow^{j}_{c}( \uparrow^{k}_{c}(\tau))\]</span> [lem:compose-shift]</p><p>The proof is a straightforward induction on <span class="math">\(\tau\)</span>.</p><h2 id="big-lemmas">Big Lemmas</h2><p>There are one or two big lemmas for each of the “features” in this variant of System F.</p><p>The first lemma shows that well-typed occurrences of term variables cannot go wrong.</p><p><b>Lemma</b> [Lookup in Well-typed Environment] <br />If <span class="math">\(\vdash \rho : \Gamma\)</span> and <span class="math">\(\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)\)</span>, then <span class="math">\(\exists v.\, \rho_1[i] = v\)</span> and <span class="math">\(v \in T [ \tau ] \rho_2\)</span>. [lem:lookup-wfenv]</p><p>The proof is by induction on the derivation of <span class="math">\(\vdash \rho : \Gamma\)</span>. The first two cases were straightforward but the third case required some work and used lemmas [lem:wfenv-good-ctx], [lem:shift-cons-preserves-T], and [lem:compose-shift].</p><p><b>Lemma</b> [Application cannot go wrong] If <span class="math">\(V \subseteq T [ \sigma \to \tau ] \eta\)</span> and <span class="math">\(V' \subseteq T [ \sigma ] \eta\)</span>, then <span class="math">\(\mathsf{apply}(V,V') \subseteq T [ \tau ] \eta\)</span>. [lem:fun-app]</p><p>The proof of this lemma is direct and does not use induction. However, it does use lemmas [lem:wrong-not-in-T] and [lem:T-down-closed].</p><p><b>Lemma</b> [Compositionality] Let <span class="math">\(V = T [ \sigma ] (\eta_1\eta_2)\)</span>. <span class="math">\[T [ \tau ] (\eta_1 V \eta_2) = T [ \tau[\sigma/|\eta_1|] ] (\eta_1 \eta_2)\]</span> [lem:compositionality]</p><p>I proved the Compositionality lemma by induction on <span class="math">\(\tau\)</span>. All of the cases were straightforward except for <span class="math">\(\tau=\forall\tau'\)</span>. In that case I used the induction hypothesis to show that <span class="math">\[T [ \tau' ] (V \eta_1 S \eta_2) = T [ ([|V\eta_1|\mapsto \uparrow^1_0(\sigma)] \tau' ] (V\eta_1\eta_2) \text{ where } S = T [ \uparrow^1_0(\sigma) ] (V\eta_1\eta_2)\]</span> and I used Lemma [lem:shift-cons-preserves-T].</p><p><b>Lemma</b> [Iterate cannot go wrong] If</p><ul><li><p><span class="math">\(\mathsf{iterate}(L,\rho,v)\)</span> and</p></li><li><p>for any <span class="math">\(v'\)</span>, <span class="math">\(v' \in T[ \sigma\to\tau ] \rho_2\)</span> implies <span class="math">\(L(v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2\)</span>,</p></li></ul><p>then <span class="math">\(v \in T [ \sigma \to \tau ] \rho_2\)</span>. [lem:iterate-sound]</p><p>This was straightfroward to prove by induction on the derivation of <span class="math">\(\mathsf{iterate}(L,\rho,v)\)</span>. The slightly difficult part was coming up with the definition of <span class="math">\(\mathsf{iterate}\)</span> to begin with and in formulating the second premise.</p><h2 id="the-theorem">The Theorem</h2><p><b>Theorem</b> [Well-typed programs cannot go wrong] <br />If <span class="math">\(\Gamma \vdash e : \tau\)</span> and <span class="math">\(\vdash \rho : \Gamma\)</span>, then <span class="math">\(E [ e ] \rho \subseteq T[ \tau ] \rho_2\)</span>. [thm:welltyped-dont-go-wrong]</p><p>The proof is by induction on the derivation of <span class="math">\(\Gamma \vdash e : \tau\)</span>.</p><ul><li><p><span class="math">\(\Gamma \vdash n : \mathtt{nat}\)</span></p><p>This case is immediate.</p></li><li><p><span class="math">\(\frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau}\)</span></p><p>Lemma [lem:lookup-wfenv] tells us that <span class="math">\(\rho_1[i] = v\)</span> and <span class="math">\(v \in T [ \tau ] \rho_2\)</span> for some <span class="math">\(v\)</span>. We conclude by Lemma [lem:T-down-closed].</p></li><li><p><span class="math">\(\frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau}\)</span></p><p>After unraveling some definitions, for arbitrary <span class="math">\(f,v_1,v_2,v'_2\)</span> we can assume <span class="math">\(v_1 \in T [ \sigma ] \rho_2\)</span>, <span class="math">\(v_2 \in E [ e ](v_1{::}\rho)\)</span>, and <span class="math">\(v'_2 \sqsubseteq v_2\)</span>. We need to prove that <span class="math">\(v_2 \in T [ \tau ] (v_1{::}\rho)_2\)</span>.</p><p>We can show <span class="math">\(\vdash v_1{::}\rho : \sigma{::}\Gamma\)</span> and therefore, by the induction hypothesis, <span class="math">\(E [ e ] (v_1{::}\rho) \subseteq T [ \tau ] (v_1{::}\rho)_2\)</span>. So we conclude that <span class="math">\(v_2 \in T [ \tau ] (v_1{::}\rho)_2\)</span>.</p></li><li><p><span class="math">\(\frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau}\)</span></p><p>By the induction hypothesis, we have <span class="math">\(E [ e ] \rho \subseteq T [ \sigma\to\tau ] \rho_2\)</span> and <span class="math">\(E [ e' ] \rho \subseteq T [ \sigma ] \rho_2\)</span>. We conclude by Lemma [lem:fun-app].</p></li><li><p><span class="math">\(\frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau}\)</span></p><p>For an arbitrary <span class="math">\(v\)</span>, we may assume <span class="math">\(\mathsf{iterate}(E[ e ], \rho, v)\)</span> and need to show that <span class="math">\(v \in T [ \sigma\to\tau ]\rho_2\)</span>.</p><p>In preparation to apply Lemma [lem:iterate-sound], we first prove that for any <span class="math">\(v'\)</span>, <span class="math">\(v' \in T[ \sigma\to\tau ] \rho_2\)</span> implies <span class="math">\(E[ e](v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2\)</span>. Assume <span class="math">\(v'' \in E[ e](v'{::}\rho)\)</span>. We need to show that <span class="math">\(v'' \in T[ \sigma\to\tau ] \rho_2\)</span>. We have <span class="math">\(\vdash v'{::}\rho : (\sigma\to\tau){::}\Gamma\)</span>, so by the induction hypothesis <span class="math">\(E [ e ](v'{::}\rho) \subseteq T[ \sigma\to\tau ](v'{::}\rho)\)</span>. From this we conclude that <span class="math">\(v'' \in T[ \sigma\to\tau ] \rho_2\)</span>.</p><p>We then apply Lemma [lem:iterate-sound] to conclude this case.</p></li><li><p><span class="math">\(\frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau}\)</span></p><p>After unraveling some definitions, for an arbitrary <span class="math">\(v'\)</span> and <span class="math">\(V\)</span> we may assume that <span class="math">\(\forall V'.\, v' \in E [ e ](V{::}\rho)\)</span>. We need to show that <span class="math">\(v' \in T [ \tau ] (V{::}\rho_2)\)</span>. We have <span class="math">\(\vdash V{::}\rho : *{::}\Gamma\)</span>, so by the induction hypothesis <span class="math">\(E[ e ](V{::}\rho) \subseteq T [ \tau ] (V{::}\rho)_2\)</span>. Also, from the assumption we have <span class="math">\(v' \in E [ e ](V{::}\rho)\)</span>, so we can conclude.</p></li><li><p><span class="math">\(\frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}\)</span></p><p>Fix a <span class="math">\(v' \in E [ e ] \rho\)</span>. We have three cases to consider.</p><ol><li><p><span class="math">\(v'=\mathsf{abs}(\mathsf{none})\)</span>. This case is immediate.</p></li><li><p><span class="math">\(v'=\mathsf{abs}(\mathsf{some}(v''))\)</span> for some <span class="math">\(v''\)</span>. By the induction hypothesis, <span class="math">\(v' \in T [ \forall\tau ]\rho_2\)</span>. So we have <span class="math">\(v'' \in T [ \tau ](V{::}\rho_2)\)</span> where <span class="math">\(V=T[\sigma]\rho_2\)</span>. Then by Compositionality (Lemma [lem:compositionality]) we conclude that <span class="math">\(v'' \in T [ [0\mapsto \sigma]\tau]\rho_2\)</span>.</p></li><li><p><span class="math">\(v'\)</span> is some other kind of value. This can’t happen because, by the induction hypothesis, <span class="math">\(v' \in T [ \forall\tau ]\rho_2\)</span>.</p></li></ol></li></ul><div class="references"><h1>References</h1><p>Milner, Robin. 1978. “A Theory of Type Polymorphism in Programming.” <em>Journal of Computer and System Sciences</em> 17 (3): 348–75.</p><p>Pierce, Benjamin C. 2002. <em>Types and Programming Languages</em>. MIT Press.</p><p>Reynolds, John C. 1974. “Towards a Theory of Type Structure.” In <em>Programming Symposium: Proceedings, Colloque Sur La Programmation</em>, 19:408–25. LNCS. Springer-Verlag.</p></div>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com1tag:blogger.com,1999:blog-11162230.post-36638143877930090162017-03-24T11:20:00.000-07:002017-03-24T21:35:05.337-07:00Consolidation of the Denotational Semantics and an Application to Compiler Correctness<p>This is a two part post. The second part depends on the first. </p> <h3>Part 1. Consolidation of the Denotational Semantics</h3> <p>As a matter of expediency, I've been working with two different versions of the intersection type system upon which the denotational semantics is based, one version <a href="http://siek.blogspot.com/2017/01/completeness-of-intersection-types-wrt.html">with subsumption</a> and <a href="http://siek.blogspot.com/2017/03/sound-wrt-contextual-equivalence.html">one without</a>. I had used the one with subsumption to prove completeness with respect to the reduction semantics whereas I had used the one without subsumption to prove soundness (for both whole programs and parts of programs, that is, contextual equivalence). The two versions of the intersection type system are equivalent. However, it would be nice to simplify the story and just have one version. Also, while the correspondence to intersection types has been enormously helpful in working out the theory, it would be nice to have a presentation of the semantics that doesn't talk about them and instead talks about functions as tables. </p> <p>Towards these goals, I went back to the proof of completeness with respect to the reduction semantics and swapped in <a href="http://siek.blogspot.com/2017/03/the-take-3-semantics-revisited.html">the "take 3" semantics</a>. While working on that I realized that the subsumption rule was almost admissible in the "take 3" semantics, just the variable and application equations needed more uses of \(\sqsubseteq\). With those changes in place, the proof of completeness went through without a hitch. So here's the updated definition of the denotational semantics of the untyped lambda calculus. </p> <p>The definition of values remains the same as last time: \[ \begin{array}{lrcl} \text{function tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] as does the \(\sqsubseteq\) operator. \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} For the denotation function \(E\), we add uses of \(\sqsubseteq\) to the equations for variables (\(v \sqsubseteq \rho(x)\)) and function application (\(v_3 \sqsubseteq v_3'\)). (I've also added the conditional expression \(\mathbf{if}\,e_1\,e_2\,e_3\) and primitive operations on numbers \(f(e_1,e_2)\), where \(f\) ranges over binary functions on numbers.) \begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ v \mid v \sqsubseteq \rho(x) \} \\ E[\!| \lambda x.\, e |\!](\rho) &= \left\{ T \middle| \begin{array}{l} \forall v_1 v_2'. \, v_1\mapsto v_2' \in T \Rightarrow\\ \exists v_2.\, v_2 \in E[\!| e |\!](\rho(x{:=}v_1)) \land v_2' \sqsubseteq v_2 \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists T v_2 v_2' v_3'.\, T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, v'_2\mapsto v_3' \in T \land v'_2 \sqsubseteq v_2 \land v_3 \sqsubseteq v_3' \end{array} \right\} \\ E[\!| f(e_1, e_2) |\!](\rho) &= \{ f(n_1,n_2) \mid \exists n_1 n_2.\, n_1 \in E[\!| e_1 |\!](\rho) \land n_2 \in E[\!| e_2 |\!](\rho) \} \\ E[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](\rho) &= \left\{ v \, \middle| \begin{array}{l} v \in E[\!| e_2 |\!](\rho) \quad \text{if } n \neq 0 \\ v \in E[\!| e_3 |\!](\rho) \quad \text{if } n = 0 \end{array} \right\} \end{align*} </p> <p>Here are the highlights of the results for this definition. </p> <p><b>Proposition</b> (Admissibility of Subsumption)<br>If \(v \in E[\!| e |\!] \) and \(v' \sqsubseteq v\), then \(v' \in E[\!| e |\!] \). </p> <p><b>Theorem</b> (Reduction implies Denotational Equality)<br><ol><li>If \(e \longrightarrow e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\). <li> If \(e \longrightarrow^{*} e'\), then \(E[\!| e |\!] = E[\!| e' |\!]\). </ol></p> <p><b>Theorem</b> (Whole-program Soundness and Completeness)<br><ol><li> If \(v' \in E[\!| e |\!](\emptyset)\), then \(e \longrightarrow^{*} v\) and \(v' \in E[\!| v |\!](\emptyset)\). <li> If \(e \longrightarrow^{*} v\), then \(v' \in E[\!| e |\!](\emptyset) \) and \(v' \in E[\!| v |\!](\emptyset) \) for some \(v'\). </ol></p> <p><b>Proposition</b> (Denotational Equality is a Congruence)<br> For any context \(C\), if \(E[\!| e |\!] = E[\!| e' |\!]\), then \(E[\!| C[e] |\!] = E[\!| C[e'] |\!]\). </p> <p><b>Theorem</b> (Soundness wrt. Contextual Equivalence)<br> If \(E[\!| e |\!] = E[\!| e' |\!]\), then \(e \simeq e'\). </p> <h3>Part 2. An Application to Compiler Correctness</h3> <p>Towards finding out how useful this denotational semantics is, I've begun looking at using it to prove compiler correctness. I'm not sure exactly which compiler I want to target yet, but as a first step, I wrote a simple source-to-source optimizer \(\mathcal{O}\) for the lambda calculus. It performs inlining and constant folding and simplifies conditionals. The optimizer is parameterized over the inlining depth to ensure termination. We perform optimization on the body of a function after inlining, so this is a polyvariant optimizer. Here's the definition. \begin{align*} \mathcal{O}[\!| x |\!](k) &= x \\ \mathcal{O}[\!| n |\!](k) &= n \\ \mathcal{O}[\!| \lambda x.\, e |\!](k) &= \lambda x.\, \mathcal{O}[\!| e |\!](k) \\ \mathcal{O}[\!| e_1\,e_2 |\!](k) &= \begin{array}{l} \begin{cases} \mathcal{O}[\!| [x{:=}e_2'] e |\!] (k{-}1) & \text{if } k \geq 1 \text{ and } e_1' = \lambda x.\, e \\ & \text{and } e_2' \text{ is a value} \\ e_1' \, e_2' & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| f(e_1,e_2) |\!](k) &= \begin{array}{l} \begin{cases} f(n_1,n_2) & \text{if } e_1' = n_1 \text{ and } e_2' = n_2 \\ f(e_1',e_2') & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](k) &= \begin{array}{l} \begin{cases} e_2' & \text{if } e_1' = n \text{ and } n \neq 0 \\ e_3' & \text{if } e_1' = n \text{ and } n = 0 \\ \mathbf{if}\,e_1'\, e_2'\,e_3'|\!](k) & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k)\\ \text{ and } e_3' = \mathcal{O}[\!|e_3 |\!](k) \end{array} \end{align*} </p> <p>I've proved that this optimizer is correct. The first step was proving that it preserves denotational equality. </p> <p><b>Lemma</b> (Optimizer Preserves Denotations) <br> \(E(\mathcal{O}[\!| e|\!](k)) = E[\!|e|\!] \) <br><b>Proof</b><br> The proof is by induction on the termination metric for \(\mathcal{O}\), which is the lexicographic ordering of \(k\) then the size of \(e\). All the cases are straightforward to prove because Reduction implies Denotational Equality and because Denotational Equality is a Congruence. <b>QED</b></p> <p><b>Theorem</b> (Correctness of the Optimizer)<br> \(\mathcal{O}[\!| e|\!](k) \simeq e\) <br><b>Proof</b><br> The proof is a direct result of the above Lemma and Soundness wrt. Contextual Equivalence. <b>QED</b></p> <p>Of course, all of this is proved in Isabelle. Here is the <a href="https://www.dropbox.com/s/98uhhrwhcc942d2/opt.tar.gz?dl=0">tar ball</a>. I was surprised that this proof of correctness for the optimizer was about the same length as the definition of the optimizer! </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-64616246801641788812017-03-10T20:31:00.000-08:002017-03-10T20:31:34.086-08:00The Take 3 Semantics, Revisited<p>In my post about <a href="http://siek.blogspot.com/2017/01/intersection-types-as-denotations.html">intersection types as denotations</a>, I conjectured that the simple <a href="http://siek.blogspot.com/2016/12/take-3-application-with-subsumption-for.html">"take 3" denotational semantics</a> is equivalent to an intersection type system. I haven't settled that question per se, but I've done something just as good, which is to show that everything that I've done with the intersection type system can also be done with the "take 3" semantics (with a minor modification). </p> <p>Recall that the main difference between the "take 3" semantics and the intersection type system is how subsumption of functions is handled. The "take 3" semantics defined function application as follows, using the subset operator \(\sqsubseteq\) to require the argument \(v_2\) to include all the entries in the parameter \(v'_2\), while allowing \(v_2\) to have possibly more entries. \begin{align*} E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists v_1 v_2 v'_2.\, v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, \{ v'_2\mapsto v_3 \} \sqsubseteq v_1 \land v'_2 \sqsubseteq v_2 \end{array} \right\} \end{align*} Values are either numbers or functions. Functions are represented as a finite tables mapping values to values. \[ \begin{array}{lrcl} \text{tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] and \(\sqsubseteq\) is defined as equality on numbers and subset for function tables: \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} Recall that \(\subseteq\) is defined in terms of equality on elements. </p> <p>In an intersection type system (without subsumption), function application uses subtyping. Here's one way to formulate the typing rule for application: \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A' \to B \quad A <: A'} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Types are defined as follows \[ \begin{array}{lrcl} \text{types} & A,B,C & ::= & n \mid A \to B \mid A \land B \mid \top \end{array} \] and the subtyping relation is given below. \begin{gather*} \frac{}{n <: n}(a) \quad \frac{}{\top <: \top}(b) \quad \frac{}{A \to B <: \top}(c) \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(d) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B}(e) \quad \frac{}{A \wedge B <: A}(f) \quad \frac{}{A \wedge B <: B}(g) \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)}(h) \end{gather*} Recall that values and types are isomorphic (and dual) to eachother in this setting. Here's the functions \(\mathcal{T}\) and \(\mathcal{V}\) that map back and forth between values and types. \begin{align*} \mathcal{T}(n) &= n \\ \mathcal{T}( \{ v_1 \mapsto v'_1, \ldots, v_n \mapsto v'_n \} ) &= \mathcal{T}(v_1) {\to} \mathcal{T}(v'_1) \land \cdots \land \mathcal{T}(v_n) {\to} \mathcal{T}(v'_n) \\[2ex] \mathcal{V}(n) &= n \\ \mathcal{V}(A \to B) &= \{ \mathcal{V}(A)\mapsto\mathcal{V}(B) \} \\ \mathcal{V}(A \land B) &= \mathcal{V}(A) \cup \mathcal{V}(B)\\ \mathcal{V}(\top) &= \emptyset \end{align*} </p> <p>Given that values and types are really the same, the the typing rule for application is almost the same as the equation for the denotation of \(E[\!| e_1\;e_2 |\!](\rho)\). The only real difference is the use of \(<:\) versus \(\sqsubseteq\). However, subtyping is a larger relation than \(\sqsubseteq\), i.e., \(v_1 \sqsubseteq v_2\) implies \(\mathcal{T}(v_1) <: \mathcal{T}(v_2)\) but it is not the case that \(A <: B\) implies \(\mathcal{V}(A) \sqsubseteq \mathcal{V}(B)\). Subtyping is larger because of rules \((d)\) and \((h)\). The other rules just express the dual of \(\subseteq\). </p> <p>So the natural question is whether subtyping needs to be bigger than \(\sqsubseteq\), or would we get by with just \(\sqsubseteq\)? In my <a href="http://siek.blogspot.com/2017/03/sound-wrt-contextual-equivalence.html">last post</a>, I mentioned that rule \((h)\) was not necessary. Indeed, I removed it from the Isabelle formalization without disturbing the proofs of whole-program soundness and completeness wrt. operational semantics, and was able to carry on and prove soundness wrt. contextual equivalence. This morning I also replaced rule \((d)\) with a rule that only allows equal function types to be subtypes. \[ \frac{}{A \to B <: A \to B}(d') \] The proofs went through again! Though I did have to make two minor changes in the type system without subsumption to ensure that it stays equivalent to the version of the type system with subsumption. I used the rule given above for function application instead of \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A \to B} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Also, I had to change the typing rule for \(\lambda\) to use subtyping to relate the body's type to the return type. \[ \frac{\Gamma,x:A \vdash e : B' \qquad B' <: B} {\Gamma \vdash \lambda x.\, e : A \to B} \] Transposing this back into the land of denotational semantics and values, we get the following equation for the meaning of \(\lambda\), in which everything in the return specification \(v_2\) must be contained in the value \(v'_2\) produced by the body. \[ E[\!| \lambda x.\; e |\!] (\rho) = \left\{ v \middle| \begin{array}{l}\forall v_1 v_2. \{v_1\mapsto v_2\} \sqsubseteq v \implies \\ \exists v_2'.\; v'_2 \in E[\!| e |\!] (\rho(x{:=}v_1)) \,\land\, v_2 \sqsubseteq v'_2 \end{array} \right\} \] </p> <p>So with this little change, the "take 3" semantics is a great semantics for the call-by-value untyped lambda calculus! For whole programs, it's sound and complete with respect to the standard operational semantics, and it is also sound with respect to contextual equivalence. </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-39490555625202145832017-03-08T08:59:00.002-08:002017-03-09T06:48:51.218-08:00Sound wrt. Contextual Equivalence<p>The ICFP paper submission deadline kept me busy for much of February, but now I'm back to thinking about the <a href="http://siek.blogspot.com/2016/12/simple-denotational-semantics-for.html">simple denotational semantics</a>of the lambda calculus. In previous posts I showed that this <a href="http://siek.blogspot.com/2017/01/completeness-of-intersection-types-wrt.html">semantics is equivalent to standard operational semantics</a> when considering the behavior of whole programs. However, sometimes it is necessary to reason about the behavior of program fragments and we would like to use the denotational semantics for this as well. For example, an optimizing compiler might want to exchange one expression for another less-costly expression that does the same job. </p> <p>The formal notion of two such ``exchangeable'' expressions is <b>contextual equivalence</b> (Morris 1968). It says that two expression are equivalent if plugging them into an arbitrary context produces programs that behave the same. </p> <p><b>Definition</b> (Contextual Equivalence)<br> Two expressions \(e_1\) and \(e_2\) are contextually equivalent, written \(e_1 \simeq e_2\), iff for any closing context \(C\), \[ \mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]). \] </p> <p>We would like to know that when two expressions are denotationally equal, then they are also contextually equivalent. </p><p><b>Theorem</b> (Sound wrt. Contextual Equivalence)<br>If \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\), then \(e_1 \simeq e_2\). <br> <p>The rest of the blog post gives an overview of the proof (except for the discussion of related work at the very end). The details of the proof are in the Isabelle <a href="https://www.dropbox.com/s/yzrx8fgw4y9lvg4/SoundWRTContext.tar?dl=0">mechanization</a>. But first we need to define the terms used in the above statements. </p> <h3>Definitions</h3> <p>Recall that our denotational semantics is defined in terms of an intersection type system. The meaning of an expression is the set of all types assigned to it by the type system. \[ E[e]\Gamma \equiv \{ A \mid \Gamma \vdash_2 e : A \} \] Recall that the types include singletons, functions, intersections, and a top type: \[ A,B,C ::= n \mid A \to B \mid A \land B \mid \top \] I prefer to think of these types as values, where the function, intersection, and top types are used to represent finite tables that record the input-output values of a function. </p> <p>The intersection type system that we use here differs from the one in the <a href="http://siek.blogspot.com/2017/01/completeness-of-intersection-types-wrt.html">previous post</a> in that we remove the subsumption rule and sprinkle uses of subtyping elsewhere in a standard fashion (Pierce 2002). </p> \begin{gather*} \frac{}{\Gamma \vdash_2 n : n} \\[2ex] \frac{} {\Gamma \vdash_2 \lambda x.\, e : \top} \quad \frac{\Gamma \vdash_2 \lambda x.\, e : A \quad \Gamma \vdash_2 \lambda x.\, e : B} {\Gamma \vdash_2 \lambda x.\, e : A \wedge B} \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash_2 x : A} \quad \frac{\Gamma,x:A \vdash_2 B} {\Gamma \vdash_2 \lambda x.\, e : A \to B} \\[2ex] \frac{\Gamma \vdash_2 e_1: C \quad C <: A \to B \quad \Gamma \vdash_2 e_2 : A} {\Gamma \vdash_2 e_1 \; e_2 : B} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash_2 e_1 : A \quad A <: n_1 \\ \Gamma \vdash_2 e_2 : B \quad B <: n_2 \end{array} \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash_2 \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: 0 \quad \Gamma \vdash_2 e_3 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: n \quad n \neq 0 \quad \Gamma \vdash_2 e_2 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \end{gather*} <p>Regarding subtyping, we make a minor change and leave out the rule \[ \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \] because I had a hunch that it wasn't needed to prove Completeness with respect to the small step semantics, and indeed it was not. So the subtyping relation is defined as follows. </p> \begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \end{gather*} <p>This type system is equivalent to the one with subsumption in the following sense. </p> <p><b>Theorem</b> (Equivalent Type Systems) <br><ol><li> If \(\Gamma \vdash e : A\), then \(\Gamma \vdash_2 e : A'\) and \(A' <: A\) for some \(A'\). <li> If \(\Gamma \vdash_2 e : A\), then \(\Gamma \vdash e : A\). </ol><b>Proof</b> <br>The proofs of the two parts are straightforward inductions on the derivations of the typing judgments. <b>QED</b></p> <p>This type system satisfies the usual progress and preservation properties. </p> <p><b>Theorem</b> (Preservation)<br>If \(\Gamma \vdash_2 e : A\) and \(e \longrightarrow e'\), then \(\Gamma \vdash_e e' : A'\) and \(A' <: A\) for some \(A'\). <br><b>Proof</b><br>The proof of preservation is by induction on the derivation of the reduction. The case for \(\beta\) reduction relies on lemmas about substitution and type environments. <b>QED</b></p> <p><b>Theorem</b> (Progress)<br>If \(\emptyset \vdash_2 e : A\) and \(\mathrm{FV}(e) = \emptyset\), then \(e\) is a value or \(e \longrightarrow e'\) for some \(e'\). <br><b>Proof</b><br>The proof of progress is by induction on the typing derivation. As usual it relies on a canonical forms lemma. <b>QED</b></p> <p><b>Lemma</b> (Canonical forms) <br> Suppose \(\emptyset \vdash_2 v : A\). <ol><li> If \(A <: n\), then \(v = n\). <li> If \(A <: B \to C\), then \(v = \lambda x.\, e\) for some \(x,e\). </ol></p> <p>Next we turn to the definition of \(\mathit{eval}\). As usual, we shall define the behavior of a program in terms of the operational (small-step) semantics and an \(\mathit{observe}\) function. \begin{align*} \mathit{eval}(e) &= \begin{cases} \mathit{observe}(v) & \text{if } e \longrightarrow^{*} v \\ \mathtt{bad} & \text{otherwise} \end{cases}\\ \mathit{observe}(n) &= n \\ \mathit{observe}(\lambda x.\, e) &= \mathtt{fun} \end{align*} In the above we categorize programs as \(\mathtt{bad}\) if they do not produce a value. Thus, we are glossing over the distinction between programs that diverge and programs that go wrong (e.g., segmentation fault). We do this because our denotational semantics does not make such a distinction. However, I plan to circle back to this issue in the future and develop a version of the semantics that does. </p> <h3>Soundness wrt. Contextual Equivalence</h3> <p>We assume that \(E[e_1]\Gamma = E[e_2]\Gamma\) for any \(\Gamma\) and need to show that \(e_1 \simeq e_2\). That is, we need to show that \(\mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]) \) for any closing context \(C\). We shall prove Congruence which lets us lift the denotational equality of \(e_1\) and \(e_2\) through any context, so we have \begin{equation} E[C[e_1]]\emptyset = E[C[e_2]]\emptyset \qquad\qquad (1) \end{equation} Now let us consider the cases for \(\mathsf{eval}(C[e_1])\). <ul><li>Case \(\mathsf{eval}(C[e_1]) = \mathit{observe}(v)\) and \(C[e_1] \longrightarrow^{*} v\): <br> By Completeness of the intersection type system we have \(\emptyset \vdash_2 C[e_1] : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\). Then with (1) we have \begin{equation} \emptyset \vdash_2 C[e_2] : A \qquad\qquad (2) \end{equation} The type system is sound wrt. the big-step semantics, so \(\emptyset \vdash C[e_2] \Downarrow v'\) for some \(v'\). Therefore \(C[e_2] \longrightarrow^{*} v''\) because the big-step semantics is sound wrt. the small-step semantics. It remains to show that \(\mathit{observe}(v'') = \mathit{observe}(v)\). From (2) we have \(\emptyset \vdash_2 v'' : A''\) for some \(A''\) where \(A'' <: A\), by Preservation. Noting that we already have \(\emptyset \vdash_2 v : A'\), \(\emptyset \vdash_2 v'' : A''\), \(A' <: A\), and \(A'' <: A\), we conclude that \(\mathit{observe}(v) = \mathit{observe}(v'')\) by the Lemma Observing values of subtypes. <li>Case \(\mathsf{eval}(C[e_1]) = \mathtt{bad}\):<br> So \(C[e_1]\) either diverges or gets stuck. In either case, we have \(E[C[e_1]]\emptyset = \emptyset \) (Lemmas Diverging programs have no meaning and Programs that get stuck have no meaning). So by (1) we have \(E[C[e_2]]\emptyset = \emptyset\). We conclude that \(C[e_2]\) either diverges or gets stuck by Lemma (Programs with no meaning diverge or get stuck). Thus, \(\mathsf{eval}(C[e_2]) = \mathtt{bad}\). </ul><b>QED</b></p> <p><b>Lemma</b> (Congruence) <br>Let \(C\) be an arbitrary context. If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[C[e_1]]\Gamma = E[C[e_2]]\Gamma\). <br><b>Proof</b><br>We prove congruence by structural induction on the context \(C\), using the induction hypothesis and the appropriate Compatibility lemma for each kind of expression. <b>QED</b></p> <p>Most of the Compatibility lemmas are straightforward, though the one for abstraction is worth discussing. </p> <p><b>Lemma</b> (Compatibility for abstraction) <br>If \(E[e_1]\Gamma' = E[e_2]\Gamma'\) for any \(\Gamma'\), then \(E[\lambda x.\, e_1]\Gamma = E[\lambda x.\, e_2]\Gamma\). <br><b>Proof</b><br>To prove compatibility for abstractions, we first prove that <blockquote>If \(\Gamma' \vdash_2 e_1 : B\) implies \(\Gamma' \vdash_2 e_2 : B\) for any \(\Gamma',B\), then \(\Gamma \vdash_2 \lambda x.\, e_1 : C\) implies \(\Gamma \vdash_2 \lambda x.\, e_2 : C\). </blockquote>This is a straightforward induction on the type \(C\). Compatibility follows by two uses this fact. <b>QED</b></p> <p><b>Theorem</b> (Completeness wrt. small-step semantics) If \(e \longrightarrow^{*} v\) then \(\emptyset \vdash_2 e : A\) and \(\emptyset \vdash_2 v : A'\) for some \(A,A'\) such that \(A' <: A\). <br><b>Proof</b><br>We have \(\emptyset \vdash e : B\) and \(\emptyset \vdash v : B\) by Completeness of the type system with subsumption. Therefore \(\emptyset \vdash_2 e : A\) and \(A <: B\) by Theorem Equivalent Type Systems. By preservation we conclude that \(\emptyset \vdash_2 v : A'\) and \(A' <: A\). <b>QED</b></p> <p>In a <a href="http://siek.blogspot.com/2016/12/simple-denotational-semantics-for.html">previous blog post</a>, we proved soundness with respect to big-step semantics for a slightly different denotational semantics. So we update that proof for the denotational semantics defined above. We shall make use of the following logical relation \(\mathcal{G}\) in this proof. \begin{align*} G[n] &= \{ n \} \\ G[A \to B] &= \{ \langle \lambda x.\, e, \rho \rangle \mid \forall v \in G[A]. \; \rho(x{:=}v) \vdash e \Downarrow v' \text{ and } v' \in G[B] \} \\ G[A \land B] &= G[A] \cap G[B] \\ G[\top] &= \{ v \mid v \in \mathrm{Values} \} \\ \\ G[\emptyset] &= \{ \emptyset \} \\ G[\Gamma,x:A] &= \{ \rho(x{:=}v) \mid v \in G[A] \text{ and } \rho \in G[\Gamma] \} \end{align*} </p> <p>We shall need two lemmas about this logical relation. </p> <p><b>Lemma</b> (Lookup in \(\mathcal{G}\)) <br>If \(x:A \in \Gamma\) and \(\rho \in G[\Gamma]\), then \(\rho(x) = v\) and \(v \in G[A]\). <br></p> <p><b>Lemma</b> (\(\mathcal{G}\) preserves subtyping ) <br>If \(A <: B\) and \(v \in G[A]\), then \(v \in G[B]\). </p> <p><b>Theorem</b> (Soundness wrt. big-step semantics) <br>If \(\Gamma \vdash_2 e : A\) and \(\rho \in G[\Gamma]\), then \(\rho \vdash e \Downarrow v\) and \(v \in G[A]\). <br><b>Proof</b><br>The proof is by induction on the typing derivation. The case for variables uses the Lookup Lemma and all of the elimination forms use the above Subtyping Lemma (because their typing rules use subtyping). <b>QED</b></p> <p><b>Lemma</b> (Observing values of subtypes) <br>If \(\emptyset \vdash_2 v : A\), \(\emptyset \vdash_2 v' : B\), \(A <: C\), and \(B <: C\), then \(\mathit{observe}(v) = \mathit{observe}(v')\). <br><b>Proof</b><br>The proof is by cases of \(v\) and \(v'\). We use Lemmas about the symmetry of subtyping for singletons, an inversion lemma for functions, and that subtyping preserves function types. <b>QED</b></p> <p><b>Lemma</b> (Subtyping symmetry for singletons) If \(n <: A\), then \(A <: n\). </p> <p>For the next lemma we need to characterize the types for functions. \begin{gather*} \frac{}{\mathit{fun}(A \to B)} \quad \frac{\mathit{fun}(A) \qquad \mathit{fun}(B)} {\mathit{fun}(A \land B)} \quad \frac{}{\mathit{fun}(\top)} \end{gather*} </p> <p><b>Lemma</b> (Inversion on Functions) <br>If \(\Gamma \vdash_2 \lambda x.\, e : A\), then \(\mathit{fun}(A)\). </p> <p><b>Lemma</b> (Subtyping preserves functions)<br>If \(A <: B\) and \(\mathit{fun}(A)\), then \(\mathit{fun}(B)\). </p> <p><b>Lemma</b> (Diverging Programs have no meaning) <br>If \(e\) diverges, then \(E[e]\emptyset = \emptyset\). <br><b>Proof</b><br>Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Then by soundness wrt. big-step semantics, we have \(\emptyset \vdash e \Downarrow v\) and so also \(e \longrightarrow^{*} v'\). But this contradicts the premise that \(e\) diverges. <b>QED</b></p> <p><b>Lemma</b> (Programs that get stuck have no meaning) <br>Suppose that \(e \longrightarrow^{*} e'\) and \(e'\) is stuck (and not a value). Then \(E[e]\emptyset = \emptyset\). <br><b>Proof</b><br>Towards a contradiction, suppose \(E[e]\emptyset \neq \emptyset\). Then we have \(\emptyset \vdash_2 e : A\) for some \(A\). Therefore \(\emptyset \vdash_2 e' : A'\) for some \(A' <: A\). By Progress, either \(e'\) is a value or it can take a step. But that contradicts the premise. <b>QED</b></p> <p><b>Lemma</b> (Programs with no meaning diverge or gets stuck) <br>If \(E[e]\emptyset = \emptyset\), then \(e\) diverges or reduces to a stuck non-value. <br><b>Proof</b><br>Towards a contradiction, suppose that \(e\) does not diverge and does not reduce to a stuck non-value. So \(e \longrightarrow^{*} v\) for some \(v\). But then by Completeness wrt. the small-step semantics, we have \(\emptyset \vdash_2 e : A\) for some \(A\), which contradicts the premise \(E[e]\emptyset = \emptyset\). <b>QED</b></p> <h3>Related Work</h3> <p>The proof method used here, of proving Compatibility and Congruence lemmas to show soundness wrt. contextual equivalence, is adapted from Gunter's book (1992), where he proves that the standard model for PCF (CPO's and continuous functions) is sound. This approach is also commonly used to show that logical relations are sound wrt. contextual equivalence (Pitts 2005). </p> <p>The problem of <b>full abstraction</b> is to show that denotational equivalence is both sound (aka. correct): \[ E[e_1] = E[e_2] \qquad \text{implies} \qquad e_1 \simeq e_2 \] and complete: \[ e_1 \simeq e_2 \qquad \text{implies} \qquad E[e_1] = E[e_2] \] with respect to contextual equivalence (Milner 1975). Here we showed that the simple denotational semantics is sound. I do not know whether it is complete wrt. contextual equivalence. </p> <p>There are famous examples of denotational semantics that are not complete. For example, the standard model for PCF is not complete. There are two expressions in PCF that are contextually equivalent but not denotationally equivalent (Plotkin 1977). The idea behind the counter-example is that parallel-or cannot be defined in PCF, but it can be expressed in the standard model. The two expressions are higher-order functions constructed to behave differently only when applied to parallel-or. </p> <p>Rocca and Paolini (2004) define a filter model \(\mathcal{V}\) for the call-by-value lambda calculus, similar to our simple denotational semantics, and prove that it is sound wrt. contextual equivalence (Theorem 12.1.18). Their type system and subtyping relation differs from ours in several ways. Their \(\land\,\mathrm{intro}\) rule is not restricted to \(\lambda\), they include subsumption, their \(\top\) type is a super-type of all types (not just function types), they include the distributivity rule discussed at the beginning of this post, and they include a couple other rules (labeled \((g)\) and \((v)\) in Fig. 12.1). I'm not sure whether any of these differences really matter; the two systems might be equivalent. Their proof is quite different from ours and more involved; it is based on the notion of approximants. They also show that \(\mathcal{V}\) is incomplete wrt. contextual equivalence, but go on to create another model based on \(\mathcal{V}\) that is. The fact that \(\mathcal{V}\) is incomplete leads me suspect that \(\mathcal{E}\) is also incomplete. This is certainly worth looking into. </p> <p>Abramsky (1990) introduced a <i>domain logic</i> whose formulas are intersetion types: \[ \phi ::= \top \mid \phi \land \phi \mid \phi \to \phi \] and whose proof theory is an intersection type system designed to capture the semantics of the lazy lambda calculus. Abramsky proves that it is sound with respect to contextual equivalence. As far as I can tell, the proof is different than the approach used here, as it shows that the domain logic is sound with respect to a denotational semantics that solves the domain equation \(D = (D \to D)_\bot\), then shows that this denotational semantics is sound wrt. contextual equivalence. (See also Alan Jeffrey (1994).) </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-16068822707943596252017-02-05T12:39:00.000-08:002017-03-09T06:19:22.896-08:00On the Meaning of Casts and Blame for Gradual Typing<p>Gradually typed languages enable programmers to choose which parts of their programs are statically typed and which parts are dynamically typed. Thus, gradually typed languages perform some type checking at compile time and some type checking at run time. When specifying the semantics of a gradually typed language, we usually express the run time checking in terms of <em>casts</em>. Thus, the semantics of a gradually typed language depends crucially on the semantics of casts. This blog post tries to answer the question: "What is a cast?" </p> <h2>Syntax and Static Semantics of Casts</h2> <p>Syntactically, a cast is an expression of the form \[ e : A \Rightarrow^{\ell} B \] where \(e\) is a subexpression; \(A\) and \(B\) are the source and target types, respectively. The \(\ell\) is what we call a blame label, which records the location of the cast (e.g. line number and character position). </p> <p>Regarding the static semantics (compile-time type checking), a cast enables the sub-expression \(e\) of static type \(A\) to be used in a context expecting a different type \(B\). \[ \frac{\Gamma \vdash e : A} {\Gamma \vdash (e : A \Rightarrow^{\ell} B) : B} \] In gradual typing, \(A\) and \(B\) typically differ in how "dynamic" they are but are otherwise similar to each other. So we often restrict the typing rule for casts to only allow source and target types that have some values in common, that is, when \(A\) and \(B\) are consistent. \[ \frac{\Gamma \vdash e : A \quad A \sim B} {\Gamma \vdash (e : A \Rightarrow^{\ell} B) : B} \] For example, if we let \(\star\) be the unknown type (aka. \(\mathtt{dynamic}\)), then we have \(\mathtt{Int} \sim \star\) and \(\star \sim \mathtt{Int}\) but \(\mathtt{Int} \not\sim \mathtt{Int}\to\mathtt{Int}\). Here are the rules for consistency with integers, functions, and the dynamic type. \begin{gather*} \mathtt{Int} \sim \mathtt{Int} \qquad \frac{A \sim B \qquad A' \sim B'} {A \to A' \sim B \to B'} \qquad A \sim \star \qquad \star \sim B \end{gather*} </p> <h2>Dynamic Semantics of Casts</h2> <p>The dynamic semantics of a cast is to check whether the value produced by subexpression \(e\) is of the target type \(B\) and if so, return the value; otherwise signal an error. The following is a strawman denotational semantics that expresses this basic intuition about casts. Suppose we have already defined the meaning of types, so \(\mathcal{T}[\!| A |\!]\) is the set of values of type \(A\). The meaning function \(\mathcal{E}[\!| e |\!]\) maps an expression to a result (either a value \(v\) or error \(\mathsf{blame}\,\ell\)). \begin{align*} \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!] &= \begin{cases} v & \text{if } v \in \mathcal{T}[\!| B |\!] \\ \mathsf{blame}\,\ell & \text{if } v \notin \mathcal{T}[\!| B |\!] \end{cases} \\ & \text{where } v = \mathcal{E} [\!| e |\!] \end{align*} </p> <p>If we restrict ourselves to first-order types such as \(\mathtt{Int}\), it is straightforward to define \(\mathcal{T}\) and check whether a value is in the set. \begin{align*} \mathcal{T}[\!| \mathtt{Int} |\!] &= \mathbb{Z} \end{align*} The story for function types, that is, for \(A \to B\), is more complicated. In a denotational setting, it traditionally takes sophisticated mathematics to come up with mathematical entities that can serve as function values when the \(\star\) type is involved (Scott 1970, 1976). The primary challenge is that one cannot simply use the usual notion of a mathematical function to represent function values because of a cardinality problem. Suppose that \(D\) is the set of all values. The set of mathematical functions whose domain and codomain is \(D\) is necessarily larger than \(D\), so the mathematical functions cannot fit into the set of all values. There is nothing wrong with sophisticated mathematics per se, but when it comes to using a specification for communication (e.g. between language designers and compiler writers), it is less desirable to require readers of the specification to fully understand a large number of auxiliary definitions and decide whether those definitions match their intuitions. </p> <h3>Competing Operational Semantics for Casts</h3> <p>We'll come back to denotational semantics in a bit, but first let's turn to operational semantics, in particular reduction semantics, which is what the recent literature uses to explains casts and the type \(\star\) (Gronski 2006, Siek 2007, Wadler 2009). In a reduction semantics, we give rewrite rules to say what happens when a syntactic value flows into a cast, that is, we say what expression the cast reduces to. Recall that a syntactic value is just an expression that cannot be further reduced. We can proceed by cases on the consistency of the source type \(A\) and target type \(B\). </p><ul><li>Case \((v : \mathtt{Int} \Rightarrow^{\ell} \mathtt{Int})\). This one is easy, the static type system ensures that \(v\) has type \(\mathtt{Int}\), so there is nothing to check and we can rewrite to \(v\). \[ v : \mathtt{Int} \Rightarrow^{\ell} \mathtt{Int} \longrightarrow v \] </li><li>Case \((v : \star \Rightarrow^{\ell} \star)\). This one is also easy. \[ v : \star \Rightarrow^{\ell} \star \longrightarrow v \] </li><li>Case \((v : A \to A' \Rightarrow^{\ell} B \to B')\). This one is more complicated. We'd like to check that the function \(v\) has type \(B \to B'\). Suppose \(B'=\mathtt{Int}\). How can we determine whether a function returns an integer? In general, that's just as hard as the halting problem, which is undecidable. So instead of checking now, we'll delay the checking until when the function is called. We can accomplish this by rewriting to a lambda expression that casts the input, calls \(v\), and then casts the output. \[ v : A \to A' \Rightarrow^{\ell} B \to B' \longrightarrow \lambda x{:}B. (v \; (x : B \Rightarrow^{\ell} A)) : A' \Rightarrow^{\ell} B' \] Here we see the importance of attaching blame labels to casts. Because of the delayed checking, the point of error can be far removed from the original source code location, but thanks to the blame label we can point back to the source location of the cast that ultimately failed (Findler and Felleisen 2002). </li><li>Case \((v : A \Rightarrow^{\ell} \star)\). For this one there are multiple options in the literature. One option is declare this as a syntactic value (Siek 2009), so no rewrite rule is necessary. Another option is to factor all casts to \(\star\) through the ground types \(G\): \[ G ::= \mathtt{Int} \mid \star \to \star \] Then we expand the cast from \(A\) to \(\star\) into two casts that go through the unique ground type for \(A\). \begin{align*} v : A \Rightarrow^{\ell} \star &\longrightarrow (v : A \Rightarrow^{\ell} G) : G \Rightarrow^{\ell} \star\\ & \text{where } A \sim G, A \neq G, A \neq \star \end{align*} and then declare that expressions of the form \((v : G \Rightarrow^{\ell} \star)\) are values (Wadler 2009). </li><li>Case \((v : \star \Rightarrow^{\ell} B)\). There are multiple options here as well, but the choice is linked to the above choice regarding casting from \(A\) to \(\star\). If \(v = (v' : A \Rightarrow^{\ell'} \star)\), then we need the following rewrite rules \begin{align*} (v' : A \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} B &\longrightarrow v' : A \Rightarrow^{\ell} B & \text{if } A \sim B \\[2ex] (v' : A \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} B &\longrightarrow \mathsf{blame}\,\ell & \text{if } A \not\sim B \end{align*} On the other hand, if we want to factor through the ground types, we have the following reduction rules. \begin{align*} v : \star \Rightarrow^{\ell} B &\longrightarrow v : \star \Rightarrow^{\ell} G \Rightarrow^{\ell} B \\ & \text{if } B \sim G, B \neq G, B \neq \star \\[2ex] (v : G \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} G &\longrightarrow v \\[2ex] (v : G \Rightarrow^{\ell'} \star) : \star \Rightarrow^{\ell} G' &\longrightarrow \mathsf{blame}\,\ell\\ & \text{if } G \neq G' \end{align*} </li></ul> <p>Given that we have multiple options regarding the reduction semantics, an immediate question is whether it matters, that is, can we actually observe different behaviors for some program? Yes, in the following example we cast the identity function on integers to an incorrect type. \begin{equation} \begin{array}{l} \mathtt{let}\, id = (\lambda x{:}\mathtt{Int}. x)\, \mathtt{in}\\ \mathtt{let}\, f = (id : \mathtt{Int}\to \mathtt{Int} \Rightarrow^{\ell_1} \star) \, \mathtt{in} \\ \mathtt{let}\, g = (f : \star \Rightarrow^{\ell_2} (\mathtt{Int}\to \mathtt{Int}) \to \mathtt{Int})\,\mathtt{in} \\ \quad g \; id \end{array} \tag{P0}\label{P0} \end{equation} If we choose the semantics that factors through ground types, the above program reduces to \(\mathsf{blame}\, \ell_1\). If we choose the other semantics, the above program reduces to \(\mathsf{blame}\, \ell_2\). Ever since around 2008 I've been wondering which of these is correct, though for the purposes of full disclosure, I've always felt that \(\mathsf{blame}\,\ell_2\) was the better choice for this program. I've also been thinking for a long time that it would be nice to have some alternative, hopefully more intuitive, way to specify the semantics of casts, with which we could then compare the above two alternatives. </p> <h3>A Denotational Semantics of Functions and Casts</h3> <p>I've recently found out that there is a simple way to represent function values in a denotational semantics. The intuition is that, although a function may be able to deal with an infinite number of different inputs, the function only has to deal with a finite number of inputs on any one execution of the program. Thus, we can represent functions with finite tables of input-output pairs. An empty table is written \(\emptyset\), a single-entry table has the form \(v \mapsto v'\) where \(v\) is the input and \(v'\) is the corresponding output. We build a larger table out of two smaller tables \(v_1\) and \(v_2\) with the notation \(v_1 \sqcup v_2\). So, with the addition of integer values \(n \in \mathbb{Z}\), the following grammar specifies the values. \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \] </p> <p>Of course, we can't use just one fixed-size table as the denotation of a lambda expression. Depending on the context of the lambda, we may need a bigger table that handles more inputs. Therefore we map each lambda expression to the set of all finite tables that jive with that lambda. To be more precise, we shall define a meaning function \(\mathcal{E}\) that maps an expression and an environment to a set of values, and an auxiliary function \(\mathcal{F}\) that determines whether a table jives with a lambda expression in a given environment. Here's a first try at defining \(\mathcal{F}\). \begin{align*} \mathcal{F}(n, \lambda x{:}A. e, \rho) &= \mathsf{false} \\ \mathcal{F}(\emptyset, \lambda x{:}A. e, \rho) &= \mathsf{true} \\ \mathcal{F}(v \mapsto v', \lambda x{:}A. e, \rho) &= \mathcal{T}(A,v) \text{ and } v' \in \mathcal{E}[\!| e |\!]\rho(x{:=}v) \\ \mathcal{F}(v_1 \sqcup v_2, \lambda x{:}A. e, \rho) &= \mathcal{F}(v_1, \lambda x{:}A. e, \rho) \text{ and } \mathcal{F}(v_2, \lambda x{:}A. e, \rho) \end{align*} (We shall define \(\mathcal{T}(A,v)\) shortly.) We then define the semantics of a lambda-expression in terms of \(\mathcal{F}\). \[ \mathcal{E}[\!| \lambda x{:}A.\, e|\!]\rho = \{ v \mid \mathcal{F}(v, \lambda x{:}A. e, \rho) \} \] The semantics of function application is essentially that of table lookup. We write \((v_2 \mapsto v) \sqsubseteq v_1\) to say, roughly, that \(v_2 \mapsto v\) is an entry in the table \(v_1\). (We give the full definition of \(\sqsubseteq\) in the Appendix.) \[ \mathcal{E}[\!| e_1 \, e_2 |\!]\rho = \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \text{ and } v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \text{ and } (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \] Finally, to give meaning to lambda-bound variables, we simply look them up in the environment. \[ \mathcal{E}[\!| x |\!]\rho = \{ \rho(x) \} \] </p> <p>Now that we have a good representation for function values, we can talk about giving meaning to higher-order casts, that is, casts from one function type to another. Recall that in our strawman semantics, we got stuck when trying to define the meaning of types in the form of map \(\mathcal{T}\) from a type to a set of values. Now we can proceed based on the above definition of values \(v\). (To make the termination of \(\mathcal{T}\) more obvious, we'll instead define \(\mathcal{T}\) has a map from a type and a value to a Boolean. The measure is a lexicographic ordering on the size of the type and then the size of the value.) \begin{align*} \mathcal{T}(\mathtt{Int}, v) &= (\exists n. \; v = n) \\ \mathcal{T}(\star, v) &= \mathsf{true} \\ \mathcal{T}(A \to B, n) &= \mathsf{false} \\ \mathcal{T}(A \to B, \emptyset) &= \mathsf{true} \\ \mathcal{T}(A \to B, v \mapsto v') &= \mathcal{T}(A, v) \text{ and } \mathcal{T}(B, v') \\ \mathcal{T}(A \to B, v_1 \sqcup v_2) &= \mathcal{T}(A \to B, v_1) \text{ and } \mathcal{T}(A \to B, v_2) \end{align*} With \(\mathcal{T}\) defined, we define the meaning to casts as follows. \begin{align*} \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!]\rho &= \{ v \mid v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \mathcal{T}(B, v) \}\\ & \quad\; \cup \left\{ \mathsf{blame}\,\ell \middle| \begin{array}{l} \exists v.\; v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \neg \mathcal{T}(B, v)\\ \text{and } (\forall l'. v \neq \mathsf{blame}\,l') \end{array}\right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in \mathcal{E} [\!| e |\!]\rho \} \end{align*} This version says that the result of the cast should only be those values of \(e\) that also have type \(B\). It also says that we signal an error when a value of \(e\) does not have type \(B\). Also, if there was an error in \(e\) then we propagate it. The really interesting thing about this semantics is that, unlike the reduction semantics, we actually check functions at the moment they go through the cast, instead of delaying the check to when they are called. We immediately determine whether the function is of the target type. If the function is not of the target type, we can immediately attribute blame to this cast, so there is no need for complex blame tracking rules. </p> <p>Of course, we need to extend values to include blame: \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \mid \mathsf{blame}\,\ell \] and augment \(\mathcal{T}\) and \(\mathcal{F}\) to handle \(\mathsf{blame}\,\ell\). \begin{align*} \mathcal{T}(A\to B, \mathsf{blame}\,\ell) &= \mathsf{false} \\ \mathcal{F}(\mathsf{blame}\,\ell, \lambda x{:}A.e, \rho) &= \mathsf{false} \end{align*} To propagate errors to the meaning of the entire program, we augment the meaning of other language forms, such as function application to pass along blame. \begin{align*} \mathcal{E}[\!| e_1 \, e_2 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \text{ and } v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \text{and } (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\, \ell \mid \mathsf{blame}\, \ell \in \mathcal{E}[\!| e_1 |\!]\rho \text{ or } \mathsf{blame}\,\ell \in \mathcal{E}[\!| e_2 |\!]\rho\} \end{align*} </p> <h3>Two Examples</h3> <p>Let us consider the ramifications of this semantics. The following example program creates a function \(f\) that returns \(1\) on non-zero input and returns the identity function when applied to \(0\). We cast this function to the type \(\mathtt{Int}\to\mathtt{Int}\) on two separate occasions, cast \(\ell_3\) and cast \(\ell_4\), to create \(g\) and \(h\). We apply \(g\) to \(1\) and \(h\) to its result. \[ \begin{array}{l} \mathtt{let}\,f = \left(\lambda x:\mathtt{Int}.\; \begin{array}{l} \mathtt{if}\, x \,\mathtt{then}\, (0: \mathtt{Int}\Rightarrow^{\ell_1}\,\star)\\ \mathtt{else}\, ((\lambda y:\mathtt{Int}.\; y) : \mathtt{Int}\to\mathtt{Int}\Rightarrow{\ell_2} \, \star) \end{array} \right) \; \mathtt{in} \\ \mathtt{let}\,g = (f : \mathtt{Int}\to\star \Rightarrow^{\ell_3} \mathtt{Int}\to\mathtt{Int})\, \mathtt{in} \\ \mathtt{let}\,h = (f : \mathtt{Int}\to\star \Rightarrow^{\ell_4} \mathtt{Int}\to\mathtt{Int})\, \mathtt{in} \\ \mathtt{let}\,z = (g \; 1)\, \mathtt{in} \\ \quad (h\; z) \end{array} \] The meaning of this program is \(\{ \mathsf{blame}\,\ell_3, \mathsf{blame}\,\ell_4\}\). To understand this outcome, we can analyze the meaning of the various parts of the program. (The semantics is compositional!) Toward writing down the denotation of \(f\), let's define auxiliary functions \(id\) and \(F\). \begin{align*} id(n) &= \mathsf{false} \\ id(\emptyset) &= \mathsf{true} \\ id(v \mapsto v') &= (v = v') \\ id(v_1 \sqcup v_2) &= id(v_1) \text{ and } id(v_2) \\ id(\mathsf{blame}\,\ell) &= \mathsf{false} \\ \\ F(n) &= \textsf{false} \\ F(\emptyset) &= \textsf{true} \\ F(0 \mapsto v) &= \mathit{id}(v) \\ F(n \mapsto 0) &= (n \neq 0)\\ F(v_1 \sqcup v_2) &= F(v_1) \text{ and } F(v_2) \\ F(\mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} The denotation of \(f\) is \[ \mathcal{E}[\!| f |\!] = \{ v \mid F(v) \} \] To express the denotation of \(g\), we define \(G\) \begin{align*} G(n) &= \textsf{false} \\ G(\emptyset) &= \textsf{true} \\ G(n \mapsto 0) &= (n \neq 0) \\ G(v_1 \sqcup v_2) &= G(v_1) \text{ and } G(v_2) \\ G(\mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} The meaning of \(g\) is all the values that satisfy \(G\) and also \(\mathsf{blame}\,\ell_3\). \[ \mathcal{E}[\!| g |\!] = \{ v \mid G(v) \} \cup \{ \mathsf{blame}\, \ell_3 \} \] The meaning of \(h\) is similar, but with different blame. \[ \mathcal{E}[\!| h |\!] = \{ v \mid G(v) \} \cup \{ \mathsf{blame}\, \ell_4 \} \] The function \(g\) applied to \(1\) produces \(\{ 0, \mathsf{blame}\, \ell_3\}\), whereas \(h\) applied to \(0\) produces \(\{ \mathsf{blame}\, \ell_4\}\). Thus, the meaning of the whole program is \(\{ \mathsf{blame}\,\ell_3, \mathsf{blame}\,\ell_4\}\). </p> <p>Because cast \(\ell_3\) signals an error, one might be tempted to have the meaning of \(g\) be just \(\{ \mathsf{blame}\,\ell_3\}\). However, we want to allow implementations of this language that do not blame \(\ell_3\) (\(g\) is never applied to \(0\) after all, so its guilt was not directly observable) and instead blame \(\ell_4\), who was caught red handed. So it is important for the meaning of \(g\) to include the subset of values from \(f\) that have type \(\mathtt{Int}\to\mathtt{Int}\) so that we can carry on and find other errors as well. We shall expect implementations of this language to be sound with respect to blame, that is, if execution results in blame, it should blame one of the labels that is in the denotation of the program (and not some other innocent cast). </p> <p>Let us return to the example (P0). The denotation of that program is \(\{\mathsf{blame}\,\ell_2\}\) because the cast at \(\ell_2\) is a cast to \((\mathtt{Int}\to \mathtt{Int}) \to \mathtt{Int}\) and the identity function is not of that type. The other case at \(\ell_1\) is innocent because it is a cast to \(\star\) and all values are of that type, including the identity cast. </p> <h2>Discussion</h2> <p>By giving the cast calculus a denotational semantics in terms of finite function tables, it became straightforward to define whether a function value is of a given type. This in turn made it easy to define the meaning of casts, even casts at function type. A cast succeeds if the input value is of the target type and it fails otherwise. With this semantics we assign blame to a cast in an eager fashion, without the need for the blame tracking machinery that is present in the operational semantics. </p> <p>We saw an example program where the reduction semantics that factors through ground types attributes blame to a cast that the denotational semantics says is innocent. This lends some evidence to that semantics being less desirable. </p> <p>I plan to investigate whether the alternative reduction semantics is sound with respect to the denotational semantics in the sense that the reduction semantics only blames a cast if the denotational semantics says it is guilty. </p> <h2>Appendix</h2> <p>We give the full definition of the cast calculus here in the appendix. The relation \(\sqsubseteq\) that we used to define table lookup is the dual of the subtyping relation for intersection types. The denotational semantics is a mild reformation of the intersection type system that I discussed in previous blog posts. </p> Syntax \[ \begin{array}{lcl} A &::=& \mathtt{Int} \mid A \to B \mid \star \\ e &::= &n \mid \mathit{op}(e,e) \mid \mathtt{if}\, e\, \mathtt{then}\, e \,\mathtt{else}\, e \mid x \mid \lambda x{:}A \mid e \; e \mid e : A \Rightarrow^\ell B \end{array} \] Consistency \begin{gather*} \mathtt{Int} \sim \mathtt{Int} \qquad \frac{A \sim B \qquad A' \sim B'} {A \to A' \sim B \to B'} \qquad A \sim \star \qquad \star \sim B \end{gather*} Type System \begin{gather*} \frac{}{\Gamma \vdash n : \mathtt{Int}} \quad \frac{\Gamma \vdash e_1 : \mathtt{Int} \quad \Gamma \vdash e_2 : \mathtt{Int}} {\Gamma \vdash \mathit{op}(e_1,e_2) : \mathtt{Int}} \\[2ex] \frac{\Gamma \vdash e_1 : \mathtt{Int} \quad \Gamma \vdash e_2 : A \quad \Gamma \vdash e_3 : A} {\Gamma \vdash \mathtt{if}\, e_1\, \mathtt{then}\, e_2 \,\mathtt{else}\, e_3 : A} \\[2ex] \frac{x{:}A \in \Gamma}{\Gamma \vdash x : A} \quad \frac{\Gamma,x{:}A \vdash e : B}{\Gamma \vdash \lambda x{:}A.\; e : A \to B} \quad \frac{\Gamma e_1 : A \to B \quad \Gamma e_2 : A} {\Gamma \vdash e_1 \; e_2 : B} \\[2ex] \frac{\Gamma \vdash e : A \quad A \sim B} {\Gamma \vdash (e : A \Rightarrow^\ell B) : B} \end{gather*} Values \[ v ::= n \mid \emptyset \mid v \mapsto v \mid v \sqcup v \mid \mathsf{blame}\,\ell \] Table Lookup (Value Information Ordering) \begin{gather*} \frac{}{n \sqsubseteq n} \quad \frac{v'_1 \sqsubseteq v_1 \quad v_2 \sqsubseteq v'_2} {v_1 \mapsto v_2 \sqsubseteq v'_1 \mapsto v'_2} \quad \frac{}{\mathsf{blame}\,\ell \sqsubseteq \mathsf{blame}\,\ell} \\[2ex] \frac{}{v_1 \sqsubseteq v_1 \sqcap v_2} \quad \frac{}{v_2 \sqsubseteq v_1 \sqcap v_2} \quad \frac{v_1 \sqsubseteq v_3 \quad v_2 \sqsubseteq v_3} {v_1 \sqcup v_2 \sqsubseteq v_3} \\[2ex] \frac{}{v_1 \mapsto (v_2 \sqcup v_3) \sqsubseteq (v_1 \mapsto v_2) \sqcup (v_1 \mapsto v_3)} \quad \frac{}{\emptyset \sqsubseteq v_1 \mapsto v_2} \quad \frac{}{\emptyset \sqsubseteq \emptyset} \end{gather*} \noindent Semantics of Types \begin{align*} \mathcal{T}(\mathtt{Int}, v) &= (\exists n. \; v = n) \\ \mathcal{T}(\star, v) &= \mathsf{true} \\ \mathcal{T}(A \to B, n) &= \mathsf{false} \\ \mathcal{T}(A \to B, \emptyset) &= \mathsf{true} \\ \mathcal{T}(A \to B, v \mapsto v') &= \mathcal{T}(A, v) \text{ and } \mathcal{T}(B, v') \\ \mathcal{T}(A \to B, v_1 \sqcup v_2) &= \mathcal{T}(A \to B, v_1) \text{ and } \mathcal{T}(A \to B, v_2) \\ \mathcal{T}(A\to B, \mathsf{blame}\,\ell) &= \mathsf{false} \end{align*} Denotational Semantics \begin{align*} \mathcal{E}[\!| n |\!]\rho &= \{ n \}\\ \mathcal{E}[\!| \mathit{op}(e_1,e_2) |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2 n_1 n_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \land\; n_1 \sqsubseteq v_1 \land n_2 \sqsubseteq v_2 \land v = [\!| \mathit{op} |\!](n_1,n_2) \end{array} \right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho) \} \\ \mathcal{E}[\!| \mathtt{if}\, e_1\, \mathtt{then}\, e_2 \,\mathtt{else}\, e_3 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 n. v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land n \sqsubseteq v_1 \\ \land\; (n = 0 \Longrightarrow v \in \mathcal{E}[\!| e_3 |\!]\rho) \\ \land\; (n \neq 0 \Longrightarrow v \in \mathcal{E}[\!| e_2 |\!]\rho) \end{array} \right\}\\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho \cup \mathcal{E} [\!| e_3 |\!]\rho ) \} \\ \mathcal{E}[\!| x |\!]\rho &= \{ \rho(x) \}\\ \mathcal{E}[\!| \lambda x{:}A.\, e|\!]\rho &= \{ v \mid \mathcal{F}(v, \lambda x{:}A.\, e, \rho) \} \\ \mathcal{E}[\!| e_1 \, e_2 |\!]\rho &= \left\{ v \middle| \begin{array}{l} \exists v_1 v_2.\; v_1 \in \mathcal{E}[\!| e_1 |\!]\rho \land v_2 \in \mathcal{E}[\!| e_2 |\!]\rho \\ \land\; (v_2 \mapsto v) \sqsubseteq v_1 \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in (\mathcal{E} [\!| e_1 |\!]\rho \cup \mathcal{E} [\!| e_2 |\!]\rho) \} \\ \mathcal{E} [\!| e : A \Rightarrow^{\ell} B |\!]\rho &= \{ v \mid v \in \mathcal{E} [\!| e |\!]\rho \text{ and } \mathcal{T}(B, v) \}\\ & \quad\; \cup \left\{ \mathsf{blame}\,\ell \middle| \begin{array}{l}\exists v.\; v \in \mathcal{E} [\!| e |\!] \rho \text{ and } \neg \mathcal{T}(B, v)\\ \text{and } (\forall \ell'. v \neq \mathsf{blame}\,\ell') \end{array} \right\} \\ & \quad\; \cup \{ \mathsf{blame}\,\ell' \mid \mathsf{blame}\,\ell' \in \mathcal{E} [\!| e |\!]\rho \} \\ \mathcal{F}(n, \lambda x{:}A.\, e, \rho) &= \mathsf{false} \\ \mathcal{F}(\emptyset, \lambda x{:}A.\, e, \rho) &= \mathsf{true} \\ \mathcal{F}(v \mapsto v', \lambda x{:}A.\, e, \rho) &= \mathcal{T}(A,v) \text{ and } v' \in \mathcal{E}[\!| e |\!]\rho(x{:=}v) \\ \mathcal{F}(v_1 \sqcup v_2, \lambda x{:}A.\, e, \rho) &= \mathcal{F}(v_1, \lambda x{:}A.\, e, \rho) \text{ and } \mathcal{F}(v_2, \lambda x{:}A.\, e, \rho) \\ \mathcal{F}(\mathsf{blame}\,\ell, \lambda x{:}A.e, \rho) &= \mathsf{false} \end{align*} <h2>References</h2> <ul><li>(Findler 2002) Contracts for higher-order functions. R. B. Findler and M. Felleisen. International Conference on Functional Programming. 2002. <li>(Gronski 2006) Sage: Hybrid Checking for Flexible Specifications. Jessica Gronski and Kenneth Knowles and Aaron Tomb and Stephen N. Freund and Cormac Flanagan. Scheme and Functional Programming Workshop, 2006. <li>(Scott 1970) Outline of a Mathematical Theory of Computation. Dana Scott. Oxford University. 1970. Technical report PRG-2. <li>(Scott 1976) Data Types as Lattices. Dana Scott. SIAM Journal on Computing. 1976. Volume 5, Number 3. <li>(Siek 2009) Exploring the Design Space of Higher-Order Casts. Jeremy G. Siek and Ronald Garcia and Walid Taha. European Symposium on Programming. 2009. <li>(Wadler 2009) Well-typed programs can't be blamed. Philip Wadler and Robert Bruce Findler. European Symposium on Programming. 2009. </ul>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com1tag:blogger.com,1999:blog-11162230.post-80718992028141086812017-01-30T20:33:00.000-08:002017-02-01T07:19:10.545-08:00Completeness of Intersection Types wrt. an Applied CBV Lambda Calculus<p>I'm still quite excited about the <a href="http://siek.blogspot.com/2016/12/simple-denotational-semantics-for.html">simple denotational semantics</a> and looking forward to applying it to the semantics of gradually typed languages. However, before building on it I'd like to make sure it's correct. Recall that I proved soundness of the simple semantics with respect to a standard big-step operational semantics, but I did not prove completeness. Completeness says that if the operational semantics says that the program reduces to a particular value, then the denotational semantics does too. Recall that the first version of the simple semantics that I gave was not complete! It couldn't handle applying a function to itself, which is needed for the \(Y\) combinator and recursion. I've <a href="http://siek.blogspot.com/2016/12/take-3-application-with-subsumption-for.html">written down a fix</a>, but the question remains whether the fix is good enough, that is, can we prove completeness? In the mean time, I learned that <a href="http://siek.blogspot.com/2017/01/intersection-types-as-denotations.html">the simple semantics is closely related to filter models based on type systems with intersection types</a>. This is quite helpful because that literature includes many completeness results for pure lambda calculi, see for example <i>Intersection Types and Computational Rules</i> by Alessi, Barbanera, and Dezani-Ciancaglini (2003). </p> <p>In this blog post I prove completeness for an intersection type system with respect to a call-by-value lambda calculus augmented with numbers, primitive operators (addition, multiplication, etc.), and a conditional if-expression. The main outline of the proof is adapted from the above-cited paper, in which completeness is proved with respect to small-step operational semantics, though you'll find more details (i.e. lemmas) here because I've mechanized the proof in Isabelle and can't help but share in my suffering ;) (<a href="https://www.dropbox.com/s/288le14ru19rgjb/Lambda.thy?dl=0">Lambda.thy</a>, <a href="https://www.dropbox.com/s/00907wtk3r0jy4p/SmallStepLam.thy?dl=0">SmallStepLam.thy</a>, <a href="https://www.dropbox.com/s/638e42rf0szp5by/IntersectComplete.thy?dl=0">IntersectComplete.thy</a>) Ultimately I would like to prove completeness for the simple denotational semantics, but a good first step is doing the proof for a system that is in between the simple semantics and the intersection type systems in the literature. </p> <p>The intersection type system I use here differs from ones in the literature in that I restrict the \(\wedge\) introduction rule to \(\lambda\)'s instead of applying it to any expression, as shown below. I recently realized that this change does not disturb the proof of Completeness because we're dealing with a call-by-value language. \[ \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\, e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B}(\wedge\,\mathrm{intro}) \] I would like to remove the subsumption rule \[ \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \] but doing so was increasing the complexity of the proof of Completeness. Instead I plan to separately prove that the version without subsumption is equivalent to the version with subsumption. One might also consider doing the same regarding our above change to the \(\wedge\) introduction rule. I have also been working on that approach, but proving the admissibility of the standard \(\wedge\) introduction rule has turned out to be rather difficult (but interesting!). </p> <h3>Definition of an Applied CBV Lambda Calculus</h3> <p>Let us dive into the formalities and define the language that we're interested in. Here's the types, which include function types, intersection types, the top function type (written \(\top\)), and singleton numbers. Our \(\top\) corresponds to the type \(\nu\) from Egidi, Honsell, and Rocca (1992). See also Alessi et al. (2003). \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \mid n \] and here's the expressions: \[ e ::= n \mid \mathit{op}(e,e) \mid \mathrm{if}\,e\,\mathrm{then}\,e\,\mathrm{else}\,e \mid x \mid \lambda x.\, e \mid e\,e \] where \(n\) ranges over numbers and \(\mathit{op}\) ranges over arithmetic operators such as addition. </p> <p>We define type environments as an association list mapping variables to types. \[ \Gamma ::= \emptyset \mid \Gamma,x:A \] </p> <p>The type system, defined below, is unusual in that it is highly precise. Note that the rule for arithmetic operators produces a precise singleton result and that the rules for if-expressions require the condition to be a singleton number (zero or non-zero) so that it knows which branch is taken. Thus, this type system is really a kind of dynamic semantics. </p> \begin{gather*} \frac{}{\Gamma \vdash n : n} \\[2ex] \frac{} {\Gamma \vdash \lambda x.\, e : \top}(\top\,\mathrm{intro}) \quad \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\, e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B}(\wedge\,\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash x : A} \quad \frac{\Gamma,x:A \vdash B} {\Gamma \vdash \lambda x.\, e : A \to B} \quad \frac{\Gamma \vdash e_1: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B}(\to\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e_1 : n_1 \quad \Gamma \vdash e_2 : n_2 \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash e_1 : 0 \quad \Gamma \vdash e_3 : B} {\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \quad \frac{\Gamma \vdash e_1 : n \quad n \neq 0 \quad \Gamma \vdash e_2 : A} {\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : A} \end{gather*} <p>The rules for subtyping come from the literature. </p>\begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(<:\to) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \end{gather*} <p>We shall be working with values that are well typed in an empty type environment. This usually implies that the values have no free variables. However, that is not true of the current type system because of the \(\top\) introduction rule. So we add a side condition for \(\lambda\) in our definition of values. (In retrospect, I should have instead included a statement about free variables in the main Completeness theorem and then propagated that information to where it is needed.) \[ v ::= n \mid \lambda x.\, e \quad \text{where } FV(e) \subseteq \{x\} \] </p> <p>We use a naive notion of substitution (not capture avoiding) because the \(v\)'s have no free variables to capture. \begin{align*} [x:=v] y &= \begin{cases} v & \text{if } x = y \\ y & \text{if } x \neq y \end{cases} \\ [x:=v] n &= n \\ [x:=v] (\lambda y.\, e) &= \begin{cases} \lambda y.\, e & \text{if } x = y \\ \lambda y.\, [x:=v] e & \text{if } x \neq y \end{cases} \\ [x:=v](e_1\, e_2) &= ([x:=v]e_1\, [x:=v]e_2) \\ [x:=v]\mathit{op}(e_1, e_2) &= \mathit{op}([x:=v]e_1, [x:=v]e_2) \\ [x:=v](\mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3) &= \mathrm{if}\,[x:=v]e_1\,\mathrm{then}\,[x:=v]e_2\,\mathrm{else}\,[x:=v]e_3 \end{align*} </p> <p>The small-step operational semantics is defined by the following reduction rules. I'm not sure why I chose to use SOS-style rules instead of evaluation contexts. \begin{gather*} \frac{}{(\lambda x.\,e) \; v \longrightarrow [x:=v]e} \quad \frac{e_1 \longrightarrow e'_1}{e_1\,e_2 \longrightarrow e'_1 \, e_2} \quad \frac{e_2 \longrightarrow e'_2}{e_1\,e_2 \longrightarrow e_1 \, e'_2} \\[2ex] \frac{}{\mathit{op}(n_1,n_2) \longrightarrow [\!|\mathit{op}|\!](n_1,n_2)} \quad \frac{e_1 \longrightarrow e'_1} {\mathit{op}(e_1,e_2) \longrightarrow \mathit{op}(e'_1,e_2)} \quad \frac{e_2 \longrightarrow e'_2} {\mathit{op}(e_1,e_2) \longrightarrow \mathit{op}(e_1,e'_2)} \\[2ex] \frac{}{\mathrm{if}\,0\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow e_3} \quad \frac{n \neq 0} {\mathrm{if}\,n\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow e_2} \\[2ex] \frac{e_1 \longrightarrow e'_1} {\mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 \longrightarrow \mathrm{if}\,e'_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3} \end{gather*} \[ \frac{}{e \longrightarrow^{*} e} \qquad \frac{e_1 \longrightarrow e_2 \quad e_2 \longrightarrow^{*} e_3} {e_1 \longrightarrow^{*} e_3} \] </p> <h3>Proof of Completeness</h3> <p>The theorem that we aim to prove is that if the operational semantics says that a program reduces to a value, then the program is typable in the intersection type system and that the result type precisely describes the result value. I'm going to present the proof in a top-down style, so the proof of each lemma that I use is found further along in this blog post. </p> <p><b>Theorem</b> (Completeness)<br>If \(e \longmapsto^{*} v\), then \(\emptyset \vdash e : A\) and \(\emptyset \vdash v : A\) for some type \(A\). <br><b>Proof</b><br> Every value is typable (use the \(\top\) introduction rule for \(\lambda\)), so we have some \(A\) such that \(\emptyset \vdash v : A\). We shall show that typing is preserved by reverse reduction, which will give us \(\emptyset \vdash e : A\). <b>QED</b></p> <p><b>Lemma</b> (Reverse Multi-Step Preserves Types) <br>If \(e \longrightarrow^{*} e'\) and \(\emptyset \vdash e' : A\), then \(\emptyset \vdash e : A\). <br><b>Poof</b><br> The proof is by induction on the derivation of \(e \longrightarrow^{*} e'\). The base case is trivial. The induction case requires that typing be preserved for a single-step of reduction, which we prove next. <b>QED</b></p> <p><b>Lemma</b> (Reverse Single-Step Preserves Types) <br>If \(e \longrightarrow e'\) and \(\emptyset \vdash e' : A\), then \(\emptyset \vdash e : A\). <br><b>Proof</b><br>The proof is by induction on the derivation of \(e \longrightarrow e'\). The most important case is for function application: \[ (\lambda x.\,e) \; v \longrightarrow [x:=v]e \] We have that \(\emptyset \vdash [x:=v]e : A\) and need to show that \(\emptyset \vdash (\lambda x.\,e) \; v : A\). That is, we need to show that call-by-value \(\beta\)-expansion preserves types. So we need \(x:B \vdash e : A\) and \(\emptyset \vdash v : B\) for some type \(B\). The proof of this was the crux and required some generalization; I found it difficult to find the right statement of the lemma. It is proved below under the name Reverse Substitution Preserves Types. The other cases of this proof are straightforward except for one hiccup. They all require inversion lemmas (aka. generation lemmas) to unpack the information from \(\emptyset \vdash e' : A\). However, as is usual for languages with subsumption, the inversion lemmas are not simply proved by case analysis on typing rules, but must instead be proved by induction on the typing derivations. <b>QED</b></p> <p><b>Lemma</b> (Inversion)<br><ol><li>If \(\Gamma \vdash n : A \), then \(n <: A\). </li><li>If \(\Gamma \vdash e_1\,e_2 : A \), then \( \Gamma \vdash e_1 : B \to A' \), \( A' <: A \), and \( \Gamma \vdash e_2 : B \) for some \(A'\) and \(B\). </li><li>If \(\Gamma \vdash \mathit{op}(e_1,e_2) : A\), then \(\Gamma \vdash e_1 : n_1\), \(\Gamma \vdash e_2 : n_2 \), \(\Gamma \vdash \mathit{op}(e_1,e_2) : [\!|\mathit{op}|\!](n_1,n_2) \), and \([\!|\mathit{op}|\!](n_1,n_2) <: A\) for some \(n_1\) and \(n_2\). </li><li>If \(\Gamma \vdash \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : A\), then either <ul> <li>\(\Gamma \vdash e_1 : 0\), \(\Gamma\vdash e_3 : B\), and \(B <: A\), for some \(B\).</li> <li> \(\Gamma \vdash e_1 : n\), \(n \neq 0\), \(\Gamma\vdash e_2 : A'\), and \(A' <: A\), for some \(A'\).</li> </ul></li></ol><b>Proof</b> The proofs are by induction on the derivation of typing. <b>QED</b></p> <p>To state the reverse substitution lemma in a way that provides a useful induction hypothesis in the case for \(\lambda\), we introduce a notion of equivalence of type environments: \[ \Gamma \approx \Gamma' = (x : A \in \Gamma \text{ iff } x : A \in \Gamma') \] The reverse substitution lemma will show that if \([y:=v]e\) is well typed, then so is \(e\) in the environment extended with \(y:B\), for some appropriate choice of \(B\). Now, the value \(v\) may appear in multiple places within \([y:=v]e\) and in each place, \(v\) may have been assigned a <em>different</em> type. For example, \(v\) could be \(\lambda x.\, {+}(x,1)\) and it could have the type \(0\to 1\) in one place and \(1\to 2\) in another place. However, we must choose a single type \(B\) for \(y\). But thanks to intersection types, we can choose \(B\) to be the intersection of all the types assigned to \(v\). </p> <p><b>Lemma</b> (Reverse Substitution Preserves Types)<br>If \(\Gamma \vdash [y:=v]e : A\) and \(y \notin \mathrm{dom}(\Gamma)\), then \( \emptyset \vdash v : B \), \( \Gamma' \vdash e : A\), and \(\Gamma' \approx \Gamma,y:B\) for some \(\Gamma'\) and \(B\). <br><b>Proof</b>The proof is by induction on the derivation of \(\Gamma \vdash [y:=v]e : A\). (I wonder if the proof would have been easier if done by induction on \(e\).) The proof is rather long, so I'll just highlight the lemmas that were needed here. The full details are in the Isabelle mechanization. <ul><li> The cases for variables and numbers are relatively straightforward. </li><li> The case for \(\lambda\) requires lemmas regarding Environment Strengthening and Environment Lowering and their corollaries.</li><li> The case for subsumption is relatively easy.</li><li> The case for function application is interesting. We have \( (e_1 \, e_2) = [y:=v]e \), so \(e = e'_1 \, e'_2\) where \(e_1 = [y:=v]e'_1\) and \(e_2 = [y:=v]e'_2\). From the induction hypotheses for \(e_1\) and \(e_2\), we have \(\emptyset \vdash v : B_1\) and \(\emptyset \vdash v : B_2\). The lemma Combine Values gives us some \(B_3\) such that \(\emptyset \vdash v : B_3\) and \(B_3 <: B_1\) and \(B_3 <: B_2\). We choose \(\Gamma' = \Gamma,y:B_3\). To show \(\Gamma' \vdash e'_1\, e'_2 : A\) we use the induction hypotheses for \(e_1\) and \(e_2\), along with the lemmas Equivalent Environments and Environment Lowering. </li><li> The case for \(\top\) introduction is straightforward. </li><li> The case for \(\wedge\) introduction uses the lemmas Well-typed with No Free Variables, Environment Strengthening, Combine Values, Equivalent Environments, and Environment Lowering. </li><li> The cases for arithmetic operators and if-expressions follow a pattern similar to that of function application. </li></ul><b>QED</b></p> <p><b>Lemma</b> (Environment Strengthening)<br>If \(\Gamma \vdash e : A\) and for every free variable \(x\) in \(e\), \( x:A \in \Gamma \text{ iff } x:A \in \Gamma' \), then \(\Gamma' \vdash e : A\). <br><b>Proof</b>The proof is by induction on the derivation of \(\Gamma \vdash e : A\). <b>QED</b></p> <p><b>Corollary</b> (Well-typed with No Free Variables)<br>If \(\Gamma \vdash e : A\) and \(\mathit{FV}(e) = \emptyset\), then \(\emptyset \vdash e : A\). </p> <p>We define the following ordering relation on environments: \[ \Gamma \sqsupseteq \Gamma' = (x:A \in \Gamma \Longrightarrow x:A' \in \Gamma \text{ and } A' <: A) \] </p> <p><b>Lemma</b> (Environment Lowering) <br>If \(\Gamma \vdash e : A\) and \(\Gamma \sqsupseteq \Gamma'\), then \(\Gamma' \vdash e : A\). <br><b>Proof</b> The proof is by induction on the derivation of \(\Gamma \vdash e : A\). <b>QED</b></p> <p><b>Corollary</b> (Equivalent Environments) <br>If \(\Gamma \vdash e : A\) and \(\Gamma \approx \Gamma'\), then \(\Gamma' \vdash e : A\). <br><b>Proof</b> If \(\Gamma \approx \Gamma'\) then we also have \(\Gamma \sqsupseteq \Gamma'\), so we conclude by applying Environment Lowering. <b>QED</b></p> <p><b>Lemma</b> (Combine Values) <br> If \(\Gamma \vdash v : B_1\) and \(\Gamma \vdash v : B_2\), then \(\Gamma \vdash v : B_3\), \(B_3 <: B_1 \wedge B_2\), and \(B_1 \wedge B_2 <: B_3\) for some \(B_3\). <br><b>Proof</b> The proof is by cases on \(v\). It uses the Inversion lemma for numbers and the \(\wedge\) introduction rule for \(\lambda\)'s. <b>QED</b></p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-75942928316127722342017-01-14T15:40:00.001-08:002017-03-10T20:03:42.827-08:00Intersection Types as Denotations<p>In my previous post I described a simple denotational semantics for the CBV lambda calculus in which the meaning of a \(\lambda\) function is a set of tables. For example, here is a glimpse at some of the tables in the meaning of \(\lambda x. x+2\). </p>\[ E[\!| (\lambda x. x+2) |\!](\emptyset) = \left\{ \begin{array}{l} \emptyset, \\ \{ 5\mapsto 7 \},\\ \{ 0\mapsto 2, 1 \mapsto 3 \},\\ \{ 0\mapsto 2, 1\mapsto 3, 5 \mapsto 7 \}, \\ \vdots \end{array} \right\} \] <p>Since then I've been reading the literature starting from an observation by Alan Jeffrey that this semantics seems similar to the domain logic in Abramsky's Ph.D. thesis (1987). That in turn pointed me to the early literature on intersection types, which were invented in the late 1970's by Coppo, Dezani-Ciancaglini, Salle, and Pottinger. It turns out that one of the motivations for intersection types was to create a denotational semantics for the lambda calculus. Furthermore, it seems that intersection types are closely related to my simple denotational semantics! </p> <p>The intersection types for the pure lambda calculus included function types, intersections, and a top type: \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \] For our purposes we shall also add singleton types for numbers. \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \mid n \] So the number \(2\) has the singleton type \(2\) and any function that maps \(0\) to \(2\) will have the type \(0 \to 2\). Any function that maps \(0\) to \(2\) and also maps \(1\) to \(3\) has the intersection type \[ (0 \to 2) \wedge (1 \to 3) \] These types are starting to look a lot like the tables above! Indeed, even the empty table \(\emptyset\) corresponds to the top type \(\top\), they both can be associated with any \(\lambda\) function. </p> <p>The addition of the singleton number types introduces a choice regarding the top type \(\top\). Does it include the numbers and functions or just functions? We shall go with the later, which corresponds to the \(\nu\) type in the literature (Egidi, Honsell, Rocca 1992). </p> <p>Now that we have glimpsed the correspondence between tables and intersection types, let's review the typing rules for the implicitly typed lambda calculus with singletons, intersections, and \(\top\). </p>\begin{gather*} \frac{}{\Gamma \vdash n : n} \\[2ex] \frac{}{\Gamma \vdash \lambda x.\,e : \top}(\top\,\mathrm{intro}) \quad \frac{\Gamma \vdash e : A \quad \Gamma \vdash e : B} {\Gamma \vdash e : A \wedge B}(\wedge\,\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \quad \frac{x:A \in \Gamma}{\Gamma \vdash x : A} \\[2ex] \frac{\Gamma,x:A \vdash e : B} {\Gamma \vdash \lambda x.\, e : A \to B} \quad \frac{\Gamma \vdash e_1: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B}(\to\mathrm{elim}) \end{gather*} where subtyping is defined as follows \begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \end{gather*} <p>With intersection types, one can write the same type in many different ways. For example, the type \(5\) is the same as \(5 \wedge 5\). One common way to define such equalities is in terms of subtyping: \(A = B\) iff \(A <: B\) and \(B <: A\). </p> <p>So how does one define a semantics using intersection types? Barendregt, Coppo, Dezani-Ciancaglini (1983) (BCD) define the meaning of an expression \(e\) to be the set of types for which it is typable, something like \[ [\!| e |\!](\Gamma) = \{ A \mid \Gamma \vdash e : A \} \] For a simple type system (without intersection), such as semantics would not be useful. Any term with self application (needed for recursion) would not type check and therefore its meaning would be the empty set. But with intersection types, the semantics gives a non-empty meaning to all terminating programs! </p> <p>The next question is, how does the BCD semantics relate to my simple table-based semantics? One difference is that the intersection type system has two rules that are not syntax directed: \((\wedge\,\mathrm{intro})\) and (Sub). However, we can get rid of these rules. The \((\wedge\,\mathrm{intro})\) rule is not needed for numbers, only for functions. So one should be able to move all uses of the \((\wedge\,\mathrm{intro})\) rules to \(\lambda\)'s. \[ \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\; e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B} \] To get rid of (Sub), we need to modify \((\to\mathrm{elim})\) to allow for the possibility that \(e_1\) is not literally of function type. \[ \frac{\Gamma \vdash e_1 : C \quad C <: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B} \] </p> <p>All of the rules are now syntax directed, though we now have three rules for \(\lambda\), but those rules handle the three different possible types for a \(\lambda\) function: \(A \to B\), \(A \wedge B\), and \(\top\). Next we observe that a relation is isomorphic to a function that produces a set. So we change from \(\Gamma \vdash e : A\) to \(E[\!| e |\!](\Gamma) = \mathcal{A}\) where \(\mathcal{A}\) ranges over sets of types, i.e., \(\mathcal{A} \in \mathcal{P}(A)\). We make use of an auxiliary function \(F\) to define the meaning of \(\lambda\) functions. \begin{align*} E[\!| n |\!](\Gamma) & = \{ n \} \\ E[\!| x |\!](\Gamma) & = \{ \Gamma(x) \} \\ E[\!| \lambda x.\, e |\!](\Gamma) & = \{ A \mid F(A,x,e,\Gamma) \} \\ E[\!| e_1 \; e_2 |\!](\Gamma) & = \left\{ B \middle| \begin{array}{l} C \in E[\!| e_1 |\!](\Gamma) \\ \land\; A \in E[\!| e_2 |\!](\Gamma) \\ \land\; C <: A \to B \end{array} \right\} \\ \\ F(A \to B, x,e,\Gamma) &= B \in E[\!| e |\!](\Gamma(x:=A)) \\ F(A \wedge B, x,e,\Gamma) &= F(A, x,e,\Gamma) \land F(B, x,e,\Gamma) \\ F(\top, x,e,\Gamma) &= \mathrm{true} \end{align*} </p> <p>I conjecture that this semantics is equivalent to the "take 3" semantics. There are a couple remaining differences and here's why I don't think they matter. Regarding the case for \(\lambda\) in \(E\), the type \(A\) can be viewed as an alternative representation for a table. The function \(F\) essentially checks that all entries in the table jive with the meaning of the \(\lambda\)'s body, which is what the clause for \(\lambda\) does in the ``take 3'' semantics. Regarding the case for application in \(E\), the \(C\) is a table and \(C <: A \to B\) means that there is some entry \(A' \to B'\) in the table \(C\) such that \(A' \to B' <: A \to B\), which means \(A <: A'\) and \(B' <: B\). The \(A <: A'\) corresponds to our use of \(\sqsubseteq\) in the "take 3" semantics. The \(B' <: B\) doesn't matter. </p> <p>There's an interesting duality and change of viewpoint going on here between the table-based semantics and the intersection types. The table-based semantics is concerned with what values are produced by a program whereas the intersection type system is concerned with specifying what kind of values are allowed, but the types are so precise that it becomes dual in a strong sense to the values themselves. To make this precise, we can talk about tables in terms of their finite graphs (sets of pairs), and create them using \(\emptyset\), union, and a singleton input-output pair \(\{(v_1,v_2)\}\). With this formulation, tables are literally dual to types, with \(\{(v_1,v_2)\}\) corresponding to \(v_1 \to v_2\), union corresponding to intersection, empty set corresponding to \(\top\), and \(T_1 \subseteq T_2\) corresponding to \(T_2 <: T_1\). </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-19270050164232580602016-12-21T13:48:00.000-08:002017-01-07T05:09:15.968-08:00Take 3: Application with Subsumption for Den. Semantics of Lambda Calculus <p>Alan Jeffrey tweeted the following in reply to the previous post: </p> <blockquote>@jeremysiek wouldn't it be easier to change the defn of application to be <br>⟦MN⟧σ = { W | T ∈ ⟦M⟧σ, V ∈ ⟦N⟧σ, (V′,W) ∈ T, V′ ⊆ V }? </blockquote> <p>The idea is that, for higher order functions, if the function \(M\) is expecting to ask all the questions in the table \(V'\), then it is OK to apply \(M\) to a table \(V\) that answers <b>more</b> questions than \(V'\). This idea is quite natural, it is like Liskov's subsumption principle but for functions instead of objects. If this change can help us with the self application problem, then it will be preferable to the graph-o-tables approach described in the previous post because it retains the simple inductive definition of values. So let's see where this takes us! </p> <p>We have the original definition of values </p>\[ \begin{array}{lrcl} \text{values} & v & ::= & n \mid \{ (v_1,v'_1),\ldots,(v_n,v'_n) \} \end{array} \] <p>and here is the denotational semantics, updated with Alan's suggestion to include the clause \(v'_2 \sqsubseteq v_2\) in the case for application. </p>\begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \{ T \mid \forall v v'. (v,v') \in T \Rightarrow v' \in E[\!|e|\!](\rho(x:=v)) \} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v \middle| \begin{array}{l} \exists T v_2 v'_2. T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land v'_2 \sqsubseteq v_2 \land (v'_2,v) {\in} T \end{array} \right\} \end{align*} <p>The ordering on values \(\sqsubseteq\) used above is just equality on numbers and subset on function tables. </p> <p>The first thing to check is whether this semantics can handle self application at all, such as \[ (\lambda f. f \; f) \; (\lambda g. \; 42) \] </p><p><b>Example 1.</b> \( 42 \in E[\!| (\lambda f. f \; f) \; (\lambda g. \; 42) |\!](\emptyset) \) <br> The main work is figuring out witnesses for the function tables. We're going to need the following tables: \begin{align*} T_0 &= \emptyset \\ T_1 &= \{ (\emptyset, 42)\} \\ T_2 &= \{ (T_1, 42) \} \end{align*} Here's the proof, working top-down, or goal driven. The important use of subsumption is the \( \emptyset \sqsubseteq T_1 \) below. <ul><li> \( T_2 \in E[\!| (\lambda f. f \; f)|\!](\emptyset)\)<br> So we need to show: \( 42 \in E[\!| f \; f|\!](f:=T_1) \) <ul> <li> \( T_1 \in E[\!| f |\!](f:=T_1) \)</li> <li> \( T_1 \in E[\!| f |\!](f:=T_1) \)</li> <li> \( \emptyset \sqsubseteq T_1 \) </li> <li> \( (\emptyset, 42) \in T_1 \) </li> </ul></li><li> \( T_1 \in E[\!| (\lambda g. \; 42) |\!](\emptyset)\) <br> So we need to show \( 42 \in E[\!| 42 |\!](g:=\emptyset)\), which is immediate. <li> \( T_1 \sqsubseteq T_1 \) <li> \( (T_1,42) \in T_2 \) </ul></p> <p>Good, so this semantics can handle a simple use of self application. How about factorial? Instead of considering factorial of 3, as in the previous post, we'll go further this time and consider factorial of an arbitrary number \(n\). </p> <p><b>Example 2.</b> We shall compute the factorial of \(n\) using the strict version of the Y combinator, that is, the Z combinator. \begin{align*} M & \equiv \lambda x. f \; (\lambda v. (x\; x) \; v) \\ Z & \equiv \lambda f. M \; M \\ F & \equiv \lambda n. \mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)\\ H & \equiv \lambda r. F \\ \mathit{fact} & \equiv Z\, H \end{align*} We shall show that \[ n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \] For this example we need very many tables, but fortunately there are just a few patterns. To capture these patterns, be define the following table-producing functions. \begin{align*} T_F(n) &= \{ (n,n!) \} \\ T_H(n) &= \{ (\emptyset,T_F(0)), (T_F(0), T_F(1)), \ldots ,(T_F(n-1), T_F(n)) \} \\ T_M(n) &= \begin{cases} \emptyset & \text{if } n = 0 \\ \{ (T_M(n'), T_F(n')) \} \cup T_M(n') & \text{if } n = 1+n' \end{cases} \\ T_Z(n) &= \{ (T_H(n), T_F(n) )\} \end{align*} \(T_F(n)\) is a fragment of the factorial function, for the one input \(n\). \(T_H(n)\) maps each \(T_F(i)\) to \(T_F(i+1) \) for up to \(i+1 = n\). \(T_M(n)\) is the heart of the matter, and what makes the self application work. It maps successively larger versions of itself to fragments of the factorial function, that is \[ T_M(n) = \left\{ \begin{array}{l} T_M(0) \mapsto T_F(0) \\ T_M(1) \mapsto T_F(1) \\ \vdots & \\ T_M(n-1) \mapsto T_F(n-1) \end{array} \right\} \] For example, here is \(T_M(4)\): <div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-uukwPMtr5gw/WFsngo6m9LI/AAAAAAAAAgA/ug6-U1_3SwgtLkCbpuR2OrsV_hZ_EqKbQCLcB/s1600/TM4.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://3.bp.blogspot.com/-uukwPMtr5gw/WFsngo6m9LI/AAAAAAAAAgA/ug6-U1_3SwgtLkCbpuR2OrsV_hZ_EqKbQCLcB/s640/TM4.jpg" width="640" height="490" /></a></div> The tables \( T_M \) enable self application because we have the following two crucial properties: <ol><li> \( T_M(n) \sqsubseteq T_M(1+n) \) </li><li> \( (T_M(n), T_F(n)) \in T_M(1+n) \) </li></ol> The main lemma's that we prove are </p> <p><b>Lemma 1.</b> If \(n \le k\), then \(T_M(1+n) \in E[\!| M |\!](f:=T_H(k)) \). </p><p><b>Lemma 2.</b> \( T_Z(n) \in E[\!| Z |\!](\emptyset) \) <br></p> <p>If you're curious about the details for the complete proof of \( n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \) you can take a look at the proof in Isabelle that I've written <a href="https://www.dropbox.com/s/qxrnrdihelrzlna/DenotLam3.thy?dl=0">here</a>. </p> <p>This is all quite promising! Next we look at the proof of soundness with respect to the big step semantics. </p> <h3>Soundness with Respect to the Big-Step Semantics</h3> <p>The proof of soundness is quite similar to that of the <a href="http://siek.blogspot.com/2016/12/simple-denotational-semantics-for.html">first version</a>, as the relation \(\approx\) between the denotational and big-step values remains the same. However, the following two technical lemmas are needed to handle subsumption. </p> <p><b>Lemma</b> (Related Table Subsumption) If \(T' \subseteq T\) and \(T \approx \langle \lambda x.e, \rho \rangle\), then \(T' \approx \langle \lambda x.e, \rho \rangle\). <br>The proof is by induction on \(T'\). </p> <p><b>Lemma</b> (Related Value Subsumption) If \(v_1 \approx v'\) and \(v_2 \sqsubseteq v'\), then \(v_2 \approx v'\). <br>The proof is by case analysis, using the previous lemma when the values are function tables. </p> <p><b>Theorem</b> (Soundness). <br>If \(v \in E[\!| e |\!](\rho) \) and \( \rho \approx \rho' \), then \( \rho' \vdash e \Downarrow v' \) and \(v \approx v'\) for some \(v'\). </p> <p>The mechanization of soundness in Isabelle is <a href="https://www.dropbox.com/s/tjkg09bgp62y1ce/BigStepLam.thy?dl=0">here</a>. </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-49512711962868040412016-12-19T04:57:00.000-08:002016-12-19T06:01:33.090-08:00Take 2: Graph of Tables for the Denotational Semantics of the Lambda Calculus<p>In the previous post, the denotational semantics I gave for the lambda calculus could not deal with self application, such as the program \[ (\lambda f. f\;f)\;(\lambda g. 42) \] whose result should be \(42\). The problem was that I defined function values to be tables of pairs of values, using a datatype definition, which rules out the possibility of cycles. In the above program, the table for \( \lambda g. 42 \) needs to include itself so that the application \(f \; f \) makes sense. </p> <p>A straightforward way to solve this problem is to allow cycles by representing all the functions created by the program as a graph of tables. A function value will just contain an index, i.e. it will have the form \(\mathsf{fun}[i]\) with index \(i\), and the graph will map the index to a table. So we define values as follows. \[ \begin{array}{lccl} \text{numbers} & n \in \mathbb{N} \\ \text{values} & v \in \mathbb{V} & ::= & n \mid \mathsf{fun}[i] \\ & \mathit{graph} & = & \mathbb{N} \to (\mathbb{V} \times \mathbb{V})\,\mathit{list} \end{array} \] Given a function value \(v\) and graph \(G\), we write \(\mathit{tab}(v,G)\) for the table \(t\) when \(v= \mathsf{fun}[i]\) and \(G(i)=t\) for some index \(i\). So we modify the semantic function \(E\) to be defined as follows. </p> \begin{align*} E & : \mathit{exp} \to \mathit{env} \to (\mathbb{V} \times \mathit{graph})\,\mathit{set} \\ E[\!| n |\!](\rho) &= \{ (v,G) \mid v = n \} \\ E[\!| x |\!](\rho) &= \{ (v,G) \mid v = \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \left\{ (v,G) \middle| \begin{array}{l} \forall v_1 v_2. (v_1,v_2) \in \mathit{tab}(v,G) \\ \Rightarrow v_2 \in E[\!|e|\!](\rho(x:=v_1)) \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ (v,G) \middle| \begin{array}{l}\exists v_1 v_2. v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land (v_2,v) {\in} \mathit{tab}(v_1,G) \end{array} \right\} \end{align*} <p><b>Example 1.</b> Let's consider again the program \[ (\lambda f. f\;f)\;(\lambda g. 42) \] We'll define a graph \(G\) as follows. \[ G(0) = [(\mathsf{fun}[1], 42)] \qquad G(1) = [(\mathsf{fun}[1], 42)] \] To show \[ (42,G) \in E[\!|(\lambda f. f\;f)\;(\lambda g. 42)|\!]\emptyset \] we need to show <ol><li> \( (\mathsf{fun}[0],G) \in E[\!|(\lambda f. f\;f)|\!]\emptyset \). So we need to show \[ (42,G) \in E[\!|f\;f|\!](f:=\mathsf{fun}[1]) \] which we have because \((\mathsf{fun}[1], 42) \in \mathsf{fun}[1]\). <li> \( (\mathsf{fun}[1],G) \in E[\!|(\lambda g. 42)|\!]\emptyset \). We need \[ (42,G) \in E[\!|42|\!](f:=\mathsf{fun}[1]) \] which is immediate. <li> \( (\mathsf{fun}[1], 42) \in \mathit{tab}(\mathsf{fun}[0]) \). This is immediate. </ol> </p> <p><b>Example 2.</b> We shall compute the factorial of 3 using the strict version of the Y combinator, that is, the Z combinator. \begin{align*} R & \equiv \lambda v. (x\; x) \; v \\ M & \equiv \lambda x. f \; R \\ Z & \equiv \lambda f. M \; M \\ F & \equiv \lambda n. \mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)\\ H & \equiv \lambda r. F \\ \mathit{fact} & \equiv Z\, H \end{align*} We shall show that \[ (6,G) \in E[\!|\mathit{fact}\;3|\!]\emptyset \] for some graph \(G\) that we need to construct. By instrumenting an interpreter for the lambda calculus and running \(\mathit{fact}\,3\), we observe the following graph. \begin{align*} G(0) &= [(\mathsf{fun}[1],\mathsf{fun}[4])] & \text{for } Z \\ G(1) &= [(\mathsf{fun}[3],\mathsf{fun}[4])] & \text{for } H \\ G(2) &= [(\mathsf{fun}[2],\mathsf{fun}[4])]& \text{for } M \\ G(3) &= [(0,1),(1,1),(2,2)] & \text{for } R \\ G(4) &= [(0,1),(1,1),(2,2),(3,6)] & \text{for } F \end{align*} We check all of the following (Tedious! I'm going to write a program to do this next time.): <ul><li> \((\mathsf{fun}[4],G) \in E[Z\,H]\emptyset \) <li> \((3,6) \in G(4) \) <li> \((\mathsf{fun}[0],G) \in E[\!|Z|\!]\emptyset\) <li> \( (\mathsf{fun}[4],G) \in E[\!|M\;M|\!](f:=\mathsf{fun}[1])\) <li> \( (\mathsf{fun}[2],G) \in E[\!|M|\!](f:=\mathsf{fun}[1]) \) <li> \( (\mathsf{fun}[4],G) \in E[\!|f \; R|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2]) \) <li> \( (\mathsf{fun}[3],G) \in E[\!|R|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2]) \) <li> \( (1,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=0) \) and \( (0,1) \in G(4) \) <li> \( (1,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=1) \) and \( (1,1) \in G(4) \) <li> \( (2,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=2) \) and \( (2,2) \in G(4) \) <li> \( (\mathsf{fun}[2], \mathsf{fun}[4]) \in G(2) \) <li> \((\mathsf{fun}[1],G) \in E[\!|H|\!]\emptyset\) <li> \((\mathsf{fun}[4],G) \in E[\!|F|\!](r:=\mathsf{fun}[3])\) <li> \((1,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=0)\) <li> \((1,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=1)\) <li> \((2,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=2)\) <li> \((6,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=3)\) </ul></p> <p>The next step is to update the proof of soundness wrt. the big-step semantics. The graphs will make it a bit more challenging. But hopefully they will make it possible to also prove completeness! </p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-47875705508742373182016-12-15T21:56:00.000-08:002017-01-14T08:36:25.051-08:00Simple Denotational Semantics for the Lambda Calculus, Pω Revisited? <p>I've been trying to understand Dana Scott's \(P_{\omega}\) and \(D_{\infty}\) models of the lambda calculus, as well as a couple large Coq formalizations of domain theory, and in the process have come up with an extraordinarily simple denotational semantics for the call-by-value lambda calculus. It borrows some of the main ideas from \(P_{\omega}\) but doesn't encode everything into numbers and puts infinity in a different place. (More discussion about this near the bottom of the post.) That being said, I still don't 100% understand \(P_{\omega}\), so there may be other subtle differences. In any event, what follows is so simple that it's either wrong or amazing that it's not already a well-known semantics. </p> <p><b>UPDATE:</b> It's wrong. See the section titled Counterexample to Completeness below. Now I need to go and read more literature. <br><b>UPDATE:</b> It seems that there is an easy fix to the problem! See the subsequent <a href="http://siek.blogspot.com/2016/12/take-3-application-with-subsumption-for.html">post</a>. </p> <p>To get started, here's the syntax of the lambda calculus. </p> \[ \begin{array}{lrcl} \text{variables} & x \\ \text{numbers} & n & \in & \mathbb{N} \\ \text{expressions} & e & ::= & n \mid x \mid \lambda x.e \mid e\;e \end{array} \] <p>So we've got a language with numbers and first-class functions. I'm going to represent functions as data in the most simple way possible, as a lookup table, i.e., a list of pairs, each pair is an input value \(v_i\) and it's corresponding output \(v'_i\). Of course, functions can have an infinite number of inputs, so how will a finite-length list suffice? </p> \[ \begin{array}{lrcl} \text{values} & v & ::= & n \mid [(v_1,v'_1),\ldots,(v_n,v'_n)] \end{array} \] <p>The answer is that we're going to write the denotational semantics as a logic program, that is, as a relation instead of a function. (We'll turn it back into a function later.) Instead of producing the result value, the semantics will ask for a result value and then say whether it is correct or not. So when it comes to a program that produces a function, the semantics will ask for a list of input-output values and say whether they agree with the function. A finite list suffices because, after all, you can't actually build an infinite list to pass into the semantics. </p> <p>So here we go, our first version of the semantics, in relational style, or equivalently, as a function with three parameters that returns a Boolean. We name the function <tt>denoto</tt>, with an <tt>o</tt> at the end as a nod to <i>The Reasoned Schemer</i>. The meaning of a \(\lambda x.e\) is any table \(T\) that agrees with the semantics of the body \(e\). (This allows the table to be empty!) The meaning of an application \(e_1 e_2\) is simply to do a lookup in a table for \(e_1\). (Requiring a non-empty table.) We write \((v,v') \in T\) when \((v,v')\) is one of the pairs in the table \(T\). </p>\begin{align*} \mathit{denoto}(n, \rho, n) &= \mathit{true} \\ \mathit{denoto}(x, \rho, \rho(x)) &= \mathit{true} \\ \mathit{denoto}(\lambda x. e, \rho, T) &= \forall v v'.\, (v,v') \in T \Rightarrow \mathit{denoto}(e,\rho(x:=v),v') \\ \mathit{denoto}(e_1 \; e_2, \rho, v) &= \left(\begin{array}{l} \exists T v_2.\; \mathit{denoto}(e_1, \rho, T) \land \mathit{denoto}(e_2, \rho, v_2) \\ \qquad \land (v_2,v) \in T \end{array} \right) \end{align*} <p>The existential quantifier in the line for application is powerful. It enables the semantics to guess a sufficiently large table \(T\), so long as the table agrees with the semantics of \(e_1\) and the later uses of the result value. Because the execution of a terminating program can only call the function with a finite number of arguments, and the execution can only use the results in a finite number of ways, there is a sufficiently large table \(T\) to cover all its uses in the execution of the whole program. Also, note that \(T\) can be large in a couple dimensions: it may handle a large number of inputs, but also, it can specify large outputs (in the case when the outputs are functions). </p> <p>Denotational semantics are usually written as functions, not relations, so we remove the third parameter and instead return a set of values. This will bring the semantics more in line with \(P_{\omega}\). Also, for this version we'll use the name \(E\) for the semantic function instead of <tt>denoto</tt>. </p>\begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \{ T \mid \forall v v'. (v,v') \in T \Rightarrow v' \in E[\!|e|\!](\rho(x:=v)) \} \\ E[\!| e_1\;e_2 |\!](\rho) &= \{ v \mid \exists T v_2. T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \land (v_2,v) {\in} T \} \end{align*} <p>With the semantics written this way, it is clear that the meaning of a \(\lambda\) is not just a finite table, but instead it is typically an infinite set of tables, each of which is an approximation of the actual infinite graph of the function. </p> <p>Is this semantics correct? I'm not entirely sure yet, but I have proved that it is sound with respect to the big-step semantics. </p> <p><b>Theorem</b> (Soundness). <br>If \(v \in E[\!| e |\!](\rho) \) and \( \rho \approx \rho' \), then \( \rho' \vdash e \Downarrow v' \) and \(v \approx v'\) for some \(v'\). </p> <p>The two semantics have different notions of values, so the relation \(\approx\) is defined to bridge the two worlds. </p>\begin{gather*} n \approx n \\[2ex] \frac{\forall v_1 v_2 v'_1. (v_1,v_2) \in T \land v_1 {\approx} v'_1 \Rightarrow \exists v'_2. \rho(x:=v'_1) \vdash e \Downarrow v'_2 \land v_2 {\approx} v'_2} {T \approx \langle \lambda x.e, \rho \rangle} \end{gather*} The definition of \(\approx\) extends to environments in the natural way. </p> <!-- <p>The other direction, Completeness, will be more difficult. I'm planning to instrument the big-step semantics to log all of the input-output values that were seen with each function and then use the log to produce the appropriate tables needed for the denotational semantics. </p> <p>After that, the next question is whether denotational equality coincides with observational equivalence. </p>--> <p>The semantics and proof of soundness in Isabelle is <a href="https://www.dropbox.com/s/x9wdm7gdyb5yo1s/DenotLam.thy?dl=0">here</a>. </p> <h3>Counterexample to Completeness</h3> <p>This semantics does not handle self application properly. Consider the program \[ (\lambda f. f \; f) \; (\lambda g. 1) \] The operational semantics says the answer is \(1\). The denotational semantics requires us to find some tables \(T \in [\!|\lambda f. f\;f|\!]\) and \(T' \in [\!|\lambda g.1|\!]\). We need \( (T',1) \in T\), so we need \(1 \in [\!|f\; f|\!](f:=T') \). That requires \((T', 1) \in T'\), but that's impossible given that we've defined the values and tables in a way that does not allow cycles. </p><h3>Relationship with \(P_{\omega}\) </h3> <p>The idea of representing functions as data, and as a lookup table, comes from \(P_{\omega}\), as does having the denotation's result be a set of values. As mentioned above, one (minor) difference is that \(P_{\omega}\) encodes everything into numbers, whereas here we've used a datatype definition for the values. However, the most important difference (if I understand \(P_{\omega}\) correctly) is that its functions are infinite in a first-class sense. That is, \(P_{\omega}\) is a solution to \[ D = \mathbb{N} + (D \to D) \] and the things in \(D \to D\) are functions with potentially infinite graphs. In contrast, I've taken a stratified approach in which I've defined the values \(V\) to include only finite representations of functions \[ V = \mathbb{N} + (V \times V) \, \mathit{list} \] and then, only at the top level, I've allowed for infinity by making the denotation of an expression be a (potentially infinite) set of values. \[ E : \mathit{exp} \to \mathit{env} \to V\, \mathit{set} \] </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-67110132097793918992016-12-08T19:55:00.001-08:002016-12-15T20:18:06.907-08:00Denotational Semantics of IMP without the Least Fixed Point <p>It has been too long since I wrote a blog post! Needless to say, parenthood, teaching, research, service, and my wonderful graduate students and post-docs have been keeping me busy. But with the Fall 2016 semester winding down I can sneak away for a bit and think. </p> <p>But first, I have something to admit. My education in the theory of programming languages has a glaring hole in it. I only have a very basic understanding of Denotational Semantics despite having taking a course on Domain Theory in graduate school. I figured it would be fun to remedy this situation, and it might even be useful in the future. So my first step was to understand the denotational semantics of the IMP language, which is the starter language for most textbooks on denotational semantics. IMP is simply an imperative language with assignment statements, while loops, if statements, and arithmetic on integers. The IMP language has two syntactic categories, expressions and commands. The following is the syntax of IMP. </p> \[ \begin{array}{lrcl} \text{variables} & x \\ \text{numbers} & n & \in & \mathbb{N} \\ \text{unary operators}&u & ::= & \neg \\ \text{binary operators}&b & ::= & + \mid - \mid \times \mid \,=\, \\ \text{expressions} & e & ::= & n \mid x \mid u(e) \mid b(e,e)\\ \text{commands} & c & ::= & \mathtt{skip} \mid x := e \mid c ; c \mid \mathtt{if}\;e\;\mathtt{then}\;c\;\mathtt{else}\;c \\ & & \mid & \mathtt{while}\;e\;\mathtt{do}\;c \end{array} \] <p>As far as I can tell, for a semantics to be a <i>denotational semantics</i>it has to satisfy two properties. <ul><li>It is a mapping from abstract syntax (the program) to a mathematical object, which is just to say some precisely defined entity, that describes the observable behavior of the program. For example, the mathematical object could be a relation between a program's inputs and outputs.</li><li>It is compositional, which means that the denotation of a particular language construct is defined in terms of the denotation of the syntactic sub-parts of the construct. For example, the meaning of a while loop is defined in terms of the meaning of its conditional and the meaning of its body. </li></ul> For the expressions of IMP, it is straightforward to write down a denotational semantics, the following function \(E\). This \(E\) is no different from a recursively defined interpreter. In the following, we map expressions to natural numbers. Following custom, we use the brackets \([\!| e |\!]\) ask a kind of quote to distinguish abstract syntax from the surrounding mathematics. To handle variables, we also pass in a function \(\sigma\) from variables to numbers, which we call a <i>state</i>. </p> \begin{align*} E[\!| n |\!](\sigma) &= n \\ E[\!| x_i |\!](\sigma) &= \sigma(x) \\ E[\!| u(e) |\!](\sigma) &= [\!|u|\!]( E[\!|e|\!] ) \\ E[\!| b(e_1,e_2) |\!](\sigma) &= [\!|b|\!]( E[\!|e_1|\!], E[\!|e_2|\!]) \\ \\ E[\!| \neg |\!](n) &= \begin{cases} 1 & \text{if } n = 0 \\ 0 & \text{if } n \neq 0\end{cases} \\ E[\!| + |\!](n_1,n_2) &= n_1 + n_2 \\ E[\!| - |\!](n_1,n_2) &= n_1 - n_2 \\ E[\!| \times |\!](n_1,n_2) &= n_1 \times n_2 \\ E[\!| = |\!](n_1,n_2) &= \begin{cases} 1 & \text{if } n_1 = n_2 \\ 0 & \text{if } n_1 \neq n_2 \end{cases} \end{align*} <p>What do the commands of IMP have in common regarding what they do? They change a state to a new state. For example, if \(\sigma\) is the incoming state, then the assignment command \( x := 42 \) outputs a new state \(\sigma'\) which is the same as \(\sigma\) except that \(\sigma'(x) = 42\). In general, the denotation of a command, \(C[\!|c|\!]\), will be a relation on states, that is, a set of pairs that match up input states with their corresponding output states. We shall give the denotational semantics of commands, one construct as a time. </p> <p>The meaning of the <tt>skip</tt> command is that it it doesn't change the state, so it relates each state to itself. </p> \[ C[\!| \mathtt{skip} |\!] = \{ (\sigma,\sigma) \mid \sigma \in \mathit{state} \} \] <p>The meaning of the assignment statement is to update the state to map the left-hand side variable to the result of the right-hand side expression. So the new state is a function that takes in a variable named \(y\) and returns \( [\!|e|\!](\sigma) \) if \(y=x\) and otherwise returns the same thing as \(\sigma\). </p> \begin{align*} C[\!| x := e |\!] &= \{ (\sigma, \sigma') \mid \sigma \in \mathit{state} \} \\ & \text{where } \sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases} \end{align*} <p>The meaning of two commands in sequence is just the meaning of the first followed by the meaning of the second. </p> \[ C[\!| c_1; c_2 |\!] = \{ (\sigma,\sigma'') \mid \exists \sigma'. (\sigma,\sigma') \in C[\!| c_1 |\!] \land (\sigma',\sigma'') \in C[\!| c_2 |\!] \} \] <p>The meaning of an <tt>if</tt> command depends on the conditional expression \(e\). If the \(e\) evaluates to 0, then the meaning of <tt>if</tt> is given by the else branch \(c_2\). Otherwise, the meaning of <tt>if</tt> is given by the then branch \(c_1\). </p> \[ C[\!| \mathtt{if}\, e \,\mathtt{then}\, c_1 \,\mathtt{else}\, c_2 |\!] = \left\{ (\sigma,\sigma') \middle| \begin{array}{l} (\sigma,\sigma') \in C[\!| c_2 |\!] & \text{if}\, E[\!|e|\!](\sigma) = 0 \\ (\sigma,\sigma') \in C[\!| c_1 |\!] & \text{if}\, E[\!|e|\!](\sigma) \neq 0 \end{array} \right\} \] <p>The meaning of the <tt>while</tt> command is the crux of the matter. This is normally where a textbook includes several pages of beautiful mathematics about monotone and continuous functions, complete partial orders, and least fixed points. We're going to bypass all of that. </p> <p>The meaning of an <tt>while</tt> command is to map each starting state \(\sigma_0\) to an ending state \(\sigma_n\) obtained by iterating it's body command so long as the condition is non-zero. Pictorially, we have the following: <div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-UeolbIPYoXw/WEnVr14APPI/AAAAAAAAAfg/pXmYbRJf7DwKNXMXL6apuc5xPnqE_VxIACLcB/s1600/loop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-UeolbIPYoXw/WEnVr14APPI/AAAAAAAAAfg/pXmYbRJf7DwKNXMXL6apuc5xPnqE_VxIACLcB/s640/loop.png" width="640" height="147" /></a></div> We define an auxiliary function named \(L\) to express the iterating of the loop. It takes as input the number of iterations of the loop, the denotation of the condition expression \(de\), and the denotation of the command \(dc\). </p>\begin{align*} L(0, de, dc) &= \{ (\sigma,\sigma) \mid de(\sigma) = 0 \} \\ L(1+n, de, dc) &= \{ (\sigma,\sigma'') \mid de(\sigma) \neq 0 \land \exists \sigma'. (\sigma,\sigma') \in dc \land (\sigma',\sigma'') \in L(n, de, dc) \} \end{align*} <p>The meaning of the <tt>while</tt> command is to relate any state \(\sigma\) to state \(\sigma'\) if \(L(n,[\!|e|\!],[\!|c|\!])\) relates \(\sigma\) to \(\sigma'\) for some \(n\). </p>\[ C[\!| \mathtt{while}\, e \,\mathtt{do}\, c |\!] = \{ (\sigma,\sigma') \mid \exists n.\, (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \} \] <p>At this point I'm worried that this is so simple that it couldn't possibly be correct. A good way to check is to prove that it is equivalent to the standard big-step semantics for IMP, which we shall do now. </p> <h3>Equivalence to the Standard Big-Step Semantics</h3> <p>The big-step semantics for the IMP language is a three-place relation on a command, a starting state, and the final state, which we shall write \(c\mid\sigma\Downarrow\sigma'\). It is defined inductively by the following rules. </p> \begin{gather*} \frac{}{\mathtt{skip} \mid \sigma \Downarrow \sigma} \\[2ex] \frac{\sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases}}{ x := e \mid \sigma \Downarrow \sigma'} \qquad \frac{c_1 \mid \sigma \Downarrow \sigma' \quad c_2 \mid \sigma' \Downarrow \sigma''} {c_1 ; c_2 \mid \sigma \Downarrow \sigma''} \\[2ex] \frac{E[\!|e|\!](\sigma) = 0 \quad c_2 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\qquad \frac{E[\!|e|\!](\sigma) \neq 0 \quad c_1 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\\[2ex] \frac{E[\!|e|\!](\sigma) = 0} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma} \qquad \frac{E[\!|e|\!](\sigma) \neq 0 \quad c \mid \sigma \Downarrow \sigma' \quad \mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma' \Downarrow \sigma''} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma''} \end{gather*} <p>(The big-step semantics is not denotational because the second rule for <tt>while</tt> is not compositional: the recursion is not on a proper sub-part but instead on the entire <tt>while</tt> command.) </p> <p>We shall prove that the denotational semantics is equivalent to the big-step semantics in two steps. <ol><li>The big-step semantics implies the denotational semantics. (completeness)</li><li>The denotation semantics implies the big-step semantics. (soundness)</li></ol></p> <p><b>Theorem</b> (Completeness). If \(c \mid \sigma \Downarrow \sigma'\), then \((\sigma,\sigma') \in [\!|c|\!]\). <br><b>Proof</b>. We proceed by induction on the derivation of \(c \mid \sigma \Downarrow \sigma'\). We have one case to consider per rule in the big-step semantics. (For the reader in a hurry: the case for <tt>while</tt> at the end is the only interesting one.) <br><b>Case:</b>\[ \frac{}{\mathtt{skip} \mid \sigma \Downarrow \sigma} \] We need to show that \((\sigma,\sigma) \in [\!|\mathtt{skip}|\!]\), which is immediate from the definition of the denotational semantics. <br><b>Case:</b> \[ \frac{\sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases}}{ x := e \mid \sigma \Downarrow \sigma'} \] We need to show that \((\sigma,\sigma') \in [\!|x := e|\!]\). Again this is immediate. <br><b>Case:</b>\[ \frac{c_1 \mid \sigma \Downarrow \sigma' \quad c_2 \mid \sigma' \Downarrow \sigma''} {c_1 ; c_2 \mid \sigma \Downarrow \sigma''} \] We have two induction hypotheses: \((\sigma,\sigma') \in C[\!|c_1|\!]\) and \((\sigma',\sigma'') \in C[\!|c_2|\!]\). It follows (by definition) that \((\sigma,\sigma'') \in C[\!|c_1 ; c_2|\!]\). <br><b>Case:</b>\[ \frac{E[\!|e|\!](\sigma) = 0 \quad c_2 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\qquad \] We have the induction hypothesis \((\sigma,\sigma') \in [\!|c_2|\!]\). Together with the condition expression evaluating to 0, we have \((\sigma,\sigma') \in C[\!|\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2|\!]\). <br><b>Case:</b>\[ \frac{E[\!|e|\!](\sigma) \neq 0 \quad c_1 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\\[2ex] \] We have the induction hypothesis \((\sigma,\sigma') \in C[\!|c_1|\!]\). Together with the condition is non-zero, we have \((\sigma,\sigma') \in C[\!|\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2|\!]\). <br><b>Case:</b>\[ \frac{E[\!|e|\!](\sigma) = 0} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma} \qquad \] From \(E[\!|e|\!](\sigma) = 0\) we have \((\sigma,\sigma) \in L(0,E[\!|e|\!],E[\!|c|\!]) \). Therefore \((\sigma,\sigma) \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\). <br><b>Case:</b>\[ \frac{E[\!|e|\!](\sigma) \neq 0 \quad c \mid \sigma \Downarrow \sigma' \quad \mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma' \Downarrow \sigma''} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma''} \] We have the induction hypotheses \((\sigma,\sigma') \in C[\!|c|\!]\) and \((\sigma',\sigma'') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\). Unpacking the definition of the later, we have \( (\sigma',\sigma'') \in L(n,E[\!|e|\!],C[\!|c|\!]) \) for some \(n\). Therefore we have \( (\sigma,\sigma'') \in L(1+n,E[\!|e|\!],C[\!|c|\!]) \). So we conclude that \((\sigma,\sigma'') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\). <br><b>QED.</b></p> <p>The other direction, that if the denotation of \(c\) relates \(\sigma\) to \(\sigma'\), then so does the big-step semantics, takes a bit more work. The proof will be by induction on the structure of \(c\). In the case for <tt>while</tt> we need to reason about the \(L\) function. We get to assume that \((\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!])\) for some \(n\) and we have the induction hypothesis that \(\forall \sigma \sigma'.\, (\sigma,\sigma') \in C[\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \). Because \(L\) is recursive, we going to need a lemma about \(L\) and prove it by induction on the number of iterations. </p> <p><b>Lemma</b>If \((\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!])\) and \(\forall \sigma \sigma'.\, (\sigma,\sigma') \in [\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \), then \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma' \). <br><b>Proof.</b>The proof is by induction on \(n\). <br><b>Case \(n=0\)</b>. We have \((\sigma,\sigma') \in L(0,E[\!|e|\!],C[\!|c|\!])\), so \(\sigma = \sigma'\) and \(E[\!|e|\!](\sigma) = 0\). Therefore we can conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\). <br><b>Case \(n=1 + n'\)</b>. We have \((\sigma,\sigma') \in L(1+n',E[\!|e|\!],C[\!|c|\!])\), so \(E[\!|e|\!](\sigma) \neq 0\) and \( (\sigma,\sigma_1) \in C[\!|c|\!] \) and \( (\sigma_1,\sigma') \in L(n',E[\!|e|\!],C[\!|c|\!]) \) for some \(\sigma_1\). From the premise about \(c\), we have \(c \mid \sigma \Downarrow \sigma_1\). From the induction hypothesis, we have \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma_1 \Downarrow \sigma'\). Putting all of these pieces together, we conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\). <br><b>QED.</b><p><b>Theorem</b> (Soundness). For any \(\sigma\) and \(\sigma'\), if \((\sigma,\sigma') \in C[\!|c|\!]\), then \(c \mid \sigma \Downarrow \sigma'\). <br><b>Proof</b>. The proof is by induction on the structure of \(c\). <br><b>Case \(\mathtt{skip}\).</b>From \((\sigma,\sigma') \in C[\!|\mathtt{skip}|\!]\) we have \(\sigma = \sigma'\) and therefore \(\mathtt{skip} \mid \sigma \Downarrow \sigma'\). <br><b>Case \(x:=e\).</b>We have \[ \sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases} \] and therefore \(x := e \mid \sigma \Downarrow \sigma'\). <br><b>Case \(c_1 ; c_2\).</b>We have \( (\sigma, \sigma_1) \in C[\!|c_1|\!]\) and \( (\sigma_1, \sigma') \in C[\!|c_2|\!]\) for some \(\sigma_1\). So by the induction hypothesis, we have \(c_1 \mid \sigma \Downarrow \sigma_1\) and \(c_2 \mid \sigma_1 \Downarrow \sigma'\), from which we conclude that \( c_1 ; c_2 \mid \sigma \Downarrow \sigma'\). <br><b>Case \(\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2\).</b>We have two cases to consider, whether \(E[\!|e|\!](\sigma) = 0\) or not. <ul><li> Suppose \(E[\!|e|\!](\sigma) = 0\). Then \( (\sigma,\sigma') \in C[\!|c_2|\!] \) and by the induction hypothesis, \( c_2 \mid \sigma \Downarrow \sigma' \). We conclude that \( \mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2 \mid \sigma \Downarrow \sigma' \). </li><li> Suppose \(E[\!|e|\!](\sigma) \neq 0\). Then \( (\sigma,\sigma') \in C[\!|c_1|\!] \) and by the induction hypothesis, \( c_1 \mid \sigma \Downarrow \sigma' \). We conclude that \( \mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2 \mid \sigma \Downarrow \sigma' \). </li></ul><br><b>Case \(\mathtt{while}\, e \,\mathtt{do}\, c\).</b> From \((\sigma,\sigma') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\) we have \( (\sigma,\sigma') \in L(n, E[\!|e|\!], C[\!|c|\!]) \). Also, by the induction hypothesis, we have that \( \forall \sigma \sigma'. \; (\sigma,\sigma') \in C[\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \). By the Lemma about \(L\), we conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\). <br><b>QED.</b></p> <p>Wow, the simple denotational semantics of IMP is correct! </p> <p>The mechanization of all this in Coq is available <a href="https://dl.dropboxusercontent.com/u/10275252/DenotIMP.v">here</a>. </p> <h3>What about infinite loops?</h3> <p>Does this denotational semantics give meaning to programs with infinite loops, such as \[ \mathtt{while}\, 1 \,\mathtt{do}\, \mathtt{skip} \] The answer is yes, the semantics defines a total function from commands to relations, so every program gets a meaning. So the next question is which relation is the denotation of an infinite loop? Just like the fixed-point semantics, the answer is the empty relation. \[ C[\!|\mathtt{while}\, 1 \,\mathtt{do}\, \mathtt{skip} |\!] = \{ (\sigma,\sigma') \mid \exists n.\; (\sigma,\sigma') \in L(n, E[\!|1|\!], C[\!|\mathtt{skip}|\!]) \} = \emptyset \] </p> <h3>Comparison to using the least fixed point semantics</h3> <p>The standard denotational semantics for IMP defines the meaning of <tt>while</tt> in terms of the least fixed point of the following functional. </p> \[ W(de, dc)(dw) = \left\{ (\sigma,\sigma') \middle| \begin{array}{l} \sigma = \sigma' & \text{if } de(\sigma) = 0 \\ \exists \sigma_1, (\sigma,\sigma_1) \in dc \land (\sigma_1,\sigma') \in dw & \text{if } de(\sigma) \neq 0 \end{array} \right\} \] <p>One of the standard ways to compute the least fixed point of a functional \(F\) is from Kleene's fixpoint theorem, which says that the least fixed point of \(F\) is \[ \bigcup_{k=0}^{\infty} F^k(\emptyset) \] where \begin{align*} F^0(x) & = x \\ F^{k+1}(x) & = F (F^k(x)) \end{align*} So the traditional denotation of <tt>while</tt> is: \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \bigcup_{k=0}^{\infty} W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \] Applying the definition of infinitary union, we have \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists k.\; (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \} \] which starts to look similar to our definition. But they are not trivially equivalent. </p> <p>Consider the following loop that counts down to zero. \[ \mathtt{while}\,\neg (x=0)\, \mathtt{do}\, x := x - 1 \] To talk about the semantics of this loop, we create the following abbreviations for some relations on states. \begin{align*} R_0 &= \{ (\sigma, \sigma) \mid \sigma(x) = 0 \} \\ R_1 &= \{ (\sigma, \sigma') \mid \sigma(x) = 1 \land \sigma'(x) = 0 \} \\ R_2 &= \{ (\sigma, \sigma') \mid \sigma(x) = 2 \land \sigma'(x) = 0 \} \\ R_3 &= \{ (\sigma, \sigma') \mid \sigma(x) = 3 \land \sigma'(x) = 0 \} \\ & \vdots \end{align*} <ul><li>If \(x=0\) in the initial state, the loop immediately terminates, so the final state is the same as the input state. This is \(R_0\).</li><li>If \(x=1\) in the initial state, the loop executes one iteration, so the final state has \(x=0\). This is \(R_1\).</li><li>If \(x=2\) in the initial state, the loop executes one iteration, so the final state has \(x=0\). This is \(R_2\).</li><li> and so on.</li></ul>The \(L\) function computes exactly these \(R\)'s. \begin{align*} L(0, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_0 \\ L(1, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_1 \\ L(2, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_2 \\ L(3, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_3\\ & \vdots \end{align*} The semantics of <tt>while</tt> given by \(L\) says that an initial state is related to a final state if it is possible to guess the iteration count \(n\) to pick out the appropriate line of \(L\) that relates the two states. \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists n.\, (\sigma,\sigma') \in L(n, [\!|e|\!],[\!|c|\!]) \} \] In contrast, Kleene's iteration incrementally builds up the union of all the \(R\)'s: \begin{align*} W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^0(\emptyset) &= \emptyset \\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^1(\emptyset) &= R_0\\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^2(\emptyset) &= R_0 \cup R_1 \\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^3(\emptyset) &= R_0 \cup R_1 \cup R_2 \\ & \vdots \end{align*} The semantics of <tt>while</tt> given by the least fixed point of \(W\) says that an initial state is related to a final state if, after a sufficient number of applications of \(W\), say \(k\), the two states are in the resulting union \(R_0 \cup \cdots \cup R_{k-1}\). \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists k.\; (\sigma,\sigma') \in W([\!|e|\!],[\!|c|\!])^k(\emptyset) \} \] </p> <p>In general, \( L(n,E[\!|e|\!],C[\!|c|\!]) \) gives the denotation of the loop exactly when \(n\) is the number of iterations executed by the loop in a given initial state. In contrast, \( W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \) produces the \(k\)th approximation of the loop's meaning, providing the appropriate initial/final states for up to \(k-1\) iterations of the loop. </p> <p>However, these two algorithms are equivalent in the following sense. </p> <p><b>Theorem (Equivalence to LFP semantics)</b><ol><li> If \( (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \), then \( \exists k.\; (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \). </li><li> If \( (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \), then \( \exists n.\; (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \). </li></ol>The first part is proved by induction on n. The second part is proved by induction on k. The full proofs are in the Coq development linked to above. </p> <h3>Parting Thoughts</h3> <ul><li>The traditional least fixed-point semantics is overkill for IMP.</li><li>What makes this simple version work is the combination of choosing the denotation of commands to be relations on states (instead of functions on states) and the use of an existential for the number of loop iterations in the meaning of <tt>while</tt>. </li><li>The IMP language is just a toy. As we consider more powerful language constructs such as functions, pointers, etc., at what point will we be forced to use the heavier-duty math of least fixed-points, continuous functions, etc.? I'm looking forward to finding out!</li></ul> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com2tag:blogger.com,1999:blog-11162230.post-45916503853620104982014-01-26T20:10:00.001-08:002014-04-22T11:00:02.342-07:00The Publication Process in Programming LanguagesThere was an interesting discussion at the ACM SIGPLAN Meeting at POPL 2014 regarding the problems and potential solutions with the publication process within programming languages. The discussion centered around the problem that we primarily publish in conferences and not journals, which makes the field of programming languages look bad compared to other disciplines. I agree that this is a problem, but it is not the only problem with our publication process. As we consider alternative publication models, we should aim to solve all of the problems (or as many as possible). The following are some of the other problems that I see with our current conference-oriented process. I conclude with a short outline of a possible solution inspired by code review processes in open-source software groups such as C++ Boost.<br /><h3>Soundness vs. Importance </h3>When making the accept/reject decision for conference publications, we judge papers on both scientific soundness (are the claims in the paper true?) and some sort of feeling of importance. These two things should be decoupled. On the one hand, we don't give ourselves enough time to evaluate scientific soundness, and on the other, we don't need to, and should not try to evaluate importance at the time of publication because <a href="http://arnetminer.org/conferencebestpapers">we are not very good at doing so</a>. A somewhat amusing/annoying side-effect of this evaluation model is that many papers dedicate several paragraphs to inflating the importance of the results, sometimes obscuring the actual results. Another rumored side-effect is authors cherry-picking data to make their results seem more important. Similarly, when is the last time you read about a negative result in a programming languages conference or journal?<br /><h3>Publication vs. Presentation </h3>The conference system conflates the decision to accept for publication with the decision to accept for presentation. Despite many conferences going to multi-track formats, the number of available presentation slots (and their cost) plays a strong role in limiting the number of papers accepted for publication. On the other hand, the cost of publishing a paper is very low (and should be much lower than what it is for the ACM). The only limit on publishing papers should be with regards to scientific soundness.<br /><h3>Publication Delay</h3>One of the supposed benefits of the conference system is fast turn-around. However, I know of many papers that take over a year, sometimes multiple years, to be published because they receive multiple rejections before being accepted for publication, and not because the paper is scientifically unsound. This phenomenon slows the scientific process because the dissemination of scientific results is delayed. (Many researchers do not feel it is appropriate to post pre-prints on their web pages.) This phenomenon also increases the reviewing load on the community because some papers end up with nine reviews when three would have sufficed.<br /><h3> Reviewer Expertise</h3>Papers are not always reviewed by the reviewers with the most expertise. On a fairly routine basis, I find minor inaccuracies and problems in published papers in the areas that I'm expert. In such situations, I feel that I should have reviewed the paper but didn't get a chance to because I wasn't on that particular program committee. I was on other PC's that year. One of the reasons this happens is that there is large overlap in topics among the major programming language conferences.<br /><h3>Narrow Audience</h3>Most papers in our area are written with a very narrow audience in mind: other experts in programming languages. The reason for this is obvious: program committees consist of experts in programming languages. This narrow writing style reduces the readability of our papers to people outside the area, including the people in industry who are in a position to apply the research.<br /><h3>Reviewer Load </h3>Recently, program chairs have been discouraging PC members from getting help from their graduate students and friends. There are a couple problems with this. First, reviewing is an important part of training graduate students. Second, with over 20 papers assigned to a PC member over a couple months time, doing high-quality reviews becomes a heroic effort. For academics whose job description includes reviewing, this kind of load is manageable in a macho sort of way. For those in industry (not counting a few research laboratories), I imagine this kind of load is a non-starter.<br /><h3>Post-Publication Reviews and Corrections</h3>The current publication process does not provide a sufficiently low-overhead mechanism for reviews and/or corrections to come along after a paper has been published. In particular, it would be great to have an easy way for someone that is trying to replicate an experiment, or apply a technique in industry, to post a review providing further data.<br /><h3>The Outline of a Solution</h3>Instead of a mixture of conferences and journals, we should just have one big continual reviewing process for all research on programming languages. Anyone who wants to participate could submit papers (anonymously or not) and anyone could review any submission (anonymously or not). Both the paper submissions and the reviews would be public. Authors would likely revise their papers as reviews come in. If it turns out that some people submit lots of papers without doing any reviews, then some simple rules regarding the ratio of paper submissions to reviews could be created. A group of editors would evaluate reviews and skim papers to decide whether a paper is scientifically sound and therefore deserving of publication, that is, deserving of their seal of approval. The editor's meta-review and decision would also be public. Of course, reviews of papers that have already been published would be allowed and could lead to a revision or even retraction of a paper. Regarding presentations, conferences and workshops could select from the recently published papers as they see fit.<br /><br />This outline of a solution has much in common with the Open Review model already used by several conferences (ICLR, ICML, and AKBC at openreview.net), though it is directly inspired by my positive experiences with the code review process in the C++ Boost open-source software group.Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com2tag:blogger.com,1999:blog-11162230.post-27961055517502504472013-05-27T05:52:00.001-07:002018-05-11T14:48:38.484-07:00Type Safety in Three Easy LemmasLast year I wrote a blog post showing how proofs of type safety can be done with just <a href="http://siek.blogspot.com/2012/08/type-safety-in-five-easy-lemmas.html">five easy lemmas</a> if one defines the programming language in a certain way, using a special abstract machine. In this blog post I'll improve on that post in two ways. I'll show that type safety can be proved with just three easy lemmas and I'll use a standard interpreter-based definition of the language instead of a special abstract machine. <br />For readers interested in mechanized metatheory, here's the <a href="https://www.dropbox.com/s/e3sws9a3raakleh/EvalSTLC.thy?dl=1">mechanized version</a> of this post in Isabelle. <br /><h3>Syntax</h3>This time we do not need to put the language in A-normal form. To review, we study a simple but Turing-complete language with integers, Booleans, and first-class functions that may be recursive. \[ \begin{array}{lrcl} \text{types} & T & ::= & \mathtt{Int} \mid \mathtt{Bool} \mid T \to T \\ \text{variables} & x,y,z \\ \text{integers} & n \\ \text{operators}&o & ::= & + \mid - \mid \,=\, \\ \text{Booleans}&b & ::= & \mathtt{true} \mid \mathtt{false}\\ \text{constants}&c & ::= & n \mid b \\ \text{expressions} & e & ::= & c \mid o(e,e) \mid x \mid \mathtt{fun}\,f(x{:}T)\, e \mid e(e) \end{array} \] <br /><h3>Dynamic Semantics</h3>As before, we use the notation \(\epsilon\) for the empty list. Given a list \(L\), the notation \(a \cdot L\) is a larger list with \(a\) as the first element and the rest of the elements are the same as \(L\). We use lists of key-value pairs (<i>association lists</i>) to represent mapping from variables to types (type environments, written \(\Gamma\)) and variables to values (environments, written \(\rho\)). The following lookup function finds the thing associated with a given key in an association list. The return type of \(\mathit{lookup}\) and the use of \(\mathit{stuck}\) and \(\mathit{return}\) deserves a little explanation, which I give below. \begin{align*} \mathit{lookup} & : \alpha \times (\alpha \times \beta) \, \mathit{list}\to \beta \, \mathit{result}\\ \mathit{lookup}(K_1, \epsilon ) &= \mathit{stuck} \\ \mathit{lookup}(K_1, (K_2,V) \cdot L ) &= \begin{cases} \mathit{return}\,V & \text{if } K_1 = K_2 \\ \mathit{lookup}(K_1,L) & \text{otherwise} \end{cases} \end{align*} One might normally consider \(\mathit{lookup}\) as a partial function, because it might fail to find a matching key in the association list. However, here we're going to make the partiality explicit by returning a special result called \(\mathit{stuck}\). We'll also include \(\mathit{timeout}\) in the kinds of results, which we explain shortly. \[ \mathit{datatype}\;\;\alpha\,\mathit{result} = \mathit{Result}\,\alpha \mid \mathit{Stuck} \mid \mathit{TimeOut} \] We define the following auxiliary notation for dealing with the \(\mathit{result}\) datatype. The most import definition is the last line, which helps us avoid cluttering our code with lots of \(\mathit{case}\)'s. \begin{align*} \mathit{return}\,a &\equiv \mathit{Result}\;a \\ \mathit{stuck} &\equiv \mathit{Stuck} \\ \mathit{timeout} &\equiv \mathit{TimeOut} \\ X \gets M_1;\; M_2 &\equiv \mathit{case}\,M_1\,\mathit{of} \\ & \qquad \mathit{Stuck} \Rightarrow \mathit{Stuck} \\ & \quad\; \mid \mathit{TimeOut} \Rightarrow \mathit{TimeOut} \\ & \quad\; \mid \mathit{Result}\, R \Rightarrow [X := R]M_2 \end{align*} The values of this language (the results of evaluation) are constants and closures. A closure pairs a function with an environment \(\rho\). \[ \begin{array}{lrcl} \text{values}& v & ::= & c \mid \langle f(x{:}T)\, e, \rho \rangle \end{array} \] In many places within the interpreter we're going to extract an integer, Boolean, or closure from a value. The extraction might fail because, for example, even though we may want to extract an integer the value may instead be a Boolean. \begin{align*} \mathit{toInt}(n) &= \mathit{return}\,n \\ \mathit{toInt}(b) &= \mathit{stuck} \\ \mathit{toInt}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{stuck} \\ \\ \mathit{toBool}(n) &= \mathit{stuck} \\ \mathit{toBool}(b) &= \mathit{return}\,b \\ \mathit{toBool}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{stuck} \\ \\ \mathit{toClosure}(n) &= \mathit{stuck} \\ \mathit{toClosure}(b) &= \mathit{stuck} \\ \mathit{toClosure}(\langle f(x{:}T)\, e, \rho \rangle ) &= \mathit{return}\,(f,x,e,\rho) \end{align*} Next we define the \(\delta\) function, which gives meaning to the primitive operators. \begin{align*} \delta(+, v_1, v_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2);\; \mathit{return}\; (n_1 + n_2) \\ \delta(-, v_1, v_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2); \; \mathit{return}\,(n_1 - n_2) \\ \delta(=, n_1, n_2) &= n_1 \gets \mathit{toInt}(v_1);\; n_2 \gets \mathit{toInt}(v_2);\; \mathit{return}\; (n_1 = n_2) \end{align*} The evaluation function \(\mathcal{V}\) is a mostly-standard closure-based interpreter. The one thing that is a bit unusual is that we make sure that the interpreter is a total function by ensuring termination. The interpreter includes a third parameter that is a counter. If the counter gets to zero, the result is \(\mathit{timeout}\). (This counter technique was described in the earlier <a href="http://siek.blogspot.com/2012/07/big-step-diverging-or-stuck.html">blog post</a>.) \begin{align*} \mathcal{V}(e,\rho,0) &= \mathit{timeout} \\ \mathcal{V}(x,\rho,1+k) &= \mathit{lookup}(x,\rho) \\ \mathcal{V}(c,\rho,1+k) &= \mathit{return}\,c \\ \mathcal{V}(\mathtt{fun}\,f(x{:}T)\, e, \rho, 1+k) &= \mathit{return}\,\langle f (x{:}T)\,e,\rho\rangle \\ \mathcal{V}(o(e_1,e_2),\rho,1+k) &= v_1 \gets \mathcal{V}(e_1,\rho,k);\; v_2 \gets \mathcal{V}(e_2,\rho,k); \; \delta(o, v_1, v_2) \\ \mathcal{V}(e_1\,e_2,\rho,1+k) &= v_1 \gets \mathcal{V}(e_1,\rho,k); \; v_2 \gets \mathcal{V}(e_2,\rho,k); \\ & \quad\; (f,x,e,\rho') \gets \mathit{toClosure}(v_1); \\ & \quad\; \mathcal{V}(e, (x,v_2) \cdot (f,v_1) \cdot \rho', k) \end{align*} To finish off the dynamic semantics we must define \(\mathit{eval}\) which specifies the behavior of whole programs. The behavior of a program is to either return a constant, return \(\mathit{diverge}\) which indicates that the program runs forever, or the behavior is undefined. \[ \mathit{eval}(e) = \begin{cases} c & \text{if } \mathcal{V}(e,\epsilon,n) = \mathit{Result}\;c \text{ for some } n \\ \mathit{diverge} & \text{if } \forall n.\; \mathcal{V}(e,\epsilon,n) = \mathit{TimeOut} \end{cases} \] <br /><h3>Type System</h3>The types for the constants is given by the \(\mathit{typeof}\) partial function. \begin{align*} \mathit{typeof}(n) &= \mathsf{Int} \\ \mathit{typeof}(\mathsf{true}) &= \mathsf{Bool} \\ \mathit{typeof}(\mathsf{false}) &= \mathsf{Bool} \end{align*} The \(\Delta\) partial function maps a primitive operator and argument types to the return type. \begin{align*} \Delta(+,\mathsf{Int},\mathsf{Int}) &= \mathsf{Int} \\ \Delta(-, \mathsf{Int},\mathsf{Int}) &= \mathsf{Int} \\ \Delta(=,\mathsf{Int},\mathsf{Int}) &= \mathsf{Bool} \end{align*} The following presents the type rules for expressions. <br />\(\Gamma \vdash e : T\)<br />\begin{gather*} \frac{\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T}{\Gamma \vdash x : T} \qquad \frac{\mathit{typeof}(c) = T}{\Gamma \vdash c : T} \\[2ex] \frac{ (x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash e : T_2 }{ \Gamma \vdash \mathtt{fun}\,f(x{:}T_1)e : T_1 \to T_2 } \\[2ex] \frac{ \Gamma \vdash e_1 : T_1 \to T_2 \quad \Gamma \vdash e_2 : T_1 }{ \Gamma \vdash e_1(e_2) : T_2 } \qquad \frac{ \begin{array}{c} \Delta(o,T_1,T_2) = T_3 \\ \Gamma \vdash e_1 : T_1 \quad \Gamma \vdash e_2 : T_2 \end{array} }{ \Gamma \vdash o(e_1,e_2) : T_3 } \end{gather*} Our proof of type safety will require that we define notions of well-typed values, results, and environments. <br />\(\vdash v : T\) \begin{gather*} \frac{\mathit{typeof}(c) = T}{\vdash c : T} \quad \frac{ \Gamma \vdash \rho \quad (x,T_1) \cdot (f,T_1\to T_2) \cdot \Gamma \vdash e : T_2 }{ \vdash \langle f(x{:}T_1)e , \rho \rangle : T_1 \to T_2 } \end{gather*} \(\vdash r : T\) \begin{gather*} \frac{\vdash v : T}{\vdash \mathit{Result}\,v : T} \quad \frac{}{\vdash \mathit{TimeOut} : T} \end{gather*} \(\Gamma \vdash \rho\) \begin{gather*} \frac{}{\epsilon \vdash \epsilon} \qquad \frac{ \vdash v : T \quad \Gamma \vdash \rho }{ (x,T) \cdot \Gamma \vdash (x,v) \cdot \rho } \end{gather*} <br /><h3>Type Safety</h3>The proof of type safety, as promised, includes just three easy lemmas. The first two lemmas are essentially identical to the corresponding lemmas in ``Type Safety in Five''. They establish that the primitive operators and lookup on environments are sound with respect to \(\Delta\) and lookup on type environments, respectively. The third lemma is the main event of the proof, showing that \(\mathcal{V}\) is sound with respect to the type system. Because we know that \(\mathcal{V}\) terminates, we can do the proof by induction on the definition of \(\mathcal{V}\), which helps to streamline the proof. We cap off the proof with the Type Safety theorem, which follows almost immediately from the soundness of \(\mathcal{V}\).<br /><b>Lemma</b> (\(\delta\) is safe) <br />If \(\Delta(o,\overline{T}) = T\) and \(\vdash v_i : T_i\) for \(i \in \{1,\ldots,n\}\), then \(\delta(o,\overline{v}) = \mathit{Result}\;v\) and \(\vdash v : T\), for some \(v\).<br /><i>Proof.</i> We proceed by cases on the operator \(o\). <br /><ol><li> If the operator \(o\) is \(+\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Int}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(+,n_1,n_2) = \mathit{Result}\, (n_1 + n_2)\) and we have \(\vdash (n_1 + n_2) : \mathsf{Int}\). </li><li> If the operator \(o\) is \(-\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Int}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(-,n_1,n_2) = \mathit{Result}\, (n_1 - n_2)\) and we have \(\vdash (n_1 - n_2) : \mathsf{Int}\). </li><li> If the operator \(o\) is \(=\), then we have \(T_1=T_2=\mathsf{Int}\) and \(T=\mathsf{Bool}\). Then because \(\vdash v_i : \mathsf{Int}\), we know that \(v_i = n_i\) for \(i \in \{1,2\}\). Then \(\delta(=,n_1,n_2) = \mathit{Result}\;(n_1 = n_2)\) and we have \(\vdash (n_1 = n_2) : \mathsf{Bool}\). </li></ol>QED.<br /><b>Lemma</b> (\(\mathit{lookup}\) is safe) <br />If \(\Gamma \vdash \rho\) and \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T\), then \(\mathit{lookup}(x,\rho) = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\).<br /><i>Proof.</i>We proceed by induction on \(\Gamma \vdash \rho\). <br /><ol><li> Case \(\epsilon \vdash \epsilon: \qquad (\Gamma=\epsilon, \rho = \epsilon)\)<br /> But then we have a contradition with the premise \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\;T\), so this case is vacuously true. </li><li> Case \(\begin{array}{c}\vdash v : T' \quad \Gamma' \vdash \rho' \\ \hline (x',T') \cdot \Gamma' \vdash (x',v) \cdot \rho' \end{array}\): <br /> Next we consider two cases, whether \(x = x'\) or not. <ol><li> Case \(x = x'\): Then \(\mathit{lookup}(x, \rho) = \mathit{Result}\;v\) and \(T = T'\), so we conclude that \(\vdash v : T\). </li><li> Case \(x \neq x'\): Then \(\mathit{lookup}(x,\rho) = \mathit{lookup}(x,\rho')\) and \(\mathit{lookup}(x,\Gamma) = \mathit{lookup}(x,\Gamma') = \mathit{Result}\;T\). By the induction hypothesis, we have \(\mathit{lookup}(x,\rho') = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\), which completes this case. </li></ol></li></ol>QED.<br /><b>Lemma</b> (\(\mathcal{V}\) is safe) <br />If \(\Gamma \vdash e : T\) and \(\Gamma \vdash \rho\), then \(\vdash \mathcal{V}(e,\rho,k) : T\).<br /><i>Proof.</i>The proof is by induction on the definition of \(\mathcal{V}\). <br /><ol><li> Case \(\mathcal{V}(e,\rho,0)\): We have \(\mathcal{V}(e,\rho,0) = \mathit{TimeOut}\) and \(\vdash \mathit{TimeOut} : T\). </li><li> Case \(\mathcal{V}(x,\rho,1+k)\): We have \(\mathcal{V}(x,\rho,1+k) = \mathit{lookup}(x,\rho)\). From \(\Gamma \vdash x : T\) we have \(\mathit{lookup}(x,\Gamma) = \mathit{Result}\; T\). Then by the \(\mathit{lookup}\) is safe lemma, we have \(\mathit{lookup}(x,\rho) = \mathit{Result}\;v\) and \(\vdash v : T\) for some \(v\). Thus we have \(\vdash \mathit{lookup}(x,\rho) : T\) and conclude \(\vdash \mathcal{V}(x,\rho,1+k) : T\). </li><li> Case \(\mathcal{V}(c,\rho,1+k)\): From \(\Gamma \vdash c : T\) we have \(\mathit{typeof}(c) = T\) and therefore \(\vdash c : T\). We have \(\mathcal{V}(c,\rho) = \mathit{Result}\;c\) and therefore \(\vdash \mathcal{V}(c,\rho,1+k) : T\). </li><li> Case \(\mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1, \rho,1+k)\): <br /> We have \(\mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1,\rho,1+k) = \mathit{Result}\;\langle f(x{:}T_1) e_1, \rho \rangle\). From \(\Gamma \vdash \mathtt{fun}\,f(x{:}T_1) e_1 : T\) we have \((x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash e_1 : T_2\), with \(T = T_1 \to T_2\). Together with \(\Gamma \vdash \rho\), we have \(\vdash \langle f(x{:}T_1) e_1, \rho \rangle : T\) and therefore \(\vdash \mathcal{V}(\mathtt{fun}\,f(x{:}T_1) e_1, \rho,1+k) : T\). </li><li> Case \(\mathcal{V}(o(e_1,e_2), \rho, 1+k))\): <br /> We have \(\Delta(o,T_1,T_2) = T\) and \(\Gamma \vdash e_1 : T_1\) and \(\Gamma \vdash e_2 : T_2\). By the induction hypothesis, we have \(\mathcal{V}(e_1,\rho,k) = \mathit{Result}\;v_1\) and \(\vdash v_1 : T_1\) and \(\mathcal{V}(e_2,\rho,k) = \mathit{Result}\;v_2\) and \(\vdash v_2 : T_2\). Because \(\delta\) is safe, we have \(\delta(o,v_1,v_2) = \mathit{Result}\;v_3\) and \(\vdash v_3 : T\) for some \(v_3\). We have \(\vdash \delta(o,v_1,v_2) : T\) and therefore \(\vdash \mathcal{V}(o(e_1,e_2), \rho, 1+k)) : T\). </li><li> Case \(\mathcal{V}(e_1\,e_2, \rho, 1+k)\): We have \(\Gamma \vdash e_1 : T_2 \to T\) and \(\Gamma \vdash e_2 : T_2\). By the induction hypothesis, we have \(\mathcal{V}(e_1,\rho,k) = \mathit{Result}\;v_1\) and \(\vdash v_1 : T_2 \to T\) and \(\mathcal{V}(e_2,\rho,k) = \mathit{Result}\;v_2\) and \(\vdash v_2 : T_2\). By cases on \(\vdash v_1 : T_2 \to T\), we have that \(v_1 = \langle f(x{:}T_2)e_3, \rho' \rangle\) and \(\Gamma' \vdash \rho'\) and \((x,T_2) \cdot (f,T_2 \to T) \cdot \Gamma' \vdash e_3 : T\) for some \(f,x,e_3,\Gamma',\rho'\). So we have \(\mathcal{V}(e_1\,e_2, \rho, 1+k) = \mathcal{V}(e_3,(x,v_2) \cdot (f,v_1) \cdot \rho',k)\). Applying the induction hypothesis for a third time, we have \(\vdash \mathcal{V}(e_3,(x,v_2) \cdot (f,v_1) \cdot \rho',k) : T\). Therefore \(\vdash \mathcal{V}(e_1\,e_2, \rho, 1+k) : T\). </li></ol>QED.<br /><b>Theorem</b> (Type Safety) <br />If \(\epsilon \vdash e : T\) and \(T \in \{ \mathtt{Int}, \mathtt{Bool}\} \), then either \(\mathit{eval}(e) = c\) and \(\vdash c : T\) for some \(c\) or \(\mathit{eval}(e) = \mathit{diverge}\) .<br /><i>Proof.</i>We consider two cases: whether the program diverges or not. <br /><ol><li> Suppose that \(\mathcal{V}(e,\epsilon,k) = \mathit{TimeOut}\) for all \(k\). Then \(\mathit{eval}(e) = \mathit{diverge}\). </li><li> Suppose it is not the case that \(\mathcal{V}(e,\epsilon,k) = \mathit{TimeOut}\) for all \(k\). So \(\mathcal{V}(e,\epsilon,k') \neq \mathit{TimeOut}\) for some \(k'\). Then because \(\mathcal{V}\) is safe, we have \(\mathcal{V}(e,\epsilon,k') = \mathit{Result}\,v\) and \(\vdash v : T\). Then from \(T \in \{ \mathtt{Int}, \mathtt{Bool} \}\) and by cases on \(\vdash v : T\), we have \(v = c\) for some \(c\). </li></ol>QED. Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com6tag:blogger.com,1999:blog-11162230.post-34543293987135261862012-10-18T21:50:00.001-07:002012-10-20T22:45:57.675-07:00Interp. of the GTLC, Part 5: Eager Cast Checking<p>Back in Part 1 of this series, I mentioned that there is a design choice between eager and lazy cast checking. Recall the following example. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With eager cast checking, the cast labeled \(\ell_1\) fails at the moment when it is applied to a value. Whereas with lazy cast checking, the \(\ell_1\) cast initially succeeds, but then later, when the function is applied at \(f\,\mathsf{true}\), the cast fails. I like eager cast checking because it tells the programmer as soon as possible that something is amiss. Further, it turns out that when using the space-efficient implementations, eager and lazy checking are about the same regarding run-time overhead. (Lazy can be faster if you don't care about space efficiency.) </p> <p>We saw the specification for lazy cast checking in Part 1, but the specification for eager checking was postponed. The reason for the postponement was that specifying the semantics of eager cast checking requires more machinery than for lazy cast checking. (I'll expand on this claim in the next paragraph.) Thankfully, in the meantime we've acquired the necessary machinery: the Coercion Calculus. The Eager Coercion Calculus was first discussed in the paper <i>Space-Efficient Gradual Typing</i> and was extended to include blame labels in <i>Exploring the Design Space of Higher-Order Casts</i>. Here we'll discuss the version with blame labels and flesh out more of the theory, such as characterizing the coercion normal forms and defining an efficient method of composing coercions in normal form. This is based on an ongoing collaboration with Ronald Garcia. </p> <p>Before getting into the eager coercion calculus, let me take some time to explain why the eager coercion calculus is needed for the semantics, not just for an efficient implementation. After all, there was no mention of coercions in the semantics of the lazy variants of the Gradually-Typed Lambda Calculus. Instead, those semantics just talked about casts, which consisted of a pair of types (source and target) and a blame label. The heart of those semantics was a \(\mathsf{cast}\) function that applies a cast to a value. </p> <p>It's instructive to see where naive definitions of a \(\mathsf{cast}\) function for eager checking break down. The most obvious thing to try is to modify the \(\mathsf{cast}\) function to check for (deep) type consistency instead of only looking at the head of the type. So we change the first line of the \(\mathsf{cast}\) function from \[ \mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \] to \[ \mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } T_1 \not\sim T_2 \] Let's see what happens on an example just a tad different from the previous example. In this example we go through \(\star \to \star\) instead of \(\star\). \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With the naive cast function, both casts initially succeed, producing the value \[ (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool} \] However, that's not what an eager cast checking should do. The above should not be a value, it should have already failed and blamed \(\ell_0\). </p> <p>So the \(\mathsf{cast}\) function not only needs to check whether the target type is consistent with the source type, it also needs to check whether the target type is consistent with all of the casts that are wrapping the value. One way we could try to do this is to compute the greatest lower bound (with respect to naive subtyping) of all the types in the casts on the value, and then compare the target type to the greatest lower bound. The meet operator on types is defined as follows: \begin{align*} B \sqcap B &= B \\ \star \sqcap T &= T \\ T \sqcap \star &= T \\ (T_1 \to T_2) \sqcap (T_3 \to T_4) &= (T_1 \sqcap T_3) \to (T_2 \sqcap T_4) \\ & \text{if } (T_1 \to T_2) \sim (T_3 \to T_4)\\ T_1 \sqcap T_2 &= \bot & \text{if } T_1 \not\sim T_2 \end{align*} We introduce the bottom type \(\bot\) so that the meet operator can be a total function. Next we define a function that computes the meet of all the casts wrapping a value. \begin{align*} \sqcap (s : T_1 \Rightarrow^{\ell} T_2) &= T_1 \sqcap T_2 \\ \sqcap (v : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_1} T_3) & = (\sqcap (v : T_1 \Rightarrow^{\ell_1} T_2)) \sqcap T_3 \end{align*} Now we can replace the first line of the \(\mathsf{cast}\) function to use this meet operator. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \left(\sqcap v\right) \not\sim T_2 \end{align*} How does this version fare on our example? An error is now triggered when the value flows into the cast labeled \(\ell_1\), so that's good, but the blame goes to \(\ell_1\). Unfortunately, the prior work on eager checking based on coercions says that \(\ell_0\) should be blamed instead! The problem with this version of \(\mathsf{cast}\) is that the \(\sqcap\) operator forgets about all the blame labels that are in the casts wrapping the value. In this example, it's dropping the label \(\ell_0\) which really ought to be blamed. </p> <h3>The Eager Coercion Calculus</h3> <p>In the context of the Coercion Calculus, one needs to add the following two reduction rules to obtain eager cast checking. What these rules do is make sure that failure coercions immediately bubble up to the top of the coercion where they can trigger a cast failure. \begin{align*} (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\hat{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*} In the second rule, we require that the domain coercion be in normal form, thereby imposing a left-to-right ordering for coercion failures. </p> <p>To ensure confluence, we also need to make two changes to existing reduction rules. In the rule for composing function coercions, we need to require that the two coercions be in normal form. (The notation \(\tilde{c}\) is new and will be explained shortly.) \begin{align*} (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \end{align*} Here's the counter-example to confluence, thanks to Ron, if the above restriction is not made. \begin{align*} (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow \mathsf{Fail}^{\ell_1}; (\mathsf{Fail}^{\ell_2}\to c_2) \longrightarrow \mathsf{Fail}^{\ell_1} \\ (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow (\mathsf{Fail}^{\ell_2};\mathsf{Fail}^{\ell_1}) \to (c_1; c_2) \\ & \longrightarrow \mathsf{Fail}^{\ell_2} \to (c_1; c_2)\\ & \longrightarrow \mathsf{Fail}^{\ell_2} \end{align*} There is also a confluence problem regarding the following rule. \begin{align*} \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*} The counter-example, again thanks to Ron, is \begin{align*} (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota;\iota) \to (\mathsf{Bool}!; \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \iota \to (\mathsf{Fail}^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \mathsf{Fail}^{\ell_2} \\ (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota \to \mathsf{Bool}!); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow \mathsf{Fail}^{\ell_1} \end{align*} We fix this problem by making the reduction rule more specific, by only allowing injections to be consumed on the left of a failure. \begin{align*} I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*} </p> <p>Here's the complete set of reduction rules for the Eager Coercion Calculus. \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\tilde{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*} </p> <p>These additions and changes to the reduction rules cause changes in the normal forms for coercions. First, \(\mathsf{Fail}^\ell\) cannot appear under a function coercion We therefore introduce another category, called ``normal parts'' and written \(\tilde{c}\), that excludes \(\mathsf{Fail}^\ell\) (but still includes \(I?^{\ell_1}; \mathsf{Fail}^{\ell_2}\) because the \(\ell_1\) projection could still fail and take precedence over \(\ell_2\)). Also, \( (\tilde{c}_1 \to \tilde{c}_2); \mathsf{Fail}^\ell\) is now a normal form. Further, to regularize the form that coercions can take, we always write them as having three parts. The following grammar defines the normal coercions for eager cast checking. \[ \begin{array}{llcl} \text{optional injections} & i & ::= & \iota \mid I! \\ & i_\bot & ::= & i \mid \mathsf{Fail}^\ell \\ \text{optional functions} & f & ::= & \iota \mid \tilde{c} \to \tilde{c} \\ & f_\bot & ::= & f \mid \mathsf{Fail}^\ell \\ \text{optional projections} & j & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \iota; f; i \qquad \dagger\\ \text{normal parts} & \tilde{c} & ::= & j ; f; i_\bot \qquad \ddagger \\ \text{normal coercions} & \hat{c} & ::= & \tilde{c} \mid \iota; \iota; \mathsf{Fail}^\ell \end{array} \] \(\dagger\) The coercion \((\iota ;\iota; \iota)\) is not a wrapper coercion. <br>\(\ddagger\) The coercion \((\iota; \iota; \mathsf{Fail}^\ell)\) is not a normal part. <br></p> <h3>The Eager Gradually-Typed Lambda Calculus</h3> <p>Taking a step back, recall that we gave the <em>semantics</em> of the Lazy Gradually-Typed Lambda Calculus in terms of a denotational semantics, based on an evaluation function \(\mathcal{E}\). We can do the same for the Eager variant but using coercions to give the meaning of casts. The following is the definition of values and results for the Eager variant. \[ \begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : \overline{c} \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array} \] </p> <p>Most of the action in the \(\mathcal{E}\) function is in the \(\mathsf{cast}\) auxiliary function. We will give an alternative version of \(\mathsf{cast}\) for eager checking. To make \(\mathsf{cast}\) more succinct we make use of the following helper function regarding cast failure. \[ \mathsf{isfail}(c,\ell) \equiv (c = \mathsf{Fail}^\ell \text{ or } c = \mathsf{Fail}^\ell \circ (\tilde{c}_1 \to \tilde{c}_2) \text{ for some } \tilde{c}_1 \text{ and } \tilde{c}_2) \] Here's the updated definition of \(\mathsf{cast}\) for eager checking. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \overline{c}_3 \end{cases} \end{align*} </p> <p>We can now give the definition of \(\mathcal{E}\), making use of the above \(\mathsf{cast}\) function as well as a function \(\mathcal{C}\) for compiling casts to coercions. (Use \(\mathcal{C}_{\mathit{D}}\) or \(\mathcal{C}_{\mathit{UD}}\) for \(\mathcal{C}\) to obtain the D or UD blame tracking strategy.) \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X, \mathcal{C}(T_1 \Rightarrow^\ell T_2)) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad\; \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad\; \mathsf{apply}(X_1,X_2) \end{align*} </p> <p>The semantics for the Eager Gradually-Typed Lambda Calculus is defined by the following \(\mathit{eval}\) partial function. \[ \mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases} \] where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : \iota \circ (\hat{c}_1 \to \hat{c}_2) \circ \iota) &= \mathit{function} \\ \mathit{observe}(v : I! \circ \iota \circ \iota) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*} </p> <h3>An Eager Space-Efficient Machine</h3> <p>To obtain a space efficient machine for the Eager variant, we just plug the eager version of \(\mathsf{cast}\) into the lazy space-efficient machine. </p> <h3>An Eager Time-Efficient Machine</h3> <p>Recall that the lazy time-efficient machine used threesomes instead of coercions because we could define an efficient function for composing threesomes, whereas reducing coercions is a complex process. The natural thing to do here is to try and come up with an eager variant of threesomes and the composition function. The lazy threesomes were isomorphic to lazy coercions in normal form, and we already have the normal forms for eager coercions, so it should be straightforward to come up with eager threesomes. It is straightforward, but in this case nothing is gained; we just end up with a slightly different notation. The reason is that the normal forms for eager coercions are more complex. So we might as well stick with using the eager coercions. </p> <p>However, the essential lesson from the threesomes is that we don't need to implement reduction on coercions, instead we just need to define a composition function that takes coercions in normal form. After thinking about this for a long time, trying lots of variants, we've come up with the definition shown below. (Here we use \(\rhd\) for composition. I'd prefer to use the fatsemi latex symbol, but it seems that is not available in MathJax.) </p> <p>Composition of Normal Coercions: \( \hat{c} \rhd \hat{c}\)<br>\begin{align*} (j; f; i_\bot) \rhd (j'; f'; i'_\bot) &= \mathbf{case}\;i_\bot \rhd j'\;\mathbf{of}\\ & \qquad I! \Rightarrow j; f; (I! \rhd i'_\bot) \\ & \quad \mid I?^\ell \Rightarrow I?^\ell; f'; i'_\bot \\ & \quad \mid \mathsf{Fail}^\ell \Rightarrow j; f; \mathsf{Fail}^\ell \\ & \quad \mid c \Rightarrow \mathbf{case}\;(f \rhd c) \rhd f' \;\mathbf{of}\\ & \qquad\qquad\quad \mathsf{Fail}^\ell \Rightarrow j; \iota; \mathsf{Fail}^\ell\\ & \qquad\quad\quad \mid c' \Rightarrow j; c'; i'_\bot \end{align*} \begin{align*} \iota \rhd c &= c \\ c \rhd \iota &= c \\ I_1! \rhd I_2?^\ell &= \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_1 \to \tilde{c}_2) \rhd (\tilde{c}_3 \to \tilde{c}_4) &= (\tilde{c_3}\rhd \tilde{c_1}) \overset{\bullet}{\to} (\tilde{c}_2 \rhd \tilde{c}_4) \\ \mathsf{Fail}^\ell \rhd c &= \mathsf{Fail}^\ell \\ I! \rhd \mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \\ \\ \tilde{c}_1 \overset{\bullet}{\to} \tilde{c}_2 &= \tilde{c}_1 \to \tilde{c}_2 \\ \mathsf{Fail}^\ell \overset{\bullet}{\to}\hat{c}_2 &= \mathsf{Fail}^\ell \\ \tilde{c}_1 \overset{\bullet}{\to}\mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \end{align*} </p> <p>To obtain an eager, time-efficient machine, we just replace coercion reduction with coercion composition. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \overline{c}_3 \end{cases} \end{align*} </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-89607921316346233962012-10-08T22:25:00.001-07:002012-10-08T22:25:35.646-07:00Is TypeScript gradually typed? Part 2<p>Consider the following TypeScript program, in which a number is stored in variable <tt>x</tt> of type <tt>any</tt> and then passed to the <tt>display</tt> function that expects a <tt>string</tt>. As we saw in the previous post, a gradual type system allows the implicit down-cast from <tt>any</tt> to <tt>string</tt>, so this is a well-typed program. <pre><br />function display(y : string) {<br /> document.body.innerHTML = y.charAt(0);<br />}<br /><br />var x : any = 3;<br />display(x);<br /></pre> But what happens at run-time? The answer for TypeScript is that this program hits an error at the <tt>y.charAt(0)</tt> method call because <tt>charAt</tt> is not supported by numbers like <tt>3</tt>. But isn't <tt>y</tt> guaranteed to be a <tt>string</tt>? No, not in TypeScript. TypeScript does not guarantee that the run-time value in a variable is consistent with the static type of the variable. The reason for this is simple, TypeScript does not perform any run-time checks at down-casts to ensure that the incoming value is of the target type. In the above program, the call <tt>display(x)</tt> causes an implicit cast to <tt>string</tt>, but there's no run-time check to make sure that the value is in fact a string. TypeScript is implemented as a compiler to JavaScript, and the compiler simply ignores the implicit casts. Let's refer to this no-checking semantics as <b>level 1 gradual typing</b>. I briefly describe this approach in the paper <i>Gradual Typing for Objects</i>. </p> <h3>Level 2 Gradual Typing</h3> <p>A second alternative semantics for gradual typing is to perform run-time checking to ensure that values are consistent with the static types. For implicit casts concerning simple types like <tt>number</tt> and <tt>string</tt>, this run-time checking is straightforward. In the above example program, an error would be signaled just prior to the call to <tt>display</tt>, saying that <tt>display</tt> expects a <tt>string</tt>, not a <tt>number</tt>. </p> <p>For implicit casts concerning complex types, such as function and object types, run-time checking is more subtle. Consider the following program that defines a <tt>deriv</tt>function that takes another function as a parameter. <pre><br />function deriv(d:number, f:(number)=>number, x:number) {<br /> return (f(x + d) - f(x - d)) / (2.0 * d);<br />}<br /><br />function fun(y):any {<br /> if (y > 0)<br /> return Math.pow(y,3) - y - 1;<br /> else<br /> return "yikes";<br />}<br /><br />deriv(0.01, fun, 3.0);<br />deriv(0.01, fun, -3.0);<br /></pre> The function <tt>fun</tt> has type <tt>(any)=>any</tt>, and at each call to <tt>deriv</tt>, this function is implicitly cast to <tt>(number)=>number</tt>. The fundamental challenge in casting functions is that it's impossible to tell in general how a function will behave, and in particular, what the return value will be. Here we don't know whether <tt>fun</tt> will return a <tt>number</tt> or a <tt>string</tt>until we've actually called it. </p> <p>The standard way to deal with function casts is to delay the checking until subsequent calls. One way to visualize this semantics is to imagine the compiler generating the following wrapper function, <tt>casted_fun</tt>, that applies casts to the argument and return value. <pre><br />function casted_fun(z:number):number {<br /> return <number>fun(<any>z);<br />}<br /><br />deriv(0.01, casted_fun, 3.0);<br />deriv(0.01, casted_fun, -3.0);<br /></pre></p> <p>My first two papers on gradual typing, <i>Gradual Typing for Functional Languages</i>and <i>Gradual Typing for Objects</i>, both used level 2 gradual typing. </p> <h3>Level 3 Gradual Typing</h3> <p>The down-side of delayed checking of function casts is that when an error is finally caught, the location of the error can be far away from the cast that failed. In the above example, the error would occur during the call <tt>f(x + d)</tt>, not at the call to <tt>deriv</tt>. Findler and Felleisen solved this problem by introducing the notion of blame tracking in their paper <i>Contracts for higher-order functions</i>. The idea is to associate source location information with each cast and then to carry along this information at run-time, in the wrapper functions, so that when the cast in a wrapper fails, it can emit an error that mentions the source location of the original cast, in this example, the call to <tt>deriv</tt>. </p> <p>Implementing casts in a way that supports blame tracking while also keeping space overheads to a constant factor is challenging. My paper <i>Threesomes, With and Without Blame</i> shows how to do this. </p> <h3>Discussion</h3> <p>Each of the three levels comes with some advantages and disadvantages. Level 1 gradual typing is the easiest to implement, an important engineering concern, and it comes with no run-time overhead, as there is no run-time checking. On the other hand, level 1 gradual typing does not provide run-time support for catching broken invariants, such as <tt>deriv</tt>'s expectation that its arguments have type <tt>string</tt>. Thus, a TypeScript programmer that really wants to enforce such an invariant would need to add code to check the type of the argument, a common practice today in JavaScript. </p> <p>Level 2 gradual typing ensures that the value stored in a variable is consistent with the variable's static type and it provides the run-time checking to catch when this invariant is about to be broken. Thus, level 2 gradual typing removes the need for hand-written type tests. Also, level 2 gradual typing opens up the possibility of compiling statically-typed regions of a program in a more efficient, type-specific manner. (This is an active area of my current research.) The disadvantages of level 2 gradual typing are the run-time overhead from cast checking and the increased implementation complexity.</p> <p>Level 3 gradual typing improves on level 2 by adding blame tracking, thereby improving the diagnostic errors reported when a cast fails. The extra cost of blame tracking is not very significant, so I would always suggest level 3 over level 2. </p> </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com6tag:blogger.com,1999:blog-11162230.post-90850965798442131842012-10-04T10:25:00.000-07:002012-10-26T21:35:53.287-07:00Is TypeScript gradually typed? Part 1<p>If you haven't heard already, there's a new language named TypeScript from Microsoft, designed by Anders Hejlsberg and several others, including a recent alumni from my research group named Jonathan Turner. The TypeScript language extends JavaScript with features that are intended to help with large-scale programming such as optional static type checking, classes, interfaces, and modules. In this post I'll try to characterize what optional static typing means for TypeScript. There are a large number of possible design decisions regarding optional static typing, so the characterization is non-trivial. When discussing types, it's often easy to fixate on the static semantics, that is, how the type checker should behave, but we'll also need to look at the dynamic semantics of TypeScript in Part 2 of this post. The punch line will be that TypeScript is a gradually-typed language, but only to level 1. (I'll define levels that go from 1 to 3 and discuss their pros and cons.) </p> <h3>Static Semantics (Type System)</h3> <p>TypeScript has an <tt>any</tt> type. Variables and fields of this type can store any type of value. TypeScript has function types that describe the types of the parameters and return types of a function and the way in which the <tt>any</tt> type and function types interact is closely related to the design I wrote about in <i>Gradual Typing for Functional Languages</i>, SFP 2006. Further, TypeScript has object types to describe the types of fields and methods within an object. The way in which the <tt>any</tt> type and the object types behave in TypeScript is closely related to the system I described in <i>Gradual Typing for Objects</i>, ECOOP 2007. </p> <p>The basic feature of <tt>any</tt> is that you can implicitly convert from any type to <tt>any</tt> and you can implicitly convert from <tt>any</tt> to any other type. For example, the following is a well-typed program in TypeScript that demonstrates converting from type <tt>string</tt> to type <tt>any</tt>and back to <tt>string</tt>. <pre><br />var answer : string = "42";<br />var a : any = answer;<br />var the_answer : string = a;<br />document.body.innerHTML = the_answer;<br /></pre> On the other hand, a gradual type system acts like a static type system when the <tt>any</tt> type is not involved. For example, the following program tries to implicitly convert from a <tt>number</tt>to a <tt>string</tt>, so the type system rejects this program. <pre><br />var answer : number = 42;<br />var the_answer : string = answer;<br /></pre> </p> <p>Next, let's look at how <tt>any</tt> and function types interact. TypeScript uses structural typing for function types, which means that whether you can convert from one function type to another depends on the parts of the function type. The parts are the parameter and the return types. Consider the following example, in which a function of type <tt>(string)=>string</tt> is implicitly converted to <tt>(any)=>string</tt> and then to <tt>(any)=>any</tt>. <pre><br />function f(x:string):string { return x; }<br />var g : (any)=>string = f;<br />var h : any = g;<br />document.body.innerHTML = h("42");<br /></pre> The first conversion is interesting because, if <tt>g</tt> is called with an argument of type <tt>any</tt>, then the argument needs to be implicitly converted to the <tt>string</tt> that <tt>f</tt> expects. This is an implicit down-cast, and doesn't follow the contra-variance rule for functions that one sees in the subtyping rules for object-oriented languages. Indeed, in a gradually typed system, assignment compatibility is co-variant in the parameter type of a function, at least, with respect to the <tt>any</tt> type. The second conversion, from <tt>(any)=>string</tt> to <tt>any</tt> is not so surprising, it's just up-casting from <tt>(any)=>string</tt> to <tt>any</tt>. Interestingly, there is a third implicit conversion in this program. Can you see it? It's in the call to <tt>h</tt>. The fact that we're calling <tt>h</tt> implies that <tt>h</tt> needs to be a function (or something callable), so there's essentially an implicit conversion here from <tt>any</tt> to <tt>(string)=>any</tt>. </p> <p>Next let's look at implicit conversions involving object types. Like function types, object types are also structural. Consider the following well-typed program in TypeScript, in which an object of type <tt>{x:number; y:any}</tt>is implicitly converted to <tt>{x:any; y:string}</tt>, then <tt>{x:number}</tt>, and finally to <tt>any</tt>. <pre><br />var o : {x:number; y:any;} = {x:1, y:"42"};<br />var p : {x:any; y:string;} = o;<br />var q : {x:number;} = p;<br />var r : any = p;<br />document.body.innerHTML = r.y;<br /></pre> The assignment of <tt>o</tt> to <tt>p</tt> shows structural changes within an object type, both to and from <tt>any</tt>. The next conversion, to <tt>{x:number}</tt>, shows that the type system allows implicit narrowing of object types. Thus, the rules governing implicit conversion are quite close to the <i>consistent-subtyping</i>relation described in <i>Gradual Typing for Objects</i>. This relation combines the <i>consistency</i> relation that governs the static behavior of <tt>any</tt>(sometimes called compatibility) with the traditional subtyping relation of structural type systems that allows the implicit narrowing of object types. Getting back to the above example, similar to the function call at type <tt>any</tt>, TypeScript allows member access on things of type <tt>any</tt>. </p> <p>The next example is not well-typed in TypeScript. <pre><br />var o : {x:number; y:any; } = {x:1, y:"42"};<br />var q : {x:number;} = o;<br />var r : {x:number; y:any;} = q;<br />document.body.innerHTML = r.y;<br /></pre>The <tt>tsc</tt> compiler complains that <pre><br />example.ts(3,29): Cannot convert '{ x: number; }' <br /> to '{ x: number; y: any; }':<br />Type '{ x: number; }' is missing property 'y'<br /> from type '{ x: number; y: any; }'<br /></pre>which shows the TypeScript doesn't allow implicit widening (again in line with the consistent-subtyping relation). </p> <p>To wrap up the discussion of the static semantics, let's take a look at the interaction between function types (arrows) and object types. To quote John Reynolds by way of Olivier Danvy, "As usual, something funny happens at the left of the arrow". I'm curious to see whether object narrowing is contra-variant in the parameters of function types, which is what I'd expect based on traditional subtyping relations and based on wanting a consistent design with respect to not allowing implicit widening. Consider the following example. <pre><br />function f(o: {x:number;}):string { return "42"; };<br />var g : (o: {x:number; y:number;})=>string = f;<br />var h : (o: {x:number;})=>string = g;<br />document.body.innerHTML = h({x:1,y:2});<br /></pre>The conversion from <tt>f</tt> to <tt>g</tt> should be OK, because it only requires an argument of type <tt>{x:number; y:number;}</tt>to be up-cast (narrowed) to <tt>{x:number;}</tt>. However, the conversion from <tt>g</tt> to <tt>h</tt>should not be OK because it requires an argument of type <tt>{x:number;}</tt> to be implicitly down-cast (widened) to <tt>{x:number; y:number;}</tt>. Surprisingly, the <tt>tsc</tt> compiler does not give a type error for the above example! So what I said above about TypeScript disallowing implicit widening is not quite true. In many cases it disallows widening, but here we see an exception to the rule. I don't like exceptions in language design because they increase the complexity of the language. So on this one point, TypeScript differs from the design in <i>Gradual Typing for Objects</i>. Perhaps Jonathan can comment on whether this difference was intentional or accidental. </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com7tag:blogger.com,1999:blog-11162230.post-8670564545025985052012-09-20T10:53:00.000-07:002012-09-20T10:53:03.364-07:00Interpretations of the GTLC, Part 4: Even Faster<p>Consider the following statically-typed function. (The type \(\star\) does not occur anywhere in this function.) \[ \lambda f: \mathsf{Int}{\to}\mathsf{Int}. \; x{:}\mathsf{Int} = f(42); \mathbf{return}\,x \] We'd like the execution speed of this function to be the same as if the entire language were statically typed. That is, we don't want statically-typed parts of a gradually-typed program to pay overhead because other parts may be dynamically typed. Unfortunately, in the abstract machines that we've defined so far, there is an overhead. At the point of a function call, such as \(f(42)\) above, the machine needs to check whether \(f\) has evaluated to a closure or to a closure wrapped in a threesome. This act of checking constitutes some run-time overhead. </p> <p>Taking a step back, there are two approaches that one sees in the literature regarding how a cast is applied to a function. One approach is to build a new function that casts the argument, applies the old function, and then casts the result. The reduction rule looks like this: \[ v : T_1 \to T_2 \Rightarrow T_3 \to T_4 \longrightarrow \lambda x{:}T_3. (v\,(x : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that there's only one kind of value of function type, functions! So when it comes to function application, we only need one reduction rule, good old beta: \[ (\lambda x{:}T.\,e)\, v \longrightarrow [x{:=}v]e \] The other approach is to leave the cast around the function and then add a second reduction rule for applications. \[ (v_1 : T_1 \to T_2 \Rightarrow T_3 \to T_4) \, v_2 \longrightarrow (v_1\, (v_2 : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4 \] The nice thing about this approach is that the cast around the function is easy to access and change, which we took advantage of to compress sequences of such casts. But as we've already pointed out, having two kinds of values at function type induces some run-time overhead, even in parts of the program that are statically typed. </p> <p>Our solution to this conundrum is to use a hybrid representation and to take advantage of the indirection that is already present in a function call. Instead of having two kinds of values at function type, we have only one: a closure that includes an optional threesome: \[ \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \] When a closure is first created, there is no threesome. Later, when a closure is cast, the threesome is added. \[ V(\lambda x{:}T.\, s,\rho) = \langle \lambda x{:}T.\, s, \rho, \bot \rangle \] The one transition rule for function application passes the optional threesome as a special parameter, here named \(c\), to the function. In the case of an un-casted closure, the function ignores the \(c\) parameter. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \end{align*} </p> <p>When an un-casted closure is cast, we build a wrapper function, similar to the first approach discussed above, but using the special variable \(c\) to refer to the threesome instead of hard-coding the cast into the wrapper function. We add \(\mathit{dom}\) and \(\mathit{cod}\) operations for accessing the parts of a function threesome. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s', \rho', \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ & \text{and } \rho' = \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \} \end{align*} When a closure is cast for the second time, the casts are combined to save space. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*} </p> <p>That's it. We now have a machine that doesn't perform extra dispatching at function calls. There is still a tiny bit of overhead in the form of passing the \(c\) argument. This overhead can be removed by passing the entire closure to itself (instead of passing the array of free variables and the threesome separately), and from inside the function, access the threesome from the closure. </p> <p>In the following I give the complete definitions for the new abstraction machine. In addition to \(\mathit{dom}\) and \(\mathit{cod}\), we add a tail call without a cast to avoid overhead when there is no cast. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \mid \mathit{dom}(e) \mid \mathit{cod}(e) \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) \mid \mathbf{return}\,e(e) : \tau \\ \text{optional threesomes} & \tau_\bot & ::= & \bot \mid \tau \\ \text{values}& v & ::= & k \mid k : \tau \mid \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \end{array} \] Here's the complete definition of the cast function. \begin{align*} \mathsf{cast}(v, \bot) &= v \\ \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s' , \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \}, \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*} Here are the updated evaluation rules. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho, \bot \rangle \\ V(\mathit{dom}(e),\rho) &= \tau_1 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \\ V(\mathit{cod}(e),\rho) &= \tau_2 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \end{align*} Lastly, here are the transition rules for the machine. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau, (x,s,\rho')::\kappa)) & \longmapsto (s, \rho'[x{:=}v'], \kappa) \\ \text{where }& V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2), \rho,\kappa) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot],\kappa) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho,(\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot], ((\tau_1; \tau_2), \sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\[2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*} </p> <p>I like how there are fewer rules and the rules are somewhat simpler compared to the previous machine. There is one last bit of overhead in statically typed code: in a normal return we have to apply the pending threesome that's on the stack. If one doesn't care about making tail-calls space efficient in the presence of casts, then this wouldn't be necessary. But I care. </p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-37795884960807843822012-09-19T15:40:00.000-07:002012-09-20T09:34:59.489-07:00Interpretations of the GTLC: Part 3, Going Faster<p>The intuition for an efficient coercion composition function came from thinking about types, not coercions. We'll start with the UD blame tracking strategy and then later consider D. Also, for now we'll stick with lazy cast checking. </p> <p>Consider the following sequence of casts: \[ e : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_2} \cdots \Rightarrow^{\ell_{n-1}} T_n \] We'd like some way to summarize the sequence of types without loosing any important information. That is, we'd like to come up with something that can catch the same cast errors as the entire sequence, blaming the appropriate label, but using less space. Imagine the \(n\) types as a line of differently colored trees on the side of a road. If you're next to the road, staring down the line of trees, you see what looks like one tree with branches of many colors. Some of the branches from further-away trees are hidden from view by closer trees, but some are visible. Now, suppose we wanted to maintain the same view from your standpoint, but save on water. We could replace the line of trees with a single multi-colored tree that includes all the branches visible to you. The figure below depicts three differently-colored trees getting merged into a single multi-color tree. The nodes without color should be considered transparent. <br> <div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-TWjdKe_mcZA/UFqGjRsWgWI/AAAAAAAAAD4/tDwj1hJzZy0/s1600/tree-merge-pair.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="121" width="320" src="http://4.bp.blogspot.com/-TWjdKe_mcZA/UFqGjRsWgWI/AAAAAAAAAD4/tDwj1hJzZy0/s320/tree-merge-pair.png" /></a></div> </p> <p>Mapping this idea back to types, the colors are blame labels and transparent nodes are the type \(\star\). Because we need to color individual branches, we need blame labels on every internal node of a type. In particular, we need the notion of a labeled type: \[ \begin{array}{lrcl} \text{optional labels} & p,q & ::= & \epsilon \mid \ell \\ \text{labeled types} & P,Q & ::= & B^p \mid P \to^p Q \mid \star \mid (I^p; \bot^\ell) \end{array} \] (These labeled types are for the UD strategy. We'll discuss the labeled types for D later. Also, the labeled type \(I^p; \bot^\ell\) deserves some explanation, which we'll get to soon.) The \(\mathit{label}\) function returns the top-most label of a labeled type: \begin{align*} \mathit{label}(B^p) &= p \\ \mathit{label}(P \to^p Q) &= p \\ \mathit{label}(\star) &= \epsilon \\ \mathit{label}(I^p; \bot^\ell) &= p \end{align*} </p> <p>We'll define a function for composing two labeled types \(P\) and \(Q\) to produce a new labeled type \(P'\), using semicolon as the syntax for this composition function: \[ P ; Q = P' \] We replace each cast with a threesome, that is, a cast annotated with a labeled type. The labeled type is computed by a simple function \(\mathcal{L}\) that we define below. \begin{align*} & e : T_1 \Rightarrow^\ell T_2 \text{ becomes } e : T_1 \overset{P}{\Longrightarrow} T_2 \\ & \text{ where } P = \mathcal{L}(T_1 \Rightarrow^\ell T_2) \end{align*} \begin{align*} \mathcal{L}(B \Rightarrow^\ell B) &= B \\ \mathcal{L}(\star \Rightarrow^\ell \star) &= \star \\ \mathcal{L}(B \Rightarrow^\ell \star) &= B \\ \mathcal{L}(\star \Rightarrow^\ell B) &= B^\ell \\ \mathcal{L}(T_1 \Rightarrow^\ell T_2) &= I ; \bot^\ell \qquad \text{where } I \sim T_1 \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell T_4) \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell \star) &= \mathcal{L}(\star \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell \star) \\ \mathcal{L}(\star \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell \star) \to^\ell \mathcal{L}(\star \Rightarrow^\ell T_4) \end{align*} A sequence of threesomes is compressed to a single threesome using the composition function: \begin{gather*} e : T_1 \overset{P_1}{\Longrightarrow} T_2 \overset{P_2}{\Longrightarrow} \cdots \overset{P_{n-1}}{\Longrightarrow} T_n \\ \text{becomes} \\ e : T_1 \overset{P}{\Longrightarrow} T_n \\ \text{where } P = P_1; P_2; \cdots; P_{n-1} \end{gather*} </p> <p>Before we go into the details of the composition function, it helps to see how (well-formed) threesomes correspond to coercions in normal form. With this correspondence in place, we can use the coercion reduction rules to help guide the definition of threesome composition. The function \(\mathit{TC}\) defined below maps threesomes to coercions in normal form. This function is an isomorphism, so it's inverse maps normal coercions back to threesomes. \begin{align*} \mathit{TC}(B \overset{B}{\Longrightarrow} B) &= \iota_B \\ \mathit{TC}(\star \overset{\star}{\Longrightarrow}\star) &=\iota_\star \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} B) &= B?^\ell \\ \mathit{TC}(B\overset{B}{\Longrightarrow} \star) &= B! \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} \star) &= B?^\ell; B! \\ \mathit{TC}(T_1 \overset{I; \bot^\ell}{\Longrightarrow} T_2) &= \mathsf{Fail}^\ell \\ \mathit{TC}(T_1 \overset{I^{\ell_1}; \bot^{\ell_2}}{\Longrightarrow} T_2) &= I?^{\ell_1} ; \mathsf{Fail}^{\ell_2} \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} T_3 \to T_4)&= \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} T_3 \to T_4)&= (\star \to \star)?^\ell ; \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} \star)&= \mathit{TC}(\star \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} \star)&= (\star \to \star)?^\ell ; \mathit{TC}(\star \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \end{align*} </p> <p>We're ready to make precise how two labeled types can be composed to form a single labeled type. The two cases in which one of the labeled types is \(\star\) are easy: just return the other type: \begin{align*} \star; Q &= Q \\ P; \star &= P \end{align*} Next, suppose we have \(\mathsf{Int}^{\ell_1}\) followed by \(\mathsf{Int}^{\ell_2}\). These should compose to \(\mathsf{Int}^{\ell_1}\) because if the first cast succeeds, so will the second, making the blame label \(\ell_2\) redundant. In general, for labeled basic types we have the following rule. \begin{equation} \label{eq:1} B^p; B^q = B^p \end{equation} Suppose instead that the basic types don't match. The right-hand side of the following rule is a bit tricky, so let's think about this in terms of coercions. \begin{equation} \label{eq:2} B_1^p ; B_2^q = B_1^p ; \bot^q \qquad \text{if } B_1 \neq B_2 \end{equation} Suppose \(p=\ell_1, q = \ell_2\) and these two labeled types come from the threesomes \[ \star \overset{B_1^{\ell_1}}{\Longrightarrow} \star \overset{B_2^{\ell_2}}{\Longrightarrow} B_2 \] The corresponding coercion sequence is \[ B_1?^{\ell_1} ; B_1! ; B_2?^{\ell_2} \] which reduces to \[ B_1?^{\ell_1} ; \mathsf{Fail}^{\ell_2} \] and that corresponds to the labeled type for errors, \(B_1^{\ell_1}; \bot^{\ell_2}\). We also need to consider a mismatch between basic types and function types: \begin{align} B^p; (P \to^q Q) &= B^q; \bot^q \\ (P \to^p Q); B^q &= (\star \to \star)^p; \bot^q \end{align} The rule for labeled function types takes the label \(p\) for the label in the result and it recursively composes the domain and codomain types. The contra-variance in the parameter type is important for getting the right blame and coincides with the contra-variance in the reduction rule for composing function coercions. \begin{equation} \label{eq:4} (P_1 \to^p P_2) ; (Q_1 \to^q Q_2) = (Q_1; P_1) \to^p (P_2; Q_2) \end{equation} The following figure shows an example similar to the previous figure, but with function types instead of pair types. The analogy with real trees and line-of-sight breaks down because you have to flip to looking at the trees from back-to-front instead of front-to-back for negative positions within the type. <div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/-_ZDxfJaY95w/UFqG8dofbuI/AAAAAAAAAEE/Q9tIW9xhmns/s1600/tree-merge.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="121" width="320" src="http://1.bp.blogspot.com/-_ZDxfJaY95w/UFqG8dofbuI/AAAAAAAAAEE/Q9tIW9xhmns/s320/tree-merge.png" /></a></div></p> <p>Lastly we need several rules to handle when the error type is on the left or right. \begin{align} (I^p; \bot^\ell); Q &= (I^p; \bot^\ell) \\ P; (I^q; \bot^\ell) &= I^p ; \bot^\ell \qquad \text{if } I \sim P \text{ and } \mathit{label}(P) = p \\ P; (I^q; \bot^\ell) &= I^p ; \bot^q \qquad \text{if } I \not\sim P \text{ and } \mathit{label}(P) = p \end{align} </p> <p>What's extra cool about labeled types and their composition function is that each rule covers many different rules if you were to formulate them in terms of coercions. For example, the single rule \( B^p; B^q = B^p\) covers four situations when viewed as threesomes or coercions: \begin{align*} (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= B \overset{B}{\Longrightarrow} B \\ \iota_B; \iota_B &\longrightarrow \iota_B \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= \star \overset{B^\ell}{\Longrightarrow} B \\ B?^\ell ; \iota_B &\longrightarrow B?^\ell \\ (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= B \overset{B}{\Longrightarrow} \star \\ \iota_B; B! &\longrightarrow B! \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= \star \overset{B^\ell}{\Longrightarrow} \star \\ B?^\ell ; B! & \text{ is already in normal form} \end{align*} </p> <p>We define a threesomes as a source type, middle labeled typed, and a target type. \[ \begin{array}{llcl} \text{threesomes} & \tau & ::= & T \overset{P}{\Longrightarrow} T \\ \end{array} \] We define the sequencing of threesomes as follows \[ (T_1 \overset{P}{\Longrightarrow} T_2); (T_2 \overset{Q}{\Longrightarrow}T_3) = T_1 \overset{P;Q}{\Longrightarrow} T_3 \] Similarly, we define the notation \(\tau_1 \to \tau_2\) as \[ (T_3 \overset{P}{\Longrightarrow} T_1) \to (T_2 \overset{Q}{\Longrightarrow} T_4) = T_1\to T_2 \overset{P\to Q}{\Longrightarrow} T_3 \to T_4 \] </p> <p>We can now go back to the ECD machine and replace the coercions with threesomes. Here's the syntax in A-normal form. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x : T = e(e) \mid x = e : \tau \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \tau \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \tau \end{array} \] The cast function, of course, needs to change. \begin{align*} \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = (B \overset{B}{\Longrightarrow} B) \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle, \tau) &= \langle \lambda x{:}T.s,\rho \rangle : \tau \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle : \tau_1, \tau_2) &= \begin{cases} \langle \lambda x{:}T.s,\rho \rangle : (\tau_1;\tau_2)& \text{if } \mathit{middle}(\tau_1;\tau_2) \neq (I^p;\bot^\ell) \\ \mathbf{blame}\,\ell & \text{if } \mathit{middle}(\tau_1;\tau_2) = (I^p;\bot^\ell) \end{cases} \end{align*} And last but not least, here's the transitions for the ECD machine, but with threesomes instead of coercions. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (T_1 \overset{T_1}{\Longrightarrow}T_1,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (\tau_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : \tau_1 \to \tau_2, \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2, \tau_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], ((\tau_1; \tau_2),\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\tau_5,\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_1. s, \rho' \rangle : \tau_3 \to \tau_4,\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, \tau_3) = v'_2, \text{ and} \\ & (\tau_4; \tau_1; \tau_2) = \tau_5 \\[2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*} </p> <p>We now have an implementation of Lazy UD that is space efficient and relatively efficient in time as well. However, there is one nagging issue regarding the speed of statically-typed code. Notice how there are two transition rules for each kind of function call. The source of the problem is that there are two kinds of values that have function type, closures and closures wrapped in a threesome. In the next post I'll define a unified representation for closures and wrapped closures so that we don't need to dispatch at runtime. </p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-59574990847947974532012-09-18T11:48:00.001-07:002012-09-20T08:58:57.735-07:00Interpretations of the GTLC, Part 2: Space-Efficient Machines<p>I briefly mentioned in my previous post that there are implementation challenges regarding gradual typing. One of those challenges regards space efficiency. In their paper <i>Space-Efficient Gradual Typing</i>, Herman, Tomb, and Flanagan observed two circumstances in which function casts can lead to unbounded space consumption. First, some programs repeatedly apply casts to the same function, resulting in a build-up of casts around the function. In the following example, each time the function bound to k is passed between even and odd a cast is added, causing a space leak proportional to n. <pre><br />let rec even(n : Int, k : Dyn->Bool) : Bool =<br /> if (n = 0) then k(True : Bool => *)<br /> else odd(n - 1, k : *->Bool => Bool->Bool)<br />and odd(n : Int, k : Bool->Bool) : Bool =<br /> if (n = 0) then k(False)<br /> else even(n - 1, k : Bool->Bool => *->Bool)<br /></pre></p> <p>Second, some casts break proper tail recursion. Consider the following example in which the return type of even is \(\star\) and odd is \(\mathsf{Bool}\). <pre><br />let rec even(n : Int) : Dyn =<br /> if (n = 0) then True else odd(n - 1) : Bool => *<br />and odd(n : Int) : Bool =<br /> if (n = 0) then False else even(n - 1) : * => Bool<br /></pre>Assuming tail call optimization, cast-free versions of the even and odd functions require only constant space, but because the call to even is no longer a tail call, the run-time stack grows with each call and space consumption is proportional to n. </p> <p>Herman et al.'s solution to this space-efficiency problem relies on Henglein's Coercion Calculus. This calculus defines a set of combinators, called <i>coercions</i>, that can be used to express casts. The key advantage of coercions is that an arbitrarily long sequence of coercions reduces to a sequence of at most three coercions. The following defines the syntax and typing rules for coercions. The two most important coercions are the injection coercion \(I!\) from \(I\) to \(\star\) and the projection coercion \(I?^\ell\) from \(\star\) to \(I\). The definition of the injectable types \(I\) depends on the blame tracking strategy: \begin{align*} I & ::= B \mid \star \to \star & (UD) \\ I & ::= B \mid T \to T & (D) \end{align*} \begin{gather*} \begin{array}{llcl} \text{coercions} & c & ::= & \iota_T \mid I! \mid I?^\ell \mid c\to c \mid c;c \mid \mathsf{Fail}^\ell \end{array} \\[2ex] \frac{}{\vdash \iota_T : T \Rightarrow T} \qquad \frac{}{\vdash T! : T \Rightarrow \star} \qquad \frac{}{\vdash T?^\ell : \star \Rightarrow T} \\[2ex] \frac{\vdash c_1 : T_{21} \Rightarrow T_{11} \quad \vdash c_2 : T_{12} \Rightarrow T_{22}} {\vdash c_1 \to c_2 : T_{11} \to T_{12} \Rightarrow T_{21} \to T_{22}} \\[2ex] \frac{\vdash c_1 : T_1 \Rightarrow T_2 \quad \vdash c_2 : T_2 \Rightarrow T_3} {\vdash c_1 ; c_2 : T_1 \Rightarrow T_3} \qquad \frac{}{\vdash \mathsf{Fail}^\ell : T_1 \Rightarrow T_2} \end{gather*} We sometimes drop the subscript on the \(\iota_T\) when the type \(T\) doesn't matter. </p> <p>The way in which casts are compiled to coercions depends on whether you're using the D or UD blame tracking strategy. Here's the compilation function for D. \begin{align*} \mathcal{C}_D(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_D(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_D(\star \Rightarrow^\ell I) &= I?^\ell \\ \mathcal{C}_D(I \Rightarrow^\ell \star) &= I! \\ \mathcal{C}_D(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_D(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} The compilation function for UD is a bit more complicated because the definition of injectable type is more restrictive. \begin{align*} \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell B) &= B?^\ell \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell T_3 \to T_4) &= (\star \to \star)?^\ell; (c_1 \to c_2) \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell \star) \\ \text{and } & c_2 = \mathcal{C}_D(\star \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell \star) &= B! \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell \star) &= (c_1 \to c_2) ; (\star \to \star)! \\ \text{where } & c_1 = \mathcal{C}_D(\star \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell \star) \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} </p> <p>We identify coercion sequencing up to associativity and identity: \begin{align*} c_1 ; (c_2; c_3) &= (c_1 ; c_2); c_3 \\ (c ; \iota) &= (\iota ; c) = c \end{align*} The following are the reduction rules for coercions. The compilation function \(\mathcal{C}\) depends on the choice of blame tracking strategy (D or UD). \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (c_{11} \to c_{12}); (c_{21} \to c_{22}) & \longrightarrow (c_{21};c_{11}) \to (c_{12}; c_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ \end{align*} </p> <p>The last reduction rule refers to \(\overline{c}\), a subset of the coercions that we refer to as wrapper coercions. We define wrapper coercions and coercions in normal form \(\hat{c}\) as follows. \[ \begin{array}{llcl} \text{optional injections} & \mathit{inj} & ::= & \iota \mid I! \\ \text{optional function coercions} & \mathit{fun} & ::= & \iota \mid \hat{c} \to \hat{c} \\ \text{optional projections} & \mathit{proj} & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \mathit{fun}; \mathit{inj} \\ \text{normal coercions} & \hat{c} & ::= & \mathit{proj} ; \mathit{fun}; \mathit{inj} \mid \mathit{proj}; \mathsf{Fail}^\ell \end{array} \] Here we can easily see that coercions in normal form can always be represented by a coercion with a length of at most three. </p> <p><b>Theorem</b> (Strong Normalization for Coercions) <i>For any coercion \(c\), there exists a \(\hat{c}\) such that \(c \longrightarrow^{*} \hat{c}\). </i></p> <p>With the Coercion Calculus in hand, we can define a space-efficient abstract machine. This machine is a variant of my favorite abstract machine for the lambda calculus, the ECD machine on terms in A-normal form. Herman et al. define an efficient reduction semantics that relies on mutually-recursive evaluation contexts to enable the simplification of coercions in tail position. Our choice of the ECD machine allows us to deal with coercions in tail position in a more straightforward way. Here's the syntax for our coercion-based calculus in A-normal form. The two main additions are the cast definition and the tail-call statement. \[ \begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x:T = e(e) \mid x = e : \hat{c}\\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \hat{c} \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \overline{c} \end{array} \] </p> <p>The evaluation function \(V\) maps expressions and environments to values in the usual way. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho \rangle \end{align*} The following auxiliary function applies a coercion to a value. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota_B \\ \mathbf{blame}\,\ell & \text{if } \hat{c} = \mathsf{Fail}^\ell \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c_2}) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c_2})= \iota_B \\ \mathbf{blame}\,\ell & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \mathsf{Fail}^\ell \\ \tilde{v} : \overline{c_3} & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \overline{c_3} \text{ and } \overline{c_3} \neq \mathsf{Fail}^\ell \end{cases} \end{align*} </p> <p>The machine state has the form \((s,\rho,\kappa)\), where the stack \(\kappa\) is essentially a list of frames. The frames are somewhat unusual in that they include a pending coercion. We also need a pending coercion for the empty stack, so we define stacks as follows. \[ \begin{array}{llcl} & \sigma & ::= & [] \mid (x,s,\rho)::\kappa \\ \text{stacks} & \kappa & ::= & (c,\sigma) \end{array} \] The machine has seven transition rules to handle normal execution and two rules to handle cast errors. \begin{align*} (x:T = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (\iota_T,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (c_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : (c_1 \to c_2), \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2,c_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : c; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,e, \rho, (c,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], (\hat{c_3},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2\\ & (c_1; c_2) \longrightarrow^{*} \hat{c_3} \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\hat{c_5},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle : (c_3 \to c_4),\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, c_3) = v'_2, \text{ and} \\ & (c_4; c_1; c_2) \longrightarrow^{*} \hat{c_5} \\[2ex] (x = e : c; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (c,\sigma)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \end{align*} </p> <p>The space-efficiency of this machine comes from two places. First, the values only ever include a single cast wrapped around a simple value. The cast function maintains this invariant by normalizing whenever a cast is applied to an already-casted value. The second place is the transition rule for tail calls. The coercion in tail position \(c_1\) is sequenced with the pending coercion \(c_2\) and normalized to \(c_3\), which becomes the new pending coercion. </p> <p>While this machine is space efficient, it is not efficient with respect to time. The reason is that naive coercion reduction is an expensive process: one must search for a redex, reduce it, and plug the contractum back in. Furthermore, the associativity of coercions makes searching for a redex more challenging. In the next post I'll discuss a recursive function that directly maps a sequence of two coercions in normal form to its normal form, using ideas from the paper <i>Threesomes, With and Without Blame</i> I coauthored with Philip Wadler. </p> Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0tag:blogger.com,1999:blog-11162230.post-13362786728811779302012-09-17T21:55:00.000-07:002012-09-18T08:46:18.670-07:00Interpretations of the Gradually-Typed Lambda Calculus, Part 1<p>I just got back from Copenhagen, where I gave a tutorial on gradual typing at the Workshop on Scheme and Functional Programming. I very much enjoyed the workshop and giving the tutorial. Thank you for the invitation Olivier! </p> <p>For those of you who couldn't be there, this series of blog posts will include the material from my tutorial. For those of you who were there, this series will include some bonus material: an efficient machine for "Eager D" based on recent work by Ronald Garcia and myself. </p> <p>When I first began working on gradual typing in 2005 and 2006, my focus was on the type system. The main pieces of the type system fell into place that first year, and ever since then I've been thinking about the dynamic semantics. It turns out there are many design choices and implementation challenges regarding the dynamic semantics. In this post I'll restrict my attention to the gradually-typed lambda calculus, as many issues already arise in that setting. I'll quickly review the syntax and type system, then move on to discuss the dynamic semantics. </p> <p>The following defines the syntax for the gradually-typed lambda calculus. Here I'm writing the dynamic type as <script type="math/tex">\star</script>. Also, note that a lambda without a type annotation on its parameter is shorthand for a lambda whose parameter is annotated with <script type="math/tex">\star</script>. <script type="math/tex; mode=display">\begin{array}{lrcl} \text{basic types} & B & ::= & \mathsf{Int} \mid \mathsf{Bool} \\ \text{types} & T & ::= & B \mid T \to T \mid \star \\ \text{variables} & x,y,z \\ \text{integers} & n \\ \text{constants}& k & ::= & n \mid \mathsf{true} \mid \mathsf{false} \\ \text{operators}&\mathit{op} & ::= & \mathsf{inc} \mid \mathsf{dec} \mid \mathsf{zero?} \\ \text{expressions} & e & ::= & k \mid \mathit{op}(e) \mid x \mid \lambda x{:}T.\, e \mid e\,e \\ & & & \lambda x.\, e \equiv \lambda x{:}\star.\, e \end{array} </script></p> <h3>The Gradual Type System and Cast Insertion</h3> <p>The type system of the gradually-typed lambda calculus is quite similar to the simply-typed lambda calculus. The only differences are in the rules for application. Instead of requiring the argument's type to be identical to the function's parameter type, we only require that the types be <i>consistent</i>, written <script type="math/tex">\sim</script>and defined below. We also allow the application of expressions of type \(\star\). <script type="math/tex; mode=display">\begin{gather*} \frac{\mathrm{typeof}(k) = B}{\Gamma \vdash k : B} \quad \frac{\mathrm{typeof}(\mathit{op}) = B_1 \to B_2 \quad \Gamma \vdash e : T \quad B_1 \sim T}{ \Gamma \vdash \mathit{op}(e) : B_2} \\[2ex] \frac{x:T \in \Gamma}{\Gamma \vdash x : T} \quad \frac{\Gamma,x:T_1 \vdash e : T_2}{\Gamma \vdash \lambda x:T_1.e : T_1 \to T_2} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 : T_1 \to T_3 \\ \Gamma \vdash e_2 : T_2 \end{array} \quad T_1 \sim T_2}{\Gamma \vdash e_1\,e_2 : T_3} \quad \frac{\begin{array}{l}\Gamma \vdash e_1 : \star \\ \Gamma \vdash e_2 : T_2 \end{array}}{\Gamma \vdash e_1\,e_2 : \star} \\[4ex] B \sim B \qquad \star \sim T \qquad T \sim \star \qquad \frac{T_1 \sim T_3 \quad T_2 \sim T_4}{T_1 \to T_2 \sim T_3 \to T_4} \end{gather*} </script> The lack of contra-variance in how function parameters are handled in the consistency relation not a mistake. Unlike subtyping, the consistency relation is <i>symmetric</i>, so it wouldn't matter if we wrote <script type="math/tex">T_3 \sim T_1</script> instead of <script type="math/tex">T_1 \sim T_3</script>. Also, consistency is not transitive, which is why we don't use a separate subsumption rule, but instead use consistency in the rules for application. </p> <p>The dynamic semantics of the gradually-typed lambda calculus is not defined in terms of the surface syntax, but instead it is defined on an intermediate language that extends the simply-typed lambda calculus with explicit casts. <script type="math/tex; mode=display">\begin{array}{lrcl} \text{blame labels} & \ell \\ \text{expressions} & e &::=& \ldots \mid e : T \Rightarrow^\ell T \end{array} </script>We use a non-standard notation for casts so that they are easier to read, so that they go left to right. The casts are annotated with blame labels, which we treat here as symbols, but in a real implementation would include the static location (line and character position) of the cast. We use a single blame label without any notion of polarity because these casts are really just casts, not contracts between two parties. </p> <p>Cast insertion is a type-directed translation, so it is the same as the type system with the addition of an output term. <script type="math/tex; mode=display">\begin{gather*} \frac{\mathrm{typeof}(k) = B}{\Gamma \vdash k \leadsto k : B} \quad \frac{\begin{array}{l}\mathrm{typeof}(\mathit{op}) = B_1 \to B_2 \\ \Gamma \vdash e \leadsto e' : T \end{array} \quad B_1 \sim T}{ \Gamma \vdash \mathit{op}(e) \leadsto \mathit{op}(e' : T \Rightarrow^\ell B_1): B_2} \\[2ex] \frac{x:T \in \Gamma}{\Gamma \vdash x \leadsto x : T} \quad \frac{\Gamma,x:T_1 \vdash e \leadsto e' : T_2} {\Gamma \vdash (\lambda x:T_1.e) \leadsto (\lambda x:T_1.e') : T_1 \to T_2} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 \leadsto e'_1 : T_1 \to T_3 \\ \Gamma \vdash e_2 \leadsto e'_2 : T_2 \end{array} \quad T_1 \sim T_2}{\Gamma \vdash e_1\,e_2 \leadsto e'_1 \, (e'_2 : T_2 \Rightarrow^\ell T_1): T_3} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash e_1 \leadsto e'_1 : \star \\ \Gamma \vdash e_2 \leadsto e'_2 : T_2 \end{array}} {\Gamma \vdash e_1 \, e_2 \leadsto (e'_1: \star \Rightarrow^\ell T_2 \to \star )\,e'_2 : \star} \end{gather*} </script></p> <p>We often abbreviate a pair of casts <script type="math/tex; mode=display"> (e : T_1 \Rightarrow^{\ell_1} T_2) : T_2 \Rightarrow^{\ell_2} T_3 </script>to remove the duplication of the middle type as follows <script type="math/tex; mode=display"> e : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_2} T_3 </script></p> <h3>Design Choices Regarding the Dynamics</h3> <p>Consider the following example in which a function is cast to the dynamic type and then cast to a type that is inconsistent with the type of the function. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} A few questions immediately arise: <ul><li> Should a runtime cast error occur during the evaluation of the right-hand side of the let? Or should the runtime error occur later, when f is applied to \textsf{true}? <li> When the runtime cast error occurs, which cast should be blamed, <script type="math/tex">\ell_0</script> or <script type="math/tex">\ell_1</script>? More generally, we want to define a subtyping relation to characterize safe casts (casts that never fail), and the specifics of subtyping relation depend on the blame tracking strategy. </ul></p> <p>Ron, Walid, and I wrote a paper, <i>Exploring the Design Space of Higher-Order Casts</i> (ESOP 2009), that characterized the different answers to the above questions in terms of Henglein's Coercion Calculus. One can choose to check higher-order casts in either a <i>lazy</i> or <i>eager</i> fashion and one can assign blame to only downcasts (D) or the one can share blame between upcasts and downcasts (UD). The semantics of casts with lazy checking is straightforward whereas eager checking is not, so we'll first discuss lazy checking. Also, the semantics of the D approach is slightly simpler than UD, so we'll start with lazy D. We'll delay discussing the Coercion Calculus until we really need it. </p> <h3>The Lazy D Semantics</h3> <p>We'll define an evaluation (partial) function <script type="math/tex">\mathcal{E}</script> that maps a term and environment to a result. That is, we'll give Lazy D a denotational semantics. The values and results are defined as follows: \[ \begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : T \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array} \] </p> <p>To handle the short-circuiting of evaluation in the case of a cast error (signaled by <script type="math/tex">\mathbf{blame}\,\ell</script>), we use the following monadic operators: \begin{align*} \mathbf{return}\,v &= v \\ \mathbf{letB}\,X = M\,\mathbf{in}\, N &= \mathbf{case}\,M\,\mathbf{of}\,\\ & \quad\;\; \mathbf{blame}\,\ell \Rightarrow \mathbf{blame}\,\ell \\ & \quad \mid v \Rightarrow [X{:=}v]N \end{align*} </p> <p>The primitive operators are given their semantics by the <script type="math/tex">\delta</script> function. \begin{align*} \delta(\mathsf{inc},n) &= n + 1 \\ \delta(\mathsf{dec},n) &= n - 1 \\ \delta(\mathsf{zero?},n) &= (n = 0) \end{align*} </p> <p>In lazy cast checking, when determining whether to signal a cast error, we only compare the heads of the types: \begin{align*} \mathit{hd}(B) &= B \\ \mathit{hd}(T_1 \to T_2) &= \star \to \star \end{align*} </p> <p>The following auxiliary function, named cast, is the main event. It is defined by cases on the source and target types <script type="math/tex">T_1</script> and <script type="math/tex">T_2</script>. The line for projecting from <script type="math/tex">\star</script> to <script type="math/tex">T_2</script> picks the <script type="math/tex">\ell</script>blame label from the projection (the down-cast), which is what gives this semantics its "D". \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : T_3 \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',T_3,\ell,T_2) \\ \mathsf{cast}(v,T_1,\ell,\star) &= v : T_1 \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*} </p> <p>The apply auxiliary function performs function application, and is defined by induction on the first parameter. \begin{align*} \mathsf{apply}(F,v_2) &=F(v_2) \\ \mathsf{apply}(v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4,v_2) &= \mathbf{letB}\,X_3 = \mathsf{cast}(v_2,T_3,\ell,T_1)\,\mathbf{in} \\ & \quad \mathbf{letB}\,X_4 = \mathsf{apply}(v,X_3)\,\mathbf{in} \\ & \quad \mathsf{cast}(X_4, T_2, \ell, T_4) \end{align*} </p> <p>With these auxiliary functions and monadic operators in hand, the definition of the evaluation function <script type="math/tex">\mathcal{E}</script>is straightforward. \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X,T_1 ,\ell, T_2) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad \mathsf{apply}(X_1,X_2) \end{align*} The semantics for the Lazy D Gradually-Typed Lambda Calculus is defined by the following <script type="math/tex">\mathit{eval}</script>partial function. \[ \mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases} \] where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : T_1\to T_2\Rightarrow^\ell T_3\to T_4) &= \mathit{function} \\ \mathit{observe}(v : T \Rightarrow \star) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*} </p> <p><b>Exercise:</b> Calculate the output of <i>eval</i> for the example program at the beginning of this post. </p> <p>Similar to object-oriented languages, we can define a subtyping relation that characterizes when a cast is safe, that is, when a cast will never fail. The following is the subtyping relation for the D semantics. \begin{gather*} \frac{}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*} This subtyping relation is what I expected to see. The dynamic type plays the role of the top element of this ordering and the rule for function types has the usual contra-variance in the parameter type. The Subtyping Theorem connects the dynamic semantics with the subtyping relation. </p> <p><b>Theorem</b> (Subtyping) <i>If the cast labeled with \(\ell\) in program \(e\) respects subtyping, then \(\mathit{eval}(e) \neq \mathbf{blame}\,\ell\).</i></p> <h3>The Lazy UD Semantics</h3> <p>One interpretation of the dynamic type <script type="math/tex">\star</script> is to view it as the following recursive type: \[ \star \equiv \mu \, d.\, \mathsf{Int} + \mathsf{Bool} + (d \to d) \] (See, for example, the chapter on Dynamic Typing in Robert Harper's textbook <i>Practical Foundations for Programming Languages</i>.) In such an interpretation, one can directly convert from <script type="math/tex">\star \to \star</script> to <script type="math/tex">\star</script>, but not, for example, from <script type="math/tex">\mathsf{Int}\to\mathsf{Int}</script> to <script type="math/tex">\star</script>. Instead, one must first convert from <script type="math/tex">\mathsf{Int}\to\mathsf{Int}</script> to <script type="math/tex">\star \to \star</script> and then to <script type="math/tex">\star</script>. </p> <p>Let I be the subset of types that can be directly injected into <script type="math/tex">\star</script>: \[ I ::= B \mid \star \to \star \] The definition of values for Lazy UD changes to use I instead of T for the values of type <script type="math/tex">\star</script>. \[ \begin{array}{lrcl} \text{values} & v \in V & ::= & k \mid F \mid v : I \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \end{array} \] This change in the definition of value necessitates some changes in the cast function. The second and third-to-last lines below contain most of the changes. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : I \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',I,\ell,T_2) \\ \mathsf{cast}(v,I,\ell,\star) &= v : I \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,\star) &= v : T_{11}\to T_{12} \Rightarrow^\ell \star \to \star \Rightarrow^\ell \star \\ & \text{if } T_{11} \neq \star, T_{12} \neq \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*} </p> <p>The rest of the definitions for Lazy UD are the same as those for Lazy D. The following is the subtyping relation for Lazy UD. With this subtyping relation, the type \(\star\) does not play the role of the top element. Instead, a type $T$ is a subtype of \(\star\) if it is a subtype of some injectable type \(I\). \begin{gather*} \frac{}{\star <: \star} \qquad \frac{T <: I}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*} </p> <p><b>Theorem</b> (Subtyping) <i>If the cast labeled with \(\ell\) in program \(e\) respects subtyping, then \(\mathit{eval}(e) \neq \mathbf{blame}\,\ell\).</i></p> <p><b>Exercise:</b> Calculate the output of the Lazy UD <i>eval</i> for the example program at the beginning of this post. </p> <p>In the next post I'll turn to the efficient implementation of Lazy D and UD. </p>Jeremy Siekhttps://plus.google.com/115359947417709201623noreply@blogger.com0