Jeremy Siek

Take-aways from using Deduce in the classroom

2024-10-09T13:58:00.000-07:00

During two weeks of September 2024 I experimented with using the Deduce proof assistant in an honors undergraduate data structures course at Indiana University. The rest of the semester is taught using the Java programming language.

The primary goal of the experiment was to find out whether it would be feasible to teach our undergraduate students how to use a proof assistant and prove correctness of some basic algorithms on linked lists. I gave three lectures about Deduce: (1) how to define functions and data types, (2) how to write proofs, and (3) an application of these ideas to defining insertion sort and proving its correctness. The teaching assistants conducted a lab each week. The first lab helped the students install Deduce and write some functions on linked lists, such as reverse, sum, and cumulative sum. The second lab asked the students to define a tail-recursive variant of insertion sort and prove its correctness, reusing the main lemma from the proof given in lecture. During the second week we also assigned some proof exercises as homework. A week after the second lab, I asked the students to complete a survey regarding the use of Deduce, and the results of the survey are shown below.

The students were able to complete the two labs and the exercises, so the main take-away from the experiment is, yes, it is feasible to teach undergraduate students at IU how to use a proof assistant in a relatively short amount of time! However, I think the survey results tell us that it was just barely feasible and the degree of frustration for the students was still too high. Fortunately, the survey also points to many things that can be improved about Deduce that will help to lower the frustration for beginners.

The number one challenge for students learning Deduce was learning the syntax and dealing with syntax errors. This comes up over and over in the survey. This was surprising to me because the number one goal for me when designing Deduce was to make the syntax as unsurprising as possible. However, at the end-of-the-day, Deduce is still a computer language, which means it takes time for students to learn the grammar. Unfortunately, I built the parser for Deduce using a parser generator and the default error message is just “unexpected token” and the position in the file, which typically pointed to the token after the incorrect one. So an important take-away is that the syntax error messages need to be top notch. This past week I rewrote the parser by hand (recursive descent) and I’ve greatly improved the error messages.

Another aspect of the syntax challenge was a bit more subtle. The students had difficulty learning which keywords to use in different parts of a proof. In Deduce, there is typically one keyword (and syntactic form) for each logical rule. For example, if pq is a proof of (if P then Q) and p is a proof of P, then apply pq to p a proof of Q (modus ponens in Deduce). However, in English mathematical prose, there are no specific words that correspond to using modus ponens (AFAIK). Something that I’m now experimenting with is allowing the proof rule to be inferred, so one could just write pq, p (comma is conjunction introduction) and Deduce figures out that it can use modus ponens to conclude Q. However, in situations when that fails, it may become more difficult to provide an informative error message because the syntax does not capture as much of the user’s intent.

On a related note, several students mentioned that they would like a reference manual for Deduce. At the time of the course, the documentation for Deduce consisted of two tutorials, one for functional programming in Deduce and the other writing proofs. However, those documents are not ideal for answering questions like, “What is the have keyword for and how does it work?” I’m now in the process of adding a reference manual.

Deduce allows students to write incomplete proofs, using ? as a placeholder for parts that still need to be filled in. For each ?, Deduce reports what formula needs to be proved and it gives advice regarding what the next step of the proof could be, based on the top logical connective of the formula. For example, if the formula is an all, then Deduce suggests arbitrary as the next step. Of course, induction is also an alternative choice, and one student would have liked Deduce to also suggest that alternative. An undergraduate research assistant has already added that to Deduce. Inspired by how much this advice helped the students, I’ve added advice for elimination rules. For example, if pq is a proof that P implies Q, then writing help pq will cause Deduce to suggest that you use the apply-to form.

Helping students during office hours was also a great way learn more about student expectations and challenges using Deduce. One thing I noticed is that a student would sometimes refer to an already-proved formula, i.e. a given, by writing the formula itself instead of using the label for the formula. For example, writing apply (if P then Q) to P instead of apply pq to p. I’ve now added the ability to refer to givens by formula. In general, I expect users to prefer the formula when its small and the prefer the label when the formula is large.

Finally, it looks like it would be better to spend 4 weeks introducing students to Deduce instead of 2 weeks. This would allow for a more gentle introduction to Deduce and it would enable more applications of Deduce to proofs of correctness. I’ll give this a try with the 100-student regular section of data structures this Spring!

I’ll be going back over the survey results from time to time, looking for more lessons to be learned and ways to improve Deduce.

Survey Results

There were 7 questions in the survey that were completed by 21 students in an honors undergraduate data structures course. 4 of the questions were multiple choice and 3 were open-ended.

4 Multiple-choice Questions

3 Open-ended Questions

How did you feel when you completed a proof of a theorem using Deduce?

Extremely satisfied

In all honesty, it felt like I was just trying all possible combinations
of applying the logic rather than figuring out how to prove the questions. 
More time spent on learning what the syntax actually means might help.

I had a hard time proving theorems in Deduce, mostly because of the syntax. 
So, when I finally saw "xyz.pf is valid," the joy I felt was indescribable. 
I could sleep at night peacefully knowing I didn't prove anything wrong.

It felt satisfying when it finally completed, however it was because of consistent
frustration in figuring out how to phrase the proofs in deduce.

I typically felt confused and frustrated.

Seeing the words: "proof is valid", would always bring me satisfaction. However, 
I would frequently feel sad when I struggled more with the implementation of a 
proof in Deduce rather than the proof itself.
Also, I would like to add a comment about question 2. While I think doing proofs 
without a program like Deduce is sufficient for students in C241, it could make 
grading proofs for UIs/TAs easier.

Liberated (I struggled with the syntax of Deduce so it was satisfying when the proof finally ran).

I felt accomplished afterwards, like I had completed a journey of smaller goals to
achieve proving the final theorem.

Completing a proof felt rewarding. Pf. Siek mentioned that it would eventually feel 
like a video game, and I agree. It felt like putting together pieces of a puzzle
until the puzzle was complete as opposed to proof-writing as I'm used to it.

Very satisfied once I understood why it was correct.

It feels a little nice having confirmation of my proof correctness, but I can still
feel confused after completing a proof. Sometimes after completing a proof in Deduce, 
I have trouble reading the proof I just wrote. 

It did feel good because you know what you’ve done is correct.

relief, like a weight has been lifted off my shoulders. 
During the proof: I hate deduce
After completing the proof: I love deduce I did it I can do anything the world is mine

I felt really accomplished because the syntax in Deduce was hard to pick up in the 2 weeks
we learned it. I still think Deduce was helpful in a way even though I struggled, so maybe
spending more time with Deduce would help with syntax knowledge.

Relieved, the process was challenging, but typically rewarding in the end. However, 
I sometimes felt frustrated with Deduce since I sometimes didn't understand the syntax, 
which made it hard to get Deduce to give helpful feedback.

I think proofs that I wrote in deduce were super thorough, your logic needed to be super
strong and for a more difficult proof, you needed to map out the definitions and the 
equations you are trying to prove. I felt more confident about my logic skills and code 
after the proofs. It was definitely a lot of hard work but I think it was worth it to get 
an understanding of the correctness of your programs.

I felt good and rewarded when the valid statement came out, especially after trial and error
with the syntax

Every time I completed a proof of a theorem using Deduce, I felt mostly relieved.  
It was frustrating trying to understand Deduce's syntax or how to apply certain concepts
to complete a proof, but at least when Deduce says that the proof is valid, I know it's 
correct and I do not have to go looking for someone to look over my work. I think, to make 
it easier to learn Deduce, it would be helpful to have more completed step-by-step examples
of proved theorems available to students. I'd also argue that when Deduce did offer advice, 
the advice that I got was usually very helpful and assisted greatly in pushing me towards 
correctness. However, often, Deduce would not give advice at all when I had an error in my 
proof, so I would suggest having a handbook or guide available to students to help decipher
what errors lead to certain, unhelpful, error messages. 

I felt really happy that the theorem worked and was marked as valid since this means 
my logic is actually correct.

I found completing the theorems in Deduce satisfying. The immediate feedback was nice
compared to traditional informal proofs that had delayed feedback.

I felt good. Especially because it was much harder than normal proofs. Nevertheless, 
I felt like I understood the logic behind the functions I programmed.

What changes or improvements to Deduce would you find most helpful? Please be as specific as you can.

The biggest thing that would help would be more diverse/descriptive error messages. 
Most of the error messages I got were something like "unexpected symbol ____". 
Clearly this meant I wasn't using the syntax correctly, but I didn't get any feedback
on what I was getting wrong. So I think if the error messages were somehow able to have a 
bit more information that would be good. Also if you could make the ? work while using the
equations statement, that would help a lot.

Better error messages would probably be the most helpful, followed by an easier process of
applying facts, definitions, and rewrites to the goal.

I'm not sure if we already have it, but a documentation explaining how all the functions work
would be really helpful. For example, how "suffices," "apply," "definition," and others function.

Sometimes the ? would not work as expected - where it would give me an error because it was
thinking that ? was part of the proof rather than to be used as like a "fill in the blank".

Some of the syntax just feels very unintuitive, it seems unreasonable to me to try to teach us the
language in just a few weeks and expect us to be able to use it well. 

In class, we were taught how to use suffices to change the goal when proving equivalence. 
Similarly, when proving things in C241, we only changed the goal when we did logical equivalence
proofs. However, in the Insertion Sort lab, when we had to use suffices in our proofs, the novelty
of suffices/Deduce compounded onto the fact that we had not practiced changing the goal in proofs
that were not about equivalence. This led to me feeling very sad after taking a long time to complete
only the base case of the isort proof.
The addition of syntax highlighting could also be cool, but I do not think it is too important.

Maybe an index for all the syntax (eg. apply, suffice) that was used in the examples on the Github page
(I had to use ctrl+f to find the terms on the page).

I think making the error statements a little more specific in more areas would be helpful for 
troubleshooting when writing the proof.

Having tools to support it such as intelliSense and autocomplete or even a VSCode extension would
improve the learning curve. A lot of times you know where the proof needs to go but you don't know 
the syntax that deduce wants, which is frustrating. Also a frequent error message I would get was
"unexpected token". I think it would be useful if there was a way for deduce to (if possible) 
partially parse the line the unexpected token was on to see what type of command I was trying to
use and give some advice on it.

In the question mark advice section, include induction as a possibility.

A more complete guide to using deduce

I wish we covered it more in class because I struggled with it a lot outside of class.

The "evidence" part of each line could be improved. When we had to reference a function, 
or theorem, or a previously proved fact, we had to use a large variety of different keywords. 
Definition, apply, definition in, rewrite, rewrite in, and maybe more had me spending a lot 
of time figuring out which one I was supposed to use even though I already knew the next step
of the proof. I just didn't know the format for the justification deduce was looking for. 
Maybe it is possible to combine some of these?

I think listing out the proper ways to approach proofs would help. Moreover, there should be
a reference listing when to use "suffices ... with definition ...", "by definition", "by rewrite",
or "by apply". I looked for guidance in GitHub and think that the new syntax learned up to a certain
point should be accessible in an efficient way.

Here are a few things I would suggest:
Clearer feedback on errors: A lot of the time I would get errors/responses from Deduce where 
I had a difficult time interpreting what I was supposed to be doing to fix my issues.
More detailed examples: It would be nice to have a reference for proofs in the homework, 
especially if there were a line-by-line breakdown of how the proofs work.
Better documentation or tutorials: Learning the system was difficult due to a lack of resources,
as the only materials were the two .md files on proofs and Functional Programming.

While the feedback from deduce definitely helped me identify some minor errors, I feel the
error messages sometimes lack clarity. And its more unclear when dealing with type inference
or mismatch errors or issues with multisets. I don't know to what extent it could be possible
in the context of a class, but more detailed error messages with specific suggestions or more
context can help debug proofs more easily. 

Sometimes it got difficult to know when to use which words like "with", "by", "apply." 
If there is some way to be able to use most of the words we used in 241, I think that would 
be very helpful.

I think Deduce would benefit most from better error messages and more forgiving syntax. 
First, it seems that when Deduce recognizes an error in a proof, but doesn't know specifically
how to explain the error, it just defaults to some sort of "parsing error" or reports an
"unexpected token" which I think is sometimes unhelpful. Additionally, I believe that when trying
to use a theorem, if it is used incorrectly, then Deduce should mention every parameter the theorem 
needs in some sort of explicit detail, so I don't have to look through another document to decipher
what the theorem expects. I will also say that it seems like Deduce is only willing to give advice
when I put the "?" symbol in the exact right place in the proof. For example, I've had issues where
the "?" symbol was not formatted properly (such as indentation or location in a line) and I received
one of the two errors I mentioned above. It would also be great to have more overall documentation
about specifically the syntax of Deduce in addition to the existing documentation about how to prove
theorems using features of Deduce.

It would be nice to have better error messages that were a bit clearer or better explain what the
problem was that occurred. For example, I had an issue where I was trying to use a "definition" 
instead of using "apply," and the error messages would just say something like "no proofbinding."

Allow the use of question marks in the equations statements. I also think adding a little statement
when solving for equalities v.s. other goals (ex. Insertion sort theorem) would be nice.

Better IntelliSense, and to make it less confusing, and so many keywords mean the same thing, which
makes this very confusing. So, looking at it at first glance, you won't know that they have different
functionalities. I am trying to make it easier for a first-timer to learn the language without depending
on the TAs or professors to tell them what everything does. Also, when someone hovers over an in-built
function or keyword, it should show a brief definition of what that does.

What about Deduce did you dislike or find frustrating? Please be as specific as you can.

I got better about this as we went on, but at the beginning and even somewhat at the end,
it felt somewhat ambiguous to me when to use which keyword ("suffices," "definition," 
"rewrite," etc.). I'm not sure how to fix this and a large part of this is probably just
because we didn't spend long on Deduce but that was the biggest learning curve for me.

It was often very difficult to apply facts, definitions, or rewrites in the way that I would
expect them to. Maybe this was just because I didn't understand the syntax too well.

Sometimes, even though my proof was logically correct, I had to restructure it because Deduce 
didn’t recognize it as valid.

The syntax really frustrated me. I felt that it would be so much easier to phrase in english
and then I had to work backwards to try to figure out how deduce wanted the proofs to be.
I put 4 weeks as the right amount of time because I felt if I had more time to get used to
deduce I wouldn't be as upset with it, but I was between putting that or "Not using deduce". 

Occasionally, I would receive feedback that was not very helpful. Unfortunately, I do not
remember specific examples of that happening. 
It is also slightly frustrating that after taking the time to learn it, we are not continuing
the use of Deduce. Although I am relieved, I feel like I finally have a grasp on Deduce.

I'm not sure if it's because I have such a low understanding of Deduce, but it was hard to figure
out what syntax to use for which situations (eg. suffice, rewrite, apply).

Honestly, the only thing I can think of is I would occasionally get confused with the question mark
and when or where I would need to put it, but that might have been user error as well.

Basically what I mentioned above. Not knowing the exact syntax to get where you want to go
(usually a trivial theorem or rule) and unexpected token errors not showing much help. 

Learning syntax and translating things I knew to actual proofs.

What I find most frustrating about Deduce is knowing how to do a proof but not being able to figure
out how to write it in Deduce. I also feel like Deduce is a bit restrictive. But I understand both
of these are because it's a computer program. If I had to prove something through a proof 
programming language, I guess Deduce is as good as any. 

I found the syntax very difficult. During lab I didn’t know how to solve a problem in deduce
because I didn’t know how to word it properly.

1. sometimes to see what to do next you can use ?, but other times ? will create an error
2. I know the next step of the proof but I don't know how deduce wants me to prove it

Coding the functions wasn't frustrating since I had taken H211, but despite having previous
experience with similar syntax, I still struggled with Deduce syntax, which was pretty frustrating.

One of the main frustrations I encountered while using Deduce was the feedback system. The error
messages were often unclear, and it was difficult to pinpoint exactly what went wrong in my proofs. 
For example, when I was trying to rewrite an expression using insertion_sort2(xs) = isort(xs, empty) 
by definition insertion_sort2 to prove sorted(insertion_sort2(xs)), Deduce gave me an error message
stating, “no matches found for rewrite with insertion_sort2(xs) = isort(xs, empty) in
sorted(isort(xs, empty)).” This feedback was vague and didn’t help me understand why the rewrite failed
or how to fix it.
Additionally, the learning curve for Deduce felt steep, especially compared to programming in more 
familiar languages like Java. The formal proof system required precise logic, and I often struggled
with applying theorems correctly due to the complex syntax and lack of intuitive error handling.
I also found that Deduce lacked detailed examples or a more comprehensive tutorial system. When I was
trying to apply the isort_sorted theorem, for instance, there wasn’t a clear guide on how to handle
such proofs step-by-step, leaving me unsure whether I was using the right approach.

When I took C241, the way we proved theorems was slightly different. In deduce, the most frustrating
thing was the syntax and flow. More often than not, I had all the logic and steps down, but there 
would be issues with matching the right statements. I usually use have statements a lot and that 
confuses me. And then I tried using the equations feature which was far simpler to complete the same proof. 

The most frustrating part of using Deduce was knowing how to prove a theorem to a person, 
but not knowing how to prove a theorem to Deduce. The problem of trying to prove a specific
theorem to Deduce becomes even more apparent when error messages can be unhelpful. I think some 
Deduce features such as "?" and "sorry" are great steps in the right direction, but I still think
more should be done to help point students to the correct solution when they make a mistake in their proofs.

I found it frustrating how specific you have to be with deduce as compared to real life.
I would understand the logic of a proof but I would not be able to prove it in deduce because
I did not know exactly how to format my reasoning.

I found using the equations implementation an annoyance because you couldn’t get feedback after
every line when a question mark was placed. I would write out everything in steps and that would 
take up a lot of space on the line and become hard to read.

Some syntax was very confusing. Like defining functions. It felt like it was not meant for 
UG students. especially the use of Lambda functions. I don't know how you can improve that 
given that it is a vital part of the language but that is how I felt.

Binary Search Trees, Correctly!

2024-08-11T12:04:00.000-07:00

This is the seventh blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

This post continues on the theme of binary trees, that is, trees in which each node has at most two children. The focus of this post is to implement the Search interface, described next, using binary trees.

The Search Interface

The Search interface includes operations to (1) create an empty data structure, (2) search for a value based on its associated key, and (3) insert a new key-value association.

Function Implmentation of Search

The Search interface can also be implemented in a simple but less efficient way, using a function to map keys to values. With this approach, the operation to search for a value is just function call. The Maps.pf file defines the empty_map operation, which returns a function that maps every input to none.

assert @empty_map<Nat,Nat>(5) = none

The Maps.pf file also defined the update(f, k, v) operation, which returns a function that associates the key k with v but otherwise behaves like the given function f. Here is an example use of update.

define m2 = update(@empty_map<Nat,Nat>, 4, just(99))
assert m2(4) = just(99)
assert m2(5) = none

We will use this function implementation of the Search interface to specify the correctness of the binary tree implementation of Search.

Binary Tree Implementation of Search

We will store the keys and their values in a binary tree and implement BST_search and BST_insert operations. These operations are efficient (logarithmic time) when the binary tree is balanced, but we will save how to balance trees for a later blog post.

The main idea of a binary search tree comes from the notion of binary search on a sequence, that is, keep the sequennce in sorted-order and when searching for a key, start in the middle and go left by half of the subsequence if the key you’re looking for is less than the one at the current position; go right by half of the subsequence if the key is greater than the one at the current position. Of course, if they are equal, then you’ve found what you’re looking for. Thus, binary search is just like looking up the definition of a word in a dictionary. The word is your key and the dictionary is sorted alphabetically. You can start in the middle and compare your word to those on the current page, then flip to the left or right depending on whether your word is lower or higher in the alphabet.

The binary search tree adapts the idea of binary search from a sequence to a tree. Each node in the tree stores a key and its value. The left subtree of the node contain keys that are less than the node and the right subtree contain keys that are greater than the node. Thus, when searching for a key, one can compare it to the current node and then either go left or right depending on whether the key is less-than or greater-than the current node.

Consider the following diagram of a binary search tree. For simplicity, we will use numbers for both the keys and the values. In this diagram the key is listed before the colon and the value is after the colon. For example, this tree contains

key 10 associated with value 32,
key 13 associated with value 63,
etc.

The following code builds this binary search tree using the Tree union type defined in the Binary Tree blog post and the Pair type from the Pair.pf file.

define mt = @EmptyTree<Pair<Nat,Nat>>
define BST_1 = TreeNode(mt, pair(1, 53), mt)
define BST_9 = TreeNode(mt, pair(9, 42), mt)
define BST_6 = TreeNode(BST_1, pair(6, 85), BST_9)
define BST_11 = TreeNode(mt, pair(11, 99), mt)
define BST_13 = TreeNode(BST_11, pair(13, 69), mt)
define BST_19 = TreeNode(mt, pair(19, 74), mt)
define BST_14 = TreeNode(BST_13, pair(14, 27), BST_19)
define BST_10 = TreeNode(BST_6, pair(10, 32), BST_14)

There are three operations in the binary search tree interface and here are their specifications.

The EmptyTree constructor from the Tree union type, which builds a binary search tree that does not contain any key-value associations.
BST_search : fn Tree<Pair<Nat,Nat>> -> (fn Nat -> Option<Nat>)

The operation BST_search(T) returns a function that maps each key to its associated value.
BST_insert : fn Tree<Pair<Nat,Nat>>, Nat, Nat -> Tree<Pair<Nat,Nat>>

The operation BST_insert(T, k, v) produces a new tree that associates value v with key k and for all other keys, associates keys with the values according to tree T. In other words, BST_insert(T, k, v) = update(BST_search(T), k, v).

Write the `BST_search` and `BST_insert` functions

The BST_search function is recursive over the Tree parameter. If the tree is empty, the result is none. Otherwise, we compare the key k with the key in the current node x. If they are equal, return the value in the current node; if k is less-than, recursively search the left subtree; if k is greater-than, recursively search the right subtree.

function BST_search(Tree<Pair<Nat,Nat>>) -> fn Nat -> Option<Nat> {
  BST_search(EmptyTree) = λk{ none }
  BST_search(TreeNode(L, x, R)) = λk{
    if k = first(x) then
      just(second(x))
    else if k < first(x) then
      BST_search(L)(k)
    else
      BST_search(R)(k)
  }
}

The BST_insert function follows a similar control structure: recursive over the Tree parameter followed by an if-then-else based on the key k and the key of the current node. However, BST_insert returns a new tree that contains the specified key and value. When the key k is already in the tree, BST_insert overrides the current value with the new value, as implied by the specification above.

function BST_insert(Tree<Pair<Nat,Nat>>, Nat, Nat) -> Tree<Pair<Nat,Nat>> {
  BST_insert(EmptyTree, k, v) = TreeNode(EmptyTree, pair(k, v), EmptyTree)
  BST_insert(TreeNode(L, x, R), k, v) =
    if k = first(x) then
      TreeNode(L, pair(k, v), R)
    else if k < first(x) then
      TreeNode(BST_insert(L, k, v), x, R)
    else
      TreeNode(L, x, BST_insert(R, k, v))
}

Test

We test the correctness of the EmptyTree, BST_search, and BST_insert operations by making sure they behave according to their specification. Starting with EmptyTree, the result of BST_search with any key should be none.

assert BST_search(EmptyTree)(5) = none

After inserting key 10 with value 32, the result of BST_search on 10 should be 32. For other keys, such as 5, the result should be the same as for EmptyTree.

define BST_a = BST_insert(EmptyTree, 10, 32)
assert BST_search(BST_a)(10) = just(32)
assert BST_search(BST_a)(5) = none

The story is similar for inserting key 6 with value 85.

define BST_b = BST_insert(BST_a, 6, 85)
assert BST_search(BST_b)(6) = just(85)
assert BST_search(BST_b)(10) = just(32)
assert BST_search(BST_b)(5) = none

If we insert with the same key 6 but a different value 59, the result of BST_search for 6 should be the new value 59. For other keys, the result of BST_search remains the same.

define BST_c = BST_insert(BST_b, 6, 59)
assert BST_search(BST_c)(6) = just(59)
assert BST_search(BST_c)(10) = just(32)
assert BST_search(BST_c)(5) = none

Prove

Starting with EmptyTree, we prove that applying BST_search produces an empty map.

theorem BST_search_EmptyTree: 
  BST_search(EmptyTree) = λk{none}
proof
  extensionality
  arbitrary k:Nat
  conclude BST_search(EmptyTree)(k) = none
      by definition BST_search
end

The main correctness theorem is to show that BST_insert behaves the same as update. That is to say, applying BST_insert to a tree followed by BST_search is the same as first applying BST_search and then applying update.

theorem BST_search_insert_udpate: all T:Tree<Pair<Nat,Nat>>. all k:Nat, v:Nat.
  BST_search(BST_insert(T, k, v)) = update(BST_search(T), k, just(v))

The proof is by induction on the tree. For the case T = EmptyTree, we start by using extensionality to apply both sides of the equation to an arbitrary number i. We then expand BST_insert(EmptyTree, k, v).

    // <<BST_search_insert_empty_ext>> =
    arbitrary k:Nat, v:Nat
    extensionality
    arbitrary i:Nat
    suffices BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
           = update(BST_search(EmptyTree), k, just(v))(i)   with definition BST_insert

Looking at the definition of BST_search, the left-hand side will either be none or just(v) depending on whether i is less-than, equal to, or greater than k. So we proceed with case analysis using the trichotomy theorem from Nat.pf.

    // <<BST_search_insert_empty_tri>> =
    cases trichotomy[i][k]
    case i_less_k: i < k {
      <<BST_search_insert_empty_less>>
    }
    case i_eq_k: i = k {
      <<BST_search_insert_empty_equal>>
    }
    case i_greater_k: k < i {
      <<BST_search_insert_empty_greater>>
    }

Indeed, when i is less than k, both the left-hand side and the right-hand side are equal to none.

    // <<BST_search_insert_empty_less>> =
    have not_i_eq_k: not (i = k)   by apply less_not_equal to i_less_k
    equations
        BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
         = @none<Nat>
            by definition {BST_search, BST_search, first, second}
               and rewrite not_i_eq_k | i_less_k
     ... = update(BST_search(EmptyTree), k, just(v))(i)
            by definition {BST_search, update} and rewrite not_i_eq_k

When i is equal to k, both sides are equal to just(v).

    // <<BST_search_insert_empty_equal>> =
    equations
        BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
         = just(v)
            by definition {BST_search, first, second} and rewrite i_eq_k
     ... = update(BST_search(EmptyTree), k, just(v))(i)
            by definition {BST_search, update} and rewrite i_eq_k

When i is greater than k, both side are equal to none.

    // <<BST_search_insert_empty_greater>> =
    have not_k_eq_i: not (k = i)  by apply less_not_equal to i_greater_k
    have not_i_eq_k: not (i = k)  by suppose ik apply not_k_eq_i to symmetric ik
    have not_i_less_k: not (i < k) 
        by apply less_implies_not_greater to i_greater_k
    equations
        BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
         = @none<Nat>
            by definition {BST_search, BST_search, first, second}
               and rewrite not_i_eq_k | not_i_less_k
     ... = update(BST_search(EmptyTree), k, just(v))(i)
            by definition {BST_search, update}
               and rewrite not_i_eq_k

Next we consider the case where T = TreeNode(L, x, R). Again we begin with extensionality.

    // <<BST_search_insert_node_ext>> =
    arbitrary k:Nat, v:Nat
    extensionality
    arbitrary i:Nat
    suffices BST_search(BST_insert(TreeNode(L, x, R), k, v))(i) 
           = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)   by .

Looking at BST_insert(TreeNode(L, x, R), k, v), its result will depend on whether k is less than, equal to, or greater than first(x). So we proceed by cases, using the trichotomy theorem from Nat.py.

    // <<BST_search_insert_node_tri>> =
    cases trichotomy[k][first(x)]
    case k_less_fx: k < first(x) {
      <<BST_search_insert_node_k_less_fx>>
    }
    case k_eq_fx: k = first(x) {
      <<BST_search_insert_node_k_equal_fx>>
    }
    case k_greater_fx: first(x) < k {
      <<BST_search_insert_node_k_greater_fx>>
    }

For the case k < first(x), we have

  BST_insert(TreeNode(L, x, R), k, v) 
= BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)

so it suffices to prove the following.

    // <<BST_search_insert_node_k_less_fx_suffices>> =
    have not_k_eq_fx: not (k = first(x))   by apply less_not_equal to k_less_fx
    suffices BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
           = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
                with definition {BST_insert} and rewrite not_k_eq_fx | k_less_fx

The result of BST_search(TreeNode(BST_insert(L, k, v), x, R))(i) depends on the relationship between i and first(x), so we again proceed by cases using the trichotomy theorem. There sure are a lot of cases in this proof!

    // <<BST_search_insert_node_k_less_fx_tri>> =
    cases trichotomy[i][first(x)]
    case i_less_fx: i < first(x) {
      <<BST_search_insert_node_k_less_fx_i_less_fx>>
    }
    case i_eq_fx: i = first(x) {
      <<BST_search_insert_node_k_less_fx_i_eq_fx>>
    }
    case fx_less_i: first(x) < i {
      <<BST_search_insert_node_k_less_fx_i_greater_fx>>
    }

For the case i < first(x), we proceed by the following several steps of equational reasoning, shown below. The key step is applying the induction hypothesis for the left subtree L.

    // <<BST_search_insert_node_k_less_fx_i_less_fx>> =
    have not_i_eq_fx: not (i = first(x)) by apply less_not_equal to i_less_fx
    equations
          BST_search(TreeNode(BST_insert(L, k, v), x, R))(i) 
        = BST_search(BST_insert(L, k, v))(i)
            by definition{BST_search} and rewrite not_i_eq_fx | i_less_fx
    ... = update(BST_search(L), k, just(v))(i)
            by rewrite IH_L[k,v]
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i) by
            switch i = k {
              case true suppose ik_true {
                definition {BST_search,update} and rewrite ik_true
              }
              case false suppose ik_false {
                definition {BST_search,update}
                and rewrite ik_false | not_i_eq_fx | i_less_fx
              }
            }

For the case i = first(x), both sides simplify to just(second(x)).

    // <<BST_search_insert_node_k_less_fx_i_eq_fx>> =
    have not_fx_eq_k: not (first(x) = k)
      by suppose fx_eq_k
         conclude false by rewrite not_k_eq_fx in symmetric fx_eq_k 
    equations
          BST_search(TreeNode(BST_insert(L, k, v), x, R))(i) 
        = just(second(x))
            by definition {BST_search} and rewrite i_eq_fx
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
            by definition {BST_search,update} and rewrite i_eq_fx | not_fx_eq_k

For the case first(x) < i, both side are equal to BST_search(R)(i). because we know that i ≠ k.

    // <<BST_search_insert_node_k_less_fx_i_greater_fx>> =
    have not_i_eq_fx: not (i = first(x))
      by suppose i_eq_fx
         apply (apply less_not_equal to fx_less_i) to symmetric i_eq_fx
    have not_i_less_fx: not (i < first(x))
      by apply less_implies_not_greater to fx_less_i
    have not_i_eq_k: not (i = k)
      by suppose i_eq_k
         have fx_less_k: first(x) < k   by rewrite i_eq_k in fx_less_i
         have not_k_less_fx: not (k < first(x)) 
             by apply less_implies_not_greater to fx_less_k
         conclude false by apply not_k_less_fx to rewrite k_less_fx
    equations
          BST_search(TreeNode(BST_insert(L, k, v), x, R))(i) 
        = BST_search(R)(i)
            by definition BST_search and rewrite not_i_eq_fx | not_i_less_fx
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
            by definition {BST_search, update}
               and rewrite not_i_eq_k | not_i_eq_fx | not_i_less_fx

This completes the proof of the case for k < first(x).

    <<BST_search_insert_node_k_less_fx_suffices>>
    <<BST_search_insert_node_k_less_fx_tri>>

Next consider the case for k = first(x). We have

  BST_insert(TreeNode(L, x, R), k, v) 
= TreeNode(L, pair(k, v), R)

so it suffices to prove the following

    // <<BST_search_insert_node_k_equal_fx_suffices>> =
    suffices BST_search(TreeNode(L, pair(k, v), R))(i) 
           = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
                by definition BST_insert and rewrite k_eq_fx

Looking at the definition of BST_search, the result of BST_search(TreeNode(L, pair(k, v), R))(i) will depend on the relationship between i and k. So we proceed by cases, using the trichotomy theorem.

    // <<BST_search_insert_node_k_equal_fx_tri>> =
    cases trichotomy[i][k]
    case i_less_k: i < k {
      <<BST_search_insert_node_k_equal_fx_i_less_k>>
    }
    case i_eq_k: i = k {
      <<BST_search_insert_node_k_equal_fx_i_eq_k>>
    }
    case k_less_i: k < i {
      <<BST_search_insert_node_k_equal_fx_i_greater_k>>
    }

When i < k, both sides of the equation are equal to BST_search(L)(i).

    // <<BST_search_insert_node_k_equal_fx_i_less_k>> =
    have not_i_eq_k: not (i = k)   by apply less_not_equal to i_less_k
    equations
          BST_search(TreeNode(L, pair(k, v), R))(i) 
        = BST_search(L)(i)
              by definition {BST_search, first}
                 and rewrite not_i_eq_k | i_less_k
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
              by definition {update,BST_search}
                 and rewrite symmetric k_eq_fx | not_i_eq_k | i_less_k

When i = k, both sides are equal to just(v).

    // <<BST_search_insert_node_k_equal_fx_i_eq_k>> =
    suffices BST_search(TreeNode(L, pair(k, v), R))(k)
            = update(BST_search(TreeNode(L, x, R)), k, just(v))(k)
            with rewrite i_eq_k
    equations
      BST_search(TreeNode(L, pair(k, v), R))(k)
        = just(v)          by definition {BST_search, first, second}
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(k)
                           by definition {BST_search, update}

When i > k, both sides are equal to BST_search(R)(i).

    have not_i_eq_k: not (i = k) 
      by have nki: not (k = i) by apply less_not_equal to k_less_i
         suppose i_eq_k apply nki to symmetric i_eq_k
    have not_i_less_k: not (i < k) 
        by apply less_implies_not_greater to k_less_i
    equations
          BST_search(TreeNode(L, pair(k, v), R))(i) 
        = BST_search(R)(i)
            by definition {BST_search, first, second}
               and rewrite not_i_eq_k | not_i_less_k
    ... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
            by definition {update, BST_search}
               and rewrite symmetric k_eq_fx | not_i_eq_k | not_i_less_k

This concludes the proof of the case for k = first(x).

    <<BST_search_insert_node_k_equal_fx_suffices>>
    <<BST_search_insert_node_k_equal_fx_tri>>

The last case to prove is for k > first(x). We leave this as an exercise.

The following puts together the pieces of the proof for BST_search_insert_udpate.

theorem BST_search_insert_udpate: all T:Tree<Pair<Nat,Nat>>. all k:Nat, v:Nat.
  BST_search(BST_insert(T, k, v)) = update(BST_search(T), k, just(v))
proof
  induction Tree<Pair<Nat,Nat>>
  case EmptyTree {
    <<BST_search_insert_empty_ext>>
    <<BST_search_insert_empty_tri>>
  }
  case TreeNode(L, x, R) suppose IH_L, IH_R {
    <<BST_search_insert_node_ext>>
    <<BST_search_insert_node_tri>>
  }
end

Binary Trees with In-order Iterators (Part 2)

2024-07-20T13:57:00.000-07:00

This is the sixth blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

This post continues were we left off from the previous post in which we implemented binary trees and in-order tree iterators.

Our goal in this post is to prove that we correctly implemented the iterator operations:

ti2tree : < E > fn TreeIter<E> -> Tree<E>
ti_first : < E > fn Tree<E>,E,Tree<E> -> TreeIter<E>
ti_get : < E > fn TreeIter<E> -> E
ti_next : < E > fn TreeIter<E> -> TreeIter<E>
ti_index : < E > fn(TreeIter<E>) -> Nat

The first operation, ti2tree, requires us to first obtain a tree iterator, for example, with ti_first, so ti2tree does not have a correctness criteria all of its own, but instead the proof of its correctness will be part of the correctness of the other operations.

So we skip to the proof of correctness for ti_first.

Correctness of `ti_first`

Let us make explicit the specification of ti_first:

Specification: The ti_first(A, x, B) function returns an iterator pointing to the first node, with respect to in-order traversal, of the tree TreeNode(A, x, B).

Also, recall that we said the following about ti2tree and ti_first: creating an iterator from a tree using ti_first and then applying ti2tree produces the original tree.

So we have two properties to prove about ti_first. For the first property, we need a way to formalize "the first node with respect to in-order traversal". This is where the ti_index operation comes in. If ti_first returns the first node, then its index should be 0. (One might worry that if ti_index is incorrect, then this property would not force ti_first to be correct. Not to worry, we will prove that ti_index is correct!) So we have the following theorem:

theorem ti_first_index: all E:type, A:Tree<E>, x:E, B:Tree<E>.
  ti_index(ti_first(A, x, B)) = 0
proof
  arbitrary E:type, A:Tree<E>, x:E, B:Tree<E>
  definition ti_first
  ?
end

After expanding the definition of ti_first, we are left with the following goal. So we need to prove a lemma about the first_path auxiliary function.

    ti_index(first_path(A,x,B,empty)) = 0

Here is a first attempt to formulate the lemma.

lemma first_path_index: all E:type. all A:Tree<E>. all y:E, B:Tree<E>.
  ti_index(first_path(A,y,B, empty)) = 0

However, because first_path is recursive, we will need to prove this by recursion on A. But looking at the second clause of in the definition of first_path, the path argument grows, so our induction hypothesis, which requires the path argument to be empty, will not be applicable. As is often the case, we need to generalize the lemma. Let’s replace empty with an arbitrary path as follows.

lemma first_path_index: all E:type. all A:Tree<E>. all y:E, B:Tree<E>, path:List<Direction<E>>.
  ti_index(first_path(A,y,B, path)) = 0

But now this lemma is false. Consider the following situation in which the current node y is 5 and the path is L,R (going from node 5 up to node 3).

The index of node 5 is not 0, it is 5! Instead the index of node 5 is equal to the number of nodes that come before 5 according to in-order travesal. We can obtain that portion of the tree using functions that we have already defined, in particular take_path followed by plug_tree. So we can formulate the lemma as follows.

lemma first_path_index: all E:type. all A:Tree<E>. all y:E, B:Tree<E>, path:List<Direction<E>>.
  ti_index(first_path(A,y,B, path)) = num_nodes(plug_tree(take_path(path), EmptyTree))
proof
  arbitrary E:type
  induction Tree<E>
  case EmptyTree {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    ?
  }
  case TreeNode(L, x, R) suppose IH {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    ?
  }
end

For the case A = EmptyTree, the goal simply follows from the definitions of first_path, ti_index, and ti_take.

    conclude ti_index(first_path(EmptyTree,y,B,path))
           = num_nodes(plug_tree(take_path(path),EmptyTree))
                by definition {first_path, ti_index, ti_take}.

For the case A = TreeNode(L, x, R), after expanding the definition of first_path, we need to prove:

  ti_index(first_path(L,x,R,node(LeftD(y,B),path)))
= num_nodes(plug_tree(take_path(path),EmptyTree))

But that follows from the induction hypothesis and the definition of take_path.

    definition {first_path}
    equations
          ti_index(first_path(L,x,R,node(LeftD(y,B),path)))
        = num_nodes(plug_tree(take_path(node(LeftD(y,B),path)),EmptyTree))
                by IH[x, R, node(LeftD(y,B), path)]
    ... = num_nodes(plug_tree(take_path(path),EmptyTree))
                by definition take_path.

Here is the completed proof of the first_path_index lemma.

lemma first_path_index: all E:type. all A:Tree<E>. all y:E, B:Tree<E>, path:List<Direction<E>>.
  ti_index(first_path(A,y,B, path)) = num_nodes(plug_tree(take_path(path), EmptyTree))
proof
  arbitrary E:type
  induction Tree<E>
  case EmptyTree {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    conclude ti_index(first_path(EmptyTree,y,B,path))
           = num_nodes(plug_tree(take_path(path),EmptyTree))
                by definition {first_path, ti_index, ti_take}.
  }
  case TreeNode(L, x, R) suppose IH {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    definition {first_path}
    equations
          ti_index(first_path(L,x,R,node(LeftD(y,B),path)))
        = num_nodes(plug_tree(take_path(node(LeftD(y,B),path)),EmptyTree))
                by IH[x, R, node(LeftD(y,B), path)]
    ... = num_nodes(plug_tree(take_path(path),EmptyTree))
                by definition take_path.
  }
end

Returning to the proof of ti_first_index, we need to prove that ti_index(first_path(A,x,B,empty)) = 0. So we apply the first_path_index lemma and then the definitions of take_path, plug_tree, and num_nodes. Here is the completed proof of ti_first_index.

theorem ti_first_index: all E:type, A:Tree<E>, x:E, B:Tree<E>.
  ti_index(ti_first(A, x, B)) = 0
proof
  arbitrary E:type, A:Tree<E>, x:E, B:Tree<E>
  definition ti_first
  equations  ti_index(first_path(A,x,B,empty))
           = num_nodes(plug_tree(take_path(empty),EmptyTree))
                       by first_path_index[E][A][x,B,empty]
       ... = 0      by definition {take_path, plug_tree, num_nodes}.
end

Our next task is to prove that creating an iterator from a tree using ti_first and then applying ti2tree produces the original tree.

theorem ti_first_stable: all E:type, A:Tree<E>, x:E, B:Tree<E>.
  ti2tree(ti_first(A, x, B)) = TreeNode(A, x, B)
proof
  arbitrary E:type, A:Tree<E>, x:E, B:Tree<E>
  definition ti_first
  ?
end

After expanding the definition of ti_first, we are left to prove that

ti2tree(first_path(A,x,B,empty)) = TreeNode(A,x,B)

So we need to prove another lemma about first_path and again we need to generalize the empty path to an arbitrary path. Let us consider again the situation where the current node x is 5.

The result of first_path(A,x,B,path) will be the path to node 4, and the result of ti2tree will be the whole tree, not just TreeNode(A,x,B) as in the above equation. However, we can construct the whole tree from the path and TreeNode(A,x,B) using the plug_tree function. So we have the following lemma to prove.

lemma first_path_stable:
  all E:type. all A:Tree<E>. all y:E, B:Tree<E>, path:List<Direction<E>>.
  ti2tree(first_path(A, y, B, path)) = plug_tree(path, TreeNode(A, y, B))
proof
  arbitrary E:type
  induction Tree<E>
  case EmptyTree {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    ?
  }
  case TreeNode(L, x, R) suppose IH_L, IH_R {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    ?
  }
end

In the case A = EmptyTree, we prove the equation using the definitions of first_path and ti2tree.

    equations  ti2tree(first_path(EmptyTree,y,B,path))
             = ti2tree(TrItr(path,EmptyTree,y,B))       by definition first_path.
         ... = plug_tree(path,TreeNode(EmptyTree,y,B))  by definition ti2tree.

In the case A = TreeNode(L, x, R), we need to prove that

  ti2tree(first_path(TreeNode(L,x,R),y,B,path))
= plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))

We probably need to expand the definition of first_path, but doing so in your head is hard. So we can instead ask Deduce to do it. We start by constructing an equation with a bogus right-hand side and apply the definition of first_path.

    equations
          ti2tree(first_path(TreeNode(L,x,R),y,B,path))
        = EmptyTree
             by definition first_path ?
    ... = plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))
             by ?

Deduce responds with

incomplete proof
Goal:
    ti2tree(first_path(L,x,R,node(LeftD(y,B),path))) = EmptyTree

in which the left-hand side has expanded the definition of first_path. So we cut and paste that into our proof and move on to the next step.

    equations
          ti2tree(first_path(TreeNode(L,x,R),y,B,path))
        = ti2tree(first_path(L,x,R,node(LeftD(y,B),path)))
             by definition first_path.
    ... = plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))
             by ?

We now have something that matches the induction hypothesis, so we instantiate it and ask Deduce to tell us the new right-hand side.

    equations
          ti2tree(first_path(TreeNode(L,x,R),y,B,path))
        = ti2tree(first_path(L,x,R,node(LeftD(y,B),path)))
             by definition first_path.
    ... = EmptyTree
             by IH_L[x,R,node(LeftD(y,B),path)]
    ... = plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))
             by ?

Deduce responds with

expected
ti2tree(first_path(L,x,R,node(LeftD(y,B),path))) = EmptyTree
but only have
ti2tree(first_path(L,x,R,node(LeftD(y,B),path))) = plug_tree(node(LeftD(y,B),path),TreeNode(L,x,R))

So we cut and paste the right-hand side of the induction hypothesis to replace EmptyTree.

    equations
          ti2tree(first_path(TreeNode(L,x,R),y,B,path))
        = ti2tree(first_path(L,x,R,node(LeftD(y,B),path)))
             by definition first_path.
    ... = plug_tree(node(LeftD(y,B),path),TreeNode(L,x,R))
             by IH_L[x,R,node(LeftD(y,B),path)]
    ... = plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))
             by ?

The final step of the proof is easy; we just apply the definition of plug_tree. Here is the completed proof of first_path_stable.

lemma first_path_stable:
  all E:type. all A:Tree<E>. all y:E, B:Tree<E>, path:List<Direction<E>>.
  ti2tree(first_path(A, y, B, path)) = plug_tree(path, TreeNode(A, y, B))
proof
  arbitrary E:type
  induction Tree<E>
  case EmptyTree {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    equations  ti2tree(first_path(EmptyTree,y,B,path))
             = ti2tree(TrItr(path,EmptyTree,y,B))       by definition first_path.
         ... = plug_tree(path,TreeNode(EmptyTree,y,B))  by definition ti2tree.
  }
  case TreeNode(L, x, R) suppose IH_L, IH_R {
    arbitrary y:E, B:Tree<E>, path:List<Direction<E>>
    equations
          ti2tree(first_path(TreeNode(L,x,R),y,B,path))
        = ti2tree(first_path(L,x,R,node(LeftD(y,B),path)))
             by definition first_path.
    ... = plug_tree(node(LeftD(y,B),path),TreeNode(L,x,R))
             by IH_L[x,R,node(LeftD(y,B),path)]
    ... = plug_tree(path,TreeNode(TreeNode(L,x,R),y,B))
             by definition plug_tree.
  }
end

Returning to the ti_first_stable theorem, the equation follows from our first_path_stable lemma and the definition of plug_tree.

theorem ti_first_stable: all E:type, A:Tree<E>, x:E, B:Tree<E>.
  ti2tree(ti_first(A, x, B)) = TreeNode(A, x, B)
proof
  arbitrary E:type, A:Tree<E>, x:E, B:Tree<E>
  definition ti_first
  equations  ti2tree(first_path(A,x,B,empty))
           = plug_tree(empty,TreeNode(A,x,B))  by first_path_stable[E][A][x,B,empty]
       ... = TreeNode(A,x,B)                   by definition plug_tree.
end

Correctness of `ti_next`

We start by writing down a more careful specification of ti_next.

Specification: The ti_next(iter) operation returns an iterator whose position is one more than the position of iter with respect to in-order traversal, assuming the iter is not at the end of the in-order traversal.

To make this specification formal, we can again use ti_index to talk about the position of the iterator. So we begin to prove the following theorem ti_next_index, taking the usual initial steps in the proof as guided by the formula to be proved and the definition of ti_next, which performs a switch on the right child R of the current node.

theorem ti_next_index: all E:type, iter : TreeIter<E>.
  if suc(ti_index(iter)) < num_nodes(ti2tree(iter))
  then ti_index(ti_next(iter)) = suc(ti_index(iter))
proof
  arbitrary E:type, iter : TreeIter<E>
  suppose prem: suc(ti_index(iter)) < num_nodes(ti2tree(iter))
  switch iter {
    case TrItr(path, L, x, R) suppose iter_eq {
      definition ti_next
      switch R {
        case EmptyTree suppose R_eq {
          ?
        }
        case TreeNode(RL, y, RR) suppose R_eq {
          ?
        }
      }
    }
  }
end

In the case R = EmptyTree, ti_next calls the auxiliary function next_up and we need to prove.

ti_index(next_up(path,L,x,EmptyTree)) = suc(ti_index(TrItr(path,L,x,EmptyTree)))

As usual, we must create a lemma that generalizes this equation.

Proving the `next_up_index` lemma

Looking at the definition of next_up, we see that the recursive call grows the fourth argument, so we must replace the EmptyTree in the needed equation with an arbitrary tree R:

ti_index(next_up(path,L,x,R)) = suc(ti_index(TrItr(path,L,x,R)))

But this equation is not true in general. Consider the situation below where the current node x is node 1 in our example tree. The index of the next_up from node 1 is 3, but the index of node 1 is 1 and of course, adding one to that is 2, not 3!

So we need to change this equation to account for the situation where R is not empty, but instead an arbitrary subtree. The solution is to add the number of nodes in R to the right-hand side:

ti_index(next_up(path,L,x,R)) = suc(ti_index(TrItr(path,L,x,R))) + num_nodes(R)

One more addition is necessary to formulate the lemma. The above equation is only meaningful when the index on the right-hand side is in bounds. That is, it must be smaller than the number of nodes in the tree. So we formula the lemma next_up_index as follows and take a few obvious steps into the proof.

lemma next_up_index: all E:type. all path:List<Direction<E>>. all A:Tree<E>, x:E, B:Tree<E>.
  if suc(ti_index(TrItr(path, A, x, B)) + num_nodes(B)) < num_nodes(ti2tree(TrItr(path, A, x, B)))
  then ti_index(next_up(path, A, x, B)) = suc(ti_index(TrItr(path, A,x,B)) + num_nodes(B))
proof
  arbitrary E:type
  induction List<Direction<E>>
  case empty {
    arbitrary A:Tree<E>, x:E, B:Tree<E>
    suppose prem: suc(ti_index(TrItr(empty,A,x,B)) + num_nodes(B)) 
                  < num_nodes(ti2tree(TrItr(empty,A,x,B)))
    ?
  }
  case node(f, path') suppose IH {
    arbitrary A:Tree<E>, x:E, B:Tree<E>
    suppose prem
    switch f {
      case LeftD(y, R) {
        ?
      }
      case RightD(L, y) suppose f_eq {
        ?
      }
    }
  }
end

In the case path = empty, the premise is false because there are no nodes that come afterwards in the in-order traversal. In particular, the premise implies the following contradictory inequality.

    have AB_l_AB: suc(num_nodes(A) + num_nodes(B)) < suc(num_nodes(A) + num_nodes(B))
      by definition {ti_index, ti_take, take_path, plug_tree, ti2tree, num_nodes} 
         in prem
    conclude false  by apply less_irreflexive to AB_l_AB

Next consider the case path = node(LeftD(y, R), path'). After expanding all the relevant definitions, we need to prove that

  num_nodes(plug_tree(take_path(path'), TreeNode(A,x,B))) 
= suc(num_nodes(plug_tree(take_path(path'), A)) + num_nodes(B))

We need a lemma that relates num_nodes and plug_tree. So we pause the current proof for the following exercise.

Exercise: prove the `num_nodes_plug` lemma

lemma num_nodes_plug: all E:type. all path:List<Direction<E>>. all t:Tree<E>.
  num_nodes(plug_tree(path, t)) = num_nodes(plug_tree(path, EmptyTree)) + num_nodes(t)

Back to the `next_up_index` lemma

We use num_nodes_plug on both the left and right-hand sides of the equation, and apply the definition of num_nodes.

    rewrite num_nodes_plug[E][take_path(path')][TreeNode(A,x,B)]
    rewrite num_nodes_plug[E][take_path(path')][A]
    definition num_nodes

After that it suffices to prove the following.

  num_nodes(plug_tree(take_path(path'),EmptyTree)) + suc(num_nodes(A) + num_nodes(B)) 
= suc((num_nodes(plug_tree(take_path(path'),EmptyTree)) + num_nodes(A)) + num_nodes(B))

This equation is rather big, so let’s squint at it by giving names to its parts. (This is a new version of define that I’m experimenting with.)

    define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
    define_ Y = num_nodes(A)
    define_ Z = num_nodes(B)

Now it’s easy to see that our goal is true using some simple arithmetic.

    conclude X + suc(Y + Z) = suc((X + Y) + Z)
        by rewrite add_suc[X][Y+Z] | add_assoc[X][Y,Z].

Finally, consider the case path = node(RightD(L, y), path'). After expanding the definition of next_up, we need to prove

  ti_index(next_up(path',L,y,TreeNode(A,x,B))) 
= suc(ti_index(TrItr(node(RightD(L,y),path'),A,x,B)) + num_nodes(B))

The left-hand side matches the induction hypothesis, so we have

    equations
      ti_index(next_up(path',L,y,TreeNode(A,x,B))) 
        = suc(ti_index(TrItr(path',L,y,TreeNode(A,x,B))) + num_nodes(TreeNode(A,x,B)))
            by apply IH[L,y,TreeNode(A,x,B)] 
               to definition {ti_index, ti_take, num_nodes, ti2tree} ?
    ... = suc(ti_index(TrItr(node(RightD(L,y),path'),A,x,B)) + num_nodes(B))
            by ?

But we need to prove the premise of the induction hypothesis. We can do that as follows, with many uses of num_nodes_plug and some arithmetic that we package up into lemma XYZW_equal.

    have IH_prem: suc(num_nodes(plug_tree(take_path(path'),L)) 
                      + suc(num_nodes(A) + num_nodes(B))) 
                  < num_nodes(plug_tree(path',TreeNode(L,y,TreeNode(A,x,B))))
      by rewrite num_nodes_plug[E][take_path(path')][L]
          | num_nodes_plug[E][path'][TreeNode(L,y,TreeNode(A,x,B))]
         definition {num_nodes, num_nodes}
         define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
         define_ Y = num_nodes(L) define_ Z = num_nodes(A) define_ W = num_nodes(B)
         define_ P = num_nodes(plug_tree(path',EmptyTree))
         suffices suc((X + Y) + suc(Z + W)) < P + suc(Y + suc(Z + W))
         have prem2: suc((X + suc(Y + Z)) + W) < P + suc(Y + suc(Z + W))
           by enable {X,Y,Z,W,P}
              definition {num_nodes, num_nodes} in
              rewrite num_nodes_plug[E][take_path(path')][TreeNode(L,y,A)]
                    | num_nodes_plug[E][path'][TreeNode(L,y,TreeNode(A,x,B))] in
              definition {ti_index, ti_take, take_path, ti2tree, plug_tree} in
              rewrite f_eq in prem
         rewrite XYZW_equal[X,Y,Z,W]
         prem2

Here is the proof of XYZW_equal.

lemma XYZW_equal: all X:Nat, Y:Nat, Z:Nat, W:Nat.
  suc((X + Y) + suc(Z + W)) = suc((X + suc(Y + Z)) + W)
proof
  arbitrary X:Nat, Y:Nat, Z:Nat, W:Nat
  enable {operator+}
  equations
        suc((X + Y) + suc(Z + W))
      = suc(suc(X + Y) + (Z + W))      by rewrite add_suc[X+Y][Z+W].
  ... = suc(suc(((X + Y) + Z) + W))    by rewrite add_assoc[X+Y][Z,W].
  ... = suc(suc((X + (Y + Z)) + W))    by rewrite add_assoc[X][Y,Z].
  ... = suc((X + suc(Y + Z)) + W)      by rewrite add_suc[X][Y+Z].
end

Getting back to the equational proof, it remains to prove that

  suc(ti_index(TrItr(path',L,y,TreeNode(A,x,B))) + num_nodes(TreeNode(A,x,B)))
= suc(ti_index(TrItr(node(RightD(L,y),path'),A,x,B)) + num_nodes(B))

which we can do with yet more uses of num_nodes_plug and XYZW_equal.

    ... = suc(num_nodes(plug_tree(take_path(path'),L)) + suc(num_nodes(A) + num_nodes(B)))
          by definition {ti_index, ti_take, num_nodes}.
    ... = suc((num_nodes(plug_tree(take_path(path'),EmptyTree)) + num_nodes(L))
              + suc(num_nodes(A) + num_nodes(B)))
          by rewrite num_nodes_plug[E][take_path(path')][L].
    ... = suc((num_nodes(plug_tree(take_path(path'),EmptyTree)) 
              + suc(num_nodes(L) + num_nodes(A))) + num_nodes(B))
          by define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
             define_ Y = num_nodes(L) define_ Z = num_nodes(A) define_ W = num_nodes(B)
             define_ P = num_nodes(plug_tree(path',EmptyTree))
             conclude suc((X + Y) + suc(Z + W)) = suc((X + suc(Y + Z)) + W)
                 by XYZW_equal[X,Y,Z,W]
    ... = suc(num_nodes(plug_tree(take_path(path'),TreeNode(L,y,A))) + num_nodes(B))
          by rewrite num_nodes_plug[E][take_path(path')][TreeNode(L,y,A)]
             definition {num_nodes, num_nodes}.
    ... = suc(ti_index(TrItr(node(RightD(L,y),path'),A,x,B)) + num_nodes(B))
          by definition {ti_index, ti_take, take_path, plug_tree}.

That completes the last case of the proof of next_up_index. Here’s the completed proof.

lemma next_up_index: all E:type. all path:List<Direction<E>>. all A:Tree<E>, x:E, B:Tree<E>.
  if suc(ti_index(TrItr(path, A, x, B)) + num_nodes(B)) < num_nodes(ti2tree(TrItr(path, A, x, B)))
  then ti_index(next_up(path, A, x, B)) = suc(ti_index(TrItr(path, A,x,B)) + num_nodes(B))
proof
  arbitrary E:type
  induction List<Direction<E>>
  case empty {
    arbitrary A:Tree<E>, x:E, B:Tree<E>
    suppose prem: suc(ti_index(TrItr(empty,A,x,B)) + num_nodes(B)) 
                  < num_nodes(ti2tree(TrItr(empty,A,x,B)))
    have AB_l_AB: suc(num_nodes(A) + num_nodes(B)) < suc(num_nodes(A) + num_nodes(B))
      by definition {ti_index, ti_take, take_path, plug_tree, ti2tree, num_nodes} 
         in prem
    conclude false  by apply less_irreflexive to AB_l_AB
  }
  case node(f, path') suppose IH {
    arbitrary A:Tree<E>, x:E, B:Tree<E>
    suppose prem
    switch f {
      case LeftD(y, R) {
        definition {next_up, ti_index, ti_take, take_path}
        rewrite num_nodes_plug[E][take_path(path')][TreeNode(A,x,B)]
        rewrite num_nodes_plug[E][take_path(path')][A]
        definition num_nodes
        define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
        define_ Y = num_nodes(A)
        define_ Z = num_nodes(B)
        conclude X + suc(Y + Z) = suc((X + Y) + Z)
            by rewrite add_suc[X][Y+Z] | add_assoc[X][Y,Z].
      }
      case RightD(L, y) suppose f_eq {
        definition {next_up}
        have IH_prem: suc(num_nodes(plug_tree(take_path(path'),L)) 
                          + suc(num_nodes(A) + num_nodes(B))) 
                      < num_nodes(plug_tree(path',TreeNode(L,y,TreeNode(A,x,B))))
          by rewrite num_nodes_plug[E][take_path(path')][L]
              | num_nodes_plug[E][path'][TreeNode(L,y,TreeNode(A,x,B))]
             definition {num_nodes, num_nodes}
             define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
             define_ Y = num_nodes(L) define_ Z = num_nodes(A) define_ W = num_nodes(B)
             define_ P = num_nodes(plug_tree(path',EmptyTree))
             suffices suc((X + Y) + suc(Z + W)) < P + suc(Y + suc(Z + W))
             have prem2: suc((X + suc(Y + Z)) + W) < P + suc(Y + suc(Z + W))
               by enable {X,Y,Z,W,P}
                  definition {num_nodes, num_nodes} in
                  rewrite num_nodes_plug[E][take_path(path')][TreeNode(L,y,A)]
                        | num_nodes_plug[E][path'][TreeNode(L,y,TreeNode(A,x,B))] in
                  definition {ti_index, ti_take, take_path, ti2tree, plug_tree} in
                  rewrite f_eq in prem
             rewrite XYZW_equal[X,Y,Z,W]
             prem2
        equations
              ti_index(next_up(path',L,y,TreeNode(A,x,B))) 
            = suc(ti_index(TrItr(path',L,y,TreeNode(A,x,B))) + num_nodes(TreeNode(A,x,B)))
                by apply IH[L,y,TreeNode(A,x,B)] 
                   to definition {ti_index, ti_take, num_nodes, ti2tree} IH_prem
        ... = suc(num_nodes(plug_tree(take_path(path'),L)) + suc(num_nodes(A) + num_nodes(B)))
              by definition {ti_index, ti_take, num_nodes}.
        ... = suc((num_nodes(plug_tree(take_path(path'),EmptyTree)) + num_nodes(L))
                  + suc(num_nodes(A) + num_nodes(B)))
              by rewrite num_nodes_plug[E][take_path(path')][L].
        ... = suc((num_nodes(plug_tree(take_path(path'),EmptyTree)) 
                  + suc(num_nodes(L) + num_nodes(A))) + num_nodes(B))
              by define_ X = num_nodes(plug_tree(take_path(path'),EmptyTree))
                 define_ Y = num_nodes(L) define_ Z = num_nodes(A) define_ W = num_nodes(B)
                 define_ P = num_nodes(plug_tree(path',EmptyTree))
                 conclude suc((X + Y) + suc(Z + W)) = suc((X + suc(Y + Z)) + W)
                     by XYZW_equal[X,Y,Z,W]
        ... = suc(num_nodes(plug_tree(take_path(path'),TreeNode(L,y,A))) + num_nodes(B))
              by rewrite num_nodes_plug[E][take_path(path')][TreeNode(L,y,A)]
                 definition {num_nodes, num_nodes}.
        ... = suc(ti_index(TrItr(node(RightD(L,y),path'),A,x,B)) + num_nodes(B))
              by definition {ti_index, ti_take, take_path, plug_tree}.
      }
    }
  }
end

Back to the proof of `ti_next_index`

With the next_up_index lemma complete, we can get back to proving the ti_next_index theorem. Recall that we were in the case R = EmptyTree and needed to prove the following.

ti_index(next_up(path,L,x,EmptyTree)) = suc(ti_index(TrItr(path,L,x,EmptyTree)))

To use the next_up_index lemma, we need to prove its premise:

    have next_up_index_prem:
        suc(ti_index(TrItr(path,L,x,EmptyTree)) + num_nodes(EmptyTree))
        < num_nodes(ti2tree(TrItr(path,L,x,EmptyTree)))
      by enable num_nodes
         rewrite add_zero[ti_index(TrItr(path,L,x,EmptyTree))]
         rewrite iter_eq | R_eq in prem

We can finish the proof of the equation using the definition of num_nodes and the add_zero property.

    equations
          ti_index(next_up(path,L,x,EmptyTree))
        = suc(ti_index(TrItr(path,L,x,EmptyTree)) + num_nodes(EmptyTree))
          by apply next_up_index[E][path][L, x, EmptyTree] to next_up_index_prem
    ... = suc(ti_index(TrItr(path,L,x,EmptyTree)))
          by definition num_nodes
             rewrite add_zero[ti_index(TrItr(path,L,x,EmptyTree))].

The next case in the proof of ti_next_index is for R = TreeNode(RL, y, RR). We need to prove

  ti_index(first_path(RL,y,RR,node(RightD(L,x),path))) 
= suc(ti_index(TrItr(path,L,x,TreeNode(RL,y,RR))))

We can start by applying the first_path_index lemma, which gives us

equations
      ti_index(first_path(RL,y,RR,node(RightD(L,x),path))) 
    = num_nodes(plug_tree(take_path(node(RightD(L,x),path)),EmptyTree))

We have opportunities to expand take_path and then plug_tree.

... = num_nodes(plug_tree(take_path(path),TreeNode(L,x,EmptyTree)))
        by definition {take_path,plug_tree}.

We can separate out the TreeNode(L,x,EmptyTree) using num_nodes_plug.

... = num_nodes(plug_tree(take_path(path),EmptyTree)) + suc(num_nodes(L))
        by rewrite num_nodes_plug[E][take_path(path)][TreeNode(L,x,EmptyTree)]
           definition {num_nodes, num_nodes}
           rewrite add_zero[num_nodes(L)].

Then we can move the L back into the plug_tree with num_nodes_plug.

... = suc(num_nodes(plug_tree(take_path(path),L)))
       by rewrite add_suc[num_nodes(plug_tree(take_path(path),EmptyTree))][num_nodes(L)]
          rewrite num_nodes_plug[E][take_path(path)][L].

We conclude the equational reasoning with the definition of ti_index and ti_take.

... = suc(ti_index(TrItr(path,L,x,TreeNode(RL,y,RR))))
        by definition {ti_index, ti_take}.

Here is the complete proof of ti_next_index.

theorem ti_next_index: all E:type, iter : TreeIter<E>.
  if suc(ti_index(iter)) < num_nodes(ti2tree(iter))
  then ti_index(ti_next(iter)) = suc(ti_index(iter))
proof
  arbitrary E:type, iter : TreeIter<E>
  suppose prem: suc(ti_index(iter)) < num_nodes(ti2tree(iter))
  switch iter {
    case TrItr(path, L, x, R) suppose iter_eq {
      definition ti_next
      switch R {
        case EmptyTree suppose R_eq {
          have next_up_index_prem:
              suc(ti_index(TrItr(path,L,x,EmptyTree)) + num_nodes(EmptyTree))
              < num_nodes(ti2tree(TrItr(path,L,x,EmptyTree)))
            by enable num_nodes
               rewrite add_zero[ti_index(TrItr(path,L,x,EmptyTree))]
               rewrite iter_eq | R_eq in prem
          equations
                ti_index(next_up(path,L,x,EmptyTree))
              = suc(ti_index(TrItr(path,L,x,EmptyTree)) + num_nodes(EmptyTree))
                by apply next_up_index[E][path][L, x, EmptyTree] to next_up_index_prem
          ... = suc(ti_index(TrItr(path,L,x,EmptyTree)))
                by definition num_nodes
                   rewrite add_zero[ti_index(TrItr(path,L,x,EmptyTree))].
        }
        case TreeNode(RL, y, RR) suppose R_eq {
          equations
                ti_index(first_path(RL,y,RR,node(RightD(L,x),path))) 
              = num_nodes(plug_tree(take_path(node(RightD(L,x),path)),EmptyTree))
                  by first_path_index[E][RL][y,RR,node(RightD(L,x),path)]
          ... = num_nodes(plug_tree(take_path(path),TreeNode(L,x,EmptyTree)))
                  by definition {take_path,plug_tree}.
          ... = num_nodes(plug_tree(take_path(path),EmptyTree)) + suc(num_nodes(L))
                  by rewrite num_nodes_plug[E][take_path(path)][TreeNode(L,x,EmptyTree)]
                     definition {num_nodes, num_nodes}
                     rewrite add_zero[num_nodes(L)].
          ... = suc(num_nodes(plug_tree(take_path(path),L)))
                 by rewrite add_suc[num_nodes(plug_tree(take_path(path),EmptyTree))][num_nodes(L)]
                    rewrite num_nodes_plug[E][take_path(path)][L].
          ... = suc(ti_index(TrItr(path,L,x,TreeNode(RL,y,RR))))
                  by definition {ti_index, ti_take}.

        }
      }
   }
  }
end

Proof of `ti_next_stable`

The second correctness condition for ti_next(iter) is that it is stable with respect to ti2tree. Following the definition of ti_next, we switch on the iterator and then on the right child of the current node.

theorem ti_next_stable: all E:type, iter:TreeIter<E>.
  ti2tree(ti_next(iter)) = ti2tree(iter)
proof
  arbitrary E:type, iter:TreeIter<E>
  switch iter {
    case TrItr(path, L, x, R) {
      switch R {
        case EmptyTree {
          definition {ti2tree, ti_next}
          ?
        }
        case TreeNode(RL, y, RR) {
          definition {ti2tree, ti_next}
          ?
        }
      }
    }
  }
end

For the case R = EmptyTree, we need to prove the following, which amounts to proving that next_up is stable.

ti2tree(next_up(path,L,x,EmptyTree)) = plug_tree(path,TreeNode(L,x,EmptyTree))

We’ll pause the current proof to prove the next_up_stable lemma.

Exercise: `next_up_stable` lemma

lemma next_up_stable: all E:type. all path:List<Direction<E>>. all A:Tree<E>, y:E, B:Tree<E>.
  ti2tree(next_up(path, A, y, B)) = plug_tree(path, TreeNode(A,y,B))

Back to `ti_next_stable`

Now we conclude the R = EmptyTree case of the ti_next_stable theorem.

    conclude ti2tree(next_up(path,L,x,EmptyTree))
       = plug_tree(path,TreeNode(L,x,EmptyTree))
      by next_up_stable[E][path][L,x,EmptyTree]

In the case R = TreeNode(RL, y, RR), we need prove the following, which is to say that first_path is stable. Thankfully we already proved that lemma!

    conclude ti2tree(first_path(RL,y,RR,node(RightD(L,x),path))) 
           = plug_tree(path,TreeNode(L,x,TreeNode(RL,y,RR)))
      by rewrite first_path_stable[E][RL][y,RR,node(RightD(L,x),path)]
         definition {plug_tree}.

Here is the completed proof of ti_next_stable.

theorem ti_next_stable: all E:type, iter:TreeIter<E>.
  ti2tree(ti_next(iter)) = ti2tree(iter)
proof
  arbitrary E:type, iter:TreeIter<E>
  switch iter {
    case TrItr(path, L, x, R) {
      switch R {
        case EmptyTree {
          definition {ti2tree, ti_next}
          conclude ti2tree(next_up(path,L,x,EmptyTree))
             = plug_tree(path,TreeNode(L,x,EmptyTree))
            by next_up_stable[E][path][L,x,EmptyTree]
        }
        case TreeNode(RL, y, RR) {
          definition {ti2tree, ti_next}
          conclude ti2tree(first_path(RL,y,RR,node(RightD(L,x),path))) 
                 = plug_tree(path,TreeNode(L,x,TreeNode(RL,y,RR)))
            by rewrite first_path_stable[E][RL][y,RR,node(RightD(L,x),path)]
               definition {plug_tree}.
        }
      }
    }
  }
end

Correctness of `ti_get` and `ti_index`

Recall that ti_get(iter) should return the data in the current node of iter and ti_index should return the position of iter as a natural number with respect to in-order traversal. Thus, if we apply in_order to the tree, the element at position ti_index(iter) should be the same as ti_get(iter). So we have the following theorem to prove.

theorem ti_index_get_in_order: all E:type, iter:TreeIter<E>, a:E.
  ti_get(iter) = nth(in_order(ti2tree(iter)), a)(ti_index(iter))
proof
  arbitrary E:type, iter:TreeIter<E>, a:E
  switch iter {
    case TrItr(path, L, x, R) {
      definition {ti2tree, ti_get, ti_index, ti_take}
      ?
    }
  }
end

After expanding with some definitions, we are left to prove

x = nth(in_order(plug_tree(path,TreeNode(L,x,R))),a)
       (num_nodes(plug_tree(take_path(path),L)))

We see num_nodes applied to plug_tree, so we can use the num_nodes_plug lemma

      rewrite num_nodes_plug[E][take_path(path)][L]

The goal now is to prove

x = nth(in_order(plug_tree(path, TreeNode(L,x,R))),a)
       (num_nodes(plug_tree(take_path(path), EmptyTree)) + num_nodes(L))

The next step to take is not so obvious. Perhaps one hint is that we have the following theorem about nth from List.pf that also involves addition in the index argument of nth.

theorem nth_append_back: all T:type. all xs:List<T>. all ys:List<T>, i:Nat, d:T.
  nth(append(xs, ys), d)(length(xs) + i) = nth(ys, d)(i)

So we would need to prove a lemma that relates in_order and plug_tree to append. Now the take_path function returns the part of the tree before the path, so perhaps it can be used to create the xs in nth_append_back. But what about ys? It seems like we need a function that returns the part of the tree after the path. Let us call this function drop_path.

function drop_path<E>(List<Direction<E>>) -> List<Direction<E>> {
  drop_path(empty) = empty
  drop_path(node(f, path')) =
    switch f {
      case RightD(L, x) {
        drop_path(path')
      }
      case LeftD(x, R) {
        node(LeftD(x, R), drop_path(path'))
      }
    }
}

So using take_path and drop_path, we should be able to come up with an equation for in_order(plug_tree(path, TreeNode(A, x, B))). The part of tree before x should be take_path(path) followed by the subtree A. The part of the tree after x should be the subtree B followed by drop_path(path).

lemma in_order_plug_take_drop: all E:type. all path:List<Direction<E>>. all A:Tree<E>, x:E, B:Tree<E>.
  in_order(plug_tree(path, TreeNode(A, x, B)))
  = append(in_order(plug_tree(take_path(path), A)), 
           node(x, in_order(plug_tree(drop_path(path), B))))

It turns out that to prove this, we will also need a lemma about the combination of plug_tree and take_path:

lemma in_order_plug_take: all E:type. all path:List<Direction<E>>. all t:Tree<E>.
  in_order(plug_tree(take_path(path), t)) 
  = append( in_order(plug_tree(take_path(path),EmptyTree)), in_order(t))

and a lemma about the combination of plug_tree and drop_path:

lemma in_order_plug_drop: all E:type. all path:List<Direction<E>>. all t:Tree<E>.
  in_order(plug_tree(drop_path(path), t)) = append( in_order(t), in_order(plug_tree(drop_path(path),EmptyTree)))

Exercise: prove the `in_order_plug...` lemmas

Prove the three lemmas in_order_plug_take_drop, in_order_plug_take, and in_order_plug_drop.

Back to the proof of `ti_index_get_in_order`

Our goal was to prove

x = nth(in_order(plug_tree(path,TreeNode(L,x,R))), a)
       (num_nodes(plug_tree(take_path(path),EmptyTree)) + num_nodes(L))

So we use lemma in_order_plug_take_drop to get the following

  in_order(plug_tree(path,TreeNode(L,x,R)))
= append(in_order(plug_tree(take_path(path),L)), node(x, in_order(plug_tree(drop_path(path),R))))

and then lemma in_order_plug_take separates out the L.

  in_order(plug_tree(take_path(path), L))
= append(in_order(plug_tree(take_path(path),EmptyTree)), in_order(L))

So rewriting with the above equations

    rewrite in_order_plug_take_drop[E][path][L,x,R]
    rewrite in_order_plug_take[E][path][L]

transforms our goal to

x = nth(append(append(in_order(plug_tree(take_path(path),EmptyTree)), in_order(L)),
               node(x,in_order(plug_tree(drop_path(path),R)))),a)
       (num_nodes(plug_tree(take_path(path),EmptyTree)) + num_nodes(L))

Recall that our plan is to use the nth_append_back lemma, in which the index argument to nth is length(xs), but in the above we have the index expressed in terms of num_nodes. The following exercise proves a theorem that relates length and in_order to num_nodes.

Exercise: prove the `length_in_order` theorem

theorem length_in_order: all E:type. all t:Tree<E>.
  length(in_order(t)) = num_nodes(t)

Back to `ti_index_get_in_order`

Now we rewrite with the length_in_order lemma a couple times, give some short names to these big expressions, and apply length_append from List.pf.

      rewrite symmetric length_in_order[E][L]
            | symmetric length_in_order[E][plug_tree(take_path(path),EmptyTree)]
      define_ X = in_order(plug_tree(take_path(path),EmptyTree))
      define_ Y = in_order(L)
      define_ Z = in_order(plug_tree(drop_path(path),R))
      rewrite symmetric length_append[E][X][Y]

Now we’re in a position to use nth_append_back.

x = nth(append(append(X,Y), node(x, Z)), a)
       (length(append(X,Y)))

In particular, nth_append_back[E][append(X,Y)][node(x,Z), 0, a] gives us

  nth(append(append(X,Y), node(x,Z)),a)(length(append(X,Y)) + 0) 
= nth(node(x,Z),a)(0)

With that we prove the goal using add_zero and the definition of nth.

  conclude x = nth(append(append(X,Y), node(x,Z)), a)(length(append(X,Y)))
    by rewrite (rewrite add_zero[length(append(X,Y))] in
                nth_append_back[E][append(X,Y)][node(x,Z), 0, a])
       definition nth.

Here is the complete proof of ti_index_get_in_order.

theorem ti_index_get_in_order: all E:type, z:TreeIter<E>, a:E.
  ti_get(z) = nth(in_order(ti2tree(z)), a)(ti_index(z))
proof
  arbitrary E:type, z:TreeIter<E>, a:E
  switch z {
    case TrItr(path, L, x, R) {
      definition {ti2tree, ti_get, ti_index, ti_take}
      rewrite num_nodes_plug[E][take_path(path)][L]
      
      suffices x = nth(in_order(plug_tree(path,TreeNode(L,x,R))),a)
                      (num_nodes(plug_tree(take_path(path),EmptyTree)) + num_nodes(L))
      rewrite in_order_plug_take_drop[E][path][L,x,R]
      rewrite in_order_plug_take[E][path][L]
      
      suffices x = nth(append(append(in_order(plug_tree(take_path(path),EmptyTree)),
                                     in_order(L)),
                              node(x,in_order(plug_tree(drop_path(path),R)))),a)
                      (num_nodes(plug_tree(take_path(path),EmptyTree)) + num_nodes(L))
      rewrite symmetric length_in_order[E][L]
            | symmetric length_in_order[E][plug_tree(take_path(path),EmptyTree)]
      define_ X = in_order(plug_tree(take_path(path),EmptyTree))
      define_ Y = in_order(L)
      define_ Z = in_order(plug_tree(drop_path(path),R))
      rewrite symmetric length_append[E][X][Y]
      
      conclude x = nth(append(append(X,Y), node(x,Z)), a)(length(append(X,Y)))
        by rewrite (rewrite add_zero[length(append(X,Y))] in
                    nth_append_back[E][append(X,Y)][node(x,Z), 0, a])
           definition nth.
    }
  }
end

This concludes the proofs of correctness for in-order iterator and the five operations ti2tree, ti_first, ti_get, ti_next, and ti_index.

Exercise: Prove that `ti_prev` is correct

In the previous post there was an exercise to implement ti_prev, which moves the iterator backwards one position with respect to in-order traversal. This exercise is to prove that your implementation of ti_prev is correct. There are two theorems to prove. The first one makes sure that ti_prev reduces the index of the iterator by one.

theorem ti_prev_index: all E:type, iter : TreeIter<E>.
  if 0 < ti_index(iter)
  then ti_index(ti_prev(iter)) = pred(ti_index(iter))

The second theorem makes sure that the resulting iterator is still an iterator for the same tree.

theorem ti_prev_stable: all E:type, iter:TreeIter<E>.
  ti2tree(ti_prev(iter)) = ti2tree(iter)

Binary Trees with In-order Iterators (Part 1)

2024-07-18T12:28:00.000-07:00

This is the fifth blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

In this blog post we study binary trees, that is, trees in which each node has at most two children. We study the in-order tree traversal, as that will become important when we study binary search trees. Furthermore, we implement tree iterators that keep track of a location within the tree and can move forward with respect to the in-order traversal. We shall prove that our implementation of tree iterators is correct in Part 2 of this blog post.

Binary Trees

We begin by defining a union for binary trees:

union Tree<E> {
  EmptyTree
  TreeNode(Tree<E>, E, Tree<E>)
}

For example, we can represent the following binary tree

with a bunch of tree nodes like so:

define T0 = TreeNode(EmptyTree, 0, EmptyTree)
define T2 = TreeNode(EmptyTree, 2, EmptyTree)
define T1 = TreeNode(T0, 1, T2)
define T4 = TreeNode(EmptyTree, 4, EmptyTree)
define T5 = TreeNode(T4, 5, EmptyTree)
define T7 = TreeNode(EmptyTree, 7, EmptyTree)
define T6 = TreeNode(T5, 6, T7)
define T3 = TreeNode(T1, 3, T6)

We define the height of a tree with the following recursive function.

function height<E>(Tree<E>) -> Nat {
  height(EmptyTree) = 0
  height(TreeNode(L, x, R)) = suc(max(height(L), height(R)))
}

The example tree has height 4.

assert height(T3) = 4

We count the number of nodes in a binary tree with the num_nodes function.

function num_nodes<E>(Tree<E>) -> Nat {
  num_nodes(EmptyTree) = 0
  num_nodes(TreeNode(L, x, R)) = suc(num_nodes(L) + num_nodes(R))
}

The example tree has 8 nodes.

assert num_nodes(T3) = 8

In-order Tree Traversal

Now for the main event of this blog post, the in-order tree traversal. The idea of this traversal is that for each node in the tree, we follow this recipe:

process the left subtree
process the current node
process the right subtree

What it means to process a node can be different for different instantiations of the in-order traversal. But to make things concrete, we study an in-order traversal that produces a list. So here is our definition of the in_order function.

function in_order<E>(Tree<E>) -> List<E> {
  in_order(EmptyTree) = empty
  in_order(TreeNode(L, x, R)) = append(in_order(L), node(x, in_order(R)))
}

The result of in_order for T3 is the list 0,1,2,3,4,5,6,7. As you can see, we chose the data values in T3 to match their position within the in-order traversal.

assert in_order(T3) = interval(8, 0)

In-order Tree Iterators

A tree iterator keeps track of a position with a tree. Our goal is to create a data structure to represent a tree iterator and also to implement the following operations on iterators, which we describe in the following paragraph.

ti2tree : < E > fn TreeIter<E> -> Tree<E>
ti_first : < E > fn Tree<E>,E,Tree<E> -> TreeIter<E>
ti_get : < E > fn TreeIter<E> -> E
ti_next : < E > fn TreeIter<E> -> TreeIter<E>
ti_index : < E > fn(TreeIter<E>) -> Nat

The ti2tree operator returns the tree that the iterator is traversing.
The ti_first operator returns an iterator pointing to the first node (with respect to the in-order traversal) of a non-empty tree. We represent non-empty trees with three things: the left subtree, the data in the root node, and the right subtree.
The ti_get operator returns the data of the node at the current position.
The ti_next operator moves the iterator forward by one position.
The ti_index operator returns the position of the iterator as a natural number.

Here is an example of creating an iterator for T3 and moving it forward.

define iter0 = ti_first(T1, 3, T6)
assert ti_get(iter0) = 0
assert ti_index(iter0) = 0

define iter3 = ti_next(ti_next(ti_next(iter0)))
assert ti_get(iter3) = 3
assert ti_index(iter3) = 3

define iter7 = ti_next(ti_next(ti_next(ti_next(iter3))))
assert ti_get(iter7) = 7
assert ti_index(iter7) = 7

Iterator Representation

We represent a position in the tree by recording a path of left-or-right decisions. For example, to represent the position of node 4 of the example tree, we record the path R,L,L (R for right and L for left).

When we come to implement the ti_next operation, we will sometimes need to climb the tree. For example, to get from 4 to 5. To make that easier, we will store the path in reverse. So the path to node 4 will be stored as L,L,R.

It would seem natural to store an iterator’s path separately from the tree, but doing so would complicate many of the upcoming proofs because only certain paths make sense for certain trees. Instead, we combine the path and the tree into a single data structure called a zipper (Huet, The Zipper, Journal of Functional Programming, Vol 7. Issue 5, 1997). The idea is to attach extra data to the left and right decisions and to store the subtree at the current position. So we define a union named Direction with constructors for left and right, and we define a union named TreeIter that contains a path and the non-empty tree at the current position.

union Direction<E> {
  LeftD(E, Tree<E>)
  RightD(Tree<E>, E)
}

union TreeIter<E> {
  TrItr(List<Direction<E>>, Tree<E>, E, Tree<E>)
}

The `ti2tree` Operation

Of the tree iterator operations, we will first implement ti2tree because it will help to explain this zipper-style representation. We start by defining the auxiliary function plug_tree, which reconstructs a tree from a path and the subtree at the specified position. The plug_tree function is defined by recursion on the path, so it moves upward in the tree with each recursive call. Consider the case for LeftD(x, R) below. To plug tree t into the path node(LeftD(x, R), path'), we used the extra data stored in LeftD(x, R) to create TreeNode(t, x, R) which we then pass to the recursive call, to plug the new tree node into the rest of the path.

function plug_tree<E>(List<Direction<E>>, Tree<E>) -> Tree<E> {
  plug_tree(empty, t) = t
  plug_tree(node(f, path'), t) =
    switch f {
      case LeftD(x, R) {
        plug_tree(path', TreeNode(t, x, R))
      }
      case RightD(L, x) {
        plug_tree(path', TreeNode(L, x, t))
      }
    }
}

The ti2tree operator simply invokes plug_tree.

function ti2tree<E>(TreeIter<E>) -> Tree<E> {
  ti2tree(TrItr(path, L, x, R)) = plug_tree(path, TreeNode(L, x, R))
}

Creating an iterator from a tree using ti_first and then applying ti2tree produces the original tree. Furthermore, moving an iterator does not change the tree that it is traversing, so ti2tree returns T3 for iterators iter0, iter3, and iter7.

assert ti2tree(iter0) = T3
assert ti2tree(iter3) = T3
assert ti2tree(iter7) = T3

The `ti_first` Operation

Recall that the ti_first operation returns an iterator pointing to the first node (with respect to the in-order traversal) of a non-empty tree. For example, applying ti_first to T3 should give us node 0. The idea to implement ti_first is simple: we walk down the tree going left at each step, until we get to a leaf.

To implement ti_first we define the auxiliary function first_path that takes a non-empty tree and the path-so-far and proceeds going to the left down the tree. (The first_path function will also come in handy when implementing ti_next.)

function first_path<E>(Tree<E>, E, Tree<E>, List<Direction<E>>) -> TreeIter<E> {
  first_path(EmptyTree, x, R, path) = TrItr(path, EmptyTree, x, R)
  first_path(TreeNode(LL, y, LR), x, R, path) = first_path(LL, y, LR, node(LeftD(x, R), path))
}

We implement ti_first simply as a call to first_path where the path-so-far is empty.

define ti_first : < E > fn Tree<E>,E,Tree<E> -> TreeIter<E>
    = λ L,x,R { first_path(L, x, R, empty) }

As promised above, applying ti_first to T3 gives us node 0.

assert ti_get(ti_first(T1, 3, T6)) = 0

The `ti_get` Operation

Recall that the ti_get operator should return the data of the node at the current position. This is straightforward to implement because that data is stored directly in the tree iterator.

function ti_get<E>(TreeIter<E>) -> E {
  ti_get(TrItr(path, L, x, R)) = x
}

The `ti_next` Operation

Recall that the ti_next operator moves the iterator forward by one position with respect to the in-order traversal. This operation is non-trivial to implement. Consider again our example tree.

Suppose the current node is 2. Then the next node is 3, which requires climbing a fair ways up the tree. On the other hand, if the current node is 3, then the next node is 4, way back down the tree. So there are two different scenarios that we need to handle.

If the current node has a right child, then the next node is the first node of the right child’s subtree (with respect to in-order traversal). For example, node 3 has right child 6, and the first node of that subtree is 4.
If the current node does not have a right child, then the next node is the ancestor after the first left branch. For example, node 2 does not have a right child, so we go up the tree. We go up to 1 via a right branch and then up to 3 via a left branch, so 3 is the next node of 2.

For (1) we already have first_path, so we just need an auxiliary function for (2), which we call next_up. This function takes a path and the current non-empty subtree and returns the iterator for the next position. If the direction is RightD, we keep going up the tree. If the direction is LeftD(x, R), we stop and return an iterator for the parent node x.

function next_up<E>(List<Direction<E>>, Tree<E>, E, Tree<E>) -> TreeIter<E> {
  next_up(empty, A, z, B) = TrItr(empty, A, z, B)
  next_up(node(f, path'), A, z, B) =
    switch f {
      case RightD(L, x) {
        next_up(path', L, x, TreeNode(A, z, B))
      }
      case LeftD(x, R) {
        TrItr(path', TreeNode(A, z, B), x, R)
      }
    }
}

Now that we have both next_up and first_path, we implement ti_next by checking whether the right child R is empty. If it is, we invoke next_up, and if not, we invoke first_path.

function ti_next<E>(TreeIter<E>) -> TreeIter<E> {
  ti_next(TrItr(path, L, x, R)) =
    switch R {
      case EmptyTree {
        next_up(path, L, x, R)
      }
      case TreeNode(RL, y, RR) {
        first_path(RL, y, RR, node(RightD(L, x), path))
      }
    }
}

To see ti_next in action, in the following we go from position 2 up to position 3 and then back down to position 4.

define iter2 = ti_next(ti_next(iter0))
assert ti_get(iter2) = 2

define iter3_ = ti_next(iter2)
assert ti_get(iter3_) = 3

define iter4 = ti_next(iter3_)
assert ti_get(iter4) = 4

The `ti_index` Operation

Recall that the ti_index operator returns the position of the iterator as a natural number. More specifically, ti_index returns the position of the current node with respect to the in the in-order traversal. The following demonstrates this invariant on iter0 and iter7.

define L0 = in_order(ti2tree(iter0))
define i0 = ti_index(iter0)
assert ti_get(iter0) = nth(L0, 42)(i0)

define L7 = in_order(ti2tree(iter7))
define i7 = ti_index(iter7)
assert ti_get(iter7) = nth(L7, 42)(i7)

The idea for implementing ti_index is that we’ll count how many nodes are in the portion of the tree that comes before the current position. We define an auxiliary function that constructs this portion of the tree, calling it ti_take because it is reminiscent of the take(n, ls) function in List.pf, which returns the prefix of list ls of length n. Furthermore, we use a second auxiliary function named take_path that applies this idea to the path of the iterator. So to implement the take_path function, we throw away the subtrees to the right of the path (by removing LeftD(x, R)) and we keep the subtrees to the left of the path (by keeping Right(L, x)).

function take_path<E>(List<Direction<E>>) -> List<Direction<E>> {
  take_path(empty) = empty
  take_path(node(f, path')) =
    switch f {
      case RightD(L, x) {
        node(RightD(L,x), take_path(path'))
      }
      case LeftD(x, R) {
        take_path(path')
      }
    }
}

We implement ti_take by applying take_path to the path of the iterator, and then plug the left subtree L into the result. (The node x and subtree R are not before node x with respect to in-order traversal.)

function ti_take<E>(TreeIter<E>) -> Tree<E> {
  ti_take(TrItr(path, L, x, R)) = plug_tree(take_path(path), L)
}

Finally, we implement ti_index by counting the number of nodes in the tree returned by ti_take.

define ti_index : < E > fn(TreeIter<E>) -> Nat = λ iter { num_nodes(ti_take(iter))}

Exercise: Implement and test the `ti_prev` Operation

The ti_prev operation (for previous) moves the iterator backward by one position with respect to in-order traversal.

ti_prev : < E > fn TreeIter<E> -> TreeIter<E>

Implement and test the ti_prev operation.

Conclusion

This completes the implementation of the 5 tree iterator operations. In Part 2 of this blog post, we will prove that these operations are correct.

Merge Sort with Leftovers, Correctly

2024-06-30T07:45:00.000-07:00

Merge Sort with Leftovers

This is the fourth blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

In this blog post we study a fast sorting algorithm, Merge Sort. This classic algorithm splits the input list in half, recursively sorts each half, and then merges the two results back into a single sorted list.

The specification of Merge Sort is the same as Insertion Sort.

Specification: The merge_sort(xs) function returns a list that contains the same elements as xs but the elements in the result are in sorted order.

We follow the write-test-prove approach to develop a correct implementation of merge_sort.

Write the `merge_sort` function

The classic implementation of merge_sort would be something like the following.

function merge_sort(List<Nat>) -> List<Nat> {
  merge_sort(empty) = empty
  merge_sort(node(x,xs')) =
    define p = split(node(x,xs'))
    merge(merge_sort(first(p)), merge_sort(second(p)))
}

Unfortunately, Deduce rejects the above function definition because Deduce uses a very simple restriction to ensure the termination of recursive function, which is that a recursive call may only be made on a part of the input. In this case, the recursive call may only be applied to the sublist xs', not first(p) or second(p).

How can we work around this restriction? There’s an old trick that goes by many names (gas, fuel, etc.), which is to add another parameter of type Nat and use that for termination. Let us use the name msort for the following, and then we define merge_sort in terms of msort.

function msort(Nat, List<Nat>) -> List<Nat> {
  msort(0, xs) = xs
  msort(suc(n'), xs) =
    define p = split(xs)
    merge(msort(n', first(p)), msort(n', second(p)))
}

define merge_sort : fn List<Nat> -> List<Nat>
  = λxs{ msort(log(length(xs)), xs) }

In the above definition of merge_sort, we need to suppply enough gas so that msort won’t prematurely run out. Here we use the logarithm (base 2, rounding up) defined in Log.pf.

This definition of merge_sort and msort is fine, it has O(n log(n)) time complexity, so it is efficient. However, the use of split rubs me the wrong way because it requires traversing half of the input list. The use of split is necessary if one wanted to use parallelism to speed up the code, performing the two recursive calls in parallel. However, we are currently only interested in a single-threaded implementation.

Suppose you just finished baking a pie and intend to eat half now and half tomorrow night. One approach would be to split it in half and then eat one of the halves. Another approach is to just start eating the pie and stop when half of it is gone. That’s the approach that we will take with the next version of msort.

Specification The msort(n,xs) function sorts the first min(2ⁿ,length(xs)) many elements of xs and returns a pair containing (1) the sorted list and (2) the leftovers that were not yet sorted.

function msort(Nat, List<Nat>) -> Pair< List<Nat>, List<Nat> > {
  msort(0, xs) =
    switch xs {
      case empty { pair(empty, empty) }
      case node(x, xs') { pair(node(x, empty), xs') }
    }
  msort(suc(n'), xs) =
    define p1 = msort(n', xs)
    define p2 = msort(n', second(p1))
    define ys = first(p1)
    define zs = first(p2)
    pair(merge(length(ys) + length(zs), ys, zs), second(p2))
}

In the above case for suc(n'), the first recursive call to msort produces the pair p1 that includes a sorted list and the leftovers. We sort the leftovers with the second recursive call to msort. We return (1) the merge of the two sorted sublists and (2) the leftovers from the second recursive call to msort.

With the code for msort complete, we can turn to merge_sort. Similar to the previous version, we involke msort with the input list xs and use the logarithm of list length for the gas. This msort returns a pair, with the sorted results in the first component. The second component of the pair is an empty list because we supplied enough gas.

define merge_sort : fn List<Nat> -> List<Nat>
    = λxs{ first(msort(log(length(xs)), xs)) }

So far, we have neglected the implementation of merge. Here’s its specification.

Specification: The merge(xs,ys) function takes two sorted lists and returns a sorted list that contains just the elements from the two input lists.

Here’s the classic implementation of merge. The idea is to compare the two elements at the front of each list and use the lower of the two as the first element of the output. Then do the recursive call with the two lists, minus the element that was chosen. Again, we use an extra gas parameter to ensure termination. To ensure that we have enough gas, we will choose the sum of the lengths of the two input lists.

function merge(Nat, List<Nat>, List<Nat>) -> List<Nat> {
  merge(0, xs, ys) = empty
  merge(suc(n), xs, ys) =
    switch xs {
      case empty { ys }
      case node(x, xs') {
        switch ys {
          case empty {
            node(x, xs')
          }
          case node(y, ys') {
            if x ≤ y then
              node(x, merge(n, xs', node(y, ys')))
            else
              node(y, merge(n, node(x, xs'), ys'))
          }
        }
     }
   }
}

Test

We have three functions to test, merge, msort and merge_sort.

Test `merge`

We test that the result of merge is sorted and that it contains all the elements from the two input lists, which we check using count.

define L_1337 = node(1, node(3, node(3, node(7, empty))))
define L_2348 = node(2, node(3, node(4, node(8, empty))))
define L_12333478 = merge(length(L_1337) + length(L_2348), L_1337, L_2348)
assert sorted(L_12333478)
assert all_elements(L_1337 ++ L_2348,
  λx{count(L_1337)(x) + count(L_2348)(x) = count(L_12333478)(x) })

Test `msort`

In the following tests, we vary the gas from 0 to 3, varying how much of the input list L18 gets sorted in the call to msort. The take(n,xs) function returns the first n elements of xs and drop(n,xs) drops the first n elements of xs and returns the remaining portion of xs.

define L18 = L_1337 ++ L_2348

define p0 = msort(0, L18)
define t0 = take(pow2(0), L18)
define d0 = drop(pow2(0), L18)
assert sorted(first(p0))
assert all_elements(t0, λx{count(t0)(x) = count(first(p0))(x) })
assert all_elements(d0, λx{count(d0)(x) = count(second(p0))(x) })

define p1 = msort(1, L18)
define t1 = take(pow2(1), L18)
define d1 = drop(pow2(1), L18)
assert sorted(first(p1))
assert all_elements(t1, λx{count(t1)(x) = count(first(p1))(x) })
assert all_elements(d1, λx{count(d1)(x) = count(second(p1))(x) })

define p2 = msort(2, L18)
define t2 = take(pow2(2), L18)
define d2 = drop(pow2(2), L18)
assert sorted(first(p2))
assert all_elements(t2, λx{count(t2)(x) = count(first(p2))(x) })
assert all_elements(d2, λx{count(d2)(x) = count(second(p2))(x) })

define p3 = msort(3, L18)
define t3 = take(pow2(3), L18)
define d3 = drop(pow2(3), L18)
assert sorted(first(p3))
assert all_elements(t3, λx{count(t3)(x) = count(first(p3))(x) })
assert all_elements(d3, λx{count(d3)(x) = count(second(p3))(x) })

Test `merge_sort`

Next we test that merge_sort returns a sorted list that contains the same elements as the input list. For input, we reuse the list L18 from above.

define s_L18 = merge_sort(L18)
assert sorted(s_L18)
assert all_elements(t0, λx{count(L18)(x) = count(s_L18)(x) })

We can bundle several tests, with varying-length inputs, into one assert by using all_elements and interval.

assert all_elements(interval(3, 0),
    λn{ define xs = reverse(interval(n, 0))
        define ls = merge_sort(xs)
        sorted(ls) and
        all_elements(xs, λx{count(xs)(x) = count(ls)(x)})
    })

Prove

Compared to the proof of correctness for insertion_sort, we have considerably more work to do for merge_sort. Instead of two functions, we have three functions to consider: merge, msort, and merge_sort. Furthermore, these functions are more complex than insert and insertion_sort. Nevertheless, we are up to the challenge!

Prove correctness of `merge`

The specificaiton of merge has two parts, one part saying that the elements of the output must be the elements of the two input lists, and the another part saying that the output must be sorted, provided the two input lists are sorted.

Here is how we state the theorem for the first part.

theorem mset_of_merge: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if length(xs) + length(ys) = n
  then mset_of(merge(n, xs, ys)) = mset_of(xs) ⨄ mset_of(ys)

Here is the theorem stating that the output of merge is sort.

theorem merge_sorted: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if sorted(xs) and sorted(ys) and
     length(xs) + length(ys) = n
  then sorted(merge(n, xs, ys))

Prove the `mset_of_merge` theorem

We begin with the proof of mset_of_merge. Because merge(n, xs, ys) is recursive on the natural number n, we proceed by induction on Nat.

  induction Nat
  case 0 {
    arbitrary xs:List<Nat>, ys:List<Nat>
    suppose prem: length(xs) + length(ys) = 0
    ?
  }
  case suc(n') suppose IH {
    ?
  }

In the case for n = 0, we need to prove

  mset_of(merge(0,xs,ys)) = mset_of(xs) ⨄ mset_of(ys)

and merge(0,xs,ys) returns empty, so we need to show that mset_of(xs) ⨄ mset_of(ys) is the empty multiset. From the premise prem, both xs and ys must be empty.

  // <<mset_of_merge_case_zero_xs_ys_empty>> =
  have lxs_lys_z: length(xs) = 0 and length(ys) = 0
    by apply add_to_zero[length(xs)][length(ys)] to prem
  have xs_mt: xs = empty
    by apply length_zero_empty[Nat,xs] to lxs_lys_z
  have ys_mt: ys = empty
    by apply length_zero_empty[Nat,ys] to lxs_lys_z

After rewriting with those equalities and applying the definition of merge and mset_of:

  suffices mset_of(merge(0, empty, empty)) = mset_of(empty) ⨄ mset_of(empty)
      with rewrite xs_mt | ys_mt
  suffices m_fun[Nat](λ{0}) = m_fun[Nat](λ{0}) ⨄ m_fun[Nat](λ{0})
      with definition {merge, mset_of}

it remains to prove m_fun(λ{0}) = m_fun(λ{0}) ⨄ m_fun(λ{0}) (the sum of two empty multisets is the empty multiset), which we prove with the theorem m_sum_empty from MultiSet.pf.

  // <<mset_of_merge_case_zero_conclusion>> =
  symmetric m_sum_empty[Nat, m_fun[Nat](λx{0})]

In the case for n = suc(n'), we need to prove

  mset_of(merge(suc(n'),xs,ys)) = mset_of(xs) ⨄ mset_of(ys)

Looking a the suc clause of merge, there is a switch on xs and then on ys. So our proof will be structured analogously.

  switch xs for merge {
    case empty {
      ?
    }
    case node(x, xs') suppose xs_xxs {
      ?
    }
  }

In the case for xs = empty, we conclude by the definition of mset_of and the fact that combining mset_of(ys) with the empty multiset produces mset_of(ys).

    // <<mset_of_merge_case_suc_empty>> =
    suffices mset_of(ys) = m_fun(λx{0}) ⨄ mset_of(ys)
        with definition {mset_of}
    symmetric empty_m_sum[Nat, mset_of(ys)]

In the case for xs = node(x, xs'), merge performs a switch on ys, so our proof does too.

  switch ys for merge {
    case empty {
      ?
    }
    case node(y, ys') suppose ys_yys {
      ?
    }

The case for ys = empty, is similar to the case for xs = empty. We conclude by use of the definitions of merge and mset_of and the fact that combining mset_of(ys) with the empty multiset produces mset_of(ys).

    // <<mset_of_merge_case_suc_node_empty>> =
    suffices m_one(x) ⨄ mset_of(xs')
           = (m_one(x) ⨄ mset_of(xs')) ⨄ m_fun(λ{0})
        with definition {mset_of}
    rewrite m_sum_empty[Nat, m_one(x) ⨄ mset_of(xs')]

In the case for ys = node(y, ys'), we continue to follow the structure of merge and switch on x ≤ y.

  switch x ≤ y {
    case true suppose xy_true {
      ?
    }
    case false suppose xy_false {
      ?
    }
  }

In the case for (x ≤ y) = true, the goal becomes

  mset_of(node(x, merge(n', xs', node(y, ys')))) 
= mset_of(node(x, xs')) ⨄ mset_of(node(y, ys'))

Which follows from the conclusion of the induction hypothesis (instantiated with xs' and node(y,ys')):

  mset_of(merge(n',xs',node(y,ys')))
= mset_of(xs') ⨄ mset_of(node(y, ys'))

The induction hypothesis is a conditional, so we first must prove its premise as follows.

    // <<mset_of_merge_x_le_y_IH_prem>> =
    have IH_prem: length(xs') + length(node(y,ys')) = n'
      by enable {operator +, operator +,length}
         have suc_len: suc(length(xs')) + suc(length(ys')) = suc(n')
                by rewrite xs_xxs | ys_yys in prem
         injective suc suc_len

We conclude this case with the following equational reasoning, using the induction hypothesis in the second step.

    // <<mset_of_merge_x_le_y_equations>> =
    equations
          mset_of(node(x, merge(n', xs', node(y, ys')))) 
        = m_one(x) ⨄ mset_of(merge(n',xs',node(y,ys')))
            by definition mset_of
    ... = m_one(x) ⨄ (mset_of(xs') ⨄ mset_of(node(y, ys')))
            by rewrite (apply IH[xs', node(y, ys')] to IH_prem)
    ... = m_one(x) ⨄ (mset_of(xs') ⨄ (m_one(y) ⨄ mset_of(ys')))
            by definition mset_of
    ... = (m_one(x) ⨄ mset_of(xs')) ⨄ (m_one(y) ⨄ mset_of(ys'))
            by rewrite m_sum_assoc[Nat, m_one(x), mset_of(xs'),
                                  (m_one(y) ⨄ mset_of(ys'))]
    ... = mset_of(node(x, xs')) ⨄ mset_of(node(y, ys'))
            by definition mset_of

In the case for (x ≤ y) = false, the goal becomes

  mset_of(node(y, merge(n', node(x, xs'), ys'))) 
= mset_of(node(x, xs')) ⨄ mset_of(node(y, ys'))

The conclusion of the induction hypothesis (instantiated with node(x,xs') and ys') is

  mset_of(merge(n',node(x,xs'),ys'))
= mset_of(node(x,xs')) ⨄ mset_of(ys')

so the goal will follow from the fact that multiset sum is associative and commutative.

We first prove the premise of the induction hypothesis.

    have IH_prem: length(node(x,xs')) + length(ys') = n'
      by enable {operator +, operator +, length}
         have suc_len: suc(length(xs')) + suc(length(ys')) = suc(n')
              by rewrite xs_xxs | ys_yys in prem
         injective suc
         rewrite add_suc[length(xs')][length(ys')] in suc_len

Then we proceed with applying the induction hypothesis in the second step, followed by the equational reasoning about multiset sum.

    equations
            mset_of(node(y, merge(n', node(x, xs'), ys')))
          = m_one(y) ⨄ mset_of(merge(n',node(x,xs'),ys'))
              by definition mset_of
      ... = m_one(y) ⨄ mset_of(node(x,xs')) ⨄ mset_of(ys')
              by rewrite (apply IH[node(x,xs'), ys'] to IH_prem)
      ... = m_one(y) ⨄ ((m_one(x) ⨄ mset_of(xs')) ⨄ mset_of(ys'))
              by definition mset_of
      ... = ((m_one(x) ⨄ mset_of(xs')) ⨄ mset_of(ys')) ⨄ m_one(y)
              by m_sum_commutes[Nat, m_one(y), (m_one(x) ⨄ mset_of(xs')) ⨄ mset_of(ys')]
      ... = (m_one(x) ⨄ mset_of(xs')) ⨄ (mset_of(ys') ⨄ m_one(y))
              by m_sum_assoc[Nat, m_one(x) ⨄ mset_of(xs'), mset_of(ys'), m_one(y)]
      ... = (m_one(x) ⨄ mset_of(xs')) ⨄ (m_one(y) ⨄ mset_of(ys'))
              by rewrite m_sum_commutes[Nat, mset_of(ys'), m_one(y)]
      ... = mset_of(node(x, xs')) ⨄ mset_of(node(y, ys'))
              by definition mset_of

Here is the completed proof of mset_of_merge.

theorem mset_of_merge: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if length(xs) + length(ys) = n
  then mset_of(merge(n, xs, ys)) = mset_of(xs) ⨄ mset_of(ys)
proof
  induction Nat
  case 0 {
    arbitrary xs:List<Nat>, ys:List<Nat>
    suppose prem: length(xs) + length(ys) = 0
    <<mset_of_merge_case_zero_xs_ys_empty>>
    <<mset_of_merge_case_zero_suffices>>
    <<mset_of_merge_case_zero_conclusion>>
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>, ys:List<Nat>
    suppose prem: length(xs) + length(ys) = suc(n')
    switch xs for merge {
      case empty {
        <<mset_of_merge_case_suc_empty>>
      }
      case node(x, xs') suppose xs_xxs {
        switch ys for merge {
          case empty {
            <<mset_of_merge_case_suc_node_empty>>
          }
          case node(y, ys') suppose ys_yys {
            switch x ≤ y {
              case true suppose xy_true {
                <<mset_of_merge_x_le_y_IH_prem>>
                <<mset_of_merge_x_le_y_equations>>
              }
              case false suppose xy_false {
                <<mset_of_merge_x_g_y_IH_prem>>
                <<mset_of_merge_x_g_y_equations>>
              }
            }
          }
        }
      }
    }
  }
end

The mset_of_merge theorem also holds for sets, using the set_of function. We prove the following set_of_merge theorem as a corollary of mset_of_merge.

theorem set_of_merge: all xs:List<Nat>, ys:List<Nat>.
  set_of(merge(length(xs) + length(ys), xs, ys)) = set_of(xs) ∪ set_of(ys)
proof
  arbitrary xs:List<Nat>, ys:List<Nat>
  equations
    set_of(merge(length(xs) + length(ys), xs, ys))
        = set_of_mset(mset_of(merge(length(xs) + length(ys), xs, ys)))
            by symmetric som_mset_eq_set[Nat]
                             [merge(length(xs) + length(ys), xs, ys)]
    ... = set_of_mset(mset_of(xs) ⨄ mset_of(ys))
            by rewrite mset_of_merge[length(xs) + length(ys)][xs, ys]
    ... = set_of_mset(mset_of(xs)) ∪ set_of_mset(mset_of(ys))
            by som_union[Nat, mset_of(xs), mset_of(ys)]
    ... = set_of(xs) ∪ set_of(ys)
            by rewrite som_mset_eq_set[Nat][xs] | som_mset_eq_set[Nat][ys]
end

Prove the `merge_sorted` theorem

Next up is the merge_sorted theorem.

theorem merge_sorted: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if sorted(xs) and sorted(ys) and length(xs) + length(ys) = n
  then sorted(merge(n, xs, ys))

The structure of the proof will be similar to the one for mset_of_merge, because they both follow the structure of merge. So begin with induction on Nat.

In the case for n = 0, we need to prove sorted(merge(0, xs, ys)). But merge(0, xs, ys) = empty, and sorted(empty) is trivially true. So we conclude the case for n = 0 as follows.

    // <<merge_sorted_case_zero>> =
    arbitrary xs:List<Nat>, ys:List<Nat>
    suppose _
    suffices sorted(empty)  with definition merge
    definition sorted

We move on to the case for n = suc(n') and xs = empty. Here merge returns ys, and we already know that ys is sorted from the premise.

    // <<merge_sorted_case_suc_empty>> =
    conclude sorted(ys) by prem

In the case for xs = node(x, xs') and ys = empty, the merge function returns node(x, xs') (aka. xs), and we already know that xs is sorted from the premise.

    // <<merge_sorted_case_suc_node_empty>> =
    conclude sorted(node(x,xs'))  by rewrite xs_xxs in prem

In the case for ys = node(y, ys') and (x ≤ y) = true, the merge function returns node(x, merge(n',xs',node(y,ys'))). So we need to prove the following.

    suffices sorted(merge(n',xs',node(y,ys'))) and
             all_elements(merge(n',xs',node(y,ys')), λb{x ≤ b}) 
                 with definition sorted

To prove the first, we invoke the induction hypothesis intantiated to xs' and node(y,ys') as follows.

    // <<merge_sorted_IH_xs_yys>> =
    have s_xs: sorted(xs')
      by definition sorted in rewrite xs_xxs in prem
    have s_yys: sorted(node(y,ys'))
      by rewrite ys_yys in prem
    have len_xs_yys: length(xs') + length(node(y,ys')) = n'
      by enable {operator +, operator +, length}
         have sxs: suc(length(xs')) + suc(length(ys')) = suc(n')
            by rewrite xs_xxs | ys_yys in prem
         injective suc sxs
    have IH_xs_yys: sorted(merge(n',xs',node(y,ys')))
      by apply IH[xs',node(y,ys')] to s_xs, s_yys, len_xs_yys

It remains to prove that x is less-or-equal to to all the elements in the rest of the output list:

  all_elements(merge(n',xs',node(y,ys')),λb{x ≤ b})

The theorem all_elements_eq_member in List.pf says

  all_elements(xs,P) = (all x:T. if x ∈ set_of(xs) then P(x))

which combined with the set_of_merge corollary above, simplifies our goal as follows

    // <<x_le_merge_suffices>> =
    suffices (all z:Nat. (if z ∈ set_of(xs') ∪ set_of(node(y, ys')) then x ≤ z))
        with rewrite all_elements_eq_member[Nat, merge(n', xs', node(y,ys')),
                                            λb{x ≤ b}]
                   | symmetric len_xs_yys | set_of_merge[xs',node(y,ys')]
    arbitrary z:Nat
    suppose z_in_xs_yys: z ∈ set_of(xs') ∪ set_of(node(y,ys'))

So we have a few cases to consider and need to prove x ≤ z in each one. Consider the case where z ∈ set_of(xs'). Because node(x, xs') is sorted, we know x is less-or-equal every element of xs':

  // <<x_le_xs>> =
  have x_le_xs: all_elements(xs', λb{x ≤ b})
    by definition sorted in rewrite xs_xxs in prem

so x is less-or-equal to z, being one of the elements in xs'.

  conclude x ≤ z by
    apply all_elements_member[Nat][xs'][z, λb{x ≤ b}]
    to x_le_xs, z_in_xs

Next, consider the case where z ∈ single(y) and therefore y = z. Then we can immediately conclude because x ≤ y.

    have y_z: y = z   by definition {operator ∈, single, rep} in z_sy
    conclude x ≤ z    by rewrite symmetric y_z | xy_true

Finally, consider when z ∈ set_of(ys'). Because node(y,ys') is sorted, we know that y is less-or-equal all elements of ys'.

    have y_le_ys: all_elements(ys', λb{y ≤ b})
      by definition sorted in rewrite ys_yys in prem

Therefore we have y ≤ z. Combined with x ≤ y, we conclude that x ≤ z by transitivity.

    // <<merge_sorted_z_in_ys>> =
    have y_z: y ≤ z
      by apply all_elements_member[Nat][ys'][z,λb{y ≤ b}]
         to y_le_ys, z_in_ys
    have x_y: x ≤ y by rewrite xy_true
    conclude x ≤ z
        by apply less_equal_trans[x][y,z] to x_y, y_z

The last case to consider is for ys = node(y, ys') and (x ≤ y) = false. The reasoning is similar to the case for (x ≤ y) = true, so we skip the detailed explanation and refer the reader to the below proof.

Here’s the completed proof of merge_sorted.

theorem merge_sorted: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if sorted(xs) and sorted(ys) and length(xs) + length(ys) = n
  then sorted(merge(n, xs, ys))
proof
  induction Nat
  case 0 {
    <<merge_sorted_case_zero>>
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>, ys:List<Nat>
    suppose prem
    switch xs for merge {
      case empty {
        <<merge_sorted_case_suc_empty>>
      }
      case node(x, xs') suppose xs_xxs {
        switch ys for merge {
          case empty {
            <<merge_sorted_case_suc_node_empty>>
          }
          case node(y, ys') suppose ys_yys {
            <<merge_sorted_IH_xs_yys>>
            <<merge_sorted_x_le_xs>>
            <<merge_sorted_y_le_ys>>
            switch x ≤ y {
              case true suppose xy_true {
                <<merge_sorted_less_equal_suffices>>
                have x_le_merge: all_elements(merge(n',xs',node(y,ys')),λb{x ≤ b}) by
                    <<x_le_merge_suffices>>
                    suffices x ≤ z  by .
                    cases apply member_union[Nat] to z_in_xs_yys
                    case z_in_xs: z ∈ set_of(xs') {
                      <<merge_sorted_z_in_xs>>
                    }
                    case z_in_ys: z ∈ set_of(node(y,ys')) {
                      cases apply member_union[Nat] to definition set_of in z_in_ys
                      case z_sy: z ∈ single(y) {
                        <<merge_sorted_z_in_y>>
                      }
                      case z_in_ys: z ∈ set_of(ys') {
                        <<merge_sorted_z_in_ys>>
                      }
                    }
                IH_xs_yys, x_le_merge
              }
              case false suppose xy_false {
              
                /* Apply the induction hypothesis
                 * to prove sorted(merge(n',node(x,xs'),ys'))
                 */
                have len_xxs_ys: length(node(x,xs')) + length(ys') = n'
                  by enable {operator +, operator +, length}
                     have suc_len: suc(length(xs') + suc(length(ys'))) = suc(n')
                       by rewrite xs_xxs | ys_yys in prem
                     injective suc
                     rewrite add_suc[length(xs')][length(ys')] in suc_len
                have s_xxs: sorted(node(x, xs'))
                  by enable sorted rewrite xs_xxs in prem
                have s_ys: sorted(ys')
                  by definition sorted in rewrite ys_yys in prem
                have IH_xxs_ys: sorted(merge(n',node(x,xs'),ys'))
                  by apply IH[node(x,xs'),ys'] to s_xxs, s_ys, len_xxs_ys

                have not_x_y: not (x ≤ y)
                  by suppose xs rewrite xy_false in xs
                have y_x: y ≤ x
                  by apply less_implies_less_equal[y][x] to
                     (apply not_less_equal_greater[x,y] to not_x_y)
                suffices sorted(merge(n',node(x,xs'),ys')) and
                         all_elements(merge(n',node(x,xs'),ys'),λb{y ≤ b}) 
                    with definition sorted
                have y_le_merge: all_elements(merge(n',node(x,xs'),ys'),λb{y ≤ b}) by
                    suffices (all z:Nat. (if z ∈ set_of(node(x, xs')) ∪ set_of(ys') then y ≤ z))
                        with rewrite all_elements_eq_member[Nat,merge(n',node(x,xs'),ys'),λb{y ≤ b}]
                                   | symmetric len_xxs_ys | set_of_merge[node(x,xs'),ys']
                    arbitrary z:Nat
                    suppose z_in_xxs_ys: z ∈ set_of(node(x,xs')) ∪ set_of(ys')
                    suffices y ≤ z  by.
                    cases apply member_union[Nat] to z_in_xxs_ys
                    case z_in_xxs: z ∈ set_of(node(x,xs')) {
                      have z_in_sx_or_xs: z ∈ single(x) or z ∈ set_of(xs')
                        by apply member_union[Nat] to definition set_of in z_in_xxs
                      cases z_in_sx_or_xs
                      case z_in_sx: z ∈ single(x) {
                        have x_z: x = z  by definition {operator ∈, single, rep} in z_in_sx
                        conclude y ≤ z  by rewrite x_z in y_x
                      }
                      case z_in_xs: z ∈ set_of(xs') {
                        have x_z: x ≤ z
                          by apply all_elements_member[Nat][xs'][z,λb{x ≤ b}]
                             to x_le_xs, z_in_xs
                        conclude y ≤ z 
                           by apply less_equal_trans[y][x,z] to y_x, x_z
                      }
                    }
                    case z_in_ys: z ∈ set_of(ys') {
                      conclude y ≤ z by
                        apply all_elements_member[Nat][ys'][z,λb{y ≤ b}]
                        to y_le_ys, z_in_ys
                    }
                IH_xxs_ys, y_le_merge
              }
            }
          }
        }
      }
    }
  }
end

Prove correctness of `msort`

First we show that the two lists produced by msort contain the same elements as the input list.

theorem mset_of_msort: all n:Nat. all xs:List<Nat>.
  mset_of(first(msort(n, xs)))  ⨄  mset_of(second(msort(n, xs))) = mset_of(xs)
proof
  induction Nat
  case 0 {
    arbitrary xs:List<Nat>
    switch xs for msort {
      case empty {
        suffices m_fun[Nat](λ{0}) ⨄ m_fun[Nat](λ{0}) = m_fun[Nat](λ{0})
            with definition {first, second, mset_of}
        rewrite m_sum_empty[Nat,m_fun(λx{0})]
      }
      case node(x, xs') {
        suffices (m_one(x) ⨄ m_fun[Nat](λ{0})) ⨄ mset_of(xs')
               = m_one(x) ⨄ mset_of(xs')
            with definition {first, second, mset_of, mset_of}
        rewrite m_sum_empty[Nat,m_one(x)]
      }
    }
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>
    suffices mset_of(merge(length(first(msort(n', xs))) 
                           + length(first(msort(n', second(msort(n', xs))))),
                           first(msort(n', xs)),
                           first(msort(n', second(msort(n', xs)))))) 
             ⨄ mset_of(second(msort(n', second(msort(n', xs))))) 
             = mset_of(xs)
        with definition {msort, first, second}
    define ys = first(msort(n',xs))
    define ls = second(msort(n',xs))
    define zs = first(msort(n', ls))
    define ms = second(msort(n', ls))
    equations
          mset_of(merge(length(ys) + length(zs),ys,zs)) ⨄ mset_of(ms)
        = (mset_of(ys) ⨄ mset_of(zs)) ⨄ mset_of(ms)
          by rewrite (mset_of_merge[length(ys) + length(zs)][ys,zs])
    ... = mset_of(ys) ⨄ (mset_of(zs) ⨄ mset_of(ms))
          by rewrite m_sum_assoc[Nat, mset_of(ys), mset_of(zs), mset_of(ms)]
    ... = mset_of(ys) ⨄ mset_of(ls)
          by rewrite conclude mset_of(zs) ⨄ mset_of(ms) = mset_of(ls)
                     by enable {zs, ms} IH[ls]
    ... = mset_of(xs)
          by enable {ys, ls} IH[xs]
  }
end

Next, we prove that the first output list is sorted. We make use of the merge_sorted theorem in this proof.

theorem msort_sorted: all n:Nat. all xs:List<Nat>. 
  sorted(first(msort(n, xs)))
proof
  induction Nat
  case 0 {
    arbitrary xs:List<Nat>
    switch xs {
      case empty {
        suffices sorted(empty)  with definition {msort, first}
        definition sorted
      }
      case node(x, xs') {
        suffices sorted(node(x,empty))
            with definition {msort, first}
        definition {sorted, sorted, all_elements}
      }
    }
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>
    suffices sorted(merge(length(first(msort(n', xs))) 
                          + length(first(msort(n', second(msort(n', xs))))), 
                          first(msort(n', xs)), 
                          first(msort(n', second(msort(n', xs))))))
        with definition {msort, first}
    define ys = first(msort(n', xs))
    define ls = second(msort(n', xs))
    define zs = first(msort(n', ls))
    have IH1: sorted(ys)  by enable {ys}  IH[xs]
    have IH2: sorted(zs)  by enable {zs}  IH[ls]
    conclude sorted(merge(length(ys) + length(zs), ys, zs))
      by apply merge_sorted[length(ys) + length(zs)][ys, zs] to IH1, IH2
  }
end

It remains to show that first output of msort is of length min(2ⁿ,length(xs)). Instead of using min, I separated the proof into a couple cases depending on whether 2ⁿ ≤ length(xs). However, I first needed to prove the lengths of the two output lists adds up to the length of the input list.

theorem msort_length: all n:Nat. all xs:List<Nat>.
  length(first(msort(n, xs)))  +  length(second(msort(n, xs))) = length(xs)

The proof of msort_length required a theorem that the length of the output of merge is the sum of the lengths of the inputs.

theorem merge_length: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if length(xs) + length(ys) = n
  then length(merge(n, xs, ys)) = n

So in the case when the length of the input list is greater than 2ⁿ, the first output of msort is of length 2ⁿ.

theorem msort_length_less_equal: all n:Nat. all xs:List<Nat>.
  if pow2(n) ≤ length(xs)
  then length(first( msort(n, xs) )) = pow2(n)
proof
  induction Nat
  case 0 {
    arbitrary xs:List<Nat>
    suppose prem
    switch xs {
      case empty suppose xs_mt {
        conclude false
            by definition {pow2, length, operator≤} in
               rewrite xs_mt in prem
      }
      case node(x, xs') suppose xs_xxs {
        suffices length(node(x,empty)) = pow2(0)
            with definition {msort,first}
        definition {length, length, pow2, operator+, operator+}
      }
    }
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>
    suppose prem
    have len_xs: pow2(n') + pow2(n') ≤ length(xs)
      by rewrite add_zero[pow2(n')] in
         definition {pow2, operator*, operator*,operator*} in prem
    suffices length(merge(length(first(msort(n', xs))) 
                            + length(first(msort(n', second(msort(n', xs))))), 
                          first(msort(n', xs)), 
                          first(msort(n', second(msort(n', xs))))))
             = 2 * pow2(n')
        with definition {pow2, msort, first}
    define ys = first(msort(n',xs))
    define ls = second(msort(n',xs))
    define zs = first(msort(n', ls))
    define ms = second(msort(n', ls))
    have len_ys: length(ys) = pow2(n') by {
         have p2n_le_xs: pow2(n') ≤ length(xs) by
             have p2n_le_2p2n: pow2(n') ≤ pow2(n') + pow2(n') by
                 less_equal_add[pow2(n')][pow2(n')]
             apply less_equal_trans[pow2(n')][pow2(n') + pow2(n'), length(xs)]
             to p2n_le_2p2n, len_xs
         enable {ys} 
         apply IH[xs] to p2n_le_xs
    }
    have len_zs: length(zs) = pow2(n') by {
         have len_ys_ls_eq_xs: length(ys) + length(ls) = length(xs)
           by enable {ys, ls} msort_length[n'][xs]
         have p2n_le_ls: pow2(n') ≤ length(ls)
           by have pp_pl: pow2(n') + pow2(n') ≤ pow2(n') + length(ls)
                by rewrite symmetric len_ys_ls_eq_xs | len_ys in len_xs
              apply less_equal_left_cancel[pow2(n')][pow2(n'), length(ls)] to pp_pl
         enable {zs} 
         apply IH[ls] to p2n_le_ls
    }
    have len_ys_zs: length(ys) + length(zs) = 2 * pow2(n') by {
      equations
        length(ys) + length(zs) 
          = pow2(n') + pow2(n')    by rewrite len_ys | len_zs
      ... = 2 * pow2(n')           by symmetric two_mult[pow2(n')]
    }
    equations
          length(merge(length(ys) + length(zs), ys, zs))
        = length(merge(2 * pow2(n'), ys, zs))   by rewrite len_ys_zs
    ... = 2 * pow2(n')                          by apply merge_length[2 * pow2(n')][ys, zs] to len_ys_zs
  }
end

When the length of the input list is less than 2ⁿ, the length of the first output is the same as the length of the input.

theorem msort_length_less: all n:Nat. all xs:List<Nat>.
  if length(xs) < pow2(n)
  then length(first( msort(n, xs) )) = length(xs)
proof
  induction Nat
  case 0 {
    arbitrary xs:List<Nat>
    suppose prem
    switch xs {
      case empty suppose xs_mt {
        definition {msort, length, first}
      }
      case node(x, xs') suppose xs_xxs {
        suffices 1 + 0 = 1 + length(xs')
            with definition {msort, first, length, length}
        have xs_0: length(xs') = 0
            by definition {operator ≤, length, operator+, operator+, operator<, 
                           pow2, operator ≤, operator ≤} in 
               rewrite xs_xxs in prem
        rewrite xs_0
      }
    }
  }
  case suc(n') suppose IH {
    arbitrary xs:List<Nat>
    suppose prem
    suffices length(merge(length(first(msort(n', xs))) 
                          + length(first(msort(n', second(msort(n', xs))))), 
                          first(msort(n', xs)), 
                          first(msort(n', second(msort(n', xs))))))
             = length(xs)
        with definition{msort, first}
    define ys = first(msort(n',xs))
    define ls = second(msort(n',xs))
    define zs = first(msort(n', ls))
    define ms = second(msort(n', ls))

    have xs_le_two_p2n: length(xs) < pow2(n') + pow2(n')
      by rewrite add_zero[pow2(n')] in
         definition {pow2, operator*,operator*,operator*} in prem

    have ys_ls_eq_xs: length(ys) + length(ls) = length(xs)
      by enable {ys,ls} msort_length[n'][xs]

    have pn_xs_or_xs_pn: pow2(n') ≤ length(xs) or length(xs) < pow2(n')
      by dichotomy[pow2(n'), length(xs)]
    cases pn_xs_or_xs_pn
    case pn_xs: pow2(n') ≤ length(xs) {
    
      have ys_pn: length(ys) = pow2(n')
          by enable {ys} apply msort_length_less_equal[n'][xs] to pn_xs

      have ls_l_pn: length(ls) < pow2(n')
          by have pn_ls_l_2pn: pow2(n') + length(ls) < pow2(n') + pow2(n')
               by rewrite symmetric ys_ls_eq_xs | ys_pn in xs_le_two_p2n
             apply less_left_cancel[pow2(n'), length(ls), pow2(n')] to pn_ls_l_2pn

      have len_zs: length(zs) = length(ls)
          by enable {zs} apply IH[ls] to ls_l_pn

      equations
        length(merge(length(ys) + length(zs),ys,zs))
            = length(ys) + length(zs)
              by merge_length[length(ys) + length(zs)][ys,zs]
        ... = length(ys) + length(ls)
              by rewrite len_zs
        ... = length(xs)
              by ys_ls_eq_xs
    }
    case xs_pn: length(xs) < pow2(n') {
    
      have len_ys: length(ys) = length(xs)
        by enable {ys} apply IH[xs] to xs_pn

      have len_ls: length(ls) = 0
        by apply left_cancel[length(ys)][length(ls), 0] to
           suffices length(ys) + length(ls) = length(ys)
               by rewrite add_zero[length(ys)]
           rewrite symmetric len_ys in ys_ls_eq_xs

      have ls_l_pn: length(ls) < pow2(n')
        by rewrite symmetric len_ls in pow_positive[n'] 
      
      have len_zs: length(zs) = 0
        by enable {zs} rewrite len_ls in apply IH[ls] to ls_l_pn

      equations
        length(merge(length(ys) + length(zs),ys,zs))
          = length(ys) + length(zs)
            by merge_length[length(ys) + length(zs)][ys, zs]
      ... = length(xs)
            by rewrite len_zs | add_zero[length(ys)] | len_ys
    }
  }
end

Prove correctness of `merge_sort`

The proof that merge_sort produces a sorted list is a straightforward corollary of the msort_sorted theorem.

theorem merge_sort_sorted: all xs:List<Nat>.
  sorted(merge_sort(xs))
proof
  arbitrary xs:List<Nat>
  suffices sorted(first(msort(log(length(xs)), xs)))
      with definition merge_sort
  msort_sorted[log(length(xs))][xs]
end

The proof that the contents of the output of merge_sort are the same as the input is a bit more involved. So if we use the definitoin of merge_sort, we then need to show that

mset_of(first(msort(log(length(xs)),xs))) = mset_of(xs)

which means we need to show that all the elements in xs end up in the first output and that there are not any leftovers. Let ys be the first output of msort and ls be the leftovers. The theorem less_equal_pow_log in Log.pf tells us that length(xs) ≤ pow2(log(length(xs))). So in the case where they are equal, we can use the msort_length_less_equal theorem to show that length(ys) = length(xs). In the case where length(xs) is strictly smaller, we use the msort_length_less theorem to prove that length(ys) = length(xs). Finally, we show that the length of ls is zero by use of msort_length and some properties of arithmetic like left_cancel (in Nat.pf).

Here is the proof of mset_of_merge_sort in full.

theorem mset_of_merge_sort: all xs:List<Nat>.
  mset_of(merge_sort(xs)) = mset_of(xs)
proof
  arbitrary xs:List<Nat>
  suffices mset_of(first(msort(log(length(xs)), xs))) = mset_of(xs)
      with definition merge_sort
  define n = log(length(xs))
  define ys = first(msort(n,xs))
  define ls = second(msort(n,xs))

  have len_xs: length(xs) ≤ pow2(n)
    by enable {n} less_equal_pow_log[length(xs)]
  have len_ys: length(ys) = length(xs)
    by cases apply less_equal_implies_less_or_equal[length(xs)][pow2(n)]
             to len_xs
       case len_xs_less: length(xs) < pow2(n) {
         enable {ys} apply msort_length_less[n][xs] to len_xs_less
       }
       case len_xs_equal: length(xs) = pow2(n) {
         have pn_le_xs: pow2(n) ≤ length(xs)
           by apply equal_implies_less_equal to symmetric len_xs_equal
         have len_ys_pow2: length(ys) = pow2(n)
           by enable {ys} apply msort_length_less_equal[n][xs] to pn_le_xs
         transitive len_ys_pow2 (symmetric len_xs_equal)
       }
  have len_ys_ls_eq_xs: length(ys) + length(ls) = length(xs)
    by enable {ys, ls} msort_length[n][xs]
  have len_ls: length(ls) = 0
    by apply left_cancel[length(ys)][length(ls), 0] to
       suffices length(ys) + length(ls) = length(ys)
           with rewrite add_zero[length(ys)]
       rewrite symmetric len_ys in len_ys_ls_eq_xs
  have ls_mt: ls = empty
    by apply length_zero_empty[Nat, ls] to len_ls

  have ys_ls_eq_xs: mset_of(ys)  ⨄  mset_of(ls) = mset_of(xs)
    by enable {ys, ls} mset_of_msort[n][xs]

  equations
          mset_of(ys)
        = mset_of(ys)  ⨄  m_fun(λx{0})   by rewrite m_sum_empty[Nat,mset_of(ys)]
    ... = mset_of(ys)  ⨄  mset_of(empty) by definition mset_of
    ... = mset_of(ys)  ⨄  mset_of(ls)    by rewrite ls_mt 
    ... = mset_of(xs)                    by ys_ls_eq_xs
end

Exercise: `merge_length` and `msort_length`

Prove the following theorems.

theorem merge_length: all n:Nat. all xs:List<Nat>, ys:List<Nat>.
  if length(xs) + length(ys) = n
  then length(merge(n, xs, ys)) = n

theorem msort_length: all n:Nat. all xs:List<Nat>.
  length(first(msort(n, xs)))  +  length(second(msort(n, xs))) = length(xs)

Exercise: classic Merge Sort

Test and prove the correctness of the classic definition of merge_sort, which we repeat here.

function msort(Nat, List<Nat>) -> List<Nat> {
  msort(0, xs) = xs
  msort(suc(n'), xs) =
    define p = split(xs)
    merge(msort(n', first(p)), msort(n', second(p)))
}

define merge_sort : fn List<Nat> -> List<Nat>
  = λxs{ msort(log(length(xs)), xs) }

You will need define the split function.

Insertion Sort, Correctly

2024-06-17T10:29:00.000-07:00

Insertion Sort

This is the third blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

In this blog post we study a simple but slow sorting algorithm, Insertion Sort. (We will study the faster Merge Sort in the next blog post.) Insertion Sort is, roughly speaking, how many people sort the cards in their hand when playing a card game. The basic idea is to take one card at a time and place it into the correct location amongst the cards that you’ve already sorted.

Specification: The insertion_sort(xs) function returns a list that contains the same elements as xs but the elements in the result are in sorted order.

Of course, to make this specification precise, we need to define "sorted". There are several ways to go with this formal definition. Here is one that works well for me. It requires each element in the list to be less-or-equal to all the elements that come after it.

function sorted(List<Nat>) -> bool {
  sorted(empty) = true
  sorted(node(x, xs)) =
    sorted(xs) and all_elements(xs, λy{ x ≤ y })
}

We follow the write-test-prove approach to develop a correct implementation of insertion_sort. We then propose an exercise for the reader.

Write the `insertion_sort` function

Because insertion_sort operates on the recursive type List, we’ll try to implement insertion_sort as a recursive function.

function insertion_sort(List<Nat>) -> List<Nat> {
  insertion_sort(empty) = ?
  insertion_sort(node(x, xs')) = ?
}

In the case for the empty list, we need to return a list with the same contents, so we better return empty.

function insertion_sort(List<Nat>) -> List<Nat> {
  insertion_sort(empty) = empty
  insertion_sort(node(x, xs')) = ?
}

In the case for node(x, xs'), we can make the recursive call insertion_sort(xs') to sort the rest of the list.

function insertion_sort(List<Nat>) -> List<Nat> {
  insertion_sort(empty) = empty
  insertion_sort(node(x, xs')) = ... insertion_sort(xs') ...
}

But what do we do with the element x? This is where we need to define an auxiliary function that inserts x into the appropriate location within the result of sorting the rest of the list. We’ll choose the name insert for this auxiliary function. Here is the completed code for insertion_sort.

function insertion_sort(List<Nat>) -> List<Nat> {
  insertion_sort(empty) = empty
  insertion_sort(node(x, xs')) = insert(insertion_sort(xs'), x)
}

Of course, we’ll follow the write-test-prove approach to develop the insert function. The first thing we need to do is write down the specification. The specification of insert will play an important role in the proof of correctness of insertion_sort, because we’ll use the correctness theorems about insert in the proof of the correctness theorems about insertion_sort. With this in mind, here’s a specification for insert.

Specification: The insert(xs, y) function takes a sorted list xs and value y as input and returns a sorted list that contains y and the elements of xs.

Next we write the code for insert. This function also has a List as input, so we define yet another recursive function.

function insert(List<Nat>,Nat) -> List<Nat> {
  insert(empty, y) = ?
  insert(node(x, xs), y) = ?
}

For the case empty we must return a list that contains y, so it must be node(y, empty)

function insert(List<Nat>,Nat) -> List<Nat> {
  insert(empty, y) = node(y, empty)
  insert(node(x, xs), y) = ?
}

In the case for node(x, xs'), we need to check whether y should come before x. So we test y ≤ x and if that’s the case, we return node(y, node(x, xs')). Otherwise, y belongs somewhere later in the sequence, so we make the recursive call and return node(x, insert(xs', y)).

function insert(List<Nat>,Nat) -> List<Nat> {
  insert(empty, y) = node(y, empty)
  insert(node(x, xs'), y) =
    if y ≤ x then
      node(y, node(x, xs'))
    else
      node(x, insert(xs', y))
}

Test

This time we have 2 functions to test, insert and insertion_sort. We start with insert because if there are bugs in insert, then it will be confusing to find out about them when testing insertion_sort.

Test `insert`

Looking at the specification for insert, we need to check whether the resulting list is sorted and we need to check that the resulting list contains the elements from the input and the inserted element. We could use the search function that we developed in the previous blog post to check whether the elements from the input list are in the output list. However, doing that would ignore a subtle issue, which is that there can be one or more occurrences of the same element in the input list, and the output list should have the same number of occurrences. To take this into account, we need a new function to count the number of occurrences of an element.

function count<T>(List<T>) -> fn T->Nat {
  count(empty) = λy{ 0 }
  count(node(x, xs')) = λy{
    if y = x then 
      suc(count(xs')(y))
    else
      count(xs')(y) 
  }
}

Here’s a test that checks whether insert produces a sorted list with the correct count for every element on the input list as well as the inserted element.

define list_1234 = node(1, node(2, node(3, node(4, empty))))
define list_12334 = insert(list_1234, 3)
assert sorted(list_12334)
assert all_elements(node(3, list_1234), λx{
  if x = 3 then
    count(list_12334)(x) = suc(count(list_1234)(x))
  else
    count(list_12334)(x) = count(list_1234)(x)
  })

It’s a good idea to test corner cases, that is, inputs that trigger different code paths through the insert function. As there is a case for the empty list in the code, that’s a good test case to consider.

define list_3 = insert(empty, 3)
assert sorted(list_3)
assert all_elements(node(3, empty), λx{
  if x = 3 then
    count(list_3)(x) = suc(count(empty : List<Nat>)(x))
  else
    count(list_3)(x) = count(empty : List<Nat>)(x)
  })

Ideally we would also test with hundreds of randomly-generated lists. Adding support for random number generation is high on the TODO list for Deduce.

Test `insertion_sort`

If we refer back to the specification of insertion_sort, we need to check that the output list is sorted and that it contains the same elements as the input list.

define list_8373 = node(8, node(3, node(7, node(3, empty))))
define list_3378 = insertion_sort(list_8373)
assert sorted( list_3378 )
assert all_elements(list_8373, λx{count(list_3378)(x) = count(list_8373)(x)})

Prove

The next step in the process is to prove the correctness of insert and insertion_sort with respect to their specification.

Prove correctness of `insert`

Our first task is to prove that insert(xs, y) produces a list that contains y and the elements of xs. In our tests, we used the count function to accomplish this. Note that count returns a function fn T->Nat, which is the same thing as a multiset. The file MultiSet.pf defines the MultiSet<T> type and operations on them such as m_one(x) for creating a multiset that only contains x and the operator A ⨄ B for combining two multisets. The file List.pf defines a function mset_of that converts a list into a multiset.

function mset_of<T>(List<T>) -> MultiSet<T> {
  mset_of(empty) = m_fun(λ{0})
  mset_of(node(x, xs)) = m_one(x) ⨄ mset_of(xs)
}

So we express the requirements on the contents of insert(xs, y) as follows: converting insert(xs, y) into a multiset is the same as converting xs into a multiset and then adding y. The proof is relatively straightforward, making use of several theorems about multisets from MultiSet.pf.

theorem insert_contents: all xs:List<Nat>. all y:Nat.
  mset_of(insert(xs,y)) = m_one(y) ⨄ mset_of(xs)
proof
  induction List<Nat>
  case empty {
    arbitrary y:Nat
    conclude mset_of(insert(empty,y)) = m_one(y) ⨄ mset_of(empty)
        by definition {insert, mset_of, mset_of}
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat
    switch y ≤ x for insert {
      case true suppose yx_true {
        conclude mset_of(node(y,node(x,xs'))) = m_one(y) ⨄ mset_of(node(x,xs'))
            by definition {mset_of, mset_of}
      }
      case false suppose yx_false {
        equations
              mset_of(node(x,insert(xs',y))) 
            = m_one(x) ⨄ mset_of(insert(xs',y))
              by definition mset_of
        ... = m_one(x) ⨄ (m_one(y) ⨄ mset_of(xs'))
              by rewrite IH[y]
        ... = (m_one(x) ⨄ m_one(y)) ⨄ mset_of(xs')
              by rewrite m_sum_assoc<Nat>[m_one(x),m_one(y),mset_of(xs')]
        ... = (m_one(y) ⨄ m_one(x)) ⨄ mset_of(xs')
              by rewrite m_sum_commutes<Nat>[ m_one(x), m_one(y)]
        ... = m_one(y) ⨄ (m_one(x) ⨄ mset_of(xs'))
              by rewrite m_sum_assoc<Nat>[m_one(y),m_one(x),mset_of(xs')]
        ... = m_one(y) ⨄ mset_of(node(x,xs'))
              by definition mset_of
      }
    }
  }
end

Our second task is to prove that insert produces a sorted list, assuming the input list is sorted.

theorem insert_sorted: all xs:List<Nat>. all y:Nat.
  if sorted(xs) then sorted(insert(xs, y))
proof
  ?
end

Because insert is a recursive function, we proceed by induction on its first argument xs.

  induction List<Nat>

The case for xs = empty is a straightforward use of definitions.

    // <<insert_sorted_case_empty>> =
    arbitrary y:Nat
    suppose _
    conclude sorted(insert(empty,y))
        by definition {insert, sorted, sorted, all_elements}

Here’s the beginning of the case for xs = node(x, xs').

  case node(x, xs') suppose IH {
    arbitrary y:Nat
    suppose s_xxs: sorted(node(x,xs'))
    suffices sorted(insert(node(x,xs'),y))  by .
    ?
  }

In the goal we see an opportunity to use the definition of insert. However, insert branches on whether y ≤ x, so we use a switch-for statement to do the same in our proof.

  switch y ≤ x for insert {
    case true suppose yx_true {
      ?
    }
    case false suppose yx_false {
      ?
    }
  }

In the case when y ≤ x is true, we apply the relevant definitions to arive at the following four subgoals.

    suffices sorted(xs') 
         and all_elements(xs', λb{x ≤ b}) 
         and y ≤ x
         and all_elements(xs', λb{y ≤ b})
             with definition {sorted, sorted, all_elements}

The first two of these follows from the premise sorted(node(x,xs')).

  // <<insert_sorted_case_node_s_xs__x_le_xs>> =
  have s_xs: sorted(xs') by definition sorted in s_xxs
  have x_le_xs': all_elements(xs',λb{(x ≤ b)}) by definition sorted in s_xxs

The third is true in the current case.

  // <<insert_sorted_y_le_x>> =
  have y_le_x: y ≤ x by rewrite yx_true

The fourth, which states that y is less-or-equal all the elements in xs' follows transitively from y ≤ x and the that x is less-or-equal all the elements in xs' (x_le_xs') using the theorem all_elements_implies (in List.pf):

theorem all_elements_implies: 
  all T:type. all xs:List<T>. all P: fn T -> bool, Q: fn T -> bool.
  if all_elements(xs,P) and (all z:T. if P(z) then Q(z)) 
  then all_elements(xs,Q)

To satisfy the second premise of all_elements_implies, we use y ≤ x to prove that if x is less than any other element, then so is y.

  // <<insert_sorted_x_le_implies_y_le>> =
  have x_le_implies_y_le: all z:Nat. (if x ≤ z then y ≤ z)
    by arbitrary z:Nat  suppose x_le_z: x ≤ z
       conclude y ≤ z by apply less_equal_trans[y][x,z] to y_le_x , x_le_z

Now we apply all_elements_implies to prove all_elements(xs',λb{(y ≤ b)}).

  // <<insert_sorted_y_le_xs>> =
  have y_le_xs': all_elements(xs',λb{(y ≤ b)})
    by apply all_elements_implies<Nat>[xs']
             [λb{(x ≤ b)} : fn Nat->bool, λb{(y ≤ b)} : fn Nat->bool]
       to x_le_xs', x_le_implies_y_le

and then conclude this case for when y ≤ x.

  // <<insert_sorted_case_node_le_conclusion>> =
  s_xs, x_le_xs', y_le_x, y_le_xs'

Next we turn our attention to the case for when y ≤ x is false. After applying the definition of insert, Deduce tells us that we need to prove.

    // <<insert_sorted_case_node_g_def>> =
    suffices sorted(insert(xs',y)) 
         and all_elements(insert(xs',y),λb{x ≤ b})
             with definition sorted

The first follows from the induction hypothesis. (Though we need to move the proof of s_xs out of the y ≤ x case so that we can use it here.)

  // <<insert_sorted_s_xs_y>> =
  have s_xs'_y: sorted(insert(xs',y)) by apply IH[y] to s_xs

The second requires more thinking. We know that x ≤ y in this case by the following reasoning.

  // <<insert_sorted_x_le_y>> =
  have x_le_y: x ≤ y
      by have not_yx: not (y ≤ x)  by suppose yx rewrite yx_false in yx
         apply not_less_equal_less_equal to not_yx

We have already proved that x is less-or-equal all the elements in xs'. So we know that x is less-or-equal all the element in node(y, xs') by the definition of all_elements.

  // <<insert_sorted_x_le_y_xs>> =
  have x_le_y_xs': all_elements(node(y, xs'),λb{(x ≤ b)})
      by suffices x ≤ y and all_elements(xs', λb{x ≤ b}) 
              with definition all_elements
         x_le_y, x_le_xs'

However, what we need to prove is that x is less-or-equal to insert(xs', y). But the all_elements function shouldn’t care about the ordering of the elements in the list, and indeed there is the following theorem in List.pf:

theorem all_elements_set_of:
  all T:type, xs:List<T>, ys:List<T>, P:fn T -> bool.
  if set_of(xs) = set_of(ys)
  then all_elements(xs, P) = all_elements(ys, P)

So we need to show that set_of(insert(xs',y)) = set_of(node(y,xs')). Thankfully, we already showed that this is true for mset_of in the insert_contents theorem, and multiset equality implies set equality: (also from List.pf)

theorem mset_equal_implies_set_equal: 
  all T:type, xs:List<T>, ys:List<T>.
  if mset_of(xs) = mset_of(ys)
  then set_of(xs) = set_of(ys)

So we use these three theorems to prove the following.

theorem all_elements_insert_node:
  all xs:List<Nat>, x:Nat, P:fn Nat->bool.
  all_elements(insert(xs,x), P) = all_elements(node(x,xs), P)
proof
  arbitrary xs:List<Nat>, x:Nat, P:fn Nat->bool
  have m_xs_x: mset_of(insert(xs, x)) = mset_of(node(x, xs))
      by suffices mset_of(insert(xs, x)) = m_one(x) ⨄ mset_of(xs)
             with definition mset_of
         insert_contents[xs][x]
  have ixsx_xxs: set_of(insert(xs, x)) = set_of(node(x, xs))
     by apply mset_equal_implies_set_equal<Nat>[insert(xs, x), node(x, xs)] 
        to m_xs_x
  apply all_elements_set_of<Nat>[ insert(xs,x), node(x, xs), P]
  to ixsx_xxs
end

We apply this theorem to prove that x is less-or-equal all the elements in insert(xs',y).

  // <<insert_sorted_x_le_xs_y>> =
  have x_le_xs'_y: all_elements(insert(xs',y), λb{x ≤ b})
      by _rewrite all_elements_insert_node[xs',y,λb{x≤b}:fn Nat->bool]
         x_le_y_xs'

Now we have the two facts we need to conclude this final case of proof of insert_sorted.

  // <<insert_sorted_case_node_g_conclusion>> =
  conclude sorted(insert(xs',y)) and
           all_elements(insert(xs',y),λb{x ≤ b})
      by s_xs'_y, x_le_xs'_y

Here is the complete proof of insert_sorted.

theorem insert_sorted: all xs:List<Nat>. all y:Nat.
  if sorted(xs) then sorted(insert(xs, y))
proof
  induction List<Nat>
  case empty {
    <<insert_sorted_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat
    suppose s_xxs: sorted(node(x,xs'))
    suffices sorted(insert(node(x,xs'),y))  by .
    <<insert_sorted_case_node_s_xs__x_le_xs>>
    switch y ≤ x for insert {
      case true suppose yx_true {
        suffices sorted(node(y,node(x,xs')))  by .
        <<insert_sorted_case_node_less_defs>>
        <<insert_sorted_y_le_x>>
        <<insert_sorted_x_le_implies_y_le>>
        <<insert_sorted_y_le_xs>>
        <<insert_sorted_case_node_le_conclusion>>
      }
      case false suppose yx_false {
        <<insert_sorted_case_node_g_def>>
        <<insert_sorted_s_xs_y>>
        <<insert_sorted_x_le_y>>
        <<insert_sorted_x_le_y_xs>>
        <<insert_sorted_x_le_xs_y>>
        <<insert_sorted_case_node_g_conclusion>>
      }
    }
  }
end

Prove the correctness of `insertion_sort`

Referring back at the specification of insertion_sort(xs), we need to prove that (1) it outputs a list that contains the same elements as xs, and (2) the output is sorted.

As we did for insert, we use multisets and mset_of to express the requirement o the contents of the output of insertion_sort.

theorem insertion_sort_contents: all xs:List<Nat>.
  mset_of(insertion_sort(xs)) = mset_of(xs)

The insertion_sort(xs) function is recursive, so we proceed by induction on xs. In the case for xs = empty, we conclude the following using the definitions of insertion_sort and mset_of.

    // <<insertion_sort_contents_case_empty>> =
    conclude mset_of(insertion_sort(empty)) = mset_of(empty)
      by definition {insertion_sort, mset_of}

In the case for xs = node(x, xs'), after applying the definitions of insertion_sort and mset_of, it suffices show that

    // <<insertion_sort_contents_case_node_defs>> =
    suffices mset_of(insert(insertion_sort(xs'),x)) 
           = m_one(x) ⨄ mset_of(xs')
        with definition {insertion_sort, mset_of}

The goal follows from the insert_contents theorem and the induction hypothesis as follows.

  // <<insertion_sort_contents_case_node_equations>> =
  equations
          mset_of(insert(insertion_sort(xs'),x)) 
        = m_one(x) ⨄ mset_of(insertion_sort(xs'))
          by insert_contents[insertion_sort(xs')][x]
    ... = m_one(x) ⨄ mset_of(xs')
          by rewrite IH

Here is the complete proof of insertion_sort_contents.

theorem insertion_sort_contents: all xs:List<Nat>.
  mset_of(insertion_sort(xs)) = mset_of(xs)
proof
  induction List<Nat>
  case empty {
    <<insertion_sort_contents_case_empty>>
  }
  case node(x, xs') suppose IH {
    <<insertion_sort_contents_case_node_defs>>
    <<insertion_sort_contents_case_node_equations>>
  }
end

Finally, we prove that insertion_sort(xs) produces a sorted list. Of course the proof is by induction on xs. The case for empty follows from the relevant definitions. The case for node(x, xs') follows from the insert_sorted theorem and the induction hypothesis.

theorem insertion_sort_sorted: all xs:List<Nat>. 
  sorted( insertion_sort(xs) )
proof
  induction List<Nat>
  case empty {
    conclude sorted(insertion_sort(empty))
        by definition {insertion_sort, sorted}
  }
  case node(x, xs') suppose IH: sorted( insertion_sort(xs') ) {
    suffices sorted(insert(insertion_sort(xs'),x))
        with definition {insertion_sort, sorted}
    apply insert_sorted[insertion_sort(xs')][x] to IH
  }
end

Exercise: tail-recursive variant of `insertion_sort`

The insertion_sort function uses more computer memory than necessary because it uses one frame on the procedure call stack for every element in the input list. This can be avoided if we instead implement Insertion Sort using a tail-recursive function. That is, as a function that immediately returns after the recursive call. For this exercise, formulate a tail recursive version of insertion_sort, test it, and prove that it is correct.

As a hint, define an auxiliary function isort(xs,ys) that takes a list xs and an already sorted list ys and returns a sorted list that includes the contents of both xs and ys.

function isort(List<Nat>, List<Nat>) -> List<Nat> {
  FILL IN HERE
}

Once you have defined isort, you can implement Insertion Sort as follows.

define insertion_sort2 : fn List<Nat> -> List<Nat>
    = λxs{ isort(xs, empty) }

To prove the correctness of insertion_sort2, prove that the result is sorted

theorem insertion_sort2_sorted: all xs:List<Nat>. 
  sorted( insertion_sort2(xs) )
proof
  ?
end

and prove that the output includes all of the same elements as in the input (the correct number of times).

theorem insertion_sort2_contents: all xs:List<Nat>. 
  mset_of( insertion_sort2(xs) ) = mset_of(xs)
proof
  ?
end

Sequential Search, Correctly

2024-06-14T13:20:00.000-07:00

Sequential Search

This is the second blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

In this blog post we’ll study a classic and simple algorithm known as Sequential Search (aka. Linear Search). The basic idea of the algorithm is to look for the location of a particular item within a linked list, traversing the list front to back. Here is the specification of this search function.

Specification: The search(xs, y) function returns a natural number i such that i ≤ length(xs). If i < length(xs), then i is the index of the first occurence of y in the list xs. If i = length(xs), y is not in the list xs.

We follow the write-test-prove approach to develop a correct implementation of search. We then propose two exercises for the reader.

Write the `search` function

Before diving into the code for search, let us look again at the definition of the List type.

union List<T> {
  empty
  node(T, List<T>)
}

We say that List is a recursive union because one of its constructors has a parameter that is also of the List type (e.g. the second parameter of the node constructor).

In general, when defining a function with a parameter that is a recursive union, first consider making that function a recursive function that pattern-matches on that parameter.

For example, with search, we choose for the List<Nat> to be the first parameter so that we can pattern-match on it as follows.

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = ?
  search(node(x, xs'), y) = ?
}

Let us consider the case for the empty list. Looking at the specification of search, we need to return 0, the length of the empty list, because y is not in the empty list.

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = 0
  search(node(x, xs'), y) = ?
}

In the case for node(x, xs'), we can check whether x = y. If so, then we should return 0 because y is at index 0 of node(x, xs') and that is certainly the first occurence of y in node(x, xs').

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = 0
  search(node(x, xs'), y) =
    if x = y then
      0
    else
      ?
}

If x ≠ y, then we need to search the rest of the list xs' for y. We can make the recursive call search(xs', y), but then we need to decide how to adapt its result to produce a result that makes sense for node(x, xs'). The only way to reason about the result of a recursive call is to use the specification of the function. The specification of search splits into two cases on the result: (1) search(xs', y) < length(xs') and (2) length(xs) ≤ search(xs', y). In case (1), search(xs',y) is returning the index of the first y inside xs'. Because x ≠ y, that location will also be the first y inside node(x, xs'). However, we need to add one to the index to take into account that we’re adding a node to the front of the list. So for case (1), the result should be suc(search(xs', y)). In case (2), search(xs',y) did not find y in xs', so it is returning length(xs'). Because x ≠ y, we need to indicate that y is also not found in node(x, xs'), so we need to return length(node(x, xs')). Thus, we need to add one to the index, so the result should again be suc(search(xs', y)).

Here is the completed code for search.

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = 0
  search(node(x, xs'), y) =
    if x = y then
      0
    else
      suc(search(xs', y))
}

Test the `search` function

Focusing on the specification of search, there are several things that we should test. First, we should test whether search always returns a number that is less-or-equal to the length of the list. We can use all_elements and interval to automate the testing over a bunch of values, some of which are in the list and some are not.

define list_1223 = node(1, node(2, node(2, node(3, empty))))

assert all_elements(interval(0, 5),
  λx{ search(list_1223, x) ≤ length(list_1223) })

Most importantly, we should test whether search finds the correct index of the elements in the list. To do that we can make use of nth to lookup the element at a given index.

assert all_elements(list_1223,
  λx{ nth(list_1223, 0)( search(list_1223, x) ) = x })

Next, we should test whether search finds the first occurence. We can do this by iterating over all the indexes and checking that what search returns is an index that is less-than or equal to the current index.

assert all_elements(interval(0, length(list_1223)),
   λi{ search(list_1223, nth(list_1223, 0)(i)) ≤ i })

Finally, we check that search fails gracefully when the value being searched for is not present in the list.

assert search(list_1223, 0) = length(list_1223)
assert search(list_1223, 4) = length(list_1223)

Prove `search` Correct

We break down the specification of search into four parts and prove four theorems.

Prove `search` is less-or-equal `length`

The first part of the specification of search says that the search(xs, y) function returns a natural number i such that i ≤ length(xs). Because search is recursive, we’re going to prove this by induction on its first parameter xs.

theorem search_length: all xs:List<Nat>. all y:Nat.
  search(xs, y) ≤ length(xs)
proof
  induction List<Nat>
  case empty {
    ?
  }
  case node(x, xs') 
    suppose IH: all y:Nat. search(xs',y) ≤ length(xs') 
  {
    ?
  }
end

In the case for xs = empty, Deduce tells us that we need to prove

Goal:
    all y:Nat. search(empty,y) ≤ length(empty)

So we start with arbitrary y:Nat and then conclude using the definitions of search, length, and operator ≤.

    // <<search_length_case_empty>> =
    arbitrary y:Nat
    conclude search(empty,y) ≤ length[Nat](empty)
        by definition {search, length, operator ≤}

In these blog post we use a literate programming tool named Entangled to translate the markdown files into Deduce proof files. Entangled lets us label chunks of proof and then paste them into larger proofs. So that you can see the label names, we include them in comments, as in the <<search_length_case_empty>> label above.

In the case for xs = node(x, xs'), Deduce tells us that we need to prove

Goal:
    all y:Nat. search(node(x,xs'),y) ≤ length(node(x,xs'))

So we start with arbitrary y:Nat and note that the definitions of search has an if-then-else, so we proceed with a switch-for statement.

    arbitrary y:Nat
    switch x = y for search {
      case true {
        ?
      }
      case false {
        ?
      }
    }

In the case for x = y, the goal becomes

0 ≤ length(node(x, xs'))

so we need to use the definition of length and then we can complete the proof using the definition of ≤.

    // <<search_length_case_node_eq>> =
    suffices 0 ≤ 1 + length(xs')  with definition length
    definition operator ≤

In the case for x ≠ y, after applying the definitions of length, ≤, and +, it remains to prove that search(xs', y) ≤ length(xs'). But that is just the induction hypothesis

    // <<search_length_case_node_not_eq>> =
    suffices search(xs', y) ≤ length(xs')
        with definition {length, operator ≤, operator+, operator+}
    IH[y]

Putting all of the pieces together, we have a complete proof of search_length.

theorem search_length: all xs:List<Nat>. all y:Nat.
  search(xs, y) ≤ length(xs)
proof
  induction List<Nat>
  case empty {
    <<search_length_case_empty>>
  }
  case node(x, xs') 
    suppose IH: all y:Nat. search(xs',y) ≤ length(xs') 
  {
    arbitrary y:Nat
    switch x = y for search {
      case true {
        <<search_length_case_node_eq>>
      }
      case false {
        <<search_length_case_node_not_eq>>
      }
    }
  }
end

Prove `search(xs, y)` finds an occurence of `y`

The specification of search(xs, y) says that if the result is less-than length(xs), then the result is the index of the first occurence of y in xs. First off, this means that search(xs, y) is indeed an index for y, which we can express using nth as follows.

nth(xs, 0)( search(xs, y) ) = y

So we can formulate the following theorem, which we’ll prove by induction on xs.

theorem search_present: all xs:List<Nat>. all y:Nat.
  if search(xs, y) < length(xs)
  then nth(xs, 0)( search(xs, y) ) = y
proof
  induction List<Nat>
  case empty {
    ?
  }
  case node(x, xs') suppose IH {
    ?
  }
end

In the case for xs = empty, we proceed in a goal-directed way using arbitrary for the all y and then suppose for the if.

    arbitrary y:Nat
    suppose prem: search(empty,y) < length[Nat](empty)
    ?

Then we need to prove

nth(empty, 0)(search(empty, y)) = y

but that looks impossible! So hopefully the premise is also false, which will let us finish this case using the principle of explosion. Indeed, applying all of the relevant definitions to the premise yields false.

    arbitrary y:Nat
    suppose prem: search(empty,y) < length[Nat](empty)
    conclude false by definition {search, length, operator <, operator ≤} 
                      in prem

Moving on to the case for xs = node(x, xs'), we again begin with arbitrary and suppose.

    arbitrary y:Nat
    suppose sxxs_len: search(node(x,xs'),y) < length(node(x,xs'))
    ?

Deduce tells us that we need to prove

Goal:
    nth(node(x,xs'),0)(search(node(x,xs'),y)) = y

We see search applied to a node argument and note that again that the body of search contains an if-then-else, so we proceed with a switch-for statement.

    switch x = y for search {
      case true suppose xy_true {
        ?
      }
      case false suppose xy_false {
        ?
      }
    }

In the case where x = y, Deduce tells us that we need to prove

Goal:
    nth(node(x,xs'),0)(0) = y

We conclude using the definition of nth and the fact that x = y.

    suffices x = y with definition nth
    rewrite xy_true

In the case where x ≠ y, we need to prove

Goal:
    nth(node(x,xs'),0)(suc(search(xs',y))) = y

Now if we apply the definitions of nth and pred, the goal becomes:

    // <<search_present_case_node_nth_pred>> =
    suffices nth(xs', 0)(search(xs', y)) = y
        with definition {nth, pred}

This looks a lot like the conclusion of our induction hypothesis:

Givens:
    ...
    IH: all y:Nat. (if search(xs',y) < length(xs') 
                    then nth(xs',0)(search(xs',y)) = y)

So we just need to prove the premise of the IH, that search(xs',y) < length(xs'). Thankfully, that can be proved from the premise search(node(x,xs'),y) < length(node(x,xs')).

  // <<search_present_IH_premise>> =
    have sxs_len: search(xs',y) < length(xs')
      by enable {search, length, operator <, operator ≤, 
                 operator+, operator+}
         rewrite xy_false in sxxs_len

We conclude by applying the induction hypothesis.

  conclude nth(xs',0)(search(xs',y)) = y
    by apply IH[y] to sxs_len

Here is the the complete proof of search_present.

theorem search_present: all xs:List<Nat>. all y:Nat.
  if search(xs, y) < length(xs)
  then nth(xs, 0)( search(xs, y) ) = y
proof
  induction List<Nat>
  case empty {
    <<search_present_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat
    suppose sxxs_len: search(node(x,xs'),y) < length(node(x,xs'))
    switch x = y for search {
      case true suppose xy_true {
        <<search_present_case_node_eq>>
      }
      case false suppose xy_false {
        <<search_present_case_node_nth_pred>>
        <<search_present_IH_premise>>
        <<search_present_apply_IH>>
      }
    }
  }
end

Prove `search(xs, y)` finds the first occurence of `y`

Going back to the specification of search(xs, y), it says that if the result is less-than length(xs), then the result is the index of the first occurence of y in xs. To be the first means that the result is smaller than the index of any other occurence of y. We express that in the following theorem.

theorem search_first: all xs:List<Nat>. all y:Nat, i:Nat.
  if search(xs, y) < length(xs) and nth(xs, 0)(i) = y
  then search(xs, y) ≤ i

We proceed by induction on xs. We can handle the case for xs = empty in the same way as in search_present; the premise is false.

    // <<search_first_case_empty>> =
    arbitrary y:Nat, i:Nat
    suppose prem: search(empty,y) < length[Nat](empty) and nth(empty,0)(i) = y
    conclude false by definition {search, length, operator <, operator ≤} 
                      in prem

In the case for xs = node(x, xs'), we proceed in a goal-directed fashion with an arbitrary and suppose.

  case node(x, xs') suppose IH {
    arbitrary y:Nat, i:Nat
    suppose prem: search(node(x,xs'),y) < length(node(x,xs')) 
                  and nth(node(x,xs'),0)(i) = y,
    ?
  }

Deduce response with

Goal:
    search(node(x,xs'),y) ≤ i

We apply the definition of search and switch on x = y with a switch-for statement.

  switch x = y for search {
    case true {
      ?
    }
    case false suppose xs_false {
      ?
    }
  }

In the case where x = y, the result of search is 0, so just need to prove that 0 ≤ i, which follows from the definition of ≤.

    conclude 0 ≤ i   by definition operator ≤

In the case where x ≠ y, we need to prove

Goal:
    suc(search(xs',y)) ≤ i

What do we now about i? The premise nth(node(x,xs'),0)(i) = y tells us that i ≠ 0, which means that i is the successor of some other number i′.

    // <<search_first_case_node_false_1>> =
    have not_iz: not (i = 0)
      by suppose i_z 
         conclude false by rewrite i_z | xy_false in 
                           definition nth in prem
    obtain i' where i_si: i = suc(i') from apply not_zero_suc to not_iz
    suffices suc(search(xs', y)) ≤ suc(i')  with rewrite i_si

Now we can further simplify the goal with the definition of ≤.

    // <<search_first_case_node_false_2>> =
    suffices search(xs', y) ≤ i'   with definition operator≤

The goal looks like the conclusion of the induction hypothesis instantiated at i'.

Givens:
    ...
    IH: all y:Nat, i:Nat. (if search(xs',y) < length(xs') and nth(xs',0)(i) = y 
                           then search(xs',y) ≤ i)

So we need to prove the two premises of the IH. They follow from the given prem:

Givens:
    prem: search(node(x,xs'),y) < length(node(x,xs')) 
          and nth(node(x,xs'),0)(i) = y

In particular, the first premise of IH follows from the first conjunct of prem.

    // <<search_first_IH_prem_1>> =
    have IH_prem_1: search(xs',y) < length(xs')
      by enable {search, length, operator <, operator ≤, 
                 operator+, operator+}
         rewrite xy_false in (conjunct 0 of prem)

The second premise of the IH follows from the second conjunct of prem.

    // <<search_first_IH_prem_2>> =
    have IH_prem_2: nth(xs',0)(i') = y
      by enable {nth, pred} rewrite i_si in (conjunct 1 of prem)

We conclude the case for i = suc(i') by applying the induction hypothesis.

    // <<search_first_apply_IH>> =
    apply IH[y,i'] to IH_prem_1, IH_prem_2

Here is the complete proof of search_first.

theorem search_first: all xs:List<Nat>. all y:Nat, i:Nat.
  if search(xs, y) < length(xs) and nth(xs, 0)(i) = y
  then search(xs, y) ≤ i
proof
  induction List<Nat>
  case empty {
    <<search_first_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat, i:Nat
    suppose prem: search(node(x,xs'),y) < length(node(x,xs')) 
                  and nth(node(x,xs'),0)(i) = y
    switch x = y for search {
      case true {
        <<search_first_case_node_true>>
      }
      case false suppose xy_false {
        <<search_first_case_node_false_1>>
        <<search_first_case_node_false_2>>
        <<search_first_IH_prem_1>>
        <<search_first_IH_prem_2>>
        <<search_first_apply_IH>>
      }
    }
  }
end

Prove that search fails only when it should

The last sentence in the specification for search(xs, y) says that if i = length(xs), y is not in the list xs. How do we express that y is not in the list? In some sense, that is what search is for, but it would be vacuous to prove a theorem that says search returns length(xs) if search returns lengt(xs). Instead we need an alternative and intuitive way to express membership in a list.

One approach to expressing list membership that works well is to convert the list to a set and then use set membership. The file Set.pf defines the Set type, operations on sets such as memberhsip, union, and intersection. The Set.pf files also proves many theorems about these operations. The following set_of function converts a list into a set.

function set_of<T>(List<T>) -> Set<T> {
  set_of(empty) = ∅
  set_of(node(x, xs)) = single(x) ∪ set_of(xs)
}

We can now express our last correctness theorem for search as follows.

theorem search_absent: all xs:List<Nat>. all y:Nat, d:Nat.
  if search(xs, y) = length(xs)
  then not (y ∈ set_of(xs))

We proceed by induction on xs. In the case for xs = empty, we take the following goal-directed steps

  case empty {
    arbitrary y:Nat, d:Nat
    suppose _
    ?
  }

and Deduce responds with

Goal:
    not y ∈ set_of(empty)

which we prove using the definition of set_of and the empty_no_members theorem from Set.pf.

    // <<search_absent_case_empty>> =
    arbitrary y:Nat, d:Nat
    suppose _
    suffices not (y ∈ ∅) with definition set_of
    empty_no_members[Nat,y]

Turning to the case for xs = node(x, xs'), we take several goal-directed steps.

  case node(x, xs') suppose IH {
    arbitrary y:Nat, d:Nat
    suppose s_xxs_len_xxs: search(node(x,xs'),y) = length(node(x,xs'))
    suffices not (y ∈ single(x) ∪ set_of(xs'))  with definition set_of
    ?
  }

Now we need to prove a not formula:

Goal:
    not (y ∈ single(x) ∪ set_of(xs'))

So we assume y ∈ single(x) ∪ set_of(xs') and then prove false (a contradiction).

  suppose y_in_x_union_xs: y ∈ single(x) ∪ set_of(xs')

The main information we have to work with is the premise s_xxs_len_xxs above, concerning search(node(x,xs'), y). Thinking about the code for search, we know it will branch on whether x = y, so we better switch on that.

  switch x = y {
    case true suppose xy {
      ?
    }
    case false suppose not_xy {
      ?
    }
  }

In the case where x = y, we have search(node(x,xs'),y) = 0 but length(node(x,xs')) is 1 + length(xs'), so we have a contradiction.

    // <<search_absent_case_node_equal>> =
    have xy: x = y by rewrite xy_true
    have s_yxs_len_yxs: search(node(y,xs'),y) = length(node(y,xs'))
        by rewrite xy in s_xxs_len_xxs
    have zero_1_plus: 0 = 1 + length(xs')
        by definition {search, length} in s_yxs_len_yxs
    conclude false  by definition {operator+} in zero_1_plus

In the case where x ≠ y, we can show that y ∈ set_of(xs') and then invoke the induction hypothesis to obtain the contradition. In particular, the premise y_in_x_union_xs gives us y ∈ single(x) or y ∈ set_of(xs'). But x ≠ y implies not (y ∈ single(x)). So it must be that y ∈ set_of(xs') (using or_not from Base.pf).

  // <<search_absent_case_node_notequal_y_in_xs>> =
  have ysx_or_y_xs: y ∈ single(x) or y ∈ set_of(xs')
      by apply member_union[Nat] to y_in_x_union_xs
  have not_ysx: not (y ∈ single(x))
    by suppose ysx
       rewrite xy_false in
       apply single_equal[Nat] to ysx
  have y_xs: y ∈ set_of(xs')
    by apply or_not[y ∈ single(x), y ∈ set_of(xs')] 
       to ysx_or_y_xs, not_ysx

To satisfy the premise of the induction hypothesis, we prove the following.

    // <<search_absent_IH_prem>> =
    have sxs_lxs: search(xs',y) = length(xs')
      by injective suc
         rewrite xy_false in
         definition {search,length,operator+,operator+} in
         s_xxs_len_xxs

So we apply the induction hypothesis to get y ∉ set_of(xs'), which contradicts y ∈ set_of(xs).

  // <<search_absent_apply_IH>> =
  have y_not_xs: not (y ∈ set_of(xs'))
    by apply IH[y,d] to sxs_lxs
  conclude false  by apply y_not_xs to y_xs

Here is the complete proof of search_absent.

theorem search_absent: all xs:List<Nat>. all y:Nat, d:Nat.
  if search(xs, y) = length(xs)
  then not (y ∈ set_of(xs))
proof
  induction List<Nat>
  case empty {
    <<search_absent_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat, d:Nat
    suppose s_xxs_len_xxs: search(node(x,xs'),y) = length(node(x,xs'))
    suffices not (y ∈ single(x) ∪ set_of(xs'))  with definition set_of
    suppose y_in_x_union_xs: y ∈ single(x) ∪ set_of(xs')
    switch x = y {
      case true suppose xy_true {
        <<search_absent_case_node_equal>>
      }
      case false suppose xy_false {
        <<search_absent_case_node_notequal_y_in_xs>>
        <<search_absent_IH_prem>>
        <<search_absent_apply_IH>>
      }
    }
  }
end

Exercise `search_last`

Apply the write-test-prove approach to develop a correct implementation of the search_last(xs, y) function, which is like search(xs, y) except that it finds the last occurence of y in xs instead of the first.

In particular, you need to

write a specification for search_last,
write the code for search_last,
test search_last on diverse inputs, and
prove that search_last is correct.

function search_last(List<Nat>, Nat) -> Nat {
    FILL IN HERE
}

Exercise `search_if`

The search_if(xs, P) function is a generalization of search(xs, y). Instead of searching for the first occurence of element y, the search_if function searches for the location of the first element that satisfied predicate P (i.e. an element y in xs such that P(y) is true). Apply the write-test-prove approach to develop a correct implementation of search_if.

In particular, you need to

write a specification for search_if,
write the code for search_if,
test search_if on diverse inputs, and
prove that search_if is correct.

function search_if<T>(List<T>, fn T->bool) -> Nat {
    FILL IN HERE
}

Data Structures and Algorithms, Correctly

2024-06-12T08:44:00.000-07:00

Prelude

This is the first in what I hope to be a sequence of blog posts about (1) data structures and algorithms, (2) an approach to constructing correct code, and (3) achieving a deeper understanding of testing, logic, and proof, all of which are needed for constructing correct code. These blog posts take a functional programming approach to data structures and algorithms because, in that setting, there are software tools that make sure that our proofs about correctness are themselves correct! In particular, these posts will use the Deduce language for writing programs, testing them, and proving theorems. Unlike most functional languages and proof assistants, the syntax of Deduce is meant to be easy to learn for people familiar with languages such as Java or Python. The README.md file in the Deduce github repository provides an introduction to Deduce. We recommend reading that first.

https://github.com/jsiek/deduce/tree/main

These blog posts will cover a limited number of the data structures and algorithms, as the pace will be slower due to the increased focus on correctness. The rough plan is to cover the following topics.

Linked Lists (this post)
Sequential Search
Insertion Sort
Merge Sort
Binary Trees (Part 1)
Binary Trees (Part 2)
Binary Search Trees
Balanced Binary Search Trees
Heaps and Priority Queues

Introduction to Linked Lists

A linked list is a data structure that represents a sequence of elements. Each element is stored inside a node and each node also stores a link to the next node, or to the special empty value that signifies the end of the list. In Deduce we can implement linked lists with the following union type.

union List<T> {
  empty
  node(T, List<T>)
}

For example, the sequence of numbers 1, 2, 3 is represented by the following linked list.

define list_123 : List<Nat> = node(1, node(2, node(3, empty)))

Next we introduce two fundamental operations on linked lists. The first operation is length, which returns the number of elements in a given list. The length of an empty list is 0 and the length of a list that starts with a node is one more than the length of the list starting at the next node.

function length<E>(List<E>) -> Nat {
  length(empty) = 0
  length(node(n, next)) = 1 + length(next)
}

Of course, the length of list_123 is 3. We can ask Deduce to check this fact using the assert statement.

assert length(list_123) = 3

The return type of length is Nat which stands for natural number (that is, the non-negative integers).

import Nat

The second fundamental operation on linked lists is nth(xs,d)(i), which retrieves the element at position i in the list xs. However, if i is greater or equal to the length of xs, then nth returns the default value d. The pred(n) function is short for predecessor and computes n - 1, except that pred(0) = 0.

function nth<T>(List<T>, T) -> (fn Nat -> T) {
  nth(empty, default) = λi{default}
  nth(node(x, xs), default) = λi{
    if i = 0 then
      x
    else
      nth(xs, default)(pred(i))
  }
}

Here are examples of applying nth to the list 1, 2, 3, using 0 as the default value.

assert nth(list_123, 0)(0) = 1
assert nth(list_123, 0)(1) = 2
assert nth(list_123, 0)(2) = 3
assert nth(list_123, 0)(3) = 0

We have formulated the nth operation in an unusual way. It has two parameters and returns a function of one parameter that returns an element T. We could have instead made nth take three parameters and directly return an element T. We made this design choice because it means we can use nth with several other functions and theorems that work with functions of the type fn Nat -> T.

Correct Software via Write, Test, and Prove

We recommend a three step process to constructing correct software.

Write down the specification and the code for a subcomponent, such as a function,
Test the function on a diverse choice of inputs. If all the tests pass, proceed to step 3, otherwise return to step 1.
Prove that the function is correct with respect to its specification.

We recognize that once step 3 is complete, step 2 is obsolete because a proof of correctness supersedes any amount of testing. However there is a good reason to perform testing even when you are planning to do a proof of correctness. More often than not, your code will have one or more bugs. Testing is a fast way to detect most of the bugs. When you detect a bug, you’ll need to revise the code and then re-run the tests. On the other hand, proving correctness is a much slower way to detect bugs. You will spend a relatively long time to get part-way through a proof and realize that there is no way to finish. You’ll then need to revise the code. But because of the changes in the code, much of the proof will need to change. So you’ll spend a significant amount of time refactoring the parts of the proof that you’ve already completed before continuing on to the new parts. Therefore, to reduce the number of relatively-costly proof attempts, it is a good idea to first spend a relatively short amount of time to test and fix the code.

Example: Intervals

As an example of the write-test-prove approach, we consider the interval function.

Specification: interval(count, start) returns a list of natural numbers of length count, where the element at position i is i + start.

For example, interval(3,5) produces the list 5, 6, 7:

assert interval(3, 5) = node(5, node(6, node(7, empty)))

Write `interval`

A straightforward way to implement interval in Deduce is to define it as a function that pattern-matches on the count. The suc(n) constructor for natural numbers represents 1 + n and is short for successor.

function interval(Nat, Nat) -> List<Nat> {
  interval(0, n) = ?
  interval(suc(k), n) = ?
}

For the clause where count = 0, we must return a list of length 0. So our only choice is the empty list.

  interval(0, n) = empty

For the clause where count = suc(k), we must return a list of length suc(k). So it has at least one node.

  interval(suc(k), n) = node(?, ?)

The specification tells us that the element at position 0 of the return value is n + 0 or simply n.

  interval(suc(k), n) = node(n, ?)

The next of this node should be a list of length k that starts with the element n + 1. Thankfully we can construct such a list with a recursive call to interval.

  interval(suc(k), n) = node(n, interval(k, suc(n)))

Putting these pieces together, we have the following complete definition of interval.

function interval(Nat, Nat) -> List<Nat> {
  interval(0, n) = empty
  interval(suc(k), n) = node(n, interval(k, suc(n)))
}

Test `interval`

Let us test that our definition of interval is behaving the way we expect it to. In general, one should test many variations of each input to a function. Here we test with the values 0, 1 and 2 for the first parameter and 0 and 3 for the second parameter.

assert length(interval(0, 0)) = 0

assert length(interval(1, 0)) = 1
assert nth(interval(1, 0), 7)(0) = 0 + 0

assert length(interval(2, 0)) = 2
assert nth(interval(2, 0), 7)(0) = 0 + 0
assert nth(interval(2, 0), 7)(1) = 1 + 0

assert length(interval(0, 3)) = 0

assert length(interval(1, 3)) = 1
assert nth(interval(1, 3), 7)(0) = 0 + 3

assert length(interval(2, 3)) = 2
assert nth(interval(2, 3), 7)(0) = 0 + 3
assert nth(interval(2, 3), 7)(1) = 1 + 3

Yeah! All of these assert statements execute without error.

We have formulated these assert statements in a subtly different way than above. When we tested the length and nth functions, we wrote assert statements that compared the results to our expected output. Here we have instead written the assert statements based on the specification of interval(count, start). The specification says that the length of the output should be the same as the count parameter. So in the above we wrote assert statements that check whether the length is the same as the count. Furthermore, the specification says that the element at position i of the output is i + start. So we have used the nth function to check, for every position i in the output list, whether the element is i + start.

The benefit of writing tests based on the specification is that it reduces the possibility of discrepancies between the specification and the tests. After all, what it means for a function to be correct is that it behaves according to its specification, not that it passes some ad-hoc tests based on a loose interpretation of the specification.

In general, when a test fails, it often means that either the implementation of the function-under-test is incorrect, or the test itself is incorrect. A careful reading of the function’s specification will help you figure out which is at fault. Unfortunately, it is also possible for the specification to be incorrect! The good thing about the testing approach described here is that it helps to reveal inconsistencies between the specification, the tests, and the implementation.

Prove `interval` Correct

Once we have finished testing interval we can move on to proving that interval is correct for all inputs. Looking back at the specification of interval, there are two parts. We will prove each part with a separate theorem.

Prove the `interval_length` theorem

The first part of the specification says that interval(count, start) returns a list of length count. We want to prove that this is true for all possible choices of count and start, so we shall use Deduce’s all formula. Recall that there are two ways to prove an all formula in Deduce: 1) using arbitrary or 2) using induction. When proving a theorem about a recursive function, one typically needs to use induction for the first parameter of the function, in the case count. So our initial plan is to use induction for count and arbitrary for start. Because we are going to use different proof methods for each variable, we need to use a separate all formula for each one, as follows.

theorem interval_length:
  all count:Nat. all start:Nat. length(interval(count, start)) = count
proof
  ?
end

There is also the question of whether all count:Nat should come before or after all start:Nat. It is always safe to first choose the variable for which you’re using induction. If you make the other choice, the induction hypothesis will be weaker, which sometimes is convenient but other times prevents the proof from going through.

Now let us start the proof. We proceed by induction on the count.

theorem interval_length:
  all count:Nat. all start:Nat. length(interval(count, start)) = count
proof
  induction Nat
  case 0 {
    ?
  }
  case suc(count') suppose IH {
    ?
  }
end

In the case for count = 0, Deduce tells us that we need to prove

  all start:Nat. length(interval(0,start)) = 0

As mentioned earlier, we’ll use arbitrary for start.

  case 0 {
    arbitrary start:Nat
    ?
  }

So now we need to prove

  length(interval(0,start)) = 0

Of course, by definition we have interval(0,start) = empty and length(empty) = 0, so we can conclude using those definitions.

  case 0 {
    arbitrary start:Nat
    conclude length(interval(0, start)) = 0
        by definition {interval, length}
  }

Turning to the case count = suc(count'), Deduce tells us the goal for this case and the induction hypothesis.

incomplete proof
Goal:
    all start:Nat. length(interval(suc(count'),start)) = suc(count')
Givens:
    IH: all start:Nat. length(interval(count',start)) = count'

To improve readability of the proof, I often like to copy the formula for the induction hypothesis and paste it into the suppose as shown below.

  case suc(count') 
    suppose IH: all start:Nat. length(interval(count', start)) = count' 
  {
    ?
  }

For the proof of this case, we again start with arbitrary to handle all start then use the definitions of interval and length in a suffices statement. For that we need to write out the new goal, but to avoid figuring that out ourselves, we can start out by just using true (incorrectly) and then Deduce will give us the new goal in the error message.

  case suc(count')
    suppose IH: all start:Nat. length(interval(count', start)) = count'
  {
    arbitrary start:Nat
    suffices true
        with definition {interval, length}
    ?
  }

Deduce replies with

expected
1 + length(interval(count', suc(start))) = suc(count')
but only have
true

So we copy that into the suffices statment.

  case suc(count')
    suppose IH: all start:Nat. length(interval(count', start)) = count'
  {
    arbitrary start:Nat
    suffices 1 + length(interval(count', suc(start))) = suc(count')
        with definition {interval, length}
    ?
  }

Here is where the induction hypothesis IH comes to the rescue. If we instantiate the all start with suc(start), we get

length(interval(count',suc(start))) = count'

which is just what we need to conclude.

  case suc(count') 
    suppose IH: all start:Nat. length(interval(count', start)) = count' 
  {
    arbitrary start:Nat
    suffices 1 + length(interval(count', suc(start))) = suc(count')
        with definition {interval, length}
    rewrite suc_one_add[count'] | IH[suc(start)]
  }

Putting the two cases together, we have the following completed proof that the output of interval has the appropriate length.

theorem interval_length:
  all count:Nat. all start:Nat. length(interval(count, start)) = count
proof
  induction Nat
  case 0 {
    arbitrary start:Nat
    conclude length(interval(0, start)) = 0
        by definition {interval, length}
  }
  case suc(count')
    suppose IH: all start:Nat. length(interval(count', start)) = count' 
  {
    arbitrary start:Nat
    suffices 1 + length(interval(count', suc(start))) = suc(count')
        with definition {interval, length}
    rewrite suc_one_add[count'] | IH[suc(start)]
  }
end

Prove the `interval_nth` theorem

The second part of the specification of interval says that the element at position i of the output is i + start. Of course, there is no element at position i if i is too big, so our theorem needs to be conditional, with the premise i < count.

theorem interval_nth: all count:Nat. all start:Nat, d:Nat, i:Nat.
  if i < count
  then nth(interval(count, start), d)(i) = i + start
proof
   ?
end

Because this proof is about a recursive function whose first parameter is of type Nat, we proceed by induction on Nat.

  induction Nat
  case 0 {
    ?
  }
  case suc(count') suppose IH {
    ?
  }

In the case count = 0, Deduce tells us that we need to prove

all start:Nat, d:Nat, i:Nat.
    if i < 0 then nth(interval(0,start),d)(i) = i + start

So we can start the proof of this case with arbitrary and suppose, then use the definitions of interval and nth.

  case 0 {
    arbitrary start:Nat, d:Nat, i:Nat
    suppose i_l_z: i < 0
    suffices d = i + start  with definition {interval, nth}
    ?
  }

Now we are in a strange situation. The goal d = i + start seems rather difficult to prove because we don’t know anything about start and d. The givens (aka. assumptions) are also strange. How can the natural number i be less than 0? Of course it cannot. Thus, i < 0 implies false and then we can use the principle of explosion, which states that false implies anything, to prove that d = i + start.

  case 0 {
    arbitrary start:Nat, d:Nat, i:Nat
    suppose i_l_z: i < 0
    suffices d = i + start  with definition {interval, nth}
    conclude false  by definition {operator <, operator ≤} in i_l_z
  }

Next we turn to the case for count = suc(count'). Deduce tells us the formula for the induction hypothesis, so we paste that into the suppose IH. Directed by the goal formula, we begin the proof with arbitrary, suppose.

  case suc(count') 
    suppose IH: all start:Nat, d:Nat, i:Nat. 
        if i < count' then nth(interval(count',start),d)(i) = i + start
  {
    arbitrary start:Nat, d:Nat, i:Nat
    suppose i_l_sc: i < suc(count')
    ?
  }

So the goal becomes:

nth(interval(suc(count'), start), d)(i) = i + start

We spot the opportunity to expand the definition of interval.

    suffices nth(node(start, interval(count', suc(start))), d)(i) = i + start
        with definition interval
    ?

Next we would like to expand nth, but note that to resolve the if-then-else inside nth, we need to know whether i is 0 or not. So we switch on i and expand the definition of nth at the same time, using a switch-for statement.

  switch i for nth {
    case 0 {
      ?
    }
    case suc(i') suppose i_sc: i = suc(i') {
      ?
    }
  }

Let us proceed with the case for i = 0. Deduce responds with

Goal:
    start = 0 + start

which follows directly from the definition of addition.

    conclude start = 0 + start  by definition operator +

In the case for i = suc(i'), Deduce tells us that we need to prove

  nth(interval(count',suc(start)),d)(pred(suc(i'))) = suc(i') + start

This looks quite similar to the induction hypothesis instantiated with suc(start), d, and i':

  if i' < count' 
  then nth(interval(count',suc(start)),d)(i') = i' + suc(start)

One difference is pred(suc(i')) versus i', but they are equal by the definition of pred.

  case suc(i') suppose i_sc: i = suc(i') {
    suffices nth(interval(count',suc(start)),d)(i') = suc(i') + start
        with definition {pred}
    ?
  }

So if we use the induction hypothesis, then we will just need to prove that i' + suc(start) = suc(i') + start, which is certainly true and will just require a little reasoning about addition. But to use the induction hypothesis, we need to prove that i' < count'. This follows from the givens i_l_sc: i < suc(count') and i_sc: i = suc(i') and the definitions of < and ≤.

  case suc(i') suppose i_sc: i = suc(i') {
    suffices nth(interval(count',suc(start)),d)(i') = suc(i') + start
        with definition {pred}
    have i_l_cnt: i' < count'  by enable {operator <, operator ≤}
                                  rewrite i_sc in i_l_sc
    ?
  }

Now we can complete the proof of this case by linking together a few equations, starting with the induction hypothesis, then using the add_suc theorem from Nat.pf (which states that m + suc(n) = suc(m + n)), and finally using the definition of addition (which states that suc(n) + m = suc(n + m)).

  equations
    nth(interval(count',suc(start)),d)(i') 
        = i' + suc(start)        by apply IH[suc(start), d, i'] to i_l_cnt
    ... = suc(i' + start)        by add_suc[i'][start]
    ... = suc(i') + start        by definition operator +

Putting together all these pieces, we have the following complete proof of the interval_nth theorem. At this point we know that the interval function is 100% correct!

theorem interval_nth: all count:Nat. all start:Nat, d:Nat, i:Nat.
  if i < count
  then nth(interval(count, start), d)(i) = i + start
proof
  induction Nat
  case 0 {
    arbitrary start:Nat, d:Nat, i:Nat
    suppose i_l_z: i < 0
    suffices d = i + start  with definition {interval, nth}
    conclude false by definition {operator <, operator ≤} in i_l_z
  }
  case suc(count') 
    suppose IH: all start:Nat, d:Nat, i:Nat. 
        if i < count' then nth(interval(count',start),d)(i) = i + start
  {
    arbitrary start:Nat, d:Nat, i:Nat
    suppose i_l_sc: i < suc(count')
    suffices nth(node(start, interval(count', suc(start))), d)(i) = i + start
        with definition interval
    switch i for nth {
      case 0 {
        conclude start = 0 + start  by definition operator +
      }
      case suc(i') suppose i_sc: i = suc(i') {
        suffices nth(interval(count',suc(start)),d)(i') = suc(i') + start
            with definition {pred}
        have i_l_cnt: i' < count'  by enable {operator <, operator ≤}
                                      rewrite i_sc in i_l_sc
        equations
          nth(interval(count',suc(start)),d)(i') 
              = i' + suc(start)    by apply IH[suc(start), d, i'] to i_l_cnt
          ... = suc(i' + start)    by add_suc[i'][start]
          ... = suc(i') + start    by definition operator +
      }
    }
  }
end

Exercise: Define Append

Create a function named append that satisfies the following specification.

Specification append combines two lists into a single list. The elements of the output list must be ordered in a way that 1) the elements from the first input list come before the elements of the second list, and 2) the ordering of elements must preserve the internal ordering of each input.

function append<E>(List<E>, List<E>) -> List<E> {
  FILL IN HERE
}

Exercise: Test Append

Write assert statements to test the append function that you have defined. Formulate the assertions to closely match the above specification of above. Refer to the assertions that we wrote above to test interval to see an example of how to write the tests.

More Automation in Tests

An added benefit of formulating the assertions based on the specification is that it enables us to automate our testing. In the following code we append the list 1, 2, 3 with 4, 5 and then check the resulting list using only two assert statements. The first assert checks whether the front part of the result matches the first input list and the second assert checks whether the back part of the result matches the second input list. We make use of another function named all_elements that we describe next.

define list_45 : List<Nat> = node(4, node(5, empty))
define list_1_5 = append(list_123, list_45)
assert all_elements(interval(3, 0),
                    λi{ nth(list_1_5, 0)(i) = nth(list_123,0)(i) })
assert all_elements(interval(2, 0),
                    λi{ nth(list_1_5, 0)(3 + i) = nth(list_45,0)(i) })

The all_elements function takes a list and a function and checks whether applying the function to every element of the list always produces true.

function all_elements<T>(List<T>, fn (T) -> bool) -> bool {
  all_elements(empty, P) = true
  all_elements(node(x, xs'), P) = P(x) and all_elements(xs', P)
}

Going a step further, we can adapt the tests to apply to longer lists by automating the creation of the input lists. Here we increase the combined size to 20 elements. We could go with longer lists, but Deduce currently has a slow interpreter, so the assertions would take a long time (e.g., a minute for 100 elements).

define num_elts = 20
define first_elts = 12
define second_elts = 8
define first_list = interval(first_elts,1)
define second_list = interval(second_elts, first_elts + 1)
define output_list = append(first_list, second_list)
assert all_elements(interval(first_elts, 0), 
          λi{ nth(output_list, 0)(i) = nth(first_list,0)(i) })
assert all_elements(interval(second_elts, 0),
          λi{ nth(output_list, 0)(first_elts + i) = nth(second_list,0)(i) })

Exercise: Prove that Append is Correct

Prove that append satisfies its specification on all possible inputs. First, we need to translate the specification into a Deduce formula. We can do this by generalizing the above assertions. Instead of using specific lists and specific indices, we use all formulas to talk about all possible lists and indices. Also, for convenience, we split up correctness into two theorems, one about the first input list xs and the other about the second input list ys. We recommend that your proofs use induction on List<T>.

theorem nth_append_front:
  all T:type. all xs:List<T>. all ys:List<T>, i:Nat, d:T.
  if i < length(xs)
  then nth(append(xs, ys), d)(i) = nth(xs, d)(i)
proof
  FILL IN HERE
end

theorem nth_append_back: 
  all T:type. all xs:List<T>. all ys:List<T>, i:Nat, d:T.
  nth(append(xs, ys), d)(length(xs) + i) = nth(ys, d)(i)
proof
  FILL IN HERE
end

Help! We're Failing to Prove Correctness of Closure Conversion using Denotational Semantics (Graph Models)

2023-06-08T08:25:00.003-07:00

Recall that closure conversion lowers lexically-scoped functions into a flat-closure representation, which pairs a function pointer with a tuple of values for the function’s free variables. The crux of this pass is a transformation we call “delay” (D) because it postpones the point at which the function is applied to the above-mentioned tuple, from the point of definition of the function to the points of application. Let ⟦-⟧ₛ be the denotational semantics for the source language of “delay” and ⟦-⟧ₜ be the semantics for its target. (Both languages are variants of the untyped lambda calculus.) We tried to prove something like:

⟦ M ⟧ₛ ρ ≈ ⟦ D(M) ⟧ₜ ρ

where much of the difficulty was in finding an appropriate definition for the ≈ relation. In a denotational semantics based on the graph model, the semantics of a term is an infinite set of finite descriptions of the term’s behavior. So a straightforward way to define ≈ is

S ≈ S' iff     ∀ f. f ∈ S implies ∃f'. f' ∈ S' and f ~ f' (forward)
           and ∀ f'. f' ∈ S′ implies ∃f. f ∈ S and f ~ f' (backward).

with some suitable definition of equivalence ~ for finite descriptions.

Consider the following example. The first transformation changes the lambda abstraction to make explicit the creation of a tuple for the free variables. The second transformation, the above-mentioned “delay”, does two things, it (1) replaces the application of (λ fv ...) to ⟨ y , z ⟩ with the creation of another tuple that contains those two items and (2) replaces the application add(3) with the application add[0](add[1], 3).

let y = 4 in 
let z = 5 in 
let add = λ x. x + y + z in
add(3)
===>
let y = 4 in 
let z = 5 in 
let add = (λ fv. λ x. x + fv[0] + fv[1]) ⟨ y , z ⟩ in
add(3)
===> "delay"
let y = 4 in 
let z = 5 in 
let add = ⟨(λ (fv, x). x + fv[0] + fv[1]) , ⟨ y , z ⟩ ⟩ in
add[0](add[1], 3)

Focusing on the “delay” transformation of the lambda abstractions and the backward direction of the equivalence, we need to show that

∀f'. f' ∈ ⟦ ⟨(λ fv x. x + fv[0] + fv[1]) , ⟨ y , z ⟩ ⟩ ⟧ₜ(y={4},z={5})
implies
∃f. f ∈ ⟦ (λ fv. λ x. x + fv[0] + fv[1]) ⟨ y , z ⟩ ⟧ₛ(y={4},z={5}) 
and f ~ f'

Consider

f' = ⟨ {⟨0,0⟩ ↦ 3 ↦ 3} , ⟨ 4 , 5 ⟩ ⟩

where {⟨0,0⟩ ↦ 3 ↦ 3} is one entry in the input-output table for the lambda abstraction:

{⟨0,0⟩ ↦ (3 ↦ 3)} ∈ ⟦ λ fv. λ x. x + fv[0] + fv[1] ⟧ₜ

This entry says that if the pair ⟨0,0⟩ is bound to fv, and 3 is bound to x, then the result is 3. (Note that there are many other elements of ⟦ λ fv. λ x. ... ⟧ₜ, such as {⟨4,5⟩ ↦ (3 ↦ 12)}, {⟨4,5⟩ ↦ (6 ↦ 15)}, and {⟨0,0⟩ ↦ (6 ↦ 6)}.)

Given this f', we need to find an element f of

⟦ (λ fv. λ x. x + fv[0] + fv[1]) ⟨ y , z ⟩ ⟧ₛ(y={4},z={5})

such that f corresponds to f', i.e., f ~ f'. However, the elements of this partially-applied lambda all have y and z fixed at 4 and 5 respectively so this partially-applied lambda is the “plus nine” function:

{0 ↦ 9}, {1 ↦ 10}, {2 ↦ 11}, {3 ↦ 12}, {6 ↦ 15}, ...

So there is no f in it that corresponds to {⟨0,0⟩ ↦ (6 ↦ 6)}, (the “identity” function).

We have tried several approaches to solving this problem, but ran into road blocks with each one of them. If you know of a technique for solving this problem, please let us know!

Gradual Guarantee via Step-indexed Logical Relations

2023-05-16T16:28:00.003-07:00

{-# OPTIONS --rewriting #-}
module LogRel.BlogGradualGuaranteeLogRel where

open import Data.Empty using (⊥; ⊥-elim)
open import Data.List using (List; []; _∷_; map; length)
open import Data.Nat
open import Data.Nat.Properties
open import Data.Product using (_,_;_×_; proj₁; proj₂; Σ-syntax; ∃-syntax)
open import Data.Sum using (_⊎_; inj₁; inj₂)
open import Data.Unit using (⊤; tt)
open import Data.Unit.Polymorphic renaming (⊤ to topᵖ; tt to ttᵖ)
open import Relation.Binary.PropositionalEquality as Eq
  using (_≡_; _≢_; refl; sym; cong; subst; trans)
open import Relation.Nullary using (¬_; Dec; yes; no)

open import Var
open import InjProj.CastCalculus
open import InjProj.CastDeterministic

One of the defining characteristics of a gradually typed language is captured by the gradual guarantee , which governs how the behavior of a program can change when the programmer changes some of the type annotations in the program to be more or less precise. It says that when changed to be more precise, the program will behave the same except that it may error more often. A change in the other direction, to be less precise, yields a program with exactly the same behavior.

In this blog post I prove in Agda the gradual guarantee for the gradually typed lambda calculus using the logical relations proof technique. In the past I’ve proved the gradual guarantee using a simulation argument, but I was curious to see whether the proof would be easier/harder using logical relations. The approach I use here is a synthesis of techniques from Dreyer, Ahmed, and Birkedal (LMCS 2011) regarding step-indexing using a modal logic and Max New (Ph.D. thesis 2020) regarding logical relations for gradual typing.

This Agda development lives on github in the following repository:

https://github.com/jsiek/gradual-typing-in-agda

The files corresponding to this blog post are in the LogRel directory, which also import files from the InjProj directory (for the definition of the cast calculus). Also, this Agda code make use of the abstract binding tree library, which is in the following repository:

https://github.com/jsiek/abstract-binding-trees

Precision and the Gradual Guarantee

To talk about the gradual guarantee, we first define when one type is less precise than another one. The following definition says that the unknown type ★ is less precise than any other type.

infixr 6 _⊑_
data _⊑_ : Type → Type → Set where

  unk⊑unk : ★ ⊑ ★
  
  unk⊑ : ∀{G}{B}
     → gnd⇒ty G ⊑ B
       -------------
     → ★ ⊑ B
  
  base⊑ : ∀{ι}
        ----------
      → $ₜ ι ⊑ $ₜ ι

  fun⊑ : ∀{A B C D}
     → A ⊑ C  →  B ⊑ D
       ---------------
     → A ⇒ B ⊑ C ⇒ D

The first two rules for precision are usually presented as a single rule:

unk⊑any : ∀{B} → ★ ⊑ B

Instead we have separated out the case for when both types are ★ from the case when only the less-precise type is ★. Also, for the rule unk⊑, instead of writing B ≢ ★ we have written gnd⇒ty G ⊑ B, which turns out to be important later when we define the logical relation and use recursion on the precision relation.

Of course, the precision relation is reflexive.

Refl⊑ : ∀{A} → A ⊑ A
Refl⊑ {★} = unk⊑unk
Refl⊑ {$ₜ ι} = base⊑
Refl⊑ {A ⇒ B} = fun⊑ Refl⊑ Refl⊑

If c is a derivation of ★ ⊑ gnd⇒ty G, then it must be an instance of the unk⊑ rule.

unk⊑gnd-inv : ∀{G}
   → (c : ★ ⊑ gnd⇒ty G)
   → ∃[ d ] c ≡ unk⊑{G}{gnd⇒ty G} d
unk⊑gnd-inv {$ᵍ ι} (unk⊑ {$ᵍ .ι} base⊑) = base⊑ , refl
unk⊑gnd-inv {★⇒★} (unk⊑ {★⇒★} (fun⊑ c d)) = fun⊑ c d , refl

If c and d are both derivations of ★ ⊑ A, then they are equal.

dyn-prec-unique : ∀{A}
  → (c : ★ ⊑ A)
  → (d : ★ ⊑ A)
  → c ≡ d
dyn-prec-unique {★} unk⊑unk unk⊑unk = refl
dyn-prec-unique {★} unk⊑unk (unk⊑ {$ᵍ ι} ())
dyn-prec-unique {★} unk⊑unk (unk⊑ {★⇒★} ())
dyn-prec-unique {★} (unk⊑ {$ᵍ ι} ()) d
dyn-prec-unique {★} (unk⊑ {★⇒★} ()) d
dyn-prec-unique {$ₜ ι} (unk⊑ {$ᵍ .ι} base⊑) (unk⊑ {$ᵍ .ι} base⊑) = refl
dyn-prec-unique {A ⇒ A₁} (unk⊑ {★⇒★} (fun⊑ c c₁)) (unk⊑ {★⇒★} (fun⊑ d d₁))
    with dyn-prec-unique c d | dyn-prec-unique c₁ d₁
... | refl | refl = refl

If c and d are both derivations of gnd⇒ty G ⊑ A, then they are equal.

gnd-prec-unique : ∀{G A}
   → (c : gnd⇒ty G ⊑ A)
   → (d : gnd⇒ty G ⊑ A)
   → c ≡ d
gnd-prec-unique {$ᵍ ι} {.($ₜ ι)} base⊑ base⊑ = refl
gnd-prec-unique {★⇒★} {.(_ ⇒ _)} (fun⊑ c c₁) (fun⊑ d d₁)
    with dyn-prec-unique c d | dyn-prec-unique c₁ d₁
... | refl | refl = refl

Next we define a precision relation on terms. I’m going to skip the normal steps of first defining the precision relation for the surface language and proving that compiling from the surface language to a cast calculus preserves precision. That is relatively easy, so I’ll jump to defining precision on terms of the cast calculus.

infix 3 _⊩_⊑_⦂_

Prec : Set
Prec = (∃[ A ] ∃[ B ] A ⊑ B)

data _⊩_⊑_⦂_ : List Prec → Term → Term → ∀{A B : Type} → A ⊑ B → Set 

data _⊩_⊑_⦂_ where

  ⊑-var : ∀ {Γ x A⊑B}
     → Γ ∋ x ⦂ A⊑B
       -------------------------------------
     → Γ ⊩ (` x) ⊑ (` x) ⦂ proj₂ (proj₂ A⊑B)

  ⊑-lit : ∀ {Γ c}
       -----------------------------------
     → Γ ⊩ ($ c) ⊑ ($ c) ⦂ base⊑{typeof c}

  ⊑-app : ∀{Γ L M L′ M′ A B C D}{c : A ⊑ C}{d : B ⊑ D}
     → Γ ⊩ L ⊑ L′ ⦂ fun⊑ c d
     → Γ ⊩ M ⊑ M′ ⦂ c
       -----------------------
     → Γ ⊩ L · M ⊑ L′ · M′ ⦂ d

  ⊑-lam : ∀{Γ N N′ A B C D}{c : A ⊑ C}{d : B ⊑ D}
     → (A , C , c) ∷ Γ ⊩ N ⊑ N′ ⦂ d
       ----------------------------
     → Γ ⊩ ƛ N ⊑ ƛ N′ ⦂ fun⊑ c d

  ⊑-inj-L : ∀{Γ M M′}{G B}{c : (gnd⇒ty G) ⊑ B}
     → Γ ⊩ M ⊑ M′ ⦂ c
       --------------------------------
     → Γ ⊩ M ⟨ G !⟩ ⊑ M′ ⦂ unk⊑{G}{B} c

  ⊑-inj-R : ∀{Γ M M′}{G}{c : ★ ⊑ (gnd⇒ty G)}
     → Γ ⊩ M ⊑ M′ ⦂ c
       ---------------------------
     → Γ ⊩ M ⊑ M′ ⟨ G !⟩ ⦂ unk⊑unk

  ⊑-proj-L : ∀{Γ M M′ H B}{c : (gnd⇒ty H) ⊑ B}
     → Γ ⊩ M ⊑ M′ ⦂ unk⊑ c
       ---------------------
     → Γ ⊩ M ⟨ H ?⟩ ⊑ M′ ⦂ c

  ⊑-proj-R : ∀{Γ M M′ H}{c : ★ ⊑ (gnd⇒ty H)}
     → Γ ⊩ M ⊑ M′ ⦂ unk⊑unk
       ---------------------
     → Γ ⊩ M ⊑ M′ ⟨ H ?⟩  ⦂ c

  ⊑-blame : ∀{Γ M A}
     → map proj₁ Γ ⊢ M ⦂ A
       ------------------------
     → Γ ⊩ M ⊑ blame ⦂ Refl⊑{A}

To write down the gradual guarantee, we also need some notation for expressing whether a program halts with a value, diverges, or encounters an error. So we write ⇓ for halting with a result value, ⇑ for diverging, and ⇑⊎blame for diverging or producing an error.

_⇓ : Term → Set
M ⇓ = ∃[ V ] (M —↠ V) × Value V

_⇑ : Term → Set
M ⇑ = ∀ k → ∃[ N ] Σ[ r ∈ M —↠ N ] k ≡ len r

_⇑⊎blame : Term → Set
M ⇑⊎blame = ∀ k → ∃[ N ] Σ[ r ∈ M —↠ N ] ((k ≡ len r) ⊎ (N ≡ blame))

We can now state the gradual guarnatee. Suppose program M is less or equally precise as program M′. Then M and M′ should behave the same except that M′ results in an error more often. More specifically, if M′ results in a value or diverges, so does M. On the other hand, if M results a value, then M′ results in a value or errors. If M diverges, then M′ diverges or errors. If M errors, then so does M′.

gradual-guarantee : ∀ {A}{A′}{A⊑A′ : A ⊑ A′} → (M M′ : Term)
   → [] ⊩ M ⊑ M′ ⦂ A⊑A′
    -----------------------------------
   → (M′ ⇓ → M ⇓)
   × (M′ ⇑ → M ⇑)
   × (M ⇓ → M′ ⇓ ⊎ M′ —↠ blame)
   × (M ⇑ → M′ ⇑⊎blame)
   × (M —↠ blame → M′ —↠ blame)

One might wonder if the gradual guarantee could be simply proved by induction on the derivation of its premise [] ⊩ M ⊑ M′ ⦂ A⊑A′. Such a proof attempt runs into trouble in the case for function application, where one needs to have more information about how the bodies of related lambda abstractions evaluate when given related arguments, but don’t have it. The main idea of a logical relation is to add that extra information, effectively strengthening the theorem statement to get the induction to go through.

However, before diving into the logical relation, we have one more items to cover regarding the gradual guarantee.

Semantic Approximation

We separate the gradual guarantee into two properties, one that observes the less precise term M for k steps of reduction and the other that observes the more precise term M′ for k steps of reduction. After those k steps, the term being observed may have reduced to a value or an error, or it might still be reducing. If it reduced to a value, then the relation requires the other term to also reduce to a value, except of course that M′ may error. We define these two properties with one relation, written dir ⊨ M ⊑ M′ for k and called semantic approximation, that is parameterized over a direction dir. The direction ≼ observes the less precise term M and the ≽ direction observes the more precise term M′.

data Dir : Set where
  ≼ : Dir
  ≽ : Dir

_⊨_⊑_for_ : Dir → Term → Term → ℕ → Set

≼ ⊨ M ⊑ M′ for k = (M ⇓ × M′ ⇓)
                    ⊎ (M′ —↠ blame)
                    ⊎ (∃[ N ] Σ[ r ∈ M —↠ N ] len r ≡ k)
                    
≽ ⊨ M ⊑ M′ for k = (M ⇓ × M′ ⇓)
                    ⊎ (M′ —↠ blame)
                    ⊎ (∃[ N′ ] Σ[ r ∈ M′ —↠ N′ ] len r ≡ k)

We write ⊨ M ⊑ M′ for k for the conjunction of semantic approximation in both directions.

⊨_⊑_for_ : Term → Term → ℕ → Set
⊨ M ⊑ M′ for k = (≼ ⊨ M ⊑ M′ for k) × (≽ ⊨ M ⊑ M′ for k)

The following verbose but easy proof confirms that semantic approximation implies the gradual guarantee.

sem-approx⇒GG : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{M}{M′}
   → (∀ k → ⊨ M ⊑ M′ for k)
   → (M′ ⇓ → M ⇓)
   × (M′ ⇑ → M ⇑)
   × (M ⇓ → M′ ⇓ ⊎ M′ —↠ blame)
   × (M ⇑ → M′ ⇑⊎blame)
   × (M —↠ blame → M′ —↠ blame)
sem-approx⇒GG {A}{A′}{A⊑A′}{M}{M′} ⊨M⊑M′ =
  to-value-right , diverge-right , to-value-left , diverge-left , blame-blame
  where
  to-value-right : M′ ⇓ → M ⇓
  to-value-right (V′ , M′→V′ , v′)
      with proj₂ (⊨M⊑M′ (suc (len M′→V′)))
  ... | inj₁ ((V , M→V , v) , _) = V , M→V , v
  ... | inj₂ (inj₁ M′→blame) =
        ⊥-elim (cant-reduce-value-and-blame v′ M′→V′ M′→blame)
  ... | inj₂ (inj₂ (N′ , M′→N′ , eq)) =
        ⊥-elim (step-value-plus-one M′→N′ M′→V′ v′ eq)
        
  diverge-right : M′ ⇑ → M ⇑
  diverge-right divM′ k
      with proj₁ (⊨M⊑M′ k)
  ... | inj₁ ((V , M→V , v) , (V′ , M′→V′ , v′)) =
        ⊥-elim (diverge-not-halt divM′ (inj₂ (V′ , M′→V′ , v′)))
  ... | inj₂ (inj₁ M′→blame) =
        ⊥-elim (diverge-not-halt divM′ (inj₁ M′→blame))
  ... | inj₂ (inj₂ (N , M→N , eq)) = N , M→N , sym eq

  to-value-left : M ⇓ → M′ ⇓ ⊎ M′ —↠ blame
  to-value-left (V , M→V , v)
      with proj₁ (⊨M⊑M′ (suc (len M→V)))
  ... | inj₁ ((V , M→V , v) , (V′ , M′→V′ , v′)) = inj₁ (V′ , M′→V′ , v′)
  ... | inj₂ (inj₁ M′→blame) = inj₂ M′→blame
  ... | inj₂ (inj₂ (N , M→N , eq)) =
        ⊥-elim (step-value-plus-one M→N M→V v eq)

  diverge-left : M ⇑ → M′ ⇑⊎blame
  diverge-left divM k 
      with proj₂ (⊨M⊑M′ k)
  ... | inj₁ ((V , M→V , v) , _) =
        ⊥-elim (diverge-not-halt divM (inj₂ (V , M→V , v)))
  ... | inj₂ (inj₁ M′→blame) = blame , (M′→blame , (inj₂ refl))
  ... | inj₂ (inj₂ (N′ , M′→N′ , eq)) = N′ , (M′→N′ , (inj₁ (sym eq))) 

  blame-blame : (M —↠ blame → M′ —↠ blame)
  blame-blame M→blame
      with proj₁ (⊨M⊑M′ (suc (len M→blame)))
  ... | inj₁ ((V , M→V , v) , (V′ , M′→V′ , v′)) =
        ⊥-elim (cant-reduce-value-and-blame v M→V M→blame)
  ... | inj₂ (inj₁ M′→blame) = M′→blame
  ... | inj₂ (inj₂ (N , M→N , eq)) =
        ⊥-elim (step-blame-plus-one M→N M→blame eq)

Definition of the Logical Relation

The logical relation acts as a bridge between term precision and semantic approximation. As alluded to above, it packs away extra information when relating two lambda abstractions. However, while this idea is straightforward, especially in the context of the simply-typed lambda calculus (STLC), the definition of logical relation for the cast calculus is rather more involved. We start by reviewing how one would define a logical relation for the STLC, then introduce the complications needed for the cast calculus.

For the STLC, the logical relation would consist of two relations, one for terms and another for values, and it would be indexed by their type A.

M ≼ᴸᴿₜ M′ ⦂ A
V ≼ᴸᴿᵥ V′ ⦂ A

The relation for values would be defined as an Agda function by recursion on the type A. At base type we relate literals if they are identical.

($ c) ≼ᴸᴿᵥ ($ c′) ⦂ ι   =   c ≡ c′

At function type, two lambda abstractions are related if substituting related arguments into their bodies yields related terms.

(ƛ N) ≼ᴸᴿᵥ (ƛ N′) ⦂ A ⇒ B = 
    ∀ W W′ → W ≼ᴸᴿᵥ W′ ⦂ A → N [ W ] ≼ᴸᴿₜ N′ [ W′ ] ⦂ B

The recursive uses of ≼ᴸᴿᵥ and ≼ᴸᴿₜ at type A and B in the above are Okay because those types are part of the function type A ⇒ B.

The definition of the relation on terms would have the following form.

M ≼ᴸᴿₜ M′ ⦂ A =  M —↠ V → ∃[ V′ ] M′ —↠ V′ × V ≼ᴸᴿᵥ V′ ⦂ A

The first challenge regarding the Cast Calculus is handling the unknown type ★ and its value form, the injection V ⟨ G !⟩ that casts value V from the ground type G to ★. One might try to define the case for injection as follows

V ⟨ G !⟩ ≼ᴸᴿᵥ V′ ⟨ H !⟩ ⦂ ★
    with G ≡ H
... | yes refl = V ≼ᴸᴿᵥ V′ ⦂ G
... | no neq = ⊥

but then realize that Agda rejects the recursion on type G as that type is not a subpart of ★.

At this point one might think to try defining the logical relation using a data declaration in Agda, but then one gets stuck in the case for function type because the recursion W ≼ᴸᴿᵥ W′ ⦂ A appears to the left of an implication.

This is where step indexing comes into play. We add an extra parameter to the relation, a natural number, and decrement that number in the recursive calls. Here’s a first attempt. We’ll define the following two functions, parameterized on the step index k and the direction dir (just like in the semantic approximation above.)

dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ A⊑A′ for k
dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′ for k

When the step-index is at zero, we relate all values.

dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ A⊑A′ for zero = ⊤

For suc k, we proceed by cases on precision A ⊑ A′. In the case for unk⊑unk, where we need to relate injections to ★ on both sides, the recursion uses step index k to relate the underlying values.

dir ∣ V ⟨ G !⟩ ⊑ᴸᴿᵥ V′ ⟨ H !⟩ ⦂ unk⊑unk for (suc k)
    with G ≡ᵍ H
... | yes refl = Value V × Value V′ × (dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ Refl⊑ for k)
... | no neq = ⊥

In the case for relating function types, we could try the following

dir ∣ ƛ N ⊑ᴸᴿᵥ ƛ N′ ⦂ (fun⊑ A⊑A′ B⊑B′) for (suc k) =
  ∀ W W′ → (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ A⊑A′ for k)
         → (dir ∣ (N [ W ]) ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ B⊑B′ for k)

which again is Okay regarding termination because the recursion is at the small step-index k. Unfortunately, we run into another problem. Our proofs will depend on the logical relation being downward closed. In general, a step-indexed property S is downward closed if, whenever it is true at a given step index n, it remains true at smaller step indices.

downClosed : (ℕ → Set) → Set
downClosed S = ∀ n → S n → ∀ k → k ≤ n → S k

The above definition of the relation for function types is not downward closed. The fix is to allow the recursion at any number j that is less-than-or-equal to k.

dir ∣ ƛ N ⊑ᴸᴿᵥ ƛ N′ ⦂ (fun⊑ A⊑A′ B⊑B′) for (suc k) =
  ∀ W W′ j → j ≤ k → (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ A⊑A′ for j)
         → (dir ∣ (N [ W ]) ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ B⊑B′ for j)

But now Agda rejects this definition because it is not structurally recursive, i.e., j is not a subpart of suc k. One could instead define the relation by strong recursion and then proceed to prove that it is downward closed. I’ve tried that approach and it works. However, using strong recursion in Agda is somewhat annoying, as is the proof of downward closedness. We instead use the StepIndexedLogic library to define the logical relation, which enables the definition of recursive predicates and proves downward closedness for us. However, there is some overhead involved in using the StepIndexedLogic library.

open import StepIndexedLogic

Recall that the StepIndexedLogic library provides an operator μᵒ that takes a non-recursive predicate (with an extra parameter) and turns it into a recursive predicate where the extra parameter is bound to itself. However, the library does not directly support mutually recursive predicates, so we must merge the two into a single predicate whose input is a disjoint union (aka. sum type), and then dispatch back out to separate predicates, which we name LRᵥ (for values) and LRₜ (for terms). The predicates are indexed not only by the two terms and the direction (≼ or ≽), but also by the precision relation between the types of the two terms.

LR-type : Set
LR-type = (Prec × Dir × Term × Term) ⊎ (Prec × Dir × Term × Term)

LR-ctx : Context
LR-ctx = LR-type ∷ []

LRᵥ : Prec → Dir → Term → Term → Setˢ LR-ctx (cons Later ∅)
LRₜ : Prec → Dir → Term → Term → Setˢ LR-ctx (cons Later ∅)

_∣_ˢ⊑ᴸᴿₜ_⦂_ : Dir → Term → Term → ∀{A}{A′} (A⊑A′ : A ⊑ A′)
   → Setˢ LR-ctx (cons Now ∅)
dir ∣ M ˢ⊑ᴸᴿₜ M′ ⦂ A⊑A′ = (inj₂ ((_ , _ , A⊑A′) , dir , M , M′)) ∈ zeroˢ

_∣_ˢ⊑ᴸᴿᵥ_⦂_ : Dir → Term → Term → ∀{A}{A′} (A⊑A′ : A ⊑ A′)
   → Setˢ LR-ctx (cons Now ∅)
dir ∣ V ˢ⊑ᴸᴿᵥ V′ ⦂ A⊑A′ = (inj₁ ((_ , _ , A⊑A′) , dir , V , V′)) ∈ zeroˢ

instance
  TermInhabited : Inhabited Term
  TermInhabited = record { elt = ` 0 }

The definition of the logical relation for terms is a reorganized version of semantic approximation that only talks about one step at a time of the term that is being observed. Let us consider the ≼ direction, that observes the less-precise term M. The first clause says that M takes a step to N and that N is related to M′ at one tick later in time. The third clause says that M is already a value, and requires M′ to reduce to a value that is related to M. Finally, the second clause allows M′ to produce an error.

LRₜ (A , A′ , c) ≼ M M′ =
   (∃ˢ[ N ] (M —→ N)ˢ ×ˢ ▷ˢ (≼ ∣ N ˢ⊑ᴸᴿₜ M′ ⦂ c))
   ⊎ˢ (M′ —↠ blame)ˢ
   ⊎ˢ ((Value M)ˢ ×ˢ (∃ˢ[ V′ ] (M′ —↠ V′)ˢ ×ˢ (Value V′)ˢ
                       ×ˢ (LRᵥ (_ , _ , c) ≼ M V′)))

The other direction, ≽, is defined in a symmetric way, observing the reduction of the more-precise M′ instead of M.

LRₜ (A , A′ , c) ≽ M M′ =
   (∃ˢ[ N′ ] (M′ —→ N′)ˢ ×ˢ ▷ˢ (≽ ∣ M ˢ⊑ᴸᴿₜ N′ ⦂ c))
   ⊎ˢ (Blame M′)ˢ
   ⊎ˢ ((Value M′)ˢ ×ˢ (∃ˢ[ V ] (M —↠ V)ˢ ×ˢ (Value V)ˢ
                                ×ˢ (LRᵥ (_ , _ , c) ≽ V M′)))

Next we proceed to define the logical relation for values, the predicate LRᵥ. In the case of precision for base types base⊑, we only relate identical constants.

LRᵥ (.($ₜ ι) , .($ₜ ι) , base⊑{ι}) dir ($ c) ($ c′) = (c ≡ c′) ˢ
LRᵥ (.($ₜ ι) , .($ₜ ι) , base⊑{ι}) dir V V′ = ⊥ ˢ

In the case for related function types, two lambda abstractions are related if, for any two arguments that are related later, substituting the arguments into the bodies produces terms that are related later.

LRᵥ (.(A ⇒ B) , .(A′ ⇒ B′) , fun⊑{A}{B}{A′}{B′} A⊑A′ B⊑B′) dir (ƛ N)(ƛ N′) =
    ∀ˢ[ W ] ∀ˢ[ W′ ] ▷ˢ (dir ∣ W ˢ⊑ᴸᴿᵥ W′ ⦂ A⊑A′)
                  →ˢ ▷ˢ (dir ∣ (N [ W ]) ˢ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ B⊑B′) 
LRᵥ (.(A ⇒ B) , .(A′ ⇒ B′) , fun⊑{A}{B}{A′}{B′} A⊑A′ B⊑B′) dir V V′ = ⊥ ˢ

Notice how in the above definition, we no longer need to quantify over the extra j where j ≤ k. The implication operator →ˢ of the StepIndexedLogic instead takes care of that complication, ensuring that our logical relation is downward closed.

In the case for relating two values of the unknown type ★, two injections are related if they are injections from the same ground type and if the underlying values are related later.

LRᵥ (.★ , .★ , unk⊑unk) dir (V ⟨ G !⟩) (V′ ⟨ H !⟩)
    with G ≡ᵍ H
... | yes refl = (Value V)ˢ ×ˢ (Value V′)ˢ
                 ×ˢ (▷ˢ (dir ∣ V ˢ⊑ᴸᴿᵥ V′ ⦂ Refl⊑{gnd⇒ty G}))
... | no neq = ⊥ ˢ
LRᵥ (.★ , .★ , unk⊑unk) dir V V′ = ⊥ ˢ

In the case for relating two values where the less precise value is of unknown type but the more precise value is not, our definition depends on the direction (≼ or ≽). For the ≼ direction, the underlying values must be related later. Alternatively, we could relate them now, by using recusion on the precision derivation d, but the proof of the compatibility lemma for a projection on the more-precise side depends on only requiring the two underlying values to be related later.

LRᵥ (.★ , .A′ , unk⊑{H}{A′} d) ≼ (V ⟨ G !⟩) V′
    with G ≡ᵍ H
... | yes refl = (Value V)ˢ ×ˢ (Value V′)ˢ ×ˢ ▷ˢ (≼ ∣ V ˢ⊑ᴸᴿᵥ V′ ⦂ d)
... | no neq = ⊥ ˢ

For the ≽ direction, the underlying values must be related now. Alternatively, we could relate them later, but the proof of the compatibility lemma for a projection on the less-precise side depends on the underlying values being related now.

LRᵥ (.★ , .A′ , unk⊑{H}{A′} d) ≽ (V ⟨ G !⟩) V′
    with G ≡ᵍ H
... | yes refl = (Value V)ˢ ×ˢ (Value V′)ˢ ×ˢ (LRᵥ (gnd⇒ty G , A′ , d) ≽ V V′)
... | no neq = ⊥ ˢ
LRᵥ (★ , .A′ , unk⊑{H}{A′} d) dir V V′ = ⊥ ˢ

With LRₜ and LRᵥ in hand, we can define the combined predicate pre-LRₜ⊎LRᵥ and then use the fixpoint operator μᵒ from the StepIndexedLogic to define the combined logical relation.

pre-LRₜ⊎LRᵥ : LR-type → Setˢ LR-ctx (cons Later ∅)
pre-LRₜ⊎LRᵥ (inj₁ (c , dir , V , V′)) = LRᵥ c dir V V′
pre-LRₜ⊎LRᵥ (inj₂ (c , dir , M , M′)) = LRₜ c dir M M′

LRₜ⊎LRᵥ : LR-type → Setᵒ
LRₜ⊎LRᵥ X = μᵒ pre-LRₜ⊎LRᵥ X

We now give the main definitions for the logical relation, ⊑ᴸᴿᵥ for values and the ⊑ᴸᴿₜ for terms.

_∣_⊑ᴸᴿᵥ_⦂_ : Dir → Term → Term → ∀{A A′} → A ⊑ A′ → Setᵒ
dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ A⊑A′ = LRₜ⊎LRᵥ (inj₁ ((_ , _ , A⊑A′) , dir , V , V′))

_∣_⊑ᴸᴿₜ_⦂_ : Dir → Term → Term → ∀{A A′} → A ⊑ A′ → Setᵒ
dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′ = LRₜ⊎LRᵥ (inj₂ ((_ , _ , A⊑A′) , dir , M , M′))

The following notation is for the conjunction of both directions.

_⊑ᴸᴿₜ_⦂_ : Term → Term → ∀{A A′} → A ⊑ A′ → Setᵒ
M ⊑ᴸᴿₜ M′ ⦂ A⊑A′ = (≼ ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′) ×ᵒ (≽ ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′)

Relating open terms

The relations that we have defined so far, ⊑ᴸᴿᵥ and ⊑ᴸᴿₜ, only apply to closed terms, that is, terms with no free variables. We also need to related open terms. The standard way to do that is to apply two substitutions to the two terms, replacin each free variable with related values.

So we relate a pair of substitutions γ and γ′ with this definition of Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′, which says that the substitutions must be pointwise related using the logical relation for values.

_∣_⊨_⊑ᴸᴿ_ : (Γ : List Prec) → Dir → Subst → Subst → List Setᵒ
[] ∣ dir ⊨ γ ⊑ᴸᴿ γ′ = []
((_ , _ , A⊑A′) ∷ Γ) ∣ dir ⊨ γ ⊑ᴸᴿ γ′ = (dir ∣ (γ 0) ⊑ᴸᴿᵥ (γ′ 0) ⦂ A⊑A′)
                     ∷ (Γ ∣ dir ⊨ (λ x → γ (suc x)) ⊑ᴸᴿ (λ x → γ′ (suc x)))

We then define two open terms M and M′ to be logically related if there are a pair of related subtitutions γ and γ′ such that applying them to M and M′ produces related terms.

_∣_⊨_⊑ᴸᴿ_⦂_ : List Prec → Dir → Term → Term → Prec → Set
Γ ∣ dir ⊨ M ⊑ᴸᴿ M′ ⦂ (_ , _ , A⊑A′) = ∀ (γ γ′ : Subst)
   → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′) ⊢ᵒ dir ∣ (⟪ γ ⟫ M) ⊑ᴸᴿₜ (⟪ γ′ ⟫ M′) ⦂ A⊑A′

We use the following notation for the conjunction of the two directions and define the proj function for accessing each direction.

_⊨_⊑ᴸᴿ_⦂_ : List Prec → Term → Term → Prec → Set
Γ ⊨ M ⊑ᴸᴿ M′ ⦂ c = (Γ ∣ ≼ ⊨ M ⊑ᴸᴿ M′ ⦂ c) × (Γ ∣ ≽ ⊨ M ⊑ᴸᴿ M′ ⦂ c)

proj : ∀ {Γ}{c}
  → (dir : Dir)
  → (M M′ : Term)
  → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ c
  → Γ ∣ dir ⊨ M ⊑ᴸᴿ M′ ⦂ c
proj {Γ} {c} ≼ M M′ M⊑M′ = proj₁ M⊑M′
proj {Γ} {c} ≽ M M′ M⊑M′ = proj₂ M⊑M′

Reasoning about the logical relation

Unfortunately, there is some overhead to using the StepIndexedLogic to define the logical relation. One needs to use the fixpointᵒ theorem to obtain usable definitions.

The following states what we would like the ⊑ᴸᴿₜ relation to look like.

LRₜ-def : ∀{A}{A′} → (A⊑A′ : A ⊑ A′) → Dir → Term → Term → Setᵒ
LRₜ-def A⊑A′ ≼ M M′ =
   (∃ᵒ[ N ] (M —→ N)ᵒ ×ᵒ ▷ᵒ (≼ ∣ N ⊑ᴸᴿₜ M′ ⦂ A⊑A′))
   ⊎ᵒ (M′ —↠ blame)ᵒ
   ⊎ᵒ ((Value M)ᵒ ×ᵒ 
              (∃ᵒ[ V′ ] (M′ —↠ V′)ᵒ ×ᵒ (Value V′)ᵒ ×ᵒ (≼ ∣ M ⊑ᴸᴿᵥ V′ ⦂ A⊑A′)))
LRₜ-def A⊑A′ ≽ M M′ =
   (∃ᵒ[ N′ ] (M′ —→ N′)ᵒ ×ᵒ ▷ᵒ (≽ ∣ M ⊑ᴸᴿₜ N′ ⦂ A⊑A′))
   ⊎ᵒ (Blame M′)ᵒ
   ⊎ᵒ ((Value M′)ᵒ ×ᵒ (∃ᵒ[ V ] (M —↠ V)ᵒ ×ᵒ (Value V)ᵒ
                               ×ᵒ (≽ ∣ V ⊑ᴸᴿᵥ M′ ⦂ A⊑A′)))

We prove that the above is equivalent to ⊑ᴸᴿₜ with the following lemma, using the fixpointᵒ theorem in several places.

LRₜ-stmt : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{dir}{M}{M′}
   → dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′ ≡ᵒ LRₜ-def A⊑A′ dir M M′
LRₜ-stmt {A}{A′}{A⊑A′}{dir}{M}{M′} =
  dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′
                 ⩦⟨ ≡ᵒ-refl refl ⟩
  μᵒ pre-LRₜ⊎LRᵥ (X₂ dir)
                 ⩦⟨ fixpointᵒ pre-LRₜ⊎LRᵥ (X₂ dir) ⟩
  # (pre-LRₜ⊎LRᵥ (X₂ dir)) (LRₜ⊎LRᵥ , ttᵖ)
                 ⩦⟨ EQ{dir} ⟩
  LRₜ-def A⊑A′ dir M M′
  ∎
  where
  c = (A , A′ , A⊑A′)
  X₁ : Dir → LR-type
  X₁ = λ dir → inj₁ (c , dir , M , M′)
  X₂ = λ dir → inj₂ (c , dir , M , M′)
  EQ : ∀{dir} → # (pre-LRₜ⊎LRᵥ (X₂ dir)) (LRₜ⊎LRᵥ , ttᵖ)
                ≡ᵒ LRₜ-def A⊑A′ dir M M′
  EQ {≼} = cong-⊎ᵒ (≡ᵒ-refl refl)
           (cong-⊎ᵒ (≡ᵒ-refl refl)
            (cong-×ᵒ (≡ᵒ-refl refl) 
             (cong-∃ λ V′ → cong-×ᵒ (≡ᵒ-refl refl) (cong-×ᵒ (≡ᵒ-refl refl)
              ((≡ᵒ-sym (fixpointᵒ pre-LRₜ⊎LRᵥ (inj₁ (c , ≼ , M , V′)))))))))
  EQ {≽} = cong-⊎ᵒ (≡ᵒ-refl refl) (cong-⊎ᵒ (≡ᵒ-refl refl)
            (cong-×ᵒ (≡ᵒ-refl refl) (cong-∃ λ V → cong-×ᵒ (≡ᵒ-refl refl)
              (cong-×ᵒ (≡ᵒ-refl refl)
               (≡ᵒ-sym (fixpointᵒ pre-LRₜ⊎LRᵥ (inj₁ (c , ≽ , V , M′))))))))

In situations where we need to reason with an explicit step index k, we use the following corollary.

LRₜ-suc : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{dir}{M}{M′}{k}
  → #(dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′) (suc k) ⇔ #(LRₜ-def A⊑A′ dir M M′) (suc k)
LRₜ-suc {A}{A′}{A⊑A′}{dir}{M}{M′}{k} =
   ≡ᵒ⇒⇔{k = suc k} (LRₜ-stmt{A}{A′}{A⊑A′}{dir}{M}{M′})

The logical relation implies semantic approximation

Before getting too much further, its good to check whether the logical relation is strong enough, i.e., it should imply semantic approximation. Indeed, the following somewhat verbose but easy lemma proves that it does so.

LR⇒sem-approx : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{M}{M′}{k}{dir}
  → #(dir ∣ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′) (suc k)
  → dir ⊨ M ⊑ M′ for k
LR⇒sem-approx {A} {A′} {A⊑A′} {M} {M′} {zero} {≼} M⊑M′sk =
    inj₂ (inj₂ (M , (M END) , refl))
LR⇒sem-approx {A} {A′} {A⊑A′} {M} {M′} {suc k} {≼} M⊑M′sk
    with ⇔-to (LRₜ-suc{dir = ≼}) M⊑M′sk
... | inj₂ (inj₁ M′→blame) =
      inj₂ (inj₁ M′→blame)
... | inj₂ (inj₂ (m , (V′ , M′→V′ , v′ , 𝒱≼V′M))) =
      inj₁ ((M , (M END) , m) , (V′ , M′→V′ , v′))
... | inj₁ (N , M→N , ▷N⊑M′)
    with LR⇒sem-approx{dir = ≼} ▷N⊑M′
... | inj₁ ((V , M→V , v) , (V′ , M′→V′ , v′)) =
      inj₁ ((V , (M —→⟨ M→N ⟩ M→V) , v) , (V′ , M′→V′ , v′))
... | inj₂ (inj₁ M′→blame) =
      inj₂ (inj₁ M′→blame)
... | inj₂ (inj₂ (L , N→L , eq)) =
      inj₂ (inj₂ (L , (M —→⟨ M→N ⟩ N→L) , cong suc eq))
LR⇒sem-approx {A} {A′} {A⊑A′} {M} {M′} {zero} {≽} M⊑M′sk =
    inj₂ (inj₂ (M′ , (M′ END) , refl))
LR⇒sem-approx {A} {A′} {A⊑A′} {M} {M′} {suc k} {≽} M⊑M′sk
    with ⇔-to (LRₜ-suc{dir = ≽}) M⊑M′sk
... | inj₂ (inj₁ isBlame) =
      inj₂ (inj₁ (blame END))
... | inj₂ (inj₂ (m′ , V , M→V , v , 𝒱≽VM′)) =
      inj₁ ((V , M→V , v) , M′ , (M′ END) , m′)
... | inj₁ (N′ , M′→N′ , ▷M⊑N′)
    with LR⇒sem-approx{dir = ≽} ▷M⊑N′
... | inj₁ ((V , M→V , v) , (V′ , N′→V′ , v′)) =
      inj₁ ((V , M→V , v) , V′ , (M′ —→⟨ M′→N′ ⟩ N′→V′) , v′)
... | inj₂ (inj₁ N′→blame) = inj₂ (inj₁ (M′ —→⟨ M′→N′ ⟩ N′→blame))
... | inj₂ (inj₂ (L′ , N′→L′ , eq)) =
      inj₂ (inj₂ (L′ , (M′ —→⟨ M′→N′ ⟩ N′→L′) , cong suc eq))

The logical relation implies the gradual guarantee

Putting together the above lemma with sem-approx⇒GG, we know that the logical relation implies the gradual guarantee.

LR⇒GG : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{M}{M′}
   → [] ⊢ᵒ M ⊑ᴸᴿₜ M′ ⦂ A⊑A′
   → (M′ ⇓ → M ⇓)
   × (M′ ⇑ → M ⇑)
   × (M ⇓ → M′ ⇓ ⊎ M′ —↠ blame)
   × (M ⇑ → M′ ⇑⊎blame)
   × (M —↠ blame → M′ —↠ blame)
LR⇒GG {A}{A′}{A⊑A′}{M}{M′} ⊨M⊑M′ =
  sem-approx⇒GG{A⊑A′ = A⊑A′} (λ k → ≼⊨M⊑M′ , ≽⊨M⊑M′)
  where
  ≼⊨M⊑M′ : ∀{k} → ≼ ⊨ M ⊑ M′ for k
  ≼⊨M⊑M′ {k} = LR⇒sem-approx {k = k}{dir = ≼}
                   (⊢ᵒ-elim (proj₁ᵒ ⊨M⊑M′) (suc k) tt) 
  ≽⊨M⊑M′ : ∀{k} → ≽ ⊨ M ⊑ M′ for k
  ≽⊨M⊑M′ {k} = LR⇒sem-approx {k = k}{dir = ≽}
                   (⊢ᵒ-elim (proj₂ᵒ ⊨M⊑M′) (suc k) tt)

Looking forward to the fundamental lemma

The fundamental lemma is the last, but largest, piece of the puzzle. It states that if M and M′ are related by term precision, then they are also logically related.

fundamental : ∀ {Γ}{A}{A′}{A⊑A′ : A ⊑ A′} → (M M′ : Term)
  → Γ ⊩ M ⊑ M′ ⦂ A⊑A′
    ----------------------------
  → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (A , A′ , A⊑A′)

The proof of the fundamental lemma is by induction on the term precision relation, with each case proved as a separate lemma. By tradition, we refer to these lemmas as the compatibility lemmas. The proofs of the compatibility lemmas rely on a considerable number of technical lemmas regarding the logical relation, which we prove next.

The logical relation is preserved by anti-reduction (aka. expansion)

If two terms are related, then taking a step backwards with either or both of the terms yields related terms. For example, if ≼ ∣ N ⊑ᴸᴿₜ M′ and we step N backwards to M, then we have ≼ ∣ M ⊑ᴸᴿₜ M′.

anti-reduction-≼-L-one : ∀{A}{A′}{c : A ⊑ A′}{M}{N}{M′}{i}
  → #(≼ ∣ N ⊑ᴸᴿₜ M′ ⦂ c) i
  → (M→N : M —→ N)
    ----------------------------
  → #(≼ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) (suc i)
anti-reduction-≼-L-one {c = c} {M} {N} {M′} {i} ℰ≼NM′i M→N =
  inj₁ (N , M→N , ℰ≼NM′i)

Because the ≼ direction observes the reduction steps of the less-precise term, and the above lemma is about taking a backward step with the less-precise term, the step index increases by one, i.e., not the i in the premise and suc i in the conclusion above.

If instead the backward step is taken by the more-precise term, then the step index does not change, as in the following lemma.

anti-reduction-≼-R-one : ∀{A}{A′}{c : A ⊑ A′}{M}{M′}{N′}{i}
  → #(≼ ∣ M ⊑ᴸᴿₜ N′ ⦂ c) i
  → (M′→N′ : M′ —→ N′)
  → #(≼ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) i
anti-reduction-≼-R-one {c = c}{M}{M′}{N′}{zero} ℰMN′ M′→N′ =
  tz (≼ ∣ M ⊑ᴸᴿₜ M′ ⦂ c)
anti-reduction-≼-R-one {c = c}{M}{M′}{N′}{suc i} ℰMN′ M′→N′
    with ℰMN′
... | inj₁ (N , M→N , ▷ℰNN′) =
         let ℰNM′si = anti-reduction-≼-R-one ▷ℰNN′ M′→N′ in
         inj₁ (N , M→N , ℰNM′si)
... | inj₂ (inj₁ N′→blame) = inj₂ (inj₁ (unit M′→N′ ++ N′→blame))
... | inj₂ (inj₂ (m , (V′ , N′→V′ , v′ , 𝒱MV′))) =
      inj₂ (inj₂ (m , (V′ , (unit M′→N′ ++ N′→V′) , v′ , 𝒱MV′)))

Here are the anti-reduction lemmas for the ≽ direction.

anti-reduction-≽-L-one : ∀{A}{A′}{c : A ⊑ A′}{M}{N}{M′}{i}
  → #(≽ ∣ N ⊑ᴸᴿₜ M′ ⦂ c) i
  → (M→N : M —→ N)
  → #(≽ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) i
anti-reduction-≽-L-one {c = c}{M} {N}{M′} {zero} ℰNM′ M→N =
    tz (≽ ∣ M ⊑ᴸᴿₜ M′ ⦂ c)
anti-reduction-≽-L-one {M = M} {N}{M′}  {suc i} ℰNM′ M→N
    with ℰNM′
... | inj₁ (N′ , M′→N′ , ▷ℰMN′) =
      inj₁ (N′ , (M′→N′ , (anti-reduction-≽-L-one ▷ℰMN′ M→N)))
... | inj₂ (inj₁ isBlame) = inj₂ (inj₁ isBlame)
... | inj₂ (inj₂ (m′ , V , N→V , v , 𝒱VM′)) =
      inj₂ (inj₂ (m′ , V , (unit M→N ++ N→V) , v , 𝒱VM′))

anti-reduction-≽-R-one : ∀{A}{A′}{c : A ⊑ A′}{M}{M′}{N′}{i}
  → #(≽ ∣ M ⊑ᴸᴿₜ N′ ⦂ c) i
  → (M′→N′ : M′ —→ N′)
  → #(≽ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) (suc i)
anti-reduction-≽-R-one {c = c} {M} {M′}{N′} {i} ℰ≽MN′ M′→N′ =
  inj₁ (N′ , M′→N′ , ℰ≽MN′)

Putting together the above lemmas, we show that taking a step backwards on both sides yields terms that are related.

anti-reduction : ∀{A}{A′}{c : A ⊑ A′}{M}{N}{M′}{N′}{i}{dir}
  → #(dir ∣ N ⊑ᴸᴿₜ N′ ⦂ c) i
  → (M→N : M —→ N)
  → (M′→N′ : M′ —→ N′)
  → #(dir ∣ M ⊑ᴸᴿₜ M′ ⦂ c) (suc i)
anti-reduction {c = c} {M} {N} {M′} {N′} {i} {≼} ℰNN′i M→N M′→N′ =
  let ℰMN′si = anti-reduction-≼-L-one ℰNN′i M→N in
  let ℰM′N′si = anti-reduction-≼-R-one ℰMN′si M′→N′ in
  ℰM′N′si
anti-reduction {c = c} {M} {N} {M′} {N′} {i} {≽} ℰNN′i M→N M′→N′ =
  let ℰM′Nsi = anti-reduction-≽-R-one ℰNN′i M′→N′ in
  let ℰM′N′si = anti-reduction-≽-L-one ℰM′Nsi M→N in
  ℰM′N′si

We shall also need to know that taking multiple steps backwards is preserved by the logical relation. For the ≼ direction, we need this for taking backward steps with the more-precise term.

anti-reduction-≼-R : ∀{A}{A′}{c : A ⊑ A′}{M}{M′}{N′}{i}
  → #(≼ ∣ M ⊑ᴸᴿₜ N′ ⦂ c) i
  → (M′→N′ : M′ —↠ N′)
  → #(≼ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) i
anti-reduction-≼-R {M′ = M′} ℰMN′ (.M′ END) = ℰMN′
anti-reduction-≼-R {M′ = M′} {N′} {i} ℰMN′ (.M′ —→⟨ M′→L′ ⟩ L′→*N′) =
  anti-reduction-≼-R-one (anti-reduction-≼-R ℰMN′ L′→*N′) M′→L′

For the ≽ direction, we need this for taking backward steps with the less-precise term.

anti-reduction-≽-L : ∀{A}{A′}{c : A ⊑ A′}{M}{N}{M′}{i}
  → #(≽ ∣ N ⊑ᴸᴿₜ M′ ⦂ c) i
  → (M→N : M —↠ N)
  → #(≽ ∣ M ⊑ᴸᴿₜ M′ ⦂ c) i
anti-reduction-≽-L {c = c} {M} {.M} {N′} {i} ℰNM′ (.M END) = ℰNM′
anti-reduction-≽-L {c = c} {M} {M′} {N′} {i} ℰNM′ (.M —→⟨ M→L ⟩ L→*N) =
  anti-reduction-≽-L-one (anti-reduction-≽-L ℰNM′ L→*N) M→L

Blame is more precise

The blame term immediately errors, so it is logically related to any term on the less-precise side.

LRₜ-blame-step : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{dir}{M}{k}
   → #(dir ∣ M ⊑ᴸᴿₜ blame ⦂ A⊑A′) k
LRₜ-blame-step {A}{A′}{A⊑A′}{dir} {M} {zero} = tz (dir ∣ M ⊑ᴸᴿₜ blame ⦂ A⊑A′)
LRₜ-blame-step {A}{A′}{A⊑A′}{≼} {M} {suc k} = inj₂ (inj₁ (blame END))
LRₜ-blame-step {A}{A′}{A⊑A′}{≽} {M} {suc k} = inj₂ (inj₁ isBlame)

LRₜ-blame : ∀{𝒫}{A}{A′}{A⊑A′ : A ⊑ A′}{M}{dir}
   → 𝒫 ⊢ᵒ dir ∣ M ⊑ᴸᴿₜ blame ⦂ A⊑A′
LRₜ-blame {𝒫}{A}{A′}{A⊑A′}{M}{dir} = ⊢ᵒ-intro λ n x → LRₜ-blame-step{dir = dir}

Next we turn to proving lemmas regarding the logical relation for values.

The definitionn of ⊑ᴸᴿᵥ included several clauses that ensured that the related values are indeed syntactic values. Here we make use of that to prove that indeed, logically related values are syntactic values.

LRᵥ⇒Value : ∀ {k}{dir}{A}{A′} (A⊑A′ : A ⊑ A′) M M′
   → # (dir ∣ M ⊑ᴸᴿᵥ M′ ⦂ A⊑A′) (suc k)
     ----------------------------
   → Value M × Value M′
LRᵥ⇒Value {k}{dir} unk⊑unk (V ⟨ G !⟩) (V′ ⟨ H !⟩) 𝒱MM′
    with G ≡ᵍ H
... | no neq = ⊥-elim 𝒱MM′
... | yes refl
    with 𝒱MM′
... | v , v′ , _ = (v 〈 G 〉) , (v′ 〈 G 〉)
LRᵥ⇒Value {k}{≼} (unk⊑{H}{A′} d) (V ⟨ G !⟩) V′ 𝒱VGV′
    with G ≡ᵍ H
... | yes refl
    with 𝒱VGV′
... | v , v′ , _ = (v 〈 _ 〉) , v′
LRᵥ⇒Value {k}{≽} (unk⊑{H}{A′} d) (V ⟨ G !⟩) V′ 𝒱VGV′
    with G ≡ᵍ H
... | yes refl
    with 𝒱VGV′
... | v , v′ , _ = (v 〈 _ 〉) , v′
LRᵥ⇒Value {k}{dir} (unk⊑{H}{A′} d) (V ⟨ G !⟩) V′ 𝒱VGV′
    | no neq = ⊥-elim 𝒱VGV′
LRᵥ⇒Value {k}{dir} (base⊑{ι}) ($ c) ($ c′) refl = ($̬ c) , ($̬ c)
LRᵥ⇒Value {k}{dir} (fun⊑ A⊑A′ B⊑B′) (ƛ N) (ƛ N′) 𝒱VV′ =
    (ƛ̬ N) , (ƛ̬ N′)

If two values are related via ⊑ᴸᴿᵥ, then they are also related via ⊑ᴸᴿₜ at the same step index.

LRᵥ⇒LRₜ-step : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{V V′}{dir}{k}
   → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ A⊑A′) k
     ---------------------------
   → #(dir ∣ V ⊑ᴸᴿₜ V′ ⦂ A⊑A′) k
LRᵥ⇒LRₜ-step {A}{A′}{A⊑A′}{V} {V′} {dir} {zero} 𝒱VV′k =
   tz (dir ∣ V ⊑ᴸᴿₜ V′ ⦂ A⊑A′)
LRᵥ⇒LRₜ-step {A}{A′}{A⊑A′}{V} {V′} {≼} {suc k} 𝒱VV′sk =
  ⇔-fro (LRₜ-suc{dir = ≼})
  (let (v , v′) = LRᵥ⇒Value A⊑A′ V V′ 𝒱VV′sk in
  (inj₂ (inj₂ (v , (V′ , (V′ END) , v′ , 𝒱VV′sk)))))
LRᵥ⇒LRₜ-step {A}{A′}{A⊑A′}{V} {V′} {≽} {suc k} 𝒱VV′sk =
  ⇔-fro (LRₜ-suc{dir = ≽})
  (let (v , v′) = LRᵥ⇒Value A⊑A′ V V′ 𝒱VV′sk in
  inj₂ (inj₂ (v′ , V , (V END) , v , 𝒱VV′sk)))

As a corollary, this holds for all step indices, i.e., it holds in the logic.

LRᵥ⇒LRₜ : ∀{A}{A′}{A⊑A′ : A ⊑ A′}{𝒫}{V V′}{dir}
   → 𝒫 ⊢ᵒ dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ A⊑A′
     ---------------------------
   → 𝒫 ⊢ᵒ dir ∣ V ⊑ᴸᴿₜ V′ ⦂ A⊑A′
LRᵥ⇒LRₜ {A}{A′}{A⊑A′}{𝒫}{V}{V′}{dir} ⊢𝒱VV′ = ⊢ᵒ-intro λ k 𝒫k →
  LRᵥ⇒LRₜ-step{V = V}{V′}{dir}{k} (⊢ᵒ-elim ⊢𝒱VV′ k 𝒫k)

Equations regarding `⊑ᴸᴿᵥ`

We apply the fixpointᵒ theorem to fold or unfold the definition of related lambda abstractions.

LRᵥ-fun : ∀{A B A′ B′}{A⊑A′ : A ⊑ A′}{B⊑B′ : B ⊑ B′}{N}{N′}{dir}
   → (dir ∣ (ƛ N) ⊑ᴸᴿᵥ (ƛ N′) ⦂ fun⊑ A⊑A′ B⊑B′)
      ≡ᵒ (∀ᵒ[ W ] ∀ᵒ[ W′ ] ((▷ᵒ (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ A⊑A′))
                →ᵒ (▷ᵒ (dir ∣ (N [ W ]) ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ B⊑B′))))
LRᵥ-fun {A}{B}{A′}{B′}{A⊑A′}{B⊑B′}{N}{N′}{dir} =
   let X = inj₁ ((A ⇒ B , A′ ⇒ B′ , fun⊑ A⊑A′ B⊑B′) , dir , ƛ N , ƛ N′) in
   (dir ∣ (ƛ N) ⊑ᴸᴿᵥ (ƛ N′) ⦂ fun⊑ A⊑A′ B⊑B′)  ⩦⟨ ≡ᵒ-refl refl ⟩
   LRₜ⊎LRᵥ X                                       ⩦⟨ fixpointᵒ pre-LRₜ⊎LRᵥ X ⟩
   # (pre-LRₜ⊎LRᵥ X) (LRₜ⊎LRᵥ , ttᵖ)                          ⩦⟨ ≡ᵒ-refl refl ⟩
   (∀ᵒ[ W ] ∀ᵒ[ W′ ] ((▷ᵒ (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ A⊑A′))
                   →ᵒ (▷ᵒ (dir ∣ (N [ W ]) ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ B⊑B′)))) ∎

Elimination rules for `⊑ᴸᴿᵥ`

If we are given that two values are logically related at two types related by a particular precision rule, then we can deduce something about the shape of the values.

If the two types are base types, then the values are identical literals.

LRᵥ-base-elim-step : ∀{ι}{ι′}{c : $ₜ ι ⊑ $ₜ ι′}{V}{V′}{dir}{k}
  → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) (suc k)
  → ∃[ c ] ι ≡ ι′ × V ≡ $ c × V′ ≡ $ c
LRᵥ-base-elim-step {ι} {.ι} {base⊑} {$ c} {$ c′} {dir} {k} refl =
  c , refl , refl , refl

If the two types are function types related by fun⊑, then the values are lambda expressions and their bodies are related as follows.

LRᵥ-fun-elim-step : ∀{A}{B}{A′}{B′}{c : A ⊑ A′}{d : B ⊑ B′}{V}{V′}{dir}{k}{j}
  → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ fun⊑ c d) (suc k)
  → j ≤ k
  → ∃[ N ] ∃[ N′ ] V ≡ ƛ N × V′ ≡ ƛ N′ 
      × (∀{W W′} → # (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ c) j
                 → # (dir ∣ (N [ W ]) ⊑ᴸᴿₜ (N′ [ W′ ]) ⦂ d) j)
LRᵥ-fun-elim-step {A}{B}{A′}{B′}{c}{d}{ƛ N}{ƛ N′}{dir}{k}{j} 𝒱VV′ j≤k =
  N , N′ , refl , refl , λ {W}{W′} 𝒱WW′ →
    let 𝒱λNλN′sj = down (dir ∣ (ƛ N) ⊑ᴸᴿᵥ (ƛ N′) ⦂ fun⊑ c d)
                        (suc k) 𝒱VV′ (suc j) (s≤s j≤k) in
    let ℰNWN′W′j = 𝒱λNλN′sj W W′ (suc j) ≤-refl 𝒱WW′ in
    ℰNWN′W′j

For the ≼ direction, if the two types are related by unk⊑, so the less-precise side has type ★, then the value on the less-precise side is an injection and its underlying value is related later.

LRᵥ-dyn-any-elim-≼ : ∀{V}{V′}{k}{H}{A′}{c : gnd⇒ty H ⊑ A′}
   → #(≼ ∣ V ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) (suc k)
   → ∃[ V₁ ] V ≡ V₁ ⟨ H !⟩ × Value V₁ × Value V′
             × #(≼ ∣ V₁ ⊑ᴸᴿᵥ V′ ⦂ c) k
LRᵥ-dyn-any-elim-≼ {V ⟨ G !⟩}{V′}{k}{H}{A′}{c} 𝒱VGV′
    with G ≡ᵍ H
... | no neq = ⊥-elim 𝒱VGV′
... | yes refl
    with 𝒱VGV′
... | v , v′ , 𝒱VV′ = V , refl , v , v′ , 𝒱VV′

For the ≽ direction, if the two types are related by unk⊑, so the less-precise side has type ★, then the value on the less-precise side is an injection and its underlying value is related now, i.e., at the same step-index.

LRᵥ-dyn-any-elim-≽ : ∀{V}{V′}{k}{H}{A′}{c : gnd⇒ty H ⊑ A′}
   → #(≽ ∣ V ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) (suc k)
   → ∃[ V₁ ] V ≡ V₁ ⟨ H !⟩ × Value V₁ × Value V′
             × #(≽ ∣ V₁ ⊑ᴸᴿᵥ V′ ⦂ c) (suc k)
LRᵥ-dyn-any-elim-≽ {V ⟨ G !⟩}{V′}{k}{H}{A′}{c} 𝒱VGV′
    with G ≡ᵍ H
... | no neq = ⊥-elim 𝒱VGV′
... | yes refl
    with 𝒱VGV′
... | v , v′ , 𝒱VV′ = V , refl , v , v′ , 𝒱VV′

Introduction rules for `⊑ᴸᴿᵥ`

In the proofs of the compatibility lemmas we will often need to prove that values of a particular form are related by ⊑ᴸᴿᵥ. The following lemmas do this. We shall need lemmas to handle injections on both the less and more-precise side, and in both directions ≼ and ≽.

We start with the introduction rule for relating literals at base type.

LRᵥ-base-intro-step : ∀{ι}{dir}{c}{k} → # (dir ∣ ($ c) ⊑ᴸᴿᵥ ($ c) ⦂ base⊑{ι}) k
LRᵥ-base-intro-step {ι} {dir} {c} {zero} = tt
LRᵥ-base-intro-step {ι} {dir} {c} {suc k} = refl

LRᵥ-base-intro : ∀{𝒫}{ι}{c}{dir}
   → 𝒫 ⊢ᵒ dir ∣ ($ c) ⊑ᴸᴿᵥ ($ c) ⦂ base⊑{ι}
LRᵥ-base-intro{𝒫}{ι}{c}{dir} = ⊢ᵒ-intro λ k 𝒫k →
  LRᵥ-base-intro-step{ι}{dir}{c}{k}

In the ≽ direction, an injection on the more-precise side is related if its underlying value is related at the same step index.

LRᵥ-inject-R-intro-≽ : ∀{G}{c : ★ ⊑ gnd⇒ty G}{V}{V′}{k}
   → #(≽ ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(≽ ∣ V ⊑ᴸᴿᵥ (V′ ⟨ G !⟩) ⦂ unk⊑unk) k
LRᵥ-inject-R-intro-≽ {G} {c} {V} {V′} {zero} 𝒱VV′ =
     tz (≽ ∣ V ⊑ᴸᴿᵥ (V′ ⟨ G !⟩) ⦂ unk⊑unk)
LRᵥ-inject-R-intro-≽ {G} {c} {V} {V′} {suc k} 𝒱VV′sk
    with unk⊑gnd-inv c
... | d , refl
    with LRᵥ-dyn-any-elim-≽ {V}{V′}{k}{G}{_}{d} 𝒱VV′sk
... | V₁ , refl , v₁ , v′ , 𝒱V₁V′sk
    with G ≡ᵍ G
... | no neq = ⊥-elim 𝒱VV′sk
... | yes refl
    with gnd-prec-unique d Refl⊑
... | refl =
    let 𝒱V₁V′k = down (≽ ∣ V₁ ⊑ᴸᴿᵥ V′ ⦂ d) (suc k) 𝒱V₁V′sk k (n≤1+n k) in
    v₁ , v′ , 𝒱V₁V′k

The same is true for the ≼ direction.

LRᵥ-inject-R-intro-≼ : ∀{G}{c : ★ ⊑ gnd⇒ty G}{V}{V′}{k}
   → #(≼ ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(≼ ∣ V ⊑ᴸᴿᵥ (V′ ⟨ G !⟩) ⦂ unk⊑unk) k
LRᵥ-inject-R-intro-≼ {G} {c} {V} {V′} {zero} 𝒱VV′ =
     tz (≼ ∣ V ⊑ᴸᴿᵥ (V′ ⟨ G !⟩) ⦂ unk⊑unk)
LRᵥ-inject-R-intro-≼ {G} {c} {V} {V′} {suc k} 𝒱VV′sk
    with unk⊑gnd-inv c
... | d , refl
    with LRᵥ-dyn-any-elim-≼ {V}{V′}{k}{G}{_}{d} 𝒱VV′sk
... | V₁ , refl , v₁ , v′ , 𝒱V₁V′k
    with G ≡ᵍ G
... | no neq = ⊥-elim 𝒱VV′sk
... | yes refl
    with gnd-prec-unique d Refl⊑
... | refl = v₁ , v′ , 𝒱V₁V′k

We combine both directions into the following lemma.

LRᵥ-inject-R-intro : ∀{G}{c : ★ ⊑ gnd⇒ty G}{V}{V′}{k}{dir}
   → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(dir ∣ V ⊑ᴸᴿᵥ (V′ ⟨ G !⟩) ⦂ unk⊑unk) k
LRᵥ-inject-R-intro {G} {c} {V} {V′} {k} {≼} 𝒱VV′ =
   LRᵥ-inject-R-intro-≼{G} {c} {V} {V′} {k} 𝒱VV′ 
LRᵥ-inject-R-intro {G} {c} {V} {V′} {k} {≽} 𝒱VV′ =
   LRᵥ-inject-R-intro-≽{G} {c} {V} {V′} {k} 𝒱VV′

In the ≼ direction, an injection on the less-precise side is related if its underlying value is related at one step earlier.

LRᵥ-inject-L-intro-≼ : ∀{G}{A′}{c : gnd⇒ty G ⊑ A′}{V}{V′}{k}
   → Value V
   → Value V′
   → #(≼ ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(≼ ∣ (V ⟨ G !⟩) ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) (suc k)
LRᵥ-inject-L-intro-≼ {G} {A′} {c} {V} {V′} {k} v v′ 𝒱VV′k
    with G ≡ᵍ G
... | no neq = ⊥-elim (neq refl)
... | yes refl =
    v , v′ , 𝒱VV′k

In the ≽ direction, an injection on the less-precise side is related if its underlying value is related now, i.e., at the same step index.

LRᵥ-inject-L-intro-≽ : ∀{G}{A′}{c : gnd⇒ty G ⊑ A′}{V}{V′}{k}
   → #(≽ ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(≽ ∣ (V ⟨ G !⟩) ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) k
LRᵥ-inject-L-intro-≽ {G}{A′}{c}{V}{V′}{zero} 𝒱VV′k =
    tz (≽ ∣ (V ⟨ G !⟩) ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c)
LRᵥ-inject-L-intro-≽ {G} {A′} {c} {V} {V′} {suc k} 𝒱VV′sk
    with G ≡ᵍ G
... | no neq = ⊥-elim (neq refl)
... | yes refl =
      let (v , v′) = LRᵥ⇒Value c V V′ 𝒱VV′sk in
      v , v′ , 𝒱VV′sk

We can combine the two directions into the following lemma, which states that an injection on the less-precise side is related if its underlying value at the same step index. The proof uses downward closedness in the ≼ direction.

LRᵥ-inject-L-intro : ∀{G}{A′}{c : gnd⇒ty G ⊑ A′}{V}{V′}{dir}{k}
   → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) k
   → #(dir ∣ (V ⟨ G !⟩) ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) k
LRᵥ-inject-L-intro {G} {A′} {c} {V} {V′} {≼} {zero} 𝒱VV′k =
    tz (≼ ∣ V ⟨ G !⟩ ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c)
LRᵥ-inject-L-intro {G} {A′} {c} {V} {V′} {≼} {suc k} 𝒱VV′sk
    with G ≡ᵍ G
... | no neq = ⊥-elim (neq refl)
... | yes refl =
    let (v , v′) = LRᵥ⇒Value c V V′ 𝒱VV′sk in
    let 𝒱VV′k = down (≼ ∣ V ⊑ᴸᴿᵥ V′ ⦂ c) (suc k) 𝒱VV′sk k (n≤1+n k) in
    v , v′ , 𝒱VV′k 
LRᵥ-inject-L-intro {G} {A′} {c} {V} {V′} {≽} {k} 𝒱VV′k =
   LRᵥ-inject-L-intro-≽{G} {A′} {c} {V} {V′} 𝒱VV′k

The Bind Lemma

The last technical lemma before we get to the compatibility lemmas in the gnarly Bind Lemma.

Let F and F′ be possibly empty frames and recall that the _⦉_⦊ notation is for plugging a term into a frame.

Roughly speaking, the Bind Lemma shows that if you are trying to prove

F ⦉ M ⦊ ⊑ᴸᴿₜ F′ ⦉ M′ ⦊

for arbitrary terms M and M′, then it suffices to prove that

F ⦉ V ⦊ ⊑ᴸᴿₜ F′ ⦉ V′ ⦊

for some values V and V′ under the assumptions

M —↠ V
M′ —↠ V′
V ⊑ᴸᴿᵥ V′

The Bind Lemma is used in all of the compatibility lemmas concerning terms that have may have reducible sub-terms, i.e., application, injection, and projection.

Here is the statement of the Bind lemma with all the gory details.

LRₜ-bind : ∀{B}{B′}{c : B ⊑ B′}{A}{A′}{d : A ⊑ A′}
                 {F}{F′}{M}{M′}{i}{dir}
   → #(dir ∣ M ⊑ᴸᴿₜ M′ ⦂ d) i
   → (∀ j V V′ → j ≤ i → M —↠ V → Value V → M′ —↠ V′ → Value V′
         → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ d) j
         → #(dir ∣ (F ⦉ V ⦊) ⊑ᴸᴿₜ (F′ ⦉ V′ ⦊) ⦂ c) j)
   → #(dir ∣ (F ⦉ M ⦊) ⊑ᴸᴿₜ (F′ ⦉ M′ ⦊) ⦂ c) i

We define the following abbreviation for the (∀ j V V′ ...) premise of the Bind Lemma.

bind-premise : Dir → PEFrame → PEFrame → Term → Term → ℕ
   → ∀ {B}{B′}(c : B ⊑ B′) → ∀ {A}{A′} (d : A ⊑ A′) → Set
bind-premise dir F F′ M M′ i c d =
    (∀ j V V′ → j ≤ i → M —↠ V → Value V → M′ —↠ V′ → Value V′
     → # (dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ d) j
     → # (dir ∣ (F ⦉ V ⦊) ⊑ᴸᴿₜ (F′ ⦉ V′ ⦊) ⦂ c) j)

The premise is preserved with respect to M reducing to N and also M′ reducing to N′, with the step index decreasing by one, which we show in the following two lemmas.

LRᵥ→LRₜ-down-one-≼ : ∀{B}{B′}{c : B ⊑ B′}{A}{A′}{d : A ⊑ A′}
                      {F}{F′}{i}{M}{N}{M′}
   → M —→ N
   → (bind-premise ≼ F F′ M M′ (suc i) c d)
   → (bind-premise ≼ F F′ N M′ i c d)
LRᵥ→LRₜ-down-one-≼ {B}{B′}{c}{A}{A′}{d}{F}{F′}{i}{M}{N}{M′} M→N LRᵥ→LRₜsi
   j V V′ j≤i M→V v M′→V′ v′ 𝒱j =
   LRᵥ→LRₜsi j V V′ (≤-trans j≤i (n≤1+n i)) (M —→⟨ M→N ⟩ M→V) v M′→V′ v′ 𝒱j

LRᵥ→LRₜ-down-one-≽ : ∀{B}{B′}{c : B ⊑ B′}{A}{A′}{d : A ⊑ A′}
                       {F}{F′}{i}{M}{M′}{N′}
   → M′ —→ N′
   → (bind-premise ≽ F F′ M M′ (suc i) c d)
   → (bind-premise ≽ F F′ M N′ i c d)
LRᵥ→LRₜ-down-one-≽ {B}{B′}{c}{A}{A′}{d}{F}{F′}{i}{M}{N}{M′} M′→N′ LRᵥ→LRₜsi
   j V V′ j≤i M→V v M′→V′ v′ 𝒱j =
   LRᵥ→LRₜsi j V V′ (≤-trans j≤i (n≤1+n i)) M→V v (N —→⟨ M′→N′ ⟩ M′→V′) v′ 𝒱j

The Bind Lemma is proved by induction on the step index i. The base case is trivially true because the logical relation is always true at zero. For the inductive step, we reason separately about the two directions ≼ and ≽, and then reason by cases on the premise that M ⊑ᴸᴿₜ M′. If M or M′ take a single step to related terms, we use the induction hypothesis, applying the above lemmas to obtain the premise of the induction hypothesis. If M or M′ are values, then we use the anti-reduction lemmas. Otherwise, if M′ is blame, then F′ ⦉ blame ⦊ reduces to blame.

LRₜ-bind : ∀{B}{B′}{c : B ⊑ B′}{A}{A′}{d : A ⊑ A′}
                 {F}{F′}{M}{M′}{i}{dir}
   → #(dir ∣ M ⊑ᴸᴿₜ M′ ⦂ d) i
   → (∀ j V V′ → j ≤ i → M —↠ V → Value V → M′ —↠ V′ → Value V′
         → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ d) j
         → #(dir ∣ (F ⦉ V ⦊) ⊑ᴸᴿₜ (F′ ⦉ V′ ⦊) ⦂ c) j)
   → #(dir ∣ (F ⦉ M ⦊) ⊑ᴸᴿₜ (F′ ⦉ M′ ⦊) ⦂ c) i
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F} {F′} {M} {M′} {zero} {dir} ℰMM′sz LRᵥ→LRₜj =
    tz (dir ∣ (F ⦉ M ⦊) ⊑ᴸᴿₜ (F′ ⦉ M′ ⦊) ⦂ c)
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F}{F′}{M}{M′}{suc i}{≼} ℰMM′si LRᵥ→LRₜj
    with ⇔-to (LRₜ-suc{dir = ≼}) ℰMM′si
... | inj₁ (N , M→N , ▷ℰNM′) =
     let IH = LRₜ-bind{c = c}{d = d}{F}{F′}{N}{M′}{i}{≼} ▷ℰNM′
                (LRᵥ→LRₜ-down-one-≼{c = c}{d = d}{F}{F′}{i}{M}{N}{M′}
                     M→N LRᵥ→LRₜj) in
      ⇔-fro (LRₜ-suc{dir = ≼}) (inj₁ ((F ⦉ N ⦊) , ξ′ F refl refl M→N , IH))
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F}{F′}{M}{M′}{suc i}{≼} ℰMM′si LRᵥ→LRₜj 
    | inj₂ (inj₂ (m , (V′ , M′→V′ , v′ , 𝒱MV′))) =
      let ℰFMF′V′ = LRᵥ→LRₜj (suc i) M V′ ≤-refl (M END) m M′→V′ v′ 𝒱MV′ in
      anti-reduction-≼-R ℰFMF′V′ (ξ′* F′ M′→V′)
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F}{F′}{M}{M′}{suc i}{≼} ℰMM′si LRᵥ→LRₜj 
    | inj₂ (inj₁ M′→blame) = inj₂ (inj₁ (ξ-blame₃ F′ M′→blame refl))
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F}{F′}{M}{M′}{suc i}{≽} ℰMM′si LRᵥ→LRₜj 
    with ⇔-to (LRₜ-suc{dir = ≽}) ℰMM′si
... | inj₁ (N′ , M′→N′ , ▷ℰMN′) =
      let ℰFMFN′ : # (≽ ∣ (F ⦉ M ⦊) ⊑ᴸᴿₜ (F′ ⦉ N′ ⦊) ⦂ c) i
          ℰFMFN′ = LRₜ-bind{c = c}{d = d}{F}{F′}{M}{N′}{i}{≽} ▷ℰMN′ 
                   (LRᵥ→LRₜ-down-one-≽{c = c}{d = d}{F}{F′} M′→N′ LRᵥ→LRₜj) in
      inj₁ ((F′ ⦉ N′ ⦊) , (ξ′ F′ refl refl M′→N′) , ℰFMFN′)
... | inj₂ (inj₁ isBlame)
    with F′
... | □ = inj₂ (inj₁ isBlame)
... | ` F″ = inj₁ (blame , ξ-blame F″ , LRₜ-blame-step{dir = ≽})
LRₜ-bind {B}{B′}{c}{A}{A′}{d}{F}{F′}{M}{M′}{suc i}{≽} ℰMM′si LRᵥ→LRₜj 
    | inj₂ (inj₂ (m′ , V , M→V , v , 𝒱VM′)) =
    let xx = LRᵥ→LRₜj (suc i) V M′ ≤-refl M→V v (M′ END) m′ 𝒱VM′ in
    anti-reduction-≽-L xx (ξ′* F M→V)

Compatibility Lemmas

The end is in sight! We just have to prove nine compatibility lemmas. The first few are easy. The ones about projection are the most interesting.

A literal expression $ c is related to itself, via the LRᵥ-base-intro and LRᵥ⇒LRₜ lemmas.

compatible-literal : ∀{Γ}{c}{ι}
   → Γ ⊨ $ c ⊑ᴸᴿ $ c ⦂ ($ₜ ι , $ₜ ι , base⊑)
compatible-literal {Γ}{c}{ι} =
  (λ γ γ′ → LRᵥ⇒LRₜ LRᵥ-base-intro) , (λ γ γ′ → LRᵥ⇒LRₜ LRᵥ-base-intro)

blame on the right-hand side is logically related to anything on the left (less precise) side.

compatible-blame : ∀{Γ}{A}{M}
   → map proj₁ Γ ⊢ M ⦂ A
     -------------------------------
   → Γ ⊨ M ⊑ᴸᴿ blame ⦂ (A , A , Refl⊑)
compatible-blame{Γ}{A}{M} ⊢M = (λ γ γ′ → LRₜ-blame) , (λ γ γ′ → LRₜ-blame)

Next we prove the compatibility lemmas for variables. For that we need to know that given two related substitutions γ ⊑ᴸᴿ γ′, applying them to the same variable yields related values: γ x ⊑ᴸᴿᵥ γ′ x.

lookup-⊑ᴸᴿ : ∀{dir} (Γ : List Prec) → (γ γ′ : Subst)
  → ∀ {A}{A′}{A⊑A′}{x} → Γ ∋ x ⦂ (A , A′ , A⊑A′)
  → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′) ⊢ᵒ dir ∣ γ x ⊑ᴸᴿᵥ γ′ x ⦂ A⊑A′
lookup-⊑ᴸᴿ {dir} (.(A , A′ , A⊑A′) ∷ Γ) γ γ′ {A} {A′} {A⊑A′} {zero} refl = Zᵒ
lookup-⊑ᴸᴿ {dir} (B ∷ Γ) γ γ′ {A} {A′} {A⊑A′} {suc x} ∋x =
   Sᵒ (lookup-⊑ᴸᴿ Γ (λ z → γ (suc z)) (λ z → γ′ (suc z)) ∋x)

We then use LRᵥ⇒LRₜ to show that γ x ⊑ᴸᴿₜ γ′ x. (The sub-var lemma just says that ⟪ γ ⟫ ( x) ≡ γ x`.)

compatibility-var : ∀ {Γ A A′ A⊑A′ x}
  → Γ ∋ x ⦂ (A , A′ , A⊑A′)
    -------------------------------
  → Γ ⊨ ` x ⊑ᴸᴿ ` x ⦂ (A , A′ , A⊑A′)
compatibility-var {Γ}{A}{A′}{A⊑A′}{x} ∋x = LT , GT
  where
  LT : Γ ∣ ≼ ⊨ ` x ⊑ᴸᴿ ` x ⦂ (A , A′ , A⊑A′)
  LT γ γ′ rewrite sub-var γ x | sub-var γ′ x = LRᵥ⇒LRₜ (lookup-⊑ᴸᴿ Γ γ γ′ ∋x)

  GT : Γ ∣ ≽ ⊨ ` x ⊑ᴸᴿ ` x ⦂ (A , A′ , A⊑A′)
  GT γ γ′ rewrite sub-var γ x | sub-var γ′ x = LRᵥ⇒LRₜ (lookup-⊑ᴸᴿ Γ γ γ′ ∋x)

The compatibility lemma for lambda is easy but important. Roughly speaking, tt takes the premise N ⊑ᴸᴿ N′ and stores it in the logical relation for the lambda values, ƛ N ⊑ᴸᴿₜ ƛ N′, which is needed to prove the compatibility lemma for function application.

compatible-lambda : ∀{Γ : List Prec}{A}{B}{C}{D}{N N′ : Term}
     {c : A ⊑ C}{d : B ⊑ D}
   → ((A , C , c) ∷ Γ) ⊨ N ⊑ᴸᴿ N′ ⦂ (B , D , d)
     ------------------------------------------------
   → Γ ⊨ (ƛ N) ⊑ᴸᴿ (ƛ N′) ⦂ (A ⇒ B , C ⇒ D , fun⊑ c d)
compatible-lambda{Γ}{A}{B}{C}{D}{N}{N′}{c}{d} ⊨N⊑N′ =
  (λ γ γ′ → ⊢ℰλNλN′) , (λ γ γ′ → ⊢ℰλNλN′)
 where
 ⊢ℰλNλN′ : ∀{dir}{γ}{γ′} → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′)
            ⊢ᵒ (dir ∣ (⟪ γ ⟫ (ƛ N)) ⊑ᴸᴿₜ (⟪ γ′ ⟫ (ƛ N′)) ⦂ fun⊑ c d)
 ⊢ℰλNλN′ {dir}{γ}{γ′} =
     LRᵥ⇒LRₜ (substᵒ (≡ᵒ-sym LRᵥ-fun)
          (Λᵒ[ W ] Λᵒ[ W′ ] →ᵒI {P = ▷ᵒ (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ c)}
            (appᵒ (Sᵒ (▷→ (monoᵒ (→ᵒI ((proj dir N N′ ⊨N⊑N′)
                                            (W • γ) (W′ • γ′))))))
                  Zᵒ)))

The compatibility lemma for function application shows that two applications are logically related

L · M ⊑ᴸᴿ L′ · M′

if their operator and operand terms are logically related

L ⊑ᴸᴿ L′
M ⊑ᴸᴿ M′

The proof starts with two uses of the Bind Lemma, after which it remains to prove

V · W ⊑ᴸᴿₜ V′ · W′

for some V, W, V′, and W′ where

L —↠ V, L′ —↠ V′, V ⊑ᴸᴿᵥ V′
M —↠ W, M′ —↠ W′, W ⊑ᴸᴿᵥ W′

We apply the elimination lemma for function types, LRᵥ-fun-elim-step, to V ⊑ᴸᴿᵥ V′, so V and V′ are related lambda expressions:

ƛ N ⊑ᴸᴿᵥ ƛ N′

Thanks to the definition of ⊑ᴸᴿᵥ, we therefore know that

N [ W ] ⊑ᴸᴿₜ N′ [ W′ ]

Of course, via β reduction

(ƛ N) · W —→ N [ W ] (ƛ N′) · W′ —→ N′ [ W′ ]

so we can apply anti-reduction to conclude that

(ƛ N) · W ⊑ᴸᴿₜ (ƛ N′) · W′

Now here’s the proof in Agda.

compatible-app : ∀{Γ}{A A′ B B′}{c : A ⊑ A′}{d : B ⊑ B′}{L L′ M M′}
   → Γ ⊨ L ⊑ᴸᴿ L′ ⦂ (A ⇒ B , A′ ⇒ B′ , fun⊑ c d)
   → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (A , A′ , c)
     ----------------------------------
   → Γ ⊨ L · M ⊑ᴸᴿ L′ · M′ ⦂ (B , B′ , d)
compatible-app {Γ}{A}{A′}{B}{B′}{c}{d}{L}{L′}{M}{M′} ⊨L⊑L′ ⊨M⊑M′ =
 (λ γ γ′ → ⊢ℰLM⊑LM′) , λ γ γ′ → ⊢ℰLM⊑LM′
 where
 ⊢ℰLM⊑LM′ : ∀{dir}{γ}{γ′} → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′)
                             ⊢ᵒ dir ∣ ⟪ γ ⟫ (L · M) ⊑ᴸᴿₜ ⟪ γ′ ⟫ (L′ · M′) ⦂ d
 ⊢ℰLM⊑LM′ {dir}{γ}{γ′} = ⊢ᵒ-intro λ n 𝒫n →
  LRₜ-bind{c = d}{d = fun⊑ c d}
               {F = ` (□· (⟪ γ ⟫ M))}{F′ = ` (□· (⟪ γ′ ⟫ M′))}
  (⊢ᵒ-elim ((proj dir L L′ ⊨L⊑L′) γ γ′) n 𝒫n)
  λ j V V′ j≤n L→V v L′→V′ v′ 𝒱VV′j →
  LRₜ-bind{c = d}{d = c}{F = ` (v ·□)}{F′ = ` (v′ ·□)}
   (⊢ᵒ-elim ((proj dir M M′ ⊨M⊑M′) γ γ′) j
   (down (Πᵒ (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′)) n 𝒫n j j≤n))
   λ i W W′ i≤j M→W w M′→W′ w′ 𝒱WW′i →
     Goal{v = v}{v′}{w = w}{w′} i≤j 𝒱VV′j 𝒱WW′i
   where
   Goal : ∀{V}{V′}{v : Value V}{v′ : Value V′}
           {W}{W′}{w : Value W}{w′ : Value W′}{i}{j}
     → i ≤ j
     → # (dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ fun⊑ c d) j
     → # (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ c) i
     → # (dir ∣ ((` (v ·□)) ⦉ W ⦊) ⊑ᴸᴿₜ ((` (v′ ·□)) ⦉ W′ ⦊) ⦂ d) i
   Goal {V} {V′} {v} {v′} {W} {W′} {w}{w′}{zero} {j} i≤j 𝒱VV′j 𝒱WW′i =
     tz (dir ∣ (value v · W) ⊑ᴸᴿₜ (value v′ · W′) ⦂ d)
   Goal {V} {V′} {v} {v′} {W} {W′} {w}{w′}{suc i} {suc j}
       (s≤s i≤j) 𝒱VV′sj 𝒱WW′si
       with LRᵥ-fun-elim-step{A}{B}{A′}{B′}{c}{d}{V}{V′}{dir}{j}{i} 𝒱VV′sj i≤j
   ... | N , N′ , refl , refl , body =
       let 𝒱WW′i = down (dir ∣ W ⊑ᴸᴿᵥ W′ ⦂ c)(suc i)𝒱WW′si i (n≤1+n i) in
       let ℰNWNW′i = body{W}{W′} 𝒱WW′i in
       anti-reduction{c = d}{i = i}{dir = dir} ℰNWNW′i (β w) (β w′)

We have four more compatibility lemmas to prove, regarding injections and projections on the left and right-hand side.

For an injection on the left, we apply the Bind Lemma, so it remains to prove that

V ⟨ G !⟩ ⊑ᴸᴿ V′

for some values V and V′ where

M —↠ V, M′ —↠ V′, V ⊑ᴸᴿᵥ V′

We apply LRᵥ-inject-L-intro to obtain

V ⟨ G !⟩ ⊑ᴸᴿᵥ V′

and then conclude via LRᵥ⇒LRₜ-step.

compatible-inj-L : ∀{Γ}{G A′}{c : gnd⇒ty G ⊑ A′}{M M′}
   → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (gnd⇒ty G , A′ , c)
     ---------------------------------------------
   → Γ ⊨ M ⟨ G !⟩ ⊑ᴸᴿ M′ ⦂ (★ , A′ , unk⊑{G}{A′} c)
compatible-inj-L{Γ}{G}{A′}{c}{M}{M′} ⊨M⊑M′ =
  (λ γ γ′ → ℰMGM′) , (λ γ γ′ → ℰMGM′)
  where
  ℰMGM′ : ∀ {γ}{γ′}{dir}
   → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′) ⊢ᵒ (dir ∣ (⟪ γ ⟫ M ⟨ G !⟩) ⊑ᴸᴿₜ (⟪ γ′ ⟫ M′) ⦂ unk⊑ c)
  ℰMGM′{γ}{γ′}{dir} = ⊢ᵒ-intro λ n 𝒫n →
   LRₜ-bind{c = unk⊑ c}{d = c}{F = ` (□⟨ G !⟩)}{F′ = □}
              {⟪ γ ⟫ M}{⟪ γ′ ⟫ M′}{n}{dir}
   (⊢ᵒ-elim ((proj dir M M′ ⊨M⊑M′) γ γ′) n 𝒫n)
   λ j V V′ j≤n M→V v M′→V′ v′ 𝒱VV′j →
   LRᵥ⇒LRₜ-step{★}{A′}{unk⊑ c}{V ⟨ G !⟩}{V′}{dir}{j}
   (LRᵥ-inject-L-intro{G}{A′}{c}{V}{V′}{dir}{j} 𝒱VV′j)

For an injection on the right, the proof is similar but uses the LRᵥ-inject-R-intro lemma.

compatible-inj-R : ∀{Γ}{G}{c : ★ ⊑ gnd⇒ty G }{M M′}
   → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (★ , gnd⇒ty G , c)
   → Γ ⊨ M ⊑ᴸᴿ M′ ⟨ G !⟩ ⦂ (★ , ★ , unk⊑unk)
compatible-inj-R{Γ}{G}{c}{M}{M′} ⊨M⊑M′
    with unk⊑gnd-inv c
... | d , refl = (λ γ γ′ → ℰMM′G) , λ γ γ′ → ℰMM′G
  where
  ℰMM′G : ∀{γ}{γ′}{dir}
    → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′) ⊢ᵒ dir ∣ (⟪ γ ⟫ M) ⊑ᴸᴿₜ (⟪ γ′ ⟫ M′ ⟨ G !⟩) ⦂ unk⊑unk
  ℰMM′G {γ}{γ′}{dir} = ⊢ᵒ-intro λ n 𝒫n →
   LRₜ-bind{c = unk⊑unk}{d = unk⊑ d}{F = □}{F′ = ` (□⟨ G !⟩)}
              {⟪ γ ⟫ M}{⟪ γ′ ⟫ M′}{n}{dir}
   (⊢ᵒ-elim ((proj dir M M′ ⊨M⊑M′) γ γ′) n 𝒫n)
   λ j V V′ j≤n M→V v M′→V′ v′ 𝒱VV′j →
   LRᵥ⇒LRₜ-step{★}{★}{unk⊑unk}{V}{V′ ⟨ G !⟩}{dir}{j}
   (LRᵥ-inject-R-intro{G}{unk⊑ d}{V}{V′}{j} 𝒱VV′j )

For projection on the left, we again start with an application of the Bind Lemma. So we need to show that

V ⟨ H ?⟩ ⊑ᴸᴿₜ V′

for some values V and V′ where

M —↠ V, M′ —↠ V′, V ⊑ᴸᴿᵥ V′

The proof is by case on the step index j. The case for zero is trivially true because the logical relation is always true at zero. For the case suc j, we need to prove

#(V ⟨ H ?⟩ ⊑ᴸᴿₜ V′) (suc j)

We proceed by cases on the two directions ≼ and ≽.

For the ≼ case, we use lemma LRᵥ-dyn-any-elim-≼ with #(V ⊑ᴸᴿᵥ V′) (suc j) to obtain

V ≡ V₁ ⟨ H !⟩
#(V₁ ⊑ᴸᴿᵥ V′) j

We use LRᵥ⇒LRₜ-step to obtain

#(V₁ ⊑ᴸᴿₜ V′) j

and then because

V₁ ⟨ H !⟩ ⟨ H ?⟩ —→ V₁

The anti-reduction-≼-L-one lemma allows us to conclude that

#(V₁ ⟨ H !⟩ ⟨ H ?⟩ ⊑ᴸᴿₜ V′) (suc j)

For the ≽ case, we use lemma LRᵥ-dyn-any-elim-≽ with #(V ⊑ᴸᴿᵥ V′) (suc j) to obtain

V ≡ V₁ ⟨ H !⟩
#(V₁ ⊑ᴸᴿᵥ V′) (suc j)

(Recall that in the definition of ⊑ᴸᴿᵥ for unk⊑ and ≽, we chose to relate the underlying value now, i.e., at suc j.) By definition, to prove #(V₁⟨ H !⟩⟨ H ?⟩ ⊑ₜ V′) (suc j), it suffices to show that the left-hand side reduces to a related value at suc j (because the right-hand side is a value), which we have already proved.

compatible-proj-L : ∀{Γ}{H}{A′}{c : gnd⇒ty H ⊑ A′}{M}{M′}
   → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (★ , A′ ,  unk⊑ c)
   → Γ ⊨ M ⟨ H ?⟩ ⊑ᴸᴿ M′ ⦂ (gnd⇒ty H , A′ , c)
compatible-proj-L {Γ}{H}{A′}{c}{M}{M′} ⊨M⊑M′ =
  (λ γ γ′ → ℰMHM′) , λ γ γ′ → ℰMHM′
  where
  ℰMHM′ : ∀{γ}{γ′}{dir} → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′)
       ⊢ᵒ dir ∣ (⟪ γ ⟫ M ⟨ H ?⟩) ⊑ᴸᴿₜ (⟪ γ′ ⟫ M′) ⦂ c
  ℰMHM′ {γ}{γ′}{dir} = ⊢ᵒ-intro λ n 𝒫n →
   LRₜ-bind{c = c}{d = unk⊑ c}{F = ` (□⟨ H ?⟩)}{F′ = □}
              {⟪ γ ⟫ M}{⟪ γ′ ⟫ M′}{n}{dir}
   (⊢ᵒ-elim ((proj dir M M′ ⊨M⊑M′) γ γ′) n 𝒫n)
   λ j V V′ j≤n M→V v M′→V′ v′ 𝒱VV′j → Goal{j}{V}{V′}{dir} 𝒱VV′j 
   where
   Goal : ∀{j}{V}{V′}{dir}
       → #(dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ unk⊑ c) j
       → #(dir ∣ (V ⟨ H ?⟩) ⊑ᴸᴿₜ V′ ⦂ c) j
   Goal {zero} {V} {V′}{dir} 𝒱VV′j =
       tz (dir ∣ (V ⟨ H ?⟩) ⊑ᴸᴿₜ V′ ⦂ c)
   Goal {suc j} {V} {V′}{≼} 𝒱VV′sj
       with LRᵥ-dyn-any-elim-≼{V}{V′}{j}{H}{A′}{c} 𝒱VV′sj
   ... | V₁ , refl , v₁ , v′ , 𝒱V₁V′j =
       let V₁HH→V₁ = collapse{H}{V = V₁} v₁ refl in
       let ℰV₁V′j = LRᵥ⇒LRₜ-step{gnd⇒ty H}{A′}{c}{V₁}{V′}{≼}{j} 𝒱V₁V′j in
       anti-reduction-≼-L-one ℰV₁V′j V₁HH→V₁
   Goal {suc j} {V} {V′}{≽} 𝒱VV′sj
       with LRᵥ-dyn-any-elim-≽{V}{V′}{j}{H}{A′}{c} 𝒱VV′sj
   ... | V₁ , refl , v₁ , v′ , 𝒱V₁V′sj =
       let V₁HH→V₁ = collapse{H}{V = V₁} v₁ refl in
       inj₂ (inj₂ (v′ , V₁ , unit V₁HH→V₁ , v₁ , 𝒱V₁V′sj))

The last compatibility lemma is for projection on the right. As usual we start with the Bind Lemma, so our goal is to prove that

V ⊑ᴸᴿₜ V′ ⟨ H ?⟩

for some values V and V′ where

M —↠ V, M′ —↠ V′, V ⊑ᴸᴿᵥ V′

The proof is by cases on the step index j. The case for zero is trivially true because the logical relation is always true at zero. In the case for suc j, we need to prove

#(V ⊑ᴸᴿₜ V′ ⟨ H ?⟩) (suc j)

Note that V and V′ are both of type ★, so by definition #(V ⊑ᴸᴿᵥ V′) (suc j) gives us

V ≡ V₁ ⟨ G !⟩
V′ ≡ V₁′ ⟨ G !⟩
#(V₁ ⊑ᴸᴿᵥ V₁′) j

We proceed by cases on whether or not G ≡ H.

Suppose G ≢ H. Then we have

V′₁ ⟨ G !⟩⟨ H ?⟩ —→ blame

We proceed by cases on the direction. For the ≼ direction we can immediately conclude by the definition of ⊑ᴸᴿₜ because the right-hand side reduces to blame.

#(V₁ ⟨ G !⟩ ⊑ᴸᴿₜ V′₁ ⟨ G !⟩⟨ H ?⟩) (suc j)

For the ≽ direction, we apply anti-reduction-≽-R-one, so it suffices to show

V₁ ⟨ G !⟩ ⊑ᴸᴿₜ blame

which we obtain by LRₜ-blame-step.

Next suppose G ≡ H. Then we have

V′₁ ⟨ G !⟩⟨ H ?⟩ —→ V′₁

For the ≼ direction, since we have a value on the left-hand side, we need the right-hand side to reduce to a related value. So it remains to show that

#(V₁⟨ G !⟩ ⊑ᴸᴿᵥ V′₁) (suc j)

which we have from #(V₁ ⊑ᴸᴿᵥ V₁′) j and the definition of ⊑ᴸᴿᵥ for unk⊑ and ≼. (Recall that we choose to use the later operator in that case of ⊑ᴸᴿᵥ.)

For the ≽ direction, we apply anti-reduction-≽-R-one, so it remains to prove that

#(V₁⟨ G !⟩ ⊑ᴸᴿₜ V′₁) j

Next we apply LRᵥ⇒LRₜ-step, so our goal reduces to

#(V₁⟨ G !⟩ ⊑ᴸᴿᵥ V′₁) j

which we prove by LRᵥ-inject-L-intro-≽ using #(V₁ ⊑ᴸᴿᵥ V₁′) j.

compatible-proj-R : ∀{Γ}{H}{c : ★ ⊑ gnd⇒ty H}{M}{M′}
   → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (★ , ★ , unk⊑unk)
   → Γ ⊨ M ⊑ᴸᴿ M′ ⟨ H ?⟩ ⦂ (★ , gnd⇒ty H , c)
compatible-proj-R {Γ}{H}{c}{M}{M′} ⊨M⊑M′
    with unk⊑gnd-inv c
... | d , refl = (λ γ γ′ → ℰMM′H) , λ γ γ′ → ℰMM′H
    where
    ℰMM′H : ∀{γ}{γ′}{dir} → (Γ ∣ dir ⊨ γ ⊑ᴸᴿ γ′)
             ⊢ᵒ dir ∣ (⟪ γ ⟫ M) ⊑ᴸᴿₜ (⟪ γ′ ⟫ M′ ⟨ H ?⟩) ⦂ unk⊑ d
    ℰMM′H {γ}{γ′}{dir} = ⊢ᵒ-intro λ n 𝒫n →
     LRₜ-bind{c = c}{d = unk⊑unk}{F = □}{F′ = ` □⟨ H ?⟩}
                {⟪ γ ⟫ M}{⟪ γ′ ⟫ M′}{n}{dir}
     (⊢ᵒ-elim ((proj dir M M′ ⊨M⊑M′) γ γ′) n 𝒫n)
     λ j V V′ j≤n M→V v M′→V′ v′ 𝒱VV′j →
     Goal {j}{V}{V′}{dir} 𝒱VV′j 
     where
     Goal : ∀{j}{V}{V′}{dir}
        → # (dir ∣ V ⊑ᴸᴿᵥ V′ ⦂ unk⊑unk) j
        → # (dir ∣ V ⊑ᴸᴿₜ (V′ ⟨ H ?⟩) ⦂ unk⊑ d) j
     Goal {zero} {V} {V′}{dir} 𝒱VV′j =
         tz (dir ∣ V ⊑ᴸᴿₜ (V′ ⟨ H ?⟩) ⦂ unk⊑ d)
     Goal {suc j} {V₁ ⟨ G !⟩} {V′₁ ⟨ H₂ !⟩}{dir} 𝒱VV′sj
         with G ≡ᵍ H₂ | 𝒱VV′sj
     ... | no neq | ()
     ... | yes refl | v₁ , v′ , 𝒱V₁V′₁j
         with G ≡ᵍ G
     ... | no neq = ⊥-elim (neq refl)
     ... | yes refl
         with G ≡ᵍ H
         {-------- Case G ≢ H ---------}
     ... | no neq
         with dir
         {-------- Subcase ≼ ---------}
     ... | ≼ = inj₂ (inj₁ (unit (collide v′ neq refl)))
         {-------- Subcase ≽ ---------}
     ... | ≽ = anti-reduction-≽-R-one (LRₜ-blame-step{★}{gnd⇒ty H}{unk⊑ d}{≽})
                                      (collide v′ neq refl)
     Goal {suc j} {V₁ ⟨ G !⟩} {V′₁ ⟨ H₂ !⟩}{dir} 𝒱VV′sj
         | yes refl | v₁ , v′ , 𝒱V₁V′₁j | yes refl
         {-------- Case G ≡ H ---------}
         | yes refl 
         with dir
         {-------- Subcase ≼ ---------}
     ... | ≼
         with G ≡ᵍ G
     ... | no neq = ⊥-elim (neq refl)
     ... | yes refl 
         with gnd-prec-unique d Refl⊑
     ... | refl =
           let V₁G⊑V′₁sj = v₁ , v′ , 𝒱V₁V′₁j in
           inj₂ (inj₂ (v₁ 〈 G 〉 ,
                       (V′₁ , unit (collapse v′ refl) , v′ , V₁G⊑V′₁sj)))
     Goal {suc j} {V₁ ⟨ G !⟩} {V′₁ ⟨ H₂ !⟩}{dir} 𝒱VV′sj
         | yes refl | v₁ , v′ , 𝒱V₁V′₁j | yes refl
         | yes refl 
         {-------- Subcase ≽ ---------}
         | ≽
         with gnd-prec-unique d Refl⊑
     ... | refl =
         let 𝒱VGV′j = LRᵥ-inject-L-intro-≽ {G}{gnd⇒ty G}{d} 𝒱V₁V′₁j in
         let ℰVGV′j = LRᵥ⇒LRₜ-step{V = V₁ ⟨ G !⟩}{V′₁}{≽} 𝒱VGV′j in
         anti-reduction-≽-R-one ℰVGV′j (collapse v′ refl)

Proof of the Fundamental Lemma

With the compatibility lemmas finished, the difficulty is behind us. We prove the Fundamental Lemma by induction on term precision, using the appropriate compatibility lemma for each case.

fundamental : ∀ {Γ}{A}{A′}{A⊑A′ : A ⊑ A′} → (M M′ : Term)
  → Γ ⊩ M ⊑ M′ ⦂ A⊑A′
    ----------------------------
  → Γ ⊨ M ⊑ᴸᴿ M′ ⦂ (A , A′ , A⊑A′)
fundamental {Γ} {A} {A′} {A⊑A′} .(` _) .(` _) (⊑-var ∋x) =
   compatibility-var ∋x
fundamental {Γ} {_} {_} {base⊑} ($ c) ($ c) ⊑-lit =
   compatible-literal
fundamental {Γ} {A} {A′} {A⊑A′} (L · M) (L′ · M′) (⊑-app ⊢L⊑L′ ⊢M⊑M′) =
    compatible-app{L = L}{L′}{M}{M′} (fundamental L L′ ⊢L⊑L′)
                                     (fundamental M M′ ⊢M⊑M′)
fundamental {Γ} {.(_ ⇒ _)} {.(_ ⇒ _)} {.(fun⊑ _ _)} (ƛ N)(ƛ N′) (⊑-lam ⊢N⊑N′) =
    compatible-lambda{N = N}{N′} (fundamental N N′ ⊢N⊑N′)
fundamental {Γ} {★} {A′} {unk⊑ c} (M ⟨ G !⟩) M′ (⊑-inj-L ⊢M⊑M′) =
    compatible-inj-L{G =  G}{M = M}{M′} (fundamental M M′ ⊢M⊑M′)
fundamental {Γ} {★} {★} {.unk⊑unk} M (M′ ⟨ G !⟩) (⊑-inj-R ⊢M⊑M′) =
    compatible-inj-R{Γ}{G = G}{M = M}{M′} (fundamental M M′ ⊢M⊑M′)
fundamental {Γ} {_} {A′} {A⊑A′} (M ⟨ H ?⟩) M′ (⊑-proj-L ⊢M⊑M′) =
    compatible-proj-L{Γ}{H}{A′}{M = M}{M′} (fundamental M M′ ⊢M⊑M′)
fundamental {Γ} {A} {.(gnd⇒ty _)} {A⊑A′} M (M′ ⟨ H′ ?⟩) (⊑-proj-R ⊢M⊑M′) =
    compatible-proj-R{M = M}{M′} (fundamental M M′ ⊢M⊑M′)
fundamental {Γ} {A} {.A} {.Refl⊑} M .blame (⊑-blame ⊢M∶A) =
   compatible-blame ⊢M∶A

Proof of the Gradual Guarantee

The gradual guarantee is proved by putting together the fundamental lemma with the LR⇒GG lemma.

gradual-guarantee : ∀ {A}{A′}{A⊑A′ : A ⊑ A′} → (M M′ : Term)
   → [] ⊩ M ⊑ M′ ⦂ A⊑A′
    ---------------------------
   → (M′ ⇓ → M ⇓)
   × (M′ ⇑ → M ⇑)
   × (M ⇓ → M′ ⇓ ⊎ M′ —↠ blame)
   × (M ⇑ → M′ ⇑⊎blame)
   × (M —↠ blame → M′ —↠ blame)
gradual-guarantee {A}{A′}{A⊑A′} M M′ M⊑M′ =
  let (⊨≼M⊑ᴸᴿM′ , ⊨≽M⊑ᴸᴿM′) = fundamental M M′ M⊑M′ in
  LR⇒GG (⊨≼M⊑ᴸᴿM′ id id ,ᵒ ⊨≽M⊑ᴸᴿM′ id id)

Type Safety in 10 Easy, 4 Medium, and 1 Hard Lemma using Step-indexed Logical Relations

2023-04-17T09:15:00.013-07:00

```
{-# OPTIONS --rewriting #-}
module rewriting.examples.BlogTypeSafety10Easy4Med1Hard where

open import Data.Bool using (true; false) renaming (Bool to 𝔹)
open import Data.Empty using (⊥; ⊥-elim)
open import Data.Nat
open import Data.Nat.Properties using (≤-refl)
open import Data.List using (List; []; _∷_)
open import Data.Product using (_,_;_×_; proj₁; proj₂; Σ-syntax; ∃-syntax)
open import Data.Sum using (_⊎_; inj₁; inj₂)
open import Data.Unit using (⊤; tt)
open import Data.Unit.Polymorphic renaming (⊤ to topᵖ; tt to ttᵖ)
open import Relation.Binary.PropositionalEquality as Eq
  using (_≡_; _≢_; refl; sym; cong; subst; trans)
open import Relation.Nullary using (¬_; yes; no)
```

Ok, so logical relations are overkill for proving type safety. The
proof technique is better suited to proving more interesting
properties such as parametricity, program equivalence, and the gradual
guarantee.  Nevertheless, understanding a proof of type safety via
logical relations is a helpful stepping stone to understanding these
more complex use cases, especially when the logical relations employ
more advanced techniques, such as step indexing.  In this blog post I
prove type safety of a cast calculus (an intermediate language of the
gradually typed lambda calculus).  The proof is in Agda and the proof
uses step-indexed logical relations because the presence of the
unknown type (aka. dynamic type) prevents the use of logical relations
that are only indexed by types. To reduce the clutter of reasoning
about step indexing, we conduct the proof using a temporal logic, in
the spirit of the LSLR logic of Dreyer, Ahmed, and Birkedal (LMCS 2011),
that we embed in Agda.

This is a literate Agda file, so most of the details are here, but it
imports several items whose root on github is here:

    https://github.com/jsiek/abstract-binding-trees/tree/master/src

This post is based on work with Philip Wadler and Peter Thiemann.

## Review of the Cast Calculus

```
open import Var
open import rewriting.examples.Cast
```

We review the syntax and reduction rules of this cast calculus.  Just
like the lambda calculus, types include base types (Booleans and
natural numbers), and function types. To support gradual typing, we
include the unknown type ★.

    ι ::= 𝔹 | ℕ
    A,B,C,G,H ::= ι | A ⇒ B | ★

The ground types are 

    G,H ::= ι | ★⇒★

Just like the lambda calculus, there are variables (de Bruijn
indices), lambdas, and application. We throw in literals
(Booleans and natural numbers).  Also, to support gradual typing, we
include a term `M ⟨ G !⟩` for injecting from a ground type `G` to the
unknown type, and a term `M ⟨ H ?⟩` for projecting from the unknown
type back out to a ground type.  Finally, we include the `blame` term
to represent trapped runtime errors.

    L,M,N ::= ` x | ƛ N | L · M | $ k | M ⟨ G !⟩ | M ⟨ H ?⟩ | blame

This cast calculus is somewhat unusual in that it only includes
injections and projections but not the other kinds of casts that one
typically has in a cast calculus, such as a cast from one function type
`★ ⇒ ℕ` to another function type `ℕ ⇒ ℕ`. That is OK because those
other casts can still be expressed in this cast calculus.

The values include lambdas, literals, and injected values.

    V,W ::= ƛ N | $ c | V ⟨ G !⟩

The reduction rules make use of frames, which are defined as follows.

    F ::= □· M | V ·□ | □⟨ G !⟩ | □⟨ H ?⟩

The operation `F ⟦ M ⟧` plugs a term into a frame.

The reduction rules of the cast calculus are as follows:

    (ξ)        If M —→ N, then F ⟦ M ⟧ —→ F ⟦ N ⟧
    (ξ-blame)  F ⟦ blame ⟧ —→ blame
    (β)        (ƛ N) · W —→ N [ W ]
    (collapse) V ⟨ G !⟩ ⟨ G ?⟩ —→ V
    (collide)  If G ≢ H, then V ⟨ G !⟩ ⟨ H ?⟩ —→ blame.


## A First Attempt at a Logical Relation for Type Safety

The following is a first attempt to define a logical relation for type
safety for the cast calculus. The predicate ℰ expresses the semantic
notion of a term being well typed at a given type A. Here "semantic"
means "runtime behavior". We define that a term M is semantically well
typed at type A if it satisfies "progress" and "preservation". The
progress part says that M is either (1) a semantic value at type `A`,
(2) reducible, or (3) an error. The preservation part says that if M
reduces to N, then N is also semantically well typed at A.

    ℰ⟦_⟧ : (A : Type) → Term → Set
    ℰ⟦ A ⟧ M = (𝒱 ⟦ A ⟧ M ⊎ reducible M ⊎ Blame M)
                × (∀ N → (M —→ N) → ℰ⟦ A ⟧ N)

The predicate 𝒱 expresses the semantic notion of a value being well
typed at some type A. For a base type `ι` (𝔹 or ℕ), the value must be
the appropriate kind of literal (Boolean or natural number). For a
function type `A ⇒ B`, the value must be a lambda expression `ƛ N`,
and furthermore, substituting any value `W` that is semantically well
typed at `A` into the body `N` produces a term that is semantically
well typed at `B`. For the unknown type `★`, the value must be
an injection of a value `V` from some ground type `G`, and `V`
must be semantically well typed at `G`.

    𝒱⟦_⟧ : (A : Type) → Term → Set
    𝒱⟦ ι ⟧ ($ c) = ι ≡ typeof c
    𝒱⟦ A ⇒ B ⟧ (ƛ N) = ∀ W → 𝒱⟦ A ⟧ W → ℰ⟦ B ⟧ (N [ W ])
    𝒱⟦ ★ ⟧ (V ⟨ G !⟩) = Value V × 𝒱⟦ gnd⇒ty G ⟧ V
    𝒱⟦ _ ⟧ _ = ⊥

Note that the definitions of ℰ and 𝒱 are recursive. Unfortunately they
are not proper definitions of (total) functions because there is no
guarantee of their termination. For simple languages, like the Simply
Typed Lambda Calculus, 𝒱 can be defined by recursion on the type
`A`. However, here we have the unknown type `★` and the recursion in that
clause invokes `𝒱⟦ gnd⇒ty G ⟧ V`, but `gnd⇒ty G` is
not a structural part of `★` (nothing is).
(The definition of ℰ above is also problematic, but one could
reformulate ℰ to remove the recursion in ℰ.)

## An Explicitly Step-indexed Logical Relation for Type Safety

We can force the definitions of ℰ and 𝒱 to terminate using
step-indexing (aka. the "gasoline" technique), which was first applied
to logical relations by Appel and McAllester (TOPLAS 2001). We add a
parameter k (a natural number) to ℰ and 𝒱, and decrement k on each
recursive call. When k is zero, ℰ and 𝒱 accept all terms. Thus, the
meaning of `ℰ⟦ A ⟧ M k` is that term `M` is guaranteed to behave
according to type `A` for `k` reduction steps, but after that there
are no guarantees.

    ℰ⟦_⟧ : (A : Type) → Term → ℕ → Set
    ℰ⟦ A ⟧ M 0 = ⊤
    ℰ⟦ A ⟧ M (suc k) = (𝒱 ⟦ A ⟧ M k ⊎ reducible M ⊎ Blame M)
                        × (∀ N → (M —→ N) → ℰ⟦ A ⟧ N k)

    𝒱⟦_⟧ : (A : Type) → Term → ℕ → Set
    𝒱⟦ A ⟧ M 0 = ⊤
    𝒱⟦ ι ⟧ ($ ι′ c) (suc k) = ι ≡ ι′
    𝒱⟦ A ⇒ B ⟧ (ƛ N) (suc k) = ∀ W → 𝒱⟦ A ⟧ W k → ℰ⟦ B ⟧ (N [ W ]) k
    𝒱⟦ ★ ⟧ (V ⟨ G !⟩) (suc k) = Value V × 𝒱⟦ gnd⇒ty G ⟧ V k
    𝒱⟦ _ ⟧ _ (suc k) = ⊥

We now have proper definitions of ℰ and 𝒱 but proving theorems about
these definitions involves a fair bit of reasoning about the step
indices, which is tedious, especially in Agda because it's support for
automating proofs about arithmetic is cumbersome to use.  To
streamline the definitions and proofs that involve step indexing,
Dreyer, Ahmed, and Birkedal (2011) propose the use of a temporal logic
that hides the step indexing. Next we discuss the embedding of such a
logic in Agda.


## Step-indexed Logic

```
open import rewriting.examples.StepIndexedLogic2
```

Our Step-indexed Logic (SIL) includes first-order logic (i.e., a logic
with "and", "or", "implies", "for all", etc.). To distinguish its
connectives from Agda's, we add a superscript "o". So "and" is written
`×ᵒ`, "implies" is written `→ᵒ`, and so on.  SIL also includes a
notion of time in which there is a clock counting down. The logic is
designed in such a way that if a formula `P` is true at some time then
`P` stays true in the future (at lower counts). So formulas are
downward closed.  When the clock reaches zero, every formula becomes
true.  Furthermore, the logic includes a "later" operator, written `▷ᵒ
P`, meaning that `P` is true one clock tick in the future. When we use
SIL to reason about the cast calculus, one clock tick will correspond
to one reduction step.

Just as `Set` is the type of true/false formulas in Agda, `Setᵒ` is
the type of true/false formulas in SIL. It is a record that bundles
the formula itself, represented with a function of type `ℕ → Set`,
with proofs that the formula is downward closed and true at zero.

    record Setᵒ : Set₁ where
      field
        # : ℕ → Set
        down : downClosed #
        tz : # 0                -- tz short for true at zero
    open Setᵒ public

For example, the "false" proposition is false at every time except zero.

    ⊥ᵒ : Setᵒ
    ⊥ᵒ = record { # = λ { zero → ⊤ ; (suc k) → ⊥ }
                ; down = ... ; tz = ... }

The "and" proposition `P ×ᵒ Q` is true at a given time `k` if both `P`
and `Q` are true at time `k`.

    _×ᵒ_ : Setᵒ → Setᵒ → Setᵒ
    P ×ᵒ Q = record { # = λ k → # P k × # Q k
                    ; down = ... ; tz = ... }

The "for all" proposition `∀ᵒ[ a ] P` is true at a given time `k` if
the predicate `P` is true for all `a` at time `k`.

    ∀ᵒ : ∀{A : Set} → (A → Setᵒ) → Setᵒ
    ∀ᵒ{A} P = record { # = λ k → ∀ (a : A) → # (P a) k
                     ; down = ... ; tz = ... }

The "exists" proposition `∃ᵒ[ a ] P` is true at a given time `k` if
the predicate `P` is true for some `a` at time `k`. However, we
must require that the type `A` is inhabited so that this proposition
is true at time zero.

    ∃ᵒ : ∀{A : Set}{{_ : Inhabited A}} → (A → Setᵒ) → Setᵒ
    ∃ᵒ{A} P = record { # = λ k → Σ[ a ∈ A ] # (P a) k
                         ; down = ... ; tz = ... }

We embed arbitrary Agda formulas into the step-indexed logic with the
following constant operator, written `S ᵒ`, which is true if and only
if `S` is true, except at time zero, when `S ᵒ` has to be true.

    _ᵒ  : Set → Setᵒ
    S ᵒ = record { # = λ { zero → ⊤ ; (suc k) → S }
                 ; down = ... ; tz = ... }

Next we discuss the most important and interesting of the propositions,
the one for defining a recursive predicate. The following is a first
attempt at writing down the type of this proposition. The idea is that
this constructor of recursive predicates works like the Y-combinator
in that it turns a non-recursive predicate into a recursive one.

    μᵒ : ∀{A}
       → (A → (A → Setᵒ) → Setᵒ)
         -----------------------
       → A → Setᵒ

The non-recursive predicate has type `A → (A → Setᵒ) → Setᵒ`. It has
an extra parameter `(A → Setᵒ)` that will be bound to the
recursive predicate itself. To clarify, lets look at an example.
Suppose we want to define multi-step reduction according to
the following rules:

                M —→ L    L —→* N
    -------     ------------------
    M —→* M     M —→* N

We would first define a non-recursive predicate that has an extra
parameter, let us name it `R` for recursion. Inside the definition of
`mreduce`, we use `R` is the place where we would recursively use
`mreduce`, as follows.

    mreduce : Term × Term → (Term × Term → Setᵒ) → Setᵒ
    mreduce (M , N) R = (M ≡ N)ᵒ ⊎ᵒ (∃ᵒ[ L ] (M —→ L)ᵒ ×ᵒ R (L , N))

Because we use `∃ᵒ` with a Term, we need to prove that Term is inhabited.

```
instance
  TermInhabited : Inhabited Term
  TermInhabited = record { elt = ` 0 }

```
We then apply the `μᵒ` proposition to `mreduce` to
obtain the desired recursive predicate `—→*`.

    _—→*_ : Term → Term → Setᵒ
    M —→* N = μᵒ mreduce (M , N)

The problem with the above story is that it's not possible in Agda (to
my knowledge) to construct a recursive predicate from an arbitrary
function of type `A → (A → Setᵒ) → Setᵒ`. Instead, we need to place
restrictions on the function. In particular, if we make sure that the
recursion never happens "now", but only "later", then it becomes
possible to construct `μᵒ`. We define the `Setˢ` type in Agda to
capture this restriction. (The superscript "s" stands for step
indexed.) Furthermore, to allow the nesting of recursive definitions,
we must generalize from a single predicate parameter to an environment
of predicates. The type of the environment is given by a `Context`:

    Context : Set₁
    Context = List Set

We represent an environment of recursive predicates with a tuple of
the following type.

    RecEnv : Context → Set₁
    RecEnv [] = topᵖ 
    RecEnv (A ∷ Γ) = (A → Setᵒ) × RecEnv Γ

We use de Bruijn indices to represent the variables that refer to the
recursive predicates, which we define as follows.

    data _∋_ : Context → Set → Set₁ where
      zeroˢ : ∀{Γ}{A} → (A ∷ Γ) ∋ A
      sucˢ : ∀{Γ}{A}{B} → Γ ∋ B → (A ∷ Γ) ∋ B

For each variable, we track whether it has been used "now" or not. So
we define `Time` as follows. The `Later` constructor does double duty
to mean a predicate has either been used "later" or not at all.

    data Time : Set where
      Now : Time
      Later : Time

The following defines a list of times, one for each variable in `Γ`.

    data Times : Context → Set₁ where
      ∅ : Times []
      cons : ∀{Γ}{A} → Time → Times Γ → Times (A ∷ Γ)

The `Setˢ` type is a record indexed by the type of the environment and
by the time for each variable. The representation of `Setˢ` (the `#`
field) is a function that maps an environment of predicates
(one predicate for each in-scope μ) to a `Setᵒ`.

    record Setˢ (Γ : Context) (ts : Times Γ) : Set₁ where
      field
        # : RecEnv Γ → Setᵒ 
        ...
    open Setˢ public

We define variants of all the propositional connectives to work on
Setˢ.

The "later" operator `▷ˢ` asserts that `P` is true in the future, so the
predicate `▷ˢ P` can safely say that any use of recursive predicate in
`P` happens `Later`.

    laters : ∀ (Γ : Context) → Times Γ
    laters [] = ∅
    laters (A ∷ Γ) = cons Later (laters Γ)

    ▷ˢ : ∀{Γ}{ts : Times Γ}
       → Setˢ Γ ts
         -----------------
       → Setˢ Γ (laters Γ)

The "and" operator, `P ×ˢ Q` is categorized as `Later` for a variable
only if both `P` and `Q` are `Later` for that variable. Otherwise it
is `Now`.  We use the following function to make this choice:

    choose : Kind → Kind → Kind
    choose Now Now = Now
    choose Now Later = Now
    choose Later Now = Now
    choose Later Later = Later

We define `combine` to apply `choose` to a list of times.

    combine : ∀{Γ} (ts₁ ts₂ : Times Γ) → Times Γ
    combine {[]} ts₁ ts₂ = ∅
    combine {A ∷ Γ} (cons x ts₁) (cons y ts₂) =
        cons (choose x y) (combine ts₁ ts₂)

Here's the type of the "and" operator:

    _×ˢ_ : ∀{Γ}{ts₁ ts₂ : Times Γ} → Setˢ Γ ts₁ → Setˢ Γ ts₂
       → Setˢ Γ (combine ts₁ ts₂)

The other propositions follow a similar pattern.

The membership formula `a ∈ x` is true when `a` is in the predicate
bound to variable `x` in the environment. The time for `x` is required
to be `Now`.

    var-now : ∀ (Γ : Context) → ∀{A} → (x : Γ ∋ A) → Times Γ
    var-now (B ∷ Γ) zeroˢ = cons Now (laters Γ)
    var-now (B ∷ Γ) (sucˢ x) = cons Later (var-now Γ x)

    _∈_ : ∀{Γ}{A}
       → A
       → (x : Γ ∋ A)
       → Setˢ Γ (var-now Γ x)
    a ∈ x =
      record { # = λ δ → (lookup x δ) a
             ; ... }

The `μˢ` formula defines a (possibly nested) recursive predicate.

    μˢ : ∀{Γ}{ts : Times Γ}{A}
       → (A → Setˢ (A ∷ Γ) (cons Later ts))
         ----------------------------------
       → (A → Setˢ Γ ts)

It takes a non-recursive predicate from `A` to `Setˢ` and produces a
recursive predicate in `A`. Note that the variable `zeroˢ`, the
one introduced by this `μˢ`, is required to have time `Later`.

If the recursive predicate is not nested inside other recursive
predicates, then you can directly use the following `μᵒ` operator.

    μᵒ : ∀{A}
       → (A → Setˢ (A ∷ []) (cons Later ∅))
         ----------------------------------
       → (A → Setᵒ)

Let's revisit the example of defining multi-step reduction.  The
non-recursive `mreduce` predicate is defined as follows.

```
mreduce : Term × Term → Setˢ ((Term × Term) ∷ []) (cons Later ∅)
mreduce (M , N) = (M ≡ N)ˢ ⊎ˢ (∃ˢ[ L ] (M —→ L)ˢ ×ˢ ▷ˢ (((L , N) ∈ zeroˢ)))
```

Note that the `R` parameter has become implicit; it has moved into the
environment. Also the application `R (L , N)` is replaced by
`▷ˢ ((L , N) ∈ zeroˢ)`, where the de Bruijn index `zeroˢ` refers to
the predicate `R` in the environment.

We define the recursive predicate `M —→* N` by applying `μᵒ`
to `mreduce`.

```
infix 2 _—→*_
_—→*_ : Term → Term → Setᵒ
M —→* N = μᵒ mreduce (M , N)
```

Here are a couple uses of the multi-step reduction relation.

```
X₀ : #($ (Num 0) —→* $ (Num 0)) 1
X₀ = inj₁ refl

X₁ : #((ƛ ($ (Num 1))) · $ (Num 0) —→* $ (Num 1)) 2
X₁ = inj₂ (_ , (β ($̬ _) , inj₁ refl))
```

## Proofs in Step-indexed Logic

Just like first-orderd logic, SIL comes with rules of deduction for
carrying out proofs. The judgement form is `𝒫 ⊢ᵒ P`, where `𝒫` is a
list of assumptions and `P` is a formula.  The judgement `𝒫 ⊢ᵒ P` is
true iff for every time `k`, all of `𝒫` are true at `k` implies that `P`
is true at `k`. So in Agda we have the following definition.

    Πᵒ : List Setᵒ → Setᵒ
    Πᵒ [] = ⊤ᵒ
    Πᵒ (P ∷ 𝒫) = P ×ᵒ Πᵒ 𝒫 

    _⊢ᵒ_ : List Setᵒ → Setᵒ → Set
    𝒫 ⊢ᵒ P = ∀ k → # (Πᵒ 𝒫) k → # P k

Many of the deduction rules are the same as in first order logic.
For example, here are the introduction and elimination rules
for conjunction. We use the same notation as Agda, but with
a superscript "o".

    _,ᵒ_ : ∀{𝒫 : List Setᵒ }{P Q : Setᵒ}
      → 𝒫 ⊢ᵒ P
      → 𝒫 ⊢ᵒ Q
        ------------
      → 𝒫 ⊢ᵒ P ×ᵒ Q

    proj₁ᵒ : ∀{𝒫 : List Setᵒ }{P Q : Setᵒ}
      → 𝒫 ⊢ᵒ P ×ᵒ Q
        ------------
      → 𝒫 ⊢ᵒ P

    proj₂ᵒ : ∀{𝒫 : List Setᵒ }{P Q : Setᵒ}
      → 𝒫 ⊢ᵒ P ×ᵒ Q
        ------------
      → 𝒫 ⊢ᵒ Q

The introduction rule for a constant formula `S ᵒ` is straightforward.
A proof of `S` in regular Agda is sufficient to build a proof of `S ᵒ`
in SIL.

    constᵒI : ∀{𝒫}{S : Set}
       → S
       → 𝒫 ⊢ᵒ S ᵒ

On the other hand, given a proof of `S ᵒ` in SIL, one cannot obtain a
proof of `S` directly in Agda. That is, the following rule is invalid
because `𝒫` could be false at every index.

    bogus-constᵒE : ∀ {𝒫}{S : Set}{R : Setᵒ}
       → 𝒫 ⊢ᵒ S ᵒ
       → S

Instead, we have an elimination rule in continuation-passing style.
That is, if we have a proof of `S ᵒ` and need to prove some arbitrary
goal `R`, then it suffices to prove `R` under the assumption that `S`
is true.

    constᵒE : ∀ {𝒫}{S : Set}{R : Setᵒ}
       → 𝒫 ⊢ᵒ S ᵒ
       → (S → 𝒫 ⊢ᵒ R)
       → 𝒫 ⊢ᵒ R

Analogous to `subst` in Agda's standard library, SIL has `substᵒ`
which says that if `P` and `Q` are equivalent, then a proof of `P` gives
a proof of `Q`.

    substᵒ : ∀{𝒫}{P Q : Setᵒ}
      → P ≡ᵒ Q
        -------------------
      → 𝒫 ⊢ᵒ P  →  𝒫 ⊢ᵒ Q

The deduction rules also include ones for the "later" operator.  As we
mentioned earlier, if a proposition is true now it will also be true
later.

    monoᵒ : ∀ {𝒫}{P}
       → 𝒫 ⊢ᵒ P
         -----------
       → 𝒫 ⊢ᵒ  ▷ᵒ P

One can transport induction on natural numbers into SIL to obtain the
following Löb rule, which states that when proving any property `P`,
one is allowed to assume that `P` is true later.

    lobᵒ : ∀ {𝒫}{P}
       → (▷ᵒ P) ∷ 𝒫 ⊢ᵒ P
         -----------------------
       → 𝒫 ⊢ᵒ P

For comparison, here's induction on natural numbers

      P 0
    → (∀ k → P k → P (suc k))
    → ∀ n → P n

In the world of SIL, propositions are always true at zero, so the base
case `P 0` is not necessary. The induction step `(∀ k → P k → P (suc k))`
is similar to the premise `(▷ᵒ P) ∷ 𝒫 ⊢ᵒ P` because `▷ᵒ` subtracts one.

The following is a handy proof rule that turns a proof of `P` in SIL
into an assumption in Agda that `P` is true for some positive natural
number.

    ⊢ᵒ-sucP : ∀{𝒫}{P Q : Setᵒ}
       → 𝒫 ⊢ᵒ P
       → (∀{n} → # P (suc n) → 𝒫 ⊢ᵒ Q)
       → 𝒫 ⊢ᵒ Q

As usual for temporal logics (or more generally, for modal logics),
there are distribution rules that push "later" through the other
logical connectives. For example, the following rule distributes
"later" through conjunction.

    ▷× : ∀{𝒫} {P Q : Setᵒ}
       → 𝒫 ⊢ᵒ (▷ᵒ (P ×ᵒ Q))
         ----------------------
       → 𝒫 ⊢ᵒ (▷ᵒ P) ×ᵒ (▷ᵒ Q)

This project was the first time for me conducting nontrivial proofs in
a modal logic, and it took some getting use to!


## Defining a Logical Relation for Type Safety

With the Step-indexed Logic in hand, we are ready to define a logical
relation for type safety. The two predicates ℰ and 𝒱 are mutually
recursive, so we combine them into a single recursive predicate named
`ℰ⊎𝒱` that takes a sum type, where the left side is for ℰ and the
right side is for 𝒱. We shall define `ℰ⊎𝒱` by an application of
`μᵒ`, so we first need to define the non-recursive version of
`ℰ⊎𝒱`, which we call `pre-ℰ⊎𝒱`, defined below. It simply dispatches to
the non-recursive `pre-ℰ` and `pre-ℰ` which we define next.

```
ℰ⊎𝒱-type : Set
ℰ⊎𝒱-type = (Type × Term) ⊎ (Type × Term)

ℰ⊎𝒱-ctx : Context
ℰ⊎𝒱-ctx = ℰ⊎𝒱-type ∷ []

pre-ℰ : Type → Term → Setˢ ℰ⊎𝒱-ctx (cons Later ∅)
pre-𝒱 : Type → Term → Setˢ ℰ⊎𝒱-ctx (cons Later ∅)

pre-ℰ⊎𝒱 : ℰ⊎𝒱-type → Setˢ ℰ⊎𝒱-ctx (cons Later ∅)
pre-ℰ⊎𝒱 (inj₁ (A , V)) = pre-𝒱 A V
pre-ℰ⊎𝒱 (inj₂ (A , M)) = pre-ℰ A M
```

To improve the readability of our definitions, we define the following
notation for recursive applications of the ℰ and 𝒱 predicates.

```
ℰˢ⟦_⟧ : Type → Term → Setˢ ℰ⊎𝒱-ctx (cons Now ∅)
ℰˢ⟦ A ⟧ M = (inj₂ (A , M)) ∈ zeroˢ

𝒱ˢ⟦_⟧ : Type → Term → Setˢ ℰ⊎𝒱-ctx (cons Now ∅)
𝒱ˢ⟦ A ⟧ V = (inj₁ (A , V)) ∈ zeroˢ
```

The definition of `pre-ℰ` and `pre-𝒱` below are of similar form to the
explicitly step-indexed definition of ℰ and 𝒱 above, however the
parameter `k` is gone and all of the logical connectives have a
superscript `s`, indicating that we're building a `Setˢ`.  Also,
note that all the uses of `ℰˢ` and `𝒱ˢ` are guarded by the later
operator `▷ˢ`. Finally, in the definition of `pre-ℰ`, we do not use `▷ˢ
(𝒱⟦ A ⟧ M)` but instead use `pre-𝒱 A M` because we need to say in that
spot that `M` is a semantic value now, not later.

```
pre-ℰ A M = (pre-𝒱 A M ⊎ˢ (reducible M)ˢ ⊎ˢ (Blame M)ˢ)
             ×ˢ (∀ˢ[ N ] (M —→ N)ˢ →ˢ ▷ˢ (ℰˢ⟦ A ⟧ N))

pre-𝒱 ★ (V ⟨ G !⟩ )      = (Value V)ˢ ×ˢ ▷ˢ (𝒱ˢ⟦ gnd⇒ty G ⟧ V)
pre-𝒱 ($ₜ ι) ($ c)        = (ι ≡ typeof c)ˢ
pre-𝒱 (A ⇒ B) (ƛ N)      = ∀ˢ[ W ] ▷ˢ (𝒱ˢ⟦ A ⟧ W) →ˢ ▷ˢ (ℰˢ⟦ B ⟧ (N [ W ]))
pre-𝒱 A M                = ⊥ ˢ
```

We define ℰ and 𝒱 by creating a recursive predicate (apply `μᵒ` to
`pre-ℰ⊎𝒱`) and then apply it to an argument injected with either `inj₁`
for 𝒱 or `inj₂` for ℰ.

```
ℰ⊎𝒱 : ℰ⊎𝒱-type → Setᵒ
ℰ⊎𝒱 X = μᵒ pre-ℰ⊎𝒱 X

ℰ⟦_⟧ : Type → Term → Setᵒ
ℰ⟦ A ⟧ M = ℰ⊎𝒱 (inj₂ (A , M))

𝒱⟦_⟧ : Type → Term → Setᵒ
𝒱⟦ A ⟧ V = ℰ⊎𝒱 (inj₁ (A , V))
```

To succinctly talk about the two aspects of ℰ, we define semantic
`progress` and `preservation` as follows.

```
progress : Type → Term → Setᵒ
progress A M = 𝒱⟦ A ⟧ M ⊎ᵒ (reducible M)ᵒ ⊎ᵒ (Blame M)ᵒ

preservation : Type → Term → Setᵒ
preservation A M = ∀ᵒ[ N ] ((M —→ N)ᵒ →ᵒ ▷ᵒ (ℰ⟦ A ⟧ N))
```

We can prove that ℰ is indeed equivalent to progress and preservation
by use of the `fixpointᵒ` theorem in SIL.

```
ℰ-stmt : ∀{A}{M}
  → ℰ⟦ A ⟧ M ≡ᵒ progress A M ×ᵒ preservation A M
ℰ-stmt {A}{M} =
  ℰ⟦ A ⟧ M                                                    ⩦⟨ ≡ᵒ-refl refl ⟩
  μᵒ pre-ℰ⊎𝒱 (inj₂ (A , M))              ⩦⟨ fixpointᵒ pre-ℰ⊎𝒱 (inj₂ (A , M)) ⟩
  # (pre-ℰ⊎𝒱 (inj₂ (A , M))) (ℰ⊎𝒱 , ttᵖ)
             ⩦⟨ cong-×ᵒ (cong-⊎ᵒ (≡ᵒ-sym (fixpointᵒ pre-ℰ⊎𝒱 (inj₁ (A , M))))
                                      (≡ᵒ-refl refl)) (≡ᵒ-refl refl) ⟩
  progress A M ×ᵒ preservation A M
  ∎
```

For convenience, we define introduction and elimination rules for ℰ.

```
ℰ-intro : ∀ {𝒫}{A}{M}
  → 𝒫 ⊢ᵒ progress A M
  → 𝒫 ⊢ᵒ preservation A M
    ----------------------
  → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ M
ℰ-intro 𝒫⊢prog 𝒫⊢pres = substᵒ (≡ᵒ-sym ℰ-stmt) (𝒫⊢prog ,ᵒ 𝒫⊢pres)

ℰ-progress : ∀ {𝒫}{A}{M}
  → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ M
  → 𝒫 ⊢ᵒ progress A M
ℰ-progress 𝒫⊢ℰM = proj₁ᵒ (substᵒ ℰ-stmt 𝒫⊢ℰM )

ℰ-preservation : ∀ {𝒫}{A}{M}
  → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ M
  → 𝒫 ⊢ᵒ preservation A M
ℰ-preservation 𝒫⊢ℰM = proj₂ᵒ (substᵒ ℰ-stmt 𝒫⊢ℰM )
```

Similarly, we can derive the expected equations for 𝒱.

```
𝒱-base : ∀{ι}{c : Lit} → (𝒱⟦ $ₜ ι ⟧ ($ c)) ≡ᵒ (ι ≡ typeof c)ᵒ
𝒱-base = ≡ᵒ-intro λ k → (λ x → x) , (λ x → x)

𝒱-dyn : ∀{G}{V} → 𝒱⟦ ★ ⟧ (V ⟨ G !⟩) ≡ᵒ ((Value V)ᵒ ×ᵒ ▷ᵒ (𝒱⟦ gnd⇒ty G ⟧ V))
𝒱-dyn {G}{V} =
   let X = (inj₁ (★ , V ⟨ G !⟩)) in
   𝒱⟦ ★ ⟧ (V ⟨ G !⟩)                              ⩦⟨ ≡ᵒ-refl refl ⟩
   ℰ⊎𝒱 X                                          ⩦⟨ fixpointᵒ pre-ℰ⊎𝒱 X ⟩
   # (pre-ℰ⊎𝒱 X) (ℰ⊎𝒱 , ttᵖ)                     ⩦⟨ ≡ᵒ-refl refl ⟩ 
   (Value V)ᵒ ×ᵒ ▷ᵒ (𝒱⟦ gnd⇒ty G ⟧ V)             ∎

𝒱-fun : ∀{A B}{N}
   → 𝒱⟦ A ⇒ B ⟧ (ƛ N)
      ≡ᵒ (∀ᵒ[ W ] ((▷ᵒ (𝒱⟦ A ⟧ W)) →ᵒ (▷ᵒ (ℰ⟦ B ⟧ (N [ W ])))))
𝒱-fun {A}{B}{N} =
   let X = (inj₁ (A ⇒ B , ƛ N)) in
   𝒱⟦ A ⇒ B ⟧ (ƛ N)                                         ⩦⟨ ≡ᵒ-refl refl ⟩
   ℰ⊎𝒱 X                                            ⩦⟨ fixpointᵒ pre-ℰ⊎𝒱 X ⟩
   # (pre-ℰ⊎𝒱 X) (ℰ⊎𝒱 , ttᵖ)                               ⩦⟨ ≡ᵒ-refl refl ⟩ 
   (∀ᵒ[ W ] ((▷ᵒ (𝒱⟦ A ⟧ W)) →ᵒ (▷ᵒ (ℰ⟦ B ⟧ (N [ W ])))))   ∎
```

We have defined `𝒱` such that it only accepts terms that are syntactic
values. (We included `Value V` in the case for `★` of `pre-𝒱`.)

```
𝒱⇒Value : ∀ {k} A M
   → # (𝒱⟦ A ⟧ M) (suc k)
     ---------------------
   → Value M
𝒱⇒Value ★ (M ⟨ G !⟩) (v , _) = v 〈 G 〉
𝒱⇒Value ($ₜ ι) ($ c) 𝒱M = $̬ c
𝒱⇒Value (A ⇒ B) (ƛ N) 𝒱M = ƛ̬ N
```

A value `V` in 𝒱 is also in ℰ. The definition of `progress` includes
values, and to prove preservation we note that a value is irreducible.

```
𝒱⇒ℰ : ∀{A}{𝒫}{V}
   → 𝒫 ⊢ᵒ 𝒱⟦ A ⟧ V
     ---------------
   → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ V
𝒱⇒ℰ {A}{𝒫}{V} 𝒫⊢𝒱V = ℰ-intro prog pres
    where
    prog = inj₁ᵒ 𝒫⊢𝒱V
    pres = Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ V—→N →
             ⊢ᵒ-sucP (⊢ᵒ-weaken 𝒫⊢𝒱V) λ 𝒱V →
                ⊥-elim (value-irreducible (𝒱⇒Value A V 𝒱V ) V—→N))
```

## Semantic Type Safety for Open Terms

The `ℰ` predicate applies to closed terms, that is, terms without any
free variables, such as a whole program. However, we'll need a notion
of semantic type safety that also includes open terms. The standard
way to define safety for an open term `M` is to substitute the free
variables for values and then use `ℰ`. That is, we apply a
substitution `γ` to `M` where all the values in `γ` must be
semantically well typed. The following `𝓖` expresses this contraint on
`γ`.

```
𝓖⟦_⟧ : (Γ : List Type) → Subst → List Setᵒ
𝓖⟦ [] ⟧ σ = []
𝓖⟦ A ∷ Γ ⟧ σ = (𝒱⟦ A ⟧ (σ 0)) ∷ 𝓖⟦ Γ ⟧ (λ x → σ (suc x))
```

A term `M` is semantically well typed at `A` in context `Γ` if, 
for any well-typed substitution `γ`, we have `ℰ⟦ A ⟧ (⟪ γ ⟫ M)`.

```
_⊨_⦂_ : List Type → Term → Type → Set
Γ ⊨ M ⦂ A = ∀ (γ : Subst) → 𝓖⟦ Γ ⟧ γ ⊢ᵒ ℰ⟦ A ⟧ (⟪ γ ⟫ M)
```

## The Fundamental Lemma via Compatibility Lemmas

The main lemma on our way to proving type safety is the Fundamental
Lemma, which states that well-typed programs are semantically type
safe. That is, well-typed programs behave as expected according to
their types.

    fundamental : ∀ {Γ A} → (M : Term)
      → Γ ⊢ M ⦂ A
        ----------
      → Γ ⊨ M ⦂ A

The proof of `fundamental` is by induction on the typing derivation,
with each case dispatching to a compatibility lemma.

The compatibility lemma for number literals is proved by showing that
`$ (Num n)` is in `𝒱⟦ $ₜ ′ℕ ⟧` via the definition of `𝒱` and then
apply the `𝒱⇒ℰ` lemma.

```
compatible-nat : ∀{Γ}{n : ℕ}
     -----------------------
   → Γ ⊨ $ (Num n) ⦂ ($ₜ ′ℕ)
compatible-nat {Γ}{n} γ = 𝒱⇒ℰ (substᵒ (≡ᵒ-sym 𝒱-base) (constᵒI refl))
```

The compability lemma for Boolean literals is the same.

```
compatible-bool : ∀{Γ}{b : 𝔹}
     --------------------------
   → Γ ⊨ ($ (Bool b)) ⦂ ($ₜ ′𝔹)
compatible-bool {Γ}{b} γ = 𝒱⇒ℰ (substᵒ (≡ᵒ-sym 𝒱-base) (constᵒI refl))
```

The compatibility lemma for the `blame` term is similar to the `𝒱⇒ℰ`
lemma in that `blame` is one of the alternatives allowed in `progress`
and `blame` is irreducible.

```
ℰ-blame : ∀{𝒫}{A} → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ blame
ℰ-blame {𝒫}{A} = ℰ-intro prog pres
    where
    prog = inj₂ᵒ (inj₂ᵒ (constᵒI isBlame))
    pres = Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ blame→ → ⊥-elim (blame-irreducible blame→))

compatible-blame : ∀{Γ}{A}
     -------------
   → Γ ⊨ blame ⦂ A
compatible-blame {Γ}{A} γ = ℰ-blame
```

The compatibility lemma for variables makes use of the premise that
the values in the environment are semantically well typed.
The following lemma proves that for any variable `y` in `Γ`,
`γ` in `𝓖⟦ Γ ⟧` imples that `γ y` in `𝒱⟦ A ⟧`.

```
lookup-𝓖 : (Γ : List Type) → (γ : Subst)
  → ∀ {A}{y} → (Γ ∋ y ⦂ A)
  → 𝓖⟦ Γ ⟧ γ ⊢ᵒ 𝒱⟦ A ⟧ (γ y)
lookup-𝓖 (B ∷ Γ) γ {A} {zero} refl = Zᵒ
lookup-𝓖 (B ∷ Γ) γ {A} {suc y} ∋y =
    Sᵒ (lookup-𝓖 Γ (λ x → γ (suc x)) ∋y) 
```

Once we have `γ y` in `𝒱⟦ A ⟧`, we conclude by applying the `𝒱⇒ℰ`
lemma. (The `sub-var` lemma just says that `⟪ γ ⟫ (` x) ≡ γ x`.)

```
compatibility-var : ∀ {Γ A x}
  → Γ ∋ x ⦂ A
    -----------
  → Γ ⊨ ` x ⦂ A
compatibility-var {Γ}{A}{x} ∋x γ rewrite sub-var γ x = 𝒱⇒ℰ (lookup-𝓖 Γ γ ∋x)
```

The next compatibility lemma is for lambda abstraction.  To show that
`ƛ N` is in `ℰ⟦A ⇒ B⟧` we shows that `ƛ N` is in `𝒱⟦A ⇒ B⟧`.  According
to that definition, we need to show that for any argument value `W` in
`𝒱⟦ A ⟧` (later), we have `(⟪ ext γ ⟫ N) [ W ]` in `ℰ⟦ B ⟧` (also later).  But
that follows almost directly from the premise that `N` is semantically
type safe. From that premise we have

    ▷ᵒ ℰ ⟦ B ⟧ (⟪ W • γ ⟫ N)

and the Abstract Binding Tree library provides rewrites for the
following equation

    ⟪ W • γ ⟫ N = (⟪ ext γ ⟫ N) [ W ]

which gives us what we need:

    ▷ᵒ ℰ ⟦ B ⟧ (⟪ ext γ ⟫ N) [ W ]

Here's all the details in Agda:
```
compatible-lambda : ∀{Γ}{A}{B}{N}
   → (A ∷ Γ) ⊨ N ⦂ B
     -------------------
   → Γ ⊨ (ƛ N) ⦂ (A ⇒ B)
compatible-lambda {Γ}{A}{B}{N} ⊨N γ = 𝒱⇒ℰ ⊢𝒱λN
 where
 ⊢𝒱λN : 𝓖⟦ Γ ⟧ γ ⊢ᵒ 𝒱⟦ A ⇒ B ⟧ (ƛ (⟪ ext γ ⟫ N))
 ⊢𝒱λN = (substᵒ (≡ᵒ-sym 𝒱-fun) (Λᵒ[ W ] →ᵒI ▷𝓔N[W]))
  where
  ▷𝓔N[W] : ∀{W} → ▷ᵒ 𝒱⟦ A ⟧ W ∷ 𝓖⟦ Γ ⟧ γ  ⊢ᵒ  ▷ᵒ ℰ⟦ B ⟧ ((⟪ ext γ ⟫ N) [ W ])
  ▷𝓔N[W] {W} = appᵒ (Sᵒ (▷→ (monoᵒ (→ᵒI (⊨N (W • γ)))))) Zᵒ
```

The next few compatibility lemmas, for application, injection, and
projection all involve reasoning about the reduction of one or two
subexpressions.  Instead of duplicating this reasoning, the standard
approach is to put that reasoning in the "bind" lemma, which we
discuss next.

## Interlude: the "Bind" Lemma

The bind lemma says that if we have an expression `N` with a
subexpression `M` (so `N` is equal to plugging `M` into
an appropriate frame `F`, i.e. `N = F ⟦ M ⟧`), if
`M` is semantically safe, then to prove `ℰ⟦ A ⟧ (F ⟦ M ⟧)`
it suffices to prove that `ℰ⟦ A ⟧ (F ⟦ V ⟧))`
for some semantically safe value `V` that `M` reduced to.

    ℰ-bind : ∀{𝒫}{A}{B}{F}{M}
       → 𝒫 ⊢ᵒ ℰ⟦ B ⟧ M
       → 𝒫 ⊢ᵒ (∀ᵒ[ V ] (M —↠ V)ᵒ →ᵒ 𝒱⟦ B ⟧ V →ᵒ ℰ⟦ A ⟧ (F ⟦ V ⟧))
         ----------------------------------------------------------
       → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)

In the title of this blog post I alluded to one hard lemma. This is
the one!

We begin by creating some names for parts of the statement of this
lemma. First we have a name for the second premise.

```
𝒱V→ℰF[V] : Type → Type → Frame → Term → Setᵒ
𝒱V→ℰF[V] A B F M = ∀ᵒ[ V ] (M —↠ V)ᵒ →ᵒ 𝒱⟦ B ⟧ V →ᵒ ℰ⟦ A ⟧ (F ⟦ V ⟧)
```

Then we have a name for the two premises and the conclusion, with the
implications expressed in SIL.

```
ℰ-bind-M : Type → Type → Frame → Term → Setᵒ
ℰ-bind-M A B F M = ℰ⟦ B ⟧ M →ᵒ 𝒱V→ℰF[V] A B F M →ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
```

The following adds universal quantification (in SIL) over the term `M`.

```
ℰ-bind-prop : Type → Type → Frame → Setᵒ
ℰ-bind-prop A B F = ∀ᵒ[ M ] ℰ-bind-M A B F M
```

We shall need the `𝒱V→ℰF[V]` property to be preserved under reverse
reduction, i.e., expansion. The proof is as follows. We need to show
that `ℰ⟦ A ⟧ (F ⟦ V ⟧)` under the assumption that `M′ —↠ V` and
`𝒱⟦ B ⟧ V`. With the first premise `M —→ M′`, we obtain `M —↠ V`. Then we
apply the second premise to conclude that `ℰ⟦ A ⟧ (F ⟦ V ⟧)`.

```
𝒱V→ℰF[V]-expansion : ∀{𝒫}{A}{B}{F}{M}{M′}
   → M —→ M′
   → 𝒫 ⊢ᵒ 𝒱V→ℰF[V] A B F M
     -----------------------
   → 𝒫 ⊢ᵒ 𝒱V→ℰF[V] A B F M′
𝒱V→ℰF[V]-expansion {𝒫}{A}{B}{F}{M}{M′} M→M′ 𝒱V→ℰF[V][M] =
   Λᵒ[ V ]
    let M′→V→ℰFV : 𝒱⟦ B ⟧ V ∷ (M′ —↠ V)ᵒ ∷ 𝒫 ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ V ⟧)
        M′→V→ℰFV = ⊢ᵒ-sucP (Sᵒ Zᵒ) λ M′→V → 
                     let M—↠V = constᵒI (M —→⟨ M→M′ ⟩ M′→V) in
                     let M→V→ℰFV = ⊢ᵒ-weaken(⊢ᵒ-weaken(instᵒ 𝒱V→ℰF[V][M] V)) in
                     appᵒ (appᵒ M→V→ℰFV M—↠V) Zᵒ in
    →ᵒI (→ᵒI M′→V→ℰFV)
```

We now proceed to prove the `ℰ-bind` lemma by way of an auxilliary
lemma `ℰ-bind-aux` that restates the lemma so that the term `M` is
universally quantified in SIL (instead of Agda), so that we can do the
proof by Löb induction, that is, by use of the `lobᵒ` rule of SIL.
So after the use of `lobᵒ`, it remains to prove that `ℰ⟦ A ⟧ (F ⟦ M ⟧)`,
but now we have the additional assumption that we can apply the
bind lemma in the future to any term, i.e., we have `▷ᵒ ℰ-bind-prop A B F`.
From the premise `ℰ⟦ B ⟧ M` we have that `M` satisfies progress,
so either (1) it is a semantic value, (2) it can reduce, or (3) it is blame.
We proceed by reasoning about each of these three cases.

* `M` is already a value, so it can multi-step reduce to itself in
  zero steps, and then we apply the `𝒱V→ℰF[V]` premise to immediately
  conclude.

* `M` is reducible.
  Now to prove `ℰ⟦ A ⟧ (F ⟦ M ⟧)` we need to prove progress and preservation.
  The progress part is immediate, because by rule `ξ` we have
  `F ⟦ M ⟧ —→ F ⟦ M′ ⟧` because `M —→ M′ for some `M′`.
  The preservation part is more involved.
  We are given that `F ⟦ M ⟧ —→ N` and need to prove that `▷ᵒ (ℰ⟦ A ⟧ N)`.
  By the `frame-inv2` lemma, we obtain an `M′` such that `M —→ M′`
  and `N ≡ F ⟦ M′ ⟧`. So we need to prove that `▷ᵒ (ℰ⟦ A ⟧ (F ⟦ M′ ⟧))`
  We shall obtain this via the induction hypothesis, and for that we
  need to prove (1) `▷ᵒ ℰ⟦ B ⟧ M′` and (2) `▷ᵒ (𝒱V→ℰF[V] A B F M′)`.
  We obtain (1) from the preservation part of `ℰ⟦ B ⟧ M`.
  We obtain (2) by the `𝒱V→ℰF[V]-expansion` lemma and shift it to later
  using `monoᵒ`.

* `M` is blame. We need to show `ℰ⟦ A ⟧ (F ⟦ blame ⟧)`.
   For the progress part, we have the reduction `F ⟦ blame ⟧ —→ blame`
   by rule `ξ-blame`. For preservation, we have `F ⟦ blame ⟧ —→ N`
   and need to prove that `▷ᵒ (ℰ⟦ A ⟧ N)`. The `blame-frame`
   lemma tells us that `N ≡ blame`, so we conclude by use of
   `ℰ-blame` and then `monoᵒ`.

```
open import rewriting.examples.CastDeterministic
  using (frame-inv2; deterministic)

ℰ-bind-aux : ∀{𝒫}{A}{B}{F} → 𝒫 ⊢ᵒ ℰ-bind-prop A B F
ℰ-bind-aux {𝒫}{A}{B}{F} = lobᵒ (Λᵒ[ M ] →ᵒI (→ᵒI Goal))
  where
  Goal : ∀{M} → (𝒱V→ℰF[V] A B F M) ∷ ℰ⟦ B ⟧ M ∷ ▷ᵒ ℰ-bind-prop A B F ∷ 𝒫
                 ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
  Goal{M} =
   case3ᵒ (ℰ-progress (Sᵒ Zᵒ)) Mval Mred Mblame
   where
   𝒫′ = (𝒱V→ℰF[V] A B F M) ∷ ℰ⟦ B ⟧ M ∷ ▷ᵒ ℰ-bind-prop A B F ∷ 𝒫

   Mval : 𝒱⟦ B ⟧ M ∷ 𝒫′ ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
   Mval =
     let 𝒱V→ℰF[V][M] = λ V → (M —↠ V)ᵒ →ᵒ 𝒱⟦ B ⟧ V →ᵒ ℰ⟦ A ⟧ (F ⟦ V ⟧) in
     appᵒ (appᵒ (instᵒ{P = 𝒱V→ℰF[V][M]} (Sᵒ Zᵒ) M) (constᵒI (M END))) Zᵒ

   Mred : (reducible M)ᵒ ∷ 𝒫′ ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
   Mred = ℰ-intro progressMred preservationMred
    where
    progressMred : (reducible M)ᵒ ∷ 𝒫′ ⊢ᵒ progress A (F ⟦ M ⟧)
    progressMred = inj₂ᵒ (inj₁ᵒ (constᵒE Zᵒ λ {(M′ , M→M′) →
                                            constᵒI (_ , (ξ F M→M′))}))

    preservationMred : (reducible M)ᵒ ∷ 𝒫′ ⊢ᵒ preservation A (F ⟦ M ⟧)
    preservationMred = (constᵒE Zᵒ λ redM →
                ⊢ᵒ-weaken (Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ FM→N →
                                          ⊢ᵒ-weaken (redM⇒▷ℰN redM FM→N))))
     where
     redM⇒▷ℰN : ∀{N} → reducible M → (F ⟦ M ⟧ —→ N) → 𝒫′ ⊢ᵒ ▷ᵒ (ℰ⟦ A ⟧ N)
     redM⇒▷ℰN {N} rM FM→N =
      let finv = frame-inv2{M}{N}{F} rM FM→N in
      let M′ = proj₁ finv in
      let M→M′ = proj₁ (proj₂ finv) in
      let N≡ = proj₂ (proj₂ finv) in
      let ▷ℰM′ : 𝒫′ ⊢ᵒ ▷ᵒ ℰ⟦ B ⟧ M′
          ▷ℰM′ = appᵒ (instᵒ{P = λ N → (M —→ N)ᵒ →ᵒ ▷ᵒ (ℰ⟦ B ⟧ N)}
                        (ℰ-preservation (Sᵒ Zᵒ)) M′)
                      (constᵒI M→M′) in
      let ▷M′→V→𝒱V→ℰFV : 𝒫′ ⊢ᵒ ▷ᵒ (𝒱V→ℰF[V] A B F M′)
          ▷M′→V→𝒱V→ℰFV = monoᵒ (𝒱V→ℰF[V]-expansion{𝒫′}{A}{B} M→M′ Zᵒ) in
      let IH : 𝒫′ ⊢ᵒ ▷ᵒ ℰ-bind-prop A B F
          IH = Sᵒ (Sᵒ Zᵒ) in
      let ▷ℰFM′ : 𝒫′ ⊢ᵒ ▷ᵒ (ℰ⟦ A ⟧ (F ⟦ M′ ⟧))
          ▷ℰFM′ = frame-prop-lemma IH ▷ℰM′ ▷M′→V→𝒱V→ℰFV in
      subst (λ N → 𝒫′ ⊢ᵒ ▷ᵒ ℰ⟦ A ⟧ N) (sym N≡) ▷ℰFM′
      where
      frame-prop-lemma : ∀{𝒫}{A}{B}{M}{F}
         → 𝒫 ⊢ᵒ ▷ᵒ ℰ-bind-prop A B F  →  𝒫 ⊢ᵒ ▷ᵒ ℰ⟦ B ⟧ M
         → 𝒫 ⊢ᵒ ▷ᵒ 𝒱V→ℰF[V] A B F M   →  𝒫 ⊢ᵒ ▷ᵒ (ℰ⟦ A ⟧ (F ⟦ M ⟧))
      frame-prop-lemma{𝒫}{A}{B}{M}{F} IH ℰM V→FV =
       appᵒ(▷→ (appᵒ(▷→ (instᵒ(▷∀{P = λ M → ℰ-bind-M A B F M} IH) M)) ℰM)) V→FV

   Mblame : (Blame M)ᵒ ∷ 𝒫′ ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
   Mblame = ℰ-intro progressMblame
            (constᵒE Zᵒ λ blameM →
               ⊢ᵒ-weaken (Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ FM→N →
                                        ⊢ᵒ-weaken (blameM⇒▷ℰN blameM FM→N))))
    where
    progressMblame : (Blame M)ᵒ ∷ 𝒫′ ⊢ᵒ progress A (F ⟦ M ⟧)
    progressMblame =
       inj₂ᵒ (inj₁ᵒ (constᵒE Zᵒ λ {isBlame → constᵒI (_ , (ξ-blame F))}))

    blameM⇒▷ℰN : ∀{N} → Blame M → (F ⟦ M ⟧ —→ N)
       → 𝒫′ ⊢ᵒ ▷ᵒ (ℰ⟦ A ⟧ N)
    blameM⇒▷ℰN {N} isBlame FM→N =
        let eq = blame-frame FM→N in
        subst (λ N → 𝒫′ ⊢ᵒ ▷ᵒ ℰ⟦ A ⟧ N) (sym eq) (monoᵒ ℰ-blame)
```

The `ℰ-bind` lemma follows as a corollary of `ℰ-bind-aux`.

```
ℰ-bind : ∀{𝒫}{A}{B}{F}{M}
   → 𝒫 ⊢ᵒ ℰ⟦ B ⟧ M
   → 𝒫 ⊢ᵒ (∀ᵒ[ V ] (M —↠ V)ᵒ →ᵒ 𝒱⟦ B ⟧ V →ᵒ ℰ⟦ A ⟧ (F ⟦ V ⟧))
     ----------------------------------------------------------
   → 𝒫 ⊢ᵒ ℰ⟦ A ⟧ (F ⟦ M ⟧)
ℰ-bind {𝒫}{A}{B}{F}{M} ⊢ℰM ⊢𝒱V→ℰFV =
  appᵒ (appᵒ (instᵒ{𝒫}{P = λ M → ℰ-bind-M A B F M} ℰ-bind-aux M) ⊢ℰM) ⊢𝒱V→ℰFV
```

## More Compatibility Lemmas

The next compatibility lemma to prove is the one for function
application.  For that we'll need the following elimination lemma for
a value `V` in `𝒱⟦ A ⇒ B ⟧`.

```
safe-body : List Setᵒ → Term → Type → Type → Set
safe-body 𝒫 N A B = ∀{W} → 𝒫 ⊢ᵒ (▷ᵒ (𝒱⟦ A ⟧ W)) →ᵒ (▷ᵒ (ℰ⟦ B ⟧ (N [ W ])))

𝒱-fun-elim : ∀{𝒫}{A}{B}{V}{R}
   → 𝒫 ⊢ᵒ 𝒱⟦ A ⇒ B ⟧ V
   → (∀ N → V ≡ ƛ N → safe-body 𝒫 N A B → 𝒫 ⊢ᵒ R)
    ------------------------------------------------
   → 𝒫 ⊢ᵒ R
𝒱-fun-elim {𝒫}{A}{B}{V}{R} ⊢𝒱V cont =
  ⊢ᵒ-sucP ⊢𝒱V λ { 𝒱Vsn → G {V} 𝒱Vsn ⊢𝒱V cont}
  where
  G : ∀{V}{n}
     → # (𝒱⟦ A ⇒ B ⟧ V) (suc n)
     → 𝒫 ⊢ᵒ 𝒱⟦ A ⇒ B ⟧ V
     → (∀ N → V ≡ ƛ N → safe-body 𝒫 N A B → 𝒫 ⊢ᵒ R)
     → 𝒫 ⊢ᵒ R
  G{ƛ N}{n} 𝒱V ⊢𝒱V cont = cont N refl λ {W} →
      instᵒ{P = λ W → (▷ᵒ (𝒱⟦ A ⟧ W)) →ᵒ (▷ᵒ (ℰ⟦ B ⟧ (N [ W ])))}
                 (substᵒ 𝒱-fun ⊢𝒱V) W
```

The proof of compatibility for application, given below, starts with
two uses of the `ℰ-bind` lemma, once for subexpression `L` and again
for `M`.  So we obtain that `L` reduces to value `V` and `M` reduces
to `W` and that `𝒱⟦ A ⇒ B ⟧ V` and `𝒱⟦ A ⟧ W`.  At this point, our
goal is to show that `ℰ⟦ B ⟧ (V · W)`.  Next we use the elimination
lemma on `𝒱⟦ A ⇒ B ⟧ V` which tells us that `V` is a lambda
abstraction `ƛ N` with a semantically safe body `N`.  We thus obtain
the `progress` part of `ℰ⟦ B ⟧ (V · W)` because `(ƛ N) · W —→ N [ W ]`.
For the preservation part, we need to show that `ℰ⟦ B ⟧ (N [ W ])`,
but that follows from `𝒱⟦ A ⟧ W` and that `N` is a semantically safe
body.

```
compatible-app : ∀{Γ}{A}{B}{L}{M}
   → Γ ⊨ L ⦂ (A ⇒ B)
   → Γ ⊨ M ⦂ A
     -------------------
   → Γ ⊨ L · M ⦂ B
compatible-app {Γ}{A}{B}{L}{M} ⊨L ⊨M γ = ⊢ℰLM
 where
 ⊢ℰLM : 𝓖⟦ Γ ⟧ γ ⊢ᵒ ℰ⟦ B ⟧ (⟪ γ ⟫ (L · M))
 ⊢ℰLM = ℰ-bind {F = □· (⟪ γ ⟫ M)} (⊨L γ) (Λᵒ[ V ] →ᵒI (→ᵒI ⊢ℰVM))
  where
  𝒫₁ = λ V → 𝒱⟦ A ⇒ B ⟧ V ∷ (⟪ γ ⟫ L —↠ V)ᵒ ∷ 𝓖⟦ Γ ⟧ γ
  ⊢ℰVM : ∀{V} → 𝒫₁ V ⊢ᵒ ℰ⟦ B ⟧ (V · ⟪ γ ⟫ M)
  ⊢ℰVM {V} = ⊢ᵒ-sucP Zᵒ λ 𝒱Vsn →
       let v = 𝒱⇒Value (A ⇒ B) V 𝒱Vsn in
       let 𝒫₁⊢ℰM : 𝒫₁ V ⊢ᵒ ℰ⟦ A ⟧ (⟪ γ ⟫ M)
           𝒫₁⊢ℰM = Sᵒ (Sᵒ (⊨M γ)) in
       ℰ-bind {F = v ·□} 𝒫₁⊢ℰM (Λᵒ[ V ] →ᵒI (→ᵒI ⊢ℰVW))
   where
   𝒫₂ = λ V W → 𝒱⟦ A ⟧ W ∷ (⟪ γ ⟫ M —↠ W)ᵒ ∷ 𝒱⟦ A ⇒ B ⟧ V ∷ (⟪ γ ⟫ L —↠ V)ᵒ
                 ∷ 𝓖⟦ Γ ⟧ γ
   ⊢ℰVW : ∀{V W} → 𝒫₂ V W ⊢ᵒ ℰ⟦ B ⟧ (V · W)
   ⊢ℰVW {V}{W} =
     let ⊢𝒱V : 𝒫₂ V W ⊢ᵒ 𝒱⟦ A ⇒ B ⟧ V
         ⊢𝒱V = Sᵒ (Sᵒ Zᵒ) in
     let ⊢𝒱W : 𝒫₂ V W ⊢ᵒ 𝒱⟦ A ⟧ W
         ⊢𝒱W = Zᵒ in
     ⊢ᵒ-sucP ⊢𝒱W λ 𝒱Wsn →
     let w = 𝒱⇒Value A W 𝒱Wsn in
     𝒱-fun-elim ⊢𝒱V λ {N′ refl 𝒱W→ℰNW →
     let prog : 𝒫₂ (ƛ N′) W ⊢ᵒ progress B (ƛ N′ · W)
         prog = (inj₂ᵒ (inj₁ᵒ (constᵒI (_ , (β w))))) in
     let pres : 𝒫₂ (ƛ N′) W ⊢ᵒ preservation B (ƛ N′ · W)
         pres = Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ {r →
                let ⊢▷ℰN′W = appᵒ 𝒱W→ℰNW (monoᵒ ⊢𝒱W) in
                let eq = deterministic r (β w) in
                ⊢ᵒ-weaken (subst (λ N → 𝒫₂ (ƛ N′) W ⊢ᵒ ▷ᵒ ℰ⟦ B ⟧ N)
                                 (sym eq) ⊢▷ℰN′W)}) in
     ℰ-intro prog pres
     }
```

The compability lemma for an injection cast also begins with applying
the bind lemma to subexpression `M`, taking us from `ℰ⟦ gnd⇒ty G ⟧ M`
to `𝒱⟦ gnd⇒ty G ⟧ V`. This also gives us that `V` is a syntactic
value via `𝒱⇒Value`. So we have `𝒱⟦ ★ ⟧ (V ⟨ G !⟩)` and then
conclude using `𝒱⇒ℰ`.

```
compatible-inject : ∀{Γ}{G}{M}
  → Γ ⊨ M ⦂ gnd⇒ty G
    --------------------
  → Γ ⊨ M ⟨ G !⟩ ⦂ ★
compatible-inject {Γ}{G}{M} ⊨M γ = ℰMg!
 where
 ℰMg! : 𝓖⟦ Γ ⟧ γ ⊢ᵒ ℰ⟦ ★ ⟧ ((⟪ γ ⟫ M) ⟨ G !⟩)
 ℰMg! = ℰ-bind {F = □⟨ G !⟩} (⊨M γ) (Λᵒ[ V ] →ᵒI (→ᵒI ⊢ℰVg!))
  where
  𝒫₁ = λ V → 𝒱⟦ gnd⇒ty G ⟧ V ∷ (⟪ γ ⟫ M —↠ V)ᵒ ∷ 𝓖⟦ Γ ⟧ γ
  ⊢ℰVg! : ∀{V} → 𝒫₁ V ⊢ᵒ ℰ⟦ ★ ⟧ (V ⟨ G !⟩)
  ⊢ℰVg!{V} =
   ⊢ᵒ-sucP Zᵒ λ 𝒱Vsn →
   let v = 𝒱⇒Value (gnd⇒ty G) V 𝒱Vsn in
   𝒱⇒ℰ (substᵒ (≡ᵒ-sym 𝒱-dyn) (constᵒI v ,ᵒ monoᵒ Zᵒ))
```

The last compatibility lemma is for a projection cast.
Here we also need an elimination lemma, this time for
a value `V` of type `★`.

```
𝒱-dyn-elim : ∀{𝒫}{V}{R}
   → 𝒫 ⊢ᵒ 𝒱⟦ ★ ⟧ V
   → (∀ W G → V ≡ W ⟨ G !⟩
             → 𝒫 ⊢ᵒ ((Value W)ᵒ ×ᵒ ▷ᵒ (𝒱⟦ gnd⇒ty G ⟧ W))
             → 𝒫 ⊢ᵒ R)
     ----------------------------------------------
   → 𝒫 ⊢ᵒ R
𝒱-dyn-elim {𝒫}{V}{R} ⊢𝒱V cont =
  ⊢ᵒ-sucP ⊢𝒱V λ { 𝒱Vsn → G 𝒱Vsn ⊢𝒱V cont }
  where
  G : ∀{V}{n}
      → # (𝒱⟦ ★ ⟧ V) (suc n)
      → 𝒫 ⊢ᵒ 𝒱⟦ ★ ⟧ V
      → (∀ W G → V ≡ W ⟨ G !⟩
               → 𝒫 ⊢ᵒ ((Value W)ᵒ ×ᵒ ▷ᵒ (𝒱⟦ gnd⇒ty G ⟧ W))
               → 𝒫 ⊢ᵒ R)
      → 𝒫 ⊢ᵒ R
  G {W ⟨ G !⟩}{n} 𝒱Vsn ⊢𝒱V cont
      with 𝒱⇒Value ★ (W ⟨ G !⟩) 𝒱Vsn
  ... | w 〈 _ 〉 =
      let ⊢▷𝒱W = proj₂ᵒ (substᵒ (𝒱-dyn{V = W}) ⊢𝒱V) in
      cont W _ refl (constᵒI w ,ᵒ ⊢▷𝒱W)
```

The compatibility lemma for a projection `M ⟨ H ?⟩` begins by using
`ℰ-bind` on the subexpression `M` to obtain a value `V` where
`⟪ γ ⟫ M —↠ V` and `𝒱⟦ ★ ⟧ V`. We then apply lemma `𝒱-dyn-elim`
to compose `V` into an injection `W ⟨ G !⟩` of a value `W`
where `▷ᵒ 𝒱⟦ G ⟧ W`. We need to show `ℰ⟦ H ⟧ (W ⟨ G !⟩ ⟨ H ?⟩)`.
The progress part comes from showing that it reduces to `W`
(if `G ≡ H`) or to `blame`. The preservation part is from
`▷ᵒ 𝒱⟦ G ⟧ W` (in the `G ≡ H` case) or because `ℰ⟦ H ⟧ blame`.

```
compatible-project : ∀{Γ}{H}{M}
  → Γ ⊨ M ⦂ ★
    -----------------------------
  → Γ ⊨ M ⟨ H ?⟩ ⦂ gnd⇒ty H
compatible-project {Γ}{H}{M} ⊨M γ = ℰMh?
 where
 ℰMh? : 𝓖⟦ Γ ⟧ γ ⊢ᵒ ℰ⟦ gnd⇒ty H ⟧ ((⟪ γ ⟫ M) ⟨ H ?⟩)
 ℰMh? = ℰ-bind {F = □⟨ H ?⟩} (⊨M γ) (Λᵒ[ V ] →ᵒI (→ᵒI ⊢ℰVh?))
  where
  𝒫₁ = λ V → 𝒱⟦ ★ ⟧ V ∷ (⟪ γ ⟫ M —↠ V)ᵒ ∷ 𝓖⟦ Γ ⟧ γ
  ⊢ℰVh? : ∀{V} → 𝒫₁ V ⊢ᵒ ℰ⟦ gnd⇒ty H ⟧ (V ⟨ H ?⟩)
  ⊢ℰVh?{V} =
   let ⊢𝒱V : 𝒫₁ V ⊢ᵒ 𝒱⟦ ★ ⟧ V
       ⊢𝒱V = Zᵒ in
   𝒱-dyn-elim ⊢𝒱V λ { W G refl ⊢w×▷𝒱W →
   let ⊢w = proj₁ᵒ ⊢w×▷𝒱W in
   let ▷𝒱W = proj₂ᵒ ⊢w×▷𝒱W in
   ⊢ᵒ-sucP ⊢w λ{n} w →
   let prog : 𝒫₁ (W ⟨ G !⟩) ⊢ᵒ progress (gnd⇒ty H) ((W ⟨ G !⟩) ⟨ H ?⟩)
       prog = inj₂ᵒ (inj₁ᵒ (constᵒI (reduce-inj-proj w))) in
   let pres : 𝒫₁ (W ⟨ G !⟩) ⊢ᵒ preservation (gnd⇒ty H)((W ⟨ G !⟩) ⟨ H ?⟩)
       pres = Λᵒ[ N ] →ᵒI (constᵒE Zᵒ λ r → ⊢ᵒ-weaken (Goal r w ▷𝒱W)) in
   ℰ-intro prog pres
   }
    where
    reduce-inj-proj : ∀{G}{H}{W} → Value W → reducible ((W ⟨ G !⟩) ⟨ H ?⟩)
    reduce-inj-proj {G} {H} {W} w
        with G ≡ᵍ H
    ... | yes refl = W , (collapse w  refl)
    ... | no neq = blame , (collide w neq refl)
    
    Goal : ∀{W}{G}{H}{N}
       → (W ⟨ G !⟩ ⟨ H ?⟩) —→ N
       → Value W
       → 𝒫₁ (W ⟨ G !⟩) ⊢ᵒ ▷ᵒ 𝒱⟦ gnd⇒ty G ⟧ W
       → 𝒫₁ (W ⟨ G !⟩) ⊢ᵒ ▷ᵒ ℰ⟦ gnd⇒ty H ⟧ N
    Goal (ξξ □⟨ H ?⟩ refl refl r) w ▷𝒱W =
        ⊥-elim (value-irreducible (w 〈 _ 〉) r)
    Goal {W} (ξξ-blame □⟨ H ?⟩ ())
    Goal {W}{G}{G}{W} (collapse{H} w′ refl) w ▷𝒱W =
       ▷→▷ ▷𝒱W (𝒱⇒ℰ Zᵒ)
    Goal {W} (collide x x₁ x₂) w ▷𝒱W = monoᵒ ℰ-blame
```

## Fundamental Lemma

The Fundamental Lemma states that a syntactically well-typed term is
also a semantically well-typed term. Or given how we have defined the
logical relations, it means that a well-typed term satisfies progress
and preservation.

```
fundamental : ∀ {Γ A} → (M : Term)
  → Γ ⊢ M ⦂ A
    ----------
  → Γ ⊨ M ⦂ A
fundamental {Γ} {A} .(` _) (⊢` ∋x) =
    compatibility-var ∋x
fundamental {Γ} {.($ₜ ′ℕ)} .($ (Num _)) (⊢$ (Num n)) =
    compatible-nat
fundamental {Γ} {.($ₜ ′𝔹)} .($ (Bool _)) (⊢$ (Bool b)) =
    compatible-bool
fundamental {Γ} {A} (L · M) (⊢· ⊢L ⊢M) =
    compatible-app{L = L}{M} (fundamental L ⊢L) (fundamental M ⊢M)
fundamental {Γ} {.(_ ⇒ _)} (ƛ N) (⊢ƛ ⊢N) =
    compatible-lambda {N = N} (fundamental N ⊢N)
fundamental {Γ} {.★} (M ⟨ G !⟩) (⊢⟨!⟩ ⊢M) =
    compatible-inject {M = M} (fundamental M ⊢M)
fundamental {Γ} {A} (M ⟨ H ?⟩) (⊢⟨?⟩ ⊢M H) =
    compatible-project {M = M} (fundamental M ⊢M)
fundamental {Γ} {A} .blame ⊢blame = compatible-blame
```

## Proof of Type Safety

For the Type Safety theorem, we need to consider multi-step reduction.
So we first prove the following lemma which states that if
`M —↠ N` and `M` is in `ℰ⟦ A ⟧`, then `N` satisfies progress.
The lemma is by induction on the multi-step reduction, using
the preservation part of `ℰ⟦ A ⟧` at each step.

```
sem-type-safety : ∀ {A} → (M N : Term)
  → (r : M —↠ N)
  → # (ℰ⟦ A ⟧ M) (suc (len r))
    ---------------------------------------------
  → Value N  ⊎  (∃[ N′ ] (N —→ N′))  ⊎  N ≡ blame   
sem-type-safety {A} M .M (.M END) (inj₁ 𝒱M , presM) =
    inj₁ (𝒱⇒Value A M 𝒱M)
sem-type-safety {A} M .M (.M END) (inj₂ (inj₁ r) , presM) =
    inj₂ (inj₁ r)
sem-type-safety {A} M .M (.M END) (inj₂ (inj₂ isBlame) , presM) =
    inj₂ (inj₂ refl)
sem-type-safety {A} M N (_—→⟨_⟩_ .M {M′} M→M′ M′→N) (_ , presM) =
    let ℰM′ : # (ℰ⟦ A ⟧ M′) (suc (len M′→N))
        ℰM′ = presM M′ (suc (suc (len M′→N))) ≤-refl M→M′ in
    sem-type-safety M′ N M′→N ℰM′
```

The Type Safety theorem is then a corollary of the Fundamental Lemma
together with the above lemma regarding multi-step reduction.

```
type-safety : ∀ {A} → (M N : Term)
  → [] ⊢ M ⦂ A
  → M —↠ N
    ---------------------------------------------
  → Value N  ⊎  (∃[ N′ ] (N —→ N′))  ⊎  N ≡ blame   
type-safety M N ⊢M M→N =
  let ℰM = ⊢ᵒ-elim ((fundamental M ⊢M) id) (suc (len M→N)) tt in
  sem-type-safety M N M→N ℰM 
```

Using Agda's Induction/Recursion Library

2022-04-07T07:56:00.016-07:00

FastExp

```
module FastExp where
```
# Imports
```
open import Data.Nat
open import Data.Nat.Properties
open import Data.Nat.Induction hiding (rec)
open import Data.Product using (_×_; _,_; Σ; Σ-syntax; ∃; ∃-syntax; proj₁; proj₂)
open import Data.Sum using (_⊎_; inj₁; inj₂)
open import Induction
open import Relation.Binary.PropositionalEquality

open import Exponents
open import Parity
```


I've been putting off learning how to use Agda's `Induction` library
for some time because I knew it would take a serious effort. (The
documentation is incomplete.) However, I ran into yet another
situation that calls for it, so I finally decided to dive in!

The purpose of the `Induction` library is to provide alternate forms
of induction and recursion, such as complete induction, and it helps
you build your own forms of induction. (Induction and recursion are
the same thing in Agda.) Recall that Agda provides built-in support
for structural recursion, but sometimes you want to define a function
that doesn't fit into that mold. For example, suppose I wanted to
write down the fast exponentiation function. That is, I want to define
a function `fast-exp` such that

    fast-exp n x ≡ x ^ n

Here's a naive attempt to write the function in Agda. I'm using an
auxilliary function `parity` that determines whether a number `n` is
even (`n ≡ 2 * k`) or odd (`n ≡ 1 + 2 * k`).

    fast-exp : ℕ → ℕ → ℕ
    fast-exp zero x = 1
    fast-exp (suc n) x
        with parity n
    ... | inj₁ (k , refl) =  x * fast-exp k (x * x)
    fast-exp (suc n) x
        | inj₂ (k , refl) =  x * x * (fast-exp k (x * x))

Agda's termination checker rejects this program because it can't tell
that the argument `k` in the recursive call to `fast-exp` is smaller
than the input parameter `suc n`. We'll use the `Induction` library to
work around this problem.

There are two layers to the `Induction` library, there's a lower-level
layer that resides in the `src/Induction.agda` file of the Agda
standard library and there is a higher-level layer that is specific to
induction on natural numbers in the `src/Data/Nat/Induction.agda`
file. We'll start by using the higher-level layer to define `fast-exp`
and then we'll take a look at the lower-level layer and build a custom
induction principle for natural numbers and use it to redo our
definition of `fast-exp`.

The `Nat.Induction` library provides support for complete induction
(aka. strong induction), which allows you to make a recursive call
with any natural number smaller than the current one. The main
ingredient that you need to define is a "step" function. This function
looks a lot like the recursive function that you're trying to define,
but it takes an extra parameter, let's name it `rec`, that will give
you access to the recursive call. Here's a first (erroneous) attempt
to define such a step function for `fast-exp`. Notice how the
recursive calls to `fast-exp` above have been replaced by calls to `rec`.

    fe-step zero rec x = 1
    fe-step (suc n) rec x
        with parity n
    ... | inj₁ (k , refl) =  x * rec k (x * x)
    fe-step (suc n) rec x
        | inj₂ (k , refl) =  x * x * (rec k (x * x))

## The `CRec` Type Operator

The above doesn't quite work because the `rec` given to us by the
`Nat.Induction` library is not a function, it is a big tuple with one
element for every natural number smaller than the current one. The
type of this tuple is given by `CRec` in the `Nat.Induction`
library. The `CRec` type operator takes three parameters: a universe
level parameter (ignore that for now), a function that produces the
type for each element in the tuple (given its index from the back),
and the size of the tuple.

For the purposes of defining `fast-exp`, we want each element in the
tuple to be a function, in particular, the fast exponentiation
function that's been partially applied to its first parameter. So the
type of each element should be `ℕ → ℕ`. We define the following
`FERec` abbreviation for the use of `CRec` that matches our needs.

```
FERec : ℕ → Set
FERec n = CRec _ (λ i → ℕ → ℕ) n
```

The next thing we'll need is a way to access the nth element of the
tuple. The Agda standard library has a `projₙ` function for this
purpose, but to use it we'd need to prove that the `CRec` type
operator produces a `Product` type. Instead we'll roll our own `projₙ`
function for `CRec`. It takes a tuple of length `suc k` (so that it's
non-empty), an index `n`, and a proof that `n` is less than `suc k`.

```
projₙ : ∀{ℓ P k} → CRec ℓ P (suc k) → (n : ℕ) → n ≤′ k → P n
projₙ {l} rec n ≤′-refl = proj₁ rec
projₙ {l} rec n (≤′-step n≤k) = projₙ (proj₂ rec) n n≤k
```

## A Step Function for Fast Exponentiation

Next we define the step function for fast exponentiation.  The type
for the `rec` parameter is `FERec n`.  To make the recursive call, we
use `projₙ` to access the appropriate partially-applied version of
fast exponentiation (for a smaller natural number) from the `rec`
tuple. However, to do so, we have to prove that `k` is less than `n`,
the length of the tuple.  (More about this below.)

```
fe-step : (n : ℕ) → FERec n → ℕ → ℕ
fe-step zero rec x = 1
fe-step (suc n′) rec x
    with parity n′
... | inj₁ (k , refl) =  x * projₙ rec k lt (x * x)
      where lt : k ≤′ 2 * k
            lt = ≤⇒≤′ (m≤m+n k _)
fe-step (suc n′) rec x
    | inj₂ (k , refl) =  x * (x * (projₙ rec k lt (x * x)))
      where lt : k ≤′ 1 + (2 * k)
            lt = ≤⇒≤′ (≤-step (m≤m+n k _))
```

Regarding `fe-step`, the case for `n ≡ zero` is straightforward.  In
the case for `n ≡ suc n′`, we have two subcases to consider, when `n′`
is even (`n′ ≡ 2 * k`) and when `n′` is odd (`n′ ≡ 1 + 2 * k`).
For the even subcase, to call `projₙ` we need to show that
`k ≤ 2 * k`, which we do with the theorem `m≤m+n` from `Nat.Properties`.
For the odd subcase, to call `projₙ` we need to show that
`k ≤ 1 + 2 * k`, which we do using `≤-step` and then `m≤m+n`.


## Use `cRec` to Define Fast Exponentiation

The final step to defining `fast-exp` is to apply the `cRec` function
from `Nat.Induction` to our step function, `fe-step`. Similar to the
`CRec` type operator, the `cRec` function also needs to know the type
of the elements in the `rec` tuple, which is `ℕ → ℕ`.

```
fast-exp : ℕ → ℕ → ℕ
fast-exp = cRec (λ _ → (ℕ → ℕ)) fe-step
```

## Proof that Fast Exponentiation is Correct

Of course, the whole point of programming in a proof assistant like
Agda is to prove the correctness of our programs. Let's prove that

    fast-exp n x ≡ x ^ n

which will give us an opportunity to 1) use `Nat.Induction` in an
inductive proof and 2) reason about a function that was defined using
`Nat.Induction`.

We're going to refer to the correctness condition many times, so we
define the following abbreviation for it.

```
fe-ok : ℕ → Set
fe-ok n = ∀ x → fast-exp n x ≡ x ^ n
```

When reasoning about `fast-exp`, we'll need to reason about the tuple
that gets passed to the `rec` parameter of `fe-step`.  It turns out
that the `cRec` function builds that tuple using an auxilliary
function named `cRecBuilder`. So we define the following abbreviation
named `fe-rec` for applying the `cRecBuilder` function to our step
function.

```
fe-rec : (n : ℕ) → CRec _ (λ _ → (ℕ → ℕ)) n
fe-rec n = cRecBuilder (λ _ → (ℕ → ℕ)) fe-step n
```

The nth function in the tuple produced by `fe-rec` is the `fast-exp n`
function.

```
projₙ-fe : ∀ n k x (lt : n ≤′ k) → projₙ (fe-rec (suc k)) n lt x ≡ fast-exp n x
projₙ-fe n n x ≤′-refl = refl
projₙ-fe n (suc k) x (≤′-step lt) = projₙ-fe n k x lt
```

(This proof goes through easily because we do induction on `n ≤′ k`,
which is using the alternate form of less-than.  If we had instead
used the normal less-than `≤` in the definition of `projₙ`, this proof
would be more difficult.)

We prove that `fast-exp` is correct using complete induction. Since
induction and recursion are the same thing in Agda, this means the
proof is a (dependetly typed) recursive function defined using `cRec`.

Recall that the first argument to `cRec` is the type for the elements
of the `rec` tuple. However, because we are now doing induction, we
should instead think of the `rec` tuple as the induction hypothesis.
In our step function for the proof we will use the parameter name `IH`
instead of `rec`.  Furthermore, because this tuple serves as the
induction hypothesis, its elements will need to be proofs that
`fast-exp` is correct for particular (smaller) exponents. So the type
of the element at position `n` should be the proposition `fe-ok n`.
Also, recall that the `CRec` type operator produces the type of
the tuple. So for our current purposes, `CRec _ fe-ok n` should be the
type of `IH`.

The second argument to `cRec` is a step function, which in this case
needs to construct a proof that `fast-exp n x ≡ x ^ n` for an
arbitrary `n`, given the induction hypothesis `IH`. We define the
`step` function in the `where` clause of our theorem below. The `step`
function mimics the structure of the `fe-step` function, doing case
analysis on the result of `parity n`.  We then do the appropriate
equational reasoning.  The one unusual step is the first one, which
uses the `projₙ-fe` lemma to replace the "raw" recursive call via
`projₙ` (as it appears in the body of `fe-step`) with a call to
`fast-exp`.

```
fast-exp-is-correct : ∀ n x → fast-exp n x ≡ x ^ n
fast-exp-is-correct = cRec fe-ok step
  where
  step : (n : ℕ) → CRec _ fe-ok n → fe-ok n
  step zero IH x = refl
  step (suc n) IH x
      with parity n
  ... | inj₁ (k , refl) =
        begin
          x * projₙ (fe-rec (1 + 2 * k)) k lt (x * x)   ≡⟨ cong (λ X → x * X) (projₙ-fe k (k + (k + zero)) (x * x) lt) ⟩
          x * fast-exp k (x * x)                        ≡⟨ cong (λ X → x * X) (projₙ IH k (≤⇒≤′ (m≤m+n _ _)) (x * x)) ⟩
          x * ((x * x) ^ k)                             ≡⟨ cong (λ X → x * X) (*-distribˡ-^ x x k) ⟩
          x * (x ^ k * x ^ k)                           ≡⟨ cong (λ X → x * (x ^ k * x ^ X)) (sym (+-identityʳ k)) ⟩
          x * (x ^ k * x ^ (k + 0))                     ≡⟨ cong (λ X → x * X) (sym (^-distribˡ-+-* x k (k + zero))) ⟩
          x * (x ^ (2 * k))
        ∎
        where
        open ≡-Reasoning
        lt = ≤⇒≤′ (m≤m+n k (k + zero))
  ... | inj₂ (k , refl) =
        begin
          x * (x * (projₙ (fe-rec (2 + (2 * k))) k lt (x * x))) ≡⟨ cong (λ X → x * (x * X)) (projₙ-fe k (1 + (2 * k)) (x * x) lt) ⟩
          x * (x * (fast-exp k (x * x)))                        ≡⟨ cong (λ X → x * (x * X)) (projₙ IH k (≤⇒≤′ (≤-step (m≤m+n _ _))) (x * x)) ⟩ 
          x * (x * (x * x) ^ k)                                 ≡⟨ cong (λ X → x * (x * X)) (*-distribˡ-^ x x k) ⟩
          x * (x * ((x ^ k) * (x ^ k)))                         ≡⟨ cong (λ X → x * (x * ((x ^ k) * (x ^ X))))(sym(+-identityʳ k)) ⟩
          x * (x * (x ^ k * x ^ (k + 0)))                       ≡⟨ cong (λ X → x * (x * X)) (sym (^-distribˡ-+-* x k (k + zero))) ⟩
          x * (x * (x ^ (2 * k)))
        ∎
        where
        open ≡-Reasoning
        lt = ≤⇒≤′ (≤-step (m≤m+n k (k + zero)))
```

## Diving Deeper, Definition Induction using the `Induction` Library

We have seen how to use the facilities in `Nat.Induction` for complete
induction. Next we explore how to use the lower-level library in
`src/Induction.agda` to build our own recursion/induction principle.
In particular, we'll build an alternative form of complete induction
that should be more familiar to everyone, one in which the `rec`
parameter is simply a function, not a tuple of functions.

## The `RecStruct` Type Operator and `SRec`

The `RecStruct` type operator in `src/Induction.agda` produces the
type for the operators like `CRec`. The parameter `A` of `RecStruct`
is for the things you're doing induction on (e.g. `ℕ`) and the
universe levels `ℓ₁` and `ℓ₂` can be ignored for now.

    RecStruct : ∀ {a} → Set a → (ℓ₁ ℓ₂ : Level) → Set _
    RecStruct A ℓ₁ ℓ₂ = Pred A ℓ₁ → Pred A ℓ₂

Recall that `Pred A ℓ₁` is a function from an element of `A` to a type
(an instance of `Set`). In `FERec` above, `(λ i → ℕ → ℕ)` is an
example of something of type `Pred ℕ _`, and we applied `CRec` to this
predicate.

Let us take a look at the definition of `CRec`. It generates a
tuple type of length `n` where each element is of type `P i`
(for `i < n`) except for `0`. The last element is just unit.

    CRec : ∀ ℓ → RecStruct ℕ ℓ ℓ
    CRec ℓ P zero    = ⊤
    CRec ℓ P (suc n) = P n × CRec ℓ P n

For our alternative form of complete induction, we replace the tuple
with a function. In particular, a function that takes a number `k`, a
proof that `k` is smaller than the current `n`, and produces the
function's result for `k`. So the type of our `rec` parameter is given
by the following `SRec` operator.

```
SRec : ∀ ℓ → RecStruct ℕ ℓ ℓ
SRec ℓ P = λ n → ∀ k → k <′ n → P k
```

(We use `<′` instead of `<` to make it easier to define `sRecBuilder`
and prove `fe-rec-fast-exp₂` in the following.)


## Building the Arguments for the `rec` Parameter

The next step is to build a value of type `SRec` for every natural
number, so that we can pass these values into the `rec` parameter of
the client's step function. Thus, we have to define a function
analogous to the `cRecBuilder` we discussed above. Let us name our
builder function `sRecBuilder`. As a guide, the `Induction` library
defines the `RecursorBuilder` type operator that specifies the type
for builder functions. Its input parameter `Rec` has type `RecStruct`
(e.g. `SRec` is a valid argument to `RecursorBuilder`).

    RecursorBuilder : ∀ {a ℓ₁ ℓ₂} {A : Set a} → RecStruct A ℓ₁ ℓ₂ → Set _
    RecursorBuilder Rec = ∀ P → (Rec P ⊆′ P) → Universal (Rec P)

A builder function takes 1) a predicate `P` that specifies the result
type of the recursive function (just like the `P` in `SRec`), and 2) a
step function that produces a `P` given a `rec` parameter of type `Rec
P`.  The result of the builder function is a value of type `Rec P`,
that is, a value that can be passed into the `rec` parameter of the
client's step function.

So our `sRecBuilder` function has type `RecursorBuilder (SRec ℓ)`.  So
it takes a predicate `P`, a step function, and its output is of type
`SRec ℓ P`, so the output is a function with parameters `n` and `k`
and a proof of `k <′ n`. The function produces an element of `P k`. We
define `sRecBuilder` using induction on `k <′ n`. We discuss
the two cases below.

```
sRecBuilder : ∀ {ℓ} → RecursorBuilder (SRec ℓ)
sRecBuilder P step .(suc k) k ≤′-refl = step k rec
  where rec = sRecBuilder P step k
sRecBuilder P step (suc n) k (≤′-step lt) = sRecBuilder P step n k lt
```

* In the case of `≤′-refl`, we have `n = suc k`. We need to produce `P k`,
  which we can do with the call `step k rec`, but we need to fill in
  the second argument `rec`. This we do by the recursive call to
  `sRecBuilder`.

* The case for `≤′-step` is even easier. We simply call `sRecBuilder`
  recursively.


## The `build` Function to Finish `sRec` 

The final step in creating our custom induction/recursion principle is
to invoke the `build` function in `src/Induction.agda` to produce
`sRec`. The `build` function takes a builder function, such as
`sRecBuilder`, and produces a `Recursor`, which is a function that,
given a step function, produces a recursive function.

```
sRec : ∀{ℓ} → Recursor (SRec ℓ)
sRec = build sRecBuilder
```

## Revisiting Fast Exponentiation with Strong Recursion/Induction

Analogous to `FERec`, we define the type for the `rec` parameter of
our step function with the below `FERec₂`, but this time use `SRec`
instead of `CRec`.

```
FERec₂ : ℕ → Set
FERec₂ n = SRec _ (λ i → ℕ → ℕ) n
```

The step function `fe-step₂` is similar to `fe-step`, but this time
the `rec` parameter is easier to work with. It's a function that we
can call. It just requires an extra argument with the proof that the
argument `k` is less than `n`.

```
fe-step₂ : (n : ℕ) → FERec₂ n → (ℕ → ℕ)
fe-step₂ zero rec x = 1
fe-step₂ (suc n′) rec x
    with parity n′
... | inj₁ (k , refl) =  x * rec k lt (x * x)
      where lt : k <′ 1 + (2 * k)
            lt = ≤⇒≤′ (s≤s (m≤m+n k _))
... | inj₂ (k , refl) =  x * x * rec k lt (x * x)
      where lt : k <′ 2 + (2 * k)
            lt = ≤⇒≤′ (s≤s (≤-step (m≤m+n k _)))
```

We define our second fast exponentiation using `sRec` and our new step
function `fe-step₂`.

```
fast-exp₂ : ℕ → ℕ → ℕ
fast-exp₂ = sRec (λ _ → (ℕ → ℕ)) fe-step₂
```

## Revisiting the Proof that Fast Exponentiation is Correct

Let us see how this alternate definition of fast exponentiation
affects our proof of correctness. (TLDR: it only changes one lemma.)
We define an abbreviation for the correctness condition as follows.

```
fe-ok₂ : ℕ → Set
fe-ok₂ n = ∀ x → fast-exp₂ n x ≡ x ^ n
```

And another appreviation for calling `sRecBuilder` with the new step
function `fe-step₂`.

```
fe-rec₂ : (n : ℕ) → SRec _ (λ _ → (ℕ → ℕ)) n
fe-rec₂ n = sRecBuilder (λ _ → (ℕ → ℕ)) fe-step₂ n
```

Last time we proved the lemma `projₙ-fe` to relate the "raw" recursive
call to a call to `fast-exp`. Here we need a similar lemma, but it is
somewhat simpler because we no longer need to use `projₙ`.  So we just
need to relate `fe-rec₂` to `fast-exp₂`.

```
fe-rec-fast-exp₂ : ∀ k n x (lt : k <′ suc n) → fe-rec₂ (suc n) k lt x ≡ fast-exp₂ k x
fe-rec-fast-exp₂ k .k x ≤′-refl = refl
fe-rec-fast-exp₂ k (suc n′) x (≤′-step lt) = fe-rec-fast-exp₂ k n′ x lt
```

We prove that `fast-exp₂` is correct, again with a proof by complete
induction. The only difference is that the first step in the equational
reasoning is to use `fe-rec-fast-exp₂` instead of `projₙ-fe`.

```
fast-exp₂-is-correct : ∀ n x → fast-exp₂ n x ≡ x ^ n
fast-exp₂-is-correct = cRec fe-ok₂ step
  where
  step : (n : ℕ) → CRec _ fe-ok₂ n → fe-ok₂ n
  step zero IH x = refl
  step (suc n′) IH x
      with parity n′
  ... | inj₁ (k , refl) =
      begin
        x * fe-rec₂ (suc n′) k lt (x * x) ≡⟨ cong(λ X → x * X) (fe-rec-fast-exp₂ k n′ (x * x) lt) ⟩
        x * fast-exp₂ k (x * x)           ≡⟨ cong(λ X → x * X) (projₙ IH k lt₂ (x * x)) ⟩
        x * (x * x) ^ k                   ≡⟨ cong(λ X → x * X) (*-distribˡ-^ x x k) ⟩
        x * ((x ^ k) * (x ^ k))           ≡⟨ cong(λ X → x * (x ^ k * x ^ X)) (sym(+-identityʳ k)) ⟩
        x * (x ^ k * x ^ (k + 0))         ≡⟨ cong(λ X → x * X) (sym (^-distribˡ-+-* x k (k + zero))) ⟩
        x * x ^ (2 * k)
      ∎
      where
      open ≡-Reasoning
      lt = s≤′s (≤⇒≤′ (m≤m+n k (k + zero)))
      lt₂ = ≤⇒≤′ (m≤m+n k (k + zero))
  ... | inj₂ (k , refl) =
      begin
      (x * x) * fe-rec₂ (2 + 2 * k) k lt (x * x) ≡⟨ cong(λ X → (x * x) * X)(fe-rec-fast-exp₂ k (1 + (2 * k)) (x * x) lt) ⟩
      (x * x) * fast-exp₂ k (x * x)              ≡⟨ cong(λ X → (x * x) * X)(projₙ IH k lt₂ (x * x)) ⟩
      (x * x) * (x * x) ^ k                      ≡⟨ cong(λ X → (x * x) * X) (*-distribˡ-^ x x k) ⟩
      (x * x) * (x ^ k  *  x ^ k)                ≡⟨ cong(λ X → (x * x) * (x ^ k  *  x ^ X)) (sym(+-identityʳ k)) ⟩
      (x * x) * (x ^ k  *  x ^ (k + 0))          ≡⟨ cong(λ X → (x * x) * X) (sym (^-distribˡ-+-* x k (k + zero))) ⟩
      (x * x) * (x ^ (2 * k))                    ≡⟨ *-assoc x _ _ ⟩
      x * (x * (x ^ (2 * k)))
      ∎
      where
      open ≡-Reasoning
      lt = s≤′s (≤⇒≤′ (≤-step (m≤m+n k (k + zero))))
      lt₂ = (≤⇒≤′ (≤-step (m≤m+n k (k + zero))))
```

## A Parting Thought

As I was fumbling around and bumping into deadends on my way to
writing the above definitions and proofs, perhaps the trickiest part
of using the `Induction` library was stating and proving the lemmas
that relate the "raw" recursive call to the recursive function, e.g.,
`projₙ-fe` and `fe-rec-fast-exp₂`. I wonder whether the `Induction`
library could somehow also provide a general lemma for that, perhaps
with help from the client.

Strongly Connected Components and Kosaraju's Algorithm

2021-03-27T13:43:00.007-07:00

Some presentations of Kosaraju’s Algorithm don’t provide a detailed explanation of why the algorithm works. Here’s my attempt to explain it.

The story begins with depth-first search (DFS). To review, DFS goes deeper at each step, following an out-edge from the current vertex to a never-before-seen vertex. If there are no out-edges to never-before-seen vertices, then the search backtracks to the last visited vertex with out-edges to never-before-seen vertices and continues from there.

The following graph shows the result of DFS on a small graph. The edges traversed by the DFS are marked in green and form a depth-first tree.

Example of Depth First Search.

We can categorizes the edges of the graph with respect to the depth-first tree in the following way:

tree edge: an edge on the tree, e.g., g → c in the graph above.
back edge: an edge that connects a descendent to an ancestor with respect to the tree, e.g., f → g.
forward edge: an edge that connects an ancestor to a descendent wrt. the tree, e.g., f → e
cross edge: all other edges, e.g., k → l.

Graph with edges categorized by a Depth-First Search. The tree edges are in green, back edges in red, forward edges in blue, and cross edges in black.

Theorem A graph has a cycle if and only if there is a back edge.

As we shall see, it is useful to record timestamps on a vertex when it is first discovered during a DFS and when it is finished (after visiting all of its descendants).

A graph with discover and finish times from a Depth-First Search.

Theorem

If u → v is a tree, forward or cross edge, then the finish_time[v] < finish_time[u].
If u → v is a back edge, then finish_time[u] < finish_time[v].

For reference, here is the code for DFS and its auxiliary function DFS_visit.

static void DFS_visit(List<List<Integer>> G, Integer u, 
                      ArrayList<Integer> parent,
                      ArrayList<Boolean> visited, List<Integer> finish) {

    visited.set(u, true);
    for (Integer v : G.get(u)) {
        if (! visited.get(v)) {
            parent.set(v, u);
            DFS_visit(G, v, parent, visited, finish);
        }
    }
    finish.add(u);
}

static List<Integer> DFS(List<List<Integer>> G) {
    ArrayList<Integer> parent = new ArrayList<>();
    ArrayList<Boolean> visited = new ArrayList<>();
    for (int u = 0; u != G.size(); ++u) {
      parent.add(u);
      visited.add(false);
    }
    ArrayList<Integer> finish = new ArrayList<>();
    for (int u = 0; u != G.size(); ++u) {
        visited.set(u, false);
    }
    for (int u = 0; u != G.size(); ++u) {
        if (! visited.get(u))
            DFS_visit(G, u, parent, visited, finish);
    }
    return finish;

Now we turn to discussing the problem of computing strongly connected components.

Definition A strongly connected component is a maximum subset of the vertices in a graph such that every vertex in the subset is reachable from all the other vertices in the subset.

For example, the following graph

has these strongly connected components:

Definition The component graph C of another digraph G has 1) a vertex for each SCC in G. For each vertex u in C, we write SCC(u) for it’s SCC in G. 2) an edge between u and v if there is an edge from any vertex in SCC(u) to any vertex in SCC(v).

Here is the component graph of the example.

Theorem A component graph is acyclic.

Otherwise, the vertices in the cycle represent connected components that are not maximal. They could have been combined into a larger SCC.

Kosaraju’s Algorithm for SCC

Suppose we do a DFS_visit from a random node in the graph G. We’ll visit all of the other nodes in its SCC (that’s good) but we may also visit nodes in other SCCs (that’s bad).

How can we cause DFS_visit to stop before visiting nodes in other SCCs?

If we run DFS_visit on a node in an SCC that has no out-edges to other SCCs, then we’d just visit the nodes in that SCC and no other. We could then remove those nodes from the graph and repeat.

That’s a lot like a topological ordering on the component graph C, but with out-edges instead of in-edges. So what we need is a topological ordering on the transposed component graph C^T.

Definition The transpose of a graph G, written G^T, has the same vertices as G but the edges are reversed.

For examaple,

D, B, A, E

is a topological ordering of C^T.

But we don’t have C yet… that’s what we’re trying to compute!

Recall that DFS finish times are related to topological ordering. We can apply DFS to G^T to obtain finish times. Here’s the transposed graph of the example with the root and edges of each DFS tree highlighted in green.

and here are the vertices ordered by finish time:

3, 0, 1, 5, 4, 2

The vertex that finished last (vertex 2) must be in a SCC (D) that does not have any in-edges in C^T. Why is that? If there were an in-edge from another SCC, then the source of that in-edge would have finished later, but that contradicts vertex 2 being the last to finish. The only way the source of the in-edge could finish earlier would be if it was a back edge, but then the two vertices would be in a cycle and in the same SCC, which contradicts them being in different SCC.

Since the SCC (D) does not have any in-edges in C^T, it doesn’t have any out-edges in the C.

So running DFS_visit on vertex 2 in the original graph will only reach other vertices in its SCC (D). DFS_visit will mark all of those vertices as visited, so later runs of DFS_visit will ignore them.

We continue running DFS_visit on each vertex according to the reverse order of finish time (i.e. 4,5,1,0,3). Each tree in the resulting DFS forest is a SCC.

So here’s the algorithm

Transpose the graph G to obtain G^T.
Apply DFS to G^T to obtain the order in which the vertices finished.
For each vertex u in the reversed finish list, apply DFS_visit to u in G.
Each of the resulting DFS trees is a SCC. (The trees are encoded in the parent array.)

(The above differs from the standard presentation of Kosaraju’s algorithm, which instead applies DFS to G to get an ordering, and then applies DFS_visit to G^T repeatedly to get the DFS forest in which each tree is an SCC.)

static ArrayList<Integer> SCC(List<List<Integer>> G) {
  List<List<Integer>> GT = transpose(G);
  List<Integer> finished = DFS(GT);
  Collections.reverse(finished);

  ArrayList<Integer> parent = new ArrayList<>();
  ArrayList<Boolean> visited = new ArrayList<>();
  for (int u = 0; u != G.size(); ++u) {
    parent.add(u);
    visited.add(false);
  }
  ArrayList<Integer> ignore = new ArrayList<>();
  for (Integer u : finished) {
    if (! visited.get(u))
      DFS_visit(G, u, parent, visited, ignore);
  }
  return parent;
}

Type Safety in Two Easy Lemmas

2020-07-10T13:11:00.000-07:00

Type Safety in Two Easy Lemmas

Wow, it's been seven years already since I blogged about Type Safety in Three Easy Lemmas. Time flies! In that blog post I showed how to prove type safety of a simple language whose semantics was specified by a definitional interpreter. I still like that approach, and it has proved useful to other researchers on much larger projects such as the verified CakeML compiler.

In the meantime, I've learned about the Agda proof assistant thanks to the book Programming Language Foundations in Agda (PLFA) and I've become excited by Agda's abstraction mechanisms that enable proof reuse. I'm working on an Agda library for reusable programming language metatheory, called abstract-binding-trees. As the name suggests, it represents abstract syntax trees using Robert Harper's notion of abstract binding trees (ABT), that is, trees that are enhanced to know about variable bindings and variable occurrences (See the book Practical Foundations for Programming Languages). My library provides a suite of useful functions on abstract binding trees, such as substitution, and theorems about those functions. The neat thing about these theorems is that they automatically apply to any language whose grammar is built using abstract binding trees!

In this blog post I'll prove type safety of the simply-typed lambda calculus (STLC) with respect to a semantic specified in the standard way using a reduction semantics (standard for PL theory). The proof includes just two easy lemmas: progress and preservation. Normally a proof via progress and preservation also requires quite a few technical lemmas about substitution, but in this case we get those lemmas for free thanks to the abstract-binding-trees library.

This blog post is a literate Agda file, so the text will be interspersed with the Agda code that defines the STLC and proves type safety.

module examples.BlogTypeSafetyTwoEasy where

We'll be making use of the following items from the Agda standard library.

open import Data.List using (List; []; _∷_; length)
open import Data.Nat using (ℕ; zero; suc)
open import Data.Product using (_×_; proj₁; proj₂) renaming (_,_ to ⟨_,_⟩ )
open import Data.Unit.Polymorphic using (⊤; tt)
open import Data.Vec using (Vec) renaming ([] to []̌; _∷_ to _∷̌_)
open import Relation.Binary.PropositionalEquality using (_≡_; refl; sym)

Syntax of the STLC

The abstract-binding-trees library provides a module named Syntax that provides facilities for creating abstract binding trees.

open import Syntax

An abstract binding tree ABT consists of two kinds of nodes:

Variables: A variable node is a leaf (no children) and stores the de Bruijn index for the variable.
Operators: An operator node is tagged with the kind of operator and it has zero or more children, depending on the kind of operator.

The ABT data type is parameterized by the kinds of operators and their signatures, which specifies things like the number of child nodes for each kind of operator. To specify the operators, you create a data type definition with one constructor for each kind of operator. For the STLC the operators are lambda abstraction and application.

data Op : Set where
  op-lam : Op
  op-app : Op

To specify the operator signatures, write a function that maps the operators to a list of the Sig data type. The length of the list says the number of children nodes and the Sig controls changes in variable scoping for the child. The Sig data type is defined by the abstract-binding-trees library as follows:

data Sig : Set where
  ■ : Sig
  ν : Sig → Sig
  ∁ : Sig → Sig

The ν brings a variable into scope. The ∁ clears the scope of the child, so that the child does not have access to the surrounding lexical scope. The ■ terminates the changes in scope.

For the STLC, the signature function is defined as follows.

sig : Op → List Sig
sig op-lam = (ν ■) ∷ []
sig op-app = ■ ∷ ■ ∷ []

With Op and sig defined, we can import the abstract binding tree data type ABT from the Syntax library. We choose to rename it to Term.

open Syntax.OpSig Op sig renaming (ABT to Term)

The raw abstract binding trees are verbose to deal with, so we use Agda pattern synonyms to obtain syntax that is closer to the pen-and-paper STLC. We write ƛ N for a lambda abstraction with body N and we write L · M for the application of the function produces by L to the argument produced by M.

pattern ƛ N  = op-lam ⦅ cons (bind (ast N)) nil ⦆

infixl 7  _·_
pattern _·_ L M = op-app ⦅ cons (ast L) (cons (ast M) nil) ⦆

Reduction Semantics

We define the reduction semantics for the STLC in the usual way, with several congruence rules (the ξ's) and the β rule for function application. In the β rule, we use the substitution function defined in the abstract-binding-trees library, writing N [ M ] for replacing all the occurrences of de Bruijn index 0 inside N with the term M.

infix 2 _—→_

data _—→_ : Term → Term → Set where

  ξ-·₁ : ∀ {L L′ M : Term}
    → L —→ L′
      ---------------
    → L · M —→ L′ · M

  ξ-·₂ : ∀ {L M M′ : Term}
    → M —→ M′
      ---------------
    → L · M —→ L · M′

  ξ-ƛ : ∀ {N N′ : Term}
    → N —→ N′
      ---------------
    → (ƛ N) —→ (ƛ N′)

  β-ƛ : ∀ {N M : Term}
      --------------------
    → (ƛ N) · M —→ N [ M ]

Type System

To make use of the theorems in the abstract-binding-trees library, we need to use its approach to defining type systems. Instead of defining the whole type system ourselves using an Agda data type, we instead specify 1) the types and 2) the side-conditions for each typing rule.

For STLC, we have function types, written A ⇒ B, and the bottom type Bot.

data Type : Set where
  Bot   : Type
  _⇒_   : Type → Type → Type

The library asks that we specify a side condition for the variable rule that mediates the variable's type A in the environment with the expected type B, for which we define the following predicate 𝑉. For the STLC we simply require that A ≡ B.

𝑉 : List Type → Var → Type → Type → Set
𝑉 Γ x A B = A ≡ B

Next we define the predicate 𝑃 that specifies the side conditions for all the other syntax nodes. The definition of 𝑃 includes one line for each operator. The Vec parameter contains the types of the child nodes. The BTypes parameter contains the types of the bound variables. The last Type parameter is the type assigned to the current node. So for lambda abstractions (op-lam), the body has type B, the lambda's bound variable has type A, and we require that the type C of the lambda is a function type from A to B, that is, C ≡ A ⇒ B. For application (op-app), the function has type C, the argument has type A, and the result type is B provided that C is a function type from A to B, that is, C ≡ A ⇒ B.

𝑃 : (op : Op) → Vec Type (length (sig op)) → BTypes Type (sig op) → Type → Set
𝑃 op-lam (B ∷̌ []̌) ⟨ ⟨ A , tt ⟩ , tt ⟩ C = C ≡ A ⇒ B
𝑃 op-app (C ∷̌ A ∷̌ []̌) ⟨ tt , ⟨ tt , tt ⟩ ⟩ B = C ≡ A ⇒ B

We import the ABTPredicate module, using our definitions of 𝑉 and 𝑃, to obtain the type system for the STLC.

open import ABTPredicate Op sig 𝑉 𝑃

The raw typing rules are verbose, so we again use Agda's pattern synonyms to create abbreviations to match the rule names in PLFA.

pattern ⊢` ∋x = var-p ∋x refl
pattern ⊢ƛ ⊢N eq = op-p {op = op-lam} (cons-p (bind-p (ast-p ⊢N)) nil-p) eq
pattern ⊢· ⊢L ⊢M eq = op-p {op = op-app}
                           (cons-p (ast-p ⊢L) (cons-p (ast-p ⊢M) nil-p)) eq

Proof of Type Safety

We prove type safety with two lemmas: progress and preservation.

Proof of Progress

The progress lemma states that every closed, well-typed term is either a value (so it's finished computing) or it can reduce.

In the STLC, lambda abstractions are values.

data Value : Term → Set where

  V-ƛ : ∀ {N : Term}
      --------------
    → Value (ƛ N)

Following PLFA, we define an auxiliary data type to express the conclusion of the progress lemma.

data Progress (M : Term) : Set where

  done :
      Value M
      ----------
    → Progress M

  step : ∀ {N}
    → M —→ N
      ----------
    → Progress M

The proof of progress is by induction on the typing derivation. The variable case is vacuous because M is closed (well typed in an empty environment). In the lambda case, we're done. Regarding an application L · M, the induction hypothesis tells us that term L either takes a step or is already a lambda abstraction. In the former case, the whole application reduces using the congruence rule ξ-·₁. In the later case, the whole application reduces using β reduction.

progress : ∀ {M A}
  → [] ⊢ M ⦂ A
    ----------
  → Progress M
progress (⊢` ())
progress (⊢ƛ ⊢N _)                          =  done V-ƛ
progress (⊢· ⊢L ⊢M _)
    with progress ⊢L
... | step L—→L′                            =  step (ξ-·₁ L—→L′)
... | done V-ƛ                              =  step β-ƛ

As you can see, to prove progress we didn't need help from the abstract-binding-trees library.

Proof of Preservation

The preservation lemma says that if a well-typed term reduces to another term, then that term is also well typed. The proof is by induction on the derivation of the reduction. The only interesting case is the one for β reduction:

(ƛ N) · M —→ N [ M ]

We know that

(A ∷ Γ) ⊢ N ⦂ B
Γ ⊢ M ⦂ A

and we need prove that

Γ ⊢ N [ M ] ⦂ B

This requires the lemma that substitution preserves typing, which is provided in the SubstPreserve module of the abstract-binding-trees library. This module places four restrictions on 𝑉, for which we provide the proofs (λ x → refl), etc.

open import SubstPreserve Op sig Type 𝑉 𝑃 (λ x → refl) (λ { refl refl → refl })
    (λ x → x) (λ { refl ⊢M → ⊢M }) using (preserve-substitution)

So here is the proof of preservation.

preserve : ∀ {Γ M N A}
  → Γ ⊢ M ⦂ A
  → M —→ N
    ----------
  → Γ ⊢ N ⦂ A
preserve (⊢· ⊢L ⊢M refl) (ξ-·₁ L—→L′) = ⊢· (preserve ⊢L L—→L′) ⊢M refl
preserve (⊢· ⊢L ⊢M refl) (ξ-·₂ M—→M′) = ⊢· ⊢L (preserve ⊢M M—→M′) refl
preserve (⊢ƛ ⊢M refl) (ξ-ƛ M—→N) = ⊢ƛ (preserve ⊢M M—→N) refl
preserve {M = (ƛ N) · M} (⊢· (⊢ƛ ⊢N refl) ⊢M refl) β-ƛ =
    preserve-substitution N M ⊢N ⊢M

Thus we conclude the proof of type safety, having only needed to prove two lemmas, progress and preservation. Thanks to the abstract-binding-trees library, we did not need to prove that substitution preserves types nor any of the many technical lemmas that it depends on.

Reading list for getting started on Gradual Typing

2018-09-12T05:51:00.001-07:00

Which papers would I recommend for getting started on understanding the research on gradual typing? That's a hard question because there are a lot of papers to choose from and, as research papers, their primary goal was not to give a good introduction, but instead to describe some scientific contribution. I really ought to write a proper introduction, but in the mean time, here's my choice of a few papers to get started.

Refined Criteria for Gradual Typing
This paper does a decent job of surveying research related to gradual typing and situating it with respect to other areas of research in programming languages and type systems. The paper includes a modern and, what I would deem canonical, specification of the Gradually Typed Lambda Calculus (GTLC). Finally, the paper gives formal criteria for what it means for a language to be gradually typed, including the gradual guarantee.
Blame and Coercion: Together Again for the First Time (alternative location)
The runtime semantics of a gradually typed language is typically given in two parts: 1) a translation to a cast calculus and 2) an operational semantics for the cast calculus. Nowadays, I recommend using coercions to express casts because they help to constrain the design space in a good way, they are easily extended to handle blame tracking, and they can be compressed to ensure space efficiency (time too!). This paper defines an easy-to-understand coercion calculus $\lambda C$ and a space-efficient calculus $\lambda S$, proves that they are equivalent to the standard cast calculus $\lambda B$, and also reviews the blame safety theorem.
Abstracting Gradual Typing (alternative location)
This paper presents a general framework based on abstract interpretation for understanding gradual typing and for extending gradual typing to handle languages that make use of other predicates on types, such as subtyping. The framework provides guidance for how to define the consistency relation and for how to derive an operational semantics.

After reading the above papers, there's plenty more to enjoy! See the bibliography maintained by Sam Tobin-Hochstadt.

Intersection Types, Sub-formula Property, and the Functional Character of the Lambda Calculus

2018-08-09T07:32:00.000-07:00

Intersection Types, Sub-formula Property, and the Functional Character of the Lambda Calculus

Last December I proved that my graph model of the lambda calculus, once suitable restricted, is deterministic. That is, I defined a notion of consistency between values, written $v_1 \sim v_2$, and showed that any two outputs of the same program are consistent.
Theorem (Determinism)
If $v \in {\mathcal{E}{[\![ e ]\!]}}\rho$, $v' \in {\mathcal{E}{[\![ e ]\!]}}\rho'$, and $\rho \sim \rho'$, then $v \sim v'$.
Recall that values are integers or finite relations; consistency for integers is equality and consistency for relations means mapping consistent inputs to consistent outputs. I then restricted values to be well formed, meaning that they must be consistent with themselves (and similarly for their parts).

Having proved the Determinism Theorem, I thought it would be straightforward to prove the following related theorem about the join of two values.
Theorem (Join)
If $v \in {\mathcal{E}{[\![ e ]\!]}}\rho$, $v' \in {\mathcal{E}{[\![ e ]\!]}}\rho'$, $\rho$ is well formed, $\rho'$ is well formed, and $\rho \sim \rho'$,
then $v \sqcup v' \in {\mathcal{E}{[\![ e ]\!]}}(\rho\sqcup\rho')$.
I am particularly interested in this theorem because $\beta$-equality can be obtained as a corollary. \[{\mathcal{E}{[\![ ({\lambda x.\,}e){\;}e' ]\!]}}\rho = {\mathcal{E}{[\![ [x{:=}e']e ]\!]}}\rho\] This would enable the modeling of the call-by-name $\lambda$-calculus and it would also enable the use of $\beta$-equality in a call-by-value setting when $e'$ is terminating (instead of restricting $e'$ to be a syntactic value).

Recall that we have defined a partial order $\sqsubseteq$ on values, and that, in most partial orders, there is a close connection between notions of consistency and least upper bounds (joins). One typically has that $v \sim v'$ iff $v \sqcup v'$ exists. So my thinking was that it should be easy to adapt my proof of the Determinism Theorem to prove the Join Theorem, and I set out hoping to finish in a couple weeks. Hah! Here we are 8 months later and the proof is complete; it was a long journey that ended up depending on a result that was published just this summer, concerning intersection types, the sub-formula property, and cut elimination by Olivier Laurent. In this blog post I’ll try to recount the journey and describe the proof, hopefully remembering the challenges and motivations. Here is a tar ball of the mechanization in Isabelle and in pdf form.

Many of the challenges revolved around the definitions of $\sqsubseteq$, consistency, and $\sqcup$. Given that I already had definitions for $\sqsubseteq$ and consistency, the obvious thing to try was to define $\sqcup$ such that it would be the least upper bound of $\sqsubseteq$. So I arrived at this partial function: \[\begin{aligned} n \sqcup n &= n \\ f_1 \sqcup f_2 &= f_1 \cup f_2\end{aligned}\] Now suppose we prove the Join Theorem by induction on $e$ and consider the case for application: $e = (e_1 {\;}e_2)$. From $v \in {\mathcal{E}{[\![ e_1 {\;}e_2 ]\!]}}$ and $v' \in {\mathcal{E}{[\![ e_1 {\;}e_2 ]\!]}}$ we have

$f \in {\mathcal{E}{[\![ e_1 ]\!]}}\rho$, $v_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho$, $v_3 \mapsto v_4 \in f$, $v_3 \sqsubseteq v_2$, and $v \sqsubseteq v_4$ for some $f, v_2, v_3, v_4$.
$f' \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho'$, $v'_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\rho'$, $v'_3 \mapsto v'_4 \in f$, $v'_3 \sqsubseteq v'_2$, and $v' \sqsubseteq v'_4$ for some $f', v'_2, v'_3, v'_4$.

By the induction hypothesis we have $f \sqcup f' \in {\mathcal{E}{[\![ e_1 ]\!]}}$ and $v_2 \sqcup v'_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}$. We need to show that \[v''_3 \mapsto v''_4 \in f \sqcup f' \qquad v''_3 \sqsubseteq v_2 \sqcup v'_2 \qquad v \sqcup v' \sqsubseteq v''_4\] But here we have a problem. Given our definition of $\sqcup$ in terms of set union, there won’t necessarily be a single entry in $f \sqcup f'$ that combines the information from both $v_3 \mapsto v_4$ and $v'_3 \mapsto v'_4$. After all, $f \sqcup f'$ contains all the entries of $f$ and all the entries of $f'$, but the set union operation does not mix together information from entries in $f$ and $f'$ to form new entries.

Intersection Types to the Rescue

At this point I started thinking that my definitions of $\sqsubseteq$, consistency, and $\sqcup$ were too simple, and that I needed to incorporate ideas from the literature on filter models and intersection types. As I’ve written about previously, my graph model corresponds to a particular intersection type system, and perhaps a different intersection type system would do the job. Recall that the correspondence goes as follows: values correspond to types, $\sqsubseteq$ corresponds to subtyping $<:$ (in reverse), and $\sqcup$ corresponds to intersection $\sqcap$. The various intersection type systems primarily differ in their definitions of subtyping. Given the above proof attempt, I figured that I would need the usual co/contra-variant rule for function types and also the following rule for distributing intersections over function types. \[(A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)\] This distributivity rule enables the “mixing” of information from two different entries.

So I defined types as follows: \[A,B,C,D ::= n \mid A \to B \mid A \sqcap B\] and defined subtyping according to the BCD intersection type system (Lambda Calculus with Types, Barendregt et al. 2013). \[\begin{gathered} A <: A \qquad \frac{A <: B \quad B <: C}{A <: C} \\[2ex] A \sqcap B <: A \qquad A \sqcap B <: B \qquad \frac{C <: A \quad C <: B}{C <: A \sqcap B} \\[2ex] \frac{C <: A \quad B <: D}{A \to B <: C \to D} \qquad (A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)\end{gathered}\] I then adapted the definition of consistency to work over types. (Because this definition uses negation, it is easier to define consistency as a recursive function in Isabelle instead of as an inductively defined relation.) \[\begin{aligned} n \sim n' &= (n = n') \\ n \sim (C \to D) &= \mathit{false} \\ n \sim (C \sqcap D) &= n \sim C \text{ and } n \sim D \\ (A \to B) \sim n' &= \mathit{false} \\ (A \to B) \sim (C \to D) &= (A \sim C \text{ and } B \sim D) \text{ or } A \not\sim C \\ (A \to B) \sim (C \sqcap D) &= (A \to B) \sim C \text{ and } (A \to B) \sim D \\ (A \sqcap B) \sim n' &= A \sim n' \text{ and } B \sim n' \\ (A \sqcap B) \sim (C \sqcap D) &= A \sim C \text{ and } A \sim D \text{ and } B \sim C \text{ and } B \sim D\end{aligned}\]

Turning back to the Join Theorem, I restated it in terms of the intersection type system and rebranded it the Meet Theorem. Instead of using the letter $\rho$ for environments, we shall switch to $\Gamma$ because they now contain types instead of values.
Theorem (Meet)
If $\Gamma \vdash e : A$, $\Gamma' \vdash e : B$, and $\Gamma \sim \Gamma'$, then $\Gamma\sqcap\Gamma' \vdash e : A \sqcap B$.
By restating the theorem in terms of intersection types, we have essentially arrived at the rule for intersection introduction. In other words, if we can prove this theorem we will have shown that the intersection introduction rule is admissible in our system.

While the switch to intersection types and subtyping enabled this top-level proof to go through, I got stuck on one of the lemmas that it requires, which is an adaptation of Proposition 3 of the prior blog post.
Lemma (Consistency and Subtyping)

If $A \sim B$, $A <: C$, and $B <: D$, then $C \sim D$.
If $A \not\sim B$, $C <: A$, $D <: B$, then $C \not\sim D$.

In particular, I got stuck in the cases where the subtyping $A <: C$ or $B <: D$ was derived using the transitivity rule.

Subtyping and the Sub-formula Property

For a long time I’ve disliked definitions of subtyping in which transitivity is given as a rule instead of proved as a theorem. There are several reasons for this: a subtyping algorithm can’t directly implement a transitivity rule (or any rule that is not syntax directed), reasoning by induction or cases (inversion) is more difficult, and it is redundant. Furthermore, the presence of the transitivity rule means that subtyping does not satisfy the sub-formula property. This term sub-formula property comes from logic, and means that a derivation (proof) of a formula only mentions propositions that are a part of the formulate to be proved. The transitivity rule breaks this property because the type $B$ comes out of nowhere, it is not part of $A$ or $C$, the types in the conclusion of the rule.

So I removed the transitivity rule and tried to prove transitivity. For most type systems, proving the transitivity of subtyping is straightforward. But I soon realized that the addition of the distributivity rule makes it significantly more difficult. After trying and failing to prove transitivity for some time, I resorted to reading the literature. Unfortunately, it turns out that none of the published intersection type systems satisfied the sub-formula property and vast majority of them included the transitivity rule. However, there was one paper that offered some hope. In a 2012 article in Fundamenta Informaticae titled Intersection Types with Subtyping by Means of Cut Elimination, Olivier Laurent defined subtyping without transitivity and instead proved it, but his system still did not satisfy the sub-formula property because of an additional rule for function types. Nevertheless, Olivier indicated that he was interested in finding a version of the system that did, writing

“it would be much nicer and much more natural to go through a sub-formula property”

A lot of progress can happen in six years, so I sent an email to Olivier. He replied,

“Indeed! I now have two different sequent-calculus systems which are equivalent to BCD subtyping and satisfy the sub-formula property. I am currently writting a paper on this but it is not ready yet.”

and he attached the paper draft and the Coq mechanization. What great timing! Furthermore, Olivier would be presenting the paper, titled Intersection Subtyping with Constructors, at the Workshop on Intersection Types and Related System in Oxford on July 8, part of the Federated Logic Conference (FLOC). I was planning to attend FLOC anyways, for the DOMAINS workshop to celebrate Dana Scott’s 85th birthday.

Olivier’s systems makes two important changes compared to prior work: he combines the distributivity rule and the usual arrow rule into a single elegant rule, and to enable this, he generalizes the form of subtyping from $A <: B$ to $A_1,\ldots,A_n \vdash B$, which should be interpreted as meaning $A_1 \sqcap \cdots \sqcap A_n <: B$. Having a sequence of formulas (types) on the left is characteristic of proof systems in logic, including both natural deduction systems and sequence calculi. (Sequent calculi, in addition, typically have a sequence on the right that means the disjunction of the formulas.) Here is one of Olivier’s systems, adapted to my setting, which I’ll describe below. Let $\Gamma$ range over sequences of types. \[\begin{gathered} \frac{\Gamma_1, \Gamma_2 \vdash A} {\Gamma_1 , n, \Gamma_2 \vdash A} \qquad \frac{\Gamma_1, \Gamma_2 \vdash A} {\Gamma_1 , B \to C, \Gamma_2 \vdash A} \\[2ex] \frac{\Gamma \vdash A \quad \Gamma \vdash B}{\Gamma \vdash A \sqcap B} \qquad \frac{\Gamma_1,B,C,\Gamma_2 \vdash A}{\Gamma_1,B\sqcap C,\Gamma_2 \vdash A} \\[2ex] \frac{}{n \vdash n} \qquad \frac{A \vdash C_1, \ldots, C_n \quad D_1, \ldots, D_n \vdash B} {C_1\to D_1,\ldots, C_n\to D_n \vdash A \to B}\end{gathered}\] The first two rules are weakening rules for singleton integers and function types. There is no weakening rule for intersections. The third and fourth rules are introduction and elimination rules for intersection. The fifth rule is reflexivity for integers, and the last is the combined rule for function types.

The combined rule for function types says that the intersection of a sequence of function types ${\sqcap}_{i=1\ldots n} (C_i\to D_i)$ is a subtype of $A \to B$ if \[A <: {\sqcap}_{i\in\{1\ldots n\}} C_i \qquad \text{and}\qquad {\sqcap}_{i\in\{1\ldots n\}} D_i <: B\] Interestingly, the inversion principle for this rule is the $\beta$-sound property described in Chapter 14 of Lambda Calculus with Types by Barendregt et al., and is the key to proving $\beta$-equality. In Olivier’s system, $\beta$-soundness falls out immediately, instead of by a somewhat involved proof.

The regular subtyping rule for function types is simply an instance of the combined rule in which the sequence on the left contains just one function type.

The next step for me was to enter Olivier’s definitions into Isabelle and prove transitivity via cut elimination. That is, I needed to prove the following generalized statement via a sequence of lemmas laid out by Olivier in his draft.
Theorem (Cut Elimination)
If $\Gamma_2 \vdash B$ and $\Gamma_1,B,\Gamma_3 \vdash C$, then $\Gamma_1,\Gamma_2,\Gamma_3 \vdash C$.
The transitivity rule is the instance of cut elimination where $\Gamma_2 = A$ and both $\Gamma_1$ and $\Gamma_3$ are empty.

Unfortunately, I couldn’t resist making changes to Olivier’s subtyping system as I entered it into Isabelle, which cost me considerable time. Some of Olivier’s lemmas show that the collection of types on the left, that is, the $A's$ in $A_1,\ldots, A_n \vdash B$, behave like a set instead of a sequence. I figured that if the left-hand-side was represented as a set, then I would be able to bypass several lemmas and obtain a shorter proof. I got stuck in proving Lemma $\cap L_e$ which states that $\Gamma_1,A\sqcap B,\Gamma_2 \vdash C$ implies $\Gamma_1,A, B,\Gamma_2 \vdash C$. Olivier’s subtyping rules are carefully designed to minimize the amount of overlap between the rules, and switching to a set representation increases the amount of overlap, making the proof of this lemma more difficult (perhaps impossible?).

So after struggling with the set representation for some time, I went back to sequences and was able to complete the proof of cut elimination, with a little help from Olivier at FLOC. I proved the required lemmas in the following order.
Lemma (Weakening)
If $\Gamma_1,\Gamma_2 \vdash A$, then $\Gamma_1,B,\Gamma_2 \vdash A$.
(Proved by induction on $A$.)
Lemma (Axiom)
$A \vdash A$
(Proved by induction on $A$.)
Lemma (Permutation)
If $\Gamma_1 \vdash A$ and $\Gamma_2$ is a permutation of $\Gamma_1$, then $\Gamma_2 \vdash A$.
(Proved by induction on the derivation of $\Gamma_1 \vdash A$, using many lemmas about permutations.)
Lemma ($\cap L_e$)
If $\Gamma_1,A\sqcap B,\Gamma_2 \vdash C$, then $\Gamma_1,A, B,\Gamma_2 \vdash C$.
(Proved by induction on the derivation of $\Gamma_1,A\sqcap B,\Gamma_2 \vdash C$.)
Lemma (Collapse Duplicates)
If $\Gamma_1,A,A,\Gamma_2 \vdash C$, then $\Gamma_1,A,\Gamma_2 \vdash C$.
(This is proved by well-founded induction on the lexicographical ordering of the pair $(n,k)$ where $n$ is the size of $A$ and $k$ is the depth of the derivation of $\Gamma_1,A,A,\Gamma_2 \vdash C$. Proof assistants such as Isabelle and Coq do not directly provide the depth of a derivation, but the depth can be manually encoded as an extra argument of the relation, as in $\Gamma_1,A,A,\Gamma_2 \vdash_k C$.)
The Cut Elimination Theorem is then proved by well-founded induction on the triple $(n,k_1,k_2)$ where $n$ is the size of B, $k_1$ is the depth of the derivation of $\Gamma_2 \vdash B$, and $k_2$ is the depth of the derivation of $\Gamma_1,B,\Gamma_3 \vdash C$.

We define subtyping as follows. \[A <: B \quad = \quad A \vdash B\]

The BCD subtyping rules and other derived rules follow from the above lemmas.
Proposition (Properties of Subtyping)

$A <: A$.
If $A <: B$ and $B <: C$, then $A <: C$.
If $C <: A$ and $B <: D$, then $A \to B <: C \to D$.
If $A_1 <: B$, then $A_1 \sqcap A_2 <: B$.
If $A_2 <: B$, then $A_1 \sqcap A_2 <: B$.
If $B <: A_1$ and $B <: A_2$, then $B <: A_1 \sqcap A_2$.
If $A <: C$ and $B <: D$, then $A \sqcap B <: C \sqcap D$.
$(A\to B) \sqcap (A \to C) <: A \to (B \sqcap C)$.
$(A \to C) \sqcap (B \to D) <: (A\sqcap B) \to (C \sqcap D)$

Consistency and Subtyping, Resolved

Recall that my switch to intersection types was motivated by my failure to prove the Consistency and Subtyping Lemma. We now return to the proof of that Lemma. We start with a handful of lemmas that are needed for that proof.
Lemma (Consistency is Symmetric and Reflexive)

If $A \sim B$, then $B \sim A$.
If ${\mathsf{wf}(A)}$, then $A \sim A$.

It will often be convenient to decompose a type into its set of atoms, defined as follows. \[\begin{aligned} {\mathit{atoms}(n)} &= \{ n \} \\ {\mathit{atoms}(A\to B)} &= \{ A \to B \} \\ {\mathit{atoms}(A \sqcap B)} &= {\mathit{atoms}(A)} \cup {\mathit{atoms}(B)}\end{aligned}\]

The consistency of two types is determined by the consistency of its atoms.
Lemma (Atomic Consistency)

If $A \sim B$, $C \in {\mathit{atoms}(A)}$, and $D \in {\mathit{atoms}(B)}$, then $C \sim D$.
If (for any $C \in {\mathit{atoms}(A)}$ and $D \in {\mathit{atoms}(B)}$, $C \sim D$), then $A \sim B$.
If $A \not\sim B$, then $C \not\sim D$ for some $C \in {\mathit{atoms}(A)}$ and $D \in {\mathit{atoms}(B)}$.
If $C \not\sim D$, $C \in {\mathit{atoms}(A)}$, and $D \in {\mathit{atoms}(B)}$, then $A \not\sim B$.

There are also several properties of subtyping and the atoms of a type.
Lemma (Atomic Subtyping)

If $A <: B$ and $C \in {\mathit{atoms}(B)}$, then $A <: C$.
If $A <: n$, then $n \in {\mathit{atoms}(A)}$.
$n <: A$ if and only if ${\mathit{atoms}(A)} \subseteq \{ n \}$.
If $C <: A \to B$, then $D\to E \in {\mathit{atoms}(C)}$ for some $D,E$.
If $\Gamma \vdash A$ and every atom in $\Gamma$ is a function type, then every atom of $A$ is a function type.

And we have the following important inversion lemma for function types. We use the following abbreviations: \[\begin{aligned} \mathrm{dom}(\Gamma) &= \{ A \mid \exists B.\; A \to B \in \Gamma \}\\ \mathrm{cod}(\Gamma) &= \{ B \mid \exists A.\; A \to B \in \Gamma \}\end{aligned}\]

Lemma (Subtyping Inversion for Function Types)
If $C <: A \to B$, then there is a sequence of function types $\Gamma$ such that

each element of $\Gamma$ is an atom of $C$,
For every $D\to E \in \Gamma$, we have $A <: D$, and
${\sqcap}\mathrm{cod}(\Gamma) <: B$.

Note that item 2 above implies that $A <: {\sqcap}\mathrm{dom}(\Gamma)$.

Lemma (Consistency and Subtyping)

If $A \sim B$, $A <: C$, and $B <: D$, then $C \sim D$.
If $A \not\sim B$, $C <: A$, $D <: B$, then $C \not\sim D$.

(1) The proof is by strong induction on the sum of the depths of $A$, $B$, $C$, and $D$. We define the depth of a type as follows. \[\begin{aligned} \mathit{depth}(n) &= 0 \\ \mathit{depth}(A \to B) &= 1 + \mathrm{max}(\mathit{depth}(A),\mathit{depth}(B)) \\ \mathit{depth}(A \sqcap B) &= \mathrm{max}(\mathit{depth}(A),\mathit{depth}(B)) \end{aligned}\] To show that $C \sim D$ it suffices to show that all of their atoms are consistent. Suppose $C' \in {\mathit{atoms}(C)}$ and $D'\in{\mathit{atoms}(D)}$. So we need to show that $C' \sim D'$. We proceed by cases on $C'$.

Case $C'=n_1$:
We have $A <: C'$ and therefore $n_1 \in {\mathit{atoms}(A)}$. Then because $A \sim B$, we have ${\mathit{atoms}(B)} \subseteq \{n_1\}$. We have $B <: D'$, so we also have ${\mathit{atoms}(D)} \subseteq \{n_1\}$. Therefore $C' \sim D'$.
Case $C'=C_1\to C_2$:
We have $A <: C_1 \to C_2$, so by inversion we have some sequence of function types $\Gamma_1$ such that every element of $\Gamma_1$ is an atom of $A$, $C_1 <: {\sqcap}\mathrm{dom}(\Gamma_1)$, and ${\sqcap}\mathrm{cod}(\Gamma_1) <: C_2$.

We also know that $D'$ is a function type, say $D'=D_1 \to D_2$. (This is because we have $A <: C'$, so we know that $A_1\to A_2 \in {\mathit{atoms}(A)}$ for some $A_1,A_2$. Then because $A \sim B$, we know that all the atoms in $B$ are function types. Then because $B <: D$ and $D' \in {\mathit{atoms}(D)}$, we have that $D'$ is a function type.) So by inversion on $B <: D_1 \to D_2$, we have some sequence of function types $\Gamma_2$ such that every element of $\Gamma_2$ is an atom of $B$, $D_1 <: {\sqcap}\mathrm{dom}(\Gamma_2)$, and ${\sqcap}\mathrm{cod}(\Gamma_2) <: D_2$.

It’s the case that either $C_1 \sim D_1$ or $C_1 \not\sim D_1$.
- Sub-case $C_1 \sim D_1$.
  It suffices to show that $C_2 \sim D_2$. By the induction hypothesis, we have ${\sqcap}\mathrm{dom}(\Gamma_1) \sim {\sqcap}\mathrm{dom}(\Gamma_2)$.
  
  As an intermediate step, we shall prove that ${\sqcap}\mathrm{cod}(\Gamma_1) \sim {\sqcap}\mathrm{cod}(\Gamma_2)$, which we shall do by showing that all their atoms are consistent. Suppose $A' \in {\mathit{atoms}({\sqcap}\mathrm{cod}(\Gamma_1))}$ and $B' \in {\mathit{atoms}({\sqcap}\mathrm{cod}(\Gamma_2))}$. There is some $A_1\to A_2 \in \Gamma_1$ where $A' \in {\mathit{atoms}(A_2)}$. Similarly, there is $B_1 \to B_2 \in \Gamma_2$ where $B' \in {\mathit{atoms}(B_2)}$. Also, we have $A_1 \to A_2 \in {\mathit{atoms}(A)}$ and $B_1 \to B_2 \in {\mathit{atoms}(B)}$. Then because $A \sim B$, we have $A_1 \to A_2 \sim B_1 \to B_2$. Furthermore, we have $A_1 \sim B_1$ because ${\sqcap}\mathrm{dom}(\Gamma_1) \sim {\sqcap}\mathrm{dom}(\Gamma_2)$, so it must be the case that $A_2 \sim B_2$. Then because $A' \in {\mathit{atoms}(A_2)}$ and $B' \in {\mathit{atoms}(B_2)}$, we have $A' \sim B'$. Thus concludes this intermediate step.
  
  By another use of the induction hypothesis, we have $C_2 \sim D_2$, and this case is finished.
- Sub-case $C_1 \not\sim D_1$.
  Then we immediately have $C_1 \to C_2 \sim D_1 \to D_2$.
Case $C'=C_1\sqcap C_2$:
We already know that $C'$ is an atom, so we have a contradiction and this case is vacously true.

The next two lemmas follow from the Consistency and Subtyping Lemma and help prepare to prove the case for application in the Join Theorem.
Lemma (Application Consistency)
If $A_1 \sim A_2$, $B_1 \sim B_2$, $A_1 <: B_1 \to C_1$, $A_2 <: B_2 \to C_2$, and all these types are well formed, then $C_1 \sim C_2$.
(This lemma is proved directly, without induction.)
Lemma (Application Intersection)
If $A_1 <: B_1 \to C_1$, $A_2 <: B_2 \to C_2$, $A_1 \sim A_2$, $B_1 \sim B_2$, and $C_1 \sim C_2$, then $(A_1\sqcap A_2) <: (B_1 \sqcap B_2) \to (C_1 \sqcap C_2)$.
(This lemma is proved directly, without induction.)

Updating the Denotational Semantics

Armed with the Consistency and Subtyping Lemma, I turned back to the proof of the Join Theorem, but first I needed to update my denotational semantics to use intersection types instead of values. For this we’ll need the definition of well formed types that we alluded to earlier.

\[\begin{gathered} \frac{}{{\mathsf{wf}(n)}} \qquad \frac{{\mathsf{wf}(A)} \quad {\mathsf{wf}(B)}}{{\mathsf{wf}(A \to B)}} \qquad \frac{A \sim B \quad {\mathsf{wf}(A)} \quad {\mathsf{wf}(B)}}{{\mathsf{wf}(A \sqcap B)}}\end{gathered}\]

Here are some examples and non-examples of well-formed types. \[\begin{gathered} {\mathsf{wf}(4)} \qquad {\mathsf{wf}(3 \sqcap 3)} \qquad \neg {\mathsf{wf}(3 \sqcap 4)} \\ {\mathsf{wf}((0\to 1) \sqcap (2 \to 3))} \qquad \neg {\mathsf{wf}((0 \to 1) \sqcap (0 \to 2))}\end{gathered}\] It is sometimes helpful to think of well-formed types in terms of the equivalence classes determined by subtype equivalence: \[A \approx B \quad = \quad A <: B \text{ and } B <: A\] For example, we have $3 \approx (3 \sqcap 3)$, so they are in the same equivalence class and $3$ would be the representative.

We also introduce the following notation for all the well-formed types that are super-types of a given type. \[{\mathord{\uparrow} A} \quad = \quad \{ B\mid A <: B \text{ and } {\mathsf{wf}(B)} \}\]

We shall represent variables with de Bruijn indices, so an environment $\Gamma$ is a sequence of types. The denotational semantics of the CBV $\lambda$-calculus is defined as follows. \[\begin{aligned} {\mathcal{E}{[\![ n ]\!]}}\Gamma &= {\mathord{\uparrow} n} \\ {\mathcal{E}{[\![ x ]\!]}}\Gamma &= {\mathrm{if}\;}x < |\Gamma| {\;\mathrm{then}\;}{\mathord{\uparrow} \Gamma[k]} {\;\mathrm{else}\;}\emptyset \\ {\mathcal{E}{[\![ \lambda e ]\!]}}\Gamma &= \{ A \mid {\mathsf{wf}(A)} \text{ and } {\mathcal{F}{[\![ A ]\!]}}e\Gamma \} \\ {\mathcal{E}{[\![ e_1{\;}e_2 ]\!]}}\Gamma &= \left\{ C\, \middle| \begin{array}{l} \exists A,B.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma,\\ A <: B \to C, \text{ and } {\mathsf{wf}(C)} \end{array} \right\} \\ {\mathcal{E}{[\![ f(e_1,e_2) ]\!]}}\Gamma &= \left\{ C\, \middle| \begin{array}{l} \exists A,B,n_1,n_2.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma,\\ A <: n_1, B <: n_2, {[\![ f ]\!]}(n_1,n_2) <: C, {\mathsf{wf}(C)} \end{array} \right\} \\ {\mathcal{E}{[\![ {\mathrm{if}\;}e_1 {\;\mathrm{then}\;}e_2 {\;\mathrm{else}\;}e_3 ]\!]}}\Gamma &= \left\{ B\, \middle| \begin{array}{l} \exists A, n.\; A \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma, A <: n,\\ n = 0 \Rightarrow B \in {\mathcal{E}{[\![ e_3 ]\!]}}\Gamma,\\ n \neq 0 \Rightarrow B \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma \end{array} \right\} \\[2ex] {\mathcal{F}{[\![ n ]\!]}}e\Gamma &= \mathit{false} \\ {\mathcal{F}{[\![ A \sqcap B ]\!]}}e \Gamma &= {\mathcal{F}{[\![ A ]\!]}}e\Gamma \text{ and } {\mathcal{F}{[\![ B ]\!]}}e\Gamma\\ {\mathcal{F}{[\![ A \to B ]\!]}}e \Gamma &= B \in {\mathcal{E}{[\![ e ]\!]}} (A, \Gamma)\end{aligned}\]

It is easy to show that swapping in a “super” environment does not change the semantics.

Lemma (Weakening)

If ${\mathcal{F}{[\![ A ]\!]}}e \Gamma_1$, $\Gamma_1 <: \Gamma_2$ and $(\forall B, \Gamma_1, \Gamma_2.\; B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1, \Gamma_2 <: \Gamma_1 \Rightarrow B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2)$, then ${\mathcal{F}{[\![ A ]\!]}}e \Gamma_2$.
If $A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1$ and $\Gamma_2 <: \Gamma_1$, then $A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2$.

(Part 1 is proved by induction on $A$. Part 2 is proved by induction on $e$ and uses part 1.)

The Home Stretch

Now for the main event, the proof of the Meet Theorem!
Theorem (Meet)
If $A_1 \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_1$, $A_2 \in {\mathcal{E}{[\![ e ]\!]}}\Gamma_2$, both $\Gamma_1$ and $\Gamma_2$ are well formed, and $\Gamma_1 \sim \Gamma_2$,
then $A_1 \sqcap A_2 \in {\mathcal{E}{[\![ e ]\!]}}(\Gamma_1\sqcap\Gamma_2)$ and ${\mathsf{wf}(A_1 \sqcap A_2)}$.
Proof We proceed by induction on $e$.

Case $e=k$ ($k$ is a de Bruijn index for a variable):
We have $\Gamma_1[k] <: A_1$ and $\Gamma_2[k] <: A_2$, so $\Gamma_1[k] \sqcap \Gamma_2[k] <: A_1 \sqcap A_2$. Also, because $\Gamma_1 \sim \Gamma_2$ we have $\Gamma_1[k] \sim \Gamma_2[k]$ and therefore $A_1 \sim A_2$, by the Consistency and Subtyping Lemma. So we have ${\mathsf{wf}(A_1 \sqcap A_2)}$ and this case is finished.
Case $e=n$:
We have $n <: A_1$ and $n <: A_2$, so $n <: A_1 \sqcap A_2$. Also, we have $A_1 \sim A_2$ by the Consistency and Subtyping Lemma. So we have ${\mathsf{wf}(A_1 \sqcap A_2)}$ and this case is finished.
Case $e=\lambda e$:
We need to show that ${\mathsf{wf}(A_1 \sqcap A_2)}$ and ${\mathcal{F}{[\![ A_1 \sqcap A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)$. For the later, it suffices to show that $A_1 \sim A_2$, which we shall do by showing that their atoms are consistent. Suppose $A'_1 \in {\mathit{atoms}(A_1)}$ and $A'_2 \in {\mathit{atoms}(A_2)}$. Because ${\mathcal{F}{[\![ A_1 ]\!]}}e\Gamma_1$ we have $A'_1 =A'_{11} \to A'_{12}$ and $A'_{12} \in {\mathcal{E}{[\![ e ]\!]}}(A'_{11},\Gamma_1)$. Similarly, from ${\mathcal{F}{[\![ A_2 ]\!]}}e\Gamma_2$ we have $A'_2 =A'_{21} \to A'_{22}$ and $A'_{22} \in {\mathcal{E}{[\![ e ]\!]}}(A'_{21},\Gamma_2)$. We proceed by cases on whether $A'_{11} \sim A'_{21}$.
- Sub-case $A'_{11} \sim A'_{21}$:
  By the induction hypothesis, we have ${\mathsf{wf}(A'_{12} \sqcap A'_{22})}$ from which we have $A'_{12} \sim A'_{22}$ and therefore $A'_{11}\to A'_{12} \sim A'_{21} \to A'_{22}$.
- Sub-case $A'_{11} \not\sim A'_{21}$:
  It immediately follows that $A'_{11}\to A'_{12} \sim A'_{21} \to A'_{22}$.
It remains to show ${\mathcal{F}{[\![ A_1 \sqcap A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)$. This follows from two uses of the Weakening Lemma to obtain ${\mathcal{F}{[\![ A_1 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)$ and ${\mathcal{F}{[\![ A_2 ]\!]}}e(\Gamma_1\sqcap\Gamma_2)$.
Case $e = (e_1 {\;}e_2)$:
We have \[B_1 \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma_1 \quad C_1 \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma_1 \quad B_1 <: C_1 \to A_1 \quad {\mathsf{wf}(A_1)}\] and \[B_2 \in {\mathcal{E}{[\![ e_1 ]\!]}}\Gamma_2 \quad C_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}\Gamma_2 \quad B_2 <: C_2 \to A_2 \quad {\mathsf{wf}(A_2)}\] By the induction hypothesis, we have \[B_1 \sqcap B_2 \in {\mathcal{E}{[\![ e_1 ]\!]}}(\Gamma_1 \sqcap \Gamma_2) \quad {\mathsf{wf}(B_1 \sqcap B_2)}\] and \[C_1 \sqcap C_2 \in {\mathcal{E}{[\![ e_2 ]\!]}}(\Gamma_1 \sqcap \Gamma_2) \quad {\mathsf{wf}(C_1 \sqcap C_2)}\] We obtain $A_1 \sim A_2$ by the Application Consistency Lemma, and then by the Application Intersection Lemma we have \[B_1 \sqcap B_2 <: (C_1 \sqcap C_2) \to (A_1 \sqcap A_2)\] So we have $A_1 \sqcap A_2 \in {\mathcal{E}{[\![ e ]\!]}}(\Gamma_1 \sqcap \Gamma_2)$.

Also, from $A_1 \sim A_2$, ${\mathsf{wf}(A_1)}$, and ${\mathsf{wf}(A_2)}$, we conclude that ${\mathsf{wf}(A_1 \sqcap A_2)}$.
Case $e= f(e_1,e_2)$:
(This case is not very interesting. See the Isabelle proof for the details.)
Case $e= {\mathrm{if}\;}e_1 {\;\mathrm{then}\;}e_2 {\;\mathrm{else}\;}e_3$:
(This case is not very interesting. See the Isabelle proof for the details.)

I thought that the following Subsumption Theorem would be needed to prove the Meet Theorem, but it turned out not to be necessary, which is especially nice because the proof of the Subsumption Theorem turned out to depend on the Meet Theorem!
Theorem (Subsumption)
If $A \in {\mathcal{E}{[\![ e ]\!]}}\Gamma$, $A <: B$, and both $B$ and $\Gamma$ are well-formed, then $B \in {\mathcal{E}{[\![ e ]\!]}}\Gamma$.
The proof is by induction on $e$ and all but the case $e=\lambda e'$ are straightforward. For that case, we use the following lemmas.
Lemma (Distributivity for $\mathcal{F}$)
If ${\mathcal{F}{[\![ (A \to B)\sqcap (C \to D) ]\!]}} e \Gamma$, $A \sim C$, and everything is well formed, then ${\mathcal{F}{[\![ (A\sqcap C) \to (B\sqcap D) ]\!]}} e \Gamma$.
(The proof is direct, using the Meet Theorem and the Weakening Lemma.)
Lemma ($\mathcal{F}$ and Intersections)
Suppose $\Gamma_1$ is a non-empty sequence of well-formed and consistent function types. If ${\mathcal{F}{[\![ {\sqcap}\Gamma_1 ]\!]}} e \Gamma_2$, then ${\mathcal{F}{[\![ {\sqcap}\mathrm{dom}(\Gamma_1) \to {\sqcap}\mathrm{cod}(\Gamma_1) ]\!]}} e \Gamma_2$.
(The proof is by induction on $\Gamma_1$ and uses the previous lemma.)

Conclusion

This result can be viewed a couple ways. As discussed at the beginning of this post, establishing the Meet Theorem means that the this call-by-value denotational semantics respects $\beta$-equality for any terminating argument expression. This is useful in proving the correctness of a function inlining optimizer. Also, it would be straightforward to define a call-by-name (or need) version of the semantics that respects $\beta$-equality unconditionally.

Secondly, from the viewpoint of intersection type systems, this result shows that, once we require types to be well formed (i.e. self consistent), we no longer need the intersection introduction rule because it is a consequence of having the subtyping rule for distributing intersections through function types.

What do real numbers have in common with lambdas? and what does continuity have to do with it?

2018-04-24T20:31:00.000-07:00

Continuous functions over the real numbers

As a high school student and undergraduate I learned in Calculus that

real numbers involve infinity in precision, e.g. some have no finite decimal representation, and
a continuous function forms an unbroken line, a necessary condition to be differentiable.

For an example, the decimal representation of $\sqrt 2$ goes on forever: \[1.41421 \ldots\] Later on, in a course on Real Analysis, I learned that one way to define the real numbers is to declare them to be Cauchy sequences, that is, infinite sequences of rational numbers that get closer and closer together. So, for example, $\sqrt 2$ is declared to be the sequence $1, \frac{3}{2}, \frac{17}{12}, \frac{577}{408}, \ldots$ described by the following recursive formulas.
\[A_0 = 1 \qquad A_{n+1} = \frac{A_n}{2} + \frac{1}{A_n} \hspace{1in} (1) \label{eq:caucy-sqrt-2}\]
Depending on how close an approximation to $\sqrt 2$ you need, you can go further out in this sequence. (Alternatively, one can represent $\sqrt 2$ by its sequence of continued fractions.)
For an example of a continuous function, Figure 1 depicts $x^3 - x^2 - 4x$. On the other hand, Figures 2 and 3 depict functions that are not continuous. The function $1/\mathrm{abs}(x-\sqrt 2)^{1/4}$ in Figure 2 is not continuous because it goes to infinity as it approaches $\sqrt 2$. The function $(x+1)\,\mathrm{sign}(x)$ in Figure 3 is not continuous because it jumps from $-1$ to $1$ at $0$.

Figure 1. The function $x^3 - x^2 - 4x$ is continuous.

Figure 2. The function $1/\mathrm{abs}(x-\sqrt 2)^{1/4}$ is not continuous at $\sqrt 2$.

Figure 3. The function $(x+1)\,\mathrm{sign}(x)$ is not continuous at $0$.

You may recall the $\epsilon$-$\delta$ definition of continuity, stated below and depicted in Figure 4.

A function $f$ is continuous at a point $x$ if for any $\epsilon > 0$ there exists a $\delta > 0$ such that for any $x'$ in the interval $(x - \delta,x+\delta)$, $f(x')$ is in $(f(x) -\epsilon, f(x) + \epsilon)$.

In other words, when a function is continuous, if you want to determine its result with an accuracy of $\epsilon$, you need to measure the input with an accuracy of $\delta$.

Figure 4. The $\epsilon$-$\delta$ definition of continuity.

One connection between the infinite nature of real numbers and continuity that only recently sunk-in is that continuous functions are the ones that can be reasonably approximated by applying them to approximate, finitely-represented inputs. For example, suppose you wish to compute $f(\sqrt 2)$ for some continuous function $f$. You can accomplish this by applying $f$ to each rational number in the Cauchy sequence for $\sqrt 2$ until two subsequent results are closer than your desired accuracy. On the other hand, consider trying to approximate the function from Figure 2 by applying it to rational numbers in the Cauchy sequence for $\sqrt 2$. No matter how far down the sequence you go, you’ll still get a result that is wrong by an infinite margin!

The $\lambda$-calculus and continuous functions

In graduate school I studied programming languages and learned that

the $\lambda$-calculus is a little language for creating and applying functions, and
Dana S. Scott’s semantics of the $\lambda$-calculus interprets $\lambda$’s as continuous functions.

For example, the $\lambda$ expression \[\lambda x.\; x + 1\] creates an anonymous function that maps its input $x$, say a natural number, to the next greatest one. The graph of this function is \[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\}\] which is infinite. So we have our first similarity between the real numbers and $\lambda$’s, both involve infinity.
A key characteristic of the $\lambda$-calculus is that functions can take functions as input. Thus, the semantics of the $\lambda$-calculus is also concerned with functions over infinite entities (just like functions over the real numbers). For example, here is a $\lambda$ expression that takes a function $f$ and produces a function that applies $f$ twice in succession to its input $x$. \[\lambda f.\; \lambda x.\; f(f(x))\] The graph of this function is especially difficult to write down. Not only does it have an infinite domain and range, but each element in the domain and range is an infinite entity. \[\left\{ \begin{array}{l} \{ 0\mapsto 1, 1\mapsto 2, 2\mapsto 3, \ldots \} \mapsto \{ 0\mapsto 2, 1\mapsto 3, 2\mapsto 4, \ldots \},\\ \{ 0\mapsto 0, 1\mapsto 2, 2\mapsto 4, \ldots \} \mapsto \{ 0\mapsto 0, 1\mapsto 4, 2\mapsto 8, \ldots \},\\ \ldots \end{array} \right\}\]
Denotational semantics for the $\lambda$-calculus interpret $\lambda$’s as continuous functions, so just based on the terminology there should be another similarity with real numbers! However, these continuous functions are over special sets called domains, not real numbers, and the definition of continuity in this setting bears little resemblance to the $\epsilon$-$\delta$ definition. For example, in Dana S. Scott’s classic paper Data Types as Lattices, the domain is the powerset of the natural numbers, $\mathcal{P}(\mathbb{N})$. This domain can be used to represent a function's graph by encoding (create a bijection) between pairs and natural numbers, and between sets and naturals. The following are the easier-to-specify directions of the two bijections, the mapping from pairs to naturals and the mapping from naturals to sets of naturals.
\[\begin{aligned} \langle n, m \rangle &= 2^n (2m+1) - 1 \\ \mathsf{set}(0) &= \emptyset \\ \mathsf{set}(1+k) &= \{ m \} \cup \mathsf{set}(n) & \text{if } \langle n, m \rangle = k\end{aligned}\]
Scott defines the continuous functions on $\mathcal{P}(\mathbb{N})$ as those functions $h$ that satisfy
\[h(f) = \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \hspace{1in} (2) \label{eq:cont-pn}\]
In other words, the value of a continuous function $h$ on some function $f \in \mathcal{P}(\mathbb{N})$ must be the same as the union of applying $h$ to all the finite subgraphs of $f$. One immediately wonders, why are the $\lambda$-definable functions continuous in this sense? Consider some $\lambda$ expression $h$ that takes as input a function $f$.

But $f$ is a function; an infinite object. What does it mean to “compute” with an “infinite” argument? In this case it means most simply that $h(f)$ is determined by asking of $f$ finitely many questions: $f(m_0), f(m_1), ..., f(m_{k-1})$. —Dana S. Scott, A type-theoretical alternative to ISWIM, CUCH, OWHY, 1969.

Put another way, if $h$ terminates and returns a result, then it will only have had a chance to call $f$ finitely many times. So it suffices to apply $h$ instead to a finite subset of the graph of $f$. However, we do not know up-front which subset of $f$ to use, but it certainly suffices to try all of them!

Relating the two kinds of continuity

But what does equation (2) have to do with continuous functions over the real numbers? What does it have to do with the $\epsilon$-$\delta$ definition? This question has been in the back of my mind for some time, but only recently have I had the opportunity to learn the answer.
To understand how these two kinds of continuity are related, it helps to focus on the way that infinite entities can be approximated with finite ones in the two settings. We can approximate a real number with a rational interval. For example, refering back to the Cauchy sequence for $\sqrt 2$, equation (1), we have \[\sqrt 2 \in \left(\frac{17}{12}, \frac{3}{2}\right)\] Of course an approximation does not uniquely identify the thing it approximates. So there are other real numbers in this interval, such as $\sqrt{2.1}$. \[\sqrt{2.1} \in \left(\frac{17}{12}, \frac{3}{2}\right)\]
Likewise we can approximate the infinite graph of a function with a finite part of its graph. For example, let $G$ be the a graph with just one input-output entry. \[G=\{ 1 \mapsto 2 \}\] Then we consider $G$ to be an approximation of any function that agrees with $G$ (maps $1$ to $2$), which is to say its graph is a superset of $G$. So the set of all functions that are approximated by $G$ can be expressed with a set comprehension as follows: $\{ f \mid G \subseteq f\}$. In particular, the function $+1$ that adds one to its input is approximated by $G$. \[\left\{ \begin{array}{l} 0\mapsto 1, \\ 1\mapsto 2, \\ 2\mapsto 3, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\] But also the function $\times 2$ that doubles its input is approximated by $G$. \[\left\{ \begin{array}{l} 0\mapsto 0, \\ 1\mapsto 2, \\ 2\mapsto 4, \\ \quad\,\vdots \end{array} \right\} \in \{ f \mid G \subseteq f\}\] Of course, a better approximation such as $G'=\{1\mapsto 2, 2\mapsto 3\}$ is able to tell these two functions apart.
The interval $(17/12, 3/2)$ and the set $\{f\mid G \subseteq f\}$ are both examples of neighborhoods (aka. base elements) in a topological space. The field of Topology was created to study the essence of continuous functions, capturing the similarities and abstracting away the differences regarding how such functions work in different settings. A topological space is just some set $X$ together with a collection $B$ of neighborhoods, called a base, that must satisfy a few conditions that we won’t get into. We’ve already seen two topological spaces.

The real numbers form a topological space where each neighborhood consists of all the real numbers in a rational interval.
The powerset $\mathcal{P}(\mathbb{N})$ forms a topological space where each neighborhood consists of all the functions approximated by a finite graph.

The $\epsilon$-$\delta$ definition of continuity generalizes to topological spaces: instead of talking about intervals, it talks generically about neighborhoods. In the following, the interval $(f(x) -\epsilon, f(x) + \epsilon)$ is replaced by neighborhood $E$ and the interval $(x - \delta,x+\delta)$ is replaced by neighborhood $D$.

A function $f$ is continuous at a point $x$ if for any neighborhood $E$ that contains $f(x)$, there exists a neighborhood $D$ that contains $x$ such that for any $y$ in $D$, $f(y)$ is in $E$.

Now let us instantiate this topological definition of continuity into $\mathcal{P}(\mathbb{N})$.

A function $f$ over $\mathcal{P}(\mathbb{N})$ is continuous at $X$ if for any finite set $E$ such that $E \subseteq f(X)$, there exists a finite set $D$ with $D \subseteq X$ such that for any $Y$, $D \subseteq Y$ implies $E \subseteq f(Y)$.

Hmm, this still doesn’t match up with the definition of continuity in equation (2) but perhaps they are equivalent. Let us take the above as the definition and try to prove equation (2).
First we show that \[h(f) \subseteq \bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}\] Let $x'$ be an arbitrary element of $h(f)$. To show that $x'$ is in the right-hand side we need to identify some finite $g$ such that $g \subseteq f$ and $x' \in h(g)$, that is, $\{x'\} \subseteq h(g)$. But this is just what continuity gives us, taking $h$ as $f$, $f$ as $X$, $\{x'\}$ as $E$, $g$ as $D$, and also $g$ as $Y$. Second we need show that \[\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \} \subseteq h(f)\] This time let $x'$ be an element of $\bigcup \{ h(g) \mid g \subseteq_{\mathit{fin}} f \}$. So we known there is some finite set $g$ such that $x' \in h(g)$ and $g \subseteq f$. Of course $\{x'\}$ is a finite set and $\{x'\} \subseteq h(g)$, so we can apply the definition of continuity to obtain a finite set $E$ such that $E \subseteq g$ and for all $Y$, $E \subseteq Y$ implies $\{x'\} \subseteq h(Y)$. From $E \subseteq g$ and $g \subseteq f$ we transitively have $E \subseteq f$. So instantiating $Y$ with $f$ we have $\{x'\} \subseteq h(f)$ and therefore $x' \in h(f)$.
We have shown that the topologically-derived definition of continuity for $\mathcal{P}(\mathbb{N})$ implies the definition used in the semantics of the $\lambda$-calculus, i.e., equation (2). It is also straightforward to prove the other direction, taking equation (2) as given and proving that the topologically-derived definition holds. Thus, continuity for functions over real numbers really is similar to continuity for $\lambda$ functions, they are both instances of continuous functions in a topological space.

Continuous functions over partial orders

In the context of Denotational Semantics, domains are often viewed as partial orders where the ordering $g \sqsubseteq f$ means that $g$ approximates $f$, or $f$ is more informative than $g$. The domain $\mathcal{P}(\mathbb{N})$ with set containment $\subseteq$ forms a partial order. Refering back to the examples in the first section, with $G=\{ 1 \mapsto 2 \}$ and $G'=\{1\mapsto 2, 2\mapsto 3\}$, we have $G \sqsubseteq G'$, $G' \sqsubseteq +1$, and $G \sqsubseteq \times 2$. In a partial order, the join $x \sqcup y$ of $x$ and $y$ is the least element that is greater than both $x$ and $y$. For the partial order on $\mathcal{P}(\mathbb{N})$, join corresponds to set union.
In the context of partial orders, continuity is defined with respect to infinite sequences of ever-better approximations: \[f_0 \sqsubseteq f_1 \sqsubseteq f_2 \sqsubseteq \cdots\] A function $h$ is continuous if applying it to the join of the sequence is the same as applying it to each element of the sequence and then taking the join.
\[h\left(\bigsqcup_{n\in\mathbb{N}} f_n\right) = \bigsqcup_{n\in\mathbb{N}} h(f_n) \hspace{1in} (3) \label{eq:cont-cpo}\]
But this equation is not so different from the equation (2) that expresses continuity on $\mathcal{P}(\mathbb{N})$. For any function $f$ (with infinite domain) we can find an sequence $(f_n)_{n=0}^{\infty}$ of ever-better but still finite approximations of $f$ such that \[f = \bigsqcup_{n\in\mathbb{N}} f_n\] Then both equation (2) and (3) tell us that $h(f)$ is equal to the union of applying $h$ to each $f_n$.

Putting the Function back in Lambda

2017-12-23T20:04:00.000-08:00

Happy holidays! There’s nothing quite like curling up in a comfy chair on a rainy day and proving a theorem in your favorite proof assistant.

Lately I’ve been interested in graph models of the $\lambda$-calculus, that is, models that represent a $\lambda$ with relations from inputs to outputs. The use of relations instead of functions is not a problem when reasoning about expressions that produce numbers, but it does introduce problems when reasoning about expressions that produce higher-order functions. Some of these expressions are contextually equivalent but not denotationally equivalent. For example, consider the following two expressions. \[{\lambda f.\,} (f {\;}0) + (f {\;}0) =_{\mathrm{ctx}} {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) \qquad\qquad (1)\] The expression on the left-hand side has two copies of a common subexpression $(f {\;}0)$. The expression on the right-hand side is optimized to have just a single copy of $(f {\;}0)$. The left and right-hand expressions in equation (1) are contextually equivalent because the $\lambda$-calculus is a pure language (no side effects), so whether we call $f$ once or twice does not matter, and it always returns the same result given the same input. Unfortunately, the two expressions in equation (1) are not denotationally equivalent. \[{\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset \neq {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset \qquad\qquad (2)\] Recall that my semantics $\mathcal{E}$ maps an expression and environment to a set of values. The “set” is not because an expression produces multiple conceptually-different values. Sets are needed because we represent a (infinite) function as an infinite set of finite relations. So to prove the above inequality (2) we simply need to find a value that is in the set on the left-hand side that is not in the set on the right-hand side. The idea is that we consider the behavior when parameter $f$ is bound to a relation that is not a function. In particular, the relation \[R = \{ (0,1), (0,2) \}\] Now when we consider the application $(f {\;}0)$, the semantics of function application given by $\mathcal{E}$ can choose the result to be either $1$ or $2$. Furthermore, for the left-hand side of equation (2), it could choose $1$ for the first $(f {\;}0)$ and $2$ for the second $(f {\;}0)$ . Thus, the result of the function can be $3$. \[\{ (R,3) \} \in {\mathcal{E}[\![ {\lambda f.\,} (f {\;}0) + (f {\;}0) ]\!]}\emptyset\] Of course, this function could never actually produce $3$ because $R$ does not correspond to any $\lambda$’s. In other words, garbage-in garbage-out. Turning to the right-hand side of equation (2), there is only one $(f{\;}0)$, which can either produce $1$ or $2$, so the result of the outer function can be $2$ or $4$, but not $3$.

\[\begin{aligned} \{ (R,2) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,3) \} &\notin {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\\ \{ (R,4) \} &\in {\mathcal{E}[\![ {\lambda f.\,} ({\lambda x.\,} x + x) {\;}(f {\;}0) ]\!]}\emptyset\end{aligned}\]

So we need to put the function back in $\lambda$! That is, we need to restrict the notion of values so that all the relations are also functions. Recall the definition: a function $f$ is a relation on two sets $A$ and $B$ such that for all $a \in A$ there exists a unique $b \in B$ such that $(a,b) \in f$. In other words, if $(a,b) \in f$ and $(a,b') \in f$, then necessarily $b = b'$. Can we simply add this restriction to our notion of value? Not quite. If we literally applied this definition, we could still get graphs such as the following one, which maps two different approximations of the add-one function to different outputs. This graph does not correspond to any $\lambda$. \[\{ (\{(0,1)\}, 2), (\{(0,1),(5,6) \}, 3) \}\]

So we need to generalize the notion of function to allow for differing approximations. We shall do this by generalizing from equality to consistency, written $\sim$. Two integers are consistent when they are equal. Two graphs as consistent when they map consistent inputs to consistent outputs. We are also forced to explicitly define inconsistency, which we explain below.

\[\begin{gathered} \frac{}{n \sim n} \qquad \frac{\begin{array}{l}\forall v_1 v'_1 v_2 v'_2, (v_1,v'_1) \in t_1 \land (v_2,v'_2) \in t_2 \\ \implies (v_1 \sim v_2 \land v'_1 \sim v'_2) \lor v_1 \not\sim v_2 \end{array}} {t_1 \sim t_2} \\[2ex] \frac{n_1 \neq n_2}{n_1 \not\sim n_2} \qquad \frac{(v_1,v'_1) \in t_1 \quad (v_2,v'_2) \in t_2 \quad v_1 \sim v_2 \quad v'_1 \not\sim v'_2} {t_1 \not\sim t_2} \\[2ex] \frac{}{n \not\sim t} \qquad \frac{}{t \not\sim n}\end{gathered}\]

The definition of consistency is made a bit more complicated than I expected because the rules of an inductive definition must be monotonic, so we can’t negate a recursive application or put it on the left of an implication. In the above definition of consistency for graphs $t_1 \sim t_2$, it would have been more natural to say $v_1 \sim v_2 \implies v'_1 \sim v'_2$ in the premise, but then $v_1 \sim v_2$ is on the left of an implication. The above inductive definition works around this problem by mutually defining consistency and inconsistency. We then prove that inconsistency is the negation of consistency.

Proposition 1 (Inconsistency) $v_1 \not\sim v_2 = \neg (v_1 \sim v_2)$
Proof. We first establish by mutual induction that $v_1 \sim v_2 \implies \neg (v_1 \not\sim v_2)$ and $v_1 \not\sim v_2 \implies \neg (v_1 \sim v_2)$. We then show that $(v_1 \sim v_2) \lor (v_1 \not\sim v_2)$ by induction on $v_1$ and case analysis on $v_2$. Therefore $\neg (v_1 \not\sim v_2) \implies v_1 \sim v_2$, so we have proved both directions of the desired equality. $\Box$

Armed with this definition of consistency, we can define a generalized notion of function, let’s call it $\mathsf{is\_fun}$. \[\mathsf{is\_fun}\;t \equiv \forall v_1 v_2 v'_1 v'_2, (v_1,v'_1) \in t \land (v_2,v'_2) \in t \land v_1 \sim v_2 \implies v'_1 \sim v'_2\] Next we restrict the notion of value to require the graphs to satisfy $\mathsf{is\_fun}$. Recall that we use to define values by the following grammar. \[\begin{array}{lrcl} \text{numbers} & n & \in & \mathbb{Z} \\ \text{graphs} & t & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \}\\ \text{values} & v & ::= & n \mid t \end{array}\] We keep this definition but add an induction definition of a more refined notion of value, namely $\mathsf{is\_val}$. Numbers are values and graphs are values so long as they satisfy $\mathsf{is\_fun}$ and only map values to values.

\[\begin{gathered} \frac{}{\mathsf{is\_val}\,n} \qquad \frac{\mathsf{is\_fun}\;t \quad \forall v v', (v,v') \in t \implies \mathsf{is\_val}\,v \land \mathsf{is\_val}\,v'} {\mathsf{is\_val}\,t}\end{gathered}\]

We are now ready to update our semantic function $\mathcal{E}$. The one change that we make is to require that each graph $t$ satisfies $\mathsf{is\_val}$ in the meaning of a $\lambda$. \[{\mathcal{E}[\![ {\lambda x.\,} e ]\!]}\rho = \{ t \mid \mathsf{is\_val}\;t \land \forall (v,v')\in t, v' \in {\mathcal{E}[\![ e ]\!]}\rho(x{:=}v) \}\] Hopefully this change to the semantics enables a proof that $\mathcal{E}$ is deterministic. Indeed, we shall show that if $v \in {\mathcal{E}[\![ e ]\!]}\rho$ and $v' \in {\mathcal{E}[\![ e ]\!]}\rho'$ for any suitably related $\rho$ and $\rho'$, then $v \sim v'$.

To relate $\rho$ and $\rho'$, we extend the definitions of consistency and $\mathsf{is\_val}$ to environments.

\[\begin{gathered} \emptyset \sim \emptyset \qquad \frac{v \sim v' \quad \rho \sim \rho'} {\rho(x{:=}v) \sim \rho'(x{:=}v')} \\[2ex] \mathsf{val\_env}\;\emptyset \qquad \frac{\mathsf{is\_val}\; v \quad \mathsf{val\_env}\;\rho} {\mathsf{val\_env}\;\rho(x{:=}v)}\end{gathered}\]

We will need a few small lemmas concerning these definitions and their relationship with the $\sqsubseteq$ ordering on values.

Proposition 2

If $\mathsf{val\_env}\;\rho$ and $\rho(x) = v$, then $\mathsf{is\_val}\; v$.
If $\rho \sim \rho'$, $\rho(x) = v$, $\rho'(x) = v'$, then $v \sim v'$.

Proposition 3

If $\mathsf{is\_val}\;v'$ and $v \sqsubseteq v'$, then $\mathsf{is\_val}\; v$.
If $v_1 \sqsubseteq v'_1$, $v_2 \sqsubseteq v'_2$, and $v'_1 \sim v'_2$, then $v_1 \sim v_2$.

We now come to the main theorem, which is proved by induction on $e$, using the above three propositions.

Theorem (Determinism of $\mathcal{E}$) If $v \in {\mathcal{E}[\![ e ]\!]}\rho$, $v' \in {\mathcal{E}[\![ e ]\!]}\rho'$, $\mathsf{val\_env}\;\rho$, $\mathsf{val\_env}\;\rho'$, and $\rho \sim \rho'$, then $\mathsf{is\_val}\;v$, $\mathsf{is\_val}\;v'$, and $v \sim v'$.

New revision of the semantics paper (POPL rejection, ESOP submission)

2017-10-15T14:54:00.000-07:00

My submission about declarative semantics to POPL was rejected. It's been a few weeks now, so I'm not so angry about it anymore. I've revised the paper and will be submitting it to ESOP this week.

The main reason for rejection according to the reviewers was a lack of technical novelty, but I think the real reasons were that 1) the paper came across as too grandiose and as a result, it accidentally annoyed the reviewer who is an expert in denotational semantics, and 2) the paper did not do a good job of comparing to the related set-theoretic models of Plotkin and Engeler.

Regarding 1), in the paper I use the term "declarative semantics" to try and distance this new semantics from the standard lattice-based denotational semantics. However, the reviewer took it to claim that the new semantics is not a denotational semantics, which is clearly false. In the new version of the paper I've removed the term "declarative semantics" and instead refer to the new semantics as a denotational semantics of the "elementary" variety. Also, I've toned down the sales pitch to better acknowledge that this new semantics is not the first elementary denotational semantics.

Regarding 2), I've revised the paper to include a new section at the beginning that gives background on the elementary semantics of Plotkin, Engeler, and Coppo et al. This should help put the contributions of the paper in context.

Other than that, I've added a section with a counter example to full abstraction. A big thanks to the POPL reviewers for the counter example! (Also thanks to Max New, who sent me the counter example a couple months ago.)

Unfortunately, the ESOP page limit is a bit shorter, so I removed the relational version of the semantics and also the part about mutable references.

A draft of the revision is available on arXiv. Feedback is most welcome, especially from experts in denotational semantics! I really hope that this version is no longer annoying, but if it is, please tell me!

Comparing to Plotkin and Engeler's Set-theoretic Models of the Lambda Calculus

2017-10-03T17:43:00.000-07:00

On the plane ride back from ICFP last month I had a chance to re-read and better understand Plotkin’s Set-theoretical and other elementary models of the $\lambda$-calculus (Technical Report 1972, Theoretical Computer Science 1993) and to read, for the first time, Engeler’s Algebras and combinators (Algebra Universalis 1981). As I wrote in my draft paper Declarative semantics for functional languages: compositional, extensional, and elementary, the main intuitions behind my simple semantics are present in these earlier papers, but until now I did not understand these other semantics deeply enough to give a crisp explanation of the similarities and differences. (The main intuitions are also present in the early work on intersection type systems, and my semantics is more closely related to those systems. A detailed explanation of that relationship is given in the draft paper.)

I should note that Engeler’s work was in the context of combinators (S and K), not the $\lambda$-calculus, but of course the $\lambda$-calculus can be encoded into combinators. I’ve ported his definitions to the $\lambda$-calculus, along the lines suggested by Plotkin (1993), to make for easier comparison. In addition, I’ll extend both Engeler and Plotkin’s semantics to include integers and integer arithmetic in addition to the $\lambda$-calculus. Here’s the syntax for the $\lambda$-calculus that we consider here: \[\begin{array}{rcl} && n \in \mathbb{Z} \qquad x \in \mathbb{X} \;\;\text{(program variables)}\\ \oplus & ::= & + \mid - \mid \times \mid \div \\ \mathbb{E} \ni e & ::= & n \mid e \oplus e \mid x \mid {\lambda x.\,} e \mid e \; e \mid {\textbf{if}\,}e {\,\textbf{then}\,}e {\,\textbf{else}\,}e \end{array}\]

Values

Perhaps the best place to start the comparison is in the definition of what I’ll call values. All three semantics give an inductive definition of values and all three involve finite sets, but in different ways. I’ll write $\mathbb{V}_S$ for my definition, $\mathbb{V}_P$ for Plotkin’s, and $\mathbb{V}_E$ for Engeler’s. \[\begin{aligned} \mathbb{V}_S &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_S \times \mathbb{V}_S) \\ \mathbb{V}_P &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_P) \times \mathcal{P}_f(\mathbb{V}_P) \\ \mathbb{V}_E &= \mathbb{Z} + \mathcal{P}_f(\mathbb{V}_E) \times \mathbb{V}_E\end{aligned}\] In $\mathbb{V}_S$, a function is represented as a finite graph, that is, a finite set of input-output pairs. For example, the graph $\{ (0,1), (1,2), (2,3) \}$ is one of the meanings for the term $(\lambda x.\, x + 1)$.

Plotkin’s values $\mathbb{V}_P$ include only a single input-output pair from a function’s graph. For example, $(\{0\}, \{1\})$ is one of the meanings for the term $(\lambda x.\, x + 1)$. Engeler’s values also include just a single entry. For example, $(\{0\}, 1)$ is one of the meanings for the term $(\lambda x.\, x + 1)$. In this example we have not made use of the finite sets in the input and output of Plotkin’s values. To do so, let us consider a higher-order example, such as the term $(\lambda f.\, f\,1 + f\,2)$. For Plotkin, the following value is one of its meanings: \[(\{ (\{1\}, \{3\}), (\{2\}, \{4\}) \}, \{7\})\] That is, in case $f$ is the function that adds $2$ to its input, the result is $7$. We see that the presence of finite sets in the input is needed to accomodate functions-as-input. The corresponding value in $\mathbb{V}_S$ is \[\{ (\{ (1, 3), (2, 4) \}, 7) \}\]

The difference between Plotkin and Engeler’s values can be seen in functions that return functions. Consider the $K$ combinator $(\lambda x.\,\lambda y.\, x)$. For Plotkin, the following value is one of its meanings: \[(\{1\}, \{ (\{0\},\{1\}), (\{2\},\{1\}) \})\] That is, when applied to $1$ it returns a function that returns $1$ when applied to either $0$ or $2$. The corresponding value in $\mathbb{V}_S$ is \[\{ (1, \{ (0,1), (2,1) \}) \}\] For Engeler, there is not a single value corresponding to the above value. Instead it requires two values to represent the same information. \[(\{1\}, (\{0\},1)) \quad\text{and}\quad (\{1\}, (\{2\},1))\] We’ll see later that it doesn’t matter that Engeler requires more values to represent the same information.

The Domains

The semantics of Plotkin, Engeler, and myself does not use values for the domain, but instead a set of values. That is \[\mathcal{P}(\mathbb{V}_S) \qquad \mathcal{P}(\mathbb{V}_P) \qquad \mathcal{P}(\mathbb{V}_E)\]

The role of the outer $\mathcal{P}$ is intimately tied to the meaning of functions in Plotkin and Engeler’s semantics because the values themselves only record a single input-output pair. The outer $\mathcal{P}$ is needed to represent all of the input-output pairs for a given function. While the $\mathcal{P}$ is also necessary for functions in my semantics, one can view it generically as providing non-determinism and therefore somewhat orthogonal to the meaning of functions per se. Next let’s take a look at the semantics.

Comparing the Semantics

Here is Plotkin’s semantics $\mathcal{E}_P$. Let $V,V'$ range over finite sets of values. \[\begin{aligned} {\mathcal{E}_P[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_P[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_P[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_P[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,V') \mid V' \subseteq {\mathcal{E}_P[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_P[\![ e_1\;e_2 ]\!]}\rho &= \bigcup \left\{ V' \, \middle| \begin{array}{l} \exists V.\, (V,V') {\in} {\mathcal{E}_P[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_P[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_P[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_P[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_P[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_P[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] For Plotkin, the environment $\rho$ maps variables to finite sets of values. In the case for application, the input set $V$ must be a subset of the meaning of the argument, which is critical for enabling self application and, using the $Y$ combinator, general recursion. The $\bigcup$ flattens the set-of-finite-sets into a set.

Next we consider Engeler’s semantics $\mathcal{E}_E$. \[\begin{aligned} {\mathcal{E}_E[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_E[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_E[\![ x ]\!]}\rho &= \rho(x) \\ {\mathcal{E}_E[\![ {\lambda x.\,} e ]\!]}\rho &= \{ (V,v') \mid v' \in {\mathcal{E}_E[\![ e ]\!]}\rho(x{:=}V) \} \\ {\mathcal{E}_E[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v' \, \middle| \begin{array}{l} \exists V.\, (V,v') {\in} {\mathcal{E}_E[\![ e_1 ]\!]}\rho \land V {\subseteq} {\mathcal{E}_E[\![ e_2 ]\!]}\rho \end{array} \right\} \\ {\mathcal{E}_E[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_E[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_E[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_E[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] The semantics is quite similar to Plotkin’s, as again we see the use of $\subseteq$ in the case for application. Because the output $v'$ is just a value, and not a finite set of values as for Plotkin, there is no need for the $\bigcup$.

Finally we review my semantics $\mathcal{E}_S$. For it we need to define an ordering on values that is just equality for integers and $\subseteq$ on function graphs. Let $t$ range over $\mathcal{P}_{f}(\mathbb{V} \times \mathbb{V})$. \[\frac{}{n \sqsubseteq n} \qquad \frac{t_1 \subseteq t_2}{t_1 \sqsubseteq t_2}\] Then we define $\mathcal{E}_S$ as follows. \[\begin{aligned} {\mathcal{E}_S[\![ n ]\!]}\rho &= \{ n \} \\ {\mathcal{E}_S[\![ e_1 \oplus e_2 ]\!]}\rho &= \{ n_1 \oplus n_2 \mid n_1 \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land n_2 \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho \} \\ {\mathcal{E}_S[\![ x ]\!]}\rho &= \{ v \mid v \sqsubseteq \rho(x) \} \\ {\mathcal{E}_S[\![ {\lambda x.\,} e ]\!]}\rho &= \{ t \mid \forall (v,v')\in t.\, v' \in {\mathcal{E}_S[\![ e ]\!]}\rho(x{:=}v) \} \\ {\mathcal{E}_S[\![ e_1\;e_2 ]\!]}\rho &= \left\{ v \, \middle| \begin{array}{l} \exists t\, v_2\, v_3\, v_3'.\, t {\in} {\mathcal{E}_S[\![ e_1 ]\!]}\rho \land v_2 {\in} {\mathcal{E}_S[\![ e_2 ]\!]}\rho \\ \land\, (v_3, v_3') \in t \land v_3 \sqsubseteq v_2 \land v \sqsubseteq v_3' \end{array} \right\} \\ {\mathcal{E}_S[\![ {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 ]\!]}\rho &= \left\{ v\, \middle|\, \begin{array}{l} \exists n.\, n \in {\mathcal{E}_S[\![ e_1 ]\!]}\rho \\ \land\, (n\neq 0 \implies v \in {\mathcal{E}_S[\![ e_2 ]\!]}\rho)\\ \land\, (n=0 \implies v \in {\mathcal{E}_S[\![ e_3 ]\!]}\rho) \end{array} \right\}\end{aligned}\] In my semantics, $\rho$ maps a variable to a single value. The $v_3 \sqsubseteq v_2$ in my semantics corresponds to the uses of $\subseteq$ in Plotkin and Engeler’s. One can view this as a kind of subsumption, allowing the use of a larger approximation of a function in places where a smaller approximation is needed. I’m not sure whether all the other uses of $\sqsubseteq$ are necessary, but the semantics needs to be downward closed, and the above placement of $\sqsubseteq$’s makes this easy to prove.

Relational Semantics

For people like myself with a background in operational semantics, there is another view of the semantics that is helpful to look at. We can turn the above dentoational semantics into a relational semantics (like a big-step semantics) that hides the $\mathcal{P}$ by making use of the following isomorphism (where $\mathbb{V}$ is one of $\mathbb{V}_S$, $\mathbb{V}_P$, or $\mathbb{V}_E$). \[\mathbb{E} \to (\mathbb{X}\rightharpoonup \mathbb{V}) \to {\mathcal{P}(\mathbb{V})} \quad\cong\quad \mathbb{E} \times (\mathbb{X}\rightharpoonup \mathbb{V}) \times \mathbb{V}\] Let $v$ range over $\mathbb{V}$. We can define the semantic relation $\rho \vdash_S e \Rightarrow v$ that corresponds to $\mathcal{E}_S$ as follows. Note that in the rule for lambda abstraction, the table $t$ comes out of thin air (it is existentially quantified), and that there is one premise in the rule per entry in the table, that is, we have the quantification $\forall(v,v') \in t$. \[\begin{gathered} \frac{}{\rho \vdash_S n \Rightarrow n} \quad \frac {\rho \vdash_S e_1 \Rightarrow n_1 \quad \rho \vdash_S e_2 \Rightarrow n_2} {\rho \vdash_S e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \sqsubseteq \rho(x)} {\rho \vdash_S x \Rightarrow v} \\[3ex] \frac{\forall (v,v'){\in} t.\; \rho(x{:=}v) \vdash_S e \Rightarrow v'} {\rho \vdash_S {\lambda x.\,}e \Rightarrow t} \quad \frac{\begin{array}{c}\rho \vdash_S e_1 \Rightarrow t \quad \rho \vdash_S e_2 \Rightarrow v_2 \\ (v_3,v'_3) \in t \quad v_3 \sqsubseteq v_2 \quad v \sqsubseteq v'_3 \end{array} } {\rho \vdash_S (e_1{\;}e_2) \Rightarrow v} \\[3ex] \frac{\rho \vdash_S e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_S e_2 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_S e_1 \Rightarrow 0 \quad \rho \vdash_S e_3 \Rightarrow v} {\rho \vdash_S {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\]

For comparison, let us also turn Plotkin’s semantics into a relation. \[\begin{gathered} \frac{}{\rho \vdash_P n \Rightarrow n} \quad \frac {\rho \vdash_P e_1 \Rightarrow n_1 \quad \rho \vdash_P e_2 \Rightarrow n_2} {\rho \vdash_P e_1 \oplus e_2 \Rightarrow n_1 \oplus n_2} \quad \frac {v \in \rho(x)} {\rho \vdash_P x \Rightarrow v} \\[3ex] \frac{\forall v' \in V'.\, \rho(x{:=}V) \vdash_P e \Rightarrow v'} {\rho \vdash_P {\lambda x.\,}e \Rightarrow (V,V')} \quad \frac{\begin{array}{c}\rho \vdash_P e_1 \Rightarrow (V,V') \quad \forall v_2 \in V.\, \rho \vdash_P e_2 \Rightarrow v_2 \\ v' \in V' \end{array} } {\rho \vdash_P (e_1{\;}e_2) \Rightarrow v'} \\[3ex] \frac{\rho \vdash_P e_1 \Rightarrow n \quad n \neq 0 \quad \rho \vdash_P e_2 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v} \quad \frac{\rho \vdash_P e_1 \Rightarrow 0 \quad \rho \vdash_P e_3 \Rightarrow v} {\rho \vdash_P {\textbf{if}\,}e_1 {\,\textbf{then}\,}e_2 {\,\textbf{else}\,}e_3 \Rightarrow v}\end{gathered}\] Recall that in Plotkin’s semantics, the environment maps variables to finite sets of values. The “set” is needed to handle the case of a function bound to a variable, but is just extra baggage when we have an integer bound to a variable. So in the variable rule we have $v \in \rho(x)$, which either extracts a singleton integer from $\rho(x)$, or extracts one input-output entry from a function’s graph. Moving on to the lambda rule, it only produces one input-output entry, but to handle the case when the output $V'$ is representing a function, we must build it up one entry at a time with the quantification $\forall v'\in V'$ and a finite but arbitrary number of premises. In the application rule we again have a finite number of premises, with $\forall v_2\in V$, and also the premise $v' \in V'$.

The relational version of Engeler’s semantics removes the need for quantification in the lambda rule, but the application rule still has $\forall v_2 \in V$. \[\begin{gathered} \frac{\rho(x{:=}V) \vdash_E e \Rightarrow v'} {\rho \vdash_E {\lambda x.\,}e \Rightarrow (V,v')} \quad \frac{\begin{array}{c}\rho \vdash_E e_1 \Rightarrow (V,v') \quad \forall v_2 \in V.\, \rho \vdash_E e_2 \Rightarrow v_2 \end{array} } {\rho \vdash_E (e_1{\;}e_2) \Rightarrow v'}\end{gathered}\]

Conclusion

My semantics is similar to Plotkin and Engeler’s in that

The domain is a set of values, and values are inductively defined and involve finite sets.
Self application is enabled by allowing a kind of subsumption on functions.

The really nice thing about all three semantics is that they are simple; very little mathematics is necessary to understand them, which is important pedagogically, practically (easier for practitioners to apply such semantics), and aesthetically (Occam’s razor!).

My semantics is different to Plotkin and Engeler’s in that

the definition of values places $\mathcal{P}_f$ so that functions are literally represented by finite graphs, and
environments map each variable to a single value, and
$\sqsubseteq$ is used instead of $\subseteq$ to enable self application.

The upshot of these (relatively minor) differences is that my semantics may be easier to understand.

POPL submission, pulling together these blog posts on semantics!

2017-07-13T08:44:00.002-07:00

Last week I submitted a paper to POPL 2018 about the new kind of denotational semantics that I've been writing about in this blog, which I am now calling declarative semantics. I think this approach to semantics has the potential to replace operational semantics for the purposes of language specification. The declarative semantics has the advantage of being compositional and extensional while, like operational semantics, using only elementary mathematics. Thus, the declarative semantics should be better than operational semantics for reasoning about programs and for reasoning about the language as a whole (i.e. it's meta-theory). The paper pulls together many of the blog posts, updates them, and adds a semantics for mutable references. The paper is available now on arXiv and the Isabelle mechanization is available here. I hope you enjoy it and I welcome your feedback!

Revisiting "well-typed programs cannot go wrong"

2017-06-07T21:04:00.000-07:00

Robin Milner proved that well-typed programs cannot go wrong in his 1978 paper A Theory of Type Polymorphism in Programming (Milner 1978). That is, he defined a type system and denotational semantics for the Exp language (a subset of ML) and then proved that the denotation of a well-typed program in Exp is not the “wrong” value. The “wrong” denotation signifies that a runtime type error occurred, so Milner’s theorem proves that the type system is strong enough to prevent all the runtime type errors that could occur in an Exp program. The denotational semantics used by Milner (1978) was based on the standard domain theory for an explicitly typed language with higher-order functions.

I have been exploring, over the last month, whether I can prove a similar theorem but using my new denotational semantics, and mechanize the proof in the Isabelle proof assistant. At first I tried to stay as close to Milner’s proof as possible, but in the process I learned that Milner’s proof is rather syntactic and largely consists of proving lemmas about how substitution interacts with the type system, which does not shed much light on the semantics of polymorphism.

Last week I decided to take a step back and try a more semantic approach and switch to a cleaner but more expressive setting, one with first-class polymorphism. So I wrote down a denotational semantics for System F (Reynolds 1974) extended with support for general recursion. The proof that well-typed programs cannot go wrong came together rather quickly. Today I finished the mechanization in Isabelle and it came in at just 539 lines for all the definitions, lemmas, and main proof. I’m excited to share the details of how it went! Spoiler: the heart of the proof turned out to be a lemma I call Compositionality because it looks a lot like the similarly-named lemma that shows up in proofs of parametricity.

Syntax

The types in the language include natural numbers, function types, universal types, and type variables. Regarding the variables, after some experimentation with names and locally nameless, I settled on good old DeBruijn indices to represent both free and bound type variables. \[\begin{array}{rcl} i,j & \in & \mathbb{N} \\ \sigma,\tau & ::= & \mathtt{nat} \mid \tau \to \tau \mid \forall\,\tau \mid i \end{array}\] So the type of the polymorphic identity function, normaly written $\forall \alpha.\, \alpha \to \alpha$, is instead written $\forall \left(0 \to 0\right)$.

The syntax of expressions is as follows. I choose to use DeBruijn indices for term variables as well, and left off all type annotations, but I don’t think that matters for our purposes here. \[\begin{array}{rcl} n & \in & \mathbb{N} \\ e & ::= & n \mid i \mid \lambda e \mid e\; e \mid \Lambda e \mid e [\,] \mid \mathtt{fix}\, e \end{array}\]

Denotational Semantics

The values in this language, described by the below grammar, include natural numbers, functions represented by finite lookup tables, type abstractions, and $\mathsf{wrong}$ to represent a runtime type error. \[\begin{array}{rcl} f & ::= & \{ (v_1,v'_1), \ldots, (v_n,v'_n) \} \\ o & ::= & \mathsf{none} \mid \mathsf{some}(v) \\ v & ::= & n \mid \mathsf{fun}(f) \mid \mathsf{abs}(o) \mid \mathsf{wrong} \end{array}\] A type abstraction $\mathsf{abs}(o)$ consists of an optional value, and not simply a value, because the body of a type abstraction might be a non-terminating computation.

We define the following information ordering on values so that we can reason about one lookup table being more or less-defined than another lookup table. We define $v \sqsubseteq v'$ inductively as follows.

\[\begin{gathered} n \sqsubseteq n \quad \frac{f_1 \subseteq f_2} {\mathsf{fun}(f_1) \sqsubseteq \mathsf{fun}(f_2)} \quad \mathsf{wrong} \sqsubseteq\mathsf{wrong} \\ \mathsf{abs}(\mathsf{none}) \sqsubseteq\mathsf{abs}(\mathsf{none}) \quad \frac{v \sqsubseteq v'} {\mathsf{abs}(\mathsf{some}(v)) \sqsubseteq\mathsf{abs}(\mathsf{some}(v'))}\end{gathered}\]

The denotational semantics maps an expression to a set of values. Why a set and not just a single value? A single finite lookup table is not enough to capture the meaning of a lambda, but an infinite set of finite tables is. However, dealing with sets is somewhat inconvenient, so we mitigate this issue by working in a set monad. Also, to deal with $\mathsf{wrong}$ we need an error monad, so we use a combined set-and-error monad.

\[\begin{aligned} X := E_1 ; E_2 &\equiv \{ v \mid \exists v'. \, v' \in E_1, v' \neq \mathsf{wrong}, v \in E_2[v'/X] \} \\ & \quad \cup \{ v \mid v = \mathsf{wrong}, \mathsf{wrong} \in E_1 \} \\ \mathsf{return}(E) & \equiv \{ v \mid v \sqsubseteq E \} \\ X \leftarrow E_1; E_2 & \equiv \{ v \mid \exists v'.\, v' \in E_1, v \in E_2[v'/X]\}\end{aligned}\]

The use of $\sqsubseteq$ in $\mathsf{return}$ is to help ensure that the meaning of an expression is downward-closed with respect to $\sqsubseteq$. (The need for which is explained in prior blog posts.)

Our semantics will make use of a runtime environment $\rho$ that includes two parts, $\rho_1$ and $\rho_2$. The first part gives meaning to the term variables, for which we use a list of values (indexed by their DeBruijn number). The second part, for the type variables, is a list containing sets of values, as the meaning of a type will be a set of values. We define the following notation for dealing with runtime environments.

\[\begin{aligned} v{::}\rho \equiv (v{::}\rho_1, \rho_2) \\ V{::}\rho \equiv (\rho_1, V{::}\rho_2)\end{aligned}\]

We write $\rho[i]$ to mean either $\rho_1[i]$ or $\rho_2[i]$, which can be disambiguated based on the context of use.

To help define the meaning of $\mathtt{fix}\,e$, we inductively define a predicate named $\mathsf{iterate}$. Its first parameter is the meaning $L$ of the expression $e$, which is a mapping from an environment to a set of values. The second parameter is a runtime environment $\rho$ and the third parameter is a value that is the result of iteration.

\[\begin{gathered} \mathsf{iterate}(L, \rho, \mathsf{fun}(\emptyset)) \quad \frac{\mathsf{iterate}(L, \rho, v) \quad v' \in E(v{::}\rho)} {\mathsf{iterate}(L, \rho, v')}\end{gathered}\]

To help define the meaning of function application, we define the following $\mathsf{apply}$ functiion. \[\mathsf{apply}(V_1,V_2) \equiv \begin{array}{l} x_1 := V_1; \\ x_2 := V_2; \\ \mathsf{case}\,x_1\,\textsf{of}\\ \;\; \mathsf{fun}(f) \Rightarrow (x'_2,x'_3) \leftarrow f; \mathsf{if}\, x'_2 \sqsubseteq x_2 \, \mathsf{then}\, x'_3 \,\mathsf{else}\, \emptyset \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\]

The denotational semantics is given by the following function $E$ that maps an expression and environment to a set of values.

\[\begin{aligned} E[ n ]\rho &= \mathsf{return}(n) \\[1ex] E[ i ]\rho &= \mathsf{return}(\rho[i]) \\[1ex] E[ \lambda e ]\rho &= \{ v \mid \exists f.\, v = \mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f \Rightarrow \\ & \qquad\qquad \exists v_2.\, v_2 \in E[ e ] (v_1{::}\rho), v'_2 \sqsubseteq v_2\} \\[1ex] E[ e_1\; e_2 ] \rho &= \mathsf{apply}(E[ e_1 ]\rho, E[ e_2 ]\rho) \\[1ex] E[ \mathtt{fix}\,e ] \rho &= \{ v \mid \mathsf{iterate}(E[ e ], \rho, v) \} \\[1ex] E[ \Lambda e ] \rho &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V. v' \in E[ e ] (V{::}\rho) \} \\ & \quad\; \cup \{ v \mid v = \mathsf{abs}(\mathsf{none}), \forall V. E[ e ](V{::}\rho) = \emptyset \} \\[1ex] E[ e [\,] ] \rho &= \begin{array}{l} x := E [ e ] \rho;\\ \mathsf{case}\,x\,\mathsf{of} \\ \;\; \mathsf{abs}(\mathsf{none}) \Rightarrow \emptyset \\ \mid \mathsf{abs}(\mathsf{some}(v')) \Rightarrow \mathsf{return}(v') \\ \mid \_ \Rightarrow \mathsf{return}(\mathsf{wrong}) \end{array}\end{aligned}\]

We give meaning to types with the function $T$, which maps a type and an environment to a set of values. For this purposes, we only need the second part of the runtime environment which gives meaning to type variables. Instead of writing $\rho_2$ everywhere, we’ll use the letter $\eta$. It is important to ensure that $T$ is downward closed, which requires some care either in the definition of $T[ \forall \tau ]\eta$ or in the definition of $T[ i ]\eta$. We have chosen to do this work in the definition of $T[ i ]\eta$, and let the definition of $T[ \forall \tau ]\eta$ quantify over any set of values $V$ to give meaning to it’s bound type variable.

\[\begin{aligned} T[ \mathtt{nat} ] \eta &= \mathbb{N} \\ T[ i ] \eta &= \begin{cases} \{ v \mid \exists v'.\, v' \in \eta[i], v \sqsubseteq v',v \neq \mathsf{wrong} \} &\text{if } i < |\eta| \\ \emptyset & \text{otherwise} \end{cases} \\ T[ \sigma\to\tau ] \eta &= \{ v\mid \exists f. \,v=\mathsf{fun}(f), \forall v_1 v'_2.\, (v_1,v'_2) \in f, v_1 \in T[\sigma]\eta \\ & \hspace{1.5in} \Rightarrow \exists v_2.\, v_2 \in T[\tau]\eta, v'_2 \sqsubseteq v_2 \} \\ T[ \forall\tau ] \eta &= \{ v \mid \exists v'.\, v = \mathsf{abs}(\mathsf{some}(v')), \forall V.\, v' \in T[\tau ] (V{::}\eta) \} \cup \{ \mathsf{abs}(\mathsf{none}) \} \end{aligned}\]

Type System

Regarding the type system, it is standard except perhaps how we deal with the DeBruijn representation of type variables. We begin with the definition of well-formed types. A type is well formed if all the type variables in it are properly scoped, which is captured by their indices being below a given threshold (the number of enclosing type variable binders, that is, $\Lambda$’s and $\forall$’s). More formally, we write $j \vdash \tau$ to say that type $\tau$ is well-formed under threshold $j$, and give the following inductive definition.

\[\begin{gathered} j \vdash \mathtt{nat} \quad \frac{j \vdash \sigma \quad j \vdash \tau}{j \vdash \sigma \to \tau} \quad \frac{j+1 \vdash \tau }{j \vdash \forall \tau} \quad \frac{i < j}{j \vdash i}\end{gathered}\]

Our representation of the type environment is somewhat unusual. Because term variables are just DeBruijn indices, we can use a list of types (instead of a mapping from names to types). However, to keep track of the type-variable scoping, we also include with each type the threshold from its point of definition. Also, we need to keep track of the current threshold, so when we write $\Gamma$, we mean a pair where $\Gamma_1$ is a list and $\Gamma_2$ is a number. The list consists of pairs of types and numbers, so for example, $\Gamma_1[i]_1$ is a type and $\Gamma_1[i]_2$ is a number whenever $i$ is less than the length of $\Gamma_1$. We use the following notation for extending the type environment:

\[\begin{aligned} \tau :: \Gamma &\equiv ((\tau,\Gamma_2){::}\Gamma_1, \Gamma_2) \\ * :: \Gamma & \equiv (\Gamma_1, \Gamma_2 + 1)\end{aligned}\]

We write $\vdash \rho : \Gamma$ to say that environment $\rho$ is well-typed according to $\Gamma$ and define it inductively as follows.

\[\begin{gathered} \vdash ([],[]) : ([], 0) \quad \frac{\vdash \rho : \Gamma \quad v \in T[ \tau ] \rho_2} {\vdash v{::}\rho : \tau{::}\Gamma} \quad \frac{\vdash \rho : \Gamma} {\vdash V{::}\rho : *{::}\Gamma}\end{gathered}\]

The primary operation that we perform on a type environment is looking up the type associated with a term variable, for which we define the following function $\mathsf{lookup}$ that maps a type environment and DeBruijn index to a type. To make sure that the resulting type is well-formed in the current environment, we must increase all of its free type variables by the difference of the current threshold $\Gamma_2$ and the threshold at its point of definition, $\Gamma_1[i]_2$, which is accomplished by the shift operator $\uparrow^k_c(\tau)$ (Pierce 2002). \[\mathsf{lookup}(\Gamma,i) \equiv \begin{cases} \mathsf{some}(\uparrow^{k}_{0}(\Gamma_1[i]_1) & \text{if } n < |\Gamma_1| \\ & \text{where } k = \Gamma_2 - \Gamma_1[i]_2 \\ \mathsf{none} & \text{otherwise} \end{cases}\]

To review, the shift operator is defined as follows.

\[\begin{aligned} \uparrow^{k}_{c}(\mathtt{nat}) &= \mathtt{nat} \\ \uparrow^{k}_{c}(i) &= \begin{cases} i + k & \text{if } c \leq i \\ i & \text{otherwise} \end{cases} \\ \uparrow^{k}_{c}(\sigma \to \tau) &= \uparrow^{k}_{c}(\sigma) \to \uparrow^{k}_{c}(\tau) \\ \uparrow^{k}_{c}(\forall \tau) &= \forall\, \uparrow^{k}_{c+1}(\tau)\end{aligned}\]

Last but not least, we need to define type substitution so that we can use it in the typing rule for instantiation (type application). We write $[j\mapsto \tau]\sigma$ for the substitution of type $\tau$ for DeBruijn index $j$ within type $\sigma$ (Pierce 2002).

\[\begin{aligned} [j\mapsto \tau]\mathtt{nat} &= \mathtt{nat} \\ [j\mapsto\tau]i &= \begin{cases} \tau & \text{if } j = i \\ i - 1 & \text{if } j < i \\ i & \text{otherwise} \end{cases}\\ [j\mapsto\tau](\sigma\to\sigma') &= [j\mapsto\tau]\sigma \to [j\mapsto \tau]\sigma' \\ [j\mapsto \tau]\forall\sigma &= \forall\, [j+1 \mapsto \uparrow^{1}_{0}(\tau)]\sigma\end{aligned}\]

Here is the type system for System F extended with $\mathtt{fix}$.

\[\begin{gathered} \Gamma \vdash n : \mathtt{nat} \qquad \frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau} \qquad \frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau} \\[2ex] \frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau} \\[2ex] \frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau} \qquad \frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}\end{gathered}\]

We say that a type environment $\Gamma$ is well-formed if $\Gamma_2$ is greater or equal to every threshold in $\Gamma_1$, that is $\Gamma_1[i]_2 \leq \Gamma_2$ for all $i < |\Gamma_1|$.

Proof of well-typed programs cannot go wrong

The proof required 6 little lemmas and 4 big lemmas. (There were some itsy bitsy lemmas too that I’m not counting.)

Little Lemmas

Lemma [$\sqsubseteq$ is a preorder]

$v \sqsubseteq v$
If $v_1 \sqsubseteq v_2$ and $v_2 \sqsubseteq v_3$, then $v_1 \sqsubseteq v_3$.

[lem:less-refl] [lem:less-trans]

I proved transitivity by induction on $v_2$.

Lemma [$T$ is downward closed] If $v \in T [ \tau ] \eta$ and $v' \sqsubseteq v$, then $v' \in T [ \tau ] \eta$. [lem:T-down-closed]

The above is a straightforward induction on $\tau$

Lemma [$\mathsf{wrong}$ not in $T$] For any $\tau$ and $\eta$, $\mathsf{wrong} \notin T [ \tau ] \eta$. [lem:wrong-not-in-T]

The above is another straightforward induction on $\tau$

Lemma If $\vdash \rho : \Gamma$, then $\Gamma$ is a well-formed type environment. [lem:wfenv-good-ctx]

The above is proved by induction on the derivation of $\vdash \rho : \Gamma$.

Lemma \[T [ \tau ] (\eta_1 \eta_3) = T [ \uparrow^{|\eta_2|}_{ |\eta_1|}(\tau) ] (\eta_1\eta_2\eta_3)\]

The above lemma is proved by induction on $\tau$. It took me a little while to figure out the right strengthening of the statement of this lemma to get the induction to go through. The motivations for this lemma were the following corollaries.

Corollary [Lift/Append Preserves $T$] \[T [ \tau ](\eta_2) = T [ \uparrow^{|\eta_1|}_{0}(\tau) ] (\eta_1\eta_2)\] [lem:lift-append-preserves-T]

Corollary[Lift/Cons Preserves $T$] \[T [ \tau ] (\eta) = T [ \uparrow^{1}_{0}(\tau) ] (V{::}\eta)\] [lem:shift-cons-preserves-T]

Of course, two shifts can be composed into a single shift by adding the amounts.

Lemma [Compose Shift] \[\uparrow^{j+k}_{c}(\tau) = \uparrow^{j}_{c}( \uparrow^{k}_{c}(\tau))\] [lem:compose-shift]

The proof is a straightforward induction on $\tau$.

Big Lemmas

There are one or two big lemmas for each of the “features” in this variant of System F.

The first lemma shows that well-typed occurrences of term variables cannot go wrong.

Lemma [Lookup in Well-typed Environment]
If $\vdash \rho : \Gamma$ and $\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)$, then $\exists v.\, \rho_1[i] = v$ and $v \in T [ \tau ] \rho_2$. [lem:lookup-wfenv]

The proof is by induction on the derivation of $\vdash \rho : \Gamma$. The first two cases were straightforward but the third case required some work and used lemmas [lem:wfenv-good-ctx], [lem:shift-cons-preserves-T], and [lem:compose-shift].

Lemma [Application cannot go wrong] If $V \subseteq T [ \sigma \to \tau ] \eta$ and $V' \subseteq T [ \sigma ] \eta$, then $\mathsf{apply}(V,V') \subseteq T [ \tau ] \eta$. [lem:fun-app]

The proof of this lemma is direct and does not use induction. However, it does use lemmas [lem:wrong-not-in-T] and [lem:T-down-closed].

Lemma [Compositionality] Let $V = T [ \sigma ] (\eta_1\eta_2)$. \[T [ \tau ] (\eta_1 V \eta_2) = T [ \tau[\sigma/|\eta_1|] ] (\eta_1 \eta_2)\] [lem:compositionality]

I proved the Compositionality lemma by induction on $\tau$. All of the cases were straightforward except for $\tau=\forall\tau'$. In that case I used the induction hypothesis to show that \[T [ \tau' ] (V \eta_1 S \eta_2) = T [ ([|V\eta_1|\mapsto \uparrow^1_0(\sigma)] \tau' ] (V\eta_1\eta_2) \text{ where } S = T [ \uparrow^1_0(\sigma) ] (V\eta_1\eta_2)\] and I used Lemma [lem:shift-cons-preserves-T].

Lemma [Iterate cannot go wrong] If

$\mathsf{iterate}(L,\rho,v)$ and
for any $v'$, $v' \in T[ \sigma\to\tau ] \rho_2$ implies $L(v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2$,

then $v \in T [ \sigma \to \tau ] \rho_2$. [lem:iterate-sound]

This was straightfroward to prove by induction on the derivation of $\mathsf{iterate}(L,\rho,v)$. The slightly difficult part was coming up with the definition of $\mathsf{iterate}$ to begin with and in formulating the second premise.

The Theorem

Theorem [Well-typed programs cannot go wrong]
If $\Gamma \vdash e : \tau$ and $\vdash \rho : \Gamma$, then $E [ e ] \rho \subseteq T[ \tau ] \rho_2$. [thm:welltyped-dont-go-wrong]

The proof is by induction on the derivation of $\Gamma \vdash e : \tau$.

$\Gamma \vdash n : \mathtt{nat}$

This case is immediate.
$\frac{\mathsf{lookup}(\Gamma,i) = \mathsf{some}(\tau)} {\Gamma \vdash i : \tau}$

Lemma [lem:lookup-wfenv] tells us that $\rho_1[i] = v$ and $v \in T [ \tau ] \rho_2$ for some $v$. We conclude by Lemma [lem:T-down-closed].
$\frac{\Gamma_2 \vdash \sigma \quad \sigma{::}\Gamma \vdash e : \tau} {\Gamma \vdash \lambda e : \sigma \to \tau}$

After unraveling some definitions, for arbitrary $f,v_1,v_2,v'_2$ we can assume $v_1 \in T [ \sigma ] \rho_2$, $v_2 \in E [ e ](v_1{::}\rho)$, and $v'_2 \sqsubseteq v_2$. We need to prove that $v_2 \in T [ \tau ] (v_1{::}\rho)_2$.

We can show $\vdash v_1{::}\rho : \sigma{::}\Gamma$ and therefore, by the induction hypothesis, $E [ e ] (v_1{::}\rho) \subseteq T [ \tau ] (v_1{::}\rho)_2$. So we conclude that $v_2 \in T [ \tau ] (v_1{::}\rho)_2$.
$\frac{\Gamma \vdash e : \sigma \to \tau \quad \Gamma \vdash e' : \sigma} {\Gamma \vdash e \; e' : \tau}$

By the induction hypothesis, we have $E [ e ] \rho \subseteq T [ \sigma\to\tau ] \rho_2$ and $E [ e' ] \rho \subseteq T [ \sigma ] \rho_2$. We conclude by Lemma [lem:fun-app].
$\frac{\Gamma_2 \vdash \sigma \to \tau \quad (\sigma\to \tau){::}\Gamma \vdash e : \sigma \to \tau } {\Gamma \vdash \mathtt{fix}\,e : \sigma \to \tau}$

For an arbitrary $v$, we may assume $\mathsf{iterate}(E[ e ], \rho, v)$ and need to show that $v \in T [ \sigma\to\tau ]\rho_2$.

In preparation to apply Lemma [lem:iterate-sound], we first prove that for any $v'$, $v' \in T[ \sigma\to\tau ] \rho_2$ implies $E[ e](v'{::}\rho) \subseteq T[ \sigma\to\tau ] \rho_2$. Assume $v'' \in E[ e](v'{::}\rho)$. We need to show that $v'' \in T[ \sigma\to\tau ] \rho_2$. We have $\vdash v'{::}\rho : (\sigma\to\tau){::}\Gamma$, so by the induction hypothesis $E [ e ](v'{::}\rho) \subseteq T[ \sigma\to\tau ](v'{::}\rho)$. From this we conclude that $v'' \in T[ \sigma\to\tau ] \rho_2$.

We then apply Lemma [lem:iterate-sound] to conclude this case.
$\frac{*::\Gamma \vdash e : \tau} {\Gamma \vdash \Lambda e :: \forall\tau}$

After unraveling some definitions, for an arbitrary $v'$ and $V$ we may assume that $\forall V'.\, v' \in E [ e ](V{::}\rho)$. We need to show that $v' \in T [ \tau ] (V{::}\rho_2)$. We have $\vdash V{::}\rho : *{::}\Gamma$, so by the induction hypothesis $E[ e ](V{::}\rho) \subseteq T [ \tau ] (V{::}\rho)_2$. Also, from the assumption we have $v' \in E [ e ](V{::}\rho)$, so we can conclude.
$\frac{\Gamma \vdash e : \forall \tau} {\Gamma \vdash e[\,] : [0\mapsto\sigma]\tau}$

Fix a $v' \in E [ e ] \rho$. We have three cases to consider.
1. $v'=\mathsf{abs}(\mathsf{none})$. This case is immediate.
2. $v'=\mathsf{abs}(\mathsf{some}(v''))$ for some $v''$. By the induction hypothesis, $v' \in T [ \forall\tau ]\rho_2$. So we have $v'' \in T [ \tau ](V{::}\rho_2)$ where $V=T[\sigma]\rho_2$. Then by Compositionality (Lemma [lem:compositionality]) we conclude that $v'' \in T [ [0\mapsto \sigma]\tau]\rho_2$.
3. $v'$ is some other kind of value. This can’t happen because, by the induction hypothesis, $v' \in T [ \forall\tau ]\rho_2$.

References

Milner, Robin. 1978. “A Theory of Type Polymorphism in Programming.” Journal of Computer and System Sciences 17 (3): 348–75.

Pierce, Benjamin C. 2002. Types and Programming Languages. MIT Press.

Reynolds, John C. 1974. “Towards a Theory of Type Structure.” In Programming Symposium: Proceedings, Colloque Sur La Programmation, 19:408–25. LNCS. Springer-Verlag.

Consolidation of the Denotational Semantics and an Application to Compiler Correctness

2017-03-24T11:20:00.000-07:00

This is a two part post. The second part depends on the first.

Part 1. Consolidation of the Denotational Semantics

As a matter of expediency, I've been working with two different versions of the intersection type system upon which the denotational semantics is based, one version with subsumption and one without. I had used the one with subsumption to prove completeness with respect to the reduction semantics whereas I had used the one without subsumption to prove soundness (for both whole programs and parts of programs, that is, contextual equivalence). The two versions of the intersection type system are equivalent. However, it would be nice to simplify the story and just have one version. Also, while the correspondence to intersection types has been enormously helpful in working out the theory, it would be nice to have a presentation of the semantics that doesn't talk about them and instead talks about functions as tables.

Towards these goals, I went back to the proof of completeness with respect to the reduction semantics and swapped in the "take 3" semantics. While working on that I realized that the subsumption rule was almost admissible in the "take 3" semantics, just the variable and application equations needed more uses of $\sqsubseteq$. With those changes in place, the proof of completeness went through without a hitch. So here's the updated definition of the denotational semantics of the untyped lambda calculus.

The definition of values remains the same as last time: \[ \begin{array}{lrcl} \text{function tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] as does the $\sqsubseteq$ operator. \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} For the denotation function $E$, we add uses of $\sqsubseteq$ to the equations for variables ($v \sqsubseteq \rho(x)$) and function application ($v_3 \sqsubseteq v_3'$). (I've also added the conditional expression $\mathbf{if}\,e_1\,e_2\,e_3$ and primitive operations on numbers $f(e_1,e_2)$, where $f$ ranges over binary functions on numbers.) \begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ v \mid v \sqsubseteq \rho(x) \} \\ E[\!| \lambda x.\, e |\!](\rho) &= \left\{ T \middle| \begin{array}{l} \forall v_1 v_2'. \, v_1\mapsto v_2' \in T \Rightarrow\\ \exists v_2.\, v_2 \in E[\!| e |\!](\rho(x{:=}v_1)) \land v_2' \sqsubseteq v_2 \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists T v_2 v_2' v_3'.\, T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, v'_2\mapsto v_3' \in T \land v'_2 \sqsubseteq v_2 \land v_3 \sqsubseteq v_3' \end{array} \right\} \\ E[\!| f(e_1, e_2) |\!](\rho) &= \{ f(n_1,n_2) \mid \exists n_1 n_2.\, n_1 \in E[\!| e_1 |\!](\rho) \land n_2 \in E[\!| e_2 |\!](\rho) \} \\ E[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](\rho) &= \left\{ v \, \middle| \begin{array}{l} v \in E[\!| e_2 |\!](\rho) \quad \text{if } n \neq 0 \\ v \in E[\!| e_3 |\!](\rho) \quad \text{if } n = 0 \end{array} \right\} \end{align*}

Here are the highlights of the results for this definition.

Proposition (Admissibility of Subsumption)
If $v \in E[\!| e |\!] $ and $v' \sqsubseteq v$, then $v' \in E[\!| e |\!] $.

Theorem (Reduction implies Denotational Equality)

If $e \longrightarrow e'$, then $E[\!| e |\!] = E[\!| e' |\!]$.
If $e \longrightarrow^{*} e'$, then $E[\!| e |\!] = E[\!| e' |\!]$.

Theorem (Whole-program Soundness and Completeness)

If $v' \in E[\!| e |\!](\emptyset)$, then $e \longrightarrow^{*} v$ and $v' \in E[\!| v |\!](\emptyset)$.
If $e \longrightarrow^{*} v$, then $v' \in E[\!| e |\!](\emptyset) $ and $v' \in E[\!| v |\!](\emptyset) $ for some $v'$.

Proposition (Denotational Equality is a Congruence)
For any context $C$, if $E[\!| e |\!] = E[\!| e' |\!]$, then $E[\!| C[e] |\!] = E[\!| C[e'] |\!]$.

Theorem (Soundness wrt. Contextual Equivalence)
If $E[\!| e |\!] = E[\!| e' |\!]$, then $e \simeq e'$.

Part 2. An Application to Compiler Correctness

Towards finding out how useful this denotational semantics is, I've begun looking at using it to prove compiler correctness. I'm not sure exactly which compiler I want to target yet, but as a first step, I wrote a simple source-to-source optimizer $\mathcal{O}$ for the lambda calculus. It performs inlining and constant folding and simplifies conditionals. The optimizer is parameterized over the inlining depth to ensure termination. We perform optimization on the body of a function after inlining, so this is a polyvariant optimizer. Here's the definition. \begin{align*} \mathcal{O}[\!| x |\!](k) &= x \\ \mathcal{O}[\!| n |\!](k) &= n \\ \mathcal{O}[\!| \lambda x.\, e |\!](k) &= \lambda x.\, \mathcal{O}[\!| e |\!](k) \\ \mathcal{O}[\!| e_1\,e_2 |\!](k) &= \begin{array}{l} \begin{cases} \mathcal{O}[\!| [x{:=}e_2'] e |\!] (k{-}1) & \text{if } k \geq 1 \text{ and } e_1' = \lambda x.\, e \\ & \text{and } e_2' \text{ is a value} \\ e_1' \, e_2' & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| f(e_1,e_2) |\!](k) &= \begin{array}{l} \begin{cases} f(n_1,n_2) & \text{if } e_1' = n_1 \text{ and } e_2' = n_2 \\ f(e_1',e_2') & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k) \end{array} \\ \mathcal{O}[\!| \mathbf{if}\,e_1\,e_2\,e_3 |\!](k) &= \begin{array}{l} \begin{cases} e_2' & \text{if } e_1' = n \text{ and } n \neq 0 \\ e_3' & \text{if } e_1' = n \text{ and } n = 0 \\ \mathbf{if}\,e_1'\, e_2'\,e_3'|\!](k) & \text{otherwise} \end{cases}\\ \text{where } e_1' = \mathcal{O}[\!|e_1 |\!](k) \text{ and } e_2' = \mathcal{O}[\!|e_2 |\!](k)\\ \text{ and } e_3' = \mathcal{O}[\!|e_3 |\!](k) \end{array} \end{align*}

I've proved that this optimizer is correct. The first step was proving that it preserves denotational equality.

Lemma (Optimizer Preserves Denotations)
$E(\mathcal{O}[\!| e|\!](k)) = E[\!|e|\!] $
Proof
The proof is by induction on the termination metric for $\mathcal{O}$, which is the lexicographic ordering of $k$ then the size of $e$. All the cases are straightforward to prove because Reduction implies Denotational Equality and because Denotational Equality is a Congruence. QED

Theorem (Correctness of the Optimizer)
$\mathcal{O}[\!| e|\!](k) \simeq e$
Proof
The proof is a direct result of the above Lemma and Soundness wrt. Contextual Equivalence. QED

Of course, all of this is proved in Isabelle. Here is the tar ball. I was surprised that this proof of correctness for the optimizer was about the same length as the definition of the optimizer!

The Take 3 Semantics, Revisited

2017-03-10T20:31:00.000-08:00

In my post about intersection types as denotations, I conjectured that the simple "take 3" denotational semantics is equivalent to an intersection type system. I haven't settled that question per se, but I've done something just as good, which is to show that everything that I've done with the intersection type system can also be done with the "take 3" semantics (with a minor modification).

Recall that the main difference between the "take 3" semantics and the intersection type system is how subsumption of functions is handled. The "take 3" semantics defined function application as follows, using the subset operator $\sqsubseteq$ to require the argument $v_2$ to include all the entries in the parameter $v'_2$, while allowing $v_2$ to have possibly more entries. \begin{align*} E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v_3 \middle| \begin{array}{l} \exists v_1 v_2 v'_2.\, v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land\, \{ v'_2\mapsto v_3 \} \sqsubseteq v_1 \land v'_2 \sqsubseteq v_2 \end{array} \right\} \end{align*} Values are either numbers or functions. Functions are represented as a finite tables mapping values to values. \[ \begin{array}{lrcl} \text{tables} & T & ::= & \{ v_1\mapsto v'_1,\ldots,v_n\mapsto v'_n \} \\ \text{values} & v & ::= & n \mid T \end{array} \] and $\sqsubseteq$ is defined as equality on numbers and subset for function tables: \begin{gather*} \frac{}{n \sqsubseteq n} \qquad \frac{T_1 \subseteq T_2}{T_1 \sqsubseteq T_2} \end{gather*} Recall that $\subseteq$ is defined in terms of equality on elements.

In an intersection type system (without subsumption), function application uses subtyping. Here's one way to formulate the typing rule for application: \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A' \to B \quad A <: A'} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Types are defined as follows \[ \begin{array}{lrcl} \text{types} & A,B,C & ::= & n \mid A \to B \mid A \land B \mid \top \end{array} \] and the subtyping relation is given below. \begin{gather*} \frac{}{n <: n}(a) \quad \frac{}{\top <: \top}(b) \quad \frac{}{A \to B <: \top}(c) \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'}(d) \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B}(e) \quad \frac{}{A \wedge B <: A}(f) \quad \frac{}{A \wedge B <: B}(g) \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)}(h) \end{gather*} Recall that values and types are isomorphic (and dual) to eachother in this setting. Here's the functions $\mathcal{T}$ and $\mathcal{V}$ that map back and forth between values and types. \begin{align*} \mathcal{T}(n) &= n \\ \mathcal{T}( \{ v_1 \mapsto v'_1, \ldots, v_n \mapsto v'_n \} ) &= \mathcal{T}(v_1) {\to} \mathcal{T}(v'_1) \land \cdots \land \mathcal{T}(v_n) {\to} \mathcal{T}(v'_n) \\[2ex] \mathcal{V}(n) &= n \\ \mathcal{V}(A \to B) &= \{ \mathcal{V}(A)\mapsto\mathcal{V}(B) \} \\ \mathcal{V}(A \land B) &= \mathcal{V}(A) \cup \mathcal{V}(B)\\ \mathcal{V}(\top) &= \emptyset \end{align*}

Given that values and types are really the same, the the typing rule for application is almost the same as the equation for the denotation of $E[\!| e_1\;e_2 |\!](\rho)$. The only real difference is the use of $<:$ versus $\sqsubseteq$. However, subtyping is a larger relation than $\sqsubseteq$, i.e., $v_1 \sqsubseteq v_2$ implies $\mathcal{T}(v_1) <: \mathcal{T}(v_2)$ but it is not the case that $A <: B$ implies $\mathcal{V}(A) \sqsubseteq \mathcal{V}(B)$. Subtyping is larger because of rules $(d)$ and $(h)$. The other rules just express the dual of $\subseteq$.

So the natural question is whether subtyping needs to be bigger than $\sqsubseteq$, or would we get by with just $\sqsubseteq$? In my last post, I mentioned that rule $(h)$ was not necessary. Indeed, I removed it from the Isabelle formalization without disturbing the proofs of whole-program soundness and completeness wrt. operational semantics, and was able to carry on and prove soundness wrt. contextual equivalence. This morning I also replaced rule $(d)$ with a rule that only allows equal function types to be subtypes. \[ \frac{}{A \to B <: A \to B}(d') \] The proofs went through again! Though I did have to make two minor changes in the type system without subsumption to ensure that it stays equivalent to the version of the type system with subsumption. I used the rule given above for function application instead of \[ \frac{\Gamma \vdash_2 e_1: C \quad \Gamma \vdash_2 e_2 : A \quad \quad C <: A \to B} {\Gamma \vdash_2 e_1 \; e_2 : B} \] Also, I had to change the typing rule for $\lambda$ to use subtyping to relate the body's type to the return type. \[ \frac{\Gamma,x:A \vdash e : B' \qquad B' <: B} {\Gamma \vdash \lambda x.\, e : A \to B} \] Transposing this back into the land of denotational semantics and values, we get the following equation for the meaning of $\lambda$, in which everything in the return specification $v_2$ must be contained in the value $v'_2$ produced by the body. \[ E[\!| \lambda x.\; e |\!] (\rho) = \left\{ v \middle| \begin{array}{l}\forall v_1 v_2. \{v_1\mapsto v_2\} \sqsubseteq v \implies \\ \exists v_2'.\; v'_2 \in E[\!| e |\!] (\rho(x{:=}v_1)) \,\land\, v_2 \sqsubseteq v'_2 \end{array} \right\} \]

So with this little change, the "take 3" semantics is a great semantics for the call-by-value untyped lambda calculus! For whole programs, it's sound and complete with respect to the standard operational semantics, and it is also sound with respect to contextual equivalence.

Sound wrt. Contextual Equivalence

2017-03-08T08:59:00.002-08:00

The ICFP paper submission deadline kept me busy for much of February, but now I'm back to thinking about the simple denotational semantics of the lambda calculus. In previous posts I showed that this semantics is equivalent to standard operational semantics when considering the behavior of whole programs. However, sometimes it is necessary to reason about the behavior of program fragments and we would like to use the denotational semantics for this as well. For example, an optimizing compiler might want to exchange one expression for another less-costly expression that does the same job.

The formal notion of two such ``exchangeable'' expressions is contextual equivalence (Morris 1968). It says that two expression are equivalent if plugging them into an arbitrary context produces programs that behave the same.

Definition (Contextual Equivalence)
Two expressions $e_1$ and $e_2$ are contextually equivalent, written $e_1 \simeq e_2$, iff for any closing context $C$, \[ \mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]). \]

We would like to know that when two expressions are denotationally equal, then they are also contextually equivalent.

Theorem (Sound wrt. Contextual Equivalence)
If $E[e_1]\Gamma = E[e_2]\Gamma$ for any $\Gamma$, then $e_1 \simeq e_2$.

The rest of the blog post gives an overview of the proof (except for the discussion of related work at the very end). The details of the proof are in the Isabelle mechanization. But first we need to define the terms used in the above statements.

Definitions

Recall that our denotational semantics is defined in terms of an intersection type system. The meaning of an expression is the set of all types assigned to it by the type system. \[ E[e]\Gamma \equiv \{ A \mid \Gamma \vdash_2 e : A \} \] Recall that the types include singletons, functions, intersections, and a top type: \[ A,B,C ::= n \mid A \to B \mid A \land B \mid \top \] I prefer to think of these types as values, where the function, intersection, and top types are used to represent finite tables that record the input-output values of a function.

The intersection type system that we use here differs from the one in the previous post in that we remove the subsumption rule and sprinkle uses of subtyping elsewhere in a standard fashion (Pierce 2002).

\begin{gather*} \frac{}{\Gamma \vdash_2 n : n} \\[2ex] \frac{} {\Gamma \vdash_2 \lambda x.\, e : \top} \quad \frac{\Gamma \vdash_2 \lambda x.\, e : A \quad \Gamma \vdash_2 \lambda x.\, e : B} {\Gamma \vdash_2 \lambda x.\, e : A \wedge B} \\[2ex] \frac{x:A \in \Gamma}{\Gamma \vdash_2 x : A} \quad \frac{\Gamma,x:A \vdash_2 B} {\Gamma \vdash_2 \lambda x.\, e : A \to B} \\[2ex] \frac{\Gamma \vdash_2 e_1: C \quad C <: A \to B \quad \Gamma \vdash_2 e_2 : A} {\Gamma \vdash_2 e_1 \; e_2 : B} \\[2ex] \frac{\begin{array}{l}\Gamma \vdash_2 e_1 : A \quad A <: n_1 \\ \Gamma \vdash_2 e_2 : B \quad B <: n_2 \end{array} \quad [\!|\mathit{op}|\!](n_1,n_2) = n_3} {\Gamma \vdash_2 \mathit{op}(e_1,e_2) : n_3} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: 0 \quad \Gamma \vdash_2 e_3 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \\[2ex] \frac{\Gamma \vdash_2 e_1 : A \quad A <: n \quad n \neq 0 \quad \Gamma \vdash_2 e_2 : B} {\Gamma \vdash_2 \mathrm{if}\,e_1\,\mathrm{then}\,e_2\,\mathrm{else}\,e_3 : B} \end{gather*}

Regarding subtyping, we make a minor change and leave out the rule \[ \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \] because I had a hunch that it wasn't needed to prove Completeness with respect to the small step semantics, and indeed it was not. So the subtyping relation is defined as follows.

\begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \end{gather*}

This type system is equivalent to the one with subsumption in the following sense.

Theorem (Equivalent Type Systems)

If $\Gamma \vdash e : A$, then $\Gamma \vdash_2 e : A'$ and $A' <: A$ for some $A'$.
If $\Gamma \vdash_2 e : A$, then $\Gamma \vdash e : A$.

Proof
The proofs of the two parts are straightforward inductions on the derivations of the typing judgments. QED

This type system satisfies the usual progress and preservation properties.

Theorem (Preservation)
If $\Gamma \vdash_2 e : A$ and $e \longrightarrow e'$, then $\Gamma \vdash_e e' : A'$ and $A' <: A$ for some $A'$.
Proof
The proof of preservation is by induction on the derivation of the reduction. The case for $\beta$ reduction relies on lemmas about substitution and type environments. QED

Theorem (Progress)
If $\emptyset \vdash_2 e : A$ and $\mathrm{FV}(e) = \emptyset$, then $e$ is a value or $e \longrightarrow e'$ for some $e'$.
Proof
The proof of progress is by induction on the typing derivation. As usual it relies on a canonical forms lemma. QED

Lemma (Canonical forms)
Suppose $\emptyset \vdash_2 v : A$.

If $A <: n$, then $v = n$.
If $A <: B \to C$, then $v = \lambda x.\, e$ for some $x,e$.

Next we turn to the definition of $\mathit{eval}$. As usual, we shall define the behavior of a program in terms of the operational (small-step) semantics and an $\mathit{observe}$ function. \begin{align*} \mathit{eval}(e) &= \begin{cases} \mathit{observe}(v) & \text{if } e \longrightarrow^{*} v \\ \mathtt{bad} & \text{otherwise} \end{cases}\\ \mathit{observe}(n) &= n \\ \mathit{observe}(\lambda x.\, e) &= \mathtt{fun} \end{align*} In the above we categorize programs as $\mathtt{bad}$ if they do not produce a value. Thus, we are glossing over the distinction between programs that diverge and programs that go wrong (e.g., segmentation fault). We do this because our denotational semantics does not make such a distinction. However, I plan to circle back to this issue in the future and develop a version of the semantics that does.

Soundness wrt. Contextual Equivalence

We assume that $E[e_1]\Gamma = E[e_2]\Gamma$ for any $\Gamma$ and need to show that $e_1 \simeq e_2$. That is, we need to show that $\mathsf{eval}(C[e_1]) = \mathsf{eval}(C[e_2]) $ for any closing context $C$. We shall prove Congruence which lets us lift the denotational equality of $e_1$ and $e_2$ through any context, so we have \begin{equation} E[C[e_1]]\emptyset = E[C[e_2]]\emptyset \qquad\qquad (1) \end{equation} Now let us consider the cases for $\mathsf{eval}(C[e_1])$.

Case $\mathsf{eval}(C[e_1]) = \mathit{observe}(v)$ and $C[e_1] \longrightarrow^{*} v$:
By Completeness of the intersection type system we have $\emptyset \vdash_2 C[e_1] : A$ and $\emptyset \vdash_2 v : A'$ for some $A,A'$ such that $A' <: A$. Then with (1) we have \begin{equation} \emptyset \vdash_2 C[e_2] : A \qquad\qquad (2) \end{equation} The type system is sound wrt. the big-step semantics, so $\emptyset \vdash C[e_2] \Downarrow v'$ for some $v'$. Therefore $C[e_2] \longrightarrow^{*} v''$ because the big-step semantics is sound wrt. the small-step semantics. It remains to show that $\mathit{observe}(v'') = \mathit{observe}(v)$. From (2) we have $\emptyset \vdash_2 v'' : A''$ for some $A''$ where $A'' <: A$, by Preservation. Noting that we already have $\emptyset \vdash_2 v : A'$, $\emptyset \vdash_2 v'' : A''$, $A' <: A$, and $A'' <: A$, we conclude that $\mathit{observe}(v) = \mathit{observe}(v'')$ by the Lemma Observing values of subtypes.
Case $\mathsf{eval}(C[e_1]) = \mathtt{bad}$:
So $C[e_1]$ either diverges or gets stuck. In either case, we have $E[C[e_1]]\emptyset = \emptyset $ (Lemmas Diverging programs have no meaning and Programs that get stuck have no meaning). So by (1) we have $E[C[e_2]]\emptyset = \emptyset$. We conclude that $C[e_2]$ either diverges or gets stuck by Lemma (Programs with no meaning diverge or get stuck). Thus, $\mathsf{eval}(C[e_2]) = \mathtt{bad}$.

QED

Lemma (Congruence)
Let $C$ be an arbitrary context. If $E[e_1]\Gamma' = E[e_2]\Gamma'$ for any $\Gamma'$, then $E[C[e_1]]\Gamma = E[C[e_2]]\Gamma$.
Proof
We prove congruence by structural induction on the context $C$, using the induction hypothesis and the appropriate Compatibility lemma for each kind of expression. QED

Most of the Compatibility lemmas are straightforward, though the one for abstraction is worth discussing.

Lemma (Compatibility for abstraction)
If $E[e_1]\Gamma' = E[e_2]\Gamma'$ for any $\Gamma'$, then $E[\lambda x.\, e_1]\Gamma = E[\lambda x.\, e_2]\Gamma$.
Proof
To prove compatibility for abstractions, we first prove that

If $\Gamma' \vdash_2 e_1 : B$ implies $\Gamma' \vdash_2 e_2 : B$ for any $\Gamma',B$, then $\Gamma \vdash_2 \lambda x.\, e_1 : C$ implies $\Gamma \vdash_2 \lambda x.\, e_2 : C$.

This is a straightforward induction on the type $C$. Compatibility follows by two uses this fact. QED

Theorem (Completeness wrt. small-step semantics) If $e \longrightarrow^{*} v$ then $\emptyset \vdash_2 e : A$ and $\emptyset \vdash_2 v : A'$ for some $A,A'$ such that $A' <: A$.
Proof
We have $\emptyset \vdash e : B$ and $\emptyset \vdash v : B$ by Completeness of the type system with subsumption. Therefore $\emptyset \vdash_2 e : A$ and $A <: B$ by Theorem Equivalent Type Systems. By preservation we conclude that $\emptyset \vdash_2 v : A'$ and $A' <: A$. QED

In a previous blog post, we proved soundness with respect to big-step semantics for a slightly different denotational semantics. So we update that proof for the denotational semantics defined above. We shall make use of the following logical relation $\mathcal{G}$ in this proof. \begin{align*} G[n] &= \{ n \} \\ G[A \to B] &= \{ \langle \lambda x.\, e, \rho \rangle \mid \forall v \in G[A]. \; \rho(x{:=}v) \vdash e \Downarrow v' \text{ and } v' \in G[B] \} \\ G[A \land B] &= G[A] \cap G[B] \\ G[\top] &= \{ v \mid v \in \mathrm{Values} \} \\ \\ G[\emptyset] &= \{ \emptyset \} \\ G[\Gamma,x:A] &= \{ \rho(x{:=}v) \mid v \in G[A] \text{ and } \rho \in G[\Gamma] \} \end{align*}

We shall need two lemmas about this logical relation.

Lemma (Lookup in $\mathcal{G}$)
If $x:A \in \Gamma$ and $\rho \in G[\Gamma]$, then $\rho(x) = v$ and $v \in G[A]$.

Lemma ($\mathcal{G}$ preserves subtyping )
If $A <: B$ and $v \in G[A]$, then $v \in G[B]$.

Theorem (Soundness wrt. big-step semantics)
If $\Gamma \vdash_2 e : A$ and $\rho \in G[\Gamma]$, then $\rho \vdash e \Downarrow v$ and $v \in G[A]$.
Proof
The proof is by induction on the typing derivation. The case for variables uses the Lookup Lemma and all of the elimination forms use the above Subtyping Lemma (because their typing rules use subtyping). QED

Lemma (Observing values of subtypes)
If $\emptyset \vdash_2 v : A$, $\emptyset \vdash_2 v' : B$, $A <: C$, and $B <: C$, then $\mathit{observe}(v) = \mathit{observe}(v')$.
Proof
The proof is by cases of $v$ and $v'$. We use Lemmas about the symmetry of subtyping for singletons, an inversion lemma for functions, and that subtyping preserves function types. QED

Lemma (Subtyping symmetry for singletons) If $n <: A$, then $A <: n$.

For the next lemma we need to characterize the types for functions. \begin{gather*} \frac{}{\mathit{fun}(A \to B)} \quad \frac{\mathit{fun}(A) \qquad \mathit{fun}(B)} {\mathit{fun}(A \land B)} \quad \frac{}{\mathit{fun}(\top)} \end{gather*}

Lemma (Inversion on Functions)
If $\Gamma \vdash_2 \lambda x.\, e : A$, then $\mathit{fun}(A)$.

Lemma (Subtyping preserves functions)
If $A <: B$ and $\mathit{fun}(A)$, then $\mathit{fun}(B)$.

Lemma (Diverging Programs have no meaning)
If $e$ diverges, then $E[e]\emptyset = \emptyset$.
Proof
Towards a contradiction, suppose $E[e]\emptyset \neq \emptyset$. Then we have $\emptyset \vdash_2 e : A$ for some $A$. Then by soundness wrt. big-step semantics, we have $\emptyset \vdash e \Downarrow v$ and so also $e \longrightarrow^{*} v'$. But this contradicts the premise that $e$ diverges. QED

Lemma (Programs that get stuck have no meaning)
Suppose that $e \longrightarrow^{*} e'$ and $e'$ is stuck (and not a value). Then $E[e]\emptyset = \emptyset$.
Proof
Towards a contradiction, suppose $E[e]\emptyset \neq \emptyset$. Then we have $\emptyset \vdash_2 e : A$ for some $A$. Therefore $\emptyset \vdash_2 e' : A'$ for some $A' <: A$. By Progress, either $e'$ is a value or it can take a step. But that contradicts the premise. QED

Lemma (Programs with no meaning diverge or gets stuck)
If $E[e]\emptyset = \emptyset$, then $e$ diverges or reduces to a stuck non-value.
Proof
Towards a contradiction, suppose that $e$ does not diverge and does not reduce to a stuck non-value. So $e \longrightarrow^{*} v$ for some $v$. But then by Completeness wrt. the small-step semantics, we have $\emptyset \vdash_2 e : A$ for some $A$, which contradicts the premise $E[e]\emptyset = \emptyset$. QED

Related Work

The proof method used here, of proving Compatibility and Congruence lemmas to show soundness wrt. contextual equivalence, is adapted from Gunter's book (1992), where he proves that the standard model for PCF (CPO's and continuous functions) is sound. This approach is also commonly used to show that logical relations are sound wrt. contextual equivalence (Pitts 2005).

The problem of full abstraction is to show that denotational equivalence is both sound (aka. correct): \[ E[e_1] = E[e_2] \qquad \text{implies} \qquad e_1 \simeq e_2 \] and complete: \[ e_1 \simeq e_2 \qquad \text{implies} \qquad E[e_1] = E[e_2] \] with respect to contextual equivalence (Milner 1975). Here we showed that the simple denotational semantics is sound. I do not know whether it is complete wrt. contextual equivalence.

There are famous examples of denotational semantics that are not complete. For example, the standard model for PCF is not complete. There are two expressions in PCF that are contextually equivalent but not denotationally equivalent (Plotkin 1977). The idea behind the counter-example is that parallel-or cannot be defined in PCF, but it can be expressed in the standard model. The two expressions are higher-order functions constructed to behave differently only when applied to parallel-or.

Rocca and Paolini (2004) define a filter model $\mathcal{V}$ for the call-by-value lambda calculus, similar to our simple denotational semantics, and prove that it is sound wrt. contextual equivalence (Theorem 12.1.18). Their type system and subtyping relation differs from ours in several ways. Their $\land\,\mathrm{intro}$ rule is not restricted to $\lambda$, they include subsumption, their $\top$ type is a super-type of all types (not just function types), they include the distributivity rule discussed at the beginning of this post, and they include a couple other rules (labeled $(g)$ and $(v)$ in Fig. 12.1). I'm not sure whether any of these differences really matter; the two systems might be equivalent. Their proof is quite different from ours and more involved; it is based on the notion of approximants. They also show that $\mathcal{V}$ is incomplete wrt. contextual equivalence, but go on to create another model based on $\mathcal{V}$ that is. The fact that $\mathcal{V}$ is incomplete leads me suspect that $\mathcal{E}$ is also incomplete. This is certainly worth looking into.

Abramsky (1990) introduced a domain logic whose formulas are intersetion types: \[ \phi ::= \top \mid \phi \land \phi \mid \phi \to \phi \] and whose proof theory is an intersection type system designed to capture the semantics of the lazy lambda calculus. Abramsky proves that it is sound with respect to contextual equivalence. As far as I can tell, the proof is different than the approach used here, as it shows that the domain logic is sound with respect to a denotational semantics that solves the domain equation $D = (D \to D)_\bot$, then shows that this denotational semantics is sound wrt. contextual equivalence. (See also Alan Jeffrey (1994).)

Jeremy Siek

Take-aways from using Deduce in the classroom

Survey Results

4 Multiple-choice Questions

3 Open-ended Questions

How did you feel when you completed a proof of a theorem using Deduce?

What changes or improvements to Deduce would you find most helpful? Please be as specific as you can.

What about Deduce did you dislike or find frustrating? Please be as specific as you can.

Binary Search Trees, Correctly!

The Search Interface

Function Implmentation of Search

Binary Tree Implementation of Search

Write the BST_search and BST_insert functions

Test

Prove

Binary Trees with In-order Iterators (Part 2)

Correctness of ti_first

Correctness of ti_next

Proving the next_up_index lemma

Exercise: prove the num_nodes_plug lemma

Back to the next_up_index lemma

Back to the proof of ti_next_index

Proof of ti_next_stable

Exercise: next_up_stable lemma

Back to ti_next_stable

Correctness of ti_get and ti_index

Exercise: prove the in_order_plug... lemmas

Back to the proof of ti_index_get_in_order

Exercise: prove the length_in_order theorem

Back to ti_index_get_in_order

Exercise: Prove that ti_prev is correct

Binary Trees with In-order Iterators (Part 1)

Binary Trees

In-order Tree Traversal

In-order Tree Iterators

Iterator Representation

The ti2tree Operation

The ti_first Operation

The ti_get Operation

The ti_next Operation

The ti_index Operation

Exercise: Implement and test the ti_prev Operation

Conclusion

Merge Sort with Leftovers, Correctly

Merge Sort with Leftovers

Write the merge_sort function

Test

Test merge

Test msort

Test merge_sort

Prove

Prove correctness of merge

Prove the mset_of_merge theorem

Prove the merge_sorted theorem

Prove correctness of msort

Prove correctness of merge_sort

Exercise: merge_length and msort_length

Exercise: classic Merge Sort

Insertion Sort, Correctly

Insertion Sort

Write the insertion_sort function

Test

Test insert

Test insertion_sort

Prove

Prove correctness of insert

Prove the correctness of insertion_sort

Exercise: tail-recursive variant of insertion_sort

Sequential Search, Correctly

Sequential Search

Write the search function

Test the search function

Prove search Correct

Prove search is less-or-equal length

Prove search(xs, y) finds an occurence of y

Prove search(xs, y) finds the first occurence of y

Prove that search fails only when it should

Exercise search_last

Exercise search_if

Data Structures and Algorithms, Correctly

Write the `BST_search` and `BST_insert` functions

Correctness of `ti_first`

Correctness of `ti_next`

Proving the `next_up_index` lemma

Exercise: prove the `num_nodes_plug` lemma

Back to the `next_up_index` lemma

Back to the proof of `ti_next_index`

Proof of `ti_next_stable`

Exercise: `next_up_stable` lemma

Back to `ti_next_stable`

Correctness of `ti_get` and `ti_index`

Exercise: prove the `in_order_plug...` lemmas

Back to the proof of `ti_index_get_in_order`

Exercise: prove the `length_in_order` theorem

Back to `ti_index_get_in_order`

Exercise: Prove that `ti_prev` is correct

The `ti2tree` Operation

The `ti_first` Operation

The `ti_get` Operation

The `ti_next` Operation

The `ti_index` Operation

Exercise: Implement and test the `ti_prev` Operation

Write the `merge_sort` function

Test `merge`

Test `msort`

Test `merge_sort`

Prove correctness of `merge`

Prove the `mset_of_merge` theorem

Prove the `merge_sorted` theorem

Prove correctness of `msort`

Prove correctness of `merge_sort`

Exercise: `merge_length` and `msort_length`

Write the `insertion_sort` function

Test `insert`

Test `insertion_sort`

Prove correctness of `insert`

Prove the correctness of `insertion_sort`

Exercise: tail-recursive variant of `insertion_sort`

Write the `search` function

Test the `search` function

Prove `search` Correct

Prove `search` is less-or-equal `length`

Prove `search(xs, y)` finds an occurence of `y`

Prove `search(xs, y)` finds the first occurence of `y`

Exercise `search_last`

Exercise `search_if`

Write `interval`

Test `interval`

Prove `interval` Correct

Prove the `interval_length` theorem

Prove the `interval_nth` theorem

Equations regarding `⊑ᴸᴿᵥ`

Elimination rules for `⊑ᴸᴿᵥ`

Introduction rules for `⊑ᴸᴿᵥ`