This is the seventh blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.
This post continues on the theme of binary trees, that is, trees in which each node has at most two children. The focus of this post is to implement the Search interface, described next, using binary trees.
The Search Interface
The Search interface includes operations to (1) create an empty data structure, (2) search for a value based on its associated key, and (3) insert a new key-value association.
Function Implmentation of Search
The Search interface can also be implemented in a simple but less efficient way, using a function to map keys to values. With this approach, the operation to search for a value is just function call. The Maps.pf
file defines the empty_map
operation, which returns a function that maps every input to none
.
assert @empty_map<Nat,Nat>(5) = none
The Maps.pf
file also defined the update(f, k, v)
operation, which returns a function that associates the key k
with v
but otherwise behaves like the given function f
. Here is an example use of update
.
define m2 = update(@empty_map<Nat,Nat>, 4, just(99))
assert m2(4) = just(99)
assert m2(5) = none
We will use this function implementation of the Search interface to specify the correctness of the binary tree implementation of Search.
Binary Tree Implementation of Search
We will store the keys and their values in a binary tree and implement BST_search
and BST_insert
operations. These operations are efficient (logarithmic time) when the binary tree is balanced, but we will save how to balance trees for a later blog post.
The main idea of a binary search tree comes from the notion of binary search on a sequence, that is, keep the sequennce in sorted-order and when searching for a key, start in the middle and go left by half of the subsequence if the key you’re looking for is less than the one at the current position; go right by half of the subsequence if the key is greater than the one at the current position. Of course, if they are equal, then you’ve found what you’re looking for. Thus, binary search is just like looking up the definition of a word in a dictionary. The word is your key and the dictionary is sorted alphabetically. You can start in the middle and compare your word to those on the current page, then flip to the left or right depending on whether your word is lower or higher in the alphabet.
The binary search tree adapts the idea of binary search from a sequence to a tree. Each node in the tree stores a key and its value. The left subtree of the node contain keys that are less than the node and the right subtree contain keys that are greater than the node. Thus, when searching for a key, one can compare it to the current node and then either go left or right depending on whether the key is less-than or greater-than the current node.
Consider the following diagram of a binary search tree. For simplicity, we will use numbers for both the keys and the values. In this diagram the key is listed before the colon and the value is after the colon. For example, this tree contains
- key
10
associated with value32
, - key
13
associated with value63
, - etc.
The following code builds this binary search tree using the Tree
union type defined in the Binary Tree blog post and the Pair
type from the Pair.pf
file.
define mt = @EmptyTree<Pair<Nat,Nat>>
define BST_1 = TreeNode(mt, pair(1, 53), mt)
define BST_9 = TreeNode(mt, pair(9, 42), mt)
define BST_6 = TreeNode(BST_1, pair(6, 85), BST_9)
define BST_11 = TreeNode(mt, pair(11, 99), mt)
define BST_13 = TreeNode(BST_11, pair(13, 69), mt)
define BST_19 = TreeNode(mt, pair(19, 74), mt)
define BST_14 = TreeNode(BST_13, pair(14, 27), BST_19)
define BST_10 = TreeNode(BST_6, pair(10, 32), BST_14)
There are three operations in the binary search tree interface and here are their specifications.
The
EmptyTree
constructor from theTree
union type, which builds a binary search tree that does not contain any key-value associations.BST_search : fn Tree<Pair<Nat,Nat>> -> (fn Nat -> Option<Nat>)
The operation
BST_search(T)
returns a function that maps each key to its associated value.BST_insert : fn Tree<Pair<Nat,Nat>>, Nat, Nat -> Tree<Pair<Nat,Nat>>
The operation
BST_insert(T, k, v)
produces a new tree that associates valuev
with keyk
and for all other keys, associates keys with the values according to treeT
. In other words,BST_insert(T, k, v) = update(BST_search(T), k, v)
.
Write the BST_search
and BST_insert
functions
The BST_search
function is recursive over the Tree
parameter. If the tree is empty, the result is none
. Otherwise, we compare the key k
with the key in the current node x
. If they are equal, return the value in the current node; if k
is less-than, recursively search the left subtree; if k
is greater-than, recursively search the right subtree.
function BST_search(Tree<Pair<Nat,Nat>>) -> fn Nat -> Option<Nat> {
BST_search(EmptyTree) = λk{ none }
BST_search(TreeNode(L, x, R)) = λk{
if k = first(x) then
just(second(x))
else if k < first(x) then
BST_search(L)(k)
else
BST_search(R)(k)
}
}
The BST_insert
function follows a similar control structure: recursive over the Tree
parameter followed by an if
-then
-else
based on the key k
and the key of the current node. However, BST_insert
returns a new tree that contains the specified key and value. When the key k
is already in the tree, BST_insert
overrides the current value with the new value, as implied by the specification above.
function BST_insert(Tree<Pair<Nat,Nat>>, Nat, Nat) -> Tree<Pair<Nat,Nat>> {
BST_insert(EmptyTree, k, v) = TreeNode(EmptyTree, pair(k, v), EmptyTree)
BST_insert(TreeNode(L, x, R), k, v) =
if k = first(x) then
TreeNode(L, pair(k, v), R)
else if k < first(x) then
TreeNode(BST_insert(L, k, v), x, R)
else
TreeNode(L, x, BST_insert(R, k, v))
}
Test
We test the correctness of the EmptyTree
, BST_search
, and BST_insert
operations by making sure they behave according to their specification. Starting with EmptyTree
, the result of BST_search
with any key should be none
.
assert BST_search(EmptyTree)(5) = none
After inserting key 10
with value 32
, the result of BST_search
on 10
should be 32
. For other keys, such as 5
, the result should be the same as for EmptyTree
.
define BST_a = BST_insert(EmptyTree, 10, 32)
assert BST_search(BST_a)(10) = just(32)
assert BST_search(BST_a)(5) = none
The story is similar for inserting key 6
with value 85
.
define BST_b = BST_insert(BST_a, 6, 85)
assert BST_search(BST_b)(6) = just(85)
assert BST_search(BST_b)(10) = just(32)
assert BST_search(BST_b)(5) = none
If we insert with the same key 6
but a different value 59
, the result of BST_search
for 6
should be the new value 59
. For other keys, the result of BST_search
remains the same.
define BST_c = BST_insert(BST_b, 6, 59)
assert BST_search(BST_c)(6) = just(59)
assert BST_search(BST_c)(10) = just(32)
assert BST_search(BST_c)(5) = none
Prove
Starting with EmptyTree
, we prove that applying BST_search
produces an empty map.
theorem BST_search_EmptyTree:
BST_search(EmptyTree) = λk{none}
proof
extensionality
arbitrary k:Nat
conclude BST_search(EmptyTree)(k) = none
by definition BST_search
end
The main correctness theorem is to show that BST_insert
behaves the same as update
. That is to say, applying BST_insert
to a tree followed by BST_search
is the same as first applying BST_search
and then applying update
.
theorem BST_search_insert_udpate: all T:Tree<Pair<Nat,Nat>>. all k:Nat, v:Nat.
BST_search(BST_insert(T, k, v)) = update(BST_search(T), k, just(v))
The proof is by induction on the tree. For the case T = EmptyTree
, we start by using extensionality
to apply both sides of the equation to an arbitrary number i
. We then expand BST_insert(EmptyTree, k, v)
.
// <<BST_search_insert_empty_ext>> =
arbitrary k:Nat, v:Nat
extensionality
arbitrary i:Nat
suffices BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
= update(BST_search(EmptyTree), k, just(v))(i) with definition BST_insert
Looking at the definition of BST_search
, the left-hand side will either be none
or just(v)
depending on whether i
is less-than, equal to, or greater than k
. So we proceed with case analysis using the trichotomy
theorem from Nat.pf
.
// <<BST_search_insert_empty_tri>> =
cases trichotomy[i][k]
case i_less_k: i < k {
<<BST_search_insert_empty_less>>
}
case i_eq_k: i = k {
<<BST_search_insert_empty_equal>>
}
case i_greater_k: k < i {
<<BST_search_insert_empty_greater>>
}
Indeed, when i
is less than k
, both the left-hand side and the right-hand side are equal to none
.
// <<BST_search_insert_empty_less>> =
have not_i_eq_k: not (i = k) by apply less_not_equal to i_less_k
equations
BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
= @none<Nat>
by definition {BST_search, BST_search, first, second}
and rewrite not_i_eq_k | i_less_k
... = update(BST_search(EmptyTree), k, just(v))(i)
by definition {BST_search, update} and rewrite not_i_eq_k
When i
is equal to k
, both sides are equal to just(v)
.
// <<BST_search_insert_empty_equal>> =
equations
BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
= just(v)
by definition {BST_search, first, second} and rewrite i_eq_k
... = update(BST_search(EmptyTree), k, just(v))(i)
by definition {BST_search, update} and rewrite i_eq_k
When i
is greater than k
, both side are equal to none
.
// <<BST_search_insert_empty_greater>> =
have not_k_eq_i: not (k = i) by apply less_not_equal to i_greater_k
have not_i_eq_k: not (i = k) by suppose ik apply not_k_eq_i to symmetric ik
have not_i_less_k: not (i < k)
by apply less_implies_not_greater to i_greater_k
equations
BST_search(TreeNode(EmptyTree, pair(k, v), EmptyTree))(i)
= @none<Nat>
by definition {BST_search, BST_search, first, second}
and rewrite not_i_eq_k | not_i_less_k
... = update(BST_search(EmptyTree), k, just(v))(i)
by definition {BST_search, update}
and rewrite not_i_eq_k
Next we consider the case where T = TreeNode(L, x, R)
. Again we begin with extensionality
.
// <<BST_search_insert_node_ext>> =
arbitrary k:Nat, v:Nat
extensionality
arbitrary i:Nat
suffices BST_search(BST_insert(TreeNode(L, x, R), k, v))(i)
= update(BST_search(TreeNode(L, x, R)), k, just(v))(i) by .
Looking at BST_insert(TreeNode(L, x, R), k, v)
, its result will depend on whether k
is less than, equal to, or greater than first(x)
. So we proceed by cases, using the trichotomy
theorem from Nat.py
.
// <<BST_search_insert_node_tri>> =
cases trichotomy[k][first(x)]
case k_less_fx: k < first(x) {
<<BST_search_insert_node_k_less_fx>>
}
case k_eq_fx: k = first(x) {
<<BST_search_insert_node_k_equal_fx>>
}
case k_greater_fx: first(x) < k {
<<BST_search_insert_node_k_greater_fx>>
}
For the case k < first(x)
, we have
BST_insert(TreeNode(L, x, R), k, v)
= BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
so it suffices to prove the following.
// <<BST_search_insert_node_k_less_fx_suffices>> =
have not_k_eq_fx: not (k = first(x)) by apply less_not_equal to k_less_fx
suffices BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
= update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
with definition {BST_insert} and rewrite not_k_eq_fx | k_less_fx
The result of BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
depends on the relationship between i
and first(x)
, so we again proceed by cases using the trichotomy
theorem. There sure are a lot of cases in this proof!
// <<BST_search_insert_node_k_less_fx_tri>> =
cases trichotomy[i][first(x)]
case i_less_fx: i < first(x) {
<<BST_search_insert_node_k_less_fx_i_less_fx>>
}
case i_eq_fx: i = first(x) {
<<BST_search_insert_node_k_less_fx_i_eq_fx>>
}
case fx_less_i: first(x) < i {
<<BST_search_insert_node_k_less_fx_i_greater_fx>>
}
For the case i < first(x)
, we proceed by the following several steps of equational reasoning, shown below. The key step is applying the induction hypothesis for the left subtree L
.
// <<BST_search_insert_node_k_less_fx_i_less_fx>> =
have not_i_eq_fx: not (i = first(x)) by apply less_not_equal to i_less_fx
equations
BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
= BST_search(BST_insert(L, k, v))(i)
by definition{BST_search} and rewrite not_i_eq_fx | i_less_fx
... = update(BST_search(L), k, just(v))(i)
by rewrite IH_L[k,v]
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i) by
switch i = k {
case true suppose ik_true {
definition {BST_search,update} and rewrite ik_true
}
case false suppose ik_false {
definition {BST_search,update}
and rewrite ik_false | not_i_eq_fx | i_less_fx
}
}
For the case i = first(x)
, both sides simplify to just(second(x))
.
// <<BST_search_insert_node_k_less_fx_i_eq_fx>> =
have not_fx_eq_k: not (first(x) = k)
by suppose fx_eq_k
conclude false by rewrite not_k_eq_fx in symmetric fx_eq_k
equations
BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
= just(second(x))
by definition {BST_search} and rewrite i_eq_fx
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
by definition {BST_search,update} and rewrite i_eq_fx | not_fx_eq_k
For the case first(x) < i
, both side are equal to BST_search(R)(i)
. because we know that i ≠ k
.
// <<BST_search_insert_node_k_less_fx_i_greater_fx>> =
have not_i_eq_fx: not (i = first(x))
by suppose i_eq_fx
apply (apply less_not_equal to fx_less_i) to symmetric i_eq_fx
have not_i_less_fx: not (i < first(x))
by apply less_implies_not_greater to fx_less_i
have not_i_eq_k: not (i = k)
by suppose i_eq_k
have fx_less_k: first(x) < k by rewrite i_eq_k in fx_less_i
have not_k_less_fx: not (k < first(x))
by apply less_implies_not_greater to fx_less_k
conclude false by apply not_k_less_fx to rewrite k_less_fx
equations
BST_search(TreeNode(BST_insert(L, k, v), x, R))(i)
= BST_search(R)(i)
by definition BST_search and rewrite not_i_eq_fx | not_i_less_fx
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
by definition {BST_search, update}
and rewrite not_i_eq_k | not_i_eq_fx | not_i_less_fx
This completes the proof of the case for k < first(x)
.
<<BST_search_insert_node_k_less_fx_suffices>>
<<BST_search_insert_node_k_less_fx_tri>>
Next consider the case for k = first(x)
. We have
BST_insert(TreeNode(L, x, R), k, v)
= TreeNode(L, pair(k, v), R)
so it suffices to prove the following
// <<BST_search_insert_node_k_equal_fx_suffices>> =
suffices BST_search(TreeNode(L, pair(k, v), R))(i)
= update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
by definition BST_insert and rewrite k_eq_fx
Looking at the definition of BST_search
, the result of BST_search(TreeNode(L, pair(k, v), R))(i)
will depend on the relationship between i
and k
. So we proceed by cases, using the trichotomy
theorem.
// <<BST_search_insert_node_k_equal_fx_tri>> =
cases trichotomy[i][k]
case i_less_k: i < k {
<<BST_search_insert_node_k_equal_fx_i_less_k>>
}
case i_eq_k: i = k {
<<BST_search_insert_node_k_equal_fx_i_eq_k>>
}
case k_less_i: k < i {
<<BST_search_insert_node_k_equal_fx_i_greater_k>>
}
When i < k
, both sides of the equation are equal to BST_search(L)(i)
.
// <<BST_search_insert_node_k_equal_fx_i_less_k>> =
have not_i_eq_k: not (i = k) by apply less_not_equal to i_less_k
equations
BST_search(TreeNode(L, pair(k, v), R))(i)
= BST_search(L)(i)
by definition {BST_search, first}
and rewrite not_i_eq_k | i_less_k
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
by definition {update,BST_search}
and rewrite symmetric k_eq_fx | not_i_eq_k | i_less_k
When i = k
, both sides are equal to just(v)
.
// <<BST_search_insert_node_k_equal_fx_i_eq_k>> =
suffices BST_search(TreeNode(L, pair(k, v), R))(k)
= update(BST_search(TreeNode(L, x, R)), k, just(v))(k)
with rewrite i_eq_k
equations
BST_search(TreeNode(L, pair(k, v), R))(k)
= just(v) by definition {BST_search, first, second}
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(k)
by definition {BST_search, update}
When i > k
, both sides are equal to BST_search(R)(i)
.
have not_i_eq_k: not (i = k)
by have nki: not (k = i) by apply less_not_equal to k_less_i
suppose i_eq_k apply nki to symmetric i_eq_k
have not_i_less_k: not (i < k)
by apply less_implies_not_greater to k_less_i
equations
BST_search(TreeNode(L, pair(k, v), R))(i)
= BST_search(R)(i)
by definition {BST_search, first, second}
and rewrite not_i_eq_k | not_i_less_k
... = update(BST_search(TreeNode(L, x, R)), k, just(v))(i)
by definition {update, BST_search}
and rewrite symmetric k_eq_fx | not_i_eq_k | not_i_less_k
This concludes the proof of the case for k = first(x)
.
<<BST_search_insert_node_k_equal_fx_suffices>>
<<BST_search_insert_node_k_equal_fx_tri>>
The last case to prove is for k > first(x)
. We leave this as an exercise.
The following puts together the pieces of the proof for BST_search_insert_udpate
.
theorem BST_search_insert_udpate: all T:Tree<Pair<Nat,Nat>>. all k:Nat, v:Nat.
BST_search(BST_insert(T, k, v)) = update(BST_search(T), k, just(v))
proof
induction Tree<Pair<Nat,Nat>>
case EmptyTree {
<<BST_search_insert_empty_ext>>
<<BST_search_insert_empty_tri>>
}
case TreeNode(L, x, R) suppose IH_L, IH_R {
<<BST_search_insert_node_ext>>
<<BST_search_insert_node_tri>>
}
end