This is the fifth blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.
In this blog post we study binary trees, that is, trees in which each node has at most two children. We study the in-order tree traversal, as that will become important when we study binary search trees. Furthermore, we implement tree iterators that keep track of a location within the tree and can move forward with respect to the in-order traversal. We shall prove that our implementation of tree iterators is correct in Part 2 of this blog post.
Binary Trees
We begin by defining a union
for binary trees:
union Tree<E> {
EmptyTree
TreeNode(Tree<E>, E, Tree<E>)
}
For example, we can represent the following binary tree
with a bunch of tree nodes like so:
define T0 = TreeNode(EmptyTree, 0, EmptyTree)
define T2 = TreeNode(EmptyTree, 2, EmptyTree)
define T1 = TreeNode(T0, 1, T2)
define T4 = TreeNode(EmptyTree, 4, EmptyTree)
define T5 = TreeNode(T4, 5, EmptyTree)
define T7 = TreeNode(EmptyTree, 7, EmptyTree)
define T6 = TreeNode(T5, 6, T7)
define T3 = TreeNode(T1, 3, T6)
We define the height of a tree with the following recursive function.
function height<E>(Tree<E>) -> Nat {
height(EmptyTree) = 0
height(TreeNode(L, x, R)) = suc(max(height(L), height(R)))
}
The example tree has height 4
.
assert height(T3) = 4
We count the number of nodes in a binary tree with the num_nodes
function.
function num_nodes<E>(Tree<E>) -> Nat {
num_nodes(EmptyTree) = 0
num_nodes(TreeNode(L, x, R)) = suc(num_nodes(L) + num_nodes(R))
}
The example tree has 8
nodes.
assert num_nodes(T3) = 8
In-order Tree Traversal
Now for the main event of this blog post, the in-order tree traversal. The idea of this traversal is that for each node in the tree, we follow this recipe:
- process the left subtree
- process the current node
- process the right subtree
What it means to process a node can be different for different instantiations of the in-order traversal. But to make things concrete, we study an in-order traversal that produces a list. So here is our definition of the in_order
function.
function in_order<E>(Tree<E>) -> List<E> {
in_order(EmptyTree) = empty
in_order(TreeNode(L, x, R)) = append(in_order(L), node(x, in_order(R)))
}
The result of in_order
for T3
is the list 0,1,2,3,4,5,6,7
. As you can see, we chose the data values in T3
to match their position within the in-order traversal.
assert in_order(T3) = interval(8, 0)
In-order Tree Iterators
A tree iterator keeps track of a position with a tree. Our goal is to create a data structure to represent a tree iterator and also to implement the following operations on iterators, which we describe in the following paragraph.
ti2tree : < E > fn TreeIter<E> -> Tree<E>
ti_first : < E > fn Tree<E>,E,Tree<E> -> TreeIter<E>
ti_get : < E > fn TreeIter<E> -> E
ti_next : < E > fn TreeIter<E> -> TreeIter<E>
ti_index : < E > fn(TreeIter<E>) -> Nat
The
ti2tree
operator returns the tree that the iterator is traversing.The
ti_first
operator returns an iterator pointing to the first node (with respect to the in-order traversal) of a non-empty tree. We represent non-empty trees with three things: the left subtree, the data in the root node, and the right subtree.The
ti_get
operator returns the data of the node at the current position.The
ti_next
operator moves the iterator forward by one position.The
ti_index
operator returns the position of the iterator as a natural number.
Here is an example of creating an iterator for T3
and moving it forward.
define iter0 = ti_first(T1, 3, T6)
assert ti_get(iter0) = 0
assert ti_index(iter0) = 0
define iter3 = ti_next(ti_next(ti_next(iter0)))
assert ti_get(iter3) = 3
assert ti_index(iter3) = 3
define iter7 = ti_next(ti_next(ti_next(ti_next(iter3))))
assert ti_get(iter7) = 7
assert ti_index(iter7) = 7
Iterator Representation
We represent a position in the tree by recording a path of left-or-right decisions. For example, to represent the position of node 4
of the example tree, we record the path R,L,L
(R
for right and L
for left).
When we come to implement the ti_next
operation, we will sometimes need to climb the tree. For example, to get from 4
to 5
. To make that easier, we will store the path in reverse. So the path to node 4
will be stored as L,L,R
.
It would seem natural to store an iterator’s path separately from the tree, but doing so would complicate many of the upcoming proofs because only certain paths make sense for certain trees. Instead, we combine the path and the tree into a single data structure called a zipper (Huet, The Zipper, Journal of Functional Programming, Vol 7. Issue 5, 1997). The idea is to attach extra data to the left and right decisions and to store the subtree at the current position. So we define a union
named Direction
with constructors for left and right, and we define a union named TreeIter
that contains a path and the non-empty tree at the current position.
union Direction<E> {
LeftD(E, Tree<E>)
RightD(Tree<E>, E)
}
union TreeIter<E> {
TrItr(List<Direction<E>>, Tree<E>, E, Tree<E>)
}
The ti2tree
Operation
Of the tree iterator operations, we will first implement ti2tree
because it will help to explain this zipper-style representation. We start by defining the auxiliary function plug_tree
, which reconstructs a tree from a path and the subtree at the specified position. The plug_tree
function is defined by recursion on the path, so it moves upward in the tree with each recursive call. Consider the case for LeftD(x, R)
below. To plug tree t
into the path node(LeftD(x, R), path')
, we used the extra data stored in LeftD(x, R)
to create TreeNode(t, x, R)
which we then pass to the recursive call, to plug the new tree node into the rest of the path.
function plug_tree<E>(List<Direction<E>>, Tree<E>) -> Tree<E> {
plug_tree(empty, t) = t
plug_tree(node(f, path'), t) =
switch f {
case LeftD(x, R) {
plug_tree(path', TreeNode(t, x, R))
}
case RightD(L, x) {
plug_tree(path', TreeNode(L, x, t))
}
}
}
The ti2tree
operator simply invokes plug_tree
.
function ti2tree<E>(TreeIter<E>) -> Tree<E> {
ti2tree(TrItr(path, L, x, R)) = plug_tree(path, TreeNode(L, x, R))
}
Creating an iterator from a tree using ti_first
and then applying ti2tree
produces the original tree. Furthermore, moving an iterator does not change the tree that it is traversing, so ti2tree
returns T3
for iterators iter0
, iter3
, and iter7
.
assert ti2tree(iter0) = T3
assert ti2tree(iter3) = T3
assert ti2tree(iter7) = T3
The ti_first
Operation
Recall that the ti_first
operation returns an iterator pointing to the first node (with respect to the in-order traversal) of a non-empty tree. For example, applying ti_first
to T3
should give us node 0
. The idea to implement ti_first
is simple: we walk down the tree going left at each step, until we get to a leaf.
To implement ti_first
we define the auxiliary function first_path
that takes a non-empty tree and the path-so-far and proceeds going to the left down the tree. (The first_path
function will also come in handy when implementing ti_next
.)
function first_path<E>(Tree<E>, E, Tree<E>, List<Direction<E>>) -> TreeIter<E> {
first_path(EmptyTree, x, R, path) = TrItr(path, EmptyTree, x, R)
first_path(TreeNode(LL, y, LR), x, R, path) = first_path(LL, y, LR, node(LeftD(x, R), path))
}
We implement ti_first
simply as a call to first_path
where the path-so-far is empty
.
define ti_first : < E > fn Tree<E>,E,Tree<E> -> TreeIter<E>
= λ L,x,R { first_path(L, x, R, empty) }
As promised above, applying ti_first
to T3
gives us node 0
.
assert ti_get(ti_first(T1, 3, T6)) = 0
The ti_get
Operation
Recall that the ti_get
operator should return the data of the node at the current position. This is straightforward to implement because that data is stored directly in the tree iterator.
function ti_get<E>(TreeIter<E>) -> E {
ti_get(TrItr(path, L, x, R)) = x
}
The ti_next
Operation
Recall that the ti_next
operator moves the iterator forward by one position with respect to the in-order traversal. This operation is non-trivial to implement. Consider again our example tree.
Suppose the current node is 2
. Then the next node is 3
, which requires climbing a fair ways up the tree. On the other hand, if the current node is 3
, then the next node is 4
, way back down the tree. So there are two different scenarios that we need to handle.
If the current node has a right child, then the next node is the first node of the right child’s subtree (with respect to in-order traversal). For example, node
3
has right child6
, and the first node of that subtree is4
.If the current node does not have a right child, then the next node is the ancestor after the first left branch. For example, node
2
does not have a right child, so we go up the tree. We go up to1
via a right branch and then up to3
via a left branch, so3
is the next node of2
.
For (1) we already have first_path
, so we just need an auxiliary function for (2), which we call next_up
. This function takes a path and the current non-empty subtree and returns the iterator for the next position. If the direction is RightD
, we keep going up the tree. If the direction is LeftD(x, R)
, we stop and return an iterator for the parent node x
.
function next_up<E>(List<Direction<E>>, Tree<E>, E, Tree<E>) -> TreeIter<E> {
next_up(empty, A, z, B) = TrItr(empty, A, z, B)
next_up(node(f, path'), A, z, B) =
switch f {
case RightD(L, x) {
next_up(path', L, x, TreeNode(A, z, B))
}
case LeftD(x, R) {
TrItr(path', TreeNode(A, z, B), x, R)
}
}
}
Now that we have both next_up
and first_path
, we implement ti_next
by checking whether the right child R
is empty. If it is, we invoke next_up
, and if not, we invoke first_path
.
function ti_next<E>(TreeIter<E>) -> TreeIter<E> {
ti_next(TrItr(path, L, x, R)) =
switch R {
case EmptyTree {
next_up(path, L, x, R)
}
case TreeNode(RL, y, RR) {
first_path(RL, y, RR, node(RightD(L, x), path))
}
}
}
To see ti_next
in action, in the following we go from position 2
up to position 3
and then back down to position 4.
define iter2 = ti_next(ti_next(iter0))
assert ti_get(iter2) = 2
define iter3_ = ti_next(iter2)
assert ti_get(iter3_) = 3
define iter4 = ti_next(iter3_)
assert ti_get(iter4) = 4
The ti_index
Operation
Recall that the ti_index
operator returns the position of the iterator as a natural number. More specifically, ti_index
returns the position of the current node with respect to the in the in-order traversal. The following demonstrates this invariant on iter0
and iter7.
define L0 = in_order(ti2tree(iter0))
define i0 = ti_index(iter0)
assert ti_get(iter0) = nth(L0, 42)(i0)
define L7 = in_order(ti2tree(iter7))
define i7 = ti_index(iter7)
assert ti_get(iter7) = nth(L7, 42)(i7)
The idea for implementing ti_index
is that we’ll count how many nodes are in the portion of the tree that comes before the current position. We define an auxiliary function that constructs this portion of the tree, calling it ti_take
because it is reminiscent of the take(n, ls)
function in List.pf
, which returns the prefix of list ls
of length n
. Furthermore, we use a second auxiliary function named take_path
that applies this idea to the path of the iterator. So to implement the take_path
function, we throw away the subtrees to the right of the path (by removing LeftD(x, R)
) and we keep the subtrees to the left of the path (by keeping Right(L, x)
).
function take_path<E>(List<Direction<E>>) -> List<Direction<E>> {
take_path(empty) = empty
take_path(node(f, path')) =
switch f {
case RightD(L, x) {
node(RightD(L,x), take_path(path'))
}
case LeftD(x, R) {
take_path(path')
}
}
}
We implement ti_take
by applying take_path
to the path of the iterator, and then plug the left subtree L
into the result. (The node x
and subtree R
are not before node x
with respect to in-order traversal.)
function ti_take<E>(TreeIter<E>) -> Tree<E> {
ti_take(TrItr(path, L, x, R)) = plug_tree(take_path(path), L)
}
Finally, we implement ti_index
by counting the number of nodes in the tree returned by ti_take
.
define ti_index : < E > fn(TreeIter<E>) -> Nat = λ iter { num_nodes(ti_take(iter))}
Exercise: Implement and test the ti_prev
Operation
The ti_prev
operation (for previous) moves the iterator backward by one position with respect to in-order traversal.
ti_prev : < E > fn TreeIter<E> -> TreeIter<E>
Implement and test the ti_prev
operation.
Conclusion
This completes the implementation of the 5 tree iterator operations. In Part 2 of this blog post, we will prove that these operations are correct.
No comments:
Post a Comment