Friday, June 14, 2024

Sequential Search, Correctly

Sequential Search

This is the second blog post in a series about developing correct implementations of basic data structures and algorithms using the Deduce language and proof checker.

In this blog post we’ll study a classic and simple algorithm known as Sequential Search (aka. Linear Search). The basic idea of the algorithm is to look for the location of a particular item within a linked list, traversing the list front to back. Here is the specification of this search function.

Specification: The search(xs, y) function returns a natural number i such that i ≤ length(xs). If i < length(xs), then i is the index of the first occurence of y in the list xs. If i = length(xs), y is not in the list xs.

We follow the write-test-prove approach to develop a correct implementation of search. We then propose two exercises for the reader.

Write the search function

Before diving into the code for search, let us look again at the definition of the List type.

union List<T> {
  empty
  node(T, List<T>)
}

We say that List is a recursive union because one of its constructors has a parameter that is also of the List type (e.g. the second parameter of the node constructor).

In general, when defining a function with a parameter that is a recursive union, first consider making that function a recursive function that pattern-matches on that parameter.

For example, with search, we choose for the List<Nat> to be the first parameter so that we can pattern-match on it as follows.

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = ?
  search(node(x, xs'), y) = ?
}

Let us consider the case for the empty list. Looking at the specification of search, we need to return 0, the length of the empty list, because y is not in the empty list.

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = 0
  search(node(x, xs'), y) = ?
}

In the case for node(x, xs'), we can check whether x = y. If so, then we should return 0 because y is at index 0 of node(x, xs') and that is certainly the first occurence of y in node(x, xs').

function search(List<Nat>, Nat) -> Nat {
  search(empty, y) = 0
  search(node(x, xs'), y) =
    if x = y then
      0
    else
      ?
}

If x ≠ y, then we need to search the rest of the list xs' for y. We can make the recursive call search(xs', y), but then we need to decide how to adapt its result to produce a result that makes sense for node(x, xs'). The only way to reason about the result of a recursive call is to use the specification of the function. The specification of search splits into two cases on the result: (1) search(xs', y) < length(xs') and (2) length(xs) ≤ search(xs', y). In case (1), search(xs',y) is returning the index of the first y inside xs'. Because x ≠ y, that location will also be the first y inside node(x, xs'). However, we need to add one to the index to take into account that we’re adding a node to the front of the list. So for case (1), the result should be suc(search(xs', y)). In case (2), search(xs',y) did not find y in xs', so it is returning length(xs'). Because x ≠ y, we need to indicate that y is also not found in node(x, xs'), so we need to return length(node(x, xs')). Thus, we need to add one to the index, so the result should again be suc(search(xs', y)).

Here is the completed code for search.

Test the search function

Focusing on the specification of search, there are several things that we should test. First, we should test whether search always returns a number that is less-or-equal to the length of the list. We can use all_elements and interval to automate the testing over a bunch of values, some of which are in the list and some are not.

define list_1223 = node(1, node(2, node(2, node(3, empty))))

assert all_elements(interval(0, 5),
  λx{ search(list_1223, x) ≤ length(list_1223) })

Most importantly, we should test whether search finds the correct index of the elements in the list. To do that we can make use of nth to lookup the element at a given index.

assert all_elements(list_1223,
  λx{ nth(list_1223, 0)( search(list_1223, x) ) = x })

Next, we should test whether search finds the first occurence. We can do this by iterating over all the indexes and checking that what search returns is an index that is less-than or equal to the current index.

assert all_elements(interval(0, length(list_1223)),
   λi{ search(list_1223, nth(list_1223, 0)(i)) ≤ i })

Finally, we check that search fails gracefully when the value being searched for is not present in the list.

assert search(list_1223, 0) = length(list_1223)
assert search(list_1223, 4) = length(list_1223)

Prove search Correct

We break down the specification of search into four parts and prove four theorems.

Prove search is less-or-equal length

The first part of the specification of search says that the search(xs, y) function returns a natural number i such that i ≤ length(xs). Because search is recursive, we’re going to prove this by induction on its first parameter xs.

theorem search_length: all xs:List<Nat>. all y:Nat.
  search(xs, y) ≤ length(xs)
proof
  induction List<Nat>
  case empty {
    ?
  }
  case node(x, xs') 
    suppose IH: all y:Nat. search(xs',y) ≤ length(xs') 
  {
    ?
  }
end

In the case for xs = empty, Deduce tells us that we need to prove

Goal:
    all y:Nat. search(empty,y) ≤ length(empty)

So we start with arbitrary y:Nat and then conclude using the definitions of search, length, and operator ≤.

    // <<search_length_case_empty>> =
    arbitrary y:Nat
    conclude search(empty,y) ≤ length[Nat](empty)
        by definition {search, length, operator ≤}

In these blog post we use a literate programming tool named Entangled to translate the markdown files into Deduce proof files. Entangled lets us label chunks of proof and then paste them into larger proofs. So that you can see the label names, we include them in comments, as in the <<search_length_case_empty>> label above.

In the case for xs = node(x, xs'), Deduce tells us that we need to prove

Goal:
    all y:Nat. search(node(x,xs'),y) ≤ length(node(x,xs'))

So we start with arbitrary y:Nat and note that the definitions of search has an if-then-else, so we proceed with a switch-for statement.

    arbitrary y:Nat
    switch x = y for search {
      case true {
        ?
      }
      case false {
        ?
      }
    }

In the case for x = y, the goal becomes

0 ≤ length(node(x, xs'))

so we need to use the definition of length and then we can complete the proof using the definition of .

    // <<search_length_case_node_eq>> =
    suffices 0 ≤ 1 + length(xs')  with definition length
    definition operator ≤

In the case for x ≠ y, after applying the definitions of length, , and +, it remains to prove that search(xs', y) ≤ length(xs'). But that is just the induction hypothesis

    // <<search_length_case_node_not_eq>> =
    suffices search(xs', y) ≤ length(xs')
        with definition {length, operator ≤, operator+, operator+}
    IH[y]

Putting all of the pieces together, we have a complete proof of search_length.

theorem search_length: all xs:List<Nat>. all y:Nat.
  search(xs, y) ≤ length(xs)
proof
  induction List<Nat>
  case empty {
    <<search_length_case_empty>>
  }
  case node(x, xs') 
    suppose IH: all y:Nat. search(xs',y) ≤ length(xs') 
  {
    arbitrary y:Nat
    switch x = y for search {
      case true {
        <<search_length_case_node_eq>>
      }
      case false {
        <<search_length_case_node_not_eq>>
      }
    }
  }
end

Prove search(xs, y) finds an occurence of y

The specification of search(xs, y) says that if the result is less-than length(xs), then the result is the index of the first occurence of y in xs. First off, this means that search(xs, y) is indeed an index for y, which we can express using nth as follows.

nth(xs, 0)( search(xs, y) ) = y

So we can formulate the following theorem, which we’ll prove by induction on xs.

theorem search_present: all xs:List<Nat>. all y:Nat.
  if search(xs, y) < length(xs)
  then nth(xs, 0)( search(xs, y) ) = y
proof
  induction List<Nat>
  case empty {
    ?
  }
  case node(x, xs') suppose IH {
    ?
  }
end

In the case for xs = empty, we proceed in a goal-directed way using arbitrary for the all y and then suppose for the if.

    arbitrary y:Nat
    suppose prem: search(empty,y) < length[Nat](empty)
    ?

Then we need to prove

nth(empty, 0)(search(empty, y)) = y

but that looks impossible! So hopefully the premise is also false, which will let us finish this case using the principle of explosion. Indeed, applying all of the relevant definitions to the premise yields false.

    arbitrary y:Nat
    suppose prem: search(empty,y) < length[Nat](empty)
    conclude false by definition {search, length, operator <, operator ≤} 
                      in prem

Moving on to the case for xs = node(x, xs'), we again begin with arbitrary and suppose.

    arbitrary y:Nat
    suppose sxxs_len: search(node(x,xs'),y) < length(node(x,xs'))
    ?

Deduce tells us that we need to prove

Goal:
    nth(node(x,xs'),0)(search(node(x,xs'),y)) = y

We see search applied to a node argument and note that again that the body of search contains an if-then-else, so we proceed with a switch-for statement.

    switch x = y for search {
      case true suppose xy_true {
        ?
      }
      case false suppose xy_false {
        ?
      }
    }

In the case where x = y, Deduce tells us that we need to prove

Goal:
    nth(node(x,xs'),0)(0) = y

We conclude using the definition of nth and the fact that x = y.

    suffices x = y with definition nth
    rewrite xy_true

In the case where x ≠ y, we need to prove

Goal:
    nth(node(x,xs'),0)(suc(search(xs',y))) = y

Now if we apply the definitions of nth and pred, the goal becomes:

    // <<search_present_case_node_nth_pred>> =
    suffices nth(xs', 0)(search(xs', y)) = y
        with definition {nth, pred}

This looks a lot like the conclusion of our induction hypothesis:

Givens:
    ...
    IH: all y:Nat. (if search(xs',y) < length(xs') 
                    then nth(xs',0)(search(xs',y)) = y)

So we just need to prove the premise of the IH, that search(xs',y) < length(xs'). Thankfully, that can be proved from the premise search(node(x,xs'),y) < length(node(x,xs')).

  // <<search_present_IH_premise>> =
    have sxs_len: search(xs',y) < length(xs')
      by enable {search, length, operator <, operator ≤, 
                 operator+, operator+}
         rewrite xy_false in sxxs_len

We conclude by applying the induction hypothesis.

  conclude nth(xs',0)(search(xs',y)) = y
    by apply IH[y] to sxs_len

Here is the the complete proof of search_present.

theorem search_present: all xs:List<Nat>. all y:Nat.
  if search(xs, y) < length(xs)
  then nth(xs, 0)( search(xs, y) ) = y
proof
  induction List<Nat>
  case empty {
    <<search_present_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat
    suppose sxxs_len: search(node(x,xs'),y) < length(node(x,xs'))
    switch x = y for search {
      case true suppose xy_true {
        <<search_present_case_node_eq>>
      }
      case false suppose xy_false {
        <<search_present_case_node_nth_pred>>
        <<search_present_IH_premise>>
        <<search_present_apply_IH>>
      }
    }
  }
end

Prove search(xs, y) finds the first occurence of y

Going back to the specification of search(xs, y), it says that if the result is less-than length(xs), then the result is the index of the first occurence of y in xs. To be the first means that the result is smaller than the index of any other occurence of y. We express that in the following theorem.

theorem search_first: all xs:List<Nat>. all y:Nat, i:Nat.
  if search(xs, y) < length(xs) and nth(xs, 0)(i) = y
  then search(xs, y) ≤ i

We proceed by induction on xs. We can handle the case for xs = empty in the same way as in search_present; the premise is false.

    // <<search_first_case_empty>> =
    arbitrary y:Nat, i:Nat
    suppose prem: search(empty,y) < length[Nat](empty) and nth(empty,0)(i) = y
    conclude false by definition {search, length, operator <, operator ≤} 
                      in prem

In the case for xs = node(x, xs'), we proceed in a goal-directed fashion with an arbitrary and suppose.

  case node(x, xs') suppose IH {
    arbitrary y:Nat, i:Nat
    suppose prem: search(node(x,xs'),y) < length(node(x,xs')) 
                  and nth(node(x,xs'),0)(i) = y,
    ?
  }

Deduce response with

Goal:
    search(node(x,xs'),y) ≤ i

We apply the definition of search and switch on x = y with a switch-for statement.

  switch x = y for search {
    case true {
      ?
    }
    case false suppose xs_false {
      ?
    }
  }

In the case where x = y, the result of search is 0, so just need to prove that 0 ≤ i, which follows from the definition of .

    conclude 0 ≤ i   by definition operator ≤

In the case where x ≠ y, we need to prove

Goal:
    suc(search(xs',y)) ≤ i

What do we now about i? The premise nth(node(x,xs'),0)(i) = y tells us that i ≠ 0, which means that i is the successor of some other number i′.

    // <<search_first_case_node_false_1>> =
    have not_iz: not (i = 0)
      by suppose i_z 
         conclude false by rewrite i_z | xy_false in 
                           definition nth in prem
    obtain i' where i_si: i = suc(i') from apply not_zero_suc to not_iz
    suffices suc(search(xs', y)) ≤ suc(i')  with rewrite i_si

Now we can further simplify the goal with the definition of .

    // <<search_first_case_node_false_2>> =
    suffices search(xs', y) ≤ i'   with definition operator≤ 

The goal looks like the conclusion of the induction hypothesis instantiated at i'.

Givens:
    ...
    IH: all y:Nat, i:Nat. (if search(xs',y) < length(xs') and nth(xs',0)(i) = y 
                           then search(xs',y) ≤ i)

So we need to prove the two premises of the IH. They follow from the given prem:

Givens:
    prem: search(node(x,xs'),y) < length(node(x,xs')) 
          and nth(node(x,xs'),0)(i) = y

In particular, the first premise of IH follows from the first conjunct of prem.

    // <<search_first_IH_prem_1>> =
    have IH_prem_1: search(xs',y) < length(xs')
      by enable {search, length, operator <, operator ≤, 
                 operator+, operator+}
         rewrite xy_false in (conjunct 0 of prem)

The second premise of the IH follows from the second conjunct of prem.

    // <<search_first_IH_prem_2>> =
    have IH_prem_2: nth(xs',0)(i') = y
      by enable {nth, pred} rewrite i_si in (conjunct 1 of prem)

We conclude the case for i = suc(i') by applying the induction hypothesis.

    // <<search_first_apply_IH>> =
    apply IH[y,i'] to IH_prem_1, IH_prem_2

Here is the complete proof of search_first.

theorem search_first: all xs:List<Nat>. all y:Nat, i:Nat.
  if search(xs, y) < length(xs) and nth(xs, 0)(i) = y
  then search(xs, y) ≤ i
proof
  induction List<Nat>
  case empty {
    <<search_first_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat, i:Nat
    suppose prem: search(node(x,xs'),y) < length(node(x,xs')) 
                  and nth(node(x,xs'),0)(i) = y
    switch x = y for search {
      case true {
        <<search_first_case_node_true>>
      }
      case false suppose xy_false {
        <<search_first_case_node_false_1>>
        <<search_first_case_node_false_2>>
        <<search_first_IH_prem_1>>
        <<search_first_IH_prem_2>>
        <<search_first_apply_IH>>
      }
    }
  }
end

Prove that search fails only when it should

The last sentence in the specification for search(xs, y) says that if i = length(xs), y is not in the list xs. How do we express that y is not in the list? In some sense, that is what search is for, but it would be vacuous to prove a theorem that says search returns length(xs) if search returns lengt(xs). Instead we need an alternative and intuitive way to express membership in a list.

One approach to expressing list membership that works well is to convert the list to a set and then use set membership. The file Set.pf defines the Set type, operations on sets such as memberhsip, union, and intersection. The Set.pf files also proves many theorems about these operations. The following set_of function converts a list into a set.

function set_of<T>(List<T>) -> Set<T> {
  set_of(empty) = ∅
  set_of(node(x, xs)) = single(x) ∪ set_of(xs)
}

We can now express our last correctness theorem for search as follows.

theorem search_absent: all xs:List<Nat>. all y:Nat, d:Nat.
  if search(xs, y) = length(xs)
  then not (y ∈ set_of(xs))

We proceed by induction on xs. In the case for xs = empty, we take the following goal-directed steps

  case empty {
    arbitrary y:Nat, d:Nat
    suppose _
    ?
  }

and Deduce responds with

Goal:
    not y ∈ set_of(empty)

which we prove using the definition of set_of and the empty_no_members theorem from Set.pf.

    // <<search_absent_case_empty>> =
    arbitrary y:Nat, d:Nat
    suppose _
    suffices not (y ∈ ∅) with definition set_of
    empty_no_members[Nat,y]

Turning to the case for xs = node(x, xs'), we take several goal-directed steps.

  case node(x, xs') suppose IH {
    arbitrary y:Nat, d:Nat
    suppose s_xxs_len_xxs: search(node(x,xs'),y) = length(node(x,xs'))
    suffices not (y ∈ single(x) ∪ set_of(xs'))  with definition set_of
    ?
  }

Now we need to prove a not formula:

Goal:
    not (y ∈ single(x) ∪ set_of(xs'))

So we assume y ∈ single(x) ∪ set_of(xs') and then prove false (a contradiction).

  suppose y_in_x_union_xs: y ∈ single(x) ∪ set_of(xs')

The main information we have to work with is the premise s_xxs_len_xxs above, concerning search(node(x,xs'), y). Thinking about the code for search, we know it will branch on whether x = y, so we better switch on that.

  switch x = y {
    case true suppose xy {
      ?
    }
    case false suppose not_xy {
      ?
    }
  }

In the case where x = y, we have search(node(x,xs'),y) = 0 but length(node(x,xs')) is 1 + length(xs'), so we have a contradiction.

    // <<search_absent_case_node_equal>> =
    have xy: x = y by rewrite xy_true
    have s_yxs_len_yxs: search(node(y,xs'),y) = length(node(y,xs'))
        by rewrite xy in s_xxs_len_xxs
    have zero_1_plus: 0 = 1 + length(xs')
        by definition {search, length} in s_yxs_len_yxs
    conclude false  by definition {operator+} in zero_1_plus

In the case where x ≠ y, we can show that y ∈ set_of(xs') and then invoke the induction hypothesis to obtain the contradition. In particular, the premise y_in_x_union_xs gives us y ∈ single(x) or y ∈ set_of(xs'). But x ≠ y implies not (y ∈ single(x)). So it must be that y ∈ set_of(xs') (using or_not from Base.pf).

  // <<search_absent_case_node_notequal_y_in_xs>> =
  have ysx_or_y_xs: y ∈ single(x) or y ∈ set_of(xs')
      by apply member_union[Nat] to y_in_x_union_xs
  have not_ysx: not (y ∈ single(x))
    by suppose ysx
       rewrite xy_false in
       apply single_equal[Nat] to ysx
  have y_xs: y ∈ set_of(xs')
    by apply or_not[y ∈ single(x), y ∈ set_of(xs')] 
       to ysx_or_y_xs, not_ysx

To satisfy the premise of the induction hypothesis, we prove the following.

    // <<search_absent_IH_prem>> =
    have sxs_lxs: search(xs',y) = length(xs')
      by injective suc
         rewrite xy_false in
         definition {search,length,operator+,operator+} in
         s_xxs_len_xxs

So we apply the induction hypothesis to get y ∉ set_of(xs'), which contradicts y ∈ set_of(xs).

  // <<search_absent_apply_IH>> =
  have y_not_xs: not (y ∈ set_of(xs'))
    by apply IH[y,d] to sxs_lxs
  conclude false  by apply y_not_xs to y_xs

Here is the complete proof of search_absent.

theorem search_absent: all xs:List<Nat>. all y:Nat, d:Nat.
  if search(xs, y) = length(xs)
  then not (y ∈ set_of(xs))
proof
  induction List<Nat>
  case empty {
    <<search_absent_case_empty>>
  }
  case node(x, xs') suppose IH {
    arbitrary y:Nat, d:Nat
    suppose s_xxs_len_xxs: search(node(x,xs'),y) = length(node(x,xs'))
    suffices not (y ∈ single(x) ∪ set_of(xs'))  with definition set_of
    suppose y_in_x_union_xs: y ∈ single(x) ∪ set_of(xs')
    switch x = y {
      case true suppose xy_true {
        <<search_absent_case_node_equal>>
      }
      case false suppose xy_false {
        <<search_absent_case_node_notequal_y_in_xs>>
        <<search_absent_IH_prem>>
        <<search_absent_apply_IH>>
      }
    }
  }
end

Exercise search_last

Apply the write-test-prove approach to develop a correct implementation of the search_last(xs, y) function, which is like search(xs, y) except that it finds the last occurence of y in xs instead of the first.

In particular, you need to

  • write a specification for search_last,
  • write the code for search_last,
  • test search_last on diverse inputs, and
  • prove that search_last is correct.
function search_last(List<Nat>, Nat) -> Nat {
    FILL IN HERE
}

Exercise search_if

The search_if(xs, P) function is a generalization of search(xs, y). Instead of searching for the first occurence of element y, the search_if function searches for the location of the first element that satisfied predicate P (i.e. an element y in xs such that P(y) is true). Apply the write-test-prove approach to develop a correct implementation of search_if.

In particular, you need to

  • write a specification for search_if,
  • write the code for search_if,
  • test search_if on diverse inputs, and
  • prove that search_if is correct.
function search_if<T>(List<T>, fn T->bool) -> Nat {
    FILL IN HERE
}

2 comments:

  1. Anonymous1:56 AM

    What specification language is this? Lean?

    ReplyDelete
    Replies
    1. The language is named Deduce.
      https://github.com/jsiek/deduce

      Delete