Wednesday, October 09, 2024

Take-aways from using Deduce in the classroom

During two weeks of September 2024 I experimented with using the Deduce proof assistant in an honors undergraduate data structures course at Indiana University. The rest of the semester is taught using the Java programming language.

The primary goal of the experiment was to find out whether it would be feasible to teach our undergraduate students how to use a proof assistant and prove correctness of some basic algorithms on linked lists. I gave three lectures about Deduce: (1) how to define functions and data types, (2) how to write proofs, and (3) an application of these ideas to defining insertion sort and proving its correctness. The teaching assistants conducted a lab each week. The first lab helped the students install Deduce and write some functions on linked lists, such as reverse, sum, and cumulative sum. The second lab asked the students to define a tail-recursive variant of insertion sort and prove its correctness, reusing the main lemma from the proof given in lecture. During the second week we also assigned some proof exercises as homework. A week after the second lab, I asked the students to complete a survey regarding the use of Deduce, and the results of the survey are shown below.

The students were able to complete the two labs and the exercises, so the main take-away from the experiment is, yes, it is feasible to teach undergraduate students at IU how to use a proof assistant in a relatively short amount of time! However, I think the survey results tell us that it was just barely feasible and the degree of frustration for the students was still too high. Fortunately, the survey also points to many things that can be improved about Deduce that will help to lower the frustration for beginners.

The number one challenge for students learning Deduce was learning the syntax and dealing with syntax errors. This comes up over and over in the survey. This was surprising to me because the number one goal for me when designing Deduce was to make the syntax as unsurprising as possible. However, at the end-of-the-day, Deduce is still a computer language, which means it takes time for students to learn the grammar. Unfortunately, I built the parser for Deduce using a parser generator and the default error message is just “unexpected token” and the position in the file, which typically pointed to the token after the incorrect one. So an important take-away is that the syntax error messages need to be top notch. This past week I rewrote the parser by hand (recursive descent) and I’ve greatly improved the error messages.

Another aspect of the syntax challenge was a bit more subtle. The students had difficulty learning which keywords to use in different parts of a proof. In Deduce, there is typically one keyword (and syntactic form) for each logical rule. For example, if pq is a proof of (if P then Q) and p is a proof of P, then apply pq to p a proof of Q (modus ponens in Deduce). However, in English mathematical prose, there are no specific words that correspond to using modus ponens (AFAIK). Something that I’m now experimenting with is allowing the proof rule to be inferred, so one could just write pq, p (comma is conjunction introduction) and Deduce figures out that it can use modus ponens to conclude Q. However, in situations when that fails, it may become more difficult to provide an informative error message because the syntax does not capture as much of the user’s intent.

On a related note, several students mentioned that they would like a reference manual for Deduce. At the time of the course, the documentation for Deduce consisted of two tutorials, one for functional programming in Deduce and the other writing proofs. However, those documents are not ideal for answering questions like, “What is the have keyword for and how does it work?” I’m now in the process of adding a reference manual.

Deduce allows students to write incomplete proofs, using ? as a placeholder for parts that still need to be filled in. For each ?, Deduce reports what formula needs to be proved and it gives advice regarding what the next step of the proof could be, based on the top logical connective of the formula. For example, if the formula is an all, then Deduce suggests arbitrary as the next step. Of course, induction is also an alternative choice, and one student would have liked Deduce to also suggest that alternative. An undergraduate research assistant has already added that to Deduce. Inspired by how much this advice helped the students, I’ve added advice for elimination rules. For example, if pq is a proof that P implies Q, then writing help pq will cause Deduce to suggest that you use the apply-to form.

Helping students during office hours was also a great way learn more about student expectations and challenges using Deduce. One thing I noticed is that a student would sometimes refer to an already-proved formula, i.e. a given, by writing the formula itself instead of using the label for the formula. For example, writing apply (if P then Q) to P instead of apply pq to p. I’ve now added the ability to refer to givens by formula. In general, I expect users to prefer the formula when its small and the prefer the label when the formula is large.

Finally, it looks like it would be better to spend 4 weeks introducing students to Deduce instead of 2 weeks. This would allow for a more gentle introduction to Deduce and it would enable more applications of Deduce to proofs of correctness. I’ll give this a try with the 100-student regular section of data structures this Spring!

I’ll be going back over the survey results from time to time, looking for more lessons to be learned and ways to improve Deduce.

Survey Results

There were 7 questions in the survey that were completed by 21 students in an honors undergraduate data structures course. 4 of the questions were multiple choice and 3 were open-ended.

4 Multiple-choice Questions

DeduceLength
DeduceDiscreteMath
DeduceJava
DeduceCorrectProofs

3 Open-ended Questions

How did you feel when you completed a proof of a theorem using Deduce?

Extremely satisfied

In all honesty, it felt like I was just trying all possible combinations
of applying the logic rather than figuring out how to prove the questions. 
More time spent on learning what the syntax actually means might help.

I had a hard time proving theorems in Deduce, mostly because of the syntax. 
So, when I finally saw "xyz.pf is valid," the joy I felt was indescribable. 
I could sleep at night peacefully knowing I didn't prove anything wrong.

It felt satisfying when it finally completed, however it was because of consistent
frustration in figuring out how to phrase the proofs in deduce.

I typically felt confused and frustrated.

Seeing the words: "proof is valid", would always bring me satisfaction. However, 
I would frequently feel sad when I struggled more with the implementation of a 
proof in Deduce rather than the proof itself.
Also, I would like to add a comment about question 2. While I think doing proofs 
without a program like Deduce is sufficient for students in C241, it could make 
grading proofs for UIs/TAs easier.

Liberated (I struggled with the syntax of Deduce so it was satisfying when the proof finally ran).

I felt accomplished afterwards, like I had completed a journey of smaller goals to
achieve proving the final theorem.

Completing a proof felt rewarding. Pf. Siek mentioned that it would eventually feel 
like a video game, and I agree. It felt like putting together pieces of a puzzle
until the puzzle was complete as opposed to proof-writing as I'm used to it.

Very satisfied once I understood why it was correct.

It feels a little nice having confirmation of my proof correctness, but I can still
feel confused after completing a proof. Sometimes after completing a proof in Deduce, 
I have trouble reading the proof I just wrote. 

It did feel good because you know what you’ve done is correct.

relief, like a weight has been lifted off my shoulders. 
During the proof: I hate deduce
After completing the proof: I love deduce I did it I can do anything the world is mine

I felt really accomplished because the syntax in Deduce was hard to pick up in the 2 weeks
we learned it. I still think Deduce was helpful in a way even though I struggled, so maybe
spending more time with Deduce would help with syntax knowledge.

Relieved, the process was challenging, but typically rewarding in the end. However, 
I sometimes felt frustrated with Deduce since I sometimes didn't understand the syntax, 
which made it hard to get Deduce to give helpful feedback.

I think proofs that I wrote in deduce were super thorough, your logic needed to be super
strong and for a more difficult proof, you needed to map out the definitions and the 
equations you are trying to prove. I felt more confident about my logic skills and code 
after the proofs. It was definitely a lot of hard work but I think it was worth it to get 
an understanding of the correctness of your programs.

I felt good and rewarded when the valid statement came out, especially after trial and error
with the syntax

Every time I completed a proof of a theorem using Deduce, I felt mostly relieved.  
It was frustrating trying to understand Deduce's syntax or how to apply certain concepts
to complete a proof, but at least when Deduce says that the proof is valid, I know it's 
correct and I do not have to go looking for someone to look over my work. I think, to make 
it easier to learn Deduce, it would be helpful to have more completed step-by-step examples
of proved theorems available to students. I'd also argue that when Deduce did offer advice, 
the advice that I got was usually very helpful and assisted greatly in pushing me towards 
correctness. However, often, Deduce would not give advice at all when I had an error in my 
proof, so I would suggest having a handbook or guide available to students to help decipher
what errors lead to certain, unhelpful, error messages. 

I felt really happy that the theorem worked and was marked as valid since this means 
my logic is actually correct.

I found completing the theorems in Deduce satisfying. The immediate feedback was nice
compared to traditional informal proofs that had delayed feedback.

I felt good. Especially because it was much harder than normal proofs. Nevertheless, 
I felt like I understood the logic behind the functions I programmed.

What changes or improvements to Deduce would you find most helpful? Please be as specific as you can.

The biggest thing that would help would be more diverse/descriptive error messages. 
Most of the error messages I got were something like "unexpected symbol ____". 
Clearly this meant I wasn't using the syntax correctly, but I didn't get any feedback
on what I was getting wrong. So I think if the error messages were somehow able to have a 
bit more information that would be good. Also if you could make the ? work while using the
equations statement, that would help a lot.

Better error messages would probably be the most helpful, followed by an easier process of
applying facts, definitions, and rewrites to the goal.

I'm not sure if we already have it, but a documentation explaining how all the functions work
would be really helpful. For example, how "suffices," "apply," "definition," and others function.

Sometimes the ? would not work as expected - where it would give me an error because it was
thinking that ? was part of the proof rather than to be used as like a "fill in the blank".

Some of the syntax just feels very unintuitive, it seems unreasonable to me to try to teach us the
language in just a few weeks and expect us to be able to use it well. 

In class, we were taught how to use suffices to change the goal when proving equivalence. 
Similarly, when proving things in C241, we only changed the goal when we did logical equivalence
proofs. However, in the Insertion Sort lab, when we had to use suffices in our proofs, the novelty
of suffices/Deduce compounded onto the fact that we had not practiced changing the goal in proofs
that were not about equivalence. This led to me feeling very sad after taking a long time to complete
only the base case of the isort proof.
The addition of syntax highlighting could also be cool, but I do not think it is too important.

Maybe an index for all the syntax (eg. apply, suffice) that was used in the examples on the Github page
(I had to use ctrl+f to find the terms on the page).

I think making the error statements a little more specific in more areas would be helpful for 
troubleshooting when writing the proof.

Having tools to support it such as intelliSense and autocomplete or even a VSCode extension would
improve the learning curve. A lot of times you know where the proof needs to go but you don't know 
the syntax that deduce wants, which is frustrating. Also a frequent error message I would get was
"unexpected token". I think it would be useful if there was a way for deduce to (if possible) 
partially parse the line the unexpected token was on to see what type of command I was trying to
use and give some advice on it.

In the question mark advice section, include induction as a possibility.

A more complete guide to using deduce

I wish we covered it more in class because I struggled with it a lot outside of class.

The "evidence" part of each line could be improved. When we had to reference a function, 
or theorem, or a previously proved fact, we had to use a large variety of different keywords. 
Definition, apply, definition in, rewrite, rewrite in, and maybe more had me spending a lot 
of time figuring out which one I was supposed to use even though I already knew the next step
of the proof. I just didn't know the format for the justification deduce was looking for. 
Maybe it is possible to combine some of these?

I think listing out the proper ways to approach proofs would help. Moreover, there should be
a reference listing when to use "suffices ... with definition ...", "by definition", "by rewrite",
or "by apply". I looked for guidance in GitHub and think that the new syntax learned up to a certain
point should be accessible in an efficient way.

Here are a few things I would suggest:
Clearer feedback on errors: A lot of the time I would get errors/responses from Deduce where 
I had a difficult time interpreting what I was supposed to be doing to fix my issues.
More detailed examples: It would be nice to have a reference for proofs in the homework, 
especially if there were a line-by-line breakdown of how the proofs work.
Better documentation or tutorials: Learning the system was difficult due to a lack of resources,
as the only materials were the two .md files on proofs and Functional Programming.

While the feedback from deduce definitely helped me identify some minor errors, I feel the
error messages sometimes lack clarity. And its more unclear when dealing with type inference
or mismatch errors or issues with multisets. I don't know to what extent it could be possible
in the context of a class, but more detailed error messages with specific suggestions or more
context can help debug proofs more easily. 

Sometimes it got difficult to know when to use which words like "with", "by", "apply." 
If there is some way to be able to use most of the words we used in 241, I think that would 
be very helpful.

I think Deduce would benefit most from better error messages and more forgiving syntax. 
First, it seems that when Deduce recognizes an error in a proof, but doesn't know specifically
how to explain the error, it just defaults to some sort of "parsing error" or reports an
"unexpected token" which I think is sometimes unhelpful. Additionally, I believe that when trying
to use a theorem, if it is used incorrectly, then Deduce should mention every parameter the theorem 
needs in some sort of explicit detail, so I don't have to look through another document to decipher
what the theorem expects. I will also say that it seems like Deduce is only willing to give advice
when I put the "?" symbol in the exact right place in the proof. For example, I've had issues where
the "?" symbol was not formatted properly (such as indentation or location in a line) and I received
one of the two errors I mentioned above. It would also be great to have more overall documentation
about specifically the syntax of Deduce in addition to the existing documentation about how to prove
theorems using features of Deduce.

It would be nice to have better error messages that were a bit clearer or better explain what the
problem was that occurred. For example, I had an issue where I was trying to use a "definition" 
instead of using "apply," and the error messages would just say something like "no proofbinding."

Allow the use of question marks in the equations statements. I also think adding a little statement
when solving for equalities v.s. other goals (ex. Insertion sort theorem) would be nice.

Better IntelliSense, and to make it less confusing, and so many keywords mean the same thing, which
makes this very confusing. So, looking at it at first glance, you won't know that they have different
functionalities. I am trying to make it easier for a first-timer to learn the language without depending
on the TAs or professors to tell them what everything does. Also, when someone hovers over an in-built
function or keyword, it should show a brief definition of what that does.

What about Deduce did you dislike or find frustrating? Please be as specific as you can.

I got better about this as we went on, but at the beginning and even somewhat at the end,
it felt somewhat ambiguous to me when to use which keyword ("suffices," "definition," 
"rewrite," etc.). I'm not sure how to fix this and a large part of this is probably just
because we didn't spend long on Deduce but that was the biggest learning curve for me.

It was often very difficult to apply facts, definitions, or rewrites in the way that I would
expect them to. Maybe this was just because I didn't understand the syntax too well.

Sometimes, even though my proof was logically correct, I had to restructure it because Deduce 
didn’t recognize it as valid.

The syntax really frustrated me. I felt that it would be so much easier to phrase in english
and then I had to work backwards to try to figure out how deduce wanted the proofs to be.
I put 4 weeks as the right amount of time because I felt if I had more time to get used to
deduce I wouldn't be as upset with it, but I was between putting that or "Not using deduce". 

Occasionally, I would receive feedback that was not very helpful. Unfortunately, I do not
remember specific examples of that happening. 
It is also slightly frustrating that after taking the time to learn it, we are not continuing
the use of Deduce. Although I am relieved, I feel like I finally have a grasp on Deduce.

I'm not sure if it's because I have such a low understanding of Deduce, but it was hard to figure
out what syntax to use for which situations (eg. suffice, rewrite, apply).

Honestly, the only thing I can think of is I would occasionally get confused with the question mark
and when or where I would need to put it, but that might have been user error as well.

Basically what I mentioned above. Not knowing the exact syntax to get where you want to go
(usually a trivial theorem or rule) and unexpected token errors not showing much help. 

Learning syntax and translating things I knew to actual proofs.

What I find most frustrating about Deduce is knowing how to do a proof but not being able to figure
out how to write it in Deduce. I also feel like Deduce is a bit restrictive. But I understand both
of these are because it's a computer program. If I had to prove something through a proof 
programming language, I guess Deduce is as good as any. 

I found the syntax very difficult. During lab I didn’t know how to solve a problem in deduce
because I didn’t know how to word it properly.

1. sometimes to see what to do next you can use ?, but other times ? will create an error
2. I know the next step of the proof but I don't know how deduce wants me to prove it

Coding the functions wasn't frustrating since I had taken H211, but despite having previous
experience with similar syntax, I still struggled with Deduce syntax, which was pretty frustrating.

One of the main frustrations I encountered while using Deduce was the feedback system. The error
messages were often unclear, and it was difficult to pinpoint exactly what went wrong in my proofs. 
For example, when I was trying to rewrite an expression using insertion_sort2(xs) = isort(xs, empty) 
by definition insertion_sort2 to prove sorted(insertion_sort2(xs)), Deduce gave me an error message
stating, “no matches found for rewrite with insertion_sort2(xs) = isort(xs, empty) in
sorted(isort(xs, empty)).” This feedback was vague and didn’t help me understand why the rewrite failed
or how to fix it.
Additionally, the learning curve for Deduce felt steep, especially compared to programming in more 
familiar languages like Java. The formal proof system required precise logic, and I often struggled
with applying theorems correctly due to the complex syntax and lack of intuitive error handling.
I also found that Deduce lacked detailed examples or a more comprehensive tutorial system. When I was
trying to apply the isort_sorted theorem, for instance, there wasn’t a clear guide on how to handle
such proofs step-by-step, leaving me unsure whether I was using the right approach.

When I took C241, the way we proved theorems was slightly different. In deduce, the most frustrating
thing was the syntax and flow. More often than not, I had all the logic and steps down, but there 
would be issues with matching the right statements. I usually use have statements a lot and that 
confuses me. And then I tried using the equations feature which was far simpler to complete the same proof. 

The most frustrating part of using Deduce was knowing how to prove a theorem to a person, 
but not knowing how to prove a theorem to Deduce. The problem of trying to prove a specific
theorem to Deduce becomes even more apparent when error messages can be unhelpful. I think some 
Deduce features such as "?" and "sorry" are great steps in the right direction, but I still think
more should be done to help point students to the correct solution when they make a mistake in their proofs.

I found it frustrating how specific you have to be with deduce as compared to real life.
I would understand the logic of a proof but I would not be able to prove it in deduce because
I did not know exactly how to format my reasoning.

I found using the equations implementation an annoyance because you couldn’t get feedback after
every line when a question mark was placed. I would write out everything in steps and that would 
take up a lot of space on the line and become hard to read.

Some syntax was very confusing. Like defining functions. It felt like it was not meant for 
UG students. especially the use of Lambda functions. I don't know how you can improve that 
given that it is a vital part of the language but that is how I felt.