Abstract 1 Introduction 2 Related work 3 Novel contributions 4 Description of the algorithm 5 Generalizing non-numerical constants 6 What to expect from proof generalization 7 Applications of proof-based generalization 8 Conclusion References

Automatically Generalizing Proofs and Statements

Anshula Gandhi ORCID University of Cambridge, UK Anand Rao Tadipatri ORCID University of Cambridge, UK Timothy Gowers ORCID Collège de France, Paris, France
University of Cambridge, UK
Abstract

We present an algorithm, developed in the Lean programming language, to automatically generalize mathematical proofs. The algorithm, which builds on work by Olivier Pons, advances state-of-the-art proof generalization by robustly generalizing repeated and related constants, as well as abstracting out hypotheses implicitly concerning them. We also discuss the role of proof generalization in conjecturing, learning from failure, and other aspects of mathematical proof discovery.

Keywords and phrases:
automated reasoning, automated theorem proving, interactive theorem proving, formalization of mathematics, generalization, Lean theorem prover, Lean tactic
Copyright and License:
[Uncaptioned image] © Anshula Gandhi, Anand Rao Tadipatri, and Timothy Gowers; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Logic and verification
; Theory of computation Automated reasoning
Supplementary Material:
 Software  (Source Code): https://github.com/Human-Oriented-ATP/automatic-proof-generalization/tree/v1-3
Editors:
Yannick Forster and Chantal Keller

1 Introduction

In this paper, we present an algorithm that takes as its input a theorem, a proof of the theorem, and some aspect of the theorem that can potentially be generalized, and outputs a correct generalization of the theorem and proof. Our algorithm, which builds on work of Olivier Pons [12], is implemented as a tactic in the Lean theorem prover, and can be extended to any dependently-typed theorem proving language.

To give an example, consider the following theorem and proof.

Theorem.

17 is irrational.

Proof.

Suppose not. Then we can find coprime positive integers a and b such that 17=a/b, which implies that a2=17b2. From this it follows that a2 is a multiple of 17. Since 17 is prime, it follows that a is a multiple of 17. Writing a=17c we deduce that 172c2=17b2, and hence that 17c2=b2. It follows that b2 is a multiple of 17, and therefore so is b, contradicting the fact that a and b are coprime. This contradiction proves the theorem. The Lean formalization of the theorem is below:

def irrat_sqrt : Irrational 17 :=
by -- proof omitted, full proof available in code supplement

If our tactic is presented with the above theorem and proof, formalized in Lean, and asked to generalize the number 17 by replacing it with an unknown natural number n (via the tactic invocation autogeneralize 17 in irrat_sqrt), it will determine that the only property it used of the number 17 was that it was prime. So, its output will be the theorem that if n is prime, then n is irrational, together with a correct proof of that theorem.

def irrat_sqrt.Gen : (n : ), Nat.Prime n Irrational n :=
by -- proof omitted, full proof available in code supplement

Pons’s algorithm does something very similar, but in Rocq.

In this case, finding a generalized proof is particularly simple: one just needs to replace each occurrence of 17 in the original proof above by n, and then identify the properties of n that must hold for the proof to “go through.” However, for most examples of proof generalization, such a naive approach fails. We mention three important complications that often occur.

  1. 1.

    If the same constant appears in more than one place in a theorem and proof, an algorithm needs to detect whether the fact that the constants in those places are equal is used in the proof.

  2. 2.

    If different constants appear in a theorem and proof, an algorithm needs to determine whether there are dependencies between them that are used in the proof.

  3. 3.

    Often we use a fact about a constant that is specific to that constant, or to a very small set of constants, with the result that the fact may not conveniently generalize even if the proof as a whole generalizes.

The following standard proof that 2 is irrational illustrates all three complications (which is why we began by discussing the irrationality of 17 instead).

Theorem.

2 is irrational.

Proof.

Suppose not. Then we can find coprime positive integers a and b such that 2=a/b, which implies that a2=2b2. From this it follows that a2 is even. It follows from there that a is even. Indeed, if not, then we can write a as 2r1 for a positive integer r, in which case a2=4r24r+1=2(2r22r)+1, which shows that a2 is odd, contradicting our assumption.

Writing a=2c we deduce that 4c2=2b2, and hence that 2c2=b2. It follows that b2 is even, and therefore so is b, contradicting the fact that a and b are coprime. This contradiction proves the theorem.

If we wish to generalize the statement to the statement that n is irrational, subject to suitable hypotheses, then we do not want to replace the equation a2=2b2 by an=nbn, since the equality of the coefficient 2 and the exponent 2 is not relevant to the proof. Thus, we have an example of the first complication. We also need to recognise that the constant 4 is related to the 2 that we are generalizing, and that in the more general argument it should be replaced by n2, which illustrates the second complication. And finally, there are various places in the proof where the argument is specific to 2: the use of the words “even” and “odd” is an obvious example, but even if we were to replace “is even” by “is a multiple of 2” and “is odd” by “is not a multiple of 2” we would still use the fact that every positive integer that is not a multiple of 2 can be written in the form 2r1 for some positive integer r.

This last point is a problem, because a naive proof generalizer will simply note that it does not have a proof that a non-multiple of n can be written in the form nr1, and will therefore add that as a condition, ending up with a theorem that says the following.

Theorem.

Let n>1 be a positive integer with the property that every non-multiple of n can be written in the form nr1 for a positive integer r. Then n is irrational.

This theorem is true, and the generalized argument would constitute a proof of it, but the theorem is not a generalization, because the only integer n>1 that satisfies the condition is 2.

The natural reaction of a human mathematician after observing this would be to ask, “What did I use this condition for?” to which the answer is that it was needed in the proof that if a2 is a multiple of n, then so is a. So one possible response to the situation is to isolate this statement as the hypothesis that we place on n, which gives us the following true theorem, together with a proof.

Theorem.

Let n>2 be a positive integer with the property that for every a, if a2 is a multiple of n, then so is a. Then n is irrational.

Having proved that, we can investigate which integers have the given property.

It turns out that n has the property if and only if it is square free, but discovering that fact is outside the scope of our algorithm.

The algorithm we currently have implemented is capable of some interesting proof generalizations: we give more details about its capabilities in Section 6. However, we believe that it should be possible to bring several more proofs within its scope in the not too distant future, as the difficulties in doing so appear to be more technical than fundamental. In this paper, when we refer to “the algorithm” or “the tactic” we shall mean the algorithm as currently implemented. When we wish to discuss what a future development of the algorithm will probably be able to do, we shall refer to “an ideal algorithm”.

The rest of this paper is organized as follows. In Section 2 we discuss related work and previous state-of-the-art. In Section 3, we outline the ways in which our proof generalizer goes beyond the previous state-of-the-art. In Section 4 we describe in more detail how our algorithm works. In Section 5 we give two examples of proofs with non-numerical constants that the algorithm can generalize. In Section 6 we clarify what a proof generalizer should and should not be expected to do. In Section 7 we talk about the role of proof generalization in mathematics and about potential applications of automatic proof generalizers. We make some concluding remarks in Section 8.

2 Related work

For the purposes of the discussion in this section, it will be useful to draw a distinction between two kinds of generalizers.

Let us call a generalization algorithm a risky generalizer if it generates statements that may or may not be true. Risky generalizers are often coupled with tools for searching for simple counterexamples, so that obviously false generalizations can be dismissed and only reasonable conjectures are kept. They can be very useful in the process of searching for proofs, when one does not yet know which of the various facts one has at one’s disposal will be used, or how.

By contrast, a safe generalizer is one that produces generalizations that are guaranteed to be true. Since the guarantee needs to come in the form of a proof, they will typically be proof-based generalizers such as the algorithm presented here: that is, they take a proof of the more specialized statement, generalize it, and output the statement that is given by the more general proof.

Some examples of risky generalizers include Boyer and Moore’s generalization heuristics [3] or Ireland and Bundy’s generalization critic [6]. Both generalizers involve generating a set of conjectures, throwing away the conjectures that can be easily falsified with a counterexample, and setting about trying to prove the others. Modern generalization tools that rely on large language models are also often risky – the models can hallucinate, so there is no guarantee of the truth of the outputted statement.

Safe generalizers, on the other hand, have their roots in foundational proof generalization frameworks in the 1990s. Ringer et al.’s survey in [13] provides a valuable historical overview of safe generalizers, including the work of Hasker and Reddy [5] in ’92, Felty and Howe [4] in ’94, and Kolbe and Walther [8] in ’98. Pons’s type-theoretic generalization algorithm [12], the foundation for our tactic, followed soon after in 2000. Pons’s algorithm generalizes all occurrences of a constant within a proof (to the same variable) and adds hypotheses as necessary to allow the initial proof to be reused. Since then, some safe generalizers have focused on adapting proofs to accept more general datatypes: Magaud and Bertot [9] and Ringer et al. [14] both present tools which adapt Rocq proofs to work across varying inductive types; Best’s typeclass generalizer [1] generalizes Lean proofs by replacing any unnecessarily specific typeclasses with the most general typeclass required by the proof. Proof generalization tools have also been applied outside the realm of dependent type theory, including within higher-order logics by Johnsen and Lüth [7], and within constrained sequent calculi by Boy de la Tour and Peltier [2].

Our work aims to advance the capabilities of safe generalizers by increasing their robustness and flexibility – in particular, our work has built most directly on Pons’s work to generalize repeated constants independently, and also generalize dependencies of a given constant.

3 Novel contributions

As already mentioned, our algorithm builds on the type-theoretic generalization algorithm described in Pons [12]. The main new contribution of our algorithm is that it is better able to deal with the complications mentioned in the introduction. That is, it has a greater agility at generalizing statements when the expression to be generalized appears multiple times, or when two expressions are related by some function that is implicit in the proof. Whereas Pons’s algorithm generalizes all instances of the specified constant to a single variable, our algorithm can detect any other constants related to the specified constant, and then generalize all related constants to as many variables as are required. Additionally, while Pons’s algorithm generates hypotheses out of lemmas quoted in the proof, our algorithm can generate hypotheses by abstracting out larger sections of the proof.

Another difference is that while Pons’s algorithm was implemented in Rocq, we have implemented ours in Lean, which allows us to take advantage of the large mathematical library available in Lean and the rapidly growing Lean community.

We now describe some examples where this additional agility – to generalize repeated and related constants – makes a difference.

3.1 Generalizing repeated instances of a constant

Often, generalizing all instances of a constant in a theorem to the same variable yields unnatural generalizations. Consider, for example, the following result.

Theorem.

17+17 is irrational.

Proof.

The square root of any prime number is irrational, so 17 is irrational. And 17 is an integer, so it is rational. Since the sum of an irrational number and a rational number is irrational, it follows that 17+17 is irrational.

The Lean formalization of the initial theorem is:

def irrat_sum_sqrt : Irrational (17+17) :=
by -- proof omitted, full proof available in code supplement

The previous state-of-the-art in proof generalization in dependent type theory, Pons’s algorithm, would generalize the above statement as follows.

Theorem.

For all primes p, p+p is irrational.

However, the proof we are given makes no use of the fact that the two occurrences of 17 are equal: all it uses is that the first 17 is a prime number and the second is an integer. Our algorithm detects this and therefore arrives at the following stronger, more natural generalization.

Theorem.

Let n,p and let p be prime. Then p+n is irrational.

That is, we can generalize all instances of 17 in the above statement and proof via the following call to our Lean tactic: autogeneralize 17 in irrat_sum_sqrt. The output is the following generalized statement and proof:

def irrat_sum_sqrt.Gen : (n : ), Nat.Prime n (m : ), Irrational (n+m) :=
by -- proof omitted, full proof available in code supplement

3.2 Generalizing other constants that depend on a constant

Sometimes when mathematicians say that they wish to generalize a constant c, they really need to generalize not just occurrences of c, but also occurrences of other constants in the proof that depend on c. For example, consider the following theorem, which bounds the size of the union of two sets.

Theorem.

For all sets X and Y,

|X|=2|Y|=2|XY|4.
Proof.

For any sets X and Y, we know that |XY|=|X|+|Y||XY|. Plugging in the size of each of the sets, we have |XY|=2+2|XY|, and so |XY|4.

The Lean formalization of the initial theorem is as follows:

def union_of_finsets
: Type} [Fintype α] [DecidableEq α]
(A B : Finset α) (hA : A.card = 2) (hB : B.card = 2) : (A B).card 4 :=
by -- proof omitted, full proof available in code supplement

If we wish to generalize the constant “2”, it is not enough just to generalize the instances of “2” itself, since we must also recognise that the constant “4” depends on the two “2”s in an important way. Thus, any algorithm that generalizes the “2”s in the proof will need to generalize the “4” as well.

Pons’s algorithm would attempt to produce a “generalized” proof term with the “2”s replaced by variables but the “4” remaining, and therefore would not typecheck.111Claims about the performance of Pons’s algorithm are based on a close reading of his paper and the implementation described there.

However, the algorithm we present recognizes from a proof of the theorem that the two “2”s are “independent” and that the “4” depends on both of them, yielding the following natural generalization.

Theorem.

Let m,n. Then for all sets X and Y,

|X|=m|Y|=n|XY|m+n.

That is, when we call our Lean tactic, specifying that we want to generalize the 2 in the above theorem via the command autogeneralize 2 in union_of_finsets, the result is the following generalized statement and proof:

def union_of_finsets.Gen : (n m : ℕ)
: Type} [Fintype α] [DecidableEq α]
(A B : Finset α), A.card = n B.card = m (A B).card n + m :=
by -- proof omitted, full proof available in code supplement

3.3 Abstracting out composite hypotheses involving the constant

It often happens that proofs contain deductions about a constant that are specific to that constant and therefore need to be abstracted away as hypotheses in the generalization. Pons’s algorithm is able to identify such hypotheses when they are present in the form of lemmas, but fails when a fact about a constant is encoded in an arbitrary sub-portion of the proof.

Consider the following proof.

Theorem.

For any natural number n0, 1<3n.

Proof.

We know that if a<b, then an<bn for any non-zero natural number n. Since 1<3, it follows that 1n<3n. But 1n=1 for every n, so we obtain the desired inequality 1<3n.

This is formalized in Lean as the following lemma:

lemma one_lt_three_pow {n : ℕ} (hn : n 0) : 1 < 3 ^ n :=
have hpow_lt : 1 ^ n < 3 ^ n := Nat.pow_lt_pow_left (a := 1) (b := 3) ?_ hn
rwa [one_pow] at hpow_lt
· exact Nat.one_lt_succ_succ 1 -- 1 < 3

The constant 3 does not appear anywhere within the type of the lemma given by Nat.one_lt_succ_succ (which has type a:,1<a+2). The constant 3 does, however, appear in the composite sub-portion of the proof given by Nat.one_lt_succ_succ 1 (which has type 1<1+2). So, our algorithm is able to abstract the fact into a hypothesis of the form 1 < ?m.

def one_lt_three_pow.Gen : (m : ℕ), 1 < m {n : ℕ}, n 0 1 < m ^ n :=
by -- proof omitted, full proof available in code supplement

4 Description of the algorithm

4.1 Overview

The working of the algorithm can be summarized as follows. The algorithm takes as input a proof and a constant to generalize in it and begins by traversing the proof, replacing occurrences of that constant with metavariables (which can be thought of as placeholders or holes).222Note that the default generalization produced is not minimal, since the procedure here abstracts all occurrences of the constant being generalized (unless the “occurrences” flag is passed to the tactic, specifying which occurrences are to be generalized). In particular, this means that if two occurrences of a constant are coincidentally equal, then the default output is different from what it would be if they were not equal. There may be portions of the proof which use specific properties and facts about this constant that are not generalized by this process; they must be separately abstracted out as hypotheses.

The algorithm then outputs a generalized version of the theorem together with a proof. Note that since the algorithm operates on and modifies the proof term, the tactical structure of the proof is not preserved.

A diagram summarizing the algorithm is below.

Each stage in the above diagram is described in detail in the following section.

4.2 Algorithm Details

We now turn to describing the algorithm in more detail. For clarity, we present the algorithm in various stages, some of which actually take place simultaneously in a single pass through the proof to be generalized.

Stage 1: Abstract occurrences of the constant

During its traversal of the proof term, the algorithm looks for occurrences of the constant c being generalized, replacing each one with a different metavariable. That is, each instance of c:t is replaced by a new ?m:t.

By abstracting each occurrence of the constant differently, instead of uniformly as in Pons’s original algorithm, it becomes possible to generalize distinct occurrences of the constant in the theorem statement separately according to the roles they play in the proof. Initially, this abstraction process creates a proliferation of different metavariables in the proof, but a consolidation process at a later stage (Stage 4) of the algorithm via type checking ensures that occurrences that in fact play the same role in the proof will be unified.

Stage 2: Abstract facts about the constant

After Stage 1 of the algorithm, all occurrences of the constant c have been abstracted from the proof term. However, it may be possible for the constant c still to be present in the type of some of the sub-expressions in the proof. A sub-portion P of the proof is considered a relevant hypothesis for the constant c to be generalized when P’s type contains c, despite c being absent in the term P itself.

More precisely, after the occurrences of the constant c have been abstracted from a subterm P of the proof in Stage 1, the algorithm checks whether its inferred type TP involves the constant c, and if it does, this subterm P:TP is abstracted away into a hypothesis of the generalized type: that is, P:TP(c:t) is abstracted to ?h:TP(?m:t). A similar procedure was proposed in Pons’s paper [12] as a potential remedy to certain type-correctness issues that the algorithm is prone to.

This criterion, in particular, ensures that relevant lemmas, which appear in the proof term by their names, are abstracted as generalized hypotheses. For example, if the proof uses the lemma that 2 is a prime number, it appears in the proof term as .const Nat.prime_two [] : Nat.Prime 2 and is abstracted away to a metavariable ?h : Nat.Prime ?p when generalizing the constant 2. The metavariable hypothesis ?h will at a later stage of the algorithm (Stage 5) be extracted to be a hypothesis of the generalized theorem.

Abstracting only lemmas in the proof (which contain the constant c explicitly in the lemma’s type) as generalized hypotheses was the primary mechanism of generalization in Pons’s algorithm. However, the procedure described here strictly extends Pons’s algorithm, since our algorithm is capable of abstracting facts about a constant even when the fact is not encapsulated in a lemma, but rather, occurs in any sub-portion of the proof (see section 3.3 for an example).

In summary, stage 1 of the algorithm generalizes the occurrences of the constant in the proof, while stage 2 of the algorithm generalizes facts about the constant that are used in the proof.333The algorithm actually performs Stage 1 and Stage 2 in the same pass of the proof for the sake of efficiency, but we present these as distinct stages here for clarity.

Stage 3: Detect other constants that depend on the abstracted constant

It may turn out that other constants in the proof are functions of the constant that has just been abstracted. If so, it is possible the generalization process may introduce type mismatches within the proof term. That is, the mismatches typically occur in expressions constructed as function applications, where the type expected by the function no longer matches that of the argument. In these situations, the mismatch may be due to some parts of the proof being over-specialized, and generalizing additional constants in the proof may result in a type-correct term.

The algorithm identifies potential new constants to generalize through antiunification [11] [5]. The antiunifier of two expressions (assumed to have the same type) is their least common generalizer, i.e., the most specific expression with metavariables that can be instantiated to either expression. For example, if one started with the expressions 2+(3+7) and ?m+(3+4), the least common generalizer would be ?m+(3+?n), where m and n are metavariables representing natural numbers.

The antiunification procedure used in the algorithm operates by traversing the expressions together and inserting metavariables at the locations where they do not match. The pairs of mismatching terms collected along the way give candidate terms to generalize next. Although more sophisticated treatments of antiunification exist in the literature, for example [5], the procedure above seems to be sufficient because the expressions being compared here are ones that have diverged from terms that were previously definitionally equal in the original proof, and are therefore likely to be quite similar.

For example, when generalizing the constant 4 in a proof about graphs with four vertices, there may be a sub-portion of the proof that is expected to prove a statement like degree(v)=3, but which actually proves degree(v)=(?n1) after generalization (one such proof is available in the code supplement). Computing the antiunification of degree(v)=3 and degree(v)=?n1 produces the conflicting pair 3 and ?n1. We then choose to generalize 3 in the proof based on the heuristic that it contains fewer metavariables than ?n1.

Not all type mismatches in the generalized proof are fixed this way, and there may be situations where the mismatch is not due to the proof being over-specialized, but instead because of failure of definitional equality after generalization. This is described in more detail in subsection 4.3 below.

Stage 4: Detect and unify related occurrences of the constant

It may turn out that multiple different occurrences of the constant in the proof are required to be equal to each other for the proof to work. In such cases, the corresponding metavariables must be matched up in the generalized proof term. This can be done by traversing through the proof and for each portion of the proof, unifying the inferred type of each subterm with its expected type. This kind of consolidation can be conveniently done in a dependently typed language by running type checking on the generalized proof term.

Consider, for example, a proof (available in the code supplement) that the number of functions from a set of size 3 to a set of size 3 is 33. After Stage 1, each “3” in the proof would be abstracted to a different metavariable (i.e. the “proof” would say that the number of functions from a set of size ?m to a set of size ?n is ?p^?q). By the end of this stage, however, the algorithm would recognize that some of these metavariables must be equal for the proof to go through, and that the number of functions from a set of size ?m to a set of size ?n is ?n^?m.

Stage 5: Extract metavariables as hypotheses

The entire procedure creates two kinds of metavariables – those corresponding to the original constant and constants related to it, and those corresponding to portions of the proof that have been abstracted out as generalized hypotheses. Metavariable extraction involves turning these from placeholders in the proof into actual hypotheses and variables that the proof is allowed to depend on.

For example, before this stage, the generalized proof that 17 is irrational would include both a metavariable ?p : representing the abstracted constant, and a metavariable ?h : Prime ?p representing a hypothesis about the abstracted constant. After this stage, the proof term would begin with ‘fun (p : ℕ) (h : Nat.Prime n) => ...’, and the proof would contain no metavariables.

Stage 6: Deduce the generalized statement

Once a formalized proof has been created through this generalization process, we can run type inference to determine the generalized theorem statement that it proves.

For example, consider the generalization of the proof that 17 is irrational (available in the code supplement) which begins “fun (p : ) (h : Nat.Prime p) => ....” Running type inference on the proof term yields the statement “ p : , Nat.Prime p → Irrational p.”

The success of this final typecheck shows that the generalization is, indeed, correct. The generalized theorem and its proof can now be used in other proofs.

4.3 A limitation of the algorithm

If a proof relies on two types being definitionally equal, it will fail to be typecorrect after generalization if the definitional equality no longer holds, as shown in the following example.

theorem dvd_left_of_dvd_prod (a b c : ℤ) (h : a | b) : a | (b * c) := by
rcases h with d, hd⟩; exists (d * c); rw [hd, mul_assoc]

The first step of the proof relies on the fact that ab is defined as c,ca=b over the integers. That is, Lean considers ab to be definitionally equal to c,ca=b, and therefore can switch between the two without explicit conversion in the proof. When the integers are generalized to a different type, the following proof (which doesn’t explicitly expand the definition of ab) breaks since the division symbol “” is not necessarily defined the same way for an arbitrary type carrying a division operation.

So, when the tactic attempts to operate on the proof term to generalize the theorem, the generalized proof cannot make use of any definition of division, and therefore does not typecheck. Again, as mentioned earlier, since the tactic never produces incorrect generalizations, “failure” in this case means the tactic produces a typechecking error, and returns no generalized proof.

Explicitly expanding the definition of division with a rewrite fixes this issue,444Note that the tactic unfold does not change the proof term at all – it simply changes the goal visible to the user to a definitionally equal goal. adding an extra hypothesis a,bT,(ab)=c,b=ac to the generalization (where T is the type is being generalized to).

theorem dvd_left_of_dvd_prod (a b c : ℤ) (h : a | b) : a | (b * c) := by
rewrite [Int.dvd_def] at * -- expand a | b to ‘∃ d, d * a = b everywhere
rcases h with d, hd⟩; exists (d * c); rw [hd, mul_assoc]

Future work on the algorithm will include building in automatic detection of when such definitional equalities are used in the proof, and the insertion of these as hypotheses, to facilitate generalization.

5 Generalizing non-numerical constants

The constants to be generalized in the examples we have discussed so far have been integers. However, numerical constants are by no means the only constants one might wish to generalize, and indeed, Pons’s algorithm can generalize non-numerical constants such as or (multiplication) as well. Our algorithm also has this capability. Consider, for example, the following result.

Theorem.

For any three integers a,b and c, if a+b=a+c, then b=c.

theorem cancellation: a b c : , a + b = a + c b = c := by
intros a b c h
replace h : -a + (a + b) = -a + (a + c) := by rw [h]
rw [← Int.add_assoc, Int.add_left_neg, Int.add_assoc, Int.add_left_neg, Int.zero_add, Int.zero_add] at h
exact h

This can be proved by adding a to both sides (on the left), applying associativity, noting that a+a=0, and using the fact that 0 is an additive identity.

Each step of the argument works just as well in an arbitrary semigroup with an identity and left inverse, so in fact the proof gives the following result.

Theorem.

Let X be a set with an associative binary operation . Suppose that X contains a left identity e for and that for every xX there exists yX such that yx=e. Then for every a,b,cX, if ab=ac, then b=c.

theorem cancellation.Gen :
(T : Type) -- If you have an arbitrary type
[inverse : Neg T] -- with a symbol representing the inverse,
(e : T) -- and a symbol representing the identity,
(f : T T T), -- and a binary operation
(∀ (a : T), f e a = a) -- s.t. the identity element is left-neutral,
(∀ (a : T), f (-a) a = e) -- and every element has a left inverse,
(∀ (a b c : T),
f (f a b) c = f a (f b c)) -- and the operation is associative,
(a b c : T),
f a b = f a c b = c -- then the operation is left-cancellative.
:= by -- proof omitted, full proof available in code supplement

Note that the generalized result outputted by our algorithm above uses f as the binary operation, rather than infix notation with a .

What the algorithm has done here is generalize the constants +, 0, and , thereby proving that the conclusion holds in a semigroup with an identity and left inverses. Importantly, the algorithm does not need to have prior knowledge of these more abstract concepts: it discovers the abstractions for itself.

A more complicated example of this kind is provided by Bézout’s theorem from elementary number theory. Our algorithm is capable of generalizing the non-numerical constant in a standard proof of this theorem, which states the following.

Theorem.

Let x and y be integers, where y0. Then there exist integers h and k such that hx+ky=gcd(x,y).

theorem bezout_identity : (x y : ℤ), y 0
(h k : ℤ), isGCD (h * x + k * y) x y
:= by -- proof omitted, full proof available in code supplement

In a typical university mathematics course, students will later have it pointed out that the proof of this result works just as well in a Euclidean domain, which, roughly speaking, means an integral domain with a measure of size that allows one to run the Euclidean algorithm and guarantees that it will terminate. As with the semigroup example above, what our algorithm discovers is not actually the definition of a Euclidean domain, but rather the definition of a weaker and somewhat more artificial structure that is just enough for the proof of Bézout’s theorem (that is, the particular proof we have formalized in Lean, which is based on one of the standard pen-and-paper proofs) to carry over. More specifically, it produces, among other hypotheses, the following conditions under which the theorem holds for a set R.

  • The set has the appropriate additive and multiplicative structure (in particular, it must be an Abelian group under addition 555Note that the commutative law appears in an unconventional way, in the form of a hypothesis that a+(b+c)=b+(a+c) for elements a,b and c, which, in the context of the other group axioms, is easily seen to be equivalent to commutativity. and multiplication must distribute over addition).

  • The set comes equipped with a function R (the “size function”), which generalizes the absolute value function on the integers.

  • The only element in R of size 0 is the additive identity 0.

  • There are functions q,r:R×RR, generalizations of the quotient and remainder, respectively, that satisfy the property a=q(a,b)b+r(a,b) for elements a,bR

  • Additionally, for any a,bR such that b0, the size of r(a,b) is strictly less than the size of b.

The full generalization in Lean outputted by our algorithm is below. The list of hypotheses produced is quite long and not 100% natural: future work will include pairing our generalization tactic with typeclass search, to produce more succinct and readable theorem statements.

theorem bezout_identity.Gen :
(T : Type) (gen_instOfNat : {n : ℕ} OfNat T n) (AddGroup T) (Mul T) (Dvd T),
(∀ (a b c : T), a + b + c = a + (b + c))
(∀ (a b c : T), a + (b + c) = b + (a + c))
(∀ (a b c : T), (a + b) * c = a * c + b * c)
(∀ (a b c : T), a * (b + c) = a * b + a * c)
(∀ (a b c : T), a * b * c = a * (b * c))
(gen_natAbs : T ℕ),
(∀ (a : T), 0 * a = 0)
(∀ (a : T), 1 * a = a)
(∀ (a : T), 0 + a = a)
(∀ {a : T}, gen_natAbs a = 0 a = 0)
(Div T) (Mod T),
(∀ (a b : T), a / b * b + a % b = a)
(Neg T),
(∀ (a b : T), -(a * b) = -a * b)
(∀ (a : T) {b : T}, gen_natAbs b 0 gen_natAbs (a % b) < gen_natAbs b)
(∀ (a : T), a + 0 = a)
(∀ (a b : T), b | a * b)
(∀ (a b : T), (a | b) = c, b = a * c)
(∀ (a b c : T), a * (b * c) = b * (a * c))
(x y : T), y 0 h k, isGCD (h * x + k * y) x y
:= by -- proof omitted, full proof available in code supplement

Nevertheless, the above output already demonstrates that our algorithm has the ability to discover new mathematical abstractions, which is one of the important uses of proof generalization by human mathematicians.

6 What to expect from proof generalization

We turn now to what an ideal proof generalization algorithm can and cannot do.

An ideal proof generalization algorithm, at least as we conceive it, is very much tied to the proofs it is given to generalize, which is why we talk about proof-based generalization. This constraint has consequences for what such an algorithm should be expected to be able to do. Broadly speaking, it should be capable of generalizing a proof when the generalization is implicitly present in the proof, but if there are steps that generalize in ways that cannot be read out of the proof, then discovering the generalizations of those steps involves solving new mathematical problems, which is outside the scope of an ideal proof generalizer and should instead be carried out by another reasoner.

In the rest of this section, we discuss two particular consequences of this constraint. The discussion is not tied to any particular implementation of a proof generalization algorithm.

6.1 Different proofs yield different generalizations

There are often multiple ways to prove the same theorem, and often each proof will generalize in a different way. This is a phenomenon also encountered by mathematicians when they generalize pen-and-paper theorems.

For example, suppose we wish to generalize the statement that 2 is irrational with a view to showing that m is irrational under suitable conditions on m. If the proof we choose when m=2 appeals to the fact that every odd number can be written in the form 2k+1, then the natural generalization will appeal to the fact that every non-multiple of m can be written in the form mq+r with 0<r<m. If instead, it observes that 2 is prime and uses that to prove that if a2 is a multiple of 2, then so is a, then the natural generalization of the proof will yield the statement that the square root of every prime number is irrational. And if the proof instead observes that 2 occurs an even number of times in the prime factorization of a2 and an odd number of times in the prime factorization of 2b2, then the natural generalization will yield the statement that m is irrational if m is not a perfect square.

When generalizing a statement for which one has a proof, it is often desirable to modify the proof to make it more suitable for further generalization, by trying as far as possible not to use specific facts about the constants one is generalizing. Our algorithm does not perform pre-processing of this kind: we envisage it as a component of a more sophisticated generalizer that might be capable of doing so, but that is for future work.

6.2 Proofs with black boxes cannot be directly generalized

Because proof generalization relies on a thorough examination of the proof, it benefits the generalization if every step of the proof is evident in the proof term. A limitation of any proof-generalization tool, and therefore of this algorithm in particular, is that black-box steps can obscure some parts of the proof and therefore yield suboptimal generalizations.

Often such steps take the form of small computations. Consider, for example, the following problem, which appeared in Round 2 of the British Mathematical Olympiad in 1993.

Problem.

Let p be a prime greater than 3 and let m=(4p1)/3. Prove that 2m1 leaves a remainder of 1 when divided by m.

It is natural to begin by looking at the simplest case, namely p=5. In this case m=(2101)/3=341, so the problem requires us to show that 2340 leaves a remainder of 1 when divided by 341, or equivalently that 23401(mod341).

A phenomenon that seems relevant is that 2101(mod2101), and indeed that quickly implies that 2101(mod341). But that gives us what we want, since 10 divides 340.

If we now try to generalize this argument by replacing 5 with a general prime p>3, everything works smoothly apart from at the very end, where in place of the observation that 10 divides 340 we need to prove that 2p divides (4p1)/31. When p=5 we proved this computationally: we just noted that 340=34×10. That option is not available to us for general p and we have to supply an argument. This turns out not to be too hard, but it is out of the scope of our algorithm. What we can hope for in this case is a proof of the theorem that 2m11(modm) if 2p(4p1)/31.

A related, more technical issue arises when computation rules are used in Lean. Suppose that, within the context of a larger proof, it is useful to prove that 2×3 is even. We can prove this either by using deduction rules or by using computation rules. A deductive approach might be to argue that 2 times any number is even, by definition, and thus that 2×3 is even. A computational approach would be to calculate 2×3, obtaining the answer 6 (in Lean, this computation could be carried out with the procedure reduceMul), and then to use the fact that 6 is even. When given the first proof, our generalization algorithm can generalize the 3 to an arbitrary integer c. When given the second proof, our algorithm is unable to find an occurrence of 3 to generalize, because the Lean proof term does not record that 2×3 was the operation that was done, just that 6 is the result of the computation. Thus, we can no longer determine which properties of 3 were used in the proof. And if another part of the proof relies on unification with that 3, a type error may arise during the generalization.

For our purposes, this is a downside of a language like Lean, which does not show computations in the proof term. But the larger issue (that some programs, not necessarily just theorem proving programs, contain black boxes) is tackled by other languages such as Pientka’s functional programming language Beluga [10], which prioritizes creating only programs that can be transparently reasoned about. Theorems proved in these less opaque languages may be particularly amenable to proof-based generalization.

7 Applications of proof-based generalization

Why should we care about automatic proof generalization? An overarching reason is simply that mathematicians often like to try to generalize statements that they have proved, so if one is trying to develop tools to facilitate mathematical reasoning, or even to carry it out entirely, then it makes sense in particular to try to develop tools that can carry out proof generalization.

Our main motivation for developing the algorithm was the use of generalization during proof discovery: in the course of finding a proof, mathematicians often prove special cases of statements and then generalize them, and we are interested in automating that process. Just before we discuss specific potential applications of this kind, we briefly mention a non-application: it is unlikely that any proof-generalization algorithm would be useful when applied to results in Lean’s mathlib, since proofs there are already designed to be as general as possible.

7.1 Reusing lemmas and theorems

Imagine that one is developing the theory of real numbers and considering the question of whether the least-upper-bound axiom, together with the axioms for an ordered field, is enough to yield that the number 2 has a square root. It is natural to consider the supremum s of the set {x:x2<2} and it turns out not to be too hard to prove that s2=2.

Having done this, one can consider replacing 2 by a more general real number and the function xx2 by a more general function f. Performing a proof generalization, one discovers that the argument works provided that f and c obey certain conditions, namely the standard hypotheses for the intermediate value theorem: f should be continuous and there should exist x and y such that f(x)<c and f(y)>c.

After performing this generalization, we have a much more general and flexible theorem, which allows us to argue that all sorts of other equations can be solved. For example, if we have defined the exponential function and proved that it is continuous, we can show that for every positive real number c there exists x such that ex=c, which gives us a definition of the logarithm function.

This is a very common phenomenon in mathematics: a lemma proved for a narrow purpose can be significantly generalized (with a proof-based generalization), and becomes useful well beyond its original context.

7.2 Generating conjectures from special cases

Consider the following problem.

Problem.

Let p be a prime and let x and y be integers. Prove that (x+y)pxp+yp(modp).

It is not obvious why such a statement should be true, so one way of gaining some intuition is to look at some special cases. When p=2, we observe that

(x+y)2=x2+2xy+y2x2+y2(mod2),

since 20(mod2). When p=3 we observe similarly that

(x+y)3=x3+3x2y+3xy2+y3x3+y3(mod3),

since 30(mod3). And when p=5, we find that

(x+y)5=x5+5x4y+10x3y2+10x2y3+5xy4+y5x5+y5(mod5),

since both 5 and 10 are multiples of 5.

These simple proofs rely on the fact that all the coefficients we obtain when we expand out (x+y)p are multiples of p. However, this is shown computationally, so this fact does not obviously generalize beyond these simple examples, and will therefore not be obtained by a proof-generalization algorithm. Instead, an ideal proof generalizer would output the following conjecture.

Conjecture.

Let p be a prime and let 1k<p. Then the binomial coefficient (pk) is divisible by p.

It would of course be possible to arrive at this conjecture by simply noting that (x+y)p can be expanded using the binomial formula, that two of the terms are xp and yp, and that the remaining terms all have coefficients (pk) for some k between 1 and p1. However, our confidence that it is correct is greatly increased by the fact that we have seen (i) that it holds in a few special cases and (ii) that it played an important role in the proof of those cases. In more complicated examples, proof generalization can help to identify conjectures that would be significantly harder to guess than this one.

7.3 Discovering inductive proofs

When mathematicians prove a theorem by induction, often the idea behind how to prove the inductive step P(n)P(n+1) in the general case can be discovered by proving the first few cases, for instance P(1),P(2) and P(3), and using proof generalization. Of course, if that is one’s strategy, then in the course of proving P(3) one will try to make use of P(2), so that the proof has a good chance of generalizing appropriately.

7.4 Finding new abstractions

As we saw in the generalization of Bézout’s theorem in Section 5, one of the uses of proof generalization is to formulate useful new abstractions, by examining proofs about concrete objects and identifying the properties of those objects that are essential to the proofs.

In practice, the most useful abstraction of a proof is not always the most general. An interesting avenue that we have not yet explored would be to start with two proofs and aim to find something like a least common generalization. For example, one could begin with Bézout’s theorem for integers and for polynomials: the weakest natural simultaneous generalization of those two results might well yield precisely the definition of a Euclidean domain, rather than the more general structure that our algorithm provides (which, for instance, is not necessarily an integral domain).

7.5 Learning from failure

A fundamental part of the process of solving mathematical problems is conjecturing potentially useful intermediate lemmas, and refining one’s conjectures if they turn out to be false. Proof generalization is an important component of this kind of “learning from failure.” Suppose, for example, that one wishes to prove a statement of the form xX,P(x)Q(x). Sometimes, one may identify a property R that is easily seen to be stronger than Q, and conjecture that in fact xX,P(x)R(x). If we then discover some zX such that P(z) is true but not R(z), then instead of simply abandoning the conjecture, we can attempt to generalize the proof of the statement P(x)¬R(z), obtaining a class of solutions. This we can express as a statement of the form

wX,S(w)P(w)¬R(w).

If now we would like to have another try at finding a property T such that PT and TQ, we know in particular that S should imply T, which is a more stringent condition than the condition that z should satisfy T. Thus, the proof generalization has helped us to reduce the size of our search space.

Details, illustrated with several examples, will be given in a future paper by the authors.666The phrase “learning from failure” is often used differently in the automated theorem proving community, with the word “failure” referring to an invalid proof of some statement, and “learning from failure” meaning patching that proof. Tools like Ireland and Bundy’s [6] have been used to perform this patching and to learn from failure in this alternative sense. For our purposes, “failure” is a valid disproof of a conjecture, and we can “learn from failure” by generalizing that disproof.

7.6 Generating interesting questions

An important challenge in automatic theorem proving is designing systems that do not just answer questions but ask them, with the important caveat that the questions they ask should be natural and interesting. Current methods tend to work in a rather “computery” way, for example asking a large number of questions in some domain and filtering out the ones that have easy answers. This is quite unlike the approach that human mathematicians take. In particular, human mathematicians often come up with interesting questions as a result of refining conjectures that arise naturally when they are thinking about other questions. As several of the examples discussed in this paper illustrate, proof generalization is one of the ways in which one question can lead to another that is recognisably “motivated”, and therefore interesting.

We have already seen an indication of how this process can work: when trying to generalize the proof that 2 is irrational, an ideal algorithm outputs the condition “For every a, if na2 then na,” which can be converted into the problem, “For which positive integers n is it the case that for every a, if na2, then na?” which can in turn lead on to many other interesting questions that did not have an immediately obvious relationship to the irrationality of 2.

For this reason it seems likely that proof generalization will be at the heart of any system that is good at generating interesting mathematical questions.

8 Conclusion

Proof generalization is an important part of the process of searching for mathematical proofs, and more generally for advancing mathematics. Therefore, if we are to create useful proof-finding tools, it is highly desirable that they should have the ability to generalize existing proofs.

Further improvements to the design and implementation of proof-generalization algorithms have the potential to enhance the ability of automated theorem provers to reuse proofs, generate conjectures, and refine conjectures when they are discovered to be false, and in that way serve as better tools for mathematicians for solving and posing mathematical problems.

References

  • [1] Alexander Best. Automatically generalizing theorems using typeclasses. In Conference on Intelligent Computer Mathematics, 2021. URL: https://ceur-ws.org/Vol-3377/fmm12.pdf.
  • [2] Thierry Boy De La Tour and Nicolas Peltier. Proof generalization in LK by second order unifier minimization. Journal of Automated Reasoning, 57(3):245–280, 2016.
  • [3] Robert Boyer and J. Strother Moore. A Computational Logic, chapter 12: Generalization, pages 163–170. Academic Press, 1979.
  • [4] Amy Felty and Douglas Howe. Generalization and reuse of tactic proofs. In Logic Programming and Automated Reasoning: 5th International Conference, LPAR’94 Kiev, Ukraine, July 16–22, 1994 Proceedings 5, pages 1–15. Springer, 1994. doi:10.1007/3-540-58216-9_25.
  • [5] Robert W Hasker and Uday S Reddy. Generalization at higher types. In Proceedings of the Workshop on the λProlog Programming Language, pages 257–271, 1992.
  • [6] Andrew Ireland and Alan Bundy. Productive use of failure in inductive proof. Journal of Automated Reasoning, 16(1-2):79–111, 1996. doi:10.1007/BF00244460.
  • [7] Einar Broch Johnsen and Christoph Lüth. Theorem reuse by proof term transformation. In International Conference on Theorem Proving in Higher Order Logics, pages 152–167. Springer, 2004. doi:10.1007/978-3-540-30142-4_12.
  • [8] Thomas Kolbe and Christoph Walther. Proof analysis, generalization and reuse. In Automated Deduction—A Basis for Applications: Volume II: Systems and Implementation Techniques, pages 189–219. Springer, 1998.
  • [9] Nicolas Magaud and Yves Bertot. Changing data structures in type theory: A study of natural numbers. In International Workshop on Types for Proofs and Programs, pages 181–196. Springer, 2000. doi:10.1007/3-540-45842-5_12.
  • [10] Brigitte Pientka. Mechanizing meta-theory in beluga (invited talk). In Yuki Chiba, Santiago Escobar, Naoki Nishida, David Sabel, and Manfred Schmidt-Schauß, editors, 2nd International Workshop on Rewriting Techniques for Program Transformations and Evaluation, WPTE@RDP 2015, July 2, 2015, Warsaw, Poland, volume 46 of OASIcs, pages 1–1. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2015. doi:10.4230/OASICS.WPTE.2015.1.
  • [11] Gordon D Plotkin. A note on inductive generalization. Machine intelligence, 5(1):153–163, 1970.
  • [12] Olivier Pons. Generalization in type theory based proof assistants. In Paul Callaghan, Zhaohui Luo, James McKinna, and Robert Pollack, editors, Types for Proofs and Programs, International Workshop, TYPES 2000, Durham, UK, December 8-12, 2000, Selected Papers, volume 2277 of Lecture Notes in Computer Science, pages 217–232. Springer, 2000. doi:10.1007/3-540-45842-5_14.
  • [13] Talia Ringer, Karl Palmskog, Ilya Sergey, Milos Gligoric, Zachary Tatlock, et al. QED at large: A survey of engineering of formally verified software. Foundations and Trends® in Programming Languages, 5(2-3):102–281, 2019. doi:10.1561/2500000045.
  • [14] Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. Ornaments for proof reuse in coq. In 10th International Conference on Interactive Theorem Proving (ITP 2019), pages 26–1. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019.