Abstract 1 Introduction 2 Background 3 Greedy Algorithm for Matroids 4 Matroid Intersection 5 Discussion References

A Formal Analysis of Algorithms for Matroids and Greedoids

Mohammad Abdulaziz ORCID King’s College London, UK Thomas Ammer ORCID King’s College London, UK Shriya Meenakshisundaram ORCID King’s College London, UK Adem Rimpapa ORCID Technische Universität München, Germany
Abstract

We present a formal analysis, in Isabelle/HOL [30], of optimisation algorithms for matroids, which are useful generalisations of combinatorial structures that occur in optimisation, and greedoids, which are a generalisation of matroids. Although some formalisation work has been done earlier on matroids, our work here presents the first formalisation of results on greedoids, and many results we formalise in relation to matroids are also formalised for the first time in this work. We formalise the analysis of a number of optimisation algorithms for matroids and greedoids. We also derive from those algorithms executable implementations of Kruskal’s algorithm for computing optimal spanning trees, an algorithm for maximum cardinality matching for bi-partite graphs, and Prim’s algorithm for computing minimum weight spanning trees.

Keywords and phrases:
Matroids, Greedoids, Combinatorial Optimisation, Graph Algorithms, Isabelle/HOL, Formal Verification
Copyright and License:
[Uncaptioned image] © Mohammad Abdulaziz, Thomas Ammer, Shriya Meenakshisundaram, and Adem Rimpapa; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Discrete optimization
; Theory of computation Invariants ; Theory of computation Program verification
Related Version:
Full Version: https://arxiv.org/abs/2505.19816
Supplementary Material:
Software: https://doi.org/10.5281/zenodo.15758230
Editors:
Yannick Forster and Chantal Keller

1 Introduction

Matroids are algebraic structures that generalise the concepts of graphs and matrices, where a matroid is a pair of sets (E,) satisfying a number of conditions, e.g. E is a finite set and 2E. Although they have a generally rich mathematical structure which inherently justifies studying them, there are two main motivations to studying them in the context of combinatorial optimisation. First, optimisation problems defined on matroids generalise, albeit not always in a practically useful way, many standard combinatorial optimisation problems: e.g. the minimisation problem for matroids over modular objective functions generalises the travelling salesman problem, the shortest-path problem, minimum spanning trees, and Steiner trees, all of which are important standard optimisation problems. This means that verifying an algorithm for the minimisation problem, for instance, allows for deriving algorithms to solve many other problems (modulo some conditions that we will discuss later), and in many cases those derived algorithms are practical, albeit sometimes asymptotically slower than the fastest possible algorithms. The second reason to study matroids is that, since they are an abstraction of other concrete structures, like graphs, reasoning about (algorithms that process) them avoids a lot of the combinatorial reasoning and the case analyses encountered when reasoning about (algorithms processing) the more concrete structures.

In this paper we formalise, in Isabelle/HOL [30], a number of results about matroids, and greedoids, which are a generalisation of matroids. We formally analyse greedy algorithms for the maximisation problems for both structures and prove guarantees on the optimality of the solutions those greedy algorithms find. We also formally analyse an algorithm for the matroid intersection problem, where one aims to find the largest member of 12, for two matroids (E,1) and (E,2). We then instantiate the verified algorithms with concrete instances to obtain executable algorithms for standard graph problems: we obtain an 𝒪(n2log2n) implementation of Kruskal’s algorithm from the greedy algorithm for matroids,111We verify a variant to compute maximum spanning trees. an 𝒪(nm(logn+logm)) algorithm for maximum cardinality bi-partite matching from the matroid intersection algorithm, and an 𝒪(nmlogn) implementation of Prim’s algorithm for minimum weight spanning trees from the greedy algorithm for greedoids, where n and m are the number of vertices and edges of the input graph, respectively.222We did not verify the running times of these algorithms in Isabelle/HOL. Comments on running time are an informal analysis only. In addition to verifying these algorithms, we formalise a number of results from the theory of matroids and greedoids, e.g. the fact that matroids can be fully characterised by the greedy algorithm, and the relationship between greedoids and antimatroids. With the exception of deriving Kruskal’s algorithm from the greedy algorithm for maximisation over matroids [15] and Edmonds’ Intersection Theorem (Theorem 5) [28], all the above results were not formalised before in any theorem prover. In our formalisation, we follow Korte and Vygen’s textbook [21] and Schrijver’s textbook [33], with the former being our main reference.

2 Background

Matroids.

A set system (E,) consists of a carrier set E and a collection of independent sets 2E. We only consider finite carriers. We call (E,) an independence system iff (M1) and (M2) XY and Y implies X. Members of are called independent. Sets FE with F are called dependent. A matroid is an independence system satisfying the augmentation property (M3): for X,Y and |X|>|Y|, there is xXY with Y{x}.

A basis of a set XE is an independent subset of X maximal w.r.t. set inclusion. On the contrary, a circuit CE is a minimal dependent set, e.g. there are no circuits if =2E. For XE, the lower rank ρ(X) and (upper) rank r(X) are the cardinalities of the smallest and largest bases of X, respectively. The rank quotient q(E,) is defined as the minimum of ρ(X)r(X) over all XE. For example, if ={,{1},{2},{3},{2,3}} for E={1,2,3}, {1} is a smallest and {2,3} is a largest basis for X=E and ρ(X)r(X)=1 for all other XE leading to q(E,)=12. We have 0<q(E,)1 in general, and q(E,)=1 iff (E,) is a matroid. Intuitively, the quotient expresses how close an independence system is to being a matroid.

Generalising planar graph duality (see details of instantiation to directed graphs in the next section), we call (E,) the dual of an independence system (E,), where a set F iff there is a basis B of (E,) s.t. FB=. As a basic property, we have that the dual of the dual (E,) is the independence system (E,) itself. Furthermore, (E,) is a matroid iff (E,) is a matroid. For a matroid (E,) and FE, r(F)=|F|+r(EF)r(E), where r is the rank of the dual.

Formalisation.

Keinholz already formalised independence systems, matroids, bases, circuits, lower ranks and basic properties of these in Isabelle/HOL [17] using locales. In Isabelle/HOL, locales [6] are named contexts allowing the fixing of constants and the assertion of assumptions on their properties, which will then be available within the locale for definitions and proofs. Keinholz gives locales indep_system and matroid shown in Listing 2, each of which fixes a carrier set carrier and an independence predicate indep specifying which subsets of carrier should be independent. We extend this formalisation with another equivalent definition of the upper rank. For the rank quotient, we explicitly define the argument of the minimum to be 1 when ρ(X)=r(X) to treat division by zero and empty carriers.

Listing 1 Definition of the rank quotient in Isabelle/HOL.
Listing 2 Two locales with assumptions characterising independence systems and matroids.

Verification Methodology.

As mentioned earlier, a goal of this work is to formalise some of the theory of matroids and greedoids and then obtain executable algorithms to process them. We follow Nipkow’s [29] approach of using locales to model abstract data types (ADTs) and to implement data structures such as finite sets. The functions and assumptions of an ADT can be encapsulated into a locale. In that approach, one specifies invariants, which give the conditions for the data structure to be well-formed, and abstraction functions, which convert an instance of the data structure into a value of the abstract mathematical type which it represents (see Listing 3, for instance). Implementing an algorithm defined using ADTs can be done by instantiating the ADTs with executable implementations. The approach thus allows correctness guarantees and a lot of the proofs to be done at an abstract mathematical level, while executable implementations are obtained.

To model iteration, we follow Abdulaziz’s [29] approach whereby algorithms involving for- or while-loops are specified in Isabelle/HOL as recursive functions manipulating values of algorithm states, which are modelled as records with fields corresponding to the variables which change as the algorithm runs. The verification of the correctness properties of an algorithm is done using loop invariants, which are shown to hold for the initial state and to be preserved across every distinct execution path of the algorithm. With a custom-defined induction principle and by providing a termination proof for the function, we can deduce the relevant correctness theorems for the function using the invariant properties.

Our approach has limitations when it comes to the efficiency of executable code, as it cannot generate imperative code, it strikes a good balance of obtaining practical verified implementations for algorithms with deep background mathematical theory.

3 Greedy Algorithm for Matroids

In order to implement executable algorithms on independence systems and matroids in Isabelle/HOL, we first provide a specification of these structures that allows for executable implementations. Since the locales indep_system and matroid work with ’mathematical’ sets of the type ’a set, we specify as a locale an ADT for sets which can be abstracted to a set of type ’a set, which would allow for executability later on after instantiation. For this, we specify an ADT using a locale extending an existing ADT (Set2) for binary set operations with the function cardinality.

Listing 3 An ADT for binary set operations: a set implementation is assumed to be of type ’s.

We then define the locale Indep_System_Specs, which includes the specification for Card_Set2. Within this locale, an independence system will be represented by a carrier set carrier of type ’s, and an independence function indep_fn of type ’s bool. indep_fn is an independence oracle, that is, it takes a set X as input and returns a Boolean value indicating whether X is independent w.r.t. the matroid.

Indep_System_Specs defines a predicate indep_system_axioms which gives the independence system properties for carrier and indep_fn, in terms of the well-formedness invariant and the set operations from Card_Set2. We also define two abstraction functions carrier_abs and indep_abs which convert carrier and indep_fn to their abstract counterparts of type ’a set and ’a set bool, respectively. We prove that if the invariants hold, then indep_system_axioms carrier indep_fn is equivalent to the abstract independence system properties for the abstractions of carrier and indep_fn, which allows us to reuse the theory on abstract independence systems for the implementation. We also assume an invariant that states, among other things, that the independence function must return the same value for any two implementation sets that have the same abstraction, which is non-vacuous as one abstract set can have multiple different implementations representing it. This is needed to show that the implementation and abstraction of the independence functions are equivalent. Finally, we define a locale Matroid_Specs which has the same specification as Indep_System_Specs and contains the additional, analogous predicates and lemmas for matroids.

Listing 4 A locale specifying an ADT for independence systems.

3.1 Specification of the Best-In-Greedy Algorithm

Algorithm 1 BestInGreedy(E,,c).

The Best-In-Greedy algorithm is used to solve the maximisation problem on independence systems. Here, we consider an independence system (E,) and a nonnegative cost function c:E+, and want to find a set X which maximises the total cost c(X):=eXc(e). For the greedy algorithm, we assume the existence of an independence oracle, which given a set FE, decides whether F or not. The pseudocode of the Best-In-Greedy algorithm is shown in Algorithm 1.

In order to implement and verify the Best-In-Greedy algorithm, we define a locale Best_In_Greedy. It assumes that its input matroid satisfies Matroid_Specs. The elements of the carrier are assumed to be of type ’a. Furthermore, the locale fixes a carrier set carrier of type ’s, an independence function indep_fn of type ’s bool and a sorting function sort_desc of type (’a rat) ’a list ’a list which sorts the input list in descending order using the input function as a key.

We define the state for the greedy algorithm as a record type, where carrier_list is a list of type ’a list consisting of the elements of the carrier set (corresponding to E) and result is the result set of type ’s which is constructed over the course of the algorithm (corresponding to F). With this state type, we can specify the algorithm as a function in Isabelle/HOL as shown in Listing 5.

The recursive function BestInGreedy implements the loop in the pseudocode. It goes recursively through the sorted carrier list, adding elements to the constructed solution as appropriate. The algorithm requires a cost function c and a list order containing the elements of the carrier set in an initial arbitrary ordering. These two parameters are not fixed in the locale since we need to quantify over them in the subsequent correctness theorems (both universally and existentially). We require an explicit initial ordering of the elements for the proof of one of the correctness theorems. The initial state of the Best-In-Greedy algorithm is defined by setting carrier_list to sort_desc c order and result to the empty set.

3.2 Verification of the Best-In-Greedy Algorithm

We now describe some of the important aspects of the verification of the Best-In-Greedy algorithm in Isabelle/HOL. Several of the definitions used in the verification of the Best-In-Greedy algorithm are parametrised by c and order. Whenever these terms appear in a theorem, we assume that c is nonnegative and that the elements of order correspond exactly to those in the carrier set. Additionally, for all the correctness theorems on the greedy algorithm, we assume the predicates BestInGreedy_axioms, sort_desc_axioms and matroid.indep_system_axioms carrier indep_fn.

Listing 5 Best-In-Greedy algorithm in Isabelle/HOL and its initial state. Note: for a record r, r x:= v is the same as r, except with the value of x set to v.

BestInGreedy_axioms contains the invariants from the locale Matroid_Specs, whereas sort_desc_axioms states that the function sort_desc sorts the input list in non-increasing order according to the input cost function, and that the sort is stable.

The first important correctness theorem for the Best-In-Greedy algorithm is Lemma 1. It states that for any nonnegative cost function c and any X which is a valid solution to the maximisation problem, the cost of the Best-In-Greedy solution is greater than or equal to the rank quotient of the independence system times the cost of solution X.

Lemma 1 (Best-In-Greedy cost bound [16, 19]).

Let (E,) be an independence system, with c:E+. Let G be the output of BestInGreedy. Then c(G)q(E,)c(X) for all X.

Proof.

Let E:={e1,e2,,en}, c:E+, and c(e1)c(e2)c(en). Let Gn be the final solution found by BestInGreedy assuming that the elements are sorted in the given order, and let Xn be an arbitrary solution. Define Ej:={e1,,ej}, Gj:=GnEj and Xj:=XnEj for j=0,,n. Set dn:=c(en) and dj:=c(ej)c(ej+1) for j=1,,n1.

Since Xj, we have |Xj|r(Ej). Since Gj is a basis of Ej, we have |Gj|ρ(Ej). Together with the definition of the rank quotient, we can conclude that

c(Gn) =j=1n(|Gj||Gj1|)c(ej)
=j=1n|Gj|djj=1nρ(Ej)djq(E,)j=1nr(Ej)djq(E,)j=1n|Xj|dj
=q(E,)j=1n(|Xj||Xj1|)c(ej)=q(E,)c(Xn).

Within the appropriate context in the locale Best_In_Greedy (which assumes the three axiom predicates described above and that c and order are valid), the lemma can be formulated as in Listing 6 in Isabelle/HOL. Here, c_set c S denotes the sum of the costs of the elements in a set S, where the cost function is c.

Listing 6 The first correctness lemma for the Best-In-Greedy algorithm. to_set is the abstraction function for the set ADT.

In our formalisation, the proof of this lemma followed the informal proof, with the main argument of the proof consisting of the chain of inequalities on sums. Since we do not explicitly store the current iteration number j in the state, we provide the definition num_iter, which extracts the iteration number from a given state using the length of the carrier list. The definitions carrier_pref and pref are used to represent the prefix sets Ej, Gj and Xj.

A main statement that is often given without further explanation in informal proofs is that for all j{1,,n}, Gj is a basis of Ej. Intuitively, Gj can be seen to be a maximally independent subset of Ej since it is independent by construction, and since no potential candidate elements are skipped during the algorithm. However, in the context of the formalisation, proving this statement requires some more work. In order to be able to use this fact in the proof, we formulate invariant invar_4, which expresses the desired property.

Listing 7 The main invariant for the Best-In-Greedy algorithm.

For the invariant preservation proofs for invar_4, defining some more auxiliary invariants and proving they are preserved across the algorithm is necessary. These are not part of the informal proof, since they are either fairly trivial (e.g. showing that result is always a subset of the current carrier prefix) or aspects specific to the formal proof (e.g. showing that result always satisfies the set ADT well-formedness invariant). Proving that invar_4 is preserved across the different execution paths of BestInGreedy then boils down to proving that if result is a basis of the current carrier prefix, the result set after one state update will still be a basis of the carrier prefix in the next step. We consider the two recursive cases of BestInGreedy, in which the current element is either added to result if it preserves independence, or left out otherwise. The two lemmas required for the invariant proofs in these two cases are formulated and proven in the context of the abstract indep_system locale. Through the theorems connecting the implementation of independence systems to the abstract theory of independence systems, we are able to use these theorems to complete the invariant preservation proofs of invar_4.

The second important correctness theorem on the Best-In-Greedy algorithm is Lemma 2, which states that there exists a nonnegative cost function c and a valid solution X for which the bound from Lemma 1 holds.

Lemma 2 (Best-In-Greedy cost bound tightness [16, 19]).

Let (E,) be an independence system. There exists a cost function c:E+ and X s.t. for the output G of BestInGreedy, c(G)=q(E,)c(X).

Proof.

Choose FE and bases B1,B2 of F such that |B1||B2|=q(E,). Define c(e):=1 for eF, c(e):=0 for eEF and sort e1,,en such that c(e1)c(e2)c(en) and B1={e1,,e|B1|}. Then c(G)=|B1| and c(X)=|B2|, which finishes the proof. In our formalisation, this lemma is stated with an additional existential quantifier for the order parameter. The proof in our formalisation proceeds similarly to the informal proof we follow, except that we explicitly construct the initial order list and use the stability of the sorting function to show that the sorting of the elements produces the ordering necessary for the proof. Additionally, proving that the greedy result is B1 requires defining some new loop invariants (for example that no element of FB1 is ever in the greedy result) and showing that they are preserved across the algorithm.

Listing 8 The second correctness lemma for the Best-in-Greedy algorithm.

The final correctness theorem on the Best-In-Greedy algorithm concerns the relationship between the performance of the algorithm and matroids:

Theorem 3 (Characterisation of matroids, Edmonds-Rado [32, 11]).

An independence system (E,) is a matroid if and only if BestInGreedy finds an optimal solution for the maximisation problem for (E,,c) for all cost functions c:E+.

Proof.

By Lemma 1 and Lemma 2 we have q(E,)<1 if and only if there exists a cost function c:E+ for which BestInGreedy does not find an optimum solution. Using the fact that q(E,)<1 if and only if (E,) is not a matroid completes the proof.

Instantiation for Spanning Forests and Oracles.

For an undirected graph G with a set of edges E, a spanning tree T is an acyclic subgraph connecting all vertices of G to one another. A spanning forest is acyclic and connects all vertices that are connected via E.

The set of edges E in undirected graphs forms a matroid under acyclicity, i.e. where those sets of edges that form acyclic subgraphs are independent. This is a so-called graphic matroid and it is a widely cited example of matroids [21, 33]. Acyclicity of sets XE indeed satisfies the matroid axioms and a circuit would be a cycle in the ordinary sense. We formalise this in Isabelle/HOL, including the equivalence of being a basis w.r.t. acyclicity and being a spanning forest, making the greedy algorithm suitable to solve the maximum spanning forest problem correctly. Note that minimum spanning trees can be easily computed using this algorithm by adapting the input.

The instantiation of the Best-In-Greedy algorithm for graph matroids is also known as Kruskal’s Algorithm [22]. An independence oracle could check for the absence of cycles by a modified Depth-First Search (DFS). Since the independence of the current solution X is maintained as an invariant, it is enough to check that a new element x does not lead to circuits in X{x}, i.e. a new edge e does not add a cycle in the case of the graphical matroid. We call this a weak oracle. It would be simpler to implement and to verify, and it might even allow for running time improvements e.g. if available, by using a union-find structure to store connected components of X.

We therefore have an extended locale that also specifies the behaviour of a weak oracle, which is that if (a) X is independent and if (b) suitable data structure and auxiliary invariants are satisfied and (c) xX, then the oracle determines whether X{x} is independent. Subsequently, we verify the greedy algorithm with a weak oracle by proving it equivalent to the first version when called on the empty set. As a result, correctness can be lifted. In the end, we obtain an executable algorithm for maximum spanning forests using a simple DFS as weak oracle. A simplified version of the correctness theorem for Kruskal’s Algorithm is stated in Listing 9. An informal analysis yields that this implementation has a running time of 𝒪(n2log2n). 𝒪(|E|ι) is the running time of the abstract algorithm where ι is the running time of the oracle.

Listing 9 Kruskal’s algorithm’s correctness (simplified).

3.3 Greedy Algorithm for Greedoids

Greedoids are a generalisation of matroids. Their definition (conditions (M1) and (M3)) is obtained by dropping the condition (M2) from the definition of matroids. Similar to matroids, we use a locale to define greedoids, fixing the carrier set, its family of independent subsets, and specifying the axioms of greedoids. Their intuition is derived from the edge set of undirected trees containing a fixed vertex r of an undirected graph and its total set of edges. Due to space limits, we briefly survey results on greedoids in our library without expanding on proofs. Interested readers may refer to the formalisation instead for details.

We introduce notions of accessible set systems, which are set systems in which every independent set contains an element removing which the resulting set continues to be independent, and antimatroids, which are accessible set systems that are closed under union. We formalise two fundamental results [20, 21] that are necessary to verify the greedy algorithm (Theorem 4), which is the main result in our library on greedoids. Firstly, given an accessible set system (E,), for X, |X|=k, there exists an order x1,x2,xk for elements of X such that ik,{x1,xi}. Secondly, a greedoid (E,) is accessible, as stated in Section 14.1 of [21]. Thirdly, we prove that every antimatroid is a greedoid [21].

We also consider the optimisation of modular weight functions c:2E on greedoids. Modular weight functions satisfy c(AB)=c(A)+c(B)c(AB) for all A,BE. The algorithm GreedoidGreedy (Listing 10) keeps track of a current solution X, initially . In every iteration, it searches E in the order e1,,en for a candidate x s.t. xX and X{x}. It takes the first candidate in e1,,en such that c({x})c({y}) for all other candidates y. If there is no candidate x, the procedure terminates. Otherwise the first best candidate is inserted into X, followed by the next iteration. Similar to Theorem 3, one can characterise certain greedoids with that algorithm.

Theorem 4 (Characterisation of Strong-Exchange Greedoids [20]).

We say that a greedoid (E,) has the strong exchange property (SEP) iff for all A,B, B basis w.r.t , AB and xEB with A{x}, there is y with A{y} and (B{y}){x}. GreedoidGreedy computes a maximum-weight basis in for any order of iteration e1,,en and any modular cost function c:2E iff (E,) has the SEP.

For the second direction (), the usual proof is by constructing a counterexample to show the contrapositive. This heavily depends on the order e1,,en for candidate search, making use of the greedoids-accessibility relationship and order property of accessible set systems. In Isabelle/HOL (Listing 10), both costs and the order (formalised as a list) are fixed by a context in which the algorithm is modelled, making e1,,en an input to the algorithm, as well, similar to the order parameter for BestInGreedy.

We formalise set systems anew because the existing formalisation of independence systems [17] uses a finite carrier set along with the independence predicate and assumes (M2). Thus, a formalisation of greedoids, which has to drop (M2), cannot be obtained from that. Note that, although we prove that a matroid according to Keinholz’s formalisation is a greedoid, we formalise a distinct greedy-algorithm for matroids discussed in the last section rather than using GreedoidGreedy. This is because, even for a matroid with the strong exchange property stated in Theorem 4, i.e. one for which GreedoidGreedy works, the worst-case running time of GreedoidGreedy is much worse than BestInGreedyGreedoidGreedy has a worst-case running time of 𝒪(|E|2ι), where ι is the running time of the independence oracle, while BestInGreedy has 𝒪(|E|ι).

Executability.

We give an executable function that is equivalent to the one in Listing 10. However, candidate search is not performed on all eE since anything in X will never be a next best candidate. For the current solution X, it only checks those e where eEX.

Instantiation.

An arborescence T around a vertex r in a graph with edges E is an acyclic, connected subgraph of E that contains r. If an edge in E is incident on r, the set of arborescences around r forms a greedoid. Bases are spanning trees of the connected component of r in E. The arborescence greedoid has strong exchange, making the instantiation of GreedoidGreedy, namely, the Jarnik-Prim Algorithm [31] optimal. We obtain a 𝒪(n|E|logn) implementation to compute maximum weight bases where n is the number of vertices in the component around r. Prim’s and Kruskal’s algorithm demonstrate that bounds tighter than what can be deduced from the running time of the abstract algorithm, which is 𝒪(|E|2ι) and 𝒪(|E|ι), respectively, is possible for concrete problems.

Listing 10 Formalisation of greedoids, GreedoidGreedy, and the characterisation theorem.

4 Matroid Intersection

The maximum cardinality matroid intersection problem asks for an X of maximum cardinality in 12 for two matroids (E,1) and (E,2). A formal definition reusing the matroid locale [17] is given in Listing 11. BestInGreedy from the last section adds an element if the solution remains independent. For intersection, adding an element might preserve independence w.r.t. 1 but not 2. The insight is thus to add an element, then potentially remove an element to maintain independence w.r.t. 2, then insert another element. This three-step process is repeated until an element is added that preserves independence w.r.t. both 1 and 2. The repetition of this process to improve the solution is called augmentation.

The following is an optimality criterion for matroid intersection that involves ranks.

Lemma 5 (Rank Criterion, Edmonds’ Intersection Theorem [12]).

For two matroids (E,1) and (E,2) over the same ground set E with rank functions r1 and r2, respectively, X12, and QE it holds that |X|r1(Q)+r2(EQ). Therefore, |X|r1(X)+r2(EX), for any X12.

Proof.

Both XQ and XQ are independent in both matroids. This implies |XQ|r1(Q) and |XQ|r2(EQ). Both inequalities follow from the fact that |Y|r(Z) if YZ and Y independent, which essentially follows from the definition of the rank r in a matroid. Of course, |XQ|+|XQ|=|X|.

Listing 11 Augmentation and Matroid Intersection.

The rank criterion cannot be exploited immediately for an algorithm since it does not have an immediate computational interpretation. It can be turned into a criterion that is computationally useful using augmentation, which we do in the next section.

4.1 Augmentation

Fix a single matroid (E,). For X, xX and X{x} there is a unique circuit in X{x}. If there were two of these C1,C2, there would be a third one C3C1C2{x}X, since both C1 and C2 need to contain x. Let 𝒞(X,x) be this unique circuit if X{x}, or , otherwise. We observe that if x𝒞(X,y) and X∌yxX, then x can be replaced by y, i.e. (X{x}){y} is independent. Under certain conditions, this replacement can be repeated as shown by the following lemma:

Lemma 6 (Replacement Lemma [13]).

For a matroid (E,) and X, assume (1) x1,,xsX, (2) y1,,ysEX, (3) xk𝒞(X,yk), for all 1ks, and (4) xj𝒞(X,yk) for all 1j<ks. If (1)-(4) hold, then (X{x1,,xs}{y1,,ys}).

Proof.

We show independence of Xl:=(X{x1,,xl}){y1,,yl} for ls by induction. The theorem trivially holds for l=0. We assume the claim for Xl where 0<l<s: It might be that Xl{yl+1} is independent implying independence of Xl+1 because of (M2). If Xl{yl+1} is dependent, however, it contains a unique circuit 𝒞(Xl,yl+1) containing yl+1 due to Xl’s independence. All deleted x1,,xl cannot have been part of 𝒞(X,yl+1) because of (4) and neither any inserted y1,,yl because of (2), yl+1 could, however. This implies 𝒞(X,yl+1)Xl+1. Therefore 𝒞(Xl,yl+1)=𝒞(X,yl+1). Due to (3), xl+1𝒞(X,yl+1), implying independence of (Xl{xl+1}){yl+1}=Xl+1. The formalisation of the above lemma contains an inductive proof on s within the context of the locale matroid. The xis and yis are formalised as a list of pairs.

From now on, we assume two matroids (E,1) and (E,2) over the same ground set E, or formally, we work in the double_matroid locale (Listing 11). Following Korte and Vygen [21], for a set X12, we define an auxiliary graph GX with vertices E and edges AX,1AX,2 where AX,1={(x,y).yEXx𝒞1(X,y){y}} and AX,2={(y,x).yEXx𝒞2(X,y){y}}. GX is obviously bipartite between X and EX. We define two sets SX={y.yEXX{y}1} and TX={y.yEXX{y}2}. A path between a vertex from SX and another from TX indicates an alternating sequence of insertion and deletion. Due to bipartiteness, the length will be odd allowing for an augmentation. We make this precise as follows.

Lemma 7 (Augmentation Lemma [21]).

Let X12 be independent in both matroids and p=y0x1y1x2y2xsys a shortest path from y0SX to ysTX. Then X=(X{y0,,ys}){x1,,xs} is independent in both matroids, i.e. X12.

Proof.

We apply Lemma 6 to X{y0}, x1,,xs and y1,,ys to show X1: For all 0<is, xi is the second vertex of the (2i1)th edge of p. Because the path alternates between EX and X, the (2i1)th is in AX,2 and xiX{y0} for all xi, yielding (1). For all 0<is, yi is the second vertex of the (2i)th edge in p. Because of alternation and pairwise distinctness of the ys (p is shortest path), yiE(X{y0}), implying (2). Also, for all 0<is, xi is the first vertex of the (2i)th edge, which is part of AX,1 because of alternation. Therefore, xk𝒞1(X{y0},yk) for all 1ks (3). We assume an xi and xj where 0<j<is and xj𝒞1(X{y0},yi). xi or xj is the first vertex of the 2ith of 2jth edge of p, respectively. yj and yi are the second vertices. Both of those edges are part of AX,1. As yiEX, and therefore xjyi, the edge (xj,yi)AX,1. We could then delete yj,,xi1 giving a shorter SX-TX-path. Hence (4) is satisfied and Lemma 6 can be applied.

Analogously, we take X{ys}, xs,,x1 and ys1,,y0 to show X2. The statement is in Listing 11. We show that the edges of paths used by Lemma 7 alternate between A1 and A2 using an existing library on alternating lists for matchings [1]. From this we can easily deduce (1)-(3) to apply Lemma 6. The re-applicability of matching theory is no coincidence since bipartite matching is an instance of matroid intersection as we show later.

On top of improvement by augmentation, the absence of an augmenting path characterises maximality of |X| for X12:

Theorem 8 (Optimality Criterion [21]).

X is a set of maximum cardinality in 12 iff GX does not contain a path from some sSX to some tTX (henceforth, an SX-TX path). Edmonds’ max-min-equality [12] is a corollary: max{|X|.X12}=min{r1(Q)+r2(EQ).QE}.

Proof.

If there is an SX-TX path, there is a shortest one as well, which could be used for an augmentation according to Lemma 7, leading to an increase in the cardinality.

Now assume, that there is no SX-TX path. We define R as the set of vertices in GX that are reachable from SX. Obviously SXR and RTX=. There are rank functions r1 and r2 w.r.t. 1 and 2, respectively.

We prove r2(R)=|XR| by contradiction: Since XR has to be independent w.r.t. the second matroid and therefore |XR|=r2(XR), we would have r2(XR)<r2(R) since r2 is monotone. Because of the strict inequality, there is yRX where (XR){y}2. Otherwise, XR would be a basis of R implying equal ranks. As RTX=, X{y}2. There is xXR with x𝒞2(X,y) (Otherwise (XR){y}2). By definition, (y,x)AX,2. That makes x part of R, which is a contradiction.

We prove r1(EX)=|XR| by contradiction: Since XR has to be independent w.r.t. the first matroid and therefore |XR|=r1(XR), we would have r1(XR)<r1(ER) since r1 is monotone. Because of the strict inequality, there is y(ER)X where (XR){y}1. Otherwise, XR would be a basis of ER implying equal ranks. As SXR, X{y}1. There is xXR with x𝒞1(X,y) (Otherwise (XR){y}1). By definition, (x,y)AX,1. That makes y part of R, which is a contradiction.

Therefore, |X|=r1(EX)+r2(X). Because of the rank criterion from Lemma 5, X satisfies |X|r1(Q)+r2(EQ) with equality. This gives the max-min equality and makes X a set of maximum cardinality that is independent w.r.t. both matroids.

The formal proof of Theorem 8 is split in two directions. The second direction also shows r2(R)=|XR|, r1(EX)=|XR| and |X|=r1(EX)+r2(X) as statements to prove the max-min equality separately. The final formalisation can be seen in Listing 11.

4.2 Intersection Algorithm

We exploit the previous subsection’s results for an algorithm [5, 24] that iteratively applies augmenting paths to solve maximum cardinality matroid intersection. The main invariant of the algorithm MaxMatroidIntersection is X12 for the current solution X (preservation proof by Lemma 7). Algorithm 2 shows the pseudocode. In our main reference [21], the circuits 𝒞1/2(X,y), i.e. minimal dependent sets, are computed as explicit sets by iterating over all xX{y} and taking those where X{y}{x} is independent. GX is then computed by using the definition from above which requires further iterations over the circuits. Our pseudocode does not need the circuits as an intermediate step and adds an edge (x,y) or (y,x) if X{y} is dependent or X{x}{y} is independent, respectively.

Algorithm 2 MaxMatroidIntersection(E, 1, 2).
Listing 12 Formalisation of the Intersection Algorithm.

MaxMatroidIntersection is again formalised within a locale that assumes the necessary subprocedures, most notably a function find_path for path selection. This will take the graph GX as an adjacency map. We also have two weak independence oracles orcl1 and orcl2, one for each matroid. We assume a Haskell-style outer_fold and inner_fold to simulate for-loops over sets in the functional language of Isabelle/HOL. We furthermore assume a function to compute set complements as required in Line 4 of Algorithm 2.

Listing 12 shows the formalisation of the for-loops. treat1 and treat2 simulate Line 6 and Line 8, respectively. The outer for-loop between Lines 4 to 8 is realised by compute_graph. We also have a simple function augment performing the augmentation, i.e. alternating deletion and insertion to a set. The function matroid_intersection (while loop) calls the graph computation, gives AX,1AX,2, SX and TX to the path selection function and terminates or calls augmentation according to the result of path selection. matroid_intersection maps the solution X before entering the loop (intended for Line 1 of Algorithm 2) to the X after finishing the loop. We use a record with a single variable for the current solution.

Again, we use invariants for loop verification. Independence of X is the major invariant and data structure well-formedness invariants are minor ones. As for BestInGreedy, we prove single-step preservation for execution paths which is then lifted to preservation for the whole function by a computational induction. The first lemma in Listing 13 says that by executing one step (a) X remains in 12 (by Lemma 7), (b) X continues to satisfy its data structure invariant, (c) XE, (d) X’s cardinality increases by 1 (by Lemma 7) and (e) neither SX nor TX are empty. The Listing also contains the lemma certifying optimality if the terminating branch of matroid_intersection is taken (by Theorem 8). The second lemma together with (a), (b) and (c) can be combined by the induction principle into correctness for general inputs X. When (d) and (e) are used, there is termination. Total correctness is proven when matroid_intersection is called on initial_state.

Listing 13 Invariants, Termination and Total Correctness of the Intersection Algorithm. Note that f_dom input for a function f means that f terminates on input.

Running Time and Circuit Oracles.

The number of iterations to build GX is 𝒪(|E|2). Each matroid comes with an oracle which is assumed to be ι1 or ι2, respectively. This results in 𝒪(|E|3(ι1+ι2)) overall running time. We might have to check |X|(|E||X|) pairs of vertices. If - depending on the problem - the size of the circuits σ1 and σ2 is assumed to be small, e.g. 𝒪(log|E|) or even 𝒪(1), most of the inner for-loops’ iterations were wasted. If the circuits can be computed in time κ1,κ2o(|E|), the time would be the more efficient 𝒪(|E|2(ι1+ι2+κ1+κ2+σ1+σ2)).

In Isabelle/HOL, weak circuit oracles are assumed to return data structures storing 𝒞1/2(X,y){y} provided that X is independent and X{y} is dependent. These are added to the locale for MaxMatroidIntersection, as well. A variant of the algorithm using circuit oracles is verified using the same methodology as before.

Selecting an 𝑺𝑿-𝑻𝑿 Path.

There is a verified implementation of Breadth-First Search (BFS) that works on adjacency maps by Abdulaziz [3]. It takes a set of sources and builds another adjacency map modelling a DAG of those edges that are actually explored by the search. If a vertex v is reachable from a source s, there is a path p in this DAG leading from a source s to v such that p is a shortest path in the actual graph leading from s to v. find_path runs BFS for the sources SX and uses a DFS to obtain a path in the DAG and, hence, a shortest SX-TX path. For a concrete intersection problem and together with appropriate oracles, find_path can be used to instantiate the locale of the intersection algorithm.

Bipartite Matching.

An example for matroid intersection is maximum cardinality bipartite matching. Consider a bipartite graph with edges E between two disjoint sets of vertices L and R. For eE, we call the endpoint in L the left and the one in R the right endpoint. We say that ME is independent w.r.t. L iff no edges in M share a left endpoint. Independence w.r.t. R is defined analogously. Valid matchings are sets of vertex-disjoint edges in E and are thus exactly those M that are independent w.r.t. both L and R. The two independence predicates satisfy the matroid axioms making this a matroid intersection problem. Circuits w.r.t. L or R would be edges e,dE sharing a left or right endpoint, respectively.

For ME, we maintain a set data structure [M], two maps ML and MR, where ML associates every xL with eM for which x is the left endpoint, and analogously for MR. Weak independence and circuit oracles are very simple: If M is independent w.r.t. L (R), and eM, we check in ML (MR) that e’s left (right) endpoint is not associated to any another edge. If M{e} would be dependent w.r.t. L or R, we would return the edges associated to e’s left or right endpoint as 𝒞L(M,e){e} or 𝒞R(M,e){e}, respectively. We used these oracles to instantiate MaxMatroidIntersection to compute maximum matchings in bipartite graphs. The running time is 𝒪(min{|L|,|R|}m(log|R|+log|L|+logm)) where the logarithms are due to using tree data structures, which is dominated by 𝒪(nm(logn+logm)). Without circuit oracles, we would have another multiplicative factor of min{|L|,|R|}.

5 Discussion

We presented a formal analysis of algorithms to solve two optimisation problems for matroids: (a) computing the largest (in terms of accumulated weight) independent set in a given matroid and (b) computing the largest (in terms of cardinality) set that is independent w.r.t. two matroids. The two algorithms are of great importance to practitioners and are implemented in a number of computer algebra systems, e.g. in Sage [9] and Macaulay [14]. Additionally, we also briefly (due to space constraints) presented a formalisation of the analysis of an algorithm to solve an optimisation problem for greedoids, which are a generalisation of matroids. In addition to formalising significant parts of the theory of matroids, greedoids, and the analysis of the algorithms, we showed that our approach can also be used to obtain practical executable verified implementations of important graph algorithms. This demonstrates the potential practical role of matroids and greedoids to design well-factored formal mathematical libraries whereby proofs of multiple algorithms are done in one go that captures their mathematical essence, and that is later instantiated for different concrete examples. The formalisation overall totalled 17.4K lines of proof script, with 11K lines dedicated to the theory of matroids and greedoids, 2.9K lines connecting graph theoretic problems to matroids and/or greedoids, and 3.5K lines on defining and verifying Kruskal’s, Prim’s, and the bi-partite matching algorithm. Our formalisation is available at https://doi.org/10.5281/zenodo.15758230; it builds on and is part of an ongoing effort to formalise combinatorial optimisation in Isabelle/HOL [2].

Related Work.

Most relevant to us is the formalisation of matroids in Isabelle/HOL by Keinholz [17]. Haslbeck et al. [15] formalised a classic proof of the correctness of the Best-In-Greedy Algorithm and obtained a verified, imperative implementation of Kruskal’s algorithm. They used a weak oracle based on a verified imperative union-find implementation by Löwenberg [25] to store the tree’s connected components leading to their implementation having the best possible running time of 𝒪(mα(n)), where α is the inverse Ackermann function. Löwenberg also gives a functional version that we could use for our work, which would, however, not pay off in terms of running time.

In Lean, there is an ongoing effort by Nelson et al. to formalise matroid theory [28], including more advanced results about e.g. minors, isomorphism, Tutte’s excluded minor theorem for finitary binary matroids, or Edmonds’ Intersection Theorem (Lemma 5), but (as of the time of writing) there seem to be no verified practical algorithms in that library.

Matroids were also formalised in Coq [26, 8] and Mizar [7], however, both do not go very far. Set systems were formalised in Isabelle/HOL by Edmonds and Paulson [10], who did that in the context of verifying combinatorial designs.

Additionally, there are formalisations of matching algorithms [1, 4] and Prim’s algorithm [23], in Isabelle/HOL, and of Dijkstra, Kruskal, and Prim [27], in Coq, that are not based on matroids or greedoids, where all reasoning is done at the graph-theoretic level. The Isabelle/HOL formalisation of Prim’s algorithm [27] uses a priority queue to find edges to extend the spanning tree. We cannot use this more efficient (and standard) way due to conditions required to prove Theorem 4. Readers interested in comparing the style of formal reasoning when using matroids or greedoids, which is algebraic, to the one performed directly on graphs, which is more intuitive but combinatorial, should refer to those formalisations.

Future Work.

There is one “main” missing algorithm from our library which would be the most imminent piece of future work: the weighted version of the matroid intersection problem, whereby, instead of finding the largest cardinality set that is independent w.r.t. two matroids, one is to find the largest in terms of the weight of such a set, for a modular cost function.

Routine functions that check whether a matroid indeed satisfies the assumptions of a matroid, compute the dual of a matroid, etc., which are implemented in most computer algebra packages, are missing from our library and are an important next step for our work. Naive implementations of those functions have a worst-case exponential running time, so utilising sophisticated enumeration algorithms [18] would help with making that more practical.

References

  • [1] Mohammad Abdulaziz. A Formal Correctness Proof of Edmonds’ Blossom Shrinking Algorithm, 2024.
  • [2] Mohammad Abdulaziz. https://github.com/mabdula/Isabelle-Graph-Library, 2024. Last accessed on 27/05/2025. URL: https://github.com/mabdula/Isabelle-Graph-Library.
  • [3] Mohammad Abdulaziz. Graph Algorithms. In Functional Data Structures and Algorithms: A Proof Assistant Approach. Editor: Tobias Nipkow. ACM Books, 2025.
  • [4] Mohammad Abdulaziz and Christoph Madlener. A Formal Analysis of RANKING. In The 14th Conference on Interactive Theorem Proving (ITP), 2023. doi:10.48550/arXiv.2302.13747.
  • [5] Martin Aigner and Thomas A. Dowling. Matching Theory for Combinatorial Geometries. Transactions of the American Mathematical Society, 1971. doi:10.2307/1995784.
  • [6] Clemens Ballarin. Locales: A Module System for Mathematical Theories. J. Autom. Reason., 2014. doi:10.1007/s10817-013-9284-7.
  • [7] Grzegorz Bancerek and Yasunari Shidama. Introduction to Matroids. Formalized Mathematics, 2008. doi:10.2478/v10037-008-0040-0.
  • [8] David Braun, Nicolas Magaud, and Pascal Schreck. A Matroid-Based Automatic Prover and Coq Proof Generator for Projective Incidence Geometry. Journal of Automated Reasoning, 2024. doi:10.1007/s10817-023-09690-2.
  • [9] The Sage Developers. The Abstract Matroid Class - Matroid Theory. Last accessed on 25/03/2025. URL: https://doc.sagemath.org/html/en/reference/matroids/sage/matroids/matroid.html.
  • [10] Chelsea Edmonds and Lawrence C. Paulson. A Modular First Formalisation of Combinatorial Design Theory. In Intelligent Computer Mathematics, 2021. doi:10.1007/978-3-030-81097-9_1.
  • [11] Jack Edmonds. Matroids and the Greedy Algorithm. Mathematical Programming, 1971. doi:10.1007/BF01584082.
  • [12] Jack Edmonds. Submodular Functions, Matroids, and Certain Polyhedra. In 5th International Workshop on Combinatorial Optimization – Eureka, You Shrink!: Papers Dedicated to Jack Edmonds. Springer Berlin Heidelberg, 2003. doi:10.1007/3-540-36478-1_2.
  • [13] András Frank. A Weighted Matroid Intersection Algorithm. Journal of Algorithms, 1981. doi:10.1016/0196-6774(81)90032-8.
  • [14] Daniel R. Grayson and Michael E. Stillman. Macaulay2, a software system for research in algebraic geometry. URL: http://www2.macaulay2.com.
  • [15] Maximilian P. L. Haslbeck, Peter Lammich, and Julian Biendarra. Kruskal’s Algorithm for Minimum Spanning Forest. Archive of Formal Proofs, 2019.
  • [16] T. A. Jenkyns. The Efficacy of the "Greedy" Algorithm. In Proceedings of the 7th Southeastern Conference on Combinatorics, Graph Theory and Computing, 1976. URL: https://cir.nii.ac.jp/crid/1570572699713216768.
  • [17] Jonas Keinholz. Matroids. Archive of Formal Proofs, 2018.
  • [18] Donald E. Knuth. The Art of Computer Programming: Combinatorial Algorithms, Part 1. Addison-Wesley Professional, 2011.
  • [19] Bernhard Korte and Dirk Hausmann. An Analysis of the Greedy Heuristic for Independence Systems. In Algorithmic Aspects of Combinatorics. Elsevier, 1978. doi:10.1016/S0167-5060(08)70322-4.
  • [20] Bernhard Korte and László Lovász. Greedoids and Linear Objective Functions. SIAM J. Algebraic Discrete Methods, 1984. doi:10.1137/0605024.
  • [21] Bernhard Korte and Jens Vygen. Combinatorial Optimization. Springer Berlin Heidelberg, 2012. doi:10.1007/978-3-642-24488-9.
  • [22] Joseph B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 1956. doi:10.1090/S0002-9939-1956-0078686-7.
  • [23] Peter Lammich and Tobias Nipkow. Proof Pearl: Purely Functional, Simple and Efficient Priority Search Trees and Applications to Prim and Dijkstra. In 10th International Conference on Interactive Theorem Proving (ITP), 2019. doi:10.4230/LIPICS.ITP.2019.23.
  • [24] Eugene L. Lawler. Matroid Intersection Algorithms. Mathematical Programming, 1975. doi:10.1007/BF01681329.
  • [25] Adrian Löwenberg and Maximilian Haslbeck. Proof of the Amortized Time Complexity of an Efficient Implementation of the Union-Find Data Structure in Isabelle/HOL. Last accessed on 27/05/2025. URL: https://github.com/adrilow/Proof-of-the-amortized-time-complexity-of-the-Union-Find-data-structure-in-Isabelle-HOL/tree/master.
  • [26] Nicolas Magaud, Julien Narboux, and Pascal Schreck. A Case Study in Formalizing Projective Geometry in Coq: Desargues theorem. Computational Geometry, 2012. doi:10.1016/j.comgeo.2010.06.004.
  • [27] Anshuman Mohan, Wei Xiang Leow, and Aquinas Hobor. Functional Correctness of C Implementations of Dijkstra’s, Kruskal’s, and Prim’s algorithms. In 33rd International Conference on Computer Aided Verification (CAV), 2021. doi:10.1007/978-3-030-81688-9_37.
  • [28] Peter Nelson, Mattias Ehatamm, Tom Iagovet, fair8 (pseudonym), Elan2004 (pseudonym), Max Jiang, and Kim Morrison. Matroids. Last accessed on 27/05/2025. URL: https://github.com/apnelson1/Matroid.
  • [29] Tobias Nipkow, Mohammad Abdulaziz, Jasmin Blanchette, Manuel Eberl, Alejandro Gomez-Londono, Peter Lammich, Lawrence Paulson, Christian Sternagel, Simon Wimmer, and Bohua Zhan. Functional Data Structures and Algorithms. A Proof Assistant Approach, 2025.
  • [30] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL - A Proof Assistant for Higher-Order Logic. Springer, 2002. doi:10.1007/3-540-45949-9.
  • [31] R. C. Prim. Shortest Connection Networks and Some Generalizations. The Bell System Technical Journal, 1957. doi:10.1002/j.1538-7305.1957.tb01515.x.
  • [32] R. Rado. Note on Independence Functions. Proceedings of the London Mathematical Society, 1957. doi:10.1112/plms/s3-7.1.300.
  • [33] A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer, 2003.