A Certified Proof Checker for Deep Neural Network Verification in Imandra
Abstract
Recent advances in the verification of deep neural networks (DNNs) have opened the way for a broader usage of DNN verification technology in many application areas, including safety-critical ones. However, DNN verifiers are themselves complex programs that have been shown to be susceptible to errors and numerical imprecision; this, in turn, has raised the question of trust in DNN verifiers. One prominent attempt to address this issue is enhancing DNN verifiers with the capability of producing certificates of their results that are subject to independent algorithmic checking. While formulations of Marabou certificate checking already exist on top of the state-of-the-art DNN verifier Marabou, they are implemented in C++, and that code itself raises the question of trust (e.g., in the precision of floating point calculations or guarantees for implementation soundness). Here, we present an alternative implementation of the Marabou certificate checking in Imandra – an industrial functional programming language and an interactive theorem prover (ITP) – that allows us to obtain full proof of certificate correctness. The significance of the result is two-fold. Firstly, it gives stronger independent guarantees for Marabou proofs. Secondly, it opens the way for the wider adoption of DNN verifiers in interactive theorem proving in the same way as many ITPs already incorporate SMT solvers.
Keywords and phrases:
Neural Network Verification, Farkas Lemma, Proof CertificationCopyright and License:
Guy Katz; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Software and its engineering Functional languages ; Software and its engineering Formal software verification ; Computing methodologies Neural networks ; Theory of computation Logic and verificationSupplementary Material:
Software (Source Code): https://github.com/rdesmartin/imandra-marabou-proof-checkingFunding:
The work of Komendantskaya was partially supported by the EPSRC grant AISEC: AI Secure and Explainable by Construction (EP/T026960/1) and ARIA grant “Safeguarded AI”. Desmartin acknowledges Imandra sponsorship for his PhD studies. The work of Isac and Katz was partially funded by the European Union (ERC, VeriDeL, 101112713). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.Editors:
Yannick Forster and Chantal KellerSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
As part of the AI revolution in computer science, deep neural networks (DNNs) are becoming the state-of-the-art solution to many computational problems, including in safety-critical applications, e.g., in medicine [41], aviation [24] and autonomous vehicles [10]. DNNs are complex functions, with parameters optimised (or trained) to fit a large set of input-output examples. Therefore, DNNs are intrinsically opaque for humans and are known for their vulnerability to errors in the presence of distribution shifts [42]. This raises the question of the reliability of DNNs in safety-critical domains and calls for tools that guarantee their safety [15].
One family of such tools are DNN verifiers, which allow to either mathematically prove that a DNN complies with certain properties or provide a counterexample for a violation [14]. They are typically based on SMT-solving with bound tightening (Marabou [45]), interval bound propagation (-CROWN [44]), or abstract interpretation (AI2 [25]). But implementing these verifiers is only a first step to full safety guarantees. Even complete DNN verifiers are prone to implementation bugs and numerical imprecision that, in turn, might compromise their soundness and can be maliciously exploited, as was shown in e.g. [15, 29, 46].
One could consider verifying DNN verifiers directly. However, as Wu et al. exemplify, a mature DNN verifier is a complex multi-platform library; its direct verification would be close to infeasible [45]. A similar problem was faced by the SMT solvers community [34, 6, 5, 18]. A solution was found: to produce (and then check) a proof certificate for each automatically generated proof, instead of checking the verifier in its entirety [33]. The software that checks the proof certificate is called the proof checker. Ideally, the proof checker should: a) be significantly simpler than the original verifier and b) yield strong guarantees of code correctness (i.e. be formally verified). For Marabou, the first half of this research agenda was accomplished by Isac et al. [27] with proof checking for Marabou being reduced to the application of the Farkas lemma [43], a well-known solvability theorem for a finite system of linear equations, along with a tree structure that reflects Marabou’s verification procedure. However, the proof checker in [27] came with no formal guarantees of its own correct implementation.
In this paper, we address this problem by deploying the well-established principles of proof-carrying code [35, 31] and self-certifying code [33]. Namely, we implement formal proof production from Marabou certificates in the interactive theorem prover Imandra [37]. Imandra can implement – and prove properties of – programs written in a subset of OCaml, and thus avoids a potential discrepancy between implementation and formalisation. In Imandra, we obtain a certificate checker implementation that can be safely executed and proven correct in the same language. Imandra supports arbitrary precision real arithmetic, and thus does not suffer from floating point imprecision that haunts DNN verifiers. Finally, it strikes a good balance between interactive and automated proving: i.e., it features strong automation while admitting user interaction with the prover via tactics. We exhibit the role of the Imandra checker within the verification process in Figure 1.
Contributions.
Reconstruction of full correctness proofs from Marabou certificates rests on two major results:
-
Firstly, the Imandra implementation of an algorithm that reconstructs a proof of validity of Marabou certificates. The latter are given by Marabou proof trees with a DNN verification query at the root, nodes corresponding to partitions of the search space, and Farkas vectors at the leaves. We get for free that our implementation of the algorithm is free from floating point instability, unlike the C++ version in [27].
-
Secondly, a formal proof of soundness of the algorithm proven in Imandra, that guarantees that if this algorithm returns UNSAT then indeed the given set of polynomial constraints at the root of the tree has no solution. This proof relies on a new formal proof of the DNN variant of the famous Farkas lemma in Imandra.
Finally, we demonstrate that our verified certificate checker can be executed in practice. We analyze the code complexity, run it on two verification benchmarks, and evaluate its performance speed against the original certificate checker and verifier [27]. Our results suggest an expected trade-off between the reliability and scalability of the proof checking process, with Imandra code taking about 4.56-4.76 times as long as original verification and checking time, on average.
The paper proceeds as follows. We start with explaining the essence of Marabou certificate production in Section 2 and introducing the Imandra proof of the Farkas lemma in Section 3. Section 4 introduces the new certificate checking algorithm in Imandra, and Section 5 proves its soundness. Finally, Section 6 is devoted to practical evaluation of the verified checker, and Section 7 concludes the paper.
2 Background
2.1 Deep Neural Networks as Graphs
A DNN is usually defined as a function , with inputs and outputs . We represent a DNN by a directed, acyclic, connected and weighted graph , where is a finite set of vertices and is a set of edges of the form , equipped with a weight function , and an activation function for each node . A vertex is called an input vertex if there is no s.t. and an output vertex if there is no s.t. . The following picture shows an example of a DNN represented as a graph:
We assume that all input vertices are assigned a value, using . For all other vertices, we define the weighted sum and neuron functions as follows:
| (1) |
Note that in this paper, we will use only two activation functions: rectified linear unit (ReLU), defined as and the identity function . However, in principle, this work can be extended to support DNNs with any piecewise linear activation function.
To cohere with the existing literature, we will use and (with indices) to refer to input and output vertices and and to lists of input and output vertices. For brevity, we will call them simply inputs and outputs.
2.2 DNN Verification Queries
Given a variable vector of size , and , a bound property is defined by inequalities for each element of . Given a DNN , an input bound property and a linear arithmetic output property , such that . If such exists, the verification problem is satisfiable (SAT); otherwise, it is unsatisfiable (UNSAT). Typically, represent an erroneous behavior; thus an input and output that satisfy the query serve as a counterexample and UNSAT implies safe behavior. Although our approach supports any linear property for the output, we focus on bound properties for simplicity. This definition can be extended to more sophisticated properties of DNNs, that express constraints on the a DNN’s internal variables [26].
Recall that all DNNs we consider 1.) come with affine (i.e., linear polynomial) weight functions and 2.) piecewise linear activation functions. As a consequence, any DNN verification problem can be reduced to piecewise linear constraints that represent the activation functions, accompanied with a Linear Programming (LP) [17] instance, which represents the affine functions and the input/output properties. One prominent approach for solving the DNN verification constraints is using algorithms for solving LP, coupled with a case-splitting approach for handling the piecewise linear constraints [7, 30].
One example of such a DNN verification algorithm is the Reluplex [30] algorithm, which extends the Simplex algorithm [17, 21] for solving LP problems. Based on a DNN verification problem , the algorithm initiates:
-
a variable vector containing variables and , and fresh variables and whose values represent weighted sum and neuron functions and for all which are neither an input nor an output. For each node with the ReLU activation function, the algorithm initiates an additional fresh auxiliary variable , to represent the non-negative difference .
-
two bound vectors , giving upper and lower bounds to each element in . The values for and are generated as follows: values of and are given directly by and . The remaining values of and are computed by propagating forward the lower and upper bounds of through using equations (1) and (2). For variables , we denote their lower and upper bounds and , respectively.
-
a matrix , called tableau, which represents the equations given in (1), with additional auxiliary equations. It is computed by defining a system of equations of the form:
(2) one for each non-input node . In , we record these coefficients as follows. Let , where is the size of and is the number of equations we generated as in (2). Each entry in contains the coefficient of the variable in in the equation. We then obtain , where is the dot product, and is the vector of zeros.
-
a set of ReLU constraints for variables in , given by for all non-input nodes , such that (as per equation (2)). We call the variables , and the participating variables of a constraint. We can join all in a conjunction: , and denote the resulting formula by . Intuitively, holds if the variables in satisfy the constraints in .
The tuple is a DNN verification query, and the tuple is a linear query. We use the notation for substituting the element in a vector by the value .
Example 1 (DNN Verification Query).
Consider the DNN in Section 2.1, the input bound property that holds if and only if and the output bound property that holds if and only if . We first obtain :
Next, and are computed as follows. Firstly, and already give the bounds for , , . For the rest, we propagate the input bounds. We get:
| (3) |
It only takes to assemble these values into vectors and :
To obtain , we first get all equations as in (2):
, ,
,
,
.
The first equation suggests we have coefficients: for , for and for . Because this equation does not feature any other variables in , we record zero coefficients for all other variables; which gives the first row in the matrix . We continue in the same manner for the remaining equations:
Finally, the set of ReLU constraints is given by: .
Definition 2 (Solution to DNN verification query).
Given a vector , variable assignment is a solution of the DNN verification query if:
| (4) |
We define the predicate that returns true if is a solution to . In case is empty, we will write .
2.3 DNN Verification in Marabou
In order to find a solution for a DNN verification query, or conclude there is none, DNN verifiers such as Marabou [45] rely on algorithms such as Simplex [17] for solving the linear part of the query (i.e. , and enhance it with a splitting approach for handling the piecewise linear constraints (i.e. ReLU) in . Conceptually, the DNN verifier invokes the linear solver, which either returns a solution to the linear part of the query or concludes there is none. If no solution exists, the DNN verifier concludes the whole query is UNSAT. If a solution is provided, then the verifier checks whether it satisfies the remaining ReLU constraints. If so, the verifier concludes SAT. If not, case splitting is applied, and the process repeats for every case. For constraints of the form , a case-split divides the query into two subqueries: one enhances the linear part of the query with and the other with . Adding is equivalent to adding , thus by adding the auxiliary variable, case splitting is performed by updating the bounds () without adding new equations. Besides splitting over a piecewise linear constraint, splitting can be performed over a single variable whose value can either be less or equal than some constant or greater or equal than the constant; i.e., one subquery is enhanced with and the other with . This induces a tree structure, with nodes corresponding to the splits. If the verifier answers a subquery, then its node represents a leaf, as no further splits are performed. A tree with all leaf subqueries are UNSAT corresponds to an UNSAT query, and inversely a tree with a single SAT leaf corresponds to a SAT query. Note that in every such UNSAT leaf, unsatisfiability is deduced using the linear part of the query.
Also, note that such a DNN verification scheme heavily relies on a solver for linear equations, which often manipulates the matrix , as well as many optimizations and heuristics for splitting strategy. A proof checker avoids implementing any of those operations.
3 Farkas Lemma for DNN Verification
Recall that the idea of a proof checker assumes that we can obtain a proof witness that the checker can certify independently. Since a solution to a DNN verification query is a satisfiability problem, a satisfying assignment serves as a straightforward (and independently checkable) proof witness of SAT. However, providing a proof witness for the UNSAT case was an open question, based on the NP-hardness of the problem [40]. The novelty of Isac et al.’s result [27] was in showing that the Farkas lemma [43] can be reformulated constructively in terms of the DNN verification problem, and that the vectors in its construction can be then used to witness the UNSAT proof. We will call this specialized constructive form of the Farkas lemma the DNN Farkas lemma (cf. § 3.1).
To use the DNN Farkas lemma as part of the verified checker, we need to prove its correctness in Imandra. Indeed, there exist Mathematical Components [1] proofs of the lemma in its original form [2, 39]. However, those formalisations rely strongly on comprehensive MathComp matrix libraries. Other ITP attempts [11, 8, 36] have suggested ways to prove the Farkas lemma directly in terms of systems of linear polynomial equations (abbreviated systems of l.p.e. for short), obtaining the original Farkas lemma as a corollary (cf. Figure 2). We will call this form of the Farkas lemma the polynomial Farkas lemma.
We follow the polynomial approach and prove the polynomial Farkas lemma in Imandra (UNSAT case only, cf. Theorem 5), obtaining its specialization to the case that refers to the parameters of the DNN verification problem, referred to as the DNN Polynomial Farkas lemma (cf. Theorem 6). In this shape, the lemma relates the existence of witnesses of a certain form to the unsatisfiability of a system of l.p.e. constructed from a DNN verification query. To ensure that the checker is sound, it remains to link this unsatisfiability to the unsatisfiability of the DNN verification query (Theorem 7). These results are shown in red double lines in Figure 2.
3.1 The DNN Farkas Lemma
To produce proofs of UNSAT for linear queries of the form , Isac et al. [27] prove a constructive variant of the Farkas lemma that defines proof witnesses of UNSAT:
Theorem 3 (DNN Farkas lemma [27]).
Let , a vector of variables and , such that and . Then exactly one of these two options holds:
-
1.
SAT: There exists a solution such that is true.
-
2.
UNSAT: There exists a contradiction vector such that for all , if we have . As , is a proof of the constraints’ unsatisfiability.
Moreover, in the UNSAT case, the contradiction vector can be constructed during the execution of the Simplex algorithm.
Note that, given a vector or , a proof checker can immediately conclude the satisfiability of , while constructing or often requires the DNN verifier to apply more sophisticated procedures [27]. We will prove a polynomial version of this result in Imandra.
3.2 Polynomial Farkas Lemma
Given coefficients and a vector of variables , a real linear polynomial of size is
defined as follows:
Given , one can form different polynomials . To simplify the notation, we write to denote arbitrary polynomials when the concrete choice of s is unimportant. We will use to denote a potentially different list of polynomials.
A linear polynomial equation is an equation of the form , where is either or .
The following formula defines a system of linear polynomial equations
where are arbitrary natural numbers.
We say that is satisfied by a vector
if is true.
In Imandra, we use the predicate , which returns true when the vector is a solution to .
We say the system of l.p.e. has a solution, denoted SAT, if there exists an such that is true. Otherwise, is UNSAT.
Theorem 6. (Polynomial Farkas Lemma (UNSAT case)). If then
Operations on Polynomials.
A polynomial with coefficients scaled by a real constant is defined as the polynomial over with coefficients , and noted . The addition of two polynomials and with coefficients and respectively, denoted as , is a polynomial with coefficients . The linear combination of polynomials with coefficient vector is defined as the sum of polynomials scaled by the coefficients in : .
We define the conversion function from the DNN verification query form to linear polynomial equations and inequalities.
Definition 4.
Tableau-to-polynomial conversion. Given a DNN verification query and variable vector , the conversion function is defined as follows:
| (5) |
-
is a polynomial whose coefficients are the values in the row of the matrix (and constant element is ): . It encodes the linear polynomial equations as defined in equation (2).
-
For all ,
-
For all ,
Note that the coefficients for polynomials and are sparse, with values at all indices except for the constant and at the index of .
Theorem 5 (Polynomial Farkas Lemma).
A system of l.p.e. has a solution iff there does not exist: 1. a linear combination of and 2. a linear combination of with non-negative coefficients, such that . We call witnesses of UNSAT for .
Proof.
We prove that witnesses and a solution cannot exist simultaneously.
-
1.
Assume there exists and , where for all , such that .
-
2.
Furthermore, assume that the system of l.p.e. has a solution . Then, for and for all , so we have . Similarly, it implies that for all , , so we have . This means that .
Since we postulated that this sum is negative, this leads to a contradiction.
The Imandra code covering one direction of Theorem 5 is given in Figure 3. It is the only direction needed to prove soundness of our implementation, and corresponds to the top double box in Figure 2. In Imandra, we define a valid certificate as a linear combination of the tableau rows which gives a polynomial with all coefficients equal to and a negative constant. The proof then proceeds by applying two lemmas: cert_is_neg (“a valid certificate always evaluates to a negative value”) and solution_is_not_neg (“evaluating a certificate with a solution to the system evaluates to a non-negative value”). Table 1 shows formal statements of these lemmas. The tactic auto resolves the contradiction in the theorem’s conclusion. We require that the system and variable vectors have matching dimensions, and use the predicate well_formed to assert that. Figure 4 gives the interested reader a glimpse of the completed proof produced by Imandra in response to user prompt given in Figure 3.
Generally, the user interacts with Imandra by supplying a set of automation tactics and possibly providing missing auxiliary lemmas. The keyword waterfall refers to a famous Boyer-Moore inductive proof automation approach [12] that is deployed in ACL2 and Imandra [37]. It dynamically composes tactics that generate induction schemes, perform simplification, rewriting, generalization and forward-chaining reasoning during proof search. Together with proofs of auxiliary lemmas the proof of Theorem 5 is nearly 200 lines long.
3.3 Farkas Lemma for DNN Proof Checking
Marabou produces proofs of UNSAT in the form given in Theorem 3, but the Imandra implementation uses the formulation of the Farkas lemma given in Theorem 5. We need to prove that Theorem 5 specializes to systems of l.p.e that encode DNN verification queries:
Theorem 6 (DNN Polynomial Farkas Lemma).
If then .
To deduce the UNSAT of a DNN verification query from the UNSAT of its equivalent system of l.p.e., we prove the following Lemma (represented by the lower red double arrow in Figure 2).
Theorem 7 (Sound Application of DNN Polynomial Farkas Lemma).
If then .
It is straightforward to see that, by Definition 4, polynomials in have a one-to-one correspondence to constraints in . However proving this in Imandra was not trivial. We consider the equations from the tableau rows and the inequalities from the bounds separately. For the equations, tableau_reduction is proven by Imandra with minimal guidance by induction on the tableau length. On the other hand, bound_reduction necessitated linking index-based computations (the bound polynomial for the bound is constructed as a polynomial with a single non-zero coefficient at index ) and recursion-based computation (e.g. for checking whether a vector is bounded by and ).
4 A DNN Certificate Checker in Imandra
Recall that Marabou search induces tree-like structures. The proof production in [27] constructs a proof tree based on the search trace of UNSAT queries, serving as a witness. We now introduce an alternative algorithm that checks these proof trees and certifies that the corresponding DNN verification query is UNSAT. If an error is detected, it can identify the specific parts of the proof that contributed to the failure. These correspond to search states where Marabou failed to produce a correct proof.
Example 8 (Marabou Proof Tree Construction).
Below is a graphical representation of a proof tree proving that the query presented in Example 1 is UNSAT and witnessing that during execution, Marabou performed a single split over the ReLU constraint derived from . The proof contains the tableau , the bound vectors , and the ReLU constraints :
Each leaf contains a contradiction vector for their respective subquery (cf. Theorem 3).
A proof tree t, represented by an inductive type proofTree, can be either a leaf Leaf w or a non-leaf node Node s tl tr. A leaf Leaf w corresponds to an UNSAT result of the linear part of a subquery; leaves hence always contain a contradiction vector w : real list. A non-leaf node Node s tl tr corresponds to a split s: Split and corresponding sub-proof-trees tl and tr. We consider two kinds of splits: s: Split can be either a case split over a ReLU constraint, Relu b f aux, where b, f, and aux are the participating variables in the corresponding ReLU constraint (i.e. the DNN verification query includes ), or a single-variable split SingleVar xi k, performed on a variable with a split on value .
This type guarantees some structural properties but not the full certification of the proof trees. For example, for Leaf w, w has to be checked to be a contradiction vector for the corresponding subquery to construct a witness of UNSAT (Theorem 5). During the parsing of the Marabou proof, we hence check that each ReLU split corresponds to one of the constraints in the original DNN verification query via the recursive check_tree algorithm (Algorithm 1).
Marabou uses splits to compute tighter upper and lower bound vectors, later used to prove UNSAT in the proof tree’s leaves. The function update_bounds u l s performs this bound tightening and yields updated lower and upper bounds for both subtrees:
In the case of a split on a single variable with value , the updated bounds are and . In the case of a split on a ReLU constraint of the form , the updated bounds are and for one child and and for the other. This corresponds to the case analysis of lemma relu_split (see Table 1). We call the first case inactive phase and the second case active phase of the ReLU constraint. Note that the order of the phases is fixed during the tree construction; the fixed order is followed in the bound updating implementation (see lines 4, 12).
The function check_tree, traverses a proof tree t and given a DNN verification query ensures that for all the leaves it is able to build witnesses of UNSAT in the sense of Theorem 6. It is given in pseudocode (Algorithm 1) and Imandra code (Figure 6). For a leaf node containing a list of reals , the procedure mk_certificate computes witness candidates as in Theorem 6, then the algorithm checks whether (mk_certificate in Figure 6, l. 6). If this check passes, by Theorem 5, are witnesses of UNSAT for the system of l.p.e. . By Theorem 7, is UNSAT as well. For a non-leaf node with data :
-
the procedure update_bounds computes updated versions of corresponding to the two phases of the split as described in the previous section (update_bounds, lines 10-11).
-
recursive calls to check_tree, using the new bounds that correspond to each sub-tree, check the sub-trees rooted in and (lines 13-14).
A tree is certified if all checks pass. The function implements Algorithm 1 almost verbatim, with the notable difference in the use of the check_split function (Figure 6, line 9). It re-iterates the check that ReLU splits indeed correspond to a known ReLU constraint of the verification query.
The following example shows the execution of this algorithm.
Example 9 (Checking the Proof Tree of Example 8).
To certify the proof tree, the algorithm begins by checking the root node by certifying that the bound vectors of the two children correspond to the splits and , i.e., to the two splits of the constraint . Then, it recursively checks the leaves. It starts by updating , , and and certifying the contradiction vector of the leaf: it constructs the system of l.p.e. with equations representing the tableau : , and the inequalities representing the bounds . For simplicity, we consider only the inequalities and . Then, it constructs the certificate , . Lastly, it checks that indeed .
The right leaf is checked similarly. Since all nodes pass the checks, the algorithm returns true, and the certification of this proof tree is complete.
5 Proof of Soundness
This section presents the proof of soundness of the algorithm introduced in the previous section: given a DNN verification query and the corresponding tree witness of UNSAT , if check_tree returns true, then there should exist no satisfying assignment for .
The proof follows a custom induction scheme on the structure of check_tree. We will first prove the base case, when the proof tree is a leaf (Lemma 10), then the induction step, when it is a node with children (Lemma 13). The proof for leaves is straightforward thanks to the Farkas lemma. The case for non-leaf nodes is trickier, as it involves proving that splits fully cover the DNN verification query solution space. This can be done by proving that both single variable splits and ReLU splits are covering (Lemmas 12 and 13 respectively).
We first prove that check_tree is correct for proof trees leaves:
Lemma 10 (Leaf checking).
If is a leaf and returns then .
Proof.
By the definition of check_tree and Theorems 5 and 7. For the inductive case, we first prove that tightening bounds according to splits is covering. We state the definition of boundedness as a conjunction instead of a universally quantified index to avoid instantiating an index and to allow better automation in Imandra.
Definition 11 (Bound vectors).
Let . We say that is bounded by and (and we write ) if .
For single variable splits, the following lemma is proven (we omit the proof as it is simple):
Lemma 12 (Single variable splits are covering).
Let ; ; ; , , , .
If then .
For ReLU splits, recall that for each split with indices , the DNN verification query includes .
Lemma 13 (ReLU splits are covering).
Let ; such that ; , , , .
If then .
Proof.
Assuming that holds, we consider the cases given by relu_split, see Table 1 for statements of auxiliary lemmas mentioned below:
-
1.
Case 1: . We prove that for all indices , , by considering cases according to the value of .
-
(a)
if :
-
(b)
if :
-
(c)
if :
-
(a)
-
2.
Case 2: . We prove that for all indices , , we do a similar case analysis on the value of . The proof proceeds similarly to Case 1.
Lemma 14 states that if the query corresponding to a parent node in the proof tree has a satisfying assignment, then this assignment will satisfy one of the child node’s queries.
Lemma 14 (Splits are covering).
Let ; a split, , are the bounds computed by .
If there exists that satisfies then satisfies or satisfies .
Proof.
Since the bounds are the only parts of the children’s queries that differ from their parent’s query, proving Lemma 14 only requires to prove that the updated bounds are covering. This follows from Lemmas 12 and 13.
Theorem 15 (Algorithm 1 is sound).
If returns true, then
.
Proof.
We proceed by a functional induction scheme based on check_tree’s definition.
Base case.
is a leaf. By Lemma 10.
Induction step.
Let be a split, and be two proof trees and be the bounds obtained from . Our induction hypothesis states that check_tree is sound for and . We now need to prove that check_tree is sound for , where is the proof tree with children and and split .
Assuming that holds, by definition and also hold. By the induction hypothesis, this means that and
By Lemma 14, we conclude that . Note that the proof assumes that all the query elements are well-formed w.r.t the variable vector, i.e. the tableau, bounds and variable vector dimensions match, and the proof-tree splits correspond to the query constraints. In the implementation, we need to prove that these properties are preserved throughout the inductive steps, with lemmas such as well_formed_preservation (see Table 1). Even though we need to give such indications, the custom inductive scheme is derived automatically by Imandra: the user tactic interaction with Imandra is shown in Figure 7.
6 Evaluation
In this section, we evaluate the new implementation of across two orthogonal axes: first, the code complexity of its main modules; and second, the performance speed, compared to the Marabou C++ implementation.
Code Complexity.
The overview of the entire formalisation is given in Table 2. In addition to counting L.O.C., we also count the number of auxiliary lemmas per proof, as they are the main mode of user interaction with Imandra proof search. We note that, due to the clever proof production offered by the Waterfall method in Imandra [12, 37], the size of the human-written code (as exemplified in Figure 3, 7) is much smaller than the actual length of the corresponding proof (as shown in Figure 4).
Performance Speed.
As scalability of DNN verifiers is a major factor within the DNN verification community [13, 14], any implementation of algorithms should be considered with respect to its performance. We used the proof producing version of Marabou to solve queries from two families of benchmarks: (1) collision avoidance (coav), which verifies a DNN with 137 ReLU neurons attempting to predict collisions of two vehicles that follow curved paths at different speed [22]; and (2) robotics navigation [3] with properties of a neural robot controller with 32 ReLU neurons. We chose these benchmarks because they provide a large dataset of UNSAT DNN verification queries that are solvable in a short time. We have disabled some optimizations within Marabou – proofs of bound tightenings that are derived based on the ReLU constraints, and thus are not proven directly by using the Farkas lemma [27].
Overall, Marabou has solved all queries and successfully generated UNSAT proofs for 293 coav and 213 robotics queries. For some queries, proofs were not generated due to early UNSAT deduction during preprocessing. We evaluated the results of 276 coav and 180 robotics queries, which have proof size smaller than 5MB. All UNSAT proofs were checked by the native Marabou checker and by Imandra, and both checkers have certified all queries. For each query, we measured the time it took Marabou to deduce UNSAT (Verification time) and the time it took the native Marabou and Imandra to check the proofs (Native and Imandra checking time, respectively). Our evaluation is depicted in Figure 8. For the coav and robotics benchmarks, on average, Marabou solved the queries in 5.40 and 5.62 seconds, respectively; while its native checker checked the proofs in average of 0.003, 0.03 seconds. Imandra required on average 24.64 and 26.72 seconds, suggesting that, regardless of network size, Imandra requires time proportionate to the verification time of Marabou. This hypothesis needs to be further investigated. Furthermore, even though Imandra’s checking time is considerably slower than Marabou’s, its effect on the overall verification process is of factor slowdown.
Although performance differences are expected due to the use of more precise arithmetic, our results suggest a crucial trade-off between scalability and reliability. We believe that some performance difference is due to the use of verification-oriented data structures such as lists, and thus using optimized structures such as maps may improve speed significantly as shown in [19], especially in the presence of sparse vectors in DNN Verification queries.
7 Conclusions, Future and Related Work
The nascent field of DNN verifiers faces several challenges, and certification of DNN verifiers themselves is widely recognized as one of them [15, 45]. Recent work of [27] laid theoretical foundations for the certification of the DNN verifier Marabou, by connecting its certificate producing version with the well-known Farkas lemma. In this paper, we take inspiration from the proof-carrying code [35, 31] and self-certifying code [33] tradition and propose a framework that implements, executes, and verifies a checker for certificates produced by Marabou, within the same programming language, thus obtaining high assurance levels. Moreover, thanks to Imandra’s native use of infinite precision reals, we avoid the problem of floating point imprecision in the checker. Evaluating our checker suggests a trade-off between reliability and scalability. We hypothesize that the differences originate primarily due to the use of infinite-precision arithmetic and the choice of verification-oriented data structures. We leave a more detailed analysis for future work.
7.1 Related Work
We hope to encourage further collaboration between verification and programming language communities. Notable is the recent success of industrial ITPs, such as Imandra [38] and F* [4, 32]. Such languages are particularly suitable for AI verification, thanks to their automation and other modern features [20]. Our work opens the opportunity for future integration of DNN verifiers in these provers and ITPs more generally.
Imandra vs other Provers.
In principle, this work could be replicated in other proof assistants such as Isabelle/HOL, ACL2, Lean, PVS or Rocq, and it could be interesting to do so. Anecdotally, we find that Imandra’s high level of proof automation, lemma discovery features, integrated bounded and unbounded verification, and efficient model execution make it an ideal environment for developing verified tools such as our verified proof checker.
Proof-evidence production for SMT solvers.
Existing formalisations of the Farkas Lemma.
Although our proof of the Farkas lemma takes into consideration the previous experience in other ITPs [8, 11, 36], it had to include substantial modifications to cover the DNN case and in particular Imandra Theorem 7 is original relative to the cited papers. Moreover, we prove Farkas lemma in a polynomial form (in Section 3). As opposed to the original matrix-based Farkas lemma [43], the polynomial version yields easier automation in Imandra.
7.2 Future Work
There are a number of directions in which we plan to extend this work.
From Farkas vectors to specifications.
We plan to lift the soundness of the checker (Theorem 15) to the level of DNN verification queries, and, via a certified compilation procedure, to higher-order specification languages such as Vehicle [16]. This would extend formal verification to encompass the left-hand-side blocks of Figure 1. Thus, this paper can be seen as a first step towards developing methods of proof-carrying neuro-symbolic code [31].
Optimisation of Proof Checking.
One of Marabou’s key techniques for scaling to larger verification problems is the use of theory lemmas for dynamic bound tightening. These lemmas characterize the connections between variables that participate in a ReLU constraint, and are not proven directly by using the Farkas lemma. This feature is supported by the proof production in [27]. Although it is possible to run DNN verification queries – and certify their proofs – without support of these theory lemmas, they are key to scaling to larger verification queries. We implemented them in Imandra, but certifying their soundness would be the next logical step to fully support proofs generated by Marabou. While supporting DNNs with other piecewise-linear activations, such as maxpool and sign, is relatively striaghtforward, non-linear activations are more challenging: their verification relies on over-approximations, which Marabou does not currently generate proofs for. This is also grounded in theoretical results [28].
Verification of cyber-physical systems with DNN components.
is another possible application for presented results. For this, DNN verifiers need to interface with languages that can express the system dynamics and/or probabilistic safety properties [16]. The presented proof production will ensure that any such integration is sound.
References
- [1] Reynald Affeldt, Yves Bertot, Cyril Cohen, Marie Kerjean, Assia Mahboubi, Damien Rouhling, Pierre Roux, Kazuhiko Sakaguchi, Zachary Stone, Pierre-Yves Strub, et al. MathComp-Analysis: Mathematical Components Compliant Analysis Library. https://math-comp.github.io/, 2022.
- [2] Xavier Allamigeon and Ricardo D. Katz. A Formalization of Convex Polyhedra Based on the Simplex Method. Journal of Automated Reasoning, 63(2):323–345, 2019. doi:10.1007/S10817-018-9477-1.
- [3] Guy Amir, Davide Corsi, Raz Yerushalmi, Luca Marzari, David Harel, Alessandro Farinelli, and Guy Katz. Verifying Learning-Based Robotic Navigation Systems. In Proc. 29th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 607–627, 2023. doi:10.1007/978-3-031-30823-9_31.
- [4] Cezar-Constantin Andrici, Stefan Ciobaca, Catalin Hritcu, Guido Martínez, Exequiel Rivas, Éric Tanter, and Théo Winterhalter. Securing verified IO programs against unverified code in F*. Proc. 51st Symposium on Principles of Programming Languages (POPL), pages 2226–2259, 2024. doi:10.1145/3632916.
- [5] Haniel Barbosa, Andrew Reynolds, Gereon Kremer, Hanna Lachnitt, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Arjun Viswanathan, Scott Viteri, Yoni Zohar, Cesare Tinelli, and Clark Barrett. Flexible Proof Production in an Industrial-Strength SMT Solver. In Proc. 11th Int. Joint Conference on Automated Reasoning (IJCAR), pages 15–35, 2022. doi:10.1007/978-3-031-10769-6_3.
- [6] Clark Barrett, Leonardo de Moura, and Pascal Fontaine. Proofs in Satisfiability Modulo Theories. All about Proofs, Proofs for All, 55(1):23–44, 2015.
- [7] O Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring Neural Net Robustness with Constraints. In Proc. 30th Conf. on Neural Information Processing Systems (NeurIPS), 2016.
- [8] Frédéric Besson, Pierre-Emmanuel Cornilleau, and David Pichardie. Modular SMT Proofs for Fast Reflexive Checking inside Coq. In Proc. 38th Symposium on Principles of Programming Languages (POPL), pages 151–166, 2011. doi:10.1007/978-3-642-25379-9_13.
- [9] Sascha Böhme and Tjark Weber. Fast LCF-Style Proof Reconstruction for Z3. In Proc. 1st Int. Conf. on Interactive Theorem Proving (ITP), pages 179–194, 2010. doi:10.1007/978-3-642-14052-5_14.
- [10] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End to End Learning for Self-Driving Cars, 2016. Technical Report. http://arxiv.org/abs/1604.07316. arXiv:1604.07316.
- [11] Ralph Bottesch, Max W. Haslbeck, and René Thiemann. Farkas’ Lemma and Motzkin’s Transposition Theorem. Archive of Formal Proofs, 2019.
- [12] Robert S. Boyer and J. Strother Moore. A Computational Logic. ACM Monograph Series. Academic Press, New York, 1979.
- [13] Christopher Brix, Stanley Bak, Taylor T Johnson, and Haoze Wu. The Fifth International Verification of Neural Networks Competition (VNN-COMP 2024): Summary and Results, 2024. Technical report. http://arxiv.org/abs/2412.19985. doi:10.48550/arXiv.2412.19985.
- [14] Christopher Brix, Mark Niklas Müller, Stanley Bak, Taylor T Johnson, and Changliu Liu. First Three Years of the International Verification of Neural Networks Competition (VNN-COMP), 2023. Technical report. http://arxiv.org/abs/2301.05815. doi:10.48550/arXiv.2301.05815.
- [15] L. Cordeiro, M. Daggitt, J. Girard, O. Isac, T. Johnson, G. Katz, E. Komendantskaya, E. Manino, A. Sinkarovs, and H. Wu. Neural Network Verification is a Programming Language Challenge. In Proc. 34th European Symposium on Programming (ESOP), 2025.
- [16] Matthew L. Daggitt, Wen Kokke, Robert Atkey, Ekaterina Komendantskaya, Natalia Slusarz, and Luca Arnaboldi. Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs. In Proc. 10th Int. Conf. on Formal Structures for Computation and Deductionn, (FSCD), 2025.
- [17] G. Dantzig. Linear Programming and Extensions. Princeton University Press, 1963.
- [18] L. de Moura and N. Bjørner. Satisfiability Modulo Theories: Introduction and Applications. Communications of the ACM, 54(9):69–77, 2011. doi:10.1145/1995376.1995394.
- [19] Remi Desmartin, Grant O. Passmore, and Ekaterina Komendantskaya. Neural Networks in Imandra: Matrix Representation as a Verification Choice. In Proc. 5th Int. Workshop of Software Verification and Formal Methods for ML-Enabled Autonomous Systems (FoMLAS) and 15th Int. Workshop on Numerical Software Verification (NSV), pages 78–95, 2022. doi:10.1007/978-3-031-21222-2_6.
- [20] Remi Desmartin, Grant O. Passmore, Ekaterina Komendantskaya, and Matthew Daggit. CheckINN: Wide Range Neural Network Verification in Imandra. In Proc. 24th Int. Symposium on Principles and Practice of Declarative Programming (PPDP), pages 3:1–3:14, 2022. doi:10.1145/3551357.3551372.
- [21] B. Dutertre and L. de Moura. A Fast Linear-Arithmetic Solver for DPLL(T). In Proc. 18th Int. Conf. on Computer Aided Verification (CAV), pages 81–94, 2006.
- [22] R. Ehlers. Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks. In Proc. 15th Int. Symp. on Automated Technology for Verification and Analysis (ATVA), pages 269–286, 2017.
- [23] Burak Ekici, Alain Mebsout, Cesare Tinelli, Chantal Keller, Guy Katz, Andrew Reynolds, and Clark Barrett. SMTCoq: A Plug-in for Integrating SMT Solvers into Coq. In Proc. 29th Int. Conf. Computer Aided Verification (CAV 2017), pages 126–133, 2017. doi:10.1007/978-3-319-63390-9_7.
- [24] Yizhak Elboher, Raya Elsaleh, Omri Isac, Mélanie Ducoffe, Audrey Galametz, Guillaume Povéda, Ryma Boumazouza, Noémie Cohen, and Guy Katz. Robustness Assessment of a Runway Object Classifier for Safe Aircraft Taxiing. In Proc. 43rd Int Digital Avionics Systems Conf. (DASC), pages 1–6, 2024.
- [25] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In Proc. 39th IEEE Symposium on Security and Privacy (SP), pages 3–18, 2018. doi:10.1109/SP.2018.00058.
- [26] Divya Gopinath, Luca Lungeanu, Ravi Mangal, Corina Pasareanu, Siqi Xie, and Huafeng Yu. Feature-Guided Analysis of Neural Networks. In Proc. 26th Int. Conf. on Fundamental Approaches to Software Engineering (FASE), pages 133–142, 2023. doi:10.1007/978-3-031-30826-0_7.
- [27] O. Isac, C. Barrett, M. Zhang, and G. Katz. Neural Network Verification with Proof Production. In Proc. 22nd Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 38–48, 2022.
- [28] Omri Isac, Yoni Zohar, Clark Barrett, and Guy Katz. DNN Verification, Reachability, and the Exponential Function Problem. In Proc. 34th Int. Conf. on Concurrency Theory (CONCUR), 2023.
- [29] K. Jia and M. Rinard. Exploiting Verified Neural Networks via Floating Point Numerical Error. In Proc. 28th Int. Static Analysis Symposium (SAS), pages 191–205, 2021.
- [30] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: a Calculus for Reasoning about Deep Neural Networks. Formal Methods in System Design (FMSD), 2021.
- [31] Ekaterina Komendantskaya. Proof-Carrying Neuro-Symbolic Code. In Computability in Europe (CiE), 2025.
- [32] Denis Merigoux, Nicolas Chataing, and Jonathan Protzenko. Catala: A Programming Language for the Law. Proceedings of the ACM on Programming Languages, 5:77:1–77:29, 2021. doi:10.1145/3473582.
- [33] Kedar S. Namjoshi and Lenore D. Zuck. Program Correctness through Self-Certification. Communications of the ACM, pages 74–84, 2025. doi:10.1145/3689624.
- [34] G. Necula. Compiling with Proofs. Carnegie Mellon University, 1998.
- [35] George Necula. Proof-carrying code. In Proc. 24th Symposium on Principles of Programming Languages (POPL), pages 106–119, 1997. doi:10.1145/263699.263712.
- [36] Grant Passmore. ACL2 Proofs of Nonlinear Inequalities with Imandra. Electronic Proceedings in Theoretical Computer Science, 393:151–160, 2023. doi:10.4204/EPTCS.393.12.
- [37] Grant Passmore, Simon Cruanes, Denis Ignatovich, Dave Aitken, Matt Bray, Elijah Kagan, Kostya Kanishev, Ewen Maclean, and Nicola Mometto. The Imandra Automated Reasoning System (System Description). In Proc. 10th Int. Joint Conf. Automated Reasoning (IJCAR), pages 464–471, 2020. doi:10.1007/978-3-030-51054-1_30.
- [38] Grant Olney Passmore. Some Lessons Learned in the Industrialization of Formal Methods for Financial Algorithms. In Proc. 24th Int. Symposium on Formal Methods (FM), pages 717–721, 2021. doi:10.1007/978-3-030-90870-6_39.
- [39] Kazuhiko Sakaguchi. vass. https://github.com/pi8027/vass, 2025.
- [40] Marco Sälzer and Martin Lange. Reachability Is NP-Complete Even for the Simplest Neural Networks. In Proc. 15th Int. Conf. on Reachability Problems (RP), pages 149–164, 2021. doi:10.1007/978-3-030-89716-1_10.
- [41] Kenji Suzuki. Overview of Deep Learning in Medical Imaging. Radiological Physics and Technology, 10(3):257–273, 2017.
- [42] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing Properties of Neural Networks, 2013. Technical report. http://arxiv.org/abs/1312.6199.
- [43] R. Vanderbei. Linear Programming: Foundations and Extensions. Journal of the Operational Research Society, 1996.
- [44] Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification. Advances in Neural Information Processing Systems, 34:29909–29921, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/fac7fead96dafceaf80c1daffeae82a4-Abstract.html.
- [45] H. Wu, O. Isac, A. Zeljić, T. Tagomori, M. Daggitt, W. Kokke, I. Refaeli, G. Amir, K. Julian, S. Bassan, P. Huang, O. Lahav, M. Wu, M. Zhang, E. Komendantskaya, G. Katz, and C. Barrett. Marabou 2.0: A Versatile Formal Analyzer of Neural Networks. In Proc. 36th Int. Conf. on Computer Aided Verification (CAV), pages 249–264, 2024.
- [46] Dániel Zombori, Balázs Bánhelyi, Tibor Csendes, István Megyeri, and Márk Jelasity. Fooling a Complete Neural Network Verifier. In Proc. 9th Int. Conf. on Learning Representations (ICLR), 2021.
