Resolution with Counting: Dag-Like Lower Bounds and Different Moduli

Resolution over linear equations is a natural extension of the popular resolution refutation system, augmented with the ability to carry out basic counting. Denoted Res(linR)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}_R)$$\end{document}, this refutation system operates with disjunctions of linear equations with Boolean variables over a ring R, to refute unsatisfiable sets of such disjunctions. Beginning in the work of Raz & Tzameret (2008), through the work of Itsykson & Sokolov (2020) which focused on tree-like lower bounds, this refutation system was shown to be fairly strong. Subsequent work (cf. Garlik & Kołodziejczyk 2018; Itsykson & Sokolov 2020; Krajícek 2017; Krajícek & Oliveira 2018) made it evident that establishing lower bounds against general Res(linR)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}_R)$$\end{document} refutations is a challenging and interesting task since the system captures a ``minimal'' extension of resolution with counting gates for which no super-polynomial lower bounds are known to date. We provide the first super-polynomial size lower bounds against general (dag-like) resolution over linear equations refutations in the large characteristic regime. In particular, we prove that the subset-sum principle 1+∑i=1n2ixi=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1+\sum\nolimits_{i=1}^{n}2^i x_i = 0$$\end{document} requires refutations of exponential size over Q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb{Q}$$\end{document}. We use a novel lower bound technique: We show that under certain conditions every refutation of a subset-sum instance f=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f=0$$\end{document} must pass through a fat clause consisting of the equation f=α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f=\alpha$$\end{document} for every α\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document} in the image of f under Boolean assignments, or can be efficiently reduced to a proof containing such a clause. We then modify this approach to prove exponential lower bounds against tree-like refutations of any subset-sum instance that depends on n variables, hence also separating tree-like from dag-like refutations over the rationals. We then turn to the finite fields regime, showing that the work of Itsykson & Sokolov (2020), where tree-like lower bounds over F2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb{F}_2$$\end{document} were obtained, can be carried over and extended to every finite field. We establish new lower bounds and separations as follows: (i) For every pair of distinct primes p,q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q$$\end{document}, there exist CNF formulas with short tree-like refutations in Res(linFp)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}{\mathbb{F}_p})$$\end{document} that require exponential-size tree-like Res(linFq)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}{\mathbb{F}_q})$$\end{document} refutations; (ii) random k-CNF formulas require exponential-size tree-like Res(linFp)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}{\mathbb{F}_p})$$\end{document} refutations, for every prime p and constant k; and (iii) exponential-size lower bounds for tree-like Res(linF)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm Res}({\rm lin}{\mathbb{F}})$$\end{document} refutations of the pigeonhole principle, for every field F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb{F}$$\end{document}.


Introduction
The resolution refutation system is among the most prominent and well-studied propositional proof systems, and for good reasons: It is a natural and simple refutation system, that, at least in practice, is capable of being easily automatized. Furthermore, while being non-trivial, it is simple enough to succumb to many lower bound techniques.
Formally, a resolution refutation of an unsatisfiable CNF formula is a sequence of clauses D 1 , . . . , D l = ∅, where ∅ is the empty clause, such that each D i is either a clause of the CNF or is derived from previous clauses D j , D k , j ≤ k < i by means of applying the following resolution rule: from the clauses C ∨ x and D ∨ ¬x derive C ∨ D.
The tree-like version of resolution, where every occurrence of a clause in the refutation is used at most once as a premise of a rule, is of particular importance, since it helps us to understand certain kind of satisfiability algorithms known as DPLL algorithms (cf. Nordström 2015). DPLL algorithms are simple recursive algorithms for solving SAT that are the basis of successful contempo-rary SAT solvers. The transcript of a run of DPLL on an unsatisfiable formula is a decision tree, which can be interpreted as a tree-like resolution refutation. Thus, lower bounds on the size of tree-like resolution refutations imply lower bounds on the runtime of DPLL algorithms (though it is important to clarify that contemporary SAT solvers utilize more than the strength of tree-like resolution).
In contrast to the apparent practical success of SAT solvers, a variety of hard instances that require exponential-size refutations have been found for resolution during the years. Many classes of such hard instances are based on principles expressing some sort of counting. One famous example is the pigeonhole principle, denoted PHP m n , expressing that there is no (total) injective map from a set with cardinality m to a set with cardinality n if m > n (Haken 1985). Another important example is Tseitin tautologies, denoted TS G , expressing that the sum of the degrees of vertices in a graph G must be even (Tseitin 1968).
Since such counting tautologies are a source of hard instances for resolution, it is useful to study extensions of resolution that can efficiently count, so to speak. This is important firstly, because such systems may become the basis of more efficient SAT solvers and secondly, in order to extend the frontiers of lower bound techniques against stronger and stronger propositional proof systems. Indeed, there are many works dedicated to the study of weak systems operating with De Morgan formulas with counting connectives; these are variations of resolution that operate with disjunctions of certain arithmetic expressions.
One such extension of resolution was introduced in Raz & Tzameret (2008) under the name resolution over linear equations in which literals are replaced by linear equations. Specifically, the system R(lin), which operates with disjunctions of linear equations over Z was studied in Raz & Tzameret (2008). This work demonstrated the power of resolution with counting over the integers, and specifically provided polynomial upper bounds for the pigeonhole principle and the Tseitin formulas, as well as other basic counting formulas. It also established exponential lower bounds for a subsystem of R(lin), denoted R 0 (lin). Subsequently, Itsykson & Sokolov (2020) studied resolution over linear equations over F 2 , denoted Res(⊕). They demonstrated the power of resolution with counting mod 2 as well as its limitations by means of several upper and tree-like lower bounds. Moreover, Itsykson & Sokolov (2020) introduced DPLL algorithms, which can "branch" on arbitrary linear forms over F 2 , as well as parity decision trees, and showed a correspondence between parity decision trees and tree-like Res(⊕) refutations. In both Raz & Tzameret (2008) and Itsykson & Sokolov (2020) the dag-like lower bound question for resolution over linear equations remained open.
Apart from being a very natural refutation system, understanding the proof complexity of resolution over linear equations is important for the following reason: proving super-polynomial dag-like lower bounds against resolution over linear equations for prime fields and for the integers can be viewed as a first step towards the long-standing open problems of AC 0 [p]-Frege and TC 0 -Frege lower bounds, respectively. We explain this in what follows.
Resolution operates with clauses, which are De Morgan formulas (¬, unbounded fan-in ∨ and ∧) of a particular kind, namely, of depth 1. Thus, from the perspective of proof complexity, resolution is a fairly weak version of the propositional calculus, where the latter operates with arbitrary De Morgan formulas. Under a natural and general definition, propositional calculus systems go under the name Frege systems: they can be (axiomatic) Hilbert-style systems or sequent calculus style systems. A particular choice of the formalism is not important: A classical result by Reckhow (1976) assures us that all Frege systems are polynomially equivalent. The task of proving lower bounds for general Frege systems is notoriously hard: No nontrivial lower bounds are known to date. Basically, the strongest fragment of Frege systems, for which lower bounds are known are AC 0 -Frege systems, which are Frege proofs operating with constant-depth formulas. For example, both PHP m n and TS G do not admit sub-exponential proofs in AC 0 -Frege (Ajtai 1988;Ben-Sasson 2002;Håstad 2017;Krajíček et al. 1995;Pitassi et al. 1993). However, if we extend the De Morgan language with counting connectives such as unbounded fan-in mod p (AC 0 [p]-Frege) or threshold gates (TC 0 -Frege), then we step again into the dark-ness: proving super-polynomial lower bounds for these systems is a long-standing open problem on what can be characterized as the "frontiers" of proof complexity. Recent works by Krajícek (2017), Garlik & Ko lodziejczyk (2018) and Krajícek & Oliveira (2018) had suggested possible approaches to attack dag-like Res(lin F 2 ) lower bounds (though this problem remains open to date).

Our results and techniques.
In this work, we prove a host of new lower bounds, separations and upper bounds for resolution over linear equations. Our first technical contribution is a dag-like refutation lower bound over large characteristic fields. Conceptually, the proof idea exploits two main properties that recently have been found useful in proof complexity: (i) Single axiom: the hard instance consists of a single unsatisfiable axiom (for Boolean assignments) (1.1) 1 + n i=1 2 i x i = 0 (unlike, for instance, a set of clauses).
(ii) Large coefficients: the hard instance uses coefficients of exponential magnitude (though their bit size is polynomial, and we consider coefficients to be written in binary in proofs).
Although employing different approaches, both of these properties played a recent role in proof complexity lower bounds. Forbes et al. (2016) used subset-sum variants (that is, unsatisfiable linear equations with Boolean variables) to establish lower bounds on subsystems of the ideal proof system (IPS) over large characteristic fields, where IPS is the strong proof system introduced in Grochow & Pitassi (2018). It is essential in both Forbes et al. (2016) and our work that the hard instance takes the form of a single unsatisfiable axiom. Subsequently, in a recent work, Alekseev et al.
(2020) established conditional exponential-size lower bounds on full IPS refutations over the rationals of the same subset-sum instance (1.1), where the use of big coefficients is again essential to the lower bound. We explain our dag-like lower bound in Section 1.1.2.
The next contribution we make is a systematic development of new kinds of lower bound techniques against tree-like resolution over linear equations, both over the rationals and over finite fields. To this end we develop new and extend existing combinatorial techniques such as the Prover-Delayer game method as originated in Pudlák & Impagliazzo (2000) for resolution, and developed further in Itsykson & Sokolov (2020). Moreover, we provide new applications in proof complexity of different combinatorial results; this include bounds on the size of essential coverings of the hypercube from Linial & Radhakrishnan (2005), a result about the hyperplane coverings of the hypercube from Alon & Füredi (1993) and the notion of immunity from Alekhnovich & Razborov (2001). We further non-trivially extend the well-established principle of sizewidth trade-offs in resolution Ben-Sasson & Wigderson (2001) to the setting of Res(lin R ) (though it is important to note that most of our lower bounds do not follow from this trade-off result).

Background.
For a ring R, the refutation system Res(lin R ) is defined as an extension of the resolution refutation system as follows (see Raz & Tzameret 2008). The proof-lines of Res(lin R ) are called linear clauses (sometimes called simply clauses), which are defined as disjunctions of linear equations with duplicate linear equations contracted. More formally, they are disjunctions of the form: where k is some number (the width of the clause), and a ji , b j ∈ R. The resolution rule is the following: where α, β ∈ R, and where C, D are linear clauses. A Res(lin R ) refutation of a set of linear clauses C 1 , . . . , C m that is an unsatisfiable over 0-1 is a sequence of proof-lines, where each proof-line is either C i , for i ∈ [m], a Boolean axiom (x i = 0 ∨ x i = 1) for some variable x i , or was derived from previous proof-lines by the above resolution rule, or by the weakening rule that allows to extend cc Resolution with Counting Page 7 of 71 2 clauses with arbitrary disjuncts, or a simplification rule allowing to discard false constant linear polynomials (e.g., 1 = 0) from a linear clause (with duplicate linear equations automatically contracted once they turn out after an inference rule). The last proof-line in a refutation is the empty clause (standing for the truth value false).
The size of a Res(lin R ) refutation is the total size of all the clauses in the derivation, where the size of a clause is defined to be the total number of occurrences of variables in it plus the total size of all the coefficient occurring in the clause. The size of a coefficient when using integers (or integers embedded in characteristic zero rings) is the standard size of the binary representation of integers (nevertheless, when we talk about "big" or "exponential" coefficients and "polynomially bounded" coefficients, etc., we mean that the magnitude of the coefficients is big (exponential) or polynomially bounded).
We are generally interested in the following questions: (Q1) For a given ring R, what kind of counting can be efficiently performed in Res(lin R ) and tree-like Res(lin R )?
(Q2) Can dag-like Res(lin R ) be separated from tree-like Res(lin R )?
(Q3) Can tree-like systems for different rings R be separated?
Tree-like Res(lin R ) with semantic weakening. In order to be able to do some non-trivial counting in tree-like versions of resolution over linear equations, we define a semantic version of the system as follows.
The system Res sw (lin R ) is obtained from Res(lin R ) by replacing the weakening and the simplification rules, as well as the Boolean axioms, with the semantic weakening rule (the symbol |= will denote in this work semantic implication with respect to 0-1 assignments): 1 C (C |= D) . D 1 Let k = char(R) be the characteristic of the ring R. In case k / ∈ {1, 2, 3}, deciding whether an R-linear clause D is a tautology (that is, holds for every 2 Page 8 of 71 Part & Tzameret cc The reason for studying Res sw (lin R ) is mainly the following: In case F is a field of characteristic 0, the possibility to do counting in tree-like Res(lin F ) is quite limited. For instance, we show that 2x 1 + · · · + 2x n = 1 requires refutations of exponential in n size (Theorem 8.6). On the other hand, such contradictions do admit short tree-like Res(lin F ) refutations in the presence of the following generalized Boolean axioms (which is a tautological linear clause, over 0-1): where im 2 (f ) is the image of a linear polynomial f under 0-1 assignments. (Similar to the way the Boolean axioms (x i = 0) ∨ (x i = 1) state that the possible value of a variable is either zero or one, the Im(f )axiom states all the possible values that the linear polynomial f can have.) Let Γ be an arbitrary set of tautological R-linear clauses. Then, it is possible to show that lower bounds for tree-like Res sw (lin R ) imply lower bounds for tree-like Res(lin R ) with formulas in Γ as axioms. If a lower bound holds for tree-like Res sw (lin F ) it also holds, in particular, for tree-like Res(lin F ) with the axioms Im(f ), and this makes tree-like Res sw (lin F ) a useful system, for which lower bounds against are sufficiently interesting.

Characteristic zero lower bounds.
For characteristic zero fields, we will use mainly the rational number field Q (though many of the results hold over any characteristic zero rings). First, we show that over Q, whenever α 1 x 1 +· · ·+α n x n +β = 0 is unsatisfiable (over 0-1 assignments), it has polynomial dag-like Res(lin Q ) refutations if the coefficients are polynomially bounded in magnitude, while it requires exponential dag-like Res(lin Q ) refutations 0-1 assignment to its variables) is at least as hard as deciding whether a 3-DNF is a tautology (because over characteristic k / ∈ {1, 2, 3} linear equations can express conjunction of three conjuncts). For this reason Res sw (lin R ) proofs cannot be checked in polynomial time and thus Res sw (lin R ) is not a Cook-Reckhow proof system unless P = coNP (namely, the correctness of proofs in the system cannot necessarily be checked in polynomial time, as required by a Cook-Reckhow propositional proof system (Cook & Reckhow 1979); see Section 2.2). cc Resolution with Counting Page 9 of 71 2 for some subset-sum instances with exponential magnitude coefficients. Note that α 1 x 1 +· · ·+α n x n +β = 0 expresses the subset-sum principle: α 1 x 1 + · · · + α n x n = −β is satisfiable iff there is a subset of the integral coefficients α i whose sum is precisely −β. The lower bound is stated in the following theorem: Theorem (Theorem 4.6; Main dag-like lower bound). Any Res(lin Q ) refutation of 2x 1 + 4x 2 + · · · + 2 n x n + 1 = 0 requires size 2 Ω(n) .
The proof of this theorem introduces a new lower bound technique. We show that every (dag-or tree-like) refutation π of 2x 1 + 4x 2 + · · · + 2 n x n + 1 = 0 can be transformed without much increase in size into a derivation of a certain "fat" (exponential-size) clause C π from Boolean axioms only. 2 In order to prove that C π is fat, we ensure that every disjunct g = 0 in C π has at most 2 cn satisfying Boolean assignments, for some constant c < 1. Because C π is derived from Boolean axioms alone, it must be a Boolean tautology, that is, it must have 2 n satisfying assignments. Since every disjunct in C π is satisfied by at most 2 cn assignments, the number of disjuncts in the clause is at least 2 (1−c)n . Since our constructed derivation is not much larger than the original refutation, the size of the original refutation must be 2 Ω(n) . This proof relies in an essential way on the fact that the coefficients of the linear form have exponential magnitude. Indeed, every contradiction of the form f = 0 can be shown to admit polynomialsize dag-like Res(lin Q ) refutations whenever the coefficients of f are polynomially bounded. A natural question is whether in the case of bounded coefficients, f = 0 can be efficiently refuted already by tree-like Res(lin Q ) refutations. The question turns out to be non-trivial, and we provide a negative answer: Theorem (Theorem 8.6; Subset-sum tree-like lower bounds). Let f be any linear polynomial over Q, which depends on n 2 The notion of showing that a refutation must go though a fat (i.e., wide) clause is well established in resolution lower bounds. However, we note that our lower bound is completely different from the known size-width-based resolution lower bounds (as formulated in a generic way in the work of Ben-Sasson & Wigderson 2001).
The proof is in two stages. First, we use a transformation analogous to the one used for the dag-like lower bound to reduce the lower bound problem for refutations of f = 0 to a lower bound problem for derivations of clauses of a certain kind. Namely, we transform any tree-like refutation π of f = 0 to a tree-like derivation of C π from Boolean axioms without much increase in size. The only difference is that this time we ensure that in every disjunct g = 0 of C π , the linear polynomial g depends on at least n 2 variables.
Second, we prove that tree-like Res(lin Q ) derivations of such a C π are large: Theorem (Theorem 8.4). Any tree-like Res(lin Q ) derivation of any tautology of the form j∈ [N ] g j = 0, for some positive N , where each g j is linear over Q and depends on at least n 2 variables, is of size 2 Ω( √ n) .
To prove this, as well as some other lower bounds, we extend the Prover-Delayer game technique as originated in Pudlák & Impagliazzo (2000) for resolution, and developed further in Itsykson & Sokolov (2020) for Res(lin F 2 ), to general rings, including characteristic zero rings (see Section 8.1). 3 We define a non-trivial strategy for Delayer in the corresponding game and prove that it guarantees √ n coins using a bound on the size of essential coverings of the hypercube from Linial & Radhakrishnan (2005). The relation between Prover-Delayer games and tree-like Res(lin Q ) refutations allows us to conclude that the size of tree-like Res(lin Q ) refutations must be 2 Ω( √ n) . Moreover, as a corollary of Theorem 8.4, we obtain a lower bound on tree-like Res(lin Q ) derivations (in contrast to refutations) of Im(f ) : We also use Prover-Delayer games to prove an exponential-size 2 Ω(n) lower bound on tree-like Res sw (lin F ) refutations of the pigeonhole principle PHP m n for every field F (including finite fields). This extends a previous result in Itsykson & Sokolov (2020) for tree-like Res(lin F 2 ).
Theorem (Theorem 8.9; Pigeonhole principle lower bounds). Let F be any (possibly finite) field. Then every tree-like Res sw (lin F ) refutation of ¬PHP m n has size 2 Ω( n−1 2 ) .
Together with the polynomial upper bounds for PHP m n refutations in dag-like Res(lin F ) for fields F of characteristic zero demonstrated in Raz & Tzameret (2008), Theorem 8.9 establishes a separation between dag-like Res(lin F ) and tree-like Res sw (lin F ) for characteristic zero fields, for the language of unsatisfiable formulas in CNF: Corollary. Over fields of characteristic zero F, Res(lin F ) has an exponential speed-up over tree-like Res(lin F ) as refutation systems for unsatisfiable formulas in CNF.
To prove Theorem 8.9 we need to prove that Delayer's strategy from Itsykson & Sokolov (2020) is successful over any field. This argument is new and uses a result from Alon & Füredi (1993) about the hyperplane coverings of the hypercube.
We prove another separation between dag-like Res(lin Q ) and tree-like Res sw (lin Q ), as follows. For any ring R we define the image avoidance principle to be: where x 1 + · · · + x n = k := k ∈{0,...,n}, k =k x 1 + · · · + x n = k . In words, the image avoidance principle expresses the contradictory statement that for every 0 ≤ i ≤ n, x 1 + · · · + x n equals some element in {0, . . . , n} \ i. In more generality, let f be a linear form over Q and let im 2 (f ) be the image of f under 0-1 assignments to its variables. Define f = A : Corollary (Corollary 3.13). For every ring R and every linear form f the contradiction ImAv (f ) admits polynomial-size Res(lin R ) refutations.
The lower bound in Theorem 8.8 is one more novel application of the Prover-Delayer game argument, combined with the notion of immunity from Alekhnovich & Razborov (2001), as we now briefly explain.
Let f be a linear form as in Theorem 8.8. We consider an instance of the Prover-Delayer game for ImAv (f ). A position in the game is determined by a set Φ of linear non-equalities of the form g = 0, which we think of as the set of non-equalities learned up to this point by each Prover. In the beginning Φ is empty. We define Delayer's strategy in such a way that for Φ an end-game position, there is a satisfiable subset Φ = {g 1 = 0, . . . , g m = 0} ⊆ Φ such that Φ |= f = A for some A ∈ Q, and Delayer earns at least |Φ | = m coins. Because Q is of characteristic zero, it follows that f ≡ A + 1 (mod 2) |=f = A |= g 1 · · · · · g m = 0 and thus the n 4immunity of f ≡ A + 1(mod 2) (Alekhnovich & Razborov 2001) implies m ≥ n 4 . To conclude, by a standard argument if Delayer always earns n 4 coins, then the shortest proof is of size at least 2 n 4 . Table 1.1 sums up our knowledge up to this point with respect to Q (and for some cases any characteristic 0 field):

Finite fields lower bounds.
We now turn to resolution over linear equations in finite fields. We obtain many new tree-like lower bounds (see Table 1.2).  We already discussed above lower bounds for the pigeonhole principle which hold both for positive and zero characteristic. We furthermore prove a separation between tree-like Res(lin F p k ) (resp. tree-like Res sw (lin F p k )) and tree-like Res(lin F q l ) (resp. treelike Res sw (lin F q l )) for every pair of distinct primes p = q and every k, l ∈ N\{0}. The separating instances are mod p Tseitin formulas TS (p) G,σ (written as CNFs), which are reformulations of the standard Tseitin graph formulas TS G for counting mod p. Furthermore, we establish an exponential lower bound for tree-like Res sw (lin F p c ) on random k-CNFs. 4 The lower bounds for tree-like Res(lin F ) for finite fields F are obtained via a variant of the size-width relation for tree-like Res(lin F ) together with a translation to polynomial calculus over the field F, denoted P C F (Clegg et al. 1996), such that Res(lin F ) proofs of width ω are translated to P C F proofs of degree ω (the width ω of a clause is defined to be the total number of disjuncts in a clause). This establishes the lower bounds for the size of tree-like Res(lin F ) proofs via known lower bounds on P C F degrees (Alekhnovich & Razborov 2001). We show that where ω 0 is what we call the principal width, which counts the number of linear equations in clauses when we treat as identical those defining parallel hyperplanes, and S t-l Res(lin R ) (φ ⊥) denotes the minimal size of a tree-like Res(lin R ) refutation of φ. Specifically, over finite fields the following upper and lower bounds provide exponential separations: Theorem (Theorem 9.3; Size-width relation). Let φ be an unsatisfiable set of linear clauses over a field F. The following relation between principal width and size holds for both tree-like Res(lin F ) and tree-like Res sw (lin F ): S(φ ⊥) = 2 Ω(ω 0 (φ ⊥)−ω 0 (φ)) . If F is a finite field, then the same relation holds for the (standard) width of a clause ω.
This extends to every field a result (Garlik & Ko lodziejczyk 2018, Theorem 14) where a size-width relation was shown for a system denoted tree-like PK id O(1) (⊕), which is a system extending tree-like Res(lin F 2 ) by allowing arbitrary constant-depth De Morgan formulas as inputs to ⊕ (XOR gates) (though note that our result does not deal with arbitrary constant-depth formulas).
Theorem (Theorem 9.5). Let F be a field and π be a Res(lin F ) refutation of an unsatisfiable CNF formula φ. Then, there exists a P C F refutation π of (the arithmetization of) φ of degree ω(π).
Corollary (Corollary 9.6; Tseitin mod p lower bounds). For any fixed prime p there exists a constant d 0 = d 0 (p) such that the following holds. If d ≥ d 0 , G is a d-regular directed graph satisfying certain expansion properties, and F is a finite field such that char(F) = p, then every tree-like Res(lin F ) refutation of the Tseitin mod p formula ¬TS (p) G,σ has size 2 Ω(dn) . , and let F be a finite field. Then, every tree-like Res(lin F ) refutation of φ has Remark 1.4. We stress that the size-width relation of Theorem 9.3 cannot be used for transferring P C F degree lower bounds to tree-like Res(lin F ) size lower bounds in case char(F) = 0. This is due to the essential difference between principal width and width in this case. Thus, all the lower bounds that we prove using Prover-Delayer games techniques in case char(F) = 0 do not follow from lower bounds for P C F .  and Δ is the clause density (number of clauses divided by the number of variables), Ax = b stands for a linear system over F p k that has no 0-1 solutions in the first and the third rows, and in the second row the linear system Ax = b is over F 2 . The notation TS in the first and the third rows and for TS (2) G,σ in the second row. t-l Res(lin R ) stands for tree-like Res(lin R ), and p = q are primes (in the second row and third column we assume q = 2). Proposition. (Proposition 3.14; Upper bounds on unsatisfiable linear systems) Let F be a field and assume that the linear system Let φ be a CNF formula encoding the linear system A x = b. Then, there exist tree-like Res(lin F ) refutations of φ of size polynomial in the sum of sizes of encodings of all coefficients in A.
The upper bound in Proposition 3.14 applies only to linear systems that are unsatisfiable over the whole field F. But does any system A x = b over F that has a satisfying assignment over F, but not over 0-1 assignments, admit polynomial-size Res(lin F ) refutations?
For fields F with char(F) ≥ 5 or char(F) = 0 it is known that 0-1 satisfiability of A x = b is NP-complete (see Section 2.4). This means that unless coNP = NP there exist 0-1 unsatisfiable linear systems that require superpolynomial dag-like Res(lin F ) refutations. Moreover, the reduction R from k-UNSAT is such that φ ∈ k-UNSAT has Res(lin F ) refutations of size S iff the system R(φ) has Res(lin F ) refutations of size O(S). Thus, in general, proving lower bounds for linear systems can be as hard as proving lower bounds for CNFs: lower bounds for some linear systems imply lower bounds for CNFs.
A substantial part of this paper is devoted to the study of Res(lin F ) complexity of linear systems, for which we obtain both lower and upper bounds. This includes the lower bounds for Subset Sum principles: as our dag-like Res(lin F ) lower bounds described above (Theorem 4.6) show, in case char(F) = 0 there exists a 0-1 unsatisfiable family of linear systems f = 0, each linear system having a single equation f = 0, with coefficients growing exponentially in the number of variables n, that requires exponential in n daglike Res(lin F ) refutations. But what if F is finite of fixed cardinality q? In this case it is easy to show that the simplest one-equation instance f = 0 is always 0-1 satisfiable (unless f depends on O(|F|) variables). Thus, a hard linear system f 1 = 0, . . . , f m = 0 over a finite field F must contain several equations. Moreover, to obtain super-polynomial lower bounds the number of equations m must satisfy m = ω(log n), as implied by the following upper bound: In case of finite fields, an unconditional explicit tree-like Res(lin F ) lower bound for linear systems can be obtained via P C F using size-width relation for finite fields (Theorem 9.3) and Proposition 2.7. In particular, hard instances of the form A x = b can be constructed by applying the reduction R from k-SAT in the proof of NP-completeness of 0-1 satisfiability of linear systems 5 to, say, mod 2 Tseitin formulas. Our work implies an exponential lower bound for the size of tree-like Res(lin F ) refutations of these systems (for large enough, but constant, characteristic), and we conjecture that they are hard for dag-like Res(lin F ) as well.

Nondeterministic linear decision trees.
There is a well-known size-preserving (up to a constant factor) correspondence between tree-like resolution refutations for unsatisfiable formulas φ and decision trees, which solve the following problem: given an assignment ρ for the variables of φ, determine which clause C ∈ φ is falsified by querying values of the variables under the assignment ρ. In Itsykson & Sokolov (2020) this correspondence was generalized to tree-like Res(⊕) refutations and parity decision trees. In Beame et al. (2018) an analogous correspondence was shown for tree-like R(CP) refutations 6 and decision trees that branch on linear inequalities. In the current work, we initiate the study of linear decision trees and their properties over different characteristics, extending the correspondence of Itsykson & Sokolov (2020) to a correspondence between tree-like Res(lin R ) (and tree-like Res sw (lin R )) derivations to what we call nondeterministic linear decision trees (NLDT).
NLDTs for an unsatisfiable set of linear clauses φ are binary rooted trees, where every edge is labeled with a non-equality f = 0 for a linear form f and every leaf is labeled with a linear clause C ∈ φ, which is violated by the non-equalities on the path from the root to the leaf. (Note that in the same manner that in a (Boolean) decision tree (which corresponds to a tree-like resolution refutation) we go along a path from the root to a leaf, choosing those edges that violate a literal x i or ¬x i , in an NLDT we branch along a path that violates equalities f = 0, or equivalently, certifies non-equalities of the form f = 0.) Theorem (Theorem 6.3). If φ is an unsatisfiable CNF formula, then every tree-like Res(lin R ) or tree-like Res sw (lin R ) refutation can be transformed into a corresponding NLDT for φ of the same size up to a constant factor, and vice versa (note that the NLDTs for the two types of refutations are different).

Preliminaries
2.1. Notation. Denote by [n] the set {1, . . . , n}. We use x 1 , x 2 , . . . to denote variables, both propositional and algebraic. Let f be a linear polynomial (equivalently, an affine function) over a ring R, that is, a function of the form n i=1 a i x i + a 0 with a i ∈ R. We sometimes refer to a linear form as a hyperplane, since the assignments that nullify a linear form determines a hyperplane. We denote by im 2 (f ) the image of f under 0-1 assignments to its vari- . , x n variables, and a ij , b i 's ring elements (when the ring is specified in advanced). We sometimes abuse notation by writing a linear equation as n i=0 a 1i x i = −b 1 instead of n i=0 a 1i x i +b 1 = 0. We assume that all the disjuncts in a linear clause are distinct.
For φ a set of clauses or linear clauses, vars(φ) denotes the set of variables occurring in φ and let Vars denote the set of all variables.
Let A be a matrix over a ring. We introduce the notation Ax b for a system of linear non-equalities, where a non-equality cc Resolution with Counting Page 19 of 71 2 means = (note the difference between Ax b, which stands for If f is a linear polynomial over R and A is a matrix over R, denote by |f | the sum of sizes of encodings of coefficients in f and by |A| the sum of sizes of encodings of elements in A. If If φ is a set of linear clauses over a ring R and D is a linear clause over R, denote by C∈φ C |= D and C∈φ C |= R D semantic entailment over 0-1 and R-valued assignments, respectively.
Let l be a linear polynomial not containing the variable x. If C is a linear clause, denote by C x←l the linear clause, which is obtained from C by substituting l for x everywhere in C.
We define a linear substitution ρ to be a sequence (x 1 ← l 1 , . . . , x n ← l n ) such that each linear polynomial l i does not depend on x i . For a clause or a set of clauses φ, we define φ ρ := (. . . ((φ x 1 ←l 1 ) x 2 ←l 2 ) . . .) xn←ln .

Propositional proof systems.
A clause is an expression of the form l 1 ∨ · · · ∨ l k , where l i is a literal, where a literal is a propositional variable x or its negation ¬x. A formula is in Conjunctive Normal Form (CNF) if it is a conjunction of clauses. A CNF can thus be defined simply as a set of clauses. The choice of a reasonable binary encoding of sets of clauses allows us to define the language UNSAT ⊂ {0, 1} * of unsatisfiable propositional formulas in CNF. We sometimes interpret an element in UNSAT as a formula and sometimes as a set of clauses. Dually, a formula is in Disjunctive Normal Form (DNF) if it is a disjunction of conjunctions of literals and TAUT is the language of tautological propositional formulas in DNF. There is a bijection between TAUT and UNSAT, which preserves the size of the formula, given by negation.
A formula is in k-CNF (resp. k-DNF) if it is in CNF (resp. DNF) and every clause (resp. conjunct) has at most k literals. k-UNSAT 2 Page 20 of 71 Part & Tzameret cc (resp. k-TAUT) is the language of unsatisfiable (resp. tautological) formulas in k-CNF (resp. k-DNF).
In particular, a refutation system Π is a proof system for UNSAT. Post-composition with negation turns a propositional proof system into a refutation system and vice versa. Denote by S(π), and alternatively by |π|, the size of the binary encoding of a proof π in a proof system Π. For φ ∈ UNSAT and a refutation system Π denote by S Π (φ ⊥) (we sometimes omit the subscript Π when it is clear from the context) the minimal size of a Π-refutation of φ.
The resolution system (which we denote also by Res) is a refutation system, based on the following rule, allowing to derive new clauses from given ones: A resolution derivation of a clause D from a set of clauses φ is a sequence of clauses (D 1 , . . . , D s ≡ D) such that for every 1 ≤ i ≤ s either D i ∈ φ or D i is obtained from previous clauses by applying the resolution rule. A resolution refutation of φ ∈ UNSAT is a resolution derivation of the empty clause from φ, which stands for the truth value False.
A resolution derivation is tree-like if every clause in it is used at most once as a premise of a rule. Accordingly, tree-like resolution is the resolution system allowing only tree-like refutations.
Let F be a field. A polynomial calculus Clegg et al. (1996) derivation of a polynomial q ∈ F[x 1 , . . . , x n ] from a set of polyno- j − x j , p i ∈ P or p i is obtained from previous polynomials by applying one of the following rules: cc Resolution with Counting Page 21 of 71 2 x n ] is a derivation of 1. The degree d(π) of a polynomial calculus derivation π is the maximal total degree of a polynomial appearing in it. This defines the proof system P C F for the language of unsatisfiable systems of polynomial equations over F. It can be turned into a proof system for k-UNSAT via arithmetization of clauses as follows:

Hard instances
G,σ are the CNF encoding of the following equations for all u ∈ V : Note that we use the standard encoding of Boolean functions as CNF formulas and the number of clauses, required to encode these To see this, note that if we sum (2.2) over all nodes u ∈ V we obtain precisely u∈V σ(u) which is different from 0 mod p; but on the other hand, in this sum over all nodes u ∈ V each edge (u, v) ∈ E appears once with a positive sign as an outgoing edge from u and with a negative sign as an incoming edge to v, meaning the the total sum is 0, which is a contradiction.
In particular, ¬TS G,σ are the classical Tseitin formulas (Tseitin 1968) and TS , expresses the fact that the sum of total degrees (incoming + outgoing) of the vertices is even.
The proof complexity of Tseitin tautologies depends on the properties of the graph G. For example, if G is just a union of K d+1 (the complete graphs on d + 1 vertices), then they are easy to prove. On the other hand, they are known to be hard for some proof systems if G satisfies certain expansion properties. Let Consider the following measure of expansion for r ≥ 1:

Random k-CNFs.
with n variables that is generated by picking randomly and independently Δ · n clauses from the set of all n k · 2 k clauses.

Complexity of linear systems.
It is a well-known fact that deciding 0-1 satisfiability of linear systems over F p , p ≥ 5 or of linear systems over Q (even if coefficients are small) are NPcomplete problems. Indeed, for example, the 3-clause (x 1 ∨¬x 2 ∨x 3 ) can be represented as the linear equation with additional Boolean variables y 1 , y 2 : x 1 + (1 − x 2 ) + x 3 = 1 + y 1 + y 2 . In this way k-SAT reduces to 0-1 satisfiability of linear systems over a field of characteristic 0 or p > k.
Theorem 2.6. The problem of deciding 0-1 satisfiability of linear systems over a field of characteristic 0 or p ≥ 5 is NP-complete. In case of characteristic 0 this also holds if the size of coefficients is required to be bounded by a constant.
The mapping R of k-CNFs to linear systems described above can be used to translate lower bounds on degree of P C F refutations from k-CNFs to linear systems.
Proof. Denote σ the mapping from literals to linear polynomials such that: σ(x) := x and σ(¬x) := 1 − x. Let τ be the following mapping from clauses to linear polynomials: τ (l 1 ∨ · · · ∨ l s ) := σ(l 1 ) + · · · + σ(l s ) − 1 − y (1) Assume L has P C F refutation π of degree d. If x 1 , . . . , x n are variables of φ, then all the auxiliary variables y . . . , x n ) of degree at most k such that C j |= (τ (C j ) ρv ) = 0, where ρ v stands for the substitution and the entailment is over 0-1 assignments. It is easy to see that π can be extended to the proof π ρv of degree at most k · d, where all the auxiliary variables are substituted with the corresponding polynomials. Due to implicational completeness of P C F , there are P C F derivations π j : C j (τ (C j ) ρv ) = 0 of degree at most k. Composition of {π j } j∈ [m] with π ρv gives a P C F refutation of degree at most k · d.
Conversely, if π is a P C F refutation of φ of degree d, then the composition of derivations τ (C j ) = 0 C j with π gives a refutation of L of degree at most max(k, d).
Remark. Note that the sizes of P C F proofs of φ and R(φ) in the construction in the proof of Proposition 2.7 are the same up to a factor depending on k.

Resolution over linear equations for general rings
In this section, we define and outline some basic properties of systems that are extensions of resolution, where clauses are disjunctions of linear equations over a ring R: Recall that disjunctions of this form are called linear clauses, and that we assume that all disjuncts are distinct, and hence we automatically contract duplicate linear equations in case a duplicate turns up once an inference rule is applied. We sometimes abuse notation by writing a linear equation are linear forms over R and C, D are linear clauses. Note that contraction of duplicates disjuncts is done automatically when applying the resolution and the weakening rules. The Boolean axioms are defined as follows: A Res(lin R ) derivation of a linear clause D from a set of linear clauses φ is a sequence of linear clauses (D 1 , . . . , D s ≡ D) such that for every 1 ≤ i ≤ s either D i ∈ φ or is a Boolean axiom or D i is obtained from previous clauses by applying one of the rules above. A Res(lin R ) refutation of an unsatisfiable set of linear clauses φ is a Res(lin R ) derivation of the empty clause (which stands for false) from φ. The size of a Res(lin R ) derivation is the total size of all the clauses in the derivation, where the size of a clause is defined to be the total number of occurrences of variables in it plus the total size of all the coefficient occurring in the clause. The size of a coefficient when using integers (or integers embedded in characteristic zero rings) will be the standard size of the binary representation of integers.
In this definition, we assume that R is a non-trivial (R = 0) ring such that there are polynomial-time algorithms for addition, multiplication and taking additive inverses.
Along with size, we will be dealing with two complexity measures of derivations: width and principal width. Note that in the definition above the constant disjuncts a = 0, a ∈ R count toward the width and principal width.
Proposition 3.2. Res(lin R ) is sound and complete as a propositional proof system.
Proof. The soundness can be checked by inspecting that each rule of Res(lin R ) is sound. Completeness follows from a simple observation that Res(lin R ) simulates resolution: from C ∨ x i = 0 and D ∨ 1 − x i = 0 derive C ∨ D ∨ 1 = 0 and apply simplification.
In Section 6 (Corollary 6.5), we also prove implicational completeness of Res(lin R ) with respect to linear clauses.
We now define two systems of resolution with linear equations over a ring, where some of the rules are semantic: Res sw (lin R ) and Sem-Res(lin R ). Res sw (lin R ) is obtained from Res(lin R ) by replacing the Boolean axioms with 0 = 0, discarding simplification rule and replacing the weakening rule with the following semantic weakening rule: The system Sem-Res(lin R ) has no axioms except for 0 = 0 and has only the following semantic resolution rule: It is easy to see that Res(lin R ) ≤ p Res sw (lin R ) ≤ p Sem-Res(lin R ), where P ≤ p Q denotes that Q polynomially simulates P .
In contrast to the case R = F 2 (see Itsykson & Sokolov 2020), for rings R with char(R) / ∈ {1, 2, 3} both Res sw (lin R ) and Sem-Res(lin R ) are not Cook-Reckhow proof systems, unless P = NP: Proof. Consider a 3-DNF φ and encode every conjunct ( Then φ is tautological if and only if the disjunction of these linear equations is tautological (that is, for every 0-1 assignment to the variables at least one of the equations hold, when the equations are computed over a ring with characteristic zero or finite characteristic bigger than 3).
We leave it as an open question to determine the complexity of verifying a correct application of the semantic weakening in case char(R) = 3 or in case char(R) = 2 and R = F 2 . In the case R = F 2 the negation of a clause is a system of linear equations and thus the existence of solutions for it can be checked in polynomial time. Therefore, Res sw (lin F 2 ) is a Cook-Reckhow propositional proof system. The definitions of Res(lin F 2 ), Res sw (lin F 2 ) and Sem-Res(lin F 2 ) coincide with the definitions of syntactic Res(⊕), Res(⊕) and Res sem (⊕) from Itsykson & Sokolov (2020), respectively. 7 As showed in Itsykson & Sokolov (2020), Res(lin F 2 ), Res sw (lin F 2 ) and Sem-Res(lin F 2 ) are polynomially equivalent.
The linear clause lin(¬φ) := i∈ [m] lin(¬C i ) = 0 is a tautology (under 0-1 assignments) and thus can be derived in Res sw (lin R ) in a single step as a weakening of 0 = 0 or resolving 0 = 0 with 0 = 0 in tree-like Sem-Res(lin R ).
In tree-like Sem-Res(lin R ) the disjunct lin(¬C i ) = 0 can be eliminated from lin(¬φ) by a single resolution with C i , and thus the empty clause is derived by a sequence of m resolutions of lin(¬φ) with C 1 , . . . , C m .
The following proposition is straightforward, but useful as it allows, for example, to transfer results about Res(lin Q ) to Res(lin Z ).

Proposition 3.5. If R is an integral domain and F rac(R) is its field of fractions, then Res(lin R ) is equivalent to Res(lin F rac(R) ) and tree-like Res(lin R ) is equivalent to tree-like Res(lin F rac(R) ).
Proof. Every proof in Res(lin R ) is also a proof in Res(lin F rac(R) ).
To get the converse, we get rid of fractions by multiplying Res(lin F rac(R) ) proof-lines by elements in R as follows. Assume the Res(lin F rac(R) ) proof π starts with linear clauses {C i } over R. By induction on the number of steps in π we construct a Res(lin R ) refutation π such that if C := ( i g i = 0) is a line in π, then C : Moreover, we ensure that if {b 1 , . . . , b k } is the multiset of denominators of coefficients appearing in the derivation of C, then all coefficients A i ∈ R divide the product k j=1 b j . And finally we ensure that for all i the coefficients in A i g i are all from R.
The base case is immediate. Consider a resolution step in π: for some a, b, c, d ∈ R. By induction hypothesis we have derivations of ( C ∨ Af = 0) and ( C ∨ Bg = 0) in π . We now derive: ((lcm(A, B) If a 1 , . . . , a N ∈ R is the list of denominators of all the coefficients in a Res(lin F rac(R) ) proof π, then under a reasonable encoding of R: | N i=1 a j | ≤ |a 1 | + · · · + |a N | ≤ |π|. Therefore the corresponding Res(lin R ) proof is of size at most O(|π| 2 ).

Basic counting in Res(lin R ) and
Res sw (lin R ). Here we introduce several unsatisfiable sets of linear clauses that express some counting principles, and serve to exemplify the ability of daglike Res(lin R ), tree-like Res(lin R ) and tree-like Res sw (lin R ) to reason about counting, for a ring R. We then summarize what we know about refutations of these instances in our different systems, proving along the way some upper bounds and stating some lower bounds proved in the sequel.
Our unsatisfiable instances are the following: Linear systems: If A = (B|b) is an m × (n + 1) matrix over R, where the B sub-matrix consists of the first n columns, such that Bx = b has no 0-1 solutions, then (B i is the ith row in B): 2 Page 30 of 71

Part & Tzameret cc
Subset Sum: Let f be a linear form over R such that 0 / ∈ im 2 (f ). Then, Image avoidance: Let f be a linear form over R and recall the notation f = A from Section 2.1. We define We also consider the following (tautological) generalization of the Boolean axiom x = 0 ∨ x = 1.

Proof. We construct derivations of
Base case: k = 0. In this case Im(b) is just the axiom b = b and thus derived in one step.
Induction step: Let f k := k i=1 a i x i + b and assume Im(f k ) was already derived. Derive C 0 := A∈im 2 (f k ) f k + a k+1 x k+1 = A ∨ x k+1 = 1 from Im(f k ) by |im 2 (f k )| many resolution applications with x k+1 = 0 ∨ x k+1 = 1. Similarly derive C 1 := Im(f k+1 ) by resolving C 0 with C 1 on x k+1 . The size of the derivation is n · |Im(f )|, and as there is no clause with more than 3 equations that determines non-parallel hyperplanes, and hence the principal width of the derivation is at most 3.
Proof. First construct the shortest derivation of Im(f ), and then by a sequence of |im 2 (f )| many application of the resolution rule with f = 0 derive the empty clause. By Proposition 3.10 the resulting refutation is of polynomial in |Im(f )| size.
Proof. Let A 1 , . . . , A N = a be an enumeration of all the elements in im 2 (f ). By Proposition 3.10 there exists a derivation of i≥1 f = A i of principal width at most 3. For 1 < k < N, we derive C : We thus obtain a derivation of principal width ω 0 ≤ 3 and of size (1 + · · · + (N − 2))|f | = (N −1)(N −2) 2 |f |. Proof. Pick some a ∈ im 2 (f ). By Proposition 3.12 there is a derivation of f = a from ImAv (f ) of polynomial size. This derivation can be extended to a refutation of ImAv (f ) by a sequence of resolution rule applications of f = a with f = a ∈ ImAv (f ).
In Section 5.1, we prove an upper bound for LinSys(A) in terms of the size of the image of the affine map, corresponding to A (Theorem 5.1). All other Res(lin R ) upper bounds for LinSys(A) are tree-like. So for more LinSys(A) upper bounds we refer the reader to the tree-like Res(lin R ) upper bounds further in this section.
Lower bounds. In Section 4, we prove an exponential lower bound for SubSum(f ) in case f is a linear polynomial with large coefficients (Theorem 4.6).
Tree-like Res(lin R ). Upper bounds. In case R is a finite ring, in Section 6, we prove that the clauses in Im(f ) admit derivations of polynomial size (Proposition 6.6). Obviously, in that case (R is finite) any unsatisfiable R-linear equation f = 0 has at most |R| variables and SubSum(f ) are always refutable in constant size. In contrast, in case R = Q, we prove a lower bound for Im(f ), SubSum(f ) and ImAv (f ) for a specific f with small coefficients (see the lower bounds below).
In case a matrix A = (B|b) with entries in a field F defines a system of equations Bx = b, that is unsatisfiable under arbitrary F-valued assignments (not just under 0-1 assignments), we prove a polynomial upper bound for tree-like Res(lin F ) refutations of LinSys(A).
Proposition 3.14. If a m×(n+1) matrix A = (B|b) with entries in a field F is such that Bx = b has no F-valued solutions, then there exists tree-like Res(lin F ) refutation of LinSys(A) of linear size.
Proof. It is a well-known fact from linear algebra that Bx = b has no F-valued solutions iff there exists α ∈ F m such that α T B = 0 and α T b = 1. Therefore, by m − 1 resolutions of Lower bounds.
In Section 4 we prove tree-like Res(lin Q ) exponential-size lower bounds for derivations of Im(f ) and refutations of SubSum(f ) for any f (Corollary 8.5 and Theorem 8.6).
For ImAv (f ) whenever f is of the form f = 1 x 1 + · · · + n x n − A for some i ∈ {−1, 1}, A ∈ F the lower bound holds even for the stronger system tree-like Res sw (lin F ) (see below).
Lower bounds. In case F is a field of characteristic zero, ImAv (f ) are hard even for tree-like Res sw (lin R ) whenever f is of the form f = 1 x 1 + · · · + n x n − A for some i ∈ {−1, 1}, A ∈ F (Theorem 8.8).

Dag-like lower bounds
In this section, we prove an exponential lower bound on the size of dag-like Res(lin Q ) refutations of SubSum(f ), where f = 1 + 2x 1 + 4x 2 + · · · + 2 n x n . The argument proceeds by considering refutations of f = 0, and then roughly mimicking the same refutation while "ignoring" all the resolution rule applications with the (only non-Boolean) axiom f = 0. In this way we obtain a derivation of a clause from only the Boolean axioms. We introduce the concept of stability for Res(lin Q ), which is a property of linear forms that is maintained under such refutation-to-derivation transformations: If a linear form f possesses a property that is stable for Res(lin Q ), then a refutation of f = 0 can be transformed with not much increase in size to a derivation of a linear clause ∨ i (g i = 0) in which every g i has that property. The property that we use, together with the observation that ∨ i (g i = 0) must be a tautology since it is derivable from only the Boolean axioms, will immediately imply that ∨ i (g i = 0) is big, hence concluding the lower bound.
More precisely, our refutation-to-derivation transformation is defined as a mapping that sends every refutation π of f = 0 to a derivation π from the Boolean axioms of some clause C π , in such a way that π satisfies two properties: 1. π is at most polynomially larger than π; 2. C π is exponentially large. We ensure that the second property holds by defining the property of linear polynomials that is stable for Res(lin Q ) to hold iff a linear form has at most 2 cn satisfying assignments, for some constant c < 1. This, together with the observation that C π must be a Boolean tautology, because it is derivable from the Boolean axioms only, implies that C π must be of exponential size (since C π has 2 n satisfying assignments and each disjunct contributes at most 2 cn satisfying assignments). Therefore, by the first property, the original refutation π must be of exponential size.
The fact that f has exponentially large coefficients is essential in our proof that C π is of exponential size. All contradictions of the form f = 0, where f has polynomially bounded coefficients, have polynomial dag-like Res(lin Q ) refutations and, thus, there is no hope to prove strong bounds for dag-like refutations in this case. However, in Section 8.2 we prove that any f = 0, as long as f depends on n variables, must have tree-like Res(lin Q ) refutations of size at least 2 Ω( √ n) . The argument relies on a similar transformation from refutations π of f = 0 to derivations of some C π and in this way reduces the problem to proving size lower bounds against tree-like Res(lin Q ) derivations of C π from the Boolean axioms. The difference is in the stable property that we use: For the tree-like lower bounds, we define the stable property to hold iff a linear form depends on at least n 2 variables (that is, there are at least n 2 variables with nonzero coefficients in it). Denote by F[x 1 , . . . , x n ] ≤1 the set of linear polynomial over the field F. Definition 4.1 (Stable property for Res(lin F )). Let f be a linear polynomial over a field F with n variables and let P : 1} be a property of linear polynomials over F that is closed under nonzero scalar products, namely if P(g) (that is, P(g) = 1) then also P(αg) for all nonzero α ∈ F and linear polynomials g. We say that P is stable for Res(lin F ) with respect to f whenever the following both hold: (ii) for all linear polynomials g and for all but at most one a ∈ F: P(g + af ) = 1. We now show that stable properties are preserved under refutation-to-derivation transformations: Lemma 4.2. Let f be a linear polynomial with n variables over a field F and let the property P be stable for Res(lin F ) with respect to f . Then, if there exists Res(lin F ) (resp. tree-like Res(lin F )) refutation of f = 0 of size S, then there exists Res(lin F ) (resp. tree-like Res(lin F )) derivation of size O(n · S 3 ) from the Boolean axioms only of a linear clause j∈[N ] g j = 0 (for some positive N ), where P(g j ) = 1 for every j ∈ [N ].
Proof. We first sketch the plan of the proof. Assume that π is a Res(lin F ) refutation of f = 0. By refraining from performing the resolution rule application with f = 0 in π, we transform the refutation into a derivation π of some clause C, such that P(g) = 1 for every disjunct g = 0 in C. We do this in such a way that π is not much larger than π: |π | = O(n · |π| 3 ).
Denote by π ≤k the fragment of π that consists of the first k lines of π. By induction on k we define a derivation π k of some clauses D k from Boolean axioms alone. The derivations π k are defined together with a surjective function M k from lines of π ≤k to lines of π k such that if D = t∈ [m] g t = 0 is a line in π ≤k , then is a line in π k , where a t ∈ F and each h s is a linear polynomial. M k is a surjection because it is a mapping between lines in the original refutation to lines in a corresponding proof in which we do not perform the resolution rule applications with f = 0, and hence some lines in π ≤k will not have corresponding lines in the new derivation. We show that M k (D) satisfies the following properties: 1. For each h s = 0, P(h s ) = 1. such disjuncts summing over all refutation lines in π k is not too large:

Let
3. The numbers a t and coefficients of h s are not too large: Their bit-size does not exceed the maximal bit-size of coefficients in π.
Before we proceed to the inductive definition of π k , we finish the proof assuming that π k described above exists. If l is the length of π, then π := π l contains a derivation of M l (∅), where ∅ denotes the empty clause. Hence, all the disjuncts in M l (∅) are h s in the notation of (4.3), and by condition 1 above they all have P(h s ) = 1. For the size upper bound we need to show that |π l | = O(n · S 3 ) where S = |π ≤l |. By condition 2 above, we have |π l | ≤ 2S · (max size of h s ) + S · (max size of t∈ [m] (g t + a t f = 0) as in (4.3)). By condition 3 above the maximal size of h s ≤ n · (max bit-size in π l ) ≤ n · S, and (max size of t∈ [m] (g t + a t f = 0) as in (4.3)) ≤ S + m · n · (max size of a t ) ≤ O(n · S 2 ), and we are done.
We now turn to the inductive construction of π k .
Base case: Define π 0 to be the empty derivation. Induction step: Assume π k and M k satisfy the properties above and k is smaller than the length of π. If D is the last line of π ≤k+1 , then M k+1 extends M k to D and π k+1 either extends π k with M k+1 (D) or coincides with π k . Consider the possible cases in which the last line D of π ≤k+1 is derived: Case 1: Boolean axiom: D = (x i = 0 ∨ x i = 1). Then π k+1 extends π k with D and M k+1 (D) = D. Case 2: D = (f = 0). Then π k+1 extends π k with the axiom 0 = 0 and M k+1 (D) = (f − f = 0). Case 3: D is derived by resolution: from some previous lines in π ≤k is of the form (i = 1, 2; A i ∈ F): The derivation π k+1 extends π k with M k+1 (D). It remains to be shown that M k+1 (D) is of the required form and it satisfies properties 1-3 above. If we consider the clause D = (αG 1 + βG 2 = 0 ∨ C 1 ∨ C 2 ) as a multiset of disjuncts and C 1 , C 2 , as usual, as sets of disjuncts, there can be up to three identical copies of some disjunct g = 0 (from C 1 , from C 2 and from {αG 1 + βG 2 = 0}) that are contracted to a single disjunct in the clause D. In M k+1 (D) these copies can be different because of different "+af " terms and, thus, can be non-contractible.
For every disjunct g = 0 in D, denote by F g the set of up to three disjuncts in M k+1 (D) that correspond to g, namely, (g j and (αG 1 + βG 2 + (αA 1 + βA 2 )f = 0) ∈ F g if g = αG 1 + βG 2 . For every g = 0 ∈ D, pick one element g + af = 0 ∈ F g , which minimizes P(g + af ) (minimizes in the sense of P as a function ranging over {0, 1}), and denote X the set of these elements. Denote Y : h s = 0 in the notation of (4.3), that is, it is considered a part of H D . We now show that M k+1 satisfies all the desired properties 1-3 above: 1. For every h (i) s ) = 1 holds by induction hypothesis. For every g + af = 0 ∈ Y , P(g + af ) = 1 holds by definition of Y and the fact that P is stable for Res(lin F ) with respect to f (and hence there can be at most one a ∈ F such that P(g + a f ) = 0).
3. The absolute values of coefficients in π k+1 do not exceed the maximal absolute value of coefficients in π.
Lemma 4.4. Let g : Z n → Z be a linear function and let I = im 2 (g) be the image of g under Boolean assignments and K = g −1 (0) ∩ {0, 1} n be the "Boolean kernel" of g. Then, |I| · |K| ≤ 3 n .
Proof. Intuitively as the image of g under Boolean assignments becomes bigger it must be the Boolean kernel of g, which sends all points to the same point 0 in the image, becomes smaller.
For every element a ∈ I choose some v a ∈ {0, 1} n such that g(v a ) = a. Consider the set X := {v a + u} a∈I,u∈K ⊂ {0, 1, 2} n , where v a + u means the vector resulting from coordinate-wise addition of {0, 1} n values (with 1 + 1 = 2, namely the addition is over Z). Notice that |X| ≤ 3 n . Hence it suffices to prove that |X| = |I| · |K|. For this we show that each different pair a, u with a ∈ I and u ∈ K induces a distinct vector v a + u in X.
Since u, u ∈ K we have g(u) = g(u ) = 0, and hence g(v a ) = g(v a ). Hence, by definition of v a , a = a and v a = v a , and since v a + u = v a + u , we conclude also that u = u .
Proof. First, note that P is closed under scalar products: both g = 0 and αg = 0 have the same number of satisfying assignments, for any nonzero α ∈ Q. Second, P(b+f ) = 1 for all b ∈ Q, because f : {0, 1} → [1, 2 n+1 − 1] is a bijection, hence b + f = 0 has always at most one solution.
It thus remains to show that for all linear polynomials g and all a ∈ F but at most one P(g + af ) = 1. We use the following claim: Claim. Let g : Z n → Z be a linear function. For any a ∈ Z\{0} one of the following holds: (i) g = 0 has at most 3 n 2 0-1 solutions; (ii) g + af = 0 has at most 3 n 2 0-1 solutions.
Proof of claim: For every b ∈ Z, there exists at most one Boolean assignment that satisfies both g = b and b + af = 0. Therefore the number of 0-1 solutions of g + af = 0 is at most the size of the Boolean image im 2 (g) of g. By Lemma 4.4 either |im 2 (g)| ≤ 3 Let h be any linear function, then if we take g := h + bf , a := b − b the claim implies that either h + bf or h + b f has at most 3 n 2 solutions. Therefore, for every h there exists at most one b ∈ Z such that h + bf = 0 has more than 3 n 2 solutions, which means that P is stable as required.
Proof. By Lemma 4.5 the property of linear polynomials g over Q that holds iff g = 0 has at most 2 (0.5·log 3)n 0-1 solutions is stable for Res(lin Q ) with respect to f . Thus, by Lemma 4.2, if π is a refutation of f = 0, then there exists a derivation π of some clause C = j∈[N ] g j = 0 from the Boolean axioms, where each g j = 0 has at most 2 (0.5·log 3)n 0-1 solutions. Moreover |π | = O(n · |π| 3 ). Since C must be a Boolean tautology (which is satisfied by 2 n assignments), it must contain at least 2 (1−0.5·log 3)n disjuncts (because every disjunct contributes at most 2 (0.5·log 3)n satisfying assignments). Therefore |π| = 2 Ω(n) .

Linear systems with small coefficients
In this short section, we study 0-1 unsatisfiable linear systems over finite fields (in contrast to CNF formulas, for example).
We prove an upper bound, which is polynomial in . . , f m (x)). In contrast to the case of a single equation f = 0, the size of the image |im 2 (A x)| does not fully characterize the size of the shortest Res(lin F ) refutation of f 1 = 0, . . . , f m = 0: there is an example, where |im 2 (A x)| is large, but the size for refuting f 1 = 0, . . . , f m = 0 is small.

Nondeterministic linear decision trees
In this section, we extend the classical correspondence between tree-like resolution refutations and decision trees (cf. Beame et al. 2004) to tree-like Res(lin R ) and tree-like Res sw (lin R ). We establish some upper bounds on such decision trees which in turn imply short tree-like Res(lin R ) refutations.
We define nondeterministic linear decision trees (NLDT), which generalize parity decision trees, proposed in Itsykson & Sokolov (2020) for R = F 2 , to arbitrary rings. We shall use these trees in the sequel to establish some of our upper and lower bounds (though not for our dag-like lower bounds).
Let φ be a set of linear clauses (that we wish to refute) and Φ a set of linear non-equalities over R (that we take as assumptions). Consider the following two decision problems: DP1 Assume Φ |= ¬φ. Given a satisfying Boolean assignment ρ to Φ, determine which clause C ∈ φ is violated by ρ by making queries of the form: which of f | ρ = 0 or g| ρ = 0 holds for linear forms f, g in case f | ρ + g| ρ = 0.
DP2 Similar to DP1, only that we assume Φ |= R ¬φ, and given R-valued assignment ρ, satisfying Φ, we ask to find a clause C ∈ φ falsified by ρ.
Below we define NLDTs of types DT sw (R) and DT(R), which provide solutions to DP1 and DP2, respectively. The root of a tree is labeled with a system Φ, the edges in a tree are labeled with linear non-equalities of the form f = 0 and the leaves are labeled with clauses C ∈ φ. Informally, at every node v there is a set Φ v of all learned non-equalities, which is the union of Φ and the set of non-equalities along the path from the root to the node. If v is an internal node, two outgoing edges f = 0 and g = 0 define a query to be made at v, Starting from the root, based on the assignment ρ, we go along a path, from the root to a leaf, by choosing in each node to go along the left edge f = 0 or the right edge g = 0, depending on whether f | ρ = 0 or g| ρ = 0. Note that f | ρ = 0 and g| ρ = 0 may not be mutually exclusive, and this is why the decision made in each node may be nondeterministic.
Definition 6.1. (Nondeterministic linear decision tree NLDT; DT(R), DT sw (R)) Let φ be a set of linear clauses and Φ be a set of linear non-equalities over a ring R. A nondeterministic linear decision tree T of type DT(R) and of type DT sw (R) for (φ, Φ) is a binary rooted tree, where every edge is labeled with some linear non-equality f = 0, in such a way that the conditions below hold. In what follows, for a node v, we denote by Φ r;v the set of non-equalities along the path from the root r to v and by Φ v the set Φ r;v ∪ Φ. We say that Φ v is the set of learned non-equalities at v.
(i) Let v be an internal node. Then v has two outgoing edges labeled by linear non-equalities f v = 0 and g v = 0, such that: which is violated by Φ v in the following sense: In case Φ is empty, we sometimes simply write that the NLDT is for φ instead of (φ, ∅).
Below we give several examples (and basic properties) of NLDTs.
Example 1 Let φ be a set of clauses, representing unsatisfiable CNF. Then any standard decision tree on Boolean variables is an NLDT for where a branching on the value of a variable x is realized by branching on (1 − x) + x = 0 to either 1 − x = 0 or x = 0. This is illustrated by (the proof of) the following proposition: Proposition 6.2. If Φ is a set of linear non-equalities and φ is a set of linear clauses over R such that Φ |= ¬φ, then there exists a DT(R) tree for Proof. Let vars(φ ∪ {¬Φ}) = {x 1 , . . . , x n } and fix an ordering on these variables. Construct a tree T 0 with 2 n nodes, that branches on x 1 , . . . , x n , in this order. Thus, in every leaf v of T 0 a total assignment to the variables is determined (i.e., ). Since Φ |= ¬φ, this assignment violates either some clause C = (f 1 = 0 ∨ · · · ∨ f m = 0) in φ or some non-equality g = 0 in Φ. We augment T 0 to T by attaching a subtree to every leaf v of T 0 depending on whether the former or latter condition holds for v, as follows: We attach a subtree to v that makes m sequences of branches as follows. If f i = a 1 x 1 + · · · + a n x n + b then a 1 (1 − ν 1 ) + · · · + a n (1 − ν n ) + b = 0 holds and the ith sequence is the following sequence of "substitutions": (a 1 x 1 + a 2 (1 − ν 2 ) + · · · + a n (1 − ν n ) + b) + (a 1 (1 − ν 1 ) − a 1 x 1 ) = 0 to a 1 x 1 + a 2 (1 − ν 2 ) + · · · + a n (1 − ν n ) + b = 0 and (1 − ν 1 ) − x 1 = 0, . . . , (a 1 x 1 +· · ·+a n−1 x n−1 +a n (1−ν n )+b)+(a n (1−ν n )−a n x n ) = 0 to f i = 0 and (1 − ν n ) − x n = 0. All the right branches lead to nodes u such that {x i = 0, x i = 1} ⊆ Φ u for some i ∈ [n] and thus they satisfy the DT(R) leaf condition in Definition 6.1. Such a sequence indeed performs substitutions: the edge to the leftmost node is f i = 0 and as we go upward, we apply the substitutions x n ← 1 − ν n , . . . , x 1 ← 1 − ν 1 to this non-equality.
In the leftmost node w in the end of the mth sequence, {f 1 = 0, . . . , f m = 0} ⊆ Φ w holds and thus again C is violated at w in the sense of Definition 6.1 and therefore w is a legal DT(R)-leaf.
The tree T is a DT(R) tree for (φ, Φ).
Example 2 Let φ be as in Example 1. Parity decision trees, as defined in Itsykson & Sokolov (2020), are NLDTs for φ of type DT sw (F 2 ): branching on the value of an F 2 -linear form f is realized by branching from (1 − f ) + f = 0 to 1 − f = 0 and f = 0. And the converse also holds: a branching of f + g = 0 to f = 0 and g = 0, where, say, f is a non-constant F 2 -linear form, is equivalent to branching on the value of f .
We now show the equivalence between NLDTs and tree-like Res(lin R ) proofs. Theorem 6.3. Let φ be a set of linear clauses over a ring R and Φ be a set of linear non-equalities over R. Then, there exist decision trees DT(R) (resp. DT sw (R)) for Proof. (⇒) Let T φ be an NLDT of type DT(R) or DT sw (R) for φ. We construct a tree-like Res(lin R ) or tree-like Res sw (lin R ) derivation from T φ , respectively, as follows. Consider the tree of clauses π 0 , obtained from T φ by replacing every vertex u with the clause ¬Φ u . This tree is not a valid tree-like derivation yet. We augment it to a valid derivation π by appropriate insertions of applications of weakening and simplification rules.
Case 1: If ¬Φ u ∈ π 0 is a leaf, then Φ u violates a clause D ∈ φ ∪ {0 = 0}. By condition Definition 6.1(ii), ¬Φ u must be a weakening of D (syntactic for T φ ∈ DT(R) and semantic for T φ ∈ DT sw (R)) and we add D as the only child of this node.
Case 2: Let ¬Φ u ∈ π 0 be an internal node with two outgoing edges labeled with f u = 0 and g u = 0.
(⇐) Conversely, assume π is a tree-like Res(lin R ) or a tree-like Res sw (lin R ) derivation of a (possibly empty) clause C from φ. In what follows, when we say weakening we mean syntactic or semantic weakening depending on π being a tree-like Res(lin R ) or a tree-like Res sw (lin R ) derivation, respectively.
Let the edges in the proof-tree of π be directed from conclusion to premises. We turn this proof-tree into a decision tree T π for cc Resolution with Counting Page 47 of 71 2 (φ, ¬C) as follows. Every node of outgoing degree 2 in the prooftree π is a clause obtained from its children by a resolution rule. For each such node C ∨ D ∨ (αf + βg = 0), we label its outgoing edges to C ∨ f = 0 and D ∨ g = 0 with f = 0 and g = 0, respectively. We contract all unlabeled edges, which are precisely those corresponding to applications of weakening and simplification rules. If C 1 , . . . , C k is a maximal (with respect to inclusion) sequence of weakening and simplification rule applications (the latter occur only in Res(lin R ) derivations), then we contract it to C k . In this way we obtain the tree T π , where every edge is labeled with linear non-equality and every node u is labeled with a clause C u such that if f = 0 and g = 0 are labels of edges to the left l(u) and to the right r(u) children, respectively, then C u is a weakening and a simplification (the latter again in case of Res(lin R )) of the clause C ∨ D ∨ αf + βg = 0 for some α, β ∈ R, such that . We now prove that T π is a valid decision tree of type DT(R) (respectively, DT sw (R)) if π is a tree-like Res(lin R ) derivation (respectively, tree-like Res sw (lin R ) derivation).
Case 1: Assume π is tree-like Res(lin R ) derivation. We prove inductively that for every node u in T π we have ¬C u ⊆ Φ u . Base case: u is the root r. We have Φ r = ¬C = ¬C r . Induction step: For any other node u assume ¬C p ⊆ Φ p ∪ {a = 0 | a ∈ R \ 0} holds for its parent node p. Let f = 0 be the label on the edge from p to u. Then C u = (C ∨ f = 0) for some clause C and C p must be of the form (C ∨ D) for some clause D, and hence Now we show that T π satisfies the conditions of Definition 6.1 for DT(R) trees.
• (Internal nodes) Let u be an internal node of T π with outgoing edges labeled with f = 0 and g = 0. C u must be both a weakening and a simplification of (C ∨ αf + βg = 0) for some α, β ∈ R and a linear clause C. If αf + βg = 0 ∈ {a = 0 | a ∈ R \ 0}, then the condition trivially holds, otherwise αf + βg = 0 cannot be eliminated via simplification and thus αf + βg = 0 ∈ ¬C u and ¬C u ⊆ Φ u imply αf + βg = 0 ∈ 2 Page 48 of 71 Part & Tzameret cc Φ u and the condition for internal nodes in Definition 6.1 is satisfied.
• (Leaves) Let u be a leaf of T π . Then C u must be both a weakening and a simplification of some clause Case 2: Assume π is a tree-like Res sw (lin R ) derivation. We prove inductively that for every node u in T π , C u |= ¬Φ u holds. Base case: u is the root r and we have ¬Φ r = C = C r . Induction step: u is a node which is not the root. If C p |= ¬Φ p holds for its parent p and f = 0 is the label on the edge from p to u, then (C ∨ D ∨ αf + βg = 0) |= C p , C u = (C ∨ f = 0) for some α, β ∈ R a linear form g and some linear clauses C, D. Therefore, We now show that T π satisfies the conditions of Definition 6.1 for DT sw (R) trees.
• (Leaves) Let u be a leaf of T π . Then C u must be a weakening of some clause C in φ ∪ {0 = 0}, that is, C u = (C ∨ D) for some clause D. Therefore C u |= ¬Φ u implies that C is falsified by Φ u .
An immediate corollary is the following: We construct an NLDT to prove the following upper bound: Proposition 6.6. Let R be a finite ring, f = a 1 x 1 + · · · + a n x n a linear form over R, s f the size of Im(f ) (i.e., the size of its encoding) and d f = |im 2 (f )|. Then, there exists a tree-like Proof. We construct a decision tree of type DT(R) of size O(s f n 2d f ) with the system Φ r = {f = A} A∈im 2 (f ) at its root r. By Theorem 6.3 this implies the existence of a tree-like Res(lin R ) proof of Im(f ) of the same size. Let f (1) := a 1 x 1 + · · · + a n 2 x n 2 and f (2) := a n 2 +1 x n 2 +1 + · · · + a n x n . The decision tree for Im(f ) is constructed recursively as a tree of height 2d f , where a subtree for Im f (1) or for Im f (2) is hanged from each leaf. At every node u of depth d the system of non-equalities is of the form: . The branching at an internal node u is made by the non-equality The size s n of this tree can be upper bounded as follows:

CNF upper bounds for Res(lin R )
In this section, we outline two basic polynomial upper bounds, which we use to establish our separations in subsequent sections: short tree-like Res(lin R ) refutations for CNF encodings of linear systems over a ring R, and short Res(lin R ) refutations for ¬PHP m n . Together with our lower bounds, these imply the separation between tree-like Res(lin F ) and tree-like Res(lin F ), where F, F are fields of positive characteristic such that char(F) = char(F ).
The short refutation of the pigeonhole principle will imply a separation between dag-like and tree-like Res(lin F ) for fields F of characteristic 0.
In what follows, we consider standard CNF encodings of linear equations f = 0 where the linear equations are considered as Boolean functions (i.e., functions from 0-1 assignments to {0, 1}); we do not use extension variable in these encodings.
Proposition 7.1. Let F be a field and Ax = b be a system of linear equations that has no solution over F, where A is k × n matrix with entries in F, and A i denotes the ith row in A. Assume Proof. The idea is to derive the actual linear system of equations from their CNF encoding and then refute the linear system using a previous upper bound (Proposition 3.14).
If n i is the number of variables in A i · x − b i = 0, then |φ i | = Θ(2 n i ). By Proposition 6.4 proved in the sequel there exists a tree-like Res(lin F ) derivation of By Proposition 3.14 there exists a tree-like Res(lin F ) refutation As a corollary we get the polynomial upper bound for the Tseitin formulas (see Section 2.3.2 for the definition): Theorem 7.2. Let G = (V, E) be a d-regular directed graph, p a prime number, σ : V → F p such that u∈V σ(u) ≡ 0 (mod p), then ¬TS Proof. ¬TS (p) G,σ is an unsatisfiable system of linear equations over F p (note that no assignment of F-elements to the variables in ¬TS (p) G,σ is satisfying, and so we do not need to use the (nonlinear) Boolean axioms to get the unsatisfiability of the system of equations). Therefore, by Proposition 7.1 there exists a tree-like Res(lin Fp ) refutation of ¬TS Proof. This follows from the upper bound of Raz & Tzameret (2008) for Res(lin Z ) and the fact that any Res(lin Z ) proof can be interpreted as Res(lin R ) if R is of characteristic 0.

Tree-like lower bounds
In this section we deal with tree-like refutations (over large and small fields). In Section 8.1 we present the Prover-Delayer game for our model.In Section 8.2 we establish lower bounds on treelike refutations of subset sum instances with small coefficients and derivations of related instances. In Section 8.3, we establish treelike lower bounds for the pigeonhole principle.

Prover-Delayer games.
The Prover-Delayer game is an approach to obtain lower bounds on resolution refutations introduced in Pudlák & Impagliazzo (2000). The idea is that the nonexistence of small decision trees, and hence small tree-like resolution refutations for an unsatisfiable formula can be phrased in terms of the existence of a certain strategy for Delayer in a game against Prover, associated with the unsatisfiable formula. Prover wants to conclude the proof as quickly as possible while Delayer wishes to delay the conclusion and earn as much coins as possible. We define such games G R and G R sw for decision trees DT(R) and DT sw (R), respectively. Below we show (Lemma 8.1) that the existence of certain strategies for Delayer in G R and G R sw imply lower bounds on the size of DT(R) and DT sw (R) trees, respectively.
The game. Let φ be a set of linear clauses and Φ s be a set of linear non-equalities. Consider the following game between two parties called Prover and Delayer. The game goes in rounds, consisting of one move of Prover followed by one move of Delayer. The position in the game is determined by a system of linear nonequalities Φ, which is extended by one non-equality after every round. The starting position is Φ s .
In each round, Prover presents to Delayer a possible branching f = 0 and g = 0 over a linear non-equality f + g = 0, such that f + g = 0 ∈ Φ ∪ {a = 0 | a ∈ R \ 0} in game G R or Φ |= f + g = 0 in game G R sw . After that, Delayer chooses either f = 0 or g = 0 to be added to Φ, or leaves the choice to the Prover and thus earns a coin. The game G R finishes when ¬C ⊆ Φ for some C ∈ φ ∪ {0 = 0}, and G R sw finishes when Φ |= ¬C for some clause C ∈ φ ∪ {0 = 0}. We use the term strategy for Delayer to refer to a set of branching choices made based on the position in the game and the branching options presented.
Lemma 8.1. If there exists a strategy with a starting position Φ s for Delayer in the game G R (respectively, G R sw ) that guarantees at least c coins on a set of linear clauses φ, then the size of a DT(R) (respectively DT sw (R)) tree for φ, with the system Φ s at the root, must be at least 2 c .
Proof. Assume that T is a tree of type DT(R) (respectively, DT sw (R)) for φ. We define an embedding of the full binary tree B c of height c to T inductively as follows. We simulate Prover in the game G R (respectively, G R sw ) by choosing branchings from T and following to a subtree chosen by the Delayer until Delayer decides to earn a coin and leaves the choice to the Prover or until the game finishes. In case we are at a position where Delayer earns a coin, and which corresponds to a vertex u in T , we map the root of B c to u and proceed inductively by embedding two trees B c−1 to the left and right subtrees of u, corresponding to two choices of the Prover.

Lower bounds for the subset sum with small coefficients.
In this section, we prove tree-like Res(lin Q ) lower bounds for SubSum(f ) (namely, f = 0) including instances where the coefficients of f have small magnitude, as well as tree-like Res sw (lin Q ) lower bounds for ImAv (±x 1 ± · · · ± x n ) (for all possible +/− signs for each of the variables x i ).
The proof of the tree-like Res(lin Q ) lower bounds for SubSum(f ) proceeds in two steps. Assume that f depends on n variables (namely, n variables appear in the linear polynomial f ). First, as in the proof of the dag-like lower bounds in Section 4 we use Lemma 4.2 to transform refutations π of f = 0 to derivations π of a clause C π from only the Boolean axioms. We ensure that π is not much larger than π and C π possesses the following property, which makes it hard to derive in tree-like tree-like Res(lin Q ): for every disjunct g = 0 in C π the linear polynomial g depends on at least n 2 variables. Second, we use Prover-Delayer games to prove the lower bound for tree-like Res(lin Q ) derivations of any clause with this property. The proof that Delayer's strategy succeeds to earn sufficiently many coins is guaranteed by a bound on the size of essential coverings of hypercubes by Linial & Radhakrishnan (2005).
Definition 8.2. Let H be a set of hyperplanes in Q n . We say that H forms an essential cover of the Boolean cube B n = {0, 1} n if: • Every point of B n is covered by some hyperplane in H (a hyperplane determined by a linear equation h = 0 is said to cover a point b ∈ B n iff b satisfies the equation). We use Prover-Delayer games to prove the lower bound below. Proof. According to the definitions in Section 8.1 the corresponding Prover-Delayer game is on 0 = 0 and starts with the position The game finishes at a position Φ, where {x i = 0, x i = 1} ⊆ Φ for some i ∈ [n] or 0 = 0 ∈ Φ. We now define a Delayer's strategy that guarantees Ω( √ n) coins and by Lemma 8.1 obtain the lower bound.
If Φ is a position in the game, denote by Φ c ⊂ Φ the subset of so-called "coin" non-equalities, that is, non-equalities that were chosen by Prover when Delayer decided to leave the choice to Prover and earn a coin. The number |Φ c | is then precisely the number of coins earned by Delayer at Φ.
Let us first informally explain the idea behind the strategy of Delayer. Throughout the game Delayer ignores the original nonequalities from Φ r and ensures that Φ \ Φ r is 0-1 satisfiable while trying to earn as many coins as possible. Roughly speaking, Delayer wants to let Prover make a choice between h 1 = 0 and h 2 = 0 whenever Φ \ Φ r does not semantically imply h i = 0 over 0-1 assignments. But Delayer replaces this semantic test with a syntactic test by keeping a partial assignment ρ I for variables in I ⊆ [n] such that (Φ \ Φ r ) ρ I does not imply h = 0 for any h: in that case Delayer can check whether either h 1 ρ I or h 2 ρ I is zero polynomial and if not, leave the choice to Prover. Having such partial assignment ρ I constructed guarantees a lower bound for the number of coins: |Φ c | = Ω( |I|) by the Theorem 8.3. Moreover in the end of the game g j ρ I must be a zero polynomial for some j ∈ [N ] and thus |I| ≥ n 2 − 1, which implies the required lower bound on the number of coins.
We now turn to the formal proof. Throughout the game Delayer constructs a partial assignment ρ I for variables in I ⊆ [n] and a set of non-equalities Φ I ⊆ Φ c , such that: Proof. Let a 1 , . . . , a k be the rows of the matrix A. The Boolean solutions to the system Ax b are all the points of the ndimensional Boolean hypercube B n := {0, 1} n ⊂ F n , that are not covered by the hyperplanes H := {a 1 x − b 1 = 0, . . . , a k x − b k = 0}. We need to show that if k < n and 0 ∈ B n is not covered by H, then some other point in B n is not covered by H as well. This follows from ( Thus, if k < n hyperplanes do not cover B n completely, then they do not cover at least M (n + 1) points. The set Y (n + 1) in the Corollary above consists of all tuples (y 1 , . . . , y n ), where y i = 2 for some i ∈ [n] and y j = 1 for j ∈ [n], j = i. Therefore M (n + 1) = 2.
Lemma 8.11. Let Ax b be a system of k linear non-equalities over a field F with n > k variables and let α ∈ {0, 1} n be a solution to the system. Then, for every choice I of k + 1 bits in α, there exists at least one i ∈ I so that flipping the ith bit in α results in a new solution to Ax b. In other words, if I ⊆ [n] is such that |I| = k + 1, then there exists a Boolean assignment β = 0 such that {i | β i = 1} ⊆ I and A(α ⊕ β) b.
Proof. Let I ⊆ {0, 1} n . Denote by A I the matrix with columns {(1 − 2α i )a i | i ∈ I}, where a i is the ith column of A. That is, A I is the matrix A restricted to columns i with i ∈ I and where column i flips its sign iff α i is 1.
Assume that β ∈ {0, 1} n is nonzero and all its 1's must appear in the indices in I, that is, {i | β i = 1} ⊆ I. Given a set of