Optimal Sparsification for Some Binary CSPs Using Low-degree Polynomials

This paper analyzes to what extent it is possible to efficiently reduce the number of clauses in NP-hard satisfiability problems, without changing the answer. Upper and lower bounds are established using the concept of kernelization. Existing results show that if NP is not contained in coNP/poly, no efficient preprocessing algorithm can reduce n-variable instances of CNF-SAT with d literals per clause, to equivalent instances with O(n^{d-e}) bits for any e>0. For the Not-All-Equal SAT problem, a compression to size \~O(n^{d-1}) exists. We put these results in a common framework by analyzing the compressibility of binary CSPs. We characterize constraint types based on the minimum degree of multivariate polynomials whose roots correspond to the satisfying assignments, obtaining (nearly) matching upper and lower bounds in several settings. Our lower bounds show that not just the number of constraints, but also the encoding size of individual constraints plays an important role. For example, for Exact Satisfiability with unbounded clause length it is possible to efficiently reduce the number of constraints to n+1, yet no polynomial-time algorithm can reduce to an equivalent instance with O(n^{2-e}) bits for any e>0, unless NP is a subset of coNP/poly.


Introduction
The goal of sparsification is to make an object such as a graph or logical structure less dense, without changing the outcome of a computational task of interest. Sparsification can be used to speed up the solution of NP-hard problems, by sparsifying a problem instance before solving it. The notion of kernelization, originating in the field of parameterized complexity [8,12,13], facilitates a rigorous study of polynomial-time preprocessing for NP-hard problems and can be used to reason about (the impossibility of) sparsification. Over the last few years, our understanding of the power of polynomial-time data reduction has increased tremendously, as documented in recent surveys [4,16,23,26]. By studying the kernelization complexity of a graph problem parameterized by the number of vertices, or of a logic problem parameterized by the number of variables, we can analyze its potential for sparsification.

71:2
Optimal Sparsification for Some Binary CSPs Using Low-Degree Polynomials The vast majority of the currently known results in this direction are negative [10,18,19,20], stating that no nontrivial sparsification is possible under plausible complexity-theoretic assumptions. For example, Dell and van Melkebeek [10] obtained such a result for CNF-Satisfiability with clauses of size at most d (d-cnf-sat), for each fixed d ≥ 3. Assuming NP ⊆ coNP/poly, there is no polynomial-time algorithm that compresses any n-variable instance of d-cnf-sat to an equivalent instance with O(n d−ε ) bits for ε > 0. Since there are O(n d ) possible clauses of size at most d over n variables, the trivial compression scheme that outputs a bitstring of length O(n d ), denoting for each possible clause whether it occurs in the instance or not, is optimal up to n o (1) factors.
A problem for which nontrivial polynomial-time sparsification is possible was recently discovered by the current authors [20]. Any n-variable instance of the Not-All-Equal CNF-Satisfiability problem with clauses of size at most d (d-nae-sat) can efficiently be compressed to an equivalent instance with O(n d−1 ) clauses, which can be encoded in O(n d−1 log n) bits. The preprocessing algorithm is based on a linear-algebraic lemma by Lovász [27] to identify clauses that are implied by others, allowing a reduction from Θ(n d ) clauses to O(n d−1 ). This sparsification for d-nae-sat forms the starting point for this work. Since d-cnf-sat and d-nae-sat can both be seen as constraint satisfaction problems (CSPs) with a binary domain, it is natural to ask whether the positive results for d-nae-sat extend to other binary CSPs. The difference between d-cnf-sat and d-nae-sat shows that the type of constraints that one allows, affects the compressibility of the resulting CSP. The goal of this paper is to understand how the optimal compression size for a binary CSP depends on the type of legal constraints, with the aim of obtaining matching upper and lower bounds.
Before presenting our results, we give an example to illustrate our methods. Consider the NP-complete Exact d-CNF-Satisfiability (Exact d-sat) problem, which asks whether there is a truth assignment that satisfies exactly one literal in each clause; the clauses have size at most d. While there are Θ(n d ) different clauses that can occur in an instance with n variables, the exact nature of the problem makes it possible to reduce any instance to an equivalent one with n + 1 clauses. A clause such as x 1 ∨ x 3 ∨ ¬x 5 naturally corresponds to an equality constraint of the form x 1 + x 3 + (1 − x 5 ) = 1, since a 0/1-assignment to the variables satisfies exactly one literal of the clause if and only if it satisfies the equality. To find redundant clauses, transform each of the m clauses into an equality to obtain a system of equalities Ax = b where A is an m × n matrix, x is the column vector (x 1 , . . . , x n ), and b is an integer column vector. Using Gaussian elimination, one can efficiently compute a basis B for the row space of the extended matrix (A|b): a set of equalities such that every equality can be written as a linear combination of equalities in B. Since (A|b) has n + 1 columns, its rank is at most n + 1 and the basis B contains at most n + 1 equalities. To perform data reduction, remove all clauses from the Exact d-sat instance whose corresponding equalities do not occur in B. If an assignment satisfies f 1 (x) = b 1 and f 2 (x) = b 2 , then it also satisfies their sum f 1 (x) + f 2 (x) = b 1 + b 2 , and any linear combination of the satisfied equalities. Since any equality not in B can be written as a linear combination of equalities in B, a truth assignment satisfying all clauses from B must necessarily also satisfy the remaining clauses, which shows the correctness of the data reduction procedure. The resulting instance can be encoded in O(n log n) bits, as each of the remaining n + 1 clauses has d ∈ O(1) literals.

Our results
Our positive results are generalizations of the linear-algebraic data reduction tool for binary CSPs presented above. They reveal that the O(n)-bit compression for Exact d-sat, the O(n d−1 )-bit compression for d-nae-sat, and the O(n d )-bit compression for d-cnf-sat are samples of a gliding scale of problem complexity: more tightly constrained problems can be compressed better. We formalize this idea by considering a generic CSP whose constraints are of the form f (x) = 0, where f is a bounded-degree polynomial and the constraint demands that x is a root of f . The example given earlier shows that Exact d-sat can be expressed using degree-1 polynomials. We show that d-nae-sat and d-cnf-sat can be expressed using equalities of polynomial expressions of degree d − 1 and d. We study the following problem: Using a generalization of the argument presented above, the number of constraints in an instance of d-Polynomial root CSP can efficiently be reduced to O(n d ), even when the number of variables that occur in a constraint is not restricted. The latter implies, for example, that using degree-1 polynomials one can express the Exact sat problem with clauses of arbitrary size. When the number of variable occurrences in a constraint can be as large as n, it may take Ω(n) bits to encode a single constraint. After reducing the number of clauses in an Exact sat instance to n + 1, one may therefore still require Θ(n 2 ) bits to encode the instance. This turns out to be unavoidable: we prove that Exact sat has no sparsification of size O(n 2−ε ) for any ε > 0, unless NP ⊆ coNP/poly. In general, we compress instances of d-Polynomial root CSP to bitsize O(n d+1 ) when each constraint can be encoded in O(n) bits. We prove that no compression to size O(n d+1−ε ) is possible unless NP ⊆ coNP/poly. When each constraint can be encoded in O(1) bits, the constraint reduction scheme reduces the size of an instance to O(n d ). As we will show that d-nae-sat can be modeled using polynomials of degree d − 1, this method strictly generalizes our earlier results [20] for d-nae-sat. The linear-algebraic data reduction tool described above works over arbitrary fields F , allowing us to capture constraints such as "the number of satisfied literals in the clause is exactly two, when evaluated modulo 3". We therefore extend our study to the d-Polynomial root CSP problem over arbitrary fields F , and obtain similar positive and negative results.
Finally, we consider binary CSPs whose constraints are formed by inequalities, rather than equalities, of degree-d polynomials. This leads to the following generic problem: We present upper and lower bounds for problems of this type. When the polynomials are evaluated over a structure that is not a field, the situation changes significantly. For example, CSPs with constraints of the type "the number of satisfied literals in the clause is 1 or 2, when evaluated modulo 6" behave differently than the corresponding problem modulo 5, or modulo 7, because the integers modulo 6 do not form a field. Both our upper-and lower bound techniques fail when defining constraints with respect to composite moduli. We present connections to different areas of theoretical computer science where the distinction between prime and composite moduli plays a big role. More concretely, we show that obtaining polynomial sparsification upper bounds for d-Polynomial non-root CSP over the integers modulo a composite, would resolve a long-standing problem concerning the representation of the or-function using low-degree polynomials (cf. [2,3,29]).

Related work
Schaefer's Theorem [28] is a classic result relating the complexity of a binary CSP to the type of allowed constraints, separating the NP-complete from the polynomial-time solvable cases. A characterization of the kernelization complexity of min-ones CSPs parameterized by the number of variables was presented by Kratsch and Wahlström [25]. There are several parameterized complexity results for CSPs [7,9,24].

Preliminaries
A parameterized problem Q is a subset of Σ * × N, where Σ is a finite alphabet. Let Q, Q ⊆ Σ * × N be parameterized problems and let h : N → N be a computable function. A generalized kernel for Q into Q of size h(k) is an algorithm that, on input (x, k) ∈ Σ * × N, takes time polynomial in |x| + k and outputs an instance (x , k ) such that: 1. |x | and k are bounded by h(k), and is a polynomial. Since a polynomial-time reduction to an equivalent sparse instance yields a generalized kernel, we use lower bounds for the sizes of generalized kernels to prove the non-existence of sparsification algorithms.
A linear-parameter transformation from a parameterized problem Q to a parameterized problem Q is a polynomial-time algorithm that transforms any instance (x, k) of Q into an equivalent instance (x , k ) of Q such that k ∈ O(k). It is easy to see (cf. [6]) that the existence of a linear-parameter transformation from Q to Q , together with a (generalized) kernel of size O(k d ) for Q , yields a generalized kernel of size O(k d ) for Q. By contraposition, the existence of such a transformation implies that when Q does not have generalized kernels of size O(k d−ε ), then Q does not have generalized kernels of size O(k d−ε ) either.
We use the framework of cross-composition [5] to establish kernelization lower bounds, requiring the definitions of polynomial equivalence relations [5, Def. 3.1] and or-cross- Theorem 1 ([5, Theorem 6]). Let L ⊆ Σ * be a language, let Q ⊆ Σ * × N be a parameterized problem, and let d, ε be positive reals. If L is NP-hard under Karp reductions, has an or-cross-composition into Q with cost f (t) = t 1/d+o (1) , where t denotes the number of instances, and Q has a polynomial (generalized) kernelization with size bound O(k d−ε ), then NP ⊆ coNP/poly.
For d ∈ N we will refer to an or-cross-composition of cost f (t) = t 1/d log(t) as a degree-d cross-composition. By Theorem 1, a degree-d cross-composition can be used to rule out generalized kernels of size O(k d−ε ). Note that when studying sparsification, we use the number of vertices or variables in the instance (which is usually denoted by n) as the parameter value (which is usually denoted by k).
When interpreting truth assignments as elements of a field, we equate the value true with the 1 element in the field (multiplicative identity), and the value false with the 0 element (additive identity). Consequently, for a boolean variable x its negation ¬x corresponds to (1 − x). We let Z/mZ denote the integers modulo m, which form a field if m is a prime number. The degree of a multivariate polynomial is the maximum degree of its monomials. Let f (x 1 , . . . , x d ) be a d-variate polynomial over a field F . The root set of f is the algebraic variety {(e 1 , . . . , e d ) ∈ F d | f (e 1 , . . . , e d ) = 0}. For a field F and a finite set S ⊆ F of elements, the univariate polynomial f (x) := s∈S (x − s) over F of degree |S| has root set exactly S. We say that a field F is efficient if the field operations and Gaussian elimination can be done in polynomial time in the size of a reasonable input encoding. The field of rational numbers Q, and all finite fields, are efficient. We use [n] to denote {1, . . . , n}.
The O-notation suppresses polylogarithmic factors: O(n) = O(n log c n) for a constant c. For statements marked with a ( ), the proof can be found in the full version [21].

3
Kernel upper bounds

Polynomial root CSP
We start by showing how to reduce the number of constraints in instances of d-Polynomial root CSP, by extending the argument presented in the introduction.

Theorem 2.
There is a polynomial-time algorithm that, given an instance (L, V ) of d-Polynomial root CSP over an efficient field F , outputs an equivalent instance (L , V ) with at most n d + 1 constraints such that L ⊆ L.
Proof. Given a list L of polynomial equalities over variables V for d-Polynomial root CSP, we use linear algebra to find redundant constraints. Observe that (x i ) c = x i for all 0/1-assignments and c ∈ N + . As constraints are evaluated over 0/1-assignments, we may assume without loss of generality that the monomials in each of the polynomials are multilinear: each monomial consists of a coefficient from F multiplied by distinct variables.
Create a matrix A with |L| rows and a column for every multilinear monomial of degree at most d over variables from V . Let position a i,j in A be the coefficient of the monomial corresponding to column j in the polynomial equality corresponding to row i.
Compute a basis B of the row space of matrix A, for example using Gaussian elimination [17], and let L consist of the equalities in L whose corresponding row appears in the basis. Since L ⊆ L, it follows that if the original instance has a satisfying assignment, the reduced instance has a satisfying assignment as well. The crucial part of the correctness proof is to establish the converse. Proof. Consider any equality (f (x) = 0) ∈ L \ L , since equalities in L are trivially satisfied, and assume it corresponds to the i'th matrix row. Let f j (x) be the polynomial represented in the j'th row of matrix A for j ∈ [|L|]. Without loss of generality, let the basis of A correspond to its first m rows a 1 , . . . , a m . We then have i > m, and by the definition of basis there exist β 1 , . . . , β m ∈ F such that a i = m j=1 β j a j . Let t be the column vector containing, for each multilinear monomial of degree ≤ d in variables x 1 , . . . , x n , the evaluation under τ . For example, for monomial x 1 x 3 it contains τ (x 1 ) · τ (x 3 ). By using the same order of monomials as in the construction of A, we obtain for all j ∈ [|L|] that f j (τ (x 1 ), . . . , τ (x n )) = a j t, the inner product of a j and t. It follows that a j t = 0 for all j ∈ [m], since satisfying L implies f j (τ (x 1 ), . . . , τ (x n )) = 0. To conclude the proof, note that Proof. The size of a basis of any matrix over a field equals its rank, which is bounded by the number of columns. As there is a column for each multilinear monomial of degree at most d, there are at most Let F (x) be a polynomial with root set S j (mod p) of degree at most |S j |. We obtain F (f (x)) ≡ 0 (mod p) if and only if x satisfies the clause. Note that the degree of F (f (x)) is at most |S j | ≤ d.
Applying Theorem 2 to the resulting instance of d-Polynomial root CSP identifies a subset of at most n d + 1 constraints which preserve the answer to the Sat problem. Each clause contains at most 2n literals, which can be encoded in O(log n) bits each. Additionally, for each clause we need to store the set S i of at most d integers, which have value at most 2n in relevant inputs. As d is a constant, the instance can be encoded in O(n d+1 log n) bits.
Corollary 5 yields a new way to get a nontrivial compression for d-nae-sat, which is conceptually simpler than the existing approach which requires an unintuitive lemma by Lovász [27]. The new approach gives the same size bound as given earlier [20].

Polynomial non-root CSP
In this section we consider d-Polynomial non-root CSP. In Section 4.2 we will show that, over the field of rational numbers, the problem cannot be compressed to size polynomial in n, unless NP ⊆ coNP/poly. We therefore consider the field Z/pZ of integers modulo a prime p. Let F : Z/pZ → Z/pZ be a polynomial of degree p − 1 with root set {1, . . . , p − 1} modulo p, which exists since Z/pZ is a field. Then f (x) ≡ 0 (mod p) can equivalently be stated as F (f (x)) ≡ 0 (mod p). It is easy to see that F (f (x)) is a polynomial of degree at most d(p − 1). Therefore, L can be written as an instance of d(p − 1)-Polynomial root CSP by replacing every polynomial f by F • f . By Theorem 2, the proof follows.
In Section 4.2 we will establish a nearly-matching lower bound counterpart to Theorem 7.

4
Kernel lower bounds

Polynomial root CSP
We now turn our attention to lower bounds, starting with d-Polynomial root CSP over Q.
We start by proving that Exact Red-Blue Dominating Set does not have generalized kernels of bitsize O(n 2−ε ) for any ε > 0, unless NP ⊆ coNP/poly. The same lower bound for 1-Polynomial root CSP will follow by a linear-parameter transformation. We then show how to generalize this result to d-Polynomial root CSP. As a starting problem for the cross-composition we will use the NP-hard Red-Blue Dominating Set (rbds) [11,22]. Exact Red Blue Dominating Set (erbds) is defined similarly, except that every vertex in B must have exactly one neighbor in D. Furthermore we will not bound the size of such a set, but merely ask for the existence of any erbds. Proof. We will prove this result by giving a degree-2 cross-composition from rbds to erbds. We start by giving a polynomial equivalence relation R on inputs of rbds. Let two instances of rbds be equivalent under R if they have the same number of red vertices, the same number of blue vertices, and the same maximum size of a rbds. It is easy to check that R is a polynomial equivalence relation.
Assume we are given t instances of rbds, labeled X i,j for i, j ∈ [ √ t], from the same equivalence class of R. If the number of instances given is not a square, we duplicate one of the input instances until a square number is reached. Since this changes the number of 71:8

Optimal Sparsification for Some Binary CSPs Using Low-Degree Polynomials
Gadget c  inputs by at most a factor four, this does not influence the cross-composition. Instance X i,j consists of graph G i,j with a set of red vertices R i,j and blue vertices B i,j . Call the number of red vertices in every instance m R , the number of blue vertices m B , and the required size of the dominating set k. For each instance enumerate the red vertices as r 1 , . . . , r m R and the blue vertices as b 1 , . . . , b m B , arbitrarily. Create instance G for erbds by the following steps. Figure 1 shows a sketch of G .

For each i ∈ [k] add the edge from
]. By steps 1 to 3, the graph induced by the vertices in U ∪ V consists of k vertex-disjoint copies of G , . The next steps are used to ensure that there are exactly k vertices from U in any erbds, which must all belong to the same set U .  + 1 red vertices labeled a 1 , . . . , a k+1 that are all connected to a blue vertex b that is private to the gadget. Furthermore, in gadget c i,j , the vertex

Create k blue vertices
. By this construction an erbds uses at most one red vertex from each gadget, to dominate one vertex from V .
It follows that exactly one vertex in this set is in E for all i. By the previous argument the vertex cannot be from U x for x = , hence it is from U .

Claim 12.
If G has an erbds, then some input X i,j has a rbds of size at most k.
Proof. Assume G has an erbds, say E. By Claim 11, there exists 2 ∈ [ √ t], such that for every j ∈ [m B ] at least one of the vertices in {v i,j | i ∈ [k]} has a neighbor in E ∩ U . By Claim 9, there exists 1 ∈ [ √ t] with U i ∩ E = ∅ for all i = 1 , so these neighbors lie in U 1 . We now construct a rbds E for instance X 1 , 2 . For each j ∈ [m R ], add r j to E if E ∩ {u 1 i,j | i ∈ [k]} = ∅. By Claim 9, it follows that E has size at most k, as required. It remains to show that every vertex in B 1, 2 has a neighbor in E . If some vertex b j from B 1 , 2 does not have a neighbor in E , then none of the vertices {v 2 i,j | i ∈ [k]} have a neighbor in E ∩ U 1 . This contradicts Claim 11. Hence E is an rbds of size at most k for B 1, 2 .
Claim 13. If some input instance has a rbds of size at most k, then G has an erbds.
Proof. Suppose instance X 1, 2 has a rbds E of size k consisting of vertices r i1 , . . . , r i k ⊆ R 1, 2 . We construct an erbds E for G . Start by choosing vertices u 1 x,ix for x ∈ [k], so for every vertex in E we pick one vertex in the erbds for G . Furthermore we choose the red vertices z 1 and y 2 to be in E. To exactly dominate the blue vertices in V , we use the gadgets in C as follows. For = 2 ∈ [ √ t], add red vertex a x of gadget c x,j if vertex v x,j does not yet have a neighbor in E, for j ∈ [m R ]. Else, add vertex a k+1 of gadget c x,j to E, in order to exactly dominate the blue vertex of this gadget.
To exactly dominate the vertices in V 2 we apply a similar procedure, except that gadget c 1,j cannot be used since its blue vertex b is already dominated by y 2 . Since E is a rbds of instance X 1, 2 , for each j ∈ [m B ] at least one vertex from set {v 2 i,j | i ∈ [k]} has a neighbor 71:10 Optimal Sparsification for Some Binary CSPs Using Low-Degree Polynomials in E ∩ U . As such, the k − 1 remaining gadgets can be used to each dominate one of the k − 1 remaining vertices in this set, if they do not already have a neighbor in E ∩ U . If no red vertex of a gadget is needed to dominate, we choose vertex a k+1 of the gadget in E to dominate the blue vertex in the gadget. It is straight-forward to verify that this results in an erbds for G .
From Claims 12 and 13 it follows that graph G has an erbds if and only if at least one of the input instances has a rbds of size at most k. The graph G has O( √ t · (m R + m B ) 3 ) vertices, which is suitably bounded for a cross-composition. By Theorem 1, it follows that erbds parameterized by the number of vertices n does not have a generalized kernel of size O(n 2−ε ) for any ε > 0, unless NP ⊆ coNP/poly.
Using Theorem 8 we provide lower bounds for constraint satisfaction problems. Proof. The case d = 1 is covered by Corollary 14; we consider d ≥ 2 and give a degree-(d + 1) cross-composition from rbds, re-using some parts of the proof of Theorem 8. Suppose we are given t = r d+1 instances of rbds from the same equivalence class of R, all having m R red vertices and m B blue vertices. By a similar padding argument as before, we may assume r is an integer. Split the inputs into groups of size r 2 and apply the cross-composition of Theorem 8 to each group, followed by the linear-parameter transformation in Corollary 14. We obtain r d−1 instances of 1-Polynomial root CSP with O(r · poly(m R + m B )) variables each, such that the answer to each composed instance is the logical or of the answers to the rbds instances in its group. Label the instances resulting from the group compositions X i1,...,i d−1 with i 1 , . . . , i d−1 ∈ [r]. They all use the same number of variables; label the variables in each instance as x 1 , . . . , x q . Create an instance L of d-Polynomial root CSP as follows: . Then from instance X i1,...,i d−1 , all polynomial equations are copied to L and multiplied by 1 on both sides. Hence they are satisfied by the assignment to x.
(⇐) Suppose instance X i1,...,i d−1 of 1-Polynomial root CSP has a satisfying assignment. Set the x-variables according to this assignment. Furthermore, set variables y z iz for z ∈ [d − 1] to 1, set all other variables to 0. Thereby the sum of variables in each set Y i is 1, as required. Furthermore, any equation added in Step 2 of the construction is satisfied in the following way. If it belongs to instance X i1,...,i d−1 , it is satisfied by definition. Equations belonging to any other instance are trivially satisfied since both sides are multiplied by zero.
Observe that the polynomials constructed in Theorem 15 have a simple form: each polynomial is a product of (d − 1) Y -variables multiplied by a sum of distinct variables from x. Each polynomial can therefore be encoded in O(n) bits, where n is the number of variables in the constructed CSP. The sparsification of Theorem 2 therefore encodes such instances in O(n d+1 ) bits. The lower bound shows that this is optimal up to n o(1) factors.
We expect the lower bound of Theorem 15 to extend to arbitrary finite fields of prime order, except for the case d = 1 over Z/2Z, which is polynomial-time solvable [28].

Polynomial non-root CSP
We start our lower bound discussion for d-Polynomial non-root CSP by considering polynomials over Q. 1-Polynomial non-root CSP over Q does not have a generalized kernel of size bounded by any polynomial in n, unless NP ⊆ coNP/poly. This follows from the fact that CNF-Satisfiability parameterized by the number of variables does not have a kernel of size polynomial in n unless NP ⊆ coNP/poly [10,14], together with the fact that a clause such as (x 1 ∨¬x 3 ∨x 4 ) is satisfied by a 0/1-assignment if and only if x 1 +(1−x 3 )+x 4 = 0 over Q. In the remainder of the section we investigate the behavior over finite fields.
In Theorem 7 we provided a kernel for d-Polynomial non-root CSP over Z/pZ for primes p. It is natural to ask whether similar results can be obtained when working with polynomials modulo an arbitrary integer m. When m is composite, our kernelization fails. We can show that this is not a shortcoming of our proof strategy, but a necessity due to the fact that constraints expressed by equalities of degree-d polynomials modulo composite numbers can model more complex constraints than degree-d polynomials modulo a prime. For example, it is known (cf. [1, §2]) that there is a degree-3 polynomial f over the integers modulo 6 which represents a logical or of size 27 in the following way: By this expressibility of a size-27 or by a polynomial of degree 3 over Z/6Z using the same variables, it is easy to give a linear-parameter transformation from 27-cnf-sat to 3-Polynomial non-root CSP (mod 6). Using known lower bounds for d-cnf-sat [10, Theorem 1], this implies the latter problem has no kernel of O(n 27−ε ) bits, unless NP ⊆ coNP/poly. Plugging in the degree of 3 and modulus 6 into the bound of Theorem 7 would give a reduction to O(n 3·(6−1) ) = O(n 15 ) constraints and would contradict the lower bound. The example therefore shows that the problem is more complex for composite moduli. For more general non-primes, we can prove a lower bound using a general construction by Bhowmick et al. [3] of low-degree polynomials representing or in the sense of Equation 1. In case m does not have a prime factorization in which all primes are distinct, it is possible to obtain weaker a lower bound using a result by Barrington et al. [2], which proves that there exists a polynomial of degree O( N 1/r ) that represents a logical or when taken modulo m. Here is the largest prime factor of m. For prime moduli, we provide a lower bound almost matching the upper bound in Section 3.2.

Conclusion
We have given upper and lower bounds on the kernelization complexity of binary CSPs that can be represented by polynomial (in)equalities, obtaining tight sparsification bounds in several cases. Our main conceptual contribution is to analyze constraints on binary variables based on the minimum degree of multivariate polynomials whose roots, or non-roots, capture the satisfying assignments. The ultimate goal of this line of research is to characterize the optimal sparsification size of a binary CSP based on easily accessible properties of the constraint language. To reach this goal, several significant hurdles have to be overcome. For d-Polynomial non-root CSP (mod 6), we do not know of any way to reduce the number of constraints to polynomial in n. This difficulty is connected to longstanding questions regarding the minimum degree of a multivariate polynomial modulo 6 that represents the or-function of n variables in the sense of Equation 1. As exploited in the construction of Theorem 16, if the or-function with g(d) inputs can be represented by polynomials of degree d, then d-Polynomial non-root CSP cannot be compressed to size O(n g(d)−ε ) unless NP ⊆ coNP/poly. By contraposition, a kernelization with size bound O(n h(d) ) implies a lower bound of h −1 (d) on the degree of a polynomial representing an or of arity h(d), assuming NP ⊆ coNP/poly. Kernel bounds where h(d) is polynomially bounded in d, would therefore establish inverse polynomial lower bounds on the degree of polynomials representing an n-variable or modulo 6. However, the current-best degree lower bound [29] is only Ω(log n), which has not been improved in nearly two decades (cf. [3, §1.4]).
When it comes to CSPs whose constraints are of the form "the number of satisfied literals in the clause belongs to set S", many cases remain unsolved. We can prove ( ) using the Green-Tao theorem [15] that for constraints of the form "the number of satisfied literals is a prime number", no generalized kernel of size polynomial in n exists unless NP ⊆ coNP/poly. On the other hand, Corollary 5 gives good compressions for problems of the type "the number of satisfied literals in the clause is a multiple of three". Is sparsification possible when a constraint requires the number of satisfied literals to be a square, for example?
A simple example of a CSP whose kernelization complexity is currently unclear has constraints of the form "the number of satisfied literals is one or two, modulo six". The approach of Theorem 2 fails, since there is no polynomial modulo six with root set {1, 2}.
Finally, we mention that all our results extend to the setting of min-ones and max-ones CSPs, in which one has to find a satisfying assignment that sets at least, or at most, a given number of variables to true. For example, our results easily imply that Exact Hitting Set parameterized by the number of variables n has a sparsification of size O(n 2 ), which cannot be improved to O(n 2−ε ) unless NP ⊆ coNP/poly.