Tight Size-Degree Bounds for Sums-of-Squares Proofs

We exhibit families of $4$-CNF formulas over $n$ variables that have sums-of-squares (SOS) proofs of unsatisfiability of degree (a.k.a. rank) $d$ but require SOS proofs of size $n^{\Omega(d)}$ for values of $d = d(n)$ from constant all the way up to $n^{\delta}$ for some universal constant$\delta$. This shows that the $n^{O(d)}$ running time obtained by using the Lasserre semidefinite programming relaxations to find degree-$d$ SOS proofs is optimal up to constant factors in the exponent. We establish this result by combining $\mathsf{NP}$-reductions expressible as low-degree SOS derivations with the idea of relativizing CNF formulas in [Kraj\'i\v{c}ek '04] and [Dantchev and Riis'03], and then applying a restriction argument as in [Atserias, M\"uller, and Oliva '13] and [Atserias, Lauria, and Nordstr\"om '14]. This yields a generic method of amplifying SOS degree lower bounds to size lower bounds, and also generalizes the approach in [ALN14] to obtain size lower bounds for the proof systems resolution, polynomial calculus, and Sherali-Adams from lower bounds on width, degree, and rank, respectively.


Introduction
Let f 1 , . . . , f s ∈ R[x 1 , . . . , x n ] be real, multivariate polynomials. Then the Positivstellensatz proven in [Kri64,Ste73] says (as a special case) that the the system of equations That there can exist no solution given an expression of the form (1.2) is clear, but what is more interesting is that there always exists such an expression to certify unsatisfiability. We refer to (1.2) as a Positivstellensatz proof or Sums-of-squares (SOS) proof of unsatisfiability, or as an SOS refutation, 1 of (1.1). We remark that the Positivstellensatz also applies if we add inequalities h 1 ≥ 0, . . . , h t ≥ 0 to the system of equations and allow terms −h j ℓ q 2 j,ℓ on the right-hand side in (1.2). The degree 2 of an SOS refutation is the maximal degree of any g j f j . The search for proofs of constant degree d is automatizable as shown in a sequence of works by Shor [Sho87], Nesterov [Nes00], Lasserre [Las01], and Parrilo [Par00]. What this means is that if there exists a degree-d SOS refutation for a system of polynomial equalities (and inequalities) over n variables, then such a refutation can be found in polynomial time n O(d) . Briefly, one can view (1.2) as linear system of equations in the coefficients of g j and u = ℓ q 2 ℓ with the added constraint that u is a sum of squares, and such a system can be solved by semidefinite programming in d/2 rounds of the Lasserre SDP hierarchy.
In the last few years there has been renewed interest in sums-of-squares in the context of constraint satisfaction problems (CSPs) and hardness of approximation, as witnessed by, for instance, [BBH + 12, OZ13,Tul09]. These works have highlighted the importance of SOS degree upper bounds for CSP approximability, and this is currently a very active area of study.
Our focus in this paper is not on algorithmic questions, however, but more on sums-of-squares viewed as a proof system (also referred to in the literature as Positivstellensatz or Lasserre). This proof system was introduced by Grigoriev and Vorobjov [GV01] as an extension of the Nullstellensatz proof system studied by Beame et al. [BIK + 94], and Grigoriev established SOS degree lower bound for unsatisfiable F 2 -linear equations [Gri01b] (also referred to as the 3-XOR problem when each equation involves at most 3 variables) and for the knapsack problem [Gri01a].
Given the connections to semidefinite programming and the Lasserre SDP hierarchy, it is perhaps not surprising that most works on SOS lower bounds have focused on the degree measure. However, from a proof complexity point of view it is also natural to ask about the minimal size of SOS proofs, measured as the number of monomials when all polynomials in each term in (1.2) are expanded out as linear combinations of monomials. Such SOS size lower bounds were proven for knapsack in [GHP02] and F 2 -linear systems of equations in [KI06], 3 and tree-like size lower bounds for other formulas were also obtained in [PS12].
A wider interest in this area of research was awakened when Schoenebeck [Sch08] (essentially) rediscovered Grigoriev's result [Gri01b], which together with further work by Tulsiani [Tul09] led to integrality gaps for a number of constraint satisfaction problems. There have also been papers such as [BPS07] and [GP14] focusing on semantic versions of the proof system, with less attention to the actual syntactic derivation rules used. We refer the reader to, for instance, the introductory section of [OZ13] for more background on sums-of-squares and connections to hardness of approximation, and to the survey [BS14] for an in-depth discussion of SOS as an approximation algorithm and the intriguing connections to the so-called Unique Games Conjecture [Kho02].

Our Contribution
As discussed above, if a system of polynomial equalities and inqualities over n variables can be shown inconsistent by SOS in degree d, then by using semidefinite programming one can find an SOS refutation of the system in time n O(d) . It is natural to ask whether this is optimal, or whether there might exist "shortcuts" that could lead to SOS refutations more quickly.
We prove that there are no such shortcuts in general, but that the running time obtained by using the Lasserre semidefinite programming relaxations to find SOS proofs is optimal up to the constant in the exponent. We show this by constructing formulas on n variables (which can be translated to systems of polynomial equalities in a canonical way) that have SOS refutations of degree d but require refutations of size n Ω(d) . Our lower bound proof works for d from constant all the way up to n δ for some constant δ.
Then there is a family of 4-CNF formulas {F n } n∈N + with O n 2 clauses over O(n) variables such that F n is refutable in sums-of-squares in degree Θ(d) but any SOS refutation of F n requires size n Ω(d) .
This theorem extends an analogous result joint by the two authors with Atserias in [ALN14] for the proof systems resolution, polynomial calculus, and Sherali-Adams, 4 where upper bounds on refutation size in terms of width, degree, and rank, respectively, were shown to be tight up to the multiplicative constant in the exponent. Theorem 1.1 works for all of these proof systems, since the upper bound is in fact on resolution width (i.e., the size of a largest clause in a resolution refutation), not just SOS degree, 1 Introduction and in this sense the theorem subsumes the results in [ALN14]. The concrete bound we obtain for the exponent inside the asymptotic notation in the n Ω(d) size lower bound is very much worse, however, and therefore the gap between upper and lower bounds is very much larger than in [ALN14].
We want to emphasize that the size lower bound in Theorem 1.1 holds for SOS proofs of arbitrary degree. Thus, going to higher degree (i.e., higher levels of the Lasserre SDP hierarchy) does not help, since even arbitrarily large degree cannot yield shorter proofs. This is an interesting parallel to the paper [LRST14] exhibiting problems for which a (symmetric) SDP relaxation of arbitrary degree but bounded size n d does not do much better than the systematic relaxation of degree d.

Techniques
We obtain the result in Theorem 1.1 as a special case of a more general method of amplifying lower bounds on width (in resolution), degree (in polynomial calculus) and rank/degree (in Sherali-Adams and Lasserre/SOS) to size lower bounds in the corresponding proof systems. This method is in some sense already implicit in [ALN14], which in turn relies heavily on an earlier paper by Atserias et al. [AMO13], but it turns out that extracting the essential ingredients and making them explicit is helpful for extending the results in [ALN14] to an analogue for sums-of-squares. We give a brief, informal description of the three main ingredients of the method below.
(i) Find a base CNF formulas hard with respect to width/degree/rank To start, we need to find a base problem, encoded as an unsatisfiable CNF formula, that is "moderately hard" for the proof system at hand. What this means is that we should be able to prove asymptotically tight bounds on width if we are dealing with resolution, on degree for polynomial calculus, and on degree/rank for Sherali-Adams and sums-of-squares. It then follows by a generic argument (as discussed briefly above for SOS) that a bound O(d) on width/degree/rank implies an upper bound n O(d) on proof size.
In [AMO13,ALN14] the pigeonhole principle served as the base problem. This principle, which has been extensively studied in proof complexity, is encoded in CNF as pigeonhole principle (PHP) formulas saying that there is a one-to-one mapping of m pigeons into n pigeonholes for m > n. For sums-ofsquares we cannot use PHP formulas, however, since they are not hard with respect to SOS degree. Instead we construct an SOS reduction in low degree from inconsistent systems of F 2 -linear equations to the clique problem, and then appeal to the result in [Gri01b,Sch08] briefly discussed above to obtain the following degree lower bound. Theorem 1.2 (informal). Given k ∈ N + , there is a graph G and a 3-CNF formula k-Clique(G) of size polynomial in k with the following properties: 1. The graph G does not contain a k-clique, but the formula k-Clique(G) claims that it does.
(ii) Relativize the CNF formulas The second step is to take the formulas for which we have established width/degree/rank lower bounds and relativize them. Relativization is an idea that seems to have been considered for the first time in the context of proof complexity by Krajíček [Kra04] and that was further developed by Dantchev and Riis [DR03]. Very loosely, it can be described as follows.
Suppose that we have a CNF formula encoding (the negation of) a combinatorial principle saying that some set S has a property. For instance, the CNF formula could encode the pigeonhole principle discussed above, or could claim the existence of a totally ordered set of n elements where no element in the set is minimal with respect to the ordering (these latter CNF formulas are known as ordering principle formulas, least number principle formulas, or graph tautologies in the literature).
The formula at hand is then relativized by constructing another formula encoding that there is a (potentially much larger) set T containing a subset S ⊆ T for which the same combinatorial principle holds.
For the ordering principle, we can encode that there exists a non-empty ordered subset S ⊆ T of arbitrary size such that it is possible for all elements in S to find a smaller element inside S. This relativization step transforms the previously very easy ordering principle formulas into relativized versions that are exponentially hard for resolution [Dan06,DM14]. For the PHP formulas, we specify that we have a set of M ≫ m pigeons mapped into into n < m holes such that there exists a subset of m pigeons that are mapped injectively.
In our setting, it will be important that the relativization does not make the formulas too hard. We do not want the hardness to blow up exponentially and instead would like the upper bound obtained in the first step above to scale nicely with the size of the relativization. For our general approach to work, we therefore need formulas talking about some domain being mapped to some range, where we can enlarge the domain while keeping the range fixed, and where in addition the mapping is symmetric in the sense that permuting the domain does not change the formula.
For this reason, relativizing the ordering principle formulas does not work for our purposes. Pigeonhole principle formulas have this structure, however, which is exactly why the proofs in [ALN14] go through. As already mentioned, PHP formulas will not work for sums-of-squares, but we can relativize the formulas in Theorem 1.2 by saying that there is a large subset of vertices such that there is a k-clique hiding inside such a subset.
(iii) Apply random restrictions to show proof size lower bounds In the final step, we use random restrictions to establish lower bounds on proof size for the relativized CNF formulas obtained in the second step. This part of the proof is relatively standard, except for a crucial twist in the restriction argument introduced in [AMO13].
Assume that there is a small refutation in sums-of-squares (or whatever proof system we are studying) of the relativized formula claiming the existence of a subset of size m ≪ M with the given combinatorial property. Now hit the formula (and the refutation) with a random restriction that in effect chooses a subset of size m, and hence gives us back the original, non-relativized formula. This restriction will be fairly aggressive in terms of the number of variables set to fixed truth values, and hence it will hold with high probability that the restricted refutation has no monomials of high degree (or, for resolution, no clauses of high width), since all such monomials will either have been killed by the restriction or at least have shrunk significantly. (We remark that making use of this shrinking in the analysis is the crucial extra feature added in [AMO13].) But this means that we have a refutation of the original formula in degree smaller than the lower bound established in the first step. Hence, no small refutation can exist, and the lower bound on proof size follows.
This concludes the overview of our method to amplify lower bounds on width/degree/rank to size. It is our hope that developing such a systematic approach for deriving this kind of lower bounds, and making explicit what conditions are needed for this approach to work, can also be useful in other contexts.

Organization of This Paper
The rest of this paper is organized as follows. We start in Section 2 by reviewing the definitions and notation used, and also stating some basic facts that we will need. In Section 3, we prove a degree lower bound for CNF formulas encoding a version of the clique problem. We then present in Section 4 a general method for obtaining SOS size lower bounds from degree lower bounds (or from width, degree, and rank, respectively, for proof systems such as resolution, polynomial calculus, and Sherali-Adams). We conclude with a brief discussion of some possible directions for future research in Section 5.

Preliminaries
For a positive integer n, we use the standard notation [n] = {1, 2, . . . , n}. All logarithms in this paper are to base 2. A CNF formula F is a conjunction of clauses, denoted F = j C j , where each clause C is a disjunction of literals, denoted C = i a i . Each literal a is either a propositional variable x (a positive literal) or its negation x (a negative literal). We think of formulas and clauses as sets, so that there is no repetition and order does not matter. We consider polynomials on the same propositional variables, with the convention that, as an algebraic variable, x evaluates to 1 when it is true and to 0 when it is false. All polynomials in this paper are evaluated on 0/1-assignments, and live in the ring of real multilinear polynomials, which is the ring of real polynomials modulo the ideal generated by polynomials x 2 i − x i for all variables x i . In other words, all variables in all monomials have degree at most one, and monomial multiplication is defined by i∈A x i · i∈B x i = i∈A∪B x i . Since sums-of-squares derivations operate with polynomial equations and inequalities, in order to reason about CNF formulas we need to encode them in this language. For a clause C = C + ∨ C − , where we write C + and C − to denote the subsets of positive and negative literals, respectively, we define and encode C as the inequality Clearly, a clause C is satisfied by a 0/1-assignment if and only if the same assignment satisfies the inequality S(C) ≥ 1. For a variable x and a bit β ∈ {0, 1}, we define and for a sequence of variables x = (x i 1 , . . . x iw ) and a binary string β = (β 1 , . . . β w ), we define the indicator polynomial expanded out as a linear combination of monomials. That is, δ x=β is the polynomial that evaluates to 1 for 0/1-assignments satisfying the equalities x i j = β j for j = 1, . . . , w and to 0 for all other 0/1-assignments. We have the following useful fact.
Let F be a CNF formula over some set of variables denoted as Vars(F ), and let ρ be a partial assignment on Vars(F ). We write F↾ ρ to denote the formula F restricted by ρ, where all clauses C ∈ F satisfied by ρ are removed and all literals falsified by ρ in other clauses are removed. For a polynomial p over variables Vars(F ) (written, as always, as a linear combination of distinct monomials), we let p↾ ρ denote the polynomial obtained by substituting values for assigned variables and removing monomials that evaluate to 0. We extend this definition to sets of formulas or polynomials in the obvious way by taking unions.

Definition 2.2 (Sums-of-squares proof system).
A sums-of-squares derivation, or SOS derivation for short, of the polynomial inequality p ≥ 0 from the system of polynomial constraints where g 1 , . . . , g s are arbitrary polynomials and each u j is expressible as a sums of squares ℓ q 2 j,ℓ . A derivation of the equation p = 0 is a pair of derivations of p ≥ 0 and −p ≥ 0. A sums-of-squares refutation of (2.5) is a derivation of the inequality −1 ≥ 0 from (2.5).
The degree of an SOS derivation is the maximum degree among all the polynomials g j f j , u j h j , and u 0 in (2.6). The size of an SOS derivation is the total number of monomials (counted with repetition) in all polynomials g j f j , u j h j , and u 0 (all expanded out as linear combinations of distinct monomials). The size and degree of refuting an unsatisfiable system of polynomial constraints are defined by taking the minimum over all SOS refutations of the system with respect to the corresponding measure.
Remark 2.3. Readers more familiar with the usual definition of Positivstellensatz/sums-of-squares in the literature might be a bit puzzled by the use of multilinearity in Definition 2.2, and might also wonder where the axioms It is important to note that we have these axioms in our multilinear setting as well, although they are not explicitly mentioned. Equations of the form x 2 i − x i = 0 are tautological due to multilinearity, and the inequalities x i ≥ 0 and 1 − x i ≥ 0 are derivable by the squaring rule since in the multilinear setting we have Our choice of the multilinear setting is without any loss of generality and only serves to simplify the technical arguments slightly. It is easy to see that applying the multilinearization operator mapping x ℓ i to x i for every ℓ ≥ 1 to any SOS derivation over real polynomials yields a legal SOS derivation over multilinear real polynomials in at most the same size and degree. Thus, working in the multilinear setting can only make our lower bounds stronger. As to the upper bounds in this paper, we prove them in the resolution proof system discussed below, and the simulation of resolution by sums-of-squares in Lemma 2.6 below works also in the standard setting without multilinearization.
Let us state some useful basic properties of multilinear polynomials for later reference (and also provide a proof just for completeness).

Proposition 2.4 (Unique multilinear representation)
. Every function f : {0, 1} n → R has a unique representation as a multilinear polynomial. In particular, if p is a multilinear polynomial such that p(α) ∈ {0, 1} for all α ∈ {0, 1} n , then for every positive integer ℓ the equality p ℓ = p holds (where this is a syntactic equality of multlinear polynomials expanded out as linear combinations of distinct monomials).
Proof. The set of functions from {0, 1} n to R is a vector space of dimension 2 n . Any function f ( x) in this space can be represented as a linear combination β∈{0,1} n f (β) · δ x=β ( x). Since each δ x=β is a multilinear polynomial the multilinear monomials on n variables are a set of 2 n generators of the vector space. By linear independence they also form a basis, and hence the representation of a function as a linear combination of multilinear monomials is unique. The second part of the proposition now follows immediately since p ℓ and p compute the same function.
The upper bounds in this paper are shown in the weaker proof system resolution, which is defined as follows. A resolution derivation of a clause D from a CNF formula F is a sequence of clauses (D 1 , D 2 , . . . , D τ ) such that D τ = D and for every clause D i it holds that it is either a clause of F (an axiom), or is obtained by weakening from some D j ⊆ D i for j < i, or can be inferred from two clauses D ℓ , D j , ℓ < j < i, by the resolution rule that allows to derive the clause A ∨ B from two clauses A ∨ x and B ∨ x (where we say that A ∨ x and B ∨ x are resolved on x to yield the resolvent A ∨ B). If in a resolution derivation (D 1 , D 2 , . . . , D τ ) each clause D j is only used once in a weakening or resolution step to derive some D i for i > j, we say that the derivation is tree-like (such derivations may contain multiple copies of the same clause). A resolution refutation of F , or resolution proof for F , is a derivation of the empty clause (the clause containing no literals) from F .
The width of a clause is the number of literals in it, and the width of a CNF formula or resolution derivation is the maximal width of any clause in the formula or derivation. The size of a resolution derivation is the total number of clauses in it (counted with repetitions). The size and width of refuting an unsatisfiable CNF formula F is defined by taking the minimum over all resolution refutations of F with respect to the corresponding measure.
The following standard fact is easy to establish by forward induction over resolution derivations. We omit the proof. 6 Fact 2.5. Consider a partial assignment ρ which assigns ℓ variables. Let A be the unique clause of width ℓ such that A evaluates to false under ρ. If resolution can derive C in width w and size S from F↾ ρ , then resolution can derive A ∨ C in width at most w + ℓ and size at most S + 1 from F . Let us also state for the record the formal claim that SOS is more powerful than resolution in term of degree (and for constant degree also in terms of size). The next lemma is essentially Lemma 4.6 in [ALN14], except that there the lemma is stated for the Sherali-Adams proof system. Since SOS simulates Sherali-Adams efficiently with respect to both size and degree, however, the same bounds apply also for SOS. Referring to the discussion in Remark 2.3, it should also be pointed out that the lemma in [ALN14] is proven in the more common non-multilinear setting with explicit axioms Lemma 2.6 (SOS simulation of resolution). If a CNF formula F = t j=1 C j has a resolution refutation of size S and width w, then the constraints {S(C j ) ≥ 1} t j=1 as defined in (2.1) and (2.2) have an SOS refutation of size O w2 w S and degree at most w + 1.
The next lemma will be useful as a subroutine when we prove upper bounds in resolution.
Lemma 2.7. Let k and m 1 , m 2 , . . . m k be positive numbers. Then the CNF formula consisting of the clauses has a resolution refutation of width k + 1 and Proof. We prove the lemma by backwards induction over k. Consider any clause A of the form for 1 ≤ i ≤ k (and note that for i = 1 this is the empty clause). We will show how to derive A in width i + 1 given clauses We start by resolving the axioms y i,0 and y i,0 ∨ x i,1 ∨ y i,1 , and then we apply the resolution rule again on this resolvent and the clause A ∨ x i,1 (available by the induction hypothesis) to get A ∨ y i,1 . We now deduce A ∨ y i,j for increasing j. Suppose we have already obtained A ∨ y i,j−1 . Using the inductively derived clause A ∨ x i,j and the axiom y i,j−1 ∨ x i,j ∨ y i,j , we can resolve on variables y i,j−1 and x i,j to obtain A ∨ y i,j . Once A ∨ y i,m i has been derived, we resolve it with the axiom y i,m i to get A. By backward induction we reach the empty clause for i = 1, which concludes the resolution refutation. Since i ≤ k, the refutation has width k + 1. It is easy to verify that all axioms and intermediate clauses in the refutation are used exactly once. Thus, the refutation is tree-like, and has size exactly twice the number of axioms clauses minus one, which, in particular, When we construct formulas to be relativized as described in Section 1.2, it is convenient to use variables x i,  , where i ranges over some specific domain D and  is a collection of other indices. We say that the variable x i,  mentions the element i ∈ D. The domain-width of a clause is the number of distinct elements of D mentioned by its variables. The domain-width of a CNF formula or resolution proof is defined by taking the maximum domain-width over all its clauses, and the domain-width of refuting a CNF formula F is the minimal domain-width of any resolution refutation of F . Similarly, the domain-degree of a monomial is the number of distinct elements in D mentioned by its variables, the domain-degree of a polynomial or SOS proof is the maximal domain-degree of any monomial in it, and the domain-degree of refuting an unsatisfiable system of polynomial constraints is defined by taking the minimum over all refutations.

A Degree Lower Bound for Clique Formulas
In this section we state and prove the formal version of Theorem 1.2, namely a lower bound for the domain-degree needed in SOS to prove that a graph G has no k-clique. Let us start by describing how we encode the k-clique problem as a CNF formula.
Definition 3.1 (k-clique formula). Let k be a positive integer, G = (V, E) be an undirected graph on N vertices, and (v 1 , v 2 , . . . , v N ) be an enumeration of V (G) = V . Then the formula k-Clique(G) consists of the clauses (3.1e) The formula k-Clique(G) encodes the claim that G has a clique of size k. The intended meaning of the variable  Proof. We first use the weakening rule to derive all clauses of the form for every sequence of vertices (u 1 , u 2 , . . . , u k ). This is possible since either the sequence contains a repetition or it includes two vertices with no edge between them, and in both cases this means that the clause (3.2) is a superclause of some clause of the form (3.1a). Then we derive the empty clause by applying Lemma 2.7 to the clauses (3.1c)-(3.1e) and (3.2).
In order to obtain suitably hard instances of k-Clique(G) we construct a reduction from 3-XORs to k-partite graphs. It is convenient for us to describe the special case of k-clique on k-partite graphs directly as an encoding as polynomial equations and inequalities as follows next. ∪V k we let k-Block(G) denotes the following collection of polynomial constraints: It is straightforward to verify that these constrants encode the claim that G has a clique with one element in each block V i , since exactly one element is chosen from each block by (3.3a) and all the chosen elements have to be pairwise connected by (3.3b).
Any lower bound on degree that we establish for k-Block(G) will hold also for k-Clique(G) as stated in the following proposition.
∪ V k . If k-Clique(G) has an SOS refutation in domain-degree d, then k-Block(G) has an SOS refutation in domain-degree d.
Proof. The proof is by transforming a refutation of k-Clique(G) into a refutation of k-Block(G) of the same domain-degree. To give an overview, we start with a refutation of k-Clique(G) of domaindegree d and replace its variables with polynomials of degree at most 1 mentioning only variables from k-Block(G). In this way we get an SOS refutation of domain-degree at most d from the substituted axioms of k-Clique(G). The latter polynomials are not necessarily axioms of k-Block(G), but we show that they have SOS derivations of domain-degree 1 from the axioms of k-Block(G). This concludes the proof.
The variable substitution has two steps: first we substitute every variable z i,j with the linear form N t=j+1 x i,vt , where {v j } N j=1 is the enumeration of V (G) in Definition 3.1, and then we set As mentioned above, we now need to give SOS derivations of domain-degree 1 of all transformed axioms in k-Clique(G) from k-Block(G). For the axioms (3.1c)-(3.1e), the SOS encoding is (3.4c) After the first step of the substitution the inequalities (3.4a), (3.4b) and (3.4c) become, respectively, the inequality N j=1 x i,v j ≥ 1, and two occurrences of tautology 1 ≥ 1. Furthermore, after the second step of the substitution the inequality (3.4a) becomes v∈V i x i,v ≥ 1, which is subsumed by Equation (3.3a). Each of the axioms (3.1a) and (3.1b) is encoded as for some pair of indices i, i ′ and vertices u, v. We assume that u ∈ V i and v ∈ V i ′ , because otherwise the variable substitution turns the inequality into either a tautology or into 1 − x i,u ≥ 0, where the latter follows from (1 − x i,u ) 2 ≥ 0 by multilinearity. If i = i ′ then the inequality (3.5) is an axiom of k-Block(G). If that is not the case, then we can obtain 1 − x i,u − x i,v in domain-degree 1 using the derivation where the first identity holds by multilinearity. The proposition follows.
What we want to do now is to prove a domain-degree lower bound for instances of k-Block(G) where the graph G is obtained by a reduction from (unsatisfiable) sets of F 2 -linear equations. We rely on the version of Grigoriev's degree lower bound [Gri01b] shown by Schoenebeck [Sch08], which is conveniently stated for random 3-XOR formulas as encoded next.

Definition 3.5 (Polynomial encoding of random 3-XOR).
A random 3-XOR formula φ represents a system of ∆n linear equations modulo 2 defined over n variables. Each equation is sampled at random among all equations of the form x ⊕ y ⊕ z = b as follows: x, y, z are sampled uniformily without replacement from the set of n variables and b is sampled uniformly in {0, 1}. The polynomial encoding of any such linear equation modulo 2 is Fixing δ = 1/4 and ∆ = 8 in [Sch08] we have the following theorem.
Theorem 3.6 ([Sch08]). There exists an α, 0 < α < 1, such that for every ǫ > 0 there exists an n ǫ ∈ N such that a random 3-XOR formula φ in n ≥ n ǫ variables and 8n constraints has the following properties with probability at least 1 − ǫ.
1. At most 6n parity constraints of φ can be simultaneously satisfied.

Any sums-of-squares refutation of φ requires degree αn.
Now we are ready to describe how to transform a 3-XOR formula φ into a k-partite graph G k φ that has a clique of size k if and only if φ is satisfiable.

Definition 3.7 (3-XOR graph).
Given k ∈ N and a 3-XOR formula φ with 8n constraints over n variables, where we assume for simplicity that k divides 8n, we construct a 3-XOR graph G k φ as follows. We arbitrarily split the formula φ into k linear systems with 8n/k constraints each, denoted as φ 1 , φ 2 , . . . φ k . For each φ i we let V i be a set of at most N ≤ 2 24n/k vertices labelled by all possible assignments to the at most 24n/k variables appearing in φ i . For two distinct vertices u ∈ V i and v ∈ V i ′ there is an edge between u and v in G k φ if the two assignments corresponding to u and v are compatible, i.e., when they assign the same values to the common variables, and also the union of the two assignments does not violate any constraint in φ. (In particular, each V i is an independent set, since two distinct assignments to the same set of variables are not compatible.) The key property of the reduction in Definition 3.7 is that it allows small domain-degree refutations of k-Block G k φ to be converted into small degree refutations of φ.
Lemma 3.8. If k-Block G k φ has an SOS refutation of domain-degree d, then φ has an SOS refutation of degree 24dn/k.
Proof. Again we start by giving an overview of the proof, which works by transforming a refutation of k-Block G k φ of domain-degree d into a refutation of φ of degree 24dn/k. Given a refutation of k-Block G k φ of domain-degree d, we replace every variable x v with a polynomial over the variables of φ. In this way we get an SOS refutation from the polynomials corresponding to the substituted axioms of k-Block G k φ . The latter polynomials need not be axioms of φ, but we show that they can be efficiently derived in SOS from φ. We thus obtain an SOS refutation of φ, the degree of which is easily verified to be as in the statement of the lemma.
We now describe the substitution in detail. Consider a block V i and suppose that the corresponding 3-XOR formula φ i mentions t variables. Let us write x to denote this set of variables. Then every vertex v ∈ V i represents an assignment β ∈ {0, 1} t to x. In what follows, we denote the indicator polynomial δ x=β in (2.4) by δ v for brevity, and we substitute for each variable x v the polynomial δ v of degree t ≤ 24n/k.
Before the substitution each monomial in the original refutation has domain-degree at most d by assumption. Two important observations are that (δ v ) 2 = δ v for every v ∈ V i and that δ u δ v = 0 for every two distinct u, v in the same block V i . Therefore, after the substitution each monomial is either identically zero or the product of at most d indicator polynomials, and hence its degree is at most 24dn/k.
To verify these observations, note that the identity (δ v ) 2 = δ v holds by Proposition 2.4. The equality δ u δ v = 0 holds because δ u and δ v are the indicator polynomials of two incompatible assignments, and so their product always evaluates to zero. Applying Proposition 2.4 again, we conclude that the (multilinear) polynomial δ u δ v is identically zero.
In order to complete the proof outline above, we now need to present SOS derivations starting from the 3-XOR constraints of φ of all polynomial constraints resulting from the substitutions in the axioms of k-Block G k φ described above, and to do so in degree at most 24n/k. Let us first look at the axioms (3.3a). By Fact 2.1, the identity holds syntactically, so substitutions in axioms of the form (3.3a) result in tautologies 1 = 1.
The remaining axioms of k-Block G k φ in (3.3b) have the form x u + x v ≤ 1 for non-edges (u, v) between vertices in different blocks. By construction of G k φ the reason u and v are not connected is either that the partial assignments corresponding to the two vertices are incompatible, or that their union violates some constraint in φ.
In the first case, 1 − δ u − δ v ≥ 0 is an SOS axiom because of the identity which follows from the observation that δ u and δ v are the indicator polynomials of two incompatible assignments and cannot evaluate to 1 simultaneously, and so (1 − δ u − δ v ) evaluates to either 0 or 1 and is identical to its square by Proposition 2.4. The degree of (3.9) is 24n/k. In the second case, the two assignments corresponding to u and v are compatible but their union violates some initial equation f = 0 of the form (3.7a)-(3.7h). Any such f is a degree-3 indicator polynomial which evaluates to 1 whenever the assignment satisfies the equations δ u δ v = 1. This means that δ u δ v contains f as a factor. We factorize f as f u f v so that δ u = f u δ ′ u and δ v = f v δ ′ v . Given this notation, we can derive 0 ≤ 1 − δ u − δ v using the indentity of degree at most 24n/k. To verify (3.10), observe that the left-hand side is the sum of some squared polynomials and −2f u f v = −2f = 0. Expanding the squared polynomials and using Proposition 2.4 repeatedly we have that and which establishes that (3.10) holds. The lemma follows. Now we can put together all the material in this section to prove a formal version of Theorem 1.2 as stated next.
Theorem 3.9. There are universal constants N 0 ∈ N + and α 0 , 0 < α 0 < 1, such that for every k ≥ 1 there exists a graph G k with at most kN 0 = O(k) vertices and a 3-CNF formula k-Clique(G k ) of size polynomial in k with the following properties: 1. Resolution can refute k-Clique(G k ) in size 2 O(k log k) and width k + 1.

Any SOS refutation of
Proof. Fix any positive ǫ < 1 and let N 0 = 2 24nǫ , α 0 = α 24 and n = kn ǫ , where n ǫ and α are the universal constants from Theorem 3.6. To build the graph G k we take a 3-XOR formula φ on n variables and 8n equations from the distribution in Definition 3.5. Since n ≥ n ǫ , Theorem 3.6 implies that there is a formula in the support of the distribution that is unsatisfiable and that requires degree αn to be refuted in SOS. We fix φ to be that formula and let G k be the graph G k φ constructed as in Definition 3.7. Then G k φ is k-partite, with each part having at most 2 24n/k = N 0 vertices, and the graph has no k-clique because otherwise φ would be satisfiable. Suppose that there is an SOS refutation of k-Clique G k φ of domain-degree d. We want to argue that d ≥ α 0 k. Since G k φ is k-partite, by Proposition 3.4 the formula k-Block G k φ also has an SOS refutation in domain-degree d. By Lemma 3.8, this in turn yields an SOS refutation of φ in degree 24dn/k. Now Theorem 3.6 implies that 24dn/k ≥ αn, and hence d ≥ α 24 k = α 0 k. To conclude the proof, we can just observe that the resolution width and size upper bounds are a direct application of Proposition 3.2.

Size Lower Bounds from Relativization
Using the material developed in Section 3, we can now describe how to relativize formulas in order to to amplify degree lower bounds to size lower bounds in SOS . This method works for formulas that are "symmetric" in a certain sense, and so we start by explaining exactly what is meant by this.

Definition 4.1 (Symmetric formula).
Consider a CNF formula F on variables x i,  , where i is an index in some domain D and  denotes a collection of other indices. For every subset of indices ı = {i 1 , i 2 , . . . , i s } ⊆ D we identify the subformula F ı of F such that each clause C ∈ F ı mentions exactly the indices in ı , so that a formula F of domain-width d can be written as We say that F is symmetric with respect to D if it is invariant with respect to permutations of D, i.e., if for every F ı ⊆ F it also holds that F π( ı ) ⊆ F , where π is any permutation on D and π ( ı ) is the set of images of the indices in ı . Phrased differently, F is symmetric with respect to D if for any permutation π on D the syntactic equality F = ı ⊆D F π( ı ) holds (where we recall that we treat CNF formulas as sets of clauses). We apply this terminology for systems of polynomial equations and inequalities in the same way.
Let us illustrate Definition 4.1 by giving perhaps the most canonical example of a formula that is symmetric in this sense.

Example 4.2.
Recall that the CNF encoding of the pigeonhole principle with a set of pigeons D and holes [n] claims that there is a mapping from pigeons in D to holes such that no hole gets two pigeons. For every pigeon i ∈ D there is a clause j∈[n] x i,j and for every two distinct pigeons i, i ′ and hole j there is a clause x i,j ∨ x i ′ ,j . Since any permutation of the set of pigeons D gives us back exactly the same set of clauses (only listed in a different order) the pigeonhole principle formula is symmetric with respect to D.
By now, the reader will already have guessed that another example of a symmetric formula, which will be more interesting to us in the currect context, is the k-clique formula discussed in Section 3. Starting with any formula F symmetric with respect to a domain D, we can build a family of similar formulas by varying the size of the domain. If F has domain-width d, then for each s, 0 ≤ s ≤ d, the subformulas F ı with | ı | = s in (4.1) are the same up to renaming of the domain indices in ı . Hence, we can arbitrarily pick one such subformula to represent them all, and denote it as F s . The formulas {F s } d s=0 are completely determined by F , and together with D they in turn completely determine F . Using this observation, we can generalize the formula F over domain D to any domain D ′ with |D ′ | ≥ d by defining F [D ′ ] to be the formula where each F ı for | ı | = s is an isomorphic copy of F s with its domain indices renamed according to ı . Let us state some simple but useful facts that can be read off directly from (4. When we want to emphasize the domain D of a formula F in what follows, we will denote the formula F as F [D]. When the domain is D = [t], we abuse notation slightly and write F [t] instead of F [[t]]. As discussed above, from a symmetric formula F of domain-width d we can obtain a welldefined sequence of formulas F [t] for all t ≥ d. We say that the unsatisfiability threshold of such a sequence of formulas is the least t such that F [t] is unsatisfiable. For instance, the pigeonhole principle formula in Example 4.2 has unsatisfiability threshold n + 1.

Relativization of Symmetric Formulas
Given a formula F = F [m] symmetric with respect to [m] and a parameter k < m, we now want to define the k-relativization of F [m], which is intended to encode the claim that that there exists a subset D ⊆ [m] of size |D| ≥ k such that the subformula F [D] ⊆ F [m] is satisfiable. We remark that a CNF formula encoding such a claim will be unsatisfiable when k is at least the unsatisfiability threshold of F .
In order to express the existence of the subset D we use selectors s 1 , s 2 , . . . , s m as indicators of membership in the subset and encode the constraint on the subset size |D| = m i=1 s i ≥ k as described in the next definition.
To see that Thr k ( s) indeed enforces a cardinality constraint, note that the variables p ℓ,i encode a mapping between [k] and [m] (with p ℓ,i being true if and only if ℓ maps to i). The clauses (4.3a)-(4.3c) force every ℓ ∈ [k] to have an image in [m], since they form the 3-CNF representation of clauses i p ℓ,i . The clauses (4.3d) forbid two distinct elements of [k] to have the same image, so there must be at least k elements in the range of the map, and for each of them the corresponding selector must be true because of the clauses (4.3e). We will need the following properties of the threshold formula. 2. For any partial assignment to s with at least k ones there is an assignment to the extension variables that satisfies Thr k ( s).
3. There is a resolution refutation of the set of clauses Thr k ( s) ∪ i∈D s i D ⊆ [m], |D| = k of size O km k and width k + 1.
Proof. The first two items are immediate. In order to show the third item we can first derive each clause p 1,i 1 ∨. . .∨p k,i k by resolving s i 1 ∨. . .∨s i k with clauses of the form (4.3e), and then apply Lemma 2.7.
Using the formula in Definition 4.4 to encode cardinality constraints on subsets, we can now define formally what we mean by the relativization of a symmetric formula. Since we are dealing with refutations of unsatisfiable formulas, it will always be the case that the parameter k in Definition 4.6 is at least the unsatisfiability threshold of F . An important property of relativized formulas is that the hardness of F [k; m] scales nicely with m. In particular, if F [k] is not too hard, then the relativization F [k; m] also is not too hard.

Random Restrictions and Size Lower Bounds
To prove size lower bounds on refutations of relativized formulas F [k; m] we use random restrictions sampled as follows.
Definition 4.8 (Random restrictions for relativized formulas). Given a relativized formula F [k; m], we define a distribution R of partial assignments over the variables of this formula by the following process.
1. Pick uniformly at random a set D ⊆ [m] of size k.
2. Fix s i to 1 if i ∈ D and to 0 otherwise.
3. Extend this to any assignment to the remaining variables of the formula Thr k ( s) that satisfies this threshold formula.
4. For every variable x i,  that has index i ∈ D, fix x i,  to 0 or 1 uniformly and independently at random.
5. All remaining variables x i,  for the indices i ∈ D are left unset.
It is straightforward to verify that the distribution R is constructed in such a way as to give us back The key technical ingredient in the size lower bound on sums-of-squares proofs is the following property of the distribution R, which was proven in [AMO13, ALN14] but is rephrased below using the notation and terminology in this paper. We also provide a brief proof sketch just to give the reader a sense of how the argument goes. Proof sketch. Ley ℓ ′ be the domain-degree of M . The restriction ρ will set independently and uniformly at random at least ℓ ′ − k of its variables, so if (ℓ ′ − k) is larger than ℓ log m, the restricted monomial M↾ ρ is non zero with probability at most 1/m ℓ . Otherwise we upper bound the probability that M↾ ρ has domain-degree ℓ with the probability that the ℓ ′ indices in M contain ℓ of the k surviving indices. By a union bound this probability is at most (4k log m) k /m ℓ .
Using Lemma 4.10, it is now straightforward to show that relativization amplifies degree lower bounds to size lower bounds. Proof. Suppose that there is a sums-of-squares refutation of F [k; m] in size S, i.e., containing S monomials. For ρ sampled from R, we see that the probability that some monomial in the refutation restricted by ρ has domain-degree at least ℓ is at most by appealing to Lemma 4.10 and taking a union bound. As noted in Observation 4.9, the formula F [k; m]↾ ρ is equal to F [k] up to renaming of variables, and so it cannot have a refutation of domain-degree ℓ or less. This implies that the bound on the probability (4.5) is greater than one, and thus we obtain which proves the theorem.

Statement of Main Result and Discussion of Possible Improvements
Putting everything together, we can establish the formal version of our main results in Theorem 1.1 as follows. We remark that straightforward calculations show that when k(m) = O m δ for δ < α 0 the upper bound in Theorem 4.12 is m O(k) and the lower bound is m Ω(k) .
Let us now discuss a couple of the parameters in Theorem 4.12 and how they could be improved slightly. We stated our main theorem for 4-CNF formulas, since that is the clause size that results naturally from our construction. However, if one wants to minimize the clause width and obtain an analogous result for 3-CNF formulas this is also possible to achieve, just as was done in [ALN14] for other proof systems. To prove a version of Theorem 4.12 for 3-CNF formulas we need a simple but rather ad-hoc variation of the relativization argument presented above. Let us briefly describe what modifications are needed.
The way we presented the construction above, we started with the 3-CNF formula k-Clique(G) and then applied relativization, which turned the clauses (3.1c)-(3.1e) into the 4-CNF formula (4.7c) An alternative approach would be to first encode k-Clique(G) with wide clauses N j=1 x i,v j instead of clauses of the form (3.1c)-(3.1e), relativize this new, wide formula, and then convert the relativized formula into 3-CNF using extension variables. Instead of clauses (4.7c)-(4.7c), this would yield the collection of clauses (4.8c) This causes a small technical problem in that some of these clauses mention i ∈ [m] but lack the literal s i , and so a random restriction sampled as in Definition 4.8 may actually falsify these clauses. The solution to this is to change the random assignment so that when s i = 0, we fix each x i,v j uniformly at random in {0, 1}, set each z i,(j−1) equal to the value assigned to x i,v j , and finally fix z i,N to 0. The new restriction satisfies all clauses (4.8a)-(4.8c), and the proof of Lemma 4.10 still goes through.
Another parameter in Theorem 4.12 that could be improved is the value of α 0 , which determines how tightly the size lower bound matches the upper bound implied by width/degree and also how high 5 Concluding Remarks we can push k(m). In our reduction from a 3-XOR formula φ to the clique formula k-Clique G k φ we start by splitting the 8n constraints into k blocks. The vertices in each block correspond to assignments to 24n/k variables, and because of this an SOS refutation in domain-degree d of k-Clique G k φ can be converted to a refutation in degree 24dn/k of φ.
If we want to obtain a more efficient reduction, we could instead split the n variables, rather than the 8n constraints, into k parts. In this way each vertex in G k φ would correspond to an assigment to n/k variables, and an SOS refutation in domain-degree d would translate to a refutation of φ in degree dn/k. But now we cannot reduce to the clique problem anymore. Splitting with respect to constraints allows us to enforce pairwise consistency between vertices in different blocks referring to common variables. When splitting with respect to variables, the vertices in different blocks correspond to partial assigments on disjoint domains and so are always pairwise compatible. However, we must still require that these partial assignments are consistent with the constraints in φ. Each such constraint refers to up to three blocks. Thus, any satisfying assignment to φ corresponds to k vertices such that no triple of vertices violates an 3-XOR constraint. This reduces to the problem of finding a k-hyperclique in a 3-uniform hypergraph. The rest of the reduction can be made to work as in Lemma 3.8. In the end we get an analogous result of that in Theorem 3.9 but with α 0 equal to α instead of α 24 , which also improves Theorem 4.12. In this paper we instead presented a reduction to the k-clique problem for standard graphs, partly because we believe that a degree lower bound for this problem can be considered to be of independent interest.

Concluding Remarks
In this paper, we show that using Lasserre semidefinite programming relaxations to find degree-d sumsof-squares proofs is optimal up to constant factors in the exponent of the running time. More precisely, we show that there are constant-width CNF formulas on n variables that are refutable in sums-of-squares in degree d but require proofs of size n Ω(d) .
As for so many other results for the sums-of-squares proof system, in the end our proof boils down to a reduction from 3-XOR using Schoenebeck's version [Sch08] of Grigoriev's degree lower bound [Gri01b]. It would be very interesting to obtain other SOS degree lower bounds by different means than by reducing from Grigoriev's results for 3-XOR and knapsack.
Another interesting problem would be to prove average-case SOS degree lower bound for k-clique formulas over Erdős-Rényi random graphs, or size lower bounds for (non-relativized) k-clique formulas over any graphs. In this context, it might be worth to point out that the problem of establishing proof size lower bounds for k-clique formulas for constant k, which has been discussed, for instance, in [BGLR12], still remains open even for the resolution proof system (although lower bounds have been shown for tree-like resolution in [BGL13] and for full resolution for a version of clique formulas using a different encoding more amenable to lower bound techniques in [LPRT13]).