A #SAT Algorithm for Small Constant-Depth Circuits with PTF gates

We show that there is a randomized algorithm that, when given a small constant-depth Boolean circuit $C$ made up of gates that compute constant-degree Polynomial Threshold functions or PTFs (i.e., Boolean functions that compute signs of constant-degree polynomials), counts the number of satisfying assignments to $C$ in significantly better than brute-force time. Formally, for any constants $d,k$, there is an $\epsilon>0$ such that the algorithm counts the number of satisfying assignments to a given depth-$d$ circuit $C$ made up of $k$-PTF gates such that $C$ has size at most $n^{1+\epsilon}$. The algorithm runs in time $2^{n-n^{\Omega(\epsilon)}}$. Before our result, no algorithm for beating brute-force search was known even for a single degree-$2$ PTF (which is a depth-$1$ circuit of linear size). The main new tool is the use of a learning algorithm for learning degree-$1$ PTFs (or Linear Threshold Functions) using comparison queries due to Kane, Lovett, Moran and Zhang (FOCS 2017). We show that their ideas fit nicely into a memoization approach that yields the #SAT algorithms.


Introduction
This paper adds to the growing line of work on circuit-analysis algorithms, where we are given as input a Boolean circuit C from a fixed class C computing a function f : {−1, 1} n → {−1, 1} 1 and we are required to compute some parameter of the function f . A typical example of this is the question of satisfiability, i.e. whether f is the constant function 1 or not. In this paper, we are interested in computing #SAT(f ), which is the number of satisfying assignments of f (i.e. |{a ∈ {−1, 1} n | f (a) = −1}|).
Problems of this form can always be solved by "brute-force" in time poly(|C|) · 2 n by trying all assignments to C. The question is can this brute-force algorithm be significantly improved, say to time 2 n /n ω(1) when C is small, say |C| ≤ n O(1) . Such algorithms, intuitively are able to distinguish a small circuit C ∈ C from a "black-box" and hence find some structure in C. This structure, in turn, is useful in answering other questions about C, such as proving lower bounds against the class C. 2 There has been a large body of work in this area, a small sample of which can be found in [PPZ97,PPSZ05,Wil10,Wil11]. A striking result of this type was proved by Williams [Wil10] who showed that for many circuit classes C, even co-non-deterministic satisfiability algorithms running in better than brute-force time yield lower bounds against C.
Recently, researchers have also uncovered tight connections between many combinatorial problems and circuit-analysis algorithms, showing that even modest improvements over brute-force search can be used to improve long-standing bounds for these combinatorial problems (see, e.g., [Wil15,AHWW16,AR18,AB18]). This yields further impetus in improving known circuit-analysis algorithms.
This paper is concerned with #SAT algorithms for constant depth threshold circuits, denoted as TC 0 , which are Boolean circuits where each gate computes a linear threshold function (LTF); an LTF computes a Boolean function which accepts or rejects based on the sign of a (real-valued) linear polynomial evaluated on its input. Such circuits are surprisingly powerful: for example, they can perform all integer arithmetic efficiently [BCH86,HAB02], and are at the frontier of our current lower bound techniques [KW16,Che18].
It is natural, therefore, to try to come up with circuit-analysis algorithms for threshold circuits. Indeed, there has a large body of work in the area (reviewed below), but some extremely simple questions remain open.
An example of such a question is the existence of a better-than-brute-force algorithm for satisfiability of degree-2 PTFs. Informally, the question is the following: we are given a quadratic polynomial Q(x 1 , . . . , x n ) in n Boolean variables and we ask if there is any Boolean assignment a ∈ {−1, 1} n to x 1 , . . . , x n such that Q(a) < 0. (Note that for a linear polynomial instead of a quadratic polynomial, this problem is trivial.) Surprisingly, no algorithm is known for this problem that is significantly better than 2 n time. 3 In this paper, we solve the stronger counting variant of this problem for any constant-degree PTFs. We start with some definitions and then describe this result.
Definition 1 (Polynomial Threshold Functions). A Polynomial Threshold Function (PTF) on n variables of degree-k is a Boolean function f : {−1, 1} n → {−1, 1} such that there is a degree-k multilinear polynomial P (x 1 , . . . , x n ) ∈ R[x 1 , . . . , x n ] that, for all a ∈ {−1, 1} n , satisfies f (a) = sgn(P (a)). (We assume that P (a) = 0 for any a ∈ {−1, 1} n .) In such a scenario, we call f a k-PTF. In the special case that k = 1, we call f a Linear Threshold function (LTF). We also say that the polynomial P sign-represents f .
We define weight of P , denoted as w(P ), to be the bit-complexity of the sum of absolute values of all the coefficients of P .
The #SAT problem for k-PTFs is the problem of counting the number of assignments that satisfy a given k-PTF f . Formally, Definition 2 (#SAT problem for k-PTFs). The problem is defined as follows.
Input: A k-PTF f , specified by a degree-k polynomial P (x 1 , . . . , x n ) with integer coefficients. 4 Output: The number of satisfying assignments to f . That is, the number of a ∈ {−1, 1} n such that P (a) < 0.
We use #SAT(f ) to denote this output. We say that the input instance has parameters (n, M ) if n is the number of input variables and w(P ) ≤ M .
Remark 3. An interesting setting of M is poly(n) since any k-PTF can be represented by an integer polynomial with coefficients of bit-complexity at mostÕ(n k ) [Mur].
We give a better-than-brute-force algorithm for #SAT(k-PTF). Formally we prove the following theorem.
Theorem 4. Fix any constant k. There is a zero-error randomized algorithm that solves the #SAT problem for k-PTFs in time poly(n, M )·2 n−S where S =Ω(n 1/(k+1) ) and (n, M ) are the parameters of the input k-PTF f. (TheΩ(·) hides factors that are inverse polylogarithmic in n.) We then extend this result to a powerful model of circuits called k-PTF circuits, where each gate computes a k-PTF. This model was first studied by Kane, Kabanets and Lu [KKL17] who proved strong average case lower bounds for slightly superlinear-size constant-depth k-PTF circuits. Using these ideas, Kabanets and Lu [KL18] were able to give a #SAT algorithm for a restricted class of k-PTF circuits, where each gate computes a PTF with a subquadratically many, say n 1.99 , monomials (while the size remains the same, i.e. slightly superlinear). 5 A reason for this restriction on the PTFs was that they did not have an algorithm to handle even a single degree-2 PTF (which can have Ω(n 2 ) many monomials).
Building on our #SAT algorithm for k-PTFs and the ideas of [KL18], we are able to handle general k-PTF circuits of slightly superlinear size. We state these results formally below.
We first define k-PTF circuits formally.
Definition 5 (k-PTF circuits). A k-PTF circuit on n variables is a Boolean circuit on n variables where each gate g of fan-in m computes a fixed k-PTF of its m inputs. The size of the circuit is the number of wires in the circuit, and the depth of the circuit is the longest path from an input to the output gate. 6 The problems we consider is the #SAT problem for k-PTF circuits, defined as follows.
Definition 6 (#SAT problem for k-PTF circuits). The problem is defined as follows.
Input: A k-PTF circuit C, where each gate g is labelled by an integer polynomial that signrepresents the function that is computed by g. Output: The number of satisfying assignments to C.
4 It is known [Mur] that such a representation always exists. 5 Their result also works for the slightly larger class of PTFs that are subquadratically sparse in the {0, 1}-basis with no restriction on degree. Our result can also be stated for the larger class of polynomially sparse PTFs, but for the sake of simplicity, we stick to constant-degree PTFs.
6 Note, crucially, that only the fan-in of a gate counts towards its size. So any gate computing a k-PTF on m variables only adds m to the size of the circuit, though of course the polynomial representing this PTF may have ≈ m k monomials.
We use #SAT(C) to denote this output. We say that the input instance has parameters (n, s, d, M ) where n is the number of input variables, s is the size of C, d is the depth of C and M is the maximum over the weights of the degree-k polynomials specifying the k-PTFs in C. We will say that M is the weight of C, denoted by w(C).
We now state our result on #SAT for k-PTF circuits. The following result implies Theorem 4, but we prove them separately.
Theorem 7. Fix any constants k, d. Then the following holds for some constant ε k,d > 0 depending on k, d. There is a zero-error randomized algorithm that solves the #SAT problem for k-PTF circuits of size at most s = n 1+ε k,d with probability at least 1/4 and outputs ? otherwise. The algorithm runs in time poly(n, M ) · 2 n−S , where S = n ε k,d and (n, s, d, M ) are the parameters of the input k-PTF circuit.
Previous work. Satisfiability algorithms for TC 0 have been widely investigated. Impagliazo, Lovett, Paturi and Saks [IPS13,ILPS14] give algorithms for checking satisfiability of depth-2 threshold circuits with O(n) gates. An incomparable result was proved by Williams [Wil14] who obtained algorithms for subexponential-sized circuits from the class ACC 0 • LTF, which is a subclass of subexponential TC 0 . 7 For the special case of k-PTFs (and generalizations to sparse PTFs over the {0, 1} basis) with small weights, a #SAT algorithm follows from the result of Sakai et al. [SSTT16].
For general constant-depth threshold circuits, the first satisfiability algorithm was given by Chen, Santhanam and Srinivasan [CSS18]. In their paper, Chen et al. gave the first average case lower bound for TC 0 circuits of slightly super linear size n 1+ε d , where ε d depends on the depth of the circuit. (These are roughly the strongest size lower bounds we know for general TC 0 circuits even in the worst case [IPS97].) Using their ideas, they gave the first (zero-error randomized) improvement to brute-force-search for satisfiability algorithms (and indeed even #SAT algorithms) for constant depth TC 0 circuits of size at most n 1+ε d .
The lower bound results of [CSS18] were extended to the much more powerful class of k-PTF circuits (of roughly the same size as [CSS18]) by Kane, Kabanets and Lu [KKL17]. In a follow-up paper, Kabanets and Lu [KL18] considered the satisfiability question for k-PTF circuits, and could resolve this question in the special case that each PTF is subquadratically sparse, i.e. has n 2−Ω(1) monomials. One of the reasons for this sparsity restriction is that their strategy does not seem to yield a SAT algorithm for a single degree-2 PTF (which is a depth-1 2-PTF circuit of linear size).

Proof outline.
For simplicity we discuss SAT algorithms instead of #SAT algorithms.

Satisfiability algorithm for k-PTFs.
At a high-level, the algorithm uses Memoization, which is a standard and very useful strategy for satisfiability algorithms (see, e.g. [San10]). Let C be a circuit class and C n be the subclass of circuits 7 ACC 0 • LTF is a subclass of TC 0 where general threshold gates are allowed only just above the variables. All computations above these gates are one of AND, OR or Modular gates (that count the number of inputs modulo a constant). It is suspected (but not proved) that subexponential-sized ACC 0 circuits cannot simulate even a single general threshold gate. Hence, it is not clear if the class of subexponential-sized ACC 0 • LTF circuits contains even depth-2 TC 0 circuits of linear size. from C that have n variables. Memoization algorithms for C-SAT fit into the following two-step template.
• Step 1: Solve by brute-force all instances of C-SAT where the input circuit C ′ ∈ C m for some suitable m ≪ n. (Typically, m = n ε for some constant ε.) Usually this takes exp(m O(1) ) ≪ 2 n time. • Step 2: On the input C ∈ C n , set all input variables x m+1 , . . . , x n to Boolean values and for each such setting, obtain C ′′ ∈ C m on m variables. Typically C ′′ is a circuit for which we have solved satisfiability in Step 1 and hence by a simple table lookup, we should be able to check if C ′′ is satisfiable in poly(|C|) time. Overall, this takes time O * (2 n−m ) ≪ 2 n .
At first sight, this seems perfect for k-PTFs, since it is a standard result that the number of k-PTFs on m variables is at most 2 O(m k+1 ) [Cho61]. Thus, Step 1 can be done in 2 O(m k+1 ) ≪ 2 n time.
For implementing Step 2, we need to ensure that the lookup (for satisfiability for k-PTFs on m variables) can be done quickly. Unfortunately how to do this is unclear. The following two ways suggest themselves.
• Store all polynomials P ′ ∈ Z[x 1 , . . . , x m ] with small coefficients. Since every k-PTF f can be sign-represented by an integer polynomial with coefficients of size 2 poly(m) [Mur], this can be done with a table of size 2 poly(m) and in time 2 poly(m) . When the coefficients are small (say of bit-complexity ≤ n o(1) ), then this strategy already yields a #SAT algorithm, as observed by Sakai et al. [SSTT16]. Unfortunately, in general, given a restriction P ′′ ∈ Z[x 1 , . . . , x m ] of a polynomial P ∈ Z[x 1 , . . . , x n ], its coefficients can be much larger (say 2 poly(n) ) and it is not clear how to efficiently find a polynomial with small coefficients that sign-represents the same function.
• It is also known that every k-PTF on m variables can be uniquely identified by poly(m) numbers of bit-complexity O(m) each [Cho61]: these are called the "Chow parameters" of f . Again for this representation, it is unclear how to compute efficiently the Chow parameters of the function represented by the restricted polynomial P ′′ . (Even for an LTF, computing the Chow parameters is as hard as Subset-sum [OS11].) The way we solve this problem is by using a beautiful recent result of Kane, Lovett, Moran and Zhang [KLMZ17], who show that there is a simple decision tree that, when given as input the coefficients of any degree-k polynomial P ′ ∈ Z[x 1 , . . . , x m ], can determine the sign of the polynomial P ′ at all points in {−1, 1} m using only poly(m) queries to the coefficients of P . Here, each query is a linear inequality on the coefficients of P ; such a decision tree is called a linear decision tree.
Our strategy is to replace Step 1 with the construction of this linear decision tree (which can be done in exp(m O(1) ) time). At each leaf of the linear decision tree, we replace the truth table of the input polynomial P ′ by a single bit that indicates whether f ′ = sgn(P ′ ) is satisfiable or not. In Step 2, we simply run this decision tree on our restricted polynomial P ′′ and obtain the answer to the corresponding satisfiability query in poly(m, w(P ′′ )) time. Note, crucially, that the height of the linear decision tree implied by [KLMZ17] construction is poly(m) and independent of the bit-complexity of the coefficients of the polynomial P ′′ (which may be as big as poly(n) in our algorithm). This concludes the description of the algorithm for k-PTF.

Satisfiability algorithm for k-PTF circuits.
For k-PTF circuits, we follow a template set up by the result of Kabanets and Lu [KL18] on sparse-PTF circuits. We start by describing this template and then describe what is new in our algorithm.
The Kabanets-Lu algorithm is an induction on the depth d of the circuit (which is a fixed constant). Given as input a depth d k-PTF circuit C on n variables, Kabanets and Lu do the following: Depth-reduction: In [KL18], it is shown that on a random restriction that sets all but n 1−2β variables (here, think of β as a small constant, say 0.01) to random Boolean values, the bottom layer of C simplifies in the following sense.
All but t ≤ n β gates at the bottom layer become exponentially biased, i.e. on all but δ = exp(−n Ω(1) ) fraction of inputs they are equal to a fixed b ∈ {−1, 1}. Now, for each such biased gate g, there is a minority value b g ∈ {−1, 1} that it takes on very few inputs. [KL18] show how to enumerate this small number of inputs in δ · 2 n time and check if there is a satisfying assignment among these inputs. Having ascertained that there is no such assignment, we replace these gates by their majority value and there are only t gates at the bottom layer. At this point, we "guess" the output of these t "unbiased" gates and for each such guess σ ∈ {−1, 1} t , we check if there is an assignment that simultaneously satisfies: (a) the depth d − 1 circuit C ′ , obtained by setting the unbiased gates to the guess σ, is satisfied.
(b) each unbiased gate g i evaluates to the corresponding value σ i .
Base case: Continuing this way, we eventually get to a base case which is an AND of sparse PTFs for which there is a satisfiability algorithm using the polynomial method.
In the above algorithm, there are two steps where subquadratic sparsity is crucially used. The first is the minority assignment enumeration algorithm for PTFs, which uses ideas of Chen and Santhanam [CS15] to reduce the problem to enumerating biased LTFs, which is easy [CSS18]. The second is the base case, which uses a non-trivial polynomial approximation for LTFs [Sri13]. Neither of these results hold for even degree-2 PTFs in general. To overcome this, we do the following.
Enumerating minority assignments. Given a k-PTF on m variables that is δ = exp(−n Ω(1) )close to b ∈ {−1, 1}, we enumerate its minority assignments as follows. First, we set up a linear decision tree as in the k-PTF satisfiability algorithm. Then we set all but q ≈ log 1 δ variables of the PTF. On most such settings, the resulting PTF becomes the constant function and we can check this using the linear decision tree we created earlier. In this setting, there is nothing to do. Otherwise, we brute-force over the remaining variables to find the minority assignments. Setting parameters suitably, this yields an O( √ δ · 2 m ) time algorithm to find the minority assignments of a δ-biased k-PTF on m variables.
Base case: Here, we make the additional observation (which [KL18] do not need) that the AND of PTFs that is obtained further is small in that it only has slightly superlinear size. Hence, we can apply another random restriction in the style of [KL18] and using the minority assignment enumeration ideas, reduce it to an AND of a small (say n 0.1 ) number of PTFs on n 0.01 (say) variables. At this point, we can again run the linear decision tree (in a slightly more generalized form) to check satisfiability.

A result of Kane, Lovett, Moran, and Zhang [KLMZ17]
Definition 8 (Coefficient vectors.). Fix any k, m ≥ 1. There are exactly r = k i=0 m i many multilinear monomials of degree at most k. Any multilinear polynomial P (x 1 , . . . , x m ) can be identified with a list of the coefficients of its monomials in lexicographic order (say) and hence with some vector w ∈ R r . We call w the coefficient vector of P and use coeff m,k (P ) to denote this vector. When m, k are clear from context, we will simply use coeff(P ) instead of coeff m,k (P ).
Definition 9 (Linear Decision Trees). A Linear Decision Tree for a function f : R r → S (for some set S) is a decision tree where each internal node is labelled by a linear inequality of the form r i=1 w i z i ≥ θ (here z 1 , . . . , z n denote the input variables). Depending on the answer to this linear inequality, computation proceeds to the left or right child of this node, and this process continues until a leaf is reached, which is labelled with an element of S that is the output of f on the given input.
The following construction of linear decision trees due to Kane, Lovett, Moran and Zhang [KLMZ17] will be crucial for us.
Theorem 10. There is a randomized algorithm, which on input a positive integer r, a subset H ⊆ {−1, 1} r , and an error parameter ε, produces a (random) linear decision tree T of depth ∆ = O(r log r · log(|H|/ε)) that computes a (random) function F : R r → {−1, 1} |H| ∪ {?} that has the following properties.

Given as input any
The randomized algorithm runs in time 2 O(∆) .
Remark 11. The last statement in the above theorem is not formally stated in [KLMZ17] but can easily be inferred from their proof and a remark [KLMZ17, Page 363] on the "Computational Complexity" of their procedure. 8 We will need a generalization of this theorem for evaluating (tuples of) k-PTFs. However, it is a simple corollary of this theorem.
Corollary 12. Fix positive constants k and c. Let r = k i=0 m i = Θ(m k ) denote the number of coefficients in a degree-k multilinear polynomial in m variables. There is a randomized algorithm which on input positive integers m and ℓ ≤ m c produces a (random) linear decision tree T of depth ∆ = O(ℓ · m k+1 log m) that computes a (random) function F : R r·ℓ → N ∪ {?} that has the following properties.

Given as input any ℓ-tuple of coefficient vectors
is either the number of common satisfying assignments to all the k-PTFs on {−1, 1} m sign-represented by P 1 , . . . , P ℓ , or is equal to ?.
The randomized algorithm runs in time 2 O(∆) .
Proof. For each b ∈ {−1, 1} m , define eval b ∈ {−1, 1} r to be the vector of all evaluations of multilinear monomials of degree at most k, taken in lexicographic order, on the input b.
Clearly, |H| ≤ 2 m . Further, note that given any polynomial P (x 1 , . . . , x m ) of degree at most k, the truth table of the k-PTF sign-represented by P is given by the evaluation of the LTF represented by coeff(P ) at the points in H. Our aim, therefore, is to evaluate the LTFs corresponding to coeff(P 1 ), . . . , coeff(P ℓ ) at all the points in H.
For each i, we use the randomized algorithm from Theorem 10 to produce a decision tree T i that evaluates the Boolean function f i : {−1, 1} m → {−1, 1} sign-represented by P i (or equivalently, evaluating the LTF corresponding to coeff(P i ) at all points in H) with error ε = 1/2ℓ. Note that The final tree T is obtained by simply running T 1 , . . . , T ℓ in order, which is of depth O(ℓm k+1 log m). The tree T outputs the number of common satisfying assignments to all the f i if all the T i s succeed and ? otherwise. Since each T i outputs ? with probability at most 1/2ℓ, the tree T outputs ? with probability at most (1/2ℓ) · ℓ = 1/2.
The claim about the running time follows from the analogous claim in Theorem 10 and the fact that the number of common satisfying assignments to all the f i can be computed from the truth tables in 2 O(m) time. This completes the proof.

The PTF-SAT algorithm
We are now ready to prove Theorem 4. We first state the algorithm, which follows a standard memoization idea (see, e.g. [San10]). We assume that the input is a polynomial P ∈ Z[x 1 , . . . , x n ] of degree at most k that sign-represents a Boolean function f on n variables. The parameters of the instance are assumed to be (n, M ). Set m = n 1/(k+1) / log n. Algorithm A 1. Use n 1 = 10n independent runs of the algorithm from Corollary 12 with ℓ = 1 to construct independent random linear decision trees T 1 , . . . , T n 1 such that on any input polynomial Q(x 1 , . . . , x m ) (or more precisely coeff m,k (Q)) of degree at most k that sign-represents an k-PTF g on m variables, each T i computes the number of satisfying assignments to g with error at most 1/2.
2. Set N = 0. (N will ultimately be the number of satisfying assignments to f .) 3. For each setting σ ∈ {−1, 1} n−m to the variables x m+1 , . . . , x n , do the following: (a) Compute the polynomial P σ obtained by substituting the variables x m+1,...,xn accordingly in P .
(b) Run the decision trees T 1 , . . . , T n 1 on coeff(P σ ) and compute their outputs. If all the outputs are ?, output ?. Otherwise, some T i outputs N σ , the number of satisfying assignments to P σ . Add this to the current estimate to N .

Output N .
Correctness. It is clear from Corollary 12 and step 3b that algorithm A outputs either ? or the correct number of satisfying assignments to f . Further, we claim that with probability at least 1 − 1/2 Ω(n) , the output is indeed the number of satisfying assignments to f . To see this, observe that it follows from Corollary 12 that for each setting σ ∈ {−1, 1} n−m to the variables x m+1 , . . . , x n , each linear decision tree T i produced in step 1 errs on coeff(P σ ) (i.e. outputs ?) with probability at most 1/2. The probability of each T i doing so is thus at most 1/2 n 1 , as they are constructed independently. So the probability that the algorithm fails to determine N σ is at most 1/2 n 1 . Finally, taking a union bound over all σ, which are 2 n−m in number, we conclude that the probability of algorithm A outputting ? is at most 2 n−m /2 n 1 ≤ 1/2 Ω(n) .

Constant-depth circuits with PTF gates
In this section we give an algorithm for counting the number of satisfying assignment for a k-PTF circuit of constant depth and slightly superlinear size. We begin with some definitions. For a Boolean function f with majority value b, an assignment x ∈ {−1, 1} n is said to be a majority assignment if f (x) = b and minority assignment otherwise.
Definition 15. Given a k-PTF f on n variables specified by a polynomial P , a parameter m ≤ n and a partial assignment σ ∈ {−1, 1} n−m on n − m variables, let P σ be the polynomial obtained by substituting the variables in P according to σ. If P has parameters (n, M ) then P σ has parameters (m, M ). For a k-PTF circuit C, C σ is defined similarly. If C has parameters (n, s, d, M ) then C σ has parameters (m, s, d, M ).
Outline of the #SAT procedure. For designing a #SAT algorithm for k-PTF circuits, we use the genric framework developed by Kabanets and Lu [KL18] with some crucial modifications.
Given a k-PTF circuit C on n variables of depth d we want to count the number of satisfying assignments a ∈ {−1, 1} n such that C(a) = −1. We in fact solve a slightly more general problem. Given (C, P), where C is a small k-PTF circuit of depth d and P is a set of k-PTF functions, such that f ∈P fan-in(f ) is small, we count the number of assignments that simultaneously satisfy C and all the function in P.
At the core of the algorithm that solves this problem, Algorithm B, is a recursive procedure A 5 , which works as follows: on inputs (C, P) it first applies a simplification step that outputs ≪ 2 n instances of the form (C ′ , P ′ ) such that • Both C ′ and functions in P ′ are on m ≪ n variables.
• The sets of satisfying assignments of these instances "almost" partition the set of satisfying assignments of (C, P).
• With all but very small probability the bottom layer of C ′ has the following nice structure.
-At most n gates are δ-biased. We denote this set of gates by B (as we will simplify them by setting them to the values they are biased towards).
-At most n β d gates are not δ-biased. We denote these gates by G (as we will simplify them by "guessing" their values).
• There is a small set of satisfying assignments that are not covered by the satisfying assignments of (C ′ , P ′ ) but we can count these assignments with a brute-force algorithm that does not take too much time.
For each C ′ with this nice structure, then we try to use this structure to create C ′′ which has depth d − 1. Suppose we reduce the depth as follows: • Set all the gates in B to the values that they are biased towards.
• Try all the settings of the values that the gates in G can take, thereby from C ′ creating possibly 2 n β d instances (C ′′ , P ′ ).
(C ′′ , P ′ ) now is an instance where C ′′ has depth d − 1. Unfortunately, by simply setting biased gates to the values they are biased towards, we may miss out on the minority assignments to these gates which could eventually satisfy C ′ . We design a subroutine A 3 to precisely handle this issue, i.e. to keep track of the number of minority assignments, say N C ′ . This part of our algorithm is completely different from that of [KL18], which only works for subquadratically sparse PTFs.
Once A 3 has computed N C ′ , i.e. the number of satisfying assignments among the minority assignments, we now need to only count the number of satisfying assignments among the rest of the assignments.
To achieve this we use an idea similar to that in [CSS18,KL18], which involves appending P ′ with a few more k-PTFs (this forces the biased gates to their majority values). This gives say a setP ′ . Similarly, while setting gates in G to their guessed values, we again use the same idea to ensure that we are counting satisfying assignments consistent with the guessed values, once again updatingP ′ to a new set P ′′ . This creates instances of the form (C ′′ , P ′′ ), where the depth of C ′′ is d − 1.
This way, we iteratively decrease the depth of the circuit by 1. Finally, we have instances (C ′′ , P ′′ ) such that the depth of C ′′ is 1, i.e. it is a single k-PTF, say h. At this stage we solve #SAT(C), whereC = h ∧ f ∈P ′′ f . This is handled in a subroutine A 4 . Here too our algorithm differs significantly from [KL18].
In what follows we will prove Theorem 7. In order to do so, we will set up various subroutines A 1 , A 2 , A 3 , A 4 , A 5 designed to accomplish certain tasks and combine them together at the end to finally design algorithm B for the #SAT problem for k-PTF circuits.
A 1 will be an oracle, used in other routines, which will compute number of common satisfying assignments for small AND of PTFs on few variables (using the same idea as in the algorithm for #SAT for k-PTFs). A 2 will be a simplification step, which will allow us to argue to argue about some structure in the circuit (this algorithm is from [KL18]). It will make many gates at the bottom of the circuit δ-close to a constant, thus simplifying it. A 3 will be used to count minority satisfying assignments for a bunch of δ-biased PTFs, i.e. assignments which cause at least one of the PTFs to evaluate to its minority value. A 4 will be a general base of case of our algorithm, which will count satisfying assignments for AND of superlinear many PTFs, by first using A 2 to simplify the circuit, then reducing it to the case of small AND of PTFs and then using A 1 . A 5 will be a recursive procedure, which will use A 2 to first simplify the circuit, and then convert it into a circuit of lower depth, finally making a recursive call on the simplified circuit.

Oracle access to a subroutine: Let
Input: AND of k-PTFs, say f 1 , . . . , f s specified by polynomials P 1 , . . . , P s respectively, such that s ≤ n 0.1 and for each i ∈ [s], f i is defined over n ′ ≤ n 1/(2(k+1)) variables and w(P i ) ≤ M .
In what follows, we will assume that we have access to the above subroutine A 1 . We will set up such an oracle and show that it answers any call to it in time poly(n, M ) in Section 4.5.

Simplification of a k-PTF circuit
For any 1 > ε ≫ (log n) −1 , let β = Aε and δ = exp(−n β/B·k 2 ), where A and B are constants. Note that it is these constants A, B we use in the parameter settings paragraph above. Let A 2 (C, d, n, M ) be the following subroutine.
Input: k-PTF circuit C of depth d on n variables with size n 1+ε and weight M .
Output: A decision tree T DT of depth n − n 1−2β such that for a uniformly random leaf σ ∈ {−1, 1} n−n 1−2β it outputs a good circuit C σ with probability 1 − exp(−n ε ), where C σ is called good if its bottom layer has the following structure: there are at most n gates which are δ-close to an explicit constant. Let B σ denote this set of gates.
there are at most n β gates that are not δ-close to an explicit constant. Let us denote this set of gates by G σ .
In [KL18], such a subroutine A 2 (C, d, n, M ) was designed. Specifically, they proved the following theorem.
Theorem 16 (Kabanets and Lu [KL18]). There is a zero-error randomized algorithm A 2 (C, d, n, M ) that runs in time poly(n, M ) · O(2 n−n 1−2β ) and outputs a decision tree as described above with probability at least 1 − 1/2 10n (and outputs ? otherwise). Moreover, given a good C σ , there is a deterministic algorithm that runs in time poly(n, M ) which computes B σ and G σ .
Remark 17. In [KL18], it is easy to see that the probability of outputting ? is at most 1/2. To bring down this probability to 1/2 10n , we run their procedure in parallel 10n times, and output the first tree that is output by the algorithm. The probability that no such tree is output is 1/2 10n .
Remark 18. In designing the above subroutine in [KL18], they consider a more general class of polynomially sparse-PTF circuits (i.e. each gate computes a PTF with polynomially many monomials) as opposed to the k-PTF circuits we consider here. Under this weaker assumption, they get that δ = exp(−n Ω(β 3 ) ). However, by redoing their analysis for degree k-PTFs, it is easy to see that δ could be set to exp(−n β/B·k 2 ) for some constant B. Under this setting of δ, we get exactly the same guarantees. In this sense, the above theorem statement is a slight restatement of [KL18, Theorem 3.11].
Oracle access to: A 1 .

Lemma 19. There is a deterministic algorithm
Proof. We start with the description of the algorithm.
(b) Using oracle A 1 (q, 1, −g i,ρ ), check for each i ∈ [ℓ] if g i,ρ is the constant function −1 by checking if the output of the oracle on the input −g i,ρ is zero.
(c) If there is an i ∈ [ℓ] such that g i,ρ is not the constant function −1, try all possible assignments χ to the remaining q variables x 1 , . . . , x q . This way, enumerate all assignments b = (χ, ρ) to x 1 , . . . , x m for which there is an i ∈ [ℓ] such that P i (b) > 0. Add such an assignment to the collection N .
Correctness. If a ∈ {−1, 1} m is a minority assignment (i.e. ∃i 0 ∈ [ℓ] so that P i 0 (a) < 0) and if a = (χ, ρ) where ρ is an assignment to the last m − q variables, and χ to the first q, a will get added to N in the loop of step 2 corresponding to ρ and that of χ in step 2c, because of i 0 being a witness. Conversely, observe that we only add to the collection N when we encounter a minority assignment.
Running time. and let T c denote its complement. Also note that for a ρ ∈ T , enumeration of minority assignments in step 2c takes 2 q · ℓ · poly(m, M ) time. Therefore, we can bound the total running time by poly(m, M )(2 q · |T | + |T c |).
Next, we claim that the size of T is small: Writing LHS in the following way, we have where D m−q and D q denote uniform distributions on assignments to the last m − q variables and the first q variables respectively. By Markov's inequality, Consider a ρ for which this event does not occur i.e. for which Pr χ∼Dq [g i,ρ (χ) = 1] < √ δ. For such a ρ, g i,ρ has only 2 q = 1/ √ δ many inputs and therefore, g i,ρ must be the constant function −1. Thus, we conclude that Finally, by using the trivial bound |T c | ≤ 2 m−q and the above claim, we obtain a total running time of poly(m, M ) · √ δ · 2 m and this concludes the proof of the lemma.

#SAT for AND of k-PTFs
We design an algorithm A 4 (n, M, g 1 , . . . , g τ ) with the following functionality.

Run
A 2 (C, 2, n, M ) to obtain the decision tree T DT . Initialize N to 0.
3. For each leaf σ of T DT , do the following: (A) If C σ is not good, count the number of satisfying assignments for C σ by brute-force and add to N .
(B) If C σ is good, do the following: (i) C σ is now an AND of PTFs in B σ and G σ , over n ′ = n 1−2β 1 variables, where all PTFs in B σ are δ-close to an explicit constant, where δ = exp(−n β 1 /B·k 2 ). Moreover, σ exists, then count the number of satisfying assignments for C σρ by brute-force and add to N .
(c) If the above does not hold, we have established that for each h i ∈ B σ , h i,ρ is the constant function a i . If ∃i ∈ [ℓ] such that a i = 1, it means C σρ is also a constant 1 . Then simply halt. Else set each h i to a i . Thus, C σρ has been reduced to an AND of n β 1 many PTFs over m variables. Call this set G ′ σρ , use A 1 (m, n β 1 , G ′ σρ ) to calculate the number of satisfying assignments and add the output to N . 4. Finally, output N .

The correctness argument and running time analysis
Lemma 21. A 4 is a zero-error randomized algorithm that counts the number of satisfying assignments correctly. Further, A 4 runs in time poly(n, M ) · 2 n−n α and outputs the right answer with probability at least 1/2 (and outputs ? otherwise).
Proof. Correctness. For a leaf σ of T DT , when C σ is not good, we simply use brute-force, which is guaranteed to be correct. Otherwise, • If h ′ ρ not the constant function −1 for some h ′ ∈ B ′ σ , then we again use brute-force, which is guaranteed to work correctly.
Here we only need to consider the satisfying assignments for the gates in G σρ . For this we use A 1 , that works correctly by assumption.
Further, we need to ensure that the parameters that we call A 1 on, are valid. To see this, observe that m = n α ≤ n 1/(2(k+1)) because of the setting of α and further, we have n β 1 ≤ n 0.1 .
Finally, the claim about the error probability follows from the error probability of A 2 (Theorem 16).
Running Time. The time taken for constructing T DT is O * (2 n−n 1−2β 1 ), by Theorem 16. For a leaf σ of T DT , we know that step (A) is executed with probability at most 2 −n ε 1 . The total time for running step (A) is thus O * (2 n−n ε 1 ).
We know that the oracle A 1 answers calls in poly(n, M ) time. Hence, the total time for running step (a) is O * (2 n−n α ). Next, note that if step (b) is executed, then all PTFs in B σ are δ-close to −1. So, the number of times it runs is at most δ · 2 n ′ . Therefore, the total time for running step (b) is O * (2 n+n α −n β 1 /Bk 2 ). Similar to the analysis of step (a), the total time for running step (c) is also O * (2 n−n α ).
Summing them up, we conclude that total running time is O * (2 n−n α ), as due to our choice of various parameters, n − n α is the dominating power of 2. This completes the proof.
a set P of k-PTFs g 1 , . . . , g τ on n variables, which are specified by polynomials P 1 , . . . , P τ such that τ i=1 fan-in(g i ) ≤ n 1+ε d and for each i ∈ [τ ], w(P i ) ≤ M . Oracle access to: A 1 , A 4 .
We start by describing the algorithm.

The details of the algorithm.
Let count be a global counter initialized to 0 before the execution of the algorithm. (a) For each i ∈ [τ ] compute P i,σ , the polynomial obtained by substituting σ in its variables.

The correctness argument and running time analysis
Lemma 22. The algorithm A 5 described above is a zero-error randomized algorithm which on input (C, P) as described above, correctly #SAT(C, P). Moreover, the algorithm outputs the correct answer (and not ?) with probability at least 1/2. Finally, A 5 (n, d, M, n 1+ε d , C, ∅) runs in time poly(n, M ) · 2 n−n ζε d /2(k+1) , where parameters ε d , ζ are as defined at the beginning of Section 4.
Proof. We argue correctness by induction on the depth d of the circuit C. Clearly, if d = 1, correctness follows from the correctness of algorithm A 4 . This takes care of the base case.
If d ≥ 2, we argue first that if the algorithm does not output ?, then it does output #SAT(C, P) correctly. Assume that the algorithm A 2 outputs a decision tree T DT as required (otherwise, the algorithm outputs ? and we are done). Now, it is sufficient to argue that for each σ, the number of satisfying assignments to (C σ , P σ ) is computed correctly (if the algorithm does not output ?).
Fix any σ. If C σ is not a good circuit, then the algorithm uses brute-force to compute #SAT(C σ , P σ ) which yields the right answer. So we may assume that C σ is indeed good. Now, the satisfying assignments to (C σ , P σ ) break into two kinds: those that are minority assignments to the set B σ and those that are majority assignments to B σ . The former set is enumerated in Step 3e (correctly by our analysis of A 3 ) and hence we count all these assignments in this step.
Finally, we claim that the satisfying assignments to (C σ , P σ ) that are majority assignments of all gates in B σ are counted in Step 3f. To see this, note that each such assignment a ∈ {−1, 1} n 1−2β d forces the gates in G σ to some values b 1 , . . . , b t ∈ {−1, 1}. Note that for each such b ∈ {−1, 1} t , these assignments are exactly the satisfying assignments of the pair (C σ,b , P σ,b ) as defined in the algorithm. In particular, the number satisfying assignments to (C σ , P σ ) that are majority assignments of all gates in B σ can be written as b∈{−1,1} t #SAT(C σ,b , P σ,b ).
We now want to apply the induction hypothesis to argue that all the terms in the sum are computed correctly. To do this, we need to argue that the size of C σ,b and the total fan-in of the gates in P σ,b are bounded as required (note that the total size of C remains the same, while the total fan-in of P increases by the total fan-in of the gates in B ′ σ ∪ G ′ σ,b which is at most n 1+ε d ). It can be checked that this boils down to the following two inequalities both of which are easily verified for our choice of parameters (for large enough n). Thus, by the induction hypothesis, all the terms in the sum are computed correctly (unless we get ?). Hence, the output of the algorithm is correct by induction. Now, we analyze the probability of error. If d = 1, the probability of error is at most 1/2 by the analysis of A 4 . If d > 2, we get an error if either A 2 outputs ? or there is some σ such that the corresponding runs of A 5 or A 4 output ?. The probability of each is at most 1/2 10n . Taking a union bound over at most 2 n many σ, we see that the probability of error is at most 1/2 Ω(n) ≤ 1/2. Finally, we analyze the running time. Define T (n, d, M ) to be the running time of the algorithm on a pair (C, P) as specified in the input description above. We need the following claim. To see the above, we argue again by induction. The case d = 1 follows from the running time of A 4 . Further from the description of the algorithm, we get the following inequality for d ≥ 2.
T (n, d, M ) ≤ poly(n, M ) · (2 n−n 1−2β d + 2 n−n ε d + 2 n− 1 2 ·n −β d /(Bk 2 ) + 2 n−n (1−2β d )ζε d−1 /2(k+1) ) (1) The first term above accounts for the running time of A 2 and all steps other Steps 3b,3e and 3f. The second term accounts for the brute force search in Step 3b since there are only a 2 −n ε d fraction of σ where it is performed. The third term accounts for the minority enumeration algorithm in Step 3e (running time follows from the running time of that algorithm). The last term is the running time of Step 3f and follows from the induction hypothesis. It suffices to argue that each term in the RHS of (1) can be bounded by 2 n−n ζε d /2(k+1) . This is an easy verification from our choice of parameters and left to the reader. This concludes the proof.

Putting it together
In this subsection, we complete the proof of Theorem 7 using the aforementioned subroutines. We also need to describe the subroutine A 1 , which is critical for all the other subroutines. We shall do so inside our final algorithm for the #SAT problem for k-PTF circuits, algorithm B. Recall that A 1 has the following specifications: Input: AND of k-PTFs, say f 1 , . . . , f s specified by polynomials P 1 , . . . , P s respectively, such that s ≤ n 0.1 and for each i ∈ [s], f i is defined over n ′ ≤ n 1/(2(k+1)) variables and w(P i ) ≤ M .
We are now ready to complete the proof of Theorem 7. Suppose C is the input k-PTF circuit with parameters (n, n 1+ε d , d, M ). On these input parameters (C, n, n 1+ε d , d, k, M ), we finally have the following algorithm for the #SAT problem for k-PTF circuits: B(C, n, n 1+ε d , d, k, M)

(Oracle Construction
Step) Construct the oracle A 1 as follows. Use n 1 = 10n independent runs of the algorithm from Corollary 12, with ℓ chosen to be n 0.1 and m to be n 1/2(k+1) , to construct independent random linear decision trees T 1 , . . . , T n 1 such that on any input w = (coeff m,k (Q 1 ), . . . , coeff m,k (Q ℓ )) ∈ R r·ℓ (where Q i s are polynomials of degree at most k that sign-represent k-PTFs g i , each on m variables), each T i computes the number of common satisfying assignments to g 1 , . . . , g ℓ with error at most 1/2.

Run
A 5 (n, d, M, n 1+ε d , C, ∅). For an internal call to A 1 , say on parameters (n ′ , s, f 1 , . . . , f s ) where n ′ ≤ m and s ≤ ℓ, do the following: (a) Run each T i on the input w = (coeff n ′ ,k (P 1 ), . . . , coeff n ′ ,k (P s )) ∈ R r·s . (We expand out the coefficient vectors with dummy variables so that they depend on exactly m variables. Similarly, using some dummy polynomials, we can assume that there are exactly ℓ polynomials.) (b) If some T i outputs the number of common satisfying assignments to f 1 , . . . , f s , then output that. Otherwise, if all T i output ?, then output ?.
Lemma 24. The construction of the zero-error randomized oracle A 1 in the above algorithm takes 2 O(n 0.6 ) time. Once constructed, the oracle A 1 answers any call (with the correct parameters) in poly(n, M ) time with error at most 1/2 10n .
Proof. Correctness. It is clear from Corollary 12 that algorithm A 1 outputs either ? or the correct number of common satisfying assignments to f 1 , . . . , f s . Further, as the T i s in step 1 are constructed independently, it follows that with probability at least 1 − 1/2 10n , the algorithm indeed outputs the number of common satisfying assignments to f 1 , . . . , f s . Running Time. Substituting the parameters ℓ = n 0.1 and m = n 1/(2(k+1)) in Corollary 12, we see that the construction of A 1 (step 1) takes n 1 · 2 n 0.6 time. Also, the claimed running time of answering a call follows upon observing that steps 2a and 2b combined take only poly(n, M ) time to execute.
With the correctness of A 1 now firmly established, we finally argue the correctness and running time of algorithm B.
Correctness. The correctness of B follows from that of A 1 , A 2 , A 3 , A 4 , and A 5 (see Lemma 24, Theorem 16, Lemmas 19, 21, and 22 respectively). Furthermore, if the algorithm A 1 is assumed to have no error at all, then from the analysis of A 5 , we see that the probability of error in B is at most 1/2. However, as algorithm A 1 is itself randomized, we still need to bound the probability that any of the calls made to A 1 produce an undesirable output (i.e. an output of ?). To this end, first note that as the running time of A 5 is bounded by 2 n , the number of calls to A 1 is also bounded by 2 n . But by Theorem 16 and Lemma 24, the probability of A 1 outputting ? is bounded by 1/2 10n . Therefore, by the union bound, algorithm B correctly outputs the number of satisfying assignments to the input circuit C with probability at least 1/2 − 1/2 Ω(n) ≥ 1/4. Running Time. By Lemma 22 and 24, the running time of B will be 2 O(n 0.6 ) + poly(n, M ) · 2 n−n ζε d /2(k+1) . Thus, the final running time is poly(n, M ) · 2 n−S where S = n ζε d /2(k+1) and where ε d > 0 is a constant depending only on k and d. Setting ε k,d = ζε d /2(k + 1) gives the statement of Theorem 7.