Abstract 1 Introduction 2 Technical overview 3 Direct sum for disc 4 Direct sum for D× part I: proof organisation 5 Direct sum for D× part II: direct sum for S 6 Direct sum for D× part III: from S to D× 7 Separations I: disc vs. D× 8 Separations II: S vs. D× References Appendix A Appendix

Direct Sums for Parity Decision Trees

Tyler Besselman ORCID NYU Shanghai, China Mika Göös EPFL, Lausanne, Switzerland Siyao Guo ORCID NYU Shanghai, China Gilbert Maystre ORCID EPFL, Lausanne, Switzerland Weiqiang Yuan ORCID EPFL, Lausanne, Switzerland
Abstract

Direct sum theorems state that the cost of solving k instances of a problem is at least Ω(k) times the cost of solving a single instance. We prove the first such results in the randomised parity decision tree model. We show that a direct sum theorem holds whenever (1) the lower bound for parity decision trees is proved using the discrepancy method; or (2) the lower bound is proved relative to a product distribution.

Keywords and phrases:
direct sum, parity decision trees, query complexity
Funding:
Tyler Besselman: Supported by the National Natural Science Foundation of China Grant No.62102260, NYTP Grant No.20121201, and NYU Shanghai Boost Fund.
Mika Göös: Supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number MB22.00026.
Siyao Guo: Supported by the National Natural Science Foundation of China Grant No.62102260, NYTP Grant No.20121201, and NYU Shanghai Boost Fund.
Gilbert Maystre: Supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number MB22.00026.
Weiqiang Yuan: Supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number MB22.00026.
Copyright and License:
[Uncaptioned image] © Tyler Besselman, Mika Göös, Siyao Guo, Gilbert Maystre, and
Weiqiang Yuan; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Oracles and decision trees
Related Version:
Full Version: https://eccc.weizmann.ac.il/report/2024/203/
Acknowledgements:
We thank Farzan Byramji for useful comments on an earlier version of this paper.
Editors:
Srikanth Srinivasan

1 Introduction

One of the most basic questions that can be asked for any model of computation is:

How does the cost of computing k independent instances scale with k?

A direct sum theorem states that if the cost of solving a single copy is C, then solving k copies has cost at least Ω(kC), which matches the trivial algorithm that solves the k copies separately. Direct sums have been studied exhaustively for randomised query complexity Rdt, randomised communication complexity Rcc, and other concrete models of computation; see Section 1.3 for prior work. In this work, we initiate the study of direct sum problems for randomised parity decision tree complexity Rpt, a computational model sandwiched between the widely-studied Rdt and Rcc.

Parity decision trees

Parity decision trees generalise the usual notion of decision trees by allowing parity queries. To compute a function f:{0,1}n{0,1} on input x{0,1}n, a deterministic parity decision tree T performs queries of the form “what is a,x?” where a{0,1}n and a,xiaiximod2. Once enough queries have been made, T outputs f(x). Parity decision trees are more powerful than ordinary decision trees: We have Dpt(f)Ddt(f) where Ddt(f) (resp. Dpt(f)) denotes the (parity) decision tree complexity of f, defined as the least depth of a deterministic (parity) decision tree computing f. On the other hand, the n-bit XOR function is an example where Ddt(XOR)=n while Dpt(XOR)=1. We define a randomised parity decision tree 𝒯 as a distribution over deterministic parity trees T𝒯. Then Rεpt(f) is defined as the worst-case depth (over both input and randomness of the tree) of the best randomised parity tree 𝒯 computing f with error ε, that is, Pr[𝒯(x)f(x)]ε for all x. As usual, we let RptR1/3pt. To simplify notation, we drop the superscript pt and write D=Dpt and R=Rpt for short.

Our main research question is now formulated as follows. Let fk:({0,1}n)k{0,1}k denote the direct sum function that takes k instances x(x1,,xk) and returns the value of f on each of them, fk(x)(f(x1),,f(xk)). We study the following question.

Question 1.

Do we have R(fk)Ω(k)R(f) for every function f?

We show two (incomparable) main results: We answer Question 1 affirmatively when the randomised parity decision tree lower bound is proved using the discrepancy method (Section 1.1), or when the lower bound is proved relative to a product distribution (Section 1.2).

1.1 First result: Direct sum for discrepancy

Discrepancy is one of the oldest-known methods for proving randomised communication lower bounds [56, 3]. Let us tailor its definition to the setting of randomised parity trees. Thinking of {0,1}n as the vector space 2n, consider some affine subspace S{0,1}n and a probability distribution μ over the inputs {0,1}n. The discrepancy of S measures how biased f is on S. Namely, let CSbPr𝒙μ[f(𝒙)=b𝒙S]. The difference ΔS|CS0CS1| is called the bias of S under μ. We define bias(f) as the minimum over μ of the maximum difference ΔS an affine subspace can attain. Finally, the discrepancy bound disc(f) is defined as log(1/bias(f)). As in communication complexity, it is not hard to see that R(f)Ω(disc(f)); see Section 3 for details.

Theorem 1.

We have R(fk)Ω(k)disc(f) for any function f.

In particular, if we have a function f whose randomised parity decision tree complexity is equal to its discrepancy, R(f)=Θ(disc(f)), then Theorem 1 shows R(fk)Ω(k)R(f) answering Question 1 for that function. To prove Theorem 1, we first establish a particularly simple characterisation of disc(f) that relies on affine spaces defined by a single constraint. We then prove a perfect direct sum (and even an XOR lemma) for discrepancy using Fourier analysis.

1.2 Second result: Direct sum for product distributions

The standard approach for proving randomised lower bounds is to use Yao’s principle [55], which states that R(f)=maxμD1/3(f,μ). Here Dε(f,μ) is the distributional ε-error complexity of f defined as the least depth of a (deterministic) parity tree T such that Pr𝒙μ[T(𝒙)f(𝒙)]ε. We say that a distribution μ over {0,1}n is product if it can be written as the product of n independent Bernoulli distributions. We define the best lower bound provable using a product distribution as

Dε×(f)maxμ productDε(f,μ)andD×D1/3×.

Our second result answers Question 1 affirmatively (modulo logarithmic factors) whenever the randomised parity decision tree lower bound is proved relative to a product distribution.

Theorem 2.

We have R(fk)Ω(k/logn)D×(f) for any n-bit function f.

We show moreover that the O(logn)-factor loss in Theorem 2 can be avoided when μ is the uniform distribution (or more generally any bounded-bias distribution). One should compare this to the state-of-the-art in communication complexity, where the quantitatively best distributional direct sum results are also for product distributions and suffer logarithmic-factor losses [37, 4].

To prove Theorem 2, we introduce a new complexity measure tailored for product distributions, which we call skew complexity S(f) and which we define precisely in Section 4. We prove that this new measure admits a perfect direct sum theorem, S(fk)=Ω(k)S(f), and that it characterises the measure D× up to an O(logn) factor. (We also show that the logarithmic loss is necessary for our approach: there is a function f such that S(f)=O(1), even though D×(f)=Θ(logn).) We give a more in-depth technical overview in Section 2.

Comparison of main results

We also show that our two main results (Theorems 1 and 2) are incomparable: For some functions f, our first result gives a much stronger lower bound for fk than the second result – and vice versa. See Section 7 for the proof.

Lemma 3.

The complexity measures disc and D× are incomparable:

  1. 1.

    There is an n-bit function f such that disc(f)=O(logn) while D×(f)=Θ(n).

  2. 2.

    There is an n-bit function f such that disc(f)=Θ(n) while D×(f)=O(1).

1.3 Related work

Parity decision trees

Even though the direct sum problem for parity decision trees has not been studied before, the model has been studied extensively. Parity decision trees were first defined by Kushilevitz and Mansour [40] in the context of learning theory. Several prior works have studied their basic combinatorial properties [57, 46] as well as Fourier-analytic properties [28, 27], often with connections to the log-rank conjecture [54, 53, 48, 20, 32, 43]; see also the survey [39]. There are various lifting theorems involving parity decision trees: lifting from Dpt to Dcc [31], from Ddt to Dpt [19, 5, 1], and from Rdt to Rpt [52, 17]. These lifting theorems have played a central role in proving lower bounds for proof systems that can reason using parities [33, 22, 25, 11, 18, 2].

Decision trees

In the decision tree model with classical queries, a deterministic direct sum theorem, Ddt(fk)=kDdt(f), and even the stronger composition theorem, Ddt(gfk)=Ddt(g)Ddt(f), are easy to show by combining adversary strategies [50]. In the randomised case, an optimal direct sum result, Rdt(fk)Ω(k)Rdt(f), is known [38, 36, 21]. Whether a composition theorem holds for randomised query complexity, Rdt(gfk)Ω(Rdt(g)Rdt(f)) (for total g and f), is a major open problem [10, 6, 8, 7, 49]. In the randomised setting, it is possible that the direct sum problem fk requires strictly more than Θ(k)Rdt(f) queries: if one wants to succeed in computing all k copies with probability 2/3, then a naive application of the union bound would require each copy to have error 1/k. Results stating that one sometimes has Rdt(fk)ω(k)Rdt(f) are called “strong” direct sum theorems [12, 13] and they sometimes hold even for composed functions [9, 16, 29].

Communication complexity

The direct sum question for deterministic communication complexity was posed in [23] and it remains a notoriously difficult open problem [34]. By contrast, in the randomised setting, the direct sum problem is characterised by information complexity [15], which has inspired a line of works too numerous to cite here; see [35, §1.1] for an up-to-date overview. One of the key findings is that a direct sum for communication protocols is false in full generality in the distributional setting [26, 47]. We leave open the intriguing possibility that the information complexity approach can be adapted to parity decision trees. Historically, one of the first direct sum theorems proved for randomised communication was for the discrepancy bound [51, 41] (analogously to our Theorem 1). Here, discrepancy is known to be equivalent to the γ2-norm [42]. We also mention that a near-optimal direct sum theorem holds for product distributions [4] (analogously to our Theorem 2).

1.4 Open question: Deterministic direct sum

The main question left open by our work is Question 1, namely, whether R=Rpt admits a direct sum theorem for all functions f. However, we would also like to highlight the analogous question in the deterministic case D=Dpt. As discussed above, this is a long-standing open problem in the case of deterministic communication complexity Dcc. The best results so far are:

  1. 1.

    Dcc(fk)Ω~(k)Dcc(f)1/2 as proved in [23].

  2. 2.

    Dcc(fk)Ω~(k)Dcc(f)/logrank(f) as proved in [34].

We observe in Section A.1 that both approaches have analogues in the parity setting.

Theorem 4.

For any function f and k1,

  1. 1.

    D(fk)kD(f)1/2,

  2. 2.

    D(fk)kD(f)/logspar(f).

We leave it as an open question whether a perfect direct sum theorem holds for deterministic parity decision trees. We think one should attack this problem before addressing the (presumably much harder) problem for deterministic communication complexity.

2 Technical overview

We focus here on our second main result in Theorem 2 stating that R(fk)Ω(k/logn)D×(f) and which is technically the much more involved theorem. Our main technical result is the following direct sum result for distributional complexity. Here μkμ××μ (k times).

Theorem 5.

There exists a universal constant C such that the following holds. For any f:{0,1}n{0,1}, product distribution μ over {0,1}n, and k1,

Dε(fk,μk)Ω(kδlog(n/δ))(Dε+δ(f,μ)Clog(n/δ))ε,δ0.

When D×(f)6Clogn, Theorem 2 follows by taking ε=δ=1/6. Indeed, let μ be the distribution achieving the maximum for D×. Using the easy direction of the minimax principle:

R(fk)Ω(1)D1/6(fk,μk)Ω(k/logn)D1/3(f,μ)=Ω(k/logn)D×(f).

The remaining case D×(f)6Clog(n) is handled separately using ad-hoc methods in Lemma 38. We now give an overview of the proof of Theorem 5.

Warm-up: Uniform distribution

We showcase the basic proof technique by sketching the proof in the simple case where μ is the uniform distribution. Fix an n-bit function f and let 𝒰 be the uniform distribution over {0,1}n. In the uniform (and more generally in the bounded-bias) case, we are actually able to avoid the logn additive/factor loss and obtain, for all k1,

Dε(fk,𝒰k)Ω(kδ)Dε+δ(f,𝒰)δ0. (1)

Fix a decision tree T of depth d computing k copies of f with error at most ε when 𝒙𝒰k. We show how to extract a tree T that computes a single copy 𝒚𝒰 with error at most ε+δ and depth O(d/kδ). Leaves of T correspond to affine subspaces of ({0,1}n)k of codimension d. More generally, one can associate with any node v of T the set Cv={w1,,wd(v)} of linear constraints that led to the node (d(v) is the depth of the node v; the root is at level 0) and the vector b{0,1}k of desired values. The set of inputs Sv that reach node v is then given by Sv{x({0,1}n)k:wj,x=bj,j[d(v)]}.

Of relevance here are the pure constraints one can extract from Cv. A pure constraint for copy i[k] is some w({0,1}n)k such that wj0n if and only if j=i. To be more precise, the number of pure queries that can be extracted for query i at node v is defined with:

purei(Cv)dim(span(Cv)Wi)whereWi{w({0,1}n)k:wj=0n,ji}.

We describe next two illustrative examples when there are k=2 copies.

  1. 1.

    Node v corresponds to constraints “x11+x12=0” and “x12=1”. Then, pure1(Cv)=1 as it is possible to extract the pure parity constraint x11=1 by adding the two constraints. In the same vein, pure2(Cv)=1.

  2. 2.

    Node v corresponds to constraints “x11+x12=0” and “x12+x22=1”. Then, pure1(Cv)=0 as it not possible to extract a pure constraint for the first copy.

Observation 6.

For any node v, we have d(v)i[k]purei(Cv).

As the second example highlights, it is possible for the inequality to be strict. This is a notable difference with classical decision trees: for any subcube C({0, 1,}n)k, the sum of fixed bits of each copy is the total number of fixed bits in C.

Where to plant 𝒚?

The overarching idea of our result is that under the uniform distribution, queries that increase the pure rank for a copy are the only ones that bring usable information. It is thus enough to find a copy with low expected pure rank in T and plant the real instance y there. To make this precise, taking the expectation over leaves of T when 𝒙𝒰 with Observation 6 implies the existence of some copy i[k] with low expected pure rank:

𝔼𝒙𝒰k[purei(C(𝒙))]O(d/k).

Let us fix this advantageous copy to be i=1. On input y{0,1}n we run the tree T with y planted as x1 and delay actual querying of bits of y as much as possible. Suppose that the process has reached node v with constraint set Cv and there is a new parity query w to be answered. If wspan(Cv), the answer to that query can be found (an optimised tree would not do such a query). If wspan(Cv), we say that w is critical for Cv if it would increase the pure rank for the first copy pure1(Cv{w})>pure1(Cv). If w is critical, there is no way to avoid making a parity query to the real input y and our algorithm does it. If w is not critical, it is however enough to answer with a uniform bit (that is, move to a random child of v in T) without querying y at all.

To see this, further split w=w1w1, where w1{0,1}n is the constraint for the first copy and w1({0,1}n)k1 is the constraint for the rest of the copies. If w has pure1(Cv{w})=pure1(Cv) and wspan(Cv), it must be that 0nw1span(Cv). Since 𝒙1 is drawn from the uniform distribution we thus have for any fixed y consistent with Sv:

Pr𝒙1[w,y𝒙1=0|(y,𝒙1)Sv]=Pr𝒙1[w1,𝒙1=w1,y|(y,𝒙1)Sv]=12. (2)
Correctness and efficiency

Let us call the above randomised tree solving one copy as 𝒯. Correctness can be argued by showing that the distribution of leaves attained in the process for 𝒚𝒰 is the same as the distribution of leaves attained by 𝒙𝒰k in T. On the other hand, 𝒯 has expected depth O(d/k) as a real query to 𝒚 is only ever made purei(C) times for each leaf . In conclusion, 𝒯 has the following guarantees:

  1. 1.

    Pr𝒚𝒰,𝑻𝒯[𝑻(𝒚)f(𝒚)]ε.

  2. 2.

    𝔼𝒚𝒰,𝑻𝒯[#queries(𝑻,𝒚)]d/k.

Using Markov inequality, it is possible to derandomise 𝒯 to get a deterministic parity tree T solving f with a worst-case guarantee instead of an average-case one. This step introduces a parameter δ controlling a trade-off between cost and error and yields the desired result (1).

2.1 Beyond uniform: The skew measure

Observe that (2) can fail badly for non-uniform μ. As an illustrative example suppose that two random bits 𝒂,𝒃 are generated with 𝒂𝖡𝖾𝗋(1/2) and 𝒃𝖡𝖾𝗋(1/8). The constraint 𝒂𝒃=1 is not pure from the point of view of 𝒂. However, since 𝒃 is skewed towards being 0, the realisation of the constraint gives information about 𝒂: Pr[𝒂=0|𝒂+𝒃=1]=1/81/2. Thus, it seems one needs to query 𝒂 to answer the query 𝒂+𝒃 even though the query is not critical for 𝒂!

To circumvent this, we introduce the skew measure. This new measure is built around the observation that each bit of an input 𝒙μ can be sampled independently in two steps. Indeed, the following process is equivalent to 𝖡𝖾𝗋(1/8):

  1. 1.

    Let 𝝆{0,} be “0” with probability 3/4 and with probability 1/4.

  2. 2.

    If 𝝆=0, return “0”, else return a sample 𝖡𝖾𝗋(1/2).

Note that if we are “lucky” and 𝝆=, we are back in the uniform case and (2) holds again. If not, we have somehow pre-emptively fixed the return bit to value 0. The skew measure explicitly splits product distributions into a random partial fixing 𝝆 followed by a uniform distribution over unfixed bits of 𝝆. A tree computing in this model gets help from 𝝆 because 𝝆 reduces the complexity of the function. When those bits are unfixed, it is on the other hand easier to analyse the behaviour of the tree as it is the uniform case again.

In Sections 5 and 6, we show a perfect direct sum for the skew measure and that perhaps surprisingly, this new measure is only a logn-factor away from D×.

3 Direct sum for disc

The goal of this section is to prove Theorem 1, restated here for convenience. See 1 Let us start by defining discrepancy formally. We denote by 𝒮n the set of all affine subspaces of {0,1}n and 𝒪n𝒮n the set of affine subspaces of codimension 1. Note that all spaces S𝒪n can be written as S={x{0,1}n:a,x=b} for some a{0,1}n and b{0,1}.

Definition 7.

Let f:{0,1}n{0,1} be a boolean function and μ be a distribution over {0,1}n. The (parity) discrepancy of f with respect to μ is defined as:

disc(f,μ)logmaxS𝒮nbias(f,μ,S)wherebias(f,μ,S)|xS(1)f(x)μ(x)|.

The (parity) discrepancy of f is disc(f)maxμdisc(f,μ) where μ ranges over all distributions.

Observe that disc(f)1 for all non-constant f and by standard arguments, R(f)disc(f) (see Lemma 42). Using the latter, the only thing left to get Theorem 1 is to prove a direct sum result for discrepancy. We do this in a very strong way by actually establishing an XOR lemma for disc. Let fk denote the function that takes k instance and aggregates their result under f using XOR, so that fk(x1,,xk)f(x1)f(xk).

Lemma 8.

For any function f, distribution μ and k1,

kdisc(f,μ)disc(fk,μk)k(disc(f,μ)1).

This result is the strongest possible. Indeed, we cannot omit the “1” on the right because of the counterexample fXOR: we have disc(fk,μk)1 for any distribution μ. In Section A.3 we revisit this XOR lemma and show that it also holds in the distribution-free setting, with disc(fk)kdisc(f). As a final comment, we note that it is easier to work with fk instead of fk in the discrepancy setting, as it is somewhat tedious to define discrepancy for multi-valued functions. Before formally proving Lemma 8, we show how it is used to prove the main result Theorem 1.

Proof of Theorem 1.

Any decision tree computing fk can be converted to a decision tree computing fk. This is achieved by replacing the label y{0,1}k of each leaf by its parity y,1k. This operation does not increase the error probability or cost and so, using the easy direction of Yao’s principle:

R(fk) maxμD(fk,μk,1/3) (Lemma 41)
maxμD(fk,μk,1/3)
maxμdisc(fk,μk)log2(3) (Lemma 42)
kmaxμ(disc(f,μ)1)log2(3) (Lemma 8)
k(disc(f)1)log2(3).

If disc(f)10, then the string of inequalities yields k(disc(f)1)log2(3)kdisc(f)/10. If f is constant, the claim is vacuously true. Finally, we show that for any non-constant f, R(fk)klog(3/2) which completes the claim. Indeed, if disc(f)10, then klog(3/2)kdisc(f)/100.

To this end, let f be a non-constant function and μ a distribution over {0,1}n which is balanced over 0-inputs and 1-inputs, i.e. μ(f1(0))=μ(f1(1))=1/2. Let T be the best deterministic parity decision tree for D1/3(f,μ) and suppose toward contradiction that it has strictly less than L2k(2/3) leaves. Let G{0,1}n be the set of solutions which appear as a label on a leaf of T. We have |G|<L and since μ is balanced, any solution y{0,1}k is equally likely so that:

Pr𝒙μk[T(𝒙)=fk(𝒙)]Pr𝒙μk[fk(𝒙)G]|G|2k<2/3.

Thus, T errs with probability >1/3: a contradiction. We now proceed to prove Lemma 8 in three steps.

3.1 Step 1: Characterisation of discrepancy

Much like discrepancy for communication protocols can be characterised by the γ2-norm of the communication matrix [51, 42], we show that the parity discrepancy of f on μ is characterised by the L-norm of the Fourier transform of a related function Fμ. This characterisation has two purposes. First, proving an XOR lemma requires exploring all the possible ways for the k copies to sum to 1. This kind of convolution operation is greatly simplified in the Fourier domain, where it simply corresponds to standard multiplication. Second, the characterisation is also quite convenient to prove lower bounds on disc(f,μ) (which we do in Sections 7 and 8): it shows that maximum bias is (almost) attained for affine spaces of codimension 1 already.

The function 𝑭𝝁

We relate a real-valued boolean function F:{0,1}n with its Fourier transform F^:{0,1}n using the usual basis:

z{0,1}n,F^(z) x{0,1}nF(x)(1)x,z2n; [Fourier transform]
x{0,1}n,F(x) z{0,1}nF^(z)(1)z,x. [Inverse Fourier transform]

See also [45] for more background on Fourier analysis. We use F^ to denote the maximum absolute value of a Fourier coefficient of F. To analyze disc(f,μ), we introduce an associated function Fμ:{0,1}n defined by Fμ(x)(1)f(x)μ(x)2n and prove the following characterisation.

Lemma 9.

For every function f:{0,1}n{0,1} and distribution μ over {0,1}n:

maxS𝒪nbias(f,μ,S)maxS𝒮nbias(f,μ,S)Fμ^2maxS𝒪nbias(f,μ,S).
Proof.

The first inequality holds immediately because 𝒪n𝒮n. For the second, fix a maximizing S𝒮n. Suppose that codim(S)=d and fix its constraints aj{0,1}n and bj{0,1} for j[d] so that S={x{0,1}n:aj,x=bjj[d]}. Observe that the vectors {aj}j[d] are linearly independent. Let ΦxS(1)f(x)μ(x) so that bias(f,μ,S)=|Φ| and observe that

Φ=2nxSFμ(x)=2nxSz{0,1}nF^μ(z)(1)z,x=2nz{0,1}nF^μ(z)xS(1)z,x.

We focus on analysing terms TzxS(1)z,x. Let Vspan{a1,,ad} and observe that whenever zV, |Tz|=|S|. Indeed, if β1,,βd{0,1} is a linear combination of z in V:

Tz=xS(1)z,x=xSj[d](1)βjaj,x=xS(1)jβjbj=|S|(1)jβjbj.

On the other hand, Tz=0 for all zV. Indeed, Letting Sb=S{x{0,1}n:x,z=b} we have Tz=|S0||S1|. Because zV, the constraint x,z=b splits S in half and thus |S0|=|S1|=|S|/2. Factoring in those observations, we get:

|Φ|=2nz{0,1}nFμ^(z)Tz2n|S|zVFμ^(z)2n|S||V|Fμ^.

Recall that S has codimension d and as such |S|=2nd and |V|=2d, implying the desired inequality bias(f,μ,S)Fμ^. We now prove the third inequality of the lemma. Fix any maximum Fourier coefficient y{0,1}n and observe:

|Fμ^(y)|=x{0,1}nFμ(x)(1)x,y2n2maxb{0,1}x:x,y=b(1)f(x)μ(x).

Fix the maximizing argument to b and define S{x{0,1}n:x,y=b}. Note that S𝒪n and as such:

Fμ^=|Fμ^(y)|2xS(1)f(x)μ(x)2maxS𝒪nbias(f,μ,S).

3.2 Step 2: Direct sum for the maximum Fourier coefficient

The outer-product of functions F,G:{0,1}n is defined as the function FG:{0,1}2n with (FG)(x1,x2)F(x1)G(x2). Next is a direct sum result for its max Fourier coefficient.

Claim 10.

For any F,G:{0,1}n, FG^=F^G^.

Proof.

Let H=FG; for any z1,z2{0,1}n, the definition of Fourier transform implies

H^(z1,z2) =22nx1,x2{0,1}nH(x1,x2)(1)x1x2,z1z2
=22nx1,x2{0,1}nF(x1)G(x2)(1)x1,z1(1)x2,z2
=F^(z1)G^(z2).

From this, the equivalence is immediate:

H^=maxz1,z2|H^(z1,z2)|=maxz1,z2|F^(z1)||G^(z2)|=F^G^.

3.3 Step 3: Conclusion

We tie together Lemmas 9 and 10 and prove Lemma 8.

Proof of Lemma 8.

Let H:({0,1}n)k be the function associated with fk and μk in Lemma 9. It is possible to express H as the k-fold outer-product of Fμ: H=FμFμ. Indeed, for x({0,1}n)k, we have:

H(x)=2kn(1)fk(x)μk(x)=i[k]2n(1)f(xi)μ(xi)=i[k]Fμ(xi).

Thus, using the characterisation of Lemma 9 and Claim 10 k times:

maxS𝒮knbias(fk,μk,S)H^=(Fμ^)k2k(maxS𝒮nbias(f,μ,S))k.

The XOR-lemma disc(fk,μk)k(disc(f,μ)1) follows directly. We now show the other direction, disc(fk,μk)kdisc(f,μ). To do so, fix some S𝒮n maximizing bias(f,μ,S) and define T𝒮kn which is concatenation of k copies of S. Formally:

T={x({0,1}n)k:xiSi[k]}.

Now, it is easy to check that bias(fk,μk,T)=bias(f,μ,S)k and the claim follows.

4 Direct sum for D× part I: proof organisation

The goal of this section is to prepare the ground for a proof of our main technical contribution: a direct sum for parity trees in the distributional setting (restated below). See 5 As stated in Section 2, this is sufficient to prove Theorem 2 whenever D×(f)6Clog(n). The remaining case D×(f)6Clog(n) is proved in Lemma 38 in Section A.2. We thus focus on proving Theorem 5 in the next two sections (this section is devoted to introducing the necessary definitions and technical lemmas).

4.1 Two strengthenings of Theorem 5

For technical convenience, we study distributional complexity for randomised trees. For a deterministic parity tree T we let q(T,x) be the number of queries made by T on input x. If 𝒯 is a randomised tree and μ is a distribution, we define q¯(𝒯,μ) and errf(𝒯,μ) in the natural way with:

q¯(𝒯,μ)𝔼𝑻𝒯𝒙μ[q(𝑻,𝒙)]anderrf(𝒯,μ)Pr𝑻𝒯𝒙μ[𝑻(𝒙)f(𝒙)].

Finally, we define D¯ε(f,μ)=min𝒯{q¯(𝒯,μ):errf(𝒯,μ)ε}. It is clear that D¯ε(f,μ)Dε(f,μ) but a converse result is more complicated, as the derandomisation can increase both the error and the depth simultaneously.

Claim 11.

For any f:{0,1}n{0,1}, μ over {0,1}n and ε,δ0, Dε+δ(f,μ)D¯ε(f,μ)/δ.

We delay a proof of this folklore fact to Section A.4. We also refer readers to [36] which proves the analogue for ordinary decision trees. With this tool in hand, we can reduce Theorem 5 to the following theorem.

Theorem 12.

There exists a universal constant C such that the following holds. For any f:{0,1}n{0,1}, product distribution μ, and k1,

D¯ε(fk,μk)Ω(k/log(n/γ))(D¯ε+γ(f,μ)Clog(n/γ))γ(0,1/n).
Definition 13.

We say that a product distribution μ over {0,1}n is λ-bounded for some λ(0,1] if Pr𝐱μ[𝐱i=1][λ/2,1λ/2] for every i[n].

In the next sections, we also show the following qualitative improvement over Theorem 12 for bounded distributions.

Theorem 14.

For any f:{0,1}n{0,1}, λ-bounded distribution μ and k1,

D¯ε(fk,μk)Ω(kλ)D¯ε(f,μ).

Let us highlight the difference between Theorem 12 and Theorem 14: the latter is free from both the logn factor and the extra error γ. This theorem is especially interesting when the hard distribution for the function at hand (e.g. MAJ) is close to the uniform one.

4.2 The Skew measure

For the rest of this paper, we let 𝒰 be the uniform distribution. Let μ be a distribution over {0,1}n and S{0,1}n. We use μ(S)sSμ(s) to denote the mass of S with respect to μ. When μ(S)>0, we let μS be the distribution of μ conditioned on S. Let ρ{0,}n be a partial assignment corresponding to the sub-cube Cρ={x{0,1}n:ρi=0xi=0i[n]}. We use μρ to denote μCρ.

4.2.1 Random partial fixings

Let μ be a product distribution over {0,1}n. We say that μ is 0-biased if Pr𝒙μ[𝒙i=0]1/2 for every i[n]. For the rest of the paper, we will assume without loss of generality that any encountered input distribution is 0-biased. Indeed, should μ not be 0-biased, we can apply the following iterative transformation. Let f0f and μ0μ. For every i[n], if Pr𝒙μ[𝒙i=1]1/2 – the coordinate is already biased in the right direction – we simply leave fifi1 and μiμi1. Otherwise, let:

fi(x1,,xi1,xi,xi+1,,xn)fi1(x1,,xi1,1xi,xi+1,,xn);
μi(x1,,xi1,xi,xi+1,,xn)μi1(x1,,xi1,1xi,xi+1,,xn).

Observe that μn is 0-biased and D¯ε(fnk,μnk)=D¯ε(fk,μk) for every ε0 and k1. Now that we are certain that μ is 0-biased, let δi2Pr𝒙μ[𝒙i=1][0,1]. We define next the random partial fixing distribution with respect to μ. The intuition comes from the observation that each bit of μ can be written as a convex combination of the fixed bit “0” and a uniform bit.

Definition 15 (Random Partial Fixing).

The random partial fixing with respect to μ, denoted μ, is a distribution of partial assignments 𝛒{0,}n sampled as follows: For each i[n], we set independently

𝝆i={0w.p. 1δiw.p. δi.

Observe that the following alternative two-step process is equivalent to sampling an input directly from μ. First, sample 𝝆μ and then sample and return 𝒙𝒰𝝆.

4.2.2 The new measure

Given a parity decision tree T and a partial assignment ρ over the input string, let Tρ denote the pruned T by

  1. 1.

    fixing all the variables in the support of ρ,

  2. 2.

    removing redundant queries (those can be written as a linear combination of previous queries).

For randomised parity decision tree 𝒯, we define 𝒯ρ as the distribution of 𝑻ρ, where 𝑻𝒯.

Definition 16.

For every randomised parity decision tree 𝒯 and product distribution μ, define the skew average cost sq¯(𝒯,μ)𝔼𝛒μ[q¯(𝒯𝛒,𝒰𝛒)]. Let f:{0,1}n{0,1} be a function. For ε0, we define the skew measure Sε(f) with:

Sε(f,μ)min𝒯{sq¯(𝒯,μ)errf(𝒯,μ)ε}.
Claim 17.

For any f:{0,1}n{0,1}, product distribution μ, and ε0, D¯ε(f,μ)Sε(f,μ). Furthermore, equality holds if μ=𝒰.

Proof.

The claim is immediate as sq¯(𝒯,μ)q¯(𝒯,μ) for every randomised parity tree 𝒯 and product distribution μ.

4.3 Proof plan

The proofs of Theorems 12 and 14 are carried out in two steps. First, we prove a perfect direct sum for the skew measure in Section 5.

Theorem 18.

We have Sε(fk,μk)kSε(f,μ) for any function f, product μ and ε0.

As a second step, we demonstrate in Section 6 that D¯ε(f,μ)Sε(f,μ). We first prove a lossless conversion for product distribution which are constant-bounded. We then extend this to general product distributions for which we lose a log(n)-factor. Let us recall here that the logn loss for general (unbounded) product distribution is inherent to the skew measure. Indeed, we show in Section 8 the existence of some f and μ for which D¯1/3(f,μ)=Θ(logn) but S0(f,μ)=Θ(1).

Theorem 19.

For any f:{0,1}n{0,1}, product distribution μ, γ(0,1/n), we have

D¯ε+γ(f,μ)O(log(n/γ))(Sε(f,μ)+1)ε0.
Theorem 20.

For any f:{0,1}n{0,1} and λ-bounded product distribution μ, we have

D¯ε(f,μ)O(1/λ)Sε(f,μ)ε0.

Combining the results above it is now straightforward to conclude and prove Theorems 12 and 14. For instance, the proof of the former goes as follows.

Proof of Theorem 12.
D¯ε(fk,μk) Sε(fk,μk) (Claim 17)
kSε(f,μ) (Theorem 18)
Ω(k/log(n/γ))(D¯ε+γ(f,μ)Clog(n/γ)). (Theorem 19)

4.4 Some notation

Let us finish this section by defining some notations which will be useful for the rest of the paper. Let T be a parity decision tree on input {0,1}n. We define 𝒩(T) as the set of nodes of T and (T) as the set of leaves of T. For each node v𝒩(T), we define the following: (items marked with are only defined for non-leaf nodes)

  • path(v): the set of nodes on the root-to-v path (including the root, excluding v)

  • d(v)|path(v)|: the depth of v

  • Qv{0,1}n: the query made at node v

  • child(v,b) the child of v corresponding to the query outcome x,Qv=b, where b{0,1}

  • Qv: an n×d(v) boolean matrix with column vectors {Qu}upath(v)

  • Qv[QvQv] of dimension n×(d(v)+1).

  • bv{0,1}d(v): the labels on the root-to-v path

For every boolean matrix A{0,1}n×m, we use rank(A) to denote the rank of A (understood as a matrix over 𝔽2) and let col(A){0,1}n be the column space of A. For every S[n], let AS{0,1}|S|×m stand for the sub-matrix of A consisting of row with indices in S. For every x,y{0,1}n and S[n], we denote xS,yS=iSxiyi by x,yS.

Let μ and ν be two distributions over S. We use dTV(μ,ν)supSS|μ(S)ν(S)| to denote the total variation distance between μ and ν and write μν if dTV(μ,ν)=0.

5 Direct sum for D× part II: direct sum for S

In this section, we prove a perfect direct sum for S (restated below). A direct consequence of this fact is a perfect direct sum for distributional parity query complexity under the uniform distribution. See 18

Corollary 21.

We have D¯ε(fk,𝒰k)kD¯ε(f,𝒰) for any function f and ε0.

Proof.

Combine Claim 17 with Theorem 18. To prove Theorem 18, our overall strategy is to take a tree achieving Sε(fk,μk) and extract a tree computing a single copy of f under μ to within error ε while having cost bounded by Sε(fk,μk)/k. To do so, we employ the extraction strategy hinted at in Section 2. The extractor works as long as the input distributions are uniform, which is the case after the random partial fixing step of S.

5.1 Extracting a single instance under uniform distributions

Let T be a deterministic parity tree taking inputs x𝒳{0,1}m1××{0,1}mk and returning labels in {0,1}k. We assume without loss of generality that the queries along any root-to-leaf path are linearly independent. Let L(){0,1}k be the label associated with the leaf (T). For i[k], we define the linear subspace Wi𝒳 of query vectors that are zero everywhere except for copy i:

Wi{w𝒳:wj=0mjji}.

We say a node v𝒩(T) is critical with respect to i if col(Qv)Wicol(Qv)Wi and denote the set of critical indices at node v with Iv{i[k]:v is critical w.r.t. i}. Finally, we let di(v)upath(v)𝟙[iIu] be the relative depth of v with respect to instance i and highlight that di(v)=dim(col(Qv)Wi). The algorithm 𝖤𝗑𝗍i(T) which extracts a tree for the i-th instance out of T is described in Algorithm 1.

Algorithm 1 𝖤𝗑𝗍i(T).

Observe that it is indeed possible to compute the value of y,Qv from bv and y,w on line 5: Since wcol(Qv), we have rank([Qvw])=rank(Qv)+1. On the other hand, as wcol(Qv), we have rank([Qvw])=rank(Qv)=rank(Qv)+1. Thus Qvcol([Qvw]), which means that Qv can be written as a linear combination of the columns of [Qvw]: Qv=Qu1++Qut+w where u1,,ut are some ancestors of v. This in turn implies that y,Qv=i[t]y,Qui+y,w.

We stress that although T is a deterministic tree, 𝖤𝗑𝗍i(T) is a randomized decision tree with internal randomness inherited from the bits 𝝃. Our main technical claim is that for any fixed y{0,1}mi, the algorithm 𝖤𝗑𝗍i(T) perfectly simulates a run of T when the input is on a random input 𝒙=(𝒙1,,𝒙i1,y,𝒙i+1,,𝒙k) and 𝒙j𝒰({0,1}mj). In a nutshell, the randomness of the other k1 instances can be substituted with the internal randomness 𝝃. To make this precise, we let Xv={x𝒳:xTQv=bv} be the set of inputs leading to the node v𝒩(T).

Claim 22.

For any y{0,1}mi, Pr𝝃[𝖤𝗑𝗍i(T) reaches node v in its execution on y]=Pr𝒙[𝒙Xv].

Proof.

Let us fix i1 and dd(v) for simplicity. We establish and alternative description of Xv that puts pure constraints on instance 1 first. Pick td1(v) independent vectors Q1,,Qtcol(Qv)W1 and extend them arbitrarily to a basis {Qj}j[d] of Qv. As each vector of this basis can be expressed as a linear combination of {Qu}upath(v), it is possible to apply those linear combinations to bv and obtain values {bj}j[d] such that Xv={x𝒳j[d]:x,Qj=bj}. The set Yv{0,1}m1 of inputs that can reach node v in a run of 𝖤𝗑𝗍1(T) thus corresponds to

Yv{y{0,1}m1j[t]:y,Qj1=bj}.

If yYv, the statement follows directly as both probabilities are zero. However, if yYv,

Pr𝝃[𝖤𝗑𝗍1(T) reaches node v in its execution]=2d+t.

This is so because a node v can only be reached by having the “right” dt coin tosses of 𝝃 (provided that yYv). Thus, it remains to show that Pr𝒙[𝒙Xv]=2d+t if yYv.

Let m=i[k]mi and S={m1+1,,m} be the indices of the bits of every copy but the first one. Fix the m×(dt) boolean matrix A=[Qt+1Qd] and observe that rank(A)=dt by construction. We show that rank(AS)=dt too. If rank(AS)<rank(A), we can find a non-empty set J{t+1,,d} such that jJ(Qj)S=0. This implies that QjJQjWicol(Qv). But Q is linearly independent of {Q1,,Qt} – this contradicts dim(col(Qv)Wi)=t. Therefore, if yYv, we use this observation to conclude:

Pr𝒙[𝒙Xv] =Pr𝒙[j[d]:𝒙,Qj=bj]
=Pr𝒙[𝒙TA=(bj)t+1jd]
=Pr𝒛(𝒙2,,𝒙k)[𝒛TAS=(bj+y,Qj1)t+1jd]
=2rank(AS)
=2d+t.

5.2 Proof of Theorem 18

We are now ready to show Theorem 18. Let 𝒯 be a randomised parity decision tree which witnesses CSε(fk,μk). For each i[k], define the randomized decision tree 𝒯i:{0,1}n{0,1} with:

  1. 1.

    Sample 𝑻𝒯.

  2. 2.

    Sample 𝝆1,,𝝆i1,𝝆i+1,,𝝆kμ.

  3. 3.

    Let 𝝆~(𝝆1,,𝝆i1,n,𝝆i+1,,𝝆k).

  4. 4.

    Return 𝖤𝗑𝗍i(T𝝆~).

We show in Lemma 23 that errf(𝒯i,μ)ε simultaneously for all i[k]. On the other hand, we show in Lemma 24 that i[k]sq¯(𝒯i,μ)C. By an averaging argument, this shows the existence of a copy i[k] with cost C/k and therefore Sε(f,μ)C/k. The remainder of this section is devoted to proving both claims.

Lemma 23.

For every i[k], errf(𝒯i,μ)errfk(𝒯,μk).

Proof.

It is enough to prove the statement assuming 𝒯 is a deterministic parity tree T and i=1. Let be the distribution of 𝝆~ in the step 3 of generating 𝒯1. Fix some ρsupp() and note that ρ1=n. We also define 𝒰1𝒰ρ2××𝒰ρk. Using Claim 22 on a leaf (Tρ) yields:

Pr𝒚,𝝃[𝖤𝗑𝗍1(Tρ) reaches  on 𝒚L1()f(𝒚)]
= 𝔼𝒚[Pr𝝃[𝖤𝗑𝗍1(Tρ) reaches  on 𝒚]𝟙[L1()f(𝒚)]]
= 𝔼𝒚[Pr𝒙1𝒰1[(𝒚,𝒙1)X]𝟙[L1()f(𝒚)]]
= Pr𝒙μ×𝒰1[𝒙XL1()f(𝒙1)].

Thus:

errf(𝒯1,μ) =𝔼𝝆~[Pr𝒚μ,𝝃[𝖤𝗑𝗍1(T𝝆~)(𝒚)f(𝒚)]]
=𝔼𝝆~[(T𝝆~)Pr𝒙μ×𝒰1[𝒙XL1()f(𝒙1)]]
𝔼𝝆~[(T𝝆~)Pr𝒙μ×𝒰1[𝒙XL()f(𝒙)]]
=𝔼𝝆~[errfk(T𝝆~,μ×𝒰1)].

Observe now that for any xsupp(μ×𝒰1), we have T𝝆~(x)=T(x). Using the definition of μ thus yields:

errf(𝒯1,μ)𝔼𝝆~[errfk(T𝝆1,μ×𝒰1)]=errfk(T,μk).

Lemma 24.

i[k]sq¯(𝒯i,μ)sq¯(𝒯,μk).

Proof.

It is sufficient to prove this for the case where 𝒯 is a deterministic tree T. We have:

i[k]sq¯(𝒯i,μ)= i[k]𝔼𝝆iμ[q¯((𝒯i)𝝆i,𝒰𝝆i)]
= i[k]𝔼𝝆iμ𝝆~[q¯((𝖤𝗑𝗍i(T𝝆~))𝝆i,𝒰𝝆i)]
= i[k]𝔼𝝆μk[q¯(𝖤𝗑𝗍i(T𝝆),𝒰𝝆i)]
= 𝔼𝝆μk[i[k]q¯(𝖤𝗑𝗍i(T𝝆),𝒰𝝆i)].

where the third equality is due to the fact that the operations of applying 𝖤𝗑𝗍 and fixing variables are commutable. Let ρ({0,}n)k be a partial fixing and (Tρ). The probability that node is visited during the process 𝖤𝗑𝗍i(Tρ) when the input is 𝒙i𝒰ρi is 2d(). Observe that 𝖤𝗑𝗍i(Tρ) only makes di() queries to 𝒙1 to reach . As such, we have:

i[k]q¯(𝖤𝗑𝗍i(T𝝆),𝒰𝝆i) =i[k](T)2d()di()
(T)2d()d()
=q¯(Tρ,𝒰ρ).

The inequality is due to the fact that i[k]di(v)d(v). This is because dim(WiWj)=0 for each ij and so

i[k]di(v)=i[k]dim(col(Qv)Wi)dim(col(Qv))=d(v).

To conclude, we have

i[k]sq¯(𝒯i,μ)=𝔼𝝆μk[i[k]q¯(𝖤𝗑𝗍i(T𝝆),𝒰𝝆i)]𝔼𝝆μk[q¯(T𝝆,𝒰𝝆)]=sq¯(T,μk).

6 Direct sum for D× part III: from S to D×

In this section, we show how to convert parity tree of the Sε model to the more common D¯ε model and prove Theorems 19 and 20. Let us fix for this section a boolean function f:{0,1}n{0,1} together with some 0-biased product distribution μ over {0,1}n. Let T be a deterministic parity tree trying to solve f against μ. We begin by establishing an alternative view of the quantity sq¯(T,μ). For any fixed x{0,1}n, define the product distribution Rμx over {0,}n with:

Pr𝝆μx[𝝆i=]={δi/(2δi)if xi=01if xi=1whereδi2Pr𝒙μ[𝒙i=1][0,1]. (3)

Sampling 𝝆Rμ, 𝒙𝒰𝝆 and completing 𝒙j=0 for all 𝝆=0 is equivalent to first sampling 𝒙μ and then some 𝝆Rμ𝒙. One can therefore see the process of sq¯(T,μ) as follows:

  1. 1.

    Sample 𝒙μ, 𝝆Rμ𝒙.

  2. 2.

    Run T on 𝒙.

  3. 3.

    Every time T attempts to make a query, check if 𝝆 simplifies the query: 𝝆i=0𝒙i=0.

We describe this alternative view in detail in Algorithm 2. With this new interpretation, we can recast the quantity sq¯(T,μ) with

sq¯(T,μ)=𝔼𝒙μ,𝝆Rμ𝒙[Number of times line 4 is executed in Algorithm 2]. (4)

The idea to convert Sε algorithms to D¯ε ones is to simulate the process of Algorithm 2 by maintaining an incomplete but consistent view p{0,,?}n of ρ. Initially, p=?n – i.e. nothing is known about ρ – and we gradually update p based on the queries we get. For instance, if xi=1, then ˜3 asserts ρi=. This scheme helps to relate the cost of the converted D¯ε algorithm with sq¯(T,μ). The description of the converted algorithm is given in Algorithm 3.

Algorithm 2 an alternative view of sq¯(T,μ).
Algorithm 3 converts an algorithm T for Sε to D¯ε.
Definition 25.

Let p{0,,?}n be a fixing. The following are subsets of indices:

Sp={j[n]:pj=}S0p={j[n]:pj=0}S?p={j[n]:pj=?}S0p=SpS?p

We also write S(p,) to mean Sp and likewise for other sets.

Let Pv{0,,?}n be the set of all possible p that could be at the start of an iteration of Algorithm 3 at node v. We now prove an invariant of Algorithm 3 and then its correctness.

Lemma 26.

For any state v𝒩(T) and pPv that Algorithm 3 could be in at the start of a while iteration (line 3), it holds that:

rank(QS(p,0)v)=rank(QS(p,)v)=|S(p,)|.
Proof.

We prove the claim by induction on T. The statement is true when v is the root because both Qv and Sp are empty. Let us now assume that the statement is true for some v and pPv and prove that the invariant carries over to the next iteration regardless of the query outcomes and the randomness 𝜼 of the process. If p is the updated value of p at line 19, this amounts to showing that rank(QS(p,0)v)=rank(QS(p,)v)=|S(p,)|. We consider three cases.

Case 𝑫𝒗,𝒑=.

Then, there is no update for p and p=p. Since rank(QS(p,)v)=rank(QS(p,)+jv) for all jS(p,0), we have rank(QS(p,0)v)=rank(QS(p,)v)=|S(p,)|, as desired.

Case 𝑫𝒗,𝒑 and 𝒑𝒋=𝟎 for all 𝒋𝑫𝒗,𝒑.

Then, Sp=Sp and S(p,0)=S(p,0)Dv,p. By definition of Dv,p, we still have rank(QS(p,)v)=rank(QS(p,)+jv) for all jS(p,0), so rank(QS(p,0)v)=rank(QS(p,)v)=|S(p,)|.

Case 𝑫𝒗,𝒑 and 𝒑𝒋= for some 𝒋𝑫𝒗,𝒑.

Then Sp=Sp+j and it must hold that rank(QS(p,)v)=|S(p,)|. On the other hand,

rank(QS(p,0)v)rank(QS(p,0)v)+1=|S(p,)|+1=|S(p,)|.

Where the inequality follows from the fact that S(p,0)S(p,0). Finally, this implies rank(QS(p,0)v)=rank(QS(p,)v)=|S(p,)|.

Lemma 27.

For any x{0,1}n, Pr𝛈[Algorithm 3 outputs 1]=𝟙[T(x)=1].

Proof.

It is not hard to see that if Algorithm 3 gets the correct value of x,Qv at each iteration of the while loop, it perfectly simulates T. Thus, it suffices to show that whenever Dv,p=, the algorithm can compute the value of x,Qv from the previous query outcomes. Lemma 26 and its proof implies that if Dv,p=, then rank(QS(p,0)v)=rank(QS(p,0)v)=|S(p,)|. Thus QS(p,0)v can be written as a linear combination of column vectors of QS(p,0)v. Namely, QS(p,0)v=j[t]QS(p,0)vj, where v1,,vt are some ancestors of v. On the other hand, we know that xj=pj=0 for all jS0p. Consequently, we have

x,Qv=x,QvS(p,0)=j[t]x,QvjS(p,0)=j[t]bvj.

Thus, Algorithm 3 follows the same path of vertices as T, irrespective of the randomness 𝜼. Consequently, its outputs correspond to the one of T. We now turn our attention to the efficiency of Algorithm 3. We shall start with the special case of μ being a constant-bounded distribution. In this particular case, we obtain a lossless conversion. We then turn our attention to general product distributions, for which Algorithm 3 suffers a log(n) factor. This loss factor is inherent to reducing Sε to D¯ as Section 8 shows.

6.1 Conversion for constant-bounded distribution

We now prove a strong efficiency result for Algorithm 3 in the special case where μ is λ-bounded (see Definition 13). A proof of our goal (Theorem 20) then follows easily.

Lemma 28.

We have q¯(Algorithm 3 on T,μ)(2/λ)sq¯(T,μ).

Before proving this, we need an alternative view of the randomness used in the for-loop of Algorithm 3 (line 8 to 16). At the start of the process, a random partial fixing 𝝆μx is generated. The algorithm is then deterministic: whenever some xj is queried in the for-loop, this is replaced by a query to 𝝆j. The algorithm updates pj with 𝝆j and exits the loop if 𝝆j=. This process is given in detail in Algorithm 4.

Algorithm 4 an alternative view of Algorithm 3 where the randomness is fixed at the start.

Note that as Rμx is a product distribution, one can actually implement Algorithm 4 without querying all of x at the start. Indeed, it is enough to query xj whenever one needs the value of 𝝆j, similarly to Algorithm 3. This implies that both processes are equivalent.

Suppose one runs Algorithm 4 on 𝒙μ and 𝝆Rμ𝒙. Fix some state (v,p) the algorithm could be in at the start of the while loop (line 5). We let 𝒳v,p be the distribution of 𝒙 conditioned on reaching state (v,p). Furthermore, for a fixed x{0,1}n and (v,p) reachable with x we let v,p,x be the marginal distribution of 𝝆 conditioned on reaching state (v,p) and 𝒙=x. We now develop explicit formulations for those distributions.

Explicit definition of 𝓧𝒗,𝒑

Let 𝒳^v,p be the distribution over {0,1}n defined as follows:

  1. 1.

    For all jS0p, fix 𝒙j=0.

  2. 2.

    For all jS?p, sample 𝒙j𝖡𝖾𝗋(δj/2).

  3. 3.

    Determine {𝒙j:jSp} by solving {x,QuS(p,)=x,QuS(p,)+bu}upath(v)

Explicit definition of 𝓡𝒗,𝒑,𝒙

Let ^p,x be the product distribution over {0,}n defined as follows:

  1. 1.

    For all jS?p such that xj=0, let 𝝆j= with probability δj/(2δj) and 𝝆j=0 else.

  2. 2.

    For all jS?p such that xj=1, fix 𝝆j=.

  3. 3.

    For all jS(p,?), fix 𝝆j=pj.

Claim 29.

For every reachable state (v,p) and xsupp(𝒳v,p) in Algorithm 4, we have

  1. 1.

    v,p,x^p,x;

  2. 2.

    𝒳v,p𝒳^v,p.

We delay the proof of this technical lemma to Section A.5. We can now prove the efficiency of our algorithm for λ-bounded distributions.

Proof of Lemma 28.

To relate Algorithm 2 with Algorithm 4, it is helpful to insert the book-keeping of p in Algorithm 2 (lines 5 to 16, without 10) in between Algorithms 2 and 4 of Algorithm 2. This doesn’t change the number of queries or guarantees of Algorithm 2 but now both processes share the same state space over (v,p). For x{0,1}n and ρ{0,}n, define A(x,ρ) and B(x,ρ) as the number of queries each process makes:

A(x,ρ) number of times line 4 is executed in Algorithm 2 on input (x,ρ);
B(x,ρ) number of times Algorithms 4 and 4 are executed in Algorithm 4 on input (x,ρ).

Using ˜4, it is thus enough to prove that 𝔼𝒙,𝝆[A(𝒙,𝝆)]Ω(λ)𝔼𝒙,𝝆[B(𝒙,𝝆)] when 𝒙μ and 𝝆Rμ𝒙. We have:

𝔼𝒙,𝝆[A(𝒙,𝝆)]=(v,p)Pr𝒙,𝝆[(v,p) is reached]Pr𝒙𝒳v,p𝝆v,p,𝒙[rank(QS(𝝆,0)v)=rank(QS(𝝆,0)v)+1].

As both algorithms follow the same path in the state space, this expectation can be computed with respect to the code of Algorithm 4. Fix some state (v,p) and observe that if there exists some jDv,p such that 𝝆j=, then by Lemma 26,

rank(QS(𝝆,)v)=rank(QS(p,)+jv)=rank(QS(p,)v)+1=rank(QS(𝝆,0)v)+1.

Therefore, for 𝒙𝒳v,p and 𝝆v,p,𝒙, we have

Pr𝒙,𝝆[rank(QS(𝝆,0)v)=rank(QS(𝝆,0)v)+1] Pr𝒙,𝝆[jDv,p:𝝆j=]
=1Pr𝒙,𝝆[jDv,p:𝝆j=𝒙j=0].

The last equality is due to the fact that for all jDv,p, if 𝝆j=0 then 𝒙j=0. Let DDv,p. We can now substitute 𝒳^v,p for 𝒳v,p and ^p,𝒙 for v,p,𝒙 using Claim 29:

Pr𝒙,𝝆[jD:𝝆j=𝒙j=0]
= Pr𝒙,𝝆[jD:𝒙j=0]Pr𝒙,𝝆[jD:𝝆j=jD:𝒙j=0]
= jD(1δj/2)jD22δj2δj
= jD(1δj)
(1λ)|D|.

Thus, if 𝒙μ and 𝝆Rμ𝒙, we have

𝔼𝒙,𝝆[A(𝒙,𝝆)](v,p)Pr𝒙,𝝆[state (v,p) is reached](1(1λ)|Dv,p|).

We now bound the expected number of queries made by 𝒯. When Dv,p=, 𝒯 skips making a query at v. On the other hand, when Dv,p, the algorithm goes over jDv,p and stops making queries as soon as it hits some ρj=. This probability is independent for each jDv,p and can be computed explicitly using Claim 29. For 𝒙𝒳v,p and 𝝆v,p,x:

Pr𝒙,𝝆[𝝆j=]=Pr𝒙[𝒙j=0]Pr𝒙,𝝆[ρj=𝒙j=0]+Pr𝒙[𝒙j=1]Pr𝒙[𝝆j=𝒙j=1]=δjλ.

Therefore, if 𝒙μ and 𝝆Rμ𝒙,

𝔼𝒙,𝝆[B(𝒙,𝝆)]
(v,p)Pr𝒙,𝝆[state (v,p) is reached](𝟙[Dv,p]+j=0|Dv,p|1(1λ)j)
(v,p)Pr𝒙,𝝆[state (v,p) is reached](𝟙[Dv,p]+(1(1λ)|Dv,p|)/λ)
(v,p)Pr𝒙,𝝆[state (v,p) is reached](2/λ)(1(1λ)|Dv,p|).

With this in hand, we can now prove Theorem 20, which we restate below for convenience. See 20

Proof.

Let 𝒯 be a randomised parity tree such that sq¯(𝒯,μ)=Sε(f,μ) and errf(𝒯,μ)ε. Define 𝒯 to be the randomised algorithm obtained by sampling 𝑻𝒯 and returning Algorithm 3 applied to 𝑻. Using Lemma 27, we immediately obtain that err(𝒯,μ)ε. On the other hand:

q¯(𝒯,μ)=𝔼𝑻[q¯(Algorithm 3 on 𝑻,μ)](2/λ)𝔼𝑻[sq¯(𝑻,μ)]=(2/λ)Sε(f,μ).

Thus, D¯ε(f,μ)O(1/λ)Sε(f,μ), as desired.

6.2 Conversion for general product distribution

Algorithm 3 is not efficient for arbitrary product distribution since queries can be very biased so that jDv,p(1δj)=1o(1). In such cases, we cannot even afford to pay one query as the corresponding expected increment for sq¯ is o(1).

To overcome this obstacle, we introduce the following idea. Run the algorithm as if every query xj returned 0, i.e. assuming xj=𝝆j=0 for all jS(p,?) (this is likely to happen for very biased distributions). This generates a list of indices for which we assume xj=0. Upon reaching a leaf, we check efficiently whether one of those xj is actually 1. If no such j exists, we’re done – at the cost of no real queries! On the other hand, if a 1 is found, we backtrack to this state and restart the procedure. Since we’ve found xj=1, it must be that ρj= and the Sε algorithm has to pay one query there.

The process 𝖡𝗎𝗂𝗅𝖽𝖫𝗂𝗌𝗍 that “runs assuming xj=0” and produces a list of indices to check is described in Algorithm 6. Then, the updated algorithm for converting an Sε algorithm to a D¯ε one is formulated in Algorithm 5.

Algorithm 5 converts an algorithm for Sε to D¯ε for general product distributions.
Algorithm 6 the subroutine 𝖡𝗎𝗂𝗅𝖽𝖫𝗂𝗌𝗍.
How to run line 4?

This problem can be formulated as follows. Let FFOn:{0,1}n[n] be the search problem that asks for the index of the first (running from left to right) ’1’ in x or if x=0n. Even though a simple adversary argument shows that one cannot perfectly compute FFOn by making <n parity queries, a folklore result [24, 44, 30], proves that there is a randomised protocol making O(logn) queries that computes FFOn with some small error.

Lemma 30.

For any α>0, Rα(FFOn)O(logn+log(1/α)).

Proof.

This folklore fact is discussed for the parity context in Section A.4. We let 𝒯γ be the parity tree obtained by running Algorithm 5 with error parameter αγ/n on line 4. Given two indices i,jJ, we say iJj if i appears strictly earlier than j in J, and iJj if iJj or i=j. Fix any xsupp(𝒳v,p). Let i denote the first index i in J such that xi=1 and suppose that i is added to J when u=u. Observe that if such i exists, xj=0 for all jJi. As a consequence, we know that u must be reached. Moreover, we can immediately get the values of 𝝆j by flipping biased coins for all jJi. Therefore, given i, one can perfectly simulate Algorithm 3 by going over J and updating p, until finding the first index jJi such that 𝝆j=. We are now ready to prove the correctness and efficiency of 𝒯γ.

Lemma 31.

For any fixed x{0,1}n, Pr[𝒯γ(x)=1]𝟙[T(x)=1]±γ.

Proof.

The randomness of 𝒯γ stems from 𝜼 and the randomness involved in running the FFO algorithm at line 4. To analyse the latter, observe that line 4 is called at most n times and each call fails with probability at most α=γ/n, hence:

dTV(𝒯0(x),𝒯γ(x))Pr[at least one oracle call at line 4 gives a wrong index]nγn=γ.

If no call fails the discussion above implies that Algorithm 5 behaves identically to the earlier Algorithm 3. Hence, correctness of the former (Lemma 27) implies Pr[𝒯0(x)=1]=𝟙[T(x)=1].

Lemma 32.

We have q¯(𝒯γ,μ)O(log(n/γ))(sq¯(T,μ)+1)+γn.

Proof.

We first prove that the expected number of iterations of the outer while-loop is low assuming that the algorithm always gets the correct index i at line 4. Similar to what we did in Section 6.1, we view the randomness used in the for-loop (line 6 to 15) in Algorithm 5 as a pre-generated partial assignment 𝝆μx. Note that the bits of 𝝆 are independent. If i is the first index in J with xi=1, we know that xi=1 and xj=0 for all jJi. At the same time, 𝝆j for all jJi are revealed to the algorithm one by one. As soon as some 𝝆j= is found, the algorithm quits the loop.

For each x{0,1}n and ρsupp(μx), consider running 𝒯γ on input x using randomness ρ. Define K(x,ρ) as the number of iterations of the outer while loop when 𝒯γ always gets the correct i on line 4. Let p denote the final state of p. Since in each iteration except for the last one, we update some pj as , we have K(x,ρ)|S(p,)|+1. By Lemma 26, we further have

K(x,ρ)rank(QS(p,)(x))+1=rank(QS(p,0)(x))+1,

where (x)(T) is the unique leaf at which T terminates given x. Since for all pj?, pj=ρj, we have SpSρS0p, hence K(x,ρ)rank(QS(ρ,)(x))+1. On the other hand, by definition we have

sq¯(T,μ)=𝔼𝒙μ𝝆μ𝒙[rank(QS(𝝆,)(𝒙))]𝔼𝒙μ𝝆μ𝒙[K(𝒙,𝝆)]sq¯(T,μ)+1.

Lemma 30 asserts that line line 4 can be implemented to error γ/n using O(log(n/γ)) parity queries. Since all those calls are completed successfully with probability γ, we finally have:

q¯(𝒯γ,μ)(1γ)𝔼𝒙μ𝝆μx[K(𝒙,𝝆)]O(log(n/γ))+γnO(log(n/γ))(sq¯(T,μ)+1)+γn.

See 19

Proof.

Let 𝒯 be a randomised parity decision tree such that sq¯(𝒯,μ)=Sε(f,μ) and errf(𝒯,μ)ε. Define 𝒯 to be the randomised algorithm obtained by sampling 𝑻𝒯 and returning the corresponding 𝒯γ. Using Lemma 31, we immediately obtain that err(𝒯,μ)ε+γ. By Lemma 32 and the range of parameters allowed for γ, we get

q¯(𝒯,μ)=𝔼𝑻[q¯(𝒯γ,μ)]O(log(n/γ))𝔼𝑻[sq¯(T,μ)+1]=O(log(n)/γ)(sq¯(𝒯,μ)+1).

7 Separations I: disc vs. D×

In this section we prove Lemma 3, restated here for convenience. See 3

Proof.

For the first item, we can consider the n-bit majority function fMAJn. It follows from [14, Theorem 1.2] that D×(MAJn)Ω(n) where the hard distribution is uniform. By contrast, it is not hard to see that disc(MAJn)O(logn) (if we query xi for a random i[n], it will have bias Ω(1/n) toward predicting MAJn(x)). We prove the second item by a probabilistic argument. Consider a random function 𝒇, which is set with 𝒇(x)𝖡𝖾𝗋(20.9n) independently for each x{0,1}n. In Claim 33, we show that disc(𝒇)=Θ(n) and in Claim 34 that D×(𝒇)=O(1) with high probability.

Claim 33.

With probability 122Ω(n), disc(𝒇)0.01n.

Proof.

For each non-constant function f:{0,1}n{0,1}, we define the “hard” distribution μf as

μf(x){1/(2|f1(0)|)if f(x)=01/(2|f1(1)|)if f(x)=1.

To prove the claim, it suffices to show Pr𝒇[disc(𝒇,μ𝒇)0.01n]122Ω(n). Using Lemma 9, this can be further simplified to prove:

Pr𝒇[maxS𝒪nbias(𝒇,μ𝒇,S)20.01n1]122Ω(n).

To that end, fix any S𝒪n, note that |S|=|{0,1}S|=2n1 and observe that by a Chernoff bound,

Pr𝒇[|μ(𝒇1(1)S)1/4|20.02n]] Pr𝒇[|𝒇1(1)|<20.1n1]
+Pr𝒇[||𝒇1(1)S|20.1n1|>20.07n]
+Pr𝒇[||𝒇1(1)S|20.1n1|>20.07n]
3e20.03n.

Using a similar argument, we can also show Pr𝒇[|μ(𝒇1(0)S)1/4|20.02n]3e20.03n. By definition, bias(𝒇,μ𝒇,S)=|μ(𝒇1(0)S)|μ(𝒇1(1)S)|, we thus have Pr[bias(𝒇,μ𝒇,S)20.01n1]6e20.03n. Finally, observe that |𝒪n|2n and so using a union bound,

Pr𝒇[disc(𝒇)0.01n] Pr𝒇[maxS𝒪nbias(𝒇,μ𝒇,S)20.01n1]
12nPr[bias(𝒇,μ𝒇,S)20.01n1]
122Ω(n).

Claim 34.

With probability 12Ω(n), D×(f)20000.

Proof.

Let 𝒟×{𝖡𝖾𝗋(p1,,pn)p1,pn[0,1/2]} denote the set of 0-biased product distributions, where 𝖡𝖾𝗋(p1,,pn)𝖡𝖾𝗋(p1)××𝖡𝖾𝗋(pn). As observed in Section 4, it suffices to show Pr𝒇[maxμ𝒟×D1/3(𝒇,μ)20000]12Ω(n).

As a first attempt, one might want to prove that D1/3(f,μ)=O(1) with sufficiently high probability for any fixed μ and then apply union bound over all μ𝒟×. However, this cannot be done directly since 𝒟× is infinite. Luckily, we can circumvent this barrier by discretizing 𝒟×. Let us define 𝒟×{𝖡𝖾𝗋(a1/10n,,an/10n)a1,,an{0,,5n}}. For every μ=𝖡𝖾𝗋(p1,,pn)𝒟× and f:{0,1}{0,1}, consider the following two cases:

  • If ipi10, then Mmaxx{0,1}nμ(x)eipi1000ipi. Observe that

    Pr𝒇[x{0,1}f(x)μ(x)1/5]2M(20.9n)M/52150ipin,

    thus Pr𝒇[D1/4(𝒇,μ)=0]12150ipin.

  • Otherwise, we devise the following protocol: Sort μ(x1)μ(x2n). Pick the top 1000 inputs X={x1,,x1000}, then we check if our input x is in X. If yes, we output f(x), otherwise we output 0. Formally, we define the function g:{0,1}n{0,1} where

    g(x){f(x)if xX0if xX.

    Since testing whether x=xi can be done with m queries with success probability 12m, by choosing m=20 and running the testing for every i[1000], one can show 0.01(g)20000. It remains to prove that Pr𝒇[𝒇(x)=𝒈(x)]4/5 with high probability. Observe that for each xX, μ(x)1/1000. Therefore:

    Pr𝒇[xX[μ(x)𝒇(x)]1/5]121000(20.9n)20012150n.

    For those 𝒇, we have Pr𝒇[𝒇(x)=𝒈(x)]4/5, which implies that D0.22(𝒇,μ)20000.

By union bound over μ𝒟×, we can deduce that

Pr𝒇[maxμ𝒟×D0.22(𝒇,μ)>20000]
μ𝒟×Pr𝒇[D0.22(𝒇,μ)>20000]
a1=05nan=05n𝟙[iai100n]e150iai
+a1=05nan=05n𝟙[iai<100n]2150n
a1=05nan=05ne100(a1+5)e100(an+5)+2101n2150n
(a1=05ne100(a1+5))n+249n
2Ω(n).

Consider now any product distribution μ=𝖡𝖾𝗋(p1,,pn)𝒟×, define its rounded version μ:

μ(𝖡𝖾𝗋(10np110n),,𝖡𝖾𝗋(10npn10n))𝒟×.

Observe that dTV(μ,μ)1(11/10n)n11/e1/10<0.1, thus we have errf(T,μ)errf(T,μ)+0.1 for any parity tree T and f:{0,1}n{0,1}. Together with the string of inequalities developed above, we conclude that with probability at least 12Ω(n),

maxμ𝒟×D1/3(𝒇,μ)maxμ𝒟×D1/30.1(𝒇,μ)maxμ𝒟×D0.22(𝒇,μ)20000.

8 Separations II: S vs. D×

The goal of this section is to provide the following example of a function.

Theorem 35.

There exists a function f:{0,1}n{0,1} and a product distribution μ such that D×(f)=Θ(disc(f,μ))=Θ(logn) and S0(f,μ)=Θ(1).

Recall that by Theorem 19, this is the largest possible gap between S and D×. To prove the separation, we use the function FPE:{0,1}2n{0,1} which takes two inputs x,y{0,1}n and returns the value yi associated with the location i of the first “1” in x. More precisely, we let FO(x)[n] be the location (from left to right) of the first “1” in x and FO(x)=1 if x=0n and let FPE(x,y)=yFO(x). We choose as hard distribution the product distribution μ𝒳×𝒴 where for each i[n]:

𝒳i𝖡𝖾𝗋(1/n)and𝒴i𝖡𝖾𝗋(1/2).

Let us note that the choice of 1/n in the distribution 𝒳 is arbitrary: any p=na for constant a(1,0) is enough to guarantee that x0n with high probability and get the Ω(logn) lower bound.

Proof of Theorem 35.

We first prove that S0(FPE,μ)=Θ(1). Consider the following simple brute-force query algorithm T that computes f: Query the bits of x one by one from left to right, until finding the first index i such that xi=1. Then query yi and return yi if such i exists. Otherwise (x=0n), simply return 1.

Observe that errFPE(T,μ)=0. Thus we only need to show sq¯(f,μ)=Θ(1). Let Xi{xxi=1,xj=0,j<i} denote the set of x{0,1}n for which FO(x)=i. Note that {0,1}n=X1Xn{0n} forms a partition of {0,1}n. By the definition of μ, we have μ(Xi)=(11/n)i1/n. For all xXi, T queries the same set of variables {x1,,xi1,xi,yi} on x. Moreover, sample 𝝆μx and for each 1j<i, since xj=0, we have that Pr[𝝆j=]=1/(n1). Therefore,

h(x)𝔼𝝆μx[q(T𝝆,x)]i1n1+2.

We conclude that

sq¯(T,μ) =𝔼𝒙μ[h(𝒙)]
i=1nμ(Xi)𝔼𝒙μXi[h(𝒙)]+(n+1)μ(0n)
1nni=1n(i1)(11/n)i1+n(11/n)n+2
<2ni=0i(11/n)i+3
=Θ(1).

Let us now turn our attention to disc(FPE,μ). The lower-bound disc(FPE,μ)Ω(logn) is covered in Claim 36. The upper bound disc(FPE,μ)O(logn) is a direct consequence of bias(FPE,μ,S)n1/2 for S={(x,y){0,1}n:y1=1}. More interestingly, one can actually show the stronger statement D1/3(f,μ)O(logn). Indeed, 𝒙𝒳 has exactly one “1” in the first n bits with probability e1.011/3 for n large enough. In that case, a simple binary search amongst the first n bits of x using parity queries is enough to find that location and return the corresponding bit of y.

Claim 36.

disc(FPE,μ)Ω(logn)

Proof.

Using the characterisation of the bias with codimension-1 subspace Lemma 9, it is enough to show:

maxS𝒪nbias(FPE,μ,S)n1/3.

Fix an affine space S of codimension 1 that maximize the above expression, i.e. some α,β{0,1}n and γ{0,1} such that S={(x,y){0,1}2n:αx+βy=γ}. To simplify notation, we assume in what follows that γ=0 but the proof is similar for the case γ=1. Let us partition S in two sets:

S0 {(x,y){0,1}2n:αx=0 and βy=0};
S1 {(x,y){0,1}2n:αx=1 and βy=1}.

We have:

maxS𝒪nbias(FPE,μ,S)=bias(FPE,μ,S)bias(FPE,μ,S0)+bias(FPE,μ,S1).

Let us suppose without loss of generality that bias(FPE,μ,S0)bias(FPE,μ,S1) so that it is enough to show bias(FPE,μ,S0)2n1/2. Note that if Pr𝒙,𝒚μ[(𝒙,𝒚)S0]=0, we’re done. If not, we can re-express the bias in the language of probability:

bias(FPE,μ,S0) =|(x,y)S0(1)FPE(x)μ(x)|
=|b{0,1}(1)bPr𝒙,𝒚[FPE(x)=b(𝒙,𝒚)S0]|
=Pr𝒙,𝒚[(𝒙,𝒚)S0]|b{0,1}(1)bPr𝒙,𝒚[FPE(𝒙)=b(𝒙,𝒚)S0]|.

Let us denote the quantity within the absolute value by Φ and the event (𝒙,𝒚)S0 by E. Observe that S0 can be conveniently decomposed as S0=SX×SY where SX{x{0,1}n:αx=0} and SY{y{0,1}n:βy=0}. With this, we have:

Φ =b{0,1}(1)bPr𝒙,𝒚[FPE(𝒙,𝒚)=b|E]
=i[n]b{0,1}(1)bPr𝒙,𝒚[FO(𝒙)=i|E]Pr𝒙,𝒚[FPE(𝒙,𝒚)=b|EFO(𝒙)=i]
=i[n]Pr𝒙𝒳[FO(𝒙)=i|𝒙SX]b{0,1}(1)bPr𝒚𝒴[𝒚i=b|𝒚SY]pib.

Recall that SY is a codimension-1 space and 𝒴 is the uniform distribution over {0,1}n. Thus, if |α| (the number of non-zero entries in α) is zero or 2, it must be that pib=1/2 for all i[n] and b{0,1}. In that case, the claim is proven because Φ=0 and so bias(FPE,μ,S0)=0. We can thus assume that |α|=1 and fix i[n] to be the unique coordinate such that αi=1. Now, observe that pib=1/2 for all ii and b{0,1}n, pi0=1 and pi1=0 so that:

Φ=i[n]Pr𝒙𝒳[FO(𝒙)=i|𝒙SX](pi0pi1)=Pr𝒙𝒳[FO(𝒙)=i|𝒙SX].

Finally, we use the fact that the event FO(𝒙)=i with 𝒙𝒳 is unlikely to happen if SX has large mass under 𝒳. In any case, the probability is maximized for i=1 and hence:

bias(FPE,μ,S0) =Pr𝒙𝒳[𝒙SX]Pr𝒚𝒴[𝒚SY]|Φ|
Pr𝒙[FO(𝒙)=i𝒙SX]
Pr𝒙[FO(𝒙)=1].

The event FO(𝒙)=1 can happen because 𝒙1=1 or 𝒙=0n, thus we bound the bias with

Pr𝒙𝒳[FO(𝒙)=1]Pr𝒙[𝒙1=1]+Pr𝒙[𝒙=0n]n1/2+en2n1/2.

References

  • [1] Yaroslav Alekseev, Yuval Filmus, and Alexander Smal. Lifting Dichotomies. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1–9:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.9.
  • [2] Yaroslav Alekseev and Dmitry Itsykson. Lifting to bounded-depth and regular resolutions over parities via games. Technical Report TR24-128, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/128/.
  • [3] Laszlo Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pages 337–347, 1986. doi:10.1109/SFCS.1986.15.
  • [4] Boaz Barak, Mark Braverman, Xi Chen, and Anup Rao. How to compress interactive communication. SIAM Journal on Computing, 42(3):1327–1363, 2013. doi:10.1137/100811969.
  • [5] Paul Beame and Sajin Koroth. On Disperser/Lifting Properties of the Index and Inner-Product Functions. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ITCS.2023.14.
  • [6] Shalev Ben-David and Eric Blais. A tight composition theorem for the randomized query complexity of partial functions: Extended abstract. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 240–246, 2020. doi:10.1109/FOCS46700.2020.00031.
  • [7] Shalev Ben-David and Eric Blais. A new minimax theorem for randomized algorithms. J. ACM, 70(6), 2023. doi:10.1145/3626514.
  • [8] Shalev Ben-David, Eric Blais, Mika Göös, and Gilbert Maystre. Randomised Composition and Small-Bias Minimax . In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 624–635. IEEE Computer Society, 2022. doi:10.1109/FOCS54457.2022.00065.
  • [9] Shalev Ben-David, Mika Göös, Robin Kothari, and Thomas Watson. When Is Amplification Necessary for Composition in Randomized Query Complexity? In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020), volume 176 of Leibniz International Proceedings in Informatics (LIPIcs), pages 28:1–28:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.APPROX/RANDOM.2020.28.
  • [10] Shalev Ben-David and Robin Kothari. Randomized query complexity of sabotaged and composed functions. Theory of Computing, 14(5):1–27, 2018. doi:10.4086/toc.2018.v014a005.
  • [11] Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, and Pavel Dvořák. Exponential Separation Between Powers of Regular and General Resolution over Parities. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:32. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.23.
  • [12] Eric Blais and Joshua Brody. Optimal separation and strong direct sum for randomized query complexity. In Proceedings of the 34th Computational Complexity Conference, CCC ’19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.CCC.2019.29.
  • [13] Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan. A Strong Direct Sum Theorem for Distributional Query Complexity. In 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1–16:30. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CCC.2024.16.
  • [14] Mark Braverman, Ankit Garg, Denis Pankratov, and Omri Weinstein. Information lower bounds via self-reducibility. Theory of Computing Systems, 59(2):377–396, 2015. doi:10.1007/s00224-015-9655-z.
  • [15] Mark Braverman and Anup Rao. Information equals amortized communication. IEEE Transactions on Information Theory, 60(10):6058–6069, 2014. doi:10.1109/TIT.2014.2347282.
  • [16] Joshua Brody, Jae Tak Kim, Peem Lerdputtipongporn, and Hariharan Srinivasulu. A strong XOR lemma for randomized query complexity. Theory of Computing, 19(11):1–14, 2023. doi:10.4086/toc.2023.v019a011.
  • [17] Farzan Byramji and Russell Impagliazzo. Lifting to randomized parity decision trees. Technical Report TR24-202, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/202/.
  • [18] Arkadev Chattopadhyay and Pavel Dvorak. Super-critical trade-offs in resolution over parities via lifting. Technical Report TR24-132, ECCC, 2024. URL: https://eccc.weizmann.ac.il/report/2024/132/.
  • [19] Arkadev Chattopadhyay, Nikhil Mande, Swagato Sanyal, and Suhail Sherif. Lifting to Parity Decision Trees via Stifling. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 33:1–33:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ITCS.2023.33.
  • [20] Tsun Ming Cheung, Hamed Hatami, Rosie Zhao, and Itai Zilberstein. Boolean functions with small approximate spectral norm. Discrete Analysis, 2024. doi:10.19086/da.122971.
  • [21] Andrew Drucker. Improved direct product theorems for randomized query complexity. Comput. Complex., 21(2):197–244, 2012. doi:10.1007/s00037-012-0043-7.
  • [22] Klim Efremenko, Michal Garlík, and Dmitry Itsykson. Lower bounds for regular resolution over parities. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, volume 41 of STOC ’24, pages 640–651. ACM, 2024. doi:10.1145/3618260.3649652.
  • [23] Tomás Feder, Eyal Kushilevitz, Moni Naor, and Noam Nisan. Amortized communication complexity. SIAM Journal on Computing, 24(4):736–750, 1995. doi:10.1137/S0097539792235864.
  • [24] U. Feige, D. Peleg, P. Raghavan, and E. Upfal. Computing with unreliable information. In Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, STOC ’90, pages 128–137. Association for Computing Machinery, 1990. doi:10.1145/100216.100230.
  • [25] Yuval Filmus, Edward Hirsch, Artur Riazanov, Alexander Smal, and Marc Vinyals. Proving Unsatisfiability with Hitting Formulas. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), volume 287 of Leibniz International Proceedings in Informatics (LIPIcs), pages 48:1–48:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ITCS.2024.48.
  • [26] Anat Ganor, Gillat Kol, and Ran Raz. Exponential separation of information and communication for boolean functions. J. ACM, 63(5), 2016. doi:10.1145/2907939.
  • [27] Uma Girish, Makrand Sinha, Avishay Tal, and Kewen Wu. Fourier growth of communication protocols for XOR functions. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 721–732, 2023. doi:10.1109/FOCS57990.2023.00047.
  • [28] Uma Girish, Avishay Tal, and Kewen Wu. Fourier growth of parity decision trees. In Proceedings of the 36th Computational Complexity Conference, CCC ’21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.39.
  • [29] Mika Göös and Gilbert Maystre. A majority lemma for randomised query complexity. In Proceedings of the 36th Computational Complexity Conference, CCC ’21. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.18.
  • [30] Nathaniel Harms and Artur Riazanov. Better Boosting of Communication Oracles, or Not. In Siddharth Barman and Sławomir Lasota, editors, 44th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2024), volume 323 of Leibniz International Proceedings in Informatics (LIPIcs). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.FSTTCS.2024.25.
  • [31] Hamed Hatami, Kaave Hosseini, and Shachar Lovett. Structure of protocols for XOR functions. SIAM Journal on Computing, 47(1):208–217, 2018. doi:10.1137/17M1136869.
  • [32] Hamed Hatami, Kaave Hosseini, Shachar Lovett, and Anthony Ostuni. Refuting Approaches to the Log-Rank Conjecture for XOR Functions. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), volume 297 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:11. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.ICALP.2024.82.
  • [33] Dmitry Itsykson and Dmitry Sokolov. Resolution over linear equations modulo two. Annals of Pure and Applied Logic, 171(1):102722, 2020. doi:10.1016/j.apal.2019.102722.
  • [34] Siddharth Iyer and Anup Rao. An XOR lemma for deterministic communication complexity, 2024. doi:10.48550/arXiv.2407.01802.
  • [35] Siddharth Iyer and Anup Rao. XOR lemmas for communication via marginal information. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 652–658. Association for Computing Machinery, 2024. doi:10.1145/3618260.3649726.
  • [36] Rahul Jain, Hartmut Klauck, and Miklos Santha. Optimal direct sum results for deterministic and randomized decision tree complexity. Information Processing Letters, 110(20):893–897, 2010. doi:10.1016/j.ipl.2010.07.020.
  • [37] Rahul Jain, Jaikumar Radhakrishnan, and Pranab Sen. A direct sum theorem in communication complexity via message compression. In Jos C. M. Baeten, Jan Karel Lenstra, Joachim Parrow, and Gerhard J. Woeginger, editors, Automata, Languages and Programming, pages 300–315. Springer Berlin Heidelberg, 2003. doi:10.1007/3-540-45061-0_26.
  • [38] Hartmut Klauck, Robert Špalek, and Ronald de Wolf. Quantum and classical strong direct product theorems and optimal time-space tradeoffs. SIAM Journal on Computing, 36(5):1472–1493, 2007. doi:10.1137/05063235X.
  • [39] A. Knop, S. Lovett, S. McGuire, and W. Yuan. Guest column: Models of computation between decision trees and communication. SIGACT News, 52(2):46–70, 2021. doi:10.1145/3471469.3471479.
  • [40] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, 1993. doi:10.1137/0222080.
  • [41] Troy Lee, Adi Shraibman, and Robert Špalek. A direct product theorem for discrepancy. In 2008 23rd Annual IEEE Conference on Computational Complexity, pages 71–80, 2008. doi:10.1109/CCC.2008.25.
  • [42] Nati Linial and Adi Shraibman. Learning complexity vs. communication complexity. In 2008 23rd Annual IEEE Conference on Computational Complexity, pages 53–63, 2008. doi:10.1109/CCC.2008.28.
  • [43] Nikhil Mande and Swagato Sanyal. On parity decision trees for fourier-sparse boolean functions. ACM Trans. Comput. Theory, 16(2), 2024. doi:10.1145/3647629.
  • [44] Noam Nisan. The communication complexity of threshold gates. Proc. of Combinatorics, Paul Erdős is Eighty, 1993.
  • [45] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. doi:10.1017/CBO9781139814782.
  • [46] Ryan ODonnell, John Wright, Yu Zhao, Xiaorui Sun, and Li-Yang Tan. A composition theorem for parity kill number. In 2014 IEEE Conference on Computational Complexity (CCC), pages 144–154. IEEE Computer Society, 2014. doi:10.1109/CCC.2014.22.
  • [47] Anup Rao and Makrand Sinha. Simplified separation of information and communication. Theory of Computing, 14(20):1–29, 2018. doi:10.4086/toc.2018.v014a020.
  • [48] Swagato Sanyal. Fourier sparsity and dimension. Theory of Computing, 15(11):1–13, 2019. doi:10.4086/toc.2019.v015a011.
  • [49] Swagato Sanyal. Randomized query composition and product distributions. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS), volume 289 of LIPIcs, pages 56:1–56:19. Schloss Dagstuhl, 2024. doi:10.4230/LIPIcs.STACS.2024.56.
  • [50] Petr Savický. On determinism versus unambiquous nondeterminism for decision trees. Technical Report TR02-009, Electronic Colloquium on Computational Complexity (ECCC), 2002. URL: http://eccc.hpi-web.de/report/2002/009/.
  • [51] Ronen Shaltiel. Towards proving strong direct product theorems. computational complexity, 12(1):1–22, 2003. doi:10.1007/s00037-003-0175-x.
  • [52] Alexander Shekhovtsov and Vladimir Podolskii. Randomized lifting to semi-structured communication complexity via linear diversity. In 16th Innovations in Theoretical Computer Science Conference (ITCS), LIPIcs, pages 78:1–78:21. Schloss Dagstuhl, 2025. doi:10.4230/LIPIcs.ITCS.2025.78.
  • [53] Amir Shpilka, Avishay Tal, and Ben Volk. On the structure of boolean functions with small spectral norm. computational complexity, 26(1):229–273, 2017. doi:10.1007/s00037-015-0110-y.
  • [54] Hing Yin Tsang, Chung Hoi Wong, Ning Xie, and Shengyu Zhang. Fourier sparsity, spectral norm, and the log-rank conjecture. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 658–667, 2013. doi:10.1109/FOCS.2013.76.
  • [55] Andrew Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science, SFCS ’77, pages 222–227. IEEE Computer Society, 1977. doi:10.1109/SFCS.1977.24.
  • [56] Andrew Yao. Lower bounds by probabilistic arguments. In Proceedings of the 24th Annual Symposium on Foundations of Computer Science, SFCS ’83, pages 420–428. IEEE Computer Society, 1983. doi:10.1109/SFCS.1983.30.
  • [57] Zhiqiang Zhang and Yaoyun Shi. On the parity complexity measures of boolean functions. Theoretical Computer Science, 411(26-28):2612–2618, 2010. doi:10.1016/j.tcs.2010.03.027.

Appendix A Appendix

A.1 Direct sums for D

In this appendix, we prove that the best-known direct sum results in the context of deterministic communication complexity can be obtained in the parity decision tree setting. We restate our theorem for convenience below. See 4 Let us first introduce a couple of definitions. Fix a function f:{0,1}n{0,1}. A parity certificate for f(x) is an affine space S{0,1}n such that xS and for any xS, f(x)=f(x). Similarly to the classical case, the parity certificate complexity C(f) is the smallest codimension of a space that certifies the value f(x) – where the hardest possible x{0,1}n is taken. We also define spar(f)f^0=|{zf^(z)0}| for the number of non-zero Fourier coefficients of f. To prove Theorem 4, it is enough to prove a direct sum for parity certificate complexity and employ the following two results:

  1. 1.

    C(f)D(f)1/2 [57]

  2. 2.

    C(f)D(f)/logspar(f) [54]

Lemma 37.

For any f:{0,1}n{0,1} and k1, C(fk)kC(f).

Proof of Lemma 37.

Fix an input x{0,1}n attaining dC(f) and suppose towards contradiction that C(fk)<dk. This implies in particular that there exists an affine space S({0,1}n)k described by m<dk equations QTx=b (where Q{0,1}n×m,b{0,1}m) that certifies the value of the input y({0,1}n)k which is composed of k copies of x. Define di for i[k] with:

didim(col(Q)Wi)Wi{w({0,1}n)k:wj=0nji}.

Observe that i[k]dim<dk and as such there must be some i with di<k. Fix for simplicity i=1. Using Gaussian elimination, one can re-express S=S1S2 where

  1. 1.

    the constraints in S1 are exclusively on bits of the first copy and

  2. 2.

    any constraint in S2 has at least one bit of a copy other than the first.

Since S1 is about the first copy only, it can be identified with a single-copy affine space S{0,1}n where codim(S)=d1<k in a natural way. Observe that xS as yS. Because the codimension of S is strictly less than k, there must be some xS with f(x)f(x). Note that fixing x1x leaves the system of linear constraints S2 feasible and as such there exists x2,,xk{0,1}n such that y(x,x2,,xk)S: a contradiction since f(y)f(y).

A.2 Omitted case of Theorem 2

Lemma 38.

If D×(f)6Clog(n), we have R(fk)Ω(k/logn)D×(f).

Proof.

Fix a hard product distribution μ for D×(f). If D1/3(f,μ)=0, the claim follows trivially. Else, we have D1/3(f,μ)>0 and using Claim 39 with ε1/6, it must be that S1/6(f)1/6. Using Claim 17 and Theorem 18, we thus have:

R(fk)D1/6(fk,μk)S1/6(fk,μk)kS1/6(f,μ)k/6Ω(k/logn)D×(f)

Claim 39.

For any f, product distribution μ and ε0, we have Dε+Sε(f,μ)(f,μ)=0.

Proof.

Fix a deterministic decision tree T and consider the zero-query decision tree T that comes out of applying Algorithm 7 to T. To relate T and T, we go through Algorithm 5. Again, let 𝒯0 be the tree obtained by applying Algorithm 5 to T with error zero on line 4. We stress that 𝒯0 is a randomised decision tree depending on 𝜼. On the other hand, T can be seen as 𝒯0 with fewer instructions executed. Using Lemma 31, we have:

Pr𝒙μ[T(𝒙)T(𝒙)] =Pr𝒙,𝜼[𝒯0(𝒙)T(𝒙)]
Pr𝒙,𝜼[line 9 is executed while running 𝒯0(𝒙)]
=Pr𝒙,𝝆μ𝒙[T𝝆(𝒙) makes a query]
𝔼𝒙,𝝆[q(T𝝆,𝒙)]
=sq¯(T,μ)

Now, let 𝒯 be a randomised parity tree such that sq¯(𝒯,μ)=Sε(f,μ) and errf(𝒯,μ)ε. Let 𝒯 be the randomised parity tree obtained as follows:

  1. 1.

    Sample 𝑻𝒯

  2. 2.

    Return 𝑻 obtained by applying 𝑻 to Algorithm 7.

With the analysis above, we obtain Pr𝒙,𝑻[𝑻(𝒙)𝑻(𝒙)]𝔼𝑻𝒯[sq¯(𝑻,μ)]=sq¯(𝒯,μ). We remark that 𝒯 makes no queries and has the following error probability:

errf(𝒯,μ)errf(𝒯,μ)+Pr𝒙,𝑻[𝑻(𝒙)𝑻(𝒙)]ε+sq¯(𝒯,μ)=ε+Sε(f,μ).

Algorithm 7 converts an algorithm for Sε<1 to a zero-query algorithm.

A.3 Direct sum for distribution-free discrepancy

Theorem 40.

For every function f:{0,1}n{0,1} and k1,

kdisc(f)+1disc(fk)k(disc(f)1).
Proof.

The lower bound is a simple consequence of Lemma 8 by fixing μ to be a distribution such that disc(f)=disc(f,μ) and observing that disc(fk)disc(fk,μk). The other direction is more interesting as it says that the hardest distribution for fk is basically k products of the hardest distribution for a single copy f. Let f^minμFμ^ where μ ranges over all distributions. Using Lemma 9, we obtain the following relation between disc(f) and f^:

logf^+1disc(f)logf^.

Therefore, to prove the upper bound, it is enough to show a perfect direct product for f^ and apply it k time. To this end, fix some other function g:{0,1}n{0,1} and let us show that

fg^f^g^.

Where we recall that fg:{0,1}2n{0,1}. We can write f^ as the value of the following linear program where the variables describe a distribution μ:

min. c (5)
s.t. |x{0,1}n(1)f(x)μx(1)x,z|c z{0,1}n
x{0,1}nμx=1
μx0 x{0,1}n

The dual of (5) is:

max. d (6)
s.t. z{0,1}n(1)f(x)βz(1)x,zdx{0,1}n
z{0,1}n|βz|=1

Let (βf,df) and (βg,dg) be the optimal feasible solutions to (6) with respect to f and g. By the strong duality theorem, it holds that f^=df and g^=dg. We now extract a feasible solution for (6) with respect to the function fg. Let β{0,1}2n be defined with β(z1,z2)=βz1fβz2g and observe that (β,dfdg) is a feasible solution for the dual of fg^. By applying the strong duality theorem again, we have fg^dfdg=f^g^, as desired.

A.4 Some facts about parity decision trees

Yao’s minimax principle is a powerful technique to analyse randomised algorithms – we adapt here the statement to parity trees, but the proof is exactly the same as the original one [55].

Lemma 41.

For any f:{0,1}n{0,1} and distribution μ over {0,1}n, Rε(f)Dε(f,μ).

The following is a folklore fact relating randomised parity tree complexity and discrepancy [56, 3] which we re-prove in the parity context.

Lemma 42.

Dε(f,μ)disc(f,μ)+log(12ε) for any ε[0,1/2).

Proof.

Fix a parity decision tree T of depth dDε(f,μ) which makes error errf(T,μ)ε, note that

12ε Pr𝒙μ[T(𝒙)=f(𝒙)]Pr𝒙μ[T(𝒙)f(𝒙)]
=SPr𝒙μ[T(𝒙)=f(𝒙)𝒙S]Pr𝒙μ[T(𝒙)f(𝒙)𝒙S].

As |(T)|2d, there exists some S(T) – an affine subspace – with large correlation:

bias(f,μ,S)=|Pr𝒙μ[T(𝒙)=f(𝒙)𝒙S]Pr𝒙μ[T(𝒙)f(𝒙)𝒙S]|12ε2d.

See 30

Proof.

Let NORn:{0,1}n{0,1} be the function that checks whether the input is 0n and rejects otherwise. Observe that one iteration of the sumcheck protocol can be performed in one parity query. More precisely for any x{0,1}n, if 𝒔U({0,1}n) then:

Pr𝒔[x,𝒔=1]={1/2if x0n0if x=0n.

Performing two random checks independently shows that R(NORn,1/4)O(1). It is a folklore result that a (classical) randomised decision tree can solve FFOn with probability ε using O(logn+log(1/ε)) oracle NOR-queries even if the oracle fails with probability 1/3 [24, 44]. We highlight that this is an improvement over the naive method that boosts the noisy NOR queries and yields complexity O(log(n)2log(1/ε)). Recent work [30, §3] revisits this trick in depth for communication complexity and their result can be re-interpreted in the context of parity decision trees as follows:

f:R(f,ε)O(DNOR(f)+log(1/ε)).

Thus, plugging in f=FFO and noting that DNOR(FFOn)logn (with binary search), we get the desired result. See 11

Proof.

Let 𝒯 be a randomised PDT satisfying that dq¯(𝒯,μ)=D¯ε(f,μ) and errf(𝒯,μ)ε. To prove the lemma, it suffices to construct a deterministic parity tree T of depth Td/γ with errf(T,μ)ε+γ. Sample 𝑻𝒯. We construct a new tree 𝑻 by pruning 𝑻 as follows: We remove all the nodes of 𝑻 of depth greater than d/δ. If any node of depth d/δ becomes a leaf, we label it with an arbitrary bit. Note that 𝑻 has depth d/δ. Finally, let 𝒯 denote the distribution over 𝑻 inherited from 𝒯.

We observe that for each x{0,1}n, both 𝑻(x)=f(x) and 𝑻(x)f(x) happen only if q(𝑻,x)>d/γ. Moreover, by Markov’s inequality,

Pr𝑻𝒯𝒙μ[q(𝑻,𝒙)>d/γ]q¯(𝑻,μ)d/γ=γ.

Therefore, errf(𝒯,μ)errf(𝒯,μ)+γε+γ. By an averaging argument, there exists some Tsupp(𝒯) of depth d/δ that computes f with error errf(T,μ)ε+γ, as desired.

A.5 Omitted proofs of Section 6

In this appendix, we prove Claim 29, an alternative description for the distributions of Section 6.1. Let p1,p2{0,,?}n. We write p1p2 if p1 and p2 are consistent over their non? entries. That is, p1p2 if for all j[n], if pj1? and pj2?, then pj1=pj2. Claim 29 follows from Claims 43 and 44.

Claim 43.

For every reachable state (v,p), consistent x{0,1}n and ρ{0,}n, v,p,x^p,x.

Proof.

Upon inspection of ^v,p, it is enough to prove that for all x{0,1}n:

Pr𝝆v,p,x[𝝆=ρ]=jS?p𝟙[ρj=pj]×jS?pxj=1𝟙[ρj=]×jS?pxj=0{δj/(2δj)if ρj=1δj/(2δj)if ρj=0.

Fix x{0,1}n. We prove this by induction on the state space (v,p) consistent with x. The entry-point of the state space is (root(T),?n). In this case, the statement holds by definition. Suppose now that the statement is true for state (v,p). Depending on the value of ρ, there are several next state (v,p) possible. Observe however that the next vertex of T to be visited does not depend on ρ, as it is fixed to be vchild(v,x,Qv). For any fixed ρ{0,}n, we have:

Pr𝝆v,p,x[𝝆=ρ] =Pr𝒙μ,𝝆Rμ𝒙[𝝆=ρ(v,p) is reached and 𝒙=x]
=Pr𝒙,𝝆[𝝆=ρ and (v,p) is reached and 𝒙=x]Pr𝒙,𝝆[(v,p) is reached and 𝒙=x].

Note that there can be only one state from which (v,p) can be reached, namely (v,p). Indeed, suppose that there is another state (v,p) from which (v,p) can be reached. Then (v,p) and (v,p) have a common ancestor (u,q). Since the paths diverged after (u,q), it must be that pp and thus pp: a contradiction. Thus, we have the following equivalence:

(v,p) is reached(v,p) is reached and ρp.

Therefore, we have:

Pr𝝆v,p,x[𝝆=ρ]=Pr𝝆v,p,x[𝝆=ρ]𝟙[ρp]Pr𝝆v,p,x[𝝆p]. (7)

We can now use the inductive hypothesis on (v,p). Since ρp implies ρp, the numerator of  7 simplifies to:

jS?p𝟙[ρj=pj]×jS?pxj=1𝟙[ρj=]×jS?pxj=0{δj/(2δj)if ρj=1δj/(2δj)if ρj=0.

Let Δ=S?pS?p and observe that the denominator of 7 is equal to:

jΔxj=1𝟙[ρj=]×jΔxj=0{δj/(2δj)if ρj=1δj/(2δj)if ρj=0.

Claim 44.

For every reachable state (v,p) and x{0,1}n, 𝒳v,p𝒳^v,p.

Proof.

Fix some (v,p) and x{0,1}n. Upon inspection of X^v,p, it is enough to prove that

Pr𝒙𝒳v,p[𝒙=x]=M(x,v,p)jS?p1δj/2xj(1δj),

where M(x,v,p) is an indicator set to 1 if and only if for all j[n], pj=0 implies xj=0 and x,Qu=bu for all upath(v). By Baye’s rule we have:

Pr𝒙𝒳v,p[𝒙=x]=Pr𝒙μ𝝆μ𝒙[𝒙=x(v,p) is reached on (𝒙,𝝆)]=p(x)x{0,1}np(x),

where

p(x)Pr𝒙,𝝆[𝒙=x]Pr𝒙,𝝆[(v,p) is reached on (𝒙,𝝆)𝒙=x].

To analyse p(x), we have:

Pr𝒙μ𝝆μ𝒙[𝒙=x]=Pr𝒙μ[𝒙=x]=j[n]Pr𝒙μ[𝒙j=xj]=j[n]1(δj/2)xj(1δj).

On the other hand, the second component of p(x) is clearly zero if M(x,v,p)=0. For instance, v cannot be reached if x does not satisfy all equations on the path to v. Thus, we have:

Pr𝒙μ𝝆μ𝒙[(v,p) is reached on (𝒙,𝝆)𝒙=x]
= Pr𝝆μx[(v,p) is reached on (x,𝝆)]
= M(x,v,p)Pr𝝆μx[𝝆p]
= M(x,v,p)jS0p22δj2δjjSp(δj2δj)1xj.

Combining those two observations, we get:

p(x)=M(v,p,x)jS?p(1δj/2xj(1δj))jS0p(1δj)jSpδj/2.

Observe that the last two products do not involve x at all and can thus be cancelled in the initial expression:

Pr𝒙𝒳v,p[𝒙=x] =p(x)xp(x) where p(x)=M(x,v,p)jS?p(1δj/2xj(1δj)).

Finally, observe that M(x,v,p) fixes the value of all the bits of x except for S?p. Thus, the summation in the denominator equals 1 and the claim follows.