Abstract 1 Introduction 2 Preliminaries 3 Lifting theorems for randomized rank 4 Parity decision tree lower bounds via -depth References

Lifting to Randomized Parity Decision Trees

Farzan Byramji ORCID University of California, San Diego, CA, USA Russell Impagliazzo ORCID University of California, San Diego, CA, USA
Abstract

We prove a lifting theorem from randomized decision tree depth to randomized parity decision tree (PDT) size. We use the same property of the gadget, stifling, which was introduced by Chattopadhyay, Mande, Sanyal and Sherif [ITCS 23] to prove a lifting theorem for deterministic PDTs. Moreover, even the milder condition that the gadget has minimum parity certificate complexity at least 2 suffices for lifting to randomized PDT size.

To improve the dependence on the gadget g in the lower bounds for composed functions, we consider a related problem g whose inputs are certificates of g. It is implicit in the work of Chattopadhyay et al. that for any function f, lower bounds for the -depth of f give lower bounds for the PDT size of f. We make this connection explicit in the deterministic case and show that it also holds for randomized PDTs. We then combine this with composition theorems for -depth, which follow by adapting known composition theorems for decision trees. As a corollary, we get tight lifting theorems when the gadget is Indexing, Inner Product or Disjointness.

Keywords and phrases:
Parity decision trees, composition
Category:
RANDOM
Funding:
Farzan Byramji: Supported by NSF Award AF: Medium 2212136.
Russell Impagliazzo: Supported by NSF Award AF: Medium 2212136.
Copyright and License:
[Uncaptioned image] © Farzan Byramji and Russell Impagliazzo; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Oracles and decision trees
Related Version:
Full Version: https://eccc.weizmann.ac.il/report/2024/202/
Acknowledgements:
FB thanks Sreejata Kishor Bhattacharya, Eric Blais, Zachary Chase, Arkadev Chattopadhyay, Jyun-Jie Liao, Shachar Lovett, Jackson Morris and Anthony Ostuni for discussions and feedback at various stages of this work. The authors would like to thank the anonymous reviewers for helpful comments.
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

Lifting theorems provide a way to convert lower bounds for a function f in a weak model of computation to lower bounds in a stronger model of computation by composing with a function g, typically called a gadget. Given functions f:{0,1}n{0,1} and g:{0,1}m{0,1}, their composition fg:({0,1}m)n{0,1} is defined by

(fg)(x1,x2,,xn)=f(g(x1),g(x2),,g(xn)).

Typically such lifting theorems show that for certain choices of the gadget g, the two-party communication complexity of fg in some model is lower bounded by the complexity of f in a related query model [36, 27, 25, 28, 14, 31]. These have several applications and have led to the resolution of some long-standing problems [36, 26, 22, 15, 33]. An important challenge in the area is to decrease the gadget size to a constant. Current proofs require the gadget size to be logarithmic in the input length of the outer function.

As a stepping stone towards query-to-communication lifting theorems with improved gadget size, we may consider the problem of lifting to models which lie between communication protocols and decision trees. One such natural model is that of parity decision trees (PDTs). A parity decision tree is a decision tree where each node queries a parity iSxi for some S[n] and the sum is over 𝔽2. While being interesting on its own, another motivation for proving PDT lower bounds comes from proof complexity. The minimum size of a refutation of an unsatisfiable CNF formula ϕ in the proof system tree-like Resolution over parities (Res()) is (essentially) equal to the minimum size of a deterministic parity decision tree solving the related false clause search problem for ϕ [29]. Lifting theorems for deterministic parity decision trees using constant size gadgets were recently proved by Chattopadhyay, Mande, Sanyal and Sherif [16] and independently by Beame and Koroth [6] which gave a direct way to transform tree-like Resolution lower bounds to tree-like Res() lower bounds.

As a next step, we may ask for lifting theorems for randomized parity decision trees. While lower bounds for randomized PDTs do not seem to directly imply lower bounds for stronger proof systems, lower bound techniques against randomized PDTs (along with several other ideas) have been recently used to prove lower bounds against certain subsystems of (dag-like) Res() [20, 11, 3]. (More precisely, they use distributional lower bounds against deterministic PDTs.)

In this work, we prove a lifting theorem from randomized decision tree (DT) depth to randomized parity decision tree (PDT) size with constant size gadgets. For a function f, we use Ddt(f) to denote the deterministic DT depth of f and Rdt(f) to denote the 1/3-error randomized DT depth of f. Similarly, we use DSizedt(f) and RSizedt(f) to denote the corresponding size measures. We use in the superscript to denote the analogous PDT measures. For example, RSize-dt(f) denotes the minimum size of any randomized PDT computing f to error 1/3.

To prove the lifting theorem for randomized PDTs, we use the same property of the gadget, stifling, which was introduced by [16] to prove their lifting theorem for deterministic PDTs. A function g:{0,1}m{0,1} is said to be k-stifled if for all S[m] of size k and b{0,1}, there is some way to fix all bits other than those in S to force the function g to output b. A function g which is 1-stifled is also simply called stifled. Some examples of stifled functions include Inner Product, Indexing and Majority.

Theorem 1.

For any function f:{0,1}n{0,1}, any stifled function g:{0,1}m{0,1},

logRSize-dt(fg)Ωm(Rdt(f))

where the implicit constant depends on the gadget size m.

Our results also hold for relations f (like most other lifting theorems) and partial g, but we mostly focus on total functions in this section for simplicity.

Let us mention two applications of the above lifting theorem and its underlying ideas to regular Res() (see the full version for details). In their proof of an exponential separation between regular Res() and general Resolution, Bhattacharya, Chattopadhyay and Dvořák [11] required a randomized PDT lifting theorem and so they used a lifting theorem for randomized communication complexity [14]. This lifting theorem requires a logarithmic size gadget with a fairly large multiplicative constant which implies that the number of clauses in the resulting formula (and thereby the upper bound) is a large polynomial. By instead using the above lifting theorem for randomized PDTs with a constant size gadget, the lifted formula has the same number of clauses as the base formula (up to a constant factor) thereby improving the separation. We also show that ideas similar to those in the simulation can be used to improve the known lower bound for the bit pigeonhole principle in regular Res() from exp(Ω~(n3)) [20] to exp(Ω~(n)).

As warm-up to proving the lifting theorem for randomized PDTs, we also consider the simpler question of which gadgets allow lifting from randomized DT depth to randomized DT size, that also does not seem to have been considered before. In the deterministic case, Alekseev, Filmus and Smal [2] (also [19]) showed that a resistant gadget allows lifting DT depth to DT size, where a gadget is resistant if fixing a single bit of the input cannot fix the value of the function. We observe that their ideas can also be used to prove the analogous statement for randomized decision trees.

Theorem 2.

For any function f:{0,1}n{0,1}, resistant function g:{0,1}m{0,1},

logRSizedt(fg)Ωm(Rdt(f)).

Theorems 1 and 2 are in fact slightly stronger than stated since they lift to rank, a measure which lower bounds log size. This is also true of the deterministic PDT lifting theorem [16, 6] and the simulation-based proof of the deterministic DT size lifting theorem [2], though they do not explicitly mention it. PDT rank also lower bounds depth in subspace decision trees. A subspace decision tree is a decision tree where each internal node can query the indicator of an affine subspace.

Let us recall the definition of rank, introduced by Ehrenfeucht and Haussler [21]. It will be convenient to work with the following alternative way of looking at rank. Consider decision trees where at each node, one of the two outgoing edges is marked, and define the cost of such a marked decision tree to be the maximum number of marked edges along any root-to-leaf path. The rank of a function g, DRankdt(g), is the minimum cost of such a marked decision tree computing g. Compared to size, rank is closer in spirit to depth, which better motivates some of the ideas discussed later.

The general question of whether randomized DT size satisfies a composition theorem ‘Does logRSizedt(fg)=Ω(Rdt(f)logRSizedt(g)) hold for all f and g up to polylog factors?’ was recently asked by Dahiya [18]. The corresponding question in the deterministic case has a positive answer, as shown by Dahiya and Mahajan [19].111Here we require DSizedt(g)>m+1, where m is the input length for g. Considering f= and n and g= and m shows that some such condition is necessary [2]. But this still leaves open the possibility that there are other functions with DSizedt(g)m+1 which allow lifting. We discuss this in more detail in the full version. They actually show that deterministic DT rank satisfies a composition theorem which implies the composition theorem for size.

The composition question is interesting even in the most basic setting of decision tree depth. While a composition theorem for deterministic depth has been known for long [38, 40, 32], the case of randomized depth is more subtle. It is still unknown whether we have Rdt(fg)=Ω(Rdt(f)Rdt(g)) for all total functions f and g. In fact, it is known that the statement is false in its most general form for the composition of a relation and a partial function [23] and even for the composition of partial functions [7]. There is a long line of work [1, 9, 4, 24, 23, 5, 7, 8, 13, 37] studying this question and proving lower bounds on Rdt(fg) of the form Ω(Rdt(f)M(g)) or Ω(M(f)Rdt(g)) where M() is some complexity measure.

Theorem 2 can be used to show that a composition theorem for rank implies one for depth, or, taking the contrapositive, counterexamples for depth also provide counterexamples for rank. Still, some of these composition theorems [9, 23, 8] can be adapted to give analogous composition theorems for randomized DT rank (see Appendix B in the full version).

Motivated by the work on the composition question for ordinary decision trees, we try to better understand the dependence on the inner function in the lower bounds on PDT rank for composed functions. [16] proved that for any k-stifled g, DRank-dt(fg)kDdt(f) and the dependence on stifling in this lower bound cannot be improved since for some functions g such as Indexing, g is k-stifled and DRank-dt(g)=k+1.

We observe that the ideas in [16] also work with an adaptive version of stifling. To state this notion precisely, we consider the following task. Let g:{0,1}m{0,1} be a Boolean function. Given query access to a certificate z{0,1,}m of g, recognize whether z is a 0-certificate or a 1-certificate. Let g denote this problem. Now instead of counting all queries in the cost, only include the queries which evaluate to . The minimum number of ’s required to solve g is denoted by D-dt(g). Using this notion, we show the following.

Theorem 3.

For all functions f:{0,1}n{0,1},g:{0,1}m{0,1},

DRank-dt(fg)Ddt(f)(D-dt(g)1).

This is inspired by discussion at the end of the talk [39], where it is shown that for the Inner Product gadget we can get linear dependence on the gadget size even though Inner Product is not 2-stifled. Observe that if g is k-stifled, then D-dt(g)>k since the first k queries made by an algorithm could all be and it has still not determined whether z is a 0-certificate or a 1-certificate. Thus, the above lower bound is always at least as good as the one obtained from stifling. When g is Inner Product or Disjointness on 2m bits, g is not 2-stifled but D-dt(g)=m, which shows that this bound can sometimes be better.

We also prove a randomized analogue of Theorem 3.

Theorem 4.

For all functions f:{0,1}n{0,1},g:{0,1}m{0,1},

RRank-dt(fg)Ω(Rdt(f)(LR(g)O(1))).

Here LR is the natural analogue of the linearized complexity measure introduced in [8] where an inner composition theorem Rdt(fg)=Ω(Rdt(f)LR(g)) was proved. For a function h:{0,1,}m{0,1,},

LR(h)=inf𝒯maxxcost(𝒯,x)bias(𝒯,x),

where 𝒯 varies over randomized decision trees on {0,1,}n, x varies over inputs in the domain of h, cost(𝒯,x) denotes the expected number of ’s seen when running 𝒯 on x and bias(𝒯,x)=max{0,2Pr[𝒯(x)=h(x)]1}.

Using this, we can show that when the inner function is Inner Product, Disjointness or Indexing, the upper bound RRank-dt(fg)O(Rdt(f)RRankϵ-dt(g)) for ϵ1/Rdt(f) is the best one can do. For these inner functions, the deterministic PDT rank is equal to the randomized PDT rank (up to constants) so the upper bound is simply O(Rdt(f)RRank-dt(g)).

Corollary 5.

For m2, for all functions f:{0,1}n{0,1},

RRank-dt(fDISJ2m) =Θ(Rdt(f)RRank-dt(DISJ2m)),
RRank-dt(fIP2m) =Θ(Rdt(f)RRank-dt(IP2m)),
RRank-dt(fINDm+2m) =Θ(Rdt(f)RRank-dt(INDm+2m)).

While Theorems 3 and 4 do imply the lifting theorems mentioned earlier222Strictly speaking, Theorem 4 does not seem to directly imply Theorem 1 because of the additive constant loss in LR. However, one can use other measures for which an inner composition theorem holds, like sabotage complexity, in place of LR to recover Theorem 1., they still work in the standard basis. It is natural to consider analogues of the above measures which work with parity certificates instead of ordinary certificates, and indeed we can prove analogues of the above results using such measures. These lifting theorems can be found in the full version.

Theorems 3 and 4 can be viewed as improving the quantitative dependence on the gadget in the lifting theorems obtained via stifling. We can also consider the qualitative question of whether a more general property than stifling allows lifting to parity decision trees. In this direction, Alekseev, Filmus and Smal [2] completely classified gadgets based on what gadgets allow polynomial lifting, logDSize-dt(fg)=Ωg(Ddt(f)ϵ) (for some ϵ>0). However, this does not answer the question of which gadgets allow linear lifting, logDSize-dt(fg)=Ωg(Ddt(f)). We observe by considering a mild generalization of stifling that for any gadget g whose minimum parity certificate complexity is at least 2, DRank-dt(fg)Ddt(f). This can be seen as the natural parity analogue of the statement that if g has minimum certificate complexity at least 2, then for all f, DRankdt(fg)Ddt(f) [2, 19].

The similarities go further. The class of gadgets g which are not already captured by the above condition (possibly after first moving to a subfunction) are the ones which satisfy DRank-dt(g)=1. If g is a total function which cannot be computed by a single parity query and satisfies DRank-dt(g)=1, we show that there is some gadget h such that understanding whether g allows lifting to PDT rank is equivalent to understanding whether h allows lifting to DT rank. More precisely, we have the following.

Proposition 6.

Let g:{0,1}m{0,1} be a total function which is not a parity. Then one of the following holds:

  • DRank-dt(g)2 and for all functions f, we have DRank-dt(fg)Ddt(f) and
    RRank-dt(fg)Ωm(Rdt(f)).

  • DRank-dt(g)=1 and there exists some h:{0,1}k{0,1} such that for all functions f, we have DRank-dt(fg)=DRankdt(fh) and RRank-dt(fg)=Θ(RRankdt(fh)).

1.1 Techniques

The simulation for proving the lifting theorem for randomized PDTs with stifled gadgets builds on the ideas of [16]. Let us start by recalling their main idea for simulating a parity query efficiently. On an input x for f, we will simulate a parity decision tree T for fg. For concreteness, suppose the parity at the root of T is z11+z21+z22+z32. While this parity depends on multiple blocks, we would like to simulate it while making just one query to x. To do this, we localize the parity query to a single bit, say z11, which is informally thought of as being responsible for the value of the whole parity. Specifically, we view z11+z21+z22+z32=b for some b𝔽2 as fixing z11=z21+z22+z32+b where we think of z21,z22,z32 as still being free. At this point, we no longer have control of z11 but since g is stifled, we can fix the remaining bits in block z1 to force g(z1)=x1. So we only need to make one actual query to simulate a parity query.

Now suppose we wish to simulate a randomized PDT. To ensure correctness, following the usual simulation framework for randomized communication protocols and decision trees, we now require suitable hard distributions μ0 and μ1 on the 0-inputs and 1-inputs respectively of the gadget g. On an input x for f, we simulate a parity decision tree T for fg on the distribution μx:=μx1×μx2××μxn. Continuing with the above example, suppose we wish to simulate the parity z11+z21+z22+z32 on the distribution μx where x is unknown. In general, simulating this parity by only querying one bit of x seems hard but consider the following very special case. On querying x1, suppose we find that the corresponding distribution μ of block z1 is such that z11 is a uniform random bit which is independent of all the other bits, i.e. μ=𝒰1×μ for some distribution μ on {0,1}m1. Then irrespective of the distributions of z2 and z3, we know that this parity is equally likely to be 0 or 1 and so we can just move to a uniform random child. The remaining bits in z1 are set according to μ. So at least in this special case, we could simulate the parity with just one query.

To actually use the above observation in our simulation, we rely on the following idea of Bhattacharya, Chattopadhyay and Dvořák [11]. If g is a stifling gadget of constant size m, the uniform distributions on g1(0) and g1(1) have the following useful property. For any i[m], with constant probability, the bits other than i form a certificate of g in which case the ith bit is uniformly distributed. Using this property, we can now simulate a parity query across multiple blocks by a constant number of queries in expectation since each time we make an actual query, with good probability, we will be able to simulate the current parity without making any more queries.

Next we briefly describe the ideas behind the PDT simulation theorems with improved dependence on the gadget. We focus here on the deterministic case; the randomized case follows a similar proof outline. For the deterministic PDT lifting theorem, we proceed in two steps. First, we observe that the proof in [16] implicitly uses a general reduction showing that for all functions, DRank-dt(f)D-dt(f). In particular, f does not need to be of composed form. This lower bound even works for relations , once we suitably define . In Appendix A of the full version, we use this lemma and other ideas in this work to give simple proofs of some known lower bounds for tree-like Res().

With the lower bound DRank-dt(fg)D-dt((fg)) in hand, the next step is to simply note that D-dt satisfies a composition theorem D-dt(fg)Ddt(f)(D-dt(g)1) analogous to usual decision tree complexity. Combining these gives Theorem 3. The simulation in [16] can be understood as instead using the relation D-dt(fg)kDdt(f) whenever g is k-stifled, which can be seen as some analogue of Ddt(fg)Ddt(f)C(g).

1.2 Related work

In independent work, Podolskii and Shekhovtsov [34] have also proved a lifting theorem for randomized PDTs. In fact, they lift to the stronger model of semistructured communication protocols where one party is restricted to sending parities and the other can send arbitrary messages. The gadgets they allow are certain generalizations of Indexing where each index points to a distinct parity and their lifting theorem has the right dependence on the gadget size for such gadgets. This class naturally captures gadgets like Indexing and Inner Product, and by considering suitable reductions, their lifting theorem also applies to other gadgets. Some of the underlying ideas for simulating parities in their work are similar to ours, though it seems that our techniques do not directly imply their result for PDTs with the correct dependence on the gadget size and vice-versa.

Since we focus on the simpler model of PDTs, our proof of the lifting theorem using stifling gadgets is quite short, and we also provide a refined classification of gadgets for when lifting to PDTs is possible. It would be interesting to give a simulation unifying the lower bounds from [34] and our lower bounds for composed problems via -depth. We suspect that by combining techniques useful for composition in ordinary decision trees, with techniques used in query-to-communication lifting, it may be possible to find broader classes of gadgets which allow lifting to semi-structured protocols.

Besselman et al. [10] have recently obtained direct sum theorems for randomized PDT depth. They show that a direct sum theorem holds when the lower bound is proved via discrepancy or against product distributions. The direct sum question can be seen as the special case of composition where the outer function is the identity function. Our results using -depth (and its parity generalization) also give direct sum results for PDTs, where direct sum theorems hold for -depth by adapting the proofs for ordinary decision trees [30].

These direct sum results are incomparable to those of [10]. In one direction, for the Majority function, we have R-dt(MAJ)=O(n) while any PDT solving MAJ on the uniform distribution requires depth Ω(n). On the other hand, there are functions for which randomized -depth is polynomially larger than the PDT depth on product distributions. Such functions can be obtained by lifting such a separation for ordinary decision trees using a stifled gadget. The NAND tree provides such a separation for ordinary decision trees [37]. Similarly, in the deterministic case, the NAND tree also shows that deterministic -depth can sometimes be quadratically larger than (parity) certificate complexity.

1.3 Organization

In Section 2, we state definitions for the query complexity measures used. In Section 3, we prove Theorems 1 and 2. In Section 4, we prove Theorems 3 and 4.

2 Preliminaries

For a positive integer n, we use [n] to denote the set {1,2,,n}. All logs are to the base 2. We use |x| to denote the Hamming weight of a string x{0,1}n. Let {0,1}n×𝒪 be a relation. Let g:M{0,1,} be a partial function on some domain M (typically M={0,1}m). Then the composed relation gMn×𝒪 is defined as follows. If for xMn, there is some i[n] such that g(xi)=, then (x,o)g for all o𝒪 (in this case, we think of x as lying outside the domain). Otherwise (x,o)g if and only if (gn(x),o). For a relation and input x{0,1}n, we sometimes use (x):={o𝒪|(x,o)𝒪} to denote the legal outputs on x. A partial function g:{0,1}m{0,1,} is also sometimes interpreted as the relation where x is related to g(x) if x is in the domain of g and otherwise x is related to all possible outputs {0,1}.

We use standard Ω(),O() notation in most places to only represent universal constants and when required, we will explicitly note which parameters need to be large for the inequalities to hold. Additionally, if the constant depends on some parameter or function, this will be indicated by a subscript. We now define the query complexity measures used in this work. Refer to [12] for a survey about some of these measures.

Decision trees.

A deterministic decision tree T on {0,1}n is a binary rooted tree with leaves being labeled from some set 𝒪 and internal nodes labeled by i[n] each with two outgoing edges labeled 0 and 1. On an input x{0,1}n, starting at the root, we repeatedly follow the edge according to the value of xi where i is the label of the current node, until we reach a leaf. The label of the leaf is the output of T on x, which we denote by T(x).

The cost of T on x, 0pt(T,x) (or sometimes cost(T,x)), is the number of queries made by T on x. The depth of T is defined by 0pt(T)=maxx{0,1}n0pt(T,x). The size of T, denoted size(T), is the number of leaves of T.

The rank of T is defined in the following way. Consider markings of the edges of T such that one of the outgoing edges from each internal node is marked. For such a marked tree, the cost associated with this marking is the maximum number of marked edges on any root-to-leaf path. The rank of a tree T is the minimum cost of any marking of T. This definition of rank is equivalent to the more common bottom-up definition and the equivalence is proved in [17] but the idea also appears implicitly in prior work. For such a marked tree T, we will use cost(T,x) to denote the number of marked edges on the root-to-leaf path taken by x.

For a relation {0,1}n×𝒪, a decision tree T is said to compute if for all x{0,1}n, (x,T(x)). Define Ddt() to be the minimum depth of a deterministic decision tree computing . Define DSizedt() and DRankdt() to be the minimum size and rank respectively of a decision tree computing .

A randomized decision tree 𝒯 is a probability distribution over deterministic decision trees. We will use the same notation as in the deterministic case to denote the worst-case depth, size or rank of a randomized decision tree. We also use the corresponding expected measures described next. On input x{0,1}m, we define cost(𝒯,x)=𝔼TT[cost(T,x)]. Define cost(𝒯)=maxxcost(𝒯,x). Similarly rank¯(𝒯)=𝔼T𝒯[rank(T)] and size¯(𝒯)=𝔼T𝒯[size(T)].

For a relation , a randomized decision tree 𝒯 is said to compute to error ϵ if for all x{0,1}n, PrT𝒯[(x,T(x))]1ϵ. We use Rϵdt(),RSizeϵdt(),RRankϵdt() to denote the worst case analogues of Ddt(),DSizedt(),DRankdt() for randomized decision trees that compute to error ϵ. The corresponding expected measures are denoted by Rϵdt¯(),RSizeϵdt¯(),RRankϵdt¯(). We omit the subscript when dealing with error ϵ=1/3.

For any decision tree T, rank(T)logsize(T). This directly implies for a randomized decision tree 𝒯, rank(𝒯)logsize(𝒯) by applying the above relation to each tree in the support of 𝒯. To get the corresponding relation for the expected complexity measures, we use Jensen’s inequality to get 𝔼T𝒯[rank(T)]𝔼T𝒯[logsize(T)]log𝔼T𝒯[size(T)], which in our notation is rank¯(𝒯)logsize¯(𝒯). This inequality does not depend on what queries are allowed and also holds for other kinds of decision trees (like parity decision trees).

A parity decision tree (PDT) T is like a decision tree but the internal nodes can now query parities. We will denote a parity in different ways, α,x for some α𝔽2n or as iSxi where α is the indicator vector for S. The notation for parity decision trees is similar to that for (ordinary) decision trees. We will use in the superscript to denote the parity analogue of an ordinary query complexity measure. For example, RRank-dt¯() denotes the minimum expected rank of a parity decision tree computing to error 1/3. When dealing with a parity v on inputs with an underlying block structure ({0,1}m)n, for i[n], we use v|i to denote the projection of v onto the ith block.

We now define 0-depth and 1-depth. The 0-depth of an ordinary decision tree T is the maximum number of edges labeled 0 on any root-to-leaf path in T. We use D0-dt() to denote the minimum 0-depth of a deterministic decision tree for , and similar notation with 0 in the superscript for other 0-query complexity measures. The measures related to 1-depth are defined similarly.

We next mention two standard techniques for proving lower bounds on deterministic decision tree depth and rank. To prove lower bounds on Ddt() for a relation , it suffices to give an Adversary strategy in the Querier-Adversary game for the relation . In this game, Adversary has a hidden string x and Querier’s goal is to find some o related to x according to while making as few queries as possible. This technique is complete in the sense that if Ddt()=d, then there is an Adversary strategy scoring d points. This game also works for other deterministic query complexity measures by changing the kinds of queries Querier is allowed to make.

A similar game can be used to characterize the rank of a relation. In the Prover-Delayer game [35] for relation , similar to the Querier-Adversary game, Prover makes queries to a hidden string x and Delayer responds by revealing the corresponding bits of x, except for the following change. Delayer may instead choose to respond with , which is interpreted as Prover getting to decide how to fix the queried bit. Delayer gets to know what bit Prover picks in this case. The game continues in this manner until Prover can correctly output an o which is related to x. Delayer’s score is the number of ’s announced during the game. The maximum score guaranteed by a Delayer strategy for is equal to the rank of (see [19] for a proof).

The Prover-Delayer game can be equivalently described in a way closer to the Querier-Adversary game in the following way. Now instead of just picking some xi to query, Querier also picks a bit b{0,1} and Adversary gets a point only if the announced value is equal to b. The best score achievable by an Adversary strategy in this game is equal to the best score of a Delayer strategy. This equivalent view can be seen as the natural game corresponding to the description of rank using marked decision trees.

By changing the allowed queries, the Prover-Delayer game can capture rank in other query models. For instance, rank in PDTs is captured by the parity Prover-Delayer game [29].

Certificate complexity.

A partial assignment C is a string in {0,1,}n. Say that C is consistent with x{0,1}n if for all i[n], Ci=xi or Ci=. We will sometimes interpret a partial assignment as the subcube it defines, {x{0,1}n|x is consistent with C}. So we write xC to express that x is consistent with C.

C is said to be a certificate for a relation {0,1}n×𝒪 if there exists some o𝒪, such that for all xC, (x,o). For a partial function g:{0,1}m{0,1,} and bB, a partial assignment C is said to be a b-certificate if for all xC, (x,b)g (interpreting g as a relation). In other words, we require that for all xC, g(x){b,}. The size of a certificate C is the number of non-’s in it. The certificate complexity of a relation at x{0,1}n, denoted C(,x), is defined as the minimum size of a certificate which is consistent with x. The certificate complexity of , C() is the maximum certificate complexity of any input. The minimum certificate complexity of , Cmin(), is the minimum size of a certificate for .

When working with a partial function g, we will sometimes only be interested in certificates whose corresponding subcubes are completely contained in the domain of g. We will call these domain certificates. We will drop the word domain when it is clear from context or when working with total functions.

A parity certificate for {0,1}n×𝒪 is given by a collection of 𝔽2 linear equations on {0,1}n, S={α1,x=b1,α2,x=b2,,αk,x=bk} such that there exists o𝒪 for which for all x𝔽2n satisfying the equations in S, we have (x,o). The size of a parity certificate is the number of equations in it. We may always assume that the linear forms involved in a parity certificate are linearly independent, since we can remove redundant equations without changing the defined affine subspace. The minimum parity certificate complexity of , Cmin(), is the minimum size of a parity certificate for . A domain parity certificate of a partial function g is a parity certificate for g whose corresponding affine subspace is completely contained in the domain of g.

3 Lifting theorems for randomized rank

In this section, we describe simple simulation theorems which lift randomized decision tree depth to rank in randomized decision trees and randomized parity decision trees.

3.1 Lifting to randomized ordinary decision tree rank

Alekseev, Filmus and Smal [2] showed that resistant gadgets suffice for lifting decision tree depth to size by generalizing Urquhart’s argument for the XOR gadget [41]. A function g is said to be k-resistant if Cmin(f)k+1.

Theorem 7 ([2, 19]).

For any k-resistant function g and any relation ,

logDSizedt(g)DRankdt(g)kDdt().

This also follows from results of Dahiya and Mahajan [19] who show more generally that DRankdt(g)(DRankdt(g)1)Ddt() and DRankdt(g)Cmin(g).

We prove the following for randomized decision trees.

Theorem 8.

Suppose g:{0,1}m{0,1,} is k-resistant. For any relation {0,1}n×𝒪,

logRSizeϵdt¯(g)RRankϵdt¯(g)k2mRϵdt¯().

By standard arguments, we get lifting in the worst case if we incur an additive loss in the error. This additive loss can be removed by standard amplification when the outer relation is a function and the error is a constant to get Theorem 2.

For the simulation, it will be convenient to work with an equivalent distributional description of resistant functions.

Definition 9 (balanced function).

A function g:{0,1}m{0,1,} is p-balanced for p(0,1/2] if for every b{0,1}, there is a distribution μb supported on g1(b) such that for each i[m], for each c{0,1}, Prxμb[xi=c]p. A function is balanced if it is p-balanced for some p(0,1/2].

Note that a balanced function is necessarily resistant. It is also easy to see that being resistant is a sufficient condition for being balanced, as shown below.

Observation 10.

If g:{0,1}m{0,1} is k-resistant, then it is k/(2m)-balanced.

Proof.

For b{0,1}, the distribution μb witnessing that g is k/(2m)-balanced is defined in the following way. Select a subset S of [m] of size k uniformly at random. For each iS, pick xi uniformly at random (independently of other bits). Finally set all remaining bits so that the resulting string is a b-input for g. This last step can be performed by the assumption that g is k-resistant. Each i[m] is included in S with probability k/m and conditioned on being included in the first step, it is fixed to c{0,1} with probability 1/2.

We now show that any balanced gadget can be used for lifting to randomized decision tree rank. Using the distributions coming from the above observation, the simulation below is equivalent to applying a suitable random projection as in the second proof of the depth to size lifting theorem in [2]. However, it is analyzed differently for which presenting it as a simulation is more convenient.

Proposition 11.

Let g:{0,1}m{0,1,} be p-balanced for some p(0,1/2]. Then for all relations {0,1}n×𝒪,

RRankϵdt¯(g)pRϵdt¯().

Proof.

We will show how to simulate a randomized decision tree 𝒯 computing g with error probability ϵ by a randomized decision tree 𝒯 to compute with the same error probability. The expected depth of 𝒯 on any input will be at most rank¯(𝒯)/p.

Let μ0 and μ1 be the distributions on g1(0) and g1(1) respectively showing that g is p-balanced. For each x{0,1}n, let μx be the distribution on ({0,1}m)n defined by independently sampling for each i[n], the block zi from μxi. The simulation essentially samples zμx, where x is the input on which we wish to compute , and executes 𝒯 on z. Since x is unknown, the individual blocks of z are sampled as they are queried by the decision tree 𝒯. Since 𝒯 computes g correctly for each z with probability 1ϵ, the probability that 𝒯 outputs incorrectly on input x is at most 𝔼zμx[Pr[(z,𝒯(z))g]]ϵ where the inner probability is over the randomness of 𝒯. So 𝒯 computes to error ϵ.

We now describe 𝒯 in more detail. Since a randomized decision tree 𝒯 is a distribution over deterministic decision trees T, it is enough to show how to simulate a deterministic decision tree T while making at most rank(T)/p queries in expectation. Suppose T queries zi,j at the root. Then we query xi and sample ziμxi. Now that zi is known, we move to the appropriate child. In the future, if T makes any queries to bits of zi, we move according to the already sampled zi. We repeat this process until we reach a leaf at which point we output the label of the leaf reached. Note that this procedure also generates T(z) where zμx since μx=i[n]μxi.

To estimate the number of queries made to x, we will show next that starting at any node in T during the simulation, the number of queries made until we cross a marked edge is at most 1/p in expectation. At any node, one of the outgoing edges is marked. By the assumption on μ0,μ1, whenever a query is made, we follow the marked edge with probability at least p. Thus, it takes at most 1/p queries in expectation to cross a marked edge. (During the simulation, we may reach a node where we directly move to the unmarked child with probability 1 because the bit there had already been sampled earlier, but in this case no new query is made at that node.)

To get the total number of queries made in expectation when simulating T, define for each i[rank(T)], the random variable Xi counting the number of queries made between crossing the (i1)th marked edge and crossing the ith marked edge during the simulation. If we have reached a leaf of T without crossing i edges, then Xi=0. Then we have 𝔼[Xi]1/p for all i[rank(T)] by what was argued above.

Now the total number of queries made is i=1rank(T)Xi since we must reach a leaf in T after crossing at most rank(T) many marked edges. By linearity of expectation 𝔼[i=1rank(T)Xi]=i=1rank(T)𝔼[Xi]rank(T)/p. The total number of queries made when simulating 𝒯 is at most 𝔼T𝒯[rank(T)/p]=rank¯(𝒯)/p.

Theorem 8 now follows from combining Observation 10 and Proposition 11.

3.2 Lifting to randomized parity decision tree rank

We now prove that stifled gadgets allow lifting to randomized parity decision tree rank.

Definition 12 (stifled functions).

A function g:{0,1}m{0,1,} is k-stifled if for every subset S of [n] of size k and each b{0,1}, there is a domain b-certificate C{0,1,}n of g which leaves S free, i.e. Ci= for all iS.

Similar to the case of ordinary decision trees in the previous subsection, it will be convenient to work with an equivalent property arising from certain distributions on the 0-inputs and 1-inputs of the gadget. The following definition is a slight generalization of balanced functions considered in [11].

Definition 13 (affine balanced functions).

A function g:{0,1}m{0,1,} is p-affine balanced if for each b{0,1}, there is a distribution μb supported on g1(b) such that for each i[m], there exist distributions Abi on {0,1}m and Bbi on {0,1}m1 such that μb can be written as the mixture μb=(12p)Abi+(2p)Bbi×U1. Here U1 is a uniform random bit independent of Bi and we think of U1 as the bit zi and Bbi as assigning bits in z[m]{i}.

The above definition says we can sample z from μb in the following way:

  • With probability 12p, sample zAbi.

  • With probability 2p, set zi uniformly at random, and independently z[m]{i}Bbi.

Observation 14.

If g:{0,1}m{0,1,} is k-stifled, then it is k/2m-affine balanced.

Proof.

The distribution μb witnessing that g is k/(2m)-affine balanced is defined in the following way. Select a subset S of [m] of size k uniformly at random. Fix the bits outside S to a domain b-certificate for g (which can be done by the assumption that g is k-stifled). Finally for each iS, pick xi independently and uniformly at random. Each i[m] is included in S with probability k/m and conditioned on being included in the first step, it is fixed to each c{0,1} with probability 1/2 independent of the other bits.

We now prove the lifting theorem for randomized PDTs. In the proof below, we do not give a truly online simulation but after each query, we simplify the PDT being simulated. This is primarily done to make it easy to verify correctness and analyze the number of queries made. We could alternatively have given a simulation closer to [16, 6] by keeping a list of parity queries made during the simulation.

Proposition 15.

Let g:{0,1}m{0,1,} be a p-affine balanced function. For any relation {0,1}n×𝒪,

RRankϵ-dt¯(g)pRϵdt¯().

Proof.

Let 𝒯 be a randomized parity decision tree computing g. For the induction below, it will be convenient to allow each parity query to also involve a constant term. Let μ0 and μ1 be distributions on g1(0) and g1(1) respectively showing that g is p-affine balanced. For each x{0,1}n, define μx to be the distribution on ({0,1}m)n defined by independently sampling for each i[n], the block zi from μxi. We will define a randomized decision tree 𝒯 computing by simulating 𝒯 on the distribution μx. The correctness of 𝒯 follows from the correctness of 𝒯.

𝒯 is defined in the following way. First sample a deterministic parity decision tree T from 𝒯. We simulate it by a randomized decision tree in the following way. Suppose the query (i,j)Szi,j+c at the root of T involves a variable zi,j. Query the variable xi and suppose it was b{0,1}. We will now set the variables in the block zi according to the distribution μb in the following way. Recall that μb can be written as a mixture (12p)Abj+2pBbj×U1 where Abj is a distribution on strings in g1(b) and Bbj is a distribution on domain b-certificates that leave the jth bit free. We set zi as follows:

  1. 1.

    With probability 12p, set zi according to Abj

  2. 2.

    With the remaining probability 2p, set bits zi,j(jj) according to the distribution Bj and independently “set” (i,j)Szi,j=c for a uniform random bit c.

In the second case, even though the parity may depend on variables from blocks other than i, we may informally think of it equivalently as fixing zi,j=(i,j)S{(i,j)}zi,j+c. Note that irrespective of the distribution of blocks other than i, it is indeed true that the above parity is equally likely to be 0 or 1 if we are the second case since zi,j is a uniform random bit (even after conditioning on all the other blocks) which appears in the parity. Additionally, note that after conditioning on zi,j=(i,j)S{(i,j)}zi,j+c and the other zi,j, the distribution on all blocks other than i is still j[n]{i}μxj since block i is independent of the rest.

Once we have set zi as above (where possibly zi,j is a linear form depending on other blocks) we substitute them in the tree T and simplify appropriately. Specifically if any query node becomes a constant we remove it and directly attach the appropriate child to its parent. In particular, if we are in the second case, then the query at the root is set to a random c and we move to that child. Note that this simplification preserves the action of the tree T on the distribution μx when conditioned on the revealed zi.

Since the distribution on the other blocks stays j[n]{i}μxj, we can now repeat this process with the query at the new root (which may be the same as the previous one if we were in case one, with all variables from block zi removed). This is done until T has become just a leaf and we give the same output in 𝒯.

We now analyze the expected number of queries made by 𝒯 in simulating T and show that it is at most rank(T)/p on any input x. We will show by induction that a PDT of rank at most r which only depends on variables from at most l blocks is simulated using at most Q(r,l)r/p queries in expectation.

Since a PDT with rank 0 or which does not depend on any blocks is just a leaf, the statement holds whenever r=0 or l=0. Suppose the statement holds for all pairs (r,l) with l<l. Let T be a PDT of rank at most r and depending on at most l blocks. Suppose xi is the first variable queried by the above simulation because of some zi,j appearing in the query at the root.

Consider what happens after we simplify the tree T based on the sampled zi. In all cases, the rank of the resulting tree, say T1 is at most r since the rank cannot increase by removing parts of the tree, and the number of blocks on which the tree depends has decreased by 1. Since case 2 while sampling zi happens with probability 2p, with probability at least p we go down the marked edge and the rank of T1 is at most r1. Thus, the expected number of queries made in simulating T is at most

1+(1p)Q(r,l1)+pQ(r1,l1)
1+(1p)rp+pr1p=rp (by induction)

Since the simulation of a randomized PDT 𝒯 corresponds to a distribution over trees simulating the deterministic PDTs T, we get that on any input x, the expected number of queries made is at most 𝔼T𝒯[rank(T)/p] which is RRankϵ-dt¯(g)/p if we take an optimal RPDT 𝒯 for g.

 Remark 16.

As pointed out by a reviewer, we can alternatively get a bound on the number of queries made during the simulation in Proposition 15 in the following way. Each time an actual query is made, with probability at least p, we go down the marked edge in the PDT being simulated, thereby contributing to the number of marked edges crossed. This implies that the expected number of marked edges crossed in the PDT when simulating it on the distribution μx is at least p times the expected number of queries made which gives what we wanted. This argument also shows that instead of considering expected rank of the PDT we could have considered the expected cost on the worst-case input.

Combining Observation 14 and Proposition 15, we get the following lifting theorem for randomized PDTs.

Theorem 17.

Suppose g:{0,1}m{0,1,} is k-stifled. For any relation {0,1}n×𝒪,

logRSizeϵ-dt¯(g)RRankϵ-dt¯(g)k2mRϵdt¯().
 Remark 18.

The factor m loss in the lower bound in Theorem 17 is necessary at least when we allow g to be a partial function as the following example shows. We will take the outer function to be parity n and the inner function to be approximate majority ApproxMAJm,k which is defined as follows: ApproxMAJ(y) is 0 if |y|k, 1 if |y|mk and otherwise. Note that k here denotes the ends rather than the gap.

When knm/4 and ϵ=1/4, the lower bound from Theorem 17 is just a constant. For this regime of parameters, there is a PDT computing nApproxMAJm,k which makes 1 parity query. For each block i[n], pick ji[m] uniformly. For each i[n], except with probability at most k/m, we have xi,ji=ApproxMAJm,k(xi). The PDT simply outputs the parity of (x1,j1,x2,j2,,xn,jn). The error probability is at most nk/m1/4 by the union bound.

4 Parity decision tree lower bounds via -depth

In this section, we prove the lifting theorems using -depth, Theorems 3 and 4.

4.1 Reduction to deterministic -depth and the Blocker-Certifier game

The proof of the lifting theorem for deterministic PDT size [16] implicitly contains a claim which reduces the task of proving lower bounds on PDT rank to the simpler task of proving lower bounds in a certain query model where one can only query one coordinate at a time but the input is a partial assignment instead of a binary string. This reduction works for all relations and, in particular, does not need the problem to be of composed form.

To describe the reduction, we need some definitions. Let {0,1}n×𝒪 be a relation. Define the relation {0,1,}n×𝒪 as follows. For every y{0,1,}n,

(y)=x{0,1}n:xy(x).

In words, o𝒪 is a correct output on y if there is some x consistent with y for which o is a correct output according to relation .

For a Boolean function f:{0,1}n{0,1,}, f has the following simple description. The input to the partial function f:{0,1,}n{0,1,} is promised to be a domain certificate of f and the goal is to output whether it is a 0-certificate or a 1-certificate. Recall that for a partial function f, for ρ{0,1,}n to be a domain certificate, we require that the subcube corresponding to ρ is completely contained in the domain of f.

Similar to 0-depth and 1-depth for ordinary decision trees, we may define -depth for decision trees on {0,1,}n. For a relation {0,1,}n×𝒪, let D-dt() be the smallest number of ’s any deterministic decision tree (which is only allowed to query an index at a time) computing must see in the worst case. Similarly let Rϵ-dt() and Rϵ-dt¯() be the analogous randomized query complexity measures when computing to error ϵ.

Since we mainly care about the -depth of relations of the form , we now introduce a game capturing D-dt(), called the Blocker-Certifier game. This is essentially obtained by specializing the usual Querier-Adversary game corresponding to decision tree depth to our setting. However, since the score only depends on the number of ’s, we may allow the Adversary to fix positions to 0 or 1 before they are queried (similar to some Delayer strategies in the Prover-Delayer game) and then Querier only picks a coordinate to be fixed to . In the game below, Blocker’s role is similar to that of Querier (or Prover) and Certifier corresponds to an Adversary (or Delayer).

Let ¯ denote ({0,1}n×𝒪), the complement of . The Blocker-Certifier game for is played on a string s{0,1,,}n which is initially n. The game is played in rounds. In a round,

  1. 1.

    Certifier picks a subset S{i[n]si=} (possibly empty) and for each iS, sets si=bi for some bi{0,1}.

  2. 2.

    Blocker picks an i[n] such that si= and sets si=.

The game ends when we have the following situation. There is some o𝒪, such that for every way of fixing the remaining ’s in s to bits {0,1}, there is a way to fix ’s in s to bits such that the resulting string x satisfies (x,o). In other words, the game has not ended if for every o𝒪, Certifier can fix the remaining ’s to bits to get an o-certificate for ¯. Certifier’s score is the number of ’s in s at the end of the game. The Blocker-Certifier value BCval() is the maximum score guaranteed by a Certifier strategy for the Blocker-Certifier game on . The equivalence between the Blocker-Certifier game and the usual Querier-Adversary game in this setting (or equivalently the definition of -depth) is proved in the full version.

We can now relate PDT rank for a relation and the -depth of . In the proof below, we use a Certifier strategy in the Blocker-Certifier game to give a parity Delayer strategy, but the argument can also be expressed as a simulation.

Lemma 19 (implicit in [16]).

For any relation ,

DRank-dt()BCval()=D-dt().

Proof.

Suppose we have a Certifier strategy for the Blocker-Certifier game on scoring r points. We will use this to give a Delayer strategy for the parity Prover-Delayer game on achieving the same score.

Delayer essentially imitates the Certifier strategy by localizing the parity queries of Prover to the input z so that they may be treated as positions that have been touched by Blocker in the Blocker-Certifier game. Delayer will have fixed some bits in z to 0 or 1 while some positions would have been marked (denoted ) based on the queries made by Prover. Each such marked position corresponds to a linear equation zi=iSzi coming from a parity query, where zi is marked and none of the positions in S were marked at the time of the query.

We now explain this in detail. We will use x to denote the string in the Blocker-Certifier game. L will denote a collection of linear equations as explained above, which starts off empty. In the beginning, Delayer fixes bits in z exactly according to the move made by Certifier at the start of the Blocker-Certifier game. On a parity query iSzi, Delayer first simplifies this parity query according to previously fixed bits to get a parity b+iSzi where b𝔽2 and all variables in S are still free. If S=, then Delayer simply responds with b. Otherwise arbitrarily mark some iS and respond with . Suppose Prover responds with c𝔽2. Then the equation zi=b+c+iS:iizi is added to L.

Next in the Blocker-Certifier game, Blocker sets xi to to which Certifier responds by (possibly) fixing some other bits of x. As before, Delayer fixes the variables in z in the same way as x. Later queries of the Prover are handled in essentially the same way as before, but the simplification now also has to remove any marked variables appearing the query by substituting suitable parities using the appropriate equations in L.

We claim the Prover-Delayer game cannot end unless the corresponding Blocker-Certifier game is over. Suppose less than r variables have been marked so far. For every o𝒪, we will create an input z consistent with all the parity queries made so far such that (z,o). Since fewer than r variables are set to in the Blocker-Certifier game, there is a way to fix the remaining free variables in x such that for all inputs y consistent with the obtained partial assignment x, (y,o). Now consider the string z obtained by fixing all the marked bits according to the equations in L. Since the marked variables are the pivots of these equations, such an extension indeed exists. By construction, this satisfies all the parity queries made so far and is consistent with the partial assignment x. Thus Delayer can always score at least r points.

To get a lifting theorem for PDT rank, the above lemma can now be combined with a lower bound on the Blocker-Certifier value for composed problems. First, note that D-dt((g))D-dt(g) since in the problem g, we are only required to be correct when each block lies in the domain of g, i.e. is a domain certificate of g. To get the lifting theorem for any k-stifled gadget g, [16] use that D-dt(g)kDdt(). This inequality is in the same spirit as Ddt(g)Ddt()C(g) or DRankdt(g)Ddt()(Cmin(g)1). Similar to the case of decision tree depth or rank for composed problems, we can get an essentially tight lower bound on the -depth of composed problems.

Lemma 20.

For any relation {0,1}n×𝒪 and any function g:{0,1,}m{0,1,},

D-dt(g)Ddt()(D-dt(g)1).

This is proved in the same way as the usual composition theorem for deterministic (ordinary) decision tree depth [38, 40, 32]. For completeness, we sketch a proof of a more general composition theorem in Appendix B of the full version which implies Lemma 20.

Combining Lemmas 19 and 20, we obtain the following.

Theorem 21.

For any relation {0,1}n×𝒪 and any function g:{0,1}m{0,1,},

DRank-dt(g)Ddt()(D-dt(g)1).

The unique disjointness function UDISJ2m is an example of a function for which the Blocker-Certifier value is much larger than how stifled it is. Recall that UDISJ2m:({0,1}2)m{0,1,} is the partial function such that UDISJ(x1,x2,,xm)=i[m](xi1xi2) where we are promised that there is at most one i[m] such that xi1xi2=1. This is a subfunction of both inner product and disjointness. It is easy to see that UDISJ is not 2-stifled, since there is no 0-certificate which leaves both x11 and x12 free.

On the other hand, there is a simple Certifier strategy in the Blocker-Certifier game333This strategy is essentially from discussion after the talk [39], but we have been unable to recognize who suggested it. for UDISJ2m which achieves score m. Initially Certifier does not fix any bits. Suppose Blocker sets xi1= (the case xi2= is analogous). Then Certifier responds by setting xi2=0 to ensure that xi1xi2=0. Certifier follows this strategy until the end of the game ensuring that for each i, either both xi1,xi2 are unset or at least one of them is fixed to 0. We claim that the game cannot end before m rounds. Indeed if fewer than m rounds have taken place, then there is some i[m] such that xi1=xi2=. Moreover Certifier’s strategy ensures that wherever this is not the case, we have xi1=0 or xi2=0. Therefore, for any b{0,1}, by setting xi1=xi2=b and all other unset bits to 0, we obtain a domain b-certificate for UDISJ which is consistent with all the moves made so far.

4.2 Reduction to randomized -depth

The following lemma is the randomized analogue of Lemma 19.

Lemma 22.

For any relation {0,1}n×𝒪,

RRankϵ-dt¯()12Rϵ-dt¯().

Proof.

Let 𝒯 be a randomized PDT computing to error ϵ. We will give a randomized decision tree 𝒯 for computing with expected -depth at most 2rank¯(𝒯). To do this, on any input z{0,1,}n, we will simulate 𝒯 on the distribution μz which is the uniform distribution over all strings in the subcube defined by z.

By the definition of , 𝒯 makes an error only when 𝒯 makes an error on the input in μz being simulated. Therefore,

Pr[𝒯 makes an error on input z]=𝔼xμz[PrT𝒯[T makes an error on x]]ϵ.

We now describe how 𝒯 simulates 𝒯. First sample a deterministic PDT T𝒯. The tree will keep track of a list L of linear equations xi=αi,x, one for each xi that has already been queried. The linear form on the right hand side of any such linear equation does not depend on any of the variables that have previously been queried. From this description, it is clear that these equations are linearly independent. Moreover, for each such i, if zi{0,1}, the corresponding equation in L is exactly xi=zi. We will additionally maintain the invariant that the system L is equivalent to the system defined by all the parities from the root to the current node when combined with equations xi=zi for all zi which have already been queried and are not .

Starting at the root of T, the tree performs the following steps until a leaf is reached in T.

  1. 1.

    Let iSxi (for some S[n]) be the query at the current node v of T. Iteratively perform substitutions using the equations in L until the query has been simplified to c+iSxi which does not contain any variables that have already been queried and c𝔽2. Set U= which will later store which iS have already been queried.

  2. 2.

    Repeat the following

    • Pick any iSU and query zi. If zi=, go to step 3(a). Otherwise add i to U, and xi=zi to L.

    until all xi,iS have been queried. When this happens, go to step 3(b).

  3. 3.
    1. (a)

      (zi=) Pick b{0,1} uniformly at random. Move to the child of v corresponding to c+jSxj=b. Add to L the equation xi=c+b+jUzj+jS(U{i})xj.

    2. (b)

      (none of zi, iS is ) Since all zi’s in the parity have been determined, move to the appropriate child c+jSxi=c+jSzi.

The output is the same as the label of the leaf reached in T.

Lemma 23 (proved later) shows that each leaf of T is reached with the correct probability according to the distribution μz. As argued earlier, this implies the correctness of the tree.

We now analyze the expected -depth of the above randomized decision tree simulating T. Note that in each round, the tree sees one if we reach step 3(a) and otherwise no ’s. We will keep track of the number of marked edges seen as a measure of progress. In step 3(b), the number of ’s seen has not changed this round. On the other hand, in step 3(a), since we move to a random child, with probability at least 1/2 we move down the marked edge in T. Thus, in expectation, after 2 ’s, we move down a marked edge in T. By linearity of expectation, after at most 2rank(T) -queries in expectation, a leaf is reached since the maximum number of marked edges on any root to leaf path is rank(T).

Lemma 23.

Let T be a deterministic PDT on {0,1}n. Let z{0,1,}n. Let Wz(v) be the event that node v of T is visited by the randomized procedure described in the proof of Lemma 22 when run on input z. Let Vz(v) be the event that for a random xμz, running T on x reaches v. Then for every z{0,1,}n,vT, we have Pr[Wz(v)]=Pr[Vz(v)].

Proof.

Fix z{0,1,}n. The proof is by induction on the depth of v. The statement holds when v is the root since in this case, Pr[Wz(v)]=Pr[Vz(v)]=1.

Now suppose v has depth at least 1. Let w be its parent. If Pr[Wz(w)]=0, then by induction Pr[Vz(w)]=0 and, therefore, Pr[Vz(v)]=Pr[Wz(v)]. Hence, we may assume that Pr[Wz(w)]=Pr[Vz(w)]>0. We can write Pr[Vz(v)]=Pr[Vz(w)]Pr[Vz(v)Vz(w)] and Pr[Wz(v)]=Pr[Wz(w)]Pr[Wz(v)Wz(w)]. So it suffices to prove that Pr[Vz(v)Vz(w)]=Pr[Wz(v)Wz(w)].

Let Lw be the list L of equations at the begin of the round where the current node is w during the execution of the decision tree simulation with z as the input string. Let Qw be the set of all zi that were queried before reaching w and which are not . Let α,x=jSxj be the parity query at w and let b be such that v is the child corresponding to jSxj=b. So Pr[Vz(v)Vz(w)] is the probability that for a random xμz, jSxj=b conditioned on all the equations from the root to w being satisfied. Note that since xi=zi whenever zi{0,1}, we may additionally condition on any subset of these fixed xi’s being the corresponding zi’s. By the invariants, the system of equations Lw is equivalent to the system containing equations describing the parities from the root to w as well as the queries made to zi so far. Therefore, by abuse of notation, we may express Pr[Vz(v)Vz(w)] as Pr[jSxj=bLw] where we view Lw as the event that all equations in Lw hold. Moreover under Lw, the parity jSxj is equal to c+jSxj for some S as in step 1 of the round. So we have Pr[Vz(v)Vz(w)]=Prxμz[c+jSxj=bLw].

Now we only need to verify that when simulating the query at node w we go to v with the correct probability Prxμz[c+jSxj=bLw]. There are two cases to consider:

  1. 1.

    For all iS, we have zi{0,1}. In this case, all zi are queried and step 3(b) is executed in the round starting at w. So we move to the correct child with probability 1=Prxμz[c+jSxj=c+jSzjLw].

  2. 2.

    Step 3(a) is executed in the round starting at w. In this case, there is some iS such that zi=. Since c+jSxj is independent of Lw by construction, Prxμz[c+jSxj=bLw]=1/2.

This finishes the proof.

We now combine Lemma 22 with a composition theorem for randomized -depth. Recall that a tight composition theorem does not hold in general for ordinary decision trees when composing a relation with a partial function and so we cannot have such a statement for -depth (by lifting with, say, (MAJ3)). However, we can still adapt known randomized composition theorems to the setting of -depth. In Appendix B of the full version, we adapt the composition theorem of [8] to prove a composition theorem for a general class of decision trees. This composition theorem provides the best dependence on the inner function up to some loss by a constant multiplicative factor and an additive constant.

Lemma 24 (following [8]).

For any relation {0,1}n×𝒪, any function
g:{0,1,}m{0,1,},

Rϵ-dt¯(g)Ω(Rϵdt¯()(LR(g)O(1))).

Combining Lemmas 22 and 24, we get the following.

Theorem 25.

For any relation {0,1}n×𝒪, any function g:{0,1}n{0,1,},

RRankϵ-dt¯(g)Ω(Rϵdt¯()(LR(g)O(1))).

Using Theorem 25 or some other related composition theorem, we can show that, for instance, when the inner function is UDISJ or IND, then the obvious upper bound on PDT rank of g is optimal. Note that both IP2m and DISJ2m are total functions extending UDISJ2m. So the lower bound for UDISJ also implies the lower bounds for inner product and disjointness in Corollary 5.

Corollary 26.

For any relation {0,1}n×𝒪, for any m2,

RRank-dt¯(UDISJ2m) =Θ(Rdt¯()m),
RRank-dt¯(INDm+2m) =Θ(Rdt¯()m).

Proving the lower bound for UDISJ will suffice to prove it also for IND since IND2m+22m contains UDISJ2m as a subfunction. By Theorem 25, it is sufficient to prove LR(UDISJ)Ω(m). Instead of showing this directly, we will show that the simpler quantity sabotage -complexity of (UDISJ2m) is Ω(m).

Sabotage complexity was first defined by Ben-David and Kothari [9] for ordinary decision trees and here we consider its natural -analogue. Sabotage complexity Rsab(g) is the expected zero-error query complexity of the following task. Let g:{0,1,}m{0,1,} be a partial function. Given a string z{0,1,,}m, where we interpret as representing that the coordinate is free, find a in z under the promise that z is consistent with some 0-input x and some 1-input y. It can alternatively be characterized as Rsab(g)=maxμminT𝔼(x,y)μ[sepT(x,y)] [23, Theorem B.1], where μ varies over distributions on pairs in g1(0)×g1(1), T varies over deterministic decision trees solving g and sepT(x,y) denotes the number of marked edges (-queries) on the path from the root to the node v where x and y separate. More precisely, v is the unique node in T such that both x and y reach v but they disagree on the query made at node v.

The composition theorem using sabotage complexity [9] is straightforward to adapt to -decision trees, Rϵ-dt¯(g)Rϵdt¯()Rsab(g), so we omit it.

Lemma 27.

Rsab((UDISJ2m))m14.

Proof.

For brevity, let hm denote (UDISJ2m). The hard distribution μm is generated as follows. First sample z({0,1,,}2)m in the following way. Pick i[m] uniformly. Set zi to (1,) or (,1) uniformly. For each ji, independently set zi to (0,) or (,0) uniformly. Finally, obtain x by replacing the in z by 0 and y by replacing the by 1. Observe that after conditioning on im, the distribution on z1zm1 is exactly what we would get if we performed the above procedure for m1. This will let us use induction.

Let l(m)=minT𝔼μm[sepT(x,y)]. We will show that l(m)m14 by induction. The base case m=1 is clear. For the induction step, we start by making some simplifying assumptions about T since we only care about the cost of T on the distribution μ. Since our distribution is invariant under permuting blocks and permuting bits within a block, we may assume that the query at the root in T is wm1 (we use w to denote the queries in T to avoid confusion with x,y,z). In the subtree where wm1=0, for any query to wm2, we remove it and directly attach its parent to the subtree where wm2=. We do the same with the roles of 0 and interchanged. Note that this does not affect correctness of T on μ since in any pair (x,y) in the support of μ, if xm1=ym1=0, then also xm2=ym2=, and similarly the other way around. Also the cost of T does not increase by performing this simplification.

By the observation above, since the distribution μ conditioned on im is identical to μm1, the subtrees where xm1=0 and xm1= give trees solving the separation task on the distribution μm1. Therefore, we have the recurrence,

l(m)m1m(12+l(m1)),

which by induction gives l(m)m14.

Proof of Corollary 26.

The upper bounds follow from simulating a randomized decision tree T for by using a deterministic tree for the inner function at each node of T.

For the lower bounds, as stated earlier, a lower bound for UDISJ2m also implies the same lower bound for IND2m+22m. So we only need to show the lower bound for UDISJ2m. By combining Rϵ-dt¯(g)Rϵdt¯()Rsab(g) and Lemma 27, we get Rϵ-dt¯((UDISJ2m))=Ω(Rϵdt¯()m). Now using this with Lemma 22, we get RRankϵ-dt¯(g)=Ω(Rϵdt¯()m).

 Remark 28.

The simulation using stifling gadgets, Theorem 17, can be understood as using the fact that for a k-stifled function g, Rsab(g)k/m. Indeed, if we couple the distributions of certificates underlying μ0 and μ1 used in that proof according to the set of coordinates that are fixed, any decision tree correctly computing g must see a on the first query with probability k/m.

References

  • [1] Scott Aaronson, Shalev Ben-David, and Robin Kothari. Separations in query complexity using cheat sheets. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 863–876, 2016. doi:10.1145/2897518.2897644.
  • [2] Yaroslav Alekseev, Yuval Filmus, and Alexander Smal. Lifting Dichotomies. In Rahul Santhanam, editor, 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1–9:18, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2024.9.
  • [3] Yaroslav Alekseev and Dmitry Itsykson. Lifting to bounded-depth and regular resolutions over parities via games. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC ’25, pages 584–595, New York, NY, USA, 2025. Association for Computing Machinery. doi:10.1145/3717823.3718150.
  • [4] Anurag Anshu, Dmitry Gavinsky, Rahul Jain, Srijita Kundu, Troy Lee, Priyanka Mukhopadhyay, Miklos Santha, and Swagato Sanyal. A composition theorem for randomized query complexity. In 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.FSTTCS.2017.10.
  • [5] Andrew Bassilakis, Andrew Drucker, Mika Göös, Lunjia Hu, Weiyun Ma, and Li-Yang Tan. The power of many samples in query complexity. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.ICALP.2020.9.
  • [6] Paul Beame and Sajin Koroth. On Disperser/Lifting Properties of the Index and Inner-Product Functions. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:17, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2023.14.
  • [7] Shalev Ben-David and Eric Blais. A new minimax theorem for randomized algorithms. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 403–411. IEEE, 2020.
  • [8] Shalev Ben-David, Eric Blais, Mika Göös, and Gilbert Maystre. Randomised composition and small-bias minimax. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 624–635. IEEE, 2022. doi:10.1109/FOCS54457.2022.00065.
  • [9] Shalev Ben-David and Robin Kothari. Randomized query complexity of sabotaged and composed functions. Theory of Computing, 14(5):1–27, 2018. doi:10.4086/toc.2018.v014a005.
  • [10] Tyler Besselman, Mika Göös, Siyao Guo, Gilbert Maystre, and Weiqiang Yuan. Direct sums for parity decision trees. In 40th Computational Complexity Conference (CCC 2025), Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2025.16.
  • [11] Sreejata Kishor Bhattacharya, Arkadev Chattopadhyay, and Pavel Dvořák. Exponential Separation Between Powers of Regular and General Resolution over Parities. In Rahul Santhanam, editor, 39th Computational Complexity Conference (CCC 2024), volume 300 of Leibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:32, Dagstuhl, Germany, 2024. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.CCC.2024.23.
  • [12] Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theoretical Computer Science, 288(1):21–43, 2002. doi:10.1016/S0304-3975(01)00144-X.
  • [13] Sourav Chakraborty, Chandrima Kayal, Rajat Mittal, Manaswi Paraashar, Swagato Sanyal, and Nitin Saurabh. On the composition of randomized query complexity and approximate degree. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.APPROX/RANDOM.2023.63.
  • [14] Arkadev Chattopadhyay, Yuval Filmus, Sajin Koroth, Or Meir, and Toniann Pitassi. Query-to-communication lifting using low-discrepancy gadgets. SIAM Journal on Computing, 50(1):171–210, 2021. doi:10.1137/19M1310153.
  • [15] Arkadev Chattopadhyay, Michal Kouckỳ, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. computational complexity, 28:617–659, 2019. doi:10.1007/S00037-019-00190-7.
  • [16] Arkadev Chattopadhyay, Nikhil S Mande, Swagato Sanyal, and Suhail Sherif. Lifting to parity decision trees via stifling. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.ITCS.2023.33.
  • [17] Arjan Cornelissen, Nikhil S Mande, and Subhasree Patro. Improved quantum query upper bounds based on classical decision trees. In 42nd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2022). Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.FSTTCS.2022.15.
  • [18] Yogesh Dahiya. Exploring Size Complexity and Randomness in the Query Model. PhD thesis, HBNI, 2024. URL: https://www.imsc.res.in/xmlui/handle/123456789/881.
  • [19] Yogesh Dahiya and Meena Mahajan. On (simple) decision tree rank. Theoretical Computer Science, 978:114177, 2023. doi:10.1016/J.TCS.2023.114177.
  • [20] Klim Efremenko, Michal Garlík, and Dmitry Itsykson. Lower bounds for regular resolution over parities. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 640–651, 2024. doi:10.1145/3618260.3649652.
  • [21] Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231–246, 1989. doi:10.1016/0890-5401(89)90001-1.
  • [22] Ankit Garg, Mika Göös, Pritish Kamath, and Dmitry Sokolov. Monotone circuit lower bounds from resolution. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 902–911, 2018. doi:10.1145/3188745.3188838.
  • [23] Dmytro Gavinsky, Troy Lee, Miklos Santha, and Swagato Sanyal. Optimal composition theorem for randomized query complexity. Theory of Computing, 19(1):1–35, 2023. doi:10.4086/TOC.2023.V019A009.
  • [24] Mika Göös, TS Jayram, Toniann Pitassi, and Thomas Watson. Randomized communication versus partition number. ACM Transactions on Computation Theory (TOCT), 10(1):1–20, 2018. doi:10.1145/3170711.
  • [25] Mika Göös, Pritish Kamath, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for p np. computational complexity, 28:113–144, 2019. doi:10.1007/S00037-018-0175-5.
  • [26] Mika Göös and Toniann Pitassi. Communication lower bounds via critical block sensitivity. SIAM Journal on Computing, 47(5):1778–1806, 2018. doi:10.1137/16M1082007.
  • [27] Mika Goos, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. SIAM Journal on Computing, 47(6):2435–2450, 2018. doi:10.1137/16M1059369.
  • [28] Mika Göös, Toniann Pitassi, and Thomas Watson. Query-to-communication lifting for bpp. SIAM Journal on Computing, 49(4):FOCS17–441, 2020. doi:10.1137/17M115339X.
  • [29] Dmitry Itsykson and Dmitry Sokolov. Resolution over linear equations modulo two. Annals of Pure and Applied Logic, 171(1):102722, 2020. doi:10.1016/J.APAL.2019.102722.
  • [30] Rahul Jain, Hartmut Klauck, and Miklos Santha. Optimal direct sum results for deterministic and randomized decision tree complexity. Information Processing Letters, 110(20):893–897, 2010. doi:10.1016/J.IPL.2010.07.020.
  • [31] Shachar Lovett, Raghu Meka, Ian Mertz, Toniann Pitassi, and Jiapeng Zhang. Lifting with Sunflowers. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022), volume 215 of Leibniz International Proceedings in Informatics (LIPIcs), pages 104:1–104:24, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2022.104.
  • [32] Ashley Montanaro. A composition theorem for decision tree complexity. Chicago Journal of Theoretical Computer Science, 2014(6), 2014. doi:10.4086/cjtcs.2014.006.
  • [33] Toniann Pitassi and Robert Robere. Lifting nullstellensatz to monotone span programs over any field. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1207–1219, 2018. doi:10.1145/3188745.3188914.
  • [34] Vladimir Podolskii and Alexander Shekhovtsov. Randomized lifting to semi-structured communication complexity via linear diversity. In 16th Innovations in Theoretical Computer Science Conference (ITCS), LIPIcs. Schloss Dagstuhl, 2025. doi:10.4230/LIPIcs.ITCS.2025.78.
  • [35] Pavel Pudlák and Russell Impagliazzo. A lower bound for dll algorithms for k-sat (preliminary version). In Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pages 128–136, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338244.
  • [36] Ran Raz and Pierre McKenzie. Separation of the monotone nc hierarchy. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pages 234–243. IEEE, 1997. doi:10.1109/SFCS.1997.646112.
  • [37] Swagato Sanyal. Randomized query composition and product distributions. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.STACS.2024.56.
  • [38] Petr Savický. On determinism versus unambiquous nondeterminism for decision trees. Technical Report TR02-009, Electronic Colloquium on Computational Complexity (ECCC), 2002. URL: https://eccc.weizmann.ac.il//report/2002/009/.
  • [39] Suhail Sherif. Lifting to parity decision trees via stifling (with applications to proof complexity). URL: https://www.youtube.com/watch?v=PeZVs6WUf-4.
  • [40] Avishay Tal. Properties and applications of boolean function composition. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science (ITCS), pages 441–454, 2013. doi:10.1145/2422436.2422485.
  • [41] Alasdair Urquhart. The depth of resolution proofs. Studia Logica, 99:349–364, 2011. doi:10.1007/S11225-011-9356-9.