An Adaptivity Hierarchy Theorem for Property Testing

Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of \emph{adaptive} testing algorithms, wherein each query may be determined by the answers received to prior queries, and their \emph{non-adaptive} counterparts, in which all queries are independent of answers obtained from previous queries. In this work, we investigate the role of adaptivity in property testing at a finer level. We first quantify the degree of adaptivity of a testing algorithm by considering the number of"rounds of adaptivity"it uses. More accurately, we say that a tester is $k$-(round) adaptive if it makes queries in $k+1$ rounds, where the queries in the $i$'th round may depend on the answers obtained in the previous $i-1$ rounds. Then, we ask the following question: Does the power of testing algorithms smoothly grow with the number of rounds of adaptivity? We provide a positive answer to the foregoing question by proving an adaptivity hierarchy theorem for property testing. Specifically, our main result shows that for every $n\in \mathbb{N}$ and $0 \le k \le n^{0.99}$ there exists a property $\mathcal{P}_{n,k}$ of functions for which (1) there exists a $k$-adaptive tester for $\mathcal{P}_{n,k}$ with query complexity $\tilde{O}(k)$, yet (2) any $(k-1)$-adaptive tester for $\mathcal{P}_{n,k}$ must make $\Omega(n)$ queries. In addition, we show that such a qualitative adaptivity hierarchy can be witnessed for testing natural properties of graphs.


Introduction
The study of property testing, initiated by Rubinfeld and Sudan [RS96] and Goldreich, Goldwasser and Ron [GGR98], has attracted significant attention in the last two decades (see, e.g., recent books [Gol10,Gol17,BY17] and surveys [Ron08,Ron09,Can15]). Loosely speaking, property testers are highly efficient randomized algorithms (typically running in sublinear time) that solve approximate decision problems, while only inspecting a tiny fraction of their inputs. More accurately, an ε-tester T for property P is a randomized algorithm that, given query access to an input x, decides whether x ∈ P or x is ε-far (say, in Hamming distance) from P. The query complexity of T is then the number of queries it makes to x.
In general, a testing algorithm may select its queries adaptively such that the i'th query is determined by the answers to the previous i − 1 queries, in which case it is said to be an adaptive tester. However, in many natural cases, testers may actually determine their queries solely based on their randomness (and input length), without any dependency on answers to previous queries; a tester that satisfies this condition is called a non-adaptive tester. A natural question, which commonly arises in query-based models, is whether the ability to make adaptive queries can significantly affect the query complexity.
Adaptive queries can be easily emulated at the cost of a large blowup in query complexity (exponential in the number of queries). More accurately, any q-query adaptive tester for a property of objects represented by functions f : D → R can be emulated by an |R| q -query non-adaptive tester (see e.g., [Gol17, Section 1.5]). While for certain types of properties and models -e.g., linear properties [BHR05] and properties in the dense graph model [GT03] -one has better emulations which come with little or no overhead, such efficient emulations cannot exist for all properties. As was shown by Raskhodnikova and Smith [RS06], in the bounded-degree graph model [GR00] there is a large chasm between the adaptive and non-adaptive query complexities of testing many natural graph properties. In particular, any property over bounded-degree graphs with n vertices, which is not determined by the vertex degree distribution, 1 requires Ω( √ n) queries to test non-adaptively, whereas many such properties (e.g., triangle-freeness and connectivity) have ε-testers with query complexity poly(1/ε). In this work, we investigate the role of adaptivity in property testing at a finer level. Rather than considering the extreme cases of fully adaptive testers versus completely non-adaptive testers, we consider testers with various levels of restricted adaptivity and ask the following question: Can the power of testers gradually grow with the "amount" of adaptivity they are allowed to use?
Besides the sheer theoretical interest of understanding the role of adaptivity in property testing, a motivation for this question comes from the constraints that come with adaptive algorithms, which may counterbalance the apparent gain in efficiency. Indeed, non-adaptive algorithms (or at least those which only use a small number of adaptive "stages") may be preferred in practice to their adaptive counterparts, in spite of the larger number of queries they make. The reason for this preference is the significant gains obtained by being able to make many queries in parallel: when each query is an experiment which, while relatively cheap by itself, may take several hours, assessing the trade-off between rounds of adaptivity and total number of queries becomes crucial.
An archetypal example where such considerations prevail is the (different) setting of group testing (see e.g. [DH00,Section 1.2]).
To answer the foregoing question, we shall first need to give a precise definition for the "amount" of adaptivity that a tester uses. To this end, it is natural to consider the number of "rounds of adaptivity" used by a tester. 2 More precisely, we say that a tester is k-round-adaptive if it generates and makes queries in k + 1 rounds, where in the i'th round the tester queries a set of locations Q i that may depend on the answers to queries in Q 0 , . . . , Q i−1 , obtained in previous rounds. We will quantify the "amount" of adaptivity that a tester uses by the number of rounds of adaptivity that it uses. Equipped with the notion of round adaptivity, we can proceed to present our results.

Our Results
Our main result provides a positive answer to the foregoing question by showing an adaptivity hierarchy theorem for property testing; that is, we show a family of properties {P k } k such that for every k, the property P k is "easy" for k-adaptive testers and "hard" for (k − 1)-adaptive testers.
Theorem 1.1 (Informally stated (see Theorem 4.1)). For every n ∈ N and 0 ≤ k ≤ n 0.99 there is a property P n,k of strings over F n such that: 1. there exists a k-round-adaptive tester for P n,k with query complexityÕ(k), yet 2. any (k − 1)-round-adaptive tester for P n,k must make Ω(n) queries.
The above theorem relies on an arguably contrived family of property, which was specifically tailored towards maximizing the separations; hence, one may wonder whether such strong separations also hold for more natural properties. As we show below, this is indeed the case: namely, we establish another adaptivity hierarchy theorem that, albeit weaker than Theorem 1.1, applies to the well-studied natural problem of testing k-cycle freeness in the bounded-degree graph model (see Section 5.1 for definitions).
Theorem 1.2. Let k ∈ N be a constant. Then, (i) there exists a k-round-adaptive tester with query complexity O(1/ε) for (2k + 1)-cycle freeness in the bounded-degree graph model; yet (ii) any (k −1)-round-adaptive tester for (2k +1)-cycle freeness in the bounded-degree graph model must make Ω( √ n) queries, where n is the number of vertices in the graph.
We conclude this section by posing two open problems that naturally arise from our work.
Open Problem 1 (One property to rule them all). Does there exist an adaptivity hierarchy with respect to a single property? That is, for any m and all sufficiently large n, is there a property P of elements of size n, and q 1 > . . . > q m (m "levels" of hierarchy) such that for every k ∈ [m] there exists a k-adaptive tester for P with query complexity q k , yet every (k − 1)-adaptive tester must make ω(q k ) queries to test P?
Open Problem 2 (Au naturel is just as good). Does there exist a family of natural properties which exhibits an adaptivity hierarchy with separations as strong as in Theorem 1.1?

Previous Work
As previously mentioned, the role of adaptivity in property testing has been the focus of several works before. It is well known that for any property of Boolean functions, there exists at most an exponential gap between adaptive and non-adaptive testers: any (adaptive) q-query testing algorithm for a property P of n-variate Boolean functions can be simulated by a non-adaptive tester with query complexity 2 q − 1. Further, such gaps are known to exist for some natural properties, such as read-once width-2 OBDDs [RT12,BMW11] and signed majorities [MORS09,RS13] (importantly, there also exist cases where adaptivity is known not to help [BLR93,BHR05]). Another prominent example of a class of Boolean functions where adaptivity is known to help is that of k-juntas [Bla09, Bla08, STW15, CST + 17], which can be tested adaptively withÕ(k) queries, yet for which the non-adaptive query complexity isΘ k 3/2 . Of course, the Boolean function setting is not the only one: in the dense graph model, it is known that while adaptivity can help [GR11], it will be at most by a quadratic factor [AFKS00,GT03]: that is, every graph property testable (adaptively) with q queries has an O q 2 -query non-adaptive tester. This is no longer the case in the bounded-degree model, however; where Raskhodnikova and Smith showed that there exist many properties which can be tested adaptively with a constant number of queries, but for which any non-adaptive tester must have query complexity Ω( √ n) [RS06]. However, all these results, even when they establish cases where adaptivity does help, leave open the question of how much adaptivity is needed for this to happen. In particular, for the case of properties of Boolean functions, many known adaptive testers which outperforms their non-adaptive counterpart do so, at some level, by conducting a binary search of some sort (see, e.g., [Bla09,RT12,RS13]) and thus comes inherently with a logarithmic numbers of "adaptive rounds." Our proof of Theorem 1.1 relies on a connection between the property testing and linear decision tree models. Although many of the ingredients we use are new, the connection itself is not and was first observed in [Tel14] (see also [BCK14] for a slightly different connection between property testing and parity decision trees).
Adaptivity in other settings. We remark that the notion of round complexity in communication complexity and interactive proof systems is somewhat analogous to that of round adaptivity, since in those models each round of communication or interaction allows the parties to adapt their strategies. Moreover, a round complexity hierarchy is known for communication complexity [NW93] and interactive proofs of proximity [GR17]. Finally, we also mention that the role of the number of adaptive measurements used by sparse recovery algorithms was shown to be very significant [IPW11].

Organization
In Section 2 we provide the preliminaries required for the technical sections. In Section 3 we provide a precise definition for testers with bounded adaptivity. In Section 4 we prove our main result, which is a strong adaptivity hierarchy theorem for a property of functions. In Section 5 we prove an adaptivity hierarchy theorem with respect to a natural property of graphs. Finally, in Section 6 we discuss adaptivity round reductions, as well as a connection to communication complexity, and the relation between round and tail adaptivity.

Preliminaries
We begin with standard notations: • We denote the relative Hamming distance, over alphabet Σ, between two vectors x ∈ Σ n and y ∈ Σ n by dist(x, y) := |{x i = y i : i ∈ [n]}| /n. If dist(x, y) ≤ ε, we say that x is ε-close to y, and otherwise we say that x is ε-far from y. Similarly, we denote the relative distance of x from a non-empty set S ⊆ Σ n by dist(x, S) := min y∈S dist(x, y). If dist(x, S) ≤ ε, we say that x is ε-close to S, and otherwise we say that x is ε-far from S. • We denote by A x (y) the output of algorithm A given direct access to input y and oracle access to string x. Given two interactive machines A and B, we denote by (A x , B(y))(z) the output of A when interacting with B, where A (respectively, B) is given oracle access to x (respectively, direct access to y) and both parties have direct access to z. Throughout this work, probabilistic expressions that involve a randomized algorithm A are taken over the inner randomness of A (e.g., when we write Pr[A x (y) = z], the probability is taken over the coin tosses of A). Integrality. For simplicity of notation, we hereafter use the convention that all (relevant) integer parameters that are stated as real numbers are implicitly rounded to the closest integer.
Uniformity. To facilitate notation, throughout this work we define all algorithms non-uniformly; that is, we fix an integer n ∈ N and restrict the algorithms to inputs of length n. Despite fixing n, we view it as a generic parameter and allow ourselves to write asymptotic expressions such as O(n). We remark that while our results are proved in terms of non-uniform algorithms, they can be extended to the uniform setting in a straightforward manner.

The Definition of Testers with Bounded Adaptivity
In this section, we provide a formal abstraction that captures the notion of bounded adaptivity within the framework of property testing. We define two notions of bounded adaptivity: (1) roundadaptivity, which refers to algorithms that are allowed to make a bounded number of "batches" of queries, where the queries in each batch may depend on the answers to previous batches; (2) tailadaptivity, which refers to algorithms that first make a large number of non-adaptive queries and subsequently make a bounded number of adaptive queries. We remark that while tail-adaptivity can be easily emulated via round-adaptivity, the converse does not hold. Indeed, in Section 6.3 we show that round-adaptive testers can be much more powerful than tail-adaptive testers. Nonetheless, our lower bounds hold for the stronger roundadaptivity notion, whereas out upper bounds hold for the more restrictive tail-adaptivity.
Definition 3.1 (Round-Adaptive Testing Algorithms). Let Ω be a domain of cardinality n, and let k, q ≤ n. A randomized algorithm is said to be a (k, q)-round-adaptive tester for a property P ⊆ 2 Ω , if, on proximity parameter ε ∈ (0, 1] and granted query access to a function f : Ω → {0, 1}, the following holds. (i) Query Generation: The algorithm proceeds in k + 1 rounds, such that at round ℓ ≥ 0, it produces a set of queries Q ℓ := {x (ℓ),1 , . . . , x (ℓ),|Q ℓ | } ⊆ Ω (possibly empty), based on its own internal randomness and the answers to the previous sets of queries Q 0 , . . . , Q ℓ−1 , and receives f (Q ℓ ) = {f (x (ℓ),1 ), . . . , f (x (ℓ),|Q ℓ | )}; (ii) Completeness: If f ∈ P, then the algorithm outputs accept with probability at least 2/3; (iii) Soundness: If dist(f, P) > ε, then the algorithm outputs reject with probability at least 2/3. The query complexity q of the tester is the total number of queries made to f , i.e., q = k ℓ=0 |Q ℓ |. If the algorithm returns accept with probability one whenever f ∈ P, it is said to have one-sided error (otherwise, it has two-sided error). We will sometimes refer to a tester with respect to proximity parameter ε as an ε-tester.
Remark 3.2 (On amplification). We note that, as usual in property testing, the probability of success can be amplified by repetition to any 1 − δ, at the price of an O(log(1/δ)) factor in the query complexity. Crucially, this can be done with no increase in the number of adaptive rounds: while repetition would naïvely multiply both q and k by this factor, one can avoid the latter by running the O(log(1/δ)) independent copies of the algorithm in parallel, instead of sequentially.
The query complexity q of the tester is the total number of queries made to f , i.e., q = |Q| + k.
If the algorithm returns accept with probability one whenever f ∈ P, it is said to be one-sided (otherwise, it is two-sided).
Remark 3.4 (On (lack of) amplification). Unlike the round-adaptive algorithms, tail-adaptive testing algorithms do not enjoy a simple success amplification procedure which would leave unchanged the adaptivity parameter, only affecting the query complexity. This is the reason why the success probability δ is explicitly mentioned in Definition 3.3.

A Strong Adaptivity Hierarchy
In this section we prove the adaptivity hierarchy theorem, which shows that, loosely speaking, up to a nearly linear threshold, each additional round of adaptivity can significantly augment the power of testing algorithms.
We remark that, in fact, the algorithm shown in the first item of Theorem 4.1 also gives an upper bound for the more restricted model of tail adaptivity. Specifically, for every k there also exists an (O(k),Õ(k))-tail-adaptive (one-sided) tester for P k . Since a (k − 1, q)-round-adaptive lower bound implies a (k−1, q)-tail-adaptive lower bound (see discussion in Section 3), this implies an adaptivity hierarchy (albeit slightly weaker than in Theorem 4.1) with respect to tail-adaptive testers.
Hereafter we assume, without loss of generality, 3 that n is a prime number, and consider F n , the field of order n. We will consider the following sequence of "k-iterated address" functions (f k ) k≥0 from F n n to {0, 1}, which will in turn lead to the definition of the properties (P k ) k≥0 that we use to show the hierarchy theorem. Loosely speaking, f k receives a vector x of n pointers (indices in [n]) and indicates whether when jumping from pointer to pointer k times, starting from an arbitrarily predetermined pointer, we reach a location in which x takes an even value.
To formally define the foregoing functions, first consider g : F n n × F n → F n given by g(x, a) = x a+1 ; that is, g returns the coordinate of x ∈ F n n "pointed to" by a ∈ {0, . . . , n − 1}. Based on this, we define the iterated versions of g, g 0 , . . . , g n , . . . : F n n → F n , as (For instance, f 0 (x) = 1 if and only if x 1 is even; and f 1 (x) = 1 if and only if the coordinate of x pointed to by x 1 , that is x x 1 +1 , is even.) We proceed to describe the outline of the proof of Theorem 4.1.

High-Level Overview
Broadly speaking, our roadmap for proving Theorem 4.1 consists of two main steps: 1. We first consider the adaptivity hierarchy question in the setting of randomized decision tree (DT) complexity (see Section 4.2). We can view a randomized DT for computing a function f as a probabilisitic algorithm that is given query access to an input x and is required to output f (x) with high probability. Adapting the definition of round adaptivity (Definition 3.1) in the natural way to decision trees, we will prove the randomized DT analogue of our adaptivity hierarchy theorem, using the foregoing family of address functions (f k ) k≥0 . Namely, we prove that for any k ≥ 0 with k = o(n), it holds that (i) f k can be computed by an algorithm making k + 1 queries, in k adaptive rounds; but (ii) any algorithm using only k − 1 rounds of adaptivity must make Ω(n) queries.
2. We then show a bidirectional connection between adaptivity-bounded randomized DT and property testers, which extends the connection observed by Tell [Tel14]. This allows us to "lift" the DT adaptivity hierarchy theorem to property testing. Specifically, we provide two blackbox reductions between the DT problem of computing function f and property testing for a related property P f , which preserve both the number of adaptive rounds and (roughly) the number of queries. We remark these reductions strongly rely on high-rate codes that exhibit both strong local testability and relaxed local decodability. The caveat with the above is that to "lift" DT lower bounds to testing algorithms via our methodology, we actually need to show lower bounds on a stronger model of DT (this stems from the reductions of the second item, in which we will encode the input via linear codes, requiring the DT algorithm to compute coordinates of this encoding).
Hence, we will actually work in the linear decision tree (LDT) model, wherein the algorithm is allowed to query any linear combination (over F n ) of the coordinates, instead of only querying individual coordinates. (We note that in the case of F 2 , this corresponds to the parity decision tree model.) That is, we will proceed as follows: 1. (L)DT hierarchy: show that for any k ≥ 0, the function f k (i) can be computed by an efficient (k, O(k))-round-adaptive (deterministic) DT algorithm, but (ii) does not admit any (k − 1, o(n))-round-adaptive (randomized, two-sided) LDT algorithm; 2. Transference lemmas: Show that for any function f : F n n → F n , there exists a property C f ⊆ F m(n) n such that, for any k ≥ 0, (a) a (k, q)-round-adaptive testing algorithm for C f implies a (k, q)-round-adaptive LDT algorithm for f (Lemma 4.10).
Combining the items above will directly imply our hierarchy theorem for property testing (Theorem 4.1): Proof of Theorem 4.1. The upper bound (i) follows immediately from Claim 4.3 and Lemma 4.11, while combining Lemma 4.4 and Lemma 4.10 establishes the lower bound (ii).
Organization for the rest of the section. In Section 4.2, we define the decision tree models and complexities that we shall need. Then, in Section 4.3, we prove the adaptivity hierarchy theorem for randomized (linear) decision trees. Finally, in Section 4.4 we prove the transference lemmas that allow us to lift the foregoing hierarchy theorem to the property testing framework.

Decision Tree Zoo
We shall need to extend the definitions of several different types of decision tree algorithms (see [BdW02] for an extensive survey of decision tree complexity) to the setting of bounded adaptivity.
Recall that a deterministic decision tree is a model of computation for computing a function f : Ω n → Ω. The decision tree is a rooted ordered |Ω|-ary tree. Each internal vertex of the tree is labeled with a value i ∈ {1, . . . , n} and the leaves of the tree are labeled with the elements in Ω. Given an input x ∈ Ω n , the decision tree is recursively evaluated by choosing to recurse on the i'th subtree in the j'th level if and only if x j = i. Once a leaf is reached, we output the label of that leaf and halt.
Equivalently, we can view deterministic decision trees as algorithms that get oracle access to an input x ∈ Ω n , then adaptively make queries to x, to the end of computing f (x). (Note that the j'th query corresponds to the j'th layer of the corresponding decision tree, and that the different vertices in the j'th layer represent the choices of the next queries, with respect to the answers obtained for previous queries). We define the deterministic decision tree complexity of a function f to be the minimal number of queries a deterministic decision tree algorithm needs to make to compute f in the worst case. 4 Taking the algorithmic perspective, we define k-round-adaptive deterministic decision tree algorithms as algorithms that generate their queries in k rounds, where queries in each round may depend on queries from previous rounds. The extension of the foregoing definition to randomized decision tree algorithms is done in the natural way, by allowing the algorithm to toss random coins and succeed with high probability (say, 2/3) in computing f (x). Finally, we shall also extend the definition to linear decision trees, which are decision trees algorithms wherein each query is a linear combination of the elements of the domain. We remark that linear decision trees can be thought of as generalizing both parity decision trees and algebraic query complexity algorithms [AW08].
More accurately, the aforementioned notions are defined below. We provide the definition of the most general model and derive the more restricted models as special cases.
Definition 4.2 (Round-Adaptive Decision Tree Algorithms). Let F be a finite field of cardinality n, and let k, q ≤ n. A (randomized) algorithm D is said to be a (k, q)-round-adaptive (linear) decision tree algorithm for computing a function f : F n → F if, granted query access to a string x ∈ F n , the following holds.
(ii) Computation: The algorithm computes f (x) with high probability using the answers it received in all k rounds; that is, Pr[D x = f (x)] ≥ 2/3. The query complexity q of the tester is the total number of (linear) queries made to f , i.e., q = k ℓ=0 |Q ℓ |. The randomized (k, q)-round-adaptive linear decision tree complexity of a function f , denoted R ⊕ k (f ), is the minimal query complexity for a (k, q)-round-adaptive randomized linear decision tree algorithm that computes f .
If for all ℓ ∈ [k + 1] and j ∈ [|Q ℓ |] the linear combination L ℓ,j only includes a single element (i.e., L ℓ,j only has a single non-zero entry), we say that D is a randomized (k, q)-round-adaptive decision tree algorithm complexity, and denote its corresponding complexity by R k (f ). If, in addition, the algorithm does not toss any random coins and succeeds with probability 1, we say that D is a deterministic (k, q)-round-adaptive decision tree algorithm complexity, and denote its corresponding complexity by D k (f ).

Decision Tree Hierarchy: Some Things Only Adaptivity Can Address
We first establish the upper bound part of our adaptivity hierarchy theorem for DT, which follows immediately from the construction.
Proof. The algorithm is straightforward: on input x ∈ F n n , it sequentially queries x 1 = g 0 (x), x g 0 (x)+1 = g 1 (x), . . . , x g k−1 (x)+1 = g k (x); and returns 1 if g k (x) is even, and 0 otherwise. By definition of f k , this always correctly computes the function, is deterministic, and clearly satisfies the definition of a (k, k + 1)-round-adaptive DT algorithm.
We proceed to show the lower bound part of our adaptivity hierarchy theorem for DT, which is proven via a reduction from communication complexity.
Lemma 4.4. There exists an absolute constant c > 0 such that the following holds. For every Proof. We will reduce the computation of f k+1 (in k rounds of adaptivity) to a related k-round twoparty randomized communication complexity problem, the "pointer-following" problem introduced by Papadimitriou and Sipser [PS82], and conclude by invoking the lower bound of Nisan and Wigderson [NW93] on this problem. This communication complexity problem between two computationally unbounded players, Alice and Bob, is defined as follows. Let V A and V B be two disjoint sets of cardinality n/2, and let v 0 ∈ V A be a fixed element known to both players. The input is a pair of functions ( Alice and Bob are given χ A and χ B respectively, as well as a common random string, and their goal is to compute (In other terms, one can see the communication problem as Alice and Bob sharing the edges of a bipartite directed graph where each node has out-degree exactly one, and the goal is to find at which vertex the path of length k starting at a prespecified vertex v 0 , on Alice's side, ends.) We will rely on the following lower bound on the k-round, randomized (public-coin) version of this problem.
Theorem 4.5 ( [NW93], rephrased). Any k-round randomized communication protocol for the "pointer-following" problem, in which Bob sends the first message, must have total communication complexity Ω n k 2 − k log n , even to only compute a single bit of π k (χ A , χ B ) with probability at least 2/3. Note that as long as k ≪ n log n 1/3 , this lower bound is Ω n k 2 . We remark that the fact that the lower bound still holds even when only a single bit of the answer is to be computed will be crucial for us, as our goal is to reduce the communication complexity problem of "pointer-following" to computing the Boolean function f k+1 in the randomized decision tree model.
Alice and Bob can then simulate the execution of A as follows. Without loss of generality, assume it is Alice's turn to speak. To answer a query of the form φ S (x) = i∈S x i , she computes i∈S∩V A x i and sends it to Bob; on his side, Bob computes i∈S∩V B x i , and receiving Alice's message can then recover the value φ S (x) and feed it to the algorithm. (In the next round, when sending his side of the (new) queries to Alice, Bob will also send this value φ S (x), to make sure that both sides know the answers to all queries so far.) Since all queries of a given adaptive round of A can be prepared and sent in parallel (costing O(log n) bits of communication per query), this simulation can be performed in k + 1 rounds (as many as A takes) with communication complexity O(q log). At the end, whichever of Alice and Bob received the latest message holds the answer (to "is π k+1 (χ A , χ B ) an even node?"), which by assumption on A is correct with probability at least 2/3. Alice and Bob then use an extra round of communication to broadcast the answer to the other party, bringing the total number of rounds to k + 2.
But by Theorem 4.5, computing this bit of π k+2 (χ A , χ B ) with only k + 2 rounds of communication (Bob speaking first) requires Ω n k 2 bits of communication, and so we must have q = Ω n k 2 log n .

Adaptivity Bounded Testers and Decision Trees: There and Back Again
In this section we show how to reduce problems in the adaptivity bounded property testing model to problems in the adaptivity bounded (linear) decision tree model, and vice versa. We begin in Section 4.4.1, by presenting the required preliminaries regarding error-correction codes. Then, in Section 4.4.2, we prove the "transference lemmas" between these models.

Preliminaries: Locally Testable and Decodable Codes
Let k, n ∈ N. A code over alphabet Σ with distance d is a function C : Σ k → Σ n that maps messages to codewords such that the distance between any two codewords is at least d = d(n). If d = Ω(n), C is said to have linear distance. If Σ = {0, 1}, we say that C is a binary code. If C is a linear map, we say that it is a linear code. The relative distance of C, denoted by δ(C), is d/n, and its rate is k/n. When it is clear from the context, we shall sometime abuse notation and refer to the code C as the set of all codewords {C(x)} x∈Σ k . Following the discussion in the introduction, we define locally testable codes and locally decodable codes as follows. Definition 4.7 (Locally Decodable Codes). A code C : Σ k → Σ n is a locally decodable code (LDC) if there exists a constant δ radius ∈ (0, δ(C)/2) and a probabilistic algorithm (decoder) D that, given oracle access to w ∈ Σ n and direct access to index i ∈ [k], satisfies the following condition: For any i ∈ [k] and w ∈ Σ n that is δ radius -close to a codeword C(x) it holds that Pr[D w (i) = x i ] ≥ 2/3. The query complexity of a LDC is the number of queries made by its decoder.
We shall also need the notion of relaxed-LDCs (introduced in [BGH + 06]). Similarly to LDCs, these codes have decoders that make few queries to an input in attempt to decode a given location in the message. However, unlike LDCs, the relaxed decoders are allowed to output a special symbol that indicates that the decoder detected a corruption in the codeword and is unable to decode this location. Note that the decoder must still avoid errors (with high probability). 5 Definition 4.8 (Relaxed-LDC). A code C : Σ k → Σ n is a relaxed-LDC if there exists a constant δ radius ∈ (0, δ(C)/2) such that the following holds.
1. (Perfect) Completeness: For any i ∈ [k] and x ∈ Σ k it holds that D C(x) (i) = x i .

Relaxed Soundness:
For any i ∈ [k] and any w ∈ Σ n that is δ radius -close to a (unique) codeword There are a couple of efficient constructions of codes that are both relaxed-LDCs and LTCs (see [BGH + 06, GGK15]). We shall need the construction in [GGK15], which has the best parameters for our setting. 6 Theorem 4.9 (e.g., [GGK15, Theorem 1.1]). For every k ∈ N, α > 0, and finite field F there exists an F-linear code C : F k → F k 1+α with linear distance, which is both a relaxed-LDC and a (one-sided error) LTC with query complexity poly(1/ε); furthermore, both testing and (relaxed) decoding procedures are non-adaptive.

Transference Lemmas
Fix any α > 0. Let C : F n n → F m n be a code with constant relative distance δ(C) > 0, with the following properties: • linearity: for all i ∈ [m], there exists a set S i ⊆ [n] such that C(x) i = j∈S i x i for all x ∈ F n n ; • rate: m ≤ n 1+α ; • testability: C is a strong-LTC with one-sided error and non-adaptive tester; • decodability: C is a relaxed-LDC.
We will rely on Theorem 4.9 for the existence of such codes. Before delving into the details, we briefly explain the reason for each of the points above. The linearity will be crucial to reduce to and from the LDT model: indeed, any coordinate of a codeword corresponds to a fixed linear combination of the coordinates of the message, which corresponds to a single LDT query on that particular linear combination. The rate bound is required since our lower bounds are in terms of 5 The full definition of relaxed-LDCs, as defined in [BGH + 06] includes an additional condition on the success rate of the decoder. Namely, for every w ∈ {0, 1} n that is δ radius -close to a codeword C(x), and for at least a ρ fraction of the indices i ∈ [k], with probability at least 2/3 the decoder D outputs the i'th bit of x. That is, there exists a set Iw ⊆ [k] of size at least ρk such that for every i ∈ Iw it holds that Pr [D w (i) = xi] ≥ 2/3. We omit this condition since it is irrelevant to our application, and remark that every relaxed-LDC that satisfies the first two conditions can also be modified to satisfy the third conditions (see [BGH + 06, Lemmas 4.9 and 4.10]). 6 Specifically, the codes in [GGK15] are meaningful for every value of the proximity parameter, whereas the codes in [BGH + 06] require ε > 1/ polylog(k). the dimension n and upper bounds in terms of the block-lengh m. Ideally, we would like m = O(n), to have a direct correspondence between the LDT and the property testing query complexities; however, this nearly-linear rate is the best known achievable for constant-query LTCs and relaxed-LDCs [GGK15]. The LTC property will be useful to us in the reduction from property testing to DT query complexity (where we will need to first check that our input is close to a codeword, in view of decoding the closest message during the reduction), where the strong testability (i.e., rejection with probability proportional to the distance from a valid codeword) will allow us do deal with arbitrarily small values of the proximity parameter. Similarly, we will rely on the (relaxed) LDC property in that same reduction, in order to obtain individual coordinates of the message, given query access to an input close to a codeword. We proceed to show the framework for reducing property testing to decision tree complexity and vice-versa. For a fixed function f : F n n → {0, 1}, consider the subset f −1 (1) ⊆ F n n ; and define the sets of codewords C : Consider now testing the property C f : we will reduce the LDT computation of f to the testing of C f . Specifically, we prove the following. Proof. Suppose there exists a (k, q)-round-adaptive tester T for C f . On input x ∈ F n n , we emulate the invocation of T , with respect to proximity parameter ε = δ(C), on the encoded input y := C(x) ∈ F m n and output 1 if and only if T returns accept. To see why this is correct, observe that by definition, if f (x) = 1 then y ∈ C f . However, if f (x) = 0, then for any y ′ ∈ C f such that y ′ = C(x) we must have dist(y, y ′ ) > ε, by the distance of our code.
It remains to show that this simulation can be achieved efficiently, as claimed. To do so, we will rely on the fact that C is a linear code: whenever T queries y i , we can compute the set S i ⊆ [n] (which only depends on C, and not on x), and perform the LDT query j∈S i x j . The simulation clearly preserves the number of adaptive rounds as well, concluding the proof.
In our next lemma, we give a partial converse relating property testing and decision tree complexity, with some logarithmic overhead in the resulting query complexity. Proof. Fix k ≥ 0, and suppose there exists such a (k, q)-round-adaptive DT algorithm A for f . On input y ∈ F m n and proximity parameter ε ∈ (0, 1], we would like to decode y to a message x ∈ F n n and invoke the algorithm on x to determine if f (x) = 1; more precisely, we wish to invoke the DT algorithm while simulating each query to x by locally decoding y using O(1) queries. The issue, however, is that the success of the local decodable is only guaranteed for inputs that are sufficiently close to a valid codeword, and we have no such guarantee on y a priori. However, recalling that C is a strong-LTC, we can handle this as follows. Letting δ radius > 0 be the decodability radius of the relaxed-LDC C, we set δ * := min(δ radius , ε).
(1) Run independently O(poly(1/δ * )) times the local tester for the strong-LTC C on y, and output reject if any of these rejected. Since every invocation of the local tester makes O(1) queries to y, this has query complexity O(poly(1/δ * )) = O(poly(1/ε)); and if dist(y, C) > δ * then this step outputs reject with probability at least 9/10.
(2) Invoke A on the message x := argmin { dist(C(x), y) : x ∈ F n n }, answering each query x i by calling the local decoder for the relaxed-LDC C. This is done so that the decoder is correct with probability at least 1/(10q), by standard repetition (taking the plurality value); with the subtlety that we output reject immediately whenever the decoder returns ⊥. Since each query can be simulated by O(log(q)) queries (repeating the O(1) queries of the decoder O(log(1/q)) times), this step has query complexity O(q log q); and at the end, we output accept if, and only if, A returns the value 1 for f (x). Importantly, Step (1) can be run in parallel to Step (2), and in particular can be executed during the first "batch" of queries A makes. This guarantees that the whole simulation above uses the same number of adaptive rounds as A, as claimed. It remains to argue correctness.

Completeness.
Assume y ∈ C f . In particular, y is a codeword of C, and the (one-sided) local tester returns accept with probability one in (1). Then, since by definition there is a unique x ∈ F n n such that C(x) = y, the local decoder of Step (2) will correctly output the correct answer for each query with probability 1, and therefore A will correctly output f (x) with probability 2/3 -so that the tester returns accept with probability at least 2/3 overall. (Moreover, if the DT algorithm A always correctly compute f , then the tester returns accept with probability one.) Soundness. Assume dist(y, C f ) > ε. If dist(y, C) > δ * , then the local tester returns reject with probability at least 9/10 in Step (1). Therefore, we can continue assuming that dist(y, C) ≤ δ * , which satisfies the precondition of the relaxed-LDC decoder in Step (2). By a union bound over all q queries, with probability at least 9/10 we have that the decodings performed in Step (2) are all correct; in which case we answer the queries of the algorithm according to x := argmin { dist(C(x), y) : x ∈ F n n } (or possibly answered by ⊥, in which case the tester immediately outputs reject and we are done). Since dist(y, C(x)) ≤ δ * ≤ ε, we must have C(x) ∈ C f , which implies that A correctly returns f (x) = 0 with probability at least 2/3, in which case the tester outputs reject. Overall, this happens with probability at least 9/10 · 9/10 · 2/3 = 27/50. Thus, in both cases the tester is correct with probability at least 27/50; repeating a constant number of times (as explained in Remark 3.2) and taking the majority vote allows us to amplify the probability of success to 2/3.

An Adaptivity Hierarchy with respect to a Natural Property
In this section we show a natural property of graphs for which, broadly speaking, more adaptivity implies more power. More specifically, we prove the following adaptivity hierarchy theorem with respect to the property of k-cycle freeness in the bounded-degree graph model (see definitions in Section 5.1). We stress that although Theorem 4.1 establishes an adaptivity hierarchy with stronger separations, the merit of Theorem 5.1 is in showing that an adaptivity hierarchy also holds for a natural well-studied property. We further observe that the choice of the bounded-degree graph model is not insignificant: one cannot hope to establish such a striking gap in other settings such as the dense graph model or in the Boolean function testing setting. Indeed, as discussed in Section 1.2 it is wellknown that in these two models, any adaptive tester can be made (fully) non-adaptive at the price of only a quadratic and exponential blowup in the query complexity, respectively(see [AFKS00,GT03] for the former; the latter is folklore). We remark that in Section 6.1 we discuss emulating testers with k rounds of adaptivity by testers with k ′ < k rounds.

Cycle Freeness in the Bounded Degree Graph Model
In the subsection we provide the necessary definitions and establish a basic upper bound on the complexity of k-adaptive testing of cycle freeness in the bounded degree graph model. We begin with a definition of the model.
Let G = (V, E) be a graph with constant degree bound d < |V |, represented by its adjacency list; that is, represented by a function g : if v has less than i neighbors. A bounded degree graph property P is a subset of graphs (represented by their adjacency list) that is closed under isomorphism; that is, for every permutation π it holds that G ∈ P if and only if G ∈ π(G). The distance of graph G from property P is the minimal fraction of entries in g one has to change to reach an element of P.
We extend the definition of functional round-adaptive testing algorithms to the bounded degree graph model in the natural way.
Definition 5.2 (Round-Adaptive Testing in the Bounded Degree Graph Model). Let G = (V, E) be a graph with constant degree bound d < |V |, represented by its adjacency list g : V × d → V , and let k, q ≤ n. A randomized algorithm is said to be a (k, q)-round-adaptive tester for a (bounded degree) graph property P, if, on proximity parameter ε ∈ (0, 1] and granted query access to g, the following holds. (i) Query Generation: The algorithm proceeds in k + 1 rounds, such that at round ℓ ≥ 0, it produces a set of queries Q ℓ := {x (ℓ),1 , . . . , x (ℓ),|Q ℓ | } ⊆ Ω (possibly empty), based on its own internal randomness and the answers to the previous sets of queries Q 0 , . . . , Q ℓ−1 , and receives f (Q ℓ ) = {g(x (ℓ),1 ), . . . , g(x (ℓ),|Q ℓ | )}; (ii) Completeness: If G ∈ P, then the algorithm outputs accept with probability at least 2/3; (iii) Soundness: If dist(G, P) > ε, then the algorithm outputs reject with probability at least 2/3. The query complexity q of the tester is the total number of queries made to f , i.e., q = k ℓ=0 |Q ℓ |. If the algorithm returns accept with probability one whenever f ∈ P, it is said to have one-sided error (otherwise, it has two-sided error). As before, we will sometimes refer to a tester with respect to proximity parameter ε as an ε-tester.
Next, we define the (bounded degree) graph property of k-cycle freeness.  ≤ k and v 1 , . .
Finally, we make the following observation, which roughly speaking implies that when surpassing a certain threshold of round adaptivity, testing cycle freeness in the bounded degree graph model becomes "easy." 7 Observation 5.4. For every k ∈ N there exists a (k, q)-round-adaptive testing algorithm for (2k + 1)-cycle freeness and (2k + 2)-cycle freeness in the bounded-degree graph model with query Proof. The algorithm explores the graph in the most natural way: starting from O(1/ε) "source vertices" selected uniformly at random, it adaptively explore their neighborhoods by querying at each round the neighbors of the previously reached vertices, in a breadth-first-search fashion. If any (2k + 1)-cycle (resp. (2k + 2)-cycle) is detected, the algorithm rejects, and accepts otherwise. (Clearly, this tester is one-sided.) It is easy to see that if any of the source vertices belongs to a (2k + 1)-or (2k + 2)-cycle, then this bounded-depth BFS will detect it; thus, we only need to argue that if the graph is ε-far from cycle freeness, with constant probability, one of the source vertices will participate in such a cycle. But this is the case, as any such graph must have at least εn vertices participating in a cycle (indeed, otherwise one could "correct" the graph by removing less than εdn vertices, contradicting the distance).
Finally, for each source vertex, after k rounds of adaptivity the number of nodes visited is at most O(d k+1 ), hence the claimed query complexity.

Lower Bounds for Round-Adaptive Testers
In this subsection, we prove the following lemma, which roughly speaking shows that testing (2k+3)cycle freeness is hard for k-round-adaptive testing algorithms.
In stark contrast, recall that Observation 5.4 shows that testing (2k + 2)-cycle freeness is easy for k-round-adaptive testing algorithms. Indeed, the proof of Theorem 5.1 follows by combining Observation 5.4 and Lemma 5.5 together.
Proof of Lemma 5.5. We will show a distribution of (2k + 3)-cycle free graphs, denoted Y, and a distribution of graphs that are "far" from being (2k + 3)-cycle free, denoted N , and prove that no (k, q)-round-adaptive testing algorithm can distinguish, with high probability, between Y and N . Loosely speaking, Y consists of all graphs whose vertices are covered via disjoint (2k + 4)-cycles, and N consists of all graphs whose vertices are covered via disjoint (2k + 3)-cycles.
More accurately, denote by P t,n,d the subset of n-node graphs with maximum degree at most d that are t-cycle-free. Let Σ t,s be the 2-regular graph on st vertices made of s disjoint t-cycles, −1)t+1 , . . . , v st ). Denote also by Is r the independent set on r vertices. For two graphs G, G ′ on respectively m and m ′ vertices and with e and e ′ edges, we write G ⊔ G ′ for the graph on m + m ′ vertices and with e + e ′ edges obtained by concatenating disjoint copies of G, G ′ .
Proof. The first part is obvious, as the only cycles in G yes k are (2k + 4)-cycles. As for the second, it immediately follows from observing that G no k contains ℓ ′ disjoint (2k + 3)-cycles, and thus at least ℓ ′ edges have to be removed to make it (2k + 3)-cycle free. Thus, dist G no k , P (2k+3),n,d ≥ ℓ ′ dn/2 = Ω 1 dk = Ω d (1).
Let T be a deterministic testing algorithm with k rounds of adaptivity and query complexity q ′ = o( √ n). The following lemma concludes the proof of Lemma 5.5 by showing that T cannot distinguish, with high probability, between graphs in Y and graphs in N . Denote T 's (disjoint) query sets, per round, by Q 0 , . . . , Q k ⊆ V , where a query is a vertex v. Denote the corresponding sets of answers by A 0 , . . . , A k , where the answer to a query v consists of the labels of all neighbors of v (i.e., either two or zero vertices). Since k = O(1), without loss of generality, we can assume (by padding) that all query sets have the same size q := |Q i | = q ′ k+1 = Θ(q ′ ) for every i ∈ {0, . . . , k}. Moreover, we can also assume that no vertex is queried twice, i.e. that all Q i 's are disjoint.
Proof. For j ∈ {0, . . . , k}, define by Y j and N j the distribution of (A 0 , . . . , A j ) when G ∼ Y and when G ∼ N , respectively. We shall prove that d TV (Y k , N k ) ≤ 1 10 , which by the data processing inequality will imply the claim of Lemma 5.7.
The high-level idea is that in each round, the tester can either query "fresh" vertices, of which it has no prior information, or query the boundaries (i.e., the direct neighbors) of previously queried vertices. Then, loosely speaking we can argue that, on the one hand, if the total number of queries is o( √ n), then both for graphs in Y and N all queries of "fresh" vertices (obtained during all rounds) with high probability would only fall into previously unattained disjoint cycles, in which case the answer would be a uniform sequence of "fresh" labels. On the other hand, the local view obtained by querying the boundary, using at most k rounds of adaptive queries, of each vertex previously obtained via a "fresh" query (which by the above lies in a cycle wherein the tester has no information of the labels of the other vertices participating in this cycle) is isomorphic to the tail graph over fresh labels, both for instances taken from Y and N (that is, we do not have enough adaptive queries to observe a full cycle). The foregoing intuition is formalized below. For i ∈ {0, . . . , k}, define to be, respectively, the set of "entirely fresh" nodes queried at round i (that is, nodes that are not neighbors of any previously queried node), and the set of "boundary nodes" (which are the not-yet-queried nodes neighbors of a previously queried node).
First, we bound the probability that any of the q ′ queries made "hits" the set of disconnected nodes: Proof. This follows by induction: at step i, conditioned on no isolated node having been queried yet, the algorithm has degree information about ·i nodes, so there remain at least n − 3kq nodes on which the algorithm has no degree information at all. Among these, there are n − (2k + 4)ℓ ≤ (2k + 4) (or n − (2k + 3)ℓ ′ ≤ (2k + 3), in the no-case) isolated nodes. By symmetry, this means that in the new batch of q queries, the algorithm will query one of these isolated nodes with probability at most 1 − 1 − . Therefore, overall there will be an isolated node queried with probability at most k · o(1) = o(1).
Next, we argue that at each step, with overwhelming probability all the "fresh nodes" queried fall in distinct cycles, which have not been attained yet.
Claim 5.9. Let E 2 (G) denote the event that at some round i, one of the queries in S f i belongs to the same cycle (either a (2k + 4)or a (2k + 3)-cycle, depending on whether the graph is drawn from Y or N ) as one of the previous queries Proof. We will show that Pr G∼Y [ E 2 (G) ] = o(1); the no-case is similar. For i ∈ {1, . . . , k}, let E (i) 2 (G) denote the event that at some round i, one of the queries in S f i belongs to the same cycle as a previous query, so that To conclude the proof, note that by the above, with probability 1 − o(1) neither E 1 nor E 2 occurs; that is, none of the isolated vertices was queried, and all the "fresh" queries (during all rounds ) fell in previously unattained distinct cycles. In this case, at each round of adaptivity the algorithm can at most discover two new nodes out of every cycle it reached before (by including the one or two end nodes of the current "discovered portion" into S b i ). Therefore, on any cycle ever reached, the (k, q)-round-adaptive testing algorithm can observe at most 2k + 2 nodes (which then form a consecutive path). We show that this implies that the algorithm cannot distinguish between a no-instance and a yes-instance, as loosely speaking, in both cases its local view is of a tail graph over uniformly distributed fresh labels, and so it is unable to determine whether it belongs to a cycle of length 2k + 3 or 2k + 4.
To make the argument more precise, we will actually show a stronger statement; namely, we show that, conditioning on neither E 1 nor E 2 occuring, a simulator with no access to the graph can answer the queries of the testing algorithm in a way that is indistinguishable from the tuple of answers obtained from querying a graph distributed according to either Y or N . This simulator operates as follows: at round i, 1. Order (arbitrarily) all the nodes of Q i : v 1 , . . . , v q , and initialize the set of available-to-sample Do sequentially the following, for s = 1 . . . q: • if v s ∈ S f i (fresh node: no previous neighbors known), pick uniformly at random two distinct nodes u, u ′ in U s and return them as answers (i.e., declare them as neighbors of v s ); • otherwise, v s ∈ S b i (boundary node: exactly one already known neighbor, call it u): pick uniformly at random one other node u ′ in U s , and return (u, u ′ ) as answers; It is straightforward to verify that, since we conditioned on E 1 and E 2 , this simulates exactly the same distribution over nodes (over the choice of G); since this is the same both for Y and N , we get that d TV (Y k | E 1 ∪ E 2 )), (N k | E 1 ∪ E 2 )) = 0, which combined with Claim 5.8 and Claim 5.9 finishes the proof.
This concludes the proof of Lemma 5.5.

On Simulating k Rounds With Fewer
As mentioned in the beginning of Section 5, in the Boolean setting any adaptive property testing algorithm can be simulated non-adaptively with only an exponential blowup in the query complexity. Phrased differently, this implies that any property of Boolean functions which admits a (k, q)-round-adaptive tester also has a (0, 2 q − 1)-round-adaptive tester.
This begs the following more general question: let P = n P n be a property of Boolean functions, such that there exists a (k, q)-round-adaptive tester for P. For ℓ < k, what upper bound can we obtain on the query complexity q ′ of the best (ℓ, q ′ )-round-adaptive tester for P?
Denoting by q ℓ this query complexity, the above discussion immediately implies: Fact 6.1. For any 0 ≤ ℓ ≤ k, one has q k ≤ q ℓ ≤ 2 q k − 1.
In what follows, we provide a example of a more fine-grained version of this fact, in the case when ℓ = k − 1 (that is, one wishes to reduce the number of rounds of adaptivity by one).
Proof. Let T k be a (k, q)-round-adaptive tester for P, which can be viewed as a distribution over deterministic algorithms. Thus, it is sufficient to explain how to simulate any deterministic algorithm with k rounds of adaptivity by one with ℓ rounds. Fix such a (k, q)-round deterministic algorithm: this can be seen equivalently as a depth-(k + 1) binary tree, where each internal node v is labeled by the set of queries Q v made at that stage, and the leaves are either accept or reject. By assumption, we have that on each path (v 0 , v 1 , . . . , v k , v * ) from the root to a leaf, k j=0 Q v j ≤ q; moreover, one can assume without loss of generality that this is an equality.
The idea is then to contract, on any path, two consecutive nodes as follows: instead of querying Q v j , receiving the answers, and then querying the (adaptively chosen) set Q v j+1 , one can idea query simultaneously Q v j and the union of all possible sets Q v j+1 : since the latter depends only on the previous queries, and the only unknown answers are those to the queries in Q v j , there are at most 2 |Qv j | possibilities for Q v j+1 . As clearly no matter what Q v j+1 would be, its size is at most q, the set Q ′ i = Q v j ∪ Q : possible Qv j+1 Q queried has size at most Q v j + q2 |Qv j | . Thus, by contradicting the two rounds i and i + 1, one incurs an additional number of queries upper bounded by q2 |Qv j | − Q v j+1 ≤ q2 |Qv j | By an averaging argument, since on every such path we have k j=0 Q v j = q, there must exist an index j * such that Q v j * ≤ q k+1 . Since we would like to "contract" rounds j * and j * + 1 into a single round, we additionally want to ensure j * < k. But similarly, as k−1 We then get an index i * < k (which depends on the path taken down the tree) to which we can apply the above transformation. That is, whenever the deterministic algorithm is executed it will reach an index i * < k where it should make Q v i * ≤ q k queries. At that point, it makes instead these queries, along with all queries this should have triggered at the next round, and thus is able to skip round i * + 1 at the price of an additional (at most) q2 q k queries.
Remark 6.3. Note that in the above proof, while one can assume without loss of generality that the algorithm always makes exactly q queries, one cannot however assume that for any two such paths That is, the number of queries made in round j may not be the same depending on the path followed down by the algorithm, but instead depend adaptively on the previous queries made.
The above remark shows the difficulty in extending the proof of Proposition 6.2 further than a single round. If one is willing to assume that the number of queries at each round is non-adaptive, it becomes possible to obtain a more general statement for 0 ≤ ℓ < k; however, it is unclear how to proceed without this extra assumption, leading to the following question: Open Problem 3. Can one obtain a general round-reduction upper bound for 0 ≤ ℓ < k of the form q ℓ ≤ φ(q k , ℓ, k), improving on Fact 6.1 for ℓ > 0?

On the Connection with Communication Complexity
As exemplified in the proof of Lemma 4.4, there exists a striking parallel between the notion of k-round-adaptive testing algorithms, and that of k-round protocols in communication complexity. In this section, we make this parallel rigorous, and give a blackbox reduction between the two that one can leverage to establish lower bounds on k-round-adaptive testing.
In more detail, we build on the communication complexity methodology for proving property testing lower bounds due to [BBM12] (more precisely, to the general formulation of this methodology as laid out in [Gol13]). Although the results stated there hold for non-adaptive lower bounds (in the case of one-way communication or simultaneous message passing) or fully adaptive lower bounds in property testing (in the case of two-way communication), it is easy to obtain their counterpart for k-round-adaptive, given in Theorem 6.4 below. But first, we need to recall some notations.
In what follows, for a property P, integer k, and parameters ε, δ ∈ [0, 1], we write Q (k) δ (ε, P) for the minimum query complexity of any k-round-adaptive tester for P with error probability δ and distance parameter ε. Given a communication complexity predicate F , we let CC (k) δ (F ), − → CC δ (F ), and ← − CC δ (F ) denote respectively the minimum communication complexity of a public-coin protocol for F with error δ in (i) k-rounds, (ii) one-way from Alice to Bob, and (iii) one-way from Bob to Alice, respectively (note that the case δ = 0 then corresponds to protocols with perfect completeness).
Proof. The proof will be identical to that of [Gol13, Theorem 3.1], where we only need to check that Alice and Bob can each simulate the execution of the property testing algorithm (using their public random coins), answering the queries made to F (x, y) while preserving the number of rounds.
Running the testing algorithm, Alice first sends the bits allowing Bob to compute the answers to the first q 0 queries, using her input x and the one-way protocols for the relevant F i 's. Bob then answers with the q 0 bits corresponding to the answers he computed, as well as the bits allowing Alice to compute the answers to the next q 1 queries made by the tester, using now his input y and the one-way protocols for the relevant F i 's. They do so for k + 1 rounds of communication in total, until the last player to receive a message gets from the other player both the answers to the queries in Q k−1 as well as the bits needed to compute (given their own input) the answers to the last q k queries. At that point, it only remains to use a last round of communication (the (k + 2)'nd) to communicate to the other player the answers to these last q k queries, so that both Alice and Bob can finish running their copy of the testing algorithm and know the answer. Note that the number of bits communicated at round 1 ≤ i ≤ k + 2 is by definition of B (resp. B ′ ) at most B · q i−1 + q i−2 (resp. B ′ · q i−1 + q i−2 ), so that at most (B + 1)q (resp. (B ′ + 1)q) bits are communicated in total. This concludes the proof.
To illustrate the above methodology, we show how it can be leveraged to prove a hierarchy of lower bounds on the power of k-adaptive testers for testing a very fundamental class of Boolean functions, that of m-linear functions. 8 Proposition 6.5. Let PAR n s ⊆ 2 2 n denote the class of parities of size s (over n variables), and fix m := √ n 2 . Then, for any 0 ≤ k ≤ log * m − 2, any (k, q)-round-adaptive tester for PAR n 2m must satisfy q = Ω m log (k+2) m .
Proof. We will rely on a result of Sağlam and Tardos [ST13], which implies the following (tight) lower bound on the communication complexity of sparse set-disjointness (DISJ n m , where both inputs x, y ∈ {0, 1} n are promised to have Hamming weight m): Theorem (Corollary of [ST13, Theorem 4]). For any 1 ≤ k ≤ log * m, any k-round probabilistic protocol for DISJ 4m 2 m with error probability at most 1/3 must have communication Ω m log (k) m .

On the Relative Power of Round-and Tail-Adaptive Testers
In this section, we show that the two notions of round-and tail-adaptive testers we introduced are not equivalent. As mentioned in Section 3, while round-adaptive testers are at least as powerful as tail-adaptive ones, there exist properties for which the separation is strict: Theorem 6.6. Fix any α ∈ (0, 1). There exists a constant β ∈ (0, 1) such that, for every n ∈ N, the following holds. For every integer 0 ≤ k ≤ n β , there exists a property P k ⊆ F n 1+α n such that, for any constant ε ∈ (0, 1], (i) there exists a (k,Õ(k))-round-adaptive (one-sided) tester for P k ; yet (ii) any (k, q)-tail-adaptive (two-sided) tester for P k must satisfy q = Ω(n).
Proof sketch. The argument is very similar to that of Theorem 4.1, and follows the same overall structure. Namely, we slightly modify the k-iterated function f k of Section 4 (which was computable by a (k, k + 1)-tail-adaptive algorithm) to rule out tail-adaptive algorithms but not round-adaptive ones: that is, we define the function f ′ k : F n n → F n by f ′ k (x) = 1 if x x,g k−1 (x) = x x,g k−1 (x)+1 mod n 0 otherwise.
(Perhaps more clearly, f ′ k is computed by iterating the pointer function k times, and then checking if the value x i at the final coordinate i ∈ [n] reached, and the value x i+1 at the adjacent coordinate i + 1, are equal.) It is not hard to see that the counterparts of Claim 4.3 and Lemma 4.4 still hold for f ′ k : first, the function is still easy to compute by (k, k + 2)-round-adaptive algorithms. However, because the very last round requires 2 queries and not one (to query x i and x i+1 , once the value of i = g k−1 (x) has been obtained), tail-round-adaptive algorithms are no longer able to leverage this, and analogously to Lemma 4.4 we can conclude that there is no (k, o(n/(k 2 log n)))-round-adaptive (randomized) LDT algorithm which computes f ′ k . It then only remains to lift this DT separation to Open Problem 1, although one rather weak quantitatively. It also, as a special case, would separate adaptive and non-adaptive testing of m-linearity for m = o(n), a longstanding open question [BK12,BCK14].
to property testing: we can do this as before (noting, in the case of lifting the lower bound, that the reduction of Lemma 4.10 preserves the number of queries per round, and thus the "tailness" of the algorithm).