Communication Complexity with Small Advantage

We study problems in randomized communication complexity when the protocol is only required to attain some small advantage over purely random guessing, i.e., it produces the correct output with probability at least ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document} greater than one over the codomain size of the function. Previously, Braverman and Moitra (in: Proceedings of the 45th symposium on theory of computing (STOC), ACM, pp 161–170, 2013) showed that the set-intersection function requires Θ(ϵn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Theta(\epsilon{n})$$\end{document} communication to achieve advantage ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}. Building on this, we prove the same bound for several variants of set-intersection: (1) the classic “tribes” function obtained by composing with And (provided 1/ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/\epsilon$$\end{document} is at most the width of the And), and (2) the variant where the sets are uniquely intersecting and the goal is to determine partial information about (say, certain bits of the index of) the intersecting coordinate.


Introduction
In randomized communication complexity, protocols are commonly required to succeed with probability at least some constant less than 1, such as 3/4. Achieving success probability one over the codomain size of the function is trivial by outputting a uniformly random guess. There is a spectrum of complexities between these extremes, where we require a protocol to achieve success probability greater than one over the codomain size, i.e., advantage . We study the fine-grained question "How does the communication complexity of achieving advantage depend on ?" Formally, for a two-party function F , let R p (F ) denote the minimum worst-case communication cost of any randomized protocol (with both public and private coins) that is p-correct in the sense that for each input (X, Y ) in the domain of F , it outputs F (X, Y ) with probability at least p.
(We provide a proof of the Gap-Hamming upper bound in Section A since it does not appear in the above references.) Hence, it is naturally interesting to study the dependence of the complexity on for different important functions, in order to build a more complete understanding of randomized communication. For functions with codomain size greater than 2, small-advantage protocols are not even amenable to amplification, so no lower bounds for them follow a priori from lower bounds for higher-advantage protocols.
The functions we study are defined using composition. Letting g : X × Y → {0, 1} be a two-party total function (usually called a gadget), and f : {0, 1} n → {0, 1} be a (possibly partial) function, the two-party composed (possibly partial) function f • g n : X n × 2 Page 4 of 37 T. Watson cc at least one 1. For a 1-input, this rejects with probability 0, and for a 0-input it finds an all-0 row (and hence rejects) with probability at least 4 . Now if we modify the above protocol so it rejects automatically with probability 1/2 − and otherwise proceeds as before, then it rejects 1-inputs with probability 1/2− and 0-inputs with probability at least (1/2 − ) + (1/2 + ) · 4 ≥ 1/2 + . The provision ≥ 1 was stated cleanly to ensure that we can round 4 up to an integer without affecting the asymptotic complexity. (If ≤ o(1) then just evaluating a single row of M takes ω( m) communication.) The lower bound, which we prove in Section 2, does not require this provision.
Our basic approach to prove the lower bound in Theorem 1.1 is to combine the information complexity techniques of Braverman & Moitra (2013) (developed for the -advantage lower bound for Set-Inter) with the information complexity techniques of Jayram et al. (2003) (developed for the constant-advantage lower bound for Tribes). However, in trying to combine these techniques, there are a variety of technical hurdles, which require several new ideas to overcome. In Section 1.4 we discuss why other approaches fail to prove Theorem 1.1.

What if
≤ o(1)? As mentioned above, when ≤ o(1), our proof of the O( m) upper bound for Tribes ,m breaks down. So what upper bound can we give in this case? Let us restrict our attention to = 2 (and let > 0 be arbitrary).
First of all, notice that the communication protocol in Section 1.1 is actually a query complexity (a.k.a. decision tree complexity) upper bound for the outer function. A communication protocol for any composed function (with constant-size gadget) can simulate a decision tree for the outer function, using constant communication to evaluate the output of each gadget when queried by the decision tree. In the next paragraph, we describe an O( √ m)-queryadvantage randomized decision tree for And 2 • Or 2 m (thus showing that R 1/2+ (Tribes 2,m ) ≤ O( √ m) provided √ m ≥ 1). Say the input is z = (z 1 , z 2 ) ∈ {0, 1} m × {0, 1} m . Consider the following randomized decision tree: pick S 1 , S 2 ⊆ [m] both of size 2 √ m, independently uniformly at random, and accept iff z 1 | S 1 and z 2 | S 2 each contain at least one 1. For a 1-input, each of cc Communication with Small Advantage Page 5 of 37 2 these two events happens with probability at least 2 √ , so they happen simultaneously with probability at least 4 . For a 0-input, one of the two events never happens, and hence, this accepts with probability 0. Now if we modify the above randomized decision tree so it accepts automatically with probability 1/2 − and otherwise proceeds as before, then it accepts 0-inputs with probability 1/2− and 1-inputs with probability at least (1/2 − ) + (1/2 + ) · 4 ≥ 1/2 + , and queries at most O( √ m) bits. We conjecture that this communication upper bound is tight, i.e., R 1/2+ (Tribes 2,m ) ≥ Ω( √ m). This remains open, but we at least prove the query complexity version of this conjecture, which can be construed as evidence for the communication version. (The query complexity measure R dt p (f ) is defined in the natural way.) We prove the lower bound of Theorem 1.2 in Section 3. There are some known powerful "simulation theorems" (e.g., Göös et al. ( , 2017) for converting query lower bounds for an outer function into matching communication lower bounds for a composed function; however, we lack a simulation theorem powerful enough to convert Theorem 1.2 into a communication lower bound. Furthermore, we have not found a way to emulate the query lower bound proof with information complexity tools to get a communication lower bound.

Which part contains the intersecting coordinate?
We now turn our attention away from Tribes.
Suppose Alice and Bob are given uniquely intersecting subsets X and Y from a universe of size n that is partitioned into ≥ 2 equal-size parts, and they wish to identify which part contains the intersection. Of course, they can succeed with probability 1/ by random guessing without communicating about their sets. To do better, they can use the following protocol.
We state this using the following notation. Define the partial function Which : {0, 1} → [ ] that takes a string of Hamming weight 1 and outputs the coordinate of the only 1. Define the "unambiguous-or" function Unambig-Or m as Or m restricted to the domain of strings of Hamming weight 0 or 1. Define the "unambiguous-set-intersection" function 1 Unambig-Inter m := Unambig-Or m • And m 2 Theorem 1.3. R 1/ + (Which • Unambig-Inter m ) = Θ( m) provided m ≥ 1.
We prove the lower bound in Section 4, where we also describe some ways to reinterpret Theorem 1.3. This problem is motivated partly as a simple variant of set-intersection whose communication complexity was not fully understood before, and partly because of the corollaries presented in Section 4, which show the maximum possible gap between randomized small-advantage complexity and so-called SV-nondeterminism, and between sampling from distributions with/without the ability to condition on an event.
The key to the proof is in relating the complexity of Which •F to the complexity of F (for an arbitrary two-party F with Boolean output). It is natural to conjecture that the complexity goes up by roughly a factor of after composition with Which ; this is an cc Communication with Small Advantage Page 7 of 37 2 alternative form of direct sum problem. In the standard direct sum setting, the goal is to evaluate F on each of independent inputs; our form is equivalent but under the promise that one of the inputs evaluates to 1 and the rest to 0. Thus, proving the direct sum conjecture (factor increase in complexity) appears qualitatively harder in our setting than in the standard setting. We show an information complexity version of the conjecture, and we combine this with Braverman & Moitra (2013) to derive Theorem 1.3.
For worst-case communication, we at least show that the complexity does not go down after composition with Which . In particular, this yields a simple proof of a communication lower bound due to Klauck (2003) which implies the communication complexity class separation UP ∩ coUP ⊆ BPP. The proof in Klauck (2003) is technically somewhat involved, exploiting a "fine-tuned" version of Razborov's corruption lemma (Razborov 1992); our simple proof of the same lower bound is by a black-box reduction to the standard (constant-advantage) lower bound for Unambig-Inter.

Related work.
We now describe why the Ω( m) lower bound in Theorem 1.1 does not follow straightforwardly from known results. First of all, applying standard majority-amplification to the known Ω( m) lower bound for constant advantage only yields an Ω( 2 m) lower bound. What about the technique used by  to give a simplified proof of the tightadvantage lower bound for Set-Inter? Let us summarize this technique (known as "and-amplification") as applied to the complement function Set-Disj: running an -advantage protocol O(1/ ) times, and accepting iff all runs accept, yields a so-called SBP-type protocol, for which the complexity is characterized by the corruption bound. Hence, the -advantage complexity is always at least Ω( ) times the corruption bound (which is Ω(n) for Set-Disj n by Razborov (1992)). Applied to Tribes ,m (or its complement), the and-amplification technique can only yield an essentially Ω( · max( , m)) lower bound, since Tribes ,m has an O( log m)-communication nondeterministic (in particular, SBP-type) protocol and an O(m + log )-communication conondeterministic (in particular, coSBP-type) protocol.  (Harsha & Jain 2013)? The smooth rectangle bound in general characterizes the complexity of so-called WAPP-type protocols Jain & Klauck 2010). Thus, if we could "amplify" an -advantage protocol into a (sufficiently-largeconstant-advantage) WAPP-type protocol with o(1/ 2 ) factor overhead, then for Tribes √ n, √ n we would get a nontrivial -advantage lower bound. However, the smooth rectangle lower bound for Gap-Hamming (Chakrabarti & Regev 2012) shows that this cannot always be done, i.e., an Ω(1/ 2 ) overhead is sometimes necessary (at least for general partial functions). In summary, the known "rectangle-based" lower bound techniques fail to yield Theorem 1.1, so we use an information complexity approach instead.
There is some other work related to the Tribes lower bound. The paper Razborov & Sherstov (2010) proved that if the definition of R is changed to allow only private coins (no public coins) then R 1/2+ (Tribes , 2 ) ≥ Ω( ) for all > 0 (no matter how small). The original constant-advantage lower bound for Tribes (Jayram et al. 2003) spawned a line of research on the communication complexity of read-once formulas (Göös & Jayram 2016;Jain et al. 2010;Jayram et al. 2009;Leonardos & Saks 2010). A multi-party version of Tribes has also been studied in the message-passing model (Chattopadhyay & Mukhopadhyay 2015).
Regarding the problem of finding which part contains the unique intersecting coordinate, we mention that there is some prior work studying a peripherally related topic: the randomized complexity of "finding the exact intersection" (Braverman et al. 2013;Brody et al. 2014Brody et al. , 2016, albeit not restricting the size of the intersection. One of our corollaries of Theorem 1.3 concerns the communication complexity of problems where the goal is to sample from a distribution. Not much is known about this topic, and the existing works focus on problems where there is no input and the goal is to sample random input-output pairs of a function (Ambainis et al. 2003;Jain et al. 2013;. The complexity of "smalladvantage sampling" was studied in the context of time-bounded computation in Watson (2014). replaced by a different two-party gadget, namely the equality function on trits 3Eq : . This is because 3Eq reduces to Unambig-Or 3 • And 3 2 (with Alice and Bob both mapping their trit to its characteristic bit vector of Hamming weight 1), and thus Unambig-Or m •3Eq m reduces to Unambig-Or 3m • And 3m 2 , and Or m • 3Eq m reduces to Or 3m • And 3m 2 . We now mention some notational conventions. We use P for probability, E for expectation, H for Shannon entropy, I for mutual information, D for relative entropy, and Δ for statistical (total variation) distance. We use bold letters to denote random variables, and non-bold letters for particular outcomes. We use ∈ u to denote that a random variable is distributed uniformly over some set.
All protocols Π are randomized and have both public and private coins, unless otherwise stated, and we use CC (Π) to denote the worst-case communication cost. When we speak of an arbitrary F , by default it is assumed to be a two-party partial function. Also, complexity class names (such as BPP) refer to classes of (families of) two-party partial functions with polylogarithmic communication protocols of the relevant type.

Communication lower bound for tribes
The upper bound for Theorem 1.1 was shown in Section 1.1. In this section, we give the proof of the lower bound, which is broken into four steps corresponding to the four subsections. In Section 2.1, we  Jayram et al. 2003) to show that it suffices to prove a certain information complexity lower bound for a constant-size function; there are no substantially new ideas in this step. Then in Section 2.2, we further reduce to a problem involving just nine inputs at a time. In Section 2.3 and Section 2.4, we finish the proof of the lower bound by showing how to tightly relate probabilities (coming from the protocol's correctness) to the corresponding contributions to the protocol's information cost.

2.1.
Step 1: conditioning and direct sum. As noted in Section 1.5, it suffices to prove the lower bound for Tribes ,m := As a technicality, we assume Π has been converted into a private-coin-only protocol, where Alice first privately samples the public coins (if any) and sends them to Bob. (This could blow up the communication, but we will only use the fact that the "original communication" part of the transcript has bounded length, not the "public coins" part.) We can think of the input to Tribes ,m as an × m table where each cell has two trits, one for Alice and one for Bob. As is standard in information complexity lower bounds, we define a distribution over inputs, equipped with a "conditioning scheme" that decomposes the distribution into a mixture of product distributions (where Alice's and Bob's parts of the input are independent of each other). We do this by placing a uniformly random 1-input to 3Eq at a uniformly random cell in each row, and for each of the remaining cells choosing at random a rectangular "window" of 0-inputs to 3Eq, from which the input to that cell is drawn.
Formally, let us define W 1 := {00}, {11}, {22} as the set of "1-windows" of 3Eq, and define as the set of "0-windows" of 3Eq. We define a probability space with the random variables: Note that XY is supported on 1-inputs of Tribes ,m , and that X and Y are independent conditioned on W . Finally, let τ be the random transcript on input (X, Y ). Define X −J := (X i,j ) j =J i (and Y −J similarly), and let τ C denote the "original communication" part of τ , and τ R denote the "public coins" part of τ . We have Note that given this i * , h * , W * −i * ,h * , the remaining conditioning variables W i,h k have 36 possible outcomes: 2 choices for k (it could be either element of h * , and j is the other), 3 choices for W i,j , and 6 choices for W i,k .
We rephrase the situation by considering a protocol Π * that interprets its input as X i * ,h * , Y i * ,h * , uses private coins to sample in each of the non-i * rows, and 0's in the non-h * columns of the i * row). Here, we now think of the two coordinates in {i * } × h * as being labeled 1 and 2.
For convenience, we henceforth recycle notation by letting Π denote the new protocol Π * and letting (j, k) With respect to this recycled notation, the inequality 2.1 becomes The following lemma, whose proof occupies the remaining three subsections, provides the contradiction, completing the proof of Theorem 1.1. Lemma 2.3. If 2.2 holds then Π is not a (1/2 + )-correct protocol for Or 2 • 3Eq 2 .

2.2.
Step 2: uniformly covering a pair of gadgets. Let us set up some notation (all in reference to the private-coin protocol Π). If x is an Alice input and y is a Bob input, let π x,y denote the probability Π accepts on input (x, y). For a 1 × 2 rectangle of inputs {u} × {v, w} let ι u,vw denote the mutual information between the random transcript of Π and a uniformly random input from {(u, v), (u, w)}. Similarly, for a 2 × 1 rectangle of inputs {v, w} × {u} let ι vw,u denote the mutual information between the random transcript of Π and a uniformly random input from {(v, u), (w, u)}. We write u = u 1 u 2 ∈ {0, 1, 2} 2 and similarly for v and w.
Since in the inequality 2.2 there are only a constant number of possible outcomes for W k, the o( ) bound holds conditioned on The following lemma (illustrated in Figure 2.1) is proved in the remaining two subsections.
Lemma 2.5. For any Alice inputs a, b, c and Bob inputs d, e, f, we have We now show how to use Lemma 2.5 to prove Lemma 2.3.
where the second line is by the first four key properties of our map, the third line is by Lemma 2.5, and the fourth line is by the last key property. Hence, Π cannot be (1/2+ )-correct for Or 2 •3Eq 2 since otherwise the first line would be at least 36·(1/2+ )−36·(1/2− ) = 72 .
The following fact was also used in Braverman & Moitra (2013); we provide a proof for completeness. Proof. Assume the random variable Y ∈ u {v, w} is jointly distributed with τ (the random variable representing the transcript). Note that P[τ = τ ] = 1 2 τ u (τ v + τ w ) and that Δ ( where the second line is by Fact 1.4.(ii), and the third line is by Fact 1.4.(iii).
Intuitively, Lemma 2.6 means I(τ u , τ v , τ w ) lower bounds the "contribution" of τ to the information cost. Now that we have related the information costs to the contributions, we need to relate the contributions to the probabilities of observing individual transcripts. The following two lemmas allow us to do this.
Lemma 2.7. For any four numbers q, r, s, t ∈ [0, 1], we have −qs + qt + rs − rt ≤ 2 I(q, s, t) + I(s, q, r) . Lemma 2.7 is from Braverman & Moitra (2013). Lemma 2.8 (illustrated in Figure 2.2) is more involved and constitutes one of the key technical novelties in our proof of Theorem 1.1. For example, one insight is in finding the proper list of coefficients on the left side of the inequality in Lemma 2.8, to simultaneously make the lemma true and enable it to be used in our proof approach for Lemma 2.5.
The proof of Lemma 2.7 in Braverman & Moitra (2013) proceeds by clearing denominators and then decomposing the difference between the right and left sides into a sum of parts, such that the (weighted) AM-GM inequality implies each part is nonnegative. A priori, it is conceivable the same approach could work for Lemma 2.8; however, the problem of finding an appropriate decomposition can be expressed as a linear program feasibility question, and with the help of an LP solver we found that this approach actually does not work for Lemma 2.8 (even with 32 replaced by other constants). To get around this, we begin by giving a significantly different proof of Lemma 2.7, 2 which we are able to generalize to prove Lemma 2.8. We provide our proofs of both lemmas in the remaining subsection, where we also give some intuition. For now, we complete the proof of Lemma 2.5. Here, we employ another key idea (beyond the proof structure of Braverman & Moitra (2013)): the corresponding part of the argument in Braverman & Moitra (2013) finishes by simply summing Lemma 2.7 over accepting transcripts, but this approach does not work in our context. We also need to take into account the rejecting transcripts and the fact that the acceptance and rejection probabilities sum to 1, in order to orchestrate all the necessary cancellations.

2.4.
Step 4: relating information and probabilities for transcripts. We first give some intuition for why the inequality in Lemma 2.8 is true. Suppose for some small δ, > 0 we have a = 1/2 + δ, e = 1/2 + , and b = c = d = f = 1/2, as illustrated in Figure 2.3. (Although this is just a specific example, the phenomenon it illustrates turns out to hold in general.) The left side of the inequality is the linear combination of the areas of the 9 rectangles, with coefficients as indicated in the figure. The purple regions are congruent and hence cancel out since the coefficients sum to 0. The red regions are congruent and hence cancel out since the coefficients in the top row sum to 0. The blue regions are congruent and hence cancel out since the coefficients in the middle column sum to 0. Thus, the left side is 2δ since only the green region contributes.
Regarding the four terms on the right side of the inequality, the first and third are Θ( 2 ), the second is Θ(δ 2 ), and the fourth is 0. Hence, left side = Θ(δ ) ≤ Θ( 2 + δ 2 ) = right side. The point is that the right side only has terms that are quadratic in δ, , while the left side has "higher-order" terms (at least linear in δ, ) but those higher-order terms miraculously cancel out leaving only quadratic terms. The key property for the cancellation is that in every row and every column, the coefficients sum to 0. 3 We proceed to our formal proofs of Lemma 2.7 and Lemma 2.8. To avoid division-by-0 technicalities, we assume the relevant quantities are infinitesimally perturbed so none are 0.
to be the right side except for the factor of 2. The goal is to show that R ≥ L/2. If q ≥ r and s ≥ t, or if r ≥ q and t ≥ s, then L ≤ 0 ≤ R, so we are done in these cases. Now consider the case that q ≥ r and t ≥ s. (The remaining case, that r ≥ q and s ≥ t, is symmetric.) If t ≤ 3s (so s/(t + s) ≥ 1/4) then since q/(q + r) ≥ 1/2, the product of the two terms of R is ≥ (q − r) 2 (t − s) 2 /8, so by AM-GM, R ≥ 2(q − r)(t − s)/ √ 8 ≥ L/2. If t ≥ 3s then t + s ≤ 2(t − s) so the first term of R is ≥ (q/2(t − s))(t − s) 2 = q(t − s)/2 ≥ L/2.
to be the left side of the inequality in the statement of Lemma 2.8, and define to be the right side except for the factor of 32. The goal is to show that R ≥ L/32. If a + c ≥ 2b and d + f ≥ 2e, or if a + c ≤ 2b and d + f ≤ 2e, then L ≤ 0 ≤ R, so we are done in these cases. Now consider the case that a + c ≥ 2b and d + f ≤ 2e. (The remaining case, that a + c ≤ 2b and d + f ≥ 2e, is symmetric.) We consider four subcases; the first two are just like our argument for Lemma 2.7, but the other two are a bit more complicated.
c ≤ a and d ≤ f : Then L ≤ 4(a − b)(e − d). If e ≤ 3d (so d/(e + d) ≥ 1/4) then since a/(a + b) ≥ 1/2 (because b ≤ a follows from a + c ≥ 2b and c ≤ a), the product of the first two terms of R is ≥ (a − b) 2 (e − d) 2 /8, so by AM-GM, the sum of these two terms 2b and a ≤ c), the product of the last two terms of R is ≥ (c − b) 2 (e − f ) 2 /8, so by AM-GM, the sum of these two terms c ≤ a and f ≤ d:

Query lower bound for tribes
The upper bound for Theorem 1.2 was shown in Section 1.2; we now prove the matching lower bound.
Suppose for contradiction there is a randomized decision tree, which is a distribution T over deterministic decision trees that always make at most √ m/2 queries, and which accepts 0-inputs with probability at most 1/2 − and 1-inputs with probability at least 1/2 + . Consider the following pair of distributions (D 0 , D 1 ) over 0-inputs and 1-inputs, respectively: to sample from D 0 , pick i ∈ u {1, 2}, j ∈ u [m], k ∈ u [m] independently and set z i,j = z i,k = 1 (and the rest of the bits to 0). To sample from D 1 , pick j ∈ u [m], k ∈ u [m] independently and set z 1,j = z 2,k = 1 (and the rest of the bits to 0).
We claim that for an arbitrary T in the support of T , for each r ∈ {0, 1, 2}, letting A r be the set of z's such that T (z) accepts after having read exactly r 1's, we have This yields the following contradiction: (where the dependence of A r on T is implicit on the fourth line). To prove the claim, we first set up some notation. Consider the execution of T when it reads only 0's until it halts. Let S i ⊆ [m] (i ∈ {1, 2}) be the coordinates of z i queried on this execution, and let δ i := |S i |/m; note that δ 1 +δ 2 ≤ √ /2. For each q ∈ |S 1 |+|S 2 | , let r B q be the set of z's that cause T to read q − 1 0's then a 1, r i q ∈ {1, 2}, h q ∈ [m] be such that z i q ,h q is the location of that 1, r C q ⊆ B q be the set of z's that cause T to read q − 1 0's, then a 1, then only 0's until it halts, r S q i ⊆ [m] (i ∈ {1, 2}) be the coordinates of z i queried on the execution corresponding to C q , r δ q i := |S q i |/m (i ∈ {1, 2}); note that δ q 1 + δ q 2 ≤ √ /2.
Case r = 0: If the execution that reads only 0's rejects then Case r = 1: For each q, assuming for convenience that i q = 1, we have and Letting Q ⊆ |S 1 | + |S 2 | be those q's for which the execution corresponding to C q accepts, and noting that A 1 = q∈Q C q , we have Case r = 2: We have P z∼D 1 T (z) reads at least one 1 = P j ∈ S 1 or k ∈ S 2 ≤ δ 1 + δ 2 For each q, assuming for convenience that i q = 1, we have (the middle inequality may not be an equality, since prior to reading the first 1, T may have read some 0's in z 2 ). Hence ≤ P z∼D 1 T (z) reads two 1's = P z∼D 1 T (z) reads at least one 1 · P z∼D 1 T (z) reads two 1's T (z) reads at least one 1

Which one is the 1-input?
We prove Theorem 1.3 and related results in this section. We state and apply the key lemmas in Section 4.1, and we prove them in Section 4.2. We describe some ways to reinterpret Theorem 1.3 in Section 4.3. We discuss some related questions in Section 4.4.

Overview.
Let us first review some definitions.
We provide the (very similar) proofs of these two lemmas in Section 4.2. The key idea is that if we embed a random 1-input of F into a random coordinate and fill the other − 1 coordinates with random 0-inputs of F , then the protocol for Which • F will find the embedded 1-input with advantage , whereas if we embed a random 0-input in the same way then the protocol cannot achieve any advantage since the coordinate of the embedding becomes independent of the -tuple of 0-inputs given to the protocol. For Lemma 4.1, we use a direct sum property for information to get the factor decrease in cost; for Lemma 4.2 we do not get a decrease since there is no available analogous direct sum property for communication.
Proof 1/2+ ,D (F ) ≥ Ω( m); 5 the result was not stated in this way in that paper, but careful inspection of the proof yields it. 6 Then R 1/ + (Which • F ) ≥ Ω( m) follows immediately from this and Lemma 4.1.
Note that for any communication complexity class C, if F ∈ C then Which 2 • F 2 ∈ C ∩ coC. Hence for = 2 and a positive constant, Lemma 4.2 implies that if C ⊆ BPP then C ∩ coC ⊆ BPP. In particular, taking F = Unambig-Inter (and C = UP), we have a simple proof of a result of Klauck (2003, Theorem 2 of the arXiv version), using as a black box the fact that F ∈ BPP.

Proofs.
Proof (of Lemma 4.1). Consider an arbitrary (1/ + )-correct protocol Π for Which • F . Define a probability space with the following random variables: i ∈ u [ ], XY is an input to Π such that X i Y i ∼ D and X j Y j ∼ D 0 for j ∈ [ ] {i} (with the coordinates independent conditioned on i), τ is the communication transcript of Π, and R pub , R priv A , R priv B are the public, Alice's private, and Bob's private coins, respectively. Let Π be the following protocol with input interpreted as X i Y i .
Proof (of Lemma 4.2). By the minimax theorem, it suffices to show that for every distribution D over the domain of F , If either F −1 (0) or F −1 (1) has probability at least 1/2 + /4 under D, then a protocol that outputs a constant witnesses R 1/2+ /4,D (F ) = 0, so we may assume otherwise. For a bit b, let D b be the distribution D conditioned on F −1 (b).
Consider an arbitrary (1/ + )-correct protocol Π for Which • F . Define a probability space with the following random variables: i ∈ u [ ], XY is an input to Π such that X i Y i ∼ D and X j Y j ∼ D 0 for j ∈ [ ] {i} (with the coordinates independent conditioned on i), and R pub , R priv A , R priv B are the public, Alice's private, and Bob's private coins, respectively. Let X −i Y −i denote XY restricted to coordinates in [ ] {i}. Let Π be the following protocol with input interpreted as X i Y i .

Corollaries.
We now describe how Theorem 1.3 implies two other results, which give different perspectives. One result concerns so-called SV-nondeterminism, and the other concerns protocols whose output is a sample from a distribution. Generally speaking, for a function with codomain [ ], an SVnondeterministic algorithm can make a nondeterministic guess and output a value from [ ]∪{⊥}, and on every input it must (1) output the correct value for at least one guess, and (2) for each guess output either the correct value or ⊥. 7 (For = 2, this corresponds to an NP ∩ coNP type of computation.) This definition makes sense for communication complexity, where it turns out an SVnondeterministic protocol can be equivalently defined as follows: there is a collection of rectangles each labeled with a value from [ ], such that the union of the rectangles labeled v ∈ [ ] exactly covers the set of all v-inputs. We let SV(F ) denote the minimum cost, i.e., log of the number of rectangles, of an SV-nondeterministic protocol for F .

Corollary 4.3. There exists an F with codomain [ ] such that
Proof. By Theorem 1.3, the first part is witnessed by F := Which • Unambig-Inter m (for any and m with m ≥ 1) since SV(F ) ≤ log( m). As for the second part, given a cost-c SVnondeterministic protocol for F , Alice and Bob can publicly sample a subset of 2 2 c of the 2 c rectangles, and if the input lies in any of them (which can be checked with O( 2 c ) bits of communication) Let D denote the set of all probability distributions over [ ]. A function F with codomain D can be viewed as a sampling problem, where given input (X, Y ) the goal is to output a sample from (or close to) the distribution F (X, Y ). We define S p (F ) as the minimum worst-case communication cost of any protocol Π that, for each input (X, Y ), outputs a sample from a distribution Π(X, Y ) ∈ D such that Δ Π(X, Y ), F (X, Y ) ≤ 1 − p. Note that the uniform distribution over [ ] is within distance 1 − 1/ of every distribution in D , so S 1/ (F ) = 0 for all F . Thus, it makes sense to consider the complexity of achieving advantage , i.e., S 1/ + (F ).
A natural nondeterministic analogue of sampling is sampling with postselection: a protocol may output ⊥ with probability < 1, and conditioned on not outputting ⊥, the output should be a sample from (or close to) F (X, Y ). An issue is that if we do not restrict the probability of outputting ⊥, then every F can be sampled with postselection with constant communication (by using public coins to guess what the joint input is). Hence, we define PS p (F ) as the minimum CC (Π) + log(1/α) of any protocol Π that, for each input (X, Y ), conditioned on not outputting ⊥, outputs a sample from a distribution Π(X, Y ) ∈ D such that Δ Π(X, Y ), F (X, Y ) ≤ 1 − p, and where α > 0 is defined as the minimum over inputs of the probability of not outputting ⊥. (Such logarithmic terms appear in the cost measures for several other communication models; see  for more details.) We note that a protocol with communication cost c and associated α can be modified to have communication cost 2 and associated α := α/2 c : assuming w.l.o.g. that for each outcome of the public coins, the corresponding deterministic protocol has exactly 2 c possible transcripts, Alice and Bob can sample all the coins as usual as well as publicly sample a uniformly random transcript; they can then check whether the guessed transcript would have been the real one, and if so output the same value and if not output ⊥.
Proof. By Theorem 1.3, the first part is witnessed by F := Which • Unambig-Inter m (for any and m with m ≥ 1), where we identify the output of F (a value from [ ]) with the distribution completely concentrated on that value, in which case we have S 1/ + (F ) = R 1/ + (F ) ≥ Ω( m) and PS 1 (F ) ≤ O(1) + log( m). As for the second part, given a PS 1 protocol for F with communication cost 2 (which is w.l.o.g. as noted above) and associated α, Alice and Bob can run that protocol O( /α) times; if it ever produces a non-⊥ output (which happens with probability ≥ 2 ) then they output the same value, otherwise they output a uniformly random value from [ ]. The statistical distance of this

Related questions.
One question is how strong of a converse there is to Lemma 4.2, i.e., how well R 1/ + (Which • F ) can be upper bounded in terms of R 1/2+δ (F ). Doing so as a blackbox reduction (which would also work for query complexity) can be phrased as the following problem: supposing there are coins, one of which is good (having heads probability ≥ 1/2 + δ) and the rest of which are bad (having heads probability ≤ 1/2−δ), identify the good coin with probability ≥ 1/ + (over the randomness of both the algorithm and the coin flips). This has somewhat of a multi-armed bandit flavor and fits in the framework of "noisy decision trees." As far as we know, it is open to determine an optimal strategy for arbitrary , , δ, but here are some observations. (In conjunction with Lemma 4.2, these show that F and Which • F are at least qualitatively equivalent in complexity for small .) r R 3/4 (Which • F ) ≤ · R 1−1/(4 ) (F ) since we can just flip each coin once, and by a union bound, with probability 3/4 all the coins will have the "right" outcomes. (This does not exploit any properties of Which .) Of course, R 1−1/(4 ) (F ) can be further upper bounded in terms of smaller-advantage complexities by majority-amplification.
cc Communication with Small Advantage Page 31 of 37 2 r R 1/ + (Which • F ) ≤ R 1/2+ /2 (F ) (provided ≤ 1) since we can pick a coin uniformly at random and flip it; if it comes up heads then output the index of that coin; otherwise output a uniformly random one of the other − 1 indices. This implies that R 1/ + (Which •F ) ≤ O 2 ·R 1/2+ (F ) (provided ≤ 1) since by Lemma A.1 we can boost advantage to /2 advantage with O( 2 )-repetition majority-amplification.
We also remark that R dt 1/2+ (f ) ≤ O R dt 1/ + (Which • f )/( ) follows by combining the idea behind Lemma 4.2 with the idea behind the "And-composition lemma" in Göös et al. (2018a) (namely, halting and outputting 1 if the number of queries exceeds O(1/( )) times the height of the randomized decision tree for Which • f ). We omit the details of the simple analysis.
Finally, we remark that by combining Lemma 4.1 with the "one-sided vs. two-sided information complexity" equivalence of Göös et al. (2018a) and the "worst-case vs. average-case information complexity" equivalence of Braverman (2015), it is possible to derive a version of Lemma 4.1 with worst-case information complexity on the left side of the inequality.

A. A delicate concentration bound
Lemma A.1. Suppose a coin with heads probability 1 2 +δ is tossed N times (where δ ≥ 0 and N is odd). Then, the probability of getting a majority of heads is at least 1 2 +Ω(δ √ N ) provided δ √ N ≤ 1.
As mentioned at the beginning of this paper, Lemma A.1 implies the following bound.
Proof (of Corollary A.2). Suppose Alice and Bob publicly sample a uniformly random one of the n coordinates and accept iff their bits are unequal there. This can be viewed as a coin toss with heads probability at least 1 2 + 1/ √ n (where heads represents the output being correct). Repeating the experiment Θ( 2 n) times and taking the majority outcome boosts the success probability to 1 2 + .
Lemma A.1 follows without difficulty from a Chernoff bound when N = Θ(1/δ 2 ), and from the Berry-Esseen theorem when ω(1/δ) ≤ N ≤ O(1/δ 2 ), but the general case seems to require a direct proof, and we provide one below. We could not find a proof in the literature. After this paper was written, other proofs of Lemma A.1 were discovered and presented on Stack Exchange 8 by Yury Makarychev and Clément Canonne.
Proof (of Lemma A.1). We think of N as a fixed, sufficiently large number, and δ as varying in the range [0, 1/ √ N ]. In fact, since the probability in question is a monotonically increasing function of δ, it suffices to consider δ ∈ [0, 0.01/ √ N ]. Letting p i,δ := N i · ( 1 2 + δ) i · ( 1 2 − δ) N −i , the probability is N i= N/2 p i,δ . When δ = 0 this equals 1 2 since N is odd, so it suffices to show that the derivative of the probability with respect to δ is Ω( √ N ) for all δ ∈ 0, 0.01/ √ N . We introduce the shorthand γ := i N − 1 2 (and hence i = N · ( 1 2 + γ)), keeping in mind that γ is a function of i even though we suppress this dependence in the notation.
Putting these things together, we have Thus, c i,δ = 2c i · c i,δ · c i · c δ , which is certainly nonnegative for γ ∈ [0, 1 2 which proves the key claim.