Optimal Communication Complexity of Chained Index
Abstract
We study the chain communication problem introduced by Cormode et al. [ICALP 2019]. For , in the problem, there are string and index pairs for such that the value at position in string is the same bit for all pairs. The input is shared between players as follows. Player 1 has the first string , player 2 has the first index and the second string , player 3 has the second index along with the third string , and so on. Player has the last index . The communication is one way from each player to the next, starting from player 1 to player 2, then from player 2 to player 3 and so on. Player , after receiving the message from player , has to output a single bit which is the value at position in for any . It is a generalization of the well-studied index problem, which is equivalent to .
Cormode et al. proved that the problem requires communication, and they used it to prove streaming lower bounds for the approximation of maximum independent sets. Subsequently, Feldman et al. [STOC 2020] used it to prove lower bounds for streaming submodular maximization. However, it is not known whether the lower bound used in these works is optimal for the problem, and in fact, it was conjectured by Cormode et al. that bits are necessary.
We prove the optimal lower bound of for when as our main result. This settles the open conjecture of Cormode et al., barring the range of . The main technique is a reduction to a non-standard index problem where the input to the players is such that the answer is biased away from uniform. This biased version of index is analyzed using tools from information theory. As a corollary, we get an improved lower bound for approximation of maximum independent set in vertex arrival streams via a reduction from chain directly.
Keywords and phrases:
communication complexity, index communciation problemFunding:
Janani Sundaresan: Supported in part by Sepehr Assadi’s Sloan Research Fellowship and startup grant from University of Waterloo.2012 ACM Subject Classification:
Theory of computation Communication complexityAcknowledgements:
The author is thankful to Sepehr Assadi for introducing them to the problem and for insightful discussions on the proof. The author would also like to thank Parth Mittal for useful comments, Christian Konrad for introducing them to the Augmented Chain problem, and the anonymous reviewers of ITCS 2025 for helpful comments and suggestions. The author is very grateful to Mi-Ying Huang, Xinyu Mao, Guangxu Yang and Jiapeng Zhang for an illuminating discussion about the problem. They pointed out an important flaw in an earlier version of this work, and the discussion was instrumental for the new proofs in the current version.Editors:
Raghu MekaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The index problem is one of the foundational problems in communication complexity. For , in the problem, there are two players Alice and Bob. Alice has a string and Bob has an index , and Bob has to output the value of at position . If the communication is one-way from Alice to Bob, it is easy to show that Alice needs to send bits to get any constant advantage [1, 30]. This problem has been well-studied in multiple settings, and we know tight trade-offs in the two-party communication model for communication complexity [34], information complexity [24], and quantum communication complexity [8, 24].
Among the numerous applications of communication complexity, one that is of interest to us is proving lower bounds for streaming algorithms. index and its variants, in particular, have been quite useful in this context, for example, in [23, 18, 20, 21, 16, 10]. This is by no means an exhaustive list.
In this paper, we study a natural generalization of index, called chained index ( for ) introduced by [12]. There are different instances of , correlated so that they have the same answer. They are “chained” together, where each player holds the index to the previous instance, and also the string for the next instance.
Definition 1 (Informal).
In , there are instances of , all with the same answer. Players 1 and 2 take on the role of Alice and Bob respectively in the first instance, players 2 and 3 take on the role of Alice and Bob respectively for the second instance, and so on, all the way till players and for the last instance.
Communication is one-way from each player to the next in ascending order. The last player has to output the answer. The communication cost is the total number of bits in all the messages sent by the players. See Figure 1 for an illustration.
In [12], a reduction from chain was employed to get a lower bound for approximation of maximum independent sets in vertex arrival streams. Before its introduction, [28] used the problem implicitly to get a lower bound of in the approximation factor for maximum matching in space in vertex arrival streams.
The problem has been used by the breakthrough result of [19] to study the multi-party communication complexity of submodular maximization. They proved that any randomized -party protocol which maximizes a monotone submodular function , subject to a cardinality constraint of at most and an approximation factor of at least , uses communication. This also gave a lower bound for streaming submodular maximization. The chain problem was used by [17] also for similar purposes, but subject to stronger matroid constraints. [7] used a reduction from chain to prove lower bounds for interval independent set selection in streams of split intervals.
We do not know tight bounds for the communication complexity of the chain problem, despite finding varied applications of it. There is a trivial protocol of bits, where any player can send the entire string to the next player who holds the index. Another simple protocol is for each player to send bits randomly sampled from their strings using public randomness, and with constant probability, in at least one of the instances, we send the special bit to the player holding the index. However, this still takes total bits of communication.
In [12], they prove a lower bound of for any through a reduction from conservative multi-party pointer jumping problem, introduced by [14]. They state without proof that a stronger lower bound of can be obtained for a restricted range of . They posed the following conjecture on the optimal communication lower bound.
Conjecture 2 ([12]).
Any protocol that solves requires bits of communication.
[19] made some progress on 2 by showing that among all the messages sent by the players, there is at least one message with bits for every . But the original conjecture is still open, and this is the focus of our work.
1.1 Our Results
We settle 2 almost fully by proving the optimal lower bound of barring the corner case of when is too large. As far as we know, this corner case is not a focus for existing reductions from .
Theorem 3.
For any , any protocol for with probability of success at least 2/3, requires total bits of communication.
Therefore, as long as , we get the optimal lower bound from Theorem 3.
The proof of Theorem 3 can be found in Section 3. We prove the lower bound in the more general blackboard model of communication, instead of private messages between players (see Section 2.1 for details). The main idea is to analyze where Alice and Bob already have some prior advantage in guessing the answer. We elaborate on our techniques in Section 1.2.
As a direct corollary of Theorem 3, we get improvements in streaming lower bounds in [12, 19] through reductions from chain immediately. In particular, we get that any algorithm which -approximates the size of a maximum independent set in vertex arrival streams requires space, while the previous bound was in [12]. We present the implications of our result in Section 4.
A further generalization of chain, called Augmented Chain was defined in [15]. Here, instances of Augmented Index are chained together instead. Our lower bound of can be extended to Augmented Chain also, and the details are covered in Section 3.4.
1.2 Our Techniques
In this subsection, we give an overview of the challenges in proving the lower bound and a summary of our techniques. We start by going over the prior techniques.
Prior Techniques
We will briefly talk about the technique used in [19] to prove that there is at least one player who sends bits for any . We will argue that these techniques can be extended to proving a lower bound of for the total number of bits, but not all the way to bits.
The first step in proving a lower bound for in [19] is a decorrelation step – the instances of have the same answer, and they remove this correlation with a hybrid argument. These arguments have been used extensively in the literature (see e.g., [26, 3, 27, 6]). Intuitively, any protocol that tries to solve may attempt to solve “many” of the instances of , albeit each with a “small” advantage over 1/2, in the hope that the “small” advantages may accrue to get a constant probability of success overall (taking advantage of the fact that all the instances of have the same answer). The hybrid argument is a way to reduce proving a lower bound on the overall problem, to proving a lower bound for different problems against these low advantages.
Let us assume, for simplicity, that the protocol tries to get an advantage of in each instance of , to get a constant total advantage. We can prove that any protocol that gets an advantage of for uses bits of communication using basic tools from information theory [9] (this is quite standard, see e.g., [2] for a direct proof), and this is known to be tight. Therefore, for instances, we get a lower bound of bits in total. Now, we will argue why this is not the optimal lower bound.
On one hand, for , it is known that for any , there is a protocol that uses bits of communication to get a probability of success . This means that can be solved with advantage in bits; in other words, each “hybrid step” of the previous lower bound argument is optimal. So, then, why can we not get a good protocol for by running the protocol for with on all instances? This protocol would have bits of communication in total. The reason is that the small advantages do not add up as we would like them to. To illustrate this, we will briefly talk about the protocol that gets advantage in bits for .
Protocol for
First, we will sketch a protocol for that uses bits of communication. Let us imagine that the input is chosen uniformly at random from and the index is chosen uniformly at random from . Then, Alice finds the majority bit from her string and sends it to Bob. Bob just outputs the bit sent by Alice. We know, from simple anti-concentration bounds on the binomial distribution, that the number of indices with the majority bit in is at least with constant probability for some appropriate constant . The protocol succeeds if Bob holds the index to a majority bit, and this happens with probability at least .
The assumption that the input is chosen uniformly at random from can be removed using public randomness. Alice and Bob collectively sample a random string , and Alice changes her input to so that each bit is 0 or 1 with equal probability. Similarly, the assumption that the index is chosen uniformly at random from can be removed by Alice and Bob sampling a random permutation of . Alice permutes string according to this permutation, and Bob changes his input to the index that the permutation maps to. This is termed as the self-reducibility property of index.
We can also extend the protocol to any by partitioning the string into blocks at random using public randomness and sending the majority bit in each block. Bob knows the block that his input index belongs to as public randomness is used.
Challenges for
If we use the protocol we described for , we are left with bits from each instance of , which may be the correct answer to the problem with probability , but, the variance of each of these bits is . We have a protocol for , where, out of the instances of , it finds the right answer for instances in expectation. However, the standard deviation of the number of right answers is , which is enough to mask the improvement over we get in expectation. If we use a hybrid argument over the total variation distance, each of the smaller “hybrid steps” is optimal, whereas the overall lower bound is not optimal. We cannot achieve a lower bound stronger than .
Our Solution
Instead of keeping track of progress in terms of advantage gained in guessing the answer, we directly keep track of the “information” the message reveals about the answer. Formally, this translates to the change in entropy of the answer, after each successive message. Initially, the players have no information about the answer, and it is uniform over (the entropy is 1). Any protocol with a large enough probability of success must reduce the entropy of the answer by a large factor (see Fano’s inequality in Proposition 6). We prove that after each message, the entropy is reduced only by an additive factor linear in the length of the message, by a reduction to the index problem. This will give us a lower bound on the total length of the messages.
For , in any protocol with probability of success at least , the entropy of the answer conditioned on the message is at most where is the binary entropy function. In the standard version where the initial entropy is 1, the reduction in entropy is , and it is known that the protocol requires communication [24]. This is not sufficient for our application due to the following reason: after the message of , when and attempt to solve , they already have some prior advantage that the message of gives them. Therefore, we need to analyze when the answer is not uniform over .
Biased Index
We define the biased index problem, parametrized by . Alice and Bob receive input and respectively, such that the value at position in (denoted by ) is 1 with probability and 0 otherwise. Alice sends a message to Bob and Bob has to output . The initial entropy of the answer is . We prove that entropy of conditioned on is smaller by at most compared to the initial , which is our main contribution (see Lemma 12).
We can show that the entropy of input to Alice, i.e. the random string , in such a distribution is at least . Hence, after a message of length from Alice, the entropy of string reduces to . For a randomly chosen position in after conditioning on the message, the entropy is at least , which gives a lower bound on .
In the distribution given to Alice and Bob, however, is not chosen uniformly at random, and in fact, is correlated with the distribution of . Such versions of index where the distributions of Alice and Bob are correlated have been studied before (see Sparse Indexing in Appendix A of [5] and Section 3.3 of [36]). This correlation is the main issue in analyzing biased index with information theoretic tools.
Adapted from the approach in Appendix A of [5], we restrict the randomness in to a fixed set of indices of a carefully chosen size (based on ), and break this correlation. The loss in entropy of is not significant enough to hinder us, and the restriction then allows us to use standard information theoretic tools to analyze biased index (see Section 3.3 for more details).
Independent and Concurrent Work
Independently and concurrently of this work, [33] made progress on 2. They showed a lower bound of for oblivious protocols (where the length of the message sent by each player does not depend on the input), and a lower bound of for general protocols. We show a lower bound of for all protocols.111The authors of [33] pointed out an important flaw in the arguments of an earlier version of this work which was posted at around the same time as [33]. This flaw was subsequently fixed in the current version using a global change to the original argument, which now recovers optimal result for .
Quantitatively, our lower bounds are a factor of almost stronger, and are optimal for ; moreover, our lower bound also holds for the Augmented Chain problem of [15]. In terms of techniques, however, the two works are entirely disjoint: their proof is based on a new method of analysis through min-entropy and we use information theoretic approaches.
2 Preliminaries
In this section, we will present the required notation and definitions for our proof.
Notation
For any tuple of items, we use to denote the tuple for all . We use sans-serif font to denote random variables. For any random variable , we use to denote any sampled from the distribution of the random variable .
For any string , we use to denote the bit at position in for . We use to denote the string of bits preceding in . We use for any to denote the bits at positions in set .
For any , we use to denote the binary entropy function.
We need the following standard approximation of binomial coefficients (see Lemma 7 of Chapter 10 in [32]).
Fact 3 (c.f. [32]).
For any and any , we have,
2.1 Communication Complexity Model
We use the standard number-in-hand multi-party model of communication. Only the basic definitions are given in this subsection. More details can be found in textbooks on communication complexity [35, 31].
For any , let be a function from to . There are players where gets an input for . There is a shared blackboard visible to all the players. The players have access to a shared tape of random bits, along with their own private randomness. In any protocol for , the players send a message to the blackboard in increasing order ( sends a message followed by , and so on till ). The last player , after all the messages are sent, outputs a single bit denoted by . Protocol is said to solve with probability of success at least if, for all , for any choice of , we have,
The communication cost of a protocol is defined as the worst case total communication of all the players on the blackboard at the end of the protocol.
Definition 4.
The randomized communication complexity of , with probability of error , is defined as the minimum communication cost of any protocol which solves with probability of success at least .
2.2 Information Theoretic Tools
Our proof relies on tools from information theory, and we state the basic definitions and the inequalities we need in this section. Proofs of the statements and more details can be found in Chapter 2 of a textbook on information theory by Cover and Thomas [13].
Definition 5 (Shannon Entropy).
For any random variable over support , the Shannon entropy of , denoted by is defined as,
For any event we define in the same way, as the entropy of distribution of conditioned on the event . For any two random variables and , the entropy of conditioned on , denoted by is defined as,
Fact 5.
We know the following about entropy and mutual information:
-
1.
For any random variable , the entropy obeys the bound: where is the support of .
-
2.
For any two random variables , with equality holding iff .
-
3.
Chain Rule of Entropy: For and any tuple of random variables , .
-
4.
Subadditivity of entropy: For and any tuple of random variables , .
We also need the following proposition, which relates entropy to the probability of correctness while estimating a random variable.
Proposition 6 (Fano’s inequality).
Given a binary random variable and an estimator random variable and a function such that with probability at least for ,
This concludes our preliminaries section.
3 The Lower Bound
In this section, we will prove our lower bound of on the communication complexity of the problem for . Let us formally define the communication problem first.
Definition 7.
The communication problem is defined as follows. Given players for where,
-
has a string for each , and,
-
for has an index ,
such that,
for some bit . The players have a blackboard visible to all the parties. For in ascending order, sends a single message , after which the index is revealed to the blackboard at no cost. has to output whether is 0 or 1. Refer to Figure 2 for an illustration.
Let us recall the statement of our main result.
Theorem 3 (restated).
For any , any protocol for with probability of success at least 2/3, requires total bits of communication.
We give our hard distribution for in Section 3.1 and give the proof of Theorem 3 in Section 3.2 except for the analysis of biased index, which is given in Section 3.3. Lastly, we extend the arguments to Augmented Chain in Section 3.4.
3.1 Setting Up the Problem
In this subsection, we start by defining the notation for our proof, and we describe the input distributions to .
The input hard distribution is as follows. Let be the subset of strings where the number of ones is exactly equal to .
Distribution for :
-
1.
Pick a bit uniformly at random from .
-
2.
For each , sample uniformly at random from and independently conditioned on .
Notation
We use to denote the random variable corresponding to string and to denote the random variable corresponding to index for . To denote the random variable corresponding to the first strings and indices, we use and respectively.
We use to denote the random variable corresponding to the message sent by to the blackboard. We use to denote the tuple containing the messages of all the players, and to denote the random variable corresponding to .
Let be a deterministic protocol for with probability of success at least when the input is distributed according to . We use to denote the random variable corresponding to the contents of the blackboard (referred to as a transcript), and we use to also denote transcripts sampled from . We use to denote the random variable of the tuple for . We use to denote the output of when the contents of the blackboard are and the input is for .
Let be the total length of all the messages sent by the players in . We assume that the total length of the messages is exactly by padding. For any random variable , we use to denote the distribution of the random variable , as the input is distributed according to . We replace with for ease of readability whenever it is clear from context.
Let denote the random variable corresponding to bit . The bit corresponds to the answer to . We show the lower bound of for distinguishing between the case when and based on the contents of the blackboard.
We need one important observation about the distribution .
Observation 8.
For any , random variable is independent of conditioned on .
Proof.
For any fixed value of , are chosen uniformly at random from such that . This choice is independent of any with , and thus independent of .
We are ready to proceed with the proof of our main theorem.
3.2 Proof of Lower Bound
We start by showing that in any successful protocol, the entropy of the distribution of conditioned on the message must be small.
Claim 9 (The transcript reveals information about .).
Proof.
We know that successfully finds the value of with probability of success at least 2/3 using the transcript. Thus, using Fano’s inequality in Proposition 6, we have,
The main part of the proof is that we show a lower bound the entropy of conditioned on the transcripts using the entropy of the message.
Lemma 10.
For any protocol ,
Proof of Theorem 3.
Combining Lemma 10 with Claim 9, we get,
This gives that,
We know from Section 2.2-(1) that , which proves that the total number of bits for any deterministic protocol. By Yao’s minimax principle, we get a lower bound of on the randomized communication complexity of .
The proof of Lemma 10 employs a reduction to the two player . However, in these instances of , Alice and Bob already have some partial information about the answer. We call this problem the biased index problem, and it is defined based on parameter , which is the initial bias known about the answer.
Definition 11.
The biased index distributional communication problem, denoted by for , is defined as follows.
Sample such that with probability , and otherwise. Sample uniformly at random from conditioned on . Give string to Alice, and index to Bob. Bob has to output after a single message from Alice.
Let be a deterministic protocol for . Let denote the random variables corresponding to , and respectively. Let denote the joint distribution of and in .
We prove the following lemma about in Section 3.3.
Lemma 12 (Biased Index).
For any protocol for ,
3.3 Biased Index
In this subsection, we will prove Lemma 12. Let us first recall the input distribution to .
Distribution : Sample with probability and set otherwise. Sample conditioned on .
In , the distribution of and are highly correlated. We give an alternate way of sampling so that this correlation is removed partially.
Distribution :
For :
-
1.
Sample set of size uniformly at random.
-
2.
Sample a set of indices from uniformly at random and set them to 1, and set to 0 to get .
-
3.
Sample by sampling an index uniformly at random from .
For :
-
1.
Sample set of size uniformly at random.
-
2.
Sample a set of indices from uniformly at random and set them to 0, and set to 1 to get .
-
3.
Sample by sampling an index uniformly at random from .
In this section, we assume that . For the case when , the proof follows in the same vein, and is not presented. Let denote the random variables corresponding to set and set respectively. We show that distributions and are in fact identical, and the proofs can be found in the full version.
Claim 13.
In distribution , for any ,
Claim 14.
Distribution is the same as .
Using this alternate way of sampling, it is easy to see that random variables and are independent of each other conditioned on . It can also be extended to include random variable , as it is only a function of .
Observation 15.
In distribution , conditioned on , for any , distribution of random variables is independent of event .
Proof.
Conditioned on , string is chosen by picking an size set uniformly at random from , and setting these indices to 1, and this choice also fixes as the protocol is deterministic. And index is chosen uniformly at random from , independently of the choice of , by definition of . Thus, choice of is independent of random variables .
Next, we will show that even conditioned on , the entropy of remains large.
Claim 16.
Proof.
We assume that , as otherwise, the statement is vacuously true. by definition, and entropy is always non-negative by Section 2.2-(1).
Conditioned on , we know that is fixed by choosing set uniformly at random. Thus,
| (by Section 2.2-(1)) | ||||
| (by Section 2, and , as ) | ||||
| (as ) | ||||
We are ready to prove Lemma 12.
Proof of Lemma 12.
We can lower bound the entropy of conditioned on as,
| (as conditioning reduces entropy, Section 2.2-(2)) | ||||
| (as is uniform over ) | ||||
| (as by definition of ) | ||||
| (as , by 15) | ||||
| (by subadditivity of entropy, Section 2.2-(4)) | ||||
| (as is fixed to be 0) | ||||
Thus it follows that,
| (1) |
We also have,
| (as is fixed by ) | ||||
| (by chain rule of entropy, Section 2.2-(3)) | ||||
| (by Eq 1) | ||||
| (as conditioning reduces entropy, by Section 2.2-(2)) |
Combining with Claim 16, we get,
| (as ) | ||||
| (rearranging the terms) |
Dividing both sides by , we get,
| (as ) |
finishing the proof.
3.4 Extension to Augmented Chain
In this subsection, we will extend our lower bound to the Augmented Chain problem introduced by [15]. We begin by defining Augmented Index.
Augmented Index is a close variant of the index problem. Here, in addition to having the index , Bob also has the bits . We know that this generalization also requires communication when Alice sends a message to Bob [34]. This problem is particularly useful for proving lower bounds for turnstile streams (see e.g., [11, 25, 16], and references therein). Tight information cost trade-off for this variant in the two-way communication model was proved by [9].
The formal definition of Augmented Chain follows.
Definition 17 (Augmented Chain).
The communication problem is defined as follows. Given players for where,
-
has a string for each ,
-
for has an index , and a string ,
such that,
for some bit . The players have a blackboard visible to all the parties. For in ascending order, sends a single message , after which the index , and the string are revealed to the blackboard at no cost. has to output whether is 0 or 1. Refer to Figure 3 for an illustration.
For the chained version of Augmented Index, the lower bound that at least one player sends bits holds true, and the proof follows with minimal changes. Using this, [15] proved lower bounds for interval independent set selection in turnstile streams with weighted intervals.
We prove the following result about .
Theorem 18.
For any , any protocol for with probability of success at least 2/3 requires communication .
A proof sketch detailing the changes needed to prove Theorem 18 is given in the full version. Most parts of the proof are similar to the proof of Theorem 3.
4 Applications to Streaming
In this section we give applications of our main result to independent sets in vertex arrival streams and streaming submodular maximization.
4.1 Independent Sets
In edge arrival streams, for any graph , the vertex set with vertices is given, and the edges arrive in any arbitrary order. We are required to process the graph in limited space.
In vertex arrival streams, for any graph , the edges are grouped by their incident vertices. Vertices from arrive one by one (in any arbitrary order), and when a vertex arrives, all the edges connecting it to any previously arrived vertices are revealed. This makes the vertex arrival stream a strictly easier model than the edge arrival stream, as the order of edges is restricted.
Indeed, for the maximal independent set problem, we know that finding algorithms in vertex arrival streams is easier; the greedy algorithm produces a maximal independent set in space, whereas, in edge arrival streams, any algorithm which finds a maximal independent set requires space [4, 12].
Maximum independent set (MIS), however, is provably hard in both vertex arrival streams and edge arrival streams. It is known that, any algorithm which performs an -approximation of MIS in edge arrival streams requires space from [22]. In vertex arrival streams, [12] proved a lower bound of . They also gave the following connection between chain problem and MIS in the proof of Theorem 9 of their paper.
Proposition 19 (Rephrased from [12]).
For any , any algorithm that gives an -approximation of maximum independent sets in vertex arrival streams for -vertex graphs using space at most and probability of success at least 2/3 can be used to solve with communication at most bits and success probability at least .
Our lower bound Theorem 3, along with Proposition 19 directly gives the following corollary.
Corollary 20.
For , any -approximation of maximum independent sets in -vertex graphs in vertex arrival streams uses space.
This further reduces the gap between the lower bounds for -approximation of MIS in vertex arrival streams and edge arrival streams by an factor.
4.2 Submodular Maximization
In this subsection, we will summarize our slight improvements to lower bounds for streaming submodular maximization.
A function over ground set from is submodular if and only if, for any two sets and for any element ,
This captures the diminishing returns property of any submodular function.
We are interested in maximization of a monotone submodular function subject to a cardinality constraint. That is, for a given , we want to find a subset with such that for any other set with , .
We are given oracle access to function , however, we do not have access to the entirety of the ground set. The elements of the ground set arrive one by one, and the algorithm has space to either store the incoming element or to discard it. The algorithm can query the oracle to with any subset of the elements currently in storage. We want the storage to be roughly the same as the output size, which is .
In this model, [29] gave an algorithm which finds a -approximation in space. [19] showed that a better approximation was not possible. They proved that any algorithm which gets a -approximation uses . They give the following connection to the chain problem in Theorem 1.3 and Theorem 1.4 of their paper.
Proposition 21 (Rephrased from [19]).
For any , there exists a constant such that for any , any randomized streaming algorithm which maximizes a monotone submodular function subject to cardinality constraint of at most , using space at most and with approximation factor at least in expectation can be used to solve with probability of success at least 2/3 and communication at most .
We get an improvement of factor over the current state-of-art lower bound in [19] as a corollary of Theorem 3.
Corollary 22.
For any , there exists a constant such that for any , any streaming algorithm that maximizes a monotone submodular function subject to a cardinality constraint of at most , with an approximation factor at least , requires space.
References
- [1] Farid Ablayev. Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science, 157(2):139–159, 1996. doi:10.1016/0304-3975(95)00157-3.
- [2] S. Assadi. Lecture notes on sublinear algorithms. https://sepehr.assadi.info/courses/cs514-s20/lec8.pdf, 2020.
- [3] S. Assadi, G. Kol, R. R. Saxena, and H. Yu. Multi-pass graph streaming lower bounds for cycle counting, max-cut, matching size, and other problems. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 354–364, Los Alamitos, CA, USA, November 2020. IEEE Computer Society. doi:10.1109/FOCS46700.2020.00041.
- [4] Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for ( + 1) vertex coloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 767–786, 2019. doi:10.1137/1.9781611975482.48.
- [5] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, pages 698–711, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2897518.2897576.
- [6] Sepehr Assadi and Janani Sundaresan. (noisy) gap cycle counting strikes back: Random order streaming lower bounds for connected components and beyond. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, pages 183–195, New York, NY, USA, 2023. Association for Computing Machinery. doi:10.1145/3564246.3585192.
- [7] Sujoy Bhore, Fabian Klute, and Jelle J. Oostveen. On streaming algorithms for geometric independent set and clique. In Parinya Chalermsook and Bundit Laekhanukit, editors, Approximation and Online Algorithms, pages 211–224, Cham, 2022. Springer International Publishing. doi:10.1007/978-3-031-18367-6_11.
- [8] Harry Buhrman and Ronald Wolf. Communication complexity lower bounds by polynomials. In Proceedings of the Annual IEEE Conference on Computational Complexity, pages 120–130, February 2001. doi:10.1109/CCC.2001.933879.
- [9] Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, and Andrew McGregor. Information cost tradeoffs for augmented index and streaming language recognition. SIAM Journal on Computing, 42(1):61–83, 2013. doi:10.1137/100816481.
- [10] Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu. Near-optimal two-pass streaming algorithm for sampling random walks over directed graphs. In International Colloquium on Automata, Languages and Programming, 2021. URL: https://api.semanticscholar.org/CorpusID:232014583.
- [11] Kenneth L. Clarkson and David P. Woodruff. Numerical linear algebra in the streaming model. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, pages 205–214, New York, NY, USA, 2009. Association for Computing Machinery. doi:10.1145/1536414.1536445.
- [12] Graham Cormode, Jacques Dark, and Christian Konrad. Independent Sets in Vertex-Arrival Streams. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 45:1–45:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ICALP.2019.45.
- [13] Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.). Wiley, 2006.
- [14] Carsten Damm, Stasys Jukna, and Jirí Sgall. Some bounds on multiparty communication complexity of pointer jumping. In Claude Puech and Rüdiger Reischuk, editors, STACS 96, 13th Annual Symposium on Theoretical Aspects of Computer Science, Grenoble, France, February 22-24, 1996, Proceedings, volume 1046 of Lecture Notes in Computer Science, pages 643–654. Springer, 1996. doi:10.1007/3-540-60922-9_52.
- [15] Jacques Dark, Adithya Diddapur, and Christian Konrad. Interval Selection in Data Streams: Weighted Intervals and the Insertion-Deletion Setting. In Patricia Bouyer and Srikanth Srinivasan, editors, 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023), volume 284 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1–24:17, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.FSTTCS.2023.24.
- [16] Jacques Dark and Christian Konrad. Optimal lower bounds for matching and vertex cover in dynamic graph streams. In Shubhangi Saraf, editor, 35th Computational Complexity Conference, CCC 2020, July 28-31, 2020, Saarbrücken, Germany (Virtual Conference), volume 169 of LIPIcs, pages 30:1–30:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPICS.CCC.2020.30.
- [17] Ashkan Norouzi Fard, Moran Feldman, Ola Svensson, and Rico Zenklusen. Submodular maximization subject to matroid intersection on the fly. In 30th Annual European Symposium on Algorithms, ESA 2022, September 5-9, 2022, Berlin/Potsdam, Germany, pages 52:1–52:14, 2022. doi:10.4230/LIPICS.ESA.2022.52.
- [18] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. Graph distances in the data-stream model. SIAM J. Comput., 38(5):1709–1727, 2008. doi:10.1137/070683155.
- [19] Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 1363–1374, New York, NY, USA, 2020. Association for Computing Machinery. doi:10.1145/3357713.3384286.
- [20] Sudipto Guha and Andrew McGregor. Stream order and order statistics: Quantile estimation in random-order streams. SIAM J. Comput., 38(5):2044–2059, 2009. doi:10.1137/07069328X.
- [21] Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipass graph processing. In Proceedings of the 28th Conference on Computational Complexity, CCC 2013, K.lo Alto, California, USA, 5-7 June, 2013, pages 287–298, 2013. doi:10.1109/CCC.2013.37.
- [22] Magnús M. Halldórsson, Xiaoming Sun, Mario Szegedy, and Chengu Wang. Streaming and communication complexity of clique approximation. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, volume 7391 of Lecture Notes in Computer Science, pages 449–460. Springer, 2012. doi:10.1007/978-3-642-31594-7_38.
- [23] P. Indyk and D. Woodruff. Tight lower bounds for the distinct elements problem. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 283–288, 2003. doi:10.1109/SFCS.2003.1238202.
- [24] Rahul Jain, Jaikumar Radhakrishnan, and Pranab Sen. A property of quantum relative entropy with an application to privacy in quantum communication. J. ACM, 56(6), September 2009. doi:10.1145/1568318.1568323.
- [25] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pages 1161–1178, USA, 2010. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611973075.93.
- [26] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for approximating MAX-CUT. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 1263–1282. SIAM, 2015. doi:10.1137/1.9781611973730.84.
- [27] Michael Kapralov, Amulya Musipatla, Jakab Tardos, David P. Woodruff, and Samson Zhou. Noisy Boolean Hidden Matching with Applications. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022), volume 215 of Leibniz International Proceedings in Informatics (LIPIcs), pages 91:1–91:19, Dagstuhl, Germany, 2022. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2022.91.
- [28] Mikhail Kapralov. Better bounds for matchings in the streaming model. In ACM-SIAM Symposium on Discrete Algorithms, 2012. URL: https://api.semanticscholar.org/CorpusID:448251.
- [29] Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3311–3320. PMLR, 09–15 June 2019. URL: https://proceedings.mlr.press/v97/kazemi19a.html.
- [30] Ilan Kremer, Noam Nisan, and Dana Ron. On randomized one-round communication complexity. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’95, pages 596–605, New York, NY, USA, 1995. Association for Computing Machinery. doi:10.1145/225058.225277.
- [31] Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997.
- [32] F. J. (Florence Jessie) MacWilliams and N. J. A. (Neil James Alexander) Sloane. The theory of error-correcting codes. North-Holland mathematical library ; v. 16. North-Holland Pub. Co., Amsterdam, 1978 - 1977.
- [33] Guangxu Yang Mi-Ying Huang, Xinyu Mao and Jiapeng Zhang. Breaking square-root loss barriers via min-entropy. Electron. Colloquium Comput. Complex., pages TR24–067, 2024. URL: https://eccc.weizmann.ac.il/report/2024/067/.
- [34] Peter Bro Miltersen, Noam Nisan, Shmuel Safra, and Avi Wigderson. On data structures and asymmetric communication complexity. Journal of Computer and System Sciences, 57(1):37–49, 1998. doi:10.1006/jcss.1998.1577.
- [35] Anup Rao and Amir Yehudayoff. Communication Complexity: and Applications. Cambridge University Press, 2020. doi:10.1017/9781108671644.
- [36] Mert Saglam. Tight bounds for data stream algorithms and communication problems. Master’s thesis, Simon Fraser University, 2011.
