Better Streaming Algorithms for the Maximum Coverage Problem

We study the classic NP-Hard problem of finding the maximum $k$-set coverage in the data stream model: given a set system of $m$ sets that are subsets of a universe $\{1,\ldots,n \}$, find the $k$ sets that cover the most number of distinct elements. The problem can be approximated up to a factor $1-1/e$ in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to $1-1/e$, that use sublinear space $o(mn)$. Our main results are: Two $(1-1/e-\epsilon)$ approximation algorithms: One uses $O(\epsilon^{-1})$ passes and $\tilde{O}(\epsilon^{-2} k)$ space whereas the other uses only a single pass but $\tilde{O}(\epsilon^{-2} m)$ space. We show that any approximation factor better than $(1-(1-1/k)^k)$ in constant passes requires $\Omega(m)$ space for constant $k$ even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, $(1-\epsilon)$ approximation algorithm using $\tilde{O}(\epsilon^{-2} m \cdot \min(k,\epsilon^{-1}))$ space. We also study the maximum $k$-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on $N$ vertices. The goal is to find $k$ vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires $\Omega(N)$ space for constant $k$ whereas $\tilde{O}(\epsilon^{-2}N)$ space is sufficient for a $(1-\epsilon)$ approximation and arbitrary $k$ in a single pass. For regular graphs, we show that $\tilde{O}(\epsilon^{-3}k)$ space is sufficient for a $(1-\epsilon)$ approximation in a single pass. We generalize this to a $(\kappa-\epsilon)$ approximation when the ratio between the minimum and maximum degree is bounded below by $\kappa$.

We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges.
• We show that any constant approximation in constant passes requires Ω(N ) space for constant k whereasÕ( −2 N ) space is sufficient for a (1 − ) approximation and arbitrary k in a single pass.
• For regular graphs, we show thatÕ( −3 k) space is sufficient for a (1 − ) approximation in a single pass. We generalize this to a (κ − ) approximation when the ratio between the minimum and maximum degree is bounded below by κ.

Introduction
The maximum set coverage problem is a classic NP-Hard problem that has a wide range of applications including facility and sensor allocation [30], information retrieval [6], influence maximization in marketing strategy design [26], and the blog monitoring problem where we want to choose a small number of blogs that cover a wide range of topics [37]. In this problem, we are given a set system of m sets that are subsets of a universe [n] := {1, . . . , n}. The goal is to find the k sets whose union covers the largest number of distinct elements. For example, in the application considered by Saha and Getoor [37], the universe corresponds to n topics of interest to a reader, each subset corresponds to a blog that covers some of these topics, and the goal is to maximize the number of topics that the reader learns about if she can only choose k blogs. It is well-known that the greedy algorithm, which greedily picks the set that covers the most number of uncovered elements, is a 1 − 1/e approximation. Furthermore, unless P = N P , this approximation factor is the best possible [20].
The maximum vertex coverage problem is a special case of this problem in which the universe corresponds to the edges of a given graph and there is a set corresponding to each node of the graph that contains the subset of edges that are incident to that node. For this problem, algorithms based on linear programming are known to achieve a 3/4 approximation for general graphs [1] and a 8/9 approximation for bipartite graphs [13]. Assuming P = N P , there does not exist a polynomial-time approximation scheme. Recent work has focused on finding purely combinatorial algorithms for this problem [12].
Streaming Algorithms. Unfortunately, for both problems, the aforementioned greedy and linear programming algorithms do not scale well to massive data sets. This has motivated a significant research effort in designing algorithms that could handle large data in modern computation models such as the data stream model and the MapReduce model [10,31]. In the data stream model, the k-set coverage problem and the related set cover problem have received a lot of attention in recent research [7,9,16,19,22,41].
Two variants of the data stream model are relevant to our work. In the streaming-set model [19,21,28,36,37,40], the stream consists of m sets S 1 , . . . , S m and each S i is encoded as the list of elements in that set along with a unique ID for the set. For simplicity, we assume that ID(S i ) = i. In the dynamic graph stream model [2-5, 8, 11, 17, 21, 24, 25, 29, 33, 34], relevant to the maximum vertex coverage problem, the stream consists of insertions and deletions of edges of the underlying graph. For a recent survey of research in graph streaming, see [32]. Note that any algorithm for the dynamic graph stream model can also be used in the streaming-set model; the streaming-set model is simply a special case in which there are no deletions and edges are grouped by endpoint.

Related Work
Maximum Set Coverage. Saha and Getoor [37] gave a swap based 1/4 approximation algorithm that uses a single pass andÕ(kn) space. At any point, their algorithm stores k sets explicitly in the memory as the current solution. When a new set arrives, based on a specific rule, their algorithm either swaps it with the set with the least contribution in the current solution or does nothing and moves on to the next set in the stream. Subsequently, Ausiello et al. [9] gave a slightly different swap based algorithm that also finds a 1/4 approximation using one pass and the same space. Yu and Yuan [41] claimed anÕ(n) space, single-pass algorithm with an approximation factor around 0.3 based on the aid of computer simulation.
Recently, Badanidiyuru et al. [10] gave a generic single-pass algorithm for maximizing a monotone submodular function on the stream's objects subject to the cardinality constraint that at most k objects are selected. Their algorithm guarantees a 1/2 − approximation. At a high level, based on a rule that is different from [9,37] and a guess of the optimal value, their algorithm decides if the next object (which is a set in our case) is added to the current solution. The algorithm stops when it reaches the end of the stream or when k objects have been added to the solution. In the k-set coverage problem, the rule requires knowing the coverage of the current solution. As a result, a careful adaptation to the k-set coverage problem usesÕ( −1 n) space. For constant , this result directly improves upon [9,37]. Subsequently, Chekuri et al. [43] extended this work to non-monotone submodular function maximization under constraints beyond cardinality.
The set cover problem, which is closely related to the k-set coverage problem, has been studied in [7,16,19,22,37]. See [7] for a comprehensive summary of results and discussion.
Maximum Vertex Coverage. The streaming k-vertex coverage problem was studied by Ausiello et al. [9]. They first observed that simply outputting the k vertices with highest degrees is a 1/2 approximation; this can easily be done in the streaming-set model. The main results of their work wereÕ(kN )-space algorithms that have better approximation for special types of graph. Their results include a 0.55 approximation for regular graphs and a 0.6075 approximation for regular bipartite graphs. Note that their paper only considered the streaming-set model whereas our results for maximum vertex coverage will consider the more challenging dynamic graph stream model.

Our Contributions
Maximum k-set coverage. Our main goal is to achieve the 1 − 1/e approximation that is possible in the non-streaming or offline setting.
• We present polynomial time data stream algorithms that achieve a 1 − 1/e − approximation for arbitrarily small . The first algorithm uses one pass andÕ( −2 m) space whereas the second algorithm uses O( −1 ) passes andÕ( −2 k) space. We consider both algorithms to be pass efficient but the second algorithm uses much less space at the cost of using more than one pass. We note that storing the solution itself requires Ω(k) space. Thus, we considerÕ( −2 k) space to be surprisingly space efficient.
• For constant k, we show that Ω(m) space is required by any constant pass (randomized) algorithm to achieve an approximation factor better than 1 − (1 − 1/k) k with probability at least 0.99; this holds even if the algorithm is permitted exponential time. To the best of our knowledge, this is the first non-trivial space lower bound for this problem. However, with exponential time andÕ( −2 m · min(k, −1 )) space we observe that a 1 − approximation is possible in a single pass.
For a slightly worse approximation, a 1/2 − approximation in one pass can be achieved usingÕ( −3 k) space. This follows by building on the result of Badanidiyuru et al. [10]. However, we provide a simpler algorithm and analysis. Finally, we design a 1/3 − approximation algorithm for the budgeted maximum set coverage problem using one pass andÕ(n) space. In this version, each set S has a cost w S in the interval [0, L]. The goal is to find a collection of sets whose total cost does not exceed L that cover the most number of distinct elements. Khuller et al. [27] presented a polynomial time and 1 − 1/e approximation algorithm based on the greedy algorithm and an enumeration technique. Our results are summarized in Figure 1.
Shortly after our original submission, in an independent work, Bateni et al. [44] also presented a polynomialtime, single-pass,Õ( −3 m) space algorithm that finds a 1 − 1/e − approximation for the maximum k-set coverage problem. Furthermore, given unlimited post-processing time, their results also imply a single pass 1 − approximation using a single-pass andÕ( −3 m) space. This extension to 1 − approximation is also possible with our approach; see the end of 2.1 for details. We also note that our approach also works in their edge arrival model in which the stream reveals the set-element relationships one at a time.
Maximum k-vertex coverage. Compared to the most relevant previous work [9], we study this problem in a more general model, i.e., the dynamic graph stream model. We manage to achieve a better approximation and space complexity for general graphs even when comparing to their results for special types of graph. Our results are summarized in Figure 2. In particular, we show that •Õ( −2 N ) space is sufficient for a 1 − approximation (or a 3/4 − approximation if restricted to polynomial time) and arbitrary k in a single pass. The algorithms in [9] useÕ(kN ) space and achieve an approximation worse than 0.61 even for special graphs.
• Any constant approximation in constant passes requires Ω(N ) space for constant k.
• For regular graphs, we show thatÕ( −3 k) space is sufficient for 1 − approximation in a single pass.
We generalize this to an κ − approximation when the ratio between the minimum and maximum degree is bounded below by κ. We also extend this result to hypergraphs.  Figure 2: Summary of results for MaxVertexCoverage. κ is ratio of lowest degree to highest degree.

Upper/Lower bound Number of passes
Our techniques. On the algorithmic side, our basic approach is a "guess, subsample, and verify" framework. At a high level, suppose we design a streaming algorithm for approximate k-coverage that assumes a priori knowledge of a good guess of the optimal coverage. We show that it is a) possible to run same algorithm on a subsampled universe defined by a carefully chosen hash function and b) remove the assumption that a good guess was already known. If the guess is at least nearly correct, running the algorithm on the subsampled universe results in a small space complexity. However, there are two main challenges. First, an algorithm instance with a wrong guess could use too much space. We simply terminate those instances. The second issue is more subtle. Because the hash function is not fully independent, we appeal to a special version Chernoff bound. The bound needs not guarantee a good approximation unless the guess is near-correct. To this end, we use the F 0 estimation algorithm to verify the coverage of the solutions. Finally, we return the solution with maximum estimate coverage. This framework allows us to restrict the analysis solely to the near-correct guess. The analysis is, therefore, significantly simpler.
Some of our other algorithmic ideas are inspired by previous works. The "thresholding greedy" technique was inspired by [16,18,42]. However, the analysis is different for our problem. Furthermore, to optimize the number of passes, we rely on new observations. Another algorithmic idea in designing one-pass space-efficient algorithm is to treat the sets differently based on their contributions. During the stream, we immediately add the sets with large contributions to the solution. We store the contribution of each remaining sets explicitly and solve the remaining problem offline. Har-Peled et al. [22] devised a somewhat similar strategy but the details are different.
For the k-vertex coverage problem, we show that simply running the streaming cut-sparsifier algorithm is sufficient and optimal up to a polylog factor. The novelty is to treat it as an interesting corner case of a more space-efficient algorithm for near regular graphs, i.e., κ is bounded below.
One of the novelties is proving the lower bound via a randomized reduction from the k-party set disjointness problem.

Algorithms for maximum k-set coverage
In this section, we design various algorithms for approximating MaxSetCoverage in the data stream model. Our main algorithmic results in this section are two 1 − 1/e − approximation algorithms. The first algorithm uses one pass andÕ( −2 m) space whereas the second algorithm uses O( −1 ) passes andÕ( −2 k) space. We also briefly explore some other trade-offs in a subsequent subsection.
Notation. If A is a collection of sets, then C(A) denotes the union of these sets.

(1 − 1/e − ) approximation in one pass andÕ( −2 m) space
Approach. The algorithm adds sets to the current solution if the number of new elements they cover exceeds some threshold. The basic algorithm relies on an estimate z of the optimum coverage OPT. The threshold for including a new set in the solution is that it covers at least z/k new elements. Unfortunately, this threshold is too high to ensure that we selected sets that achieve the required 1 − 1/e − approximation and we may want to revisit adding a set, say S, that was not added when it first arrived. To facilitate this, we will explicitly store the subset of S that were uncovered when S arrived in a collection of sets W. By the fact that S was not added immediately, we know that this subset is not too large. At the end of the pass, we continue augmenting out current solutions using the collection W.
Technical Details. For the time being, we suppose that the algorithm is provided with an estimate z such that OPT ≤ z ≤ 4 OPT. We will later remove this assumption. The algorithm uses C to keep track of the elements that have been covered so far. Upon seeing a new set S, the algorithm stores S \ C explicitly in W if S covers few new elements. Otherwise, the algorithm adds S to the solution and updates C immediately. At the end of the stream, if there are fewer than k sets in the solution, we use the greedy approach to find the remaining sets from W .
The basic algorithm maintains where I corresponds to the ID's of the (at most k) sets in the current solution and C is the the union of the corresponding sets. We also maintain a collection of sets W described above.The algorithm proceeds as follows: 2. For each set S in the stream: 3. Post-processing: Greedily add k − |I| sets from W and update I and C appropriately.
There exists a single-pass, O (k log m + mz/k · log n)-space algorithm that finds a 1 − 1/e approximation of MaxSetCoverage.
Proof. We observe that storing the set of covered elements C requires at most OPT log n = O(z log n) bits of space. For each set S such that S \ C is stored explicitly in W, we need O (z/k · log n) bits of space. Storing I requires O(k log m) space. Thus, the algorithm uses the space as claimed.
After the algorithm added the ith set S to the solution, let a i be the number of new elements that S covers and b i be the total number of covered elements so far. Furthermore, for i > 0, let c i = OPT −b i . Define a 0 := b 0 := 0 and c 0 := OPT. At the end of the stream, suppose |I| = j. Then, c j ≤ OPT −zj/k ≤ OPT(1 − 1/k) j . Now, we consider the sets that were added in post-processing. We then proceed with the usual inductive argument to show that c i ≤ (1 − 1/k) i OPT for i > j. Before the algorithm added the (i + 1)th set for i ≥ j, there must be a set that covers at least c i /k new elements. Therefore, Following the approach outlined in Section 2.3 we may assume z = O( −2 k log m) and that OPT ≤ z ≤ 4 OPT .
There exists a single-pass,Õ( −2 m) space algorithm that finds a 1 − 1/e − approximation of MaxSetCoverage with high probability.
Better approximation using more space and unlimited post-processing time. We observe that a slight modification of the above algorithm can be used to attain a 1 − 1/(4b) approximation for any b > 1 if we are permitted unlimited post-processing time and an extra factor of b in the space use. Specifically, we increase the threshold for when to add a set immediately to the solution from z/k to bz/k and then find the optimal collection of k − |I| sets from W to add in post-processing. It is immediate that this algorithm uses O(k log m + mbz/k · log n) space.
Suppose a collection y sets S 1 were added during the stream. These y sets cover | C(S 1 )| ≥ y · bz/k ≥ OPT ·yb/k elements. On the other hand, the collection of sets S 2 selected in post-processing covers at least k−y k · (OPT −| C(S 1 )|) new elements. Then, where the last inequality follows by minimizing over y. Hence, we obtain a 1 − approximation by setting b = 4/ .

Theorem 3.
There exists a single-pass,Õ( −3 m) space algorithm that finds a 1 − approximation of MaxSetCoverage with high probability.
Our second algorithm is based on the standard greedy approach but instead of adding the set that increases the coverage of the current solution the most at each set, we add a set if the number of new elements covered by this set exceeds a certain threshold. This threshold decreases with each pass in such a way that after only O( −1 ) passes, we have a good approximate solution but the resulting algorithm may use too much space. We will fix this by first randomly subsampling each set at different rates and running multiple instantiations of the basic algorithm corresponding to different rates of subsampling. The basic "decreasing threshold" approach has been used before in different contexts [16,18,42]. The novelty of our approach is in implementing this approach such that the resulting algorithm uses small space and a small number of passes. For example, a direct implementation of the approach by Badanidiyuru and Vondrák [42] in the streaming model may require O( −1 log(m/ )) passes and O(n) space 3 .
Technical Details. We will assume that we are given an estimate z of OPT such that OPT ≤ z ≤ 4 OPT. We start by designing a (1 − 1/e − ) approximation algorithm that usesÕ(k + z) space and O( −1 ) passes. We will subsequently use a sampling approach to reduce the space toÕ( −2 k).
As with the previous algorithm, the basic algorithm in this section also maintains maintains I ⊆ [m], C ⊆ [n] where I corresponds to the ID's of the (at most k) sets in the current solution and C is the the union of the corresponding sets. The algorithm proceeds as follows: 1. Initialize C = ∅ and I = ∅  To analyze the algorithm, we introduce some notation. After the ith set was picked, let a i be the number of new elements covered by this set and let b i be the total number of covered elements so far. Furthermore, let c i = OPT −b i . We define a 0 := 0 and b 0 := 0.
Proof. Suppose the algorithm added the (i + 1)th set S during the jth pass. Consider the set of covered elements C just before the algorithm added the set S.
We first consider the case where j = 1. Then, the algorithm only adds S if Now, we consider the case where j > 1. Note that just before the algorithm added S, there must exist a set S (which could be S) that had not been already added where |S \ C| ≥ c i /k. This follows because the optimum collection of k sets covers at least c i elements that are currently uncovered and hence one of these sets must cover at least c i /k new elements. But since S had not already been added, we know that S was not added during the first j − 1 passes and thus, |S \ C| < z/(kα j−2 ). Therefore, and in particular, z/(kα j−1 ) > c i /(kα). Since the algorithm picked S, we have a i+1 = |S \ C| ≥ z/(kα j−1 ) ≥ c i /(kα) as required.
Proof of Lemma 4. It is immediate that the number of passes is O( −1 ). The algorithm needs to store the sets I and C. Since |C| ≤ z, the total space is O(k log m + z log n).
To argue about the approximation factor, we first prove by induction that we always have Suppose the final solution contains k sets. Then

Removing Assumptions via Guessing, Sampling, and Sketching
In this section, we address the fact that in the previous two sections we assumed a priori knowledge of a constant approximation of the maximum number of elements that could be covered and that this optimum was of size O( −2 k log m).
Addressing both issues are interrelated and are based on a subsampling approach. The basic idea is to run the above algorithms on a new instance formed by removing occurrences of certain elements in [n] from all the input sets. The goal is to reduce the maximum coverage to min(n, O( −2 k log m)) while ensuring that a good approximation in the subsampled instance corresponds to a good approximation in the original instance. In the rest of this section we will assume that k = o( 2 n/ log m) since otherwise this bound is trivial. where γ = v/D since the hash function was γµ = vp -wise independent. But note that where we use the fact that γ = v/D ≥ /2 because D ≤ OPT ≤ 2v. The lemma follows by taking the union bound over all m k collections of k sets. In particular, the following corollary establishes that a 1/t approximation when restricted to elements in {e ∈ [n] : h(e) = 1} yields a (1/t − 2 ) approximation and at most p OPT(1 + ) = O( −2 k log m) of these elements can be covered by k sets.
Proof. The fact that OPT ≥ p OPT(1 − ) follows by applying Lemma 7 to the optimum solution. According to Lemma 7, for all collections of k sets U 1 , . . . , U k , we have Hence, since we know v such that OPT /2 ≤ v ≤ OPT, then we know that with high probability according to Corollary 8. Then, by setting z = 2(1 + )λ, we ensure that OPT ≤ z ≤ 4 OPT .
Guessing v and F 0 Sketching We still need to address how to compute v such that OPT /2 ≤ v ≤ OPT. The natural approach is to make log 2 n guesses for v corresponding to 1, 2, 4, 8 . . . since one of these will be correct. 5 We then perform multiple parallel instantiations of the algorithm corresponding to each guess. This increases the space by a factor of O(log n). But how do we determine which instantiation corresponds to the correct guess? The most expedient way to deal with this question is to sidestep the issue as follows. Instantiations corresponding to guesses that are too small may find it is possible to cover ω( −2 k log m) elements so we will terminate any instantiation as soon as it covers more than O( −2 k log m) elements. Note that by Corollary 8 and Equation 1, we will not terminate the instantiation corresponding to the correct guess.
Among the instantiations that are not terminated we simply return the best solution. To find the best solution we want to estimate | ∪ i∈I S i |, i.e., the coverage of the corresponding sets before the subsampling. To compute this estimate in small space we can use the F 0 -sketching technique. For the purposes of our application, we can summarize the required result as follows: Theorem 9 (Cormode et al. [45]). There exists anÕ( −2 log δ −1 )-space algorithm that, given a set S ⊆ [n], can construct a data structure M(S), called an F 0 sketch of S, that has the property that the number of distinct elements in a collection of sets S 1 , S 2 , . . . , S r can be approximated up to a 1 + factor with probability at least 1 − δ given the collection of For the algorithms in the previous section we can maintain a sketch M(C) of the set of covered elements inÕ( −2 log δ −1 ) space and from this can estimate the desired coverage. We set δ ← Θ(1/n · log n) so that coverages of all non-terminated instances are estimated up to a factor (1 + ) with high probability.

Other Algorithmic Results
In this final subsection, we briefly review some other algorithmic results for MaxSetCoverage, either with different trade-offs or for a "budgeted" version of the problem.

(1 − ) approximation in one pass andÕ( −2 mk) space
In the previous subsection, we gave a single-pass 1 − 1/e − approximation usingÕ( −2 m) space. Here we observe that if we are permittedÕ( −2 mk) space and unlimited post-processing time then a 1 − approximation can be achieved directly from the F 0 sketches.
Specifically, in one pass we construct the F 0 sketches of all m sets, M(S 1 ), . . . , M(S m ) where the failure probability of the sketches is set to δ = 1/(nm k ). Thus, at the end of the stream, one can 1 + approximate the coverage |S i1 ∪ . . . ∪ S i k | for each collection of k sets S i1 , . . . , S i k with probability at least 1 − 1/(nm k ). Since there are at most m k ≤ m k collections of k sets, appealing to the union bound, we could guarantee that the coverages of all of the collections of k sets are preserved up to a 1 + factor with probability at least 1 − 1/n. The space to store the sketches isÕ( −2 mk).

(1/2 − ) approximation in one pass andÕ( −3 k) space
We next observe that it is possible to achieve a 1/2 − approximation using a single pass andÕ( −3 k) space. Consider the following simple single-pass algorithm that uses an estimate z of OPT such that OPT ≤ z ≤ (1 + ) OPT. As with previous algorithms, the basic algorithm in this section also maintains I ⊆ [m], C ⊆ [n] where I corresponds to the ID's of the (at most k) sets in the current solution and C is the the union of the corresponding sets. The algorithm proceeds as follows: 1. Initialize C = ∅ and I = ∅.
2. For each set S in the stream: 5 The number of guesses can be reduced to log 2 k if the size of the largest set is known since this gives a k approximation of OPT. The size of the large set can be computed in one additional pass if necessary.
The described algorithm is a 1/2 − approximation. To see this, if the solution consists of k sets, then the final solution obviously covers at least z/2 ≥ OPT /2 elements. Now we consider the case in which the collection of sets S chosen by the algorithm contains fewer than k sets. We defineS := S \ C(S) to be the set of elements in S that are not covered by the final solution. For each set S in the optimum solution O, if S is unpicked, then |S| ≤ z/(2k). Therefore, and thus |C(S)| ≥ 1− 2 OPT. We note that the above algorithm uses O(k log m + z log n) space but we can use an argument similar to that used in Section 2.3 to reduce this toÕ( −3 k). The only difference is since we need z such that OPT ≤ z ≤ (1 + ) OPT we will guess v in powers of 1 + /4 and set λ = 16c −2 k log m. Then Equation 1, becomes (1 − /4)λ ≤ OPT ≤ (1 + /4) 2 λ and hence z = (1 + /4) 2 λ is a sufficiently good estimate.

Budgeted Maximum Coverage
In this variation, each set S has a cost w S ∈ [0, L]. The problem asks to find the collection of sets whose total cost is at most L that covers the most number of distinct elements. For I ⊆ [n], we use w(I) to denote i∈I w Si . We present the algorithm assuming knowledge of an estimate z such that OPT ≤ z ≤ (1 + ) OPT; this assumption can be removed by running the algorithm for guesses 1, (1 + ), (1 + ) 2 , . . . for z and returning the best solution found. The basic algorithm maintains I ⊆ [m], C ⊆ [n] where I corresponds to the ID's of the (at most k) sets in the current solution and C is the the union of the corresponding sets. The algorithm proceeds as follows: 1. Initialize C = ∅ and I = ∅ 2. For each set S in the stream: (a) If |S \ C| ≥ 2z 3 · w S L then: i. If w(I) + w S > L: Terminate and return: Proof. Suppose the clause is satisfied when the set S is being considered. Then where we used the fact that w S + w(I) > L. The claim then follows immediately.

Algorithms for Maximum k-Vertex Coverage
In this section, we present algorithms for the maximum k-vertex coverage problem. We present our results in terms of hypergraphs for full generality. The generalization to hypergraphs can also be thought of as a natural "hitting set" variant of maximum coverage, i.e., the stream consists of a sequence of sets and we want to pick k elements in such a way to maximize the number of sets that include a picked element.
Notation. Given a hypergraph G and a subset of nodes S, we define C G (S) to be the number of edges that contain at least one node in S. Recall that the maximum k-vertex coverage problem is to approximate the maximum value of C G (S) over all sets S containing k nodes. We use E G and V G to denote the set of edges and nodes of the hypergraph G respectively. The size of a cut (S, V \ S) in a hypergraph G, denoted as δ G (S), is defined as the number of hyperedges that contain at least one node in both S and V \ S. In the case that G is weighted, δ G (S) denotes the total weight of the cut. A core idea to our approach is to use hypergraph sparsification: Definition 15 ( -sparsifier). Given a hypergraph G = (V, E), we say that a weighted subgraph H = (V, E ) is an -sparsifier for G if for all S ⊆ V , δ G (S) ≈ δ H (S).
Any graph on N nodes has an -sparsifier with onlyÕ( −2 N ) edges [39]. Similarly, any hypergraph in which the maximum size of the hyperedges is bounded by d (rank d hypergraphs) has an -sparsifier with onlyÕ( −2 dN ) edges. Furthermore, an -sparsifier can be constructed in the dynamic graph stream model using one pass andÕ( −2 dN ) space [21,24].
First, we show that it is possible to approximate all the coverages by constructing a sparsifier of a slightly modified graph. In particular, we construct the sparsifier H of the graph G with an extra node v, i.e., V G = V G ∪ {v}, and for every hyperedge e ∈ E G , we put the hyperedge e ∪ {v} in E G . It is easy to see that for all S that is a subset of V G , we have C G (S) = δ G (S). Therefore, it is immediate that we could 1 + approximate all the coverages in G by constructing the sparsifer of G .
Theorem 16. There exists a single-pass,Õ( −2 dN )-space algorithm that finds a 1 − approximation of MaxVertexCoverage of rank d hypergraphs with high probability.
The above theorem assumes unbounded post-processing time. If k is constant, the post-processing will be polynomial. For larger k, if we still require polynomial running time then, after constructing the -sparsifier H, we could either use the (1 − (1 − 1/d) d ) approximation algorithm via linear programming [1] or the folklore (1 − 1/e) approximation greedy algorithm.

Algorithm for Near-Regular Hypergraphs
In this subsection, we show that is possible to reduce the space used toÕ( −3 dk) in the case of hypergraphs that are regular or nearly regular. Define κ ≤ 1 to be the ratio between the smallest degree and the largest degree; for a regular hypergraph κ = 1. We show that a (κ − ) approximation is possible usingÕ( −3 dk) space for rank d hypergraphs. This also implies a (1 − ) approximation for regular hypergraphs. Proof. Suppose we uniformly sample a set S of k nodes. Let L S (y) = max(0, |y ∩ S| − 1). Then the coverage of S satisfies where the last inequality follows since every node in S covers at least t 1 hyperedges.
Let ξ y (j) denote the event that j nodes in the hyperedge y are in S and let |y| denote the number of nodes in y. We have The sum |y| j=0 j Pr [ξ y (j)] is the expected value of the hypergeometric distribution and therefore it evaluates to |y|k/N . Furthermore, The last inequality follows from taking the first three terms of the Taylor's expansion. Hence, Hence, if N ≥ 4kd/ , then By an application of Markov's inequality, Thus, if we sample O(log N ) sets of k nodes in parallel, with high probability, there is a sample set S of k nodes satisfying y∈E G L S (y) ≤ kt 2 which implies that C G (S) ≥ kt 1 − kt 2 ≥ (κ − ) OPT. If N ≤ 4kd/ , we simply construct the sparsifier of G as described above to achieve a 1 − approximation.

Lower Bounds
In this section, we prove space lower bounds for data stream algorithms that approximate MaxSetCoverage or MaxVertexCoverage. In particular, these imply that improving over an (1 − 1/e) approximation of MaxSetCoverage with constant passes and constant k requires Ω(m) space. Recall that, still assuming k is constant, we designed a constant-pass algorithm that returned a (1 − 1/e − ) approximation usingÕ( −2 k) space. For constant k, we also show that improving over a κ approximation (where κ is the ratio between the lowest degree and the highest degree) for MaxVertexCoverage requires Ω(N κ 3 ) space. Our algorithm returned a κ − approximation usingÕ( −3 k) space.
Approach. We prove both bounds by a reduction from r-player set-disjointness in communication complexity. In this problem, there are r players where the ith player has a set S i ⊆ [u]. It is promised that exactly one of the following two cases happens.
• Case 1 (NO instance): All the sets are pairwise disjoint.
• Case 2 (YES instance): There is a unique element e ∈ [u] such that e ∈ S i for all i ∈ [r].
The goal of the communication problem is the rth player answers whether the input is a YES instance or a NO instance correctly with probability at least 0.9. We shall denote this problem by DISJ r (u).
The communication complexity of the above problem in p-round, one-way model (where each round consists of player 1 sending a message to player 2, then player 2 sending a message to player 3 and so on) is Ω(u/r) [15] even if the players may use public randomness. This implies that in any randomized communication protocol, the maximum message sent by a player contains Ω(u/(pr 2 )) bits. Without loss of generality, we could assume that |S 1 ∪ S 2 ∪ . . . ∪ S r | ≥ u/4 via a padding argument.
Proof. Our proof is a reduction from DISJ k (m). Consider a sufficiently large n where k divides n. For each i ∈ [m], let P i be a random partition of [n] into k sets V i 1 , . . . , V i k of equal size. Each partition is chosen independently and the players agree on these partitions using public randomness before receiving the input.
For each player j, if i ∈ S j , then she puts V i j in the stream. According to the aforementioned assumption, the stream consists of at least m/4 sets.
If the input is a NO instance, then for each i ∈ [m], there is at most one set V i j in the stream. Hence, the stream consists of independent random sets of size n/k. Therefore, for each e ∈ [n] and any k sets By an application of Chernoff bound for negatively correlated boolean random variables [35], The last inequality holds when n is a sufficiently large multiple of k −2 log m. Therefore, the maximum coverage in this case is at most (1 + )(1 − (1 − 1/k) k )n with probability at least 1 − 1/m 10 by taking the union bound over all m k ≤ m k possible k sets. If the input is a YES instance, then clearly, the maximum coverage is n. This is because there exists i ∈ [m] such that i ∈ S 1 ∩ . . . ∩ S k and therefore V i 1 , . . . , V i k are in the stream. Therefore, any constant pass and O(s)-space algorithm that finds a (1 + 2 )(1 − (1 − 1/k) k ) approximation of the maximum coverage with probability at least 0.99 implies a protocol to solve the k-party disjointness problem using O(s) bits of communication. Thus, s = Ω(m/k 2 ) as required.
Consider the sets S 1 , . . . , S r ⊆ [u] that satisfy the unique intersection promise as in DISJ r (u). Let X be the r by u matrix in which the row X i is the characteristic vector of S i . Suppose there are r = Ω(r 2 ) players. Chakrabarti et al. [14] showed that if each entry of X is given to a unique player and the order in which the entries are given to the players is random, then the players need to use Ω(u/r) bits of communication to tell whether the sets is a YES instance or a NO instance with probability at least 0.9. Thus, in any randomized protocol, the maximum message sent by a player contains Ω(u/r 3 ) bits. Hence, using the same reduction and assuming constant k, we show that the same lower bound holds even for random order stream.
Theorem 19. Assuming n = Ω( −2 k log m), any constant-pass algorithm that finds a (1 + )(1 − (1 − 1/k) k ) approximation of MaxSetCoverage with probability at least 0.99 requires Ω(m/k 3 ) space even when all the sets have the same size and arrive in random order.
Next, we prove a lower bound for the k-vertex coverage problem for graphs where the ratio between the minimum degree and the maximum degree is at least κ. We show that for constant k, beating κ approximation for constant κ requires Ω(N ) space.
Since κ can be smaller than any constant, this also establishes that Ω(N ) space is required for any constant approximation of MaxVertexCoverage.
Proof. Initially, assume k = 1. We consider the multi-party set disjointness problem DISJ t (N ) where t = 1/κ and N = N/t. Here, there are t players and the input sets are subsets of [N ]. We consider a bipartite graph where the set of possible nodes are L ∪ R where L = {u i } i∈[N ] and R = {v i,j } i∈[N ],j∈ [t] . Note that this graph has (t + 1)N = Θ(N ) nodes. However we only consider a node to exist if the stream contains an edge incident to that node.
The j-th player defines a set of edges on this graph based on their set S j as follows. If i ∈ S j she puts the edge between u i and v i,j . If S 1 , . . . , S t is a YES instance, then there must be a node u i that has degree t. If A is a NO instance, then every node in the graph has degree at most 1. Hence the ratio of minimum degree to maximum degree is at least 1/t = κ as required.
Thus, for k = 1, a 1/t approximation with probability at least 0.99 on a graph of N nodes implies a protocol to solve DISJ t (N ). Therefore, the algorithm requires Ω(N κ 3 ) space. For general k, we make k copies of the above construction to deduce the lower bound Ω(N κ 3 /k).