Streaming Algorithms for Maximizing Monotone Submodular Functions Under a Knapsack Constraint

In this paper, we consider the problem of maximizing a monotone submodular function subject to a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access only to a small fraction of the data stored in primary memory. For this problem, we propose a (0.363-ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0.363-\varepsilon )$$\end{document}-approximation algorithm, requiring only a single pass through the data; moreover, we propose a (0.4-ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0.4-\varepsilon )$$\end{document}-approximation algorithm requiring a constant number of passes through the data. The required memory space of both algorithms depends only on the size of the knapsack capacity and ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}.


Introduction
A set function f : 2 E → R + on a ground set E is called submodular if it satisfies the diminishing marginal return property, i.e., for any subsets S ⊆ T E and e ∈ E\T , we have A function is monotone if f (S) ≤ f (T ) for any S ⊆ T . Submodular functions play a fundamental role in combinatorial optimization, as they capture rank functions of matroids, edge cuts of graphs, and set coverage, just to name a few examples. Besides their theoretical interests, submodular functions have attracted much attention from the machine learning community because they can model various practical problems such as online advertising [1,11,18], sensor location [12], text summarization [16,17], and maximum entropy sampling [14].
Many of the aforementioned applications can be formulated as the maximization of a monotone submodular function under a knapsack constraint. In this problem, we are given a monotone submodular function f : 2 E → R + , a size function c : E → N, and an integer K ∈ N, where N denotes the set of positive integers. The problem is defined as where we denote c(S) = e∈S c(e) for a subset S ⊆ E. Throughout this paper, we assume that every item e ∈ E satisfies c(e) ≤ K as otherwise we can simply discard it. Note that, when c(e) = 1 for every item e ∈ E, the constraint coincides with a cardinality constraint. The problem of maximizing a monotone submodular function under a knapsack constraint is classical and well-studied. First introduced by Wolsey [20], the problem is known to be NP-hard but can be approximated within the factor of 1 − 1/e (or 1 − 1/e − ε); see e.g., [3,8,10,13,19].
In some applications, the amount of input data is much larger than the main memory capacity of individual computers. In such a case, we need to process data in a streaming fashion. That is, we consider the situation where each item in the ground set E arrives sequentially, and we are allowed to keep only a small number of the items in memory at any point. This setting effectively rules out most of the techniques in the literature, as they typically require random access to the data. In this work, we also assume that the function oracle of f is available at any point of the process. Such an assumption is standard in the submodular function literature and in the context of streaming setting [2,7,21]. Badanidiyuru et al. [2] discuss several interesting and useful functions where the oracle can be implemented using a small subset of the entire ground set E.
We note that the problem, under the streaming model, has so far not received its deserved attention in the community. Prior to the present work, we are aware of only two: for the special case of cardinality constraint, Badanidiyuru et al. [2] gave a single-pass (1/2 − ε)-approximation algorithm; for the general case of a knapsack constraint, Yu et al. [21] gave a single-pass (1/3 − ε)-approximation algorithm, both using O(K log(K )/ε) space.
We now state our contribution.

Our Technique
We begin by a straightforward generalization of the algorithm of [2] for the special case of cardinality constraint (Sect. 2). This algorithm proceeds by adding a new item into the current set only if its marginal-ratio (its marginal return with respect to the current set divided by its size) exceeds a certain threshold. This algorithm performs well when all items in OPT are relatively small in size, where OPT is an optimal solution. However, in general, it only gives (1/3 − ε)-approximation. Note that this technique can be regarded as a variation of the one in [21]. To obtain better approximation ratio, we need new ideas.
The difficulty in improving this algorithm lies in the following case: A new arriving item that is relatively large in size, passes the marginal-ratio threshold, and is part of OPT, but its addition would cause the current set to exceed the capacity K . In this case, we are forced to throw it away, but in doing so, we are unable to bound the ratio of the function value of the current set against that of OPT properly.
We propose a branching procedure to overcome this issue. Roughly speaking, when the function value of the current set is large enough (depending on the parameters), we create a secondary set. We add an item to the secondary set only if it passes the marginal-ratio threshold (with respect to the original set) but its addition to the original set would violate the size constraint. In the end, whichever set achieves the higher value is returned. In a way, the secondary set serves as a "back-up" with enough space in case the original set does not have it, and this allows us to bound the ratio properly. Sections 3 and 4 are devoted to explaining this branching algorithm, which gives (4/11 − ε)-approximation with a single pass.
We note that the main bottleneck of the above single-pass algorithm lies in the situation where there is a large item in OPT whose size exceeds K /2. In Sect. 5, we show that we can first focus on only the large items (more specifically, those items whose size differ from the largest item in OPT by (1 + ε) factor) and choose O(1) of them so that at least one of them, along with the rest of OPT (excluding the largest item in it), gives a good approximation to f (OPT). Then in the next pass, we can apply a modified version of the original single-pass algorithm to collect small items. This multiple-pass algorithm gives a (2/5 − ε)-approximation.

3:
while item e is arriving do 4: K −c(S) and c(S + e) ≤ K then S := S + e. 5: return S.

Related Work
Maximizing a monotone submodular function subject to various constraints is a subject that has been extensively studied in the literature. We are unable to give a complete survey here and only highlight the most representative and relevant results. Besides a knapsack constraint or a cardinality constraint mentioned above, the problem has also been studied under (multiple) matroid constraint(s), p-system constraint, multiple knapsack constraints. See [4,6,8,9,13,15] and the references therein. In the streaming setting, other than the knapsack constraint that we have discussed before, there are also works considering a matroid constraint. Chakrabarti and Kale [5] gave 1/4approximation; Chekuri et al. [7] gave the same ratio.

Notation
For a subset S ⊆ E and an element e ∈ E, we use the shorthand S + e and S − e to stand for S ∪ {e} and S\{e}, respectively. For a function f : 2 E → R, we also use the shorthand f (e) to stand for f ({e}). The marginal return of adding e ∈ E with respect to S ⊆ E is defined as f (e | S) = f (S + e) − f (S). We frequently use the following, which is immediate from the diminishing marginal return property:

Single-Pass (1/3 − ")-Approximation Algorithm
In this section, we present a simple (1/3−ε)-approximation algorithm that generalizes the algorithm for a cardinality constraint in [2]. This algorithm will be incorporated into several other algorithms introduced later.

Thresholding Algorithm with Approximate Optimal Value
In this subsection, we present an algorithm MarginalRatioThresholding, which achieves (almost) 1/3-approximation given a (good) approximation v to f (OPT) for an optimal solution OPT. This assumption is removed in Sect. 2.2. Given a parameter α ∈ (0, 1] and v ∈ R + , MarginalRatioThresholding attempts to add a new item e ∈ E to the current set S ⊆ E if its addition does not violate the knapsack constraint and e passes the marginal-ratio threshold condition, i.e., The detailed description of MarginalRatioThresholding is given in Algorithm 1. Throughout this subsection, we fixS = MarginalRatioThresholding(α, v) as the output of the algorithm. Then, we have the following lemma.
where the last inequality follows from the induction hypothesis on the lower bound of f (S). Thus (1) holds. For (2), as the current set satisfies S ⊆S, by the submodularity of f , where the last inequality follows from the first part of the lemma.
An item e ∈ OPT is not added toS if either e does not pass the condition (2), or its addition would cause the size of S to exceed the capacity K . We name the latter condition as follows: Definition 1 An item e ∈ OPT is called bad if e passes the condition (2) but the total size exceeds K when added, i.e., f (e | S) ≥ αv− f (S) K −c(S) , c(S + e) > K and c(S) ≤ K , where S is the set we have just before e arrives.
The following lemma says that, if there is no bad item, then we obtain a good approximation.  Since we have no bad item, f (e |S) ≤ αvc(e)/K for any e ∈ OPT\S by Lemma 1 (2).
Consider an algorithm Singleton, which takes the best singleton as shown in Algorithm 2. If some item e ∈ OPT is bad, then, together withS = Singleton(), we can achieve (almost) 1/3-approximation.
the algorithm that runs MarginalRatioThresholding(2/3, v) and Singleton() in parallel and chooses the better output has the approximation ratio of 1 3(1+ε) ≥ 1 3 − ε. The space complexity of the algorithm is clearly O(K ).

Dynamic Updates
MarginalRatioThresholding requires a good approximation v to f (OPT). This requirement can be removed with dynamic updates in a similar way to [2]. We here present the details based on [2]. We first observe that max Then, we can run MarginalRatioThresholding for each v ∈ I in parallel and choose the best output. As the size of To get rid of the assumption that we are given m in advance, we consider an algorithm, called DynamicMRT, which dynamically updates m to determine the range of guessed optimal values. More specifically, it keeps the (tentative) maximum value max f (e), where the maximum is taken over the items e arrived so far, and keeps the approximations v in the interval between m and K m/α. The details are provided in Algorithm 3. We have the following guarantee.

Theorem 3
For ε ∈ (0, 1], the algorithm that runs DynamicMRT(ε, 2/3) and Singleton() in parallel and outputs the better output is a (1/3 − ε)-approximation streaming while item e is arriving do 5: m := max{m, f (e)}. 6: algorithm with a single pass for the problem (1). The space complexity of the algorithm Proof Let e ∈ E be an item arriving. We will show that, if v > K m/α (for α = 2/3), then e always fails the condition (2) in DynamicMRT. Indeed, if v > K m/α and e passes the condition (2) with the current set S, then Lemma 1(1) implies that where the last inequality follows from the fact that c(e) ≥ 1 and m ≥ max e ∈S+e f (e ).
On the other hand, f (S + e) ≤ |S + e| max e ∈S+e f (e ) as f is submodular, which is a contradiction. Therefore, when an item e ∈ E arrives, e may be added to the current set only if v ≤ K m/α. Moreover, since Singleton returns an item e with f (e) ≥ m, we can discard the case when v < m during the process of DynamicMRT. Thus DynamicMRT simulates all the values in V, only keeping the values in the interval There are O(log(K )/ε) streams, and each stream may have a solution with size O(K ). Thus, the total space is as desired.

Improved Single-Pass Algorithm for Small-Size Items
The main goal of this section is achieving

Branching Framework with Approximate Optimal Value
We here provide a framework of a branching algorithm BranchingMRT as Algorithm 4. This will be used with different parameters in Sect. 3.2.
Let v and c 1 be (good) approximations to f (OPT) and c(o 1 )/K , respectively, and let b ≤ 1/2 be a parameter. The value c 1 is supposed to satisfy c 1 ≤ c(o 1 )/K ≤ (1+ε)c 1 .

Algorithm 4
while item e is arriving do 5: Ignore e with c(e) > min{(1 if f (S) ≥ λ then break // leave the while loop. 8: Letê be the latest added item in S.
while item e is arriving do 12: Ignore e with c(e) > min{(1

16:
return S or S whichever has the larger function value.
This means that we can ignore items e ∈ E with c(e) > min{(1 + ε)c 1 , 1/2}K (See lines 5 and 12). The basic idea of BranchingMRT is to take only items with large marginal ratios, similarly to MarginalRatioThresholding. The difference is that, we store either the current set S or the latest added item as S (See line [7][8][9][10]. This guarantees that f (S ) ≥ λ and c(S ) ≤ (1−b)K , which means that S has a large function value and sufficient room to add more elements. We call the process of constructing S branching. In the second while-loop, we continue to add items with large marginal ratios to the current set S, and if we cannot add an item to S because it exceeds the capacity, we try to add the item to S . Note that the set S , after branching, can have at most one extra item; but this extra item can be replaced if a better candidate comes along (See line [14][15].
Remark that the sequence of sets S in BranchingMRT is identical to that in Marginal-RatioThresholding in Sect. 2. We say that an item e ∈ OPT is bad if it is bad in the sense of MarginalRatioThresholding, i.e., it satisfies the condition in Definition 1. We have the following two lemmas.

Lemma 3 For a bad item e with c(e) ≤ bK , let S e be the set just before e arrives in
Algorithm 4. Then f (S e ) ≥ λ holds. Thus branching has happened before a bad item arrives.
Since the value of f is non-decreasing during the process, it means that branching has happened before a bad item arrives.

Lemma 4 It holds that f
Proof We denote by S the set obtained right after leaving the while loop from Line 4. If LetS andS be the final two sets computed by BranchingMRT. Note that we can regardS as the output of MarginalRatioThresholding andS as the final set obtained by adding at most one item to S 0 .
Observe that the number of bad items depends on the parameter α. As we will show in Sect. 3.2, by choosing a suitable α, if we have more than two bad items, then the size ofS is large enough, implying that f (S) is already good for approximation [due to Lemma 1(1)]. Therefore, in the following, we just concentrate on the case when we have at most two bad items.

Lemma 5
Let α be a number in (0, 1], and suppose that we have only one bad item Since S 0 ⊆S, submodularity implies that Since f (S) < βv and no item in O s is bad, (3) and (4) imply by Lemma 1 (2) that Therefore, we have which is a contradiction. This completes the proof.
For the case when we have exactly two bad items, we obtain the following guarantee.

Lemma 6
Let α be a number in (0, 1], and suppose that we have exactly two bad items Proof Suppose not, that is, suppose that both of f (S) and f (S ) are smaller than βv, Since f (S) < βv and no items in O s are bad, this implies by Lemma 1(2) that which is a contradiction.

Algorithms with Guessing Large Items
We now use BranchingMRT to obtain a better approximation ratio. In the new algorithm, we guess the sizes of a few large items in an optimal solution OPT, and then use them to determine the parameter α. We first remark that, when |OPT| ≤ 2, we can easily obtain a 1/2-approximate solution with a single pass. In fact, since We start with the case that we have guessed the largest two sizes c(o 1 ) and c(o 2 ) in OPT.
Proof LetS = MarginalRatioThresholding(α, v). Note that f (S ) ≥ f (S). IfS has size at least (1 − (1 + ε)c 2 )K , then Lemma 1(1) implies that Otherwise, c(S) < (1 − (1 + ε)c 2 )K . In this case, we see that only the item o 1 can have size more than (1 + ε)c 2 K , and hence only o 1 can be a bad item. If o 1 is not a bad item, then we have no bad item, and hence Lemma 2 implies that If o 1 is bad, then Lemma 5 implies that Thus the approximation ratio is the minimum of the RHSes of the above three inequalities. This is maximized when α = 1/(2−c 2 ) or α = 2/(5−4c 2 −c 1 ), and the maximum value is equal to the RHS of (6).
Note that the approximation ratio achieved in Lemma 7 becomes 1/3− O(ε) when, for example, c 1 = c 2 = 1/2. Hence, the above lemma does not show any improvement over Theorem 2 in the worst case. Thus, we next consider the case that we have guessed the largest three sizes c(o 1 ), c(o 2 ), and c(o 3 ) in OPT. Using Lemma 6, we have the following guarantee.
Then the better outputS of BranchingMRT(ε, α, v, c 1 , b 1 ) and In this case, we see that only o 1 and o 2 can have size more than (1 + ε)c 3 , and hence only they can be bad items. If we have no bad item, it holds by Lemma 2 that Suppose we have one bad item. If it is o 1 then Lemma 5 with b 1 implies and, if it is o 2 , we obtain by Lemma 5 with b 2 Moreover, if we have two bad items o 1 and o 2 , then Lemma 6 implies Therefore, the approximation ratio is the minimum of the RHSes in the above five inequalities, which is maximized to We now see that we get an approximation ratio of 2/5 − O(ε) by combining the above two lemmas.

Theorem 4 Let ε ∈ (0, 1] and suppose that v ≤ f (OPT) ≤ (1 + ε)v and c i ≤ c(o i )/K ≤ (1 + ε)c i for i ∈ {1, 2, 3}. If c(o 1 ) ≤ K /2, then we can obtain a (2/5 − O(ε))-approximate solution with a single pass.
Proof We run the two algorithms with the optimal α shown in Lemmas 7 and 8 in parallel. LetS be the output with the better function value. Then, we have f (S) ≥ βv, where We can confirm that the first term is at least 2/5, and thusS is a (2/5 − O(ε))approximate solution.
To eliminate the assumption that we are given v, we can design a dynamic-update version of BranchingMRT by keeping the interval that contains the optimal value, similarly to Theorem 3. The algorithm, called DynamicBranchingMRT, is given in Algorithm 5. The proof for updating the interval I dynamically is the same as the proof of Theorem 3. The number of streams for guessing v is O(log(K )/ε). We also guess

Single-Pass (4/11 − ")-Approximation Algorithm
In this section, we consider the case that c(o 1 ) is larger than K /2. For the purpose, we consider the problem of finding a set S of items that maximizes f (S) subject to the constraint that the total size is at most pK , for a given number p ≥ 2. We say that a set S of items is a ( p, α)-approximate solution if c(S) ≤ pK and f (S) ≥ α f (OPT), where OPT is an optimal solution of the original instance.
The proof will be given in the next subsection. Using Theorem 6, we can design a (4/11 − ε)-approximation streaming algorithm for an instance having a large item.

22:
return S or S whichever has the larger function value.

Theorem 7 For the problem (1), there exists a (4/11 − ε)-approximation streaming algorithm with a single pass. The space complexity of the algorithm is O(K (log(K )/ε) 4 ).
Proof Let o 1 be an item in OPT with the maximum size. If c(o 1 ) ≤ K /2, then Theorem 5 gives a (2/5 − O(ε))-approximate solution, and thus we may assume that c(o 1 ) > K /2. Note that there exists only one item whose size is more than K /2. Let β be the target approximation ratio which will be determined later. We may assume that f (o 1 ) < β f (OPT), as otherwise Singleton (Algorithm 2) gives β-approximation.

Bicriteria Approximation for a Knapsack Constraint
We here present the proof of Theorem 6. Let p ≥ 2.
The basic framework is the same as in Sect. 3; we design both a simple-thresholding algorithm and a branching algorithm, where the parameters are different and the analysis is simpler. It is sufficient to design algorithms assuming that a (good) approximation v to f (OPT) is given, as we can get rid of the assumption by using the dynamic update technique.
We design a variant of MarginalRatioThresholding. The new algorithm is parameterized by a number p ≥ 2. In the algorithm we allow to pack items to the total size at most pK . Also, we change the marginal-ratio threshold condition to the following: Let MarginalRatioThresholding p be the resulting algorithm. Similarly to Lemma 1, the following lemma holds. The proof is omitted as it is almost identical to that of Lemma 1. = MarginalRatioThresholding p (α, v) for some α ∈ (0, 1] and v ∈ R + . Then, the following hold: Determining α using a good approximation to the largest size c(o 1 ) in OPT gives the following approximation guarantee:

Lemma 9 LetS
Proof If the outputS has size at least ( p − (1 + ε)c 1 )K , then we have by Lemma 9(1) Otherwise, c(S) < (p − (1 + ε)c 1 )K , and hence there is no bad item. Similarly to Lemma 2, it follows from Lemma 9(2) that we have The approximation ratio is the minimum of the RHSes of the above two inequalities, which is maximized to Next, we design a branching algorithm based on BranchingMRT. Here, the parameter b should be at most 1, and the marginal-ratio threshold is replaced with (7). Also, λ is set to be and, at Line 9 of Algorithm 4, the condition is changed to ( p−b)K instead of (1−b)K . Let BranchingMRT p be the resulting algorithm. The analysis in Sect. 3 can be adapted:

Lemma 11
The following hold for BranchingMRT p : -For a bad item e ∈ E with c(e) ≤ bK , let S e be the set just before e arrives. Then f (S e ) ≥ λ holds. Thus, branching has happened before a bad item arrives.
Note that in the second statement, we do not need the assumption that b ≤ 1/2 as Determining α using good approximations to the largest two sizes c(o 1 ) and c(o 2 ) in OPT gives the following approximation guarantee: Proof If the outputS has size at least ( p − (1 + ε)c 2 )K , then we have by Lemma 9(1) Otherwise, c(S) < (p − (1 + ε)c 2 )K . In this case, we see that there exists at most one bad item. If we have no bad item, it holds by Lemma 9(2) that Suppose that we have one bad item, which must be o 1 . Following the proof of Lemma 5, we see that The approximation ratio is the minimum of the RHSes of the above three inequalities. It is maximized to when α = 1 p−c 2 +1 or α = 2 c 1 + p+2 . This is in fact equal to c 1 + p Therefore, if we apply both of the above algorithms and take the better one, we obtain a setS ⊆ E satisfying This is minimized when c 1 = p/3, and hence we have Thus the setS can be found in O(K (log(K )/ε) 2 ) space. This, together with the dynamic update technique to guess v, proves Theorem 6.

Multiple-Pass Streaming Algorithm
In this section, we provide a multiple-pass streaming algorithm with approximation ratio 2/5 − ε. In Sect. 5.1, we consider the monotone submodular function maximization with a different constraint and develop an algorithm for it. In Sect. 5.2, this algorithm is used as a subroutine for the original problem with a knapsack constraint.

Dealing with Large Items with Single Pass
We first consider a generalization of the original problem. Let E r ⊆ E be a subset of the ground set E. For ease of presentation, we will call E r the red items. Consider the problem defined below: In the following, we show that, given ε ∈ (0, 1], an approximation v to f (OPT) with v ≤ f (OPT) ≤ (1 + ε)v, and an approximation θ to f (o r ) for the unique item o r in OPT ∩ E r , we can choose O(1) of the red items so that one of them e ∈ E r satisfies that f where Γ is a piecewise linear function lower-bounded by 2/3. For technical reasons, we will choose θ to be one of the geometric series (1 + ε) i /2 for i ∈ Z.
Our algorithm picks the first red item e with f (e) ≥ θv/(1 + ε). Then, while each item e ∈ E r such that f (e) ≥ θv/(1 + ε) arrives, we add e to our current set S if f (e | S) > (θ − x|S|)v, where x is a parameter optimized later. This is repeated until we have t + 1 red items, where t is a constant determined later. Our algorithm, SelectRedItems, is given in Algorithm 6.
Observe that, As we will see below, the algorithm guarantees a lower bound of the function value f (OPT − o r + e) for some chosen red item e.
The following lemma follows immediately from the algorithm.

Lemma 13
During the execution of SelectRedItems, it holds that Proof Since the first item e we pick satisfies f (e) ≥ θ 1+ε v, the condition at line 8 implies that Thus the lemma follows.
The next lemma states that if o r is thrown away at line 8 of the algorithm, then one of the red items in S is already good for our purpose.

Lemma 14
Suppose that |S| < t + 1 holds for the current set S ⊆ E r and the arriving item is o r and is thrown away at line 8 of the algorithm. Then at least one red item e ∈ S satisfies f ( implying that at least one red item e ∈ S ensures that So we obtain But if this is the case, o r would not have been thrown away by the algorithm in line 8, since Thus the proof is complete.
The next lemma states that, if |S| = t + 1, we can just ignore the rest, no matter o r has arrived or not.
To avoid triviality, we assume that e 1 = o r . The next lemma states that if there are two items in the returned set S, at least one serves the purpose.
The next lemma states that if o r is thrown away by the algorithm, e 1 itself is already good for approximation.
On the other hand, it holds that By submodularity, f (o r | OPT − o r + e 1 ) ≤ f (o r | e 1 ) and the above two inequalities lead to a contradiction.
By the previous two lemmas, one of the red items in the returned set S, along with OPT − o r , gives (2/3 − O(ε))v. We then can define Γ as 2/3 in the interval θ ∈ [1/2, 2/3).

Lemma 18
When c(o 1 ) ≥ K /2, we may assume that 3 10 Thus the lemma holds.

Lemma 19
We may assume that c(o 1 Thus the lemma holds. We use the first pass to estimate f (OPT) as follows. For an error parameter ε ∈ (0, 1], perform the single-pass algorithm in Theorem 3 to get a (1/3 − ε)-approximate solution S ⊆ E, which can be used to upper bound the value of f (OPT), that is, We then find the geometric series to guess its exact value. Thus, we may assume that we are given the value v Below we show how to obtain a solution of value at least (2/5 − O(ε))v, using two more passes. Before we start, we introduce slightly modified versions of the algorithms presented in Sect. 2; it will be used as a subroutine.

6:
Let the resultant set be S e .
If there is a bad item, then the set S just before some bad item e arrives satisfies that f (S + e) ≥ αv . Hence f (S) or some singleton has the value at least αv /2. Therefore, when α = 2h h+2 , the lower bound is maximized and the ratio in this case is h h+2 . Let all items e ∈ E whose sizes c(e) satisfy c 1 /(1 + ε) ≤ c(e)/K ≤ c 1 be the red items. By Theorem 8, we can select a set S of the red items so that one of them guarantees f (OPT − o 1 + e) ≥ (Γ (θ ) − O(ε))v, where θ satisfies the condition in Theorem 8. Note that θ can be guessed by a geometric series from the interval [ 3 10 v, 2 5 (1 + ε)v] by Lemma 18. The space required is O(ε −1 ). In the next pass, for each e ∈ S, define a new monotone submodular function g e (·) = f (· | e) and apply the modified thresholding algorithm (Lemma 20) with h =