Generalized budgeted submodular set function maximization

In this paper we consider a generalization of the well-known budgeted maximum coverage problem. We are given a ground set of elements and a set of bins. The goal is to find a subset of elements along with an associated set of bins, such that the overall cost is at most a given budget, and the profit is maximized. Each bin has its own cost and the cost of each element depends on its associated bin. The profit is measured by a monotone submodular function over the elements. We first present an algorithm that guarantees an approximation factor of $\frac{1}{2}\left(1-\frac{1}{e^\alpha}\right)$, where $\alpha \leq 1$ is the approximation factor of an algorithm for a sub-problem. We give two polynomial-time algorithms to solve this sub-problem. The first one gives us $\alpha=1- \epsilon$ if the costs satisfies a specific condition, which is fulfilled in several relevant cases, including the unitary costs case and the problem of maximizing a monotone submodular function under a knapsack constraint. The second one guarantees $\alpha=1-\frac{1}{e}-\epsilon$ for the general case. The gap between our approximation guarantees and the known inapproximability bounds is $\frac{1}{2}$. We extend our algorithm to a bi-criterion approximation algorithm in which we are allowed to spend an extra budget up to a factor $\beta\geq 1$ to guarantee a $\frac{1}{2}\left(1-\frac{1}{e^{\alpha\beta}}\right)$-approximation. If we set $\beta=\frac{1}{\alpha}\ln \left(\frac{1}{2\epsilon}\right)$, the algorithm achieves an approximation factor of $\frac{1}{2}-\epsilon$, for any arbitrarily small $\epsilon>0$.


Introduction
The Maximum Coverage (MC) is a fundamental combinatorial optimization problem which has several applications in job scheduling, facility locations and resource allocations [14,Ch. 3], as well as in influence maximization [17]. In the classical definition we are given a ground set X, a collection S of subsets of X with unit cost, and a budget k. The goal is selecting a subset S ⊆ S, such that |S | ≤ k, and the number of elements of X covered by S is maximized. A natural greedy algorithm starts with an empty solution and iteratively adds a set with maximum number of uncovered elements until k sets are selected. This algorithm has an approximation of 1 − 1 e [20] and such result is tight given the inapproximability result due to Feige [11]. An interesting special case of the problem where this inapproximability result does not hold is when the size of the sets in S is small. In the maximum h-coverage, h denotes the maximum size of each set in S. This problem is APX-hard for any h ≥ 3 [16] (notice that when h = 2 it is the maximum matching problem), while a simple polynomial local search heuristic has an approximation ratio very close to 2 h [4]. A polynomial time algorithm with approximation factor of 5 6 is possible for the case when h = 3 [5]. In the Budgeted Maximum Coverage (BMC) problem, which is an extension of the maximum coverage, the cost of the sets in S are arbitrary, and thus a solution is feasible if the overall cost of the selected subset S ⊆ S is at most k. In [18], the authors present a polynomial time (greedy) algorithm with approximation factor of 1 − 1 e . In the Generalized Maximum Coverage (GMC) problem every set s ∈ S has a cost c(s), and every element x ∈ X has a different weight and cost that depend on which set covers it. In [7], a polynomial time (greedy) algorithm with approximation factor of 1 − 1 e − , for any > 0, has been shown. In all the above problems the profit of a solution is given by the sum of the weights of the covered elements. An important and studied extension is adopting a nonnegative, nondecreasing, submodular function f , which assigns a profit to each subset of elements. In the Submodular set Function subject to a Knapsack Constraint maximization (SFKC) problem we have a cost c(x) for any element x ∈ X, and the goal is selecting a set X ⊆ X of elements that maximizes f (X ), where f is a monotone submodular function subject to the constraint that the sum of the costs of the selected elements is at most k. This problem admits a polynomial time algorithm that is 1 − 1 e -approximation [23]. Since the MC problem is a special case of SFKC problem, such result is tight. A more general setting was considered in [15], where the authors consider the following problem called Submodular Cost Submodular Knapsack (SCSK): given a set of elements V = {1, 2, . . . , n}, two monotone non-decreasing submodular functions g and f (f, g : 2 V → R), and a budget b, the goal is finding a set of elements X ⊆ V that maximizes the value g(X) under the constraint that f (X) ≤ b. They show that the problem cannot be approximated within any constant bound. Moreover, they give a 1/n approximation algorithm and mainly focus on bi-criterion approximation.
In this paper we consider the Generalized Budgeted submodular set function Maximization problem (GBSM) that is not captured by any of the above settings. We are given a ground set of elements X, a set of bins S, and a budget k. The goal is to find a subset of elements along with an associated set of bins such that the overall costs of both is at most a given budget and the profit is maximized. Each bin has its own cost, while the cost of each element depends on its associated bin. Finally, the profit is measured by a monotone submodular function over the elements.
We emphasize that the problem considered here is not a special case of the GMC problem, since we consider any monotone submodular functions for the profits. Moreover, we now show that our cost function is not submodular and thus that the setting (SCSK) considered in [15] does not generalize our model. Given a set of elements X, we denote with c(X) the minimum cost of covering them, that is the best choice of the bins able to cover the elements of X with the minimum cost. Consider the instance (S, X) = ({s 1 , s 2 }, {x 1 , x 2 , x 3 }) with c(s 1 ) = c(s 2 ) = 1 and the costs of associating elements to bins depicted in Table 1, where M is a large positive value. Let T = {x 1 , x 2 } and T ⊇ S = {x 1 }. We now show that In fact, notice that c(S) = 2 − , that is covering Table 1 S, X c(S, X) the element x 1 with bin s 2 . Moreover, c(T ) = 3, that is covering the elements x 1 and x 2 with bin s 1 . Finally, c(S ∪ {x 3 }) = 1 + 1 − + = 2 (i.e. x 1 and x 3 are associated to s 2 ) and c(T ∪ {x 3 }) = 1 + 1 + 1 + 1 − + (i.e. x 1 and x 3 are associated to s 2 , while x 2 is associated to s 1 ). We conclude that Finally, we notice that our setting extends the SFKC problem, given that, the cost of an element is not fixed like in SFKC, but instead depends on the bin used for covering it.
In addition to its theoretical appeal, our setting is motivated by the adaptive seeding problem, which is an algorithmic challenge motivated by influence maximization in social networks [1,22]. In its non-stochastic version, the problem is to select amongst certain accessible nodes in a network, and then select amongst neighbors of those nodes, in order to maximize a global objective function. In particular, given a set X and its neighbors N (X) there is a monotone submodular function defined on N (X), and the goal is to select t ≤ k elements in X connected to a set of size at most k − t for which the submodular function has the largest value. Our setting is an extension of it since we consider more general costs.

Our results
In Section 3 we present an algorithm that guarantees an approximation factor of 1 2 1 − 1 e α for GBSM. Here, α is the approximation factor of an algorithm used to select a subset of elements whose ratio between marginal increment in the objective function and marginal cost is maximum. We give two polynomial-time algorithms to solve this sub-problem. In particular, in Section 4 we propose an algorithm that gives us α = 1 − if the costs satisfy a specific condition. This latter is fulfilled in several relevant cases including the unitary costs case and the problem of maximizing a monotone submodular function under a knapsack constraint. In Section 5 we propose an algorithm that guarantees α = 1 − 1 e − for the general case.
The gap between our approximation guarantees and the known inapproximability bounds, i.e. the 1− 1 e hardness for the MC problem [11] and the 1− 1 hardness for the non-stochastic adaptive seeding problem with knapsack constraint [21], is 1 2 , unless P = N P . In Section 6, we extend our algorithm to a bi-criterion approximation algorithm in which we are allowed to spend an extra budget up to a factor β. An algorithm gives a [ρ, β] bi-criterion approximation for GBSM if it is guaranteed to obtain a solution (S , X ) such that f (X ) ≥ ρf (X * ) and c(S , X ) ≤ βk, where X * is the optimal solution. We denote by β the extra-budget we are allowed to use in order to obtain a better approximation factor. Our algorithm guarantees a 1 2 1 − 1 e αβ , β -approximation. If we set β = 1 α ln 1 2 , the algorithm achieves an approximation factor of 1 2 − , for any arbitrarily small > 0.

Related work
Maximum coverage and submodular set function maximization are important problems. In the literature, besides the above mentioned ones, there are many other papers dealing with related issues. For instance, in the maximum coverage with group budgeted constraints, the set S is partitioned into groups, and the goal is to pick k sets from S to maximize the cardinality of their union with the restriction that at most one set can be picked from each group. In [6], the authors propose a 1 2 -approximation algorithms for this problem, and smaller constant approximation algorithm for the cost version. In the ground-set-cost budgeted maximum coverage problem, given a budget and a hypergraph, where each vertex has a non-negative cost and a non-negative profit, we want to select a set of hyperedges such that the total cost of the covered vertices is at most the budget and the total profit of all covered vertices is maximized. This problem is strictly harder than budgeted max coverage. The difference of our problem to the budgeted maximum coverage problem is that the costs are associated with the covered vertices instead of the selected hyperedges. In [24], the authors obtain a 1 2 1 − 1 √ e -approximation algorithm for graphs (which means having sets of size 2) and an FPTAS if the incidence graph of the hypergraph is a forest (i.e. the hypergraph is Berge-acyclic).
Maximizing submodular set function is another important research topic. The general version of the problem is: given a set of elements and a monotone submodular function, the goal is to find the subset of elements that gives the maximum value, subjected to some constraints. The case when the subset of elements must be an independent set of the matroid over the set of elements has been considered in [3], where the authors show an optimal randomized 1 − 1 e -approximation algorithm. A simpler algorithm has been proposed in [13]. The case of multiple k matroid constraints has been considered in [19], where the authors give a 1 k+ -approximation. An improved result appeared in [26]. Finally, unconstrained (resp. constrained) general non-monotone submodular maximization, have been considered in [2,12] (resp. [25]).
Another related topic is the adaptive seeding problem in which the aim is to select amongst a set X of nodes of a network, called the core, and then adaptively selecting amongst the neighbors N (X) of those nodes as they become accessible in order to maximize a submodular function of the selected nodes in N (X) [1,22]. An approximation algorithm with ratio 1 − 1 e 2 has been proposed in [1]. In the adaptive seeding with knapsack constraints problem, nodes in X and in N (X) are associated with a cost and the aim is to maximize the objective function while respecting a budget constraint. In this case, an 1 − 1 e 1 − 1 e 1− 1 e approximation algorithm is known [21]. In the non-stochastic version of these problems, all the nodes in N (X) become accessible with probability one. Even in this case it is not possible to approximate an optimal solution within a factor greater than 1 − 1 e 1− 1 e , unless P = N P . A similar problem in which the core is made of the whole network and the network can be augmented by adding edges according to a given cost function has been shown to admit a 0.0878-approximation algorithm [9]. Finally, in [8,10] the authors consider the problem where the core is made of a give set of nodes and the network can be augmented by adding edges incident only to the nodes in the core. In the unit-cost version of the problem where the cost of adding any edge is constant and equal to 1 the problem is N P -hard to be approximated within a constant factor greater than 1 − (2e) −1 . Then they provide a greedy approximation algorithm that guarantees an approximation factor of 1 − 1 e − , where is any positive real number. Then, they study the more general problem where the cost of edges is in [0, 1] and propose an algorithm that achieves an approximation guarantee of 1 − 1 e F. Cellinese, G. D'Angelo, G. Monaco, and Y. Velaj 31:5 combining greedy and enumeration technique.

Preliminaries
We are given a set X of n elements and a set S of m bins. Let us denote the cost of a bin s ∈ S by c(s) ∈ R ≥0 . For each bin s ∈ S and element x ∈ X, we denote by c(s, x) the cost of associating x to s. Given a budget k ∈ R ≥0 , and a monotone submodular function f : 2 X → R ≥0 1 , our goal is to find a subset X of X and a subset S = ∅ of S such that c(S , X ) = s∈S c(s) + x∈X min s∈S c(s, x) ≤ k, and f (X ) is maximum. We call this problem the Generalized Budgeted submodular set function Maximization problem (GBSM).
Our problem generalizes several well-known problems. Indeed, by setting c(s, x) = ∞, we do not allow the association of element x to bin s, while by setting c(s, x) = 0 we allow to assign element x to bin s with no additional cost. Moreover, we relax the constraints related to the association of elements to bins by setting c(s) = 0 for each s ∈ S, and c(s 1 , x) = c(s 2 , x), for each s 1 , s 2 ∈ S and x ∈ X. By suitably combining these conditions we can capture the following problems: budgeted maximum coverage problem [18]; non-stochastic adaptive seeding problem [1] (also with knapsack constraints [21]); monotone submodular set function subject to a knapsack constraint maximization [23]. Moreover, our cost function is not submodular and thus that the setting considered in [15] does not generalize our model.
Let us consider a partial solution (S , X ). Given a set T ⊆ X \ X , we denote by c min (T ) the minimum cost of associating the elements in T with a single bin in S, considering that the cost of bins in S has been already paid, formally: where c S (s) = c(s) if s ∈ S , and c S (s) = 0 if s ∈ S . We call c min (T ) the marginal cost of T with respect to the partial solution (S , X ). We define s min (T ) as the bin s ∈ S needed to cover T with cost c min (T ). Moreover, we denote byc(T ) the cost of associating the elements in T to s min (T ),c(T ) = c min (T ) − c S (s min (T )).
The marginal increment of T ⊆ X with respect to the partial solution (S , X ) is defined as f (X ∪ T ) − f (X ). To simplify the notation, we use In the algorithm in the next section, we will look for subsets of X that maximize the ratio between the marginal increment and the marginal cost with respect to some partial solution. In the following we define a family of subsets of X containing a set that approximates such maximal ratio. Given a partial solution (S , X ), we denote by F the family of subsets T of X that can be associated to bins in S ∪ {s}, for some single bin s ∈ S, with a cost such sub-family of F is an α-list with respect to (S , X ) if it contains a subset T whose ratio between marginal increment and marginal cost is at least α times the optimal such ratio amongst all the subsets F.

31:6
Generalized budgeted submodular set function maximization Note that the sets that maximize the above formula are not necessarily singletons due to the bin opening cost. Moreover, the algorithm given in the next section build partial solutions (S , X ) in such a way that c min (T ) > 0, for each T ∈ F.

Greedy Algorithm
In this section we give an algorithm that guarantees a 1 2 1 − 1 e α -approximation to the GBSM problem, if we assume that we can compute, in polynomial time, an α-list of polynomial size. In the next sections we will give two algorithms to compute such lists for bounded values of α.
The pseudo-code is reported in Algorithm 1. In the first step (line 3) we add all zero-cost bins to the solution. Then, the algorithm finds two candidate solutions. The first one is found at lines 4-11 with a greedy strategy as follows. The algorithm iteratively constructs a partial solution (S , X ) by adding a subsetT to X and a bin s min (T ) to S . In particular, at each iteration, it first adds all the elements that can be associated to S with cost 0 (line 5). Then, it selects a subsetT that maximizes the ratio between the marginal increment and the marginal cost amongst the elements of an α-list L. Here, we assume that we have an algorithm to compute an α-list L w.r.t. (S , X ) (see line 6). In the next sections, we will show how to compute L in polynomial time for some bounded α. The algorithm stops when adding the element with the maximum ratio would exceed the budget k or when X = X. Without loss of generality, we can assume that at each iteration, the sets in the α-list L do not contain any element in X , since such elements do not increase the value of the marginal increment and possibly increase the marginal cost. This implies that at each iteration of the greedy procedure at least a new element in X is added to X and then the number of iterations is O(n).
Let (S G , X G ) be the first candidate solution computed at the end of the greedy procedure. The second candidate solution (lines [12][13][14] is computed by using the setT that is discarded in the last iteration of the greedy procedure because adding {s min (T )} andT to (S G , X G ) would exceed the budget. Indeed, the second candidate solution is (S G ∪ {s min (T )},T ). Note that this solution is feasible becauseT is contained in the α-list L computed in the last iteration of the greedy algorithm. Therefore, by definition of α-list, c(S G ∪ {s min (T )},T ) ≤ k.
The algorithm returns one of the two candidate solutions that maximizes the objective function.
The computational complexity of Algorithm 1 is O(n · (|L max | + cl)), where L max is the largest α-list computed and cl is the computational complexity of the algorithm at line 6. In the next sections we will show that our algorithms construct the α-lists in such a way that both |L max | and cl are polynomially bounded in the input size.
In what follows we analyze the approximation ratio of Algorithm 1. The proof generalizes known arguments for monotone submodular maximization, see e.g. [7,18,23].
We give some additional definitions that will be used in the proof. We denote an optimal solution by (S * , X * ). Let us consider the iterations executed by the greedy algorithm. Let l + 1 be the index of the iteration in which an element in the α-list is not added to X because it violates the budget constraint 2 . For i = 1, 2, . . . , l, we define X i and S i as the sets X and S at the end of the i-th iteration of the algorithm, respectively. Moreover, let X l+1 = X l ∪ {T } and S l+1 = S l ∪ {s min (T )}, whereT is the element selected at line 7 of The next lemma will be used in the proof of the Lemma 1.

Lemma 2.
After each iteration i = 1, 2, . . . , l + 1, the following holds Proof. Let us consider a partition of the elements in X * \ X i−1 according to the bins they are assigned to in the optimal solution (S * , X * ), that is the elements of each set T j in the partition are associated with the same bin in (S * , X * ) Formally, X * \ X i−1 = T 1 ∪ T 2 ∪ . . . ∪ T such that for each j = 1, 2, . . . , and for each x 1 , x 2 ∈ T j , arg min s∈S * {c(s, x 1 )} = arg min s∈S * {c(s, x 2 )} and T j is maximal.
We first show the following: M F C S 2 0 1 8

31:8 Generalized budgeted submodular set function maximization
Indeed, where the last inequality follows by submodularity of f . Let us denote by c * i (T j ) the marginal cost of adding the elements T j to solution , where s * (T j ) is the bin in S * which all the elements of T j are associated with. By definition of α-list and the maximum at line 7 it follows that, for each j = 1, 2, . . . , , It follows that: which implies the statement.
Proof of Lemma 1. For i = 1 from Lemma 2 follows that f (X 1 ) ≥ α c1 k f (X * ). Applying Lemma 2 and the inductive hypothesis we obtain: Armed with Lemma 1, we can prove Theorem 3.

Theorem 3. Algorithm 1 guarantees an approximation factor of
Proof. We observe that since (S l+1 , X l+1 ) violates the budget, then c(S l+1 , X l+1 ) > k. Moreover, for a sequence of numbers a 1 , a 2 , . . . , a n such that n =1 a = A, the function 1 − n i=1 1 − ai·α A achieves its minimum when a i = A n and that 1 − Therefore, by applying Lemma 1 for i = l + 1 and observing that l+1 =1 c = c(S l+1 , X l+1 ), we obtain: Since, by submodularity, f (X l+1 ) ≤ f (X l ) + f (T ), whereT is the set selected at iteration l + 1, we get The theorem follows by observing thatT is the set selected as the second candidate solution at lines 12-14 of Algorithm 1.

Computing an α-list for a particular case
In this section, we give a polynomial time algorithm to find a (1 − )-list with respect to a partial solution (S , X ) for the particular case in which, for a given parameter ∈ (0, 1), the following condition holds: for each s ∈ S and for each T ⊆ X such that |T | = 1 . We observe that this condition is fulfilled for any ∈ (0, 1) in the case in which c(s) = 1 and c(s, x) ≥ 1, for each s ∈ S and for each x ∈ X, which generalizes the non-stochastic adaptive seeding problem [1]. Indeed, in this case x∈T c(s, x) ≥ |T | = 1 c(s), for each s ∈ S and for each T ⊆ X, such that |T | = 1 . We give a simple algorithm that returns a (1 − )-list with respect to a partial solution (S , X ). The algorithm works as follows: build a list which contains all the subsets T of X \ X such that |T | ≤ 1 and c(S ∪ {s min (T )},T ) ≤ k.
Plugging this algorithm into line 6 of Algorithm 1, we can guarantee an approximation factor of 1 2 1 − 1 e − , where = 1 2e (e − 1) for GBSM. We observe that the case in this section contains the problem of maximizing a submodular set function under a knapsack constraint as a special case. Indeed, it is enough to set c(s) = 0, for each s ∈ S, and c(s 1 , x) = c(s 2 , x), for each s 1 , s 2 ∈ S and x ∈ X. Note that in this case Condition 4 is satisfied for any ∈ (0, 1). A special case of submodular set function maximization is the maximum coverage problem, and since this latter is N P -hard to be approximated within a factor greater than 1 − 1 e [11], then the gap between the

31:10
Generalized budgeted submodular set function maximization approximation factor of our algorithm and the best achievable one in polynomial time is 1 2 , unless P = N P .
It is easy to see that the computational complexity required by the algorithm in this section is O(n 1 ) and that |L max | = O(n 1 ).
In what follows, we assume that any set T * that maximizes the ratio between marginal increment and marginal cost has size greater than 1 , as otherwise the α-list returned by our algorithm would contain such set. The following two technical lemmata will be used in the analysis of the algorithm. Proof. It is easy to prove that g is monotone. We show that for each pair of sets T, S such that T ⊆ S ⊆ (X \ X ) and for each x ∈ X \ (X ∪ S), the increment in the value of g that element x causes in S ∪ {x} is smaller than the increment it produces in T ∪ {x}, that is By applying the definition we have: which is equivalent to: The statement follows because f is submodular.
. We define T = T \ {t }, i.e. we remove from T an element with the maximum cost. We obtain f (T ) = The next theorem shows the approximation ratio of the algorithm. The main idea is to consider the subsetT that maximizes the ratio between the marginal increment and marginal cost in L and to derive a series of inequalities to lead us state that this value is greater than the ratio given by the optimal subset T * times the factor (1 − ). We first compare the ratio computed forT with that for T * 1 that is a subset of cardinality 1 of maximal ratio, then, by rewriting the marginal cost formula according to its definition and by exploiting Lemmata 4 and 5, and Condition (4) we compare this ratio to that given by the subset T * and this last inequality concludes the theorem. Theorem 6. If for each T ⊆ X such that |T | = 1 and for each s ∈ S we have x∈T c(s, x) ≥ 1 c(s), then the list L made of all the subsets of X \ X of size at most 1 and cost at most k is a (1 − )-list.
Proof. We recall that g(T ) = f (X ∪ T ) − f (X ). Given a subset T of X \ X , we denote by T y a subset of T such that |T y | = y and f (Ty) c(Ty) is maximum. Let T * be the subset of X \ X that maximizes the ratio between the marginal increment and the marginal cost. LetT be the element of L that maximizes g(T ) cmin(T ) . Since |T | ≤ 1 , then By the hypothesis of the theorem,c T * Since f is monotone and submodular, then, by Lemma 4, also g T * 1 is submodular. By Lemma 5 follows that .
We now focus on the denominator, and we obtain that: By applying the hypothesisc(T * ) ≥ 1 c(s min (T * )), it follows that: To conclude: .

Computing an α-list for the general case
In this section we give a polynomial time algorithm that builds a 1 − 1 e (1 − )-list with respect to a partial solution (S , X ), for any ∈ (0, 1). Using this algorithm as routine at line 6 of Algorithm 1, we can guarantee an approximation factor of for GBSM. We observe that this case generalizes the non-stochastic adaptive seeding with knapsack constraints problem, which cannot be approximated within a factor greater than 1 − 1 e 1− 1 e , unless P = N P [21]. Then, the gap between the approximation factor of our algorithm and the best achievable one in polynomial time is 1 2 , unless P = N P . In the algorithm of this section we make use of a procedure called GreedyMaxCover to maximize the value of a monotone submodular function g : 2 X → R ≥0 , given a certain budget and costs associated to the elements of X. It is well-known that there exists a polynomial-time procedure that guarantees a 1 − 1 e -approximation for this problem [23]. Let us denote byĉ the minimum possible positive value of functions c(s) and c(s, x), amongst all elements x and bins s, i.e.ĉ = min{min{c(s) : s ∈ S, c(s) > 0}, min{c(s, The main idea is to build an α-list L which contains approximate solutions to the problem of maximizing a monotone submodular set function subject to a knapsack constraint in which the budget increases by a factor 1 + starting fromĉ, and the cost of the elements are given by the cost of associating them to a single bin. In particular, we consider q = log 1+ k c + 1 different budgets B i that iteratively increase by a factor 1+ , i.e. B 0 =ĉ and B i = (1+ )B i−1 , for i = 1, . . . , q. Moreover we define B q+1 = k. For each i = 0, . . . , q + 1 and for each bin s ∈ S, we apply procedure GreedyMaxCover with ground set X, budget B i , and the cost of associating the elements to bin s as cost function. Then, we add the set returned by GreedyMaxCover to L. In this way we consider a budget that is at most a factor 1 + greater than the cost of an optimal solution and the solution returned by GreedyMaxCover for this budget has a value that is at most 1 − 1 e times smaller than that of the optimal solution. The pseudo-code of the algorithm is reported in Algorithm 2. The outer cycle at lines 2-11 iteratively selects a bin s in S and finds a list of sets of elements assigned to bin s. The inner cycle at lines 4-8, at each iteration i, calls procedure GreedyMaxCover which uses g as function to maximize,ĉ(1 + ) i − c S (s) as budget and the cost of associating the elements to bin s as cost function (to compute this costs, we only pass s as a parameter to GreedyMaxCover). The budget is increased by a factor (1 + ) untilĉ(1 + ) i ≥ k. Finally the algorithm runs GreedyMaxCover with the full budget k. See Figure 1 for an illustration.
We call q the value of i at the end of the last iteration in the inner cycle of the algorithm. Let T j be the set in L that maximizes the ratio between g(T j ) and its assigned budget, that is: In order to bound the approximation ratio, we considerX * as the set with the optimal ratio g(X * ) cmin(X * ) amongst any possible subset of items. Let B l be the smallest value of B i , for i ∈ {0, 1, . . . , q + 1}, that is greater than or equal to the cost of an optimal solution, that is the smallest B l such that B l ≥ c min (X * ). See figure 2 for an illustration. We call T * l the set in L that has the highest ratio g(T l ) B l amongst those computed by GreedyMaxCover with budget B l , i.e. T * l = max g(T l (s)) B l : s ∈ S . We also denote the set that maximizes g(X * l ) with budget B l by X * l . The idea of the approximation analysis is that an optimal solutionX * has a value of g that is at most g(X * l ) and a cost that is at most 1 + times smaller than B l , while the number of iterations remains polynomial since the size of the intervals grows exponentially. The next two theorems show the bounds on approximation ratio, computational complexity and size of L.
Proof. Since, by construction, c min (T j ) ≤ B j , and, by Equation 5, Bj is maximum, then Procedure GreedyMaxCover guarantees a 1 − 1 e -approximation, then

31:14
Generalized budgeted submodular set function maximization Moreover, since function g is monotone and c min (X * ) ≤ B l , then g(X * l ) ≥ g(X * ), and therefore: We defined B l as the smallest value of B i that is at least c min (X * ), this implies that B l−1 ≤ c min (X * ). Moreover the ratio between B l and B l−1 is 1 + . It follows that B l ≤ (1 + )c min (X * ), which implies: The last inequality holds since 1 1+ = 1 − 1+ ≥ 1 − , for any > 0, and this concludes the proof. Proof. The outer for cycle requires m iterations. We now bound the number q of iteration of the inner cycle of the algorithm. By the exit condition of the cycle, we have:ĉ · (1 + ) q < k, which is equivalent to: q < log 1+ k c . Since for < 1, log 1+ k c = O 1 log k c , the statement follows.
We observe that O(log k c ) is polynomially bounded in the size of the input.

Bi-criterion approximation algorithm
In this section we extend the results given in Section 3 providing a bi-criterion approximation algorithm that guarantees a 1 2 1 − 1 e αβ -approximation to the GBSM problem, if we allow an extra budget up to a factor β ≥ 1. We notice that, if β = 1, i.e. we do not increase the budget, the approximation factor is 1 2 1 − 1 e α , while if β = 1 α ln 1 2 the algorithm achieves an approximation factor of 1 2 − , for any arbitrarily small > 0. The algorithm is slightly different from Algorithm 1 and it is reported in Algorithm 3. In this algorithm, we allow to exceed the given budget k by a factor β. In particular we modify lines 8 and 11, admitting a greater budget respect to Algorithm 1.
In the next theorem we show the approximation ratio of this algorithm.
Proof. We observe that since (S l+1 , X l+1 ) violates the budget, then c(S l+1 , X l+1 ) > βk. Moreover, for a sequence of numbers a 1 , a 2 , . . . , a n such that n =1 a = A, the function 1 − n i=1 1 − ai·α A achieves its minimum when a i = A n and that 1 − n i=1 1 − ai·α A ≥ 1 − 1 − α n n ≥ 1 − e −α . Therefore, by applying Lemma 1 for i = l + 1 and observing that 1 2 1 − 1 e α . One possibility to get rid of the 1 2 factor could be to use the partial enumeration technique exploited in specific subproblems (e.g. budgeted maximum coverage problem [18] and monotone submodular set function subject to a knapsack constraint maximization problem [23]). However, this requires that each greedy step selects a single element of X, to be added to a partial solution X , while our greedy algorithm selects a subset of X \ X that maximizes the ratio between its marginal increment in the objective function and its marginal cost. Note that this set can contain more than one element in order to ensure that the ratio is non-increasing at each iteration of the greedy algorithm, which is needed to apply the analysis in [18] and [23].
Other research directions, that deserve further investigation, include the study of the GBSM considering different cost functions and also different objective functions where the profit given by an element x depends on the bin s which it is associated with. It would be interesting also to analyse GBSM in the case that each bin s ∈ S has its own budget k to use in order to maximize the objective function.