Improved Approximation Algorithms for Matroid and Knapsack Median Problems and Applications

We consider the matroid median problem [Krishnaswamy et al. 2011], wherein we are given a set of facilities with opening costs and a matroid on the facility-set, and clients with demands and connection costs, and we seek to open an independent set of facilities and assign clients to open facilities so as to minimize the sum of the facility-opening and client-connection costs. We give a simple 8-approximation algorithm for this problem based on LP-rounding, which improves upon the 16-approximation in Krishnaswamy et al. [2011]. We illustrate the power and versatility of our techniques by deriving (a) an 8-approximation for the two-matroid median problem, a generalization of matroid median that we introduce involving two matroids; and (b) a 24-approximation algorithm for matroid median with penalties, which is a vast improvement over the 360-approximation obtained in Krishnaswamy et al. [2011]. We show that a variety of seemingly disparate facility-location problems considered in the literature—data placement problem, mobile facility location, k-median forest, metric uniform minimum-latency Uncapacitated Facility Location (UFL)—in fact reduce to the matroid median or two-matroid median problems, and thus obtain improved approximation guarantees for all these problems. Our techniques also yield an improvement for the knapsack median problem.


Introduction
We consider the matroid median problem, which is defined as follows.As in the uncapacitated facility location problem, we are given a set of facilities F and a set of clients D. Each facility i has an opening cost of f i .Each client j ∈ D has demand d j and assigning client j to facility i incurs an assignment cost of d j c ij proportional to the distance between i and j.Further, we are given a matroid M = (F, I) on the set of facilities.The goal is to choose a set F ∈ I of facilities to open that forms an independent set in M , and assign each client j to a facility i(j) ∈ F so as to minimize the total facility-opening and client-assignment costs, that is, i∈F f i + j∈D d j c i(j)j .We assume that the facilities and clients are located in a common metric space, so the distances c ij form a metric.
The matroid median problem is a generalization of the metric k-median problem, which is the special case where M is a uniform matroid (and there are no facility-opening costs), and is thus, NP-hard.The matroid median problem without facility-opening costs was introduced recently by Krishnaswamy et al. [11], who gave a 16-approximation algorithm for this problem.We devise an improved 8-approximation algorithm for this problem (Section 3).Moreover, notably, our algorithm is significantly simpler and cleaner than the one in [11].The effectiveness of our simpler approach for matroid median is further highlighted when we consider the matroid median problem with penalties, which is the generalization of matroid median where we are allowed to not assign a client to an open facility at the expense of incurring a certain penalty for each such unassigned client.We leverage the techniques underlying our simpler and cleaner algorithm for matroid median to devise a 24-approximation algorithm (Section 4.1), which is a vast improvement over the approximation ratio of 360 obtained by Krishnaswamy et al. [11].
Our improvement comes from an improved and simpler rounding procedure for a natural LP relaxation of the problem also considered in [11].We show that a clustering step introduced in [5] for the k-median problem coupled with two applications of the integrality of the intersection of two submodular (or matroid) polyhedra-one to obtain a half-integral solution, and another to obtain an integral solution-suffices to obtain the desired approximation ratio.In contrast, the algorithm in [11] starts off with the clustering step in [5], but then further dovetails the rounding procedure of [5] creating trees and then stars and then applies the integrality of the intersection of two submodular polyhedra.
There is great deal of similarity between the the rounding algorithm of [11] for matroid median and the rounding algorithm of Baev and Rajaraman [2] for the data placement problem, who also perform the initial clustering step in [5] and then create trees and then stars and use these to obtain an integral solution.In contrast, our simpler, improved rounding algorithm is similar to the rounding algorithm in [3] for data placement, who use the initial clustering step of [5] coupled with two min-cost flow computations-one to obtain a half-integral solution and another to obtain an integral solution-to obtain the final solution.These similarities are not surprising since, as we show in Section 5, the data-placement problem is a special case of the matroid median problem.In fact, our improvements are inspired by, and analogous to, those obtained for the data-placement problem by Baev, Rajaraman, and Swamy [3] over the guarantees in [2], and stem from similar insights.
A common theme to emerge from our work (and [3]) is that in various settings, the initial clustering step introduced by [5] imparts sufficient structure to the fractional solution so that one can then round it using two applications of suitable integrality-results from combinatorial optimization.First, this initial clustering can be used to derive a half-integral solution.This was observed explicitly in [2] and is implicit in [11], and making this explicit yields significant dividends.Second, and this is the oft-overlooked insight (in [2,11]), a half-integral solution can be easily rounded, and in a better way, without resorting to creating trees and then stars etc. as in the algorithm of [5].This is due to the fact that a half-integral solution is already "filtered": if client j is assigned to facility i fractionally, then one can bound c ij in terms of the assignment cost paid by the fractional solution for j (see Section 3).This enables one to use a standard facility-location clustering step to set up a suitable combinatorial-optimization problem possessing an integrality property, and hence, round the half-integral solution.The resulting algorithm is typically both simpler and has a better approximation ratio than what one would obtain by mimicking the steps of [5] involving creating trees, stars etc.
We consider some extensions of matroid median in Section 4 including matroid median with penalties and matroid median with two matroids, and show that our techniques readily extend to these problems and yield the same guarantees.We show in Section 5 that a variety of facility location problems that have been considered in the literature-the data placement problem [2,3], mobile facility location [9,1], k-median forest [10], metric uniform minimum-latency UFL [4]-can be cast as instances of matroid median or the extensions we consider in Section 4. This not only gives a unified framework for viewing these seemingly disparate problems, but also yields improved, and in some cases, the first, approximation guarantees for all these problems.We consider this to be one of our contributions.We conclude in Section 6 by showing that our techniques also yield an improvement for the knapsack median problem [11,12].
Recently, Charikar and Li [6] obtained a 9-approximation algorithm for the matroid-median problem; our results were obtained independently.While there is some similarity between our ideas and those in [6], we feel that our algorithm and analysis provides a more illuminating explanation of why matroid median and some of its extensions (e.g., two-matroid median, matroid median with penalties; see Section 4) are "easy" to approximate, whereas other variants such as matroid-intersection median (Section 4) are inapproximable.It is possible that our ideas coupled with the dependent-rounding procedure used in [6] for the k-median problem may lead to further improvements for the matroid median problem; we leave this as future work.

An LP relaxation
We can express the matroid median problem as an integer program and relax the integrality constraints to get a linear program.Throughout we use i to index the caches in F and j to index the clients in D. Let r denote the rank function of the matroid M = (F, I).
Variable y i indicates if facility i is open, and x ij indicates if client j is assigned to facility i.The first and second constraints say that each client must be assigned to an open facility.The third constraint encodes the matroid independence constraint.An integer solution corresponds exactly to a solution to our problem.We note that (P) can be solved in polytime since one can provide an efficient separation oracle for it using, for example, a polytime algorithm for minimization of submodular functions.

A simple 8-approximation algorithm via LP-rounding
Let (x, y) denote an optimal solution to (P) and OPT be its value.We first describe a simple algorithm to round (x, y) to an integer solution losing a factor of at most 10.In Section 3.4, we use some additional insights to tweak this rounding procedure and improve the approximation ratio to 8. We use the terms connection cost and assignment cost interchangeably.We may assume that i x ij = 1 for every client j.

Overview of the algorithm
We first give a high level description of the algorithm.Suppose for a moment that the optimal solution (x, y) satisfies the following property: for every facility i, there is at most one client j such that x ij > 0.
( * ) Notice that the F j sets are disjoint.We may assume that for i ∈ F j , we have y i = x ij , so the objective function is a linear function of only the y i variables.We can then set up the following matroid intersection problem.The first matroid is M restricted to j F j .The second matroid M ′ (on the same ground set j F j ) is the partition matroid defined by the F j sets; that is, a set is independent in M ′ if it contains at most one facility from each F j .Notice the y i -variables yield a fractional point in the intersection of the matroid polyhedron of M and the matroid-base polyhedron of M ′ .Since the intersection of these two polyhedra is known to be integral (see, e.g., [8]), this means that we can round (x, y) to an integer solution of no greater cost.Of course, the LP solution need not have property ( * ) so our goal will be to transform (x, y) to a solution that has this property without increasing the cost by much.Roughly speaking we want to do the following: cluster the clients in D around certain 'centers' (also clients) such that (a) every client k is assigned to a "nearby" cluster center j whose LP assignment cost is less than that of k, and (b) the facilities serving the cluster centers in the fractional solution (x, y) are disjoint.So, the modified instance where the demand of a client is moved to the center of its cluster has a fractional solution, namely the solution induced by (x, y), that satisfies ( * ) and has cost at most OPT .Furthermore, given a solution to the modified instance we can obtain a solution to the original instance losing a small additive factor.One option is to use the decomposition method of Shmoys et al. [13] for uncapacitated facility location (UFL) that produces precisely such a clustering.The problem however is that [13] uses filtering which involves blowing up the x ij and y i values, thus violating the matroid-independence packing constraints.Chudak and Shmoys [7] use the same clustering idea but without filtering, using the dual solution to bound the cost.The difficulty here in bounding the cost using the dual solution is that there are terms with negative coefficients in the dual objective function that correspond to the primal constraints (2).Although [14] showed that it is possible to overcome this difficulty in certain cases, the situation here looks more complicated and it is not clear how to use their techniques.
Instead, we use the clustering technique of Charikar et al. [5] to cluster clients and first obtain a halfintegral solution (x, ŷ), that is, every xij , ŷi ∈ 0, 1  2 , 1 , to the modified instance with cluster centers, losing a factor of 3. Further, any solution here will give a solution to the original instance while increasing the cost by at most 4 • OPT .Now we use the clustering method of [13] without any filtering, since the halfintegral solution (x, ŷ) is essentially already filtered; if client j is assigned to i and i ).This final step causes us to lose an additive factor equal to the cost of (x, ŷ), so overall we get an approximation ratio of 4 + 3 + 3 = 10.In Section 3.4, we show that by further exploiting the structure of the half-integral solution, we can give a better bound on the cost of the integer solution and thus obtain an 8-approximation.
We now describe each of these steps in detail.Let Cj = i c ij x ij denote the cost incurred by the LP solution to assign one unit of demand of client j.Given a vector v ∈ R F and a set S ⊆ F, we use v(S) to denote i∈S v i .

Obtaining a half-integral solution (x, ŷ)
Step I: Consolidating demands around centers.We first consolidate (or cluster) the demand of clients at certain clients, that we call cluster centers.We do not modify the fractional solution (x, y) but only modify the demands so that for some clients j, the demand d j is "moved" to a "nearby" center k.We assume every client has a non-zero demand (otherwise we can simply get rid of such clients).
Set d ′ j ← 0 for every j.Consider the clients in increasing order of Cj .For each client j, if there exists denote the cost of (x, y) for the modified instance consisting of the cluster centers.Lemma 3.1 The following hold: (i) if j, k ∈ D, then c jk ≥ 4 max( Cj , Ck ), (ii) OPT ′ ≤ OPT , and (iii) any solution (x ′ , y ′ ) to the modified instance can be converted to a solution to the original instance incurring an additional cost of at most 4 • OPT .
Proof : Suppose j was considered after k.Then d ′ k > 0 at this time, otherwise d ′ k would remain at 0 and k would not be in D s .So if c jk < 4 max( Cj , Ck ) then d ′ j would remain at 0, giving a contradiction.It is clear that if we move the demand of client j to client k, then Cj ≤ Ck and c jk ≤ 4 Ck .So the assignment cost for the new instance, j d ′ j Cj , only decreases and the facility-opening cost i f i y i does not change, hence OPT ′ ≤ OPT .Given a solution (x ′ , y ′ ) to the modified instance, if the demand of k was moved to j the extra cost incurred in assigning k to the same facility(ies) as in x ′ is at most d k c jk ≤ 4d k Ck by the triangle inequality, so the total extra cost is at most 4 • OPT .
From now on we focus on the modified instance with client set D and modified demands d ′ j .At the very end we will use the above lemma to translate an integer solution to the modified instance to an integer solution to the original instance.
Step II: Transforming to a half-integral solution.We define the cluster of a client j ∈ D to be the set F j of all facilities i such that j is the center in D closest to i, that is, F j = {i : c ij = min k∈D c ik }, with ties broken arbitrarily.Let F ′ j ⊆ F j = {i ∈ F j : c ij ≤ 2 Cj }.Clearly the sets F j for j ∈ D are disjoint.By property (i) of Lemma 3.1, we have that F j contains all the facilities i such that c ij ≤ 2 Cj .So 2 by Markov's inequality.To obtain the half-integral solution, we define a suitable vector y ′ that lies in a polytope with half-integral extreme points and construct a linear function T (.) such that T (y ′ ) bounds the cost of a fractional solution.We show that T (y ′ ) ≤ 3 • OPT ′ .This implies that one can obtain a "better" half-integral vector ŷ, which we then argue yields a half-integral solution (x, ŷ) to the modified instance of cost at most T (ŷ) ≤ T (y ′ ).
Define γ j := min i / ∈F j c ij , and let Then y ′ lies in the following polytope We claim that P has half-integral extreme points.The easiest way to see this is to note that any extreme point of P is defined by a linearly independent system of tight constraints comprising some v(S) = r(S) equalities corresponding to a laminar set system, and some v(F ′ j ) = 1 2 and v(G j ) = 1 equalities.The constraint matrix of this system thus corresponds to equations coming from two laminar set systems; such a matrix is known to be totally unimodular, and hence the vector v satisfying this system must be a halfintegral solution.(An alternate proof of half-integrality of P based on the integrality of the intersection of two submodular polyhedra, is sketched in Appendix A.) Since y ′ ∈ P, this implies that we can obtain a half-integral solution ŷ such that T (ŷ) ≤ T (y ′ ).Observe that there is at least one facility i ∈ F ′ j with ŷi > 0; we call the facility i ∈ F ′ j nearest to j the primary facility of j and set xij = ŷi .Note that every every client in D has a distinct primary facility.If ŷi < 1, then let i ′ be the facility nearest to j other than i such that ŷi ′ > 0; we call i ′ the secondary facility of j, and set xi Proof : We first show that T (y ′ ) ≤ 3 • OPT ′ , and then bound the cost of (x, ŷ) by ), this proves the lemma.We have OPT ′ = i f i y i + j d ′ j Cj , and for any j ∈ D, we have To bound the cost of (x, ŷ), it suffices to show that the assignment cost of each client j ∈ D is at most , where i ′ is the secondary facility of j.We show that c i ′ j ≤ 3γ j , which implies the desired bound.Let γ j = c i ′′ j where i ′′ ∈ F k , k = j.Let ℓ be the primary facility of k.Then, Combining the inequalities we get that c i ′ j ≤ 3γ j .

Converting (x, ŷ) to an integer solution
Define Ĉj = i c ij xij and S j = {i : xij > 0} for j ∈ D.
Step III: Clustering.First, we cluster the clients in D as follows: pick j ∈ D with smallest Ĉj .Remove every client k ∈ D such that S j ∩ S k = ∅; we call j the center of k and denote it by ctr(k).Recurse on the remaining set of clients until no client in D is left.Let D ′ be the set of clients picked -these are the new cluster centers.Note that ctr(j) = j for every j ∈ D ′ .
Step IV: The matroid intersection problem.For convenience, we will say that every client j ∈ D has both a primary facility, denoted i 1 (j), and a secondary facility i ′ with xi 1 (j)j = xi ′ j = 1 2 , with the understanding that if j does not have a secondary facility then i ′ = i 1 (j), and so xi 1 (j)j = 1.We denote the secondary facility of j by i 2 (j).Then we have, , then the resulting instance with client-set D ′ (and the new demands) satisfies the property ( * ) mentioned in Section 3.1.Hence, one can set up a matroid-intersection problem as mentioned in Section 3.1 to get an integer solution to the instance with client-set D ′ , which translates to a solution with client-set D (and the original demands d ′ j ).Doing this naively, we lose an additive factor of (at most) 4 k∈D d ′ k Ĉk in translating the demands back from D ′ to D. We will set up the matroid-intersection problem more carefully so that we only lose a factor of 2 instead.
For i ∈ F, define ŷ′ i = xij ≤ ŷi if i ∈ S j where j ∈ D ′ , and ŷ′ i = ŷi otherwise.Then ŷ′ lies in the polytope Observe that R is the intersection of the matroid polytope for M with the matroid base polytope for the partition matroid defined by the S j sets for j ∈ D ′ .This polytope is known to have integral extreme points.

Similar to
Step II, we define a linear function where Since R is integral, we can find an integer point ỹ ∈ R such that H(ỹ) ≤ H(ŷ ′ ).This yields an integer solution (x, ỹ) to the instance with client set D, where we assign each client j ∈ D ′ to the unique facility opened from S j , and each client k , or to the facility opened from S ctr(k) .In Lemma 3.3 we prove that the cost of this integer solution is at most H(ỹ), and in Lemma 3.4 we show that H(ŷ ′ ) is at most twice the cost of (x, ŷ) and hence, at most 6 • OPT (by Lemma 3.2).Combined with Lemma 3.1, this yields Theorem 3.5.

Lemma 3.3 The cost of (x, ỹ) is at most H(ỹ).
Proof : Clearly, the facility opening cost is i f i ỹi , and the assignment cost of a client j ∈ D ′ is is the second-nearest facility to k, so every facility in S j is at least as far away from k as i 2 (k).

Proof : Clearly
Theorem 3. 5 The integer solution (x, ỹ) translates to an integer solution to the original instance of cost at most 10 • OPT .

Improvement to 8-approximation
Notice that the rounding procedure described in Section 3.3 for rounding a half-integral solution into an integral one does not use any information about how the half-integral solution is obtained (in Section 3.2).That is, it shows that any half-integral solution can be converted into an integral one losing a (multiplicative) factor of 2 in the cost.We can obtain an improved approximation ratio of 8 by exploiting the structure leading to a half-integral solution.The key to the improvement comes from the following observation (in various flavors).Consider a non-cluster-center k ∈ D ′ \ D with ctr(k) = j.Let i be a facility serving both j and k, and suppose i is not the primary facility of k.In the absence of any further information, all we can say is that c jk ≤ c ij + c ik ≤ 3γ j + 3γ k .However, if we define our half-integral solution slightly differently by setting the secondary facility of k to be the primary facility of the client (in D) nearest to k, then we have the better bound c jk ≤ 2γ j + 2γ k , which yields an improved bound for k's assignment cost.In order to push this observation through, we will "couple" the rounding steps used to obtain the half-integral and integral solutions; that is, we tailor the function T (.) (defined in Step II above) so as to allow one to bound the total cost of the final integral solution obtained.Also, we use a different criterion for selecting a cluster center in the clustering performed in Step III.We now detail the requisite changes.The first step is the same as Step I in Section 3.2.So we now work with the client set D and the demands {d ′ j } j∈D .As before, let OPT ′ denote the cost of (x, y) for this modified instance.Recall that for each j ∈ D, we define and y ′ i = 0 otherwise.With some hindsight (see Lemma 3.7), we define Since y ′ lies in the half-integral polytope P (see ( 3)), we can obtain a half-integral ŷ such that T (ŷ) ≤ T (y ′ ).
For each client j ∈ D, define σ(j) = j if ŷ(G j ) = 1, and σ(j) = arg min k∈D:k =j c jk otherwise (breaking ties arbitrarily).Note that c jσ(j) ≤ 2γ j .As before, we call the facility i nearest to j with ŷi > 0 the primary facility of j and denote it by i 1 (j); we set xi 1 (j)j = ŷi 1 (j) .Note that i 1 (j) ∈ F ′ j .If ŷi 1 (j) < 1 and ŷ(G j ) = 1, let i ′ be the fractionally open facility other than i 1 (j) nearest to j; otherwise, if ŷi 1 (j) < 1 and ŷ(G j ) < 1, (so σ(j) = j and ŷi 1 (j) = 1  2 ), let i ′ be the primary facility of σ(j).We call i ′ the secondary facility of j, and denote it by i 2 (j).Again, for convenience, we will consider j as having both a primary and secondary facility and xi 1 (j)j = xi 2 (j)j = 1  2 , with the understanding that if A2. Clustering and rounding to an integral solution.For each j ∈ D, define C ′ j = c i 1 (j)j + c jσ(j) + c i 2 (j)σ(j) /2.We cluster clients as in Step III in Section 3.3, except that we repeatedly pick the client with smallest C ′ j among the remaining clients to be the cluster center.As before, let D ′ denote the set of cluster centers, and let ctr(k) = j ∈ D ′ for k ∈ D if k was removed in the clustering process because j was chosen as a cluster center and S j ∩ S k = ∅.(So ctr(j) = j for j ∈ D ′ .)

Similar to
Step IV in Section 3.3, for each i ∈ F, define ŷ′ Since ŷ′ lies in the integral polytope R (see ( 4)), we can obtain an integral vector ỹ such that H(ỹ) ≤ H(ŷ ′ ), and a corresponding integral solution (x, ỹ) (as in Step IV in Section 3.3).

Analysis.
It is easy to adapt the proof of Lemma 3.2 to show that shows that the cost of (x, ỹ) is at most H(ỹ) ≤ H(ŷ ′ ), and Lemma 3.7 proves that H(ŷ ′ ) ≤ T (ŷ).This shows that the cost of (x, ỹ) is at most 4 • OPT .Combined with Lemma 3.1, this yields the 8-approximation guarantee (Theorem 3.8).

Lemma 3.6
The cost of (x, ỹ) is at most H(ỹ).

Proof : The facility opening cost is
We show that L j (ŷ ′ ) ≤ B j (ŷ) for every j ∈ D, which will complete the proof.We first argue that d 2 , and c jσ(j) + c i 1 (σ(j))σ(j) ≤ 3γ j ; so For a client j ∈ D ′ , we have Theorem 3.8 The integer solution (x, ỹ) translates to an integer solution to the original instance of cost at most 8 • OPT .
Remark 3.9 It is easy to modify the above algorithm to obtain a so-called Lagrangian-multiplier preserving (LMP) 8-approximation algorithm, that is, where the solution (x, ỹ) returned satisfies To obtain this, the only change is that we redefine We now have . Also, as before, we have H(ŷ ′ ) ≤ T (ŷ).Thus, we have

Matroid median with penalties
This is the generalization of matroid median where are allowed to leave some clients unassigned at the expense of incurring a penalty d j π j for each unassigned client j.This changes the LP-relaxation (P) as follows.We use a variable z j for each client j ∈ D to denote if we incur the penalty for client j, and modify constraint (1) for client j to i x ij + z j ≥ 1; also the objective is now to minimize i f i y i + j d j i c ij x ij + π j z j .Let (x, y, z) denote an optimal solution to this LP and OPT be its value.Krishnaswamy et al. [11] showed that (x, y, z) can be rounded to an integer solution losing a factor of 360.We show that our rounding approach for matroid median can be adapted to yield a substantially improved 24-approximation algorithm.The rounding procedure is similar to the one described in Section 3 for matroid median, except that we now need to deal with the complication that a client need to be assigned fractionally to an extent of 1.
Let X j = i x ij , Cj = i c ij x ij /X j , and LP j = i c ij x j + π j z j = Cj X j + π j z j .We may assume that X j + z j = 1 for every client j and that if x ij > 0 then c ij ≤ π j , so we have Cj ≤ LP j ≤ π j .
Step 0. First, we set zj = 1 and incur the penalty for each client j for which π j ≤ 2LP j .In the sequel, we work with the remaining set Step I: Consolidating demands.We consolidate demands around centers in a manner similar to Step I of the rounding procedure in Section 3. The difference is that if k is consolidated with client j, then we cannot simply add d k to j's demand and replicate j's assignment for k (since π k could be much larger than π j so that Cj X j + π k (1 − X j ) need not be bounded in terms of LP k ).Instead, we treat k as being co-located with j and recompute k's assignment.
Let L be a list of clients in D ′ arranged in increasing order of LP j .Let D = ∅.We compute a new assignment (x ′ , z ′ ) for the clients as follows.Set x ′ ij = z ′ j = 0 for all i, j.Remove the first client j ∈ L and add it to D. Set x ′ ij = x ij for all facilities i and z ′ j = z j ; also set nbr(j) = j.For every client k in L with c jk ≤ 4LP k , we remove k from L, and set nbr(k) = j.We consider k to be co-located with j and re-optimize k's assignment.So we set x ′ ik = y i starting from the facility nearest to j and continuing until k is completely assigned or until the last facility i such that c ij ≤ π k , in which case we set z We call each client in D a cluster center.Let {c ′ ij } denote the assignment costs of the clients with respect to their new locations.Let OPT ′ = i f i y i + j∈D ′ d j i c ′ ij x ′ ij + π j z ′ j denote the cost of the modified solution for the modified instance.The following lemma is immediate.Lemma 4.1 The following hold: (i) if j, k ∈ D ′ are not co-located, then c jk ≥ 4 max(LP j , LP k ), (ii) OPT ′ ≤ 5 • OPT ′′ , and (iii) any solution to the modified instance can be converted to a solution to the original instance involving client-set D ′ incurring an additional cost of at most 4 • OPT ′′ .
Step II: Obtaining a half-integral solution.As in Step II of Section 3, we define a suitable vector y ′ that lies in a polytope with half-integral extreme points and construct a linear function T (.) with T (y ′ ) = O(OPT ′ ) bounding the cost of a fractional solution.We can then obtain a "better" half-integral vector ŷ, which yields a half-integral solution.In Step III, we round ŷ to an integral solution whose cost we argue is at most T (ŷ) ≤ T (y ′ ).
Consider a client j ∈ D. Let F j = {i : Consider the facilities in G j in increasing order of their distance from j.For every facility i ∈ G j , we set Clearly y ′ ≤ y, so y ′ lies in the polytope P (see (3)), which has half-integral extreme points.So we can obtain a half-integral point ŷ ∈ P ′ such that T (ŷ) ≤ T (y ′ ).
We now obtain a half-integral assignment for the clients in D ′ as follows.Consider a client k and let j = nbr(k).(Note that we could have k = j.)Set σ(j) to be j if ŷ(G j ) = 1, and arg min ℓ∈D:ℓ =j c jℓ otherwise (as in Section 3.4).Call the facility i ∈ F ′ j nearest to j the primary facility of k, and set xik = ŷi .If ŷi < 1, then define i ′ to be the facility nearest to j other than i with ŷi ′ > 0 if ŷ(G j ) = 1, and the primary facility of σ(j) otherwise.If ŷ {i ′′ ∈ G j : c i ′′ j ≤ π k } = 1 2 and π k ≤ 2γ j , we set ẑk = Proof : It suffices to show that for every client k, we have Consider the facilities in G j in increasing order of their distance from j.If π k < γ j , then (we may assume that) k uses the facilities in G j with c ′ ik = c ij ≤ π k fully (i.e., x ′ ik = y i ) until either it is completely assigned (and the last facility used by k may be partially used) or we exhaust the facilities in G j with c ij ≤ π k .In both cases, we have Step III: Rounding (x, ŷ) to an integer solution.This step is quite straightforward.We incur the penalty for all clients j ∈ D ′ with ẑj = 1 2 .Note that all the remaining clients k with nbr(k) = j are (co-located and) assigned identically and completely in (x, ŷ, ẑ).Viewing this as an instance with demand consolidated at the cluster centers, we use the rounding procedure in step A2 of Section 3.4 to convert the half-integral solution of these remaining clients into an integral one.Let (x, ỹ, z) denote the resulting integer solution.

Lemma 4.3
The cost of (x, ỹ, z) for the modified instance is at most T (ŷ) ≤ 4 • OPT ′ .Proof : For a client k with zk = 1, we have B k (ŷ) ≥ π k , since ẑk = 1  2 implies that ŷ(N k ) = 1 2 , where N k = {i ∈ G j : c ij ≤ π k }, and π k < 2γ j .For a client k with zk = 0 and nbr(k) = j, we claim that 2 , then this follows since we must have π k > 2γ j for ẑk to be 0; otherwise, ŷ(N k ) = 1 = ŷ(G j ) and again the equality holds.
The proof of Lemmas 3.6 and 3.7 now shows that , where H(.) is the function defined in step A2 of Section 3.4 for the instance where each cluster center j has demand d ′ j := k:nbr(k)=j,ẑ k =0 d k .Hence, the total cost of (x, ỹ, z) for the modified instance is at most T (ŷ).
Combined with parts (ii) and (iii) of Lemma 4.1, we obtain a solution to the original instance involving client-set D ′ of cost at most 24•OPT ′′ .Adding in the penalties of the clients in D\D ′ (recall that π j ≤ 2LP j for each j ∈ D \ D ′ ), we obtain that the total cost is at most 24 • OPT .

Matroid median with two matroids
A natural extension of matroid median is the matroid-intersection median problem, wherein are given two matroids on the facility-set F, and we require the set of open facilities to be an independent set in both matroids.The goal is, as before, to minimize the sum of the facility-opening and client-assignment costs.This problem however turns out to be inapproximable to within any multiplicative factor in polynomial time since, as we show in Appendix B, it is NP-complete to determine if there is a zero-cost solution; this holds even if one of the matroids is a partition matroid.
We consider two extensions of matroid median that are (essentially) special cases of matroid-intersection median and can be used to model some interesting problems (see Section 5).The techniques developed in Sections 3 readily extend and yield the same approximation guarantees for both problems.Loosely speaking, these extensions may be viewed in some sense as the most-general special cases of matroid-intersection median that one can hope to approximately solve in polytime.Technically, the key distinction between (general) matroid-intersection median and the extensions we consider, which enables one to achieve polytime multiplicative approximation guarantees for these problems, is the following.In both our extensions, one can define polytopes analogous to P and R in Steps II and IV of the rounding procedure respectively (see (3) and ( 4) respectively) that encode information from the clustering performed in Steps I and III respectively and whose extreme points are defined by equations coming from two laminar systems.In contrast, for matroid-intersection median, the extreme points of the analogous polytopes are defined by equations coming from three laminar systems (one each from the two matroids, and one that encodes information about the clustering step), which creates an insurmountable obstacle.
The setup in both extensions is similar.We have a matroid M = (F, I) on the facility-set.The facilityset F is partitioned into F 1 ∪ F 2 and clients may only be assigned to facilities in F 1 ; this can be encoded by setting c ij = ∞ for all i ∈ F 2 and j ∈ D. We also have lower and upper bounds (lb1 , ub1 ), (lb2 , ub2 ), and (lb, ub) on the number of facilities that may be opened from F 1 , F 2 , and F respectively.While the role of F 2 may seem unclear at this point, notice that a non-trivial (explicit or implicit) lower bound on the number of F 2 -facilities imposes restrictions on the facilities that may be opened from F 1 (due to the matroid M ).

Two-matroid median (2MMed).
In addition to the above setup, we have another matroid We need to open a feasible set of facilities and assign every client to an open facility so as to minimize the total facility-opening and client-assignment cost.The LP-relaxation for 2MMed is quite similar to (P).Let r 2 denote the rank function of M 2 .We use y(S) to denote i∈S y i .We augment (P) with the constraints: Let (x, y) denote an optimal solution to this LP, and OPT denote its cost.The rounding procedure dovetails the one in Section 3. The first step is again Step I in Section 3.2.We proceed with the client set D and the demands {d ′ j } j∈D .Let OPT ′ denote the cost of (x, y) for this modified instance.As before, for each j ∈ D, we define A slight technicality that arises in mimicking Step A1 in Section 3.4 is that setting y ′ i = x ij for some client j need not yield a feasible solution due to the lower-bound constraints.To deal with this, for every j ∈ D and i ∈ G j with 0 < x ij < y i , we replace facility i with two co-located "clones" i 1 and i 2 .We set j consisting of the new facilities i for which c ij = min k∈D c ik , c ij ≤ γ j and x ij = y i > 0 (that is, G ′ j consists of the new i 1 -clones and the old facilities i ∈ G j with x ij = y i > 0).We continue to let F ′ j denote the facilities i with c ij = min k∈D c ik and c ij ≤ 2 Cj .Let F ′ 1 denote the new F 1 -set after these changes, and and h(i) = {i} otherwise.We update the rank function r to r ′ (over 2 F ′ ) in the obvious way: r ′ (S) = r {i ∈ F : h(i)∩ S = ∅} .Note that r ′ defines a matroid on F ′ .Clearly, a solution to the modified translates to a solution to the original instance and vice versa.
We continue with steps A1 and A2 in Section 3.4, replacing G j with G ′ j , and using suitable polytopes in place of P and R to obtain the half-integral and integral solutions.Define and It is clear that (the new vector) y lies in P ′ .The key observation is that an extreme point of P ′ is defined by a linearly independent system of tight constraints coming from two laminar systems: one consisting of some tight v(S) ≤ r ′ (S) constraints of M ′ and potentially the tight lb ≤ v(F ′ ) ≤ ub constraints; the other consisting of some tight v(S) ≤ r 2 (S) constraints, some tight ) Thus, P ′ has half-integral extreme points, and so we can find a half-integral ŷ such that T (ŷ) ≤ T (y).As before, we have that The primary facility i 1 (j) of j ∈ D is the facility i nearest to j with ŷi > 0; set xi 1 (j)j = ŷi 1 .Let σ(j) = j if ŷ(G ′ j ) = 1 and arg min k∈D:k =j c jk otherwise.If ŷi 1 (j) < 1, we define the secondary facility i 2 (j) of j as follows and set xi 2 (j)j = 1 2 .If ŷ(G ′ j ) = 1, the secondary facility is the facility other than i 1 (j) opened (fractionally) from G ′ j closest to j; otherwise, i 2 (j) is the primary facility of σ(j).Let S j = {i : xij > 0}.
We define C ′ j and cluster clients in D as in step A2 of Section 3.4 (again using G ′ j instead of G j ) to obtain the set D ′ of cluster centers.A useful observation is that if |S j | = 1 then we may assume that j ∈ D ′ .This is because for any k ∈ D with S k ∩ S j = ∅, we have σ(k) = j and therefore C ′ k ≥ c jk + c i 1 (j)j /2 ≥ c i 1 (j)j = C ′ j .Thus, if j ∈ D ′ , then xij = ŷi for all i ∈ S j : this is clearly true if |S j | = 1; otherwise, we have that |S σ(j) | = 2 (since σ(j) / ∈ D ′ ) and so ŷi 1 (σ(j)) = 1 2 .The polytope used to round ŷ is which has integral extreme points.We define the function H(.) as in Step A2 in Section 3.4, and obtain an integral vector ỹ such that H(ỹ) ≤ H(ŷ), which then yields an integer solution (x, ỹ) (as in Step IV in Section 3.3).Mimicking Lemmas 3.6 and 3.7, we obtain that the cost of (x, ỹ) is at most . Thus, we obtain the following theorem.
Theorem 4. 5 The integer solution (x, ỹ) yields an integer solution to 2MMed of cost at most 8 • OPT .
Laminarity-constrained matroid median (LCMMed).In LCMMed, in addition to the matroid M on F and the bounds (lb1 , ub1 ), (lb2 , ub2 ), and (lb, ub), we have a laminar family L on F 2 with integer bounds 0 ≤ ℓ S ≤ u S for every set S ∈ L. We need to choose a set F ∈ I of facilities to open satisfying (i) the laminarity constraints ℓ S ≤ |F ∩ S| ≤ u S for all S ∈ L, (ii) the lower-and upper-bound constraints , and lb ≤ |F | ≤ ub, and assign each client j to an open facility to minimize the sum of the facility-opening and client-assignment costs.The upper-bound constraints from the laminar family define a matroid on F 2 ; thus, LCMMed can be viewed as a variant of 2MMed where we have a laminar matroid on F 2 along with lower-bound constraints coming from the same laminar family.Note that the bounds on the number of F 2 -facilities can be included as as a laminarity constraint; we will assume this from now on.It is not hard to see that the approach used for 2MMed also works for LCMMed.The only (obvious) changes are that the LP-relaxation, as well as the definition of the polytopes P ′ and R ′ (in ( 5) and ( 6)) now include the laminarity constraints in place of the rank constraints for the second matroid.All other steps and arguments proceed identically, and so we obtain an 8-approximation algorithm for laminarity-constrained matroid median.

Applications
We now show that various facility location problems can be cast as special cases of matroid median or the extensions considered in Section 4. These include the data placement problem [2,3], mobile facility location [9,1] (with general movement costs), (non-uniform) k-median forest [10], and metric uniform minimum-latency UFL (MLUFL) [4] (with latency-cost functions).The current-best approximation factors for these problems are: 10 for the data placement problem [3], 16 for non-uniform k-median forest [10], and 10.773 for metric uniform MLUFL [4].No non-trivial approximation guarantees were previously known for mobile facility location; but since we show that this problem reduces to matroid median, approximation guarantees of 16 and 9 follow from the work of [11] and [6] respectively.Thus, our 8-approximation algorithms for matroid median and the extensions discussed in Section 4 immediately yield improved approximation guarantees for all these problems.
The data placement problem [2,3].We are given a set of caches F, a set of data objects O, and a set of clients D. Each cache i ∈ F has a capacity u i , and each client j ∈ D has demand d j for a specific data object o(j) ∈ O and has to be assigned to a cache that stores that object.Storing an object o in cache i incurs a storage cost of f o i , and assigning client j to cache i incurs an access cost of d j c ij , where the distances c ij form a metric.We want to determine a set of objects O(i) ⊆ O to place in each cache i ∈ F satisfying |O(i)| ≤ u i , and assign each client j to a cache i(j) that stores object o(j), (i.e., o(j) ∈ O(i(j))) so as to minimize i∈F o∈O(i) f o i + j∈D d j c i(j)j .Baev, Rajaraman, and Swamy [3] gave a 10-approximation algorithm for this problem.
This can be cast as an instance of matroid median as follows.The facility-set in the matroid-median instance is F × O. Facility (i, o) denotes that we store object o in cache i, and has cost f o i .The client set remains unchanged.We set the distance c (i,o)j to be c ij if o(j) = o and ∞ otherwise, thus enforcing that each client j is only assigned to a facility containing object o(j).Note that the new distances form a metric if the c ij s form a metric.Finally, the cache-capacity constraints are incorporated via the matroid on F × O where a set S is independent if |{(i ′ , o) ∈ S : i ′ = i}| ≤ u i for every i ∈ F. It is easy to see that the resulting matroid-median instance precisely encodes the data placement problem.

Mobile facility location [9, 1].
In the version with general movement costs, the input is a metric space V, {c ij } .We have a set D ⊆ V of clients, with each client j having demand d j , and a set F ⊆ V of initial facility locations.A solution moves each facility i ∈ F to a final location s i ∈ V incurring a movement cost of w is i ≥ 0, and assigns each client j to the final location s of some facility incurring an assignment cost of d j c sj .The goal is to minimize the sum of all the movement and assignment costs.Approximation algorithms are known for this problem when the movement costs are proportional to the c ij -distances.Friggstad and Salavatipour [9] gave an LP-rounding based 8-approximation algorithm when w is i = c is i .This was later improved to a local-search based (3 + ǫ)-approximation algorithm by Ahmadian, Friggstad, and Swamy [1], which also extends to the weighted setting where w is i = w i • c is i .Ahmadian et al. also remark that their local-search algorithm can be arbitrarily bad for the setting with general movement costs.
To encode mobile facility location as a matroid-median instance, we define the facility-set in the matroidmedian instance to be F × V .Facility (i, s i ) denotes that i ∈ F is moved to location s i ∈ V , and has cost w is i (note that s i could be i).The client-set is unchanged, and we set c (i,s i )j to be c s i j for every facility (i, s i ) ∈ F × V and client j ∈ D. These new distances form a metric: we have The constraint that a facility in F can only be moved to one final location can be encoded by defining a matroid where a set S ⊆ F × V is said to be independent if |{(i ′ , s) ∈ S : i ′ = i}| ≤ 1 for all i ∈ F.1 k-median forest [10].In the non-uniform version, we are given a set of nodes V endowed with two metrics {d uv } and {c uv }.The goal is to find a set S ⊆ V with |S| ≤ k and assign every node j ∈ V to a location i(j) ∈ S so as to minimize j c i(j)j + d MST(V \ S) , where MST(V \ S) is the cost of a minimum spanning forest where each component contains a node of S. Gørtz and Nagarajan [10] gave a 16-approximation algorithm for this problem based on LP-rounding, and a (3 + ǫ)-approximation algorithm based on local search for the setting where the metrics c and d are multiples of each other.
We can actually reduce a generalization of k-median forest, where there is an "opening cost" f i ≥ 0 incurred for including i in S, to two-matroid median (2MMed) as follows.(The resulting 2MMed instance is also an LCMMed instance.)We add a root r to V .The facility-set F in 2MMed is the edge-set of the complete graph on V ∪ {r}.The client-set is D := V .Selecting a facility (r, i) denotes that i ∈ S, and selecting a facility (u, v), where u, v = r, denotes that (u, v) is part of MST(V \ S).We let F 1 be the edges incident to r, and F 2 be the remaining edges.The cost of a facility (r, i) ∈ F 1 is f i ; the cost of a facility (u, v) ∈ F 2 is d uv .The client-facility distances are given by c (r,i)j = c ij and c ej = ∞ for every e ∈ F 2 .Note that these {c ej } distances indeed form a metric.We let M be the graphic matroid of the complete graph on V ∪ {r}.We impose a lower bound of |V | on the number of facilities opened from F, and an upper bound of k on the number of facilities opened from F 1 .The matroid M 2 on F 2 is the vacuous one where every set is independent.
A feasible solution to the 2MMed instance corresponds to a spanning tree on V ∪ {r} where r has degree at most k.This yields a solution to k-median forest of no-greater cost, where the set S is the set of nodes adjacent to r in this edge-set.Conversely, it is easy to see that a solution S to the k-median forest instance yields a 2MMed solution of no-greater cost.
Metric uniform minimum-latency UFL (MLUFL) [4].We have a set F of facilities with opening costs {f i } i∈F , and a set D of clients with assignment costs {c ij } j∈D,i∈F , where the c ij s form a metric.Also, we have a monotone latency-cost function λ : Z + → R + .The goal is to choose a set F ⊆ F of facilities to open, assign each open facility i ∈ F a distinct time-index t i ∈ {1, . . ., |F|}, and assign each client j to an open facility i(j) ∈ F so as to minimize i∈F f i + j∈D c i(j)j + λ(t i(j) ) .
We reduce a generalization, where there is a cost f i,t for assigning index t to facility i, to matroid median.We define the facility-set to be F × {1, . . ., |F|} and the matroid on this set to encode that a set S is independent if |{(i, t ′ ) ∈ S : t ′ = t}| ≤ 1 for all t ∈ {1, . . ., |F|}.We set f (i,t) = f i + f i,t and c (i,t),j = c ij + λ(t); note that these distances form a metric.It is easy to see that we can convert any matroid-median solution to one where we open at most one (i, t) facility for any given i without increasing the cost, and hence, the matroid-median instance correctly encodes metric uniform MLUFL.

Knapsack median
We now consider the knapsack median problem [11,12], wherein instead of a matroid on the facility-set, we have a knapsack constraint on the facility-set.Kumar [12] obtained the first constant-factor approximation algorithm for this problem, and [6] obtained an improved 34-approximation algorithm.We consider a somewhat more-general version of knapsack median, wherein each facility i has a facility-opening cost f i and a weight w i , and we have a knapsack constraint i∈F w i ≤ B constraining the total weight of open facilities.We leverage the ideas from our simpler improved deterministic rounding procedure for matroid median to obtain a slightly-improved 32-approximation algorithm for this (generalized) knapsack-median problem.We show that one can obtain a nearly half-integral solution whose cost is within a constant-factor of the optimum.It then turns out to be easy to round this to an integral solution.The resulting algorithm and analysis is simpler than that in [12,6].We defer the details to Appendix C.
A Alternate proof of half-integrality of the polytope P defined by (3) Observe that by setting z i = 2v i for i ∈ F, and introducing slack variables s j for every j ∈ D, the system defining P is equivalent to the natural LP-relaxation with these constraints to obtain the following LP (K-P).
x ij , y i ≥ 0 ∀i, j; Let (x, y) be an optimal solution to (K-P) and OPT be its value.Let Cj = i c ij x ij .Note that if our estimate C opt is correct, then OPT is at most the optimal value opt for the knapsack median instance.We show that (x, y) can be to an integer solution of cost O(OPT + f opt + C opt ).Thus, if consider all possible choices for C opt in powers of (1 + ǫ) and pick the solution returned with least cost, we obtain a solution of cost at most (32 + ǫ) times the optimum.The rounding procedure is as follows.
K1. Consolidating demands.We first consolidate demands as in Step I in Section 3.2.We now work with the client-set D and the demands {d ′ j } j∈D .For j ∈ D, we use M j ⊆ D to denote the set of clients (including j) whose demands were moved to j.Note that the M j s partition D. Let OPT ′ denote the cost of (x, y) for this modified instance.As before, for each j ∈ D, we define

K2. Obtaining a nearly half-integral solution. Set y
In the sequel, we will only consider facilities in F ′ .Consider the following polytope: Since y ′ ∈ K, we can efficiently obtain an extreme point ŷ of K such that K(ŷ) ≤ K(y ′ ), the support of ŷ is a subset of the support of y ′ , and all constraints that are tight under y ′ remain tight under ŷ. 2 Thus, if i ∈ G j and ŷi > 0, then y ′ i > 0 and so c ij ≤ U j .Also, if ŷ(G j ) < 1 then y ′ (G j ) < 1, and so γ j ≤ U j .We show in Lemma C.1 that there is at most one client, which we call the special client and denote by s, such that G s contains a facility i with ŷi / ∈ 0, 1 2 , 1 .As in Section 3.4, for each client j ∈ D, define σ(j) = j if ŷ(G j ) = 1, and σ(j) = arg min k∈D:k =j c jk otherwise (breaking ties arbitrarily).Note that c jσ(j) ≤ 2γ j .We now define the primary and secondary facilities of each client j ∈ D, which we denote by i 1 (j) and i 2 (j) respectively as before.If j is not the special client s, then i 1 (j) is the facility i nearest to j with ŷi > 0; otherwise, i 1 (j) = arg min i∈F ′ j :ŷ i >0 w i (breaking ties arbitrarily).If ŷi 1 (j) = 1, then we set i 2 (j) = i 1 (j).Otherwise, if ŷ(G j ) < 1, we set i 2 (j) = i 1 (σ(j)).Otherwise, if j = s, set i 2 (j) to be the half-integral facility in G j other than i 1 (j) that is nearest to j, else let i 2 (j) be the facility with smallest weight among the facilities i ∈ G j with ŷi > 0.
2 We can obtain ŷ as follows.Let Av ≤ b, v ≥ 0 denote the constraints of K. Recall that z is an extreme point of K iff the submatrix A ′ of A corresponding to the non-zero variables and the tight constraints has full column rank.So if y ′ is not an extreme point, then letting F ′′ = {i : y ′ i > 0}, we can find some d ′ ∈ R F ′′ + such that A ′ d = 0.So letting di = d ′ i if i ∈ F ′′ and 0 otherwise, we can find some ǫ > 0 such that both y ′ + ǫd and y ′ − ǫd are feasible and all constraints that were tight under y ′ remain tight.So moving in the direction that does not increase the K(.)-value until some non-zero y ′ i drops down to 0 or some new constraint goes tight, and repeating, we obtain the desired extreme point ŷ.
The facilities i 1 (j) and i 2 (j) naturally yield a half-integral solution, where these facilities are open to an extent of 1  2 and j is assigned to them to an extent of 1 2 ; as before, if i 1 (j) = i 2 (j), then this means that i 1 (j) is open to an extent of 1 and j is assigned completely to i.The choice of the primary and secondary facilities ensures that this solution is feasible.K3.Clustering and rounding to an integral solution.This step is quite straightforward.We define C ′ j for j ∈ D, and cluster clients in D exactly as in step in Section 3.4, and we open the facility with smallest weight within each cluster.Finally, we assign each client to the nearest open facility.Let (x, ỹ) denote the resulting solution.Recall that D ′ is the set of cluster centers, and for k ∈ D, ctr(k) denotes the client in D due to which k was removed in the clustering process (so ctr(j) = j for j ∈ D ′ ).
Analysis.We call a facility i half-integral (with respect to the vector ŷ obtained in step K2) if ŷi ∈ {0, 1  2 , 1} and fractional otherwise.
Lemma C. 1 The extreme point ŷ of K obtained in step K2 is such that there is at most one client, called the special client and denoted by s, such that G s contains fractional facilities.Moreover, if 1  2 < ŷ(G s ) < 1, then there is one exactly one facility i ∈ F ′ s such ŷi > 0.
Proof : Since ŷ is an extreme point, it is well known (see, e.g., [8]) that the submatrix A ′ of the constraint matrix whose columns correspond to the non-zero ŷi s and rows correspond to the tight constraints under ŷ has full column-rank.The rows and columns of A ′ may be accounted for as follows.Each client j ∈ D contributes: (i) a non-empty disjoint set of columns corresponding to the positive ŷi s in G j ; and (ii) a possibly-empty disjoint set of at most two rows corresponding to the tight constraints ŷ(F ′ j ) = 1 2 and ŷ(G j ) = 1.This accounts for all columns of A ′ .There is at most one remaining row of A ′ , which corresponds to the tight constraint i w i ŷi = B. Let p j and q j denote respectively the number of columns and rows contributed by j ∈ D. First, note that p j ≥ q j for all j ∈ D. This is clearly true if q j ≤ 1; if q j = 2, then ŷ(F ′ j ) = 1 2 , ŷ(G j ) = 1, so both F ′ j and G j must have at least one positive ŷi .Also, note that if p j = q j , then G j contains only half-integral facilities.Since j p j ≤ j q j + 1, there can be at most one client such that p j > q j ; we let this be our special client s.Note that we must have p s = q s + 1.
If 1 2 < ŷ(G s ) < 1 then either: (i) q s = 0, so p s = 1; or (ii) q s = 1, so p s = 2, and since ŷ(F ′ s ) = 1 2 < ŷ(G s ), both F ′ s and G s contain exactly one positive ŷi .It is easy to adapt the proof of Lemma 3.2, and obtain that K(ŷ) ≤ K(y ′ ) ≤ 8 • OPT ′ ≤ 8 • OPT .Next, we prove our main result: the integer solution (x, ỹ) computed is feasible and its cost for the modified instance is at most K(ŷ) + f opt + 4C opt + 16 • OPT .Thus, "moving" the consolidated demands back to their original locations yields a solution of cost at most (32 + ǫ) • opt for the correct guess of f opt and C opt .The following claims will be useful.

Claim C.2
If ŷ(G j ) = 1 for some j ∈ D, then (we may assume that) j is a cluster center.

Theorem 4 . 4
One can round (x, y, z) to an integer solution of cost at most 24 • OPT .
open facilities is feasible if it satisfies: (i) the matroid constraints F ∈ I and F ∩ F 2 ∈ I 2 ; (ii) the lower-and upper-bound constraints lb1 ≤ |F ∩ F 1 | ≤ ub1 , lb2 ≤ |F ∩ F 2 | ≤ ub2 , and lb ≤ |F | ≤ ub.(The upper bounds ub2 and ub may also be incorporated in the definition of the matroids M and M 2 respectively.)

Claim C. 3
2(C ′ k − C ′ j ) = c i 1 (k)k + c jk − c i 2 (j)j ≥ c i 1 (k)j − c i 2 (j)j ≥ 0 since i 2 (k) / ∈ G j .For any client j ∈ D, we have d ′ j U j ≤ C opt + 4 • OPT .Proof : By definition, k d k max{0, U j − c jk } ≤ C opt .So d ′ j U j = k∈M j d k U j = k∈M j d k (U j − c jk ) + k∈M j d k c jk ≤ C opt + k∈M j 4d k Ck ≤ C opt + 4 • OPT .