Abstract 1 Introduction 2 High-level Plan 3 Finding Good Center via Geometry 4 Clustering for Homogeneous Instances 5 Clustering for Connection-dominant Instances 6 Improved Bifactor Approximation 7 Improved Unifactor Approximation 8 APX-Hardness 9 Conclusion References

Facility Location on High-Dimensional Euclidean Spaces

Euiwoong Lee University of Michigan, Ann Arbor, MI, USA Kijun Shin Seoul National University, South Korea
Abstract

Recent years have seen great progress in the approximability of fundamental clustering and facility location problems on high-dimensional Euclidean spaces, including k-Means and k-Median. While they admit strictly better approximation ratios than their general metric versions, their approximation ratios are still higher than the hardness ratios for general metrics, leaving the possibility that the ultimate optimal approximation ratios will be the same between Euclidean and general metrics. Moreover, such an improved algorithm for Euclidean spaces is not known for Uncapaciated Facility Location (UFL), another fundamental problem in the area.

In this paper, we prove that for any γ1.6774 there exists ε>0 such that Euclidean UFL admits a (γ,1+2eγε)-bifactor approximation algorithm, improving the result of Byrka and Aardal [3]. Together with the (γ,1+2eγ) NP-hardness in general metrics, it shows the first separation between general and Euclidean metrics for the aforementioned basic problems. We also present an (αLiε)-(unifactor) approximation algorithm for UFL for some ε>0 in Euclidean spaces, where αLi1.488 is the best-known approximation ratio for UFL by Li [15].

Keywords and phrases:
Approximation Algorithms, Clustering, Facility Location
Funding:
Euiwoong Lee: Supported by organization NSF CCF-2236669 and a gift from Google.
Copyright and License:
[Uncaptioned image] © Euiwoong Lee and Kijun Shin; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Facility location and clustering
Related Version:
Full Version: https://arxiv.org/abs/2501.18105 [14]
Editor:
Raghu Meka

1 Introduction

The (metric) Uncapacitated Facility Location (UFL) is one of the most fundamental problems in computer science and operations research. The input of the problem consists of a metric space (X,d), a set of facility locations X, a set of clients 𝒞X, as well as facility opening costs {fi}i. The goal is open a subset of centers S to minimize the sum of the opening cost (iSfi) and the connection cost (j𝒞miniSd(i,j)). After intensive research efforts over the years [11, 12, 16, 15], the best approximation ratio is 1.488 [15] and the best hardness ratio is 1.463 [11].

As the objective function is the sum of two heterogeneous terms of the opening cost and the connection cost, the natural notion of bifactor approximation has been actively studied as well. Formally, given an instance of UFL, a solution S is called an (λf,λc)-approximation for some λf,λc1 if, for any solution T, the total cost of S is at most λfF+λcC, where F, C denote the opening and connection cost of T respectively. In particular, the case λf=1, also known as a λc-Lagrangian Multiplier Preserving (LMP) approximation, has been actively studied due to its connection to another fundamental clustering problem of k-Median. There is a (2ε)-LMP approximation for some ε>2107 [8], and any λc-LMP approximation for UFL can be translated to 1.307λc-approximation for k-Median [9].

Generalizing the hardness of Guha and Khuller [11], Jain, Mahdian and Saberi [12] proved that no (λf,λc)-approximation polynomial-time algorithm exists for λc<1+2eλf unless 𝐏=𝐍𝐏. (Guha-Khuller’s hardness ratio γ1.463 is exactly the solution of γ=1+2eγ.) While the optimal value for λc is not known for small values of λf, Byrka and Aardal gave an algorithm that achieves an (λf,1+2eλf)-approximation for any λf1.6774 [3].

Euclidean spaces are arguably the most natural metric spaces for facility location and clustering problems. Formally, Euclidean UFL is a special case of UFL where the underlying metric is (k,.2) for some dimension k. When k=O(1), this problem admits a PTAS [5], while the problem remains APX-hard when k is part of the input [7].111While the cited paper only studies k-Median and k-Means, the soundness analysis in their Theorem 4.1 (of the arXiv version) can be directly extended to any number of open facilities k, implying APX-hardness of Euclidean UFL.

Recent years have seen active studies on related k-Means and k-Median on high-dimensional Euclidean spaces [1, 10, 4], so that the best-known approximation ratios for them are 5.912 and 2.406 respectively. While they are strictly lower than the best-known approximation ratios for general metric spaces (which are 9 and 2.613), they are still larger than the best-known hardness ratios for general metrics (which are 1+8/e3.943 and 1+2/e1.73[12], which means that it is still plausible that the optimal approximation ratios for k-Median and k-Means are the same between Euclidean metrics and general metrics.

Our first result is the first strict separation between Euclidean and general metric spaces for UFL. In particular, we show that Euclidean UFL admits a (1.6774,1+2e1.6774ε) approximation for some universal constant ε>0, which is NP-hard to do in general metrics.

Theorem 1.

There exists a (1.6774,1+2e1.6774ε)-approximation algorithm for Euclidean UFL for some ε31042.

By the result of Mahdian et al. [16], it implies an (γ,1+2eγεe(γ1.6774))-approximation for any γ1.6774. Using this result, we are able to slightly improve the approximation ratio for the best-known (αLi1.488)-unifactor approximation of Li [15].

Theorem 2.

There exists a (αLiε)-approximation algorithm for Euclidean UFL for some ε21045.

Recent years also have seen great progress on hardness of approximation for clustering problems in high-dimensional Euclidean spaces, including Euclidean k-Means and k-Median [6, 7]. We show that similar techniques extend to UFL as well, proving the APX-hardness.

Theorem 3.

Euclidean UFL is APX-hard.

2 High-level Plan

Our work is based on the framework of Byrka and Aardal [3] who achieved an optimal (λf,1+2eλf)-bifactor approximation for λf1.6774 in general metrics. We first review their framework. It is based on the following standard linear programming (LP) relaxation:

Minimize i,j𝒞d(i,j)xij+ifiyi
subject to ixij=1 j𝒞,
xijyi i,j𝒞,
xij,yi[0,1] i,j𝒞.

The dual formulation is as follows:

Maximize j𝒞vj
subject to j𝒞wijfi i,
vjwij i,j𝒞,
wij0 i,j𝒞.

A feasible solution (x,y) induces the support graph, which is defined as the bipartite graph G=((,𝒞),E) where nodes i and j𝒞 are adjacent iff the corresponding LP variable xij>0. Two clients j,j𝒞 are considered neighbors in G if they share the same facility.

Let (x,y) be a fixed optimal solution to the primal program. The overall cost is divided into the facility cost F=ifiyi and the connection cost C=i,j𝒞d(i,j)xij. Our goal is to round this solution to obtain a solution S whose total cost is at most λfF+λcC; it is well known that it implies the (λf,λc)-approximation defined in the introduction by scaling [3], so let us redefine the (λf,λc)-approximation for the rest of the paper so that S is (λf,λc)-approximate if its total cost is λfF+λcC.

The opening cost and connection cost for individual clients can be further divided using the optimal LP dual solution (v,w). For each client j𝒞, the fractional connection cost is given by Cj=id(i,j)xij, and the fractional facility cost is computed as Fj=vjCj. The irregularity of the facilities surrounding j𝒞 is defined by

rγ(j)=d(j,𝒟j)d(j,𝒟j𝒞j)Fj.

Similarly,

rγ(j)=(γ1)rγ(j)=d(j,𝒟j𝒞j)d(j,𝒞j)Fj.

If iyi=0, we set d(j,)=0. Similarly, when Fj=0, we define rγ(j)=0 and rγ(j)=0. According to the definition, the following conditions hold: the irregularity 0rγ(j)1, the average distance to a close facility Cj=d(j,𝒞j)=Cjrγ(j)Fj, and the average distance to a distant facility Dj=d(j,𝒟j)=Cj+rγ(j)Fj. The maximum distance to a close facility is bounded by MjDj.

Clustering of [3].

At a high level, clustering operates based on the support graph G=((,𝒞),E). For each c𝒞, let Nc:={c𝒞:f such that (c,f),(c,f)E} be the neighbor of c. The clustering algorithm iteratively selects some client c as a cluster center, put all its neighbors into the cluster, and proceed with the remaining clients. Eventually, all clients are partitioned into one of these clusters. After this, for each cluster, exactly one facility adjacent to cluster center is opened. This ensures that every client is connected to a facility that is not too far from them. Therefore, the criteria for choosing cluster centers and opening facilities will determine the quality of solution.

Starting with a fractional solution of the LP (x,y) and a parameter γ(1,2), [3] constructed the facility-augmented solution (x¯,y¯), where each yi value is multiplied by γ and each client j𝒞 reconfigures its xij values to be fractionally connected to as close facilities as possible. (E.g., x¯ij>0 implies xij>0, but not vice versa.) With some postprocessing, one can also assume that x¯ij{0,y¯i} for every i,j𝒞. Then one can categorize every facility near j𝒞 into two types: close facilities 𝒞j={ix¯ij>0} and distant facilities 𝒟j={ix¯ij=0 and xij>0}. This implies that as γ increases, the clusters become smaller, and more facilities are opened.

Let the average distance from j𝒞 to a set of facilities be defined as d(j,)=(id(i,j)yi)/(iyi). Then Let Cj:=d(j,𝒞j), Mj:=maxi𝒞jd(j,i), and Dj:=d(j,𝒟j). We have CjMjDj.

At this point, the support graph is defined by (x¯,y¯) solution. Intuitively, we choose the client j with the smallest Cj+Mj as a new cluster center. Given this clustering, the standard randomized rounding procedure is as follows:

  1. 1.

    For each cluster center j, choose exactly one facility from its neighboring facility set {i:(i,j)E} according to the y¯ values. (Recall that the sum of these values is exactly 1.)

  2. 2.

    For any facility i that is not adjacent to any cluster center in G, independently open i with probability y¯i.

Algorithm 1 greedy: [3]’s clustering algorithm.

Let us consider one client j𝒞 and see how its expected connection cost can be bounded under the above randomized rounding. Byrka and Aardal [3] proved the following properties.

  • The probability that at least one facility in 𝒞j is opened is at least 1e1.

  • The probability that at least one facility in 𝒞j𝒟j is opened is at least 1eγ.

  • Let client j𝒞 be a neighbor of j in G. Then, either 𝒞j(𝒞j𝒟j)= or the rerouting cost d(j,𝒞j(𝒞j𝒟j))d(j,j)+d(j,𝒞j(𝒞j𝒟j))Dj+Cj+Mj holds. Especially, when j is the cluster center of j, it is at most Cj+Mj+Dj. (Li [15] refined this bound to Cj+(3γ)Mj+(γ1)Dj.)

Then, one can (at least informally) expect that the expected connection cost of j is at most (1e1)Cj+(e1eγ)Dj+eγ(Dj+Cj+Mj). It turns out that setting γ1.6774 (the solution of e1+eγ(γ1)(1e1+eγ)=0) ensures that this value is at most (1+2eγ)Cj, proving their (γ,1+2eγ)-bifactor. (See Section 6 for the formal treatment of their analysis as well as our improvement.)

Exploit the Geometry of Euclidean Spaces.

In order to strictly improve the approximation ratio, it is natural to attempt to find a cluster N and its center j where the above inequality holds with some additional slack. Let costj(j)=d(j,𝒞j(𝒞j𝒟j)). Intuitively, our goal is to find a cluster N𝒞 with center j such that

jNcostj(j)jN((1ε1)Cj+(3γ)Mj+(γ1)Dj). (1)

The only requirement from the rounding algorithm is that NjN. Compared to [3]’s clustering, we want to shave ε1Cj on average.

Let us consider the very special case where Cj=Mj=Dj=1 for every j𝒞; every facility serving j in the original LP solution (x,y) is at the same distance from j. Let j𝒞 be a cluster center and j𝒞 be in the cluster of j. Then, a simple 3-hop triangle inequality (just using 𝒞j𝒞j) ensures that costj(j)3, and our goal is to improve it to (3ε1). If costj(j)>3ε1, how should the instance look like around j?

It turns out that the instance around j must exhibit a very specific structure in order to ensure that the 3-hop triangle inequality is tight for almost every neighbor jNj. We must have almost every jNj located around almost the same point at distance 2 from j, where almost all facility neighbors of j are at the opposite end of the line connecting Nj and j. See Figure 1 for an example. Intuitively, the existence of such a dense region of clients suggests that if we let a client j′′ in the region as a new center, many of the 3-hop-triangle inequalities cannot be tight, which implies average rerouting cost costj′′(j)3ε1. If j′′ is again problematic, we can repeat this procedure over and over.

Refer to caption
Figure 1: A simple case when costj(j)3. There is a client-dense region on the left.

However, if we relax the condition to Cj=Mj=1Dj, certain exceptions begin to emerge. One possible scenario is as follows: Since costj(j)=d(j,𝒞j(𝒞j𝒟j)) captures the rerouting of j to j’s close facilities 𝒞j except the j’s facilities 𝒞j𝒟j, if 𝒞j𝒟j is large enough to exclude the facilities of 𝒞j, then costj(j) might not behave as expected. However, a large volume of 𝒞j𝒟j implies low Cj/Fj ratio. If the C/F ratio is sufficiently low, then this facility-dominant instance is actually easier to handle with a completely different algorithm, the JMS algorithm [12], which is known to be (1.11,1.7764)-approximation algorithm.

Therefore, from now on, assume that the cluster centered at j is connection-dominant. More strictly, assume that for any neighbor j of cluster center j satisfies that 𝒞j𝒟j cannot cover half of the ball centered at j with a radius of 1. At this point, we can finally assert that it is impossible to avoid the formation of a dense region of clients.

Assume towards contradiction that there is no dense region and consider j in j’s cluster such that costj(j)>3ε1. Almost all facilities of j must be placed in one of two locations: on the opposite side of j relative to j, or within 𝒞j𝒟j. Since there is no dense region, there must be a neighbor k of j such that costj(k)>3ε1, and k is located in a different direction from j. This implies that the facilities positioned on the opposite side of j now help reduce costj(k), forcing that they are in 𝒞k𝒟k; since we assumed that 𝒞k𝒟k cannot cover half of the unit ball around j, it implies the angle jjk must be strictly greater than π2! Ultimately, this process can be simplified to the following situation: inserting unit vectors into a unit sphere with every pairwise angle greater than π2+ε for some constant ε>0. It is well known that in the geometry of Euclidean space, there is an upper bound f(ε) on the number of such vectors, and such an upper bound shows that one of the regions around j (or k) we considered must have been dense. See Figure 2 for an example.

Refer to caption
Figure 2: 𝒞k𝒟k contains facilities positioned on the opposite side of j.

However, there are several technical barriers to extending this notion to the general case without restrictions on Cj, Mj, and Dj. In Section 3, we introduce these barriers and formalize the above concept. We also propose sufficient conditions to satisfy (1). From Section 3.1 to Section 5, we demonstrate how to remove these conditions, leaving only the connection-dominant instance assumptions. In Section 6, we propose and analyze the full algorithm, which achieves improved bi-factor approximation performance.

3 Finding Good Center via Geometry

In this section, we exploit the geometry of Euclidean spaces to prove the existence of a cluster center strictly better than the greedy choice of [3] under certain conditions (Theorem 10). We first define several concepts and introduce their motivation, including the sketch of our algorithm.

Recall that our goal is to find a cluster that satisfies (1). In all the following propositions, γ is a fixed value in the range γ(1.6,2).

Definition 4.

Suppose j𝒞 is a cluster center. Let Nj be the set of neighbors of j, and ε1=1012. Additionally, define two more sets:

Nj={jNj|costj(j)>(1ε1)Cj+(3γ)Mj+(γ1)Dj}
Nj+={j𝒞|costj(j)(1ε1)Cj+(3γ)Mj+(γ1)Dj}

Moreover, Saving and Spending of center j is defined as

Saving(j)=jNj+{((1ε1)Cj+(3γ)Mj+(γ1)Dj)costj(j)},
Spending(j)=jNj{costj(j)((1ε1)Cj+(3γ)Mj+(γ1)Dj)}.

With the goal (1) in mind, Nj+ (resp. Nj) contains clients j who meet (resp. do not meet) this goal, and Saving(j) (resp. Spending(j)) indicates how much costj(j) exceeds (resp. is short of) this goal. Note that NjNjNj+Nj by definition; it is the best cluster for center j, which contains all its neighbors (which is required by the algorithm design) and possibly more to increase savings. Therefore, if Saving(j)Spending(j), then j is considered a good center; otherwise, it is a bad center.

Now, we will explain some problematic situations that arise when extending the problem to the general case, i.e., without restrictions on Cj, Mj, and Dj’s. The first simple concern is that the choice of the center j will no longer be solely based on Cj+Mj as in Algorithm 1, which breaks the previous arguments. Therefore, to gain more flexibility in selecting a new cluster center, it is beneficial to decompose the entire support graph into several layers, where each layer only concerns clients with roughly the same Cj+Mj values.

Definition 5.

A network is a subgraph ((,𝒞),E) of the support graph G=((,𝒞),E). A network is called homogeneous if there exists s0, such that for any client j𝒞, sCj+Mj(1+δ)s for δ=3×1023.

However, there are still two more scenarios where the above strategy might not hold, as illustrated in Figure 3. This means that the neighbors in Nj are all bad clients, but do not create a dense region.

  1. I.

    If Cjs, the facilities in 𝒞j are concentrated near j. In this case, since all the facilities are very close to j, the 3-hop triangle inequality is almost tight for any jNj regardless of where it is.

  2. II.

    Recall that if 𝒞j𝒟j is large enough to exclude the facilities of 𝒞j, then costj(j) might not behave as expected. In particular, a technical problem arises when two facilities from 𝒞j that are almost antipodal with respect to j are both contained in 𝒞j𝒟j, which is illustrated in Figure 3 (right).

Refer to caption
Figure 3: Two examples in a homogeneous network where every neighbor of center j belongs to Nj without creating a dense region.

The following definition addresses Scenario I.

Definition 6.

For K6=1.302, let θ=K6+1γ2K6+2γ. A client j𝒞 is normal if Cjθ(Cj+Mj), otherwise weird.

The following definition addresses Scenario II. Note that vj is the maximum distance between j and any facility in 𝒞j𝒟j. We are interested in the ball around j of radius 0.998zj(j) (for sake of analysis), and j having a small remote arm with respect to j means that 𝒞j𝒟j cannot contain two antipodal points in our ball of interest (with some slack depending on α). Let zj(j)=d(j,𝒞j(𝒞j𝒟j)). The right-hand side represents the square of the length of the other side of a triangle, where the lengths of the two sides are d(j,j) and 0.998zj(j), and the included angle between them is π2α.

Definition 7.

For normal center j, a neighbor j of j is said to have a small remote arm if the following holds for α=5×104.

(vj)2=(Cj+Fj)2<d(j,j)2+(0.998zj(j))22d(j,j)0.998zj(j)cos(π2α)

Otherwise, j is said to have a big remote arm.

In Section 3.1, we will show that these two scenarios are the only bad cases to worry about. For instance, if we assume that everything is normal and has a small remote arm, each candidate center j𝒞 is either good or contains another candidate center j′′ with Saving(j′′)>Saving(j) in its dense region.

How can we handle these two bad scenarios? In the following two lemmas, we prove that both the weird center and the big remote arm neighbor imply a high ratio of fractional facility cost Fj to fractional connection cost Cj. The proof appears in the extended version of this paper [14].

Lemma 8.

For any weird client j, CjK6Fj holds.

Lemma 9.

When 1.6<γ<2, for a normal center j, if jNj has a big remote arm, then CjK6Fj.

Therefore, if we consider a (sub)instance that has a low facility-to-connection cost ratio, it is natural to expect to apply this argument, which ultimately leads to a (somewhat) good center. Throughout the paper, we will often express this low facility-to-connection ratio condition will be expressed as C>KF and use the following values for K: K1=1.3025>K2=1.3024>K3=1.3023>K4=1.3022>K5=1.3021>K6=1.302. The following theorem is the main result of the section.

Theorem 10.

Consider a homogeneous network and let j be the normal center with the highest saving among all normal centers. Let c=jNj+NjCj and f=jNj+NjFj. If c>K5f, then this cluster is good on average for ε2=5×1018 and every γ(1.6,2). i.e.

jNj+Njcostj(j)jNj+Nj((1ε2)Cj+(3γ)Mj+(γ1)Dj).

3.1 Geometric Arguments

In this subsection, we prove Theorem 10 using properties of Euclidean geometry. As previously discussed, our goal is to show: When a bad cluster center j is normal and (many of) its neighbors have a small remote arm, it is possible to find a dense region of clients near j.

Refer to caption
Figure 4: Intersection between a Tjj and a remote arm 𝒞j𝒟j.

When we consider the rerouting of j to j’s close facilities 𝒞j, we can define some worst facilities. In the below definition, Bjj𝒞j is the set of facilities that have the (almost) worst distance from j, and TjjBjj is the set of facilities with both worst distance and worst angle.

Definition 11.

Let r:=108, and let ϕr2×104 be the minimum angle that satisfies 1+x2+2xcosϕr(1+(1r)x)2 for 0x2. For j𝒞 and its center j, let

Bjj :={i𝒞j|  0.999zj(j)d(i,j)Mj}
Tjj :={iBjj|jji>πϕr}.

The following lemma shows that if j is normal and jNj has a bad rerouting through j, then among the facilities in Bjj(𝒞j𝒟j), which are the rerouting candidates with worst distance, more than half of them must have a bad angle as well. It is a formalization of the intuition illustrated in Figure 1.

Lemma 12.

For any jNj of normal center j, zj(j)>0.99Cj.

Proof.

By the triangle inequality and the homogeneous condition,

zj(j)costj(j)d(j,j) >(1ε1)Cj+(3γ)Mj+(γ1)Dj(Mj+Mj)
(1ε1)Cj+(2γ)Mj+(γ1)Dj((1+δ)sCj)
(ε1+δ)s+Cj(1ε1+δθ)Cj0.99Cj.

Lemma 13.

For a normal center j and any jNj, let G1=Tjj(𝒞j(𝒞j𝒟j)), G2=(BjjTjj)(𝒞j(𝒞j𝒟j)). Then the following holds:

iG1yi>iG2yi.

Proof.

Assume the nontrivial case: 𝒞j(𝒞j𝒟j). Let G3=(𝒞j(𝒞j𝒟j))(G1G2). Let rerouting probability pn and rerouting length ln for 1n3 as:

pn=iGnyii𝒞j(𝒞j𝒟j)yi,ln=iGnyid(i,j)iGnyi

Note that p1+p2+p3=1 and p1l1+p2l2+p3l3=zj(j). Then the goal of this theorem can be written as p1>p2.

By Lemma 12, zj(j)0.99Cj holds. We derive a lower bound for p1+p2 as:

zj(j)=p1l1+p2l2+p3l3(p1+p2)Mj+(1p1p2)0.999zj(j)

implies that

p1+p20.001zj(j)Mj0.999zj(j)0.0010.99CjMj0.00099θ1θ

since the right-hand side gives the minimum value when Mj/Cj is the maximum. It is bounded because j is normal.

Denote a position vector as v. From the definition of ϕr, costj(j) is at most

costj(j) =i𝒞j(𝒞j𝒟j)yi(vjvj)+(vivj)i𝒞j(𝒞j𝒟j)yi
p1(d(j,j)+l1)+p2(d(j,j)+(1r)l2)+p3(d(j,j)+l3)
=d(j,j)+zj(j)p2rl2
Cj+Mj+(2γ)Mj+(γ1)Djp2rl2
(10.9990.99p2r)Cj+Mj+(2γ)Mj+(γ1)Dj.

Therefore, since jNj, (1ε1)Cj+Mj<(10.9990.99p2r)Cj+Mj(10.98p2r)Cj+Mj holds. It implies

(1ε12)(Cj+Mj)(1ε1)Cj+Mj <(10.98p2r)Cj+Mj
((10.98p2r)θ+(1θ))(Cj+Mj).

From the homogeneous condition,

(1ε12)<(10.98p2rθ)(1+δ)

which implies

p2<10.98θrδ+ε121+δ<0.00099θ1θ<p1.

The following argument demonstrates how the positional distribution of facilities in 𝒞j restricts that of the neighbors in Nj. Consider two neighbors j,kNj, both with small remote arms, separated by an angle greater than 2ϕr, say 1100. Then, TjjTjk=. However, facilities in Tjj reduce costj(k) and vice versa, which implies that either (𝒞k𝒟k)Tjj or vice versa. As the small remote arm condition of j puts a limit on how much (𝒞j𝒟j) can intersect 𝒞j, it is natural to expect that the number of such pairs is small.

Theorem 14.

For a normal center j, let S be a subset of Nj, consisting of clients with a small remote arm. Furthermore, let any two elements j1,j2S be separated by an angle greater than 1100 with respect to center j, i.e., j1jj2>1100. Then, the cardinality of S is bounded by M=5×106, independent of the Euclidean space’s dimension.

Proof.

Denote S={j1,j2,,j|S|}. Without loss of generality,

iTjjk(𝒞j(𝒞jk𝒟jk))yiiTjjk+1(𝒞j(𝒞jk+1𝒟jk+1))yi

for all 1k<|S|. Suppose Tjjn(𝒞jn𝒟jn)= for some n<n. Additionally, given jnNj and the small arm condition, it implies Tjjn(𝒞jn𝒟jn)=. Therefore, TjjnTjjn𝒞j(𝒞jn𝒟jn). Moreover, TjjnTjjn= since 2ϕr<1100. However, it contradicts to Lemma 13.

We show that two j’s in S with similar zj(j) values form a large angle with j. Suppose jnjjn=β<π2+αϕr for some n<n with zj(jn)zj(jn)[11.001,1.001]. Then for any point xTjjn, it holds that d(j,x)0.999zj(jn)>0.998zj(jn) and jnjx<β+ϕr<π2+α. Also, the quadratic function d(j,x)22d(j,jn)d(j,x)sin(α) is shown to be non-decreasing for d(j,x)>0. It comes from Lemma 12, which ensures that d(j,x)>0.998zj(jn)0.9980.99Cj0.98θs2(1+δ)sinαsd(j,jn)sinα. Therefore, since jn has a small remote arm,

d(jn,x)2 =d(j,jn)2+d(j,x)22d(j,jn)d(j,x)cosjnjx
>d(j,jn)2+d(j,x)22d(j,jn)d(j,x)sinα
d(j,jn)2+(0.998zj(jn))22d(j,jn)0.998zj(jn)sinα
>(Cjn+Fjn)2=vjn2

which implies Tjjn(𝒞jn𝒟jn)=, contradicting to the above result. Refer to Figure 4.

According to Rankin [17], the maximum number of disjoint spherical caps, each with an angular radius of π4+αϕr2, is at most 1+csc(αϕr) in any dimension. Since 0.99Cjzj(j)Mj holds, it is feasible to segment this range into successive subranges such as [0.99Cj,1.0010.99Cj], [1.0010.99Cj,1.00120.99Cj], …, up to [Mj/1.001,Mj]. Thus each subrange can only contain a finite number of clients with a small remote arm. Moreover, the normal center condition ensures a bounded number of such divisions. Consequently, the cardinality of |S| is at most

|S|(1+1sin(αϕr))logMj0.99Cjlog1.001(1+1sin(αϕr))log1θ0.99θlog1.001.

Therefore, if we have a large SNj with small remote arms, there must exist a large subset of S whose pairwise angle is small, creating a dense region.

Lemma 15.

For a normal center j, let S be a set of clients from Nj with a small remote arm. If S, then there exists a client jS for which Saving(j)s125|S|M.

Proof.

Let S be a maximal subset of S where any two clients are separated by an angle greater than 1100 with respect to the center j. Then, for any client jS, there exists a client zjS such that jjzj1100. Therefore, there exists a zS such that the number of clients jS for which zj=z is at least |S||S|.

For 1n5, let Rn be a region where

Rn={xl|zjx1100,2n25(1+δ)sd(j,x)2n5(1+δ)s}.

Therefore, there exists an index k such that |Rk||S|5|S|. For any two clients j1,j2Rk, the distance d(j1,j2) is bounded by the sum of their radial and angular differences. Hence, d(j1,j2)2(1+δ)s5+2(1+δ)s50=1125(1+δ)s. From the triangle inequality and the homogeneous condition,

costj1(j2) d(j1,j2)+Mj13625(1+δ)s3ε12(1+δ)s
(1ε1)Cj2+2Mj2(1ε1)Cj2+(3γ)Mj2+(γ1)Dj2

which implies j2Nj1+. By Theorem 14, |S|M. Therefore, the saving of any client jRk is at least

Saving(j) |S|5|S|((1ε1)Cj+(3γ)Mj+(γ1)Dj3625(1+δ)s)
|S|5M372δ25ε150ss125|S|M.

Given these geometric tools, Theorem 10 follows as the number of big-remote-arm-neighbors of the chosen center j can be bounded.

See 10 The proof appears in the extended version of this paper [14].

4 Clustering for Homogeneous Instances

In this section, we present an algorithm that operates on a connection-dominant homogeneous instance, ensuring strictly better performance than a naive greedy clustering strategy. Let c(i) for i𝒞 be the center of i when some clustering is given in the context.

Theorem 16.

Suppose a homogeneous network G=((,𝒞),E) satisfies C>K4F. Then, the clustering produced by Algorithm 2 is good on average. Precisely, for ε3=3×1032 and every γ(1.6,2), the following holds:

j𝒞costc(j)(j)j𝒞((1ε3)Cj+(3γ)Mj+(γ1)Dj).
Algorithm 2 homogeneous: Homogeneous Clustering.

For one of a cluster A made by the algorithm, if the center of A is weird, then the ratio is at most K6 since all clients within them are weird. If A is not “good”, its connection-facility ratio is at most K5 by Theorem 10. Therefore, the assumed ratio K4 in this theorem is larger than these values, implying that a constant proportion of clusters are “good” since they satisfy the conditions of Theorem 10. Therefore, by reducing the ε value by that proportion, the desired result can be obtained.

Proof.

Divide 𝒞 into three groups:

  1. 1.

    Clients clustered by a normal center, where the ratio of that cluster’s connection cost to facility cost is greater than K5.

  2. 2.

    Clients clustered by a normal center, where this ratio is at most K5.

  3. 3.

    Clients clustered by a weird center.

Let Sn be the set of clients in the n-th group (1n3). Here, S1 and S2 correspond to clusters formed by the if condition, and S3 is formed by the else condition. Define the following values:

Cn=jSnCj,Fn=jSnFj

By Theorem 10, jS1costc(j)(j)jS1((1ε2)Cj+(3γ)Mj+(γ1)Dj). The rerouting cost for S2 is only bounded by the homogeneous condition. For a client j, costc(j)(j)Cc(j)+Mc(j)+(2γ)Mj+(γ1)Dj(1+δ)(Cj+Mj)+(2γ)Mj+(γ1)Dj. Lastly, clients in S3 are clustered through a greedy strategy. Thus, costc(j)(j)Cj+(3γ)Mj+(γ1)Dj. Note that S3 consists solely of weird clients, meaning C3K6F3K5F3.

From above, the total rerouting cost is bounded by:

j𝒞costc(j)(j) =jS1costc(j)(j)+jS2costc(j)(j)+jS3costc(j)(j)
jS1((1ε2)Cj+Mj+(2γ)Mj+(γ1)Dj)
+jS2((1+δ)(Cj+Mj)+(2γ)Mj+(γ1)Dj)
+jS3(Cj+(3γ)Mj+(γ1)Dj).

Therefore, it is sufficient to show that

(δ+ε3)jS2(Cj+Mj)+ε3jS3Cj(ε2ε3)jS1Cj.

Note that C2+C3K5(F2+F3)<K5K4C holds, which means C1>(1K5K4)C. Also C1>K4F1, since C2+C3K5(F2+F3). Then the following holds:

(δ+ε3)jS2(Cj+Mj)+ε3jS3Cj
(δ+ε3)jS2S3(Cj+Mj)(δ+ε3)j𝒞(Cj+Dj)
(δ+ε3)(2C+(2γ)F)
(δ+ε3)(2γ+2K4)K4C
(ε2ε3)(K5γ+1)(K4K5)K4K5C(ε2ε3)K5γ+1K5C1
(ε2ε3)(C1(γ1)F1)(ε2ε3)jS1Cj.

5 Clustering for Connection-dominant Instances

In this section, we introduce an algorithm that operates on connection-dominant instances without a homogeneous condition. This algorithm uses Algorithm 1 and Algorithm 2 as subroutines. The theorem stated below shows our main result.

Theorem 17.

For any connection-dominant instance, i.e. C>K1F, there exists an algorithm that finds a clustering configuration whose rerouting cost is at most

j𝒞costc(j)(j)j𝒞((1ε5)Cj+(3γ)Mj+(γ1)Dj)

for ε5=2×1041 and every γ(1.6,2).

Definition 18.

Let B0 be the set of clients for which Cj+Mj=0. Let s=minj𝒞B0(Cj+Mj). For δ=7×1032, let a Block Bn be the set of clients such that

Bn={j𝒞|(1+δ)n1sCj+Mj<(1+δ)ns}

Note that if (1+δ)m(1+δ), then a network composed of at most m consecutive blocks is still homogeneous. However, applying Algorithm 2 directly to each block would be impossible when the block is a facility-dominant or neighbors of the center from some block may belong to a different block. Moreover, the following fact implies that the neighbor relationships between consecutive blocks are the main point.

Observation 19.

If two neighbors j,j𝒞 belonging to neither the same block nor consecutive blocks, satisfying (1+δ)(Cj+Mj)Cj+Mj, then jNj+ holds when Saving() criteria ε satisfies that ε2δ1+δ.

Proof.

costj(j) Cj+Mj+(2γ)Mj+(γ1)DjCj+Mj1+δ+(2γ)Mj+(γ1)Dj
(12δ1+δ)Cj+(3γ)Mj+(γ1)Dj.

A key idea is that with a sufficiently small δ, the support graph can be segmented almost arbitrarily while still preserving the homogeneous condition, enabling the identification of weak connections between consecutive blocks. Notably, even if two clients jBn and jBn+1 come from consecutive blocks, the costj(j) is not worse than the greedy strategy.

Definition 20.

An interval I is the set of consecutive blocks, up to a maximum 2L for L=2×108. Precisely, I={Bi,,Bi+k} for k<L, where j=ii+kC(Bj)C(Bi)K3j=ii+kF(Bj), except when k=0. The size of an interval I is the number of blocks it contains, denoted as |I|. The reward of interval I is defined as R(I)=j=ii+kC(Bj)C(Bi).

Note that a block is a unique type of interval with a size of 1. Also (1+δ)2L1+δ holds. An interval is the basic unit of clustering to which Algorithm 1 and Algorithm 2 are applied. Therefore, the reward of an interval represents the extent to which a guaranteed Saving can be obtained when the current interval can be clustered using Algorithm 2, regardless of how the preceding intervals have been clustered. From this perspective, the first block, which does not contribute to the reward, serves as a kind of “buffer”.

Given the entire support graph G=(𝒞,E) and a subset 𝒞𝒞, let G𝒞 be the subgraph (network) induced by 𝒞. Consequently, when we call Algorithms 1 or 2 with G𝒞, the calculations for Saving and Spending are performed solely with respect to the implicitly defined client set 𝒞. We denote Algorithm 1 as greedy and Algorithm 2 as homogeneous. For simplicity, when I denotes an interval, we interpret the expression jI as BIjB, aggregating over all clients within the interval. The following theorem illustrates how reward is related to Saving. The proof appears in the extended version of this paper [14].

Lemma 21.

Let J be a set of non-overlapping intervals. Then clustering produced by Algorithm 3 is good on average. Precisely, for ε4=2×1036,

j𝒞costc(j)(j)j𝒞((1IJR(I)j𝒞Cjε4)Cj+(3γ)Mj+(γ1)Dj).
Algorithm 3 conn: Connection-dominant Clustering.

In conclusion, it suffices to find a set of non-overlapping intervals J which have a high R(J) value. Here, we will briefly touch on the idea. We will iterate through the blocks in reverse order (BN,,B0), cutting out suitable ranges that satisfy the interval conditions. Fix a point r to be the right end of the interval, and expand the left end of the current range one block to the left until the range is suitable for processing. Suppose, at some point, the C/F value of the current range is less than K2, which is strictly less ratio of the input instance, K1. Then, this range can be considered a minor part and can be excluded as there is no need for it to become an interval. Therefore, we only consider cases where the current range has a C/F value of at least K2.

However, in the situation where the current range’s reward is small, meaning that the first block must avoid most of the C. In this case, if we keep expanding the range to the left, we will eventually reach the initial block B0, satisfying the interval condition. There remains a subtle issue of the range’s length exceeding 2L during the expansion process, but this implies that the C value on the left side of the range is always exponentially increasing compared to the right side. This can be resolved by appropriately reducing the right side of the range.

Lemma 22.

For any support graph G that is connection-dominant, C>K1F, then the set of non-overlapping intervals J obtained by Algorithm 4 satisfies IJR(I)1105C.

Algorithm 4 cutinterval: Find a set of non-overlapping intervals J with large rewards.

The proof appears in the extended version of this paper [14].

By directly applying Lemma 21 and Lemma 22, Theorem 17 can be proved.

Algorithm 5 Overall bi-factor clustering process.

6 Improved Bifactor Approximation

In this section, we present an improved bifactor approximation algorithm for UFL, proving Theorem 1. To deal with facility-dominant instances, we employ the JMS algorithm [12], which is known to be (1.11,1.7764)-approximation algorithm.

Now, we will show that Algorithm 5 satisfies Theorem 1. Let γ01.6774 be a solution of the below equation.

1e+1eγ(γ1)(11e+(1ε5)1eγ)=0.

To prove Theorem 1, we consider two cases. When CK1F, the inequality 1.11F+1.7764C1.6774F+(1+2e1.6774ε6)C holds. For the case C>K1F, we can employ the same proof structure as the statement proved in [3]222Theorem 4.3 of [3]. The proof appears in the extended version of this paper [14].

7 Improved Unifactor Approximation

In this section, we propose the algorithm that guarantees better unifactor approximation suggested by Li [15], proving Theorem 2.

Framework of [15].

Li showed that a hard instance for a certain γ might not be a hard instance for another value of γ. This suggests that selecting a γ value at random could improve the expected performance of the algorithm.

They introduced a characteristic function h:(0,1] to represent the distribution of distances between a client and its neighboring facilities. For a client j𝒞, assume i1,i2,,ik are the facilities within 𝒞j𝒟j, ordered by increasing distance from j. Then, for 0<p1, hj(p) is defined as d(j,it), where t is the smallest index satisfying l=1tyilp. Also, they improved the bound of the rerouting cost as d(j,𝒞j(𝒞j𝒟j))(2γ)Mj+(γ1)Dj+Cj+Mj. For fixed γ, the expected connection cost for j for the aforementioned algorithm of [3] is at most

𝔼[Cj]01hj(p)eγpγ𝑑p+eγ(γ01hj(p)𝑑p+(3γ)hj(1γ)).

Therefore, it can be modeled as a 0-sum game to analyze the approximation ratio. The characteristic function for the whole instance is given by h(p)=j𝒞hj(p). Assuming h is normalized so that 01h(p)𝑑p=1, the algorithm proceeds as follows: with probability κ, it employs the JMS algorithm [12]. Mahdian [16] proved that the JMS algorithm achieves a (1.11,1.7764)-approximation. Otherwise, γ is sampled randomly from the distribution μ:(1,], ensuring that κ+1μ(γ)𝑑γ=1. Thus, the value of the 0-sum game, i.e., the approximation ratio of the algorithm under a fixed strategy (κ,μ), is calculated as follows:

ν(κ,μ,h)=max{1γμ(γ)𝑑γ+1.11κ,1α(γ,h)μ(γ)𝑑γ+1.7764κ}

where

α(γ,h)=01h(p)eγpγ𝑑p+eγ(γ+(3γ)h(1γ)).

Moreover, for a given probability density function μ for γ, it can be shown that the characteristic function for the hardest instance is a threshold function, which defined as

hq(p)={11q,for p>q0,for pq

for some 0q<1. This means that the final approximation ratio for some μ is given by max0q<1ν(κ,μ,hq). In [15], the suggested distribution for αLi is μ(p)=θD(pγ1)+1κθγ2γ1[γ1<p<γ2], where D is Dirac delta function, γ1=1.479311, γ2=2.016569, θ=0.503357, κ=0.195583.

Our Improvement.

By using Algorithm 5, the expected connection cost for some γ and characteristic function h is given as follows:

α(γ,h)={01h(p)eγpγ𝑑p+eγ(γ(1ε7)+(3γ)h(1γ)),if 1.6<γ<201h(p)eγpγ𝑑p+eγ(γ+(3γ)h(1γ)).otherwise

Therefore, we define a new 0-sum game value as follows, assuming h is scaled up such that 01h(p)𝑑p=1.

ν(μ,θ,h)=max{1γμ(γ)𝑑γ+1.11κ,1α(γ,h)μ(γ)𝑑γ+1.7764κ}.

Note that α is still linear for h. Therefore, even if the game definition changes, the adversary’s choice of the characteristic function h remains a threshold function hq for some 0q<1. Given that there is a positive probability of sampling γ between 1.6 and 2, it is possible to achieve a lower cost.

Lemma 23.

Let μ2(γ)=(1ε7)μ1(γ)+ε7(1κ2)D(γ1), where D is a Dirac-delta function. Then the following holds:

max0q<1max{1γμ2(γ)𝑑γ+1.11κ2,1α(γ,hq)μ2(γ)𝑑γ+1.7764κ2}
< max0q<1max{1γμ1(γ)𝑑γ+1.11κ2,1α(γ,hq)μ1(γ)𝑑γ+1.7764κ2}ε71000.

The proof appears in the extended version of this paper [14].

Therefore, we present an improved unifactor approximation algorithm, Algorithm 6.

Algorithm 6 Overall uni-factor clustering process.

When CK1F, an inequality 1.11F+1.7764C1.487F+1.487C is satisfied. For the case C>K1F, by Lemma 23, Algorithm 6 shows an improved unifactor approximation with ε8=ε71000=2×1045.

8 APX-Hardness

In this section, we prove that UFL is APX-Hard in Euclidean spaces. We use the following result from Austrin, Khot, and Safra [2]. For ρ[1,+1] and μ[0,1], let Γρ(μ):=Pr[XΦ1(μ)YΦ1(μ)] where X,Y are standard Gaussian random variables with covariance ρ and Φ is the cumulative density function of the standard normal distribution.

Theorem 24.

Assuming the Unique Games Conjecture, for any q(0,1/2) and ε>0, it is NP-hard to, given a graph G=(V,E), distinguish between the following two cases.

  • (Completeness) G contains an independent set of size q|V|.

  • (Soundness) For any TV, the number of edges with both endpoints in T is at least |E|(Γq/(1q)(μ)ε) where μ=|T|/|V|.

Fix an arbitrary q(0,1/2). Without loss of generality, assume V=[n]. Also let m:=|E|. Our UFL instance has V as the set of facilities and E as the set of clients. The ambient Euclidean space is n, and let ei be the ith standard unit vector (i.e., (ei)i=1 and (ei)j=0 for every ji). Then each facility iV is located at ei and each client (i,j)E is located at ei+ej. Finally, let λ be the common facility cost for every i to be determined. This finishes the description of the UFL instance.

In the completeness case, there is an independent set U of size qn. We open VU. Since VU is a vertex cover, every client in E has a facility at distance 1, so the total cost is

λ(1q)n+m.

In the soundness case, consider any solution that opens SV and let T=VS and μ=|T|/n. By the soundness guarantee, at least m(Γq/(1q)ε) clients do not have a facility at distance 1. Since every client-facility distance is either 1 or 3, the total cost is at least

λ(1μ)n+m(1+(31)(Γq/(1q)(μ)ε)). (2)

For fixed q, the function Γq/(1q)(μ) is a strictly convex function of μ, so if we let λ such that

λn+m(31)dΓq/(1q)(μ)dμ|μ=q=0,

then (2) is minimized when when μ=q, which becomes

λ(1q)n+m(1+(31)(Γq/(1q)(q)ε)).

Furthermore, we can notice that

λ=mn(31)dΓq/(1q)(μ)dμ|μ=q=Θ(mn).

Then one can see that the optimal value in the soundness case is at least m(31)(Γq/(1q)(q)ε) larger than the optimal value in the completeness case. For a fixed q, by choosing ε sufficiently small, one can ensure that this excess is at least a δ fraction of the completeness case optimal value for some constant δ>0, which proves a (1+δ)-hardness of approximation.

9 Conclusion

The most natural open problem is to get an improved approximation for bifactor or unifactor approximation for UFL. Though we show a strict separation between general and Euclidean metrics for bifactor approximation in a certain regime, it is not achieved for all regimes of bifactor or unifactor approximation.

Whereas our algorithm is based on the primal rounding approach of [3] and [15], it might be a fruitful research direction to design a variant of the Jain-Mahdian-Saberi algorithm [12] (greedy algorithm analyzed by the dual fitting method) or Jain-Vazirani [13] (primal-dual algorithm) for a further improvement. In particular, as the best unifactor approximation for UFL in both general and Euclidean metrics employ the (1.11,1.7764)-approximation of the JMS algorithm as a black box, improving the JMS algorithm will directly yield a better result for the best unifactor approximation for UFL. The JV algorithm was already improved in Euclidean spaces [1, 10, 4], but they are not enough for UFL.

References

  • [1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. SIAM Journal on Computing, 49(4):FOCS17–97, 2019. doi:10.1137/18M1171321.
  • [2] Per Austrin, Subhash Khot, and Muli Safra. Inapproximability of vertex cover and independent set in bounded degree graphs. Theory of Computing, 7(1):27–43, 2011. doi:10.4086/TOC.2011.V007A003.
  • [3] Jaroslaw Byrka and Karen Aardal. An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem. SIAM Journal on Computing, 39(6):2212–2231, 2010. doi:10.1137/070708901.
  • [4] Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan. Improved approximations for euclidean k-means and k-median, via nested quasi-independent sets. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1621–1628, 2022. doi:10.1145/3519935.3520011.
  • [5] Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximation schemes for clustering in doubling metrics. Journal of the ACM (JACM), 68(6):1–34, 2021. doi:10.1145/3477541.
  • [6] Vincent Cohen-Addad and CS Karthik. Inapproximability of clustering in lp metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 519–539. IEEE, 2019.
  • [7] Vincent Cohen-Addad, Karthik C S, and Euiwoong Lee. Johnson coverage hypothesis: Inapproximability of k-means and k-median in p-metrics. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1493–1530. SIAM, 2022. doi:10.1137/1.9781611977073.63.
  • [8] Vincent Cohen-Addad Viallat, Fabrizio Grandoni, Euiwoong Lee, and Chris Schwiegelshohn. Breaching the 2 lmp approximation barrier for facility location with applications to k-median. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 940–986. SIAM, 2023.
  • [9] Kishen N Gowda, Thomas Pensyl, Aravind Srinivasan, and Khoa Trinh. Improved bi-point rounding algorithms and a golden barrier for k-median. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 987–1011. SIAM, 2023. doi:10.1137/1.9781611977554.CH38.
  • [10] Fabrizio Grandoni, Rafail Ostrovsky, Yuval Rabani, Leonard J Schulman, and Rakesh Venkat. A refined approximation for euclidean k-means. Information Processing Letters, 176:106251, 2022. doi:10.1016/J.IPL.2022.106251.
  • [11] Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. Journal of algorithms, 31(1):228–248, 1999. doi:10.1006/JAGM.1998.0993.
  • [12] Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 731–740, 2002. doi:10.1145/509907.510012.
  • [13] Kamal Jain and Vijay V Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM (JACM), 48(2):274–296, 2001. doi:10.1145/375827.375845.
  • [14] Euiwoong Lee and Kijun Shin. Facility location on high-dimensional euclidean spaces, 2024. arXiv:2501.18105.
  • [15] Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Information and Computation, 222:45–58, 2013. doi:10.1016/J.IC.2012.01.007.
  • [16] Mohammad Mahdian, Yinyu Ye, and Jiawei Zhang. A 1.52-approximation algorithm for the uncapacitated facility location problem. In Proc. of APPROX, pages 229–242, 2002.
  • [17] Robert Alexander Rankin. The closest packing of spherical caps in n dimensions. Glasgow Mathematical Journal, 2(3):139–144, 1955.