Abstract 1 Introduction 2 Background on Algorithmic Scatter Dimension 3 Bicriteria FPT Approximation Scheme 4 Extensions of the Bicriteria FPT-AS 5 Coreset 6 Conclusion References

Dimension-Free Parameterized Approximation Schemes for Hybrid Clustering

Ameet Gadekar ORCID CISPA Helmholtz Center for Information Security, Saarbrücken, Germany Tanmay Inamdar ORCID Indian Institute of Technology Jodhpur, India
Abstract

Hybrid k-Clustering is a model of clustering that generalizes two of the most widely studied clustering objectives: k-Center and k-Median. In this model, given a set of n points P, the goal is to find k centers such that the sum of the r-distances of each point to its nearest center is minimized. The r-distance between two points p and q is defined as max{𝖽𝗂𝗌𝗍(p,q)r,0} – this represents the distance of p to the boundary of the r-radius ball around q if p is outside the ball, and 0 otherwise. This problem was recently introduced by Fomin et al. [APPROX 2024], who designed a (1+ε,1+ε)-bicrtieria approximation that runs in time 2(kd/ε)O(1)nO(1) for inputs in d; such a bicriteria solution uses balls of radius (1+ε)r instead of r, and has a cost at most 1+ε times the cost of an optimal solution using balls of radius r.

In this paper we significantly improve upon this result by designing an approximation algorithm with the same bicriteria guarantee, but with running time that is FPT only in k and ε – crucially, removing the exponential dependence on the dimension d. This resolves an open question posed in their paper. Our results extend further in several directions. First, our approximation scheme works in a broader class of metric spaces, including doubling spaces, minor-free, and bounded treewidth metrics. Secondly, our techniques yield a similar bicriteria FPT-approximation schemes for other variants of Hybrid k-Clustering, e.g., when the objective features the sum of z-th power of the r-distances. Finally, we also design a coreset for Hybrid k-Clustering in doubling spaces, answering another open question from the work of Fomin et al.

Keywords and phrases:
Clustering, Parameterized algorithms, FPT approximation, k-Median, k-Center
Funding:
Ameet Gadekar: Partially supported by the Israel Science Foundation (grant No. 1042/22).
Tanmay Inamdar: Supported by IITJ Research Initiation Grant (grant number I/RIG/TNI/20240072).
Copyright and License:
[Uncaptioned image] © Ameet Gadekar and Tanmay Inamdar; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Facility location and clustering
; Theory of computation Fixed parameter tractability
Related Version:
Full Version: https://arxiv.org/abs/2501.03663
Acknowledgements:
Tanmay would like to thank his co-authors from [23] – specifically Fedor V. Fomin – for formulating Hybrid k-Clustering problem and introducing him to it. This work was partially carried out while Ameet was at Bar-Ilan University, Israel.
Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim Thắng

1 Introduction

k-Center, k-Median, and k-Means are among the most popular clustering problems both in theory and practice, with numerous applications in areas such as machine learning [36, 28, 6, 11, 4, 9, 38, 25], facility location problems [32, 41, 30, 29, 35, 16], and computational geometry [7, 34], among others.111These citations are supposed not to be comprehensive, but representative–an interested reader may use them as a starting point for the relevant literature. All of these problems have long been known to be NP-hard, even in the plane [22, 40, 37]. To cope with these intractability results, there has been several decades of research on designing approximation algorithms for these problems, that have lead to polynomial-time constant-factor approximation algorithms for all of them in general metric spaces [27, 20]. Moreover, for more “structured” inputs – such as when the points belong to Euclidean spaces, or planar graph metrics – one can find near-optimal solutions using approximation schemes [24, 17, 19, 5, 34, 7, 2].222This includes polynomial-time as well as FPT approximation schemes. Furthermore, given the simplicity and ubiquity of these vanilla clustering objectives, the techniques used for obtaining these results are subsequently used to find clustering with additional constraints, such as capacities [18], fairness [8], and outliers [31, 3].1 One recurring theme in the literature has been to define a unified clustering objective that also captures classical clustering problems like k-Median, k-Center as special cases [42, 12, 15].

Along this line of research, Fomin et al. [23] recently introduced Hybrid k-Clustering. In this problem, we are given an instance =(M=(P,𝔽,𝖽𝗂𝗌𝗍),k,r), where M=(P,𝔽,𝖽𝗂𝗌𝗍) is a metric space333Fomin et al. [23] only considered the special case of Euclidean inputs, i.e., when P𝔽=d; however, the underlying problem can be defined for arbitrary metric spaces. on the set of clients P and a set of facilities 𝔽 with distance function 𝖽𝗂𝗌𝗍. Here, k is a positive integer denoting the number of clusters, and r is a non-negative real denoting the radius. The goal is to find a set X𝔽 of k centers, such that, 𝖼𝗈𝗌𝗍r(P,X)pP𝖽𝗂𝗌𝗍r(p,X) is minimized – here, for any point p, any set QP𝔽, and a real α, 𝖽𝗂𝗌𝗍α(p,Q) is defined as 𝖽𝗂𝗌𝗍α(p,Q)max{𝖽𝗂𝗌𝗍(p,Q)α,0}, and 𝖽𝗂𝗌𝗍(p,Q)minqQ𝖽𝗂𝗌𝗍(p,q). Fomin et al. [23] proposed several motivations for studying this cost function. Indeed, Hybrid k-Clustering can be seen as a shape fitting problem, where one wants to find the “best” set of k balls of radius r that fit the given set of points, where the quality of the solution is determined by the sum of distances of each point to the nearest point on the boundary of the ball – this is analogous to the classical regression, where one wants to find the “best” linear function fitting the given set of points. Projective Clustering is a well-studied generalization of linear regression, where the aim is to find k affine spaces that minimizes the sum of distances from the points to these spaces (see e.g., [43]), which is also closely related to the model considered in Hybrid k-Clustering. Furthermore, [23] gave another motivation for studying Hybrid k-Clustering– namely, placing k WiFi routers with identical circular coverage, where the clients that lie outside the coverage area need to travel to the boundary of the nearest ball in order to receive coverage. Finally, the name of the problem is motivated from the fact that the objective is a kind of “hybrid” between the k-Center and k-Median costs, and generalizes both of them. Indeed, as observed in [23], Hybrid k-Clustering with r=0 is equivalent to k-Median, while, when r is set to be the optimal radius for k-Center, Hybrid k-Clustering reduces to k-Center. These observations immediately rule out uni-criteria polynomial-time approximation schemes that violate only the cost, or only the radius by any arbitrary factor α>1 (as formalized in Proposition 1 of [23]). Indeed, such approximation algorithms – even with running times FPT in k – would imply exact FPT algorithms for k-Center and k-Median in low-dimensional continuous Euclidean spaces. To our knowledge, such an algorithm for k-Median is not known (nor is admittedly a lower bound), whereas [39] shows W[1]-hardness of k-Center even in 2. Therefore, given the current state of the art, a bicriteria approximation scheme for Hybrid k-Clustering is essentially the best outcome, even if one allows a running time that is FPT in k.

For α,β1, an (α,β)-bicrteria approximation for Hybrid k-Clustering is an algorithm that returns a solution X𝔽 of size at most k satisfying 𝖼𝗈𝗌𝗍βr(P,X)α𝖮𝖯𝖳r. Note that the bicriteria solution X is allowed to consider 𝖽𝗂𝗌𝗍βr(p,X)=max{𝖽𝗂𝗌𝗍(p,X)βr,0} instead of 𝖽𝗂𝗌𝗍r(p,X) and is also allowed to find a solution of cost α𝖮𝖯𝖳r, where 𝖮𝖯𝖳r is the cost of an optimal solution w.r.t. radius r, i.e., without any violation of the radius. The main result of Fomin et al. [23] was an (1+ε,1+ε)-bicriteria approximation for inputs in d in time 2(kd/ε)O(1)nO(1), where n=|P|. An exponential dependence on the dimension d appears inevitable using their approach, although it was not clear whether such a dependence is required. Indeed, for Euclidean k-Median and k-Center, one can obtain dimension-free approximation schemes with FPT dependence only on k and ε [7, 33, 1]. This naturally motivates the following question, which was explicitly asked as an open question in [23].

Question 1. “An immediate question is whether improving or removing the FPT dependence on the dimension d is possible[…]”

In this work, we answer this question in the affirmative by designing a (randomized) bicriteria FPT Approximation Scheme (FPT-AS) parameterized by k and ε444Such algorithms run in time f(k,ε)nO(1) and output a (1+ε,1+ε)-bicriteria solution. Another term for FPT-AS is EPAS, which stands for Efficient Parameterized Approximation Schemes. for the (continuous) Euclidean instances of Hybrid k-Clustering, stated formally in the following theorem.

Theorem 1 (Bicriteria FPT-AS for Euclidean Spaces).

There exists a randomized algorithm, that, given an instance of Hybrid k-Clustering in d for any dimension d, runs in time 2O(klogk(1/ε5)log2(1/ε))n𝒪(1), and returns a (1+ε,1+ε)-bicrtieria approximation with high probability.

The algorithm of [23] involves a pre-processing step that separately handles “k-Median-like”, “k-Center-like” instances, using the techniques specific to these respective problems. Then, the main algorithm handles the remaining instances that cannot be directly reduced to the respective problems. This approach somewhat undermines the goal of defining a unified problem that captures both of these problems as special cases. In contrast, our algorithm adopts a uniform approach by exploiting the intrinsic structure of the hybrid problem, without the need to separately handle some instances. In fact, our algorithm works for a broader class of metric spaces, called metric spaces of bounded algorithmic scatter dimension – a notion recently introduced by Abbasi et al. [1], and further studied by Bourneuf and Pilipczuk [10]. These metric spaces capture several interesting and well-studied metric spaces, see below. We give a formal definition of the notion of algorithmic scatter dimension in Section 2; however, a good mental picture to keep is to think of them as essentially doubling spaces, i.e., metric spaces with good “packing-covering” properties.555Although this is a good mental picture, it is ultimately inaccurate since the class of metric spaces of bounded algorithmic scatter dimension is strictly larger than that of doubling spaces. Indeed, continuous Euclidean space of high (ω(logn)) dimension does not have bounded doubling dimension, yet it does have bounded algorithmic scatter dimension. Our general result is stated below.

Theorem 2 (Informal version of Theorem 9).

Hybrid k-Clustering admits a randomized bicriteria FPT-AS in metrics of bounded algorithmic ε-scatter dimension, parameterized by k and ε. In particular, Hybrid k-Clustering admits randomized bicriteria FPT-AS parameterized by k and ε, in continuous Euclidean spaces of any dimension, metrics of bounded doubling dimension, bounded treewidth metrics, and by graphs from any fixed proper minor-closed graph class.

We give a technical overview of this result in Section 1.1, and describe the approximation algorithm and its analysis in Section 3.

In their work, [23] also defined Hybrid (k,z)-Clustering problem, which features the z-th power of r-distances, i.e., the objective is to minimize 𝖼𝗈𝗌𝗍r(P,X,z)pP(𝖽𝗂𝗌𝗍r(p,X))z, where z1. They designed a bicriteria FPT-AS with similar running time for Hybrid (k,2)-Clustering, i.e., a hybrid of k-Means and k-Center; but left the possibility of obtaining a similar result for the general case of z1 conditional on the existence of a certain sampling-based approximation algorithm for the (k,z)-clustering problem for Euclidean inputs. Using our approach, we can obtain a bicriteria FPT-AS for any fixed z1 in a unified way, whose running time is independent of the dimension d. In fact, our approach works for a much more general problem of Hybrid Norm k-Clustering. We discuss these extensions in Section 4.

Next, we turn to another open direction mentioned in the work of Fomin et al. [23].

Question 2. “Another intriguing question is the design of coresets for Hybrid k-Clustering, which could also have some implications for [Question 1].”

In this paper, we also answer this question affirmatively by designing coresets for Hybrid k-Clustering in metric spaces of bounded doubling dimension. Specifically, we prove the following result.

Theorem 3 (Coreset for Hybrid k-Clustering).

There exists an algorithm that takes as input an instance =((P,𝔽,𝖽𝗂𝗌𝗍),k,r) of Hybrid k-Clustering in doubling metric of dimension d and a parameter ε(0,1), and in time 2O(dlog(1/ε))||O(1) returns a pair (P,w), where PP has size 2O(dlog(1/ε)klog|P|, and w:P, such that the following property is satisfied for any X𝔽 of size at most k:

|𝗐𝖼𝗈𝗌𝗍r(P,X)𝖼𝗈𝗌𝗍r(P,X)|ε𝖼𝗈𝗌𝗍r(P,X)

Here, 𝗐𝖼𝗈𝗌𝗍r(P,X)pPw(p)𝖽𝗂𝗌𝗍r(p,X).

1.1 Technical Overview

In this section, we highlight our technical and conceptual contributions of our main result (Theorem 2).

A natural direction to design dimension-free parameterized approximation algorithm for Hybrid k-Clustering is to leverage insights from the framework of Abbasi et al. [1], who designed dimension-free parameterized approximations for wide range of clustering problems. However, in Hybrid k-Clustering, the objective is a function of r-distances, instead of true distances, as in [1]. This introduces several technical challenges. For instance, a most obvious roadblock is that the r-distances do not satisfy triangle inequality. Moreover, many of the “nice” properties that are enjoyed by the true distances in “structured” metric spaces (e.g., covering-packing property in doubling spaces, or properties of shortest paths in sparse graphs) do not extend readily to the r-distances. Therefore, we have to overcome several technical and conceptual challenges in order to leverage the ideas presented in the framework of [1]. Let us first recall the key notion from this paper that is crucially used in this framework.666In fact, the algorithm of our paper and [1] works for a weaker notion called algorithmic scatter dimension. But, for ease of exposition, we work with scatter dimension in this section.

Scatter Dimension.

Informally, an ε-scattering sequence in a metric space M=(P,𝔽,𝖽𝗂𝗌𝗍) is a sequence of center-point pairs (x1,p1),,(x,p), where is some positive integer and for j[],xj𝔽, pjP such that 𝖽𝗂𝗌𝗍(pj,xi)1, for all 1j<i[] and 𝖽𝗂𝗌𝗍(pi,xi)>(1+ε), for all i[]. The ε-scatter dimension of M is the maximum length of ε-scattering sequence contained in M.

Now, consider a natural extension of the framework [1] for Hybrid k-Clustering in a metric space M as follows. Let 𝖮𝖯𝖳r be the optimal cost of the Hybrid k-Clustering instance corresponding to an optimal solution O. The algorithm maintains cluster constraint Qi for each cluster i[k]. Each Qi consists of a collection of requests of the form (p,δ), where p is a point and δ is a distance, representing the demand that p requires a center within distance δ. The algorithm always maintains a solution X such that xiX satisfy Qi cluster constraint, for all i[k]. Now consider the sequence of triples Si=(xi(1),pi(1),δi(1)),,(xi(),pi(),δi()) corresponding to requests in Qi, where xi(j) is the ith center maintained by the algorithm just before adding request (pi(j),δi(j)) to Qi. If X is a near-optimal solution, then the algorithm terminates successfully and returns X. Otherwise, it identifies a point pP whose r-distance to X is much larger than its r-distance to O. Such a point is called a witness to X. A simple averaging argument shows that such a witness can be sampled with high probability. The algorithm then guesses the optimal cluster i[k] of p and add a new request (p,δp=𝖽𝗂𝗌𝗍(p,X)/(1+ε)) to Qi, for some suitable but fixed ε depending on ε. The center xi is recomputed to satisfy the updated cluster constraint Qi, if possible. Otherwise the algorithm reports failure.777E.g., if the algorithm failed to sample a witness. The key observation is that the sequence of pairs (xi(j1),pi(j1)),,(xi(j),pi(j)) for a fixed radius δi(j) forms an ε-scattering sequence in M. Thus, if the ε-scatter dimension of M is bounded then the length of these sequences are also bounded. Furthermore, it can be shown that if the aspect ratio of the radii of requests in Qi for all i[k] is bounded, then the number of iterations of the algorithm can be bounded, yielding an FPT-AS.

Working with the inflated radius.

A major challenge that arises when the witness is defined in terms of r-distances, but the requests added to the cluster constraints are based on the true distances. Specifically, we need to ensure that a witness whose r-distance to X is significantly larger than its r-distance to O also has a larger true distance to X than to O. This ensures that the request (p,𝖽𝗂𝗌𝗍(p,X)/(1+ε)) is feasible and the algorithm does not fail. However, this condition may not hold, especially when the true distances are close to r. In fact, the issue is related to a fundamental barrier identified by [23], assuming standard conjectures.888Such a framework could potentially yield a near-optimal solution that does not violate the radius, contradicting Proposition 1 in [23]. To overcome this fundamental barrier and maintain a sufficient gap between the true distances, we consider the cost of X with respect to (1+ε)r radius. In other words, we look for a solution X whose (1+ε)r-cost is close to 𝖮𝖯𝖳r. In this case, we redefine a witness as a point in P whose (1+ε)r-distance to X is much larger than its r-distance to O. These insights allow us to establish a gap between the true distance of a witness to X and its true distance to O, thereby justifying the requests added by the algorithm.

Bounding the aspect ratio of the radii.

Abbasi et al. [1] bound the aspect ratio of the radii by (i) initializing the solution using “good” upper bounds, and (ii) sampling witness from the points that have comparable distance to X with respect to the corresponding upper bounds. The first step guarantees that the solution X maintained by the algorithm always satisfies the upper bounds within constant factor. This together with the second step allows one to bound the aspect ratio of radii in each Qi. They show that such witnesses have a good probability mass if the points are sampled proportional to their true distances, and hence the algorithm successfully finds a witness for X with probability that is a function of k and ε. Although, devising feasible and “good” upper bounds for Hybrid k-Clustering can be done with some efforts, the main challenge arises in the second step, which only guarantees to sample a point whose (1+ε)r-distance to X is comparable with its upper bound. As mentioned before, these (1+ε)r-distances can be much smaller than the corresponding true distances, and hence there is no guarantee that true distance of a witness p to current solution X is comparable to its upper bound. Towards this, we introduce a novel idea of dividing witnesses into two sets – “nearby witness” set, i.e., the set of points within distance O(r/ε) from X, and “faraway witness” set, which are beyond this distance from X. The observation is that, since the total probability mass on the witness set, now defined using (1+ε)r-distances, is still high, it must be that either the nearby or the faraway witnesses have sufficient probability mass. However, since we do not know which of the two sets has enough mass, we perform a randomized branching (i.e., make a guess) – which will be “correct” with probability 1/2. Then, when a point is sampled proportional to its (1+ε)r-distance from either the “nearby” or “faraway” set, it will be a witness with good probability. Now consider each of two cases separately. Since for a nearby witness p, its true distance to X is at least (1+ε)r and at most O(r/ε), the aspect ratio of the radii of requests corresponding to nearby witness set is bounded by O(1/ε). On the other hand, for requests corresponding to faraway witness set, we show that their radii lie in bounded interval, using ideas similar to [1]. Note that, these two arguments imply that the radii of the requests lie in two (possibly disjoint) intervals that themselves are bounded. However, it is not clear if the length of these requests is bounded, unlike [1], where the radii belonged to a single interval of bounded aspect ratio. Nevertheless, we observe that, using the techniques of [1], the length of requests can be bounded, even when the radii lie in constantly many (here, two) intervals, each with a bounded aspect ratio.

2 Background on Algorithmic Scatter Dimension

In this paper, we consider metric (clustering) space M=(P,𝔽,𝖽𝗂𝗌𝗍), where P is a finite set of n points, 𝔽 is a (possible infinite) set of potential cluster centers, and 𝖽𝗂𝗌𝗍 is a metric on (P𝔽). A class of metric spaces is a (possibly infinite) set of metric spaces.

In this paper, we work with a notion that is weaker (and hence more general) than ε-scatter dimension that was defined in the overview, called algorithmic ε-scatter dimension, which we explain next. To this end, we first need the following search problem.

Definition 4 (Ball Intersection Problem).

Let be a class of metric spaces with possibly infinite set of centers. Given M=(P,𝔽,𝖽𝗂𝗌𝗍), a finite set QP×+ of distance constraints, and an error parameter η>0, the Ball Intersection problem asks to find a center xF that satisfy all distance constraints within η multiplicative error, i.e., 𝖽𝗂𝗌𝗍(x,p)(1+η)δ, for every (p,δ)Q, if such a center exists, and report failure otherwise.

We say admits a Ball Intersection algorithm if it correctly solves the ball intersection problem for every metric space in M and runs in polynomial time in the size of M and 1/η.

Now, we are ready to define algorithmic scatter dimension.

Definition 5 (Algorithmic ε-Scatter Dimension).

Given a class of metric spaces with Ball Intersection algorithm 𝒞, a space M, and ε(0,1), a (𝒞,ε)-scattering sequence is a sequence (x1,p1,δ1),,(x,p,δ), where is some positive integer, and for i[],xi𝔽, piP and δi+ such that

  • (Covering by 𝒞) xi=𝒞(M,{(p1,δ1),,(pi1,δi1)},ε/2)2i

  • (ε-refutation)   𝖽𝗂𝗌𝗍(xi,pi)>(1+ε)δi   i[]

The algorithmic (ε,𝒞)-scatter dimension of is λ(ε) if any (𝒞,ε)-scattering sequence contains at most λ(ε) many triples per radius value. The algorithmic ε-scatter dimension of is the minimum algorithmic (ε,𝒞)-scatter dimension over any Ball Intersection algorithm 𝒞 for .

Although, algorithmic scatter dimension restricts the number of triples in the sequence with same radius value, we can use the proof technique from [1] to prove the following stronger guarantee. Results marked with can be found in the full version of the paper.

Lemma 6 ().

Let be a class of metric spaces of algorithmic ε-scatter dimension λ(ε). Then there exists a Ball Intersection algorithm 𝒞 with the following property. Given ε(0,1), a constant t1, and ai>0,τi2 for i[t], any (𝒞,ε)-scattering contains O(i[t]λ(ε/2)(logτi)/ε) many triples whose radii lie in the interval i[t][ai,τiai].

3 Bicriteria FPT Approximation Scheme

3.1 Algorithm

Algorithm 1 Approximation Scheme for Hybrid k-Clustering.

Our bicrteria FPT-AS for Hybrid k-Clustering is formally stated in Algorithm 1. As an input, we are given an instance =((P,𝔽,𝖽𝗂𝗌𝗍),k,r) of Hybrid k-Clustering, an accuracy parameter ε, access to an algorithm 𝒞 for the so-called “Ball Intersection” problem (discussed later), and a guess 𝒢 for the optimal cost 𝖮𝖯𝖳r. By a standard exponential search, we will assume that 𝖮𝖯𝖳r𝒢(1+ε/3)𝖮𝖯𝖳r. At a high level, this algorithm can be divided into two steps: initialization phase and the iterative cost-improvement phase. The initialization phase spans from line 1 to 7.

At a high level, the goal of this phase to compute for each point pP, an upper bound u(p), such that p must have an optimal center within distance u(p). Once we find such upper bounds, we use a subset of them to initialize for each i[k], a set of requests Qi, where each request (p,δp) demands that the ith center in the solution must be within distance at most δp from p in every subsequent solution found by the algorithm.

Now, the algorithm moves to the iterative cost-improvement phase in lines 8 to 20, consisting of a while loop that runs as long as the current solution has not become the bicriteria approximation that we are looking for. Thus, in each iteration of the while loop, we are given a solution X that satisfies all the requests Qi, and yet satisfies 𝖼𝗈𝗌𝗍r(P,X)>(1+ε)𝒢. Then, our algorithm makes a random choice whether there is enough cost-contribution of nearby witnesses, or of faraway witnesses – here a witness is a point p whose distance to X is sufficiently larger than that in the (unknown) optimal solution O. In each of the cases, we sample points from carefully defined sets (cf. N in line 11 and A in 14), proportional to their contribution to the cost to the respective sets (in the analysis, we will argue that with good probability, we will in fact sample witness points). Having sampled such a point p, we guess the index i of the optimal cluster of p in line 17. Finally, assuming p is indeed a witness, we add a request (p,𝖽𝗂𝗌𝗍(p,X)1+ε/12) to the ith request set Qi, and in line 19 we recompute xi using 𝒞. This algorithm either returns a center xi𝔽 that satisfies all the requests (with an error of up to a factor of ε/40), or correctly outputs that there is no point in 𝔽 satisfying all the requests. Note that in discrete metric spaces, 𝒞 can be simulated by simply scanning each x𝔽 and checking whether it satisfies all the requests in Qi. On the other hand, for continuous Euclidean spaces, such an algorithm was designed in [1]. If the algorithm does not fail at this step, then we update our set X and continue to the next iteration. In the rest of the section, we will prove that the algorithm returns a (1+ε,1+ε)-bicriteria approximation with good probability.

3.2 Analysis

Throughout the analysis, we assume that we are given a guess 𝒢, such that 𝖮𝖯𝖳r𝒢(1+ε/3)𝖮𝖯𝖳r. We divide the analysis in two parts. In the first part, we bound the running time of the algorithm using the following lemma. The proof of this lemma is present in Section 3.2.1.

Lemma 7.

Algorithm 1 terminates in O(kεlog(kε)λ(ε40)) iterations – with or without failure.

In the second part, we show the following lemma, which says that the probability that the algorithm terminates without failure is high. The proof the lemma is present in Section 3.2.2.

Lemma 8.

With probability at least exp(O(kεlog(kε)λ(ε40))), Algorithm 1 terminates without failure, i.e., returns a solution X satisfying 𝖼𝗈𝗌𝗍(1+ε/3)r(P,X)(1+ε)𝖮𝖯𝖳r.

Using these two lemmas and repeating the algorithm exp(O(kεlog(kε)λ(ε40))) times, we have our main result.

Theorem 9 (Main Theorem).

Let be a class of metric spaces closed under scaling distances by a positive constant. There is a randomized algorithm that takes as input an instance =((P,𝔽,𝖽𝗂𝗌𝗍),k,r) of Hybrid k-Clustering such that (P,𝔽,𝖽𝗂𝗌𝗍) and ε(0,1), and outputs a (1+ε,1+ε)-bicriteria solution for when, for all ε>0, the algorithmic ε-scatter dimension of is bounded by λ(ε), for some function λ. The running time of the algorithm is 2O(kεlog(k/ε)λ(ε/40))||O(1).

3.2.1 Bounding runtime using Algorithmic Scatter Dimension

First, we show some properties of the initial upper bounds (line 4), that we need later in the proof of Lemma 7.

Lemma 10 (Feasible upper bounds).

Consider Qi={(p(i),u(p(i)))} initialized in Line 4 of Algorithm 1. Then, 𝖽𝗂𝗌𝗍(p(i),O)u(p(i)).

Proof.

Suppose 𝖽𝗂𝗌𝗍(p(i),O)>u(p(i)). Letting α=u(p(i))/3, we have that |𝖻𝖺𝗅𝗅(p,α)|𝒢/α. Since, 𝖽𝗂𝗌𝗍(p(i),O)>3α, we have 𝖽𝗂𝗌𝗍(p,O)>2α for p𝖻𝖺𝗅𝗅(p(i),α). Using α>r, this means 𝖽𝗂𝗌𝗍r(p,O)>α for p𝖻𝖺𝗅𝗅(p,α). Therefore,

𝖼𝗈𝗌𝗍r(P,O)p𝖻𝖺𝗅𝗅(p(i),α)𝖽𝗂𝗌𝗍r(p,O)=p𝖻𝖺𝗅𝗅(p(i),α)(𝖽𝗂𝗌𝗍(p,O)r)>𝒢αα=𝒢,

contradicting the the cost of O.

Next, we have the following lemma, whose proof is identical to [1], that says that the initialization of X at line 6 is successful and the solution maintained by the algorithm always satisfies the upper bounds within a factor of 3.1.

Lemma 11 (Lemma V.5 of  [1]).

The number of marked points in line 3 is at most k, i.e., kk. Hence, the initialization of X at line 6 is successful. Furthermore, at any iteration, the solution X maintained by the algorithm satisfies that 𝖽𝗂𝗌𝗍(p,X)3.1u(p), for every pP.

Bounding aspect ratio of radii

Towards proving Lemma 7, we bound the aspect ratio of the radii in the requests.

Lemma 12.

Consider a request set Qi={(pi(1),δi(1)),,(pi(),δi())}, for i[k]. Let X(j),j[], be the center maintained by the algorithm just before adding the request (pi(j),δi(j)) to Qi. Further, let xi(j)X(j) be the center corresponding to cluster i[k]. Then, the sequence Si=(xi(1),pi(1),δi(1)),,(xi(),pi(),δi()) is an algorithmic (𝒞,ε/20)-scattering. Furthermore, the radii of requests in Si lie in the interval [r,8r/ε][rmin,105kε2rmin], where rmin is the smallest radii in Si that is larger than 8r/ε.

Proof.

The proof of the first part is similar to [1].

First note that δi(j)=𝖽𝗂𝗌𝗍(pi(j),X(j))1+ε/12𝖽𝗂𝗌𝗍(pi(j),xi(j))1+ε/12. Finally, since xi(j) is computed by 𝒞 on {(pi(1),δi(1)),,(pi(j1),δi(j1))} and error parameter ε/40, we have that 𝖽𝗂𝗌𝗍(pi(j),xi(j))(1+ε/40)δi(j), for j<j. Hence, Si is (𝒞,ε/20)-algorithmic scattering.

Now we bound that the radii of sequence Si,i[k]. Towards this, we partition Si into Sin and Sif, based on a partitioning of Qi, as follows. We say a request (pi(j),δi(j))Qi belongs to Qin if pi(j) was sampled when the If condition was satisfied (in Line 12), otherwise it belongs to Qif, which corresponds to the Else condition (in Line 15). Correspondingly, (xi(j),pi(j),δi(j))Sin if (pi(j),δi(j))Qin and (xi(j),pi(j),δi(j))Sif if (pi(j),δi(j))Qif. We bound the aspect ratio of each part, Sin and Sif, separately. For (xi(j),pi(j),δi(j))Sin, we have rδi(j)8r/ε and hence, the radii of triples in Sin lie in interval [r,8r/ε]. Now consider (xi(j),pi(j),δi(j))Sif. Recall that X(j) is the solution maintained by the algorithm when (pi(j),δi(j)) is added to Qi. Then, note that 𝖽𝗂𝗌𝗍r(pi(j),X(j))>εu(p)/1000k (see Line 14), and hence, we have 𝖽𝗂𝗌𝗍(pi(j),X(j))>εu(p)/1000k. The following claim, whose proof is similar to [1], shows that the radii of requests in Sif are also bounded, finishing the proof of the lemma.

Claim 13.

Consider requests (p,δp),(p,δp) added (in any order) to Qi in Line 18 of Algorithm 1 such that (p,δp),(p,δp)Qif. If δp<ε2δp/105k, then the algorithm fails in Line 19 upon making second of the two requests.

Proof.

Suppose, for the sake of contradiction, the algorithm does not fail and finds a center xi such that 𝖽𝗂𝗌𝗍(p,xi)(1+ε/40)δp and 𝖽𝗂𝗌𝗍(p,xi)(1+ε/40)δp. Thus, 𝖽𝗂𝗌𝗍(p,p)𝖽𝗂𝗌𝗍(p,xi)+𝖽𝗂𝗌𝗍(p,xi)(1+ε/40)(δp+δp).

Therefore, we have

𝖽𝗂𝗌𝗍(p,X)𝖽𝗂𝗌𝗍(p,p)+𝖽𝗂𝗌𝗍(p,X)(1+ε/40)(δp+δp)+3.1u(p),

using Observation 11. Let X be the center maintained by the algorithm when the request (p,δp) is added to Qi. Then, we have

δp=𝖽𝗂𝗌𝗍(p,X)(1+ε/12), (1)

due to Line 18. Similarly, let X be the center maintained by the algorithm when request (p,δp) is added to Qi. Then, since (p,δp)Qif, we have that u(p)900𝖽𝗂𝗌𝗍(p,X)/ε=1000kδp/ε, using δp=𝖽𝗂𝗌𝗍(p,X)(1+ε/12). Therefore, using δp<ε2δp/105k,

𝖽𝗂𝗌𝗍(p,X) (1+ε/40)(δp+δp)+3000kδp/ε(1+ε/40)δp+3200kδp/ε
(1+ε/40+ε/25)δp<(1+ε/12)δp

This means δp>𝖽𝗂𝗌𝗍(p,X)1+ε/12, contradicting (1). Now we are ready to finish the proof of Lemma 7.

Proof of Lemma 7.

We apply Corollary 6 to Si and note that the radii in Si lie in [r,8r/ε][rmin,105kε2rmin], where rmin is the smallest radii in Si that is larger than 8r/ε, due to Lemma 12. This implies that the length of sequence Si is bounded by O(λ(ε/40)(log(k/ε))ε). Since in each iteration the algorithm adds one request to some Qi, the total number of iterations is bounded by O(kελ(ε/40)(log(k/ε)).

3.2.2 Bounding success probability

The goal of this subsection is to prove Lemma 8, i.e., give a lower bound on the success probability of the algorithm. To this end, we first introduce the following notion.

Definition 14 (Consistency).

Consider a fixed hypothetical optimal solution O=(o1,,ok). We say that the current state of execution (specified by (X,Q1,,Qk)) of Algorithm 1 is consistent with O if for any request (p,δ)Qi,i[k], we have 𝖽𝗂𝗌𝗍(p,oi)δ.

Note that, Lemma 11 implies that the initial set of requests (line 4) are feasible. Since the balls 𝖻𝖺𝗅𝗅(p(i),u(p(i))),i[k] are disjoint, we can relabel the optimum centers O={o1,,ok} so that 𝖽𝗂𝗌𝗍(p(i),oi)u(p(i)). Thus, Lemma 11 implies that the initial state of the algorithm as computed in lines 1 to 6 is consistent with a fixed optimal solution O which is fixed henceforth. To prove Lemma 8, we will inductively argue that the state of the algorithm remains consistent with O – note that the base case is already shown. Most of this subsection is devoted to show the inductive step, i.e., to show that, given a consistent state at the start of an iteration, there is a good probability that the state of the algorithm at the end of the iteration is also consistent (cf. Lemma 22).

To this end, we introduce the following definitions w.r.t. the current solution X.

Definition 15 (Contribution and Witness).

For any set SP, we use CSpS𝖽𝗂𝗌𝗍r(p,X) to denote the contribution of the set S to the cost of the current solution.

We say that a point pP is an ε/3-witness (or simply, a witness) w.r.t. the solution X if it satisfies 𝖽𝗂𝗌𝗍(1+ε/3)r(p,X)>(1+ε/3)𝖽𝗂𝗌𝗍r(p,O).

The following claim can be proved by a simple averaging argument.

Claim 16 ().

CWεCP10.

Next, we introduce several different classifications of witnesses.

Definition 17 (Different subsets of witnesses).

For each xjX, let Wj denote the set of witnesses for which xj is a nearest center in X (breaking ties arbitrarily). Then, Wj,𝗇𝖾𝖺𝗋{pWj:𝖽𝗂𝗌𝗍(p,xj)8r/ε}, and Wj,𝖿𝖺𝗋WjWj,𝗇𝖾𝖺𝗋. Further, let W𝗇𝖾𝖺𝗋j[k]Wj,𝗇𝖾𝖺𝗋, and W𝖿𝖺𝗋j[k]Wj,𝖿𝖺𝗋. We will refer to a witness in W𝗇𝖾𝖺𝗋 as a nearby witness and a witness in W𝖿𝖺𝗋 as a faraway witness.

Now, we consider two different cases regarding the behavior of the algorithm.

Case 1: 𝑪𝑾𝗻𝗲𝗮𝗿𝜺𝟏𝟎𝟎𝑪𝑷.

In this case, when we sample a point from N{pP:𝖽𝗂𝗌𝗍(p,X)8r/ε} proportional to their 𝖽𝗂𝗌𝗍r(,X) values, the probability of sampling a nearby witness is at ε100. This will correspond to the “good event”. We prove this formally in the following lemma.

Lemma 18 (Nearby witness lemma).

Suppose the current solution X at the start of an iteration satisfies 𝖼𝗈𝗌𝗍r(P,X)>(1+ε)𝒢. Further, suppose CW𝗇𝖾𝖺𝗋ε100CP. Then, with probability at least ε200k, the point pP, the index i[k], and value δp defined in the iteration satisfy the following properties.

  1. 1.

    oiO is the closest center to pP, i.e., 𝖽𝗂𝗌𝗍(p,oi)=𝖽𝗂𝗌𝗍(p,O),

  2. 2.

    pW𝗇𝖾𝖺𝗋, i.e., (i) 𝖽𝗂𝗌𝗍r(p,X)>(1+ε/3)𝖽𝗂𝗌𝗍r(p,oi), and (ii) 𝖽𝗂𝗌𝗍(p,X)8rε.

  3. 3.

    𝖽𝗂𝗌𝗍(p,oi)<𝖽𝗂𝗌𝗍(p,X)1+ε/12=:δp8rε.

Proof.

First, in line 10 with probability 1/2, we correctly move to the “nearby witness” (if) case of line 10. We condition on this event. Next, note that W𝗇𝖾𝖺𝗋N, and CW𝗇𝖾𝖺𝗋ε100CPε100CN, where the first inequality is due to the case assumption. Therefore, in line 12, the point p sampled from N, will belong to W𝗇𝖾𝖺𝗋 with probability at least ε100. Let i[k] denote the index such that oiO is the closest center in the optimal solution O, and we correctly guess the index i in line 17. Note that the probability of the algorithm making the “correct” random choices is at least 12ε1001k=ε200k.

We condition on the said events. Since p is a witness, we know that 𝖽𝗂𝗌𝗍r(p,X)>(1+ε/3)𝖽𝗂𝗌𝗍r(p,oi). We first observe that 𝖽𝗂𝗌𝗍r(p,X) must be positive, which implies that 𝖽𝗂𝗌𝗍(p,X)>r=(1+ε/3)r. Now, we consider two cases based on the value of 𝖽𝗂𝗌𝗍(p,O).

If 𝖽𝗂𝗌𝗍(p,oi)<r, then 𝖽𝗂𝗌𝗍r(p,oi)=0, in which case,

𝖽𝗂𝗌𝗍(p,oi)<rr1+ε/12𝖽𝗂𝗌𝗍(p,X)1+ε/12 (2)

Otherwise, 𝖽𝗂𝗌𝗍(p,oi)r, in which case 𝖽𝗂𝗌𝗍r(p,oi)=𝖽𝗂𝗌𝗍(p,oi)r. Then, consider

𝖽𝗂𝗌𝗍(p,X)(1+ε3)r=𝖽𝗂𝗌𝗍r(p,X)>(1+ε3)𝖽𝗂𝗌𝗍r(p,oi)=(1+ε3)(𝖽𝗂𝗌𝗍(p,oi)r)
𝖽𝗂𝗌𝗍(p,X)(1+ε3)r>(1+ε3)𝖽𝗂𝗌𝗍(p,oi)(1+ε3)r
𝖽𝗂𝗌𝗍(p,oi)<𝖽𝗂𝗌𝗍(p,X)1+ε/3𝖽𝗂𝗌𝗍(p,X)1+ε/12 (3)

Thus, regardless of the value of 𝖽𝗂𝗌𝗍(p,oi), we have established the third item, hence completing the proof of the lemma.

Case 2: 𝑪𝑾𝗻𝗲𝗮𝗿<𝜺𝟏𝟎𝟎𝑪𝑷.

In this case, most of contribution of witnesses is coming from “faraway witnesses”. In this case, the “correct choice” corresponds to the case when we sample points from the set A={pP:𝖽𝗂𝗌𝗍r(p,X)>ε1000ku(p)} as defined in line 14. In this case, we will show that with good probability, the sampled point is a “faraway witness”. Specifically, we show the following lemma.

Lemma 19 (Faraway witness lemma).

Suppose the current solution X at the start of an iteration satisfies 𝖼𝗈𝗌𝗍r(P,X)>(1+ε)𝒢. Further, suppose CW𝗇𝖾𝖺𝗋<ε100CP. Then, with probability at least ε16k, the point pP, the index i[k], and value δp defined in the iteration satisfy the following properties.

  1. 1.

    oiO is the closest center to p, i.e., 𝖽𝗂𝗌𝗍(p,oi)=𝖽𝗂𝗌𝗍(p,O),

  2. 2.

    pW𝖿𝖺𝗋, i.e., (i) 𝖽𝗂𝗌𝗍r(p,X)>(1+ε/3)𝖽𝗂𝗌𝗍r(p,oi), and (ii) 𝖽𝗂𝗌𝗍(p,X)>8rε,

  3. 3.

    𝖽𝗂𝗌𝗍r(p,X)>ε1000ku(p), and

  4. 4.

    𝖽𝗂𝗌𝗍(p,oi)<𝖽𝗂𝗌𝗍(p,X)1+ε/12=:δp8rε.

Proof.

First, in line 10 with probability 1/2, we correctly move to the “faraway witness” (else) case of line 13. We condition on this event. Now, by combining Algorithm 16 and the case assumption we obtain,

CW𝖿𝖺𝗋=CWCW𝗇𝖾𝖺𝗋CP(ε10ε100)=ε9CP. (4)

Next, let H{j[k]:CWj,𝖿𝖺𝗋εCP100k}. It follows that,

j[k]HCWj,𝖿𝖺𝗋 kεCP100k
jHCWj,𝖿𝖺𝗋 CW𝖿𝖺𝗋j[k]HCWj,𝖿𝖺𝗋εCP9εCP100εCP8 (5)

Fix a jH, and let us index the points in Wj,𝖿𝖺𝗋={z1,z2,,z} in the non-decreasing order of their distances 𝖽𝗂𝗌𝗍(z,xj) (and equivalently, 𝖽𝗂𝗌𝗍r(z,xj)). First, we state the following simple consequences of various definitions for future reference.

Observation 20 ().

For each zWj,𝖿𝖺𝗋, the following bounds hold.

  1. 1.

    𝖽𝗂𝗌𝗍(z,xj)8rε,

  2. 2.

    𝖽𝗂𝗌𝗍(z,O)>(1+ε3)𝖽𝗂𝗌𝗍(z,xj)

  3. 3.

    𝖽𝗂𝗌𝗍r(z,xj)6rε, and

  4. 4.

    𝖽𝗂𝗌𝗍r(z,xj)𝖽𝗂𝗌𝗍(z,xj)(1+ε5)𝖽𝗂𝗌𝗍r(z,xj)(1+ε5)𝖽𝗂𝗌𝗍r(z,xj).

Let us return to the points in Wj,𝖿𝖺𝗋={z1,z2,,z}. Let q[] denote the minimum index such that, the contribution of the set Wj,𝖿𝖺𝗋={z1,z2,,zq}, is at least CWj,𝖿𝖺𝗋2. Note that by a simple argument, this implies that the contribution of the set Wj,𝖿𝖺𝗋+=Wj,𝖿𝖺𝗋Wj,𝖿𝖺𝗋 is also at least CWj,𝖿𝖺𝗋3 (see, e.g., Lemma 7.20 in [21]). Hence, we have the following lower bounds on the contribution of both the sets, by recalling that jH.

CWj,𝖿𝖺𝗋εCP200k,CWj,𝖿𝖺𝗋+εCP300k (6)

Next, we first prove the following technical claim.

Claim 21.

Let jH. For all pWj,𝖿𝖺𝗋+, it holds that u(p)900k𝖽𝗂𝗌𝗍(p,xj)ε. Therefore, Wj,𝖿𝖺𝗋+A.

Proof.

For this proof, we use the following shorthand: W+Wj,𝖿𝖺𝗋+, and WWj,𝖿𝖺𝗋. Now, fix an arbitrary point pW+. For any qW, 𝖽𝗂𝗌𝗍(p,xj)𝖽𝗂𝗌𝗍(q,xj), which implies that

𝖽𝗂𝗌𝗍(p,q)𝖽𝗂𝗌𝗍(p,xj)+𝖽𝗂𝗌𝗍(q,xj)2𝖽𝗂𝗌𝗍(p,xj)2(1+ε5)𝖽𝗂𝗌𝗍r(p,xj)4𝖽𝗂𝗌𝗍r(p,xj) (7)

Here, we use the property 4 from Observation 20 in the third inequality in the above. On the other hand, note that,

ε𝖮𝖯𝖳r200kεCP200kCW=qW𝖽𝗂𝗌𝗍r(q,xj)𝖽𝗂𝗌𝗍r(p,xj)|W| (8)

Then, if we set α=12𝖽𝗂𝗌𝗍r(p,xj), from (7), we obtain that W𝖻𝖺𝗅𝗅(p,α/3). Now, combining this with (8), we obtain that,

|𝖻𝖺𝗅𝗅(p,α/3)||W|ε𝖮𝖯𝖳r300k𝖽𝗂𝗌𝗍r(p,xj)ε𝖮𝖯𝖳r25kα=ε𝖮𝖯𝖳r75k(α/3) (9)

Hence, we have that |𝖻𝖺𝗅𝗅(p,75kα/3ε)|ε𝖮𝖯𝖳r75k(α/3). Using 𝖽𝗂𝗌𝗍(p,xj)8r/ε from Observation 20 and the fact 𝖽𝗂𝗌𝗍r(p,xj)5𝖽𝗂𝗌𝗍(p,xj)/6, we have 75kα3ε=300k𝖽𝗂𝗌𝗍r(p,xj)ε>r. Therefore, u(p)900k𝖽𝗂𝗌𝗍r(p,xj)ε900k𝖽𝗂𝗌𝗍(p,xj)ε, as desired.

Recall from (5) that CW𝖿𝖺𝗋AjHCWj,𝖿𝖺𝗋+εCP8εCA8, therefore, when we sample a point from A, the probability that the sampled point p belongs to jHWj,𝖿𝖺𝗋+ is at least ε8. Now, we condition on this event. Let oiO be the nearest center to p, and in line 17, the probability that we sample the correct index is 1k. Thus, the total probability of the algorithm making the “correct choices” in this case is at least 12ε81k=ε16k. We condition on these choices. Note that, item 1 is thus satisfied due to the correct sampling of i, item 2 is satisfied due pjHWj,𝖿𝖺𝗋+W𝖿𝖺𝗋W, and item 3 is satisfied since pA by construction. Thus, we are only left with showing item 4, to which end, we consider different cases for the distance between p and its closest optimal center, oi.

If 𝖽𝗂𝗌𝗍(p,oi)<6rε, then since 𝖽𝗂𝗌𝗍(p,X)8rε, it easily follows that,

𝖽𝗂𝗌𝗍(p,oi)<34𝖽𝗂𝗌𝗍(p,X)𝖽𝗂𝗌𝗍(p,X)1+ε/12 (10)

Otherwise, if 𝖽𝗂𝗌𝗍(p,oi)6rε, then similar item 4 of Observation 20, we can show that,

𝖽𝗂𝗌𝗍r(p,oi)<𝖽𝗂𝗌𝗍(p,oi)(1+ε3)𝖽𝗂𝗌𝗍r(p,oi) (11)

However, since p is an ε/3-witness, it follows that, 𝖽𝗂𝗌𝗍r(p,X)>(1+ε3)𝖽𝗂𝗌𝗍r(p,O). Therefore, we obtain that,

𝖽𝗂𝗌𝗍(p,xj)𝖽𝗂𝗌𝗍r(p,xj)>(1+ε3)𝖽𝗂𝗌𝗍r(p,oi) 1+ε/31+ε/5𝖽𝗂𝗌𝗍(p,oi)(1+ε12)𝖽𝗂𝗌𝗍(p,oi) (12)

Thus, in each of the sub-cases, in (10) and (12), we have shown that 𝖽𝗂𝗌𝗍(p,oi)<𝖽𝗂𝗌𝗍(p,X)1+ε/12, thus completing the proof of the lemma.

Lemma 18 and Lemma 19 can be combined in a straightforward way to obtain the following lemma completing the inductive step.

Lemma 22.

Consider an iteration of the algorithm such that the state of execution (X,Q1,,Qk) at the start of the iteration is consistent with a fixed optimal solution O, and further 𝖼𝗈𝗌𝗍r(P,X)>(1+ε)𝒢. Then, with probability at least ε200k, the state of the algorithm at the end of the iteration (X,Q1,Q2,,Qk) is also consistent with O.

Proof.

Consider the scenario described in the premise of the lemma. Then, the solution X at the start of the iteration either satisfies that CW𝗇𝖾𝖺𝗋ε100CP (nearby witness case), or CW𝗇𝖾𝖺𝗋ε100CP (faraway witness case). Then, Lemmas 18 and 19; along with the correctness of 𝒞, together imply that, conditioned on the respective case assumption, the probability that the state of the algorithm is consistent with O is at least ε200k. 999Note that this probability also accounts for the result of the coin toss in line 10. Armed with Lemma 22, it is easy to conclude the proof of Lemma 8.

Proof of Lemma 8.

As argued initially, the state of the algorithm at line 6 is consistent with the optimal solution O. Then, Lemma 22 implies that, for any 1, if the algorithm runs for iterations, then with probability at least (ε200k), the state of the algorithm remains consistent. Further, note that as long as the state of the algorithm remains consistent with O, then the algorithm cannot fail. Finally, Lemma 7 implies that the algorithm terminates within O(kελ(ε40)log(k40)) iterations. Therefore, the probability that the algorithm terminates without failure – i.e., by successfully breaking the while loop by finding a solution X satisfying 𝖼𝗈𝗌𝗍r(P,X)(1+ε)𝖮𝖯𝖳r, is at least (εk)O(kελ(ε40)log(k40))=exp(O(kεlog(kε)λ(ε40))).

4 Extensions of the Bicriteria FPT-AS

Fomin et al. [23] defined a generalization of Hybrid k-Clustering, which they call Hybrid (k,z)-clustering, wherein the objective is pP(𝖽𝗂𝗌𝗍r(p,X))z, for fixed z1, which simultaneously generalizes (k,z)-Clustering and k-Center. They generalized their algorithm for z=2, i.e., Hybrid k-Means, but left the possibility of extending to an arbitrary value of z conditional on the existence of a certain kind of sampling algorithm. In this paper, we consider a much more general Hybrid Norm k-Clustering problem that captures all (k,z)-Clustering problems, and much more. Here, the objective is to find a set X𝔽 that minimizes f(𝐝r(P,X)), where 𝐝r(P,X)=(𝖽𝗂𝗌𝗍r(p1,X),𝖽𝗂𝗌𝗍r(p2,X),,𝖽𝗂𝗌𝗍r(pn,X)) is the vector of r-distances to the solution X, and f:n is a monotone norm101010f:n is a norm if it satisfies following properties for any 𝐱,𝐲n and any λ𝐑, (i) f(𝐱)=0 iff 𝐱=𝟎, (ii) f(𝐱+𝐲)f(𝐱)+f(𝐲), (iii) f(λ𝐱)=|λ|f(𝐱); furthermore, f is monotone if f(𝐱)f(𝐲) whenever 𝐱𝐲, where denotes point-wise inequality. – we refer the reader to [1] for a detailed background on norms and related notions.

Algorithm 1 and its analysis can be extended to the general norm objective following the approach of [1]. Here, we only sketch the modifications required to the algorithm and its proof, since this is not the main focus of this work.

The initial upper bounds u(p) can be found in a similar manner. The while loop runs as long as the solution X in hand satisfies that f(𝐝r(P,X))>(1+ε/3)𝖮𝖯𝖳r, where 𝐝r denotes the vector of distances (𝖽𝗂𝗌𝗍r(p,X))pX, and r=(1+ε/3)r. If this is the case, then the first step is to compute111111Actually, it is sufficient to compute an approximate subgradient, as done in [1]. a subgradient of f at 𝐝r, which is, loosely speaking, a weight function w:P0, such that it suffices to focus the attention in the current iteration on 𝗐𝖼𝗈𝗌𝗍r(P,X)pPw(p)𝖽𝗂𝗌𝗍r(p,X) that satisfies 𝗐𝖼𝗈𝗌𝗍r(P,X)=f(𝐝r(P,X))>(1+ε/3)𝖮𝖯𝖳r. Our algorithm proceeds in a similar fashion, except that whenever points are sampled from the set N (in line 12) or set A (in line 15), the probability of sampling a point p is proportional to w(p)𝖽𝗂𝗌𝗍r(p,X). The rest of the algorithm remains unchanged.

There are a few minor modifications required in the analysis – mainly in Section 3.2.2. First, while the definition of an (ε/3)-witness (or simply a witness) remains unchanged, we redefine the contribution of a subset SP to be CSpSw(p)𝖽𝗂𝗌𝗍r(p,X). The nearby and faraway witness cases are then considered exactly as in Section 3.2.2. The proof of the analogue of Lemma 18 goes through without any modifications; whereas we need to be slightly more careful in the proof of the analogue of Lemma 19. Here, in inequalities (8 and 9) we need to take the weighted distances into account and relate CP to f(𝐝r(P,X)), which requires a slight worsening of constants. With these minor changes in place, we can conclude that the state of the algorithm after iterations is consistent with a fixed optimal solution O with probability at least Ω((εk)). The analysis for bounding the number of iterations remains unchanged, and hence we obtain an FPT-AS with a similar running time, modulo the constants hidden in the big-Oh notation. We omit the details.

5 Coreset

In this section, we design a coreset for Hybrid k-Clustering in doubling metric of bounded dimension. More specifically, we prove the following (restated for convenience). See 3

Proof.

A formal description of our algorithm can be found in the full version of the paper. Here, we provide an overview of the overview, which is based on grid construction approach of [26, 2]. Let TP𝔽 be an approximate solution that satisfies the following properties: (i) |T|2O(d)k, and (ii) 𝖼𝗈𝗌𝗍r(P,T)36𝖮𝖯𝖳r 121212Note the set T is a bicriteria solution, but in a different sense, since it approximates the size and the cost while using r-distances.. For each tiT, consider the balls Bij=𝖻𝖺𝗅𝗅(ti,2jR), for j{0,1,,2logαn}. Note that, since 𝖽𝗂𝗌𝗍(p,T)𝖼𝗈𝗌𝗍(P,T)=αnR, we have that p lies in some ball Bij, for i[γk] and j{0,1,,2logαn}.

The idea is to decompose each Bij into smaller balls each of radius ε2jR4α, and associate each point pP to a unique smallest ball containing p, breaking ties arbitrary. We say a small ball b is non-empty if there is a point in P associated with b. Next, for each non-empty ball b, pick an arbitrary point p associated with b and add p to P with weight w(p) equal to the total number of points associated with b; we call such a point p as the representative of points associated with b.

To bound the size of P, note that, in doubling metric of dimension d, a unit ball can be covered by 2O(dlog(1/ε)) balls, each of radius ε. Hence, we have |P|=O(2O(dlog(1/ε))γklog(αn)).

Recall that, for X𝔽,|X|k, we define the weighted cost of X with respect to (P,{w(p)}pP) as 𝗐𝖼𝗈𝗌𝗍r(P,X)=pPw(p)dr(p,X). Now we show that the cost of X with respect to (P,{w(p)}pP) approximately preserves the cost of X with respect to P. Consider a point pP and let pP be its representative in P. Now, the contribution of p towards 𝖼𝗈𝗌𝗍r(P,X) is 𝖽𝗂𝗌𝗍r(p,X). On the other hand, its contribution towards 𝗐𝖼𝗈𝗌𝗍r(P,X) is 𝖽𝗂𝗌𝗍r(p,X). Hence,

|𝗐𝖼𝗈𝗌𝗍r(P,X)𝖼𝗈𝗌𝗍r(P,X)|pP|𝖽𝗂𝗌𝗍r(p,X)𝖽𝗂𝗌𝗍r(p,X)|.

Now if piγkBi0, then 𝖽𝗂𝗌𝗍(p,X)εR4α𝖽𝗂𝗌𝗍(p,X)𝖽𝗂𝗌𝗍(p,X)+εR4α. Hence,

max{𝖽𝗂𝗌𝗍(p,X)εR/4αr,0}𝖽𝗂𝗌𝗍r(p,X)max{𝖽𝗂𝗌𝗍(p,X)+εR/4αr,0}

Therefore, 𝖽𝗂𝗌𝗍r(p,X)εR/4α𝖽𝗂𝗌𝗍r(p,X)𝖽𝗂𝗌𝗍r(p,X)+εR/4α. Hence, we have

pP0|𝖽𝗂𝗌𝗍r(p,X)𝖽𝗂𝗌𝗍r(p,X)|εRn4αε𝖮𝖯𝖳r2ε𝖼𝗈𝗌𝗍r(P,X)2.

Now, suppose pPiγkBi0, then 𝖽𝗂𝗌𝗍(p,T)>R. Let j1 be such that 2j1R𝖽𝗂𝗌𝗍(p,T)2jR. Hence, we have that 2jR2𝖽𝗂𝗌𝗍(p,T). Therefore, using 𝖽𝗂𝗌𝗍(p,X)εd(p,T)2α𝖽𝗂𝗌𝗍(p,X)𝖽𝗂𝗌𝗍(p,X)+εd(p,T)2α, we have

max{𝖽𝗂𝗌𝗍(p,X)εd(p,T)/2αr,0}𝖽𝗂𝗌𝗍r(p,X)max{𝖽𝗂𝗌𝗍(p,X)+εd(p,T)/2αr,0}

This implies,𝖽𝗂𝗌𝗍r(p,X)εd(p,T)/2α𝖽𝗂𝗌𝗍r(p,X)𝖽𝗂𝗌𝗍r(p,X)+εd(p,T)/2α. Thus, we have

pPP0|𝖽𝗂𝗌𝗍r(p,X)𝖽𝗂𝗌𝗍r(p,X)|pPP0ε2α𝖽𝗂𝗌𝗍(p,T)𝖮𝖯𝖳r2ε𝖼𝗈𝗌𝗍r(P,X)2.

Finally, we have |𝗐𝖼𝗈𝗌𝗍r(P,X)𝖼𝗈𝗌𝗍r(P,X)|ε𝖼𝗈𝗌𝗍r(P,X), as desired.

To finish the proof, we invoke the coreset construction algorithm using the approximate set T obtained from the following lemma. At a high level, to obtain this lemma, we start from an (18,6)-bicriteria approximation for Hybrid k-Clustering from [23, 13], and use the fact that, a ball of radius O(r) can be decomposed into 2O(d) balls of radius r, converting the guarantee from 𝖽𝗂𝗌𝗍6r(,) to 𝖽𝗂𝗌𝗍r(,).

Lemma 23 (Approximate solution T for coreset construction).

There is a polynomial-time algorithm that, given an instance =(P,𝔽,𝖽𝗂𝗌𝗍,r) of Hybrid k-Clustering in doubling metric of dimension d, computes TP𝔽,|T|2O(d)k such that 𝖼𝗈𝗌𝗍r(P,T)36𝖮𝖯𝖳r, where 𝖮𝖯𝖳r is the optimal cost for .

Proof.

Let A be (18,6)-bicriteria solution for obtained from [14], as mentioned in [23].That is 𝖼𝗈𝗌𝗍6r(P,A)18𝖮𝖯𝖳r. Consider Ba:=𝖻𝖺𝗅𝗅(a,12r) for aA, and decompose Ba into 2O(d) smaller balls, each of radius r/2 – note that this follows from the fact that the metric space has doubling dimension d. For each smaller ball b such that b(P𝔽), add an arbitrary tb(P𝔽) to T. Finally, add A to T. Hence, |T|=|A|+2O(d)|A|=k2O(d).

It is easy to see that, for every pcA𝖻𝖺𝗅𝗅(c,12r), there is some qT, such that 𝖽𝗂𝗌𝗍(p,q)r. Now, consider some pcA𝖻𝖺𝗅𝗅(c,12r). Note that 𝖽𝗂𝗌𝗍6r(p,A)6r, which implies that 𝖽𝗂𝗌𝗍(p,A)2𝖽𝗂𝗌𝗍6r(p,A). Therefore, 𝖽𝗂𝗌𝗍r(p,T)𝖽𝗂𝗌𝗍r(p,A)𝖽𝗂𝗌𝗍(p,A)2𝖽𝗂𝗌𝗍6r(p,A). Therefore, it follows that for all pP, 𝖽𝗂𝗌𝗍r(p,T)2𝖽𝗂𝗌𝗍6r(p,A). Which implies that,

𝖼𝗈𝗌𝗍r(p,T)2𝖼𝗈𝗌𝗍r(p,A)36𝖮𝖯𝖳r.

This concludes the proof of the lemma.

6 Conclusion

In this paper, we revisit Hybrid k-Clustering, which was introduced and studied recently by Fomin et al. [23]. We resolve two open questions explicitly asked in their work, namely, for continuous Euclidean instances from d, (1) we obtain an FPT-AS for Hybrid k-Clustering that does not have an FPT dependence on the dimension d (and in fact, the dependence on k is 2O(klogk) instead 2poly(k) as in [23]), and (2) we design coresets for the same. Indeed, our technique also generalizes to the (k,z)-clustering variant of the problem, which was implicitly considered in [23], but was not completely resolved. To obtain our algorithmic result, we build upon insights from the recent framework of Abbasi et al. [1] for clustering in metric spaces of bounded algorithmic scatter dimension that encapsulates a broad class of metric spaces. Thus, our result shows that the potential of the framework introduced in [1] extends to clustering problems beyond that captured by the monotone norm setting, thus paving the way for obtaining FPT-ASes for other clustering problems using the similar technique. However, several interesting questions remain.

Firstly, the framework of [1] is inherently randomized due to key sampling steps, and its derandomization remains an intriguing open question. In particular, derandomizing the step that samples a witness in our algorithm is an interesting challenge. In another direction, now that the approximation landscape of the vanilla version of Hybrid k-Clustering is beginning to be well-understood, it is natural to explore constrained variants of the problem such as those involving fairness or capacity constraints.

References

  • [1] Fateme Abbasi, Sandip Banerjee, Jaroslaw Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, and Joachim Spoerhase. Parameterized approximation schemes for clustering with general norm objectives. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1377–1399. IEEE, 2023. doi:10.1109/FOCS57990.2023.00085.
  • [2] Fateme Abbasi, Sandip Banerjee, Jaroslaw Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, and Joachim Spoerhase. Parameterized approximation for robust clustering in discrete geometric spaces. In Karl Bringmann, Martin Grohe, Gabriele Puppis, and Ola Svensson, editors, 51st International Colloquium on Automata, Languages, and Programming, ICALP 2024, July 8-12, 2024, Tallinn, Estonia, volume 297 of LIPIcs, pages 6:1–6:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.ICALP.2024.6.
  • [3] Akanksha Agrawal, Tanmay Inamdar, Saket Saurabh, and Jie Xue. Clustering what matters: Optimal approximation for clustering with outliers. J. Artif. Intell. Res., 78:143–166, 2023. doi:10.1613/JAIR.1.14883.
  • [4] Barbara Anthony, Vineet Goyal, Anupam Gupta, and Viswanath Nagarajan. A plant location guide for the unsure: Approximation algorithms for min-max location problems. Mathematics of Operations Research, 35(1):pages 79–101, 2010.
  • [5] Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for euclidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, pages 106–113, New York, NY, USA, 1998. ACM. doi:10.1145/276698.276718.
  • [6] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544–562, 2004. doi:10.1137/S0097539702416402.
  • [7] Mihai Badoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, (STOC), pages 250–257. ACM, 2002. doi:10.1145/509907.509947.
  • [8] Sayan Bandyapadhyay, Fedor V. Fomin, and Kirill Simonov. On coresets for fair clustering in metric and euclidean spaces and their applications. J. Comput. Syst. Sci., 142:103506, 2024. doi:10.1016/J.JCSS.2024.103506.
  • [9] Sayan Bhattacharya, Parinya Chalermsook, Kurt Mehlhorn, and Adrian Neumann. New approximability results for the robust k-median problem. In Scandinavian Workshop on Algorithm Theory (SWAT’14), pages 50–61. Springer, 2014. doi:10.1007/978-3-319-08404-6_5.
  • [10] Romain Bourneuf and Marcin Pilipczuk. Bounding ε-scatter dimension via metric sparsity. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (to appear), 2025.
  • [11] Vladimir Braverman, Shaofeng H-C Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for ordered weighted clustering. In International Conference on Machine Learning (ICML’19), pages 744–753. PMLR, 2019. URL: http://proceedings.mlr.press/v97/braverman19a.html.
  • [12] Jaroslaw Byrka, Krzysztof Sornat, and Joachim Spoerhase. Constant-factor approximation for ordered k-median. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 620–631. ACM, 2018. doi:10.1145/3188745.3188930.
  • [13] Deeparnab Chakrabarty, Prachi Goyal, and Ravishankar Krishnaswamy. The non-uniform k-center problem. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, volume 55 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPICS.ICALP.2016.67.
  • [14] Deeparnab Chakrabarty and Chaitanya Swamy. Interpolating between k-median and k-center: Approximation algorithms for ordered k-median. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP), volume 107 of LIPIcs, pages 29:1–29:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPICS.ICALP.2018.29.
  • [15] Deeparnab Chakrabarty and Chaitanya Swamy. Approximation algorithms for minimum norm and ordered optimization problems. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 126–137. ACM, 2019. doi:10.1145/3313276.3316322.
  • [16] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, and Chris Schwiegelshohn. Breaching the 2 LMP approximation barrier for facility location with applications to k-median. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 940–986. SIAM, 2023. doi:10.1137/1.9781611977554.CH37.
  • [17] Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. SIAM J. Comput., 48(2):644–667, 2019. doi:10.1137/17M112717X.
  • [18] Vincent Cohen-Addad and Jason Li. On the fixed-parameter tractability of capacitated clustering. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 41:1–41:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.ICALP.2019.41.
  • [19] Vincent Cohen-Addad, Michał Pilipczuk, and Marcin Pilipczuk. A polynomial-time approximation scheme for facility location on planar graphs. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 560–581. IEEE, 2019.
  • [20] Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Proceddings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 169–182. ACM, 2021. doi:10.1145/3406325.3451022.
  • [21] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3.
  • [22] Tomás Feder and Daniel H. Greene. Optimal algorithms for approximate clustering. In Janos Simon, editor, Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 434–444. ACM, 1988. doi:10.1145/62212.62255.
  • [23] Fedor V. Fomin, Petr A. Golovach, Tanmay Inamdar, Saket Saurabh, and Meirav Zehavi. Hybrid k-clustering: Blending k-median and k-center. In Amit Kumar and Noga Ron-Zewi, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2024, August 28-30, 2024, London School of Economics, London, UK, volume 317 of LIPIcs, pages 4:1–4:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.APPROX/RANDOM.2024.4.
  • [24] Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. SIAM J. Comput., 48(2):452–480, 2019. doi:10.1137/17M1127181.
  • [25] Mehrdad Ghadiri, Samira Samadi, and Santosh Vempala. Socially fair k-means clustering. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 438–448, 2021. doi:10.1145/3442188.3445906.
  • [26] Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 291–300. ACM, 2004. doi:10.1145/1007352.1007400.
  • [27] Dorit S. Hochbaum and Wolfgang Maass. Approximation schemes for covering and packing problems in image processing and VLSI. J. ACM, 32(1):130–136, 1985. doi:10.1145/2455.214106.
  • [28] Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180–184, 1985. URL: http://www.jstor.org/stable/3689371, doi:10.1287/MOOR.10.2.180.
  • [29] Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731–740. ACM, 2002. doi:10.1145/509907.510012.
  • [30] Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274–296, 2001. doi:10.1145/375827.375845.
  • [31] Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 646–659. ACM, 2018. doi:10.1145/3188745.3188882.
  • [32] Alfred A Kuehn and Michael J Hamburger. A heuristic program for locating warehouses. Management science, 9(4):643–666, 1963.
  • [33] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear time algorithms for clustering problems in any dimensions. In 32nd International Colloquium on Automata, Languages and Programming (ICALP), volume 3580 of Lecture Notes in Computer Science, pages 1374–1385. Springer, 2005. doi:10.1007/11523468_111.
  • [34] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1–5:32, 2010. doi:10.1145/1667053.1667054.
  • [35] Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Inf. Comput., 222:45–58, 2013. doi:10.1016/J.IC.2012.01.007.
  • [36] S. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137, March 1982. doi:10.1109/TIT.1982.1056489.
  • [37] Meena Mahajan, Prajakta Nimbhorkar, and Kasturi R. Varadarajan. The planar k-means problem is np-hard. Theor. Comput. Sci., 442:13–21, 2012. doi:10.1016/J.TCS.2010.05.034.
  • [38] Yury Makarychev and Ali Vakilian. Approximation algorithms for socially fair clustering. In Conference on Learning Theory (COLT’21), pages 3246–3264. PMLR, 2021. URL: http://proceedings.mlr.press/v134/makarychev21a.html.
  • [39] Dániel Marx. Parameterized complexity of independence and domination on geometric graphs. In Hans L. Bodlaender and Michael A. Langston, editors, Parameterized and Exact Computation, Second International Workshop, IWPEC 2006, Zürich, Switzerland, September 13-15, 2006, Proceedings, volume 4169 of Lecture Notes in Computer Science, pages 154–165. Springer, 2006. doi:10.1007/11847250_14.
  • [40] Nimrod Megiddo and Kenneth J Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182–196, 1984. doi:10.1137/0213014.
  • [41] John F. Stollsteimer. A working model for plant numbers and locations. Journal of Farm Economics, 45(3):631–645, 1963. URL: http://www.jstor.org/stable/1235442.
  • [42] Arie Tamir. The k-centrum multi-facility location problem. Discret. Appl. Math., 109(3):293–307, 2001. doi:10.1016/S0166-218X(00)00253-5.
  • [43] Murad Tukan, Xuan Wu, Samson Zhou, Vladimir Braverman, and Dan Feldman. New coresets for projective clustering and applications. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 of Proceedings of Machine Learning Research, pages 5391–5415. PMLR, 2022. URL: https://proceedings.mlr.press/v151/tukan22a.html.