Abstract 1 Introduction 2 Preliminaries 3 Geometric Hashing 4 MPC Algorithms for RS and MDS: Formal Statements 5 Proof Outline of Lemma 4.3: RS in High Dimension References Appendix A Proof of LemmaΒ 3.1: Geometric Hashing

Fully Scalable MPC Algorithms for Euclidean k-Center

Artur Czumaj ORCID Department of Computer Science, University of Warwick, Coventry, UK Guichen Gao ORCID School of Computer Science, Peking University, Beijing, China Mohsen Ghaffari ORCID Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA Shaofeng H.-C. Jiang ORCID School of Computer Science, Peking University, Beijing, China
Abstract

The k-center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The k-center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of k-center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has Ω⁒(k) local memory per machine. While this setting covers the case of small values of k, for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large k has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the k-center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only k⁒(1+o⁒(1)) centers whose cost is a super-constant approximation of k-center.

In this work, we significantly improve upon the earlier results for the k-center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in ℝd, we present the first constant-round fully scalable MPC algorithm for (2+Ξ΅)-approximation. We push the ratio further to (1+Ξ΅)-approximation albeit using slightly more (1+Ξ΅)⁒k centers. All these results naturally extends to slightly super-constant values of d. In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an O⁒(log⁑n/log⁑log⁑n)-approximation for k-center.

Keywords and phrases:
Massively Parallel Computing, Euclidean Spaces, k-Center Clustering
Category:
Track A: Algorithms, Complexity and Games
Funding:
Artur Czumaj: Research supported in part by the Centre for Discrete Mathematics and its Applications (DIMAP), by EPSRC award EP/V01305X/1, by a Weizmann-UK Making Connections Grant, by an IBM Award.
Shaofeng H.-C. Jiang: Research partially supported by a national key R&D program of China No. 2021YFA1000900.
Copyright and License:
[Uncaptioned image] © Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.-C. Jiang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation β†’ Massively parallel algorithms
; Theory of computation β†’ Facility location and clustering ; Theory of computation β†’ Randomness, geometry and discrete structures
Related Version:
Full Version: http://arxiv.org/abs/2504.16382 [16]
Editors:
Keren Censor-Hillel, Fabrizio Grandoni, JoΓ«l Ouaknine, and Gabriele Puppis

1 Introduction

Clustering and the π’Œ-Center Problem.

Clustering is a fundamental task in data analysis and machine learning. We consider a well-known clustering problem, called k-Center, in Euclidean spaces. In this problem, given an integer parameter kβ‰₯1 and a dataset PβŠ‚β„d, the goal is to find a center set CβŠ‚β„d of k points, such that the following clustering objective is minimized

cost⁑(P,C):=maxp∈P⁑dist⁑(p,C). (1)

Here, dist⁑(x,y):=β€–xβˆ’yβ€–2 for any two points x,yβˆˆβ„d, and dist⁑(p,C):=minc∈C⁑dist⁑(p,c) for any point pβˆˆβ„d and any set of points CβŠ‚β„d.

Solving k-Center on massive data sets introduces outstanding scalability issues. To meet this scalability challenge, practical approaches usually use several interconnected computers to solve the problem, i.e., they resort to distributed computing. Accordingly, there has been significant recent interest in scalable algorithms with provable guarantees for k-CenterΒ [1, 5, 7, 9, 11, 14, 20, 31, 37, 44, 46], primarily in the Massively Parallel Computing (MPC) model, which has nowadays become the de-facto standard theoretical model for such large-scale distributed computation settings (see, e.g., [6, 29, 36]).

Massively Parallel Computation (MPC) Model.

The MPC model, introduced inΒ [43], provides a theoretical abstraction for widely-used practical frameworks such as MapReduceΒ [19], HadoopΒ [52], SparkΒ [53], and DryadΒ [38]. In this model, we are given a set of machines, each with some given memory of size s (also known as local memory). At the beginning of computation, the input (which in our case is a set of n data points from ℝd) is arbitrarily distributed among these machines, with the constraint that it must fit within each machine’s local memory. (Hence we will require that the number of machines is Ω⁒(n/s), for otherwise the input would not fit the system.) The MPC computation proceeds in synchronous rounds. In each round, first, each machine processes its local data and performs an arbitrary computation on its data without communicating with other machines. Then, at the end of each round, machines can communicate by exchanging messages, subject to the constraint that for every machine, the total size of the messages it sends or receives is O⁒(s). When the algorithm terminates, MPC machines collectively output the solution. The goal is to finish the computational task using as small as possible number of rounds.

Local Memory Regimes and Full Scalability.

The central parameter determining the computational model is the size of the local memory s. Unlike the input size n, local memory is defined by the hardware provided and as such, one would like the relation between s and n to be as flexible as possible. Therefore an ideal MPC algorithm should be fully scalable, meaning that it should work with s=nΟƒ for any constant Οƒβˆˆ(0,1). The importance of designing fully scalable MPC algorithms has been recently observed (cf. [8]) for clustering problems like k-Center (and also k-means and k-median), where the prior research (see below) demonstrates that the problem’s difficulty changes radically depending on whether s=Ω⁒(k) or not, i.e., whether one machine can hold the entire set of proposed centers or not. It is furthermore desirable for the algorithm to use a near-linear total memory.

Prior Work with High Local Memory Requirements.

In the non-fully scalable regime, when s=Ω⁒(k), a classical technique of coresets [32, 33] can be applied to reduce the n input points into a (1+Ξ΅)-approximate proxy with only O⁒(k) points (ignoring f⁒(d)β‹…poly⁑(log⁑n) factors). For k-Center, provided that s=Ω⁒(k⁒nΞ³) for some constant γ∈(0,1) [7], this small proxy can be computed in O⁒(1) MPC rounds and moved to a single machine. The clustering problem (in fact, its approximate version because of the approximation caused by the use of coresets) can then be solved locally without further communication in a single MPC round. However, the coreset approach is not applicable when s=o⁒(k), because coresets suffer a trivial size lower bound of Ω⁒(k) making the approach sketched above unsuitable. Hence, new techniques must be developed for fully scalable algorithms when local memory is sub-linear in k.

Several other works for k-Center, although not using a coreset directly, also follow a similar paradigm of finding a sketch of poly⁑(k) points in each machineΒ [1, 20, 31, 46], and therefore they still require s=Ω⁒(k). We remark that these results work under general metrics with distance oracle, which is a setting very different from our Euclidean setting. This fundamental difference translates to different flavor of studies: in general metrics not much more can be done than using the triangle inequality, whereas in ℝd we need to investigate what Euclidean properties are useful for fully scalable algorithms.

Toward Fully Scalable MPC Algorithms.

To combat these technical challenges, several recent works have designed fully scalable MPC algorithms for k-Center and for related clustering problems, including k-Median and k-MeansΒ [5, 8, 12, 13, 14, 17]. These MPC algorithms are fully scalable in the sense that they can work with s=nΟƒ for any constant Οƒβˆˆ(0,1). Indeed, they usually work with any sβ‰₯f⁒(d)⁒poly⁑log⁑n regardless of k (albeit many of these results output more than k centers, only achieving a bi-criteria approximation). Therefore, in particular, despite the inspiring recent progress, fully scalable MPC algorithms for k-Center are poorly understood. The state-of-the-art algorithm (for geometric k-Center and only for d=O⁒(1)) is by Coy, Czumaj, and MishraΒ [14]: it achieves a super-constant approximation ratio of O⁒(logβˆ—β‘n) (improving over an O⁒(log⁑log⁑log⁑n)-rounds bound from an earlier work of Batteni et al.Β [5]), using k+o⁒(k) centers; this violates the constraint of using at most k centers and works in a super-constant O⁒(log⁑log⁑n) number of rounds. These bounds, which are stated for the standard regime of s=nΟƒ with a constant Οƒβˆˆ(0,1), show a drastic gap to the above-mentioned bounds achieved in the fully scalable s=Ω⁒(k) regime.

Challenges of High Dimensionality.

Apart from the fully-scalability, the high dimensionality of Euclidean spaces is another challenge in designing MPC algorithms. Indeed, there has been emerging works that address high dimensionality in MPC [4, 8, 12, 13, 17, 21, 39]. Unfortunately, all previous fully-scalable MPC algorithms for k-Center only work for low-dimensional regime since it requires exp⁑(d) dependence in d in local space, and fully-scalable MPC algorithms suitable for high dimension, especially those with poly⁑(d) dependence, constitutes an open area of research. Overall, there has been a large gap in fully-scalable algorithms for k-Center under both low- and high-dimensional regime.

1.1 Our Results

We give new fully scalable MPC algorithms for Euclidean k-Center and we systematically address both the low-dimensional and high-dimensional regimes. Our results in low dimension significantly improve previous results simultaneously in various aspects (i.e., approximation ratio, round complexity, etc.). We also obtain the first results for high dimension, and our approximation ratio bypasses several natural barriers.

Low-Dimensional Regime.

In low dimension, we provide the first fully scalable MPC algorithm that achieves a constant approximation to k-Center, running in a constant number of rounds (Theorem 1.1). This result significantly improves the previous fully scalable algorithms for k-CenterΒ [5, 14] in several major aspects: by achieving constant round complexity, better approximation factor, and true approximation (without bi-criteria considerations that allow slightly more than k centers).

Theorem 1.1.

There exists an MPC algorithm that given Ρ∈(0,1), kβ‰₯1, and a dataset PβŠ‚β„d of n points distributed across MPC machines with local memory sβ‰₯(Ω⁒(dβ’Ξ΅βˆ’1))Ω⁒(d)⁒poly⁑log⁑n, with probability at least 1βˆ’1/n computes a (2+Ξ΅)-approximate solution to k-Center, using O⁒(logs⁑n) rounds and O⁒(nβ‹…poly⁑log⁑nβ‹…(O⁒(dβ’Ξ΅βˆ’1))O⁒(d)) total memory.

Furthermore, we can improve the (2+Ρ) approximation bound if we allow bi-criteria approximations: we design a (1+Ρ)-approximation k-Center algorithm that uses (1+Ρ)⁒k centers (Theorem 1.2). This bi-criteria approximation is almost optimal, and is the first of its kind in the literature.

Theorem 1.2.

There exists an MPC algorithm that given Ρ∈(0,1), kβ‰₯1, and a dataset PβŠ‚β„d of n points distributed across MPC machines with local memory sβ‰₯(Ω⁒(dβ’Ξ΅βˆ’1))Ω⁒(d)⁒poly⁑log⁑n, with probability at least 1βˆ’1/n computes a (1+Ξ΅,1+Ξ΅)-approximate solution111An (Ξ±,Ξ²)-approximate solution to k-Center (also called a bi-criteria solution) is a center set CβŠ‚β„d that has at most β⁒k centers and has cost at most Ξ± times the optimal cost of using at most k centers. to k-Center, using O⁒(logs⁑n) rounds and O⁒(nβ‹…poly⁑log⁑nβ‹…(O⁒(dβ’Ξ΅βˆ’1))O⁒(d)) total memory.

Observe that for the memory regime of s=nΟƒ with a constant Οƒβˆˆ(0,1), both Theorems 1.1 andΒ 1.2 run in a constant number of rounds. Moreover, Θ⁒(logs⁑n) is the complexity of far more rudimentary tasks, e.g., outputting the summation of n numbers (see, e.g., [49]). The dependence on d in the memory bounds is 2Θ⁒(d⁒log⁑d). Thus, for any constant δ∈(0,1), if d=o⁒(log⁑n/log⁑log⁑n), then the algorithms work in a constant number of rounds with local memory sβ‰₯nΞ΄ and total memory n1+o⁒(1), which is the regime studied in the previous works on fully scalable algorithms for k-CenterΒ [5, 14]. Furthermore, if d=o⁒(log⁑log⁑n/log⁑log⁑log⁑n), then the total memory is in the desirable regime of O⁒(n⁒poly⁑log⁑(n)).

High-Dimensional Regime.

Next, we go beyond the low-dimensional regime and explore the high dimension case where d can be as large as O⁒(log⁑n).222As also observed in e.g., [12, 17], one can assume d=O⁒(log⁑n) without loss of generality by a Johnson-Lindenstrauss transform. For this regime, we provide the first fully scalable MPC algorithm for k-Center, and it achieves an O⁒(log⁑n/log⁑log⁑n)-approximation, running in a constant number of rounds (Theorem 1.3).

Theorem 1.3.

There exists an MPC algorithm that given Ρ∈(0,1), kβ‰₯1, and a dataset PβŠ‚β„d of n points distributed across MPC machines with local memory sβ‰₯poly⁑(d⁒log⁑n), with probability at least 1βˆ’1/n computes an O⁒(Ξ΅βˆ’1⁒log⁑n/log⁑log⁑n)-approximate solution to k-Center, using O⁒(logs⁑n) rounds and O⁒(n1+Ρ⁒poly⁑(d⁒log⁑n)) total memory.

We are not aware of any earlier fully scalable MPC algorithms for k-Center in high dimension that we could compare with. In fact, fully scalable MPC algorithms for clustering problems in high dimension are generally less understood, and the only known result is an O⁒(log2⁑n)-approximation for k-MedianΒ [12] (and in fact this ratio may be improved to O⁒(log1,.5⁑n) using the techniques fromΒ [2] although it is not explicitly mentioned), and as far as we know, nothing is known for k-Center and k-Means. These existing results for k-Median rely on the tree embedding technique, and currently only an O⁒(log1.5⁑n)-distortion is knownΒ [2] (which translates to the ratio). As a result, even if these techniques could be adapted for k-Center, it would only provide an O⁒(log1.5⁑n)-approximation, which falls short of the O⁒(log⁑n/log⁑log⁑n)-approximation achieved by our method; in fact, our bound is even better than the fundamental lower bound of Ω⁒(log⁑n)-approximation of tree embedding. This is not to mention the added technical difficulty of using the tree embedding: its expected distance distortion guarantee is too weak to be useful for the β€œmax” aggregation of distances in k-Center.

1.2 Technical Overview

Our algorithms for k-Center rely on variants of the classic reductions to geometric versions of the ruling set (RS) and minimum dominating set (MDS) problems, which are fundamental problems in distributed computing. We briefly describe the reductions in Section 1.2.1, particularly to mention our exact setup of RS and MDS, and state the results we obtain for each. Then in Section 1.2.2 and Section 1.2.3, we provide a technical overview of our proposed MPC algorithms for these results, focusing on the key challenges and core techniques. While the relation between k-Center and RS/MDS is well-known and has also been used in MPC algorithms for k-Center in general metricsΒ [1, 31], it has not been studied for Euclidean k-Center in the fully scalable setting. This is a key technical difference to previous fully scalable algorithms for Euclidean k-CenterΒ [5, 14] which employ successive uniform sampling to find centers.

Both our low dimension and high dimension results rely on geometric hashing techniques (in Section 3), through which we utilize the Euclidean structure. For low dimension, our algorithms are based on natural parallel algorithms where similar variants were also considered in graph MPC algorithms, and the geometric hashing is the key to achieve the new bounds. For high dimension, our algorithm is a variant of the one-round version of Luby’s algorithm. It has been known that the one-round Luby’s algorithm yields a Θ⁒(log⁑n) bound for RS in general graphs (see e.g.Β [24, Exercise 1.12]). However, our new variant crucially makes use of the Euclidean structure via geometric hashing, and it breaks the mentioned Θ⁒(log⁑n) bound in general graphs, improving it by an O⁒(log⁑log⁑n) factor in the Euclidean setting; See Section 1.2.3 for a more formal discussion. Due to the space limit, proofs for claims made in this section may be omitted in this version and can be found in the full versionΒ [16].

1.2.1 Reductions and Results for Geometric RS and MDS

To establish Theorems 1.1, 1.2, andΒ 1.3, we begin with introducing the definitions and reductions for RS and MDS. Let Ο„,Ξ±>0 be parameters, and let OPT be the minimum cost of the solution for k-Center.

Geometric RS and MDS.

A subset SβŠ†P is called a Ο„-independent set (Ο„-IS) for P, if for every xβ‰ y∈S, dist⁑(x,y)>Ο„, and we say SβŠ†β„d is a Ο„-dominating set (Ο„-DS) for P, if for every x∈P, dist⁑(x,S)≀τ. A subset SβŠ†P is a (Ο„,Ξ±)-ruling set ((Ο„,Ξ±)-RS) for P if S is both a Ο„-IS and Ξ±-DS for P. A Ο„-MDS is a Ο„-DS with the minimum size, denoted as MDSτ⁑(P). A related well known notion is maximal independent set (MIS), where a Ο„-MIS is (Ο„,Ο„)-RS.

Reductions.

It is known (see, e.g., [35]) that any Ο„-IS of P for Ο„β‰₯2⁒OPT must have at most k points. Therefore, an MPC algorithm that computes a (2⁒OPT,Ξ±)-RS of P would immediately yield Ξ±-approximation for k-Center on P. On the other hand, for MDS, since the optimal solution for k-Center itself is a candidate for a Ο„-MDS and has at most k points for Ο„=OPT, a (1+Ξ΅)-approximation to Ο„-MDS would yield (1+Ξ΅)⁒k centers. This relation helps to obtain the desired bi-criteria approximation. Compared with the setting of RS which could only leads to some O⁒(1)-approximation, MDS operates on the entire ℝd. This is necessary for the (1+Ξ΅) ratio since centers need to be picked from ℝd instead of only from P.

Results for RS and MDS.

We obtain the following results for RS and MDS, in both low and high dimension. Combining with the above-mentioned reductions, these results readily imply Theorems 1.1, 1.2, andΒ 1.3. All results run in O⁒(logs⁑n) rounds which is constant in the typical setup of s=nΟƒ for constant 0<Οƒ<1. In low dimension, we obtain (Ο„,(1+Ξ΅)⁒τ)-RS and a (1+Ξ΅)⁒τ-DS whose size is at most (1+Ξ΅) times the Ο„-MDS (which in a sense is a β€œbi-criteria” approximation). Both results use (Ξ΅βˆ’1⁒d)O⁒(d)β‹…poly⁑(log⁑(n)) local space, and n⁒poly⁑(d⁒log⁑n)β‹…(Ξ΅βˆ’1⁒d)O⁒(d) total space. In high dimension, we obtain (Ο„,(Ξ΅βˆ’1⁒log⁑n/log⁑log⁑n)⁒τ)-RS, using (ideal) poly⁑(d⁒log⁑n) local space and n1+Ρ⁒poly⁑(d⁒log⁑n) total space. These results are summarized in Table 1.

Table 1: RS and MDS results, where O~ hides poly⁑(d⁒log⁑n) factor, all run in O⁒(logs⁑n) rounds.
guarantee local space total space reference
(Ο„,(1+Ξ΅)⁒τ)-RS (Ξ΅βˆ’1⁒d)O⁒(d)β‹…O~⁒(1) O~⁒(n)β‹…(Ξ΅βˆ’1⁒d)O⁒(d) Lemma 4.1
(1+Ξ΅)⁒τ-DS of size (1+Ξ΅)⁒|MDSτ⁑(P)| (Ξ΅βˆ’1⁒d)O⁒(d)β‹…O~⁒(1) O~⁒(n)β‹…(Ξ΅βˆ’1⁒d)O⁒(d) Lemma 4.2
(Ο„,O⁒(Ξ΅βˆ’1⁒log⁑nlog⁑log⁑n)⁒τ)-RS O~⁒(1) O~⁒(n1+Ξ΅) Lemma 4.3

We remark that MIS, RS, and MDS are fundamental yet notoriously challenging problems in MPC. Existing studies on these problems are mostly under (general) graphs or general metric spaces, and they achieve worse bounds than ours, e.g., they need to use a super-constant number of roundsΒ [25, 26, 27, 30, 40, 47], and/or are not fully scalableΒ [10, 27, 31]. However, our results seem to suggest that these problems in Euclidean spaces behave very differently than in graphs/general metrics. On the one hand, we obtain fully-scalable algorithms in both low and high dimension, but on the other hand, our algorithms are only β€œapproximations” to MIS and MDS; for instance, in low dimension, both our RS and MDS results have (1+Ξ΅) factor off in the dominating parameter to MIS and MDS. For RS/MIS and MDS without violating dominating parameter, we are only aware of a line of research in distributed computing for growth-bounded graphs, see Schneider and Wattenhofer [51], which indirectly lead to O⁒(logβˆ—β‘n)-rounds fully scalable algorithms for MIS/MDS in ℝd for constant d. It is still open to design fully scalable algorithms for MIS, even in 2D, in constant number of rounds. In fact, this problem is already challenging on a 2D input set with diameter O⁒(Ο„). Nevertheless, our RS and MDS results suffice for approximations for k-Center.

1.2.2 RS and MDS in Low Dimension

The RS and MDS in low dimension starts with a rounding to a Ρ⁒τ/d-grid. Specifically, we move each data point to the nearest Ρ⁒τ/d-grid point, whose coordinates are multiples of Ρ⁒τ/d, denoted as Pβ€².333In our proof we actually need to use a slightly different rounding to make sure the image also belongs to P. Then we show that any (Ο„,α⁒τ)-RS to Pβ€² yields a (Ο„,(1+Ξ΅)⁒α⁒τ)-RS to the original dataset P, and similarly, any Ο„-DS to Pβ€² yields a (1+Ξ΅)⁒τ-DS to P. Hence, we can assume without loss of generality that P is a subset of the said grid, and find RS and MDS on it. This rounding is useful, since it ensures that in any subset of ℝd of diameter Ξ³β‹…Ο„ (Ξ³β‰₯1), the number of the grid points is at most (O⁒(d⁒γ/Ξ΅))d. Hence, as long as sβ‰₯Ω⁒(d/Ξ΅)d, we can afford to bring all grid points in a small area to a single machine and solve RS/MDS on it locally.

An Overview for the Proof of RS.

In fact, on the rounded dataset P, we find a (Ο„,Ο„)-RS which is as well Ο„-MIS on P. A standard way of finding MIS in a graph is a greedy algorithm: start with the entire vertex set, and if the current vertex set is nonempty, then find any vertex x, add it to the output (MIS) set, remove all vertices that are adjacent to x from the current vertex set, and repeat. In the geometric setting, we can use an improved version where in each iteration we identify a large number of vertices that are independent, instead of only one. Specifically, we partition the entire ℝd into some T groups 𝒲1,…,𝒲T, such that each group consists of regions that are Ο„ apart from each other. Furthermore, each region has a bounded diameter α⁒τ for some Ξ±β‰₯1.444The reader may notice the reminiscence with the widely-used notion of network decomposition in graphsΒ [3, 50]. In that notion, the node set V of the graph is partitioned into T=O⁒(log⁑n) groups V1, …, VT, such that in the subgraph induced by each Vi, each connected component has diameter at most Ξ±=O⁒(log⁑n). For d=2, there is a simple way to partition with T=O⁒(1) and Ξ±=O⁒(1), as illustrated in Figure 1. For general d, we give a partition that achieves T=O⁒(d) and Ξ±=O⁒(d1.5). See Lemma 3.1 for the precise statement.

Figure 1: A space partition in 2D with T=3 and Ξ±=5. The first group is squares with side-length 5⁒τ/2 (the blank space), the second is the red-shaded rectangles, and the third is the cross-like structures in blue shades.

A key property of this decomposition is that the MIS computation in each region can be done independently within each group, as regions are Ο„-separated. This yields an O⁒(T⁒logs⁑n)-round MPC algorithm: iterating over the T groups 𝒲1,…,𝒲T, computing an MIS in each region in parallel, and removing points within Ο„ from selected MIS points before proceeding to the next group. Since each region in any group is of diameter α⁒τ which is at most d1.5⁒τ, there are at most O⁒(Ξ΅βˆ’1⁒d)O⁒(d) data points in each region, using the assumption of the rounded instance. Hence, as long as sβ‰₯Ω⁒(Ξ΅βˆ’1⁒d)Ω⁒(d), the data points in each region can be stored in a single machine. Each such iteration can be implemented in O⁒(logs⁑n) MPC rounds, and thus the overall scheme which has T iterations takes O⁒(T⁒logs⁑n) MPC rounds.

To reduce this to O⁒(logs⁑n) rounds, we exploit the locality of the above procedure. Instead of iterating sequentially over groups, each region Rβˆˆπ’²i for 1≀i≀T directly determines its final subset Rβ€² by identifying the influence from earlier groups ⋃j<i𝒲j that are within a bounded range O⁒(i⁒τ+(iβˆ’1)⁒α⁒τ)≀poly⁑(d)β‹…Ο„. Since an MIS selection only affects a Ο„-radius per iteration, and each region has a diameter at most α⁒τ, the cumulative affected area over i iterations remains bounded. Leveraging the rounding property again, the number of relevant regions remains at most O⁒(Ξ΅βˆ’1⁒d)O⁒(d), allowing all necessary regions to be replicated locally. This ensures that each region can compute its MIS in parallel in O⁒(logs⁑n) rounds, provided sβ‰₯Ω⁒(Ξ΅βˆ’1⁒d)Ω⁒(d)⁒poly⁑(log⁑n).

An Overview for the Proof of MDS.

We start with a weaker local space bound that requires a 2Ω⁒(d2) dependence of d in s, instead of the claimed 2Ω⁒(d⁒log⁑d) bound. Our approach starts with a simple algorithm: partition ℝd into hypercubes of side-length α⁒τ, where Ξ±β‰₯1 is a parameter to be determined. Then send the data points in each hypercube to a single machine, and solve locally the exact MDS of each hypercube. Taking the union of these MDSs forms the final dominating set. Clearly, this returns a dominating set for the entire dataset, but it may not be of small size. Intuitively, the major gap in this algorithm is when the optimal solution uses points located at the boundary of the hypercubes to dominate the adjacent hypercubes, while the algorithm described here only uses a point to dominate points in a single hypercube.

To address this issue, we observe that bridging the gap between the algorithm’s solution and the optimal solution requires only bounding the number of points in the outer β€œΟ„-extended region” of each hypercube R, denoted as UΟ„βˆžβ’(R), that intersect with the optimal solution. Specifically, it suffices to show that this number is at most an Ξ΅-fraction of the optimal solution size. To establish this bound, we apply an averaging argument: If we shift the entire hypercube partitioning by some multiple of O⁒(Ο„), there must exist a shift that satisfies our requirement, provided that the hypercube side-length α⁒τ is sufficiently large. This may be visualized more easily in 1D: each UΟ„βˆžβ’(R) is simply two intervals of length Ο„ to the left and right of R (which itself is also an interval, whose length is α⁒τ). Then by shifting a multiple of O⁒(Ο„), the two intervals of UΟ„βˆž, after these shifts, form a partition of the entire ℝ. Unfortunately, this simple shifting of hypercubes only leads to Ξ±=2O⁒(d), which translates to a 2O⁒(d2) dependence of d in the local space s. The main reason for this Ξ±=2O⁒(d) bound is that the same point in the optimal solution may belong to up to 2O⁒(d) sets UΟ„βˆžβ’(R) (for some hypercube R).

To further reduce Ξ±=poly⁑(d), which leads to the dO⁒(d) local space bound, we need to employ a more sophisticated geometric hashing. We state this in Lemma 3.1 and ˜3.2. This hash maps each point in ℝd into some bucket, and we would replace the hypercubes in the above-mentioned algorithm with such buckets. An important property of this bucketing is that any point in ℝd (hence any point in the optimal solution) can intersect at most poly⁑(d) number of sets UΟ„βˆžβ’(R) over all buckets R, instead of 2O⁒(d) as in the simple hypercube partition. However, the use of this new bucketing also introduces additional issues. Specifically, since the buckets are of complicated structure, it is difficult to analyze the even more complex set UΟ„βˆžβ’(R) for the averaging argument. To this end, we manage to show (in Lemma 3.1, third property) that the union of ⋃RUΟ„βˆžβ’(R) is contained in the complement of a Cartesian power of (1D) intervals (e.g., ([1,2]βˆͺ[3,4])d). This structure of Cartesian power is similar enough to the UΟ„βˆž annulus of hypercubes (which may be handled by projecting to each dimension), albeit taking the complement. This eventually enables us to use a modified averaging argument to finish the proof.

1.2.3 RS in High Dimension

Our (Ο„,O⁒(Ξ΅βˆ’1⁒log⁑n/log⁑log⁑n)⁒τ)-RS in high dimension is a modification of the well-known Luby’s algorithm. In this discussion, we assume Ξ΅=Θ⁒(1) and ignore this parameter. The first modification is that, unlike the standard Luby’s algorithm which runs for O⁒(log⁑n) iterationsΒ [45], our algorithm only runs Luby’s for one iteration: for every data point x∈P, generate a uniform random value h⁒(x)∈[0,1], and then for each x∈P, include x in the RS if x has the smallest h value in BP⁒(x,Ο„) (the ball centered at x with radius Ο„, intersecting points in P). This one-round Luby’s algorithm achieves (Ο„,O⁒(log⁑n)⁒τ)-RS (with high probability), and we also show that this is tight in general graphs.555Similar bounds were also mentioned without proof in the literature, see e.g.Β [24, Exercise 1.12]. We give a proof (sketch) for this tight bound for completeness. However, this is worse than the O⁒(log⁑n/log⁑log⁑n) factor that we can achieve.

A New Preprocessing Step Based on Geometric Hashing.

Hence, we need to introduce the second important modification to this one-round Luby’s algorithm in order to bypass the O⁒(log⁑n) factor. Specifically, before running the one-round Luby’s algorithm, we run a preprocessing step to map the data points to the buckets of a geometric hashing. The geometric hashing that we use is the consistent hashingΒ [18] (see Lemma 3.4), and the concrete guarantee in our context is that, each hash bucket has diameter β„“:=O⁒(log⁑n/log⁑log⁑n)⁒τ, and for any subset SβŠ†β„d with diameter at most O⁒(Ο„), the number of buckets that S intersects is at most Ξ›:=poly⁑(log⁑n). We pick an arbitrary point in P from each (non-empty) bucket, denoting the resultant set as Pβ€², and we run the one-round Luby’s on Pβ€². Clearly, this hashing step only additively increases the dominating parameter by β„“=O⁒(log⁑n/log⁑log⁑n)⁒τ which we can afford. At a high level, the use of this rounding is to limit the size of the O⁒(Ο„)-neighborhood for every point, which is analogue to the degree of a graph. In a sense, what we prove is that one-round Luby’s on graphs with degree bound poly⁑(log⁑n) yields an O⁒(log⁑n/log⁑log⁑n)-ruling set.

Next, we explain in more detail why this hashing step helps to obtain O⁒(log⁑n/log⁑log⁑n)⁒τ dominating parameter. This requires us to do a formal analysis to one-round Luby’s algorithm (which did not seem to appear in the literature), and utilize the property of hashing that for every x∈Pβ€², |BP′⁒(x,Ο„)|≀Λ=poly⁑(log⁑n).

A Re-Assignment Argument.

Let R be the resultant set found by running one-round Luby’s on Pβ€². Fix a point p∈Pβ€², and we need to upper bound dist⁑(x,R). To this end, we interpret the algorithm as defining an auxiliary sequence S=(x0:=p,x1,…,xT) (where T is also random). Specifically, S is formed in the following process: we start with i=0, whenever xi is not picked into R we define xi+1 as the point with the smallest h value in BP′⁒(x,Ο„), and we terminate the procedure otherwise. Clearly, dist⁑(x,R)≀T⁒τ, and it suffices to upper bound T (which is a random stopping time). Indeed, a similar plan of re-assignment argument has been employed inΒ [17], but the definition and analysis of S is quite different since they focus on facility location.

Now, a crucial step is to bound the probability of the event that, given the algorithm picks a prefix (x0,…,xi) of sequence S, the algorithm picks some new xi+1∈Pβ€² instead of terminating. We call this extension probability. Ideally, if we can give this extension probability a universal upper bound Ξ³<1, then for tβ‰₯1, the probability that T>t is at most Ξ³t, which decreases exponentially with respect to t. However, this seemingly good bound may not immediately be sufficient, since one still needs to take a union bound over sequences of length t, because the sequence S is random. A naΓ―ve bound is nt since each xi may be picked from any point in Pβ€², and this is positively large (i.e., Ξ³t cannot β€œcancel it out”.). Moreover, an added difficulty is that the β€œideal” case of having universal upper bound Ξ³ for the extension probability may not be possible in the first place.

Our analysis resolves both difficulties. We show that the extension probability is roughly upper bounded by |BP′⁒(xi,Ο„)|/|⋃j=1iBP′⁒(xj,Ο„)| (recalling that (x0,…,xi) is given). For the union bound, since the hashing guarantees that |BP′⁒(x,Ο„)|≀Λ=poly⁑(log⁑n) for any x∈Pβ€², we can approximate the size |BP′⁒(xi,Ο„)| by rounding to the next power of 2, denoted as ⌈|BP′⁒(xi,Ο„)|βŒ‰2, and this yields only O⁒(log⁑log⁑n) possibilities after rounding. For a (fixed) sequence Sβ€²:=(x0,…,xm), we define its configuration as the rounded value of (⌈|BP′⁒(x1,Ο„)|βŒ‰2,…,⌈|BP′⁒(xm,Ο„)|βŒ‰2). We can do a union bound with respect to the configuration of the sequences, and there are at most O⁒(log⁑log⁑n)t number of configurations for length-t sequences.

Finally, given a sequence Sβ€²=(x0,…,xt) with a given configuration (for some t), to upper bound the extension probability, we need to analyze ∏i|BP′⁒(xi,Ο„)||βˆ‘j=1iBP′⁒(xj,Ο„)|, and we show in a key lemma that this is upper bounded by exp⁑(βˆ’Ξ©β’(t)⁒log⁑(t/log⁑Λ)). Therefore, we can pick t=log⁑n/log⁑log⁑n, apply the union bound and conclude that Pr⁑[T>t]≀(log⁑log⁑n)tβ‹…exp⁑(βˆ’Ξ©β’(t)⁒log⁑(t/log⁑Λ))≀1/poly⁑(n).

We also provide a (sketch) of how to slightly modify our analysis to show the one-round Luby’s algorithm yields O⁒(Ο„,O⁒(log⁑n)⁒τ)-RS. As mentioned, this did not seem to appear in the literature. We also provide a tight instance for this one-round Luby’s algorithm.

1.3 Related Work

The k-Center problem, as one of the fundamental clustering problems [28, 34, 35], has been studied extensively in sequential setting, and more recently, also in parallel setting. For MPC algorithms for k-Center, most of prior works have focused on non-fully scalable algorithms that have a dependence of Ω⁒(k) in the local memory s and can generally achieve O⁒(1) rounds. In general metric space,Β [20] obtains a large-constant approximation, using O⁒(1/Οƒ) rounds if the local memory is poly⁑(k)⁒nΟƒ, which offers a tradeoff between the number of rounds and the local memory. More recent works aim to achieve a smaller constant ratio, down to factor 2, at the cost of having local memory size with a fixed polynomial dependence in n and/or a polynomial dependence in the number of machinesΒ [1, 31, 37, 46]. There has been also some k-Center studies on doubling metrics instances (which is a generalization of ℝd)Β [7, 11], where in the bi-criteria (or with outliers) setting, not only a constant but even a (1+Ξ΅) ratio can be achieved due to the low-dimensional structureΒ [7].

Fully-scalable MPC algorithms have been also studied for related clustering problems of k-Median and k-Means in ℝd. [12] describes a hierarchical clustering algorithm that in a constant number of rounds returns a poly⁑log⁑(n)-approximation for k-Median using s=poly⁑(d)β‹…nΟƒ local memory; we are not aware of any similar approximation for k-Means (even when d=O⁒(1) and with poly⁑log⁑n ratio). Furthermore, since the techniques used in [12] rely critically on the approach of hierarchically well-separated trees, they are unlikely to lead to sub-polylogarithmic approximation ratio algorithms. On the other hand, bi-criteria approximation are known for both k-Median and k-MeansΒ [8, 17], and in a constant number of rounds and with s=poly⁑(d)β‹…nΟƒ local memory, one can approximate k-median and k-means with O⁒(1) ratio using (1+Ξ΅)⁒k centersΒ [17]. In the same setting, it is also possible to achieve a (1+Ξ΅)-approximation for special inputsΒ [13].

2 Preliminaries

For integer nβ‰₯1, let [n]:={1,…,n}. For a mapping f:Xβ†’Y, denote fβˆ’1⁒(y):={x∈X:f⁒(x)=y} as the pre-image of f. For some dβ‰₯1, a set SβŠ†β„d and xβˆˆβ„d, write S+x to denote {x+y:y∈S}. For t>0, let 𝒒tβŠ†β„d be the set of t-grid points, that is, points whose coordinates take values as integer multiples of t. For a subset SβŠ‚β„d, let diam⁑(S):=maxx,y∈S⁑dist⁑(x,y) be its diameter. Similarly define dist∞ and diam∞ as the β„“βˆž version.

Neighborhoods.

Let B⁒(x,r):={yβˆˆβ„d:dist⁑(x,y)≀r} be a ball centered at xβˆˆβ„d with radius rβ‰₯0. For a point set XβŠ†β„d, let BX⁒(x,r):=B⁒(x,r)∩X denote a ball inside X. For a point set SβŠ‚β„d and Ξ³>0, let Nγ∞⁒(S) be the Ξ³-neighborhood of S under the β„“βˆž distance, i.e., Nγ∞⁒(S):={xβˆˆβ„d:βˆƒs∈S,β€–xβˆ’sβ€–βˆžβ‰€Ξ³}, and let Uγ∞⁒(S):=Nγ∞⁒(S)βˆ–S be the Ξ³-annulus of S under β„“βˆž distance.

Geometric Independent Set, Ruling Set and Dominating Set.

Let Ο„>0 be some threshold, Ξ±>0 be some parameter and PβŠ‚β„d be a dataset. A subset SβŠ†P is called a Ο„-independent set (Ο„-IS) for P, if for every xβ‰ y∈S, dist⁑(x,y)>Ο„, and we say SβŠ†β„d is a Ο„-dominating set (Ο„-DS) for P, if for every x∈P, dist⁑(x,S)≀τ. A subset SβŠ†P is a (Ο„,Ξ±)-ruling set ((Ο„,Ξ±)-RS) for P if S is both a Ο„-IS and Ξ±-DS for P. A Ο„-MDS is a Ο„-DS with the minimum size, denoted as MDSτ⁑(P). A related well known notion is maximal independent set (MIS), where a Ο„-MIS for P, denoted as MISτ⁑(P), is a (Ο„,Ο„)-RS for P.

π’Œ-Center.

Recall that the objective of k-Center is defined in Section 1 as cost⁑(P,C) for a dataset P and center set C. Let OPT⁑(P) be the minimum value of the solution for k-Center, i.e., OPT⁑(P):=minCβŠ‚β„d⁑cost⁑(P,C). When the context is clear, we simply write OPT for OPT⁑(P).

Standard MPC Primitives.

In our algorithms we frequently use several basic primitives on MPC with local memory sβ‰₯poly⁑log⁑(N) using total memory O⁒(N⁒poly⁑log⁑N) and number of rounds O⁒(logs⁑N), where N is the size of a generic input. This includes standard procedures of broadcast and converge-cast (of a message that is of size ≀s), see e.g.Β [23]. Goodrich et al.Β [29] show that the task of sorting N numbers can also be performed deterministically in the above setting.

Lemma 2.1 (Packing property, cf. [48, Lemma 4.1]).

For a point set SβŠ‚β„d such that βˆ€xβ‰ y∈S,d⁒i⁒s⁒t⁒(x,y)β‰₯ρ, we have that |S|≀(3⁒diam⁑(S)ρ)d.

3 Geometric Hashing

We present our new geometric hashing in Lemma 3.1. This hashing is crucially used in our low dimension results. The construction of this hashing is the same asΒ [18, Theorem 5.3], and the first two properties have also been established inΒ [18]. However, the third property is new, and is based on a careful analysis that utilizes the structure of this specific construction.

Lemma 3.1.

For every Ξ²>0, β„“β‰₯Θ⁒(d1.5⁒β), there is a hash function f:ℝd→ℝd such that the following holds.

  1. 1.

    Each bucket has diameter at most β„“, namely, for every image u∈f⁒(ℝd), diam⁑(fβˆ’1⁒(u))≀ℓ.

  2. 2.

    The bucket set {fβˆ’1⁒(u):u∈f⁒(ℝd)} can be partitioned into d+1 groups {𝒲i}i=0d, such that every two buckets Sβ‰ Sβ€² in the same group 𝒲i (0≀i≀d) has dist⁑(S,Sβ€²)β‰₯dist∞⁑(S,Sβ€²)>Ξ².

  3. 3.

    For every 0<τ≀β, (⋃u∈f⁒(ℝd)UΟ„βˆžβ’(fβˆ’1⁒(u)))∩L⁒(z,2⁒b)d=βˆ…,666Recall that the notation L⁒(β‹…,β‹…)d denotes the d-th Cartesian power of L⁒(β‹…,β‹…). where z:=β„“/d, b:=d⁒β+Ο„ and L⁒(p,q):=⋃aβˆˆβ„€[a⁒p+q,(a+1)⁒pβˆ’q) for p>2⁒q.

Furthermore, it takes poly⁑(d) space to store f and to evaluate f⁒(x) for every xβˆˆβ„d.

Proof.

The proof can be found in Appendix A. β—€

Lemma 3.1 readily implies the following property. Roughly speaking, it ensures that for any point set S with small enough β„“βˆž diameter, the number of intersected buckets in the hash is bounded.

Fact 3.2.

The second property of Lemma 3.1 implies that for every SβŠ‚β„d such that diam∞⁑(S)≀β, it holds that |f⁒(S)|≀d+1.

Geometric hash functions that has similar guarantee as in ˜3.2 has been studied under the notion of consistent hashingΒ [18], or sparse partitions which interpret the hashing as a space partitionΒ [22, 41]. Specifically, the definition of consistent hashing is stated as follows. The consistently guarantee of ˜3.2 is slightly stronger in the sense that the diameter bound of S is in β„“βˆž instead of β„“2.

Definition 3.3 ([18, Definition 1.6]).

A mapping Ο†:ℝd→ℝd is called a Ξ“-gap Ξ›-consistent hash with diameter bound β„“>0, or simply (Ξ“,Ξ›)-hash , if it satisfies:

  • β– 

    Diameter: for every image zβˆˆΟ†β’(ℝd), we have diam⁑(Ο†βˆ’1⁒(z))≀ℓ; and

  • β– 

    Consistency: for every SβŠ‚β„d with diam⁑(S)≀ℓ/Ξ“, we have |φ⁒(S)|≀Λ.

Lemma 3.4 gives a space-efficient consistent hashing with near-optimal parameter tradeoffs. We rely on this parameter tradeoff in a preprocessing step of our high dimension ruling set result.

Lemma 3.4 ([18, Theorem 5.1]).

For every Ξ“βˆˆ[8,2⁒d], there exists a (deterministic) (Ξ“,Ξ›)-hash Ο†:ℝd→ℝd where Ξ›=exp⁑(8⁒d/Ξ“)β‹…O⁒(d⁒log⁑d). Furthermore, Ο† can be described using O⁒(d2⁒log2⁑d) bits and one can evaluate φ⁒(x) for any point xβˆˆβ„d in space O⁒(d2⁒log2⁑d).

4 MPC Algorithms for RS and MDS: Formal Statements

In this section, we present the formal statements of our results for RS and MDS. The proofs for the low-dimensional RS and MDS results (Lemmas 4.1 andΒ 4.2) closely follow the arguments outlined in the technical overview (see Section 1.2.2) and are therefore omitted due to space limitation (and can be found in the full versionΒ [16]). We nonetheless provide a proof outline for the high-dimensional RS result (Lemma 4.3) in Section 5.

Lemma 4.1.

There is a deterministic MPC algorithm that given threshold Ο„>0, constant Ρ∈(0,1) and dataset PβŠ†β„d of n points distributed across MPC machines with local memory sβ‰₯Ω⁒(Ξ΅βˆ’1⁒d)Ω⁒(d)β‹…poly⁑(log⁑n), computes a (Ο„,(1+Ξ΅)⁒τ)-RS for P in O⁒(logs⁑n) rounds, using O⁒(n⁒poly⁑(d⁒log⁑n))β‹…O⁒(Ξ΅βˆ’1⁒d)O⁒(d) total memory.

Lemma 4.2.

There is a deterministic MPC algorithm that given threshold Ο„>0, parameter Ρ∈(0,1) and dataset PβŠ†β„d of n points distributed across MPC machines with local memory sβ‰₯Ω⁒(Ξ΅βˆ’1⁒d)Ω⁒(d)⁒poly⁑(log⁑n), computes a (1+Ξ΅)⁒τ-DS SβŠ‚β„d for P such that |S|≀(1+Ξ΅)⁒|MDSτ⁑(P)| in O⁒(logs⁑n) rounds, using O⁒(n⁒poly⁑(log⁑n))β‹…O⁒(Ξ΅βˆ’1⁒d)O⁒(d) total memory.

Lemma 4.3.

There is an MPC algorithm that given threshold Ο„>0, 0<Ξ΅<1 and a dataset P of n points in ℝd distributed across MPC machines with local memory sβ‰₯poly⁑(d⁒log⁑n), with probability at least 1βˆ’1/n computes a (Ο„,O⁒(Ξ΅βˆ’1⁒log⁑nlog⁑log⁑n)⁒τ)-ruling set for P in O⁒(logs⁑n) rounds, using O⁒(n1+Ρ⁒poly⁑(d⁒log⁑n)) total memory.

5 Proof Outline of Lemma 4.3: RS in High Dimension

Our algorithm may be viewed as a Euclidean version of one-round Luby’s, with two major differences. One is an additional preprocessing algorithm Algorithm 1, and this step is crucially needed to improve the ruling set parameter from O⁒(log⁑n) (which is what one-round Luby’s algorithm can achieve in generalΒ [24]) to (log⁑n/log⁑log⁑n). The other is that it is not immediate to exactly simulate the one-round Luby’s in high dimensional Euclidean spaces, as the neighborhood around a data point can be huge and is not easy to handle in MPC. To this end, we need to use some approximate ball APβ⁒(p,Ο„) that is β€œsandwiched” between BP⁒(p,Ο„) and BP⁒(p,β⁒τ) for some Ξ² and every point p∈P, and an MPC procedure for this has been suggested inΒ [17] (restated in Lemma 5.7). We would address this inaccuracy introduced by APβ⁒(β‹…,β‹…) in the analysis of the offline algorithm. In the remainder of this section, we use the notations from Algorithms 1 andΒ 2.

Algorithm 1 Preprocessing, with input PβŠ†β„d of n points, Ξ²β‰₯1,Ο„>0.
Algorithm 2 Local algorithm for RS on Pβ€² resultant from Algorithm 1, same Ξ²β‰₯1,Ο„>0.
Lemma 5.1.

R is a Ο„-independent set for Pβ€² (with probability 1).

Lemma 5.2.

For any Ξ±β‰₯1, any (Ο„,α⁒τ)-ruling set for Pβ€² is an (Ο„,(Ξ±+β⁒Γ)⁒τ)-ruling set for P.

Fact 5.3.

For every p∈Pβ€², |AP′β⁒(x,Ο„)|≀Λ.

Lemma 5.4.

For every tβ‰₯Ω⁒(log2⁑Λ), the set R returned by Algorithm 2 satisfies

βˆ€p∈Pβ€²,dist⁑(p,R)≀O⁒(β⁒t)⁒τ

with probability at least 1βˆ’nβ‹…exp⁑(βˆ’Ξ©β’(t)⁒log⁑(t/log⁑Λ)).

Proof.

It suffices to show that for every p∈Pβ€², with probability 1βˆ’exp⁑(βˆ’Ξ©β’(t)⁒log⁑(t/log⁑Λ)), dist⁑(p,R)≀O⁒(β⁒t)⁒τ, since one can conclude the proof using a union bound for all points p∈Pβ€².

Now, fix a point p∈Pβ€² and consider dist⁑(p,R). In order to analyze dist⁑(p,R), we construct a sequence S:=(x0,x1,…,xT) using an auxiliary algorithm, stated in Algorithm 3, where the sequence S starts at the point x0:=p and denote the length of S as Tβ‰₯0 which is random. We emphasize that Algorithm 3 is defined with respect to the internal states of a specific (random) run of Algorithm 2, and it does not introduce new randomness.

Algorithm 3 Finding an assignment sequence S=(x0=p,…,xT), for a given p∈Pβ€².

Observe that in Algorithm 3, for every iβ‰₯0, xi+1∈AP′β⁒(xi,Ο„). Since for every p∈Pβ€², AP′β⁒(p,Ο„)βŠ†BP′⁒(p,β⁒τ), then we have that

dist⁑(p,R)β‰€βˆ‘i=1Tdist⁑(xiβˆ’1,xi)≀T⁒β⁒τ

by triangle inequality. Hence, it remains to give an upper bound for T, and we show this in the following Lemma 5.5 which is the main technical lemma. Its proof can be found in the full version due to space limit.

Lemma 5.5.

For tβ‰₯Ω⁒(log2⁑Λ), we have Pr⁑[Tβ‰₯t]≀exp⁑(βˆ’Ξ©β’(t)⁒log⁑(t/log⁑Λ)).

Finally, as mentioned, applying Lemma 5.5 with a union bound finishes the proof of Lemma 5.4. β—€

Proof of Lemma 4.3.

Our MPC algorithm for Lemma 4.3 is obtained via an MPC implementation of the offline Algorithms 1 and 2. However, before we do so, our algorithm requires to run the Johnson-Lindenstrauss (JL) transform [42] on the dataset as preprocessing to reduce d to d=O⁒(log⁑n), using parameter Ρ=O⁒(1), restated as follows with our notations.

Lemma 5.6 (JL transformΒ [42]).

For some 0<Ξ΅<1, let Mβˆˆβ„dβ€²Γ—d be a random matrix such that every entry is an independent standard Gaussian variable N⁒(0,1) where dβ€²=O⁒(Ξ΅βˆ’2⁒log⁑n), then the mapping g:x↦1/dβ€²β‹…M⁒x satisfies that with probability 1βˆ’1/poly⁑(n), βˆ€x,y∈P, dist⁑(g⁒(x),g⁒(y))∈(1Β±Ξ΅)⁒dist⁑(x,y).

This JL transform only needs O⁒(1) rounds to run in MPC, since one can generate the matrix M in a lead machine, which uses space O⁒(d⁒log⁑n), and then broadcast to every other machines. Since the pairwise distance between data points is preserved, and that our algorithms only use the distance between data points, it is immediate that any (Ο„,α⁒τ)-ruling set on g⁒(P) is as well a (Θ⁒(Ο„),Θ⁒(α⁒τ))-ruling set on P. Hence, it suffices to work on g⁒(P) which is of dimension dβ€²=O⁒(log⁑n). Without loss of generality, we can simply assume P is of dimension O⁒(log⁑n).

Now, we turn to implement Algorithms 1 andΒ 2 in MPC. For Algorithm 1, since the hash in Lemma 3.4 that we use is data oblivious and only requires poly⁑(d)=poly⁑(log⁑n) space, all machines can generate the same hash without communication. The other steps in Algorithm 1 may be implemented using standard MPC procedures including sorting, broad-casting and converge-casting. For Algorithm 2, the the only nontrivial step is to implement AP′β⁒(β‹…,β‹…). To this end, we make use of the following lemma fromΒ [17]. In particular, our MPC implementation of Algorithm 2 applies Lemma 5.7 with Ξ²=O⁒(Ξ΅βˆ’1). This finishes the description of the algorithm for Lemma 4.3.

Lemma 5.7 ([17, Theorem 3.1]).

There is a deterministic MPC algorithm that takes as input 0<Ξ΅<1, Ο„β‰₯0, PβŠ‚β„d of n points and for each p∈P a value h⁒(p)βˆˆβ„, distributed across machines with local memory sβ‰₯poly⁑(d⁒log⁑n), computes for every p∈P a value min⁑(h⁒(AP⁒(p,Ο„))), where AP⁒(p,Ο„) is an arbitrary set that satisfies

BP⁒(p,Ο„)βŠ†AP⁒(p,Ο„)βŠ†BP⁒(p,O⁒(Ξ΅βˆ’1⁒τ)), (2)

in O⁒(logs⁑n) rounds and O⁒(n1+Ρ⁒poly⁑(d⁒log⁑n)) total memory.

Now we turn to the analysis. Observe that the assumptions regarding AP′⁒(β‹…,β‹…) underlying the analysis of Algorithms 1 andΒ 2 remain valid. By Lemma 5.1, the found set R is also an Ο„-independent for P as RβŠ†Pβ€²βŠ†P. Moreover, since we assume d=O⁒(log⁑n), the parameters Ξ“=log⁑n/log⁑log⁑n, and Ξ›=poly⁑(log⁑n). Therefore, applying Lemma 5.4 with t=O⁒(log⁑n/log⁑log⁑n), we conclude that βˆ€p∈Pβ€², dist⁑(p,R)≀O⁒(Ξ΅βˆ’1⁒log⁑n/log⁑log⁑n)⁒τ, with probability 1βˆ’1/poly⁑(n). Combining with Lemma 5.2, we conclude that R is (Ο„,O⁒(Ξ΅βˆ’1⁒log⁑n/log⁑log⁑n)⁒τ)-ruling set for P, with probability 1βˆ’1/poly⁑(n).

Finally, for the round complexity, local memory and total memory, these are dominated by the parallel invocations of Lemma 5.7, which are O⁒(logs⁑n), poly⁑(d⁒log⁑n) and O⁒(n1+Ρ⁒poly⁑(d⁒log⁑n)), respectively. Therefore, we complete the proof of Lemma 4.3. β—€

References

  • [1] Sepideh Aghamolaei and Mohammad Ghodsi. A 2-approximation algorithm for data-distributed metric k-center. arXiv, 2309.04327, 2023. doi:10.48550/arXiv.2309.04327.
  • [2] AmirMohsen Ahanchi, Alexandr Andoni, MohammadTaghi Hajiaghayi, Marina Knittel, and Peilin Zhong. Massively parallel tree embeddings for high dimensional spaces. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 77–88. ACM, 2023. doi:10.1145/3558481.3591096.
  • [3] Baruch Awerbuch, AndrewΒ V Goldberg, Michael Luby, and SergeΒ A Plotkin. Network decomposition and locality in distributed computation. In 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 364–369. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63504.
  • [4] Amir Azarmehr, Soheil Behnezhad, Rajesh Jayaram, Jakub Lacki, Vahab Mirrokni, and Peilin Zhong. Massively parallel minimum spanning tree in general metric spaces. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 143–174. SIAM, 2025. doi:10.1137/1.9781611978322.5.
  • [5] MohammadHossein Bateni, Hossein Esfandiari, Manuela Fischer, and VahabΒ S. Mirrokni. Extreme k-center clustering. In Thirty-Fifth AAAI Conference on Artificial Intelligence, (AAAI), pages 3941–3949. AAAI Press, 2021. doi:10.1609/AAAI.V35I5.16513.
  • [6] Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. Journal of the ACM, 64(6):40:1–40:58, 2017. doi:10.1145/3125644.
  • [7] MarkΒ de Berg, Leyla Biabani, and Morteza Monemizadeh. k-center clustering with outliers in the MPC and streaming model. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 853–863. IEEE, 2023. doi:10.1109/IPDPS54959.2023.00090.
  • [8] Aditya Bhaskara and Maheshakya Wijewardena. Distributed clustering via LSH based data partitioning. In Proceedings of the 35th International Conference on Machine Learning (ICML), volumeΒ 80 of Proceedings of Machine Learning Research, pages 569–578. PMLR, 2018. URL: http://proceedings.mlr.press/v80/bhaskara18a.html.
  • [9] Leyla Biabani and Ami Paz. k-center clustering in distributed models. In Structural Information and Communication Complexity - 31st International Colloquium (SIROCCO), volume 14662 of Lecture Notes in Computer Science, pages 83–100. Springer, 2024. doi:10.1007/978-3-031-60603-8_5.
  • [10] MΓ©lanie Cambus, Fabian Kuhn, Shreyas Pai, and Jara Uitto. Time and space optimal massively parallel algorithm for the 2-ruling set problem. In 37th International Symposium on Distributed Computing (DISC), volume 281 of LIPIcs, pages 11:1–11:12. Schloss Dagstuhl – Leibniz-Zentrum fΓΌr Informatik, 2023. doi:10.4230/LIPICS.DISC.2023.11.
  • [11] Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. Proceedings of the VLDB Endowment, 12(7):766–778, 2019. doi:10.14778/3317315.3317319.
  • [12] Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, and Ola Svensson. Parallel and efficient hierarchical k-median clustering. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems (NeurIPS), pages 20333–20345, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/aa495e18c7e3a21a4e48923b92048a61-Abstract.html.
  • [13] Vincent Cohen-Addad, VahabΒ S. Mirrokni, and Peilin Zhong. Massively parallel k-means clustering for perturbation resilient instances. In International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, pages 4180–4201. PMLR, 2022. URL: https://proceedings.mlr.press/v162/cohen-addad22b.html.
  • [14] Sam Coy, Artur Czumaj, and Gopinath Mishra. On parallel k-center clustering. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 65–75. ACM, 2023. doi:10.1145/3558481.3591075.
  • [15] Artur Czumaj, Arnold Filtser, Shaofeng H.-C. Jiang, Robert Krauthgamer, Pavel VeselΓ½, and Mingwei Yang. Streaming facility location in high dimension via geometric hashing. arXiv preprint arXiv:2204.02095, 2022. The latest version has additional results compared to the preliminary version in [18]. arXiv:2204.02095.
  • [16] Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.Β C. Jiang. Fully scalable mpc algorithms for euclidean k-center. arXiv, 2504.16382, 2025. doi:10.48550/arXiv.2504.16382.
  • [17] Artur Czumaj, Guichen Gao, ShaofengΒ H.-C. Jiang, Robert Krauthgamer, and Pavel VeselΓ½. Fully-scalable MPC algorithms for clustering in high dimension. In 51st International Colloquium on Automata, Languages, and Programming (ICALP), volume 297 of LIPIcs, pages 50:1–50:20. Schloss Dagstuhl – Leibniz-Zentrum fΓΌr Informatik, 2024. doi:10.4230/LIPICS.ICALP.2024.50.
  • [18] Artur Czumaj, ShaofengΒ H.-C. Jiang, Robert Krauthgamer, Pavel VeselΓ½, and Mingwei Yang. Streaming facility location in high dimension via geometric hashing. In 63rd IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 450–461. IEEE, 2022. doi:10.1109/FOCS54457.2022.00050.
  • [19] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. doi:10.1145/1327452.1327492.
  • [20] Alina Ene, Sungjin Im, and Benjamin Moseley. Fast clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 681–689. ACM, 2011. doi:10.1145/2020408.2020515.
  • [21] Alessandro Epasto, Mohammad Mahdian, VahabΒ S. Mirrokni, and Peilin Zhong. Massively parallel and dynamic algorithms for minimum size clustering. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1613–1660. SIAM, 2022. doi:10.1137/1.9781611977073.66.
  • [22] Arnold Filtser. Scattering and sparse partitions, and their applications. ACM Transactions on Algorithms, 20(4):30:1–30:42, 2024. doi:10.1145/3672562.
  • [23] Mohsen Ghaffari. Massively parallel algorithms, 2019. Lecture Notes from ETH ZΓΌrich. URL: http://people.csail.mit.edu/ghaffari/MPA19/Notes/MPA.pdf.
  • [24] Mohsen Ghaffari. Distributed graph algorithms, 2022. Lecture Notes from MIT. URL: https://people.csail.mit.edu/ghaffari/DA22/Notes/DGA.pdf.
  • [25] Mohsen Ghaffari, Christoph Grunau, and CeΒ Jin. Improved MPC algorithms for MIS, matching, and coloring on trees and beyond. In 34th International Symposium on Distributed Computing (DISC), volume 179 of LIPIcs, pages 34:1–34:18. Schloss Dagstuhl – Leibniz-Zentrum fΓΌr Informatik, 2020. doi:10.4230/LIPICS.DISC.2020.34.
  • [26] Mohsen Ghaffari and Jara Uitto. Sparsifying distributed algorithms with ramifications in massively parallel computation and centralized local computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1636–1653. SIAM, 2019. doi:10.1137/1.9781611975482.99.
  • [27] Jeff Giliberti and Zahra Parsaeian. Massively parallel ruling set made deterministic. In 38th International Symposium on Distributed Computing (DISC), volume 319 of LIPIcs, pages 29:1–29:21. Schloss Dagstuhl – Leibniz-Zentrum fΓΌr Informatik, 2024. doi:10.4230/LIPICS.DISC.2024.29.
  • [28] TeofiloΒ F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985. doi:10.1016/0304-3975(85)90224-5.
  • [29] MichaelΒ T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the MapReduce framework. In Algorithms and Computation - 22nd International Symposium (ISAAC), volume 7074 of Lecture Notes in Computer Science, pages 374–383. Springer, 2011. doi:10.1007/978-3-642-25591-5_39.
  • [30] Chetan Gupta, Rustam Latypov, Yannic Maus, Shreyas Pai, Simo SΓ€rkkΓ€, Jan StudenΓ½, Jukka Suomela, Jara Uitto, and Hossein Vahidi. Fast dynamic programming in trees in the MPC model. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 443–453. ACM, 2023. doi:10.1145/3558481.3591098.
  • [31] Alireza Haqi and Hamid Zarrabi-Zadeh. Almost optimal massively parallel algorithms for k-center clustering and diversity maximization. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 239–247. ACM, 2023. doi:10.1145/3558481.3591077.
  • [32] Sariel Har-Peled. No, coreset, no cry. In Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 3328 of Lecture Notes in Computer Science, pages 324–335. Springer, 2004. doi:10.1007/978-3-540-30538-5_27.
  • [33] Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 291–300. ACM, 2004. doi:10.1145/1007352.1007400.
  • [34] DoritΒ S. Hochbaum and DavidΒ B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180–184, 1985. doi:10.1287/MOOR.10.2.180.
  • [35] DoritΒ S. Hochbaum and DavidΒ B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM, 33(3):533–550, 1986. doi:10.1145/5925.5933.
  • [36] Sungjin Im, Ravi Kumar, Silvio Lattanzi, Benjamin Moseley, and Sergei Vassilvitskii. Massively parallel computation: Algorithms and applications. Foundations and Trends in Optimization, 5(4):340–417, 2023. doi:10.1561/2400000025.
  • [37] Sungjin Im and Benjamin Moseley. Brief announcement: Fast and better distributed mapreduce algorithms for k-center clustering. In Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 65–67. ACM, 2015. doi:10.1145/2755573.2755607.
  • [38] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 EuroSys Conference, pages 59–72. ACM, 2007. doi:10.1145/1272996.1273005.
  • [39] Rajesh Jayaram, Vahab Mirrokni, Shyam Narayanan, and Peilin Zhong. Massively parallel algorithms for high-dimensional Euclidean minimum spanning tree. In Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3960–3996. SIAM, 2024. doi:10.1137/1.9781611977912.139.
  • [40] Hongyan Ji, Kishore Kothapalli, SriramΒ V. Pemmaraju, and Ajitanshu Singh. Fast deterministic massively parallel ruling sets algorithms. In Proceedings of the 26th International Conference on Distributed Computing and Networking (ICDCN), pages 152–160. ACM, 2025. doi:10.1145/3700838.3700872.
  • [41] Lujun Jia, Guolong Lin, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. Universal approximations for TSP, Steiner tree, and set cover. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pages 386–395. ACM, 2005. doi:10.1145/1060590.1060649.
  • [42] William Johnson and Joram Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189–206, 1984. doi:10.1090/conm/026/737400.
  • [43] HowardΒ J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 938–948. SIAM, 2010. doi:10.1137/1.9781611973075.76.
  • [44] Ting Liang, Qilong Feng, Xiaoliang Wu, Jinhui Xu, and Jianxin Wang. Improved approximation algorithm for the distributed lower-bounded k-center problem. In Theory and Applications of Models of Computation - 18th Annual Conference (TAMC), volume 14637 of Lecture Notes in Computer Science, pages 309–319. Springer, 2024. doi:10.1007/978-981-97-2340-9_26.
  • [45] Michael Luby. A simple parallel algorithm for the maximal independent set problem. In Proceedings of the 17th Annual ACM Symposium on Theory of Computing (STOC), pages 1–10. ACM, 1985. doi:10.1145/22145.22146.
  • [46] Gustavo Malkomes, MattΒ J. Kusner, Wenlin Chen, KilianΒ Q. Weinberger, and Benjamin Moseley. Fast distributed k-center clustering with outliers on massive data. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS), pages 1063–1071, 2015. URL: https://proceedings.neurips.cc/paper/2015/hash/8fecb20817b3847419bb3de39a609afe-Abstract.html.
  • [47] Krzysztof Onak. Round compression for parallel graph algorithms in strongly sublinear space. arXiv, 1807.08745, 2018. doi:10.48550/arXiv.1807.08745.
  • [48] David Pollard. Empirical Processes: Theory and Applications, chapter 4: Packing and Covering in Euclidean Spaces, pages 14–20. IMS, 1990. doi:10.1214/cbms/1462061091.
  • [49] Tim Roughgarden, Sergei Vassilvitski, and JoshuaΒ R. Wang. Shuffles and circuits (on lower bounds for modern parallel computation). Journal of the ACM, 65(6):41:1–41:24, 2018. doi:10.1145/3232536.
  • [50] VΓ‘clav Rozhoň and Mohsen Ghaffari. Polylogarithmic-time deterministic network decomposition and distributed derandomization. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 350–363. ACM, 2020. doi:10.1145/3357713.3384298.
  • [51] Johannes Schneider and Roger Wattenhofer. An optimal maximal independent set algorithm for bounded-independence graphs. Distributed Computing, 22(5-6):349–361, 2010. doi:10.1007/S00446-010-0097-1.
  • [52] Tom White. Hadoop: The Definitive Guide. O’Reilly, 4th edition, 2015. URL: http://www.oreilly.de/catalog/9781491901632/index.html.
  • [53] Matei Zaharia, Mosharaf Chowdhury, MichaelΒ J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’10. USENIX Association, 2010. URL: https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets.

Appendix A Proof of LemmaΒ 3.1: Geometric Hashing

See 3.1

The construction of the hash f:ℝd→ℝd is the same as that in [15, Theorem 5.3]. Their proof already implies the space bound, as well as the first two properties; In fact, our second property is stronger than what they state, but the stronger version was indeed proved in their paper, albeit implicitly. We would restate their algorithm/construction, briefly sketch the proof for the first two properties, and focus on the third property.

Hash Construction.

We start with reviewing their algorithm/construction of f. Partition ℝd into hypercubes of side length z of the form Γ—i=1d[ai⁒z,(ai+1)⁒z) where a1,…,adβˆˆβ„€. Notice that each of hypercubes is of diameter β„“. Let β„“i:=(dβˆ’i)⁒β where i∈{0,…,dβˆ’1}. For every i∈{0,…,d}, let β„±i be the set of i-dimensional faces of the above-mentioned hypercubes, and let Ai be the set of points that belong to these faces, i.e., Ai:=⋃Qβˆˆβ„±iQ. For each i∈{0,…,dβˆ’1}, let Bi:=Nβ„“i∞⁒(Ai) be β„“i-neighborhood of Ai (with respect to β„“βˆž distance). Let Bβˆ’1:=βˆ… and B≀i:=⋃j≀iBj. The main procedure for the construction of f goes as follows.

  • β– 

    For every i∈{0,…,dβˆ’1}, for every Qβˆˆβ„±i, let Q^:=Nβ„“i∞⁒(Q)βˆ–B≀iβˆ’1, and for every x∈Q^ assign f⁒(x)=q where q∈Q^ is an arbitrary but fixed point. Observe that Q^βŠ†Bi and thus x∈Q^ will not be assigned again in later iterations.

  • β– 

    For every Qβˆˆβ„±d, let Q^:=Qβˆ–B≀dβˆ’1 be the remaining part of Q whose f⁒(x) has not been assigned, and assign f⁒(x)=q for every x∈Q^, where q∈Q^ is arbitrary but fixed point.

First Two Properties.

The first property is immediate from this construction since every bucket is a subset of a hypercube of side-length z whose diameter is d⁒z=β„“. To establish the second property, we need to define the d+1 groups. For 0≀i≀dβˆ’1, define 𝒲i:={Nβ„“i∞⁒(Q)βˆ–B≀iβˆ’1:Qβˆˆβ„±i} and let 𝒲d:={Qβˆ–B≀dβˆ’1:Qβˆˆβ„±d}. Then by the construction of f, {fβˆ’1⁒(u):u∈f⁒(ℝd)}=⋃i=0d𝒲i, hence the 𝒲i is a proper partition of buckets. These 𝒲i’s are implicitly analyzed inΒ [15, Lemma 5.5 and Lemma 5.8], restated and adapted (as Lemma A.1) to our notation as follows. This lemma readily implies our second property.

Lemma A.1 ([15, Lemma 5.5 and Lemma 5.8]).

For every 0≀i≀d and every Sβ‰ Sβ€²βˆˆπ’²i, dist⁑(S,Sβ€²)β‰₯dist∞⁑(S,Sβ€²)>Ξ².

Third Property.

We proceed to prove the third property. The third property, particularly L⁒(z,2⁒b) is well-defined, since by the choice of parameters it can be verified that z>4⁒b. The proof starts with the following claim. It relates the complicated ⋃uUΟ„βˆžβ’(fβˆ’1⁒(u)) to a much simpler object Nb∞⁒(Adβˆ’1), which is merely the b-neighborhood of (dβˆ’1)-dimensional faces.

Claim A.2.

⋃u∈f⁒(ℝd)UΟ„βˆžβ’(fβˆ’1⁒(u))βŠ†Nb∞⁒(Adβˆ’1).

Proof.

Recall that {fβˆ’1⁒(u):u∈f⁒(ℝd)}=⋃i=0d𝒲i. Therefore, it suffices to prove that for every i∈{0,…,d} and for every Sβˆˆπ’²i, UΟ„βˆžβ’(S)βŠ†Nb∞⁒(Adβˆ’1). We prove this separately for the case of i≀dβˆ’1 and i=d.

Case I: π’Šβ‰€π’…βˆ’πŸ.

We first analyze the case that 0≀i≀dβˆ’1. Fix some 0≀i≀dβˆ’1 and some S:=Nβ„“i∞⁒(Q)βˆ–B≀iβˆ’1βˆˆπ’²i (for some Qβˆˆβ„±i). Observe that S=Nβ„“i∞⁒(Q)βˆ–B≀iβˆ’1βŠ†Nβ„“i∞⁒(Q)βŠ†Nβ„“i∞⁒(Ai). Then we have that NΟ„βˆžβ’(S)βŠ†NΟ„βˆžβ’(Nβ„“i∞⁒(Ai))=Nβ„“i+Ο„βˆžβ’(Ai). Observe that β„“i=(dβˆ’i)⁒β≀d⁒β. Hence, UΟ„βˆžβ’(S)=NΟ„βˆžβ’(S)βˆ–SβŠ†NΟ„βˆžβ’(S)βŠ†Nβ„“i+Ο„βˆžβ’(Ai)βŠ†Nd⁒β+Ο„βˆžβ’(Adβˆ’1)=Nb∞⁒(Adβˆ’1). This finishes the case of i≀dβˆ’1.

Case II: π’Š=𝒅.

Next, we analyze the case that i=d. By Lemma A.1, for every Sβ‰ Sβ€²βˆˆπ’²d, dist∞⁑(S,Sβ€²)>Ξ²β‰₯Ο„, and this implies

NΟ„βˆžβ’(S)∩Sβ€²=βˆ…, (3)

as β„“βˆž distance between any two points is at most their β„“2 distance. Now fix some Sβˆˆπ’²d. By the construction, Ad=ℝd and B≀dβˆ’1βˆ©β‹ƒSβˆˆπ’²dS=βˆ…. By (3), NΟ„βˆžβ’(S) has no intersection with 𝒲d other than on S. Hence,

UΟ„βˆžβ’(S)=NΟ„βˆžβ’(S)βˆ–SβŠ†B≀dβˆ’1. (4)

To further relate B≀dβˆ’1 with Adβˆ’1, we have

B≀dβˆ’1=⋃j=0dβˆ’1Bj=⋃j=0dβˆ’1Nβ„“j∞⁒(Aj)βŠ†β‹ƒj=0dβˆ’1Nd⁒β∞⁒(Aj)βŠ†Nd⁒β∞⁒(Adβˆ’1),

where the last step follows from AjβŠ†Aj+1 for every j. Combining this with (4), we conclude that UΟ„βˆžβ’(S)βŠ†Nd⁒β∞⁒(Adβˆ’1)βŠ†Nb∞⁒(Adβˆ’1). This finishes the proof of ˜A.2. ⊲

By ˜A.2, it remains to prove Nb∞⁒(Adβˆ’1)∩L⁒(z,2⁒b)d=βˆ…. Since Nb∞⁒(Adβˆ’1)=Nb∞⁒(⋃Qβˆˆβ„±dβˆ’1Q)=⋃Qβˆˆβ„±dβˆ’1Nb∞⁒(Q)=⋃x∈Q,Qβˆˆβ„±dβˆ’1Nb∞⁒(x), it suffices to show βˆ€x∈Q (for Qβˆˆβ„±dβˆ’1), Nb∞⁒(x)∩L⁒(z,2⁒b)d=βˆ…. Now fix some Qβˆˆβ„±dβˆ’1 and x∈Q. Since Q is a (dβˆ’1)-dimensional face, it has dβˆ’1 dimensions spanning an entire interval of length z, and has exactly one dimension taking a value of the form a⁒z (aβˆˆβ„€). Formally, x∈Q if and only if there exists 0≀j≀d and {aiβˆˆβ„€}i=0d such that for 0≀i≀d,

xi∈{[ai⁒z,(ai+1)⁒z)iβ‰ j{ai⁒z}i=j. (5)

Let j be that as in (5). Then for every y∈L⁒(z,2⁒b)d,

β€–xβˆ’yβ€–βˆžβ‰₯|xjβˆ’yj|β‰₯2⁒b,

where the last inequality follows from the definition of L⁒(z,2⁒b) and recall that L⁒(z,2⁒b)=⋃aβˆˆβ„€[a⁒z+2⁒b,(a+1)⁒zβˆ’2⁒b). This further implies βˆ€xβ€²βˆˆNb∞⁒(x) and y∈L⁒(z,2⁒b)d, β€–xβ€²βˆ’yβ€–βˆžβ‰₯b, by the triangle inequality of β„“βˆž. Therefore, we conclude that Nb∞⁒(x)∩L⁒(z,2⁒b)d=βˆ…, and this finishes the proof of Lemma 3.1. ∎