DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.50

Fully-Scalable MPC Algorithms for Clustering in High Dimension

Authors: Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be n^σ for arbitrarily small fixed σ > 0. Importantly, the local memory may be substantially smaller than the number of clusters k, yet all our algorithms are fast, i.e., run in O(1) rounds. We first devise a fast MPC algorithm for O(1)-approximation of uniform Facility Location. This is the first fully-scalable MPC algorithm that achieves O(1)-approximation for any clustering problem in general geometric setting; previous algorithms only provide poly(log n)-approximation or apply to restricted inputs, like low dimension or small number of clusters k; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this Facility Location result and devise a fast MPC algorithm that achieves O(1)-bicriteria approximation for k-Median and for k-Means, namely, it computes (1+ε)k clusters of cost within O(1/ε²)-factor of the optimum for k clusters. A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

Cite as

Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý. Fully-Scalable MPC Algorithms for Clustering in High Dimension. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 50:1-50:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2024.50,
  author =	{Czumaj, Artur and Gao, Guichen and Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Vesel\'{y}, Pavel},
  title =	{{Fully-Scalable MPC Algorithms for Clustering in High Dimension}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{50:1--50:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.50},
  URN =		{urn:nbn:de:0030-drops-201938},
  doi =		{10.4230/LIPIcs.ICALP.2024.50},
  annote =	{Keywords: Massively parallel computing, high dimension, facility location, k-median, k-means}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.92

Algorithms for the Generalized Poset Sorting Problem

Authors: Shaofeng H.-C. Jiang, Wenqian Wang, Yubo Zhang, and Yuhao Zhang

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider a generalized poset sorting problem (GPS), in which we are given a query graph G = (V, E) and an unknown poset 𝒫(V, ≺) that is defined on the same vertex set V, and the goal is to make as few queries as possible to edges in G in order to fully recover 𝒫, where each query (u, v) returns the relation between u, v, i.e., u ≺ v, v ≺ u or u ̸ ∼ v. This generalizes both the poset sorting problem [Faigle et al., SICOMP 88] and the generalized sorting problem [Huang et al., FOCS 11]. We give algorithms with Õ(n poly(k)) query complexity when G is a complete bipartite graph or G is stochastic under the Erdős-Rényi model, where k is the width of the poset, and these generalize [Daskalakis et al., SICOMP 11] which only studies complete graph G. Both results are based on a unified framework that reduces the poset sorting to partitioning the vertices with respect to a given pivot element, which may be of independent interest. Moreover, we also propose novel algorithms to implement this partition oracle. Notably, we suggest a randomized BFS with vertex skipping for the stochastic G, and it yields a nearly-tight bound even for the special case of generalized sorting (for stochastic G) which is comparable to the main result of a recent work [Kuszmaul et al., FOCS 21] but is conceptually different and simplified. Our study of GPS also leads to a new Õ(n^{1 - 1 / (2W)}) competitive ratio for the so-called weighted generalized sorting problem where W is the number of distinct weights in the query graph. This problem was considered as an open question in [Charikar et al., JCSS 02], and our result makes important progress as it yields the first nontrivial sublinear ratio for general weighted query graphs (for any bounded W). We obtain this via an Õ(nk + n^{1.5}) query complexity algorithm for the case where every edge in G is guaranteed to be comparable in the poset, which generalizes a Õ(n^{1.5}) bound for generalized sorting [Huang et al., FOCS 11].

Cite as

Shaofeng H.-C. Jiang, Wenqian Wang, Yubo Zhang, and Yuhao Zhang. Algorithms for the Generalized Poset Sorting Problem. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 92:1-92:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.ICALP.2024.92,
  author =	{Jiang, Shaofeng H.-C. and Wang, Wenqian and Zhang, Yubo and Zhang, Yuhao},
  title =	{{Algorithms for the Generalized Poset Sorting Problem}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{92:1--92:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.92},
  URN =		{urn:nbn:de:0030-drops-202359},
  doi =		{10.4230/LIPIcs.ICALP.2024.92},
  annote =	{Keywords: sorting, poset sorting, generalized sorting}
}

Document

DOI: 10.4230/LIPIcs.SoCG.2024.64

Moderate Dimension Reduction for k-Center Clustering

Authors: Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir

Published in: LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

Abstract

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of n points in ℝ^d and any fixed ε > 0, it reduces the dimension d to O(log n) while preserving, with high probability, all the pairwise Euclidean distances within factor 1+ε. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or k-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below O(log n) does not preserve the optimal value within factor 1+ε. We propose to focus on another regime, of moderate dimension reduction, where a problem’s value is preserved within factor α > 1 using target dimension (log n)/poly(α). We establish the viability of this approach and show that the famous k-center problem is α-approximated when reducing to dimension O({log n}/α² + log k). Along the way, we address the diameter problem via the special case k = 1. Our result extends to several important variants of k-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input’s doubling dimension. While our poly(α)-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for k-center in dynamic geometric streams, that achieves O(α)-approximation using space poly(kdn^{1/α²}). This is the first algorithm to beat O(n) space in high dimension d, as all previous algorithms require space at least exp(d). Furthermore, it extends to the k-center variants mentioned above.

Cite as

Shaofeng H.-C. Jiang, Robert Krauthgamer, and Shay Sapir. Moderate Dimension Reduction for k-Center Clustering. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 64:1-64:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{jiang_et_al:LIPIcs.SoCG.2024.64,
  author =	{Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Sapir, Shay},
  title =	{{Moderate Dimension Reduction for k-Center Clustering}},
  booktitle =	{40th International Symposium on Computational Geometry (SoCG 2024)},
  pages =	{64:1--64:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-316-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{293},
  editor =	{Mulzer, Wolfgang and Phillips, Jeff M.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.64},
  URN =		{urn:nbn:de:0030-drops-200095},
  doi =		{10.4230/LIPIcs.SoCG.2024.64},
  annote =	{Keywords: Johnson-Lindenstrauss transform, dimension reduction, clustering, streaming algorithms}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.47

Streaming Algorithms for Geometric Steiner Forest

Authors: Artur Czumaj, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

We consider an important generalization of the Steiner tree problem, the Steiner forest problem, in the Euclidean plane: the input is a multiset X ⊆ ℝ², partitioned into k color classes C₁, C₂, …, Cₖ ⊆ X. The goal is to find a minimum-cost Euclidean graph G such that every color class Cᵢ is connected in G. We study this Steiner forest problem in the streaming setting, where the stream consists of insertions and deletions of points to X. Each input point x ∈ X arrives with its color color(x) ∈ [k], and as usual for dynamic geometric streams, the input is restricted to the discrete grid {0, …, Δ}². We design a single-pass streaming algorithm that uses poly(k ⋅ log Δ) space and time, and estimates the cost of an optimal Steiner forest solution within ratio arbitrarily close to the famous Euclidean Steiner ratio α₂ (currently 1.1547 ≤ α₂ ≤ 1.214). This approximation guarantee matches the state of the art bound for streaming Steiner tree, i.e., when k = 1. Our approach relies on a novel combination of streaming techniques, like sampling and linear sketching, with the classical Arora-style dynamic-programming framework for geometric optimization problems, which usually requires large memory and has so far not been applied in the streaming setting. We complement our streaming algorithm for the Steiner forest problem with simple arguments showing that any finite approximation requires Ω(k) bits of space.

Cite as

Artur Czumaj, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý. Streaming Algorithms for Geometric Steiner Forest. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 47:1-47:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2022.47,
  author =	{Czumaj, Artur and Jiang, Shaofeng H.-C. and Krauthgamer, Robert and Vesel\'{y}, Pavel},
  title =	{{Streaming Algorithms for Geometric Steiner Forest}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{47:1--47:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.47},
  URN =		{urn:nbn:de:0030-drops-163880},
  doi =		{10.4230/LIPIcs.ICALP.2022.47},
  annote =	{Keywords: Steiner forest, streaming, sublinear algorithms, dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.ESA.2018.15

A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

Authors: T-H. Hubert Chan, Haotian Jiang, and Shaofeng H.-C. Jiang

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

Abstract

We present a unified (randomized) polynomial-time approximation scheme (PTAS) for the prize collecting traveling salesman problem (PCTSP) and the prize collecting Steiner tree problem (PCSTP) in doubling metrics. Given a metric space and a penalty function on a subset of points known as terminals, a solution is a subgraph on points in the metric space, whose cost is the weight of its edges plus the penalty due to terminals not covered by the subgraph. Under our unified framework, the solution subgraph needs to be Eulerian for PCTSP, while it needs to be a tree for PCSTP. Before our work, even a QPTAS for the problems in doubling metrics is not known. Our unified PTAS is based on the previous dynamic programming frameworks proposed in [Talwar STOC 2004] and [Bartal, Gottlieb, Krauthgamer STOC 2012]. However, since it is unknown which part of the optimal cost is due to edge lengths and which part is due to penalties of uncovered terminals, we need to develop new techniques to apply previous divide-and-conquer strategies and sparse instance decompositions.

Cite as

T-H. Hubert Chan, Haotian Jiang, and Shaofeng H.-C. Jiang. A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{chan_et_al:LIPIcs.ESA.2018.15,
  author =	{Chan, T-H. Hubert and Jiang, Haotian and Jiang, Shaofeng H.-C.},
  title =	{{A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{15:1--15:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.15},
  URN =		{urn:nbn:de:0030-drops-94781},
  doi =		{10.4230/LIPIcs.ESA.2018.15},
  annote =	{Keywords: Doubling Dimension, Traveling Salesman Problem, Polynomial Time Approximation Scheme, Steiner Tree Problem, Prize Collecting}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.24

Online Submodular Maximization Problem with Vector Packing Constraint

Authors: T.-H. Hubert Chan, Shaofeng H.-C. Jiang, Zhihao Gavin Tang, and Xiaowei Wu

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

We consider the online vector packing problem in which we have a d dimensional knapsack and items u with weight vectors w_u in R_+^d arrive online in an arbitrary order. Upon the arrival of an item, the algorithm must decide immediately whether to discard or accept the item into the knapsack. When item u is accepted, w_u(i) units of capacity on dimension i will be taken up, for each i in [d]. To satisfy the knapsack constraint, an accepted item can be later disposed of with no cost, but discarded or disposed of items cannot be recovered. The objective is to maximize the utility of the accepted items S at the end of the algorithm, which is given by f(S) for some non-negative monotone submodular function f. For any small constant epsilon > 0, we consider the special case that the weight of an item on every dimension is at most a (1- epsilon) fraction of the total capacity, and give a polynomial-time deterministic O(k / epsilon^2)-competitive algorithm for the problem, where k is the (column) sparsity of the weight vectors. We also show several (almost) tight hardness results even when the algorithm is computationally unbounded. We first show that under the epsilon-slack assumption, no deterministic algorithm can obtain any o(k) competitive ratio, and no randomized algorithm can obtain any o(k / log k) competitive ratio. We then show that for the general case (when epsilon = 0), no randomized algorithm can obtain any o(k) competitive ratio. In contrast to the (1+delta) competitive ratio achieved in Kesselheim et al. [STOC 2014] for the problem with random arrival order of items and under large capacity assumption, we show that in the arbitrary arrival order case, even when |w_u|_infinity is arbitrarily small for all items u, it is impossible to achieve any o(log k / log log k) competitive ratio.

Cite as

T.-H. Hubert Chan, Shaofeng H.-C. Jiang, Zhihao Gavin Tang, and Xiaowei Wu. Online Submodular Maximization Problem with Vector Packing Constraint. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 24:1-24:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{chan_et_al:LIPIcs.ESA.2017.24,
  author =	{Chan, T.-H. Hubert and Jiang, Shaofeng H.-C. and Tang, Zhihao Gavin and Wu, Xiaowei},
  title =	{{Online Submodular Maximization Problem with Vector Packing Constraint}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{24:1--24:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.24},
  URN =		{urn:nbn:de:0030-drops-78190},
  doi =		{10.4230/LIPIcs.ESA.2017.24},
  annote =	{Keywords: Submodular Maximization, Free-disposal, Vector Packing}
}

Search Results

Documents authored by Jiang, Shaofeng H.-C.

Fully-Scalable MPC Algorithms for Clustering in High Dimension

Abstract

Cite as

Algorithms for the Generalized Poset Sorting Problem

Abstract

Cite as

Moderate Dimension Reduction for k-Center Clustering

Abstract

Cite as

Streaming Algorithms for Geometric Steiner Forest

Abstract

Cite as

A Unified PTAS for Prize Collecting TSP and Steiner Tree Problem in Doubling Metrics

Abstract

Cite as

Online Submodular Maximization Problem with Vector Packing Constraint

Abstract

Cite as