DROPS

Document

DOI: 10.4230/LIPIcs.ICDT.2025.7

O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN

Authors: Junhao Gan, Anthony Wirth, and Zhuo Zhang

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)

Abstract

In this paper, we investigate three fundamental problems in the Massively Parallel Computation (MPC) model: (i) grid graph connectivity, (ii) approximate Euclidean Minimum Spanning Tree (EMST), and (iii) approximate DBSCAN. Our first result is a O(1)-round Las Vegas (i.e., succeeding with high probability) MPC algorithm for computing the connected components on a d-dimensional c-penetration grid graph ((d,c)-grid graph), where both d and c are positive integer constants. In such a grid graph, each vertex is a point with integer coordinates in ℕ^d, and an edge can only exist between two distinct vertices with 𝓁_∞-norm at most c. To our knowledge, the current best existing result for computing the connected components (CC’s) on (d,c)-grid graphs in the MPC model is to run the state-of-the-art MPC CC algorithms that are designed for general graphs: they achieve O(log log n + log D) [Behnezhad et al., 2019] and O(log log n + log 1/(λ)) [Sepehr Assadi et al., 2019] rounds, respectively, where D is the diameter and λ is the spectral gap of the graph. With our grid graph connectivity technique, our second main result is a O(1)-round Las Vegas MPC algorithm for computing approximate Euclidean MST. The existing state-of-the-art result on this problem is the O(1)-round MPC algorithm proposed by Andoni et al. [Alexandr Andoni et al., 2014], which only guarantees an approximation on the overall weight in expectation. In contrast, our algorithm not only guarantees a deterministic overall weight approximation, but also achieves a deterministic edge-wise weight approximation. The latter property is crucial to many applications, such as finding the Bichromatic Closest Pair and Single-Linkage Clustering. Last, but not least, our third main result is a O(1)-round Las Vegas MPC algorithm for computing an approximate DBSCAN clustering in O(1)-dimensional Euclidean space.

Cite as

Junhao Gan, Anthony Wirth, and Zhuo Zhang. O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.ICDT.2025.7,
  author =	{Gan, Junhao and Wirth, Anthony and Zhang, Zhuo},
  title =	{{O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{7:1--7:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.7},
  URN =		{urn:nbn:de:0030-drops-229483},
  doi =		{10.4230/LIPIcs.ICDT.2025.7},
  annote =	{Keywords: Massively Parallel Computation, Graph Connectivity, Grid Graphs, Euclidean Minimum Spanning Tree, DBSCAN}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2024.25

Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound

Authors: Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

Abstract

We consider the Max Unique Coverage problem, including applications to the data stream model. The input is a universe of n elements, a collection of m subsets of this universe, and a cardinality constraint, k. The goal is to select a subcollection of at most k sets that maximizes unique coverage, i.e, the number of elements contained in exactly one of the selected sets. The Max Unique Coverage problem has applications in wireless networks, radio broadcast, and envy-free pricing. Our first main result is a fixed-parameter tractable approximation scheme (FPT-AS) for Max Unique Coverage, parameterized by k and the maximum element frequency, r, which can be implemented on a data stream. Our FPT-AS finds a (1-ε)-approximation while maintaining a kernel of size Õ(k r/ε), which can be combined with subsampling to use Õ(k² r / ε³) space overall. This significantly improves on the previous-best FPT-AS with the same approximation, but a kernel of size Õ(k² r / ε²). In order to achieve our first result, we show upper bounds on the ratio of a collection’s coverage to the unique coverage of a maximizing subcollection; this is by constructing explicit algorithms that find a subcollection with unique coverage at least a logarithmic ratio of the collection’s coverage. We complement our algorithms with our second main result, showing that Ω(m / k²) space is necessary to achieve a (1.5 + o(1))/(ln k - 1)-approximation in the data stream. This dramatically improves the previous-best lower bound showing that Ω(m / k²) is necessary to achieve better than a e^{-1+1/k}-approximation.

Cite as

Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth. Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 25:1-25:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cervenjak_et_al:LIPIcs.APPROX/RANDOM.2024.25,
  author =	{Cervenjak, Philip and Gan, Junhao and Umboh, Seeun William and Wirth, Anthony},
  title =	{{Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{25:1--25:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  URN =		{urn:nbn:de:0030-drops-210183},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  annote =	{Keywords: Maximum unique coverage, maximum coverage, approximate kernel, data streams}
}

@InProceedings{cervenjak_et_al:LIPIcs.APPROX/RANDOM.2024.25,
  author =	{Cervenjak, Philip and Gan, Junhao and Umboh, Seeun William and Wirth, Anthony},
  title =	{{Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{25:1--25:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  URN =		{urn:nbn:de:0030-drops-210183},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.25},
  annote =	{Keywords: Maximum unique coverage, maximum coverage, approximate kernel, data streams}
}

Document

DOI: 10.4230/LIPIcs.SWAT.2022.25

An Almost Optimal Algorithm for Unbounded Search with Noisy Information

Authors: Junhao Gan, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)

Abstract

Given a sequence of integers, 𝒮 = s₁, s₂,… in ascending order, called the search domain, and an integer t, called the target, the predecessor problem asks for the target index N such that s_N is the largest integer in 𝒮 satisfying s_N ≤ t. We consider solving the predecessor problem with the least number of queries to a binary comparison oracle. For each query index i, the oracle returns whether s_i ≤ t or s_i > t. In particular, we study the predecessor problem under the UnboundedNoisy setting, where (i) the search domain 𝒮 is unbounded, i.e., n = |𝒮| is unknown or infinite, and (ii) the binary comparison oracle is noisy. We denote the former setting by Unbounded and the latter by Noisy. In Noisy, the oracle, for each query, independently returns a wrong answer with a fixed constant probability 0 < p < 1/2. In particular, even for two queries on the same index i, the answers from the oracle may be different. Furthermore, with a noisy oracle, the goal is to correctly return the target index with probability at least 1- Q, where 0 < Q < 1/2 is the failure probability. Our first result is an algorithm, called NoS, for Noisy that improves the previous result by Ben-Or and Hassidim [FOCS 2008] from an expected query complexity bound to a worst-case bound. We also achieve an expected query complexity bound, whose leading term has an optimal constant factor, matching the lower bound of Ben-Or and Hassidim. Building on NoS, we propose our NoSU algorithm, which correctly solves the predecessor problem in the UnboundedNoisy setting. We prove that the query complexity of NoSU is ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) when log Q^{-1} ∈ o(log N), where N is the target index, k = log^* N, the iterated logarithm, and H(p) is the entropy function. This improves the previous bound of O(log (N/Q) / (1-H(p))) by reducing the coefficient of the leading term from a large constant to 1. Moreover, we show that this upper bound can be further improved to (1 - Q) ∑_{i = 1}^k (log^{(i)} N) /(1-H(p))+ o(log N) in expectation, with the constant in the leading term reduced to 1 - Q. Finally, we show that an information-theoretic lower bound on the expected query cost of the predecessor problem in UnboundedNoisy is at least (1 - Q)(∑_{i = 1}^k log^{(i)} N - 2k)/(1-H(p)) - 10. This implies the constant factor in the leading term of our expected upper bound is indeed optimal.

Cite as

Junhao Gan, Anthony Wirth, and Xin Zhang. An Almost Optimal Algorithm for Unbounded Search with Noisy Information. In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 25:1-25:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.SWAT.2022.25,
  author =	{Gan, Junhao and Wirth, Anthony and Zhang, Xin},
  title =	{{An Almost Optimal Algorithm for Unbounded Search with Noisy Information}},
  booktitle =	{18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)},
  pages =	{25:1--25:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-236-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{227},
  editor =	{Czumaj, Artur and Xin, Qin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.25},
  URN =		{urn:nbn:de:0030-drops-161854},
  doi =		{10.4230/LIPIcs.SWAT.2022.25},
  annote =	{Keywords: Fault-tolerant search, noisy binary search, query complexity}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2020.39

Graph Clustering in All Parameter Regimes

Authors: Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang

Published in: LIPIcs, Volume 170, 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)

Abstract

Resolution parameters in graph clustering control the size and structure of clusters formed by solving a parametric objective function. Typically there is more than one meaningful way to cluster a graph, and solving the same objective function for different resolution parameters produces clusterings at different levels of granularity, each of which can be meaningful depending on the application. In this paper, we address the task of efficiently solving a parameterized graph clustering objective for all values of a resolution parameter. Specifically, we consider a new analysis-friendly objective we call LambdaPrime, involving a parameter λ ∈ (0,1). LambdaPrime is an adaptation of LambdaCC, a significant family of instances of the Correlation Clustering (minimization) problem. Indeed, LambdaPrime and LambdaCC are closely related to other parameterized clustering problems, such as parametric generalizations of modularity. They capture a number of specific clustering problems as special cases, including sparsest cut and cluster deletion. While previous work provides approximation results for a single value of the resolution parameter, we seek a set of approximately optimal clusterings for all values of λ in polynomial time. More specifically, we show that when a graph has m edges and n nodes, there exists a set of at most m clusterings such that, for every λ ∈ (0,1), the family contains an optimal solution to the LambdaPrime objective. This bound is tight on star graphs. We obtain a family of O(log n) clusterings by solving the parametric linear programming (LP) relaxation of LambdaPrime at O(log n) λ values, and rounding each LP solution using existing approximation algorithms. We prove that this is asymptotically tight: for a certain class of ring graphs, for all values of λ, Ω(log n) feasible solutions are required to provide a constant-factor approximation for the LambdaPrime LP relaxation. To minimize the size of the clustering family, we further propose an algorithm that yields a family of solutions of a size no more than twice of the minimum LP-approximating family.

Cite as

Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang. Graph Clustering in All Parameter Regimes. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 170, pp. 39:1-39:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{gan_et_al:LIPIcs.MFCS.2020.39,
  author =	{Gan, Junhao and Gleich, David F. and Veldt, Nate and Wirth, Anthony and Zhang, Xin},
  title =	{{Graph Clustering in All Parameter Regimes}},
  booktitle =	{45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)},
  pages =	{39:1--39:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-159-7},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{170},
  editor =	{Esparza, Javier and Kr\'{a}l', Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2020.39},
  URN =		{urn:nbn:de:0030-drops-127065},
  doi =		{10.4230/LIPIcs.MFCS.2020.39},
  annote =	{Keywords: Graph Clustering, Algorithms, Parametric Linear Programming}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2019.60

Result-Sensitive Binary Search with Noisy Information

Authors: Narthana S. Epa, Junhao Gan, and Anthony Wirth

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)

Abstract

We describe new algorithms for the predecessor problem in the Noisy Comparison Model. In this problem, given a sorted list L of n (distinct) elements and a query q, we seek the predecessor of q in L: denoted by u, the largest element less than or equal to q. In the Noisy Comparison Model, the result of a comparison between two elements is non-deterministic. Moreover, multiple comparisons of the same pair of elements might have different results: each is generated independently, and is correct with probability p > 1/2. Given an overall error tolerance Q, the cost of an algorithm is measured by the total number of noisy comparisons; these must guarantee the predecessor is returned with probability at least 1 - Q. Feige et al. showed that predecessor queries can be answered by a modified binary search with Theta(log (n/Q)) noisy comparisons. We design result-sensitive algorithms for answering predecessor queries. The query cost is related to the index, k, of the predecessor u in L. Our first algorithm answers predecessor queries with O(log ((log^{*(c)} n)/Q) + log (k/Q)) noisy comparisons, for an arbitrarily large constant c. The function log^{*(c)} n iterates c times the iterated-logarithm function, log^* n. Our second algorithm is a genuinely result-sensitive algorithm whose expected query cost is bounded by O(log (k/Q)), and is guaranteed to terminate after at most O(log((log n)/Q)) noisy comparisons. Our results strictly improve the state-of-the-art bounds when k is in omega(1) intersected with o(n^epsilon), where epsilon > 0 is some constant. Moreover, we show that our result-sensitive algorithms immediately improve not only predecessor-query algorithms, but also binary-search-like algorithms for solving key applications.

Cite as

Narthana S. Epa, Junhao Gan, and Anthony Wirth. Result-Sensitive Binary Search with Noisy Information. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 60:1-60:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{epa_et_al:LIPIcs.ISAAC.2019.60,
  author =	{Epa, Narthana S. and Gan, Junhao and Wirth, Anthony},
  title =	{{Result-Sensitive Binary Search with Noisy Information}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{60:1--60:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.60},
  URN =		{urn:nbn:de:0030-drops-115568},
  doi =		{10.4230/LIPIcs.ISAAC.2019.60},
  annote =	{Keywords: Fault-tolerant search, random walks, noisy comparisons, predecessor queries}
}

Search Results

Documents authored by Gan, Junhao

O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN

Abstract

Cite as

Maximum Unique Coverage on Streams: Improved FPT Approximation Scheme and Tighter Space Lower Bound

Abstract

Cite as

An Almost Optimal Algorithm for Unbounded Search with Noisy Information

Abstract

Cite as

Graph Clustering in All Parameter Regimes

Abstract

Cite as

Result-Sensitive Binary Search with Noisy Information

Abstract

Cite as

Thanks for your feedback!

Could not send message