DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.64

Fully Scalable MPC Algorithms for Euclidean k-Center

Authors: Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.-C. Jiang

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

The k-center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The k-center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of k-center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has Ω(k) local memory per machine. While this setting covers the case of small values of k, for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large k has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the k-center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only k(1+o(1)) centers whose cost is a super-constant approximation of k-center. In this work, we significantly improve upon the earlier results for the k-center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in ℝ^d, we present the first constant-round fully scalable MPC algorithm for (2+ε)-approximation. We push the ratio further to (1 + ε)-approximation albeit using slightly more (1 + ε)k centers. All these results naturally extends to slightly super-constant values of d. In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an O(log n/ log log n)-approximation for k-center.

Cite as

Artur Czumaj, Guichen Gao, Mohsen Ghaffari, and Shaofeng H.-C. Jiang. Fully Scalable MPC Algorithms for Euclidean k-Center. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 64:1-64:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{czumaj_et_al:LIPIcs.ICALP.2025.64,
  author =	{Czumaj, Artur and Gao, Guichen and Ghaffari, Mohsen and Jiang, Shaofeng H.-C.},
  title =	{{Fully Scalable MPC Algorithms for Euclidean k-Center}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{64:1--64:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.64},
  URN =		{urn:nbn:de:0030-drops-234416},
  doi =		{10.4230/LIPIcs.ICALP.2025.64},
  annote =	{Keywords: Massively Parallel Computing, Euclidean Spaces, k-Center Clustering}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.21

GSOHC: Global Synchronization Optimization in Heterogeneous Computing

Authors: Soumik Kumar Basu and Jyothi Vedurada

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

The use of heterogeneous systems has become widespread and popular in the past decade with more than one type of processor, such as CPUs, GPUs (Graphics Processing Units), and FPGAs (Field Programmable Gate Arrays) etc. A wide range of applications use both CPU and GPU to leverage the benefits of their unique features and strengths. Therefore, collaborative computation between CPU and GPU is essential to achieve high program performance. However, poorly placed global synchronization barriers and synchronous memory transfers are the main bottlenecks to enhanced program performance, preventing CPU and GPU computations from overlapping. Based on this observation, we propose a new optimization technique called hetero-sync motion that can relocate such barrier instructions to new locations, resulting in improved performance in CPU-GPU heterogeneous programs. Further, we propose GSOHC, a compiler analysis and optimization framework that automatically finds opportunities for hetero-sync motion in the input program and then performs code transformation to apply the optimization. Our static analysis is a context-sensitive, flow-sensitive inter-procedural data-flow analysis with three phases to identify the optimization opportunities precisely. We have implemented GSOHC using LLVM/Clang infrastructure. On A4000, P100 and A100 GPUs, our optimization achieves speedups of up to 1.8x, 1.9x and 1.9x over the baseline, respectively.

Cite as

Soumik Kumar Basu and Jyothi Vedurada. GSOHC: Global Synchronization Optimization in Heterogeneous Computing. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 21:1-21:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kumarbasu_et_al:LIPIcs.ECOOP.2025.21,
  author =	{Kumar Basu, Soumik and Vedurada, Jyothi},
  title =	{{GSOHC: Global Synchronization Optimization in Heterogeneous Computing}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{21:1--21:30},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.21},
  URN =		{urn:nbn:de:0030-drops-232949},
  doi =		{10.4230/LIPIcs.ECOOP.2025.21},
  annote =	{Keywords: Static Analysis, Synchronization, CPU-GPU, Heterogeneous Computing, Parallelization}
}

Document

DOI: 10.4230/LIPIcs.OPODIS.2024.33

Self-Stabilizing Fully Adaptive Maximal Matching

Authors: Shimon Bitton, Yuval Emek, Taisuke Izumi, and Shay Kutten

Published in: LIPIcs, Volume 324, 28th International Conference on Principles of Distributed Systems (OPODIS 2024)

Abstract

A self-stabilizing randomized algorithm for mending maximal matching (MM) in synchronous networks is presented. Starting from a legal MM configuration and assuming that the network undergoes k faults or topology changes (that may occur in multiple batches), the algorithm is guaranteed to stabilize back to a legal MM configuration in time O(log k) in expectation and with high probability (in k), using constant size messages. The algorithm is simple to implement and is uniform in the sense that it does not assume unique identifiers, nor does it assume any global knowledge of the communication graph including its size. It relies on a generic probabilistic phase synchronization technique that may be useful for other self-stabilizing problems. The algorithm compares favorably with the existing self-stabilizing MM algorithms in terms of the dependence of its run-time on k, a.k.a. fully adaptive run-time. In fact, this dependence is asymptotically optimal for uniform algorithms that use constant size messages.

Cite as

Shimon Bitton, Yuval Emek, Taisuke Izumi, and Shay Kutten. Self-Stabilizing Fully Adaptive Maximal Matching. In 28th International Conference on Principles of Distributed Systems (OPODIS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 324, pp. 33:1-33:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bitton_et_al:LIPIcs.OPODIS.2024.33,
  author =	{Bitton, Shimon and Emek, Yuval and Izumi, Taisuke and Kutten, Shay},
  title =	{{Self-Stabilizing Fully Adaptive Maximal Matching}},
  booktitle =	{28th International Conference on Principles of Distributed Systems (OPODIS 2024)},
  pages =	{33:1--33:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-360-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{324},
  editor =	{Bonomi, Silvia and Galletta, Letterio and Rivi\`{e}re, Etienne and Schiavoni, Valerio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2024.33},
  URN =		{urn:nbn:de:0030-drops-225698},
  doi =		{10.4230/LIPIcs.OPODIS.2024.33},
  annote =	{Keywords: self-stabilization, maximal matching, fully adaptive run-time, dynamic graphs}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2020.28

Sample-And-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model

Authors: Kishore Kothapalli, Shreyas Pai, and Sriram V. Pemmaraju

Published in: LIPIcs, Volume 182, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)

Abstract

Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al. PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and it comes with the constraint that the memory-per-machine is strongly sublinear in the input size. For graph problems, extremely fast MPC algorithms have been designed assuming Ω̃(n) memory-per-machine, where n is the number of nodes in the graph (e.g., the O(log log n) MIS algorithm of Ghaffari et al., PODC 2018). However, it has proven much more difficult to design fast MPC algorithms for graph problems in the low-memory MPC model, where the memory-per-machine is restricted to being strongly sublinear in the number of nodes, i.e., O(n^ε) for constant 0 < ε < 1. In this paper, we present an algorithm for the 2-ruling set problem, running in Õ(log^{1/6} Δ) rounds whp, in the low-memory MPC model. Here Δ is the maximum degree of the graph. We then extend this result to β-ruling sets for any integer β > 1. Specifically, we show that a β-ruling set can be computed in the low-memory MPC model with O(n^ε) memory-per-machine in Õ(β ⋅ log^{1/(2^{β+1}-2)} Δ) rounds, whp. From this it immediately follows that a β-ruling set for β = Ω(log log log Δ)-ruling set can be computed in in just O(β log log n) rounds whp. The above results assume a total memory of Õ(m + n^{1+ε}). We also present algorithms for β-ruling sets in the low-memory MPC model assuming that the total memory over all machines is restricted to Õ(m). For β > 1, these algorithms are all substantially faster than the Ghaffari-Uitto Õ(√{log Δ})-round MIS algorithm in the low-memory MPC model. All our results follow from a Sample-and-Gather Simulation Theorem that shows how random-sampling-based Congest algorithms can be efficiently simulated in the low-memory MPC model. We expect this simulation theorem to be of independent interest beyond the ruling set algorithms derived here.

Cite as

Kishore Kothapalli, Shreyas Pai, and Sriram V. Pemmaraju. Sample-And-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 28:1-28:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{kothapalli_et_al:LIPIcs.FSTTCS.2020.28,
  author =	{Kothapalli, Kishore and Pai, Shreyas and Pemmaraju, Sriram V.},
  title =	{{Sample-And-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model}},
  booktitle =	{40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020)},
  pages =	{28:1--28:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-174-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{182},
  editor =	{Saxena, Nitin and Simon, Sunil},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2020.28},
  URN =		{urn:nbn:de:0030-drops-132690},
  doi =		{10.4230/LIPIcs.FSTTCS.2020.28},
  annote =	{Keywords: Massively Parallel Computation, Ruling Set, Simulation Theorems}
}

Document

DOI: 10.4230/LIPIcs.FSTTCS.2012.136

Super-Fast 3-Ruling Sets

Authors: Kishore Kothapalli and Sriram Pemmaraju

Published in: LIPIcs, Volume 18, IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)

Abstract

A t-ruling set of a graph G = (V, E) is a vertex-subset S that is independent and satisfies the property that every vertex v in V is at a distance of at most t from some vertex in S. A maximal independent set (MIS) is a 1-ruling set. The problem of computing an MIS on a network is a fundamental problem in distributed algorithms and the fastest algorithm for this problem is the O(log n)-round algorithm due to Luby (SICOMP 1986) and Alon et al. (J. Algorithms 1986) from more than 25 years ago. Since then the problem has resisted all efforts to yield to a sub-logarithmic round algorithm. There has been recent progress on this problem, most importantly an O(log Delta . sqrt(log n))-round algorithm on graphs with n vertices and maximum degree Delta, due to Barenboim et al. (to appear FOCS 2012). The time complexity of this algorithm is sub-logarithmic for Delta =2^{o(sqrt{log n})}. We approach the MIS problem from a different angle and ask if O(1)-ruling sets can be computed faster than the currently known fastest algorithm for an MIS? As an answer to this question, we show how to compute a 2-ruling set of an n-vertex graph in O((log n)^{3/4}) rounds. We also show that the above result can be improved for special classes of graphs. For instance, on high girth graphs (girth 6 or more), trees, and graphs of bounded arboricity, we show how to compute 3-ruling sets in exp(O({sqrt{loglog n}})) rounds, O((log log n)^2 .logloglog n) rounds, and O((loglog n)^3) rounds, respectively. Our main technique involves randomized sparsification that rapidly reduces the graph degree while ensuring that every deleted vertex is close to some vertex that remains. This technique may have further applications in other contexts, e.g., in designing sub-logarithmic distributed approximation algorithms. Our results raise intriguing questions about how quickly an MIS (or 1-ruling sets) can be computed, given that 2-ruling sets can be computed in sub-logarithmic rounds.

Cite as

Kishore Kothapalli and Sriram Pemmaraju. Super-Fast 3-Ruling Sets. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012). Leibniz International Proceedings in Informatics (LIPIcs), Volume 18, pp. 136-147, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@InProceedings{kothapalli_et_al:LIPIcs.FSTTCS.2012.136,
  author =	{Kothapalli, Kishore and Pemmaraju, Sriram},
  title =	{{Super-Fast 3-Ruling Sets}},
  booktitle =	{IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012)},
  pages =	{136--147},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-47-7},
  ISSN =	{1868-8969},
  year =	{2012},
  volume =	{18},
  editor =	{D'Souza, Deepak and Radhakrishnan, Jaikumar and Telikepalli, Kavitha},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2012.136},
  URN =		{urn:nbn:de:0030-drops-38549},
  doi =		{10.4230/LIPIcs.FSTTCS.2012.136},
  annote =	{Keywords: MIS, ruling sets, graph sparsification, distributed algorithms}
}

5 Search Results for "Kothapalli, Kishore"

Fully Scalable MPC Algorithms for Euclidean k-Center

Abstract

Cite as

GSOHC: Global Synchronization Optimization in Heterogeneous Computing

Abstract

Cite as

Self-Stabilizing Fully Adaptive Maximal Matching

Abstract

Cite as

Sample-And-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model

Abstract

Cite as

Super-Fast 3-Ruling Sets

Abstract

Cite as

Thanks for your feedback!

Could not send message