O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN

Authors Junhao Gan , Anthony Wirth , Zhuo Zhang



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.7.pdf
  • Filesize: 3.61 MB
  • 20 pages

Document Identifiers

Author Details

Junhao Gan
  • The University of Melbourne, Australia
Anthony Wirth
  • The University of Melbourne, Australia
Zhuo Zhang
  • The University of Melbourne, Australia

Cite As Get BibTex

Junhao Gan, Anthony Wirth, and Zhuo Zhang. O(1)-Round MPC Algorithms for Multi-Dimensional Grid Graph Connectivity, Euclidean MST and DBSCAN. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.7

Abstract

In this paper, we investigate three fundamental problems in the Massively Parallel Computation (MPC) model: (i) grid graph connectivity, (ii) approximate Euclidean Minimum Spanning Tree (EMST), and (iii) approximate DBSCAN. 
Our first result is a O(1)-round Las Vegas (i.e., succeeding with high probability) MPC algorithm for computing the connected components on a d-dimensional c-penetration grid graph ((d,c)-grid graph), where both d and c are positive integer constants. In such a grid graph, each vertex is a point with integer coordinates in ℕ^d, and an edge can only exist between two distinct vertices with 𝓁_∞-norm at most c. To our knowledge, the current best existing result for computing the connected components (CC’s) on (d,c)-grid graphs in the MPC model is to run the state-of-the-art MPC CC algorithms that are designed for general graphs: they achieve O(log log n + log D) [Behnezhad et al., 2019] and O(log log n + log 1/(λ)) [Sepehr Assadi et al., 2019] rounds, respectively, where D is the diameter and λ is the spectral gap of the graph. With our grid graph connectivity technique, our second main result is a O(1)-round Las Vegas MPC algorithm for computing approximate Euclidean MST. The existing state-of-the-art result on this problem is the O(1)-round MPC algorithm proposed by Andoni et al. [Alexandr Andoni et al., 2014], which only guarantees an approximation on the overall weight in expectation. In contrast, our algorithm not only guarantees a deterministic overall weight approximation, but also achieves a deterministic edge-wise weight approximation. The latter property is crucial to many applications, such as finding the Bichromatic Closest Pair and Single-Linkage Clustering. Last, but not least, our third main result is a O(1)-round Las Vegas MPC algorithm for computing an approximate DBSCAN clustering in O(1)-dimensional Euclidean space.

Subject Classification

ACM Subject Classification
  • Theory of computation → Massively parallel algorithms
Keywords
  • Massively Parallel Computation
  • Graph Connectivity
  • Grid Graphs
  • Euclidean Minimum Spanning Tree
  • DBSCAN

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K Agarwal, Herbert Edelsbrunner, Otfried Schwarzkopf, and Emo Welzl. Euclidean minimum spanning trees and bichromatic closest pairs. In Proceedings of the sixth annual symposium on Computational geometry, pages 203-210, 1990. URL: https://doi.org/10.1145/98524.98567.
  2. Pankaj K. Agarwal, Kyle Fox, Kamesh Munagala, and Abhinandan Nath. Parallel algorithms for constructing range and nearest-neighbor searching data structures. In Proceedings of the 35th ACM PODS 2016, pages 429-440. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902303.
  3. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116-1127, 1988. URL: https://doi.org/10.1145/48529.48535.
  4. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, 1998. Google Scholar
  5. Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 574-583. ACM, 2014. URL: https://doi.org/10.1145/2591796.2591805.
  6. Alexandr Andoni, Zhao Song, Clifford Stein, Zhengyu Wang, and Peilin Zhong. Parallel graph connectivity in log diameter rounds. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, 2018. URL: https://doi.org/10.1109/FOCS.2018.00070.
  7. Sunil Arya, David M Mount, Nathan S Netanyahu, Ruth Silverman, and Angela Y Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6):891-923, 1998. URL: https://doi.org/10.1145/293347.293348.
  8. Sepehr Assadi, Xiaorui Sun, and Omri Weinstein. Massively parallel algorithms for finding well-connected components in sparse graphs. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, 2019. Google Scholar
  9. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. Journal of the ACM (JACM), 64(6):1-58, 2017. URL: https://doi.org/10.1145/3125644.
  10. Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Lacki, and Vahab Mirrokni. Near-optimal massively parallel graph connectivity. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1615-1636. IEEE, 2019. URL: https://doi.org/10.1109/FOCS.2019.00095.
  11. Patrick Burger and Hans-Joachim Wuensche. Fast multi-pass 3d point segmentation based on a structured mesh graph for ground vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 2150-2156. IEEE, 2018. URL: https://doi.org/10.1109/IVS.2018.8500552.
  12. Brent N Clark, Charles J Colbourn, and David S Johnson. Unit disk graphs. Discrete mathematics, 86(1-3):165-177, 1990. URL: https://doi.org/10.1016/0012-365X(90)90358-O.
  13. Sam Coy and Artur Czumaj. Deterministic massively parallel connectivity. In STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 162-175. ACM, 2022. URL: https://doi.org/10.1145/3519935.3520055.
  14. Sam Coy, Artur Czumaj, and Gopinath Mishra. On parallel k-center clustering. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2023, Orlando, FL, USA, June 17-19, 2023, pages 65-75. ACM, 2023. URL: https://doi.org/10.1145/3558481.3591075.
  15. Artur Czumaj, Guichen Gao, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Pavel Veselý. Fully scalable MPC algorithms for clustering in high dimension. CoRR, abs/2307.07848, 2023. URL: https://doi.org/10.48550/arXiv.2307.07848.
  16. Artur Czumaj, Christiane Lammersen, Morteza Monemizadeh, and Christian Sohler. (1+ε-approximation for facility location in data streams. In Proceedings of SODA 2013, 2013. Google Scholar
  17. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113, 2008. URL: https://doi.org/10.1145/1327452.1327492.
  18. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL: https://arxiv.org/abs/1810.04805.
  19. Hu Ding and Jinhui Xu. Solving the chromatic cone clustering problem via minimum spanning sphere. In International Colloquium on Automata, Languages, and Programming, pages 773-784. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22006-7_65.
  20. Junhao Gan and Yufei Tao. Dbscan revisited: Mis-claim, un-fixability, and approximation. In Proceedings of the 2015 ACM SIGMOD international conference on management of data, pages 519-530, 2015. URL: https://doi.org/10.1145/2723372.2737792.
  21. Junhao Gan and Yufei Tao. On the hardness and approximation of euclidean dbscan. ACM Transactions on Database Systems (TODS), 42(3):1-45, 2017. URL: https://doi.org/10.1145/3083897.
  22. Junhao Gan and Yufei Tao. An i/o-efficient algorithm for computing vertex separators on multi-dimensional grid graphs and its applications. J. Graph Algorithms Appl. (JGAA), 22(2):297-327, 2018. URL: https://doi.org/10.7155/JGAA.00471.
  23. Junhao Gan, Anthony Wirth, and Zhuo Zhang. o(1)-round mpc algorithms for multi-dimensional grid graph connectivity, emst and dbscan, 2025. URL: https://arxiv.org/abs/2501.12044.
  24. Michael T Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the mapreduce framework. In International Symposium on Algorithms and Computation, pages 374-383. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-25591-5_39.
  25. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770-778. IEEE Computer Society, 2016. URL: https://doi.org/10.1109/CVPR.2016.90.
  26. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the thirty-sixth annual ACM Symposium on Theory of Computing, pages 373-380, 2004. URL: https://doi.org/10.1145/1007352.1007413.
  27. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 EuroSys Conference, Lisbon, Portugal, March 21-23, 2007, pages 59-72. ACM, 2007. URL: https://doi.org/10.1145/1272996.1273005.
  28. Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, 2010. URL: https://doi.org/10.1137/1.9781611973075.76.
  29. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning. J. Comput. Syst. Sci., 62(3):516-527, 2001. URL: https://doi.org/10.1006/JCSS.2000.1741.
  30. Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013. URL: http://arxiv.org/abs/1301.3781.
  31. Morteza Monemizadeh. Facility location in the sublinear geometric model. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023), 2023. Google Scholar
  32. Danupon Nanongkai and Michele Scquizzato. Equivalence classes and conditional hardness in massively parallel computations. Distributed computing, 35(2):165-183, 2022. URL: https://doi.org/10.1007/S00446-021-00418-2.
  33. Abhinandan Nath, Kyle Fox, Pankaj K Agarwal, and Kamesh Munagala. Massively parallel algorithms for computing tin dems and contour trees for large terrains. In Proceedings of the 24th ACM SIGSPATIAL, 2016. Google Scholar
  34. Tom White. Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale (3. ed., revised and updated). O'Reilly, 2012. URL: http://www.oreilly.de/catalog/9781449311520/index.html.
  35. Jie Xue. Colored range closest-pair problem under general distance functions. In Proceedings of ACM-SIAM SODA, pages 373-390. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.24.
  36. Grigory Yaroslavtsev and Adithya Vadapalli. Massively parallel algorithms and hardness for single-linkage clustering under lp-distances. In 35th International Conference on Machine Learning (ICML'18), 2018. Google Scholar
  37. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010, 2010. URL: https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets.
  38. Yan Zhou, Oleksandr Grygorash, and Thomas F Hain. Clustering with minimum spanning trees. International Journal on Artificial Intelligence Tools, 20(01):139-177, 2011. URL: https://doi.org/10.1142/S0218213011000061.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail