PAC: Computing Join Queries with Semi-Covers

Authors Heba Aamer , Bas Ketsman



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.6.pdf
  • Filesize: 1 MB
  • 20 pages

Document Identifiers

Author Details

Heba Aamer
  • Vrije Universiteit Brussel, Belgium
Bas Ketsman
  • Vrije Universiteit Brussel, Belgium

Cite As Get BibTex

Heba Aamer and Bas Ketsman. PAC: Computing Join Queries with Semi-Covers. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 6:1-6:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.6

Abstract

An increased and growing interest in large-scale data processing has triggered a demand for specialized algorithms that thrive in massively parallel shared-nothing systems. To answer the question of how to efficiently compute join queries in this setting, a rich line of research has emerged specifically for the Massively Parallel Communication (MPC) model. In the MPC model, algorithms are executed in rounds, with each round consisting of a synchronized communication phase and a separate local computation phase. The main cost measure is the load of the algorithm, defined as the maximum number of messages received by any server in any round.
We study worst-case optimal algorithms for the join query evaluation problem in the constant-round MPC model. In the single-round variant of MPC, the worst-case optimal load for this problem is well understood and algorithms exist that guarantee this load for any join query. In the constant-round variant of MPC, queries can often be computed with a lower load compared to the single-round variant, but the worst-case optimal load is only known for specific classes of join queries, including graph-like and acyclic join queries, and the associated algorithms use very different techniques. In this paper, we propose a new constant-round MPC algorithm for computing join queries. Our algorithm is correct for every join query and its load matches (up to a polylog factor) the worst-case optimal load for at least all join queries that are acyclic or graph-like.

Subject Classification

ACM Subject Classification
  • Theory of computation → Distributed computing models
  • Theory of computation → Logic and databases
  • Theory of computation → Abstract machines
Keywords
  • Worst-case optimal load
  • MPC model
  • join queries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  2. Foto N. Afrati and Jeffrey D. Ullman. Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng., 23(9):1282-1298, 2011. URL: https://doi.org/10.1109/TKDE.2011.47.
  3. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. Parallel-correctness and transferability for conjunctive queries. In Tova Milo and Diego Calvanese, editors, Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS, 2015, pages 47-58. ACM, 2015. URL: https://doi.org/10.1145/2745754.2745759.
  4. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. In Richard Hull and Wenfei Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, 2013, pages 273-284. ACM, 2013. URL: https://doi.org/10.1145/2463664.2465224.
  5. Paul Beame, Paraschos Koutris, and Dan Suciu. Skew in parallel query processing. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, Snowbird, UT, USA, June 22-27, 2014, pages 212-223. ACM, 2014. URL: https://doi.org/10.1145/2594538.2594558.
  6. Xiao Hu. Cover or pack: New upper and lower bounds for massively parallel joins. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 2021, pages 181-198. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458319.
  7. Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 417-428. ACM, 2017. URL: https://doi.org/10.1145/3034786.3034788.
  8. Bas Ketsman, Dan Suciu, and Yufei Tao. A near-optimal parallel algorithm for joining binary relations. Log. Methods Comput. Sci., 18(2), 2022. URL: https://doi.org/10.46298/lmcs-18(2:6)2022.
  9. Paraschos Koutris. Query Processing for Massively Parallel Systems. PhD thesis, University of Washington, USA, 2015. URL: https://hdl.handle.net/1773/33697.
  10. Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-case optimal algorithms for parallel query processing. In Wim Martens and Thomas Zeume, editors, 19th International Conference on Database Theory, ICDT 2016, volume 48 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.ICDT.2016.8.
  11. Miao Qiao and Yufei Tao. Two-attribute skew free, isolated cp theorem, and massively parallel joins. In PODS, pages 166-180, 2021. URL: https://doi.org/10.1145/3452021.3458321.
  12. Yufei Tao. A simple parallel algorithm for natural joins on binary relations. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory, ICDT 2020, volume 155 of LIPIcs, pages 25:1-25:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ICDT.2020.25.
  13. Yufei Tao. Parallel acyclic joins with canonical edge covers. In Dan Olteanu and Nils Vortmeier, editors, 25th International Conference on Database Theory, ICDT 2022, volume 220 of LIPIcs, pages 9:1-9:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPICS.ICDT.2022.9.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail