Parallel Query Processing with Heterogeneous Machines

Authors Simon Frisk, Paraschos Koutris



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.27.pdf
  • Filesize: 0.8 MB
  • 18 pages

Document Identifiers

Author Details

Simon Frisk
  • University of Wisconsin-Madison, WI, USA
Paraschos Koutris
  • University of Wisconsin-Madison, WI, USA

Cite As Get BibTex

Simon Frisk and Paraschos Koutris. Parallel Query Processing with Heterogeneous Machines. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 27:1-27:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.27

Abstract

We study the problem of computing a full Conjunctive Query in parallel using p heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it receives to a cost. An optimal algorithm should minimize the maximum cost across all machines. We consider algorithms over a single communication round and give a lower bound and matching upper bound for databases where each relation has the same cardinality. We do this for both linear cost functions like in previous work, but also for more general cost functions. For databases with relations of different cardinalities, we also find a lower bound, and give matching upper bounds for specific queries like the cartesian product, the join, the star query, and the triangle query. Our approach is inspired by the HyperCube algorithm, but there are additional challenges involved when machines have heterogeneous cost functions.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Joins
  • Massively Parallel Computation
  • Heterogeneous

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Foto N. Afrati and Jeffrey D. Ullman. Optimizing joins in a map-reduce environment. In Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Léger, Felix Naumann, Anastasia Ailamaki, and Fatma Özcan, editors, EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings, volume 426 of ACM International Conference Proceeding Series, pages 99-110. ACM, 2010. URL: https://doi.org/10.1145/1739041.1739056.
  2. Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. J. ACM, 64(6):40:1-40:58, 2017. URL: https://doi.org/10.1145/3125644.
  3. Spyros Blanas, Paraschos Koutris, and Anastasios Sidiropoulos. Topology-aware parallel data processing: Models, algorithms and systems at scale. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. www.cidrdb.org, 2020. URL: http://cidrdb.org/cidr2020/papers/p10-blanas-cidr20.pdf.
  4. Benoît Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. The snowflake elastic data warehouse. In Fatma Özcan, Georgia Koutrika, and Sam Madden, editors, Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 215-226. ACM, 2016. URL: https://doi.org/10.1145/2882903.2903741.
  5. Simon Frisk and Paraschos Koutris. Parallel query processing with heterogeneous machines, 2025. URL: https://arxiv.org/abs/2501.08896.
  6. Xiao Hu. Cover or pack: New upper and lower bounds for massively parallel joins. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20-25, 2021, pages 181-198. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458319.
  7. Xiao Hu and Paraschos Koutris. Topology-aware parallel joins. Proc. ACM Manag. Data, 2(2):97, 2024. URL: https://doi.org/10.1145/3651598.
  8. Xiao Hu, Paraschos Koutris, and Spyros Blanas. Algorithms for a topology-aware massively parallel computation model. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20-25, 2021, pages 199-214. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458318.
  9. Xiao Hu and Yufei Tao. Parallel acyclic joins: Optimal algorithms and cyclicity separation. J. ACM, 71(1):6:1-6:44, 2024. URL: https://doi.org/10.1145/3633512.
  10. Xiao Hu and Ke Yi. Instance and output optimal parallel algorithms for acyclic joins. In Dan Suciu, Sebastian Skritek, and Christoph Koch, editors, Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 450-463. ACM, 2019. URL: https://doi.org/10.1145/3294052.3319698.
  11. Bas Ketsman, Dan Suciu, and Yufei Tao. A near-optimal parallel algorithm for joining binary relations. Log. Methods Comput. Sci., 18(2), 2022. URL: https://doi.org/10.46298/LMCS-18(2:6)2022.
  12. Paraschos Koutris, Paul Beame, and Dan Suciu. Worst-case optimal algorithms for parallel query processing. In Wim Martens and Thomas Zeume, editors, 19th International Conference on Database Theory, ICDT 2016, Bordeaux, France, March 15-18, 2016, volume 48 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPICS.ICDT.2016.8.
  13. Yufei Tao. A simple parallel algorithm for natural joins on binary relations. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory, ICDT 2020, March 30-April 2, 2020, Copenhagen, Denmark, volume 155 of LIPIcs, pages 25:1-25:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ICDT.2020.25.
  14. Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 1041-1052. ACM, 2017. URL: https://doi.org/10.1145/3035918.3056101.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail