The Quest for Faster Join Algorithms (Invited Talk)

Authors Paraschos Koutris , Shaleen Deep, Austen Fan, Hangdong Zhao



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.1.pdf
  • Filesize: 0.87 MB
  • 12 pages

Document Identifiers

Author Details

Paraschos Koutris
  • University of Wisconsin-Madison, WI, USA
Shaleen Deep
  • Microsoft, Madison, WI, USA
Austen Fan
  • University of Wisconsin-Madison, WI, USA
Hangdong Zhao
  • University of Wisconsin-Madison, WI, USA

Cite As Get BibTex

Paraschos Koutris, Shaleen Deep, Austen Fan, and Hangdong Zhao. The Quest for Faster Join Algorithms (Invited Talk). In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 1:1-1:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.1

Abstract

Joins are the cornerstone of relational databases. Surprisingly, even after several decades of research in the systems and theory database community, we still lack an understanding of how to design the fastest possible join algorithm. In this talk, we will present the exciting progress the database theory community has achieved in join algorithms over the last two decades. The talk will revolve around five key ideas fundamentally shaping this research area: tree decompositions, data partitioning, leveraging statistical information, enumeration, and algebraic techniques.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Conjunctive Queries
  • Joins
  • Tree Decompositions
  • Enumeration
  • Semirings

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon and Ravi B. Boppana. The monotone circuit complexity of boolean functions. Comb., 7(1):1-22, 1987. URL: https://doi.org/10.1007/BF02579196.
  2. Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997. URL: https://doi.org/10.1007/BF02523189.
  3. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In FOCS, pages 739-748. IEEE Computer Society, 2008. URL: https://doi.org/10.1109/FOCS.2008.43.
  4. Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In CSL, volume 4646 of Lecture Notes in Computer Science, pages 208-222. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-74915-8_18.
  5. Altan Birler, Alfons Kemper, and Thomas Neumann. Robust join processing with diamond hardened joins. Proc. VLDB Endow., 17(11):3215-3228, 2024. URL: https://doi.org/10.14778/3681954.3681995.
  6. Karl Bringmann and Egor Gorbachev. A fine-grained classification of subquadratic patterns for subgraph listing and friends. CoRR, abs/2404.04369, 2024. URL: https://doi.org/10.48550/arXiv.2404.04369.
  7. Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. Tractable orders for direct access to ranked answers of conjunctive queries. In PODS, pages 325-341. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458331.
  8. E. F. Codd. A relational model of data for large shared data banks (reprint). In Software Pioneers, pages 263-294. Springer Berlin Heidelberg, 2002. URL: https://doi.org/10.1007/978-3-642-59412-0_16.
  9. Kyle Deeds and Timo Camillo Merkl. Partition constraints for conjunctive queries: Bounds and worst-case optimal joins, 2025. URL: https://arxiv.org/abs/2501.04190.
  10. Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree sequence bound for join cardinality estimation. In ICDT, volume 255 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.ICDT.2023.8.
  11. Shaleen Deep, Xiao Hu, and Paraschos Koutris. Fast join project query evaluation using matrix multiplication. In SIGMOD Conference, pages 1213-1223. ACM, 2020. URL: https://doi.org/10.1145/3318464.3380607.
  12. Shaleen Deep, Xiao Hu, and Paraschos Koutris. Ranked enumeration of join queries with projections. Proc. VLDB Endow., 15(5):1024-1037, 2022. URL: https://doi.org/10.14778/3510397.3510401.
  13. Shaleen Deep, Xiao Hu, and Paraschos Koutris. General space-time tradeoffs via relational queries. In WADS, volume 14079 of Lecture Notes in Computer Science, pages 309-325. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-38906-1_21.
  14. Shaleen Deep and Paraschos Koutris. Compressed representations of conjunctive query results. In PODS, pages 307-322. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196979.
  15. Shaleen Deep and Paraschos Koutris. Ranked enumeration of conjunctive query results. In ICDT, volume 186 of LIPIcs, pages 5:1-5:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPICS.ICDT.2021.5.
  16. Shaleen Deep, Hangdong Zhao, Austen Z. Fan, and Paraschos Koutris. Output-sensitive conjunctive query evaluation. Proc. ACM Manag. Data, 2(5):220:1-220:24, 2024. URL: https://doi.org/10.1145/3695838.
  17. Austen Z. Fan, Paraschos Koutris, and Hangdong Zhao. The fine-grained complexity of boolean conjunctive queries and sum-product problems. In ICALP, volume 261 of LIPIcs, pages 127:1-127:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPICS.ICALP.2023.127.
  18. Austen Z. Fan, Paraschos Koutris, and Hangdong Zhao. Tight bounds of circuits for sum-product queries. Proc. ACM Manag. Data, 2(2):87, 2024. URL: https://doi.org/10.1145/3651588.
  19. Michael J. Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann. Adopting worst-case optimal joins in relational database systems. Proc. VLDB Endow., 13(11):1891-1904, 2020. URL: http://www.vldb.org/pvldb/vol13/p1891-freitag.pdf.
  20. Georg Gottlob, Nicola Leone, and Francesco Scarcello. Hypertree decompositions and tractable queries. In PODS, pages 21-32. ACM Press, 1999. URL: https://doi.org/10.1145/303976.303979.
  21. Todd J. Green, Gregory Karvounarakis, and Val Tannen. Provenance semirings. In PODS, pages 31-40. ACM, 2007. URL: https://doi.org/10.1145/1265530.1265535.
  22. Alina Harbuzova, Ce Jin, Virginia Vassilevska Williams, and Zixuan Xu. Improved roundtrip spanners, emulators, and directed girth approximation. In SODA, pages 4641-4669. SIAM, 2024. URL: https://doi.org/10.1137/1.9781611977912.166.
  23. Xiao Hu. Fast matrix multiplication for query processing. Proc. ACM Manag. Data, 2(2):98, 2024. URL: https://doi.org/10.1145/3651599.
  24. Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Trade-offs in static and dynamic evaluation of hierarchical queries. Log. Methods Comput. Sci., 19(3), 2023. URL: https://doi.org/10.46298/LMCS-19(3:11)2023.
  25. Mahmoud Abo Khamis, Ryan R. Curtin, Benjamin Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. On functional aggregate queries with additive inequalities. In PODS, pages 414-431. ACM, 2019. URL: https://doi.org/10.1145/3294052.3319694.
  26. Mahmoud Abo Khamis, Xiao Hu, and Dan Suciu. Fast matrix multiplication meets the subdmodular width. CoRR, abs/2412.06189, 2024. URL: https://doi.org/10.48550/arXiv.2412.06189.
  27. Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, and Dan Suciu. Join size bounds using l_p-norms on degree sequences. Proc. ACM Manag. Data, 2(2):96, 2024. URL: https://doi.org/10.1145/3651597.
  28. Mahmoud Abo Khamis, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. In-database learning with sparse tensors. In PODS, pages 325-340. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196960.
  29. Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. FAQ: questions asked frequently. In PODS, pages 13-28. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902280.
  30. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. What do Shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In PODS, pages 429-444. ACM, 2017. URL: https://doi.org/10.1145/3034786.3056105.
  31. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. PANDA: query evaluation in submodular width. CoRR, abs/2402.02001, 2024. URL: https://doi.org/10.48550/arXiv.2402.02001.
  32. Dániel Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM, 60(6):42:1-42:51, 2013. URL: https://doi.org/10.1145/2535926.
  33. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms: [extended abstract]. In PODS, pages 37-48. ACM, 2012. URL: https://doi.org/10.1145/2213556.2213565.
  34. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1-16:40, 2018. URL: https://doi.org/10.1145/3180143.
  35. Hung Q. Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5-16, 2013. URL: https://doi.org/10.1145/2590989.2590991.
  36. Dan Olteanu and Maximilian Schleich. Factorized databases. SIGMOD Rec., 45(2):5-16, 2016. URL: https://doi.org/10.1145/3003665.3003667.
  37. Dan Olteanu and Jakub Závodný. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1-2:44, 2015. URL: https://doi.org/10.1145/2656335.
  38. Neil Robertson and Paul D. Seymour. Graph minors. III. planar tree-width. J. Comb. Theory, Ser. B, 36(1):49-64, 1984. URL: https://doi.org/10.1016/0095-8956(84)90013-3.
  39. Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. Learning linear regression models over factorized joins. In SIGMOD Conference, pages 3-18. ACM, 2016. URL: https://doi.org/10.1145/2882903.2882939.
  40. Luc Segoufin. Enumerating with constant delay the answers to a query. In ICDT, pages 10-20. ACM, 2013. URL: https://doi.org/10.1145/2448496.2448498.
  41. Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, and Xiaofeng Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Proc. VLDB Endow., 13(9):1582-1597, 2020. URL: https://doi.org/10.14778/3397230.3397250.
  42. Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. Ranked enumeration for database queries. SIGMOD Rec., 53(3):6-19, 2024. URL: https://doi.org/10.1145/3703922.3703924.
  43. Todd L. Veldhuizen. Leapfrog triejoin: a worst-case optimal join algorithm. CoRR, abs/1210.0481, 2012. URL: https://arxiv.org/abs/1210.0481.
  44. Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96-106. OpenProceedings.org, 2014. URL: https://doi.org/10.5441/002/ICDT.2014.13.
  45. Yisu Remy Wang, Max Willsey, and Dan Suciu. Free join: Unifying worst-case optimal and traditional joins. Proc. ACM Manag. Data, 1(2):150:1-150:23, 2023. URL: https://doi.org/10.1145/3589295.
  46. Yifei Yang, Hangdong Zhao, Xiangyao Yu, and Paraschos Koutris. Predicate transfer: Efficient pre-filtering on multi-join queries. In CIDR. www.cidrdb.org, 2024. URL: https://www.cidrdb.org/cidr2024/papers/p22-yang.pdf.
  47. Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82-94. IEEE Computer Society, 1981. Google Scholar
  48. Hangdong Zhao, Shaleen Deep, and Paraschos Koutris. Space-time tradeoffs for conjunctive queries with access patterns. In PODS, pages 59-68. ACM, 2023. URL: https://doi.org/10.1145/3584372.3588675.
  49. Hangdong Zhao, Austen Z. Fan, Xiating Ouyang, and Paraschos Koutris. Conjunctive queries with negation and aggregation: A linear time characterization. Proc. ACM Manag. Data, 2(2):75, 2024. URL: https://doi.org/10.1145/3651138.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail