Degree Sequence Bound for Join Cardinality Estimation

Authors Kyle Deeds , Dan Suciu, Magda Balazinska, Walter Cai



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.8.pdf
  • Filesize: 0.92 MB
  • 18 pages

Document Identifiers

Author Details

Kyle Deeds
  • University of Washington, Seattle, WA, USA
Dan Suciu
  • University of Washington, Seattle, WA, USA
Magda Balazinska
  • University of Washington, Seattle, WA, USA
Walter Cai
  • University of Washington, Seattle, WA, USA

Cite AsGet BibTex

Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree Sequence Bound for Join Cardinality Estimation. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICDT.2023.8

Abstract

Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer to complete than one generated with the true cardinality. Cardinality bounding avoids this pitfall by computing an upper bound on the query’s output size using statistics about the database such as table sizes and degrees, i.e. value frequencies. In this paper, we extend this line of work by proving a novel bound called the Degree Sequence Bound which takes into account the full degree sequences and the max tuple multiplicity. This work focuses on the important class of Berge-Acyclic queries for which the Degree Sequence Bound is tight. Further, we describe how to practically compute this bound using a functional approximation of the true degree sequences and prove that even this functional form improves upon previous bounds.

Subject Classification

ACM Subject Classification
  • Information systems → Query optimization
  • Information systems → Query planning
  • Theory of computation → Database query processing and optimization (theory)
  • Theory of computation → Data modeling
Keywords
  • Cardinality Estimation
  • Cardinality Bounding
  • Degree Bounds
  • Functional Approximation
  • Query Planning
  • Berge-Acyclic Queries

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 739-748. IEEE Computer Society, 2008. URL: https://doi.org/10.1109/FOCS.2008.43.
  2. Douglas Bauer, Haitze J Broersma, Jan van den Heuvel, Nathan Kahl, A Nevo, E Schmeichel, Douglas R Woodall, and Michael Yatauro. Best monotone degree conditions for graph properties: a survey. Graphs and combinatorics, 31(1):1-22, 2015. Google Scholar
  3. Walter Cai, Magdalena Balazinska, and Dan Suciu. Pessimistic cardinality estimation: Tighter upper bounds for intermediate join cardinalities. In Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 18-35. ACM, 2019. URL: https://doi.org/10.1145/3299869.3319894.
  4. Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree sequence bound for join cardinality estimation. arXiv preprint, 2022. URL: http://arxiv.org/abs/2201.04166.
  5. Ronald Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3):514-550, 1983. URL: https://doi.org/10.1145/2402.322390.
  6. Amir Gilad, Shweta Patwa, and Ashwin Machanavajjhala. Synthesizing linked data under cardinality and integrity constraints. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 619-631. ACM, 2021. URL: https://doi.org/10.1145/3448016.3457242.
  7. Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant. Size and treewidth bounds for conjunctive queries. J. ACM, 59(3):16:1-16:35, 2012. URL: https://doi.org/10.1145/2220357.2220363.
  8. Martin Grohe and Dániel Marx. Constraint solving via fractional edge covers. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 289-298. ACM Press, 2006. URL: http://dl.acm.org/citation.cfm?id=1109557.1109590.
  9. S Louis Hakimi and Edward F Schmeichel. Graphs and their degree sequences: A survey. In Theory and applications of graphs, pages 225-235. Springer, 1978. Google Scholar
  10. Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. Cardinality estimation in dbms: A comprehensive benchmark evaluation. arXiv preprint arXiv:2109.05877, 2021. Google Scholar
  11. Axel Hertzschuch, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. Simplicity done right for join ordering. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021. URL: http://cidrdb.org/cidr2021/papers/cidr2021_paper01.pdf.
  12. Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. FAQ: questions asked frequently. In Tova Milo and Wang-Chiew Tan, editors, Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 13-28. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902280.
  13. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. Computing join queries with functional dependencies. In Tova Milo and Wang-Chiew Tan, editors, Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 327-342. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902289.
  14. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 429-444. ACM, 2017. URL: https://doi.org/10.1145/3034786.3056105.
  15. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. How good are query optimizers, really? Proc. VLDB Endow., 9(3):204-215, 2015. URL: https://doi.org/10.14778/2850583.2850594.
  16. Jie Liu, Wenqian Dong, Dong Li, and Qingqing Zhou. Fauce: Fast and accurate deep ensembles with uncertainty for cardinality estimation. Proc. VLDB Endow., 14(11):1950-1963, 2021. URL: http://www.vldb.org/pvldb/vol14/p1950-liu.pdf, URL: https://doi.org/10.14778/3476249.3476254.
  17. Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. Flow-loss: Learning cardinality estimates that matter. Proc. VLDB Endow., 14(11):2019-2032, 2021. URL: http://www.vldb.org/pvldb/vol14/p2019-negi.pdf, URL: https://doi.org/10.14778/3476249.3476259.
  18. Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Jan Van den Bussche and Marcelo Arenas, editors, Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018, pages 111-124. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196990.
  19. Yeonsu Park, Seongyun Ko, Sourav S. Bhowmick, Kyoungmin Kim, Kijae Hong, and Wook-Shin Han. G-CARE: A framework for performance benchmarking of cardinality estimation techniques for subgraph matching. In David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo, editors, Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, pages 1099-1114. ACM, 2020. URL: https://doi.org/10.1145/3318464.3389702.
  20. Ji Sun, Guoliang Li, and Nan Tang. Learned cardinality estimation for similarity queries. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 1745-1757. ACM, 2021. URL: https://doi.org/10.1145/3448016.3452790.
  21. Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. Are we ready for learned cardinality estimation? Proc. VLDB Endow., 14(9):1640-1654, 2021. URL: http://www.vldb.org/pvldb/vol14/p1640-wang.pdf, URL: https://doi.org/10.14778/3461535.3461552.
  22. Peizhi Wu and Gao Cong. A unified deep model of learning from both data and queries for cardinality estimation. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, pages 2009-2022. ACM, 2021. URL: https://doi.org/10.1145/3448016.3452830.
  23. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. Neurocard: One cardinality estimator for all tables. Proc. VLDB Endow., 14(1):61-73, 2020. URL: https://doi.org/10.14778/3421424.3421432.
  24. Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. FLAT: fast, lightweight and accurate method for cardinality estimation. Proc. VLDB Endow., 14(9):1489-1502, 2021. URL: http://www.vldb.org/pvldb/vol14/p1489-zhu.pdf, URL: https://doi.org/10.14778/3461535.3461539.