Database Theory in Action: Search-Based Program Optimization

Authors Yihong Zhang , Dan Suciu , Yisu Remy Wang , Max Willsey



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.34.pdf
  • Filesize: 0.75 MB
  • 6 pages

Document Identifiers

Author Details

Yihong Zhang
  • University of Washington, Seattle, WA, USA
Dan Suciu
  • University of Washington, Seattle, WA, USA
Yisu Remy Wang
  • University of California, Los Angeles, CA, USA
Max Willsey
  • University of California, Berkeley, CA, USA

Cite As Get BibTex

Yihong Zhang, Dan Suciu, Yisu Remy Wang, and Max Willsey. Database Theory in Action: Search-Based Program Optimization. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 34:1-34:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.34

Abstract

Recent work in programming languages developed an approach to term rewritings based on equality saturation (EqSat), which, instead of applying destructively the rewrite rules, maintains all equivalent expressions in a structure called an E-graph. This paper describes two surprising connections between EqSat and databases, going both ways. On one hand equality saturation can be viewed as a query evaluation problem, with great benefits. On the other hand, most sophisticated SQL query optimizers are based on the Volcano/Cascades framework which, we explain, is a variant of EqSat.

Subject Classification

ACM Subject Classification
  • Theory of computation → Equational logic and rewriting
  • Theory of computation → Database query processing and optimization (theory)
Keywords
  • Query optimization
  • program optimization
  • Cascades framework
  • equality saturation
  • Datalog

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. Cvc4. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV'11, pages 171-177, Berlin, Heidelberg, 2011. Springer-Verlag. URL: https://doi.org/10.1007/978-3-642-22110-1_14.
  2. Martin E. Bidlingmaier. An evaluation algorithm for datalog with equality, 2023. URL: https://doi.org/10.48550/arXiv.2302.05792.
  3. Leonardo de Moura and Nikolaj Bjørner. Efficient e-matching for smt solvers. In Frank Pfenning, editor, Automated Deduction - CADE-21, pages 183-198, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google Scholar
  4. Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS'08/ETAPS'08, pages 337-340, Berlin, Heidelberg, 2008. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=1792734.1792766.
  5. Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. Variations on the common subexpression problem. Journal of the ACM, 27(4):758-771, October 1980. URL: https://doi.org/10.1145/322217.322228.
  6. Thomas Gilray, Arash Sahebolamri, Sidharth Kumar, and Kristopher Micinski. Higher-order, data-parallel structured deduction, 2022. URL: https://doi.org/10.48550/arXiv.2211.11573.
  7. G. Graefe and W.J. McKenna. The volcano optimizer generator: extensibility and efficient search. In Proceedings of IEEE 9th International Conference on Data Engineering, pages 209-218, 1993. URL: https://doi.org/10.1109/ICDE.1993.344061.
  8. Goetz Graefe. The cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19-29, 1995. URL: http://sites.computer.org/debull/95SEP-CD.pdf.
  9. Thomas Kundefinedhler, Andrés Goens, Siddharth Bhat, Tobias Grosser, Phil Trinder, and Michel Steuwer. Guided equality saturation. Proc. ACM Program. Lang., 8(POPL), January 2024. URL: https://doi.org/10.1145/3632900.
  10. Mengmeng Liu, Zachary G Ives, and Boon Thau Loo. Enabling incremental query re-optimization. In Proceedings of the 2016 International Conference on Management of Data, pages 1705-1720, 2016. URL: https://doi.org/10.1145/2882903.2915212.
  11. I. S. Mumick, S. J. Finkelstein, Hamid Pirahesh, and Raghu Ramakrishnan. Magic is relevant. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD '90, pages 247-258, New York, NY, USA, 1990. Association for Computing Machinery. URL: https://doi.org/10.1145/93597.98734.
  12. Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. Synthesizing structured CAD models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, pages 31-44, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3385412.3386012.
  13. Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, and Zachary Tatlock. Rewrite rule inference using equality saturation. Proc. ACM Program. Lang., 5(OOPSLA), October 2021. URL: https://doi.org/10.1145/3485496.
  14. Charles Gregory Nelson. Techniques for Program Verification. PhD thesis, Stanford University, Stanford, CA, USA, 1980. AAI8011683. Google Scholar
  15. Hung Q. Ngo, Christopher Ré, and Atri Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5-16, 2013. URL: https://doi.org/10.1145/2590989.2590991.
  16. André Pacak and Sebastian Erdweg. Functional programming with datalog. In 36th European Conference on Object-Oriented Programming (ECOOP 2022). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. Google Scholar
  17. Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. Automatically improving accuracy for floating point expressions. SIGPLAN Not., 50(6):1-11, June 2015. URL: https://doi.org/10.1145/2813885.2737959.
  18. Bjarne Steensgaard. Points-to analysis in almost linear time. In Hans-Juergen Boehm and Guy L. Steele Jr., editors, Conference Record of POPL'96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Papers Presented at the Symposium, St. Petersburg Beach, Florida, USA, January 21-24, 1996, pages 32-41. ACM Press, 1996. URL: https://doi.org/10.1145/237721.237727.
  19. Robert Endre Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215-225, 1975. URL: https://doi.org/10.1145/321879.321884.
  20. Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: A new approach to optimization. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '09, pages 264-276, New York, NY, USA, 2009. ACM. URL: https://doi.org/10.1145/1480881.1480915.
  21. K. Tuncay Tekle and Yanhong A. Liu. Precise complexity analysis for efficient datalog queries. In Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, PPDP '10, pages 35-44, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1836089.1836094.
  22. Alexa VanHattum, Rachit Nigam, Vincent T. Lee, James Bornholt, and Adrian Sampson. Vectorization for Digital Signal Processors via Equality Saturation, pages 874-886. Association for Computing Machinery, New York, NY, USA, 2021. URL: https://doi.org/10.1145/3445814.3446707.
  23. Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. SPORES: Sum-product optimization via relational equality saturation for large scale linear algebra. Proceedings of the VLDB Endowment, 2020. Google Scholar
  24. Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q Ngo, Reinhard Pichler, and Dan Suciu. Optimizing recursive queries with program synthesis. arXiv preprint arXiv:2202.10390, 2022. URL: https://arxiv.org/abs/2202.10390.
  25. Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. Egg: Fast and extensible equality saturation. Proc. ACM Program. Lang., 5(POPL), January 2021. URL: https://doi.org/10.1145/3434304.
  26. Yongwen Xu. Efficiency in the columbia database query optimizer. Master’s thesis, Citeseer, 1998. Google Scholar
  27. Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. Equality saturation for tensor graph superoptimization. In Proceedings of Machine Learning and Systems, 2021. URL: https://arxiv.org/abs/2101.01332.
  28. Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. Better together: Unifying datalog and equality saturation. Proc. ACM Program. Lang., 7(PLDI), June 2023. URL: https://doi.org/10.1145/3591239.
  29. Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. Relational e-matching. Proc. ACM Program. Lang., 6(POPL), January 2022. URL: https://doi.org/10.1145/3498696.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail