Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility

Authors Thomas Muñoz Serrano , Cristian Riveros , Stijn Vansummeren



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.12.pdf
  • Filesize: 0.86 MB
  • 20 pages

Document Identifiers

Author Details

Thomas Muñoz Serrano
  • UHasselt, Data Science Institute, Diepenbeek, Belgium
Cristian Riveros
  • Pontificia Universidad Católica de Chile, Santiago, Chile
  • Millennium Institute for Foundational Research on Data, Santiago, Chile
Stijn Vansummeren
  • UHasselt, Data Science Institute, Diepenbeek, Belgium

Cite AsGet BibTex

Thomas Muñoz Serrano, Cristian Riveros, and Stijn Vansummeren. Enumeration and Updates for Conjunctive Linear Algebra Queries Through Expressibility. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.12

Abstract

Due to the importance of linear algebra and matrix operations in data analytics, there is significant interest in using relational query optimization and processing techniques for evaluating (sparse) linear algebra programs. In particular, in recent years close connections have been established between linear algebra programs and relational algebra that allow transferring optimization techniques of the latter to the former. In this paper, we ask ourselves which linear algebra programs in MATLANG correspond to the free-connex and q-hierarchical fragments of conjunctive first-order logic. Both fragments have desirable query processing properties: free-connex conjunctive queries support constant-delay enumeration after a linear-time preprocessing phase, and q-hierarchical conjunctive queries further allow constant-time updates. By characterizing the corresponding fragments of MATLANG, we hence identify the fragments of linear algebra programs that one can evaluate with constant-delay enumeration after linear-time preprocessing and with constant-time updates. To derive our results, we improve and generalize previous correspondences between MATLANG and relational algebra evaluated over semiring-annotated relations. In addition, we identify properties on semirings that allow to generalize the complexity bounds for free-connex and q-hierarchical conjunctive queries from Boolean annotations to general semirings.

Subject Classification

ACM Subject Classification
  • Theory of computation → Database theory
Keywords
  • Query evaluation
  • conjunctive queries
  • linear algebra
  • enumeration algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: A system for large-scale machine learning. In Kimberly Keeton and Timothy Roscoe, editors, 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, pages 265-283. USENIX Association, 2016. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. URL: http://webdam.inria.fr/Alice/.
  3. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. Google Scholar
  4. Michael J. Anderson, Shaden Smith, Narayanan Sundaram, Mihai Capota, Zheguang Zhao, Subramanya Dulloor, Nadathur Satish, and Theodore L. Willke. Bridging the gap between HPC and big data frameworks. Proc. VLDB Endow., 10(8):901-912, 2017. URL: https://doi.org/10.14778/3090163.3090168.
  5. Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Jacques Duparc and Thomas A. Henzinger, editors, Computer Science Logic, 21st International Workshop, CSL 2007, 16th Annual Conference of the EACSL, Lausanne, Switzerland, September 11-15, 2007, Proceedings, volume 4646 of Lecture Notes in Computer Science, pages 208-222. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-74915-8_18.
  6. Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux. On the expressiveness of LARA: A unified language for linear and relational algebra. In Carsten Lutz and Jean Christoph Jung, editors, 23rd International Conference on Database Theory, ICDT 2020, March 30-April 2, 2020, Copenhagen, Denmark, volume 155 of LIPIcs, pages 6:1-6:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPICS.ICDT.2020.6.
  7. Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the desirability of acyclic database schemes. J. ACM, 30(3):479-513, 1983. URL: https://doi.org/10.1145/2402.322389.
  8. Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. Constant delay enumeration for conjunctive queries: A tutorial. ACM SIGLOG News, 7(1):4-33, 2020. URL: https://doi.org/10.1145/3385634.3385636.
  9. Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. Answering conjunctive queries under updates. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 303-318. ACM, 2017. URL: https://doi.org/10.1145/3034786.3034789.
  10. Philip A. Bernstein and Nathan Goodman. Power of natural semijoins. SIAM J. Comput., 10(4):751-771, 1981. URL: https://doi.org/10.1137/0210059.
  11. Johann Brault-Baron. De la pertinence de l'énumération: Complexité en logiques propositionnelle et du premier ordre. (The relevance of the list: propositional logic and complexity of the first order). PhD thesis, University of Caen Normandy, France, 2013. URL: https://tel.archives-ouvertes.fr/tel-01081392.
  12. Robert Brijder, Floris Geerts, Jan Van den Bussche, and Timmy Weerwag. On the expressive power of query languages for matrices. ACM Trans. Database Syst., 44(4):15:1-15:31, 2019. URL: https://doi.org/10.1145/3331445.
  13. Robert Brijder, Marc Gyssens, and Jan Van den Bussche. On matrices and k-relations. In Andreas Herzig and Juha Kontinen, editors, Foundations of Information and Knowledge Systems - 11th International Symposium, FoIKS 2020, Dortmund, Germany, February 17-21, 2020, Proceedings, volume 12012 of Lecture Notes in Computer Science, pages 42-57. Springer, 2020. URL: https://doi.org/10.1007/978-3-030-39951-1_3.
  14. Nofar Carmeli and Markus Kröll. On the enumeration complexity of unions of conjunctive queries. ACM Trans. Database Syst., 46(2):5:1-5:41, 2021. URL: https://doi.org/10.1145/3450263.
  15. Nofar Carmeli and Luc Segoufin. Conjunctive queries with self-joins, towards a fine-grained enumeration complexity analysis. In Floris Geerts, Hung Q. Ngo, and Stavros Sintos, editors, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023, pages 277-289. ACM, 2023. URL: https://doi.org/10.1145/3584372.3588667.
  16. Arnaud Durand and Etienne Grandjean. First-order queries on structures of bounded degree are computable with constant delay. ACM Trans. Comput. Log., 8(4):21, 2007. URL: https://doi.org/10.1145/1276920.1276923.
  17. Arnaud Durand, Nicole Schweikardt, and Luc Segoufin. Enumerating answers to first-order queries over databases of low degree. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, Snowbird, UT, USA, June 22-27, 2014, pages 121-131. ACM, 2014. URL: https://doi.org/10.1145/2594538.2594539.
  18. Idan Eldar, Nofar Carmeli, and Benny Kimelfeld. Direct access for answers to conjunctive queries with aggregation. CoRR, abs/2303.05327, 2023. URL: https://doi.org/10.48550/arXiv.2303.05327.
  19. Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, and Prithviraj Sen. SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. www.cidrdb.org, 2017. URL: http://cidrdb.org/cidr2017/papers/p3-elgamal-cidr17.pdf.
  20. Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. Scaling machine learning via compressed linear algebra. SIGMOD Rec., 46(1):42-49, 2017. URL: https://doi.org/10.1145/3093754.3093765.
  21. Ronald Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM, 30(3):514-550, 1983. URL: https://doi.org/10.1145/2402.322390.
  22. Floris Geerts. On the expressive power of linear algebra on graphs. In Pablo Barceló and Marco Calautti, editors, 22nd International Conference on Database Theory, ICDT 2019, March 26-28, 2019, Lisbon, Portugal, volume 127 of LIPIcs, pages 7:1-7:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPICS.ICDT.2019.7.
  23. Floris Geerts, Thomas Muñoz, Cristian Riveros, and Domagoj Vrgoc. Expressive power of linear algebra query languages. In Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo, editors, Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2021, Virtual Event, China, June 20-25, 2021, pages 342-354. ACM, 2021. URL: https://doi.org/10.1145/3452021.3458314.
  24. Floris Geerts and Juan L. Reutter. Expressiveness and approximation properties of graph neural networks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL: https://openreview.net/forum?id=wIzUeM3TAU.
  25. Jonathan S Golan. Semirings and their Applications. Springer Science & Business Media, 2013. URL: https://doi.org/10.1007/978-94-015-9333-5.
  26. Todd J. Green, Gregory Karvounarakis, and Val Tannen. Provenance semirings. In Leonid Libkin, editor, Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2007, Beijing, China, June 11-13, 2007, pages 31-40. ACM, 2007. URL: https://doi.org/10.1145/1265530.1265535.
  27. Udo Hebisch and Hanns Joachim Weinert. Semirings: Algebraic theory and applications in computer science, volume 5. World Scientific, 1998. Google Scholar
  28. Udo Hebisch and Hans Joachim Weinert. Semirings and semifields. Handbook of Algebra, 1:425-462, 1996. Google Scholar
  29. Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 21-30. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746609.
  30. Botong Huang, Shivnath Babu, and Jun Yang. Cumulon: Optimizing statistical data analysis in the cloud. In Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pages 1-12. ACM, 2013. URL: https://doi.org/10.1145/2463676.2465273.
  31. Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. The dynamic Yannakakis algorithm: Compact and efficient query processing under updates. In Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 1259-1274. ACM, 2017. URL: https://doi.org/10.1145/3035918.3064027.
  32. Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. General dynamic Yannakakis: Conjunctive queries with theta joins under updates. VLDB J., 29(2-3):619-653, 2020. URL: https://doi.org/10.1007/S00778-019-00590-9.
  33. Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, and Zekai J. Gao. Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning. SIGMOD Rec., 49(1):43-50, 2020. URL: https://doi.org/10.1145/3422648.3422659.
  34. Konstantinos Kanellopoulos, Nandita Vijaykumar, Christina Giannoula, Roknoddin Azizi, Skanda Koppula, Nika Mansouri-Ghiasi, Taha Shahroodi, Juan Gómez-Luna, and Onur Mutlu. SMASH: co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019, pages 600-614. ACM, 2019. URL: https://doi.org/10.1145/3352460.3358286.
  35. Ahmet Kara, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Trade-offs in static and dynamic evaluation of hierarchical queries. In Dan Suciu, Yufei Tao, and Zhewei Wei, editors, Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pages 375-392. ACM, 2020. URL: https://doi.org/10.1145/3375395.3387646.
  36. Wojciech Kazana and Luc Segoufin. Enumeration of first-order queries on classes of structures with bounded expansion. In Richard Hull and Wenfei Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA, June 22 - 27, 2013, pages 297-308. ACM, 2013. URL: https://doi.org/10.1145/2463664.2463667.
  37. Shangyu Luo, Zekai J. Gao, Michael N. Gubanov, Luis Leopoldo Perez, and Christopher M. Jermaine. Scalable linear algebra on a relational database system. SIGMOD Rec., 47(1):24-31, 2018. URL: https://doi.org/10.1145/3277006.3277013.
  38. Thomas Muñoz, Cristian Riveros, and Stijn Vansummeren. Enumeration and updates for conjunctive linear algebra queries through expressibility. CoRR, abs/2310.04118, 2023. URL: https://doi.org/10.48550/arXiv.2310.04118.
  39. Nicole Schweikardt, Luc Segoufin, and Alexandre Vigny. Enumeration for FO queries over nowhere dense graphs. In Jan Van den Bussche and Marcelo Arenas, editors, Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2018, Houston, TX, USA, June 10-15, 2018, pages 151-163. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196971.
  40. Amir Shaikhha, Mohammed Elseidy, Stephan Mihaila, Daniel Espino, and Christoph Koch. Synthesis of incremental linear algebra programs. ACM Trans. Database Syst., 45(3):12:1-12:44, 2020. URL: https://doi.org/10.1145/3385398.
  41. Yisu Remy Wang, Shana Hutchison, Dan Suciu, Bill Howe, and Jonathan Leang. SPORES: sum-product optimization via relational equality saturation for large scale linear algebra. Proc. VLDB Endow., 13(11):1919-1932, 2020. URL: http://www.vldb.org/pvldb/vol13/p1919-wang.pdf.
  42. Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, and James Cheng. The best of both worlds: Big data programming with both productivity and performance. In Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, editors, Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 1619-1622. ACM, 2017. URL: https://doi.org/10.1145/3035918.3058735.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail