Learning Aggregate Queries Defined by First-Order Logic with Counting

van Bergerem, Steffen; Schweikardt, Nicole

doi:10.4230/LIPIcs.ICDT.2025.4

Abstract

In the logical framework introduced by Grohe and Turán (TOCS 2004) for Boolean classification problems, the instances to classify are tuples from a logical structure, and Boolean classifiers are described by parametric models based on logical formulas. This is a specific scenario for supervised passive learning, where classifiers should be learned based on labelled examples. Existing results in this scenario focus on Boolean classification. This paper presents learnability results beyond Boolean classification. We focus on multiclass classification problems where the task is to assign input tuples to arbitrary integers. To represent such integer-valued classifiers, we use aggregate queries specified by an extension of first-order logic with counting terms called FOC₁.
Our main result shows the following: given a database of polylogarithmic degree, within quasi-linear time, we can build an index structure that makes it possible to learn FOC₁-definable integer-valued classifiers in time polylogarithmic in the size of the database and polynomial in the number of training examples.

Azza Abouzied, Dana Angluin, Christos H. Papadimitriou, Joseph M. Hellerstein, and Avi Silberschatz. Learning and verifying quantified Boolean queries by example. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, pages 49-60. ACM, 2013. URL: https://doi.org/10.1145/2463664.2465220.
Howard Aizenstein, Tibor Hegedüs, Lisa Hellerstein, and Leonard Pitt. Complexity theoretic hardness results for query learning. Comput. Complex., 7(1):19-53, 1998. URL: https://doi.org/10.1007/PL00001593.
Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan. Characterizing schema mappings via data examples. ACM Trans. Database Syst., 36(4):23:1-23:48, 2011. URL: https://doi.org/10.1145/2043652.2043656.
Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319-342, 1987. URL: https://doi.org/10.1007/BF00116828.
Pablo Barceló, Alexander Baumgartner, Victor Dalmau, and Benny Kimelfeld. Regularizing conjunctive features for classification. J. Comput. Syst. Sci., 119:97-124, 2021. URL: https://doi.org/10.1016/j.jcss.2021.01.003.
Pablo Barceló and Miguel Romero. The complexity of reverse engineering problems for conjunctive queries. In 20th International Conference on Database Theory, ICDT 2017, volume 68 of LIPIcs, pages 7:1-7:17, 2017. URL: https://doi.org/10.4230/LIPIcs.ICDT.2017.7.
Steffen van Bergerem. Learning concepts definable in first-order logic with counting. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, pages 1-13. IEEE, 2019. URL: https://doi.org/10.1109/LICS.2019.8785811.
Steffen van Bergerem. Descriptive Complexity of Learning. PhD thesis, RWTH Aachen University, Germany, 2023. URL: https://doi.org/10.18154/RWTH-2023-02554.
Steffen van Bergerem, Martin Grohe, and Martin Ritzert. On the parameterized complexity of learning first-order logic. In PODS 2022: International Conference on Management of Data, pages 337-346. ACM, 2022. URL: https://doi.org/10.1145/3517804.3524151.
Steffen van Bergerem and Nicole Schweikardt. Learning concepts described by weight aggregation logic. In 29th EACSL Annual Conference on Computer Science Logic, CSL 2021, Ljubljana, Slovenia (Virtual Conference), January 25-28, 2021, volume 183 of LIPIcs, pages 10:1-10:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.CSL.2021.10.
Angela Bonifati, Radu Ciucanu, and Aurélien Lemay. Learning path queries on graph databases. In Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, pages 109-120. OpenProceedings.org, 2015. URL: https://doi.org/10.5441/002/edbt.2015.11.
Angela Bonifati, Radu Ciucanu, and Slawek Staworko. Learning join queries from user examples. ACM Trans. Database Syst., 40(4):24:1-24:38, 2016. URL: https://doi.org/10.1145/2818637.
Angela Bonifati, Ugo Comignani, Emmanuel Coquery, and Romuald Thion. Interactive mapping specification with exemplar tuples. ACM Trans. Database Syst., 44(3):10:1-10:44, 2019. URL: https://doi.org/10.1145/3321485.
Nataly Brukhim, Daniel Carmon, Irit Dinur, Shay Moran, and Amir Yehudayoff. A characterization of multiclass learnability. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, pages 943-955. IEEE, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00093.
Balder ten Cate and Victor Dalmau. Conjunctive queries: Unique characterizations and exact learnability. In 24th International Conference on Database Theory, ICDT 2021, volume 186 of LIPIcs, pages 9:1-9:24, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.9.
Balder ten Cate, Victor Dalmau, and Phokion G. Kolaitis. Learning schema mappings. ACM Trans. Database Syst., 38(4):28:1-28:31, 2013. URL: https://doi.org/10.1145/2539032.2539035.
Balder ten Cate, Phokion G. Kolaitis, Kun Qian, and Wang-Chiew Tan. Active learning of GAV schema mappings. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2018, pages 355-368. ACM, 2018. URL: https://doi.org/10.1145/3196959.3196974.
Adrien Champion, Tomoya Chiba, Naoki Kobayashi, and Ryosuke Sato. ICE-based refinement type discovery for higher-order functional programs. J. Autom. Reason., 64(7):1393-1418, 2020. URL: https://doi.org/10.1007/s10817-020-09571-y.
William W. Cohen and C. David Page Jr. Polynomial learnability and inductive logic programming: Methods and results. New Gener. Comput., 13(3&4):369-409, 1995. URL: https://doi.org/10.1007/BF03037231.
Andrew Cropper, Sebastijan Dumancic, Richard Evans, and Stephen H. Muggleton. Inductive logic programming at 30. Mach. Learn., 111(1):147-172, 2022. URL: https://doi.org/10.1007/s10994-021-06089-1.
Amit Daniely, Sivan Sabato, and Shai Shalev-Shwartz. Multiclass learning approaches: A theoretical comparison with implications. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, pages 494-502, 2012. URL: https://proceedings.neurips.cc/paper/2012/hash/19f3cd308f1455b3fa09a282e0d496f4-Abstract.html.
Amit Daniely and Shai Shalev-Shwartz. Optimal learners for multiclass problems. In Maria-Florina Balcan, Vitaly Feldman, and Csaba Szepesvári, editors, Proceedings of The 27th Conference on Learning Theory, COLT 2014, Barcelona, Spain, June 13-15, 2014, volume 35 of JMLR Workshop and Conference Proceedings, pages 287-316. JMLR.org, 2014. URL: http://proceedings.mlr.press/v35/daniely14b.html.
P. Ezudheen, Daniel Neider, Deepak D'Souza, Pranav Garg, and P. Madhusudan. Horn-ICE learning for synthesizing invariants and contracts. Proc. ACM Program. Lang., Volume 2( Issue OOPSLA):131:1-131:25, 2018. URL: https://doi.org/10.1145/3276501.
Solomon Feferman and Robert L. Vaught. The first-order properties of products of algebraic systems. Fundamenta Mathematicae, 47:57-103, 1959.
Jörg Flum and Martin Grohe. Parameterized Complexity Theory. Texts in Theoretical Computer Science. An EATCS Series. Springer, 2006. URL: https://doi.org/10.1007/3-540-29953-X.
Haim Gaifman. On local and non-local properties. In Jacques Stern, editor, Proceedings of the Herbrand Symposium, volume 107 of Studies in Logic and the Foundations of Mathematics, pages 105-135. North-Holland, 1982. URL: https://doi.org/10.1016/S0049-237X(08)71879-2.
Pranav Garg, Christof Löding, P. Madhusudan, and Daniel Neider. ICE: A robust framework for learning invariants. In Computer Aided Verification - 26th International Conference, CAV 2014, volume 8559 of Lecture Notes in Computer Science, pages 69-87. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-08867-9_5.
Georg Gottlob and Pierre Senellart. Schema mapping discovery from data instances. J. ACM, 57(2):6:1-6:37, 2010. URL: https://doi.org/10.1145/1667053.1667055.
Emilie Grienenberger and Martin Ritzert. Learning definable hypotheses on trees. In 22nd International Conference on Database Theory, ICDT 2019, pages 24:1-24:18, 2019. URL: https://doi.org/10.4230/LIPIcs.ICDT.2019.24.
Martin Grohe. Logic, graphs, and algorithms. In Jörg Flum, Erich Grädel, and Thomas Wilke, editors, Logic and Automata: History and Perspectives [in Honor of Wolfgang Thomas], volume 2 of Texts in Logic and Games, pages 357-422. Amsterdam University Press, 2008.
Martin Grohe, Stephan Kreutzer, and Sebastian Siebertz. Deciding first-order properties of nowhere dense graphs. J. ACM, 64(3):17:1-17:32, 2017. URL: https://doi.org/10.1145/3051095.
Martin Grohe, Christof Löding, and Martin Ritzert. Learning MSO-definable hypotheses on strings. In International Conference on Algorithmic Learning Theory, ALT 2017, pages 434-451, 2017. URL: http://proceedings.mlr.press/v76/grohe17a.html.
Martin Grohe and Martin Ritzert. Learning first-order definable concepts over structures of small degree. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, pages 1-12, 2017. URL: https://doi.org/10.1109/LICS.2017.8005080.
Martin Grohe and Nicole Schweikardt. First-order query evaluation with cardinality conditions. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2018, pages 253-266, 2018. URL: https://doi.org/10.1145/3196959.3196970.
Martin Grohe and György Turán. Learnability and definability in trees and similar structures. Theory Comput. Syst., 37(1):193-220, 2004. URL: https://doi.org/10.1007/s00224-003-1112-8.
Steve Hanneke, Shay Moran, and Qian Zhang. Universal rates for multiclass learning. In The 36th Annual Conference on Learning Theory, COLT 2023, volume 195 of Proceedings of Machine Learning Research, pages 5615-5681. PMLR, 2023. URL: https://proceedings.mlr.press/v195/hanneke23a.html.
David Haussler. Learning conjunctive concepts in structural domains. Mach. Learn., 4:7-40, 1989. URL: https://doi.org/10.1007/BF00114802.
Kouichi Hirata. On the hardness of learning acyclic conjunctive queries. In Algorithmic Learning Theory, 11th International Conference, ALT 2000, volume 1968 of Lecture Notes in Computer Science, pages 238-251. Springer, 2000. URL: https://doi.org/10.1007/3-540-40992-0_18.
Jörg-Uwe Kietz and Saso Dzeroski. Inductive logic programming and learnability. SIGART Bull., 5(1):22-32, 1994. URL: https://doi.org/10.1145/181668.181674.
Benny Kimelfeld and Christopher Ré. A relational framework for classifier engineering. ACM Trans. Database Syst., 43(3):11:1-11:36, 2018. URL: https://doi.org/10.1145/3268931.
Stephan Kreutzer. Algorithmic meta-theorems. In Javier Esparza, Christian Michaux, and Charles Steinhorn, editors, Finite and Algorithmic Model Theory, volume 379 of London Mathematical Society Lecture Note Series, pages 177-270. Cambridge University Press, 2011.
Dietrich Kuske and Nicole Schweikardt. First-order logic with counting. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, pages 1-12. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/LICS.2017.8005133.
Christof Löding, P. Madhusudan, and Daniel Neider. Abstract learning frameworks for synthesis. In Tools and Algorithms for the Construction and Analysis of Systems - 22nd International Conference, TACAS 2016, volume 9636 of Lecture Notes in Computer Science, pages 167-185. Springer, 2016. URL: https://doi.org/10.1007/978-3-662-49674-9_10.
Johann A. Makowsky. Algorithmic uses of the Feferman-Vaught theorem. Ann. Pure Appl. Log., 126(1-3):159-213, 2004. URL: https://doi.org/10.1016/j.apal.2003.11.002.
Denis Mayr Lima Martins. Reverse engineering database queries from examples: State-of-the-art, challenges, and research opportunities. Inf. Syst., 83:89-100, 2019. URL: https://doi.org/10.1016/j.is.2019.03.002.
Stephen Muggleton. Inductive logic programming. New Gener. Comput., 8(4):295-318, 1991. URL: https://doi.org/10.1007/BF03037089.
Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. J. Log. Program., 19/20:629-679, 1994. URL: https://doi.org/10.1016/0743-1066(94)90035-3.
Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York, NY, USA, 2014. URL: https://doi.org/10.1017/CBO9781107298019.
Robert H. Sloan, Balázs Szörényi, and Gyö rgy Turán. Learning Boolean functions with queries. In Yves Crama and Peter L. Hammer, editors, Boolean Models and Methods in Mathematics, Computer Science, and Engineering, pages 221-256. Cambridge University Press, 2010. URL: https://doi.org/10.1017/cbo9780511780448.010.
Slawek Staworko and Piotr Wieczorek. Learning twig and path queries. In 15th International Conference on Database Theory, ICDT 2012, pages 140-154. ACM, 2012. URL: https://doi.org/10.1145/2274576.2274592.
Wei Chit Tan, Meihui Zhang, Hazem Elmeleegy, and Divesh Srivastava. Reverse engineering aggregation queries. Proc. VLDB Endow., 10(11):1394-1405, 2017. URL: https://doi.org/10.14778/3137628.3137648.
Wei Chit Tan, Meihui Zhang, Hazem Elmeleegy, and Divesh Srivastava. REGAL+: reverse engineering SPJA queries. Proc. VLDB Endow., 11(12):1982-1985, 2018. URL: https://doi.org/10.14778/3229863.3236240.
Quoc Trung Tran, Chee Yong Chan, and Srinivasan Parthasarathy. Query reverse engineering. VLDB J., 23(5):721-746, 2014. URL: https://doi.org/10.1007/s00778-013-0349-3.
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pages 452-466. ACM, 2017. URL: https://doi.org/10.1145/3062341.3062365.
Yaacov Y. Weiss and Sara Cohen. Reverse engineering SPJ-queries from examples. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, pages 151-166. ACM, 2017. URL: https://doi.org/10.1145/3034786.3056112.
He Zhu, Stephen Magill, and Suresh Jagannathan. A data-driven CHC solver. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pages 707-721. ACM, 2018. URL: https://doi.org/10.1145/3192366.3192416.

Learning Aggregate Queries Defined by First-Order Logic with Counting

Authors Steffen van Bergerem , Nicole Schweikardt

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Learning Aggregate Queries Defined by First-Order Logic with Counting

Authors Steffen van Bergerem , Nicole Schweikardt

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message