Learning Tree Pattern Transformations

Neider, Daniel; Sabellek, Leif; Schmidt, Johannes; Vehlken, Fabian; Zeume, Thomas

doi:10.4230/LIPIcs.ICDT.2025.24

Abstract

Explaining why and how a tree t structurally differs from another tree t^⋆ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set {(t₁, t₁^⋆),… , (t_n, t_n^⋆)} of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs (t_i, t_i^⋆)? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically?
We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Cite As Get BibTex

Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume. Learning Tree Pattern Transformations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.24

Author Details

Daniel Neider

TU Dortmund University, Germany
Center for Trustworthy Data Science and Security, UA Ruhr, Germany

Leif Sabellek

CONTACT Research, Germany

Johannes Schmidt

Jönköping University, Sweden

Fabian Vehlken

Ruhr University Bochum, Germany

Thomas Zeume

Ruhr University Bochum, Germany

Funding

Neider, Daniel: Supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), grant 434592664.
Schmidt, Johannes: Partially supported by the Swedish Research Council (VR), grant 2022-03214.
Vehlken, Fabian: Supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), grants 448468041 and 532727578.
Zeume, Thomas: Supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), grants 448468041 and 532727578.

Acknowledgements

We are grateful to Florin Manea and Markus Schmid for insightful discussions.

References

Dana Angluin. Finding patterns common to a set of strings. Journal of Computer and System Sciences, 21(1):46-62, 1980. URL: https://doi.org/10.1016/0022-0000(80)90041-0.
Florent Avellaneda and Alexandre Petrenko. Inferring DFA without negative examples. In Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek, editors, Proceedings of the 14th International Conference on Grammatical Inference, ICGI 2018, Wrocław, Poland, September 5-7, 2018, volume 93 of Proceedings of Machine Learning Research, pages 17-29. PMLR, 2018. URL: http://proceedings.mlr.press/v93/avellaneda19a.html.
Philip Bille. A survey on tree edit distance and related problems. Theoretical computer science, 337(1-3):217-239, 2005. URL: https://doi.org/10.1016/J.TCS.2004.12.030.
Mikolaj Bojanczyk and Amina Doumane. First-order tree-to-tree functions. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors, LICS '20: 35th Annual ACM/IEEE Symposium on Logic in Computer Science, Saarbrücken, Germany, July 8-11, 2020, pages 252-265. ACM, 2020. URL: https://doi.org/10.1145/3373718.3394785.
Sara Cohen and Yaacov Y. Weiss. The complexity of learning tree patterns from example graphs. ACM Trans. Database Syst., 41(2):14:1-14:44, 2016. URL: https://doi.org/10.1145/2890492.
Wojciech Czerwinski, Wim Martens, Matthias Niewerth, and Pawel Parys. Minimization of tree patterns. J. ACM, 65(4):26:1-26:46, 2018. URL: https://doi.org/10.1145/3180281.
Michael R Garey and David S Johnson. Computers and intractability, volume 174. freeman San Francisco, 1979.
Aurélien Lemay, Sebastian Maneth, and Joachim Niehren. A learning algorithm for top-down XML transformations. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 285-296. ACM, 2010. URL: https://doi.org/10.1145/1807085.1807122.
Aurélien Lemay, Joachim Niehren, and Rémi Gilleron. Learning n-ary node selecting tree transducers from completely annotated examples. In Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, editors, Grammatical Inference: Algorithms and Applications, 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings, volume 4201 of Lecture Notes in Computer Science, pages 253-267. Springer, 2006. URL: https://doi.org/10.1007/11872436_21.
Daniel Neider and Ivan Gavran. Learning linear temporal properties. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1-10. IEEE, 2018. URL: https://doi.org/10.23919/FMCAD.2018.8603016.
Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume. Learning tree pattern transformations, 2024. URL: https://doi.org/10.48550/arXiv.2410.07708.
Rajarshi Roy, Dana Fisman, and Daniel Neider. Learning interpretable models in the property specification language. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 2213-2219. ijcai.org, 2020. URL: https://doi.org/10.24963/IJCAI.2020/306.
Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Zhe Xu, and Ufuk Topcu. Learning interpretable temporal properties from positive examples only. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 6507-6515. AAAI Press, 2023. URL: https://doi.org/10.1609/AAAI.V37I5.25800.
Marko Schmellenkamp, Alexandra Latys, and Thomas Zeume. Discovering and quantifying misconceptions in formal methods using intelligent tutoring systems. In Maureen Doyle, Ben Stephenson, Brian Dorn, Leen-Kiat Soh, and Lina Battestilli, editors, Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 1, SIGCSE 2023, Toronto, ON, Canada, March 15-18, 2023, pages 465-471. ACM, 2023. URL: https://doi.org/10.1145/3545945.3569806.
Marko Schmellenkamp, Fabian Vehlken, and Thomas Zeume. Teaching formal foundations of computer science with Iltis. To be published in the Educational Column of the Bulletin of EATCS , Preprint: https://ruhr-uni-bochum.sciebo.de/s/l4JS7d2H3nypWbP, 2024.
Thomas Schwentick. Automata for XML - A survey. J. Comput. Syst. Sci., 73(3):289-315, 2007. URL: https://doi.org/10.1016/J.JCSS.2006.10.003.
Kaizhong Zhang, Richard Statman, and Dennis E. Shasha. On the editing distance between unordered labeled trees. Inf. Process. Lett., 42(3):133-139, 1992. URL: https://doi.org/10.1016/0020-0190(92)90136-J.

Learning Tree Pattern Transformations

Authors Daniel Neider , Leif Sabellek , Johannes Schmidt , Fabian Vehlken , Thomas Zeume

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Learning Tree Pattern Transformations

Authors Daniel Neider , Leif Sabellek , Johannes Schmidt , Fabian Vehlken , Thomas Zeume

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message