Learning Tree Pattern Transformations

Authors Daniel Neider , Leif Sabellek , Johannes Schmidt , Fabian Vehlken , Thomas Zeume



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.24.pdf
  • Filesize: 1.26 MB
  • 20 pages

Document Identifiers

Author Details

Daniel Neider
  • TU Dortmund University, Germany
  • Center for Trustworthy Data Science and Security, UA Ruhr, Germany
Leif Sabellek
  • CONTACT Research, Germany
Johannes Schmidt
  • Jönköping University, Sweden
Fabian Vehlken
  • Ruhr University Bochum, Germany
Thomas Zeume
  • Ruhr University Bochum, Germany

Acknowledgements

We are grateful to Florin Manea and Markus Schmid for insightful discussions.

Cite As Get BibTex

Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume. Learning Tree Pattern Transformations. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.24

Abstract

Explaining why and how a tree t structurally differs from another tree t^⋆ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set {(t₁, t₁^⋆),… , (t_n, t_n^⋆)} of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs (t_i, t_i^⋆)? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically?
We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.

Subject Classification

ACM Subject Classification
  • Theory of computation → Complexity theory and logic
  • Computing methodologies → Supervised learning
  • Theory of computation → Database theory
Keywords
  • Tree pattern transformations
  • learning from positive examples
  • computational complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Dana Angluin. Finding patterns common to a set of strings. Journal of Computer and System Sciences, 21(1):46-62, 1980. URL: https://doi.org/10.1016/0022-0000(80)90041-0.
  2. Florent Avellaneda and Alexandre Petrenko. Inferring DFA without negative examples. In Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek, editors, Proceedings of the 14th International Conference on Grammatical Inference, ICGI 2018, Wrocław, Poland, September 5-7, 2018, volume 93 of Proceedings of Machine Learning Research, pages 17-29. PMLR, 2018. URL: http://proceedings.mlr.press/v93/avellaneda19a.html.
  3. Philip Bille. A survey on tree edit distance and related problems. Theoretical computer science, 337(1-3):217-239, 2005. URL: https://doi.org/10.1016/J.TCS.2004.12.030.
  4. Mikolaj Bojanczyk and Amina Doumane. First-order tree-to-tree functions. In Holger Hermanns, Lijun Zhang, Naoki Kobayashi, and Dale Miller, editors, LICS '20: 35th Annual ACM/IEEE Symposium on Logic in Computer Science, Saarbrücken, Germany, July 8-11, 2020, pages 252-265. ACM, 2020. URL: https://doi.org/10.1145/3373718.3394785.
  5. Sara Cohen and Yaacov Y. Weiss. The complexity of learning tree patterns from example graphs. ACM Trans. Database Syst., 41(2):14:1-14:44, 2016. URL: https://doi.org/10.1145/2890492.
  6. Wojciech Czerwinski, Wim Martens, Matthias Niewerth, and Pawel Parys. Minimization of tree patterns. J. ACM, 65(4):26:1-26:46, 2018. URL: https://doi.org/10.1145/3180281.
  7. Michael R Garey and David S Johnson. Computers and intractability, volume 174. freeman San Francisco, 1979. Google Scholar
  8. Aurélien Lemay, Sebastian Maneth, and Joachim Niehren. A learning algorithm for top-down XML transformations. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 285-296. ACM, 2010. URL: https://doi.org/10.1145/1807085.1807122.
  9. Aurélien Lemay, Joachim Niehren, and Rémi Gilleron. Learning n-ary node selecting tree transducers from completely annotated examples. In Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, editors, Grammatical Inference: Algorithms and Applications, 8th International Colloquium, ICGI 2006, Tokyo, Japan, September 20-22, 2006, Proceedings, volume 4201 of Lecture Notes in Computer Science, pages 253-267. Springer, 2006. URL: https://doi.org/10.1007/11872436_21.
  10. Daniel Neider and Ivan Gavran. Learning linear temporal properties. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1-10. IEEE, 2018. URL: https://doi.org/10.23919/FMCAD.2018.8603016.
  11. Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, and Thomas Zeume. Learning tree pattern transformations, 2024. URL: https://doi.org/10.48550/arXiv.2410.07708.
  12. Rajarshi Roy, Dana Fisman, and Daniel Neider. Learning interpretable models in the property specification language. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 2213-2219. ijcai.org, 2020. URL: https://doi.org/10.24963/IJCAI.2020/306.
  13. Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Zhe Xu, and Ufuk Topcu. Learning interpretable temporal properties from positive examples only. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 6507-6515. AAAI Press, 2023. URL: https://doi.org/10.1609/AAAI.V37I5.25800.
  14. Marko Schmellenkamp, Alexandra Latys, and Thomas Zeume. Discovering and quantifying misconceptions in formal methods using intelligent tutoring systems. In Maureen Doyle, Ben Stephenson, Brian Dorn, Leen-Kiat Soh, and Lina Battestilli, editors, Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 1, SIGCSE 2023, Toronto, ON, Canada, March 15-18, 2023, pages 465-471. ACM, 2023. URL: https://doi.org/10.1145/3545945.3569806.
  15. Marko Schmellenkamp, Fabian Vehlken, and Thomas Zeume. Teaching formal foundations of computer science with Iltis. To be published in the Educational Column of the Bulletin of EATCS , Preprint: https://ruhr-uni-bochum.sciebo.de/s/l4JS7d2H3nypWbP, 2024.
  16. Thomas Schwentick. Automata for XML - A survey. J. Comput. Syst. Sci., 73(3):289-315, 2007. URL: https://doi.org/10.1016/J.JCSS.2006.10.003.
  17. Kaizhong Zhang, Richard Statman, and Dennis E. Shasha. On the editing distance between unordered labeled trees. Inf. Process. Lett., 42(3):133-139, 1992. URL: https://doi.org/10.1016/0020-0190(92)90136-J.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail