,
Leif Sabellek
,
Johannes Schmidt
,
Fabian Vehlken
,
Thomas Zeume
Creative Commons Attribution 4.0 International license
Explaining why and how a tree t structurally differs from another tree t^⋆ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set {(t₁, t₁^⋆),… , (t_n, t_n^⋆)} of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs (t_i, t_i^⋆)? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically?
We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.
@InProceedings{neider_et_al:LIPIcs.ICDT.2025.24,
author = {Neider, Daniel and Sabellek, Leif and Schmidt, Johannes and Vehlken, Fabian and Zeume, Thomas},
title = {{Learning Tree Pattern Transformations}},
booktitle = {28th International Conference on Database Theory (ICDT 2025)},
pages = {24:1--24:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-364-5},
ISSN = {1868-8969},
year = {2025},
volume = {328},
editor = {Roy, Sudeepa and Kara, Ahmet},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.24},
URN = {urn:nbn:de:0030-drops-229652},
doi = {10.4230/LIPIcs.ICDT.2025.24},
annote = {Keywords: Tree pattern transformations, learning from positive examples, computational complexity}
}