Learning Definable Hypotheses on Trees

Authors Emilie Grienenberger, Martin Ritzert

Thumbnail PDF


  • Filesize: 0.49 MB
  • 18 pages

Document Identifiers

Author Details

Emilie Grienenberger
  • ENS Paris-Saclay, 61 Avenue du Président Wilson, 94230 Cachan, France
Martin Ritzert
  • RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany

Cite AsGet BibTex

Emilie Grienenberger and Martin Ritzert. Learning Definable Hypotheses on Trees. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 24:1-24:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


We study the problem of learning properties of nodes in tree structures. Those properties are specified by logical formulas, such as formulas from first-order or monadic second-order logic. We think of the tree as a database encoding a large dataset and therefore aim for learning algorithms which depend at most sublinearly on the size of the tree. We present a learning algorithm for quantifier-free formulas where the running time only depends polynomially on the number of training examples, but not on the size of the background structure. By a previous result on strings we know that for general first-order or monadic second-order (MSO) formulas a sublinear running time cannot be achieved. However, we show that by building an index on the tree in a linear time preprocessing phase, we can achieve a learning algorithm for MSO formulas with a logarithmic learning phase.

Subject Classification

ACM Subject Classification
  • Theory of computation → Logic
  • monadic second-order logic
  • trees
  • query learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. A. Abouzied, D. Angluin, C.H. Papadimitriou, J.M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In R. Hull and W. Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 49-60, 2013. Google Scholar
  2. D. Angluin. On the Complexity of Minimum Inference of Regular Sets. Information and Control, 39(3):337-350, 1978. Google Scholar
  3. D. Angluin. Learning Regular Sets from Queries and Counterexamples. Information and Computation, 75(2):87-106, 1987. Google Scholar
  4. D. Angluin. Negative Results for Equivalence Queries. Machine Learning, 5:121-150, 1990. Google Scholar
  5. A. Balmin, Y. Papakonstantinou, and V. Vianu. Incremental validation of XML documents. ACM Trans. Database Syst., 29(4):710-751, 2004. Google Scholar
  6. A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36:929-965, 1989. Google Scholar
  7. M. Bojańczyk. Algorithms for regular languages that use algebra. SIGMOD Record, 41(2):5-14, 2012. Google Scholar
  8. A. Bonifati, R. Ciucanu, and S. Staworko. Learning Join Queries from User Examples. ACM Trans. Database Syst., 40(4):24:1-24:38, 2016. Google Scholar
  9. J Richard Büchi. Weak second-order arithmetic and finite automata. Mathematical Logic Quarterly, 6(1-6):66-92, 1960. Google Scholar
  10. W.W. Cohen and C.D. Page. Polynomial Learnability and Inductive Logic Programming: Methods and Results. New generation Computing, 13:369-404, 1995. Google Scholar
  11. T. Colcombet. Green’s Relations and Their Use in Automata Theory. In Language and Automata Theory and Applications - 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings, volume 6638 of Lecture Notes in Computer Science, pages 1-21. Springer, 2011. Google Scholar
  12. B. Courcelle. The monadic second-order logic of graphs. I. Recognizable sets of finite graphs. Information and computation, 85(1):12-75, 1990. Google Scholar
  13. F. Drewes and J. Högberg. Learning a regular tree language from a teacher. In Developments in Language Theory, pages 279-291. Springer, 2003. Google Scholar
  14. P. Garg, D. Neider, P. Madhusudan, and D. Roth. Learning invariants using decision trees and implication counterexamples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 499-512, 2016. Google Scholar
  15. E.M. Gold. Complexity of Automaton Identification from Given Data. Information and Control, 37(3):302-320, 1978. Google Scholar
  16. M. Grohe, C. Löding, and M. Ritzert. Learning MSO-definable hypotheses on strings. In International Conference on Algorithmic Learning Theory, ALT 2017, 15-17 October 2017, Kyoto University, Kyoto, Japan, pages 434-451, 2017. Google Scholar
  17. M. Grohe and M. Ritzert. Learning first-order definable concepts over structures of small degree. In Proceedings of the 32nd ACM-IEEE Symposium on Logic in Computer Science, 2017. Google Scholar
  18. M. Grohe and G. Turán. Learnability and definability in trees and similar structures. Theory of Computing Systems, 37(1):193-220, 2004. Google Scholar
  19. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. siam Journal on Computing, 13(2):338-355, 1984. Google Scholar
  20. C. Jordan and L. Kaiser. Machine Learning with Guarantees using Descriptive Complexity and SMT Solvers. ArXiv (CoRR), 1609.02664 [cs.LG], 2016. URL: http://arxiv.org/abs/1609.02664.
  21. M.J. Kearns and L.G. Valiant. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. Journal of the ACM, 41(1):67-95, 1994. Google Scholar
  22. J.-U. Kietz and S. Dzeroski. Inductive Logic Programming and Learnability. SIGART Bulletin, 5(1):22-32, 1994. Google Scholar
  23. C. Löding, P. Madhusudan, and D. Neider. Abstract Learning Frameworks for Synthesis. In M. Chechik and J.-F. Raskin, editors, Proceedings of the 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 9636 of Lecture Notes in Computer Science, pages 167-185. Springer Verlag, 2016. Google Scholar
  24. S. Muggleton. Inductive logic programming. New Generation Computing, 8(4):295-318, 1991. Google Scholar
  25. S.H. Muggleton, editor. Inductive Logic Programming. Academic Press, 1992. Google Scholar
  26. S.H. Muggleton and L. De Raedt. Inductive Logic Programming: Theory and methods. The Journal of Logic Programming, 19-20:629-679, 1994. Google Scholar
  27. J. Oncina and P. García. Identifying regular languages in polynomial time. In Proceedings of the International Workshop on Structural and Syntactic Pattern Recognition, volume 5 of Machine Perception and Artificial Intelligence, pages 99 - -108. World Scientific, 1992. Google Scholar
  28. L. Pitt and M.K. Warmuth. The Minimum Consistent DFA Problem Cannot be Approximated within any Polynomial. Journal of the ACM, 40(1):95-142, 1993. Google Scholar
  29. M.O. Rabin and D.Scott. Finite Automata and Their Decision Problems. IBM Journal of Research and Development, 3:114-125, 1959. Google Scholar
  30. R.L. Rivest and R.E. Schapire. Inference of Finite Automata Using Homing Sequences. In Machine Learning: From Theory to Applications, volume 661 of Lecture Notes in Computer Science, pages 51-73. Springer, 1993. Google Scholar
  31. I. Simon. Factorization Forests of Finite Height. Theoretical Computer Science, 72(1):65-94, 1990. Google Scholar
  32. Sławek Staworko and Piotr Wieczorek. Learning twig and path queries. In Proceedings of the 15th International Conference on Database Theory, pages 140-154. ACM, 2012. Google Scholar
  33. W. Thomas. Languages, Automata, and Logic. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 389-456. Springer-Verlag, 1997. Google Scholar
  34. L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984. Google Scholar
  35. V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971. Google Scholar
  36. Y. Weiss and S. Cohen. Reverse Engineering SPJ-Queries from Examples. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 151-166. ACM, 2017. Google Scholar