Learning Definable Hypotheses on Trees
We study the problem of learning properties of nodes in tree structures. Those properties are specified by logical formulas, such as formulas from first-order or monadic second-order logic. We think of the tree as a database encoding a large dataset and therefore aim for learning algorithms which depend at most sublinearly on the size of the tree. We present a learning algorithm for quantifier-free formulas where the running time only depends polynomially on the number of training examples, but not on the size of the background structure. By a previous result on strings we know that for general first-order or monadic second-order (MSO) formulas a sublinear running time cannot be achieved. However, we show that by building an index on the tree in a linear time preprocessing phase, we can achieve a learning algorithm for MSO formulas with a logarithmic learning phase.
monadic second-order logic
trees
query learning
Theory of computation~Logic
24:1-24:18
Regular Paper
Emilie
Grienenberger
Emilie Grienenberger
ENS Paris-Saclay, 61 Avenue du Président Wilson, 94230 Cachan, France
Martin
Ritzert
Martin Ritzert
RWTH Aachen University, Templergraben 55, 52062 Aachen, Germany
This work is supported by the German research council (DFG) Research Training Group 2236 UnRAVeL.
10.4230/LIPIcs.ICDT.2019.24
A. Abouzied, D. Angluin, C.H. Papadimitriou, J.M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In R. Hull and W. Fan, editors, Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 49-60, 2013.
D. Angluin. On the Complexity of Minimum Inference of Regular Sets. Information and Control, 39(3):337-350, 1978.
D. Angluin. Learning Regular Sets from Queries and Counterexamples. Information and Computation, 75(2):87-106, 1987.
D. Angluin. Negative Results for Equivalence Queries. Machine Learning, 5:121-150, 1990.
A. Balmin, Y. Papakonstantinou, and V. Vianu. Incremental validation of XML documents. ACM Trans. Database Syst., 29(4):710-751, 2004.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36:929-965, 1989.
M. Bojańczyk. Algorithms for regular languages that use algebra. SIGMOD Record, 41(2):5-14, 2012.
A. Bonifati, R. Ciucanu, and S. Staworko. Learning Join Queries from User Examples. ACM Trans. Database Syst., 40(4):24:1-24:38, 2016.
J Richard Büchi. Weak second-order arithmetic and finite automata. Mathematical Logic Quarterly, 6(1-6):66-92, 1960.
W.W. Cohen and C.D. Page. Polynomial Learnability and Inductive Logic Programming: Methods and Results. New generation Computing, 13:369-404, 1995.
T. Colcombet. Green’s Relations and Their Use in Automata Theory. In Language and Automata Theory and Applications - 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings, volume 6638 of Lecture Notes in Computer Science, pages 1-21. Springer, 2011.
B. Courcelle. The monadic second-order logic of graphs. I. Recognizable sets of finite graphs. Information and computation, 85(1):12-75, 1990.
F. Drewes and J. Högberg. Learning a regular tree language from a teacher. In Developments in Language Theory, pages 279-291. Springer, 2003.
P. Garg, D. Neider, P. Madhusudan, and D. Roth. Learning invariants using decision trees and implication counterexamples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 499-512, 2016.
E.M. Gold. Complexity of Automaton Identification from Given Data. Information and Control, 37(3):302-320, 1978.
M. Grohe, C. Löding, and M. Ritzert. Learning MSO-definable hypotheses on strings. In International Conference on Algorithmic Learning Theory, ALT 2017, 15-17 October 2017, Kyoto University, Kyoto, Japan, pages 434-451, 2017.
M. Grohe and M. Ritzert. Learning first-order definable concepts over structures of small degree. In Proceedings of the 32nd ACM-IEEE Symposium on Logic in Computer Science, 2017.
M. Grohe and G. Turán. Learnability and definability in trees and similar structures. Theory of Computing Systems, 37(1):193-220, 2004.
Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. siam Journal on Computing, 13(2):338-355, 1984.
C. Jordan and L. Kaiser. Machine Learning with Guarantees using Descriptive Complexity and SMT Solvers. ArXiv (CoRR), 1609.02664 [cs.LG], 2016. URL: http://arxiv.org/abs/1609.02664.
http://arxiv.org/abs/1609.02664
M.J. Kearns and L.G. Valiant. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. Journal of the ACM, 41(1):67-95, 1994.
J.-U. Kietz and S. Dzeroski. Inductive Logic Programming and Learnability. SIGART Bulletin, 5(1):22-32, 1994.
C. Löding, P. Madhusudan, and D. Neider. Abstract Learning Frameworks for Synthesis. In M. Chechik and J.-F. Raskin, editors, Proceedings of the 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 9636 of Lecture Notes in Computer Science, pages 167-185. Springer Verlag, 2016.
S. Muggleton. Inductive logic programming. New Generation Computing, 8(4):295-318, 1991.
S.H. Muggleton, editor. Inductive Logic Programming. Academic Press, 1992.
S.H. Muggleton and L. De Raedt. Inductive Logic Programming: Theory and methods. The Journal of Logic Programming, 19-20:629-679, 1994.
J. Oncina and P. García. Identifying regular languages in polynomial time. In Proceedings of the International Workshop on Structural and Syntactic Pattern Recognition, volume 5 of Machine Perception and Artificial Intelligence, pages 99 - -108. World Scientific, 1992.
L. Pitt and M.K. Warmuth. The Minimum Consistent DFA Problem Cannot be Approximated within any Polynomial. Journal of the ACM, 40(1):95-142, 1993.
M.O. Rabin and D.Scott. Finite Automata and Their Decision Problems. IBM Journal of Research and Development, 3:114-125, 1959.
R.L. Rivest and R.E. Schapire. Inference of Finite Automata Using Homing Sequences. In Machine Learning: From Theory to Applications, volume 661 of Lecture Notes in Computer Science, pages 51-73. Springer, 1993.
I. Simon. Factorization Forests of Finite Height. Theoretical Computer Science, 72(1):65-94, 1990.
Sławek Staworko and Piotr Wieczorek. Learning twig and path queries. In Proceedings of the 15th International Conference on Database Theory, pages 140-154. ACM, 2012.
W. Thomas. Languages, Automata, and Logic. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 389-456. Springer-Verlag, 1997.
L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, 1984.
V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971.
Y. Weiss and S. Cohen. Reverse Engineering SPJ-Queries from Examples. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 151-166. ACM, 2017.
Emilie Grienenberger and Martin Ritzert
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode