Learning Tree Patterns from Example Graphs
This paper investigates the problem of learning tree patterns that return nodes with a given set of labels, from example graphs provided by the user. Example graphs are annotated by the user as being either positive or negative. The goal is then to determine whether there exists a tree pattern returning tuples of nodes with the given labels in each of the positive examples, but in none of the negative examples, and, furthermore, to find one such pattern if it exists. These are called the satisfiability and learning problems, respectively.
This paper thoroughly investigates the satisfiability and learning problems in a variety of settings. In particular, we consider example sets that (1) may contain only positive examples, or both positive and negative examples, (2) may contain directed or undirected graphs, and (3) may have multiple occurrences of labels or be uniquely labeled (to some degree). In addition, we consider tree patterns of different types that can allow, or prohibit, wildcard labeled nodes and descendant edges. We also consider two different semantics for mapping tree patterns to graphs. The complexity of satisfiability is determined for the different combinations of settings. For cases in which satisfiability is polynomial, it is also shown that learning is polynomial (This is non-trivial as satisfying patterns may be exponential in size). Finally, the minimal learning problem, i.e., that of finding a minimal-sized satisfying pattern, is studied for cases in which satisfiability is polynomial.
tree patterns
learning
examples
127-143
Regular Paper
Sara
Cohen
Sara Cohen
Yaacov Y.
Weiss
Yaacov Y. Weiss
10.4230/LIPIcs.ICDT.2015.127
Thomas Amoth, Paul Cull, and Prasad Tadepalli. On exact learning of unordered tree patterns. Machine Learning, 44:211-243, 2001.
Dana Angluin. Negative results for equivalence queries. Machine Learning, 5(2):121-150, July 1990.
Timos Antonopoulos, Frank Neven, and Frédéric Servais. Definability problems for graph query languages. In Proceedings of the 16th International Conference on Database Theory, pages 141-152, New York, NY, USA, 2013. ACM.
Hiroki Arimura, Hiroki Ishizaka, and Takeshi Shinohara. Learning unions of tree patterns using queries. Theor. Comput. Sci., 185(1):47-62, 1997.
Julien Carme, Michal Ceresna, and Max Goebel. Query-based learning of XPath expressions. In ICGI, 2006.
Adriane Chapman and H. V. Jagadish. Why not? In SIGMOD. ACM, 2009.
Sara Cohen and Yaacov Y. Weiss. Certain and possible XPath answers. In ICDT, 2013.
Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina, and Jennifer Widom. Synthesizing view definitions from data. In ICDT, 2010.
S. E. Dreyfus and R. A. Wagner. The steiner problem in graphs. Networks, 1(3):195-207, 1971.
Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.
Melanie Herschel, Mauricio A. Hernández, and Wang-Chiew Tan. Artemis: a system for analyzing missing answers. Proc. VLDB Endow., 2:1550-1553, August 2009.
Vagelis Hristidis, Yannis Papakonstantinou, and Andrey Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.
Jiansheng Huang, Ting Chen, AnHai Doan, and Jeffrey F. Naughton. On the provenance of non-answers to queries over extracted data. PVLDB, 1(1):736-747, 2008.
Chuntao Jiang, Frans Coenen, and Michele Zito. A survey of frequent subgraph mining algorithms. Knowledge Eng. Review, 28(1):75-105, 2013.
Benny Kimelfeld and Phokion G. Kolaitis. The complexity of mining maximal frequent subgraphs. In PODS, 2013.
Benny Kimelfeld and Yehoshua Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006.
Raymond Kosala, Maurice Bruynooghe, Jan Van Den Bussche, and Hendrik Blocked. Information extraction from web documents based on local unranked tree automaton inference. In IJCAI, 2003.
D. Kozen. Lower bounds for natural proof systems. In FOCS, 1977.
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. WHY SO? or WHY NO? Functional Causality for Explaining Query Answers. In Management of Uncertain Data, 2010.
Neeldhara Misra, Geevarghese Philip, Venkatesh Raman, Saket Saurabh, and Somnath Sikdar. FPT algorithms for connected feedback vertex set. J. Comb. Optim., 24(2):131-146, 2012.
Rika Okada, Satoshi Matsumoto, Tomoyuki Uchida, Yusuke Suzuki, and Takayoshi Shoudai. Exact learning of finite unions of graph patterns from queries. In Algorithmic Learning Theory, LNCS, pages 298-312. Springer Berlin Heidelberg, 2007.
Stefan Raeymaekers, Maurice Bruynooghe, and Jan Bussche. Learning (k,l)-contextual tree languages for information extraction from web pages. Machine Learning, 71(2-3):155-183, June 2008.
Slawek Staworko and Piotr Wieczorek. Learning twig and path queries. In ICDT, 2012.
L. J. Stockmeyer and A. R. Meyer. Word problems requiring exponential time. In STOC, 1973.
Quoc Trung Tran and Chee-Yong Chan. How to conquer why-not questions. In SIGMOD, 2010.
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. Query by output. In SIGMOD. ACM, 2009.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode