Niehren, Joachim ; Planque, Laurent ; Talbot, Jean-Marc ; Tison, Sophie

N-ary Queries by Tree Automata

05061.NiehrenJoachim.Paper.226.pdf (0.2 MB)


Information extraction from semi-structured documents requires to find n-ary queries in trees that define appropriate sets of n-tuples of nodes. We propose new representation formalisms for n-ary queries by tree automata that we prove to capture MSO. We then investigate n-ary queries by unambiguous tree automata which are relevant for query induction in multi-slot information extraction. We show that this representation formalism captures the class of n-ary queries that are finite unions of Cartesian closed queries, a property we prove decidable.

  author =	{Joachim Niehren and Laurent Planque and Jean-Marc Talbot and Sophie Tison},
  title =	{N-ary Queries by Tree Automata},
  booktitle =	{Foundations of Semistructured Data},
  year =	{2005},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  number =	{05061},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Internationales Begegnungs- und Forschungszentrum f{\"u}r Informatik (IBFI), Schloss Dagstuhl, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{},
  annote =	{Keywords: Information extraction, semistructured documents, node selecting queries in trees}

Seminar: 05061 - Foundations of Semistructured Data
Issue Date: 2005
Date of publication: 10.08.2005

