Indexing XML Documents Using Tree Paths Automaton

Authors Eliška Šestáková, Jan Janoušek



PDF
Thumbnail PDF

File

OASIcs.SLATE.2017.10.pdf
  • Filesize: 1.04 MB
  • 14 pages

Document Identifiers

Author Details

Eliška Šestáková
Jan Janoušek

Cite AsGet BibTex

Eliška Šestáková and Jan Janoušek. Indexing XML Documents Using Tree Paths Automaton. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/OASIcs.SLATE.2017.10

Abstract

An XML document can be viewed as a tree in a natural way. Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the XML index problem. In this paper, we attempt to support a significant fragment of XPath queries which may use any combination of child (i.e., /) and descendant-or-self (i.e., //) axis. A systematic approach to the construction of such XML index, which is a finite automaton called Tree Paths Automaton, is presented. Given an XML tree model T, the tree is first of all preprocessed by means of its linear fragments called string paths. Since only path queries are considered, the branching structure of the XML tree model can be omitted. For individual string paths, smaller Tree Paths Automata are built, and they are afterwards combined to form the index. The searching phase uses the index, reads an input query Q of size m, and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on the size of the XML document. Although the number of queries is clearly exponential in the number of nodes of the XML tree model, the size of the index seems to be, according to our experimental results, usually only about 2.5 times larger than the size of the original document.
Keywords
  • XML
  • XPath
  • index
  • tree
  • finite automaton

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Chin-Wan Chung, Jun-Ki Min, and Kyuseok Shim. APEX: an adaptive path index for XML data. In SIGMOD International Conference on Management of Data, pages 121-132, 2002. Google Scholar
  2. James Clark and Steve DeRose. XML Path Language (XPath) Version 1.0, 1999. URL: http://www.w3.org/TR/xpath.
  3. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007. Google Scholar
  4. Maxime Crochemore and Wojciech Rytter. Text Algorithms. Oxford University Press, 1994. Google Scholar
  5. Steve DeRose, Ron Daniel Jr., Paul Gross, Eve Maler, Jonathan Marsh, and Norman Walsh. XML Pointer Language (XPointer), 2002. URL: http://www.w3.org/TR/xptr.
  6. Steve DeRose, Eve Maler, and David Orchard. XML linking language (XLink) version 1.0. Technical report, World Wide Web Consortium, 2001. URL: http://www.w3.org/TR/xlink.
  7. Roy Goldman and Jennifer Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In 23rd International Conference on Very Large Data Bases, pages 436-445, 1997. Google Scholar
  8. Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton, and Henry F. Korth. Covering indexes for branching path queries. In SIGMOD International Conference on Management of Data, pages 133-144, 2002. Google Scholar
  9. Quanzhong Li and Bongki Moon. Indexing and querying XML data for regular path expressions. In 27th International Conference on Very Large Data Bases, pages 361-370, 2001. Google Scholar
  10. Bhushan Mandhani and Dan Suciu. Query caching and view selection for XML databases. In 31st International Conference on Very Large Data Bases, pages 469-480, 2005. Google Scholar
  11. Bořivoj Melichar, Jan Holub, and Tomáš Polcar. Text Searching Algorithms. Czech Technical University in Prague, 2005. Available at URL: http://www.stringology.org/athens/TextSearchingAlgorithms.
  12. Tova Milo and Dan Suciu. Index structures for path expressions. In Catriel Beeri and Peter Buneman, editors, 7th International Conference on Database Theory, pages 277-295, 1999. Google Scholar
  13. P. Mark Pettovello and Farshad Fotouhi. MTree: An XML XPath graph index. In ACM Symposium on Applied Computing, pages 474-481, 2006. Google Scholar
  14. Praveen Rao and Bongki Moon. PRIX: indexing and querying XML using prufer sequences. In 20th International Conference on Data Engineering, pages 288-299, March 2004. Google Scholar
  15. Albrecht Schmidt. XMark: an XML benchmark project. URL: http://www.xml-benchmark.org/.
  16. Eliška Šestáková and Jan Janoušek. Tree string path subsequences automaton and its use for indexing xml documents. In International Symposium on Languages, Applications and Technologies (SLATE), pages 171-181, 2015. Google Scholar
  17. Nan Tang, Jeffrey Xu Yu, M. Tamer Ozsu, and Kam-Fai Wong. Hierarchical indexing approach to support XPath queries. In IEEE 24th International Conference on Data Engineering, pages 1510-1512, April 2008. Google Scholar
  18. Haixun Wang, Sanghyun Park, Wei Fan, and Philip S. Yu. ViST: a dynamic index method for querying XML data by tree structures. In SIGMOD International Conference on Management of Data, pages 110-121, 2003. Google Scholar