Indexing XML Documents Using Tree Paths Automaton

Šestáková, Eliška; Janoušek, Jan

doi:10.4230/OASIcs.SLATE.2017.10

File

OASIcs.SLATE.2017.10.pdf

Filesize: 1.04 MB
14 pages

Document Identifiers

DOI: 10.4230/OASIcs.SLATE.2017.10
URN: urn:nbn:de:0030-drops-79457

Author Details

Eliška Šestáková

Jan Janoušek

Cite AsGet BibTex

Eliška Šestáková and Jan Janoušek. Indexing XML Documents Using Tree Paths Automaton. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/OASIcs.SLATE.2017.10

Abstract

An XML document can be viewed as a tree in a natural way. Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the XML index problem. In this paper, we attempt to support a significant fragment of XPath queries which may use any combination of child (i.e., /) and descendant-or-self (i.e., //) axis. A systematic approach to the construction of such XML index, which is a finite automaton called Tree Paths Automaton, is presented. Given an XML tree model T, the tree is first of all preprocessed by means of its linear fragments called string paths. Since only path queries are considered, the branching structure of the XML tree model can be omitted. For individual string paths, smaller Tree Paths Automata are built, and they are afterwards combined to form the index. The searching phase uses the index, reads an input query Q of size m, and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on the size of the XML document. Although the number of queries is clearly exponential in the number of nodes of the XML tree model, the size of the index seems to be, according to our experimental results, usually only about 2.5 times larger than the size of the original document.

Keywords

XML
XPath
index
tree
finite automaton

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Chin-Wan Chung, Jun-Ki Min, and Kyuseok Shim. APEX: an adaptive path index for XML data. In SIGMOD International Conference on Management of Data, pages 121-132, 2002.
James Clark and Steve DeRose. XML Path Language (XPath) Version 1.0, 1999. URL: http://www.w3.org/TR/xpath.
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007.
Maxime Crochemore and Wojciech Rytter. Text Algorithms. Oxford University Press, 1994.
Steve DeRose, Ron Daniel Jr., Paul Gross, Eve Maler, Jonathan Marsh, and Norman Walsh. XML Pointer Language (XPointer), 2002. URL: http://www.w3.org/TR/xptr.
Steve DeRose, Eve Maler, and David Orchard. XML linking language (XLink) version 1.0. Technical report, World Wide Web Consortium, 2001. URL: http://www.w3.org/TR/xlink.
Roy Goldman and Jennifer Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In 23rd International Conference on Very Large Data Bases, pages 436-445, 1997.
Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton, and Henry F. Korth. Covering indexes for branching path queries. In SIGMOD International Conference on Management of Data, pages 133-144, 2002.
Quanzhong Li and Bongki Moon. Indexing and querying XML data for regular path expressions. In 27th International Conference on Very Large Data Bases, pages 361-370, 2001.
Bhushan Mandhani and Dan Suciu. Query caching and view selection for XML databases. In 31st International Conference on Very Large Data Bases, pages 469-480, 2005.
Bořivoj Melichar, Jan Holub, and Tomáš Polcar. Text Searching Algorithms. Czech Technical University in Prague, 2005. Available at URL: http://www.stringology.org/athens/TextSearchingAlgorithms.
Tova Milo and Dan Suciu. Index structures for path expressions. In Catriel Beeri and Peter Buneman, editors, 7th International Conference on Database Theory, pages 277-295, 1999.
P. Mark Pettovello and Farshad Fotouhi. MTree: An XML XPath graph index. In ACM Symposium on Applied Computing, pages 474-481, 2006.
Praveen Rao and Bongki Moon. PRIX: indexing and querying XML using prufer sequences. In 20th International Conference on Data Engineering, pages 288-299, March 2004.
Albrecht Schmidt. XMark: an XML benchmark project. URL: http://www.xml-benchmark.org/.
Eliška Šestáková and Jan Janoušek. Tree string path subsequences automaton and its use for indexing xml documents. In International Symposium on Languages, Applications and Technologies (SLATE), pages 171-181, 2015.
Nan Tang, Jeffrey Xu Yu, M. Tamer Ozsu, and Kam-Fai Wong. Hierarchical indexing approach to support XPath queries. In IEEE 24th International Conference on Data Engineering, pages 1510-1512, April 2008.
Haixun Wang, Sanghyun Park, Wei Fan, and Philip S. Yu. ViST: a dynamic index method for querying XML data by tree structures. In SIGMOD International Conference on Management of Data, pages 110-121, 2003.