Node Identification Schemes for Efficient XML Retrieval

Authors Felix Weigel, Klaus U. Schulz, Holger Meuss



PDF
Thumbnail PDF

File

DagSemProc.05061.6.pdf
  • Filesize: 0.83 MB
  • 23 pages

Document Identifiers

Author Details

Felix Weigel
Klaus U. Schulz
Holger Meuss

Cite As Get BibTex

Felix Weigel, Klaus U. Schulz, and Holger Meuss. Node Identification Schemes for Efficient XML Retrieval. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005) https://doi.org/10.4230/DagSemProc.05061.6

Abstract

Node identifiers (IDs) encoding part of the tree structure in XML documents can save I/O for table look-ups,
thus speeding up the evaluation of path and tree queries on large persistent document collections. In
particular, binary tree relations such as the extended XPath axes can be either decided for a given pair of
node IDs, or reconstructed for a single node ID, without access to secondary storage. Several ID schemes have
been proposed so far, which differ with respect to (1) expressiveness, i.e. which relations can be
decided or reconstructed from IDs, (2) the runtime performance and asymptotic behaviour of decision and
reconstruction operations, (3) the storage overhead for the IDs, and (4) robustness, i.e. behaviour in the
presence of updates. First we review five ID schemes, positioning them in the trade-off between these four comparison
criteria. Then a new ID scheme called BIRD, for Balanced Index-based ID scheme for Reconstruction and
Decision, is introduced and illustrated throughout several examples of decision and reconstruction operations
on IDs. We argue that emphasizing runtime performance and expressive power, BIRDs strategy in the above
trade-off is best for many applications, especially where storage minimization is not the primary goal and
updates occur in a bulk-fashion rather than in realtime. Our experimental results on document collections of
up to one gigabyte prove BIRD to be most efficient in terms of expressiveness and runtime performance. Most
notably, BIRD is the only scheme to support both decision and reconstruction of many relations in constant
time. But also in terms of storage and robustness BIRD is highly competitive.

Subject Classification

Keywords
  • node identification scheme
  • labelling scheme
  • numbering scheme
  • naming scheme
  • tree encoding
  • BIRD

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail