License
When quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-2292
URL: http://drops.dagstuhl.de/opus/volltexte/2005/229/
Go to the corresponding Portal


Weigel, Felix ; Schulz, Klaus U. ; Meuss, Holger

Node Identification Schemes for Efficient XML Retrieval

pdf-format:
Document 1.pdf (853 KB)


Abstract

Node identifiers (IDs) encoding part of the tree structure in XML documents can save I/O for table look-ups, thus speeding up the evaluation of path and tree queries on large persistent document collections. In particular, binary tree relations such as the extended XPath axes can be either decided for a given pair of node IDs, or reconstructed for a single node ID, without access to secondary storage. Several ID schemes have been proposed so far, which differ with respect to (1) expressiveness, i.e. which relations can be decided or reconstructed from IDs, (2) the runtime performance and asymptotic behaviour of decision and reconstruction operations, (3) the storage overhead for the IDs, and (4) robustness, i.e. behaviour in the presence of updates. First we review five ID schemes, positioning them in the trade-off between these four comparison criteria. Then a new ID scheme called BIRD, for Balanced Index-based ID scheme for Reconstruction and Decision, is introduced and illustrated throughout several examples of decision and reconstruction operations on IDs. We argue that emphasizing runtime performance and expressive power, BIRDs strategy in the above trade-off is best for many applications, especially where storage minimization is not the primary goal and updates occur in a bulk-fashion rather than in realtime. Our experimental results on document collections of up to one gigabyte prove BIRD to be most efficient in terms of expressiveness and runtime performance. Most notably, BIRD is the only scheme to support both decision and reconstruction of many relations in constant time. But also in terms of storage and robustness BIRD is highly competitive.

BibTeX - Entry

@InProceedings{weigel_et_al:DSP:2005:229,
  author =	{Felix Weigel and Klaus U. Schulz and Holger Meuss},
  title =	{Node Identification Schemes for Efficient XML Retrieval},
  booktitle =	{Foundations of Semistructured Data},
  year =	{2005},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  number =	{05061},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Internationales Begegnungs- und Forschungszentrum f{\"u}r Informatik (IBFI), Schloss Dagstuhl, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2005/229},
  annote =	{Keywords: node identification scheme, labelling scheme, numbering scheme, naming scheme, tree encoding, BIRD}
}

Keywords: node identification scheme, labelling scheme, numbering scheme, naming scheme, tree encoding, BIRD
Seminar: 05061 - Foundations of Semistructured Data
Issue Date: 2005
Date of publication: 10.08.2005


DROPS-Home | Fulltext Search | Imprint Published by LZI