DROPS

Document

An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation

Authors: Angela Bonifati, Gregory Leighton, Veli Mäkinen, Sebastian Maneth, Gonzalo Navarro, and Andrea Pugliese

Published in: Dagstuhl Seminar Proceedings, Volume 8261, Structure-Based Compression of Complex Massive Data (2008)

Abstract

We describe the architecture and main algorithmic design decisions for an XQuery/XPath processing engine over XML collections which will be represented using a self-indexing approach, that is, a compressed representation that will allow for basic searching and navigational operations in compressed form. The goal is a structure that occupies little space and thus permits manipulating large collections in main memory.

Cite as

Angela Bonifati, Gregory Leighton, Veli Mäkinen, Sebastian Maneth, Gonzalo Navarro, and Andrea Pugliese. An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation. In Structure-Based Compression of Complex Massive Data. Dagstuhl Seminar Proceedings, Volume 8261, pp. 1-17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{bonifati_et_al:DagSemProc.08261.6,
  author =	{Bonifati, Angela and Leighton, Gregory and M\"{a}kinen, Veli and Maneth, Sebastian and Navarro, Gonzalo and Pugliese, Andrea},
  title =	{{An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation}},
  booktitle =	{Structure-Based Compression of Complex Massive Data},
  pages =	{1--17},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8261},
  editor =	{Stefan B\"{o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08261.6},
  URN =		{urn:nbn:de:0030-drops-16776},
  doi =		{10.4230/DagSemProc.08261.6},
  annote =	{Keywords: Compressed self-index, compressed XML representation, XPath, XQuery}
}

Document

DOI: 10.4230/DagSemProc.08261.9

Optimizing XML Compression in XQueC

Authors: Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese

Published in: Dagstuhl Seminar Proceedings, Volume 8261, Structure-Based Compression of Complex Massive Data (2008)

Abstract

We present our approach to the problem of optimizing compression choices in the context of the XQueC compressed XML database system. In XQueC, data items are aggregated into containers, which are further grouped to be compressed together. This way, XQueC is able to exploit data commonalities and to perform query evaluation in the compressed domain, with the aim of improving both compression and querying performance. However, different compression algorithms have different performance and support different sets of operations in the compressed domain. Therefore, choosing how to group containers and which compression algorithm to apply to each group is a challenging issue. We address this problem through an appropriate cost model and a suitable blend of heuristics which, based on a given query workload, are capable of driving appropriate compression choices.

Cite as

Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese. Optimizing XML Compression in XQueC. In Structure-Based Compression of Complex Massive Data. Dagstuhl Seminar Proceedings, Volume 8261, pp. 1-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{arion_et_al:DagSemProc.08261.9,
  author =	{Arion, Andrei and Bonifati, Angela and Manolescu, Ioana and Pugliese, Andrea},
  title =	{{Optimizing XML Compression in XQueC}},
  booktitle =	{Structure-Based Compression of Complex Massive Data},
  pages =	{1--12},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8261},
  editor =	{Stefan B\"{o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08261.9},
  URN =		{urn:nbn:de:0030-drops-16924},
  doi =		{10.4230/DagSemProc.08261.9},
  annote =	{Keywords: XML compression}
}

Document

DOI: 10.4230/DagSemProc.08261.12

The XQueC Project: Compressing and Querying XML

Authors: Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese

Published in: Dagstuhl Seminar Proceedings, Volume 8261, Structure-Based Compression of Complex Massive Data (2008)

Abstract

We outline in this paper the main contributions of the XQueC project. XQueC, namely XQuery processor and Compressor, is the first compression tool to seamlessly allow XQuery queries in the compressed domain. It includes a set of data structures, that basically shred the XML document into suitable chunks linked to each other, thus disagreeing with the ’homomorphic’ principle so far adopted in previous XML compressors. According to this principle, the compressed document is homomorphic to the original document. Moreover, in order to avoid the time consumption due to compressing and decompressing intermediate query results, XQueC applies ‘lazy’ decompression by issuing the queries directly in the compressed domain.

Cite as

Andrei Arion, Angela Bonifati, Ioana Manolescu, and Andrea Pugliese. The XQueC Project: Compressing and Querying XML. In Structure-Based Compression of Complex Massive Data. Dagstuhl Seminar Proceedings, Volume 8261, pp. 1-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{arion_et_al:DagSemProc.08261.12,
  author =	{Arion, Andrei and Bonifati, Angela and Manolescu, Ioana and Pugliese, Andrea},
  title =	{{The XQueC Project: Compressing and Querying XML}},
  booktitle =	{Structure-Based Compression of Complex Massive Data},
  pages =	{1--16},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8261},
  editor =	{Stefan B\"{o}ttcher and Markus Lohrey and Sebastian Maneth and Wojcieh Rytter},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08261.12},
  URN =		{urn:nbn:de:0030-drops-16919},
  doi =		{10.4230/DagSemProc.08261.12},
  annote =	{Keywords: XML compression, Data structures, XQuery querying}
}

Document

DOI: 10.4230/DagSemProc.05061.4

Exploiting Structural Similarity For Effective Web Information Extraction

Authors: Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, and Andrea Pugliese

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)

Abstract

In this paper we propose an architecture that exploit web pages stuctural information for the extraction of relevant information from them. In this architecture, a primary role played by a distance-based classification methodology is devised. Such a methodology is based on an efficient and effective technique for detecting structural similarities among semistructured documents, which significantly differs from standard methods based on graph-matching algorithms. The technique is based on the idea of representing the structure of a document as a time series in which each occurrence of a tag corresponds to a given impulse. By analyzing the frequencies of the corresponding Fourier transform, we can hence state the degree of similarity between documents. Experiments on real data show the effectiveness of the proposed technique.

Cite as

Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, and Andrea Pugliese. Exploiting Structural Similarity For Effective Web Information Extraction. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)

Copy BibTex To Clipboard

@InProceedings{masciari_et_al:DagSemProc.05061.4,
  author =	{Masciari, Elio and Flesca, Sergio and Manco, Giuseppe and Pontieri, Luigi and Pugliese, Andrea},
  title =	{{Exploiting Structural Similarity For Effective Web Information Extraction}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.4},
  URN =		{urn:nbn:de:0030-drops-2301},
  doi =		{10.4230/DagSemProc.05061.4},
  annote =	{Keywords: DFT, Web Document Structural Similarity}
}

4 Search Results for "Pugliese, Andrea"

An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation

Abstract

Cite as

Optimizing XML Compression in XQueC

Abstract

Cite as

The XQueC Project: Compressing and Querying XML

Abstract

Cite as

Exploiting Structural Similarity For Effective Web Information Extraction

Abstract

Cite as

Thanks for your feedback!

Could not send message