DROPS

Document

Research

Wavelet Tree, Part I: A Brief History

Authors: Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

The Wavelet Tree data structure introduced in Grossi, Gupta, and Vitter [Grossi et al., 2003] is a space-efficient technique for rank and select queries that generalizes from binary symbols to an arbitrary multisymbol alphabet. Over the last two decades, it has become a pivotal tool in modern full-text indexing and data compression because of its properties and capabilities in compressing and indexing data, with many applications to information retrieval, genome analysis, data mining, and web search. In this paper, we survey the fascinating history and impact of Wavelet Trees; no doubt many more developments are yet to come. Our survey borrows some content from the authors' earlier works. This paper is divided into two parts: one (this one) giving a brief history of Wavelet Trees, including its varieties and practical implementations, dedicated to this Festschrift’s honoree Roberto Grossi; the second part deals with Wavelet Tree-based text indexing and is included in the Festschrift dedicated to Giovanni Manzini [Ferragina et al., 2025].

Cite as

Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter. Wavelet Tree, Part I: A Brief History. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:OASIcs.Grossi.15,
  author =	{Ferragina, Paolo and Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Venturini, Rossano and Vitter, Jeffrey Scott},
  title =	{{Wavelet Tree, Part I: A Brief History}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{15:1--15:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.15},
  URN =		{urn:nbn:de:0030-drops-238143},
  doi =		{10.4230/OASIcs.Grossi.15},
  annote =	{Keywords: Wavelet tree, data compression, text indexing, compressed suffix array, Burrows-Wheeler transform, rank and select}
}

@InProceedings{ferragina_et_al:OASIcs.Grossi.15,
  author =	{Ferragina, Paolo and Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Venturini, Rossano and Vitter, Jeffrey Scott},
  title =	{{Wavelet Tree, Part I: A Brief History}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{15:1--15:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.15},
  URN =		{urn:nbn:de:0030-drops-238143},
  doi =		{10.4230/OASIcs.Grossi.15},
  annote =	{Keywords: Wavelet tree, data compression, text indexing, compressed suffix array, Burrows-Wheeler transform, rank and select}
}

Document

DOI: 10.4230/OASIcs.Manzini.4

Wavelet Tree, Part II: Text Indexing

Authors: Paolo Ferragina, Raffaele Giancarlo, Roberto Grossi, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Wavelet Tree data structure introduced in Grossi, Gupta, and Vitter [Grossi et al., 2003] is a space-efficient technique for rank and select queries that generalizes from binary symbols to an arbitrary multisymbol alphabet. Over the last two decades, it has become a pivotal tool in modern full-text indexing and data compression because of its properties and capabilities in compressing and indexing data, with many applications to information retrieval, genome analysis, data mining, and web search. In this paper, we survey the fascinating history and impact of Wavelet Trees; no doubt many more developments are yet to come. Our survey borrows some content from the authors' earlier works. This paper is divided into two parts: The first part gives a brief history of Wavelet Trees, including its varieties and practical implementations, which appears in the Festschrift dedicated to Roberto Grossi; the second part (this one) deals with Wavelet Tree-based text indexing and is included in the Festschrift dedicated to Giovanni Manzini.

Cite as

Paolo Ferragina, Raffaele Giancarlo, Roberto Grossi, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter. Wavelet Tree, Part II: Text Indexing. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 4:1-4:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:OASIcs.Manzini.4,
  author =	{Ferragina, Paolo and Giancarlo, Raffaele and Grossi, Roberto and Rosone, Giovanna and Venturini, Rossano and Vitter, Jeffrey Scott},
  title =	{{Wavelet Tree, Part II: Text Indexing}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{4:1--4:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.4},
  URN =		{urn:nbn:de:0030-drops-239127},
  doi =		{10.4230/OASIcs.Manzini.4},
  annote =	{Keywords: Wavelet tree, data compression, text indexing, compressed suffix array, Burrows-Wheeler transform, rank and select}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.16

Compressed Weighted de Bruijn Graphs

Authors: Giuseppe F. Italiano, Nicola Prezza, Blerina Sinaimeri, and Rossano Venturini

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.

Cite as

Giuseppe F. Italiano, Nicola Prezza, Blerina Sinaimeri, and Rossano Venturini. Compressed Weighted de Bruijn Graphs. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 16:1-16:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{italiano_et_al:LIPIcs.CPM.2021.16,
  author =	{Italiano, Giuseppe F. and Prezza, Nicola and Sinaimeri, Blerina and Venturini, Rossano},
  title =	{{Compressed Weighted de Bruijn Graphs}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.16},
  URN =		{urn:nbn:de:0030-drops-139675},
  doi =		{10.4230/LIPIcs.CPM.2021.16},
  annote =	{Keywords: weighted de Bruijn graphs, k-mer annotation, compressed data structures, partial sums}
}

Document

DOI: 10.4230/LIPIcs.ESA.2017.38

An Encoding for Order-Preserving Matching

Authors: Travis Gagie, Giovanni Manzini, and Rossano Venturini

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)

Abstract

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: e.g., (4, 1, 3, 2) and (10, 3, 7, 5) are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size sigma and a constant c >=1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m >= log^c n, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if log(sigma) = Omega(log log n); our query time is optimal if log(sigma) = Omega(log n). Our space bound contrasts with the Omega(n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(log^c n) neighbouring characters.

Cite as

Travis Gagie, Giovanni Manzini, and Rossano Venturini. An Encoding for Order-Preserving Matching. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 38:1-38:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{gagie_et_al:LIPIcs.ESA.2017.38,
  author =	{Gagie, Travis and Manzini, Giovanni and Venturini, Rossano},
  title =	{{An Encoding for Order-Preserving Matching}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{38:1--38:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.38},
  URN =		{urn:nbn:de:0030-drops-78726},
  doi =		{10.4230/LIPIcs.ESA.2017.38},
  annote =	{Keywords: Compact data structures, encodings, order-preserving matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.30

Dynamic Elias-Fano Representation

Authors: Giulio Ermanno Pibiri and Rossano Venturini

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

We show that it is possible to store a dynamic ordered set S of n integers drawn from a bounded universe of size u in space close to the information-theoretic lower bound and preserve, at the same time, the asymptotic time optimality of the operations. Our results leverage on the Elias-Fano representation of monotone integer sequences, which can be shown to be less than half a bit per element away from the information-theoretic minimum. In particular, considering a RAM model with memory word size Theta(log u) bits, when integers are drawn from a polynomial universe of size u = n^gamma for any gamma = Theta(1), we add o(n) bits to the static Elias-Fano representation in order to: 1. support static predecessor/successor queries in O(min{1+log(u/n), loglog n}); 2. make S grow in an append-only fashion by spending O(1) per inserted element; 3. describe a dynamic data structure supporting random access in O(log n / loglog n) worst-case, insertions/deletions in O(log n / loglog n) amortized and predecessor/successor queries in O(min{1+log(u/n), loglog n}) worst-case time. These time bounds are optimal.

Cite as

Giulio Ermanno Pibiri and Rossano Venturini. Dynamic Elias-Fano Representation. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 30:1-30:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pibiri_et_al:LIPIcs.CPM.2017.30,
  author =	{Pibiri, Giulio Ermanno and Venturini, Rossano},
  title =	{{Dynamic Elias-Fano Representation}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{30:1--30:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.30},
  URN =		{urn:nbn:de:0030-drops-73244},
  doi =		{10.4230/LIPIcs.CPM.2017.30},
  annote =	{Keywords: succinct data structures, integer sets, predecessor problem, Elias-Fano}
}

Search Results

Documents authored by Venturini, Rossano

Wavelet Tree, Part I: A Brief History

Abstract

Cite as

Wavelet Tree, Part II: Text Indexing

Abstract

Cite as

Compressed Weighted de Bruijn Graphs

Abstract

Cite as

An Encoding for Order-Preserving Matching

Abstract

Cite as

Dynamic Elias-Fano Representation

Abstract

Cite as

Thanks for your feedback!

Could not send message