Search Results

Documents authored by Rosone, Giovanna


Document
Complete Volume
OASIcs, Volume 132, Grossi's Festschrift, Complete Volume

Authors: Alessio Conte, Andrea Marino, Giovanna Rosone, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
OASIcs, Volume 132, Grossi's Festschrift, Complete Volume

Cite as

From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 1-312, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@Proceedings{conte_et_al:OASIcs.Grossi,
  title =	{{OASIcs, Volume 132, Grossi's Festschrift, Complete Volume}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{1--312},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi},
  URN =		{urn:nbn:de:0030-drops-243429},
  doi =		{10.4230/OASIcs.Grossi},
  annote =	{Keywords: OASIcs, Volume 132, Grossi's Festschrift, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Alessio Conte, Andrea Marino, Giovanna Rosone, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 0:i-0:xxxii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{conte_et_al:OASIcs.Grossi.0,
  author =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{0:i--0:xxxii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.0},
  URN =		{urn:nbn:de:0030-drops-239054},
  doi =		{10.4230/OASIcs.Grossi.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Research
Wavelet Tree, Part I: A Brief History

Authors: Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
The Wavelet Tree data structure introduced in Grossi, Gupta, and Vitter [Grossi et al., 2003] is a space-efficient technique for rank and select queries that generalizes from binary symbols to an arbitrary multisymbol alphabet. Over the last two decades, it has become a pivotal tool in modern full-text indexing and data compression because of its properties and capabilities in compressing and indexing data, with many applications to information retrieval, genome analysis, data mining, and web search. In this paper, we survey the fascinating history and impact of Wavelet Trees; no doubt many more developments are yet to come. Our survey borrows some content from the authors' earlier works. This paper is divided into two parts: one (this one) giving a brief history of Wavelet Trees, including its varieties and practical implementations, dedicated to this Festschrift’s honoree Roberto Grossi; the second part deals with Wavelet Tree-based text indexing and is included in the Festschrift dedicated to Giovanni Manzini [Ferragina et al., 2025].

Cite as

Paolo Ferragina, Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter. Wavelet Tree, Part I: A Brief History. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:OASIcs.Grossi.15,
  author =	{Ferragina, Paolo and Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Venturini, Rossano and Vitter, Jeffrey Scott},
  title =	{{Wavelet Tree, Part I: A Brief History}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{15:1--15:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.15},
  URN =		{urn:nbn:de:0030-drops-238143},
  doi =		{10.4230/OASIcs.Grossi.15},
  annote =	{Keywords: Wavelet tree, data compression, text indexing, compressed suffix array, Burrows-Wheeler transform, rank and select}
}
Document
Research
Subsequence-Based Indices for Genome Sequence Analysis

Authors: Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Cite as

Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini. Subsequence-Based Indices for Genome Sequence Analysis. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}
Document
BWT and Combinatorics on Words

Authors: Gabriele Fici, Sabrina Mantaci, Antonio Restivo, Giuseppe Romana, Giovanna Rosone, and Marinella Sciortino

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Burrows-Wheeler Transform (BWT) is a reversible transformation on words (strings) introduced in 1994 in the context of data compression, which is a permutation of the characters in the word. Its clustering effect, i.e., the remarkable property of grouping identical characters (BWT runs) when they share common contexts, has made it a powerful tool for boosting compression performances and enabling efficient pattern searching in highly repetitive string collections. In this chapter, we analyze the Burrows-Wheeler transform under the combinatorial point of view, and we survey known properties and connections with different aspects of combinatorics on words. In particular, we focus on the properties of words in relation to the number of their BWT runs. The value r, which counts the number of BWT runs, impacts both compression performance and indexing efficiency, and is considered a measure to evaluate the above-mentioned clustering effect and, consequently, the repetitiveness of a word. We give an overview of the results relating r to other combinatorial repetitiveness measures related to the factor complexity. The chapter also explores extremal cases of the clustering effect. Finally, some results on the sensitivity of the measure r are considered, where the effects of combinatorial operations are studied, such as reversal, edits, and the application of morphisms.

Cite as

Gabriele Fici, Sabrina Mantaci, Antonio Restivo, Giuseppe Romana, Giovanna Rosone, and Marinella Sciortino. BWT and Combinatorics on Words. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 1:1-1:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{fici_et_al:OASIcs.Manzini.1,
  author =	{Fici, Gabriele and Mantaci, Sabrina and Restivo, Antonio and Romana, Giuseppe and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{BWT and Combinatorics on Words}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{1:1--1:23},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.1},
  URN =		{urn:nbn:de:0030-drops-239090},
  doi =		{10.4230/OASIcs.Manzini.1},
  annote =	{Keywords: Burrows-Wheeler Transform, Combinatorics on Words, Clustering Effect, BWT Runs}
}
Document
BWT for String Collections

Authors: Davide Cenzato, Zsuzsanna Lipták, Nadia Pisanti, Giovanna Rosone, and Marinella Sciortino

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
We survey the different methods used for extending the BWT to collections of strings, following largely [Cenzato and Lipták, CPM 2022, Bioinformatics 2024]. We analyze the specific aspects and combinatorial properties of the resulting BWT variants and give a categorization of publicly available tools for computing the BWT of string collections. We show how the specific method used impacts on the resulting transform, including the number of runs, and on the dynamicity of the transform with respect to adding or removing strings from the collection. We then focus on the number of runs of these BWT variants and present the optimal BWT introduced in [Cenzato et al., DCC 2023], which implements an algorithm originally proposed by [Bentley et al., ESA 2020] to minimize the number of BWT-runs. We also discuss several recent heuristics and study their impact on the compression of biological sequences. We conclude with an overview of the applications and the impact of the BWT of string collections in bioinformatics.

Cite as

Davide Cenzato, Zsuzsanna Lipták, Nadia Pisanti, Giovanna Rosone, and Marinella Sciortino. BWT for String Collections. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 3:1-3:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cenzato_et_al:OASIcs.Manzini.3,
  author =	{Cenzato, Davide and Lipt\'{a}k, Zsuzsanna and Pisanti, Nadia and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{BWT for String Collections}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{3:1--3:29},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.3},
  URN =		{urn:nbn:de:0030-drops-239113},
  doi =		{10.4230/OASIcs.Manzini.3},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, compressed text indexes, text compression, string collections, bioinformatics}
}
Document
Wavelet Tree, Part II: Text Indexing

Authors: Paolo Ferragina, Raffaele Giancarlo, Roberto Grossi, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Wavelet Tree data structure introduced in Grossi, Gupta, and Vitter [Grossi et al., 2003] is a space-efficient technique for rank and select queries that generalizes from binary symbols to an arbitrary multisymbol alphabet. Over the last two decades, it has become a pivotal tool in modern full-text indexing and data compression because of its properties and capabilities in compressing and indexing data, with many applications to information retrieval, genome analysis, data mining, and web search. In this paper, we survey the fascinating history and impact of Wavelet Trees; no doubt many more developments are yet to come. Our survey borrows some content from the authors' earlier works. This paper is divided into two parts: The first part gives a brief history of Wavelet Trees, including its varieties and practical implementations, which appears in the Festschrift dedicated to Roberto Grossi; the second part (this one) deals with Wavelet Tree-based text indexing and is included in the Festschrift dedicated to Giovanni Manzini.

Cite as

Paolo Ferragina, Raffaele Giancarlo, Roberto Grossi, Giovanna Rosone, Rossano Venturini, and Jeffrey Scott Vitter. Wavelet Tree, Part II: Text Indexing. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 4:1-4:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ferragina_et_al:OASIcs.Manzini.4,
  author =	{Ferragina, Paolo and Giancarlo, Raffaele and Grossi, Roberto and Rosone, Giovanna and Venturini, Rossano and Vitter, Jeffrey Scott},
  title =	{{Wavelet Tree, Part II: Text Indexing}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{4:1--4:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.4},
  URN =		{urn:nbn:de:0030-drops-239127},
  doi =		{10.4230/OASIcs.Manzini.4},
  annote =	{Keywords: Wavelet tree, data compression, text indexing, compressed suffix array, Burrows-Wheeler transform, rank and select}
}
Document
Algorithms for Computing Very Large BWTs: a Short Survey

Authors: Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Burrows-Wheeler Transform (BWT) is a fundamental string transformation that, although initially introduced for data compression, has been extensively utilized across various domains, including text indexing and pattern matching within large datasets. Although the BWT construction is linear, the constants make the task impractical for large datasets, and as highlighted by Ferragina et al. [Paolo Ferragina et al., 2012], "to use it, one must first build it!". Thus, the construction of the BWT remains a significant challenge. For these reasons, during the past three decades there has been a succession of new algorithms for its construction using techniques that work in external memory or that use text compression. In this survey, we revise some of the most important advancements and tools presented in the past years for computing large BWTs exploiting external memory or text compression approaches without using additional information about the data.

Cite as

Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone. Algorithms for Computing Very Large BWTs: a Short Survey. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 7:1-7:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}
Document
A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem

Authors: Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The Burrows-Wheeler transform (BWT) is a famous text transformation that rearranges the symbols of the input strings so that occurrences of a same symbol tend to occur in runs. The number of runs is an important parameter in the BWT output string, historically associated with its high compressibility and more recently used as a measure for the space complexity of efficient data structures. It is a known fact that reordering the strings in the input collection 𝒮 affects the number of runs in the output string bwt(𝒮) produced by applying the BWT to the string collection. In this paper, we define a class of transformed strings where symbols in particular blocks of the bwt(𝒮) can be reordered according to a different adaptive alphabet order. Then, we introduce new heuristics to reduce the number of runs in the BWT output of a string collection that improve on the two existing heuristics introduced in Cox et al. [Anthony J. Cox et al., 2012]. These new heuristics are computed when applying the BWT to a string collection assuming no a priori order on the input strings and without requiring any pre- and/or post- processing of the collection 𝒮 or of the BWT string. In this paper, we also face the problem of reconstructing the input collection 𝒮 from the string bwt(𝒮) together with the string permutation realized when applying an alphabetical reordering of symbols during the construction of bwt(𝒮).

Cite as

Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone. A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 7:1-7:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bertola_et_al:LIPIcs.CPM.2024.7,
  author =	{Bertola, Gianmarco and Cox, Anthony J. and Guerrini, Veronica and Rosone, Giovanna},
  title =	{{A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{7:1--7:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.7},
  URN =		{urn:nbn:de:0030-drops-201179},
  doi =		{10.4230/LIPIcs.CPM.2024.7},
  annote =	{Keywords: Burrows-Wheeler Transform, SAP-interval, repetitive text, string compression}
}
Document
phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering

Authors: Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny.

Cite as

Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{guerrini_et_al:LIPIcs.WABI.2022.23,
  author =	{Guerrini, Veronica and Conte, Alessio and Grossi, Roberto and Liti, Gianni and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.23},
  URN =		{urn:nbn:de:0030-drops-170577},
  doi =		{10.4230/LIPIcs.WABI.2022.23},
  annote =	{Keywords: Phylogeny, partition tree, BWT, positional cluster, alignment-free, reference-free, assembly-free}
}
Document
Track A: Algorithms, Complexity and Games
Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

Authors: Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm^{1.5}sqrt{log m} + N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016]. Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in O(nm^{1.5-epsilon} + N) time, for any epsilon>0, refutes this conjecture. Of course, the notion of combinatorial algorithms is not clearly defined, so our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication. Two standard tools used in algorithms on strings are string periodicity and fast Fourier transform. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial O(nm^{1.381} + N)-time algorithm for EDSM. To the best of our knowledge, we are the first to do so.

Cite as

Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bernardini_et_al:LIPIcs.ICALP.2019.21,
  author =	{Bernardini, Giulia and Gawrychowski, Pawe{\l} and Pisanti, Nadia and Pissis, Solon P. and Rosone, Giovanna},
  title =	{{Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{21:1--21:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.21},
  URN =		{urn:nbn:de:0030-drops-105973},
  doi =		{10.4230/LIPIcs.ICALP.2019.21},
  annote =	{Keywords: string algorithms, pattern matching, elastic-degenerate string, matrix multiplication, fast Fourier transform}
}
Document
Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform

Authors: Nicola Prezza and Giovanna Rosone

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1,sigma] can be computed from the Burrows-Wheeler transformed collection in O(n log sigma) time using o(n log sigma) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.

Cite as

Nicola Prezza and Giovanna Rosone. Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{prezza_et_al:LIPIcs.CPM.2019.7,
  author =	{Prezza, Nicola and Rosone, Giovanna},
  title =	{{Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.7},
  URN =		{urn:nbn:de:0030-drops-104782},
  doi =		{10.4230/LIPIcs.CPM.2019.7},
  annote =	{Keywords: Burrows-Wheeler Transform, LCP array, DNA reads}
}
Document
A New Class of Searchable and Provably Highly Compressible String Transformations

Authors: Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited success. In this paper we bring new lymph to this area by introducing a whole new family of transformations that have all the "myriad virtues" of the BWT: they can be computed and inverted in linear time, they produce provably highly compressible strings, and they support linear time pattern search directly on the transformed string. This new family is a special case of a more general class of transformations based on context adaptive alphabet orderings, a concept introduced here. This more general class includes also the Alternating BWT, another invertible string transforms recently introduced in connection with a generalization of Lyndon words.

Cite as

Raffaele Giancarlo, Giovanni Manzini, Giovanna Rosone, and Marinella Sciortino. A New Class of Searchable and Provably Highly Compressible String Transformations. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:LIPIcs.CPM.2019.12,
  author =	{Giancarlo, Raffaele and Manzini, Giovanni and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{A New Class of Searchable and Provably Highly Compressible String Transformations}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{12:1--12:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.12},
  URN =		{urn:nbn:de:0030-drops-104833},
  doi =		{10.4230/LIPIcs.CPM.2019.12},
  annote =	{Keywords: Data Indexing and Compression, Burrows-Wheeler Transformation, Combinatorics on Words}
}
Document
Detecting Mutations by eBWT

Authors: Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the eBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the eBWT and LCP arrays. Preliminary results show that our method requires much less coverage than state-of-the-art tools while drastically improving precision and sensitivity.

Cite as

Nicola Prezza, Nadia Pisanti, Marinella Sciortino, and Giovanna Rosone. Detecting Mutations by eBWT. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{prezza_et_al:LIPIcs.WABI.2018.3,
  author =	{Prezza, Nicola and Pisanti, Nadia and Sciortino, Marinella and Rosone, Giovanna},
  title =	{{Detecting Mutations by eBWT}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{3:1--3:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.3},
  URN =		{urn:nbn:de:0030-drops-93051},
  doi =		{10.4230/LIPIcs.WABI.2018.3},
  annote =	{Keywords: BWT, LCP Array, SNPs, Reference-free, Assembly-free}
}
Document
Degenerate String Comparison and Applications

Authors: Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
A generalised degenerate string (GD string) S^ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length k_i but this length can vary between different sets. We denote the sum of these lengths k_0, k_1,...,k_{n-1} by W. This type of uncertain sequence can represent, for example, a gapless multiple sequence alignment of width W in a compact form. Our first result in this paper is an O(N+M)-time algorithm for deciding whether the intersection of two GD strings of total sizes N and M, respectively, over an integer alphabet, is non-empty. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in only linear space. A similar result can be obtained by employing an automata-based approach but its cost is alphabet-dependent. We then apply our string comparison algorithm to compute palindromes in GD strings. We present an O(min{W,n^2}N)-time algorithm for computing all palindromes in S^. Furthermore, we show a similar conditional lower bound for computing maximal palindromes in S^. Finally, proof-of-concept experimental results are presented using real protein datasets.

Cite as

Mai Alzamel, Lorraine A. K. Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Degenerate String Comparison and Applications. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{alzamel_et_al:LIPIcs.WABI.2018.21,
  author =	{Alzamel, Mai and Ayad, Lorraine A. K. and Bernardini, Giulia and Grossi, Roberto and Iliopoulos, Costas S. and Pisanti, Nadia and Pissis, Solon P. and Rosone, Giovanna},
  title =	{{Degenerate String Comparison and Applications}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{21:1--21:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.21},
  URN =		{urn:nbn:de:0030-drops-93236},
  doi =		{10.4230/LIPIcs.WABI.2018.21},
  annote =	{Keywords: degenerate strings, generalised degenerate strings, elastic-degenerate strings, string comparison, palindromes}
}
Document
On-Line Pattern Matching on Similar Texts

Authors: Roberto Grossi, Costas S. Iliopoulos, Chang Liu, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, and Luca Versari

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts' representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms.

Cite as

Roberto Grossi, Costas S. Iliopoulos, Chang Liu, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, and Luca Versari. On-Line Pattern Matching on Similar Texts. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.CPM.2017.9,
  author =	{Grossi, Roberto and Iliopoulos, Costas S. and Liu, Chang and Pisanti, Nadia and Pissis, Solon P. and Retha, Ahmad and Rosone, Giovanna and Vayani, Fatima and Versari, Luca},
  title =	{{On-Line Pattern Matching on Similar Texts}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.9},
  URN =		{urn:nbn:de:0030-drops-73379},
  doi =		{10.4230/LIPIcs.CPM.2017.9},
  annote =	{Keywords: string algorithms, pattern matching, degenerate strings, elastic-degenerate strings, on-line algorithms}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail