Search Results

Documents authored by Guerrini, Veronica


Document
Research
Subsequence-Based Indices for Genome Sequence Analysis

Authors: Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Cite as

Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini. Subsequence-Based Indices for Genome Sequence Analysis. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}
Document
Algorithms for Computing Very Large BWTs: a Short Survey

Authors: Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Burrows-Wheeler Transform (BWT) is a fundamental string transformation that, although initially introduced for data compression, has been extensively utilized across various domains, including text indexing and pattern matching within large datasets. Although the BWT construction is linear, the constants make the task impractical for large datasets, and as highlighted by Ferragina et al. [Paolo Ferragina et al., 2012], "to use it, one must first build it!". Thus, the construction of the BWT remains a significant challenge. For these reasons, during the past three decades there has been a succession of new algorithms for its construction using techniques that work in external memory or that use text compression. In this survey, we revise some of the most important advancements and tools presented in the past years for computing large BWTs exploiting external memory or text compression approaches without using additional information about the data.

Cite as

Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone. Algorithms for Computing Very Large BWTs: a Short Survey. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 7:1-7:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}
Document
A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem

Authors: Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The Burrows-Wheeler transform (BWT) is a famous text transformation that rearranges the symbols of the input strings so that occurrences of a same symbol tend to occur in runs. The number of runs is an important parameter in the BWT output string, historically associated with its high compressibility and more recently used as a measure for the space complexity of efficient data structures. It is a known fact that reordering the strings in the input collection 𝒮 affects the number of runs in the output string bwt(𝒮) produced by applying the BWT to the string collection. In this paper, we define a class of transformed strings where symbols in particular blocks of the bwt(𝒮) can be reordered according to a different adaptive alphabet order. Then, we introduce new heuristics to reduce the number of runs in the BWT output of a string collection that improve on the two existing heuristics introduced in Cox et al. [Anthony J. Cox et al., 2012]. These new heuristics are computed when applying the BWT to a string collection assuming no a priori order on the input strings and without requiring any pre- and/or post- processing of the collection 𝒮 or of the BWT string. In this paper, we also face the problem of reconstructing the input collection 𝒮 from the string bwt(𝒮) together with the string permutation realized when applying an alphabetical reordering of symbols during the construction of bwt(𝒮).

Cite as

Gianmarco Bertola, Anthony J. Cox, Veronica Guerrini, and Giovanna Rosone. A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 7:1-7:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bertola_et_al:LIPIcs.CPM.2024.7,
  author =	{Bertola, Gianmarco and Cox, Anthony J. and Guerrini, Veronica and Rosone, Giovanna},
  title =	{{A Class of Heuristics for Reducing the Number of BWT-Runs in the String Ordering Problem}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{7:1--7:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.7},
  URN =		{urn:nbn:de:0030-drops-201179},
  doi =		{10.4230/LIPIcs.CPM.2024.7},
  annote =	{Keywords: Burrows-Wheeler Transform, SAP-interval, repetitive text, string compression}
}
Document
phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering

Authors: Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny.

Cite as

Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{guerrini_et_al:LIPIcs.WABI.2022.23,
  author =	{Guerrini, Veronica and Conte, Alessio and Grossi, Roberto and Liti, Gianni and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.23},
  URN =		{urn:nbn:de:0030-drops-170577},
  doi =		{10.4230/LIPIcs.WABI.2022.23},
  annote =	{Keywords: Phylogeny, partition tree, BWT, positional cluster, alignment-free, reference-free, assembly-free}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail