4 Search Results for "Alanko, Jarno N."


Document
Computing the LCP Array of a Labeled Graph

Authors: Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queries on the graph’s paths. In this paper, we provide the first efficient algorithm building the LCP array of a directed labeled graph with n nodes and m edges labeled over an alphabet of size σ. The first step is to transform the input graph G into a deterministic Wheeler pseudoforest G_{is} with O(n) edges encoding the lexicographically- smallest and largest strings entering in each node of the original graph. Using state-of-the-art algorithms, this step runs in O(min{mlog n, m+n²}) time on arbitrary labeled graphs, and in O(m) time on Wheeler DFAs. The LCP array of G stores the longest common prefixes between those strings, i.e. it can easily be derived from the LCP array of G_{is}. After arguing that the natural generalization of a compact-space LCP-construction algorithm by Beller et al. [J. Discrete Algorithms 2013] runs in time Ω(nσ) on pseudoforests, we present a new algorithm based on dynamic range stabbing building the LCP array of G_{is} in O(nlog σ) time and O(nlogσ) bits of working space. Combined with our reduction, we obtain the first efficient algorithm to build the LCP array of an arbitrary labeled graph. An implementation of our algorithm is publicly available at https://github.com/regindex/Labeled-Graph-LCP.

Cite as

Jarno N. Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, and Nicola Prezza. Computing the LCP Array of a Labeled Graph. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2024.1,
  author =	{Alanko, Jarno N. and Cenzato, Davide and Cotumaccio, Nicola and Kim, Sung-Hwan and Manzini, Giovanni and Prezza, Nicola},
  title =	{{Computing the LCP Array of a Labeled Graph}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{1:1--1:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.1},
  URN =		{urn:nbn:de:0030-drops-201113},
  doi =		{10.4230/LIPIcs.CPM.2024.1},
  annote =	{Keywords: LCP array, Wheeler automata, prefix sorting, pattern matching, sorting}
}
Document
Subset Wavelet Trees

Authors: Jarno N. Alanko, Elena Biagi, Simon J. Puglisi, and Jaakko Vuohtoniemi

Published in: LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)


Abstract
Given an alphabet Σ of σ = |Σ| symbols, a degenerate (or indeterminate) string X is a sequence X = X[0],X[1]…, X[n-1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank(i,c), which returns the number of subsets up to the i-th subset in the degenerate string that contain the symbol c; and subset-select(i,c), which returns the index in the degenerate string of the i-th subset that contains symbol c. These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT.

Cite as

Jarno N. Alanko, Elena Biagi, Simon J. Puglisi, and Jaakko Vuohtoniemi. Subset Wavelet Trees. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.SEA.2023.4,
  author =	{Alanko, Jarno N. and Biagi, Elena and Puglisi, Simon J. and Vuohtoniemi, Jaakko},
  title =	{{Subset Wavelet Trees}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{4:1--4:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.4},
  URN =		{urn:nbn:de:0030-drops-183549},
  doi =		{10.4230/LIPIcs.SEA.2023.4},
  annote =	{Keywords: degenerate strings, compressed data structures, succinct data structures, string processing, data structures, efficient algorithms}
}
Document
Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time

Authors: Sebastian Schmidt and Jarno N. Alanko

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. For that, we present a formalisation of arc-centric bidirected de Bruijn graphs and carefully prove that it accurately models the k-mer spectrum of the input. Our algorithm first constructs the de Bruijn graph in linear time in the length of the input strings (for a fixed-size alphabet). Then it uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.

Cite as

Sebastian Schmidt and Jarno N. Alanko. Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 2:1-2:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2022.2,
  author =	{Schmidt, Sebastian and Alanko, Jarno N.},
  title =	{{Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{2:1--2:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.2},
  URN =		{urn:nbn:de:0030-drops-170361},
  doi =		{10.4230/LIPIcs.WABI.2022.2},
  annote =	{Keywords: Spectrum preserving string sets, Eulerian cycle, Suffix tree, Bidirected arc-centric de Bruijn graph, k-mer based methods}
}
Document
Finding All Maximal Perfect Haplotype Blocks in Linear Time

Authors: Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)


Abstract
Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

Cite as

Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye. Finding All Maximal Perfect Haplotype Blocks in Linear Time. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.WABI.2019.8,
  author =	{Alanko, Jarno and Bannai, Hideo and Cazaux, Bastien and Peterlongo, Pierre and Stoye, Jens},
  title =	{{Finding All Maximal Perfect Haplotype Blocks in Linear Time}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.8},
  URN =		{urn:nbn:de:0030-drops-110388},
  doi =		{10.4230/LIPIcs.WABI.2019.8},
  annote =	{Keywords: Population genomics, selection coefficient, haplotype block, positional Burrows-Wheeler Transform}
}
  • Refine by Author
  • 3 Alanko, Jarno N.
  • 1 Alanko, Jarno
  • 1 Bannai, Hideo
  • 1 Biagi, Elena
  • 1 Cazaux, Bastien
  • Show More...

  • Refine by Classification
  • 2 Theory of computation → Data structures design and analysis
  • 2 Theory of computation → Graph algorithms analysis
  • 2 Theory of computation → Pattern matching
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Computational biology
  • Show More...

  • Refine by Keyword
  • 1 Bidirected arc-centric de Bruijn graph
  • 1 Eulerian cycle
  • 1 LCP array
  • 1 Population genomics
  • 1 Spectrum preserving string sets
  • Show More...

  • Refine by Type
  • 4 document

  • Refine by Publication Year
  • 1 2019
  • 1 2022
  • 1 2023
  • 1 2024

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail