3 Search Results for "Alanko, Jarno N."


Document
Subset Wavelet Trees

Authors: Jarno N. Alanko, Elena Biagi, Simon J. Puglisi, and Jaakko Vuohtoniemi

Published in: LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)


Abstract
Given an alphabet Σ of σ = |Σ| symbols, a degenerate (or indeterminate) string X is a sequence X = X[0],X[1]…, X[n-1] of n subsets of Σ. Since their introduction in the mid 70s, degenerate strings have been widely studied, with applications driven by their being a natural model for sequences in which there is a degree of uncertainty about the precise symbol at a given position, such as those arising in genomics and proteomics. In this paper we introduce a new data structural tool for degenerate strings, called the subset wavelet tree (SubsetWT). A SubsetWT supports two basic operations on degenerate strings: subset-rank(i,c), which returns the number of subsets up to the i-th subset in the degenerate string that contain the symbol c; and subset-select(i,c), which returns the index in the degenerate string of the i-th subset that contains symbol c. These queries are analogs of rank and select queries that have been widely studied for ordinary strings. Via experiments in a real genomics application in which degenerate strings are fundamental, we show that subset wavelet trees are practical data structures, and in particular offer an attractive space-time tradeoff. Along the way we investigate data structures for supporting (normal) rank queries on base-4 and base-3 sequences, which may be of independent interest. Our C++ implementations of the data structures are available at https://github.com/jnalanko/SubsetWT.

Cite as

Jarno N. Alanko, Elena Biagi, Simon J. Puglisi, and Jaakko Vuohtoniemi. Subset Wavelet Trees. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.SEA.2023.4,
  author =	{Alanko, Jarno N. and Biagi, Elena and Puglisi, Simon J. and Vuohtoniemi, Jaakko},
  title =	{{Subset Wavelet Trees}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{4:1--4:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.4},
  URN =		{urn:nbn:de:0030-drops-183549},
  doi =		{10.4230/LIPIcs.SEA.2023.4},
  annote =	{Keywords: degenerate strings, compressed data structures, succinct data structures, string processing, data structures, efficient algorithms}
}
Document
Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time

Authors: Sebastian Schmidt and Jarno N. Alanko

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. For that, we present a formalisation of arc-centric bidirected de Bruijn graphs and carefully prove that it accurately models the k-mer spectrum of the input. Our algorithm first constructs the de Bruijn graph in linear time in the length of the input strings (for a fixed-size alphabet). Then it uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.

Cite as

Sebastian Schmidt and Jarno N. Alanko. Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 2:1-2:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2022.2,
  author =	{Schmidt, Sebastian and Alanko, Jarno N.},
  title =	{{Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{2:1--2:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.2},
  URN =		{urn:nbn:de:0030-drops-170361},
  doi =		{10.4230/LIPIcs.WABI.2022.2},
  annote =	{Keywords: Spectrum preserving string sets, Eulerian cycle, Suffix tree, Bidirected arc-centric de Bruijn graph, k-mer based methods}
}
Document
Finding All Maximal Perfect Haplotype Blocks in Linear Time

Authors: Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)


Abstract
Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

Cite as

Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye. Finding All Maximal Perfect Haplotype Blocks in Linear Time. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.WABI.2019.8,
  author =	{Alanko, Jarno and Bannai, Hideo and Cazaux, Bastien and Peterlongo, Pierre and Stoye, Jens},
  title =	{{Finding All Maximal Perfect Haplotype Blocks in Linear Time}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{8:1--8:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.8},
  URN =		{urn:nbn:de:0030-drops-110388},
  doi =		{10.4230/LIPIcs.WABI.2019.8},
  annote =	{Keywords: Population genomics, selection coefficient, haplotype block, positional Burrows-Wheeler Transform}
}
  • Refine by Author
  • 2 Alanko, Jarno N.
  • 1 Alanko, Jarno
  • 1 Bannai, Hideo
  • 1 Biagi, Elena
  • 1 Cazaux, Bastien
  • Show More...

  • Refine by Classification
  • 2 Theory of computation → Data structures design and analysis
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Computational biology
  • 1 Applied computing → Computational genomics
  • 1 Mathematics of computing → Combinatorial algorithms
  • Show More...

  • Refine by Keyword
  • 1 Bidirected arc-centric de Bruijn graph
  • 1 Eulerian cycle
  • 1 Population genomics
  • 1 Spectrum preserving string sets
  • 1 Suffix tree
  • Show More...

  • Refine by Type
  • 3 document

  • Refine by Publication Year
  • 1 2019
  • 1 2022
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail