DROPS

Volume

LIPIcs, Volume 88

17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

WABI 2017, August 21-23, 2017, Boston, MA, USA

Editors: Russell Schwartz and Knut Reinert

Document

DOI: 10.4230/LIPIcs.ESA.2025.55

Safe Sequences via Dominators in DAGs for Path-Covering Problems

Authors: Francisco Sena, Romeo Rizzi, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A path-covering problem on a directed acyclic graph (DAG) requires finding a set of source-to-sink paths that cover all the nodes, all the arcs, or subsets thereof, and additionally they are optimal with respect to some function. In this paper we study safe sequences of nodes or arcs, namely sequences that appear in some path of every path cover of a DAG. We show that safe sequences admit a simple characterization via cutnodes. Moreover, we establish a connection between maximal safe sequences and leaf-to-root paths in the source- and sink-dominator trees of the DAG, which may be of independent interest in the extensive literature on dominators. With dominator trees, safe sequences admit an O(n)-size representation and a linear-time output-sensitive enumeration algorithm running in time O(m + o), where n and m are the number of nodes and arcs, respectively, and o is the total length of the maximal safe sequences. We then apply maximal safe sequences to simplify Integer Linear Programs (ILPs) for two path-covering problems, LeastSquares and MinPathError, which are at the core of RNA transcript assembly problems from bioinformatics. On various datasets, maximal safe sequences can be computed in under 0.1 seconds per graph, on average, and ILP solvers whose search space is reduced in this manner exhibit significant speed-ups. For example on graphs with a large width, average speed-ups are in the range 50-250× for MinPathError and in the range 80-350× for LeastSquares. Optimizing ILPs using safe sequences can thus become a fast building block of practical RNA transcript assembly tools, and more generally, of path-covering problems.

Cite as

Francisco Sena, Romeo Rizzi, and Alexandru I. Tomescu. Safe Sequences via Dominators in DAGs for Path-Covering Problems. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 55:1-55:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{sena_et_al:LIPIcs.ESA.2025.55,
  author =	{Sena, Francisco and Rizzi, Romeo and Tomescu, Alexandru I.},
  title =	{{Safe Sequences via Dominators in DAGs for Path-Covering Problems}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{55:1--55:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.55},
  URN =		{urn:nbn:de:0030-drops-245230},
  doi =		{10.4230/LIPIcs.ESA.2025.55},
  annote =	{Keywords: directed acyclic graph, path cover, dominator tree, integer linear programming, least squares, minimum path error}
}

Document

DOI: 10.4230/OASIcs.Manzini.9

Search Schemes for Approximate Pattern Matching: An Overview

Authors: Lore Depuydt, Jan Fostier, Simon Gottlieb, Gregory Kucherov, Knut Reinert, and Luca Renders

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

We provide a brief survey of results on solving the approximate pattern matching problem using search schemes, as introduced by Kucherov et al. (2016). We demonstrate that search schemes constitute a flexible and versatile tool that enable the specification of various search strategies, including several known filtering methods. We present approaches for designing efficient search schemes and for implementing them effectively. Finally, we conclude with experimental results comparing multiple search schemes on DNA sequencing data using the Columba software by Renders et al. (2021).

Cite as

Lore Depuydt, Jan Fostier, Simon Gottlieb, Gregory Kucherov, Knut Reinert, and Luca Renders. Search Schemes for Approximate Pattern Matching: An Overview. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{depuydt_et_al:OASIcs.Manzini.9,
  author =	{Depuydt, Lore and Fostier, Jan and Gottlieb, Simon and Kucherov, Gregory and Reinert, Knut and Renders, Luca},
  title =	{{Search Schemes for Approximate Pattern Matching: An Overview}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{9:1--9:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.9},
  URN =		{urn:nbn:de:0030-drops-239172},
  doi =		{10.4230/OASIcs.Manzini.9},
  annote =	{Keywords: FM-index, bidirectional index, approximate pattern matching, search scheme}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.20

SimdMinimizers: Computing Random Minimizers, fast

Authors: Ragnar Groot Koerkamp and Igor Martayan

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Motivation. Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller than the input sequences, but preserve sufficient information for downstream analysis. Minimizers are an especially popular sketching technique and used in a wide variety of applications. They sample at least one out of every w consecutive k-mers. As DNA sequencers are getting more accurate, some applications can afford to use a larger w and hence sparser and smaller sketches. And as sketches get smaller, their analysis becomes faster, so the time spent sketching the full-sized input becomes more of a bottleneck. Methods. Our library simd-minimizers implements a random minimizer algorithm using SIMD instructions. It supports both AVX2 and NEON architectures. Its main novelty is two-fold. First, it splits the input into 8 chunks that are streamed over in parallel through all steps of the algorithm. This is enabled by using the completely deterministic two-stacks sliding window minimum algorithm, which seems not to have been used before for finding minimizers. Results. Our library is up to 6.8× faster than a scalar implementation of the rescan method when w = 5 is small, and 3.4× faster for larger w = 19. Computing canonical minimizers is less than 50% slower than computing forward minimizers, and over 15× faster than the existing implementation in the minimizer-iter crate. Our library finds all (canonical) minimizers of a 3.2 Gbp human genome in 5.2 (resp. 6.7) seconds.

Cite as

Ragnar Groot Koerkamp and Igor Martayan. SimdMinimizers: Computing Random Minimizers, fast. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{grootkoerkamp_et_al:LIPIcs.SEA.2025.20,
  author =	{Groot Koerkamp, Ragnar and Martayan, Igor},
  title =	{{SimdMinimizers: Computing Random Minimizers, fast}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.20},
  URN =		{urn:nbn:de:0030-drops-232581},
  doi =		{10.4230/LIPIcs.SEA.2025.20},
  annote =	{Keywords: Minimizers, Randomized algorithms, Sketching, Hashing}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.WABI.2017

LIPIcs, Volume 88, WABI'17, Complete Volume

Authors: Russell Schwartz and Knut Reinert

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

LIPIcs, Volume 88, WABI'17, Complete Volume

Cite as

17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Proceedings{schwartz_et_al:LIPIcs.WABI.2017,
  title =	{{LIPIcs, Volume 88, WABI'17, Complete Volume}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017},
  URN =		{urn:nbn:de:0030-drops-78120},
  doi =		{10.4230/LIPIcs.WABI.2017},
  annote =	{Keywords: Nonnumerical Algorithms and Problems, Pattern Matching, Algorithms, Life and Medical Sciences}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.WABI.2017.0

Front Matter, Table of Contents, Preface, List of Authors

Authors: Russell Schwartz and Knut Reinert

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Front Matter, Table of Contents, Preface, List of Authors

Cite as

17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{schwartz_et_al:LIPIcs.WABI.2017.0,
  author =	{Schwartz, Russell and Reinert, Knut},
  title =	{{Front Matter, Table of Contents, Preface, List of Authors}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{0:i--0:xiv},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.0},
  URN =		{urn:nbn:de:0030-drops-76348},
  doi =		{10.4230/LIPIcs.WABI.2017.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, List of Authors}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.1

Disentangled Long-Read De Bruijn Graphs via Optical Maps

Authors: Bahar Alipanahi, Leena Salmela, Simon J. Puglisi, Martin Muggli, and Christina Boucher

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

While long reads produced by third-generation sequencing technology from, e.g, Pacific Biosciences have been shown to increase the quality of draft genomes in repetitive regions, fundamental computational challenges remain in overcoming their high error rate and assembling them efficiently. In this paper we show that the de Bruijn graph built on the long reads can be efficiently and substantially disentangled using optical mapping data as auxiliary information. Fundamental to our approach is the use of the positional de Bruijn graph and a succinct data structure for constructing and traversing this graph. Our experimental results show that over 97.7% of directed cycles have been removed from the resulting positional de Bruijn graph as compared to its non-positional counterpart. Our results thus indicate that disentangling the de Bruijn graph using positional information is a promising direction for developing a simple and efficient assembly algorithm for long reads.

Cite as

Bahar Alipanahi, Leena Salmela, Simon J. Puglisi, Martin Muggli, and Christina Boucher. Disentangled Long-Read De Bruijn Graphs via Optical Maps. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 1:1-1:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{alipanahi_et_al:LIPIcs.WABI.2017.1,
  author =	{Alipanahi, Bahar and Salmela, Leena and Puglisi, Simon J. and Muggli, Martin and Boucher, Christina},
  title =	{{Disentangled Long-Read De Bruijn Graphs via Optical Maps}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{1:1--1:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.1},
  URN =		{urn:nbn:de:0030-drops-76614},
  doi =		{10.4230/LIPIcs.WABI.2017.1},
  annote =	{Keywords: Positional de Bruijn graph, Genome Assembly, Long Read Data, Optical maps}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.2

Gene Tree Parsimony for Incomplete Gene Trees

Authors: Md. Shamsuzzoha Bayzid and Tandy Warnow

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.

Cite as

Md. Shamsuzzoha Bayzid and Tandy Warnow. Gene Tree Parsimony for Incomplete Gene Trees. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bayzid_et_al:LIPIcs.WABI.2017.2,
  author =	{Bayzid, Md. Shamsuzzoha and Warnow, Tandy},
  title =	{{Gene Tree Parsimony for Incomplete Gene Trees}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{2:1--2:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.2},
  URN =		{urn:nbn:de:0030-drops-76495},
  doi =		{10.4230/LIPIcs.WABI.2017.2},
  annote =	{Keywords: Gene duplication and loss, gene tree parsimony, deep coalescence}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.3

Better Greedy Sequence Clustering with Fast Banded Alignment

Authors: Brian Brubach, Jay Ghurye, Mihai Pop, and Aravind Srinivasan

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Comparing a string to a large set of sequences is a key subroutine in greedy heuristics for clustering genomic data. Clustering 16S rRNA gene sequences into operational taxonomic units (OTUs) is a common method used in studying microbial communities. We present a new approach to greedy clustering using a trie-like data structure and Four Russians speedup. We evaluate the running time of our method in terms of the number of comparisons it makes during clustering and show in experimental results that the number of comparisons grows linearly with the size of the dataset as opposed to the quadratic running time of other methods. We compare the clusters output by our method to the popular greedy clustering tool UCLUST. We show that the clusters we generate can be both tighter and larger.

Cite as

Brian Brubach, Jay Ghurye, Mihai Pop, and Aravind Srinivasan. Better Greedy Sequence Clustering with Fast Banded Alignment. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{brubach_et_al:LIPIcs.WABI.2017.3,
  author =	{Brubach, Brian and Ghurye, Jay and Pop, Mihai and Srinivasan, Aravind},
  title =	{{Better Greedy Sequence Clustering with Fast Banded Alignment}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{3:1--3:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.3},
  URN =		{urn:nbn:de:0030-drops-76425},
  doi =		{10.4230/LIPIcs.WABI.2017.3},
  annote =	{Keywords: Sequence Clustering, Metagenomics, String Algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.4

Optimal Computation of Overabundant Words

Authors: Yannis Almirantis, Panagiotis Charalampopoulos, Jia Gao, Costas S. Iliopoulos, Manal Mohamed, Solon P. Pissis, and Dimitris Polychronopoulos

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

The observed frequency of the longest proper prefix, the longest proper suffix, and the longest infix of a word w in a given sequence x can be used for classifying w as avoided or overabundant. The definitions used for the expectation and deviation of w in this statistical model were described and biologically justified by Brendel et al. (J Biomol Struct Dyn 1986). We have very recently introduced a time-optimal algorithm for computing all avoided words of a given sequence over an integer alphabet (Algorithms Mol Biol 2017). In this article, we extend this study by presenting an O(n)-time and O(n)-space algorithm for computing all overabundant words in a sequence x of length n over an integer alphabet. Our main result is based on a new non-trivial combinatorial property of the suffix tree T of x: the number of distinct factors of x whose longest infix is the label of an explicit node of T is no more than 3n-4. We further show that the presented algorithm is time-optimal by proving that O(n) is a tight upper bound for the number of overabundant words. Finally, we present experimental results, using both synthetic and real data, which justify the effectiveness and efficiency of our approach in practical terms.

Cite as

Yannis Almirantis, Panagiotis Charalampopoulos, Jia Gao, Costas S. Iliopoulos, Manal Mohamed, Solon P. Pissis, and Dimitris Polychronopoulos. Optimal Computation of Overabundant Words. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 4:1-4:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{almirantis_et_al:LIPIcs.WABI.2017.4,
  author =	{Almirantis, Yannis and Charalampopoulos, Panagiotis and Gao, Jia and Iliopoulos, Costas S. and Mohamed, Manal and Pissis, Solon P. and Polychronopoulos, Dimitris},
  title =	{{Optimal Computation of Overabundant Words}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{4:1--4:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.4},
  URN =		{urn:nbn:de:0030-drops-76468},
  doi =		{10.4230/LIPIcs.WABI.2017.4},
  annote =	{Keywords: overabundant words, avoided words, suffix tree, DNA sequence analysis}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.5

Detecting Locus Acquisition Events in Gene Trees

Authors: Michal Aleksander Ciach, Anna Muszewska, and Pawel Górecki

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Horizontal Gene Transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. In this work, we investigate the Locus Tree Inference problem as a possible alternative that combines the advantages of both approaches. We show several algorithms to solve the problem in the parsimony framework. We introduce a novel tree mapping, which allows us to obtain a heuristic solution to the problems of locus tree inference and duplication classification. Our approach allows not only for faster comparisons of gene and species trees but also to improve known algorithms for duplication inference in the presence of polytomies in the species trees.

Cite as

Michal Aleksander Ciach, Anna Muszewska, and Pawel Górecki. Detecting Locus Acquisition Events in Gene Trees. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{ciach_et_al:LIPIcs.WABI.2017.5,
  author =	{Ciach, Michal Aleksander and Muszewska, Anna and G\'{o}recki, Pawel},
  title =	{{Detecting Locus Acquisition Events in Gene Trees}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{5:1--5:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.5},
  URN =		{urn:nbn:de:0030-drops-76545},
  doi =		{10.4230/LIPIcs.WABI.2017.5},
  annote =	{Keywords: rank, taxon, ranked species tree, speciation, gene duplication, gene loss, horizontal gene transfer}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.6

An IP Algorithm for RNA Folding Trajectories

Authors: Amir H. Bayegan and Peter Clote

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Vienna RNA Package software Kinfold implements the Gillespie algorithm for RNA secondary structure folding kinetics, for the move sets MS1 [resp. MS2], consisting of base pair additions and removals [resp. base pair addition, removals and shifts]. In this paper, for arbitrary secondary structures s, t of a given RNA sequence, we present the first optimal algorithm to compute the shortest MS2 folding trajectory s = s0, s1, . . . , sm = t, where each intermediate structure si+1 is obtained from its predecessor by the addition, removal or shift of a single base pair. The shortest MS1 trajectory between s and t is trivially equal to the number of base pairs belonging to s but not t, plus the number of base pairs belonging to t but not s. Our optimal algorithm applies integer programming (IP) to solve (essentially) the minimum feedback vertex set (FVS) problem for the "conflict digraph" associated with input secondary structures s, t, and then applies topological sort, in order to generate an optimal MS2 folding pathway from s to t that maximizes the use of shift moves. Since the optimal algorithm may require excessive run time, we also sketch a fast, near-optimal algorithm (details to appear elsewhere). Software for our algorithm will be publicly available at http://bioinformatics.bc.edu/clotelab/MS2distance/.

Cite as

Amir H. Bayegan and Peter Clote. An IP Algorithm for RNA Folding Trajectories. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 6:1-6:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bayegan_et_al:LIPIcs.WABI.2017.6,
  author =	{Bayegan, Amir H. and Clote, Peter},
  title =	{{An IP Algorithm for RNA Folding Trajectories}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{6:1--6:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.6},
  URN =		{urn:nbn:de:0030-drops-76437},
  doi =		{10.4230/LIPIcs.WABI.2017.6},
  annote =	{Keywords: Integer programming, RNA secondary structure, folding trajectory, feedback vertex problem, conflict digraph}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.7

Fast Spaced Seed Hashing

Authors: Samuele Girotto, Matteo Comin, and Cinzia Pizzi

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Hashing k-mers is a common function across many bioinformatics applications and it is widely used for indexing, querying and rapid similarity search. Recently, spaced seeds, a special type of pattern that accounts for errors or mutations, are routinely used instead of k-mers. Spaced seeds allow to improve the sensitivity, with respect to k-mers, in many applications, however the hashing of spaced seeds increases substantially the computational time. Hence, the ability to speed up hashing operations of spaced seeds would have a major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient. In this paper we address the problem of efficient spaced seed hashing. The proposed algorithm exploits the similarity of adjacent spaced seed hash values in an input sequence in order to efficiently compute the next hash. We report a series of experiments on NGS reads hashing using several spaced seeds. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6x to 5.3x, depending on the structure of the spaced seed.

Cite as

Samuele Girotto, Matteo Comin, and Cinzia Pizzi. Fast Spaced Seed Hashing. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{girotto_et_al:LIPIcs.WABI.2017.7,
  author =	{Girotto, Samuele and Comin, Matteo and Pizzi, Cinzia},
  title =	{{Fast Spaced Seed Hashing}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{7:1--7:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.7},
  URN =		{urn:nbn:de:0030-drops-76501},
  doi =		{10.4230/LIPIcs.WABI.2017.7},
  annote =	{Keywords: k-mers, spaced seeds, efficient hashing}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.8

A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation

Authors: Nadia El-Mabrouk and Aïda Ouangraoua

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the Duplication-Loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separared orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a 0-1 edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC Problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems.

Cite as

Nadia El-Mabrouk and Aïda Ouangraoua. A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{elmabrouk_et_al:LIPIcs.WABI.2017.8,
  author =	{El-Mabrouk, Nadia and Ouangraoua, A\"{i}da},
  title =	{{A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{8:1--8:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.8},
  URN =		{urn:nbn:de:0030-drops-76565},
  doi =		{10.4230/LIPIcs.WABI.2017.8},
  annote =	{Keywords: Gene tree correction, Supertree, Polytomy, Reconciliation, Phylogeny}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.9

Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time

Authors: Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.

Cite as

Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz. Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{crochemore_et_al:LIPIcs.WABI.2017.9,
  author =	{Crochemore, Maxime and Francisco, Alexandre P. and Pissis, Solon P. and Vaz, C\'{a}tia},
  title =	{{Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{9:1--9:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.9},
  URN =		{urn:nbn:de:0030-drops-76529},
  doi =		{10.4230/LIPIcs.WABI.2017.9},
  annote =	{Keywords: computational biology, phylogenetic inference, Hamming distance}
}

51 Search Results for "Reinert, Knut"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message