DROPS

Volume

LIPIcs, Volume 312

24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

WABI 2024, September 2-4, 2024, Royal Holloway, London, United Kingdom

Editors: Solon P. Pissis and Wing-Kin Sung

Volume

LIPIcs, Volume 128

30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

CPM 2019, June 18-20, 2019, Pisa, Italy

Editors: Nadia Pisanti and Solon P. Pissis

Volume

LIPIcs, Volume 75

16th International Symposium on Experimental Algorithms (SEA 2017)

SEA 2017, June 21-23, 2017, London, UK

Editors: Costas S. Iliopoulos, Solon P. Pissis, Simon J. Puglisi, and Rajeev Raman

Document

DOI: 10.4230/LIPIcs.WABI.2024.9

Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation

Authors: Tsuyoshi Urata, Manato Yokoyama, and Momoko Hayamizu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

The 𝒞-Orientation problem asks whether it is possible to orient an undirected graph to a directed phylogenetic network of a desired class 𝒞, and to find such an orientation if one exists. The problem can arise when visualising evolutionary data, for example, because popular phylogenetic network reconstruction methods such as Neighbor-Net are distance-based and thus inevitably produce undirected graphs. The complexity of 𝒞-Orientation remains open for many classes 𝒞, including binary tree-child networks, and practical methods are still lacking. In this paper, we propose an exponential but practically efficient FPT algorithm for 𝒞-Orientation, which is parameterised by the reticulation number and the maximum size of minimal basic cycles used in the computation. We also present a very fast heuristic for Tree-Child Orientation. To evaluate the empirical performance of the proposed methods, we compared their accuracy and execution time for Tree-Child Orientation with those of an exponential time 𝒞-orientation algorithm from the literature. Our experiments show that the proposed exact algorithm is significantly faster than the state-of-the-art exponential time algorithm. The proposed heuristic runs even faster but the accuracy decreases as the reticulation number increases.

Cite as

Tsuyoshi Urata, Manato Yokoyama, and Momoko Hayamizu. Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 9:1-9:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{urata_et_al:LIPIcs.WABI.2024.9,
  author =	{Urata, Tsuyoshi and Yokoyama, Manato and Hayamizu, Momoko},
  title =	{{Orientability of Undirected Phylogenetic Networks to a Desired Class: Practical Algorithms and Application to Tree-Child Orientation}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{9:1--9:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.9},
  URN =		{urn:nbn:de:0030-drops-206531},
  doi =		{10.4230/LIPIcs.WABI.2024.9},
  annote =	{Keywords: Phylogenetic Networks, Tree-Child Networks, Graph Orientation Algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.5

Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests

Authors: Lukas Hübner and Alexandros Stamatakis

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

The field of population genetics attempts to advance our understanding of evolutionary processes. It has applications, for example, in medical research, wildlife conservation, and - in conjunction with recent advances in ancient DNA sequencing technology - studying human migration patterns over the past few thousand years. The basic toolbox of population genetics includes genealogical trees, which describe the shared evolutionary history among individuals of the same species. They are calculated on the basis of genetic variations. However, in recombining organisms, a single tree is insufficient to describe the evolutionary history of the whole genome. Instead, a collection of correlated trees can be used, where each describes the evolutionary history of a consecutive region of the genome. The current corresponding state of-the-art data structure, tree sequences, compresses these genealogical trees via edit operations when moving from one tree to the next along the genome instead of storing the full, often redundant, description for each tree. We propose a new data structure, genealogical forests, which compresses the set of genealogical trees into a DAG. In this DAG identical subtrees that are shared across the input trees are encoded only once, thereby allowing for straight-forward memoization of intermediate results. Additionally, we provide a C++ implementation of our proposed data structure, called gfkit, which is 2.1 to 11.2 (median 4.0) times faster than the state-of-the-art tool on empirical and simulated datasets at computing important population genetics statistics such as the Allele Frequency Spectrum, Patterson’s f, the Fixation Index, Tajima’s D, pairwise Lowest Common Ancestors, and others. On Lowest Common Ancestor queries with more than two samples as input, gfkit scales asymptotically better than the state-of-the-art, and is thus up to 990 times faster. In conclusion, our proposed data structure compresses genealogical trees by storing shared subtrees only once, thereby enabling straight-forward memoization of intermediate results, yielding a substantial runtime reduction and a potentially more intuitive data representation over the state-of-the-art. Our improvements will boost the development of novel analyses and models in the field of population genetics and increases scalability to ever-growing genomic datasets.

Cite as

Lukas Hübner and Alexandros Stamatakis. Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 5:1-5:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hubner_et_al:LIPIcs.WABI.2024.5,
  author =	{H\"{u}bner, Lukas and Stamatakis, Alexandros},
  title =	{{Memoization on Shared Subtrees Accelerates Computations on Genealogical Forests}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{5:1--5:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.5},
  URN =		{urn:nbn:de:0030-drops-206499},
  doi =		{10.4230/LIPIcs.WABI.2024.5},
  annote =	{Keywords: bioinformatics, population genetics, algorithms}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.13

PLA-index: A k-mer Index Exploiting Rank Curve Linearity

Authors: Md. Hasin Abrar and Paul Medvedev

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Given a sorted list of k-mers S, the rank curve of S is the function mapping a k-mer from the k-mer universe to the location in S where it either first appears or would be inserted. An exciting recent development is the observation that, for certain datasets, the rank curve is predictable and can be exploited to create small search indices. In this paper, we develop a novel search index that first estimates a k-mer’s rank using a piece-wise linear approximation of the rank curve and then does a local search to determine the precise location of the k-mer in the list. We combine ideas from previous approaches and supplement them with an innovative data representation strategy that substantially reduces space usage. Our PLA-index uses an order of magnitude less space than Sapling and uses less than half the space of the PGM-index, for roughly the same query time. For example, using only 9 MiB of memory, it can narrow down the position of k-mer in the suffix array of the human genome to within 255 positions. Furthermore, we demonstrate the potential of our approach to impact a variety of downstream applications. First, the PLA-index halves the time of binary search on the suffix array of the human genome. Second, the PLA-index reduces the space of a direct-access lookup table by 76 percent, without increasing the run time. Third, we plug the PLA-index into a state-of-the-art read aligner Strobealign and replace a 2 GiB component with a PLA-index of size 1.5 MiB, without significantly effecting runtime. The software and reproducibility information is freely available at https://github.com/medvedevgroup/pla-index.

Cite as

Md. Hasin Abrar and Paul Medvedev. PLA-index: A k-mer Index Exploiting Rank Curve Linearity. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{abrar_et_al:LIPIcs.WABI.2024.13,
  author =	{Abrar, Md. Hasin and Medvedev, Paul},
  title =	{{PLA-index: A k-mer Index Exploiting Rank Curve Linearity}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.13},
  URN =		{urn:nbn:de:0030-drops-206578},
  doi =		{10.4230/LIPIcs.WABI.2024.13},
  annote =	{Keywords: K-mer index, Piece-wise linear approximation, Learned index}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.23

AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction

Authors: Adam Cicherski, Anna Lisiecka, and Norbert Dojer

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the current paper we present AlfaPang, a new pangenome graph building algorithm. AlfaPang is based on a novel alignment-free approach that allows to construct pangenome graphs using significantly less computational resources than state-of-the-art tools. The code of AlfaPang is freely available at https://github.com/AdamCicherski/AlfaPang.

Cite as

Adam Cicherski, Anna Lisiecka, and Norbert Dojer. AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 23:1-23:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{cicherski_et_al:LIPIcs.WABI.2024.23,
  author =	{Cicherski, Adam and Lisiecka, Anna and Dojer, Norbert},
  title =	{{AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{23:1--23:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.23},
  URN =		{urn:nbn:de:0030-drops-206673},
  doi =		{10.4230/LIPIcs.WABI.2024.23},
  annote =	{Keywords: pangenome, variation graph, genome alignment, population genomics}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.4

MEM-Based Pangenome Indexing for k-mer Queries

Authors: Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO’s index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5× faster than other approaches. MEMO’s small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.

Cite as

Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, and Ben Langmead. MEM-Based Pangenome Indexing for k-mer Queries. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{hwang_et_al:LIPIcs.WABI.2024.4,
  author =	{Hwang, Stephen and Brown, Nathaniel K. and Ahmed, Omar Y. and Jenike, Katharine M. and Kovaka, Sam and Schatz, Michael C. and Langmead, Ben},
  title =	{{MEM-Based Pangenome Indexing for k-mer Queries}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.4},
  URN =		{urn:nbn:de:0030-drops-206482},
  doi =		{10.4230/LIPIcs.WABI.2024.4},
  annote =	{Keywords: Pangenomics, Comparative genomics, Compressed indexing}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.17

A*PA2: Up to 19× Faster Exact Global Alignment

Authors: Ragnar Groot Koerkamp

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Motivation. Pairwise alignment is at the core of computational biology. Most commonly used exact methods are either based on O(ns) band doubling or O(n+s²) diagonal transition, where n is the sequence length and s the number of errors. However, as the length of sequences has grown, these exact methods are often replaced by approximate methods based on e.g. seed-and-extend and heuristics to bound the computed region. We would like to develop an exact method that matches the performance of these approximate methods. Recently, Astarix introduced the A* shortest path algorithm with the seed heuristic for exact sequence-to-graph alignment. A*PA adapted and improved this for pairwise sequence alignment and achieves near-linear runtime when divergence (error rate) is low, at the cost of being very slow when divergence is high. Methods. We introduce A*PA2, an exact global pairwise aligner with respect to edit distance. The goal of A*PA2 is to unify the near-linear runtime of A*PA on similar sequences with the efficiency of dynamic programming (DP) based methods. Like Edlib, A*PA2 uses Ukkonen’s band doubling in combination with Myers' bitpacking. A*PA2 1) uses large block sizes inspired by Block Aligner, 2) extends this with SIMD (single instruction, multiple data), 3) introduces a new profile for efficient computations, 4) introduces a new optimistic technique for traceback based on diagonal transition, 5) avoids recomputation of states where possible, and 6) applies the heuristics developed in A*PA and improves them using pre-pruning. Results. With the first 4 engineering optimizations, A*PA2-simple has complexity O(ns) and is 6× to 8× faster than Edlib for sequences ≥ 10 kbp. A*PA2-full also includes the heuristic and is often near-linear in practice for sequences with small divergence. The average runtime of A*PA2 is 19× faster than the exact aligners BiWFA and Edlib on >500 kbp long ONT (Oxford Nanopore Technologies) reads of a human genome having 6% divergence on average. On shorter ONT reads of 11% average divergence the speedup is 5.6× (avg. length 11 kbp) and 0.81× (avg. length 800 bp). On all tested datasets, A*PA2 is competitive with or faster than approximate methods.

Cite as

Ragnar Groot Koerkamp. A*PA2: Up to 19× Faster Exact Global Alignment. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 17:1-17:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{grootkoerkamp:LIPIcs.WABI.2024.17,
  author =	{Groot Koerkamp, Ragnar},
  title =	{{A*PA2: Up to 19× Faster Exact Global Alignment}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{17:1--17:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.17},
  URN =		{urn:nbn:de:0030-drops-206610},
  doi =		{10.4230/LIPIcs.WABI.2024.17},
  annote =	{Keywords: Edit distance, Pairwise alignment, A*, Shortest path, Dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.19

RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs

Authors: Théo Boury, Laurent Bulteau, and Yann Ponty

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Inverse folding is a classic instance of negative RNA design which consists in finding a sequence that uniquely folds into a target secondary structure with respect to energy minimization. A breakthrough result of Bonnet et al. shows that, even in simple base pairs-based (BP) models, the decision version of a mildly constrained version of inverse folding is NP-hard. In this work, we show that inverse folding can be solved in linear time for a large collection of targets, including every structure that contains no isolated BP and no isolated stack (or, equivalently, when all helices consist of 3^{+} base pairs). For structures featuring shorter helices, our linear algorithm is no longer guaranteed to produce a solution, but still does so for a large proportion of instances. Our approach introduces a notion of modulo m-separability, generalizing a property pioneered by Hales et al. Separability is a sufficient condition for the existence of a solution to the inverse folding problem. We show that, for any input secondary structure of length n, a modulo m-separated sequence can be produced in time 𝒪(n 2^m) anytime such a sequence exists. Meanwhile, we show that any structure consisting of 3^{+} base pairs is either trivially non-designable, or always admits a modulo-2 separated solution (m = 2). Solution sequences can thus be produced in linear time, and even be uniformly generated within the set of modulo-2 separable sequences.

Cite as

Théo Boury, Laurent Bulteau, and Yann Ponty. RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 19:1-19:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boury_et_al:LIPIcs.WABI.2024.19,
  author =	{Boury, Th\'{e}o and Bulteau, Laurent and Ponty, Yann},
  title =	{{RNA Inverse Folding Can Be Solved in Linear Time for Structures Without Isolated Stacks or Base Pairs}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{19:1--19:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.19},
  URN =		{urn:nbn:de:0030-drops-206632},
  doi =		{10.4230/LIPIcs.WABI.2024.19},
  annote =	{Keywords: RNA structure, String Design, Parameterized Complexity, Uniform Sampling}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.8

Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors: Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Cite as

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2024.8,
  author =	{Schmidt, Sebastian and Toivonen, Santeri and Medvedev, Paul and Tomescu, Alexandru I.},
  title =	{{Applying the Safe-And-Complete Framework to Practical Genome Assembly}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{8:1--8:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.8},
  URN =		{urn:nbn:de:0030-drops-206520},
  doi =		{10.4230/LIPIcs.WABI.2024.8},
  annote =	{Keywords: Genome assembly, Omnitigs, Safe-and-complete framework, graph algorithm, HiFi sequencing data, Assembly evaluation}
}

Document

DOI: 10.4230/LIPIcs.WABI.2024.12

Reconstructing Rearrangement Phylogenies of Natural Genomes

Authors: Leonard Bohnenkämper, Jens Stoye, and Daniel Dörr

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success. Here, we improve on that earlier work, giving a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large. We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.

Cite as

Leonard Bohnenkämper, Jens Stoye, and Daniel Dörr. Reconstructing Rearrangement Phylogenies of Natural Genomes. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bohnenkamper_et_al:LIPIcs.WABI.2024.12,
  author =	{Bohnenk\"{a}mper, Leonard and Stoye, Jens and D\"{o}rr, Daniel},
  title =	{{Reconstructing Rearrangement Phylogenies of Natural Genomes}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.12},
  URN =		{urn:nbn:de:0030-drops-206564},
  doi =		{10.4230/LIPIcs.WABI.2024.12},
  annote =	{Keywords: genome rearrangement, ancestral reconstruction, small parsimony, integer linear programming, double-cut-and-join}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.WABI.2024

LIPIcs, Volume 312, WABI 2024, Complete Volume

Authors: Solon P. Pissis and Wing-Kin Sung

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

LIPIcs, Volume 312, WABI 2024, Complete Volume

Cite as

24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 1-454, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Proceedings{pissis_et_al:LIPIcs.WABI.2024,
  title =	{{LIPIcs, Volume 312, WABI 2024, Complete Volume}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{1--454},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024},
  URN =		{urn:nbn:de:0030-drops-206430},
  doi =		{10.4230/LIPIcs.WABI.2024},
  annote =	{Keywords: LIPIcs, Volume 312, WABI 2024, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.WABI.2024.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Solon P. Pissis and Wing-Kin Sung

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 0:i-0:xii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{pissis_et_al:LIPIcs.WABI.2024.0,
  author =	{Pissis, Solon P. and Sung, Wing-Kin},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{0:i--0:xii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.0},
  URN =		{urn:nbn:de:0030-drops-206447},
  doi =		{10.4230/LIPIcs.WABI.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.WABI.2024.1

Bioinformatics of Pathogens (Invited Talk)

Authors: Tomáš Vinař

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

Abstract

Genomic sequencing has become an important tool in identification and surveillance of human pathogens. Compared to large organisms, where our goal is to obtain high-quality sequences for detailed analysis, in pathogen sequencing the emphasis is often on optimization of cost and time. Consequently, sequencing of pathogens creates interesting computational challenges and development of new methods has a potential to significantly enhance applicability of the results in epidemiology and clinical practice. In my talk, I will give two examples: plasmid identification in bacterial isolates and genomic surveillance of wastewater for SARS-CoV-2. In both cases, application of better algorithms and modeling helps to improve the quality of analysis of very noisy data.

Cite as

Tomáš Vinař. Bioinformatics of Pathogens (Invited Talk). In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{vinar:LIPIcs.WABI.2024.1,
  author =	{Vina\v{r}, Tom\'{a}\v{s}},
  title =	{{Bioinformatics of Pathogens}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.1},
  URN =		{urn:nbn:de:0030-drops-206455},
  doi =		{10.4230/LIPIcs.WABI.2024.1},
  annote =	{Keywords: sequence analysis, genome assembly, neural networks, probabilistic modeling}
}

130 Search Results for "Pissis, Solon P."

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message