Document

**Published in:** LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)

We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model).
Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success. Here, we improve on that earlier work, giving a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large.
We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.

Leonard Bohnenkämper, Jens Stoye, and Daniel Dörr. Reconstructing Rearrangement Phylogenies of Natural Genomes. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{bohnenkamper_et_al:LIPIcs.WABI.2024.12, author = {Bohnenk\"{a}mper, Leonard and Stoye, Jens and D\"{o}rr, Daniel}, title = {{Reconstructing Rearrangement Phylogenies of Natural Genomes}}, booktitle = {24th International Workshop on Algorithms in Bioinformatics (WABI 2024)}, pages = {12:1--12:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-340-9}, ISSN = {1868-8969}, year = {2024}, volume = {312}, editor = {Pissis, Solon P. and Sung, Wing-Kin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.12}, URN = {urn:nbn:de:0030-drops-206564}, doi = {10.4230/LIPIcs.WABI.2024.12}, annote = {Keywords: genome rearrangement, ancestral reconstruction, small parsimony, integer linear programming, double-cut-and-join} }

Document

**Published in:** LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

Two genomes over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. A genome is circular when it contains only circular chromosomes. Different distances of canonical circular genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length. Then, the breakpoint distance is equal to n-c_2, where n is the number of genes and c_2 is the number of cycles of length 2. Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance is n-c, where c is the total number of cycles.
The distance problem is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance. Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider a σ_k distance, defined to be n-(c_2+c_4+…+c_k), and increasingly investigate the complexities of median and double distance for the σ₄ distance, then the σ₆ distance, and so on. While for the median much effort was done in our and in other research groups but no progress was obtained even for the σ₄ distance, for solving the double distance under σ₄ and σ₆ distances we could devise linear time algorithms, which we present here.

Marília D. V. Braga, Leonie R. Brockmann, Katharina Klerx, and Jens Stoye. A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{braga_et_al:LIPIcs.WABI.2022.13, author = {Braga, Mar{\'\i}lia D. V. and Brockmann, Leonie R. and Klerx, Katharina and Stoye, Jens}, title = {{A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance}}, booktitle = {22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)}, pages = {13:1--13:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-243-3}, ISSN = {1868-8969}, year = {2022}, volume = {242}, editor = {Boucher, Christina and Rahmann, Sven}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.13}, URN = {urn:nbn:de:0030-drops-170472}, doi = {10.4230/LIPIcs.WABI.2022.13}, annote = {Keywords: Comparative genomics, genome rearrangement, breakpoint distance, double-cut-and-join (DCJ) distance, double distance} }

Document

**Published in:** Dagstuhl Reports, Volume 9, Issue 6 (2020)

Dagstuhl Seminar 19241 ("25 Years of the Burrows-Wheeler Transform") took place from June 10th to 14th, 2019, and was attended by 45 people from 13 countries and the three fields of Algorithms and Data Structures, Bioinformatics, and Combinatorics on Words. There were four talks and a panel session for each field. Feedback was generally positive and we are confident the seminar fostered interdisciplinary connections and will eventually result in noteworthy joint publications.

Travis Gagie, Giovanni Manzini, Gonzalo Navarro, and Jens Stoye. 25 Years of the Burrows-Wheeler Transform (Dagstuhl Seminar 19241). In Dagstuhl Reports, Volume 9, Issue 6, pp. 55-68, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Article{gagie_et_al:DagRep.9.6.55, author = {Gagie, Travis and Manzini, Giovanni and Navarro, Gonzalo and Stoye, Jens}, title = {{25 Years of the Burrows-Wheeler Transform (Dagstuhl Seminar 19241)}}, pages = {55--68}, journal = {Dagstuhl Reports}, ISSN = {2192-5283}, year = {2019}, volume = {9}, number = {6}, editor = {Gagie, Travis and Manzini, Giovanni and Navarro, Gonzalo and Stoye, Jens}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagRep.9.6.55}, URN = {urn:nbn:de:0030-drops-114874}, doi = {10.4230/DagRep.9.6.55}, annote = {Keywords: Bioinformatics, Burrows-Wheeler Transform, Combinatorics on Words, Data Compression, Data Structures, Indexing, Sequence Alignment} }

Document

**Published in:** LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.

Jarno Alanko, Hideo Bannai, Bastien Cazaux, Pierre Peterlongo, and Jens Stoye. Finding All Maximal Perfect Haplotype Blocks in Linear Time. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 8:1-8:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.WABI.2019.8, author = {Alanko, Jarno and Bannai, Hideo and Cazaux, Bastien and Peterlongo, Pierre and Stoye, Jens}, title = {{Finding All Maximal Perfect Haplotype Blocks in Linear Time}}, booktitle = {19th International Workshop on Algorithms in Bioinformatics (WABI 2019)}, pages = {8:1--8:9}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-123-8}, ISSN = {1868-8969}, year = {2019}, volume = {143}, editor = {Huber, Katharina T. and Gusfield, Dan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.8}, URN = {urn:nbn:de:0030-drops-110388}, doi = {10.4230/LIPIcs.WABI.2019.8}, annote = {Keywords: Population genomics, selection coefficient, haplotype block, positional Burrows-Wheeler Transform} }

Document

**Published in:** LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words.

Luís Cunha, Simone Dantas, Travis Gagie, Roland Wittler, Luis Kowada, and Jens Stoye. Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:LIPIcs.CPM.2017.19, author = {Cunha, Lu{\'\i}s and Dantas, Simone and Gagie, Travis and Wittler, Roland and Kowada, Luis and Stoye, Jens}, title = {{Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings}}, booktitle = {28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)}, pages = {19:1--19:9}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-039-2}, ISSN = {1868-8969}, year = {2017}, volume = {78}, editor = {K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.19}, URN = {urn:nbn:de:0030-drops-73418}, doi = {10.4230/LIPIcs.CPM.2017.19}, annote = {Keywords: string algorithms, indexing, jumbled pattern matching, run-length encoding} }

Document

**Published in:** Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

The genomic distance problem in the Hannenhalli-Pevzner (HP) theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of translocations, fusions, fissions and inversions that transform one genome into the other. We will present a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way, and a linear-time algorithm for computing this distance.

Anne Bergeron, Julia Mixtacki, and Jens Stoye. A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{bergeron_et_al:DagSemProc.10231.2, author = {Bergeron, Anne and Mixtacki, Julia and Stoye, Jens}, title = {{A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance}}, booktitle = {Structure Discovery in Biology: Motifs, Networks \& Phylogenies}, pages = {1--25}, series = {Dagstuhl Seminar Proceedings (DagSemProc)}, ISSN = {1862-4405}, year = {2010}, volume = {10231}, editor = {Alberto Apostolico and Andreas Dress and Laxmi Parida}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.2}, URN = {urn:nbn:de:0030-drops-26892}, doi = {10.4230/DagSemProc.10231.2}, annote = {Keywords: Comparative genomics, genomic distance computation, HP theory} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail