Reconstructing Rearrangement Phylogenies of Natural Genomes

Authors Leonard Bohnenkämper , Jens Stoye , Daniel Dörr



PDF
Thumbnail PDF

File

LIPIcs.WABI.2024.12.pdf
  • Filesize: 0.93 MB
  • 16 pages

Document Identifiers

Author Details

Leonard Bohnenkämper
  • Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany
Jens Stoye
  • Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany
Daniel Dörr
  • Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Germany
  • German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research Germany, and Center for Digital Medicine, Heinrich Heine University Düsseldorf, Germany

Acknowledgements

LB thanks Luca Parmigiani for helping with some C++ issues at a critical moment. DD thanks Cedric Chauve for providing the Anopheles dataset.

Cite AsGet BibTex

Leonard Bohnenkämper, Jens Stoye, and Daniel Dörr. Reconstructing Rearrangement Phylogenies of Natural Genomes. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.WABI.2024.12

Abstract

We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success. Here, we improve on that earlier work, giving a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large. We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • Theory of computation → Integer programming
Keywords
  • genome rearrangement
  • ancestral reconstruction
  • small parsimony
  • integer linear programming
  • double-cut-and-join

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Beatrice Amos, Cristina Aurrecoechea, Matthieu Barba, Ana Barreto, Evelina Y. Basenko, Wojciech Bażant, Robert Belnap, Ann S. Blevins, Ulrike Böhme, John Brestelli, Brian P. Brunk, Mark Caddick, Danielle Callan, Lahcen Campbell, Mikkel B. Christensen, George K. Christophides, Kathryn Crouch, Kristina Davis, Jeremy DeBarry, Ryan Doherty, Yikun Duan, Michael Dunn, Dave Falke, Steve Fisher, Paul Flicek, Brett Fox, Bindu Gajria, Gloria I. Giraldo-Calderón, Omar S. Harb, Elizabeth Harper, Christiane Hertz-Fowler, Mark J. Hickman, Connor Howington, Sufen Hu, Jay Humphrey, John Iodice, Andrew Jones, John Judkins, Sarah A. Kelly, Jessica C. Kissinger, Dae Kun Kwon, Kristopher Lamoureux, Daniel Lawson, Wei Li, Kallie Lies, Disha Lodha, Jamie Long, Robert M. MacCallum, Gareth Maslen, Mary Ann McDowell, Jaroslaw Nabrzyski, David S. Roos, Samuel S. C. Rund, Stephanie Wever Schulman, Achchuthan Shanmugasundram, Vasily Sitnik, Drew Spruill, David Starns, Christian J. Stoeckert, Jr., Sheena Shah Tomko, Haiming Wang, Susanne Warrenfeltz, Robert Wieck, Paul A. Wilkinson, Lin Xu, and Jie Zheng. VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Research, 50(D1):D898-D911, 2021. URL: https://doi.org/10.1093/nar/gkab929.
  2. Leonard Bohnenkämper. Recombinations, chains and caps: resolving problems with the DCJ-indel model. Algorithms for Molecular Biology, 19:8, 2024. URL: https://doi.org/10.1186/s13015-024-00253-7.
  3. Leonard Bohnenkämper, Marilia D. V. Braga, Daniel Doerr, and Jens Stoye. Computing the rearrangement distance of natural genomes. Journal of Computational Biology, 28(4):410-431, 2021. URL: https://doi.org/10.1089/cmb.2020.0434.
  4. Leonard Bohnenkämper and Daniel Dörr. SPP-DCJ. Software, swhId: https://archive.softwareheritage.org/swh:1:dir:9f96ced9254d812c0c0cd34376094007bc578a63;origin=https://github.com/marschall-lab/spp_dcj_v2;visit=swh:1:snp:ee938af820e78b1ad24c4e526baf0108e161536c;anchor=swh:1:rev:a52852c94f88e98489631f870581cd1f6e3f8cb3 (visited on 2024-08-16). URL: https://github.com/marschall-lab/spp_dcj_v2.
  5. Marilia D. V. Braga, Eyla Willing, and Jens Stoye. Double cut and join with insertions and deletions. Journal of Computational Biology, 18(9):1167-1184, 2011. URL: https://doi.org/10.1089/cmb.2011.0118.
  6. Evan P. Cribbie, Daniel Doerr, and Cedric Chauve. AGO, a framework for the reconstruction of ancestral syntenies and gene orders. In João C. Setubal, Jens Stoye, and Peter F. Stadler, editors, Comparative Genomics, vol. 2, volume 2802 of Methods Molecular Biology, pages 247-265. Springer, 2024. URL: https://doi.org/10.1007/978-1-0716-3838-5_10.
  7. Adrián A. Davín, Théo Tricou, Eric Tannier, Damien M. de Vienne, and Gergely J. Szöllősi. Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages. Bioinformatics, 36(4):1286-1288, 2019. URL: https://doi.org/10.1093/bioinformatics/btz710.
  8. Daniel Doerr and Cedric Chauve. Small parsimony for natural genomes in the DCJ-indel model. Journal of Bioinformatics and Computational Biology, 19(06):2140009, 2021. URL: https://doi.org/10.1142/S0219720021400096.
  9. Wandrille Duchemin, Yoann Anselmetti, Murray Patterson, Yann Ponty, Sèverine Bérard, Cedric Chauve, Celine Scornavacca, Vincent Daubin, and Eric Tannier. DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies. Genome Biology and Evolution, 9(5):1312-1319, 2017. URL: https://doi.org/10.1093/gbe/evx069.
  10. Bui Quang Minh, Heiko A. Schmidt, Olga Chernomor, Dominik Schrempf, Michael D. Woodhams, Arndt von Haeseler, and Robert Lanfear. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37(5):1530-1534, 2020. URL: https://doi.org/10.1093/molbev/msaa015.
  11. Vincent Ranwez, Emmanuel J. P. Douzery, Cédric Cambon, Nathalie Chantret, and Frédéric Delsuc. MACSE v2: Toolkit for the alignment of coding sequences accounting for frameshifts and stop codons. Molecular Biology and Evolution, 35(10):2582-2584, 2018. URL: https://doi.org/10.1093/molbev/msy159.
  12. Diego P. Rubert and Marilia D. V. Braga. Efficient gene orthology inference via large-scale rearrangements. Algorithms for Molecular Biology, 18:14, 2023. URL: https://doi.org/10.1186/s13015-023-00238-y.
  13. Mingfu Shao, Yu Lin, and Bernard M. E. Moret. An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. Journal of Computational Biology, 22(5):425-435, 2015. URL: https://doi.org/10.1089/cmb.2014.0096.
  14. Sophia Yancopoulos, Oliver Attie, and Richard Friedberg. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16):3340-3346, 2005. URL: https://doi.org/10.1093/bioinformatics/bti535.