Bridging Disparate Views on the DCJ-Indel Model for a Capping-Free Solution to the Natural Distance Problem

Author Leonard Bohnenkämper



PDF
Thumbnail PDF

File

LIPIcs.WABI.2023.22.pdf
  • Filesize: 7.45 MB
  • 18 pages

Document Identifiers

Author Details

Leonard Bohnenkämper
  • Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany

Acknowledgements

I thank my supervisor Jens Stoye and Marília D. V. Braga for their helpful comments in discussions regarding notation, terms and overall structure of the paper. I furthermore thank Daniel Doerr for making me switch from water to lava and all members of the genome informatics group of Bielefeld University for their advice on how to present my research in this manuscript.

Cite AsGet BibTex

Leonard Bohnenkämper. Bridging Disparate Views on the DCJ-Indel Model for a Capping-Free Solution to the Natural Distance Problem. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.WABI.2023.22

Abstract

One of the most fundamental problems in genome rearrangement is the (genomic) distance problem. It is typically formulated as finding the minimum number of rearrangements under a model that are needed to transform one genome into the other. A powerful multi-chromosomal model is the Double Cut and Join (DCJ) model. While the DCJ model is not able to deal with some situations that occur in practice, like duplicated or lost regions, it was extended over time to handle these cases. First, it was extended to the DCJ-indel model, solving the issue of lost markers. Later ILP-solutions for so called natural genomes, in which each genomic region may occur an arbitrary number of times, were developed, enabling in theory to solve the distance problem for any pair of genomes. However, some theoretical and practical issues remained unsolved. On the theoretical side of things, there exist two disparate views of the DCJ-indel model, motivated in the same way, but with different conceptualizations that could not be reconciled so far. On the practical side, while the solutions for natural genomes typically perform well on telomere to telomere resolved genomes, they have been shown in recent years to quickly loose performance on genomes with a large number of contigs or linear chromosomes. This has been linked to a particular technique increasing the solution space superexponentially named capping. Recently, we introduced a new conceptualization of the DCJ-indel model within the context of another rearrangement problem. In this manuscript, we will apply this new conceptualization to the distance problem. In doing this, we uncover the relation between the disparate conceptualizations of the DCJ-indel model. We are also able to derive an ILP solution to the distance problem that does not rely on capping and therefore significantly improves upon the performance of previous solutions for genomes with high numbers of contigs while still solving the problem exactly. To the best of our knowledge, our approach is the first allowing for an exact computation of the DCJ-indel distance for natural genomes with large numbers of linear chromosomes. We demonstrate the performance advantage as well as limitations in comparison to an existing solution on simulated genomes as well as showing its practical usefulness in an analysis of 11 Drosophila genomes.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • Comparative Genomics
  • Genome Rearrangement
  • Double-Cut-And-Join
  • Indels
  • Integer Linear Programming
  • Capping

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hans-Jürgen Bandelt and Andreas W.M. Dress. Split decomposition: A new and useful approach to phylogenetic analysis of distance data. Molecular Phylogenetics and Evolution, 1(3):242-252, 1992. URL: https://doi.org/10.1016/1055-7903(92)90021-8.
  2. Anne Bergeron, Julia Mixtacki, and Jens Stoye. A unifying view of genome rearrangements. In Philipp Bücher and Bernard M. E. Moret, editors, Algorithms in Bioinformatics, pages 163-173, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. Google Scholar
  3. Leonard Bohnenkämper. The floor is lava - halving genomes with viaducts, piers and pontoons. In Comparative Genomics. Springer International Publishing, to appear. Google Scholar
  4. Leonard Bohnenkämper, Marília D.V. Braga, Daniel Doerr, and Jens Stoye. Computing the rearrangement distance of natural genomes. Journal of Computational Biology, 28(4):410-431, 2021. PMID: 33393848. URL: https://doi.org/10.1089/cmb.2020.0434.
  5. Marília D. V. Braga, Eyla Willing, and Jens Stoye. Genomic distance with DCJ and indels. In Vincent Moulton and Mona Singh, editors, Algorithms in Bioinformatics, pages 90-101, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. Google Scholar
  6. Marília D.V. Braga, Eyla Willing, and Jens Stoye. Double cut and join with insertions and deletions. Journal of Computational Biology, 18(9):1167-1184, 2011. PMID: 21899423. URL: https://doi.org/10.1089/cmb.2011.0118.
  7. Phillip E. C. Compeau. A simplified view of DCJ-indel distance. In Ben Raphael and Jijun Tang, editors, Algorithms in Bioinformatics, pages 365-377, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. Google Scholar
  8. Phillip Ec Compeau. DCJ-indel sorting revisited. Algorithms for molecular biology : AMB, 8(1):6-6, March 2013. URL: https://doi.org/10.1186/1748-7188-8-6.
  9. Daniel Doerr and Cedric Chauve. Small parsimony for natural genomes in the DCJ-indel model. Journal of Bioinformatics and Computational Biology, 19(06):2140009, 2021. PMID: 34806948. URL: https://doi.org/10.1142/S0219720021400096.
  10. David M. Emms and Steven Kelly. Orthofinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology, 16(1):157, August 2015. URL: https://doi.org/10.1186/s13059-015-0721-2.
  11. Guillaume Fertin, Anthony Labarre, Irena Rusu, Eric Tannier, and Stéphane Vialette. Combinatorics of Genome Rearrangements. Computational Molecular Biology. The MIT Press, 2009. Google Scholar
  12. Sridhar Hannenhalli and Pavel A. Pevzner. Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals. J. ACM, 46(1):1-27, January 1999. URL: https://doi.org/10.1145/300515.300516.
  13. Daniel H. Huson and David Bryant. Application of Phylogenetic Networks in Evolutionary Studies. Molecular Biology and Evolution, 23(2):254-267, October 2005. URL: https://doi.org/10.1093/molbev/msj030.
  14. Martin Krzywinski, Jacqueline Schein, Inanc Birol, Joseph Connors, Randy Gascoyne, Doug Horsman, Steven J Jones, and Marco A Marra. Circos: an information aesthetic for comparative genomics. Genome research, 19(9):1639-1645, 2009. Google Scholar
  15. Diego P. Rubert and Marília D. V. Braga. Gene Orthology Inference via Large-Scale Rearrangements for Partially Assembled Genomes. In Christina Boucher and Sven Rahmann, editors, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), volume 242 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:22, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.WABI.2022.24.
  16. Diego P. Rubert, Daniel Doerr, and Marília D. V. Braga. The potential of family-free rearrangements towards gene orthology inference. Journal of Bioinformatics and Computational Biology, 19(06):2140014, 2021. PMID: 34775922. URL: https://doi.org/10.1142/S021972002140014X.
  17. Mingfu Shao, Yu Lin, and Bernard M.E. Moret. An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. Journal of Computational Biology, 22(5):425-435, 2015. PMID: 25517208. URL: https://doi.org/10.1089/cmb.2014.0096.
  18. Sophia Yancopoulos, Oliver Attie, and Richard Friedberg. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16):3340-3346, June 2005. URL: https://doi.org/10.1093/bioinformatics/bti535.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail