Conflict Resolution Algorithms for Deep Coalescence Phylogenetic Networks

Authors Marcin Wawerka, Dawid Dąbkowski, Natalia Rutecka, Agnieszka Mykowiecka , Paweł Górecki



PDF
Thumbnail PDF

File

LIPIcs.WABI.2021.17.pdf
  • Filesize: 1.12 MB
  • 21 pages

Document Identifiers

Author Details

Marcin Wawerka
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
Dawid Dąbkowski
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
Natalia Rutecka
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
Agnieszka Mykowiecka
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
Paweł Górecki
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland

Cite AsGet BibTex

Marcin Wawerka, Dawid Dąbkowski, Natalia Rutecka, Agnieszka Mykowiecka, and Paweł Górecki. Conflict Resolution Algorithms for Deep Coalescence Phylogenetic Networks. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 17:1-17:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.WABI.2021.17

Abstract

We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence cost. We propose an O(|G||N|)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where |G| and |N| are the sizes of G and N, respectively. This algorithm has the ability to state whether the cost is exact or is a lower bound. In addition, our algorithm provides a set of reticulation edges that correspond to the obtained cost. If the cost is exact, the set induces an optimal displayed tree that yields the cost. If the cost is a lower bound, the set contains pairs of conflicting edges, that is, edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires 2^{r+1}-1 invocations of DP in the worst case, where r is a number of reticulations. We propose a similar O(2^k|G||N|)-time algorithm for level-k networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also show how our algorithms can be extended to a broader class of phylogenetic networks. Despite their exponential complexity in the worst case, our solutions perform significantly well on empirical and simulated datasets, thanks to the strategy of resolving internal dissimilarities between gene trees and networks. In particular, experiments on simulated data indicate that the runtime of our solution is Θ(2^{0.543 k}|G||N|) on average. Therefore, our solution is an efficient alternative to enumeration strategies commonly proposed in the literature and enables analyses of complex networks with dozens of reticulations.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Combinatorial optimization
  • Applied computing → Computational genomics
Keywords
  • Phylogenetic Network
  • Gene Tree
  • Species Tree
  • Deep Coalescence
  • Reticulation
  • Optimal Displayed Tree

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eric Bapteste, Leo van Iersel, Axel Janke, Scot Kelchner, Steven Kelk, James O. McInerney, David A. Morrison, Luay Nakhleh, Mike Steel, Leen Stougie, and James Whitfield. Networks: expanding evolutionary thinking. Trends in Genetics, 29(8):439-441, 2013. Google Scholar
  2. Dennis A Benson, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and Eric W Sayers. Genbank. Nucleic Acids Research, 39(suppl_1):D32-D37, 2010. Google Scholar
  3. Luis Boto. Horizontal gene transfer in evolution: facts and challenges. Proceedings of the Royal Society B: Biological Sciences, 277(1683):819-827, November 2009. Google Scholar
  4. Gabriel Cardona, Francesc Rosselló, and Gabriel Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4):552-569, 2008. Google Scholar
  5. Gabriel Cardona, Francesc Rossello, and Gabriel Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4):552-569, October 2009. Google Scholar
  6. Gabriel Cardona and Louxin Zhang. Counting and enumerating tree-child networks and their subclasses. Journal of Computer and System Sciences, 114:84-104, 2020. Google Scholar
  7. Jose Castresana. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution, 17(4):540-552, 2000. Google Scholar
  8. Ruchi Chaudhary, J Gordon Burleigh, and Oliver Eulenstein. Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. In BMC Bioinformatics, volume 13, pages 1-10. BioMed Central, 2012. Google Scholar
  9. Charles Choy, Jesper Jansson, Kunihiko Sadakane, and Wing-Kin Sung. Computing the maximum agreement of phylogenetic networks. Theoretical Computer Science, 335(1):93-107, 2005. Google Scholar
  10. Beatrice Donati, Christian Baudet, Blerina Sinaimeri, Pierluigi Crescenzi, and Marie-France Sagot. EUCALYPT: efficient tree reconciliation enumerator. Algorithms for Molecular Biology, 10(1):3, 2015. Google Scholar
  11. Robert C Edgar. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics, 5(1):1-19, 2004. Google Scholar
  12. Mareike Fischer, Leo Van Iersel, Steven Kelk, and Celine Scornavacca. On computing the maximum parsimony score of a phylogenetic network. SIAM Journal on Discrete Mathematics, 29(1):559-585, 2015. Google Scholar
  13. William Fletcher and Ziheng Yang. Indelible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution, 26(8):1879-1888, 2009. Google Scholar
  14. Andrew R. Francis and Mike Steel. Which phylogenetic networks are merely trees with additional arcs? Systematic Biology, 64(5):768-777, June 2015. Google Scholar
  15. Paweł Górecki, Oliver Eulenstein, and Jerzy Tiuryn. Unrooted tree reconciliation: A unified approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(2):522-536, 2013. Google Scholar
  16. Paweł Górecki and Jerzy Tiuryn. DLS-trees: A model of evolutionary scenarios. Theoretical Computer Science, 359(1-3):378-399, 2006. Google Scholar
  17. Paweł Górecki and Jerzy Tiuryn. Urec: a system for unrooted reconciliation. Bioinformatics, 23(4):511-512, 2007. Google Scholar
  18. Benjamin E. Goulet, Federico Roda, and Robin Hopkins. Hybridization in plants: Old ideas, new techniques. Plant Physiology, 173(1):65-78, November 2016. Google Scholar
  19. Stéphane Guindon, Jean-François Dufayard, Lefort Vincent, Maria Anisimova, Wim Hordijk, and Olivier Gascuel. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of phyml 3.0. Systematic Biology, 59(3):307-321, 2010. Google Scholar
  20. Dan Gusfield. ReCombinatorics: the Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks. MIT Press, Boston, 2014. Google Scholar
  21. Klaas Hartmann, Dennis Wong, and Tanja Stadler. Sampling trees from evolutionary models. Systematic Biology, 52(4):465-476, 2010. Google Scholar
  22. Katharina T. Huber and Vincent Moulton. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology, 52(5):613-632, 2006. Google Scholar
  23. Katharina T. Huber, Vincent Moulton, Mike Steel, and Taoyang Wu. Folding and unfolding phylogenetic trees and networks. Journal of Mathematical Biology, 73(6-7):1761-1780, 2016. Google Scholar
  24. Daniel H. Huson, Regula Rupp, and Celine Scornavacca. Phylogenetic Networks: Concepts Algorithms and Applications. Cambridge University Press, New York, 2010. Google Scholar
  25. Leo Van Iersel, Mark Jones, and Celine Scornavacca. Improved maximum parsimony models for phylogenetic networks. Systematic Biology, 67(3):518-542, December 2017. Google Scholar
  26. Remie Janssen and Yukihiro Murakami. Linear time algorithm for tree-child network containment. In International Conference on Algorithms for Computational Biology, pages 93-107. Springer, 2020. Google Scholar
  27. Matthew LeMay, Ran Libeskind-Hadas, and Yi-Chieh Wu. A polynomial-time algorithm for minimizing the deep coalescence cost for level-1 species networks. bioRxiv, November 2020. Google Scholar
  28. Wayne P. Maddison. Gene trees in species trees. Systematic Biology, 46(3):523-536, 1997. Google Scholar
  29. Vladimir Makarenkov, Bogdan Mazoure, Guillaume Rabusseau, and Pierre Legendre. Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin. BMC Ecology and Evolution, 21(1):1-18, 2021. Google Scholar
  30. Diego Mallo, Leonardo De Oliveira Martins, and David Posada. Simphy: Phylogenomic simulation of gene, locus, and species trees. Systematic Biology, 65(2):334-344, 2015. Google Scholar
  31. Alexey Markin, Tavis K. Anderson, Venkata SKT Vadali, and Oliver Eulenstein. Robinson-foulds reticulation networks. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 77-86, 2019. Google Scholar
  32. Sarah M. McDonald, Martha I. Nelson, Paul E. Turner, and John T. Patton. Reassortment in segmented RNA viruses: mechanisms and outcomes. Nature Reviews Microbiology, 14(7):448-460, May 2016. Google Scholar
  33. Siavash Mirarab and Tandy Warnow. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44-i52, 2015. Google Scholar
  34. Erin K Molloy and Tandy Warnow. Fastmulrfs: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics, 36(Supplement_1):i57-i65, 2020. Google Scholar
  35. Yukihiro Murakami, Leo van Iersel, Remie Janssen, Mark Jones, and Vincent Moulton. Reconstructing tree-child networks from reticulate-edge-deleted subnetworks. Bulletin of Mathematical Biology, 81(10):3823-3863, 2019. Google Scholar
  36. Matthew D. Rasmussen and Manolis Kellis. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Research, 22(4):755-765, 2012. Google Scholar
  37. Celine Scornavacca, Joan Carles Pons Mayol, and Gabriel Cardona. Fast algorithm for the reconciliation of gene trees and lgt networks. Journal of Theoretical Biology, 418:129-137, 2017. Google Scholar
  38. Yuelong Shu and John McCauley. Gisaid: Global initiative on sharing all influenza data-from vision to reality. Eurosurveillance, 22(13):30494, 2017. Google Scholar
  39. Claudia Solís-Lemus and Cécile Ané. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLOS Genetics, 12(3):1-21, 2016. Google Scholar
  40. Alexandros Stamatakis. Raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22(21):2688-2690, 2006. Google Scholar
  41. Cuong Than and Luay Nakhleh. Species tree inference by minimizing deep coalescences. PLoS Computational Biologie, 5(9):e1000501, 2009. Google Scholar
  42. Cuong Than, Derek Ruths, and Luay Nakhleh. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics, 9(1), July 2008. Google Scholar
  43. Thu-Hien To and Celine Scornavacca. Efficient algorithms for reconciling gene trees and species networks via duplication and loss events. BMC Genomics, 16(S10), 2015. Google Scholar
  44. Yi-Chieh Wu, Matthew D Rasmussen, Mukul S Bansal, and Manolis Kellis. Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Research, 24(3):475-486, 2014. Google Scholar
  45. Yun Yu, R. Matthew Barnett, and Luay Nakhleh. Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Systematic Biology, 62(5):738-751, July 2013. Google Scholar
  46. Yun Yu, James H. Degnan, and Luay Nakhleh. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics, 8(4):e1002660, April 2012. Google Scholar
  47. Yun Yu, Tandy Warnow, and Luay Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. Journal of Computational Biology, 18(11):1543-1559, November 2011. Google Scholar
  48. Louxin Zhang. From gene trees to species trees ii: Species tree inference by minimizing deep coalescence events. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1685-1691, 2011. Google Scholar
  49. Jiafan Zhu, Yun Yu, and Luay Nakhleh. In the light of deep coalescence: revisiting trees within networks. BMC Bioinformatics, 17(S14), November 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail