Empirical Performance of Tree-Based Inference of Phylogenetic Networks

Authors Zhen Cao, Jiafan Zhu, Luay Nakhleh



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.21.pdf
  • Filesize: 1.2 MB
  • 13 pages

Document Identifiers

Author Details

Zhen Cao
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
Jiafan Zhu
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
Luay Nakhleh
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA

Acknowledgements

The authors would like to thank Dr. Huw A. Ogilvie for his help.

Cite As Get BibTex

Zhen Cao, Jiafan Zhu, and Luay Nakhleh. Empirical Performance of Tree-Based Inference of Phylogenetic Networks. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 21:1-21:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.WABI.2019.21

Abstract

Phylogenetic networks extend the phylogenetic tree structure and allow for modeling vertical and horizontal evolution in a single framework. Statistical inference of phylogenetic networks is prohibitive and currently limited to small networks. An approach that could significantly improve phylogenetic network space exploration is based on first inferring an evolutionary tree of the species under consideration, and then augmenting the tree into a network by adding a set of "horizontal" edges to better fit the data.
In this paper, we study the performance of such an approach on networks generated under a birth-hybridization model and explore its feasibility as an alternative to approaches that search the phylogenetic network space directly (without relying on a fixed underlying tree). We find that the concatenation method does poorly at obtaining a "backbone" tree that could be augmented into the correct network, whereas the popular species tree inference method ASTRAL does significantly better at such a task. We then evaluated the tree-to-network augmentation phase under the minimizing deep coalescence and pseudo-likelihood criteria. We find that even though this is a much faster approach than the direct search of the network space, the accuracy is much poorer, even when the backbone tree is a good starting tree. 
Our results show that tree-based inference of phylogenetic networks could yield very poor results. As exploration of the network space directly in search of maximum likelihood estimates or a representative sample of the posterior is very expensive, significant improvements to the computational complexity of phylogenetic network inference are imperative if analyses of large data sets are to be performed. We show that a recently developed divide-and-conquer approach significantly outperforms tree-based inference in terms of accuracy, albeit still at a higher computational cost.

Subject Classification

ACM Subject Classification
  • Applied computing → Genomics
  • Applied computing → Computational biology
Keywords
  • Phylogenetic networks
  • species tree
  • tree-based networks
  • multi-locus phylogeny

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jason G Bragg, Sally Potter, Ana C Afonso Silva, Conrad J Hoskin, Benjamin YH Bai, and Craig Moritz. Phylogenomics of a rapid radiation: the Australian rainbow skinks. BMC Evolutionary Biology, 18(1):15, 2018. Google Scholar
  2. Gabriel Cardona, Merce Llabrés, and Francesc Rosselló. Two results on distances for phylogenetic networks. In Advances in Bioinformatics, pages 93-100. Springer, 2010. Google Scholar
  3. Gabriel Cardona, Mercè Llabrés, Francesc Rosselló, and Gabriel Valiente. On Nakhleh’s metric for reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(4):629-638, 2009. Google Scholar
  4. Ruth Davidson, Pranjal Vachaspati, Siavash Mirarab, and Tandy Warnow. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics, 16(10):S1, 2015. Google Scholar
  5. RA Leo Elworth, Huw A Ogilvie, Jiafan Zhu, and Luay Nakhleh. Advances in computational methods for phylogenetic networks in the presence of hybridization. In Bioinformatics and Phylogenetics, pages 317-360. Springer, 2019. Google Scholar
  6. Andrew R Francis and Mike Steel. Which phylogenetic networks are merely trees with additional arcs? Systematic Biology, 64(5):768-777, 2015. Google Scholar
  7. R. R. Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18:337-338, 2002. Google Scholar
  8. Luay Nakhleh. A metric on the space of reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 7(2):218-222, 2010. Google Scholar
  9. Lam-Tung Nguyen, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1):268-274, November 2014. Google Scholar
  10. A. Rambaut and N. C. Grassly. Seq-gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comp. Appl. Biosci., 13:235-238, 1997. Google Scholar
  11. David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981. Google Scholar
  12. Sebastien Roch and Sagi Snir. Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2):93-112, 2013. Google Scholar
  13. Claudia Solís-Lemus, Mengyao Yang, and Cécile Ané. Inconsistency of species-tree methods under gene flow. Systematic Biology, 2016. Google Scholar
  14. C. Than, D. Ruths, and L. Nakhleh. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics, 9(1):322, 2008. Google Scholar
  15. D. Wen, Y. Yu, and L. Nakhleh. Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent. PLoS Genetics, 12(5):e1006006, 2016. Google Scholar
  16. Dingqiao Wen and Luay Nakhleh. Co-estimating Reticulate Phylogenies and Gene Trees from Multi-locus Sequence Data. Systematic Biology, 67(3):439-457, 2018. Google Scholar
  17. Dingqiao Wen, Yun Yu, Jiafan Zhu, and Luay Nakhleh. Inferring Phylogenetic Networks Using PhyloNet. Systematic Biology, 67(4):735-740, 2018. Google Scholar
  18. Y. Yu, R.M. Barnett, and L. Nakhleh. Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Systematic Biology, 62(5):738-751, 2013. Google Scholar
  19. Y. Yu, J.H. Degnan, and L. Nakhleh. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics, 8:e1002660, 2012. Google Scholar
  20. Y. Yu, J. Dong, K. Liu, and L. Nakhleh. Maximum likelihood inference of reticulate evolutionary histories. Proceedings of the National Academy of Sciences, 111(46):16448-6453, 2014. Google Scholar
  21. Y. Yu and L. Nakhleh. A Maximum Pseudo-likelihood Approach for Phylogenetic Networks. BMC Genomics, 16:S10, 2015. Google Scholar
  22. Y. Yu, N. Ristic, and L. Nakhleh. Fast Algorithms and Heuristics for Phylogenomics under ILS and Hybridization. BMC Bioinformatics, 14(Suppl 15):S6, 2013. Google Scholar
  23. Chao Zhang, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(6):153, 2018. Google Scholar
  24. Chi Zhang, Huw A Ogilvie, Alexei J Drummond, and Tanja Stadler. Bayesian Inference of Species Networks from Multilocus Sequence Data. Molecular Biology and Evolution, 35(2):504-517, 2018. Google Scholar
  25. Louxin Zhang. On tree-based phylogenetic networks. Journal of Computational Biology, 23(7):553-565, 2016. Google Scholar
  26. Jiafan Zhu, Xinhao Liu, Huw A Ogilvie, and Luay K Nakhleh. A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data. Bioinformatics, 2019. To appear. Google Scholar
  27. Jiafan Zhu and Luay Nakhleh. Inference of Species Phylogenies from Bi-allelic Markers Using Pseudo-likelihood. Bioinformatics, 34:i376-i385, 2018. Google Scholar
  28. Jiafan Zhu, Dingqiao Wen, Yun Yu, Heidi M. Meudt, and Luay Nakhleh. Bayesian inference of phylogenetic networks from bi-allelic genetic markers. PLOS Computational Biology, 14(1):1-32, January 2018. Google Scholar
  29. Jiafan Zhu, Yun Yu, and Luay Nakhleh. In the light of deep coalescence: revisiting trees within networks. BMC Bioinformatics, 17(14):415, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail