Reconstructing Phylogenetic Networks via Cherry Picking and Machine Learning

Authors Giulia Bernardini , Leo van Iersel, Esther Julien, Leen Stougie



PDF
Thumbnail PDF

File

LIPIcs.WABI.2022.16.pdf
  • Filesize: 1.27 MB
  • 22 pages

Document Identifiers

Author Details

Giulia Bernardini
  • University of Trieste, Italy
  • CWI, Amsterdam, The Netherlands
Leo van Iersel
  • Delft Institute of Applied Mathematics, Delft University of Technology, The Netherlands
Esther Julien
  • Delft Institute of Applied Mathematics, Delft University of Technology, The Netherlands
Leen Stougie
  • CWI and Vrije Universiteit, Amsterdam, The Netherlands
  • Erable, France

Acknowledgements

The authors thank Remie Janssen for providing ideas and preliminary code for the randomised heuristics, and Yukihiro Murakami for the inspiring discussions.

Cite AsGet BibTex

Giulia Bernardini, Leo van Iersel, Esther Julien, and Leen Stougie. Reconstructing Phylogenetic Networks via Cherry Picking and Machine Learning. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 16:1-16:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.WABI.2022.16

Abstract

Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of heuristics that are guaranteed to produce a network containing each of the input trees, for practical-size datasets. The main contribution of this paper is the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. This is one of the first applications of machine learning to phylogenetic studies, and we show its promise with a proof-of-concept experimental study conducted on both simulated and real data consisting of binary trees with no missing taxa.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
Keywords
  • Phylogenetics
  • Hybridization
  • Cherry Picking
  • Machine Learning
  • Heuristic

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Benjamin Albrecht. Computing all hybridization networks for multiple binary phylogenetic input trees. BMC bioinformatics, 16(1):1-15, 2015. Google Scholar
  2. Dana Azouri, Shiran Abadi, Yishay Mansour, Itay Mayrose, and Tal Pupko. Harnessing machine learning to guide phylogenetic-tree search algorithms. Nature communications, 12(1):1-9, 2021. Google Scholar
  3. Robert G Beiko. Telling the whole story in a 10,000-genome world. Biology Direct, 6(1):1-36, 2011. Google Scholar
  4. Magnus Bordewich and Charles Semple. Computing the minimum number of hybridization events for a consistent evolutionary history. Discrete Applied Mathematics, 155(8):914-928, 2007. Google Scholar
  5. Sander Borst, Leo van Iersel, Mark Jones, and Steven Kelk. New FPT algorithms for finding the temporal hybridization number for sets of phylogenetic trees. Algorithmica, 2022. Google Scholar
  6. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338-355, 1984. URL: https://doi.org/10.1137/0213024.
  7. Peter J Humphries, Simone Linz, and Charles Semple. Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies. Bulletin of mathematical biology, 75(10):1879-1890, 2013. Google Scholar
  8. Remie Janssen and Yukihiro Murakami. On cherry-picking and network containment. Theoretical Computer Science, 856:121-150, 2021. Google Scholar
  9. Sudhir Kumar and Sudip Sharma. Evolutionary sparse learning for phylogenomics. Molecular Biology and Evolution, 38(11):4674-4682, 2021. Google Scholar
  10. Simone Linz and Charles Semple. Attaching leaves and picking cherries to characterise the hybridisation number for a set of phylogenies. Advances in Applied Mathematics, 105:102-129, 2019. Google Scholar
  11. Sajad Mirzaei and Yufeng Wu. Fast construction of near parsimonious hybridization networks for multiple phylogenetic trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3):565-570, 2015. Google Scholar
  12. Fabio Pardi and Celine Scornavacca. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLoS computational biology, 11(4):e1004135, 2015. Google Scholar
  13. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  14. Joan Carles Pons, Celine Scornavacca, and Gabriel Cardona. Generation of level-k LGT networks. IEEE/ACM transactions on computational biology and bioinformatics, 17(1):158-164, 2019. Google Scholar
  15. Charles Semple and Gerry Toft. Trinets encode orchard phylogenetic networks. Journal of Mathematical Biology, 83(3):1-20, 2021. Google Scholar
  16. Claudia Solís-Lemus, Paul Bastide, and Cécile Ané. Phylonetworks: a package for phylogenetic networks. Molecular biology and evolution, 34(12):3292-3298, 2017. Google Scholar
  17. Leo van Iersel, Remie Janssen, Mark Jones, and Yukihiro Murakami. Orchard networks are trees with additional horizontal arcs. Bulletin of Mathematical Biology, 2022. to appear. Google Scholar
  18. Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, and Norbert Zeh. A unifying characterization of tree-based networks and orchard networks using cherry covers. Advances in Applied Mathematics, 129:102222, 2021. URL: https://doi.org/10.1016/j.aam.2021.102222.
  19. Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami, and Norbert Zeh. A practical fixed-parameter algorithm for constructing tree-child networks from multiple binary trees. Algorithmica, 84:917-960, 2022. Google Scholar
  20. Dingqiao Wen, Yun Yu, Jiafan Zhu, and Luay Nakhleh. Inferring phylogenetic networks using phylonet. Systematic biology, 67(4):735-740, 2018. Google Scholar
  21. Yufeng Wu. Close lower and upper bounds for the minimum reticulate network of multiple phylogenetic trees. Bioinformatics, 26(12):i140-i148, 2010. Google Scholar
  22. Tujin Zhu and Yunpeng Cai. Applying neural network to reconstruction of phylogenetic tree. In ICMLC 2021: 13th International Conference on Machine Learning and Computing, Shenzhen China, 26 February, 2021- 1 March, 2021, pages 146-152. ACM, 2021. URL: https://doi.org/10.1145/3457682.3457704.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail