Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees

Authors Xilin Yu , Thien Le , Sarah Christensen , Erin K. Molloy , Tandy Warnow



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.15.pdf
  • Filesize: 0.67 MB
  • 17 pages

Document Identifiers

Author Details

Xilin Yu
  • Amazon AWS, Seattle, WA, USA
Thien Le
  • Department of EECS, Massachusetts Institute of Technology, Cambridge, MA, USA.
Sarah Christensen
  • Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Erin K. Molloy
  • Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Tandy Warnow
  • Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Acknowledgements

This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.

Cite As Get BibTex

Xilin Yu, Thien Le, Sarah Christensen, Erin K. Molloy, and Tandy Warnow. Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.WABI.2020.15

Abstract

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Graph algorithms
Keywords
  • supertrees
  • divide-and-conquer
  • phylogeny estimation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho, Yehoshua Sagiv, Thomas G. Szymanski, and Jeffrey D. Ullman. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing, 10(3):405-421, 1981. Google Scholar
  2. Mukul S Bansal, J Gordon Burleigh, Oliver Eulenstein, and David Fernández-Baca. Robinson-Foulds supertrees. Algorithms for Molecular Biology, 5(1):18, 2010. Google Scholar
  3. Julien Baste, Christophe Paul, Ignasi Sau, and Celine Scornavacca. Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees. Bulletin of Mathematical biology, 79(4):920-938, 2017. Google Scholar
  4. Bernard R Baum. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon, pages 3-10, 1992. Google Scholar
  5. Vincent Berry and François Nicolas. Maximum agreement and compatible supertrees. In Annual Symposium on Combinatorial Pattern Matching, pages 205-219. Springer, 2004. Google Scholar
  6. Vincent Berry and François Nicolas. Improved parameterized complexity of the maximum agreement subtree and maximum compatible tree problems. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(3):289-302, 2006. Google Scholar
  7. Olaf RP Bininda-Emonds. Phylogenetic supertrees: combining information to reveal the tree of life. Springer Science & Business Media, 2004. Google Scholar
  8. Ruchi Chaudhary, David Fernández-Baca, and John Gordon Burleigh. MulRF: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics, 31(3):432-433, 2014. Google Scholar
  9. James A Cotton and Mark Wilkinson. Majority-rule supertrees. Systematic biology, 56(3):445-452, 2007. Google Scholar
  10. Leonardo De Oliveira Martins, Diego Mallo, and David Posada. A Bayesian supertree model for genome-wide species tree reconstruction. Systematic biology, 65(3):397-416, 2016. Google Scholar
  11. Péter Erdős, Michael A Steel, Laszlo A Szekely, and Tandy J Warnow. Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Computers and Artifical Intelligence, 16(2):217-227, 1997. Google Scholar
  12. Péter L Erdős, Michael A Steel, László A Székely, and Tandy J Warnow. A few logs suffice to build (almost) all trees (I). Random Structures & Algorithms, 14(2):153-184, 1999. Google Scholar
  13. Péter L Erdös, Michael A Steel, László A Székely, and Tandy J Warnow. A few logs suffice to build (almost) all trees (II). Theoretical Computer Science, 221(1-2):77-118, 1999. Google Scholar
  14. David Fernández-Baca, Sylvain Guillemot, Brad Shutters, and Sudheer Vakati. Fixed-parameter algorithms for finding agreement supertrees. SIAM Journal on Computing, 44(2):384-410, 2015. Google Scholar
  15. Markus Fleischauer and Sebastian Böcker. Collecting reliable clades using the greedy strict consensus merger. PeerJ, 4:e2172, 2016. Google Scholar
  16. Markus Fleischauer and Sebastian Böcker. Bad Clade Deletion supertrees: a fast and accurate supertree algorithm. Molecular biology and evolution, 34(9):2408-2421, 2017. Google Scholar
  17. Sylvain Guillemot and Vincent Berry. Fixed-parameter tractability of the maximum agreement supertree problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(2):342-353, 2010. Google Scholar
  18. Anne Kupczok. Split-based computation of majority-rule supertrees. BMC evolutionary biology, 11(1):205, 2011. Google Scholar
  19. Wayne P Maddison. Gene trees in species trees. Systematic Biology, 46(3):523-536, 1997. Google Scholar
  20. FR McMorris and Michael A Steel. The complexity of the median procedure for binary trees. In New Approaches in Classification and Data Analysis, pages 136-140. Springer, 1994. Google Scholar
  21. Siavash Mirarab, Rezwana Reaz, Md S Bayzid, Théo Zimmermann, M Shel Swenson, and Tandy Warnow. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541-i548, 2014. Special issue for ECCB (European Conference on Computational Biology), 2014. Google Scholar
  22. Siavash Mirarab and Tandy Warnow. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44-i52, 2015. Special issue for ISMB 2015. Google Scholar
  23. Erin K. Molloy and Tandy Warnow. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics, 2020. To appear, special issue for ISMB 2020; preprint available at https://www.biorxiv.org/content/10.1101/835553v3.full. Google Scholar
  24. Serita Nelesen, Kevin Liu, Li-San Wang, C Randal Linder, and Tandy Warnow. DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics, 28(12):i274-i282, 2012. Special issue for ISMB 2012. Google Scholar
  25. Nam Nguyen, Siavash Mirarab, and Tandy Warnow. MRL and SuperFine+MRL: new supertree methods. Algorithms for Molecular Biology, 7(1):3, 2012. Google Scholar
  26. Huw A Ogilvie, Joseph Heled, Dong Xie, and Alexei J Drummond. Computational performance and statistical accuracy of *BEAST and comparisons with other methods. Systematic Biology, 65(3):381-396, 2016. Google Scholar
  27. Roderic DM Page. Modified mincut supertrees. In Proceedings WABI (International Workshop on Algorithms in Bioinformatics), pages 537-551. Springer-Verlag, 2002. Google Scholar
  28. Cynthia Phillips and Tandy J Warnow. The asymmetric median tree—a new model for building consensus trees. Discrete Applied Mathematics, 71(1-3):311-335, 1996. Google Scholar
  29. Mark A Ragan. Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution, 1(1):53-58, 1992. Google Scholar
  30. David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2):131-147, 1981. Google Scholar
  31. Sebastien Roch, Michael Nute, and Tandy Warnow. Long-branch attraction in species tree estimation: Inconsistency of partitioned likelihood and topology-based summary methods. Systematic Biology, 68(2):281-297, September 2018. URL: https://doi.org/10.1093/sysbio/syy061.
  32. Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4):406-425, 1987. Google Scholar
  33. Charles Semple and Mike Steel. A supertree method for rooted trees. Discrete Applied Mathematics, 105(1-3):147-158, 2000. Google Scholar
  34. Sagi Snir and Satish Rao. Quartets MaxCut: a divide and conquer quartets algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 7(4):704-718, 2010. Google Scholar
  35. Alexandros Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9):1312-1313, 2014. Google Scholar
  36. Mike Steel and Allen Rodrigo. Maximum likelihood supertrees. Systematic biology, 57(2):243-250, 2008. Google Scholar
  37. M Shel Swenson, François Barbançon, Tandy Warnow, and C Randal Linder. A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms for Molecular Biology, 5(1):8, 2010. Google Scholar
  38. M Shel Swenson, Rahul Suri, C Randal Linder, and Tandy Warnow. An experimental study of Quartets MaxCut and other supertree methods. Algorithms for Molecular Biology, 6(1):7, 2011. Google Scholar
  39. M Shel Swenson, Rahul Suri, C Randal Linder, and Tandy Warnow. SuperFine: fast and accurate supertree estimation. Systematic Biology, 61(2):214, 2011. Google Scholar
  40. Simon Tavaré. Some probabilistic and statistical problems in the analysis of DNA sequences. In R.M. Miura, editor, Lectures on mathematics in the life sciences-DNA sequences, volume 17, pages 57-86. American Mathematical Society, Providence, RI, 1986. Google Scholar
  41. Pranjal Vachaspati and Tandy Warnow. ASTRID: accurate species trees from internode distances. BMC genomics, 16(10):S3, 2015. Google Scholar
  42. Pranjal Vachaspati and Tandy Warnow. FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics, 33(5):631-639, 2016. Google Scholar
  43. Tandy Warnow. Concatenation analyses in the presence of incomplete lineage sorting. PLOS Currents Tree of Life, 2015. URL: https://doi.org/10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7.
  44. Tandy Warnow. Divide-and-conquer tree estimation: Opportunities and challenges. In Bioinformatics and Phylogenetics: Seminal contributions of Bernard Moret, pages 121-150. Springer, 2019. Google Scholar
  45. Mark Wilkinson and James A Cotton. Supertree methods for building the tree of life: divide-and-conquer approaches to large phylogenetic problems. In T. R. Hodkinson and J. A. N. Parnell, editors, Reconstructing the Tree of Life: Taxonomy and Systematics of Large and Species Rich Taxa, pages 61-75. CRC Press, 2007. Systematics Association special volume 72. Google Scholar
  46. Mark Wilkinson, James A. Cotton, Chris Creevey, Oliver Eulenstein, Simon R. Harris, Francois-Joseph Lapointe, Claudine Levasseur, James O. McInerney, Davide Pisani, and Joseph L. Thorley. The shape of supertrees to come: Tree shape related properties of fourteen supertree methods. Systematic Biology, 54(3):419-431, 2005. Google Scholar
  47. Xilin Yu. Computing Robinson-Foulds supertree for two trees. Master’s thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 2019. Available online at URL: http://hdl.handle.net/2142/105698.
  48. Xilin Yu, Thien Le, Sarah Christensen, Erin K Molloy, and Tandy Warnow. Advancing divide-and-conquer phylogeny estimation. bioRxiv, 2020. URL: https://doi.org/10.1101/2020.05.16.099895.
  49. Chao Zhang, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(6):153, 2018. Special issue for RECOMB-CG 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail