Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time

Authors Shayesteh Arasti , Siavash Mirarab



PDF
Thumbnail PDF

File

LIPIcs.WABI.2023.4.pdf
  • Filesize: 1.76 MB
  • 20 pages

Document Identifiers

Author Details

Shayesteh Arasti
  • Computer Science and Engineering Department, University of California, San Diego, CA, USA
Siavash Mirarab
  • Electrical and Computer Engineering Department, University of California, San Diego, CA, USA

Cite AsGet BibTex

Shayesteh Arasti and Siavash Mirarab. Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.WABI.2023.4

Abstract

Finding a tree with the minimum total distance to a given set of trees (the median tree) is increasingly needed in phylogenetics. Defining tree distance as the number of induced four-taxon unrooted (i.e., quartet) trees with different topologies, the median of a set of gene trees is a statistically consistent estimator of the species tree under several models of gene tree species tree discordance. Because of this, median trees defined with quartet distance are widely used in practice for species tree inference. Nevertheless, the problem is NP-Hard and the widely-used solutions are heuristics. In this paper, we pave the way for a new type of heuristic solution to this problem. We show that the optimal place to add a subtree of size m onto a tree with n leaves can be found in time that grows quasi-linearly with n and is nearly independent of m. This algorithm can be used to perform subtree prune and regraft (SPR) moves efficiently, which in turn enables the hill-climbing heuristic search for the optimal tree. In exploratory experiments, we show that our algorithm can improve the quartet score of trees obtained using the existing widely-used methods.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • Phylogenetics
  • Gene tree discordance
  • Quartet score
  • Quartet distance
  • Subtree prune and regraft
  • Tree search
  • ASTRAL

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. E S Allman, James H. Degnan, and J A Rhodes. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol., 62:833-862, 2011. Google Scholar
  2. Eliran Avni, Reuven Cohen, and Sagi Snir. Weighted Quartets Phylogenetics. Systematic Biology, 64(2):233-242, March 2015. URL: https://doi.org/10.1093/sysbio/syu087.
  3. Paul D Blischak, Jeremy M Brown, Zhen Cao, Alison Cloutier, Kerry Cobb, Alexandria A DiGiacomo, Deren AR Eaton, Scott V Edwards, Kyle A Gallivan, and Daniel J Gates. Species Tree Inference: A Guide to Methods and Applications. Princeton University Press, 2023. Google Scholar
  4. Gerth Stølting Brodal, Rolf Fagerberg, Thomas Mailund, Christian N. S. Pedersen, and Andreas Sand. Efficient Algorithms for Computing the Triplet and Quartet Distance Between Trees of Arbitrary Degree. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1814-1832, Philadelphia, PA, January 2013. Society for Industrial and Applied Mathematics. URL: https://doi.org/10.1137/1.9781611973105.130.
  5. David Bryant and Mike Steel. Constructing Optimal Trees from Quartets. Journal of Algorithms, 38(1):237-259, January 2001. URL: https://doi.org/10.1006/jagm.2000.1133.
  6. David Bryant, John Tsang, Paul E Kearney, and Ming Li. Computing the quartet distance between evolutionary trees. In Algorithms and Computation. ISAAC 2001, volume 9(11) of LNCS, pages 285-286. Citeseer, 2000. Google Scholar
  7. Julia Chifman and Laura S Kubatko. Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics, 30(23):3317-3324, August 2014. Publisher: Oxford Univ Press. URL: https://doi.org/10.1093/bioinformatics/btu530.
  8. Ruth Davidson, Pranjal Vachaspati, Siavash Mirarab, and Tandy Warnow. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics, 16(Suppl 10):S1, 2015. URL: https://doi.org/10.1186/1471-2164-16-S10-S1.
  9. G. F. Estabrook, F. R. McMorris, and C. A. Meacham. Comparison of Undirected Phylogenetic Trees Based on Subtrees of Four Evolutionary Units. Systematic Biology, 34(2):193-200, June 1985. URL: https://doi.org/10.2307/sysbio/34.2.193.
  10. M. T. Hallett and Jens Lagergren. New algorithms for the duplication-loss model. In Proceedings of the fourth annual international conference on Computational molecular biology - RECOMB '00, pages 138-146, New York, New York, USA, 2000. ACM Press. URL: https://doi.org/10.1145/332306.332359.
  11. Max Hill, Brandon Legried, and Sebastien Roch. Species tree estimation under joint modeling of coalescence and duplication: sample complexity of quartet methods. arXiv, 2020. URL: https://arxiv.org/abs/2007.06697.
  12. Yueyu Jiang, Metin Balaban, Qiyun Zhu, and Siavash Mirarab. DEPP: Deep Learning Enables Extending Species Trees using Single Genes. Systematic Biology, page 2021.01.22.427808, April 2022. URL: https://doi.org/10.1093/sysbio/syac031.
  13. Manuel Lafond and Celine Scornavacca. On the Weighted Quartet Consensus problem. Theoretical Computer Science, 769:1-17, May 2019. https://arxiv.org/abs/1610.00505 Genre: Data Structures and Algorithms. URL: https://doi.org/10.1016/j.tcs.2018.10.005.
  14. Bret R. Larget, Satish K. Kotha, Colin N. Dewey, and Cécile Ané. BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics, 26(22):2910-2911, November 2010. https://arxiv.org/abs/0912.4472 Publisher: Department of Statistics, University of Wisconsin-Madison, WI 53706, USA. ISBN: 03036812. URL: https://doi.org/10.1093/bioinformatics/btq539.
  15. Brandon Legried, Erin K Molloy, Tandy Warnow, and Sébastien Roch. Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss. Journal of Computational Biology, 28(5):452-468, May 2021. URL: https://doi.org/10.1089/cmb.2020.0424.
  16. Wayne P. Maddison. Gene Trees in Species Trees. Systematic Biology, 46(3):523-536, September 1997. URL: https://doi.org/10.2307/2413694.
  17. Uyen Mai and Siavash Mirarab. Completing gene trees without species trees in sub-quadratic time. Bioinformatics, 38(6):1532-1541, March 2022. URL: https://doi.org/10.1093/bioinformatics/btab875.
  18. Diego Mallo, Leonardo De Oliveira Martins, and David Posada. SimPhy : Phylogenomic Simulation of Gene, Locus, and Species Trees. Systematic Biology, 65(2):334-344, March 2016. URL: https://doi.org/10.1093/sysbio/syv082.
  19. Alexey Markin and Oliver Eulenstein. Quartet-based inference is statistically consistent under the unified duplication-loss-coalescence model. Bioinformatics, page btab414, May 2021. URL: https://doi.org/10.1093/bioinformatics/btab414.
  20. Siavash Mirarab, Luay Nakhleh, and Tandy Warnow. Multispecies Coalescent: Theory and Applications in Phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 52(1):247-268, November 2021. URL: https://doi.org/10.1146/annurev-ecolsys-012121-095340.
  21. Siavash Mirarab, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, M. S. Swenson, and Tandy Warnow. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541-i548, September 2014. URL: https://doi.org/10.1093/bioinformatics/btu462.
  22. Siavash Mirarab and Tandy Warnow. ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44-i52, June 2015. URL: https://doi.org/10.1093/bioinformatics/btv234.
  23. P Pamilo and M Nei. Relationships between gene trees and species trees. Molecular biology and evolution, 5(5):568-583, 1988. ISBN: 0737-4038 (Print). URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=3193878.
  24. Morgan N. Price, Paramvir S. Dehal, and Adam P. Arkin. FastTree-2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE, 5(3):e9490, March 2010. Publisher: Public Library of Science. URL: https://doi.org/10.1371/journal.pone.0009490.
  25. Maryam Rabiee and Siavash Mirarab. INSTRAL: Discordance-Aware Phylogenetic Placement Using Quartet Scores. Systematic Biology, 69(2):384-391, August 2020. URL: https://doi.org/10.1093/sysbio/syz045.
  26. Bruce Rannala and Ziheng Yang. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164(4):1645-1656, 2003. Publisher: Department of Medical Genetics, University of Alberta, Edmonton, Alberta T6G 2H7, Canada. Google Scholar
  27. DF Robinson and LR Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981. URL: http://www.sciencedirect.com/science/article/pii/0025556481900432.
  28. Andreas Sand, Morten K. Holt, Jens Johansen, Gerth Stølting Brodal, Thomas Mailund, and Christian N. S. Pedersen. tqDist: a library for computing the quartet and triplet distances between binary or general trees. Bioinformatics, 30(14):2079-2080, July 2014. URL: https://doi.org/10.1093/bioinformatics/btu157.
  29. Erfan Sayyari and Siavash Mirarab. Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction. BMC Genomics, 17(S10):101-113, November 2016. URL: https://doi.org/10.1186/s12864-016-3098-z.
  30. Erfan Sayyari and Siavash Mirarab. Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies. Molecular Biology and Evolution, 33(7):1654-1668, July 2016. URL: https://doi.org/10.1093/molbev/msw079.
  31. Sagi Snir, Tandy Warnow, and Satish Rao. Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm. Journal of Computational Biology, 15(1):91-103, January 2008. URL: https://doi.org/10.1089/cmb.2007.0103.
  32. Michael Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9(1):91-116, January 1992. URL: https://doi.org/10.1007/BF02618470.
  33. Cuong Than and Luay Nakhleh. Species Tree Inference by Minimizing Deep Coalescences. PLoS Computational Biology, 5(9):e1000501, September 2009. URL: https://doi.org/10.1371/journal.pcbi.1000501.
  34. Chao Zhang and Siavash Mirarab. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Molecular Biology and Evolution, 39(12):msac215, October 2022. URL: https://doi.org/10.1093/molbev/msac215.
  35. Chao Zhang, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(S6):153, May 2018. URL: https://doi.org/10.1186/s12859-018-2129-y.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail