Rapidly Computing the Phylogenetic Transfer Index

Authors Jakub Truszkowski , Olivier Gascuel, Krister M. Swenson



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.20.pdf
  • Filesize: 0.67 MB
  • 12 pages

Document Identifiers

Author Details

Jakub Truszkowski
  • LIRMM, CNRS, Université Montpellier, Montpellier, France
Olivier Gascuel
  • Unité Bioinformatique Evolutive, Département de Biologie Computationnelle, USR 3756, Institut Pasteur et CNRS, Paris, France
  • LIRMM, CNRS, Université Montpellier, Montpellier, France
Krister M. Swenson
  • LIRMM, CNRS, Université Montpellier, Montpellier, France

Acknowledgements

We would like to thank Jeet Sukuraman for his help with the Dendropy package.

Cite AsGet BibTex

Jakub Truszkowski, Olivier Gascuel, and Krister M. Swenson. Rapidly Computing the Phylogenetic Transfer Index. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.20

Abstract

Given trees T and T_o on the same taxon set, the transfer index phi(b,T_o) is the number of taxa that need to be ignored so that the bipartition induced by branch b in T is equal to some bipartition in T_o. Recently, Lemoine et al. [Lemoine et al., 2018] used the transfer index to design a novel bootstrap analysis technique that improves on Felsenstein’s bootstrap on large, noisy data sets. In this work, we propose an algorithm that computes the transfer index for all branches b in T in O(n log^3 n) time, which improves upon the current O(n^2)-time algorithm by Lin, Rajan and Moret [Lin et al., 2012]. Our implementation is able to process pairs of trees with hundreds of thousands of taxa in minutes and considerably speeds up the method of Lemoine et al. on large data sets. We believe our algorithm can be useful for comparing large phylogenies, especially when some taxa are misplaced (e.g. due to horizontal gene transfer, recombination, or reconstruction errors).

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • Theory of computation → Design and analysis of algorithms
Keywords
  • large phylogenies
  • bootstrap analysis
  • tree comparison
  • data structures on trees

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. URL: https://bitbucket.org/thekswenson/rapid_transferindex/.
  2. URL: https://github.com/thekswenson/booster.
  3. GS Brodal, R Fagerberg, and CNS Pedersen. Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica, 38(2):377-395, 2004. Google Scholar
  4. Daniel G Brown and Jakub Truszkowski. Fast phylogenetic tree reconstruction using locality-sensitive hashing. In Algorithms in Bioinformatics, pages 14-29. Springer, 2012. Google Scholar
  5. Daniel G Brown and Jakub Truszkowski. Fast error-tolerant quartet phylogeny algorithms. Theoretical Computer Science, 483:104-114, 2013. Google Scholar
  6. D Bryant, J Tsang, PE Kearney, and M Li. Computing the quartet distance between evolutionary trees. In Proceedings of SODA 2000, pages 285-286, 2000. Google Scholar
  7. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms, volume 2. MIT press Cambridge, 2001. Google Scholar
  8. M Dávila Felipe, J-B Domelevo Entfellner, F Lemoine, J Truszkowski, and O Gascuel. Distribution and asymptotic behavior of the phylogenetic transfer distance. Journal of Mathematical Biology, April 2019. Google Scholar
  9. WHE Day. Optimal algorithms for comparing trees with labeled leaves. Journal of classification, 2(1):7-28, 1985. Google Scholar
  10. Péter L Erdös, Michael A Steel, László A Székely, and Tandy J Warnow. A few logs suffice to build (almost) all trees: part II. Theoretical Computer Science, 221(1-2):77-118, 1999. Google Scholar
  11. Joseph Felsenstein. Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39(4):783-791, 1985. Google Scholar
  12. K Gori, T Suchan, N Alvarez, N Goldman, and C Dessimoz. Clustering genes of common evolutionary history. Molecular biology and evolution, 33(6):1590-1605, 2016. Google Scholar
  13. F Lemoine, J-B Domelevo Entfellner, E Wilkinson, D Correia, M Dávila Felipe, T de Oliveira, and O Gascuel. Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature, 556(7702):452, 2018. Google Scholar
  14. Yu Lin, Vaibhav Rajan, and Bernard ME Moret. A metric for phylogenetic trees based on matching. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(4):1014-1022, 2012. Google Scholar
  15. M Pawlik and N Augsten. RTED: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment, 5(4):334-345, 2011. Google Scholar
  16. Morgan N Price, Paramvir S Dehal, and Adam P Arkin. FastTree 2-approximately maximum-likelihood trees for large alignments. PloS one, 5(3):e9490, 2010. Google Scholar
  17. DD Sleator and RE Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26(3):362-391, 1983. Google Scholar
  18. Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016. Google Scholar
  19. Jeet Sukumaran and Mark T Holder. DendroPy: a Python library for phylogenetic computing. Bioinformatics, 26(12):1569-1571, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail