Rapidly Computing the Phylogenetic Transfer Index

Truszkowski, Jakub; Gascuel, Olivier; Swenson, Krister M.

doi:10.4230/LIPIcs.WABI.2019.20

File

LIPIcs.WABI.2019.20.pdf

Filesize: 0.67 MB
12 pages

Document Identifiers

DOI: 10.4230/LIPIcs.WABI.2019.20
URN: urn:nbn:de:0030-drops-110505

Author Details

Jakub Truszkowski

LIRMM, CNRS, Université Montpellier, Montpellier, France

Olivier Gascuel

Unité Bioinformatique Evolutive, Département de Biologie Computationnelle, USR 3756, Institut Pasteur et CNRS, Paris, France
LIRMM, CNRS, Université Montpellier, Montpellier, France

Krister M. Swenson

LIRMM, CNRS, Université Montpellier, Montpellier, France

Acknowledgements

We would like to thank Jeet Sukuraman for his help with the Dendropy package.

Cite AsGet BibTex

Jakub Truszkowski, Olivier Gascuel, and Krister M. Swenson. Rapidly Computing the Phylogenetic Transfer Index. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.20

Abstract

Given trees T and T_o on the same taxon set, the transfer index phi(b,T_o) is the number of taxa that need to be ignored so that the bipartition induced by branch b in T is equal to some bipartition in T_o. Recently, Lemoine et al. [Lemoine et al., 2018] used the transfer index to design a novel bootstrap analysis technique that improves on Felsenstein’s bootstrap on large, noisy data sets. In this work, we propose an algorithm that computes the transfer index for all branches b in T in O(n log^3 n) time, which improves upon the current O(n^2)-time algorithm by Lin, Rajan and Moret [Lin et al., 2012]. Our implementation is able to process pairs of trees with hundreds of thousands of taxa in minutes and considerably speeds up the method of Lemoine et al. on large data sets. We believe our algorithm can be useful for comparing large phylogenies, especially when some taxa are misplaced (e.g. due to horizontal gene transfer, recombination, or reconstruction errors).

Subject Classification

ACM Subject Classification

Applied computing → Bioinformatics
Theory of computation → Design and analysis of algorithms

Keywords

large phylogenies
bootstrap analysis
tree comparison
data structures on trees

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

URL: https://bitbucket.org/thekswenson/rapid_transferindex/.
URL: https://github.com/thekswenson/booster.
GS Brodal, R Fagerberg, and CNS Pedersen. Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica, 38(2):377-395, 2004.
Daniel G Brown and Jakub Truszkowski. Fast phylogenetic tree reconstruction using locality-sensitive hashing. In Algorithms in Bioinformatics, pages 14-29. Springer, 2012.
Daniel G Brown and Jakub Truszkowski. Fast error-tolerant quartet phylogeny algorithms. Theoretical Computer Science, 483:104-114, 2013.
D Bryant, J Tsang, PE Kearney, and M Li. Computing the quartet distance between evolutionary trees. In Proceedings of SODA 2000, pages 285-286, 2000.
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein, et al. Introduction to algorithms, volume 2. MIT press Cambridge, 2001.
M Dávila Felipe, J-B Domelevo Entfellner, F Lemoine, J Truszkowski, and O Gascuel. Distribution and asymptotic behavior of the phylogenetic transfer distance. Journal of Mathematical Biology, April 2019.
WHE Day. Optimal algorithms for comparing trees with labeled leaves. Journal of classification, 2(1):7-28, 1985.
Péter L Erdös, Michael A Steel, László A Székely, and Tandy J Warnow. A few logs suffice to build (almost) all trees: part II. Theoretical Computer Science, 221(1-2):77-118, 1999.
Joseph Felsenstein. Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39(4):783-791, 1985.
K Gori, T Suchan, N Alvarez, N Goldman, and C Dessimoz. Clustering genes of common evolutionary history. Molecular biology and evolution, 33(6):1590-1605, 2016.
F Lemoine, J-B Domelevo Entfellner, E Wilkinson, D Correia, M Dávila Felipe, T de Oliveira, and O Gascuel. Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature, 556(7702):452, 2018.
Yu Lin, Vaibhav Rajan, and Bernard ME Moret. A metric for phylogenetic trees based on matching. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(4):1014-1022, 2012.
M Pawlik and N Augsten. RTED: a robust algorithm for the tree edit distance. Proceedings of the VLDB Endowment, 5(4):334-345, 2011.
Morgan N Price, Paramvir S Dehal, and Adam P Arkin. FastTree 2-approximately maximum-likelihood trees for large alignments. PloS one, 5(3):e9490, 2010.
DD Sleator and RE Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26(3):362-391, 1983.
Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016.
Jeet Sukumaran and Mark T Holder. DendroPy: a Python library for phylogenetic computing. Bioinformatics, 26(12):1569-1571, 2010.