Cache Oblivious Algorithms for Computing the Triplet Distance Between Trees

Authors Gerth Stølting Brodal, Konstantinos Mampentzidis



PDF
Thumbnail PDF

File

LIPIcs.ESA.2017.21.pdf
  • Filesize: 0.65 MB
  • 14 pages

Document Identifiers

Author Details

Gerth Stølting Brodal
Konstantinos Mampentzidis

Cite As Get BibTex

Gerth Stølting Brodal and Konstantinos Mampentzidis. Cache Oblivious Algorithms for Computing the Triplet Distance Between Trees. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ESA.2017.21

Abstract

We study the problem of computing the triplet distance between two rooted unordered trees with n labeled leafs. Introduced by Dobson 1975, the triplet distance is the number of leaf triples that induce different topologies in the two trees. The current theoretically best algorithm is an O(nlogn) time algorithm by Brodal et al. [SODA 2013]. Recently Jansson et al. proposed a new algorithm that, while slower in theory, requiring O(n log^3 n) time, in practice it outperforms the theoretically faster O(n log n) algorithm. Both algorithms do not scale to external memory.

We present two cache oblivious algorithms that combine the best of both worlds. The first algorithm is for the case when the two input trees are binary trees and the second a generalized algorithm for two input trees of arbitrary degree. Analyzed in the RAM model, both algorithms require O(n log n) time, and in the cache oblivious model O(n/B log_{2}(n/M)) I/Os. Their relative simplicity and the fact that they scale to external memory makes them achieve the best practical performance. We note that these are the first algorithms that scale to external memory, both in theory and practice, for this problem.

Subject Classification

Keywords
  • Phylogenetic tree
  • tree comparison
  • triplet distance
  • cache oblivious algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M.S. Bansal, J. Dong, and D. Fernández-Baca. Comparing and aggregating partially resolved trees. Theoretical Computer Science, 412(48):6634-6652, 2011. Google Scholar
  2. V. Berry and O. Gascuel. Inferring evolutionary trees with strong combinatorial evidence. Theoretical Computer Science, 240(2):271-298, 2000. URL: http://dx.doi.org/10.1016/S0304-3975(99)00235-2.
  3. G.S. Brodal, R. Fagerberg, C.N.S. Pedersen, T. Mailund, and A. Sand. Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree. 24th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1814-1832, 2013. URL: http://dx.doi.org/10.1137/1.9781611973105.130.
  4. G.S Brodal and K. Mampentzidis. Cache oblivious algorithms for computing the triplet distance between trees. Computing Research Repository, abs/1706.10284, 2017. Google Scholar
  5. D.E. Critchlow, D.K. Pearl, and C.L. Qian. The Triples Distance for Rooted Bifurcating Phylogenetic Trees. Systematic Biology, 45(3):323, 1996. URL: http://dx.doi.org/10.1093/sysbio/45.3.323.
  6. A.J. Dobson. Comparing the shapes of trees. Combinatorial Mathematics III. Lecture Notes in Mathematics, pages 95-100, 1975. URL: http://dx.doi.org/10.1007/BFb0069548.
  7. G.F. Estabrook, F.R. McMorris, and C.A. Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology, 34(2):193-200, 1985. URL: http://dx.doi.org/10.2307/2413326.
  8. M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. 40th Annual IEEE Symposium on Foundations of Computer Science, pages 285-297, 1999. URL: http://dx.doi.org/10.1109/SFFCS.1999.814600.
  9. M.K. Holt, J. Johansen, and G.S. Brodal. On the scalability of computing triplet and quartet distances. 16th Workshop on Algorithm Engineering and Experiments, pages 9-19, 2014. URL: http://dx.doi.org/10.1137/1.9781611973198.2.
  10. J. Jansson and R. Rajaby. A more practical algorithm for the rooted triplet distance. International Conference on Algorithms for Computational Biology, pages 109-125, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21233-3_9.
  11. J. Jansson and R. Rajaby. A More Practical Algorithm for the Rooted Triplet Distance. Journal of Computational Biology, 24(2):106-126, 2017. URL: http://dx.doi.org/10.1089/cmb.2016.0185.
  12. D.F. Robinson and L.R. Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1):131-147, 1981. URL: http://dx.doi.org/http://dx.doi.org/10.1016/0025-5564(81)90043-2.
  13. N. Saitou and M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4):406, 1987. URL: http://dx.doi.org/10.1093/oxfordjournals.molbev.a040454.
  14. A. Sand, G.S. Brodal, R. Fagerberg, C.N.S. Pedersen, and T. Mailund. A practical O(n log² n) time algorithm for computing the triplet distance on binary trees. BMC Bioinformatics, 14(2):S18, 2013. URL: http://dx.doi.org/10.1186/1471-2105-14-S2-S18.
  15. A. Sand, M.K. Holt, J. Johansen, G.S. Brodal, T. Mailund, and C.N.S. Pedersen. tqdist: A library for computing the quartet and triplet distances between binary or general trees. Bioinformatics, 30(14):2079, 2014. URL: http://dx.doi.org/10.1093/bioinformatics/btu157.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail