Computing Maximum Agreement Forests without Cluster Partitioning is Folly

Authors Zhijiang Li, Norbert Zeh



PDF
Thumbnail PDF

File

LIPIcs.ESA.2017.56.pdf
  • Filesize: 0.67 MB
  • 14 pages

Document Identifiers

Author Details

Zhijiang Li
Norbert Zeh

Cite AsGet BibTex

Zhijiang Li and Norbert Zeh. Computing Maximum Agreement Forests without Cluster Partitioning is Folly. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 56:1-56:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ESA.2017.56

Abstract

Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance.
Keywords
  • fixed-parameter tractability
  • agreement forests
  • hybridization
  • subtree prune-and-regraft

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Benjamin Albrecht, Céline Scornavacca, Alberto Cenci, and Daniel H. Huson. Fast computation of minimum hybridization networks. Bioinformatics, 28(2):191-197, 2012. Google Scholar
  2. M. Baroni, S. Grünewald, V. Moulton, and C. Semple. Bounding the number of hybridisation events for a consistent evolutionary history. Journal of Mathematical Biology, 51(2):171-182, 2005. Google Scholar
  3. M. Baroni, C. Semple, and M. Steel. Hybrids in real time. Systematic Biology, 55:46-56, 2006. Google Scholar
  4. Robert G. Beiko. Telling the whole story in a 10,000-genome world. Biology Direct, 6(1):34, 2011. Google Scholar
  5. M. Bordewich and C. Semple. Computing the minimum number of hybridization events for a consistent evolutionary history. Discrete Applied Mathematics, 155(8):914-928, 2007. Google Scholar
  6. Magnus Bordewich, Céline Scornavacca, Nihan Tokac, and Mathias Weller. On the fixed parameter tractability of agreement-based phylogenetic distances. Journal of Mathematical Biology, 74(1):239-257, 2017. Google Scholar
  7. Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics, 8(4):409-423, 2005. Google Scholar
  8. Magnus Bordewich and Charles Semple. Computing the hybridization number of two phylogenetic trees is fixed-parameter tractable. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(3):458-466, 2007. Google Scholar
  9. Joshua Collins, Simone Linz, and Charles Semple. Quantifying hybridization in realistic time. Journal of Computational Biology, 18(10):1305–1318, 2011. Google Scholar
  10. Grass Phylogeny Working Group. Phylogeny and subfamilial classification of the grasses (poaceae). Annals of the Missouri Botanical Garden, page 373–457, 2001. Google Scholar
  11. D. M. Hillis, C. Moritz, and B. K. Mable, editors. Molecular Systematics. Sinauer Associates, 1996. Google Scholar
  12. Leo van Iersel, Steven Kelk, Nela Lekić, and Leen Stougie. A short note on exponential-time algorithms for hybridization number. CoRR, abs/1312.1255, 2013. Google Scholar
  13. Leo van Iersel, Steven Kelk, Nela Lekić, and Leen Stougie. Approximation algorithms for nonbinary agreement forests. SIAM Journal on Discrete Mathematics, 28(1):49-66, 2014. Google Scholar
  14. Leo van Iersel and Simone Linz. A quadratic kernel for computing the hybridization number of multiple trees. Information Processing Letters, 113(9):318–323, 2013. Google Scholar
  15. Steven Kelk and Céline Scornavacca. Towards the fixed parameter tractability of constructing minimal phylogenetic networks from arbitrary sets of nonbinary trees. CoRR, abs/1207.7034, 2012. Google Scholar
  16. Zhijiang Li. Fixed-parameter algorithm for hybridization number of two multifurcating trees. Master’s thesis, Faculty of Computer Science, Dalhousie University, 2015. Google Scholar
  17. Zhijiang Li and Norbert Zeh. A fast algorithm for computing a soft hybridization network of two multifurcating trees. Manuscript in preparation. Google Scholar
  18. Simone Linz and Charles Semple. Hybridization in nonbinary trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(1):30-45, 2009. Google Scholar
  19. Simone Linz and Charles Semple. A cluster reduction for computing the subtree distance between phylogenies. Annals of Combinatorics, 15(3):465–484, 2011. Google Scholar
  20. V. Rosas-Magallanes, P. Deschavanne, L. Qintana-Murci, R. Brosch, B. Gicquel, and O. Neyrolles. Horizontal transfer of a virulence operon to the ancestor of mycobacterium tuberculosis. Molecular Biology and Evolution, 23(6):1129-1135, 2006. Google Scholar
  21. Heiko Schmidt. Phylogenetic Trees from Large Datasets. PhD thesis, Heinrich-Heine-Universität, Düsseldorf, Germany, 2003. Google Scholar
  22. Feng Shi, Jie You, and Qilong Feng. Improved approximation algorithm for maximum agreement forest of two trees. In Proceedings of the 8th International Workshop on Frontiers in Algorithmics, pages 205-215, 2014. Google Scholar
  23. Chris Whidden. Efficient Computation of Maximum Agreement Forests and Their Applications. PhD thesis, Faculty of Computer Science, Dalhousie University, 2013. Google Scholar
  24. Chris Whidden, Robert G. Beiko, and Norbert Zeh. Fixed-parameter algorithms for maximum agreement forests. SIAM Journal on Computing, 42(4):1431-1466, 2013. Google Scholar
  25. Chris Whidden, Robert G. Beiko, and Norbert Zeh. Fixed-parameter and approximation algorithms for maximum agreement forests of multifurcating trees. Algorithmica, 74(3):1019-1054, 2016. Google Scholar
  26. Chris Whidden and Norbert Zeh. Computing the SPR distance of rooted binary trees in O(2^kn) time. Manuscript in preparation. Google Scholar
  27. Chris Whidden, Norbert Zeh, and Robert G. Beiko. Supertrees based on the subtree prune-and-regraft distance. Systematic Biology, 63(4):566-581, 2014. Google Scholar
  28. Yufeng Wu and Jiayin Wang. Fast computation of the exact hybridization number of two phylogenetic trees. In Proceedings of the 6th International Symposium on Bioinformatics Research and Applications, pages 203-214. Springer-Verlag, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail