Better Practical Algorithms for rSPR Distance and Hybridization Number
The problem of computing the rSPR distance of two phylogenetic trees (denoted by RDC) is NP-hard and so is the problem of computing the hybridization number of two phylogenetic trees (denoted by HNC). Since they are important problems in phylogenetics, they have been studied extensively in the literature. Indeed, quite a number of exact or approximation algorithms have been designed and implemented for them. In this paper, we design and implement one exact algorithm for HNC and several approximation algorithms for RDC and HNC. Our experimental results show that the resulting exact program is much faster (namely, more than 80 times faster for the easiest dataset used in the experiments) than the previous best and its superiority in speed becomes even more significant for more difficult instances. Moreover, the resulting approximation programs output much better results than the previous bests; indeed, the outputs are always nearly optimal and often optimal. Of particular interest is the usage of the Monte Carlo tree search (MCTS) method in the design of our approximation algorithms. Our experimental results show that with MCTS, we can often solve HNC exactly within short time.
phylogenetic tree
fixed-parameter algorithms
approximation algorithms
Monte Carlo tree search
Theory of computation~Theory and algorithms for application domains
5:1-5:12
Regular Paper
Our programs are available at http://rnc.r.dendai.ac.jp/rsprHN.html.
Kohei
Yamada
Kohei Yamada
Division of Information System Design, Tokyo Denki University, Japan
Zhi-Zhong
Chen
Zhi-Zhong Chen
Division of Information System Design, Tokyo Denki University, Japan
Lusheng
Wang
Lusheng Wang
Department of Computer Science, City University of Hong Kong, China
10.4230/LIPIcs.WABI.2019.5
B. Albrecht, C. Scornavacca, A. Cenci, and D.H. Huson. Fast computation of minimum hybridization networks. Bioinformatics, 28(2):191-197, 2012.
M. Baroni, C. Semple, and M. Steel. Hybrids in real time. Systematic Biology, 55(1):46-56, 2006.
R.G. Beiko and N. Hamilton. Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology, 6(15):159-169, 2006.
M. Bordewich and C. Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics, 8(4):409-423, 2005.
C. Browne, E. Powley, D. Whitehouse, S. Lucas, P.I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton. A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1-49, 2012.
Z.-Z. Chen, Y. Fan, and L. Wang. Faster exact computation of rSPR distance. Journal of Combinatorial Optimization, 29(3):605-635, 2015.
Z.-Z. Chen, Y. Harada, Y. Nakamura, and L. Wang. Faster exact computation of rSPR distance via better approximation. IEEE/ACM Transactions on Computational Biology and Bioinformatics, to appear.
Z.-Z. Chen, E. Machida, and L. Wang. An Approximation Algorithm for rSPR Distance. In 22nd International Computing and Combinatorics Conference, Ho Chi Minh City, Vietnam, August 2-4, 2016, pages 468-479, 2016.
Z.-Z. Chen and L. Wang. Algorithms for reticulate networks of multiple phylogenetic trees. IEEE/ACM Trans. on Computational Biology and Bioinformatics, 9(2):372-384, 2012.
Z.-Z. Chen and L. Wang. An ultrafast tool for minimum reticulate networks. Journal of Computational Biology, 20(1):38-41, 2013.
L. Collins, S. Linz, and C. Semple. Quantifying hybridization in realistic time. J. of Comput. Biol., 18(10):1305-1318, 2011.
J. Hein, T. Jing, L. Wang, and K. Zhang. On the complexity of comparing evolutionary trees. Disc. Appl. Math., 71(1-3):153-169, 1996.
D.H. Huson, R. Rupp, and C. Scornavacca. Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, 2010.
S. Kelk, L. van Iersel, N. Lekic, S. Linz, C. Scornavacca, and L. Stougie. Cycle killer...qu'est-ce que c'est? On the comparative approximability of hybridization number and directed feedback vertex set. SIAM J. Discrete Math., 26(4):1635-1656, 2012.
F. Schalekamp, A. van Zuylen, and S. van der Ster. A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest. In 43rd International Colloquium on Automata, Languages and Programming, Rome, Italy, July 11-15, 2016, pages 70:1-70:14, 2016.
L. van Iersel, S. Kelk, N. Lekic, and C. Scornavacca. A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees. BMC Bioinformatics, 15(127), 2014.
C. Whidden, R.G. Beiko, and N. Zeh. Fast FPT algorithms for computing rooted agreement forest: theory and experiments. In International Symposium on Experimental Algorithms, Naples, Italy, May 20-22, 2010, pages 141-153, 2010.
C. Whidden, R.G. Beiko, and N. Zeh. Fixed-parameter algorithms for maximum agreement forests. SIAM J. Comput., 42(4):1431-1466, 2013.
C. Whidden and N. Zeh. A unifying view on approximation and FPT of agreement forests. In 9th International Workshop on Algorithms in Bioinformatics, Philadelphia, PA, USA, September 12-13, 2009, pages 390-401, 2009.
Y. Wu. A practical method for exact computation of subtree prune and regraft distance. Bioinformatics, 25(2):190-196, 2009.
Kohei Yamada, Zhi-Zhong Chen, and Lusheng Wang
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode