Treewidth-Based Algorithms for the Small Parsimony Problem on Networks
Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number of possible character-states with the number of reticulate events (per biconnected component). Here, we consider the treewidth of the undirected graph underlying the input network as parameter, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on networks. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems.
Phylogenetics
parsimony
phylogenetic networks
parameterized complexity
dynamic programming
treewidth
Theory of computation~Fixed parameter tractability
Applied computing~Molecular sequence analysis
6:1-6:21
Regular Paper
This work was supported by French Agence Nationale de la Recherche through the CoCoAlSeq project (ANR-19-CE45-0012).
We thank Christophe Paul for sharing his expertise on treewidth formulations.
Celine
Scornavacca
Celine Scornavacca
Institut des Sciences de l’Evolution, Université de Montpellier, CNRS, IRD, EPHE, France
Mathias
Weller
Mathias Weller
LIGM, CNRS, Université Gustave Eiffel, Paris, France
10.4230/LIPIcs.WABI.2021.6
Stefan Arnborg. Efficient algorithms for combinatorial problems on graphs with bounded decomposability - a survey. BIT Numerical Mathematics, 25(1):1-23, 1985.
Various Authors. The graph parameter hierarchy. Available at https://gitlab.com/gruenwald/parameter-hierarchy, 2021.
https://gitlab.com/gruenwald/parameter-hierarchy
Vincent Berry, Celine Scornavacca, and Mathias Weller. Scanning phylogenetic networks is NP-hard. In Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'20), pages 519-530. Springer, 2020.
Hans L Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing, 25(6):1305-1317, 1996.
Hans L. Bodlaender. Discovering treewidth. In Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM'05), pages 1-16, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg.
Christopher Bryant, Mareike Fischer, Simone Linz, and Charles Semple. On the quirks of maximum parsimony and likelihood on phylogenetic networks. Journal of Theoretical Biology, 417:100-108, 2017.
David Bryant and Jens Lagergren. Compatibility of unrooted phylogenetic trees is FPT. Theoretical Computer Science, 351(3):296-302, 2006.
Laurent Bulteau and Mathias Weller. Parameterized algorithms in bioinformatics: an overview. Algorithms, 12(12):256, 2019.
Bruno Courcelle. The monadic second-order logic of graphs. i. recognizable sets of finite graphs. Information and computation, 85(1):12-75, 1990.
Holger Dell, Christian Komusiewicz, Nimrod Talmon, and Mathias Weller. The PACE 2017 Parameterized Algorithms and Computational Experiments Challenge: The Second Iteration. In 12th International Symposium on Parameterized and Exact Computation (IPEC 2017), volume 89 of Leibniz International Proceedings in Informatics (LIPIcs), pages 30:1-30:12, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
Nick D. Dendris, Lefteris M. Kirousis, and Dimitrios M. Thilikos. Fugitive-search games on graphs and related parameters. Theoretical Computer Science, 172(1):233-254, 1997.
Joseph Felsenstein. Inferring phylogenies, volume 2. Sinauer associates Sunderland, MA, 2004.
Mareike Fischer, Leo Van Iersel, Steven Kelk, and Celine Scornavacca. On computing the maximum parsimony score of a phylogenetic network. SIAM Journal on Discrete Mathematics, 29(1):559-585, 2015.
Walter M Fitch. Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Biology, 20(4):406-416, 1971.
Jotun Hein. Reconstructing evolution of sequences subject to recombination using parsimony. Mathematical Biosciences, 98(2):185-200, 1990.
Daniel H Huson, Regula Rupp, and Celine Scornavacca. Phylogenetic networks: concepts, algorithms and applications. Cambridge University Press, 2010.
G. Jin, L. Nakhleh, S. Snir, and T. Tuller. Inferring phylogenetic networks by the maximum parsimony criterion: A case study. Molecular Biology and Evolution, 24(1):324-337, 2006.
G. Jin, L. Nakhleh, S. Snir, and T. Tuller. Maximum likelihood of phylogenetic networks. Bioinformatics, 22(21):2604-2611, 2006.
Guohua Jin, L. Nakhleh, S. Snir, and T. Tuller. Parsimony score of phylogenetic networks: Hardness results and a linear-time heuristic. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(3):495-505, 2009.
Lavanya Kannan and Ward C. Wheeler. Maximum Parsimony on Phylogenetic networks. Algorithms for Molecular Biology, 7(1):9, 2012.
Lavanya Kannan and Ward C. Wheeler. Exactly computing the parsimony scores on phylogenetic networks using dynamic programming. Journal of Computational Biology, 21(4):303-319, 2014.
Steven Kelk, Fabio Pardi, Celine Scornavacca, and Leo van Iersel. Finding a most parsimonious or likely tree in a network with respect to an alignment. Journal of Mathematical Biology, 78(1-2):527-547, 2019.
Ton Kloks. Treewidth: computations and approximations, volume 842. Springer Science & Business Media, 1994.
Tuukka Korhonen. Single-exponential time 2-approximation algorithm for treewidth. CoRR, abs/2104.07463, 2021. URL: http://arxiv.org/abs/2104.07463.
http://arxiv.org/abs/2104.07463
Guillaume Mescoff, Christophe Paul, and Dimitrios Thilikos. A polynomial time algorithm to compute the connected tree-width of a series-parallel graph, 2021. URL: http://arxiv.org/abs/2004.00547v5.
http://arxiv.org/abs/2004.00547v5
Luay Nakhleh, Guohua Jin, Fengmei Zhao, and John Mellor-Crummey. Reconstructing phylogenetic networks using maximum parsimony. In 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05), pages 93-102. IEEE, 2005.
Charles-Elie Rabier, Vincent Berry, Marnus Stoltz, João D. Santos, Wensheng Wang, Glaszmann Jean-Christophe, Fabio Pardi, and Celine Scornavacca. On the inference of complicated phylogenetic networks by Markov Chain Monte-Carlo. Submitted.
Hisao Tamaki. Positive-instance driven dynamic programming for treewidth. Journal of Combinatorial Optimization, 37(4):1283-1311, 2019.
Leo Van Iersel, Mark Jones, and Celine Scornavacca. Improved maximum parsimony models for phylogenetic networks. Systematic Biology, 67(3):518-542, 2018.
Ward C Wheeler. Phylogenetic network analysis as a parsimony optimization problem. BMC Bioinformatics, 16(1):1-9, 2015.
Jiafan Zhu, Dingqiao Wen, Yun Yu, Heidi M Meudt, and Luay Nakhleh. Bayesian inference of phylogenetic networks from bi-allelic genetic markers. PLoS Computational Biology, 14(1):e1005932, 2018.
Jiafan Zhu, Yun Yu, and Luay Nakhleh. In the light of deep coalescence: revisiting trees within networks. BMC Bioinformatics, 17(14):271-282, 2016.
Celine Scornavacca and Mathias Weller
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode