MUL-Tree Pruning for Consistency and Compatibility

Authors Christopher Hampson , Daniel J. Harvey, Costas S. Iliopoulos , Jesper Jansson , Zara Lim , Wing-Kin Sung



PDF
Thumbnail PDF

File

LIPIcs.CPM.2023.14.pdf
  • Filesize: 1.14 MB
  • 18 pages

Document Identifiers

Author Details

Christopher Hampson
  • Department of Informatics, King’s College London, UK
Daniel J. Harvey
  • Graduate School of Informatics, Kyoto University, Japan
Costas S. Iliopoulos
  • Department of Informatics, King’s College London, UK
Jesper Jansson
  • Graduate School of Informatics, Kyoto University, Japan
Zara Lim
  • Department of Informatics, King’s College London, UK
Wing-Kin Sung
  • Department of Chemical Pathology, The Chinese University of Hong Kong, China
  • Hong Kong Genome Institute, Hong Kong Science Park, Shatin, China
  • Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, China

Cite AsGet BibTex

Christopher Hampson, Daniel J. Harvey, Costas S. Iliopoulos, Jesper Jansson, Zara Lim, and Wing-Kin Sung. MUL-Tree Pruning for Consistency and Compatibility. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CPM.2023.14

Abstract

A multi-labelled tree (or MUL-tree) is a rooted tree leaf-labelled by a set of labels, where each label may appear more than once in the tree. We consider the MUL-tree Set Pruning for Consistency problem (MULSETPC), which takes as input a set of MUL-trees and asks whether there exists a perfect pruning of each MUL-tree that results in a consistent set of single-labelled trees. MULSETPC was proven to be NP-complete by Gascon et al. when the MUL-trees are binary, each leaf label is used at most three times, and the number of MUL-trees is unbounded. To determine the computational complexity of the problem when the number of MUL-trees is constant was left as an open problem. Here, we resolve this question by proving a much stronger result, namely that MULSETPC is NP-complete even when there are only two MUL-trees, every leaf label is used at most twice, and every MUL-tree is either binary or has constant height. Furthermore, we introduce an extension of MULSETPC that we call MULSETPComp, which replaces the notion of consistency with compatibility, and prove that MULSETPComp is NP-complete even when there are only two MUL-trees, every leaf label is used at most thrice, and every MUL-tree has constant height. Finally, we present a polynomial-time algorithm for instances of MULSETPC with a constant number of binary MUL-trees, in the special case where every leaf label occurs exactly once in at least one MUL-tree.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • multi-labelled tree
  • phylogenetic tree
  • consistent
  • compatible
  • pruning
  • algorithm
  • NP-complete

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho, Yehoshua Sagiv, Thomas G. Szymanski, and Jeffrey D. Ullman. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing, 10(3):405-421, 1981. Google Scholar
  2. Amihood Amir and Dmitry Keselman. Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms. SIAM Journal on Computing, 26(6):1656-1669, 1997. Google Scholar
  3. Mukul S Bansal. Linear-time algorithms for some phylogenetic tree completion problems under Robinson-Foulds distance. In RECOMB International conference on Comparative Genomics, pages 209-226. Springer, 2018. Google Scholar
  4. Mukul S Bansal, J Gordon Burleigh, Oliver Eulenstein, and David Fernández-Baca. Robinson-Foulds supertrees. Algorithms for molecular biology, 5(1):1-12, 2010. Google Scholar
  5. Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of combinatorics, 8(4):409-423, 2005. Google Scholar
  6. David Bryant. A classification of consensus methods for phylogenetics. In M. F. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts, editors, Bioconsensus, volume 61 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 163-184. American Mathematical Society, 2003. Google Scholar
  7. Richard Cole, Martin Farach-Colton, Ramesh Hariharan, Teresa Przytycka, and Mikkel Thorup. An O(nlog n) Algorithm for the Maximum Agreement Subtree Problem for Binary Trees. SIAM Journal on Computing, 30(5):1385-1404, 2000. Google Scholar
  8. Yun Cui, Jesper Jansson, and Wing-Kin Sung. Polynomial-time Algorithms for Building a Consensus MUL-Tree. Journal of Computational Biology, 19(9):1073-1088, 2012. Google Scholar
  9. Yun Deng and David Fernández-Baca. Fast compatibility testing for rooted phylogenetic trees. Algorithmica, 80(8):2453-2477, 2018. Google Scholar
  10. Zhihong Ding, Vladimir Filkov, and Dan Gusfield. A linear-time algorithm for the perfect phylogeny haplotyping (PPH) problem. Journal of Computational Biology, 13(2):522-553, 2006. Google Scholar
  11. Joseph Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, Massachusetts, 2004. Google Scholar
  12. CR Finden and AD Gordon. Obtaining common pruned trees. Journal of Classification, 2(1):255-276, 1985. Google Scholar
  13. Ganeshkumar Ganapathy, Barbara Goodson, Robert Jansen, Hai-son Le, Vijaya Ramachandran, and Tandy Warnow. Pattern identification in biogeography. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(4):334-346, 2006. Google Scholar
  14. Michael R Garey and David S Johnson. Computers and intractability. A Series of Books in the Mathematical Sciences. W. H. Freeman and Co., San Francisco, Calif., 1979. A guide to the theory of NP-completeness. Google Scholar
  15. Mathieu Gascon, Riccardo Dondi, and Nadia El-Mabrouk. Complexity and Algorithms for MUL-Tree Pruning. In Paola Flocchini and Lucia Moura, editors, Combinatorial Algorithms, pages 324-339, Cham, 2021. Springer International Publishing. Google Scholar
  16. Mathieu Gascon, Riccardo Dondi, and Nadia El-Mabrouk. MUL-tree pruning for consistency and optimal reconciliation - complexity and algorithms. Theoret. Comput. Sci., 937:22-38, 2022. Google Scholar
  17. Katharina T Huber and Vincent Moulton. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology, 52(5):613-632, 2006. Google Scholar
  18. Katharina T Huber, Vincent Moulton, Mike Steel, and Taoyang Wu. Folding and unfolding phylogenetic trees and networks. Journal of Mathematical Biology, 73(6):1761-1780, 2016. Google Scholar
  19. Katharina T Huber, Bengt Oxelman, Martin Lott, and Vincent Moulton. Reconstructing the evolutionary history of polyploids from multilabeled trees. Molecular Biology and Evolution, 23(9):1784-1791, 2006. Google Scholar
  20. Leo van Iersel, Steven Kelk, Nela Lekić, and Celine Scornavacca. A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees. BMC bioinformatics, 15(1):1-12, 2014. Google Scholar
  21. Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. Improved algorithms for constructing consensus trees. Journal of the ACM, 63(3), 2016. Article 28. Google Scholar
  22. Manuel Lafond, Nadia El-Mabrouk, Katharina T Huber, and Vincent Moulton. The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics. Theoretical Computer Science, 760:15-34, 2019. Google Scholar
  23. Martin Lott, Andreas Spillner, Katharina T Huber, Anna Petri, Bengt Oxelman, and Vincent Moulton. Inferring polyploid phylogenies from multiply-labeled gene trees. BMC Evolutionary Biology, 9(1):1-11, 2009. Google Scholar
  24. Nobuhiro Minaka. Cladograms and reticulated graphs: A proposal for graphic representation of cladistic structures. Bulletin of the Biogeographical Society of Japan, 45(1):1-10, 1990. Google Scholar
  25. Gordon L Nelson and Norman I Platnick. Systematics and Biogeography: Cladistics and Vicariance. Columbia University Press, 1981. Google Scholar
  26. Roderic D M Page. Parasites, phylogeny and cospeciation. International Journal for Parasitology, 23(4):499-506, 1993. Google Scholar
  27. Roderic D M Page. Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology, 43(1):58-77, 1994. Google Scholar
  28. Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan. Incomplete directed perfect phylogeny. SIAM J. Comput., 33(3):590-607, 2004. Google Scholar
  29. David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981. Google Scholar
  30. Celine Scornavacca, Vincent Berry, and Vincent Ranwez. Building species trees from larger parts of phylogenomic databases. Information and Computation, 209(3):590-605, 2011. Google Scholar
  31. Mike Steel. The complexity of reconstructing trees from qualitative characters and subtrees. J. Classification, 9(1):91-116, 1992. Google Scholar
  32. Mike Steel and Tandy Warnow. Kaikoura tree theorems: Computing the maximum agreement subtree. Information Processing Letters, 48:77-82, 1993. Google Scholar
  33. Christopher Whidden, Norbert Zeh, and Robert G Beiko. Supertrees Based on the Subtree Prune-and-Regraft Distance. Systematic biology, 63(4):566-581, 2014. Google Scholar
  34. Yufeng Wu. A practical method for exact computation of subtree prune and regraft distance. Bioinformatics, 25(2):190-196, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail