MUL-Tree Pruning for Consistency and Compatibility

Hampson, Christopher; Harvey, Daniel J.; Iliopoulos, Costas S.; Jansson, Jesper; Lim, Zara; Sung, Wing-Kin

doi:10.4230/LIPIcs.CPM.2023.14

File

LIPIcs.CPM.2023.14.pdf

Filesize: 1.14 MB
18 pages

Document Identifiers

DOI: 10.4230/LIPIcs.CPM.2023.14
URN: urn:nbn:de:0030-drops-179682

Author Details

Christopher Hampson

Department of Informatics, King’s College London, UK

Daniel J. Harvey

Graduate School of Informatics, Kyoto University, Japan

Costas S. Iliopoulos

Department of Informatics, King’s College London, UK

Jesper Jansson

Graduate School of Informatics, Kyoto University, Japan

Zara Lim

Department of Informatics, King’s College London, UK

Wing-Kin Sung

Department of Chemical Pathology, The Chinese University of Hong Kong, China
Hong Kong Genome Institute, Hong Kong Science Park, Shatin, China
Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, China

Cite AsGet BibTex

Christopher Hampson, Daniel J. Harvey, Costas S. Iliopoulos, Jesper Jansson, Zara Lim, and Wing-Kin Sung. MUL-Tree Pruning for Consistency and Compatibility. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 14:1-14:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CPM.2023.14

Abstract

A multi-labelled tree (or MUL-tree) is a rooted tree leaf-labelled by a set of labels, where each label may appear more than once in the tree. We consider the MUL-tree Set Pruning for Consistency problem (MULSETPC), which takes as input a set of MUL-trees and asks whether there exists a perfect pruning of each MUL-tree that results in a consistent set of single-labelled trees. MULSETPC was proven to be NP-complete by Gascon et al. when the MUL-trees are binary, each leaf label is used at most three times, and the number of MUL-trees is unbounded. To determine the computational complexity of the problem when the number of MUL-trees is constant was left as an open problem. Here, we resolve this question by proving a much stronger result, namely that MULSETPC is NP-complete even when there are only two MUL-trees, every leaf label is used at most twice, and every MUL-tree is either binary or has constant height. Furthermore, we introduce an extension of MULSETPC that we call MULSETPComp, which replaces the notion of consistency with compatibility, and prove that MULSETPComp is NP-complete even when there are only two MUL-trees, every leaf label is used at most thrice, and every MUL-tree has constant height. Finally, we present a polynomial-time algorithm for instances of MULSETPC with a constant number of binary MUL-trees, in the special case where every leaf label occurs exactly once in at least one MUL-tree.

Subject Classification

ACM Subject Classification

Theory of computation → Pattern matching

Keywords

multi-labelled tree
phylogenetic tree
consistent
compatible
pruning
algorithm
NP-complete

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alfred V. Aho, Yehoshua Sagiv, Thomas G. Szymanski, and Jeffrey D. Ullman. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing, 10(3):405-421, 1981.
Amihood Amir and Dmitry Keselman. Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms. SIAM Journal on Computing, 26(6):1656-1669, 1997.
Mukul S Bansal. Linear-time algorithms for some phylogenetic tree completion problems under Robinson-Foulds distance. In RECOMB International conference on Comparative Genomics, pages 209-226. Springer, 2018.
Mukul S Bansal, J Gordon Burleigh, Oliver Eulenstein, and David Fernández-Baca. Robinson-Foulds supertrees. Algorithms for molecular biology, 5(1):1-12, 2010.
Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of combinatorics, 8(4):409-423, 2005.
David Bryant. A classification of consensus methods for phylogenetics. In M. F. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts, editors, Bioconsensus, volume 61 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 163-184. American Mathematical Society, 2003.
Richard Cole, Martin Farach-Colton, Ramesh Hariharan, Teresa Przytycka, and Mikkel Thorup. An O(nlog n) Algorithm for the Maximum Agreement Subtree Problem for Binary Trees. SIAM Journal on Computing, 30(5):1385-1404, 2000.
Yun Cui, Jesper Jansson, and Wing-Kin Sung. Polynomial-time Algorithms for Building a Consensus MUL-Tree. Journal of Computational Biology, 19(9):1073-1088, 2012.
Yun Deng and David Fernández-Baca. Fast compatibility testing for rooted phylogenetic trees. Algorithmica, 80(8):2453-2477, 2018.
Zhihong Ding, Vladimir Filkov, and Dan Gusfield. A linear-time algorithm for the perfect phylogeny haplotyping (PPH) problem. Journal of Computational Biology, 13(2):522-553, 2006.
Joseph Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, Massachusetts, 2004.
CR Finden and AD Gordon. Obtaining common pruned trees. Journal of Classification, 2(1):255-276, 1985.
Ganeshkumar Ganapathy, Barbara Goodson, Robert Jansen, Hai-son Le, Vijaya Ramachandran, and Tandy Warnow. Pattern identification in biogeography. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(4):334-346, 2006.
Michael R Garey and David S Johnson. Computers and intractability. A Series of Books in the Mathematical Sciences. W. H. Freeman and Co., San Francisco, Calif., 1979. A guide to the theory of NP-completeness.
Mathieu Gascon, Riccardo Dondi, and Nadia El-Mabrouk. Complexity and Algorithms for MUL-Tree Pruning. In Paola Flocchini and Lucia Moura, editors, Combinatorial Algorithms, pages 324-339, Cham, 2021. Springer International Publishing.
Mathieu Gascon, Riccardo Dondi, and Nadia El-Mabrouk. MUL-tree pruning for consistency and optimal reconciliation - complexity and algorithms. Theoret. Comput. Sci., 937:22-38, 2022.
Katharina T Huber and Vincent Moulton. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology, 52(5):613-632, 2006.
Katharina T Huber, Vincent Moulton, Mike Steel, and Taoyang Wu. Folding and unfolding phylogenetic trees and networks. Journal of Mathematical Biology, 73(6):1761-1780, 2016.
Katharina T Huber, Bengt Oxelman, Martin Lott, and Vincent Moulton. Reconstructing the evolutionary history of polyploids from multilabeled trees. Molecular Biology and Evolution, 23(9):1784-1791, 2006.
Leo van Iersel, Steven Kelk, Nela Lekić, and Celine Scornavacca. A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees. BMC bioinformatics, 15(1):1-12, 2014.
Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. Improved algorithms for constructing consensus trees. Journal of the ACM, 63(3), 2016. Article 28.
Manuel Lafond, Nadia El-Mabrouk, Katharina T Huber, and Vincent Moulton. The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics. Theoretical Computer Science, 760:15-34, 2019.
Martin Lott, Andreas Spillner, Katharina T Huber, Anna Petri, Bengt Oxelman, and Vincent Moulton. Inferring polyploid phylogenies from multiply-labeled gene trees. BMC Evolutionary Biology, 9(1):1-11, 2009.
Nobuhiro Minaka. Cladograms and reticulated graphs: A proposal for graphic representation of cladistic structures. Bulletin of the Biogeographical Society of Japan, 45(1):1-10, 1990.
Gordon L Nelson and Norman I Platnick. Systematics and Biogeography: Cladistics and Vicariance. Columbia University Press, 1981.
Roderic D M Page. Parasites, phylogeny and cospeciation. International Journal for Parasitology, 23(4):499-506, 1993.
Roderic D M Page. Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology, 43(1):58-77, 1994.
Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan. Incomplete directed perfect phylogeny. SIAM J. Comput., 33(3):590-607, 2004.
David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981.
Celine Scornavacca, Vincent Berry, and Vincent Ranwez. Building species trees from larger parts of phylogenomic databases. Information and Computation, 209(3):590-605, 2011.
Mike Steel. The complexity of reconstructing trees from qualitative characters and subtrees. J. Classification, 9(1):91-116, 1992.
Mike Steel and Tandy Warnow. Kaikoura tree theorems: Computing the maximum agreement subtree. Information Processing Letters, 48:77-82, 1993.
Christopher Whidden, Norbert Zeh, and Robert G Beiko. Supertrees Based on the Subtree Prune-and-Regraft Distance. Systematic biology, 63(4):566-581, 2014.
Yufeng Wu. A practical method for exact computation of subtree prune and regraft distance. Bioinformatics, 25(2):190-196, 2009.

MUL-Tree Pruning for Consistency and Compatibility

Authors Christopher Hampson , Daniel J. Harvey, Costas S. Iliopoulos , Jesper Jansson , Zara Lim , Wing-Kin Sung

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

MUL-Tree Pruning for Consistency and Compatibility

Authors Christopher Hampson , Daniel J. Harvey, Costas S. Iliopoulos , Jesper Jansson , Zara Lim , Wing-Kin Sung

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message