On Two Measures of Distance Between Fully-Labelled Trees

Bernardini, Giulia; Bonizzoni, Paola; Gawrychowski, Paweł

doi:10.4230/LIPIcs.CPM.2020.6

Abstract

The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on n nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an 𝒪̃(n^{4/3+o(1}) time algorithm for computing the permutation distance between two trees on n nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.

Amir Abboud, Loukas Georgiadis, Giuseppe F. Italiano, Robert Krauthgamer, Nikos Parotsidis, Ohad Trabelsi, Przemyslaw Uznanski, and Daniel Wolleb-Graf. Faster algorithms for all-pairs bounded min-cuts. In 46th ICALP, pages 7:1-7:15, 2019.
Amir Abboud, Robert Krauthgamer, and Ohad Trabelsi. New algorithms and lower bounds for all-pairs max-flow in undirected graphs. In 31st SODA, pages 48-61, 2020.
Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
Benjamin L Allen and Mike Steel. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics, 5(1):1-15, 2001.
Giulia Bernardini, Paola Bonizzoni, Gianluca Della Vedova, and Murray Patterson. A rearrangement distance for fully-labelled trees. In 30th CPM, pages 28:1-28:15, 2019.
Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Beyond perfect phylogeny: Multisample phylogeny reconstruction via ilp. In 8th ACM-BCB, pages 1-10, 2017.
Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Does relaxing the infinite sites assumption give better tumor phylogenies? an ilp-based comparative approach. IEEE/ACM Trans. Comput. Biology Bioinform., 16(5):1410-1423, 2018.
Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics, 8(4):409-423, 2005.
Robert S. Boyer and J. Strother Moore. MJRTY: A fast majority vote algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, pages 105-118. Kluwer Academic Publishers, 1991.
Gerth Stølting Brodal, Rolf Fagerberg, Thomas Mailund, Christian NS Pedersen, and Andreas Sand. Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree. In 24th SODA, pages 1814-1832, 2013.
David Bryant. A classification of consensus methods for phylogenetics. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 61:163-184, 2003.
Peter Buneman. The recovery of trees from measures of dissimilarity. Mathematics in the Archaeological and Historical Sciences, 1971.
Simone Ciccolella, Giulia Bernardini, Luca Denti, Paol Bonizzoni, Marco Previtali, and Gianluca Della Vedova. Triplet-based similarity score for fully multi-labeled trees with poly-occurring labels, 2020. URL: http://arxiv.org/abs/https://www.biorxiv.org/content/early/2020/04/14/2020.04.14.040550.full.pdf.
Bhaskar DasGupta, Xin He, Tao Jiang, Ming Li, John Tromp, and Louxin Zhang. On distances between phylogenetic trees. In 8th SODA, pages 427-436, 1997.
Zach DiNardo, Kiran Tomlinson, Anna Ritz, and Layla Oesper. Distance measures for tumor evolutionary trees. Bioinformatics, November 2019.
Annette J Dobson. Comparing the shapes of trees. In Combinatorial Mathematics III, pages 95-100. Springer, 1975.
Bartłomiej Dudek and Paweł Gawrychowski. Computing quartet distance is equivalent to counting 4-cycles. In 51st STOC, pages 733-743, 2019.
George F Estabrook, FR McMorris, and Christopher A Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology, 34(2):193-200, 1985.
Joseph Felsenstein and Joseph Felenstein. Inferring phylogenies, volume 2. Sinauer Associates Sunderland, MA, 2004.
Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann. A faster construction of greedy consensus trees. In 45th ICALP, pages 63:1-63:14, 2018.
Kiya Govek, Camden Sikes, and Layla Oesper. A consensus approach to infer tumor evolutionary histories. In 9th BCB, pages 63-72, 2018.
Russell D Gray, Alexei J Drummond, and Simon J Greenhill. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science, 323(5913):479-483, 2009.
Iman Hajirasouliha, Ahmad Mahmoody, and Benjamin J Raphael. A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30(12):i78-i86, 2014.
John E. Hopcroft and Richard M. Karp. An n^2.5 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput., 2(4):225-231, 1973.
Katharina T. Huber and Vincent Moulton. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology, 52(5):613-632, 2006.
Daniel H Huson and David Bryant. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23(2):254-267, 2006.
Jesper Jansson, Ramesh Rajaby, Chuanqi Shen, and Wing-Kin Sung. Algorithms for the majority rule (+) consensus tree and the frequency difference consensus tree. IEEE/ACM Trans. Comput. Biology Bioinform., 15(1):15-26, 2016.
Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. Improved algorithms for constructing consensus trees. Journal of the ACM, 63(3):1-24, 2016.
Wei Jiao, Shankar Vembu, Amit G Deshwar, Lincoln Stein, and Quaid Morris. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC bioinformatics, 15(1):35, 2014.
Ming-Yang Kao, Tak Wah Lam, Wing-Kin Sung, and Hing-Fung Ting. A decomposition theorem for maximum weight bipartite matchings. SIAM J. Comput., 31(1):18-26, 2001.
Nikolai Karpov, Salem Malikic, Md Khaledur Rahman, and S Cenk Sahinalp. A multi-labeled tree dissimilarity measure for comparing "clonal trees” of tumor progression. Algorithms for Molecular Biology, 14(1):17, 2019.
Robert Krauthgamer and Ohad Trabelsi. Conditional lower bounds for all-pairs max-flow. ACM Trans. Algorithms, 14(4):42:1-42:15, 2018.
Yang P Liu and Aaron Sidford. Faster divergence maximization for faster maximum flow. arXiv preprint arXiv:2003.08929, 2020.
Salem Malikic, Farid Rashidi Mehrabadi, Simone Ciccolella, Md Khaledur Rahman, Camir Ricketts, Ehsan Haghshenas, Daniel Seidman, Faraz Hach, Iman Hajirasouliha, and S Cenk Sahinalp. Phiscs: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Research, 29(11):1860-1877, 2019.
Matt McVicar, Benjamin Sach, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou, and Tijl De Bie. Sumoted: An intuitive edit distance between rooted unordered uniquely-labelled trees. Pattern Recognition Letters, 79:52-59, 2016.
Luay Nakhleh, Tandy Warnow, Don Ringe, and Steven N Evans. A comparison of phylogenetic reconstruction methods on an indo-european dataset. Transactions of the Philological Society, 103(2):171-192, 2005.
Peter C Nowell. The clonal evolution of tumor cell populations. Science, 194(4260):23-28, 1976.
Mateusz Pawlik and Nikolaus Augsten. Efficient computation of the tree edit distance. ACM Transactions on Database Systems, 40(1):1-40, 2015.
David F Robinson and Leslie R Foulds. Comparison of weighted labelled trees. In Combinatorial Mathematics VI, pages 119-126. Springer, 1979.
David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981.
D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. J. Comput. Syst. Sci., 26(3):362-391, 1983.
Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016.
Kuo-Chung Tai. The tree-to-tree correction problem. Journal of the ACM, 26(3):422-433, 1979.
Robert S Walker, Søren Wichmann, Thomas Mailund, and Curtis J Atkisson. Cultural phylogenetics of the tupi language family in lowland south america. PLOS One, 7(4), 2012.
Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In International Congress of Mathematicians, 2018.
Ke Yuan, Thomas Sakoparnig, Florian Markowetz, and Niko Beerenwinkel. Bitphylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biology, 16(1):36, 2015.
Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989.

On Two Measures of Distance Between Fully-Labelled Trees

Authors Giulia Bernardini , Paola Bonizzoni, Paweł Gawrychowski

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

On Two Measures of Distance Between Fully-Labelled Trees

Authors Giulia Bernardini , Paola Bonizzoni, Paweł Gawrychowski

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message