On Two Measures of Distance Between Fully-Labelled Trees

Authors Giulia Bernardini , Paola Bonizzoni, Paweł Gawrychowski

Thumbnail PDF


  • Filesize: 1.27 MB
  • 16 pages

Document Identifiers

Author Details

Giulia Bernardini
  • University of Milano - Bicocca, Milan, Italy
Paola Bonizzoni
  • University of Milano - Bicocca, Milan, Italy
Paweł Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland

Cite AsGet BibTex

Giulia Bernardini, Paola Bonizzoni, and Paweł Gawrychowski. On Two Measures of Distance Between Fully-Labelled Trees. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 6:1-6:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on n nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an 𝒪̃(n^{4/3+o(1}) time algorithm for computing the permutation distance between two trees on n nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Approximation algorithms analysis
  • Theory of computation → Problems, reductions and completeness
  • Tree distance
  • Cancer progression
  • Approximation algorithms
  • Fine-grained complexity


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Amir Abboud, Loukas Georgiadis, Giuseppe F. Italiano, Robert Krauthgamer, Nikos Parotsidis, Ohad Trabelsi, Przemyslaw Uznanski, and Daniel Wolleb-Graf. Faster algorithms for all-pairs bounded min-cuts. In 46th ICALP, pages 7:1-7:15, 2019. Google Scholar
  2. Amir Abboud, Robert Krauthgamer, and Ohad Trabelsi. New algorithms and lower bounds for all-pairs max-flow in undirected graphs. In 31st SODA, pages 48-61, 2020. Google Scholar
  3. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. Google Scholar
  4. Benjamin L Allen and Mike Steel. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics, 5(1):1-15, 2001. Google Scholar
  5. Giulia Bernardini, Paola Bonizzoni, Gianluca Della Vedova, and Murray Patterson. A rearrangement distance for fully-labelled trees. In 30th CPM, pages 28:1-28:15, 2019. Google Scholar
  6. Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Beyond perfect phylogeny: Multisample phylogeny reconstruction via ilp. In 8th ACM-BCB, pages 1-10, 2017. Google Scholar
  7. Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Does relaxing the infinite sites assumption give better tumor phylogenies? an ilp-based comparative approach. IEEE/ACM Trans. Comput. Biology Bioinform., 16(5):1410-1423, 2018. Google Scholar
  8. Magnus Bordewich and Charles Semple. On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics, 8(4):409-423, 2005. Google Scholar
  9. Robert S. Boyer and J. Strother Moore. MJRTY: A fast majority vote algorithm. In Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series, pages 105-118. Kluwer Academic Publishers, 1991. Google Scholar
  10. Gerth Stølting Brodal, Rolf Fagerberg, Thomas Mailund, Christian NS Pedersen, and Andreas Sand. Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree. In 24th SODA, pages 1814-1832, 2013. Google Scholar
  11. David Bryant. A classification of consensus methods for phylogenetics. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 61:163-184, 2003. Google Scholar
  12. Peter Buneman. The recovery of trees from measures of dissimilarity. Mathematics in the Archaeological and Historical Sciences, 1971. Google Scholar
  13. Simone Ciccolella, Giulia Bernardini, Luca Denti, Paol Bonizzoni, Marco Previtali, and Gianluca Della Vedova. Triplet-based similarity score for fully multi-labeled trees with poly-occurring labels, 2020. URL: http://arxiv.org/abs/https://www.biorxiv.org/content/early/2020/04/14/2020.04.14.040550.full.pdf.
  14. Bhaskar DasGupta, Xin He, Tao Jiang, Ming Li, John Tromp, and Louxin Zhang. On distances between phylogenetic trees. In 8th SODA, pages 427-436, 1997. Google Scholar
  15. Zach DiNardo, Kiran Tomlinson, Anna Ritz, and Layla Oesper. Distance measures for tumor evolutionary trees. Bioinformatics, November 2019. Google Scholar
  16. Annette J Dobson. Comparing the shapes of trees. In Combinatorial Mathematics III, pages 95-100. Springer, 1975. Google Scholar
  17. Bartłomiej Dudek and Paweł Gawrychowski. Computing quartet distance is equivalent to counting 4-cycles. In 51st STOC, pages 733-743, 2019. Google Scholar
  18. George F Estabrook, FR McMorris, and Christopher A Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology, 34(2):193-200, 1985. Google Scholar
  19. Joseph Felsenstein and Joseph Felenstein. Inferring phylogenies, volume 2. Sinauer Associates Sunderland, MA, 2004. Google Scholar
  20. Pawel Gawrychowski, Gad M. Landau, Wing-Kin Sung, and Oren Weimann. A faster construction of greedy consensus trees. In 45th ICALP, pages 63:1-63:14, 2018. Google Scholar
  21. Kiya Govek, Camden Sikes, and Layla Oesper. A consensus approach to infer tumor evolutionary histories. In 9th BCB, pages 63-72, 2018. Google Scholar
  22. Russell D Gray, Alexei J Drummond, and Simon J Greenhill. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science, 323(5913):479-483, 2009. Google Scholar
  23. Iman Hajirasouliha, Ahmad Mahmoody, and Benjamin J Raphael. A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30(12):i78-i86, 2014. Google Scholar
  24. John E. Hopcroft and Richard M. Karp. An n^2.5 algorithm for maximum matchings in bipartite graphs. SIAM J. Comput., 2(4):225-231, 1973. Google Scholar
  25. Katharina T. Huber and Vincent Moulton. Phylogenetic networks from multi-labelled trees. Journal of Mathematical Biology, 52(5):613-632, 2006. Google Scholar
  26. Daniel H Huson and David Bryant. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23(2):254-267, 2006. Google Scholar
  27. Jesper Jansson, Ramesh Rajaby, Chuanqi Shen, and Wing-Kin Sung. Algorithms for the majority rule (+) consensus tree and the frequency difference consensus tree. IEEE/ACM Trans. Comput. Biology Bioinform., 15(1):15-26, 2016. Google Scholar
  28. Jesper Jansson, Chuanqi Shen, and Wing-Kin Sung. Improved algorithms for constructing consensus trees. Journal of the ACM, 63(3):1-24, 2016. Google Scholar
  29. Wei Jiao, Shankar Vembu, Amit G Deshwar, Lincoln Stein, and Quaid Morris. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC bioinformatics, 15(1):35, 2014. Google Scholar
  30. Ming-Yang Kao, Tak Wah Lam, Wing-Kin Sung, and Hing-Fung Ting. A decomposition theorem for maximum weight bipartite matchings. SIAM J. Comput., 31(1):18-26, 2001. Google Scholar
  31. Nikolai Karpov, Salem Malikic, Md Khaledur Rahman, and S Cenk Sahinalp. A multi-labeled tree dissimilarity measure for comparing "clonal trees” of tumor progression. Algorithms for Molecular Biology, 14(1):17, 2019. Google Scholar
  32. Robert Krauthgamer and Ohad Trabelsi. Conditional lower bounds for all-pairs max-flow. ACM Trans. Algorithms, 14(4):42:1-42:15, 2018. Google Scholar
  33. Yang P Liu and Aaron Sidford. Faster divergence maximization for faster maximum flow. arXiv preprint arXiv:2003.08929, 2020. Google Scholar
  34. Salem Malikic, Farid Rashidi Mehrabadi, Simone Ciccolella, Md Khaledur Rahman, Camir Ricketts, Ehsan Haghshenas, Daniel Seidman, Faraz Hach, Iman Hajirasouliha, and S Cenk Sahinalp. Phiscs: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Research, 29(11):1860-1877, 2019. Google Scholar
  35. Matt McVicar, Benjamin Sach, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou, and Tijl De Bie. Sumoted: An intuitive edit distance between rooted unordered uniquely-labelled trees. Pattern Recognition Letters, 79:52-59, 2016. Google Scholar
  36. Luay Nakhleh, Tandy Warnow, Don Ringe, and Steven N Evans. A comparison of phylogenetic reconstruction methods on an indo-european dataset. Transactions of the Philological Society, 103(2):171-192, 2005. Google Scholar
  37. Peter C Nowell. The clonal evolution of tumor cell populations. Science, 194(4260):23-28, 1976. Google Scholar
  38. Mateusz Pawlik and Nikolaus Augsten. Efficient computation of the tree edit distance. ACM Transactions on Database Systems, 40(1):1-40, 2015. Google Scholar
  39. David F Robinson and Leslie R Foulds. Comparison of weighted labelled trees. In Combinatorial Mathematics VI, pages 119-126. Springer, 1979. Google Scholar
  40. David F Robinson and Leslie R Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53(1-2):131-147, 1981. Google Scholar
  41. D.D. Sleator and R.E. Tarjan. A data structure for dynamic trees. J. Comput. Syst. Sci., 26(3):362-391, 1983. Google Scholar
  42. Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016. Google Scholar
  43. Kuo-Chung Tai. The tree-to-tree correction problem. Journal of the ACM, 26(3):422-433, 1979. Google Scholar
  44. Robert S Walker, Søren Wichmann, Thomas Mailund, and Curtis J Atkisson. Cultural phylogenetics of the tupi language family in lowland south america. PLOS One, 7(4), 2012. Google Scholar
  45. Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In International Congress of Mathematicians, 2018. Google Scholar
  46. Ke Yuan, Thomas Sakoparnig, Florian Markowetz, and Niko Beerenwinkel. Bitphylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biology, 16(1):36, 2015. Google Scholar
  47. Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989. Google Scholar