A Multi-labeled Tree Edit Distance for Comparing "Clonal Trees" of Tumor Progression

Authors Nikolai Karpov, Salem Malikic, Md. Khaledur Rahman, S. Cenk Sahinalp

Thumbnail PDF


  • Filesize: 0.97 MB
  • 19 pages

Document Identifiers

Author Details

Nikolai Karpov
  • Department of Computer Science, Indiana University, Bloomington, IN, USA
Salem Malikic
  • School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Md. Khaledur Rahman
  • Department of Computer Science, Indiana University, Bloomington, IN, USA
S. Cenk Sahinalp
  • Department of Computer Science, Indiana University, Bloomington, IN, USA

Cite AsGet BibTex

Nikolai Karpov, Salem Malikic, Md. Khaledur Rahman, and S. Cenk Sahinalp. A Multi-labeled Tree Edit Distance for Comparing "Clonal Trees" of Tumor Progression. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


We introduce a new edit distance measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree edit distance (MLTED) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximal common tree. We show that the MLTED measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well. We have implemented our algorithm to compute MLTED exactly and applied it to a variety of data sets successfully. The source code of our method can be found in: https://github.com/khaled-rahman/leafDelTED.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational genomics
  • Computing methodologies → Combinatorial algorithms
  • Intra-tumor heterogeneity
  • tumor evolution
  • multi-labeled tree
  • tree edit distance
  • dynamic programming


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. A. Backurs and P. Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 51-58, 2015. URL: http://dx.doi.org/10.1145/2746539.2746612.
  2. P. Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217-239, 2005. URL: http://dx.doi.org/10.1016/j.tcs.2004.12.030.
  3. W. Chen. More efficient algorithm for ordered tree inclusion. J. Algorithms, 26(2):370-385, 1998. URL: http://dx.doi.org/10.1006/jagm.1997.0899.
  4. W. Chen. New algorithm for ordered tree-to-tree correction problem. J. Algorithms, 40(2):135-158, 2001. URL: http://dx.doi.org/10.1006/jagm.2001.1170.
  5. Nilgun Donmez, Salem Malikic, Alexander W. Wyatt, Martin E. Gleave, Colin Collins, and S Cenk Sahinalp. Clonality inference from single tumor samples using low-coverage sequence data. Journal of Computational Biology, 24(6):515-523, 2017. URL: http://dx.doi.org/10.1089/cmb.2016.0148.
  6. A. G. Deshwar et al. Phylowgs: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome biology, 16(1):35, 2015. Google Scholar
  7. C. Gawad et al. Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proceedings of the National Academy of Sciences, 111(50):17947-17952, 2014. Google Scholar
  8. El-Kebir M. et al. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell systems, 3(1):43-53, 2016. Google Scholar
  9. F. Strino et al. Trap: a tree approach for fingerprinting subclonal tumor composition. Nucleic acids research, 41(17):e165-e165, 2013. Google Scholar
  10. H. Zafar et al. Sifit: inferring tumor trees from single-cell sequencing data under finite-sites models. Genome biology, 18(1):178, 2017. Google Scholar
  11. Hajirasouliha I. et al. A combinatorial approach for analyzing intra-tumor heterogeneity from high-throughput sequencing data. Bioinformatics, 30(12):i78-i86, 2014. Google Scholar
  12. J. Kuipers et al. Advances in understanding tumour evolution through single-cell sequencing. Biochimica et Biophysica Acta (BBA)-Reviews on Cancer, 1867(2):127-138, 2017. Google Scholar
  13. Jiang T. et al. Alignment of trees - an alternative to tree edit. Theor. Comput. Sci., 143(1):137-148, 1995. URL: http://dx.doi.org/10.1016/0304-3975(95)80029-9.
  14. K. Jahn et al. Tree inference for single-cell data. Genome biology, 17(1):86, 2016. Google Scholar
  15. M. El-Kebir et al. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31(12):i62-i70, 2015. Google Scholar
  16. S. Malikic et al. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics, 31(9):1349-1356, 2015. Google Scholar
  17. S. Malikic et al. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. To appear in proceedings of RECOMB, 2018. Google Scholar
  18. W. Jiao et al. Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC bioinformatics, 15(1):35, 2014. Google Scholar
  19. Michael L. Fredman and Robert Endre Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, 34(3):596-615, 1987. URL: http://dx.doi.org/10.1145/28869.28874.
  20. J. Jansson and A. Lingas. A fast algorithm for optimal alignment between similar ordered trees. Fundam. Inform., 56(1-2):105-120, 2003. URL: http://content.iospress.com/articles/fundamenta-informaticae/fi56-1-2-07.
  21. R. Kim, K.I. & Simon. Using single cell sequencing data to model the evolutionary history of a tumor. BMC bioinformatics, 15(1):27, 2014. Google Scholar
  22. P.N. Klein. Computing the edit-distance between unrooted ordered trees. In Algorithms - ESA '98, 6th Annual European Symposium, Venice, Italy, August 24-26, 1998, Proceedings, pages 91-102, 1998. URL: http://dx.doi.org/10.1007/3-540-68530-8_8.
  23. Jack Kuipers, Katharina Jahn, Benjamin J Raphael, and Niko Beerenwinkel. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome research, 27(11):1885-1894, 2017. Google Scholar
  24. P. Kilpeläinen & H. Mannila. Ordered and unordered tree inclusion. SIAM J. Comput., 24(2):340-356, 1995. URL: http://dx.doi.org/10.1137/S0097539791218202.
  25. E. M. Ross & F. Markowetz. Onconem: inferring tumor evolution from single-cell sequencing data. Genome biology, 17(1):69, 2016. Google Scholar
  26. P.C. Nowell. The clonal evolution of tumor cell populations. Science, 194(4260):23-28, 1976. Google Scholar
  27. Victoria Popic, Raheleh Salari, Iman Hajirasouliha, Dorna Kashef-Haghighi, Robert B West, and Serafim Batzoglou. Fast and scalable inference of multi-sample cancer lineages. Genome biology, 16(1):91, 2015. Google Scholar
  28. Daniele Ramazzotti, Alex Graudenzi, Luca De Sano, Marco Antoniotti, and Giulio Caravagna. Learning mutational graphs of individual tumor evolution from multi-sample sequencing data. arXiv preprint arXiv:1709.01076, 2017. Google Scholar
  29. S. Muthukrishnan & S.C. Sahinalp. An efficient algorithm for sequence comparison with block reversals. Theor. Comput. Sci., 321(1):95-101, 2004. URL: http://dx.doi.org/10.1016/j.tcs.2003.05.005.
  30. S. M. Selkow. The tree-to-tree editing problem. Inf. Process. Lett., 6(6):184-186, 1977. URL: http://dx.doi.org/10.1016/0020-0190(77)90064-3.
  31. K. Zhang & D.E. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245-1262, 1989. URL: http://dx.doi.org/10.1137/0218082.
  32. D. Shapira & J.A. Storer. Edit distance with block deletions. Algorithms, 4(1):40-60, 2011. URL: http://dx.doi.org/10.3390/a4010040.
  33. Kuo-Chung T. The tree-to-tree correction problem. J. ACM, 26(3):422-433, 1979. URL: http://dx.doi.org/10.1145/322139.322143.
  34. J. Matoušek & R. Thomas. On the complexity of finding iso- and other morphisms for partial k-trees. Discrete Mathematics, 108(1-3):343-364, 1992. URL: http://dx.doi.org/10.1016/0012-365X(92)90687-B.
  35. R.A. Wagner and M.J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974. URL: http://dx.doi.org/10.1145/321796.321811.
  36. Yong Wang, Jill Waters, Marco L Leung, Anna Unruh, Whijae Roh, Xiuqing Shi, Ken Chen, Paul Scheet, Selina Vattathil, Han Liang, et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature, 512(7513):155, 2014. Google Scholar
  37. D. Shasha & K. Zhang. Fast algorithms for the unit cost editing distance between trees. J. Algorithms, 11(4):581-621, 1990. URL: http://dx.doi.org/10.1016/0196-6774(90)90011-3.
  38. K. Zhang. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recognition, 28(3):463-474, 1995. URL: http://dx.doi.org/10.1016/0031-3203(94)00109-Y.
  39. K. Zhang and T. Jiang. Some MAX snp-hard results concerning unordered labeled trees. Inf. Process. Lett., 49(5):249-254, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90062-0.