Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Authors Manuel Lafond, Binhai Zhu, Peng Zou

Thumbnail PDF


  • Filesize: 0.54 MB
  • 15 pages

Document Identifiers

Author Details

Manuel Lafond
  • Department of Computer Science, Universite de Sherbrooke, Quebec J1K 2R1, Canada
Binhai Zhu
  • Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA
Peng Zou
  • Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA

Cite AsGet BibTex

Manuel Lafond, Binhai Zhu, and Peng Zou. Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Recently, due to the genomic sequence analysis in several types of cancer, genomic data based on copy number profiles (CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific segment of interest. The motivation is that in the late stage of certain types of cancer, the genomes are progressing rapidly by segmental duplications and deletions, and hence obtaining the exact sequences becomes difficult. Instead, the number of copies of important segments can be predicted from expression analysis and carries important biological information. Therefore, significant research has recently been devoted to the analysis of genomic data represented as CNP’s. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. The Minimum Copy Number Generation (MCNG) is defined as follows: given a string S in which each character represents a gene or segment, and a CNP C, compute a string T from S, with the minimum number of segmental duplications and deletions, such that cnp(T)=C. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications and/or deletions are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. This is achieved through a general-purpose lemma on set-cover reductions that require an exact cover in one direction, but not the other, which might be of independent interest. We also prove that the corresponding parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP’s. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP’s C₁ and C₂, compute two strings S₁ and S₂ with cnp(S₁)=C₁ and cnp(S₂)=C₂ such that the distance between S₁ and S₂, d(S₁,S₂), is minimized. Here, d(S₁,S₂) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S₁,S₂) is measured by the breakpoint distance then the problem is polynomially solvable. We expect that this will trigger some related research along the line in the near future.

Subject Classification

ACM Subject Classification
  • Theory of computation
  • Computational genomics
  • cancer genomics
  • copy number profiles
  • NP-hardness
  • approximation algorithms
  • FPT algorithms


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Sebastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thevenin, and Stephane Vialette. On the approximability of comparing genomes with duplicates. J. Graph Algorithms and Applications, 13(1):19-53, 2009. Google Scholar
  2. Salim Chowdhury, Stanley Shackney, Kerstin Heselmeyer-Haddad, Thomas Ried, Alejandro Shaeffer, and Russell Schwartz. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLOS Computational Biology, 10(7), 2014. Google Scholar
  3. SL Cooke, J Temple, S Macarthur, MA Zahra, LT Tan, RAF Crawford, CKY Ng, M Jimenez-Linan, E Sala, and JD Brenton. Intra-tumour genetic heterogeneity and poor chemoradiotherapy response in cervical cancer. British Journal of Cancer, 104(2):361, 2011. Google Scholar
  4. Susanna L Cooke and James D Brenton. Evolution of platinum resistance in high-grade serous ovarian cancer. The Lancet Oncology, 12(12):1169-1174, 2011. Google Scholar
  5. Garance Cordonnier and Manuel Lafond. Comparing copy-number profiles under multi-copy amplifications and deletions. BMC genomics, 21(2):1-12, 2020. Google Scholar
  6. Prue A Cowin, Joshy George, Sian Fereday, Elizabeth Loehrer, Peter Van Loo, Carleen Cullinane, Dariush Etemadmoghadam, Sarah Ftouni, Laura Galletta, Michael S Anglesio, et al. Lrp1b deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin. Cancer Research, 72(16):4060-4073, 2012. Google Scholar
  7. Rodney Downey and Michael Fellows. Parameterized complexity. Springer Science & Business Media, 2012. Google Scholar
  8. Mohammed El-Kebir, Benjamin Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Copy-number evolutions: complexity and algorithms. In Proceedings of WABI'2016, LNCS, volume 9838, pages 137-149. Springer, 2016. Google Scholar
  9. Mohammed El-Kebir, Benjamin J Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Complexity and algorithms for copy-number evolution problems. Algorithms for Molecular Biology, 12(1):13, 2017. Google Scholar
  10. Michael Fellows, Danny Hermelin, Frances Rosamond, and Stephane Vialette. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1):53-61, 2009. Google Scholar
  11. Patrick Holloway, Krister Swenson, David Ardell, and Nadia El-Mabrouk. Ancestral genome organization: an alignment approach. Journal of Computational Biology, 20(4):280-295, 2013. Google Scholar
  12. Haitao Jiang, Chunfang Zheng, David Sankodd, and Binhai Zhu. Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Bioinformatics and Comput. Biology, 9(4):1220-1229, 2012. Google Scholar
  13. Manuel Lafond, Binhai Zhu, and Peng Zou. The tandem duplication distance is np-hard. In Proceedings of STACS'2020, LIPIcs, volume 154, pages 15:1-15:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2020. Google Scholar
  14. Carlo C Maley, Patricia C Galipeau, Jennifer C Finley, V Jon Wongsurawat, Xiaohong Li, Carissa A Sanchez, Thomas G Paulson, Patricia L Blount, Rosa-Ana Risques, Peter S Rabinovitch, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nature Genetics, 38(4):468, 2006. Google Scholar
  15. Andriy Marusyk, Vanessa Almendro, and Kornelia Polyak. Intra-tumour heterogeneity: a looking glass for cancer? Nature Reviews Cancer, 12(5):323, 2012. Google Scholar
  16. Nicholas Navin, Alexander Krasnitz, Linda Rodgers, Kerry Cook, Jennifer Meth, Jude Kendall, Michael Riggs, Yvonne Eberling, Jennifer Troge, Vladimir Grubor, et al. Inferring tumor progression from genomic heterogeneity. Genome Research, 20(1):68-80, 2010. Google Scholar
  17. Cancer Genome Atlas Research Network et al. Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353):609, 2011. Google Scholar
  18. Letu Qingge, Xiaozhou He, Zhihui Liu, and Binhai Zhu. On the minimum copy number generation problem in cancer genomics. In Proceedings of ACM BCB'2018, pages 260-269. ACM, 2018. Google Scholar
  19. Gryte Satas, Simone Zaccaria, Geoffrey Mon, and Benjamin J Raphael. Scarlet: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Systems, 10(4):323-332, 2020. Google Scholar
  20. Roland F Schwarz, Anne Trinh, Botond Sipos, James D Brenton, Nick Goldman, and Florian Markowetz. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Computational Biology, 10(4):e1003535, 2014. Google Scholar
  21. Sohrab P Shah, Ryan D Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Guliany, Janine Senz, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature, 461(7265):809, 2009. Google Scholar
  22. Ron Shamir, Meirav Zehavi, and Ron Zeira. A linear-time algorithm for the copy number transformation problem. In Proceedings of CPM'2016, LIPIcs, volume 54, pages 16:1-16:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  23. Luca Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of 33rd ACM Symp. on Theory of Comput. (STOC'01), pages 453-461. ACM, 2001. Google Scholar
  24. G.A. Watterson, W.J. Ewens, T.E. Hall, and A. Morgan. The chromosome inversion problem. J. Theoretical Biology, 99(1):1-7, 1982. Google Scholar
  25. Ruofan Xia, Yu Lin, Jun Zhou, Tieming Geng, Bing Feng, and Jijun Tang. Phylogenetic reconstruction for copy-number evolution problems. IEEE/ACM transactions on computational biology and bioinformatics, 16(2):694-699, 2018. Google Scholar
  26. Simone Zaccaria, Mohammed El-Kebir, Gunnar W Klau, and Benjamin J Raphael. Phylogenetic copy-number factorization of multiple tumor samples. Journal of Computational Biology, 25(7):689-708, 2018. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail