ACM Other Conferences

10.1145/acmotherconferences

0000000

10.5555/0000000

Proceedings of the 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

CPM 2020

10.4230/LIPIcs.CPM.2020.22

10003752

Theory of computation

500

Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Lafond

Manuel

Department of Computer Science, Universite de Sherbrooke, Quebec J1K 2R1, Canada manuel.lafond@usherbrooke.ca Author Zhu

Binhai

Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA bhz@montana.edu Author Zou

Peng

Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA peng.zou@student.montana.edu Author

09 06 2020

22:1 22:15

Recently, due to the genomic sequence analysis in several types of cancer, genomic data based on copy number profiles (CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific segment of interest. The motivation is that in the late stage of certain types of cancer, the genomes are progressing rapidly by segmental duplications and deletions, and hence obtaining the exact sequences becomes difficult. Instead, the number of copies of important segments can be predicted from expression analysis and carries important biological information. Therefore, significant research has recently been devoted to the analysis of genomic data represented as CNP’s.

In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. The Minimum Copy Number Generation (MCNG) is defined as follows: given a string S in which each character represents a gene or segment, and a CNP C, compute a string T from S, with the minimum number of segmental duplications and deletions, such that cnp(T)=C. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications and/or deletions are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. This is achieved through a general-purpose lemma on set-cover reductions that require an exact cover in one direction, but not the other, which might be of independent interest. We also prove that the corresponding parameterized version is W[1]-hard, answering another open question by Qingge et al.

The other result is positive and is based on a new (and more general) problem regarding CNP’s. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP’s C₁ and C₂, compute two strings S₁ and S₂ with cnp(S₁)=C₁ and cnp(S₂)=C₂ such that the distance between S₁ and S₂, d(S₁,S₂), is minimized. Here, d(S₁,S₂) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S₁,S₂) is measured by the breakpoint distance then the problem is polynomially solvable. We expect that this will trigger some related research along the line in the near future.

Computational genomics cancer genomics copy number profiles NP-hardness approximation algorithms FPT algorithms

Sebastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thevenin, and Stephane Vialette. On the approximability of comparing genomes with duplicates. J. Graph Algorithms and Applications, 13(1):19-53, 2009.

Salim Chowdhury, Stanley Shackney, Kerstin Heselmeyer-Haddad, Thomas Ried, Alejandro Shaeffer, and Russell Schwartz. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLOS Computational Biology, 10(7), 2014.

SL Cooke, J Temple, S Macarthur, MA Zahra, LT Tan, RAF Crawford, CKY Ng, M Jimenez-Linan, E Sala, and JD Brenton. Intra-tumour genetic heterogeneity and poor chemoradiotherapy response in cervical cancer. British Journal of Cancer, 104(2):361, 2011.

Susanna L Cooke and James D Brenton. Evolution of platinum resistance in high-grade serous ovarian cancer. The Lancet Oncology, 12(12):1169-1174, 2011.

Garance Cordonnier and Manuel Lafond. Comparing copy-number profiles under multi-copy amplifications and deletions. BMC genomics, 21(2):1-12, 2020.

Prue A Cowin, Joshy George, Sian Fereday, Elizabeth Loehrer, Peter Van Loo, Carleen Cullinane, Dariush Etemadmoghadam, Sarah Ftouni, Laura Galletta, Michael S Anglesio, et al. Lrp1b deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin. Cancer Research, 72(16):4060-4073, 2012.

Rodney Downey and Michael Fellows. Parameterized complexity. Springer Science & Business Media, 2012.

Mohammed El-Kebir, Benjamin Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Copy-number evolutions: complexity and algorithms. In Proceedings of WABI'2016, LNCS, volume 9838, pages 137-149. Springer, 2016.

Mohammed El-Kebir, Benjamin J Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Complexity and algorithms for copy-number evolution problems. Algorithms for Molecular Biology, 12(1):13, 2017.

Michael Fellows, Danny Hermelin, Frances Rosamond, and Stephane Vialette. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1):53-61, 2009.

Patrick Holloway, Krister Swenson, David Ardell, and Nadia El-Mabrouk. Ancestral genome organization: an alignment approach. Journal of Computational Biology, 20(4):280-295, 2013.

Haitao Jiang, Chunfang Zheng, David Sankodd, and Binhai Zhu. Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Bioinformatics and Comput. Biology, 9(4):1220-1229, 2012.

Manuel Lafond, Binhai Zhu, and Peng Zou. The tandem duplication distance is np-hard. In Proceedings of STACS'2020, LIPIcs, volume 154, pages 15:1-15:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2020.

Carlo C Maley, Patricia C Galipeau, Jennifer C Finley, V Jon Wongsurawat, Xiaohong Li, Carissa A Sanchez, Thomas G Paulson, Patricia L Blount, Rosa-Ana Risques, Peter S Rabinovitch, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nature Genetics, 38(4):468, 2006.

Andriy Marusyk, Vanessa Almendro, and Kornelia Polyak. Intra-tumour heterogeneity: a looking glass for cancer? Nature Reviews Cancer, 12(5):323, 2012.

Nicholas Navin, Alexander Krasnitz, Linda Rodgers, Kerry Cook, Jennifer Meth, Jude Kendall, Michael Riggs, Yvonne Eberling, Jennifer Troge, Vladimir Grubor, et al. Inferring tumor progression from genomic heterogeneity. Genome Research, 20(1):68-80, 2010.

Cancer Genome Atlas Research Network et al. Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353):609, 2011.

Letu Qingge, Xiaozhou He, Zhihui Liu, and Binhai Zhu. On the minimum copy number generation problem in cancer genomics. In Proceedings of ACM BCB'2018, pages 260-269. ACM, 2018.

Gryte Satas, Simone Zaccaria, Geoffrey Mon, and Benjamin J Raphael. Scarlet: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Systems, 10(4):323-332, 2020.

Roland F Schwarz, Anne Trinh, Botond Sipos, James D Brenton, Nick Goldman, and Florian Markowetz. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Computational Biology, 10(4):e1003535, 2014.

Sohrab P Shah, Ryan D Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Guliany, Janine Senz, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature, 461(7265):809, 2009.

Ron Shamir, Meirav Zehavi, and Ron Zeira. A linear-time algorithm for the copy number transformation problem. In Proceedings of CPM'2016, LIPIcs, volume 54, pages 16:1-16:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.

Luca Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of 33rd ACM Symp. on Theory of Comput. (STOC'01), pages 453-461. ACM, 2001.

G.A. Watterson, W.J. Ewens, T.E. Hall, and A. Morgan. The chromosome inversion problem. J. Theoretical Biology, 99(1):1-7, 1982.

Ruofan Xia, Yu Lin, Jun Zhou, Tieming Geng, Bing Feng, and Jijun Tang. Phylogenetic reconstruction for copy-number evolution problems. IEEE/ACM transactions on computational biology and bioinformatics, 16(2):694-699, 2018.

Simone Zaccaria, Mohammed El-Kebir, Gunnar W Klau, and Benjamin J Raphael. Phylogenetic copy-number factorization of multiple tumor samples. Journal of Computational Biology, 25(7):689-708, 2018.

<book-part-wrapper xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" content-type="research-article">

<collection-meta collection-type="book-series">

<collection-id collection-id-type="doi">10.1145/acmotherconferences</collection-id>

<title-group>

<title>ACM Other Conferences</title>

</title-group>

</collection-meta>

<book-meta>

<book-id book-id-type="acm-id">0000000</book-id>

<book-id book-id-type="doi">10.5555/0000000</book-id>

<book-title-group>

<book-title>Proceedings of the 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)</book-title>

<alt-title alt-title-type="acronym">CPM 2020</alt-title>

</book-title-group>

</book-meta>

<book-part book-part-type="chapter" xml:lang="en">

<book-part-meta>

<book-part-id book-part-id-type="doi">10.4230/LIPIcs.CPM.2020.22</book-part-id>

<book-part-id book-part-id-type="article-no">22</book-part-id>

<subj-group subj-group-type="ccs2012">

<compound-subject>

<compound-subject-part content-type="code">10003752</compound-subject-part>

<compound-subject-part content-type="text">Theory of computation</compound-subject-part>

<compound-subject-part content-type="weight">500</compound-subject-part>

</compound-subject>

</subj-group>

<title-group>

<title>Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms</title>

</title-group>

<contrib-group>

<name>

<surname>Lafond</surname>

<given-names>Manuel</given-names>

</name>

<aff>Department of Computer Science, Universite de Sherbrooke, Quebec J1K 2R1, Canada</aff>

<email>manuel.lafond@usherbrooke.ca</email>

<role>Author</role>

</contrib>

<name>

<given-names>Binhai</given-names>

</name>

<aff>Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA</aff>

<email>bhz@montana.edu</email>

<role>Author</role>

</contrib>

<name>

<given-names>Peng</given-names>

</name>

<aff>Gianforte School of Computing, Montana State University, Bozeman, MT 59717, USA</aff>

<email>peng.zou@student.montana.edu</email>

<role>Author</role>

</contrib>

</contrib-group>

<pub-date date-type="publication">

</pub-date>

Recently, due to the genomic sequence analysis in several types of cancer, genomic data based on copy number profiles (CNP for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific segment of interest. The motivation is that in the late stage of certain types of cancer, the genomes are progressing rapidly by segmental duplications and deletions, and hence obtaining the exact sequences becomes difficult. Instead, the number of copies of important segments can be predicted from expression analysis and carries important biological information. Therefore, significant research has recently been devoted to the analysis of genomic data represented as CNP’s.

In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. The Minimum Copy Number Generation (MCNG) is defined as follows: given a string S in which each character represents a gene or segment, and a CNP C, compute a string T from S, with the minimum number of segmental duplications and deletions, such that cnp(T)=C. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications and/or deletions are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. This is achieved through a general-purpose lemma on set-cover reductions that require an exact cover in one direction, but not the other, which might be of independent interest. We also prove that the corresponding parameterized version is W[1]-hard, answering another open question by Qingge et al.

The other result is positive and is based on a new (and more general) problem regarding CNP’s. The Copy Number Profile Conforming (CNPC) problem is formally defined as follows: given two CNP’s C₁ and C₂, compute two strings S₁ and S₂ with cnp(S₁)=C₁ and cnp(S₂)=C₂ such that the distance between S₁ and S₂, d(S₁,S₂), is minimized. Here, d(S₁,S₂) is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if d(S₁,S₂) is measured by the breakpoint distance then the problem is polynomially solvable. We expect that this will trigger some related research along the line in the near future.

</abstract>

<kwd-group>

<kwd>Computational genomics</kwd>

<kwd>cancer genomics</kwd>

<kwd>copy number profiles</kwd>

<kwd>NP-hardness</kwd>

<kwd>approximation algorithms</kwd>

<kwd>FPT algorithms</kwd>

</kwd-group>

</book-part-meta>

<back>

<ref-list specific-use="unparsed">

<mixed-citation>Sebastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thevenin, and Stephane Vialette. On the approximability of comparing genomes with duplicates. J. Graph Algorithms and Applications, 13(1):19-53, 2009.</mixed-citation>

</ref>

<mixed-citation>Salim Chowdhury, Stanley Shackney, Kerstin Heselmeyer-Haddad, Thomas Ried, Alejandro Shaeffer, and Russell Schwartz. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics. PLOS Computational Biology, 10(7), 2014.</mixed-citation>

</ref>

<mixed-citation>SL Cooke, J Temple, S Macarthur, MA Zahra, LT Tan, RAF Crawford, CKY Ng, M Jimenez-Linan, E Sala, and JD Brenton. Intra-tumour genetic heterogeneity and poor chemoradiotherapy response in cervical cancer. British Journal of Cancer, 104(2):361, 2011.</mixed-citation>

</ref>

<mixed-citation>Susanna L Cooke and James D Brenton. Evolution of platinum resistance in high-grade serous ovarian cancer. The Lancet Oncology, 12(12):1169-1174, 2011.</mixed-citation>

</ref>

<mixed-citation>Garance Cordonnier and Manuel Lafond. Comparing copy-number profiles under multi-copy amplifications and deletions. BMC genomics, 21(2):1-12, 2020.</mixed-citation>

</ref>

<mixed-citation>Prue A Cowin, Joshy George, Sian Fereday, Elizabeth Loehrer, Peter Van Loo, Carleen Cullinane, Dariush Etemadmoghadam, Sarah Ftouni, Laura Galletta, Michael S Anglesio, et al. Lrp1b deletion in high-grade serous ovarian cancers is associated with acquired chemotherapy resistance to liposomal doxorubicin. Cancer Research, 72(16):4060-4073, 2012.</mixed-citation>

</ref>

<mixed-citation>Rodney Downey and Michael Fellows. Parameterized complexity. Springer Science & Business Media, 2012.</mixed-citation>

</ref>

<mixed-citation>Mohammed El-Kebir, Benjamin Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Copy-number evolutions: complexity and algorithms. In Proceedings of WABI'2016, LNCS, volume 9838, pages 137-149. Springer, 2016.</mixed-citation>

</ref>

<mixed-citation>Mohammed El-Kebir, Benjamin J Raphael, Ron Shamir, Roded Sharan, Simone Zaccaria, Meirav Zehavi, and Ron Zeira. Complexity and algorithms for copy-number evolution problems. Algorithms for Molecular Biology, 12(1):13, 2017.</mixed-citation>

</ref>

<mixed-citation>Michael Fellows, Danny Hermelin, Frances Rosamond, and Stephane Vialette. On the parameterized complexity of multiple-interval graph problems. Theoretical Computer Science, 410(1):53-61, 2009.</mixed-citation>

</ref>

<mixed-citation>Patrick Holloway, Krister Swenson, David Ardell, and Nadia El-Mabrouk. Ancestral genome organization: an alignment approach. Journal of Computational Biology, 20(4):280-295, 2013.</mixed-citation>

</ref>

<mixed-citation>Haitao Jiang, Chunfang Zheng, David Sankodd, and Binhai Zhu. Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Bioinformatics and Comput. Biology, 9(4):1220-1229, 2012.</mixed-citation>

</ref>

<mixed-citation>Manuel Lafond, Binhai Zhu, and Peng Zou. The tandem duplication distance is np-hard. In Proceedings of STACS'2020, LIPIcs, volume 154, pages 15:1-15:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2020.</mixed-citation>

</ref>

<mixed-citation>Carlo C Maley, Patricia C Galipeau, Jennifer C Finley, V Jon Wongsurawat, Xiaohong Li, Carissa A Sanchez, Thomas G Paulson, Patricia L Blount, Rosa-Ana Risques, Peter S Rabinovitch, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nature Genetics, 38(4):468, 2006.</mixed-citation>

</ref>

<mixed-citation>Andriy Marusyk, Vanessa Almendro, and Kornelia Polyak. Intra-tumour heterogeneity: a looking glass for cancer? Nature Reviews Cancer, 12(5):323, 2012.</mixed-citation>

</ref>

<mixed-citation>Nicholas Navin, Alexander Krasnitz, Linda Rodgers, Kerry Cook, Jennifer Meth, Jude Kendall, Michael Riggs, Yvonne Eberling, Jennifer Troge, Vladimir Grubor, et al. Inferring tumor progression from genomic heterogeneity. Genome Research, 20(1):68-80, 2010.</mixed-citation>

</ref>

<mixed-citation>Cancer Genome Atlas Research Network et al. Integrated genomic analyses of ovarian carcinoma. Nature, 474(7353):609, 2011.</mixed-citation>

</ref>

<mixed-citation>Letu Qingge, Xiaozhou He, Zhihui Liu, and Binhai Zhu. On the minimum copy number generation problem in cancer genomics. In Proceedings of ACM BCB'2018, pages 260-269. ACM, 2018.</mixed-citation>

</ref>

<mixed-citation>Gryte Satas, Simone Zaccaria, Geoffrey Mon, and Benjamin J Raphael. Scarlet: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Systems, 10(4):323-332, 2020.</mixed-citation>

</ref>

<mixed-citation>Roland F Schwarz, Anne Trinh, Botond Sipos, James D Brenton, Nick Goldman, and Florian Markowetz. Phylogenetic quantification of intra-tumour heterogeneity. PLoS Computational Biology, 10(4):e1003535, 2014.</mixed-citation>

</ref>

<mixed-citation>Sohrab P Shah, Ryan D Morin, Jaswinder Khattra, Leah Prentice, Trevor Pugh, Angela Burleigh, Allen Delaney, Karen Gelmon, Ryan Guliany, Janine Senz, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature, 461(7265):809, 2009.</mixed-citation>

</ref>

<mixed-citation>Ron Shamir, Meirav Zehavi, and Ron Zeira. A linear-time algorithm for the copy number transformation problem. In Proceedings of CPM'2016, LIPIcs, volume 54, pages 16:1-16:13. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.</mixed-citation>

</ref>

<mixed-citation>Luca Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of 33rd ACM Symp. on Theory of Comput. (STOC'01), pages 453-461. ACM, 2001.</mixed-citation>

</ref>

<mixed-citation>G.A. Watterson, W.J. Ewens, T.E. Hall, and A. Morgan. The chromosome inversion problem. J. Theoretical Biology, 99(1):1-7, 1982.</mixed-citation>

</ref>

<mixed-citation>Ruofan Xia, Yu Lin, Jun Zhou, Tieming Geng, Bing Feng, and Jijun Tang. Phylogenetic reconstruction for copy-number evolution problems. IEEE/ACM transactions on computational biology and bioinformatics, 16(2):694-699, 2018.</mixed-citation>

</ref>

<mixed-citation>Simone Zaccaria, Mohammed El-Kebir, Gunnar W Klau, and Benjamin J Raphael. Phylogenetic copy-number factorization of multiple tumor samples. Journal of Computational Biology, 25(7):689-708, 2018.</mixed-citation>

</ref>

</ref-list>

</back>

</book-part>

</book-part-wrapper>