Constructing Founder Sets Under Allelic and Non-Allelic Homologous Recombination

Authors Konstantinn Bonnet, Tobias Marschall , Daniel Doerr¹



PDF
Thumbnail PDF

File

LIPIcs.WABI.2022.6.pdf
  • Filesize: 1.58 MB
  • 23 pages

Document Identifiers

Author Details

Konstantinn Bonnet
  • Institute for Medical Biometry and Bioinformatics, Heinrich Heine University, Düsseldorf, Germany
Tobias Marschall
  • Institute for Medical Biometry and Bioinformatics, Heinrich Heine University, Düsseldorf, Germany
Daniel Doerr¹
  • Institute for Medical Biometry and Bioinformatics, Heinrich Heine University, Düsseldorf, Germany

Acknowledgements

The authors kindly thank Feyza Yilmaz for providing the haplotype data of the 1p36.13 locus.

Cite As Get BibTex

Konstantinn Bonnet, Tobias Marschall, and Daniel Doerr¹. Constructing Founder Sets Under Allelic and Non-Allelic Homologous Recombination. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 6:1-6:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.WABI.2022.6

Abstract

Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements - including deletion, duplication, and inversion - and is underlying many genomic disorders. Despite its importance for genome evolution and disease, there is a lack of computational models to study genomic loci prone to NAHR.
In this work, we propose such a computational model, providing a unified framework for both (allelic) homologous recombination and NAHR. Our model represents a set of genomes as a graph, where human haplotypes correspond to walks through this graph. We formulate two founder set problems under our recombination model, provide flow-based algorithms for their solution, and demonstrate scalability to problem instances arising in practice.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • founder set reconstruction
  • variation graph
  • pangenomics
  • NAHR
  • homologous recombination

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1 edition, February 1993. Google Scholar
  2. David A Bader, Bernard ME Moret, and Mi Yan. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. In 1st International Workshop on Algorithms in Bioinformatics (WABI 2001), Algorithms in Bioinformatics, pages 365-376, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg. Google Scholar
  3. Vineet Bafna and Pavel A. Pevzner. Genome rearrangements and sorting by reversals. SIAM Journal on Computing, 25(2):272-289, 1996. Google Scholar
  4. Vineet Bafna and Pavel A. Pevzner. Sorting by transpositions. SIAM Journal on Discrete Mathematics, 11(2):224-240, 1998. Google Scholar
  5. Anne Bergeron, Julia Mixtacki, and Jens Stoye. A unifying view of genome rearrangements. In Philipp Bucher and Bernard M. E. Moret, editors, 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), volume 4175 of Algorithms in Bioinformatics, pages 163-173, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. Google Scholar
  6. Leonard Bohnenkämper, Marília DV Braga, Daniel Doerr, and Jens Stoye. Computing the rearrangement distance of natural genomes. Journal of Computational Biology, 28(4):410-431, 2021. Google Scholar
  7. Mark J P Chaisson, Ashley D Sanders, Xuefang Zhao, Ankit Malhotra, David Porubsky, Tobias Rausch, Eugene J Gardner, Oscar L Rodriguez, Li Guo, Ryan L Collins, Xian Fan, Jia Wen, Robert E Handsaker, Susan Fairley, Zev N Kronenberg, Xiangmeng Kong, Fereydoun Hormozdiari, Dillon Lee, Aaron M Wenger, Alex R Hastie, Danny Antaki, Thomas Anantharaman, Peter A Audano, Harrison Brand, Stuart Cantsilieris, Han Cao, Eliza Cerveira, Chong Chen, Xintong Chen, Chen-Shan Chin, Zechen Chong, Nelson T Chuang, Christine C Lambert, Deanna M Church, Laura Clarke, Andrew Farrell, Joey Flores, Timur Galeev, David U Gorkin, Madhusudan Gujral, Victor Guryev, William Haynes Heaton, Jonas Korlach, Sushant Kumar, Jee Young Kwon, Ernest T Lam, Jong Eun Lee, Joyce Lee, Wan-Ping Lee, Sau Peng Lee, Shantao Li, Patrick Marks, Karine Viaud-Martinez, Sascha Meiers, Katherine M Munson, Fabio C P Navarro, Bradley J Nelson, Conor Nodzak, Amina Noor, Sofia Kyriazopoulou-Panagiotopoulou, Andy W C Pang, Yunjiang Qiu, Gabriel Rosanio, Mallory Ryan, Adrian Stütz, Diana C J Spierings, Alistair Ward, Annemarie E Welch, Ming Xiao, Wei Xu, Chengsheng Zhang, Qihui Zhu, Xiangqun Zheng-Bradley, Ernesto Lowy, Sergei Yakneen, Steven McCarroll, Goo Jun, Li Ding, Chong Lek Koh, Bing Ren, Paul Flicek, Ken Chen, Mark B Gerstein, Pui-Yan Kwok, Peter M Lansdorp, Gabor T Marth, Jonathan Sebat, Xinghua Shi, Ali Bashir, Kai Ye, Scott E Devine, Michael E Talkowski, Ryan E Mills, Tobias Marschall, Jan O Korbel, Evan E Eichler, and Charles Lee. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun., 10(1):1784, April 2019. Google Scholar
  8. Zanoni Dias and Joao Meidanis. Genome rearrangements distance by fusion, fission, and transposition is easy. In spire, pages 250-253. Citeseer, 2001. Google Scholar
  9. Richard Durbin. Efficient haplotype matching and storage using the positional burrows-wheeler transform (pbwt). Bioinformatics, 30(9):1266-1272, 2014. Google Scholar
  10. Peter Ebert, Peter A Audano, Qihui Zhu, Bernardo Rodriguez-Martin, David Porubsky, Marc Jan Bonder, Arvis Sulovari, Jana Ebler, Weichen Zhou, Rebecca Serra Mari, Feyza Yilmaz, Xuefang Zhao, Pinghsun Hsieh, Joyce Lee, Sushant Kumar, Jiadong Lin, Tobias Rausch, Yu Chen, Jingwen Ren, Martin Santamarina, Wolfram Höps, Hufsah Ashraf, Nelson T Chuang, Xiaofei Yang, Katherine M Munson, Alexandra P Lewis, Susan Fairley, Luke J Tallon, Wayne E Clarke, Anna O Basile, Marta Byrska-Bishop, André Corvelo, Uday S Evani, Tsung-Yu Lu, Mark J P Chaisson, Junjie Chen, Chong Li, Harrison Brand, Aaron M Wenger, Maryam Ghareghani, William T Harvey, Benjamin Raeder, Patrick Hasenfeld, Allison A Regier, Haley J Abel, Ira M Hall, Paul Flicek, Oliver Stegle, Mark B Gerstein, Jose M C Tubio, Zepeng Mu, Yang I Li, Xinghua Shi, Alex R Hastie, Kai Ye, Zechen Chong, Ashley D Sanders, Michael C Zody, Michael E Talkowski, Ryan E Mills, Scott E Devine, Charles Lee, Jan O Korbel, Tobias Marschall, and Evan E Eichler. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science, February 2021. Google Scholar
  11. LLC Gurobi Optimization. Gurobi optimizer reference manual, 2019. URL: http://www.gurobi.com.
  12. Heng Li, Xiaowen Feng, and Chong Chu. The design and construction of reference pangenome graphs with minigraph. Genome biology, 21(1):1-19, 2020. Google Scholar
  13. Tomas Marques-Bonet, Santhosh Girirajan, and Evan E Eichler. The origins and impact of primate segmental duplications. Trends Genet., 25(10):443-454, October 2009. Google Scholar
  14. Nicholas D Matsakis and Felix S Klock II. The rust language. In ACM SIGAda Ada Letters, volume 34(3), pages 103-104. ACM, 2014. Google Scholar
  15. Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B Hall, Christopher H Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O Twardziok, Alexander Kanitz, et al. Sustainable data analysis with snakemake. F1000Research, 10, 2021. Google Scholar
  16. Tuukka Norri, Bastien Cazaux, Saska Dönges, Daniel Valenzuela, and Veli Mäkinen. Founder reconstruction enables scalable and seamless pangenomic analysis. Bioinformatics, 37(24):4611-4619, July 2021. Google Scholar
  17. Tuukka Norri, Bastien Cazaux, Dmitry Kosolobov, and Veli Mäkinen. Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018), volume 113 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1-15:15, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  18. Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V Bzikadze, Alla Mikheenko, Mitchell R Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, Sergey Aganezov, Savannah J Hoyt, Mark Diekhans, Glennis A Logsdon, Michael Alonge, Stylianos E Antonarakis, Matthew Borchers, Gerard G Bouffard, Shelise Y Brooks, Gina V Caldas, Nae-Chyun Chen, Haoyu Cheng, Chen-Shan Chin, William Chow, Leonardo G de Lima, Philip C Dishuck, Richard Durbin, Tatiana Dvorkina, Ian T Fiddes, Giulio Formenti, Robert S Fulton, Arkarachai Fungtammasan, Erik Garrison, Patrick G S Grady, Tina A Graves-Lindsay, Ira M Hall, Nancy F Hansen, Gabrielle A Hartley, Marina Haukness, Kerstin Howe, Michael W Hunkapiller, Chirag Jain, Miten Jain, Erich D Jarvis, Peter Kerpedjiev, Melanie Kirsche, Mikhail Kolmogorov, Jonas Korlach, Milinn Kremitzki, Heng Li, Valerie V Maduro, Tobias Marschall, Ann M McCartney, Jennifer McDaniel, Danny E Miller, James C Mullikin, Eugene W Myers, Nathan D Olson, Benedict Paten, Paul Peluso, Pavel A Pevzner, David Porubsky, Tamara Potapova, Evgeny I Rogaev, Jeffrey A Rosenfeld, Steven L Salzberg, Valerie A Schneider, Fritz J Sedlazeck, Kishwar Shafin, Colin J Shew, Alaina Shumate, Ying Sims, Arian F A Smit, Daniela C Soto, Ivan Sović, Jessica M Storer, Aaron Streets, Beth A Sullivan, Françoise Thibaud-Nissen, James Torrance, Justin Wagner, Brian P Walenz, Aaron Wenger, Jonathan M D Wood, Chunlin Xiao, Stephanie M Yan, Alice C Young, Samantha Zarate, Urvashi Surti, Rajiv C McCoy, Megan Y Dennis, Ivan A Alexandrov, Jennifer L Gerton, Rachel J O'Neill, Winston Timp, Justin M Zook, Michael C Schatz, Evan E Eichler, Karen H Miga, and Adam M Phillippy. The complete sequence of a human genome. Science, 376(6588):44-53, April 2022. Google Scholar
  19. Laxmi Parida, Marta Melé, Francesc Calafell, Jaume Bertranpetit, and Genographic Consortium. Estimating the ancestral recombinations graph (arg) as compatible networks of snp patterns. Journal of Computational Biology, 15(9):1133-1153, 2008. Google Scholar
  20. David Porubsky, Peter Ebert, Peter A Audano, Mitchell R Vollger, William T Harvey, Pierre Marijon, Jana Ebler, Katherine M Munson, Melanie Sorensen, Arvis Sulovari, Marina Haukness, Maryam Ghareghani, Peter M Lansdorp, Benedict Paten, Scott E Devine, Ashley D Sanders, Charles Lee, Mark J P Chaisson, Jan O Korbel, Evan E Eichler, Tobias Marschall, and Human Genome Structural Variation Consortium. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol., December 2020. Google Scholar
  21. David Porubsky, Wolfram Höps, Hufsah Ashraf, PingHsun Hsieh, Bernardo Rodriguez-Martin, Feyza Yilmaz, Jana Ebler, Pille Hallast, Flavia Angela Maria Maggiolini, William T. Harvey, Barbara Henning, Peter A. Audano, David S. Gordon, Peter Ebert, Patrick Hasenfeld, Eva Benito, Qihui Zhu, Human Genome Structural Variation Consortium (HGSVC), Charles Lee, Francesca Antonacci, Matthias Steinrücken, Christine R. Beck, Ashley D. Sanders, Tobias Marschall, Evan E. Eichler, and Jan O. Korbel. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell, 2022. Google Scholar
  22. Pasi Rastas and Esko Ukkonen. Haplotype inference via hierarchical genotype parsing. In Raffaele Giancarlo and Sridhar Hannenhalli, editors, 7th International Workshop on Algorithms in Bioinformatics (WABI 2007), Algorithms in Bioinformatics, pages 85-97, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google Scholar
  23. Mikko Rautiainen and Tobias Marschall. MBG: Minimizer-based sparse de Bruijn Graph construction. Bioinformatics, 37(16):2476-2478, January 2021. Google Scholar
  24. Andrea Roli, Stefano Benedettini, Thomas Stützle, and Christian Blum. Large neighbourhood search algorithms for the founder sequence reconstruction problem. Computers & Operations Research, 39(2):213-224, 2012. Google Scholar
  25. Andrea Roli and Christian Blum. Tabu search for the founder sequence reconstruction problem: A preliminary study. In Sigeru Omatu, Miguel P. Rocha, José Bravo, Florentino Fernández, Emilio Corchado, Andrés Bustillo, and Juan M. Corchado, editors, Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, pages 1035-1042, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. Google Scholar
  26. Russell Schwartz, Andrew G Clark, and Sorin Istrail. Methods for inferring block-wise ancestral history from haploid sequences. In 2nd International Workshop on Algorithms in Bioinformatics (WABI 2002), Algorithms in Bioinformatics, pages 44-59, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. Google Scholar
  27. Fritz J Sedlazeck, Hayan Lee, Charlotte A Darby, and Michael C Schatz. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet., March 2018. Google Scholar
  28. Mingfu Shao, Yu Lin, and Bernard M. E. Moret. An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. Journal of Computational Biology, 22(5):425-435, 2015. URL: https://doi.org/10.1089/cmb.2014.0096.
  29. Krister M Swenson, Paul Guertin, Hugo Deschênes, and Anne Bergeron. Reconstructing the modular recombination history of staphylococcus aureus phages. BMC bioinformatics, 14(15):1-9, 2013. Google Scholar
  30. Esko Ukkonen. Finding founder sequences from a set of recombinants. In 2nd International Workshop on Algorithms in Bioinformatics (WABI 2002), Algorithms in Bioinformatics, pages 277-286, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. Google Scholar
  31. Mitchell R Vollger, Xavi Guitart, Philip C Dishuck, Ludovica Mercuri, William T Harvey, Ariel Gershman, Mark Diekhans, Arvis Sulovari, Katherine M Munson, Alexandra P Lewis, Kendra Hoekzema, David Porubsky, Ruiyang Li, Sergey Nurk, Sergey Koren, Karen H Miga, Adam M Phillippy, Winston Timp, Mario Ventura, and Evan E Eichler. Segmental duplications and their variation in a complete human genome. Science, 376(6588):eabj6965, April 2022. Google Scholar
  32. Maria Emilia MT Walter, Zanoni Dias, and Joao Meidanis. Reversal and transposition distance of linear chromosomes. In Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No. 98EX207), pages 96-102. IEEE, 1998. Google Scholar
  33. Ting Wang, Lucinda Antonacci-Fulton, Kerstin Howe, Heather A Lawson, Julian K Lucas, Adam M Phillippy, Alice B Popejoy, Mobin Asri, Caryn Carson, Mark J P Chaisson, Xian Chang, Robert Cook-Deegan, Adam L Felsenfeld, Robert S Fulton, Erik P Garrison, Nanibaa' A Garrison, Tina A Graves-Lindsay, Hanlee Ji, Eimear E Kenny, Barbara A Koenig, Daofeng Li, Tobias Marschall, Joshua F McMichael, Adam M Novak, Deepak Purushotham, Valerie A Schneider, Baergen I Schultz, Michael W Smith, Heidi J Sofia, Tsachy Weissman, Paul Flicek, Heng Li, Karen H Miga, Benedict Paten, Erich D Jarvis, Ira M Hall, Evan E Eichler, David Haussler, and Human Pangenome Reference Consortium. The human pangenome project: a global resource to map genomic diversity. Nature, 604(7906):437-446, April 2022. Google Scholar
  34. Ryan R. Wick, Mark B. Schultz, Justin Zobel, and Kathryn E. Holt. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics, 31(20):3350-3352, June 2015. URL: https://doi.org/10.1093/bioinformatics/btv383.
  35. Yufeng Wu and Dan Gusfield. Improved algorithms for inferring the minimum mosaic of a set of recombinants. In Bin Ma and Kaizhong Zhang, editors, Combinatorial Pattern Matching, pages 150-161, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg. Google Scholar
  36. Sophia Yancopoulos, Oliver Attie, and Richard Friedberg. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16):3340-3346, 2005. URL: https://doi.org/10.1093/bioinformatics/bti535.
  37. Xuefang Zhao, Ryan L Collins, Wan-Ping Lee, Alexandra M Weber, Yukyung Jun, Qihui Zhu, Ben Weisburd, Yongqing Huang, Peter A Audano, Harold Wang, Mark Walker, Chelsea Lowther, Jack Fu, Mark B Gerstein, Scott E Devine, Tobias Marschall, Jan O Korbel, Evan E Eichler, Mark J P Chaisson, Charles Lee, Ryan E Mills, Harrison Brand, and Michael E Talkowski. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet., March 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail