AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction

Authors Adam Cicherski , Anna Lisiecka , Norbert Dojer



PDF
Thumbnail PDF

File

LIPIcs.WABI.2024.23.pdf
  • Filesize: 1.04 MB
  • 18 pages

Document Identifiers

Author Details

Adam Cicherski
  • Institute of Informatics, University of Warsaw, Poland
Anna Lisiecka
  • Institute of Informatics, University of Warsaw, Poland
Norbert Dojer
  • Institute of Informatics, University of Warsaw, Poland

Cite AsGet BibTex

Adam Cicherski, Anna Lisiecka, and Norbert Dojer. AlfaPang: Alignment Free Algorithm for Pangenome Graph Construction. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 23:1-23:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.WABI.2024.23

Abstract

The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the current paper we present AlfaPang, a new pangenome graph building algorithm. AlfaPang is based on a novel alignment-free approach that allows to construct pangenome graphs using significantly less computational resources than state-of-the-art tools. The code of AlfaPang is freely available at https://github.com/AdamCicherski/AlfaPang.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational genomics
Keywords
  • pangenome
  • variation graph
  • genome alignment
  • population genomics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Francesco Andreace, Pierre Lechat, Yoann Dufresne, and Rayan Chikhi. Comparing methods for constructing and representing human pangenome graphs. Genome Biology, 24(1), 2023. URL: https://doi.org/10.1186/s13059-023-03098-2.
  2. Jasmijn A. Baaijens, Paola Bonizzoni, Christina Boucher, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, and Jouni Sirén. Computational graph pangenomics: a tutorial on data structures and their applications. Nat. Comput., 21(1):81-108, 2022. URL: https://doi.org/10.1007/s11047-022-09882-6.
  3. Jasmijn A Baaijens, Bastiaan Van der Roest, Johannes Köster, Leen Stougie, and Alexander Schönhuth. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics, 35(24):5086-5094, December 2019. URL: https://doi.org/10.1093/bioinformatics/btz443.
  4. Adam Cicherski. AlfaPang. Software, version 1.0., This work was supported by the National Science Centre, Poland, under grant number 2022/47/B/ST6/03154, swhId: https://archive.softwareheritage.org/swh:1:dir:e8a27a620673d796d0701ab29a39aa2383bece22;origin=https://github.com/AdamCicherski/AlfaPang;visit=swh:1:snp:5649c0cf7d3bf4a80af4c9d378609327f55eb365;anchor=swh:1:rev:817d39e4b8b301cd7fe69957d55371ae145396aa (visited on 2024-08-16). URL: https://github.com/AdamCicherski/AlfaPang.
  5. Adam Cicherski and Norbert Dojer. From de bruijn graphs to variation graphs - relationships between pangenome models. In Franco Maria Nardini, Nadia Pisanti, and Rossano Venturini, editors, String Processing and Information Retrieval 2023, pages 114-128, Cham, 2023. Springer Nature Switzerland. Google Scholar
  6. Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief Bioinformatics, 19(1):118-135, January 2018. URL: https://doi.org/10.1093/bib/bbw089.
  7. Dent Earl, Ngan Nguyen, Glenn Hickey, Robert S Harris, Stephen Fitzgerald, Kathryn Beal, Igor Seledtsov, Vladimir Molodtsov, Brian J Raney, Hiram Clawson, Jaebum Kim, Carsten Kemena, Jia-Ming Chang, Ionas Erb, Alexander Poliakov, Minmei Hou, Javier Herrero, William James Kent, Victor Solovyev, Aaron E Darling, Jian Ma, Cedric Notredame, Michael Brudno, Inna Dubchak, David Haussler, and Benedict Paten. Alignathon: a competitive assessment of whole-genome alignment methods. Genome Res, 24(12):2077-2089, October 2014. Google Scholar
  8. Jack Edmonds and Ellis L. Johnson. Matching: A well-solved class of integer linear programs. In Michael Jünger, Gerhard Reinelt, Giovanni Rinaldi, Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen, editors, Combinatorial optimization emdash eureka, you shrink! papers dedicated to jack edmonds 5th international workshop aussois, france, march 5–9, 2001 revised papers, volume 2570 of Lecture notes in computer science, pages 27-30. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003. URL: https://doi.org/10.1007/3-540-36478-1_3.
  9. Jordan M Eizenga, Adam M Novak, Jonas A Sibbesen, Simon Heumos, Ali Ghaffaari, Glenn Hickey, Xian Chang, Josiah D Seaman, Robin Rounthwaite, Jana Ebler, Mikko Rautiainen, Shilpa Garg, Benedict Paten, Tobias Marschall, Jouni Sirén, and Erik Garrison. Pangenome graphs. Annu Rev Genomics Hum Genet, 21:139-162, August 2020. URL: https://doi.org/10.1146/annurev-genom-120219-080406.
  10. Shilpa Garg, Renzo Balboa, and Josiah Kuja. Chromosome-scale haplotype-resolved pangenomics. Trends in Genetics, 38(11):1103-1107, November 2022. URL: https://doi.org/10.1016/j.tig.2022.06.011.
  11. Erik Garrison and Andrea Guarracino. Unbiased pangenome graphs. Bioinformatics, 39(1), January 2023. URL: https://doi.org/10.1093/bioinformatics/btac743.
  12. Erik Garrison, Andrea Guarracino, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, and Pjotr Prins. Building pangenome graphs. bioRxiv, 2023. URL: https://doi.org/10.1101/2023.04.05.535718.
  13. Erik Garrison, Jouni Sirén, Adam M Novak, Glenn Hickey, Jordan M Eizenga, Eric T Dawson, William Jones, Shilpa Garg, Charles Markello, Michael F Lin, Benedict Paten, and Richard Durbin. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol, 36(9):875-879, October 2018. URL: https://doi.org/10.1038/nbt.4227.
  14. Andrea Guarracino, Simon Heumos, Sven Nahnsen, Pjotr Prins, and Erik Garrison. ODGI: understanding pangenome graphs. Bioinformatics, 38(13):3319-3326, June 2022. URL: https://doi.org/10.1093/bioinformatics/btac308.
  15. Glenn Hickey, Jean Monlong, Jana Ebler, Adam M. Novak, Jordan M. Eizenga, Yan Gao, Haley J. Abel, Lucinda L. Antonacci-Fulton, Mobin Asri, Gunjan Baid, Carl A. Baker, Anastasiya Belyaeva, Konstantinos Billis, Guillaume Bourque, Silvia Buonaiuto, Andrew Carroll, Mark J. P. Chaisson, Pi-Chuan Chang, Xian H. Chang, Haoyu Cheng, Justin Chu, Sarah Cody, Vincenza Colonna, Daniel E. Cook, Robert M. Cook-Deegan, Omar E. Cornejo, Mark Diekhans, Daniel Doerr, Peter Ebert, Jana Ebler, Evan E. Eichler, Susan Fairley, Olivier Fedrigo, Adam L. Felsenfeld, Xiaowen Feng, Christian Fischer, Paul Flicek, Giulio Formenti, Adam Frankish, Robert S. Fulton, Shilpa Garg, Erik Garrison, Nanibaa’ A. Garrison, Carlos Garcia Giron, Richard E. Green, Cristian Groza, Andrea Guarracino, Leanne Haggerty, Ira M. Hall, William T. Harvey, Marina Haukness, David Haussler, Simon Heumos, Kendra Hoekzema, Thibaut Hourlier, Kerstin Howe, Miten Jain, Erich D. Jarvis, Hanlee P. Ji, Eimear E. Kenny, Barbara A. Koenig, Alexey Kolesnikov, Jan O. Korbel, Jennifer Kordosky, Sergey Koren, HoJoon Lee, Alexandra P. Lewis, Wen-Wei Liao, Shuangjia Lu, Tsung-Yu Lu, Julian K. Lucas, Hugo Magalhães, Santiago Marco-Sola, Pierre Marijon, Charles Markello, Tobias Marschall, Fergal J. Martin, Ann McCartney, Jennifer McDaniel, Karen H. Miga, Matthew W. Mitchell, Jacquelyn Mountcastle, Katherine M. Munson, Moses Njagi Mwaniki, Maria Nattestad, Sergey Nurk, Hugh E. Olsen, Nathan D. Olson, Trevor Pesout, Adam M. Phillippy, Alice B. Popejoy, David Porubsky, Pjotr Prins, Daniela Puiu, Mikko Rautiainen, Allison A. Regier, Arang Rhie, Samuel Sacco, Ashley D. Sanders, Valerie A. Schneider, Baergen I. Schultz, Kishwar Shafin, Jonas A. Sibbesen, Jouni Sirén, Michael W. Smith, Heidi J. Sofia, Ahmad N. Abou Tayoun, Françoise Thibaud-Nissen, Chad Tomlinson, Francesca Floriana Tricomi, Flavia Villani, Mitchell R. Vollger, Justin Wagner, Brian Walenz, Ting Wang, Jonathan M. D. Wood, Aleksey V. Zimin, Justin M. Zook, Tobias Marschall, Heng Li, and Benedict Paten. Pangenome graph construction from genome alignments with minigraph-cactus. Nature Biotechnology, 42(4):663-673, May 2023. URL: https://doi.org/10.1038/s41587-023-01793-w.
  16. Guillaume Holley and Páll Melsted. Bifrost: highly parallel construction and indexing of colored and compacted de bruijn graphs. Genome Biology, 21(1), September 2020. URL: https://doi.org/10.1186/s13059-020-02135-8.
  17. Jamshed Khan and Rob Patro. Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections. Bioinformatics, 37(Supplement_1):i177-i186, July 2021. URL: https://doi.org/10.1093/bioinformatics/btab309.
  18. Ilia Minkin, Son Pham, and Paul Medvedev. TwoPaCo: an efficient algorithm to build the compacted de bruijn graph from many complete genomes. Bioinformatics, 33(24):4024-4032, December 2017. URL: https://doi.org/10.1093/bioinformatics/btw609.
  19. Samuel O'Donnell, Jia-Xing Yue, Omar Abou Saada, Nicolas Agier, Claudia Caradec, Thomas Cokelaer, Matteo De Chiara, Stéphane Delmas, Fabien Dutreux, Téo Fournier, Anne Friedrich, Etienne Kornobis, Jing Li, Zepu Miao, Lorenzo Tattini, Joseph Schacherer, Gianni Liti, and Gilles Fischer. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in saccharomyces cerevisiae. Nature Genetics, 55(8):1390-1399, August 2023. URL: https://doi.org/10.1038/s41588-023-01459-y.
  20. Benedict Paten, Adam M Novak, Jordan M Eizenga, and Erik Garrison. Genome graphs and the evolution of genome inference. Genome Res, 27(5):665-676, May 2017. URL: https://doi.org/10.1101/gr.214155.116.
  21. Mikko Rautiainen and Tobias Marschall. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol, 21(1):253, September 2020. URL: https://doi.org/10.1186/s13059-020-02157-2.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail