Synteny Paths for Assembly Graphs Comparison

Authors Evgeny Polevikov, Mikhail Kolmogorov



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.24.pdf
  • Filesize: 0.65 MB
  • 14 pages

Document Identifiers

Author Details

Evgeny Polevikov
  • Bioinformatics Institute, Saint Petersburg, Russia
Mikhail Kolmogorov
  • Department of Computer Science and Engineering, University of California, San Diego, CA, USA

Acknowledgements

We are grateful to Dmitry Antipov, Pavel Avdeyev, Pavel Pevzner and Giulia Guidi for their helpful comments.

Cite AsGet BibTex

Evgeny Polevikov and Mikhail Kolmogorov. Synteny Paths for Assembly Graphs Comparison. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 24:1-24:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.24

Abstract

Despite the recent developments of long-read sequencing technologies, it is still difficult to produce complete assemblies of eukaryotic genomes in an automated fashion. Genome assembly software typically output assembled fragments (contigs) along with assembly graphs, that encode all possible layouts of these contigs. Graph representation of the assembled genome can be useful for gene discovery, haplotyping, structural variations analysis and other applications. To facilitate the development of new graph-based approaches, it is important to develop algorithms for comparison and evaluation of assembly graphs produced by different software. In this work, we introduce synteny paths: maximal paths of homologous sequence between the compared assembly graphs. We describe Asgan - an algorithm for efficient synteny paths decomposition, and use it to evaluate assembly graphs of various bacterial assemblies produced by different approaches. We then apply Asgan to discover structural variations between the assemblies of 15 Drosophila genomes, and show that synteny paths are robust to contig fragmentation. The Asgan tool is freely available at: https://github.com/epolevikov/Asgan.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • Assembly graphs
  • Genome assembly
  • Synteny blocks
  • Comparative Genomics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sergey S Aganezov and Max A Alekseyev. CAMSA: a tool for comparative analysis and merging of scaffold assemblies. BMC bioinformatics, 18(15):496, 2017. Google Scholar
  2. Dmitry Antipov, Anton Korobeynikov, Jeffrey S McLean, and Pavel A Pevzner. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics, 32(7):1009-1015, 2015. Google Scholar
  3. Pavel Avdeyev, Shuai Jiang, Sergey Aganezov, Fei Hu, and Max A Alekseyev. Reconstruction of ancestral genomes in presence of gene gain and loss. Journal of Computational Biology, 23(3):150-164, 2016. Google Scholar
  4. Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A Gurevich, Mikhail Dvorkin, Alexander S Kulikov, Valery M Lesin, Sergey I Nikolenko, Son Pham, Andrey D Prjibelski, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5):455-477, 2012. Google Scholar
  5. Ali Ebrahimpour Boroojeny, Akash Shrestha, Ali Sharifi-Zarchi, Suzanne Renick Gallagher, S Cenk Sahinalp, and Hamidreza Chitsaz. GTED: Graph traversal edit distance. In Research in Computational Molecular Biology, pages 37-53. Springer, 2018. Google Scholar
  6. Drosophila 12 Genomes Consortium et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature, 450(7167):203, 2007. Google Scholar
  7. Jack Edmonds. Paths, trees, and flowers. Canadian Journal of mathematics, 17:449-467, 1965. Google Scholar
  8. Cristina G Ghiurcuta and Bernard ME Moret. Evaluating synteny for improved comparative studies. Bioinformatics, 30(12):i9-i18, 2014. Google Scholar
  9. Zamin Iqbal, Mario Caccamo, Isaac Turner, Paul Flicek, and Gil McVean. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature genetics, 44(2):226, 2012. Google Scholar
  10. Govinda M Kamath, Ilan Shomorony, Fei Xia, Thomas A Courtade, and N Tse David. HINGE: long-read assembly achieves optimal repeat resolution. Genome research, 27(5):747-756, 2017. Google Scholar
  11. W James Kent, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences, 100(20):11484-11489, 2003. Google Scholar
  12. Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin, and Pavel A Pevzner. Assembly of long, error-prone reads using repeat graphs. Nature biotechnology, 37(5):540, 2019. Google Scholar
  13. Sergey Koren, Brian P Walenz, Konstantin Berlin, Jason R Miller, Nicholas H Bergman, and Adam M Phillippy. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome research, 27(5):722-736, 2017. Google Scholar
  14. Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10 (8), pages 707-710, 1966. Google Scholar
  15. Heng Li. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18):3094-3100, 2018. Google Scholar
  16. Yu Lin, Sergey Nurk, and Pavel A Pevzner. What is the difference between the breakpoint graph and the de Bruijn graph? BMC genomics, 15(6):S6, 2014. Google Scholar
  17. Dang Liu, Martin Hunt, and Isheng J Tsai. Inferring synteny between genome assemblies: a systematic evaluation. BMC bioinformatics, 19(1):26, 2018. Google Scholar
  18. Serghei Mangul and David Koslicki. Reference-free comparison of microbial communities via de Bruijn graphs. In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 68-77. ACM, 2016. Google Scholar
  19. Alla Mikheenko and Mikhail Kolmogorov. Assembly Graph Browser: interactive visualization of assembly graphs. Bioinformatics, 2019. Google Scholar
  20. Danny E Miller, Cynthia Staber, Julia Zeitlinger, and R Scott Hawley. Highly contiguous genome assemblies of 15 Drosophila species generated using nanopore sequencing. G3: Genes, Genomes, Genetics, 8(10):3131-3141, 2018. Google Scholar
  21. Ilya Minkin, Anand Patel, Mikhail Kolmogorov, Nikolay Vyahhi, and Son Pham. Sibelia: a scalable and comprehensive synteny block generation tool for closely related microbial genomes. In International Workshop on Algorithms in Bioinformatics, pages 215-229. Springer, 2013. Google Scholar
  22. Supratim Mukherjee, Dimitri Stamatis, Jon Bertsch, Galina Ovchinnikova, Hema Y Katta, Alejandro Mojica, I-Min A Chen, Nikos C Kyrpides, and TBK Reddy. Genomes OnLine database (GOLD) v. 7: updates and new features. Nucleic acids research, 47(D1):D649-D659, 2018. Google Scholar
  23. Eugene W Myers, Granger G Sutton, Art L Delcher, Ian M Dew, Dan P Fasulo, Michael J Flanigan, Saul A Kravitz, Clark M Mobarry, Knut HJ Reinert, Karin A Remington, et al. A whole-genome assembly of Drosophila. Science, 287(5461):2196-2204, 2000. Google Scholar
  24. Pavel Pevzner and Glenn Tesler. Transforming men into mice: the Nadeau-Taylor chromosomal breakage model revisited. In Proceedings of the seventh annual international conference on Research in computational molecular biology, pages 247-256. ACM, 2003. Google Scholar
  25. Pavel A Pevzner, Haixu Tang, and Glenn Tesler. De novo repeat classification and fragment assembly. Genome research, 14(9):1786-1796, 2004. Google Scholar
  26. Pavel A Pevzner, Haixu Tang, and Michael S Waterman. An Eulerian path approach to DNA fragment assembly. Proceedings of the national academy of sciences, 98(17):9748-9753, 2001. Google Scholar
  27. Adam M Phillippy. New advances in sequence assembly. Genome Research, 27:xi-xiii, 2017. Google Scholar
  28. Sebastian Proost, Jan Fostier, Dieter De Witte, Bart Dhoedt, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele. i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets. Nucleic acids research, 40(2):e11-e11, 2011. Google Scholar
  29. Christian Rödelsperger and Christoph Dieterich. CYNTENATOR: progressive gene order alignment of 17 vertebrate genomes. PloS one, 5(1):e8861, 2010. Google Scholar
  30. Yana Safonova, Anton Bankevich, and Pavel A Pevzner. dipSPAdes: assembler for highly polymorphic diploid genomes. Journal of Computational Biology, 22(6):528-545, 2015. Google Scholar
  31. Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, 4(4):406-425, 1987. Google Scholar
  32. Michael C Schatz, Arthur L Delcher, and Steven L Salzberg. Assembly of large genomes using second-generation sequencing. Genome research, 20(9):1165-1173, 2010. Google Scholar
  33. Alex Shlemov and Anton Korobeynikov. PathRacer: racing profile HMM paths on assembly graph. BioRxiv, page 562579, 2019. Google Scholar
  34. Mitchell R Vollger, Philip C Dishuck, Melanie Sorensen, AnneMarie E Welch, Vy Dang, Max L Dougherty, Tina A Graves-Lindsay, Richard K Wilson, Mark JP Chaisson, and Evan E Eichler. Long-read sequence and assembly of segmental duplications. Nature methods, 16(1):88, 2019. Google Scholar
  35. Ryan R Wick, Louise M Judd, Claire L Gorrie, and Kathryn E Holt. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS computational biology, 13(6):e1005595, 2017. Google Scholar
  36. Ryan R Wick, Mark B Schultz, Justin Zobel, and Kathryn E Holt. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics, 31(20):3350-3352, 2015. Google Scholar
  37. Aleksey V Zimin, Douglas R Smith, Granger Sutton, and James A Yorke. Assembly reconciliation. Bioinformatics, 24(1):42-45, 2007. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail