Disentangled Long-Read De Bruijn Graphs via Optical Maps

Authors Bahar Alipanahi, Leena Salmela, Simon J. Puglisi, Martin Muggli, Christina Boucher



PDF
Thumbnail PDF

File

LIPIcs.WABI.2017.1.pdf
  • Filesize: 0.64 MB
  • 14 pages

Document Identifiers

Author Details

Bahar Alipanahi
Leena Salmela
Simon J. Puglisi
Martin Muggli
Christina Boucher

Cite AsGet BibTex

Bahar Alipanahi, Leena Salmela, Simon J. Puglisi, Martin Muggli, and Christina Boucher. Disentangled Long-Read De Bruijn Graphs via Optical Maps. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 1:1-1:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.WABI.2017.1

Abstract

While long reads produced by third-generation sequencing technology from, e.g, Pacific Biosciences have been shown to increase the quality of draft genomes in repetitive regions, fundamental computational challenges remain in overcoming their high error rate and assembling them efficiently. In this paper we show that the de Bruijn graph built on the long reads can be efficiently and substantially disentangled using optical mapping data as auxiliary information. Fundamental to our approach is the use of the positional de Bruijn graph and a succinct data structure for constructing and traversing this graph. Our experimental results show that over 97.7% of directed cycles have been removed from the resulting positional de Bruijn graph as compared to its non-positional counterpart. Our results thus indicate that disentangling the de Bruijn graph using positional information is a promising direction for developing a simple and efficient assembly algorithm for long reads.
Keywords
  • Positional de Bruijn graph
  • Genome Assembly
  • Long Read Data
  • Optical maps

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Bankevich et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comp. Bio., 19(5):455-477, 2012. Google Scholar
  2. K. Berlin et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nature Biotech., 33:623-630, 2015. Google Scholar
  3. A. Bowe et al. Succinct de Bruijn graphs. In Proc. WABI, pages 225-235, 2012. Google Scholar
  4. M. Burrows and D. J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994. Google Scholar
  5. R. Chikhi et al. On the representation of de Bruijn graphs. In Proc. RECOMB, pages 35-55, 2014. Google Scholar
  6. R. Chikhi and G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol., 8(22), 2012. Google Scholar
  7. C.-S. Chin et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods, 10(6):563-569, 2013. Google Scholar
  8. C.-S. Chin, P. Peluso, F. J. Sedlazeck, M. Nattestad, G. T. Concepcion, A. Clum, C. Dunn, and R. et al. O'Malley. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods, 13:1050-1054, 2016. Google Scholar
  9. Y. Dong et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nature Biotech., 31(2):135-141, 2013. Google Scholar
  10. S. Gog et al. From theory to practice: Plug and play with succinct data structures. In Proc. SEA, pages 326-337, 2014. Google Scholar
  11. A. Gurevich et al. QUAST: Quality assessment tool for genome assemblies. Bioinformatics, 29(8):1072-1075, 2013. Google Scholar
  12. Li. H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, pages 2103-2110, 2016. Google Scholar
  13. R. M. Idury and M. S. Waterman. A new algorithm for DNA sequence assembly. J. Comp. Bio., 2:291-306, 1995. Google Scholar
  14. S. Koren et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017. URL: http://dx.doi.org/10.1101/gr.215087.116.
  15. S. Koren and A. M. Phillippy. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Cur. Opin. Microbiol., 23:110-120, 2015. Google Scholar
  16. A. K.-Y. Leung et al. OMBlast: Alignment tool for optical mapping using a seed-and-extend approach. Bioinformatics, 2016. To appear. Google Scholar
  17. Y. Lin et al. Assembly of long error-prone reads using de bruijn graphs. Proceedings of the National Academy of Sciences, 2016. URL: http://dx.doi.org/10.1073/pnas.1604560113.
  18. L. M. Mendelowitz, D. C. Schwartz, and M. Pop. MAligner: a fast ordered restriction map aligner. Bioinformatics, 32(7):1016-1022, 2016. Google Scholar
  19. E. W. Myers et al. A whole-genome assembly of drosophila. Science, 287:2196-2204, 2000. Google Scholar
  20. N. Nagarajan, T. D. Read, and M. Pop. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics, 24(10):1229endash1235, 2008. Google Scholar
  21. Y. Ono, K. Asai, and M. Hamada. PBSIM: PacBio reads simulator - toward accurate genome assembly. Bioinformatics, 29(1):119-121, 2013. Google Scholar
  22. M. Pendleton et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nature Methods, 12:780-786, 2015. Google Scholar
  23. P. A. Pevzner, H. Tang, and G. Tesler. De novo repeat classification and fragment assembly. Genome Res., 14(9):1786-1796, 2004. Google Scholar
  24. P. A. Pevzner, H. Tang, and M. S. Waterman. An Eulerian path approach to DNA fragment assembly. Proc. Nat. Acad. Sci., 98(17):9748-9753, 2001. Google Scholar
  25. A. Rhoads and K. F. Au. PacBio sequencing and its applications. Genomics, Proteomics &Bioinformatics, 13(5):278-289, 2015. Google Scholar
  26. R. Ronen, C. Boucher, H. Chitsaz, and P. Pevzner. SEQuel: Improving the accuracy of genome assemblies. Bioinformatics, 28(12):i188-i196, 2012. Google Scholar
  27. L. Salmela et al. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics, 2016. To appear. Google Scholar
  28. J. T. Simpson and R. Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics, 26(12):i367-i373, 2010. Google Scholar
  29. J. T. Simpson et al. ABySS: A parallel assembler for short read sequence data. Genome Res., 19(6):1117-1123, 2009. Google Scholar
  30. A. Valouev et al. Alignment of optical maps. J. Comp. Bio., 13(2):442-462, 2006. Google Scholar
  31. D. Verzotto et al. OPTIMA: Sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis. GigaScience, 5:2, 2016. Google Scholar
  32. D. R. Zerbino and E. Birney. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5):821-829, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail