Kermit: Guided Long Read Assembly using Coloured Overlap Graphs

Authors Riku Walve , Pasi Rastas, Leena Salmela

Thumbnail PDF


  • Filesize: 420 kB
  • 11 pages

Document Identifiers

Author Details

Riku Walve
  • Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
Pasi Rastas
  • Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Leena Salmela
  • Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland

Cite AsGet BibTex

Riku Walve, Pasi Rastas, and Leena Salmela. Kermit: Guided Long Read Assembly using Coloured Overlap Graphs. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 11:1-11:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly.

Subject Classification

ACM Subject Classification
  • Applied computing → Sequencing and genotyping technologies
  • Mathematics of computing → Graph theory
  • Genome assembly
  • Linkage maps
  • Coloured overlap graph


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. V. Ahola, R. Lehtonen, P. Somervuo, et al. The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nature Communications, 5:4737, 2014. Google Scholar
  2. B. Alipanahi, L. Salmela, S.J. Puglisi, M. Muggli, and C. Boucher. Disentangled long-read de Bruijn graphs via optical maps. In R. Schwartz and K. Reinert, editors, WABI 2017, volume 88 of LIPIcs, pages 1:1-1:14, Dagstuhl, Germany, 2017. Google Scholar
  3. A. Bankevich, S. Nurk, D. Antipov, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol., 19(5):455-477, 2012. Google Scholar
  4. S.M. Van Belleghem, P. Rastas, A. Papanicolalaou, et al. Complex modular architecture around a simple toolkit of wing pattern genes. Nature Ecology &Evolution, 1:0052, 2017. Google Scholar
  5. J. Catchen. Chromonomer., 2015. Accessed: 2018-04-27. Google Scholar
  6. G. Chartrand, GL. Johns, KA. McKeon, and P. Zhang. Rainbow connection in graphs. Mathematica Bohemica, 133(1):85-98, 2008. Google Scholar
  7. C.-S. Chin, P. Peluso, F.J. Sedlazeck, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods, 13:1050-1054, 2016. Google Scholar
  8. J.L. Fierst. Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools. Frontiers in Genetics, 6:220, 2015. Google Scholar
  9. A. Gurevich, V. Saveliev, N. Vyahhi N, and G. Tesler. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8):1072-1075, 2013. Google Scholar
  10. M. Kolmogorov, J. Yuan, Y. Lin, and P. Pevzner. Assembly of long error-prone reads using repeat graphs. In Proc. RECOMB 2018, pages 261-263, 2018. Google Scholar
  11. S. Koren, B.P. Walenz, K. Berlin, J.R. Miller, N.H. Bergman, and A.M. Phillippy. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res., 27:722-736, 2017. Google Scholar
  12. H. Li. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32(14):2103-2110, 2016. Google Scholar
  13. H. Li. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018. (To appear). Google Scholar
  14. H.C. Lin, S. Goldstein, L. Mendelowitz, S. Zhou, J. Wetzel, D.C. Schwartz, and M. Pop. AGORA: assembly guided by optical restriction alignment. BMC Bioinformatics, 13:189, 2012. Google Scholar
  15. T. Paterson and A. Law. ArkMAP: integrating genomic maps across species and data sources. BMC Bioinformatics, 14:246, 2013. Google Scholar
  16. R. Vaser R, I. Sovic, N. Nagarajan, and M. Sikic. Fast and accurate de novo genome assembly from long uncorrected reads. Genome research, 27:737-746, 2017. Google Scholar
  17. P. Rastas. Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data. Bioinformatics, 33(23):3726-3732, 2017. Google Scholar
  18. J. Salojärvi, O.P. Smolander, K. Nieminen, et al. Genome sequencing and population genomic analyses provide insights into the adaptive landscape of silver birch. Nature Genetics, 49:904-912, 2017. Google Scholar
  19. K. Schneeberger, S. Ossowski, F. Ott, et al. Reference-guided assembly of four diverse Arabidopsis thaliana genomes. PNAS, 108(25):10249-10254, 2011. Google Scholar
  20. B.K. Stöcker, J. Köster, and S. Rahmann. SimLoRD: Simulation of long read data. Bioinformatics, 32(17):2704-2706, 2016. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail