GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs

Authors Vijini G. Mallawaarachchi , Anuradha S. Wickramarachchi , Yu Lin



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.8.pdf
  • Filesize: 0.87 MB
  • 21 pages

Document Identifiers

Author Details

Vijini G. Mallawaarachchi
  • Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, Australia
Anuradha S. Wickramarachchi
  • Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, Australia
Yu Lin
  • Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra, Australia

Acknowledgements

We would like to thank the anonymous reviewers for their valuable comments. Furthermore, this research was undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI Australia), an NCRIS enabled capability supported by the Australian Government.

Cite AsGet BibTex

Vijini G. Mallawaarachchi, Anuradha S. Wickramarachchi, and Yu Lin. GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 8:1-8:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.WABI.2020.8

Abstract

Metagenomic sequencing allows us to study structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for contig binning only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • Applied computing → Computational genomics
Keywords
  • Metagenomics binning
  • contigs
  • assembly graphs
  • overlapped binning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jarno Alanko, Fabio Cunial, Djamal Belazzougui, and Veli Mäkinen. A framework for space-efficient read clustering in metagenomic samples. BMC Bioinformatics, 18(3):59, March 2017. URL: https://doi.org/10.1186/s12859-017-1466-6.
  2. Johannes Alneberg, Brynjar Smári Bjarnason, Ino de Bruijn, Melanie Schirmer, Joshua Quick, Umer Z. Ijaz, Leo Lahti, Nicholas J. Loman, Anders F. Andersson, and Christopher Quince. Binning metagenomic contigs by coverage and composition. Nature Methods, 11:1144–1146, September 2014. URL: https://doi.org/10.1038/nmeth.3103.
  3. Sasha K. Ames, David A. Hysom, Shea N. Gardner, et al. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics, 29(18):2253-2260, July 2013. URL: http://arxiv.org/abs/http://oup.prod.sis.lan/bioinformatics/article-pdf/29/18/2253/17128159/btt389.pdf.
  4. Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, and Pavel A. Pevzner. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology, 19(5):455-477, 2012. PMID: 22506599. URL: https://doi.org/10.1089/cmb.2012.0021.
  5. Tyler P. Barnum, Israel A. Figueroa, Charlotte I. Carlström, Lauren N. Lucas, Anna L. Engelbrektson, and John D. Coates. Genome-resolved metagenomics identifies genetic mobility, metabolic interactions, and unexpected diversity in perchlorate-reducing communities. The ISME Journal, 12(6):1568-1581, 2018. URL: https://doi.org/10.1038/s41396-018-0081-5.
  6. Joshua N. Burton, Ivan Liachko, Maitreya J. Dunham, and Jay Shendure. Species-level deconvolution of metagenome assemblies with hi-cendashbased contact probability maps. G3: Genes, Genomes, Genetics, 4(7):1339-1346, 2014. URL: https://doi.org/10.1534/g3.114.011825.
  7. Chon-Kit Kenneth Chan, Arthur L. Hsu, Saman K. Halgamuge, and Sen-Lin Tang. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics, 9(1):215, April 2008. URL: https://doi.org/10.1186/1471-2105-9-215.
  8. Brian Cleary, Ilana Lauren Brito, Katherine Huang, Dirk Gevers, Terrance Shea, Sarah Young, and Eric J. Alm. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nature Biotechnology, 33:1053, September 2015. URL: https://doi.org/10.1038/nbt.3329.
  9. Garey, Michael R. and Johnson, David S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., USA, 1979. Google Scholar
  10. Samuele Girotto, Cinzia Pizzi, and Matteo Comin. MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics, 32(17):i567-i575, August 2016. URL: https://doi.org/10.1093/bioinformatics/btw466.
  11. Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, and Erik Bongcam-Rudloff. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3):521-522, July 2018. URL: https://doi.org/10.1093/bioinformatics/bty630.
  12. Damayanthi Herath, Sen-Lin Tang, Kshitij Tandon, David Ackland, and Saman Kumara Halgamuge. Comet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision. BMC Bioinformatics, 18(16):571, December 2017. URL: https://doi.org/10.1186/s12859-017-1967-3.
  13. Dongwan Kang, Feng Li, Edward S Kirton, Ashleigh Thomas, Rob S Egan, Hong An, and Zhong Wang. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7:e27522v1, February 2019. URL: https://doi.org/10.7287/peerj.preprints.27522v1.
  14. David Kelley and Steven Salzberg. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics, 11(1):544, 2010. URL: https://doi.org/10.1186/1471-2105-11-544.
  15. Daehwan Kim, Li Song, Florian P. Breitwieser, and Steven L. Salzberg. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Research, 26(12):1721-1729, 2016. URL: http://arxiv.org/abs/http://genome.cshlp.org/content/26/12/1721.full.pdf+html.
  16. Cedric C. Laczny, Christina Kiefer, Valentina Galata, Tobias Fehlmann, Christina Backes, and Andreas Keller. BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation. Nucleic Acids Research, 45(W1):W171-W179, May 2017. URL: https://doi.org/10.1093/nar/gkx348.
  17. Heng Li. Aligning sequence reads, clone sequences and assembly contigs with bwa-mem, 2013. URL: http://arxiv.org/abs/1303.3997.
  18. Yunan Luo, Yun William Yu, Jianyang Zeng, Bonnie Berger, and Jian Peng. Metagenomic binning through low-density hashing. Bioinformatics, 35(2):219-226, July 2018. URL: https://doi.org/10.1093/bioinformatics/bty611.
  19. Vijini Mallawaarachchi, Anuradha Wickramarachchi, and Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, March 2020. btaa180. URL: https://doi.org/10.1093/bioinformatics/btaa180.
  20. Peter Menzel, Kim Lee Ng, and Anders Krogh. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7:11257, April 2016. Article. Google Scholar
  21. Eugene W. Myers. The fragment assembly string graph. Bioinformatics, 21(suppl_2):ii79-ii85, September 2005. URL: https://doi.org/10.1093/bioinformatics/bti1114.
  22. Sergey Nurk, Dmitry Meleshko, Anton Korobeynikov, and Pavel A. Pevzner. metaSPAdes: a new versatile metagenomic assembler. Genome Research, 27(5):824-834, 2017. URL: https://doi.org/10.1101/gr.213959.116.
  23. Rachid Ounit, Steve Wanamaker, Timothy J. Close, and Stefano Lonardi. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics, 16(1):236, March 2015. URL: https://doi.org/10.1186/s12864-015-1419-2.
  24. Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748-9753, 2001. URL: https://doi.org/10.1073/pnas.171285098.
  25. Christopher Quince, Alan W. Walker, Jared T. Simpson, Nicholas J. Loman, and Nicola Segata. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology, 35(9):833-844, 2017. URL: https://doi.org/10.1038/nbt.3935.
  26. Christian S. Riesenfeld, Patrick D. Schloss, and Jo Handelsman. Metagenomics: Genomic analysis of microbial communities. Annual Review of Genetics, 38(1):525-552, 2004. PMID: 15568985. URL: https://doi.org/10.1146/annurev.genet.38.072902.091216.
  27. L Schaeffer, H Pimentel, N Bray, P Melsted, and L Pachter. Pseudoalignment for metagenomic read assignment. Bioinformatics, 33(14):2082-2088, February 2017. URL: https://doi.org/10.1093/bioinformatics/btx106.
  28. Karel Sedlar, Kristyna Kupkova, and Ivo Provaznik. Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics. Computational and Structural Biotechnology Journal, 15:48-55, 2017. URL: https://doi.org/10.1016/j.csbj.2016.11.005.
  29. Itai Sharon, Michael J. Morowitz, Brian C. Thomas, Elizabeth K. Costello, David A. Relman, and Jillian F. Banfield. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Research, 23(1):111-120, 2013. URL: https://doi.org/10.1101/gr.142315.112.
  30. Jared T. Simpson and Richard Durbin. Efficient de novo assembly of large genomes using compressed data structures. Genome Research, 22(3):549-556, 2012. URL: https://doi.org/10.1101/gr.126953.111.
  31. Marc Strous, Beate Kraft, Regina Bisdorf, and Halina Tegetmeyer. The Binning of Metagenomic Contigs for Microbial Physiology of Mixed Cultures. Frontiers in Microbiology, 3:410, 2012. URL: https://doi.org/10.3389/fmicb.2012.00410.
  32. Torsten Thomas, Jack Gilbert, and Folker Meyer. Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation, 2(1):3, 2012. URL: https://doi.org/10.1186/2042-5783-2-3.
  33. Le Van Vinh, Tran Van Lang, Le Thanh Binh, and Tran Van Hoai. A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms for Molecular Biology, 10(1):2, January 2015. URL: https://doi.org/10.1186/s13015-014-0030-4.
  34. Jun Wang, Yuan Jiang, Guoxian Yu, Hao Zhang, and Haiwei Luo. BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage. Bioinformatics, 34(24):4172-4179, June 2018. URL: https://doi.org/10.1093/bioinformatics/bty519.
  35. Ying Wang, Kun Wang, Yang Young Lu, and Fengzhu Sun. Improving contig binning of metagenomic data using d2S oligonucleotide frequency dissimilarity. BMC Bioinformatics, 18(1):425, September 2017. URL: https://doi.org/10.1186/s12859-017-1835-1.
  36. Ziye Wang, Zhengyang Wang, Yang Young Lu, Fengzhu Sun, and Shanfeng Zhu. SolidBin: improving metagenome binning with semi-supervised normalized cut. Bioinformatics, 35(21):4229-4238, April 2019. URL: https://doi.org/10.1093/bioinformatics/btz253.
  37. Derrick E. Wood and Steven L. Salzberg. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15(3):R46, 2014. Google Scholar
  38. Yu-Wei Wu, Blake A. Simmons, and Steven W. Singer. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics, 32(4):605–607, October 2015. URL: https://doi.org/10.1093/bioinformatics/btv638.
  39. Yu-Wei Wu, Yung-Hsu Tang, Susannah G. Tringe, Blake A. Simmons, and Steven W. Singer. Maxbin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome, 2(1):26, August 2014. URL: https://doi.org/10.1186/2049-2618-2-26.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail