Alignment- and Reference-Free Phylogenomics with Colored de Bruijn Graphs

Author Roland Wittler

Thumbnail PDF


  • Filesize: 1.01 MB
  • 14 pages

Document Identifiers

Author Details

Roland Wittler
  • Genome Informatics, Faculty of Technology, Bielefeld University, Germany
  • Center for Biotechnology, Bielefeld University, Germany


I thank Guillaume Holley for support on Bifrost, Nina Luhmann for pointers to data sets, and Andreas Rempel for programming assistance.

Cite AsGet BibTex

Roland Wittler. Alignment- and Reference-Free Phylogenomics with Colored de Bruijn Graphs. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


We present a new whole-genome based approach to infer large-scale phylogenies that is alignment- and reference-free. In contrast to other methods, it does not rely on pairwise comparisons to determine distances to infer edges in a tree. Instead, a colored de Bruijn graph is constructed, and information on common subsequences is extracted to infer phylogenetic splits. Application to different datasets confirms robustness of the approach. A comparison to other state-of-the-art whole-genome based methods indicates comparable or higher accuracy and efficiency.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • Applied computing → Molecular sequence analysis
  • Phylogenomics
  • phylogenetics
  • phylogenetic splits
  • colored de Bruijn graphs


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Fatemeh Almodaresi, Prashant Pandey, and Rob Patro. Rainbowfish: a succinct colored de Bruijn graph representation. In International Workshop on Algorithms in Bioinformatics (WABI 2017), volume 88, pages 18:1-18:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. Google Scholar
  2. Hans-Jürgen Bandelt and Andreas WM Dress. Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Molecular Phylogenetics and Evolution, 1(3):242-252, 1992. Google Scholar
  3. Madeline A Crosby, Joshua L Goodman, Victor B Strelets, Peili Zhang, William M Gelbart, and the FlyBase Consortium. FlyBase: genomes by the dozen. Nucleic Acids Research, 35(suppl_1):D486-D491, 2006. Google Scholar
  4. Thomas Dencker, Chris-André Leimeister, Michael Gerth, Christoph Bleidorn, Sagi Snir, and Burkhard Morgenstern. Multi-SpaM: a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees. In Proc. of RECOMB Comparative Genomics, pages 227-241. Springer, 2018. Google Scholar
  5. Huan Fan, Anthony R Ives, Yann Surget-Groba, and Charles H Cannon. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics, 16(1):522, 2015. Google Scholar
  6. Jan Finke, Danielle Winget, Amy Chan, and Curtis Suttle. Variation in the genetic repertoire of viruses infecting Micromonas pusilla reflects horizontal gene transfer and links to their environmental distribution. Viruses, 9(5):116, 2017. Google Scholar
  7. Bernhard Haubold, Fabian Klötzl, and Peter Pfaffelhuber. andi: Fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics, 31(8):1169-1175, 2014. Google Scholar
  8. Guillaume Holley and Páll Melsted. Bifrost-Highly parallel construction and indexing of colored and compacted de Bruijn graphs. BioRxiv, page 695338, 2019. Google Scholar
  9. Guillaume Holley, Roland Wittler, and Jens Stoye. Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms for Molecular Biology, 11(1):3, 2016. Google Scholar
  10. Daniel H Huson, Tobias Kloepper, and David Bryant. SplitsTree 4.0-computation of phylogenetic trees and networks. Bioinformatics, 14:68-73, 2008. Google Scholar
  11. Zamin Iqbal, Mario Caccamo, Isaac Turner, Paul Flicek, and Gil McVean. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44(2):226, 2012. Google Scholar
  12. Tobias H Kloepper and Daniel H Huson. Drawing explicit phylogenetic networks and their integration into SplitsTree. BMC Evolutionary Biology, 8(1):22, 2008. Google Scholar
  13. Chris-André Leimeister, Salma Sohrabi-Jahromi, and Burkhard Morgenstern. Fast and accurate phylogeny reconstruction using filtered spaced-word matches. Bioinformatics, 33(7):971-979, 2017. Google Scholar
  14. Martin D Muggli, Alexander Bowe, Noelle R Noyes, Paul S Morley, Keith E Belk, Robert Raymond, Travis Gagie, Simon J Puglisi, and Christina Boucher. Succinct colored de Bruijn graphs. Bioinformatics, 33(20):3181-3187, 2017. Google Scholar
  15. Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4):406-425, 1987. Google Scholar
  16. B Jesse Shapiro, Ines Levade, Gabriela Kovacikova, Ronald K Taylor, and Salvador Almagro-Moreno. Origins of pandemic Vibrio cholerae from environmental gene pools. Nature Microbiology, 2(3):16240, 2017. Google Scholar
  17. Jim Thurmond, Joshua L Goodman, Victor B Strelets, Helen Attrill, L Sian Gramates, Steven J Marygold, Beverley B Matthews, Gillian Millburn, Giulia Antonazzo, Vitor Trovisco, Thomas C Kaufman, Brian R Calvi, and the FlyBase Consortium. FlyBase 2.0: the next generation. Nucleic Acids Research, 47(D1):D759-D765, 2018. Google Scholar
  18. Huiguang Yi and Li Jin. Co-phylog: an assembly-free phylogenomic approach for closely related organisms. Nucleic Acids Research, 41(7):e75-e75, 2013. Google Scholar
  19. Xiaoyu Yu and Oleg N Reva. SWPhylo-a novel tool for phylogenomic inferences by comparison of oligonucleotide patterns and integration of genome-based and gene-based phylogenetic trees. Evolutionary Bioinformatics, 14:1176934318759299, 2018. Google Scholar
  20. Zhemin Zhou, Nabil-Fareed Alikhan, Martin J Sergeant, Nina Luhmann, Cátia Vaz, Alexandre P Francisco, João André Carriço, and Mark Achtman. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Research, 28(9):1395-1404, 2018. Google Scholar
  21. Zhemin Zhou, Inge Lundstrøm, Alicia Tran-Dien, Sebastián Duchêne, Nabil-Fareed Alikhan, Martin J Sergeant, Gemma Langridge, Anna K Fotakis, Satheesh Nair, Hans K Stenøien, Stian S. Hamre, Sherwood Casjens, Axel Christophersen, Christopher Quince, Nicholas R. Thomson, François-Xavier Weill, Simon Y.W. Ho, M. Thomas P. Gilbert, and Mark Achtman. Pan-genome analysis of ancient and modern Salmonella enterica demonstrates genomic stability of the invasive para C lineage for millennia. Current Biology, 28(15):2420-2428, 2018. Google Scholar
  22. Guanghong Zuo and Bailin Hao. CVTree3 web server for whole-genome-based and alignment-free prokaryotic phylogeny and taxonomy. Genomics, Proteomics & Bioinformatics, 13(5):321-331, 2015. Google Scholar