WGSUniFrac: Applying UniFrac Metric to Whole Genome Shotgun Data

Authors Wei Wei , David Koslicki

Document Identifiers

Author Details

Wei Wei
  • The Pennsylvania State University, University Park, PA, USA
David Koslicki
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
  • Department of Biology, The Pennsylvania State University, University Park, PA, USA
  • Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA

Cite AsGet BibTex

Wei Wei and David Koslicki. WGSUniFrac: Applying UniFrac Metric to Whole Genome Shotgun Data. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 15:1-15:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


The UniFrac metric has proven useful in revealing diversity across metagenomic communities. Due to the phylogeny-based nature of this measurement, UniFrac has historically only been applied to 16S rRNA data. Simultaneously, Whole Genome Shotgun (WGS) metagenomics has been increasingly widely employed and proven to provide more information than 16S data, but a UniFrac-like diversity metric suitable for WGS data has not previously been developed. The main obstacle for UniFrac to be applied directly to WGS data is the absence of phylogenetic distances in the taxonomic relationship derived from WGS data. In this study, we demonstrate a method to overcome this intrinsic difference and compute the UniFrac metric on WGS data by assigning branch lengths to the taxonomic tree obtained from input taxonomic profiles. We conduct a series of experiments to demonstrate that this WGSUniFrac method is comparably robust to traditional 16S UniFrac and is not highly sensitive to branch lengths assignments, be they data-derived or model-prescribed.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Applied computing → Bioinformatics
  • Applied computing → Computational genomics
  • UniFrac
  • beta-diversity
  • Whole Genome Shotgun
  • microbial community similarity


