License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2021.12
URN: urn:nbn:de:0030-drops-143659
URL: https://drops.dagstuhl.de/opus/volltexte/2021/14365/
Go to the corresponding LIPIcs Volume Portal


Kitaya, Kazushi ; Shibuya, Tetsuo

Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition

pdf-format:
LIPIcs-WABI-2021-12.pdf (1 MB)


Abstract

A set of k-mers is used in many bioinformatics tasks, and much work has been done on methods to efficiently represent or compress a single set of k-mers. However, methods for compressing multiple k-mer sets have been less studied in spite of their obvious benefits for researchers and genome-related database maintainers. This paper proposes an algorithm to compress multiple k-mer sets, which works by iteratively splitting SPSS (spectrum-preserving string sets). In experiments with 3292 k-mer sets constructed from E. coli whole-genome sequencing data and 2555 k-mer sets constructed from human RNA-Seq data, the proposed algorithm could reduce the compressed file sizes by 34.7% and 13.2% respectively compared to one of the state-of-the-art colored de Bruijn graph representations. Also, our method used less memory than the colored de Bruijn graph method. This paper also introduces various methods to make the compression algorithm efficient in terms of time and memory, one of which is a parallelizable small-weight SPSS construction algorithm.

BibTeX - Entry

@InProceedings{kitaya_et_al:LIPIcs.WABI.2021.12,
  author =	{Kitaya, Kazushi and Shibuya, Tetsuo},
  title =	{{Compression of Multiple k-Mer Sets by Iterative SPSS Decomposition}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{12:1--12:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/14365},
  URN =		{urn:nbn:de:0030-drops-143659},
  doi =		{10.4230/LIPIcs.WABI.2021.12},
  annote =	{Keywords: sequencing data, k-mer, de Bruijn graph, compression, colored de Bruijn graph}
}

Keywords: sequencing data, k-mer, de Bruijn graph, compression, colored de Bruijn graph
Collection: 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Issue Date: 2021
Date of publication: 22.07.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI