2 Search Results for "Pandey, Prashant"


Document
Buffered Count-Min Sketch on SSD: Theory and Experiments

Authors: Mayank Goswami, Dzejla Medjedovic, Emina Mekic, and Prashant Pandey

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)


Abstract
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation. In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD. Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M > 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure. Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) >= n epsilon (1+o(1))] <= delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.

Cite as

Mayank Goswami, Dzejla Medjedovic, Emina Mekic, and Prashant Pandey. Buffered Count-Min Sketch on SSD: Theory and Experiments. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{goswami_et_al:LIPIcs.ESA.2018.41,
  author =	{Goswami, Mayank and Medjedovic, Dzejla and Mekic, Emina and Pandey, Prashant},
  title =	{{Buffered Count-Min Sketch on SSD: Theory and Experiments}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{41:1--41:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.41},
  URN =		{urn:nbn:de:0030-drops-95042},
  doi =		{10.4230/LIPIcs.ESA.2018.41},
  annote =	{Keywords: Streaming model, Count-min sketch, Counting, Frequency, External memory, I/O efficiency, Bloom filter, Counting filter, Quotient filter}
}
Document
Rainbowfish: A Succinct Colored de Bruijn Graph Representation

Authors: Fatemeh Almodaresi, Prashant Pandey, and Rob Patro

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)


Abstract
The colored de Bruijn graph— a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors - is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic variants. Muggli et al. introduced VARI, a representation of the colored de Bruijn graph that adopts the BOSS representation for the de Bruijn graph topology and achieves considerable savings in space over Cortex, albeit with some sacrifice in speed. The memory-efficient representation of VARI allows the colored de Bruijn graph to be constructed and analyzed for large datasets, beyond what is possible with Cortex. In this paper, we introduce Rainbowfish, a succinct representation of the color information of the colored de Bruijn graph that reduces the space usage even further. Our representation also uses BOSS to represent the de Bruijn graph, but decomposes the color sets based on an equivalence relation and exploits the inherent skewness in the distribution of these color sets. The Rainbowfish representation is compressed based on the 0th-order entropy of the color sets, which can lead to a significant reduction in the space required to store the relevant information for each edge. In practice, Rainbowfish achieves up to a 20x improvement in space over VARI. Rainbowfish is written in C++11 and is available at https://github.com/COMBINE-lab/rainbowfish.

Cite as

Fatemeh Almodaresi, Prashant Pandey, and Rob Patro. Rainbowfish: A Succinct Colored de Bruijn Graph Representation. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 18:1-18:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{almodaresi_et_al:LIPIcs.WABI.2017.18,
  author =	{Almodaresi, Fatemeh and Pandey, Prashant and Patro, Rob},
  title =	{{Rainbowfish: A Succinct Colored de Bruijn Graph Representation}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{18:1--18:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.18},
  URN =		{urn:nbn:de:0030-drops-76576},
  doi =		{10.4230/LIPIcs.WABI.2017.18},
  annote =	{Keywords: de Bruijn graph, succinct data structures, rank and select operation, colored de Bruijn graph}
}
  • Refine by Author
  • 2 Pandey, Prashant
  • 1 Almodaresi, Fatemeh
  • 1 Goswami, Mayank
  • 1 Medjedovic, Dzejla
  • 1 Mekic, Emina
  • Show More...

  • Refine by Classification
  • 1 Theory of computation → Data structures and algorithms for data management
  • 1 Theory of computation → Database query processing and optimization (theory)
  • 1 Theory of computation → Streaming models

  • Refine by Keyword
  • 1 Bloom filter
  • 1 Count-min sketch
  • 1 Counting
  • 1 Counting filter
  • 1 External memory
  • Show More...

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 2017
  • 1 2018

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail