1 Search Results for "Medjedovic, Dzejla"


Document
Buffered Count-Min Sketch on SSD: Theory and Experiments

Authors: Mayank Goswami, Dzejla Medjedovic, Emina Mekic, and Prashant Pandey

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)


Abstract
Frequency estimation data structures such as the count-min sketch (CMS) have found numerous applications in databases, networking, computational biology and other domains. Many applications that use the count-min sketch process massive and rapidly evolving data sets. For data-intensive applications that aim to keep the overestimate error low, the count-min sketch becomes too large to store in available RAM and may have to migrate to external storage (e.g., SSD.) Due to the random-read/write nature of hash operations of the count-min sketch, simply placing it on SSD stifles the performance of time-critical applications, requiring about 4-6 random reads/writes to SSD per estimate (lookup) and update (insert) operation. In this paper, we expand on the preliminary idea of the buffered count-min sketch (BCMS) {[Eydi et al., 2017]}, an SSD variant of the count-min sketch, that uses hash localization to scale efficiently out of RAM while keeping the total error bounded. We describe the design and implementation of the buffered count-min sketch, and empirically show that our implementation achieves 3.7 x-4.7 x speedup on update and 4.3 x speedup on estimate operations compared to the traditional count-min sketch on SSD. Our design also offers an asymptotic improvement in the external-memory model over the original data structure: r random I/Os are reduced to 1 I/O for the estimate operation. For a data structure that uses k blocks on SSD, w as the word/counter size, r as the number of rows, M as the number of bits in the main memory, our data structure uses kwr/M amortized I/Os for updates, or, if kwr/M > 1, 1 I/O in the worst case. In typical scenarios, kwr/M is much smaller than 1. This is in contrast to O(r) I/Os incurred for each update in the original data structure. Lastly, we mathematically show that for the buffered count-min sketch, the error rate does not substantially degrade over the traditional count-min sketch. Specifically, we prove that for any query q, our data structure provides the guarantee: Pr[Error(q) >= n epsilon (1+o(1))] <= delta + o(1), which, up to o(1) terms, is the same guarantee as that of a traditional count-min sketch.

Cite as

Mayank Goswami, Dzejla Medjedovic, Emina Mekic, and Prashant Pandey. Buffered Count-Min Sketch on SSD: Theory and Experiments. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{goswami_et_al:LIPIcs.ESA.2018.41,
  author =	{Goswami, Mayank and Medjedovic, Dzejla and Mekic, Emina and Pandey, Prashant},
  title =	{{Buffered Count-Min Sketch on SSD: Theory and Experiments}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{41:1--41:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.41},
  URN =		{urn:nbn:de:0030-drops-95042},
  doi =		{10.4230/LIPIcs.ESA.2018.41},
  annote =	{Keywords: Streaming model, Count-min sketch, Counting, Frequency, External memory, I/O efficiency, Bloom filter, Counting filter, Quotient filter}
}
  • Refine by Author
  • 1 Goswami, Mayank
  • 1 Medjedovic, Dzejla
  • 1 Mekic, Emina
  • 1 Pandey, Prashant

  • Refine by Classification
  • 1 Theory of computation → Data structures and algorithms for data management
  • 1 Theory of computation → Database query processing and optimization (theory)
  • 1 Theory of computation → Streaming models

  • Refine by Keyword
  • 1 Bloom filter
  • 1 Count-min sketch
  • 1 Counting
  • 1 Counting filter
  • 1 External memory
  • Show More...

  • Refine by Type
  • 1 document

  • Refine by Publication Year
  • 1 2018

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail