Document Open Access Logo

Telescoping Filter: A Practical Adaptive Filter

Authors David J. Lee, Samuel McCauley, Shikha Singh, Max Stein



PDF
Thumbnail PDF

File

LIPIcs.ESA.2021.60.pdf
  • Filesize: 1.31 MB
  • 18 pages

Document Identifiers

Author Details

David J. Lee
  • Cornell University, Ithaca, NY, USA
Samuel McCauley
  • Williams College, Williamstown, MA, USA
Shikha Singh
  • Williams College, Williamstown, MA, USA
Max Stein
  • Williams College, Williamstown, MA, USA

Cite AsGet BibTex

David J. Lee, Samuel McCauley, Shikha Singh, and Max Stein. Telescoping Filter: A Practical Adaptive Filter. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 60:1-60:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ESA.2021.60

Abstract

Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ∉ S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ε. Correctness is traded for space: filters only use log (1/ε) + O(1) bits per element. The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ε. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ε, regardless of answers to previous queries. This requires "fixing" false positives as they occur. Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives. Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds. In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data structures design and analysis
Keywords
  • Filters
  • approximate-membership query data structures (AMQs)
  • Bloom filters
  • quotient filters
  • cuckoo filters
  • adaptivity
  • succinct data structures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl Anderson and Steve Plimpton. Firehose streaming benchmarks. Technical report, Sandia National Laboratory, 2015. Google Scholar
  2. Karl Anderson and Stevel Plimpton. FireHose streaming benchmarks. www.firehose.sandia.gov. Accessed: 2018-12-11.
  3. Austin Appleby. Murmurhash. https://github.com/aappleby/smhasher, 2016. Accessed: 2020-08-01.
  4. Michael A Bender, Rathish Das, Martín Farach-Colton, Tianchi Mo, David Tench, and Yung Ping Wang. Mitigating false positives in filters: to adapt or to cache? In Symposium on Algorithmic Principles of Computer Systems (APOCS), pages 16-24. SIAM, 2021. Google Scholar
  5. Michael A Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. Bloom filters, adaptivity, and the dictionary problem. In Symposium on Foundations of Computer Science (FOCS), pages 182-193. IEEE, 2018. Google Scholar
  6. Michael A Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P Spillane, and Erez Zadok. Don't thrash: how to cache your hash on flash. Proc. VLDB Endowment, 5(11):1627-1637, 2012. Google Scholar
  7. Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970. Google Scholar
  8. Alex D Breslow and Nuwan S Jayasena. Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. Proc. VLDB Endowment, 11(9):1041-1055, 2018. Google Scholar
  9. Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485-509, 2004. Google Scholar
  10. J Bruck, Jie Gao, and Anxiao Jiang. Weighted bloom filter. In Symposium on Information Theory. IEEE, 2006. Google Scholar
  11. Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Symposium on Theory of Computing (STOC), pages 59-65. ACM, 1978. Google Scholar
  12. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. Transactions on Computer Systems, 26(2):4, 2008. Google Scholar
  13. Saar Cohen and Yossi Matias. Spectral bloom filters. In International Conference on Management of Data (SIGMOD), pages 241-252. ACM, 2003. Google Scholar
  14. Kyle Deeds, Brian Hentschel, and Stratos Idreos. Stacked filters: learning to filter by structure. Proc. VLDB Endowment, 14(4):600-612, 2020. Google Scholar
  15. Fan Deng and Davood Rafiei. Approximately detecting duplicates for streaming data using stable bloom filters. In International Conference on Management of Data (SIGMOD), pages 25-36. ACM, 2006. Google Scholar
  16. Peter C Dillinger and Stefan Walzer. Ribbon filter: practically smaller than bloom and xor. arXiv preprint arXiv:2103.02515, 2021. Google Scholar
  17. David Eppstein, Michael T Goodrich, Michael Mitzenmacher, and Manuel R Torres. 2-3 cuckoo filters for faster triangle listing and set intersection. In Principles of Database Systems (PODS), pages 247-260. ACM, 2017. Google Scholar
  18. Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. Cuckoo filter: Practically better than bloom. In Conference on emerging Networking Experiments and Technologies (CoNEXT), pages 75-88. ACM, 2014. Google Scholar
  19. Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (JEA), 25:1-16, 2020. Google Scholar
  20. Paul G. Howard and Jeffrey Scott Vitter. Practical Implementations of Arithmetic Coding, pages 85-112. Springer US, Boston, MA, 1992. URL: https://doi.org/10.1007/978-1-4615-3596-6_4.
  21. Tsvi Kopelowitz, Samuel McCauley, and Eli Porat. Support optimality and adaptive cuckoo filters. In Proc. 17th Algorithms and Data Structures Symposium (WADS), 2021. To appear. Google Scholar
  22. Harald Lang, Thomas Neumann, Alfons Kemper, and Peter Boncz. Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. Proc. VLDB Endowment, 12(5):502-515, 2019. Google Scholar
  23. Yoshinori Matsunobu, Siying Dong, and Herman Lee. Myrocks: LSM-tree database storage engine serving Facebook’s social graph. Proc. VLDB Endowment, 13(12):3217-3230, 2020. Google Scholar
  24. Michael Mitzenmacher. A model for learned bloom filters, and optimizing by sandwiching. In Conference on Neural Information Processing Systems (NeurIPS), pages 462-471, 2018. Google Scholar
  25. Michael Mitzenmacher, Salvatore Pontarelli, and Pedro Reviriego. Adaptive cuckoo filters. In Workshop on Algorithm Engineering and Experiments (ALENEX), pages 36-47. SIAM, 2018. Google Scholar
  26. Moni Naor and Eylon Yogev. Bloom filters in adversarial environments. In Annual Cryptology Conference, pages 565-584. Springer, 2015. Google Scholar
  27. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351-385, 1996. Google Scholar
  28. Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. An optimal bloom filter replacement. In Symposium on Discrete Algorithms (SODA), pages 823-829. ACM-SIAM, 2005. Google Scholar
  29. Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In International Conference on Management of Data (SIGMOD), pages 775-787. ACM, 2017. Google Scholar
  30. Jack Rae, Sergey Bartunov, and Timothy Lillicrap. Meta-learning neural bloom filters. In International Conference on Machine Learning (ICML), pages 5271-5280. PMLR, 2019. Google Scholar
  31. Sasu Tarkoma, Christian Esteve Rothenberg, Eemil Lagerspetz, et al. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys and Tutorials, 14(1):131-155, 2012. Google Scholar
  32. Minmei Wang and Mingxun Zhou. Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters. Proc. VLDB Endowment, 2019. Google Scholar
  33. Ian H. Witten, Radford M. Neal, and John G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987. Google Scholar
  34. Mingxun Zhou. Vacuum filter. https://github.com/wuwuz/Vacuum-Filter, 2020. Accessed: 2020-12-01.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail