Telescoping Filter: A Practical Adaptive Filter

Lee, David J.; McCauley, Samuel; Singh, Shikha; Stein, Max

doi:10.4230/LIPIcs.ESA.2021.60

Abstract

Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ∉ S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ε. Correctness is traded for space: filters only use log (1/ε) + O(1) bits per element.
The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ε. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ε, regardless of answers to previous queries. This requires "fixing" false positives as they occur.
Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives.
Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds.
In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters.

Karl Anderson and Steve Plimpton. Firehose streaming benchmarks. Technical report, Sandia National Laboratory, 2015.
Karl Anderson and Stevel Plimpton. FireHose streaming benchmarks. https://www.firehose.sandia.gov. Accessed: 2018-12-11.
Austin Appleby. Murmurhash. https://github.com/aappleby/smhasher, 2016. Accessed: 2020-08-01.
Michael A Bender, Rathish Das, Martín Farach-Colton, Tianchi Mo, David Tench, and Yung Ping Wang. Mitigating false positives in filters: to adapt or to cache? In Symposium on Algorithmic Principles of Computer Systems (APOCS), pages 16-24. SIAM, 2021.
Michael A Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. Bloom filters, adaptivity, and the dictionary problem. In Symposium on Foundations of Computer Science (FOCS), pages 182-193. IEEE, 2018.
Michael A Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P Spillane, and Erez Zadok. Don't thrash: how to cache your hash on flash. Proc. VLDB Endowment, 5(11):1627-1637, 2012.
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970.
Alex D Breslow and Nuwan S Jayasena. Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. Proc. VLDB Endowment, 11(9):1041-1055, 2018.
Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485-509, 2004.
J Bruck, Jie Gao, and Anxiao Jiang. Weighted bloom filter. In Symposium on Information Theory. IEEE, 2006.
Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Symposium on Theory of Computing (STOC), pages 59-65. ACM, 1978.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. Transactions on Computer Systems, 26(2):4, 2008.
Saar Cohen and Yossi Matias. Spectral bloom filters. In International Conference on Management of Data (SIGMOD), pages 241-252. ACM, 2003.
Kyle Deeds, Brian Hentschel, and Stratos Idreos. Stacked filters: learning to filter by structure. Proc. VLDB Endowment, 14(4):600-612, 2020.
Fan Deng and Davood Rafiei. Approximately detecting duplicates for streaming data using stable bloom filters. In International Conference on Management of Data (SIGMOD), pages 25-36. ACM, 2006.
Peter C Dillinger and Stefan Walzer. Ribbon filter: practically smaller than bloom and xor. arXiv preprint arXiv:2103.02515, 2021.
David Eppstein, Michael T Goodrich, Michael Mitzenmacher, and Manuel R Torres. 2-3 cuckoo filters for faster triangle listing and set intersection. In Principles of Database Systems (PODS), pages 247-260. ACM, 2017.
Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. Cuckoo filter: Practically better than bloom. In Conference on emerging Networking Experiments and Technologies (CoNEXT), pages 75-88. ACM, 2014.
Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (JEA), 25:1-16, 2020.
Paul G. Howard and Jeffrey Scott Vitter. Practical Implementations of Arithmetic Coding, pages 85-112. Springer US, Boston, MA, 1992. URL: https://doi.org/10.1007/978-1-4615-3596-6_4.
Tsvi Kopelowitz, Samuel McCauley, and Eli Porat. Support optimality and adaptive cuckoo filters. In Proc. 17th Algorithms and Data Structures Symposium (WADS), 2021. To appear.
Harald Lang, Thomas Neumann, Alfons Kemper, and Peter Boncz. Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. Proc. VLDB Endowment, 12(5):502-515, 2019.
Yoshinori Matsunobu, Siying Dong, and Herman Lee. Myrocks: LSM-tree database storage engine serving Facebook’s social graph. Proc. VLDB Endowment, 13(12):3217-3230, 2020.
Michael Mitzenmacher. A model for learned bloom filters, and optimizing by sandwiching. In Conference on Neural Information Processing Systems (NeurIPS), pages 462-471, 2018.
Michael Mitzenmacher, Salvatore Pontarelli, and Pedro Reviriego. Adaptive cuckoo filters. In Workshop on Algorithm Engineering and Experiments (ALENEX), pages 36-47. SIAM, 2018.
Moni Naor and Eylon Yogev. Bloom filters in adversarial environments. In Annual Cryptology Conference, pages 565-584. Springer, 2015.
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351-385, 1996.
Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. An optimal bloom filter replacement. In Symposium on Discrete Algorithms (SODA), pages 823-829. ACM-SIAM, 2005.
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In International Conference on Management of Data (SIGMOD), pages 775-787. ACM, 2017.
Jack Rae, Sergey Bartunov, and Timothy Lillicrap. Meta-learning neural bloom filters. In International Conference on Machine Learning (ICML), pages 5271-5280. PMLR, 2019.
Sasu Tarkoma, Christian Esteve Rothenberg, Eemil Lagerspetz, et al. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys and Tutorials, 14(1):131-155, 2012.
Minmei Wang and Mingxun Zhou. Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters. Proc. VLDB Endowment, 2019.
Ian H. Witten, Radford M. Neal, and John G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987.
Mingxun Zhou. Vacuum filter. https://github.com/wuwuz/Vacuum-Filter, 2020. Accessed: 2020-12-01.

Telescoping Filter: A Practical Adaptive Filter

Authors David J. Lee, Samuel McCauley, Shikha Singh, Max Stein

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Telescoping Filter: A Practical Adaptive Filter

Authors David J. Lee, Samuel McCauley, Shikha Singh, Max Stein

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message