Fast and Simple Compact Hashing via Bucketing

Köppl, Dominik; Puglisi, Simon J.; Raman, Rajeev

doi:10.4230/LIPIcs.SEA.2020.7

File

LIPIcs.SEA.2020.7.pdf

Filesize: 0.52 MB
14 pages

Document Identifiers

DOI: 10.4230/LIPIcs.SEA.2020.7
URN: urn:nbn:de:0030-drops-120817

Author Details

Dominik Köppl

Department of Informatics, Kyushu University, Japan Society for Promotion of Science (JSPS), Fukuoka, Japan

Simon J. Puglisi

Helsinki Institute for Information Technology, Espoo, Finland
Department of Computer Science, University of Helsinki, Finland

Rajeev Raman

School of Informatics, University of Leicester, United Kingdom

Acknowledgements

We thank Marvin Löbel for the implementations of the Bonsai tables with the additional support of a sparse table layout.

Cite AsGet BibTex

Dominik Köppl, Simon J. Puglisi, and Rajeev Raman. Fast and Simple Compact Hashing via Bucketing. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SEA.2020.7

Abstract

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0,…,u-1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where {b(u, n)} = log₂ binom(u,n) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time). A related notion is that of a compact Hash ID map, which stores a set Ŝ of n keys from U, and implicitly associates each key in Ŝ with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on Ŝ, while using close to B(u,n) bits. One of our approaches is suitable for use as a compact Hash ID map.

Subject Classification

ACM Subject Classification

Theory of computation → Sorting and searching

Keywords

compact hashing
hash table
separate chaining

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Yuriy Arbitman, Moni Naor, and Gil Segev. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In Proc. FOCS, pages 787-796, 2010.
Daniel K. Blandford and Guy E. Blelloch. Compact representations of ordered sets. In Proc. SODA, pages 11-19, 2004.
Daniel K. Blandford and Guy E. Blelloch. Compact dictionaries for variable-length keys and data with applications. ACM Trans. Algorithms, 4(2):17:1-17:25, 2008.
Andrej Brodnik and J. Ian Munro. Membership in constant time and almost-minimum space. SIAM J. Comput., 28(5):1627-1640, 1999.
Paolo Carlini, Phil Edwards, Doug Gregor, Benjamin Kosnik, Dhruv Matani, Jason Merrill, Mark Mitchell, Nathan Myers, Felix Natter, Stefan Olsson, Silvius Rus, Johannes Singler, Ami Tavory, and Jonathan Wakely. The GNU C++ Library Manual. FSF, 2018.
John G. Cleary. Compact hash tables using bidirectional linear probing. IEEE Trans. Computers, 33(9):828-834, 1984.
John J. Darragh, John G. Cleary, and Ian H. Witten. Bonsai: a compact representation of trees. Softw., Pract. Exper., 23(3):277-291, 1993.
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Patrascu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Proc. LATIN, volume 3887 of LNCS, pages 349-361, 2006.
Johannes Fischer and Dominik Köppl. Practical evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch tries. In Proc. SPIRE, volume 10508 of LNCS, pages 191-207, 2017.
Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul G. Spirakis. Space efficient hash tables with worst case constant access time. Theory Comput. Syst., 38(2):229-248, 2005.
Shunsuke Kanda, Dominik Köppl, Yasuo Tabei, Kazuhiro Morita, and Masao Fuketa. Dynamic path-decomposed tries. arXiv 1906.06015, 2019.
Tobias Maier, Peter Sanders, and Stefan Walzer. Dynamic space efficient hashing. Algorithmica, 81(8):3162-3185, 2019.
Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, 1997.
Rasmus Pagh. Low redundancy in static dictionaries with constant query time. SIAM J. Comput., 31(2):353-363, 2001.
Andreas Poyias, Simon J. Puglisi, and Rajeev Raman. Compact dynamic rewritable (CDRW) arrays. In Proc. ALENEX, pages 109-119, 2017.
Andreas Poyias, Simon J. Puglisi, and Rajeev Raman. m-Bonsai: A practical compact dynamic trie. Int. J. Found. Comput. Sci., 29(8):1257-1278, 2018.
Martin Raab and Angelika Steger. "balls into bins" - A simple and tight analysis. In Proc. RANDOM, volume 1518 of LNCS, pages 159-170, 1998.
Rajeev Raman and S. Srinivasa Rao. Succinct dynamic dictionaries and trees. In Proc. ICALP, volume 2719 of LNCS, pages 357-368, 2003.
John T. Robinson and Murthy V. Devarakonda. Data cache management using frequency-based replacement. In Proc. SIGMETRICS, pages 134-142, 1990.
Kenneth A. Ross. Efficient hash probes on modern processors. In Proc. ICDE, pages 1297-1301, 2007.
Guy L. Steele Jr., Doug Lea, and Christine H. Flood. Fast splittable pseudorandom number generators. In Proc. OOPSLA, pages 453-472, 2014.

Fast and Simple Compact Hashing via Bucketing

Authors Dominik Köppl , Simon J. Puglisi , Rajeev Raman

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Fast and Simple Compact Hashing via Bucketing

Authors Dominik Köppl , Simon J. Puglisi , Rajeev Raman

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

Supplementary Materials

References

Thanks for your feedback!

Could not send message