Fast and Simple Compact Hashing via Bucketing

Authors Dominik Köppl , Simon J. Puglisi , Rajeev Raman



PDF
Thumbnail PDF

File

LIPIcs.SEA.2020.7.pdf
  • Filesize: 0.52 MB
  • 14 pages

Document Identifiers

Author Details

Dominik Köppl
  • Department of Informatics, Kyushu University, Japan Society for Promotion of Science (JSPS), Fukuoka, Japan
Simon J. Puglisi
  • Helsinki Institute for Information Technology, Espoo, Finland
  • Department of Computer Science, University of Helsinki, Finland
Rajeev Raman
  • School of Informatics, University of Leicester, United Kingdom

Acknowledgements

We thank Marvin Löbel for the implementations of the Bonsai tables with the additional support of a sparse table layout.

Cite AsGet BibTex

Dominik Köppl, Simon J. Puglisi, and Rajeev Raman. Fast and Simple Compact Hashing via Bucketing. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.SEA.2020.7

Abstract

Compact hash tables store a set S of n key-value pairs, where the keys are from the universe U = {0,…,u-1}, and the values are v-bit integers, in close to B(u, n) + nv bits of space, where {b(u, n)} = log₂ binom(u,n) is the information-theoretic lower bound for representing the set of keys in S, and support operations insert, delete and lookup on S. Compact hash tables have received significant attention in recent years, and approaches dating back to Cleary [IEEE T. Comput, 1984], as well as more recent ones have been implemented and used in a number of applications. However, the wins on space usage of these approaches are outweighed by their slowness relative to conventional hash tables. In this paper, we demonstrate that compact hash tables based upon a simple idea of bucketing practically outperform existing compact hash table implementations in terms of memory usage and construction time, and existing fast hash table implementations in terms of memory usage (and sometimes also in terms of construction time). A related notion is that of a compact Hash ID map, which stores a set Ŝ of n keys from U, and implicitly associates each key in Ŝ with a unique value (its ID), chosen by the data structure itself, which is an integer of magnitude O(n), and supports inserts and lookups on Ŝ, while using close to B(u,n) bits. One of our approaches is suitable for use as a compact Hash ID map.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • compact hashing
  • hash table
  • separate chaining

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Yuriy Arbitman, Moni Naor, and Gil Segev. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In Proc. FOCS, pages 787-796, 2010. Google Scholar
  2. Daniel K. Blandford and Guy E. Blelloch. Compact representations of ordered sets. In Proc. SODA, pages 11-19, 2004. Google Scholar
  3. Daniel K. Blandford and Guy E. Blelloch. Compact dictionaries for variable-length keys and data with applications. ACM Trans. Algorithms, 4(2):17:1-17:25, 2008. Google Scholar
  4. Andrej Brodnik and J. Ian Munro. Membership in constant time and almost-minimum space. SIAM J. Comput., 28(5):1627-1640, 1999. Google Scholar
  5. Paolo Carlini, Phil Edwards, Doug Gregor, Benjamin Kosnik, Dhruv Matani, Jason Merrill, Mark Mitchell, Nathan Myers, Felix Natter, Stefan Olsson, Silvius Rus, Johannes Singler, Ami Tavory, and Jonathan Wakely. The GNU C++ Library Manual. FSF, 2018. Google Scholar
  6. John G. Cleary. Compact hash tables using bidirectional linear probing. IEEE Trans. Computers, 33(9):828-834, 1984. Google Scholar
  7. John J. Darragh, John G. Cleary, and Ian H. Witten. Bonsai: a compact representation of trees. Softw., Pract. Exper., 23(3):277-291, 1993. Google Scholar
  8. Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Patrascu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Proc. LATIN, volume 3887 of LNCS, pages 349-361, 2006. Google Scholar
  9. Johannes Fischer and Dominik Köppl. Practical evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch tries. In Proc. SPIRE, volume 10508 of LNCS, pages 191-207, 2017. Google Scholar
  10. Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul G. Spirakis. Space efficient hash tables with worst case constant access time. Theory Comput. Syst., 38(2):229-248, 2005. Google Scholar
  11. Shunsuke Kanda, Dominik Köppl, Yasuo Tabei, Kazuhiro Morita, and Masao Fuketa. Dynamic path-decomposed tries. arXiv 1906.06015, 2019. Google Scholar
  12. Tobias Maier, Peter Sanders, and Stefan Walzer. Dynamic space efficient hashing. Algorithmica, 81(8):3162-3185, 2019. Google Scholar
  13. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, 1997. Google Scholar
  14. Rasmus Pagh. Low redundancy in static dictionaries with constant query time. SIAM J. Comput., 31(2):353-363, 2001. Google Scholar
  15. Andreas Poyias, Simon J. Puglisi, and Rajeev Raman. Compact dynamic rewritable (CDRW) arrays. In Proc. ALENEX, pages 109-119, 2017. Google Scholar
  16. Andreas Poyias, Simon J. Puglisi, and Rajeev Raman. m-Bonsai: A practical compact dynamic trie. Int. J. Found. Comput. Sci., 29(8):1257-1278, 2018. Google Scholar
  17. Martin Raab and Angelika Steger. "balls into bins" - A simple and tight analysis. In Proc. RANDOM, volume 1518 of LNCS, pages 159-170, 1998. Google Scholar
  18. Rajeev Raman and S. Srinivasa Rao. Succinct dynamic dictionaries and trees. In Proc. ICALP, volume 2719 of LNCS, pages 357-368, 2003. Google Scholar
  19. John T. Robinson and Murthy V. Devarakonda. Data cache management using frequency-based replacement. In Proc. SIGMETRICS, pages 134-142, 1990. Google Scholar
  20. Kenneth A. Ross. Efficient hash probes on modern processors. In Proc. ICDE, pages 1297-1301, 2007. Google Scholar
  21. Guy L. Steele Jr., Doug Lea, and Christine H. Flood. Fast splittable pseudorandom number generators. In Proc. OOPSLA, pages 453-472, 2014. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail