Towards an Analysis of Quadratic Probing

Authors William Kuszmaul , Zoe Xi



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2024.103.pdf
  • Filesize: 0.79 MB
  • 19 pages

Document Identifiers

Author Details

William Kuszmaul
  • Harvard University, Cambridge, MA, USA
Zoe Xi
  • Massachusetts Institute of Technology, Cambridge, MA, USA

Cite AsGet BibTex

William Kuszmaul and Zoe Xi. Towards an Analysis of Quadratic Probing. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 103:1-103:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICALP.2024.103

Abstract

Since 1968, one of the simplest open questions in the theory of hash tables has been to prove anything nontrivial about the correctness of quadratic probing. We make the first tangible progress towards this goal, showing that there exists a positive-constant load factor at which quadratic probing is a constant-expected-time hash table. Our analysis applies more generally to any fixed-offset open-addressing hash table, and extends to higher load factors in the case where the hash table examines blocks of some size B = ω(1).

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • quadratic probing
  • hashing
  • open addressing
  • witness trees

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Abseil, 2017. Accessed: 2020-11-06. URL: https://abseil.io/.
  2. Ole Amble and Donald Ervin Knuth. Ordered hash tables. The Computer Journal, 17(2):135-142, January 1974. URL: https://doi.org/10.1093/comjnl/17.2.135.
  3. Yuriy Arbitman, Moni Naor, and Gil Segev. De-amortized cuckoo hashing: Provable worst-case performance and experimental results. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming (ICALP 2009), volume 5555 of Lecture Notes in Computer Science, pages 107-118, 2009. URL: https://doi.org/10.1007/978-3-642-02927-1_11.
  4. Guy de Balbine. Computational analysis of the random components induced by a binary equivalence relation. PhD thesis, California Institute of Technology, 1968. Google Scholar
  5. Vladimir Batagelj. The quadratic hash method when the table size is not a prime number. Communications of the ACM, 18(4):216-217, 1975. Google Scholar
  6. Daniel Bauer. Columbia COMS W3134: Data structures in Java - Lecture 12: Introduction to hashing, October 2015. URL: http://www.cs.columbia.edu/~bauer/cs3134-f15/slides/w3134-1-lecture12.pdf.
  7. James R Bell and Charles H Kaman. The linear quotient hash code. Communications of the ACM, 13(11):675-676, 1970. Google Scholar
  8. Michael A Bender, Martín Farach-Colton, John Kuszmaul, William Kuszmaul, and Mingmou Liu. On the optimal time/space tradeoff for hash tables. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1284-1297, 2022. Google Scholar
  9. Michael A Bender, Bradley C Kuszmaul, and William Kuszmaul. Linear probing revisited: Tombstones mark the demise of primary clustering. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1171-1182. IEEE, 2022. Google Scholar
  10. Pedro Celis, Per-Åke Larson, and J. Ian Munro. Robin Hood hashing (preliminary report). In 26th Annual Symposium on Foundations of Computer Science (FOCS'85), pages 281-288, Portland, Oregon, USA, 21-23 October 1985. URL: https://doi.org/10.1109/SFCS.1985.48.
  11. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press, Cambridge, Massachusetts, USA, 3rd edition, 2009. Google Scholar
  12. Lilian de Greef. UW CSE 373: Data structures and algorithims - Lecture 7: Hash table collisions, summer 2017. URL: https://courses.cs.washington.edu/courses/cse373/17su/lectures/Lecture%2007%20-%20Hash%20Table%20Collisions.pdf.
  13. Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Pǎtraşcu. De dictionariis dynamicis pauco spatio utentibus (lat. on dynamic dictionaries using little space). In Proceedings of the 7th Latin American Symposium on Theoretical Informatics (LATIN 2006), volume 3887 of Lecture Notes in Computer Science, pages 349-361, Valdiva, Chile, 20-24 March 2006. URL: https://doi.org/10.1007/11682462_34.
  14. Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight thresholds for cuckoo hashing via XORSAT. In 37th International Colloquium on Automata, Languages and Programming (ICALP 2010), pages 213-225, 2010. Google Scholar
  15. Adam Drozdek and Donald L. Simon. Data Structures in C. PWS, Boston, Massachusetts, USA, 1995. Google Scholar
  16. A Ecker. The period of search for the quadratic and related hash methods. The Computer Journal, 17(4):340-343, 1974. Google Scholar
  17. Nikolaos Fountoulakis, Megha Khosla, and Konstantinos Panagiotou. The multiple-orientability thresholds for random hypergraphs. Combinatorics, Probability and Computing, 25(6):870-908, 2016. Google Scholar
  18. David Gries and Doug James. Cornell CS210: Object-oriented programming and data structures - recitation week 8: Hashing, fall 2014. URL: https://www.cs.cornell.edu/courses/cs2110/2014fa/recitations/recitation08/HashPresentation.pptx.
  19. Leo J Guibas and Endre Szemeredi. The analysis of double hashing. In Proceedings of the eighth annual ACM symposium on Theory of computing, pages 187-191, 1976. Google Scholar
  20. Leo J Guibas and Endre Szemeredi. The analysis of double hashing. Journal of Computer and System Sciences, 16(2):226-274, 1978. Google Scholar
  21. Leonidas J Guibas. The analysis of hashing algorithms. PhD thesis, Stanford University., 1976. Google Scholar
  22. F Robert A Hopgood and J Davenport. The quadratic hash method when the table size is a power of 2. The Computer Journal, 15(4):314-315, 1972. Google Scholar
  23. Gregory Kesden. CMU 15-310: System-level software development - hashing review, 2007. Accessed 31-May-2021. URL: https://www.andrew.cmu.edu/course/15-310/applications/ln/hashing-review.html.
  24. Donald E Knuth. Notes on “open” addressing. Unpublished memorandum, pages 11-97, 1963. Google Scholar
  25. Donald E Knuth. The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley Professional, 1998. Google Scholar
  26. Alan G Konheim and Benjamin Weiss. An occupancy discipline and applications. SIAM Journal on Applied Mathematics, 14(6):1266-1274, 1966. Google Scholar
  27. Tianxiao Li, Jingxun Liang, Huacheng Yu, and Renfei Zhou. Tight cell-probe lower bounds for dynamic succinct dictionaries. arXiv preprint arXiv:2306.02253, 2023. Google Scholar
  28. Tianxiao Li, Jingxun Liang, Huacheng Yu, and Renfei Zhou. Dynamic dictionary with subconstant wasted bits per key. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 171-207. SIAM, 2024. Google Scholar
  29. Mingmou Liu, Yitong Yin, and Huacheng Yu. Succinct filters for sets of unknown sizes. In Proceedings 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020), volume 168 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1-79:19, Saarbrücken, Germany, 8-11 July 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.79.
  30. George Lueker and Mariko Molodowitch. More analysis of double hashing. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 354-359, 1988. Google Scholar
  31. George S Lueker and Mariko Molodowitch. More analysis of double hashing. Combinatorica, 13(1):83-96, 1993. Google Scholar
  32. Ward Douglas Maurer. Programming technique: An improved hash code for scatter storage. Communications of the ACM, 11(1):35-38, 1968. Google Scholar
  33. Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017. Google Scholar
  34. Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122-144, 2004. Google Scholar
  35. W Wesley Peterson. Addressing for random-access storage. IBM journal of Research and Development, 1(2):130-146, 1957. Google Scholar
  36. Charles E Radke. The use of quadratic residue research. Communications of the ACM, 13(2):103-105, 1970. Google Scholar
  37. Rajeev Raman and Satti Srinivasa Rao. Succinct dynamic dictionaries and trees. In Automata, Languages and Programming, pages 357-368, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. Google Scholar
  38. Stefan Richter, Victor Alvarez, and Jens Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96-107, 2015. Google Scholar
  39. David G. Sullivan. Harvard CS S-111: Intensive introduction to computer science using Java - unit 9, part 4: Hash tables, summer 2021. URL: https://sites.fas.harvard.edu/~libs111/files/lectures/unit9-4.pdf.
  40. Mark Allen Weiss. Data Structures and Problem Solving using C++. Addison-Wesley, Reading, Massachusetts, USA, 2000. Google Scholar