Optimal Hashing in External Memory

Authors Alex Conway, Martín Farach-Colton, Philip Shilane



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.39.pdf
  • Filesize: 0.49 MB
  • 14 pages

Document Identifiers

Author Details

Alex Conway
  • Rutgers University, New Brunswick, NJ, USA
Martín Farach-Colton
  • Rutgers University, New Brunswick, NJ, USA
Philip Shilane
  • Dell EMC, Newtown, PA, USA

Cite AsGet BibTex

Alex Conway, Martín Farach-Colton, and Philip Shilane. Optimal Hashing in External Memory. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 39:1-39:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.39

Abstract

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patrasu established an update/query tradeoff curve for external-hash tables: a hash table that performs insertions in O(lambda/B) amortized IOs requires Omega(log_lambda N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and lambda is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for lambda that is Omega(log log M + log_M N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of lambda. The simplicity of BOAs allows them to be readily modified to achieve the following results: - A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. - The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of lambda.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • hash tables
  • external memory algorthims
  • cache-oblivious algorithms
  • asymmetric data structures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Peyman Afshani, Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Mayank Goswami, and Meng-Tsung Tsai. Cross-referenced dictionaries and the limits of write optimization. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 1523-1532. SIAM, 2017. URL: http://dx.doi.org/10.1137/1.9781611974782.99.
  2. Michael A. Bender, Richard Cole, Erik D. Demaine, and Martin Farach-Colton. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Rolf H. Möhring and Rajeev Raman, editors, Algorithms - ESA 2002, 10th Annual European Symposium, Rome, Italy, September 17-21, 2002, Proceedings, volume 2461 of Lecture Notes in Computer Science, pages 139-151. Springer, 2002. URL: http://dx.doi.org/10.1007/3-540-45749-6_16.
  3. Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. Cache-oblivious streaming b-trees. In Phillip B. Gibbons and Christian Scheideler, editors, SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Diego, California, USA, June 9-11, 2007, pages 81-92. ACM, 2007. URL: http://dx.doi.org/10.1145/1248377.1248393.
  4. Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Dzejla Medjedovic, Pablo Montes, and Meng-Tsung Tsai. The batched predecessor problem in external memory. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, volume 8737 of Lecture Notes in Computer Science, pages 112-124. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-662-44777-2_10.
  5. Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. Don't thrash: How to cache your hash on flash. PVLDB, 5(11):1627-1637, 2012. URL: http://vldb.org/pvldb/vol5/p1627_michaelabender_vldb2012.pdf.
  6. Michael A. Bender, Martin Farach-Colton, Rob Johnson, Simon Mauras, Tyler Mayer, Cynthia A. Phillips, and Helen Xu. Write-optimized skip lists. In Emanuel Sallinger, Jan Van den Bussche, and Floris Geerts, editors, Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 69-78. ACM, 2017. URL: http://dl.acm.org/citation.cfm?id=3034786, URL: http://dx.doi.org/10.1145/3034786.3056117.
  7. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422-426, 1970. URL: http://dx.doi.org/10.1145/362686.362692.
  8. Gerth Stølting Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 546-554. ACM/SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644201.
  9. A. Conway, M. Farach-Colton, and P. Shilane. Optimal Hashing in External Memory. ArXiv e-prints, May 2018. URL: http://arxiv.org/abs/1805.09423.
  10. John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. The tokufs streaming file system. In 4th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage'12, Boston, MA, USA, June 13-14, 2012, 2012. URL: https://www.usenix.org/conference/hotstorage12/workshop-program/presentation/esmet.
  11. Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17-18 October, 1999, New York, NY, USA, pages 285-298. IEEE Computer Society, 1999. URL: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6604, URL: http://dx.doi.org/10.1109/SFFCS.1999.814600.
  12. John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem (in external memory). CoRR, abs/1104.2799, 2011. URL: http://arxiv.org/abs/1104.2799.
  13. John Iacono and Mihai Patrascu. Using hashing to solve the dictionary problem. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 570-582. SIAM, 2012. URL: http://portal.acm.org/citation.cfm?id=2095164&CFID=63838676&CFTOKEN=79617016, URL: http://dx.doi.org/10.1137/1.9781611973099.
  14. William Jannen, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. Lazy analytics: Let other queries do the work for you. In 8th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2016, Denver, CO, June 20-21, 2016., 2016. URL: https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/jannen.
  15. Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223-250, 1995. URL: http://dx.doi.org/10.1137/S089548019223872X.
  16. Wikipedia. Thomae’s function - Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/index.php?title=Thomae's%20function&oldid=837510765, 2018. [Online; accessed 28-April-2018].