Simplified Tight Bounds for Monotone Minimal Perfect Hashing

Author Dmitry Kosolobov



PDF
Thumbnail PDF

File

LIPIcs.CPM.2024.19.pdf
  • Filesize: 1 MB
  • 13 pages

Document Identifiers

Author Details

Dmitry Kosolobov
  • Institute of Natural Sciences and Mathematics, Ural Federal University, Ekaterinburg, Russia

Cite AsGet BibTex

Dmitry Kosolobov. Simplified Tight Bounds for Monotone Minimal Perfect Hashing. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 19:1-19:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.CPM.2024.19

Abstract

Given an increasing sequence of integers x₁,…,x_n from a universe {0,…,u-1}, the monotone minimal perfect hash function (MMPHF) for this sequence is a data structure that answers the following rank queries: rank(x) = i if x = x_i, for i ∈ {1,…,n}, and rank(x) is arbitrary otherwise. Assadi, Farach-Colton, and Kuszmaul recently presented at SODA'23 a proof of the lower bound Ω(n min{log log log u, log n}) for the bits of space required by MMPHF, provided u ≥ n 2^{2^{√{log log n}}}, which is tight since there is a data structure for MMPHF that attains this space bound (and answers the queries in O(log u) time). In this paper, we close the remaining gap by proving that, for u ≥ (1+ε)n, where ε > 0 is any constant, the tight lower bound is Ω(n min{log log log u/n, log n}), which is also attainable; we observe that, for all reasonable cases when n < u < (1+ε)n, known facts imply tight bounds, which virtually settles the problem. Along the way we substantially simplify the proof of Assadi et al. replacing a part of their heavy combinatorial machinery by trivial observations. However, an important part of the proof still remains complicated. This part of our paper repeats arguments of Assadi et al. and is not novel. Nevertheless, we include it, for completeness, offering a somewhat different perspective on these arguments.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • monotone minimal perfect hashing
  • lower bound
  • MMPHF
  • hash

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Assadi, M. Farach-Colton, and W. Kuszmaul. Tight bounds for monotone minimal perfect hashing. In Proc. Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 456-476. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977554.ch2.
  2. D. Belazzougui. Linear time construction of compressed text indices in compact space. In Proceedings of the forty-sixth Annual ACM Symposium on Theory of Computing, pages 148-193, 2014. URL: https://doi.org/10.1145/2591796.2591885.
  3. D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In Proc. SODA, pages 785-794. SIAM, 2009. URL: https://doi.org/10.1137/1.9781611973068.86.
  4. D. Belazzougui, F. C. Botelho, and M. Dietzfelbinger. Hash, displace, and compress. In European Symposium on Algorithms, pages 682-693. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-04128-0_61.
  5. D. Belazzougui, F. Cunial, J. Kärkkäinen, and V. Mäkinen. Linear-time string indexing and analysis in small space. ACM Transactions on Algorithms (TALG), 16(2):1-54, 2020. URL: https://doi.org/10.1145/3381417.
  6. D. Belazzougui and G. Navarro. Alphabet-independent compressed text indexing. ACM Transactions on Algorithms (TALG), 10(4):1-19, 2014. URL: https://doi.org/10.1145/2635816.
  7. D. Belazzougui and G. Navarro. Optimal lower and upper bounds for representing sequences. ACM Transactions on Algorithms (TALG), 11(4):1-21, 2015. URL: https://doi.org/10.1145/2629339.
  8. D. Clark. Compact pat trees. PhD thesis, University of Waterloo, 1997. Google Scholar
  9. R. Clifford, A. Fontaine, E. Porat, B. Sach, and T. Starikovskaya. Dictionary matching in a stream. In Proc. ESA, volume 9294 of LNCS, pages 361-372. Springer, 2015. URL: https://doi.org/10.1007/978-3-662-48350-3_31.
  10. T. M. Cover and J. A. Thomas. Information theory and statistics. Elements of Information Theory, 1(1):279-335, 1991. URL: https://doi.org/10.1002/0471200611.
  11. M. L. Fredman and J. Komlós. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic Discrete Methods, 5(1):61-68, 1984. URL: https://doi.org/10.1137/0605009.
  12. M. L. Fredman, J. Komlós, and E. Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538-544, 1984. URL: https://doi.org/10.1145/828.1884.
  13. T. Gagie, G. Navarro, and B. Prezza. Fully functional suffix trees and optimal text searching in bwt-runs bounded space. Journal of the ACM (JACM), 67(1):1-54, 2020. URL: https://doi.org/10.1145/3375890.
  14. R. Grossi, A. Orlandi, and R. Raman. Optimal trade-offs for succinct string indexes. In Automata, Languages and Programming: 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6-10, 2010, Proceedings, Part I 37, pages 678-689. Springer, 2010. URL: https://doi.org/10.1007/978-3-642-14165-2_57.
  15. G. Jacobson. Space-efficient static trees and graphs. In Proc. 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE, 1989. URL: https://doi.org/10.1109/SFCS.1989.63533.
  16. K. Mehlhorn. On the program size of perfect and universal hash functions. In 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pages 170-175. IEEE, 1982. URL: https://doi.org/10.1109/SFCS.1982.80.
  17. J. Radhakrishnan. Improved bounds for covering complete uniform hypergraphs. Information Processing Letters, 41(4):203-207, 1992. URL: https://doi.org/10.1016/0020-0190(92)90181-T.