SPIDER: Improved Succinct Rank and Select Performance

Authors Matthew D. Laws , Jocelyn Bliven , Kit Conklin , Elyes Laalai , Samuel McCauley , Zach S. Sturdevant



PDF
Thumbnail PDF

File

LIPIcs.SEA.2024.21.pdf
  • Filesize: 0.98 MB
  • 18 pages

Document Identifiers

Author Details

Matthew D. Laws
  • Williams College Computer Science, Williamstown, MA, USA
Jocelyn Bliven
  • Williams College Computer Science, Williamstown, MA, USA
Kit Conklin
  • Williams College Computer Science, Williamstown, MA, USA
Elyes Laalai
  • Williams College Computer Science, Williamstown, MA, USA
Samuel McCauley
  • Williams College Computer Science, Williamstown, MA, USA
Zach S. Sturdevant
  • Williams College Computer Science, Williamstown, MA, USA

Cite AsGet BibTex

Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, and Zach S. Sturdevant. SPIDER: Improved Succinct Rank and Select Performance. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.21

Abstract

Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: Rank(i) gives the number of 1 bits in slots 0 through i, and Select(j) gives the first slot s with Rank(s) = j. A succinct data structure can answer these queries while using space much smaller than the size of the original bit vector. State of the art succinct rank and select data structures use as little as 4% extra space (over the underlying bit vector) while answering rank and select queries very quickly. Rank queries can be answered using only a handful of array accesses. Select queries can be answered by starting with similar array accesses, followed by a linear scan through the bit vector. Nonetheless, a tradeoff remains: data structures that use under 4% space are significantly slower at answering rank and select queries than less-space-efficient data structures (using, say, over 20% extra space). In this paper we make significantly progress towards closing this gap. We give a new data structure, SPIDER, which uses 3.82% extra space. SPIDER gives the best known rank query time for data sets of 8 billion or more bits, even compared to much less space-efficient data structures. For select queries, SPIDER outperforms all data structures that use less than 4% space, and significantly closes the gap in select performance between data structures with less than 4% space, and those that use more (over 20% for both rank and select) space. SPIDER makes two main technical contributions. For rank queries, it improves performance by interleaving the metadata with the bit vector to improve cache efficiency. For select queries, it uses predictions to almost eliminate the cost of the linear scan. These predictions are inspired by recent results on data structures with machine-learned predictions, adapted to the succinct data structure setting. Our results hold on both real and synthetic data, showing that these predictions are effective in practice.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • Rank and Select
  • Succinct Data Structures
  • Data Structres
  • Cache Performance
  • Predictions

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Giusepppe Attardi. Wikiextractor. https://github.com/attardi/wikiextractor, 2015.
  2. Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. A learned approach to design compressed rank/select data structures. Transactions on Algorithms (TALG), 18(3):1-28, 2022. Google Scholar
  3. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. ALEX: an updatable adaptive learned index. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 969-984, 2020. Google Scholar
  4. Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction. Journal of Experimental Algorithmics (JEA), 26:1-67, 2021. Google Scholar
  5. Paolo Ferragina and Giorgio Vinciguerra. Learned data structures. In Recent Trends in Learning From Data: Tutorials from the INNS Big Data and Deep Learning Conference (INNSBDDL2019), pages 5-41. Springer, 2020. Google Scholar
  6. Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms (SEA), pages 326-337. Springer, 2014. Google Scholar
  7. Rodrigo González, Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro. Practical implementation of rank and select queries. In Poster Proc. Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA), pages 27-38. CTI Press and Ellinika Grammata Greece, 2005. Google Scholar
  8. Roberto Grossi, A Gupta, JS Vitter, et al. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-850, 2003. Google Scholar
  9. Guy Jacobson. Space-efficient static trees and graphs. In 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE Computer Society, 1989. Google Scholar
  10. Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 489-504, 2018. Google Scholar
  11. Florian Kurpicz. Engineering compact data structures for rank and select queries on bit vectors. In International Symposium on String Processing and Information Retrieval (SPIRE), pages 257-272. Springer, 2022. Google Scholar
  12. Tianxiao Li, Jingxun Liang, Huacheng Yu, and Renfei Zhou. Dynamic “succincter”. In 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1715-1733. IEEE, 2023. Google Scholar
  13. Mingmou Liu, Yitong Yin, and Huacheng Yu. Succinct filters for sets of unknown sizes. In 47th International Colloquium on Automata, Languages, and Programming (ICALP). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Google Scholar
  14. Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. In 16th Annual Sympsoim on Combinatorial Pattern Matching (CPM), pages 45-56. Springer, 2005. Google Scholar
  15. Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018. Google Scholar
  16. J Ian Munro. Tables. In International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 37-42. Springer, 1996. Google Scholar
  17. J Ian Munro and Venkatesh Raman. Succinct representation of balanced parentheses, static trees and planar graphs. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 118-126. IEEE, 1997. Google Scholar
  18. Gonzalo Navarro. Wavelet trees for all. Journal of Discrete Algorithms, 25:2-20, 2014. Google Scholar
  19. Gonzalo Navarro. Compact data structures: A practical approach. Cambridge University Press, 2016. Google Scholar
  20. Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In International Symposium on Experimental Algorithms (SEA), pages 295-306. Springer, 2012. Google Scholar
  21. Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 273-282, 2014. Google Scholar
  22. Prashant Pandey, Michael A Bender, and Rob Johnson. A fast x86 implementation of select. arXiv preprint arXiv:1706.00990, 2017. Google Scholar
  23. Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In International conference on Management of Data (SIGMOD), pages 775-787, 2017. Google Scholar
  24. Mihai Patrascu. Succincter. In 49th Annual Symposium on Foundations of Computer Science (FOCS), pages 305-313. IEEE, 2008. Google Scholar
  25. Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms (TALG), 3(4):43-es, 2007. Google Scholar
  26. Baris E Suzek, Hongzhan Huang, Peter McGarvey, Raja Mazumder, and Cathy H Wu. Uniref: comprehensive and non-redundant uniprot reference clusters. Bioinformatics, 23(10):1282-1288, 2007. Google Scholar
  27. Sebastiano Vigna. Broadword implementation of rank/select queries. In International Workshop on Experimental and Efficient Algorithms (SEA), pages 154-168. Springer, 2008. Google Scholar
  28. Dong Zhou, David G Andersen, and Michael Kaminsky. Space-efficient, high-performance rank and select structures on uncompressed bit sequences. In Symposium on Experimental Algorithms (SEA), pages 151-163. Springer, 2013. Google Scholar