SPIDER: Improved Succinct Rank and Select Performance

Laws, Matthew D.; Bliven, Jocelyn; Conklin, Kit; Laalai, Elyes; McCauley, Samuel; Sturdevant, Zach S.

doi:10.4230/LIPIcs.SEA.2024.21

Abstract

Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: Rank(i) gives the number of 1 bits in slots 0 through i, and Select(j) gives the first slot s with Rank(s) = j. A succinct data structure can answer these queries while using space much smaller than the size of the original bit vector.
State of the art succinct rank and select data structures use as little as 4% extra space (over the underlying bit vector) while answering rank and select queries very quickly. Rank queries can be answered using only a handful of array accesses. Select queries can be answered by starting with similar array accesses, followed by a linear scan through the bit vector.
Nonetheless, a tradeoff remains: data structures that use under 4% space are significantly slower at answering rank and select queries than less-space-efficient data structures (using, say, over 20% extra space).
In this paper we make significantly progress towards closing this gap. We give a new data structure, SPIDER, which uses 3.82% extra space. SPIDER gives the best known rank query time for data sets of 8 billion or more bits, even compared to much less space-efficient data structures. For select queries, SPIDER outperforms all data structures that use less than 4% space, and significantly closes the gap in select performance between data structures with less than 4% space, and those that use more (over 20% for both rank and select) space.
SPIDER makes two main technical contributions. For rank queries, it improves performance by interleaving the metadata with the bit vector to improve cache efficiency. For select queries, it uses predictions to almost eliminate the cost of the linear scan. These predictions are inspired by recent results on data structures with machine-learned predictions, adapted to the succinct data structure setting. Our results hold on both real and synthetic data, showing that these predictions are effective in practice.

Giusepppe Attardi. Wikiextractor. https://github.com/attardi/wikiextractor, 2015.
Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. A learned approach to design compressed rank/select data structures. Transactions on Algorithms (TALG), 18(3):1-28, 2022.
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. ALEX: an updatable adaptive learned index. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 969-984, 2020.
Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction. Journal of Experimental Algorithmics (JEA), 26:1-67, 2021.
Paolo Ferragina and Giorgio Vinciguerra. Learned data structures. In Recent Trends in Learning From Data: Tutorials from the INNS Big Data and Deep Learning Conference (INNSBDDL2019), pages 5-41. Springer, 2020.
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms (SEA), pages 326-337. Springer, 2014.
Rodrigo González, Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro. Practical implementation of rank and select queries. In Poster Proc. Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA), pages 27-38. CTI Press and Ellinika Grammata Greece, 2005.
Roberto Grossi, A Gupta, JS Vitter, et al. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841-850, 2003.
Guy Jacobson. Space-efficient static trees and graphs. In 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549-554. IEEE Computer Society, 1989.
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 489-504, 2018.
Florian Kurpicz. Engineering compact data structures for rank and select queries on bit vectors. In International Symposium on String Processing and Information Retrieval (SPIRE), pages 257-272. Springer, 2022.
Tianxiao Li, Jingxun Liang, Huacheng Yu, and Renfei Zhou. Dynamic “succincter”. In 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1715-1733. IEEE, 2023.
Mingmou Liu, Yitong Yin, and Huacheng Yu. Succinct filters for sets of unknown sizes. In 47th International Colloquium on Automata, Languages, and Programming (ICALP). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. In 16th Annual Sympsoim on Combinatorial Pattern Matching (CPM), pages 45-56. Springer, 2005.
Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
J Ian Munro. Tables. In International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 37-42. Springer, 1996.
J Ian Munro and Venkatesh Raman. Succinct representation of balanced parentheses, static trees and planar graphs. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 118-126. IEEE, 1997.
Gonzalo Navarro. Wavelet trees for all. Journal of Discrete Algorithms, 25:2-20, 2014.
Gonzalo Navarro. Compact data structures: A practical approach. Cambridge University Press, 2016.
Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In International Symposium on Experimental Algorithms (SEA), pages 295-306. Springer, 2012.
Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 273-282, 2014.
Prashant Pandey, Michael A Bender, and Rob Johnson. A fast x86 implementation of select. arXiv preprint arXiv:1706.00990, 2017.
Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. A general-purpose counting filter: Making every bit count. In International conference on Management of Data (SIGMOD), pages 775-787, 2017.
Mihai Patrascu. Succincter. In 49th Annual Symposium on Foundations of Computer Science (FOCS), pages 305-313. IEEE, 2008.
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms (TALG), 3(4):43-es, 2007.
Baris E Suzek, Hongzhan Huang, Peter McGarvey, Raja Mazumder, and Cathy H Wu. Uniref: comprehensive and non-redundant uniprot reference clusters. Bioinformatics, 23(10):1282-1288, 2007.
Sebastiano Vigna. Broadword implementation of rank/select queries. In International Workshop on Experimental and Efficient Algorithms (SEA), pages 154-168. Springer, 2008.
Dong Zhou, David G Andersen, and Michael Kaminsky. Space-efficient, high-performance rank and select structures on uncompressed bit sequences. In Symposium on Experimental Algorithms (SEA), pages 151-163. Springer, 2013.

SPIDER: Improved Succinct Rank and Select Performance

Authors Matthew D. Laws , Jocelyn Bliven , Kit Conklin , Elyes Laalai , Samuel McCauley , Zach S. Sturdevant

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

SPIDER: Improved Succinct Rank and Select Performance

Authors Matthew D. Laws , Jocelyn Bliven , Kit Conklin , Elyes Laalai , Samuel McCauley , Zach S. Sturdevant

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message