Succinct Rank Dictionaries Revisited
Abstract
We study data structures for representing sets of elements drawn from the universe that support access and rank queries. A classical approach to this problem, foundational to the fields of succinct and compact data structures, is to represent the set as a bitvector of bits, where iff is a member of the set. Our particular focus in this paper is on structures taking bits, which stem from the so-called RRR bitvector scheme (Raman et al., ACM Trans. Alg., 2007). In RRR bitvectors, is conceptually divided into blocks of bits each. A block containing 1 bits is then encoded using bits, where bits are used to encode , and bits are used to say which of the possible combinations the block represents. In all existing RRR implementations the code assigned to a block is its lexicographical rank amongst the combinations of its class. In this paper we explore alternative non-lexicographical assignments of codes to blocks. We show these approaches can lead to faster query times and offer relevant space-time trade-offs in practice compared to state-of-the-art implementations (Gog and Petri, Software, Prac. & Exp., 2014) from the Succinct Data Structures Library.
Keywords and phrases:
data structures, data compression, succinct data structures, compressed data structures, weighted de Bruijn sequence, text indexing, string algorithmsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsSupplementary Material:
Software (Source code): https://github.com/saskeli/h0_bv [3]archived at
swh:1:dir:383ab8d9247150886cfacd830469e5e5bdd72551
Acknowledgements:
We thank Zsuzsanna Lipták, Taku Onodera, and Rajeev Raman for interesting discussions, and the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.Editors:
Petra Mutzel and Nicola PrezzaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Bit vectors supporting rank and access queries are essential to the field of compact data structures [15, 11] and, as such, have seen intense research in the past two decades.
We are given a sequence of bits, of which are set to 1 and should preprocess so that the following queries can be answered quickly111Another related query is select, which returns the position of the th 1 bit, but it is less used than the other two queries, particularly in the context of text indexing using the backward search algorithm.:
Classical methods use bits and leave the input bit string intact, building small structures on top of it to support rank queries in constant time [6, 14]. These methods are extremely fast in practice [5, 8], but incur large space overheads on sparse or otherwise compressible inputs. The Elias-Fano representation, used within web search engines [21, 17] and full-text indexes [10] takes bits and supports access in time and rank in time is extra bits are used222Logarithms thoughout are base 2.. There are also a number of heuristic and hybrid methods that are tuned for certain scenarios, such as full-text indexing [8].
Storing a bitvector of length with 1 bits set can be seen as the problem of storing elements of a finite universe of size so that membership and rank queries can be answered efficiently in practice. The information theoretic lower bound for storing elements from a universe of size is . Raman, Raman, and Rao [19], building on earlier work by Pagh [18], describe a data structure that matches this bound up to an term and supports membership and rank queries in time. This data structure is commonly referred to as RRR and has since been heavily engineered by a series of authors [2, 16, 5], culminating in its embodiment in the Succinct Data Structures Library (SDSL)333https://github.com/simongog/sdsl.
RRR is of particular interest in full-text indexing because when applied to the bitvectors of a wavelet tree constructed over the Burrows-Wheeler transform of a string of symbols the resulting structure achieves size bits, where is th-order empirical entropy of [13, 15], a technique called implicit compression boosting [12, 4].
All RRR implementations of which we are aware divide the bitvector into blocks of bits each and store the encoded block containing 1 bits using bits. of the bits are used to encode , the number of 1 bits the block contains, which is referred to as the block’s class. The other bits are used to encode the block’s offset, its lexicographical rank amongst all possible combinations of -length blocks containing 1 bits (we provide a detailed description in Section 2).
Contribution.
In this paper, we revisit the implementation of entropy-compressed bitvectors stemming from the basic RRR scheme. Our contributions are summarized as follows:
-
1.
In order to later decode a block when answering an access or rank query, some RRR implementations [2, 5] use a lookup table of size that is indexed with the block’s class and offset to obtain the original block in time. We describe two methods for reducing the size of the lookup table: a simple sparsification technique that provides a space-time trade-off, and an approach based on weighted binary de Bruijn sequences from combinatorics on words [20] that reduces the lookup table by a factor of . Both these approaches allow slightly larger values of to be used, reducing overall space usage.
-
2.
An alternative to the lookup-table approach that is taken by some RRR implementations [16, 5] is to decode the block directly from its class-offset encoding by using properties of lexicographical offset assignment. These methods decode the original block bit-by-bit in time. We develop an alternative non-lexicographical scheme for assigning offset values, that proves slightly faster to decode in practice.
Both our new approaches above represent a departure from the lexicographical assignment of offsets that is common to all previous work on the problem.
Roadmap.
The remainder of this paper is structured as follows. In the next section, we give a detailed overview of the basic RRR approach common to all previous implementations of it, as well as the vagaries of some specific implementations. We also describe the experimental setup used throughout. Then, in Section 3 we describe methods for reducing the size of the block-decoding table relevant to some RRR implementations. Section 4 describes our new method for inverting combinations, with further engineering measures described in Section 4.1. In Section 5 we apply our new RRR implementations as an encoding option for hybrid bitvector schemes. Section 6 concludes and offers directions for further research.
2 Preliminaries
2.1 RRR bitvector structure
In this section we describe the elements of the RRR data structure that are common to existing implementations. Implementations of RRR [2, 16, 5] generally sacrifice the constant-time queries from the original paper [19] in favor or practicality. The first public implementation of RRR was due to Claude and Navarro [2], and we now describe their approach – all other implementations of which we are aware build on it.
We are given a bitvector of bits, of which are set to 1. We divide up into blocks of bits each. We assume here for ease of exposition that divides – the situation where this is not the case and the last block has less than bits is trivial to deal with.
Denote the th block as . Let the class, , of block be the number of 1 bits contains. There are different blocks belonging to class . Each block corresponds to one of the combinations of its class. We define the offset, of block to be an integer in the range that uniquely determines which element of the class (i.e., which combination) the block corresponds to. All previous work on RRR assigns a block’s offset as its lexicographical rank among all blocks of the same class. We call this the block’s lexicographical offset.
Classes are stored in an bit array, , that trivially supports random access. The offsets are stored in a bit sequence, . Note that random access to elements of is not trivial, because they are of variable length. To enable random access to , pointers to the start of every th element of the offset sequences are stored444For all of the implementations is set to 32 (blocks). in a pointer table of words. To support fast rank queries an additional word table storing cumulative ranks up to every th block is added.
Now, to support we proceed as follows. Position is contained in block . We scan and sum the class values , adding this to the cumulative rank at which we have precomputed. This gives the rank up to the start of block . To obtain , the offset for the block, when scanning class values to we sum values , , , , from which we obtain the starting position in of . As described, a scan of up to elements of is required, followed by a single access to . The class and offset for a block are thus obtained. What remains is to unrank the block’s offset – i.e., obtain its original bit representation – which allows us to complete the query answer by counting the bits in the relevant prefix in the case of a rank query. It is at this point that implementations begin to differ.
Lookup table decoding.
The implementation of Claude and Navarro [2] uses a two-dimensional lookup table of entries that contains all the possible blocks of size (an approach prescribed in Raman at al.’s original paper [19]). The th row of the table contains all the blocks of class (i.e., all the blocks containing 1 bits). Having worked out the class and the offset of the block containing the query index as described above, the query is completed by accessing cell of the table. Claude and Navarro store the blocks in a row in lexicographical order, but the order is not important to the query algorithm just described.
The lookup table has total size bits and indeed it is this size that adds some tension to the data structure: ideally, one would like to increase in order to decrease the ()-bit overhead incurred per block for storing the class values. However increasing causes a rapid blowup in the table size. In order to keep the lookup table manageable, and to fit to four bits, Calude and Navarro fix the block size to , resulting in a lookup table bits in size, and an overhead from the class array of bits. Gog and Petri engineer this approach further in the SDSL [5] with a class called rrr_vector_15.
On-the-fly decoding.
Arguing that 25% is a non-trivial overhead in the context of succinct data strucutres, Navarro and Providel [16] show that the lookup table can be eliminated by decoding the bits of a block from its lexicographical offset in time, by checking the range of values that offset belongs to and extracting bits consecutively as follows.
For a block length , consider all the blocks of class arranged in lexicographical order. The first of them begin with a 0 and the remaining begin with a 1. Thus, if , then the first bit of the block was a 1. In this case we decrement b and c, and decrease by . Otherwise, the first bit of the block was a 0. In this case we only decrement . Now we continue extracting the next bit from a block of length . The process can be stopped if becomes zero, which implies that all the remaining bits are a single run of zeroes or ones. All the binomial coefficients are precomputed in a table in order to make decoding faster, but this is much smaller than the original lookup table. Navarro and Providel show that queries, while slower than the lookup-table approach, are still acceptably fast in practice. The approach was further engineered by Gog and Petri [5] and is included in the SDSL.
2.2 Experimental Setup
We measure and report on experiments with our new methods throughout the paper and so define our experimental machine and test data now. Additionally, our source code is available at https://github.com/saskeli/h0_bv.
Data sets.
We used the same data sets for all of our experiments. Three of the bitvectors are derived from real-world data, and chosen to be diverse and relevant to practical compressed indexing applications. WT-WEB-1GB555Available at http://people.eng.unimelb.edu.au/sgog/data/WT-WEB-1GB.gz and WT-DNA-1GB666Available at http://people.eng.unimelb.edu.au/sgog/data/WT-DNA-1GB.gz are data sets by Gog and Petri [5].
bv-dump.bin777Available at https://doi.org/10.5281/zenodo.11031751 is the concatenated bit vectors from the “plain matrix” representation of the spectral Burrows-Wheeler transform (SBWT) proposed by Alanko, Puglisi and Vuohtoniemi [1]. The SBWT in question is built from a set of 17,336,887 Illumina HiSeq 2500 reads of length 502 sampled from the human gut (SRAidentifier ERR5035349) in a study on irritable bowel syndrome and bile acid malabsorption [7] with -mer size 31.
In addition, we generated random bitvectors with varying probability for a 1-bit. All of these bit vectors are one gigabyte in size, and is i.i.d.. RND-1.bin has , RND-5.bin has and RND-10.bin .
Some summary statistics on the datasets are presented in Table 1
| Dataset | mean entropy | Fraction of uniform | ||
|---|---|---|---|---|
| of 64-bit blocks | 64-bit blocks | |||
| WT-WEB-1GB | ||||
| WT-DNA-1GB | ||||
| bv-dump.bin | ||||
| RND-1.bin | ||||
| RND-5.bin | ||||
| RND-10.bin |
Experiments.
The experimental run is the same for each of the following three sections, with only the data structures under test changing. In particular, data structures are built for each of the data sets, after which random access and rank queries each are run and timed on the built index. The space usage of the built index along with mean query times are reported. We only report times for access queries because comparative rank performance between data structure implementations remains the same, just very slightly and consistently slower than access queries. Full results for rank and select queries as timed in our experiments, along with construction times, are available in Appendix A.
Machine and Compilation.
Tests were run on a machine with an AMD EPYC 7452 Processor and 500GB of DDR4 RAM. The operating system was AlmaLinux 8.10 (with Linux 4.18.0-372.9.1.el8.x86_64 kernel). Code was compiled with GCC 13.3.0 and the performance-relevant flags: -std=c++2a, -march=native, -Ofast and -DNDEBUG. Experiments were run multiple times, on multiple machines to ensure the stability of the results.
3 Reducing the size of the lookup table
As discussed in Section 2.1, the principle hurdle to reducing the lower order terms in the original RRR representation (and its implementation by Claude and Navarro) is the lookup table that stores all possible bitstrings of length . This lookup table has size bits, making it very sensitive to increases in . Navarro and Providel’s approach is to remove this lookup table entirely and reconstruct its entries as needed through computation, at the cost of query time. In this section we keep the lookup table, but examine ways to reduce its size, with the hope of increasing to reduce overall size and retain fast query times.
The two approaches we explore are sparsification of the original lookup table and essentially compacting the lookup table by replacing it with a series of weighted de Bruijn sequences [20].
Sparse lookup table.
Instead of storing all possible length- bitstrings with weight in the table, we can store every th entry ( in blocks total) and reconstruct any desired block in at most constant-time steps. This changes the total space required for the lookup table to bits. Further space savings can be had by storing samples only for blocks where , effectively halving the space usage. Our implementations thus use bits for storing the lookup table, and decoding a block given and values takes time with an algorithm that, given a block, can compute its lexicographic successor in time.
We use the “Compute the lexicographically next bit permutation”-algorithm from the “Bit Twiddling Hacks” website888https://graphics.stanford.edu/~seander/bithacks.html. See Listing 1. The fast enumeration algorithm compiles to 9 instructions per iteration when optimized and compiled by the GCC compiler.
A natural selection for gap size is , leading to the lookup table taking bits.
Weighted de Bruijn sequences.
In [20], Ruskey, Sawada and Williams study weighted de Bruijn sequences, circular strings of length from which all possible substrings of length having 1 bits can be efficiently extracted. In particular, we create circular bitstrings (the weighted de Bruijn sequences) of length , such that all bitstrings of length and weight can be created from -length substrings of this weighted de Bruijn sequence with the necessary bit appended to reach the correct weight.
These remarkable constructions seem tailor made to reduce the size of the RRR lookup table. The class remains the same as before, but the offset is the relative position of the encoded block within the weighted de Bruijn sequence.
Storing these bit strings in practice requires bits, with the first bits being appended to the end, to ensure that the “last” substring can be read without overhead due to the circularity of the string. Conveniently, if we concatenate these weighted de Bruijn sequences by increasing weight, adjacent sequences overlap by characters, provided each weighted de Bruijn sequence is the lexicographically smallest rotation. See Figure 1. Additionally we only require sequences of weights , since higher weight sequences can be stored as inverted references to lower weight sequences. This leads to the entire lookup table fitting in
The required binary sequence along with the start positions of each weight are computed using a Python script and output as static arrays for use with C++ code. These arrays are used for decoding blocks. Reading from the binary sequence consists of reading the desired bytes in big-endian, followed by masking and shifting to generate a candidate block. After this the last bit is flipped, if required before flipping the entire candidate if . The decoding process is visualized in Figure 2.
Performance of reduced size lookup tables.
We benchmarked both the sparse lookup table and weighted de Bruijn approaches with and gaps 1, 7, 15, 24, 32, 64 for the sparse lookup tables. It would have been interesting to test as well, but unfortunately, our methods for compile-time definition of the tables failed for these bigger tables (probably due to the compiler running out of memory or a compile time limitation in constant expression resolution).
Results for benchmarking can be seen in Figure 3. Our approach allows a new space-time trade-off by increasing compression efficiency by increasing while reducing lookup speed. The RRR15 implementation by Gog and Petri [5] remains the fastest overall, but compressing worse than our de Brujin sequence-based approach with (DBS24).
4 Induced total order for length bitstrings
The specific encoding of the offset is irrelevant as long as the mapping from the -length bitstring to the pair is bijective and . As we saw in Section 2.1, lexicographic order allows decoding and answering queries in time. Selecting another induced order changes the complexity of decoding. The new method we now describe allows answering queries on blocks in time.
For any bitstrings and of length and class , define the ordering such that , where denotes the binary representation of shifted right by bits (discarding the lower bits of ) and returns the number of 1 bits in an integer (i.e., its population count). That is, the order between blocks is primarily decided by the weight of their prefixes, where a bitstring with more bits in the prefix will have a greater value. This can then be recursively applied to the prefixes and suffixes for a full recursive definition for calculation of the values.
Based on this, when decoding any given , the first half of the block to decode will contain exactly
one-bits.
Given , a query targeting the prefix of the block can recurse with
and a query targeting the suffix recurses with
The recurrence takes steps, and binary searching for takes time, given precomputed lookup tables for binomial coefficients. This yields a total run time of for determining the result of any rank or access query at a block level.
The offset values with the desired properties can be computed using the following algorithm.
Example.
We illustrate the workings of this alternative encoding with , and further demonstrate querying with an access query on the encoded block.
The encoded values for the prefix and suffix are computed recursively, but the recurrences are not written out here for brevity.
-
There are , vectors of class 3 where the prefix has less than two set bits.
-
, meaning there are two instances for and that are less than , so we add the number of and vectors, with two set bits in the prefix where the prefix encoding is less than . I.e .
-
The suffix , so we add this value to the ones we derived before.
-
Thus .
We start with access with . When we recurse and will be updated accordingly.
-
Since , we know that the class of the prefix of is 2. As we recurse to the left we update by subtracing the offset and dividing with the number of length 4 vectors with one set bit.
-
Since , we know that the class of the new prefix is 1. Now we recurse to the right and update and .
-
Since , we know that the class of the new prefix is 1. Since we are recursing to the prefix, the only possible update value for is .
-
We are now accessing a vector of length one with . I.e. for access.
-
Now for , access.
4.1 Implementation details
Recursive split.
Our implementation of the method described in the previous section makes extensive use of (small) lookup tables. For our implementation with we precompute for and . The binomials are stored as static heap-ordered binary trees to enable fast branchless binary searching for binomial coefficients. In addition, the decoding for block size 8 is done using precomputed lookup tables.
A drawback with our solution is the need to compute division and modulus with binomial coefficients. We were unable to make any compiler precompute the fast integer divisions for the required binomials, and inverse modular arithmetic is impractical because implementations would require computations using 128-bit integers.
Our implementation relies on the symmetry of the recurrence, requiring a block size that is a power of two, while, for better overall data structure size, should be for some so that bits are sufficient to store every possible class value. Unfortunately, asymmetric recurrences would cause additional unavoidable and unpredictable branching while decoding and have a significant negative impact on query performance.
8-bit block iterative approach.
The same principle for the induced order can be applied to other splits besides the recursive even split described above. This leads to an decode time, which matches the on-the-fly approach of Navarro and Providel [16] (and its later implementation by Gog and Petri [5]). The hope is that the reliance on lookup tables for 8-bit sections of the block would yield lower constant factors compared to other implementations.
This approach allows use of block sizes that are not powers of two to improve the storage of class values. If the offsets for the 8-bit sub-blocks correspond to the lexicographic order, the first partial block can be decoded with the same lookup table as the full blocks, with minimal impact from the change in .
4.2 Results
Results from benchmarking, now including on-the-fly decoding performance, can be seen in Figure 4. The lookup table-based RRR15 remains the fastest overall, with the on-the-fly approaches lagging behind in query performance, but compressing significantly better. Of the on-the-fly approaches, our symmetric recurrence (REC) and block iterative (IT) methods offer similar compression but typically faster lookup speed than the on-the-fly implementation in the SDSL. On all data sets tested the new methods offer relevant space-time tradeoffs.
The space consumed by various components of the data structures is shown in Figure 5. Data structure size is mainly determined by the block and superblock sizes used, and lookup table sizes are irrelevant. Space for the array increases as block size increases, but this is offset by the shrinking of the array and superblock overhead.
5 Hybrid implementation
Regions of bitvectors that consist of alternating runs of ones and zeros have very low entropy but potentially high entropy. This leads to poor performance for the RRR-based techniques discussed so far, which achieve entropy, ignoring the term.
The hybrid bitvector by Kärkkäinen, Kempa and Puglisi [8] is a widely used, practically powerful compressed bitvector scheme that, in particular, is an essential ingredient of state-of-the-art general-purpose FM indexes [4]. The approach is to divide the bitvector into blocks and then select the best encoding for each block from a number of possible encodings. The implementation of hybrid bitvectors in the sdsl_lite library uses 256 bit blocks, and then stores each block either as: 1) minority-bit encoded (i.e., to encode the block, we store the positions of the 1 bits using 8 bits each); 2) run-length encoded; or 3) in plain form. This scheme works extremely well for blocks with runs or blocks that are extremely dense or sparse, but can perform badly when this is not the case and many blocks get stored in plain form (resulting in no compression).
This hybrid scheme and the RRR-based scheme complement each other, and adding a option for block encoding to the hybrid bitvector should improve compression performance.
With this in mind, we added on our 64-bit block iterative encoding as an option in the 256-bit hybrid bitvector implementation included in the SDSL. As a comparison we had another version that used Gog and Petri’s RRR implementation with as an encoding option instead of our iterative encoding. This was done in a way that did not cause any additional space overhead for the hybrid structure, but should induce some significant overhead to each query, compared to the implementation without support for encoding.
We ran the same tests on these structures, the results of which are shown in Figure 6. We observe that by including RRR as a block-encoding options, compression levels of hybrid bitvectors can be significantly improved, though at some penalty to query performance.
6 Concluding Remarks
We have described two main improvements to the RRR bitvector data structure, the common theme of which is the non-lexicographical assignment of codes to offset values (in contrast the lexicographical assignment, which has been done in all previous work on the problem). The first of these approaches represents all possible blocks of size as a series of weighted de Bruijn sequences. The second provides a new on-the-fly block-decoding mechanism for answering queries in time based on a new induced total order for offset encoding. These methods offer improved practical tradeoffs for -compressed bitvectors. We have also descibed a sparsification technique for reducing the size of the lookup table in the case that lexicographic assignment is used.
There are numerous avenues to further optimize the RRR scheme. Firstly, profiling reveals that our non-lookup table-based implementations execute between three and eight div instructions per query. Eliminating these divisions efficiently would improve the query performance of our data structures and possibly be of independent interest. Efficiently optimizing divisions for 64 bit (and larger) integers is an open problem, but perhaps existing schemes for shorter integers[9] could be applied to larger integers as well as hardware keeps improving.
Larger block sizes have the potential to improve the compression ratio further, but implementations require computations using large integers (128, 256 or 512-bit integers), with significantly slower arithmetic operations than 64-bit integers.
Finally, the rrr_vector_15 class in the SDSL uses bit-parallelism to great effect. Since its publication [5] some 10 years ago, architecture support for bit-parallelism has improved significantly, and we expect it is possible to improve the existing optimizations of rrr_vector_15 or apply novel optimizations to implementations with using new SIMD instructions.
We leave these open problems to be the focus of future research.
References
- [1] Jarno N. Alanko, Simon J. Puglisi, and Jaakko Vuohtoniemi. Small searchable k-spectra via subset rank queries on the spectral Burrows-Wheeler transform. In SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023, pages 225–236. SIAM, 2023. doi:10.1137/1.9781611977714.20.
- [2] Francisco Claude and Gonzalo Navarro. Practical rank/select queries over arbitrary sequences. In String Processing and Information Retrieval, pages 176–187, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-540-89097-3_18.
- [3] Saska Dönges. saskeli/h0_bv. Software, swhId: swh:1:dir:383ab8d9247150886cfacd830469e5e5bdd72551 (visited on 2025-07-02). URL: https://github.com/saskeli/h0_bv, doi:10.4230/artifacts.23794.
- [4] Simon Gog, Juha Kärkkäinen, Dominik Kempa, Matthias Petri, and Simon J. Puglisi. Fixed block compression boosting in fm-indexes: Theory and practice. Algorithmica, 81(4):1370–1391, 2019. doi:10.1007/S00453-018-0475-9.
- [5] Simon Gog and Matthias Petri. Optimized succinct data structures for massive data. Software: Practice and Experience, 44(11):1287–1314, 2014. doi:10.1002/spe.2198.
- [6] Guy Jacobson. Space-efficient static trees and graphs. In Proc. 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549–554. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63533.
- [7] Ian B Jeffery, Anubhav Das, Eileen O’Herlihy, Simone Coughlan, Katryna Cisek, Michael Moore, Fintan Bradley, Tom Carty, Meenakshi Pradhan, Chinmay Dwibedi, et al. Differences in fecal microbiomes and metabolomes of people with vs without irritable bowel syndrome and bile acid malabsorption. Gastroenterology, 158(4):1016–1028, 2020.
- [8] Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Hybrid compression of bitvectors for the FM-index. In Proc. Data Compression Conference, pages 302–311. IEEE, 2014. doi:10.1109/DCC.2014.87.
- [9] Daniel Lemire, Owen Kaser, and Nathan Kurz. Faster remainder by direct computation: Applications to compilers and software libraries. Software: Practice and Experience, 49(6):953–970, 2019. doi:10.1002/spe.2689.
- [10] Danyang Ma, Simon J. Puglisi, Rajeev Raman, and Bella Zhukova. On Elias-Fano for rank queries in FM-indexes. In Proc. 31st Data Compression Conference, pages 223–232. IEEE, 2021. doi:10.1109/DCC50243.2021.00030.
- [11] Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Bioinformatics in the Era of High-Throughput Sequencing (2nd edition). Cambridge University Press, 2023. URL: http://www.genome-scale.info/.
- [12] Veli Mäkinen and Gonzalo Navarro. Implicit compression boosting with applications to self-indexing. In Proc. String Processing and Information Retrieval, 14th International Symposium, LNCS 4726, pages 229–241. Springer, 2007. doi:10.1007/978-3-540-75530-2_21.
- [13] Giovanni Manzini. An analysis of the burrows-wheeler transform. J. ACM, 48(3):407–430, 2001. doi:10.1145/382780.382782.
- [14] J. Ian Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science, LNCS 1180, pages 37–42. Springer, 1996. doi:10.1007/3-540-62034-6_35.
- [15] Gonzalo Navarro. Compact Data Structures – A practical approach. Cambridge University Press, 2016. ISBN 978-1-107-15238-0. 570 pages.
- [16] Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In Proc. SEA, pages 295–306. Springer, 2012. doi:10.1007/978-3-642-30850-5_26.
- [17] Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proc. 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pages 273–282. ACM, 2014. doi:10.1145/2600428.2609615.
- [18] Rasmus Pagh. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proc. Automata, Languages and Programming, 26th International Colloquium, ICALP’99, LNCS 1644, pages 595–604. Springer, 1999. doi:10.1007/3-540-48523-6_56.
- [19] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43–es, November 2007. doi:10.1145/1290672.1290680.
- [20] Frank Ruskey, Joe Sawada, and Aaron Williams. De Bruijn sequences for fixed-weight binary strings. SIAM J. Discret. Math., 26(2):605–617, 2012. doi:10.1137/100808782.
- [21] Sebastiano Vigna. Quasi-succinct indices. In Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, February 4-8, 2013, pages 83–92. ACM, 2013. doi:10.1145/2433396.2433409.
Appendix A Tabulated experiment results
| WT-WEB-1GB | WT-DNA-1GB | bv-dump.bin | RND-1.bin | RND-5.bin | RND-10.bin | |
| GAP | block size: 15, gap: 1 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 15, gap: 7 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 15, gap: 15 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 15, gap: 24 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 15, gap: 32 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 15, gap: 64 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| WT-WEB-1GB | WT-DNA-1GB | bv-dump.bin | RND-1.bin | RND-5.bin | RND-10.bin | |
| GAP | block size: 24, gap: 1 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 24, gap: 7 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 24, gap: 15 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 24, gap: 24 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 24, gap: 32 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| GAP | block size: 24, gap: 64 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| WT-WEB-1GB | WT-DNA-1GB | bv-dump.bin | RND-1.bin | RND-5.bin | RND-10.bin | |
| IT | block size: 63 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| IT | block size: 64 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| REC | block size: 64 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| DBS | block size: 15 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| DBS | block size: 24 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| RRR | block size: 15 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| WT-WEB-1GB | WT-DNA-1GB | bv-dump.bin | RND-1.bin | RND-5.bin | RND-10.bin | |
| SDSL | block size: 24 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| SDSL | block size: 31 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| SDSL | block size: 63 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | ||||||
| HYB-IT | block size: 256 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | nan | nan | nan | nan | nan | nan |
| HYB-RRR | block size: 256 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | nan | nan | nan | nan | nan | nan |
| HYB | block size: 256 | |||||
| build time (s) | ||||||
| bits/bit | ||||||
| access (ns) | ||||||
| rank (ns) | ||||||
| select (ns) | nan | nan | nan | nan | nan | nan |
