Succinct Rank Dictionaries Revisited

Dönges, Saska; Puglisi, Simon J.

doi:10.4230/LIPIcs.SEA.2025.15

Succinct Rank Dictionaries Revisited

Saska Dönges

Department of Computer Science, University of Helsinki, Finland Simon J. Puglisi

Department of Computer Science, University of Helsinki, Finland

Abstract

We study data structures for representing sets of $m$ elements drawn from the universe $[0..n-1]$ that support access and rank queries. A classical approach to this problem, foundational to the fields of succinct and compact data structures, is to represent the set as a bitvector $X$ of $n$ bits, where $X[i]=1$ iff $i$ is a member of the set. Our particular focus in this paper is on structures taking $\log_{2}{n\choose m}+o(n)$ bits, which stem from the so-called RRR bitvector scheme (Raman et al., ACM Trans. Alg., 2007). In RRR bitvectors, $X$ is conceptually divided into $n/b$ blocks of $b$ bits each. A block containing $c$ 1 bits is then encoded using $\log_{2}b+\log_{2}{b\choose c}$ bits, where $\log b$ bits are used to encode $c$ , and $\log_{2}{b\choose c}$ bits are used to say which of the ${b\choose c}$ possible combinations the block represents. In all existing RRR implementations the code assigned to a block is its lexicographical rank amongst the ${b\choose c}$ combinations of its class. In this paper we explore alternative non-lexicographical assignments of codes to blocks. We show these approaches can lead to faster query times and offer relevant space-time trade-offs in practice compared to state-of-the-art implementations (Gog and Petri, Software, Prac. & Exp., 2014) from the Succinct Data Structures Library.

Keywords and phrases:

data structures, data compression, succinct data structures, compressed data structures, weighted de Bruijn sequence, text indexing, string algorithms

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Design and analysis of algorithms

Supplementary Material:

Software (Source code): https://github.com/saskeli/h0_bv [3]
archived at

swh:1:dir:383ab8d9247150886cfacd830469e5e5bdd72551

Acknowledgements:

We thank Zsuzsanna Lipták, Taku Onodera, and Rajeev Raman for interesting discussions, and the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.

DOI:

10.4230/LIPIcs.SEA.2025.15

Event:

23rd International Symposium on Experimental Algorithms (SEA 2025)

Editors:

Petra Mutzel and Nicola Prezza

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Bit vectors supporting rank and access queries are essential to the field of compact data structures [15, 11] and, as such, have seen intense research in the past two decades.

We are given a sequence $X[0..n-1]$ of $n$ bits, $m$ of which are set to 1 and should preprocess $X$ so that the following queries can be answered quickly¹¹1Another related query is select $(i)$ , which returns the position of the $i$ th 1 bit, but it is less used than the other two queries, particularly in the context of text indexing using the backward search algorithm.:

	$\displaystyle\mbox{\rm access}(i)$	$\displaystyle=X[i]$
	$\displaystyle\mbox{\rm rank}(i)$	$\displaystyle=\text{number of $1$s among the first $i$ bits}.$

Classical methods use $n+o(n)$ bits and leave the input bit string intact, building small structures on top of it to support rank queries in constant time [6, 14]. These methods are extremely fast in practice [5, 8], but incur large space overheads on sparse or otherwise compressible inputs. The Elias-Fano representation, used within web search engines [21, 17] and full-text indexes [10] takes $2m+m\log(n/m)$ bits and supports access in $O(1)$ time and rank in $O(\log(n/m))$ time is $o(n)$ extra bits are used²²2Logarithms thoughout are base 2.. There are also a number of heuristic and hybrid methods that are tuned for certain scenarios, such as full-text indexing [8].

Storing a bitvector of length $n$ with $m$ 1 bits set can be seen as the problem of storing $m$ elements of a finite universe of size $n$ so that membership and rank queries can be answered efficiently in practice. The information theoretic lower bound for storing $m$ elements from a universe of size $n$ is $B=\log{n\choose m}$ . Raman, Raman, and Rao [19], building on earlier work by Pagh [18], describe a data structure that matches this bound up to an $o(n)$ term and supports membership and rank queries in $O(1)$ time. This data structure is commonly referred to as RRR and has since been heavily engineered by a series of authors [2, 16, 5], culminating in its embodiment in the Succinct Data Structures Library (SDSL)³³3https://github.com/simongog/sdsl.

RRR is of particular interest in full-text indexing because when applied to the bitvectors of a wavelet tree constructed over the Burrows-Wheeler transform of a string $T$ of $n$ symbols the resulting structure achieves size $nH_{k}+o(n)$ bits, where $nH_{k}$ is $k$ th-order empirical entropy of $T$ [13, 15], a technique called implicit compression boosting [12, 4].

All RRR implementations of which we are aware divide the bitvector $X$ into $n/b$ blocks of $b$ bits each and store the encoded block containing $c$ 1 bits using $\log b+\log{b\choose c}$ bits. $\log b$ of the bits are used to encode $c$ , the number of 1 bits the block contains, which is referred to as the block’s class. The other $\log{b\choose c}$ bits are used to encode the block’s offset, its lexicographical rank amongst all possible ${b\choose c}$ combinations of $b$ -length blocks containing $c$ 1 bits (we provide a detailed description in Section 2).

Contribution.

In this paper, we revisit the implementation of entropy-compressed bitvectors stemming from the basic RRR scheme. Our contributions are summarized as follows:

1.

In order to later decode a block when answering an access or rank query, some RRR implementations [2, 5] use a lookup table of size $2^{b}$ that is indexed with the block’s class and offset to obtain the original block in $O(1)$ time. We describe two methods for reducing the size of the lookup table: a simple sparsification technique that provides a space-time trade-off, and an approach based on weighted binary de Bruijn sequences from combinatorics on words [20] that reduces the lookup table by a factor of $b$ . Both these approaches allow slightly larger values of $b$ to be used, reducing overall space usage.
2.

An alternative to the lookup-table approach that is taken by some RRR implementations [16, 5] is to decode the block directly from its class-offset encoding by using properties of lexicographical offset assignment. These methods decode the original block bit-by-bit in $O(b)$ time. We develop an alternative non-lexicographical scheme for assigning offset values, that proves slightly faster to decode in practice.

Both our new approaches above represent a departure from the lexicographical assignment of offsets that is common to all previous work on the problem.

Roadmap.

The remainder of this paper is structured as follows. In the next section, we give a detailed overview of the basic RRR approach common to all previous implementations of it, as well as the vagaries of some specific implementations. We also describe the experimental setup used throughout. Then, in Section 3 we describe methods for reducing the size of the block-decoding table relevant to some RRR implementations. Section 4 describes our new method for inverting combinations, with further engineering measures described in Section 4.1. In Section 5 we apply our new RRR implementations as an encoding option for hybrid bitvector schemes. Section 6 concludes and offers directions for further research.

2 Preliminaries

2.1 RRR bitvector structure

In this section we describe the elements of the RRR data structure that are common to existing implementations. Implementations of RRR [2, 16, 5] generally sacrifice the constant-time queries from the original paper [19] in favor or practicality. The first public implementation of RRR was due to Claude and Navarro [2], and we now describe their approach – all other implementations of which we are aware build on it.

We are given a bitvector $X$ of $n$ bits, $m$ of which are set to 1. We divide $X$ up into $n/b$ blocks of $b$ bits each. We assume here for ease of exposition that $b$ divides $n$ – the situation where this is not the case and the last block has less than $b$ bits is trivial to deal with.

Denote the $i$ ^th block as $x_{i}$ . Let the class, $c_{i}$ , of block $x_{i}$ be the number of 1 bits $x_{i}$ contains. There are $b\choose c$ different blocks belonging to class $c$ . Each block corresponds to one of the $b\choose c$ combinations of its class. We define the offset, $f_{i}$ of block $x_{i}$ to be an integer in the range $[0..{b\choose c})$ that uniquely determines which element of the class (i.e., which combination) the block corresponds to. All previous work on RRR assigns a block’s offset as its lexicographical rank among all blocks of the same class. We call this the block’s lexicographical offset.

Classes are stored in an $\lceil n/b\rceil\cdot\lceil\log(b+1)\rceil$ bit array, $C$ , that trivially supports random access. The offsets are stored in a $\sum_{c_{i}}\lceil\log{b\choose c_{i}}\rceil$ bit sequence, $F$ . Note that random access to elements of $F$ is not trivial, because they are of variable length. To enable random access to $F$ , pointers to the start of every $k$ ^th element of the offset sequences are stored⁴⁴4For all of the implementations $k$ is set to 32 (blocks). in a pointer table of $\lceil\frac{n}{bk}\rceil$ words. To support fast rank queries an additional $\lceil\frac{n}{bk}\rceil$ word table storing cumulative ranks up to every $k$ ^th block is added.

Now, to support $\mbox{\rm rank}(i)$ we proceed as follows. Position $i$ is contained in block $i/b$ . We scan $C[\frac{i}{kb}..\frac{i}{b})$ and sum the class values $c_{i/(kb)},\ldots,c_{i/(b-1)}$ , adding this to the cumulative rank at $\frac{i}{bk}$ which we have precomputed. This gives the rank up to the start of block $x_{i/b}$ . To obtain $f_{i/b}$ , the offset for the block, when scanning class values $c_{i/k}$ to $c_{i/b}-1$ we sum values $\log({b\choose{c_{i/kb}}})$ , $\log({b\choose{c_{(i/(kb))+1}}})$ , $\ldots$ , $\log({b\choose{c_{i/b}-1}})$ , from which we obtain the starting position in $F$ of $f_{i/b}$ . As described, a scan of up to $k$ elements of $C$ is required, followed by a single access to $F$ . The class and offset for a block are thus obtained. What remains is to unrank the block’s offset – i.e., obtain its original bit representation – which allows us to complete the query answer by counting the bits in the relevant prefix in the case of a rank query. It is at this point that implementations begin to differ.

Lookup table decoding.

The implementation of Claude and Navarro [2] uses a two-dimensional lookup table of $2^{b}$ entries that contains all the possible blocks of size $b$ (an approach prescribed in Raman at al.’s original paper [19]). The $i$ th row of the table contains all the blocks of class $i$ (i.e., all the blocks containing $i$ 1 bits). Having worked out the class $c$ and the offset $f$ of the block containing the query index as described above, the query is completed by accessing cell $[c,f]$ of the table. Claude and Navarro store the blocks in a row in lexicographical order, but the order is not important to the query algorithm just described.

The lookup table has total size $b2^{b}$ bits and indeed it is this size that adds some tension to the data structure: ideally, one would like to increase $b$ in order to decrease the ( $\log b$ )-bit overhead incurred per block for storing the class values. However increasing $b$ causes a rapid blowup in the table size. In order to keep the lookup table manageable, and to fit $c$ to four bits, Calude and Navarro fix the block size to $b=15$ , resulting in a lookup table $15\cdot 2^{15}$ bits in size, and an overhead from the class array of $\approx 0.27n$ bits. Gog and Petri engineer this approach further in the SDSL [5] with a class called rrr_vector_15.

On-the-fly decoding.

Arguing that 25% is a non-trivial overhead in the context of succinct data strucutres, Navarro and Providel [16] show that the lookup table can be eliminated by decoding the bits of a block from its lexicographical offset $f$ in $O(b)$ time, by checking the range of values that offset $f$ belongs to and extracting bits consecutively as follows.

For a block length $b$ , consider all the blocks of class $c$ arranged in lexicographical order. The first ${b-1\choose c}$ of them begin with a 0 and the remaining ${b-1\choose c-1}$ begin with a 1. Thus, if $f\geq{b-1\choose c}$ , then the first bit of the block was a 1. In this case we decrement b and c, and decrease $f$ by ${b-1\choose c}$ . Otherwise, the first bit of the block was a 0. In this case we only decrement $b$ . Now we continue extracting the next bit from a block of length $b-1$ . The process can be stopped if $f$ becomes zero, which implies that all the remaining bits are a single run of zeroes or ones. All the binomial coefficients are precomputed in a table in order to make decoding faster, but this is much smaller than the original lookup table. Navarro and Providel show that queries, while slower than the lookup-table approach, are still acceptably fast in practice. The approach was further engineered by Gog and Petri [5] and is included in the SDSL.

2.2 Experimental Setup

We measure and report on experiments with our new methods throughout the paper and so define our experimental machine and test data now. Additionally, our source code is available at https://github.com/saskeli/h0_bv.

Data sets.

We used the same data sets for all of our experiments. Three of the bitvectors are derived from real-world data, and chosen to be diverse and relevant to practical compressed indexing applications. WT-WEB-1GB⁵⁵5Available at http://people.eng.unimelb.edu.au/sgog/data/WT-WEB-1GB.gz and WT-DNA-1GB⁶⁶6Available at http://people.eng.unimelb.edu.au/sgog/data/WT-DNA-1GB.gz are data sets by Gog and Petri [5].

bv-dump.bin⁷⁷7Available at https://doi.org/10.5281/zenodo.11031751 is the concatenated bit vectors from the “plain matrix” representation of the spectral Burrows-Wheeler transform (SBWT) proposed by Alanko, Puglisi and Vuohtoniemi [1]. The SBWT in question is built from a set of 17,336,887 Illumina HiSeq 2500 reads of length 502 sampled from the human gut (SRAidentifier ERR5035349) in a study on irritable bowel syndrome and bile acid malabsorption [7] with $k$ -mer size 31.

In addition, we generated random bitvectors with varying probability $p$ for a 1-bit. All of these bit vectors are one gigabyte in size, and $p$ is i.i.d.. RND-1.bin has $p=2^{-1}$ , RND-5.bin has $p=2^{-5}$ and RND-10.bin $p=2^{-10}$ .

Some summary statistics on the datasets are presented in Table 1

Table 1: Details of the datasets used in performance benchmarks.

Dataset	$n$	$H_{0}$	mean $H_{0}$ entropy	Fraction of uniform
			of 64-bit blocks	64-bit blocks
WT-WEB-1GB	$8\,664\,343\,857$	$0.997\,653$	$0.099\,729\,8$	$0.843\,696$
WT-DNA-1GB	$8\,236\,129\,151$	$0.980\,138$	$0.614\,706$	$0.276\,278$
bv-dump.bin	$8\,261\,672\,152$	$0.811\,278$	$0.755\,564$	$0.007\,233\,09$
RND-1.bin	$8\,589\,934\,592$	$1$	$0.988\,641$	$0$
RND-5.bin	$8\,589\,934\,592$	$0.200\,631$	$0.187\,785$	$0.131\,055$
RND-10.bin	$8\,589\,934\,592$	$0.011\,172\,3$	$0.007\,195\,96$	$0.939\,396$

Experiments.

The experimental run is the same for each of the following three sections, with only the data structures under test changing. In particular, data structures are built for each of the data sets, after which $10^{7}$ random access and rank queries each are run and timed on the built index. The space usage of the built index along with mean query times are reported. We only report times for access queries because comparative rank performance between data structure implementations remains the same, just very slightly and consistently slower than access queries. Full results for rank and select queries as timed in our experiments, along with construction times, are available in Appendix A.

Machine and Compilation.

Tests were run on a machine with an AMD EPYC 7452 Processor and 500GB of DDR4 RAM. The operating system was AlmaLinux 8.10 (with Linux 4.18.0-372.9.1.el8.x86_64 kernel). Code was compiled with GCC 13.3.0 and the performance-relevant flags: -std=c++2a, -march=native, -Ofast and -DNDEBUG. Experiments were run multiple times, on multiple machines to ensure the stability of the results.

3 Reducing the size of the lookup table

As discussed in Section 2.1, the principle hurdle to reducing the lower order terms in the original RRR representation (and its implementation by Claude and Navarro) is the lookup table that stores all possible bitstrings of length $b$ . This lookup table has size $b2^{b}$ bits, making it very sensitive to increases in $b$ . Navarro and Providel’s approach is to remove this lookup table entirely and reconstruct its entries as needed through computation, at the cost of query time. In this section we keep the lookup table, but examine ways to reduce its size, with the hope of increasing $b$ to reduce overall size and retain fast query times.

The two approaches we explore are sparsification of the original lookup table and essentially compacting the lookup table by replacing it with a series of weighted de Bruijn sequences [20].

Sparse lookup table.

Instead of storing all ${{b}\choose{c}}$ possible length- $b$ bitstrings with weight $c$ in the table, we can store every $g$ th entry ( $\lceil{{b}\choose{c}}/g\rceil$ in blocks total) and reconstruct any desired block in at most $g$ constant-time steps. This changes the total space required for the lookup table to $b2^{b}/g+b$ bits. Further space savings can be had by storing samples only for blocks where $c\leq\lceil b/2\rceil$ , effectively halving the space usage. Our implementations thus use $b2^{b-1}/g+b$ bits for storing the lookup table, and decoding a block given $c$ and $f$ values takes $\mathcal{O}(g)$ time with an algorithm that, given a block, can compute its lexicographic successor in $O(1)$ time.

We use the “Compute the lexicographically next bit permutation”-algorithm from the “Bit Twiddling Hacks” website⁸⁸8https://graphics.stanford.edu/~seander/bithacks.html. See Listing 1. The fast enumeration algorithm compiles to 9 instructions per iteration when optimized and compiled by the GCC compiler.

A natural selection for gap size is $g=b$ , leading to the lookup table taking $2^{b-1}+b$ bits.

Listing 1: Code for fast enumeration of the

{{b}\choose{c}}

blocks of lenght

b

with

c

set bits.

⬇

constexpr uint32_t next(uint32_t v) {

uint32_t t = v | (v - 1);

return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctzl(v) + 1));

}

Weighted de Bruijn sequences.

In [20], Ruskey, Sawada and Williams study weighted de Bruijn sequences, circular strings of length ${{b}\choose{c}}$ from which all possible substrings of length $b$ having $c$ 1 bits can be efficiently extracted. In particular, we create circular bitstrings (the weighted de Bruijn sequences) of length ${{b}\choose{c}}$ , such that all bitstrings of length $b$ and weight $c$ can be created from $(b-1)$ -length substrings of this weighted de Bruijn sequence with the necessary bit appended to reach the correct weight.

These remarkable constructions seem tailor made to reduce the size of the RRR lookup table. The class $c$ remains the same as before, but the offset $\mathit{f}$ is the relative position of the encoded block within the weighted de Bruijn sequence.

Storing these bit strings in practice requires ${{b}\choose{c}}+b-1$ bits, with the first $b-1$ bits being appended to the end, to ensure that the “last” substring can be read without overhead due to the circularity of the string. Conveniently, if we concatenate these weighted de Bruijn sequences by increasing weight, adjacent sequences overlap by $b-2$ characters, provided each weighted de Bruijn sequence is the lexicographically smallest rotation. See Figure 1. Additionally we only require sequences of weights $0\leq c\leq\lfloor b/2\rfloor$ , since higher weight sequences can be stored as inverted references to lower weight sequences. This leads to the entire lookup table fitting in

b-1+\sum_{c=1}^{\lfloor b/2\rfloor}{{b}\choose{c}}+1<2^{b}\text{ bits.}

The required binary sequence along with the start positions of each weight are computed using a Python script and output as static arrays for use with C++ code. These arrays are used for decoding blocks. Reading from the binary sequence consists of reading the desired bytes in big-endian, followed by masking and shifting to generate a candidate block. After this the last bit is flipped, if required before flipping the entire candidate if $c>\lfloor b/2\rfloor$ . The decoding process is visualized in Figure 2.

Figure 1: Figure illustrating the concatenation of weighted de Bruijn sequences with

b=7

, to create the full sequence that replaces the traditional lookup table. The first

b

symbols of the lexicographically smallest rotation are highlighted, to show the overlap between subsequent sequences with the starts being repeated.

Figure 2: Figure illustrating the process of reading elements from the weighted de Bruijn sequence with

b=7

and

c\in\{3,4\}

.

b

candidate bits are read from the offset position, the candidate bits get flipped if

c=4

, after which the correct block is the exclusive OR of the candidate and

\mathbf{popcount}(\text{candidate})\neq c

.

Performance of reduced size lookup tables.

We benchmarked both the sparse lookup table and weighted de Bruijn approaches with $b\in\{15,24\}$ and gaps 1, 7, 15, 24, 32, 64 for the sparse lookup tables. It would have been interesting to test $b=31$ as well, but unfortunately, our methods for compile-time definition of the tables failed for these bigger tables (probably due to the compiler running out of memory or a compile time limitation in constant expression resolution).

Results for benchmarking can be seen in Figure 3. Our approach allows a new space-time trade-off by increasing compression efficiency by increasing $b$ while reducing lookup speed. The RRR15 implementation by Gog and Petri [5] remains the fastest overall, but compressing worse than our de Brujin sequence-based approach with $b=24$ (DBS24).

Figure 3: Performance of different lookup table -based approaches with sizes of the lookup tables considered part of the compressed data structure size. The optimized version, based on a full lookup table due to Claude and Navarro [2], and Gog and Petri [5] (RRR15 in the figure), remains the fastest for queries, and beats both our implementations with

b=15

in space efficiency as well, this is likely due to some specific optimizations for

b=15

, that are missing from our implementation and would not apply for

b=24

. For

b=24

, our approaches improve compression, offering a space – query performance trade-off. The weighted de Bruijn sequence-based approach (DBS) outperformes the sparsified lookup tables (GAP) in all cases. Mean block wise

H_{0}

entropies are added for reference to provide a theoretical limit for block-based

H_{0}

compression.

4 Induced total order for $b$ length bitstrings

The specific encoding of the offset is irrelevant as long as the mapping from the $b$ -length bitstring $x$ to the $(c,\mathit{f})$ pair is bijective and $0\leq\textit{f}<{{b}\choose{c}}$ . As we saw in Section 2.1, lexicographic order allows decoding and answering queries in $\mathcal{O}(b)$ time. Selecting another induced order changes the complexity of decoding. The new method we now describe allows answering queries on blocks in $\mathcal{O}(\log c\cdot\log b)$ time.

For any bitstrings $x_{\alpha}$ and $x_{\beta}$ of length $b$ and class $c$ , define the ordering such that $\operatorname{pop}(x_{\alpha}\gg b/2)<\operatorname{pop}(x_{\beta}\gg b/2)% \Rightarrow f_{a}<f_{b}$ , where $u\gg v$ denotes the binary representation of $u$ shifted right by $v$ bits (discarding the lower $v$ bits of $u$ ) and $\operatorname{pop}$ returns the number of 1 bits in an integer (i.e., its population count). That is, the order between blocks is primarily decided by the weight of their prefixes, where a bitstring with more bits in the prefix will have a greater $\mathit{f}$ value. This can then be recursively applied to the prefixes and suffixes for a full recursive definition for calculation of the $\mathit{f}$ values.

Based on this, when decoding any given $(c,\mathit{f})$ , the first half of the block $x$ to decode will contain exactly

\underset{c_{p}\in\mathbb{N}}{\operatorname{argmax}}\left(\sum_{i=0}^{c_{p}-1}% {{b/2}\choose i}{{b/2}\choose{c-i}}<\mathit{f}\right)

one-bits.

Given $c_{p}$ , a query targeting the prefix of the block can recurse with

\left(c_{p},\left(f-\sum_{i=0}^{c_{p}-1}{{b/2}\choose i}{{b/2}\choose{c-i}}% \right)/{{b/2}\choose{c-c_{p}}}\right)

and a query targeting the suffix recurses with

\left(c-c_{p},\left(f-\sum_{i=0}^{c_{p}-1}{{b/2}\choose i}{{b/2}\choose{c-i}}% \right)\mod{{{b/2}\choose{c-c_{p}}}}\right).

The recurrence takes $\mathcal{O}(\log b)$ steps, and binary searching for $c_{p}$ takes $\mathcal{O}(\log c)$ time, given precomputed lookup tables for binomial coefficients. This yields a total run time of $\mathcal{O}(\log c\log b)$ for determining the result of any rank or access query at a block level.

The offset values with the desired properties can be computed using the following algorithm.

Example.

We illustrate the workings of this alternative encoding with $x=01101000_{b}$ , and further demonstrate querying with an access $(x,2)$ query on the encoded block.

The encoded values for the prefix $x_{p}=0110_{b}$ and suffix $x_{s}=1000_{b}$ are computed recursively, but the recurrences are not written out here for brevity.

$\blacksquare$

There are $\sum_{i=0}^{1}{4\choose i}{4\choose{3-i}}=28$ , vectors of class 3 where the prefix has less than two set bits. $f\leftarrow 28$
$\blacksquare$

$\operatorname{enc}(x-p)=(2,2)$ , meaning there are two instances for $b=4$ and $c=2$ that are less than $x_{p}$ , so we add the number of $b=8$ and $c=3$ vectors, with two set bits in the prefix where the prefix encoding is less than $(2,2)$ . I.e $2{4\choose 1}=8$ . $f\leftarrow f+8$
$\blacksquare$

The suffix $\operatorname{enc}(x_{s})=(1,3)$ , so we add this value to the ones we derived before. $f\leftarrow f+3$
$\blacksquare$

Thus $\operatorname{enc}(0110100_{b})=(3,28+8+3)=(3,39)$ .

We start with access $(i=2)$ with $(c,f)=(3,39)$ . When we recurse $i$ and $(c,f)$ will be updated accordingly.

$\blacksquare$

Since $28\leq f<52$ , we know that the class of the prefix of $x$ is 2. As we recurse to the left we update $(c,f)$ by subtracing the offset and dividing with the number of length 4 vectors with one set bit. $(c,f)\leftarrow(2,\lfloor\frac{39-28}{4}\rfloor)=(2,2)$
$\blacksquare$

Since $1\leq f<5$ , we know that the class of the new prefix is 1. Now we recurse to the right and update $i\leftarrow i-2=0$ and $(c,f)\leftarrow(1,2-1\mod 2)=(1,1)$ .
$\blacksquare$

Since $1\leq f<2$ , we know that the class of the new prefix is 1. Since we are recursing to the prefix, the only possible update value for $c=1$ is $(c,f)\leftarrow(1,0)$ .
$\blacksquare$

We are now accessing a vector of length one with $c=1$ . I.e. for $(c,f)=(1,0)$ access $(0)=1$ .
$\blacksquare$

Now for $(c,f)=(3,39)$ , access $(2)=1$ .

4.1 Implementation details

Recursive split.

Our implementation of the method described in the previous section makes extensive use of (small) lookup tables. For our implementation with $b=64$ we precompute ${{n}\choose{m}}$ for $n\in\{32,16,8\}$ and $0\leq m\leq n$ . The binomials are stored as static heap-ordered binary trees to enable fast branchless binary searching for binomial coefficients. In addition, the decoding for block size 8 is done using precomputed lookup tables.

A drawback with our solution is the need to compute division and modulus with binomial coefficients. We were unable to make any compiler precompute the fast integer divisions for the required binomials, and inverse modular arithmetic is impractical because implementations would require computations using 128-bit integers.

Our implementation relies on the symmetry of the recurrence, requiring a block size that is a power of two, while, for better overall data structure size, $b$ should be $2^{a}-1$ for some $a\in\mathbb{N}^{+}$ so that $a$ bits are sufficient to store every possible class value. Unfortunately, asymmetric recurrences would cause additional unavoidable and unpredictable branching while decoding and have a significant negative impact on query performance.

8-bit block iterative approach.

The same principle for the induced order can be applied to other splits besides the recursive even split described above. This leads to an $O(b)$ decode time, which matches the on-the-fly approach of Navarro and Providel [16] (and its later implementation by Gog and Petri [5]). The hope is that the reliance on lookup tables for 8-bit sections of the block would yield lower constant factors compared to other implementations.

This approach allows use of block sizes that are not powers of two to improve the storage of class values. If the offsets for the 8-bit sub-blocks correspond to the lexicographic order, the first partial block can be decoded with the same lookup table as the full blocks, with minimal impact from the change in $b$ .

Figure 4: Performance of our

H_{0}

-based implementations compared to implementations in the SDSL. Pictured are our de Bruijn sequence-based approach (DBS), our balanced recurrence-based approach (REC) and our block iteration-based implementation (IT), along with the lookup table (RRR) and non-lookup table (SDSL)-based approaches from SDSL, with the number denoting the block size

b

. The highly optimized RRR15 outperformed all other implementations in query performance, while only beating our weighted de Bruijn implementation with

b=15

(DBS15) in compression, due to the small block size. For larger block sizes our implementations compare favorably, being comparable in compression ratio while typically improving query time. Mean block wise

H_{0}

entropies are added for reference to provide a theoretical limit for block-based

H_{0}

compression.

Figure 5: Visualization of the space usage for some of the tested data structures. Lookup table sizes are insignificant. The size of the F array increases with block size, but this increase is offset by the decrease in superblock overhead and C array size.

4.2 Results

Results from benchmarking, now including on-the-fly decoding performance, can be seen in Figure 4. The lookup table-based RRR15 remains the fastest overall, with the on-the-fly approaches lagging behind in query performance, but compressing significantly better. Of the on-the-fly approaches, our symmetric recurrence (REC) and block iterative (IT) methods offer similar compression but typically faster lookup speed than the on-the-fly implementation in the SDSL. On all data sets tested the new methods offer relevant space-time tradeoffs.

The space consumed by various components of the data structures is shown in Figure 5. Data structure size is mainly determined by the block and superblock sizes used, and lookup table sizes are irrelevant. Space for the $F$ array increases as block size increases, but this is offset by the shrinking of the $C$ array and superblock overhead.

5 Hybrid implementation

Regions of bitvectors that consist of alternating runs of ones and zeros have very low $H_{1}$ entropy but potentially high $H_{0}$ entropy. This leads to poor performance for the RRR-based techniques discussed so far, which achieve $H_{0}$ entropy, ignoring the $o(n)$ term.

The hybrid bitvector by Kärkkäinen, Kempa and Puglisi [8] is a widely used, practically powerful compressed bitvector scheme that, in particular, is an essential ingredient of state-of-the-art general-purpose FM indexes [4]. The approach is to divide the bitvector into blocks and then select the best encoding for each block from a number of possible encodings. The implementation of hybrid bitvectors in the sdsl_lite library uses 256 bit blocks, and then stores each block either as: 1) minority-bit encoded (i.e., to encode the block, we store the positions of the 1 bits using 8 bits each); 2) run-length encoded; or 3) in plain form. This scheme works extremely well for blocks with runs or blocks that are extremely dense or sparse, but can perform badly when this is not the case and many blocks get stored in plain form (resulting in no compression).

This hybrid scheme and the RRR-based scheme complement each other, and adding a $H_{0}$ option for block encoding to the hybrid bitvector should improve compression performance.

With this in mind, we added on our 64-bit block iterative encoding as an option in the 256-bit hybrid bitvector implementation included in the SDSL. As a comparison we had another version that used Gog and Petri’s RRR implementation with $b=256$ as an encoding option instead of our iterative encoding. This was done in a way that did not cause any additional space overhead for the hybrid structure, but should induce some significant overhead to each query, compared to the implementation without support for $H_{0}$ encoding.

We ran the same tests on these structures, the results of which are shown in Figure 6. We observe that by including RRR as a block-encoding options, compression levels of hybrid bitvectors can be significantly improved, though at some penalty to query performance.

Figure 6: Performance of the SDSL hybrid bitvector, compared to versions with added option for compressing blocks with either our 64-bit block iterative implementation (IT), or the RRR 256-bit compression also from the SDSL. We find that compression can be improved as expected (c.f. blue points and green points), but with a penalty to query performance.

6 Concluding Remarks

We have described two main improvements to the RRR bitvector data structure, the common theme of which is the non-lexicographical assignment of codes to offset values (in contrast the lexicographical assignment, which has been done in all previous work on the problem). The first of these approaches represents all possible blocks of size $b$ as a series of weighted de Bruijn sequences. The second provides a new on-the-fly block-decoding mechanism for answering queries in $\mathcal{O}(\log(c)\log(b))$ time based on a new induced total order for offset encoding. These methods offer improved practical tradeoffs for $H_{0}$ -compressed bitvectors. We have also descibed a sparsification technique for reducing the size of the lookup table in the case that lexicographic assignment is used.

There are numerous avenues to further optimize the RRR scheme. Firstly, profiling reveals that our non-lookup table-based implementations execute between three and eight div instructions per query. Eliminating these divisions efficiently would improve the query performance of our data structures and possibly be of independent interest. Efficiently optimizing divisions for 64 bit (and larger) integers is an open problem, but perhaps existing schemes for shorter integers[9] could be applied to larger integers as well as hardware keeps improving.

Larger block sizes have the potential to improve the compression ratio further, but implementations require computations using large integers (128, 256 or 512-bit integers), with significantly slower arithmetic operations than 64-bit integers.

Finally, the rrr_vector_15 class in the SDSL uses bit-parallelism to great effect. Since its publication [5] some 10 years ago, architecture support for bit-parallelism has improved significantly, and we expect it is possible to improve the existing optimizations of rrr_vector_15 or apply novel optimizations to implementations with $b>15$ using new SIMD instructions.

We leave these open problems to be the focus of future research.

References

[1] Jarno N. Alanko, Simon J. Puglisi, and Jaakko Vuohtoniemi. Small searchable k-spectra via subset rank queries on the spectral Burrows-Wheeler transform. In SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023, pages 225–236. SIAM, 2023. doi:10.1137/1.9781611977714.20.
[2] Francisco Claude and Gonzalo Navarro. Practical rank/select queries over arbitrary sequences. In String Processing and Information Retrieval, pages 176–187, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-540-89097-3_18.
[3] Saska Dönges. saskeli/h0_bv. Software, swhId: swh:1:dir:383ab8d9247150886cfacd830469e5e5bdd72551 (visited on 2025-07-02). URL: https://github.com/saskeli/h0_bv, doi:10.4230/artifacts.23794.
[4] Simon Gog, Juha Kärkkäinen, Dominik Kempa, Matthias Petri, and Simon J. Puglisi. Fixed block compression boosting in fm-indexes: Theory and practice. Algorithmica, 81(4):1370–1391, 2019. doi:10.1007/S00453-018-0475-9.
[5] Simon Gog and Matthias Petri. Optimized succinct data structures for massive data. Software: Practice and Experience, 44(11):1287–1314, 2014. doi:10.1002/spe.2198.
[6] Guy Jacobson. Space-efficient static trees and graphs. In Proc. 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549–554. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63533.
[7] Ian B Jeffery, Anubhav Das, Eileen O’Herlihy, Simone Coughlan, Katryna Cisek, Michael Moore, Fintan Bradley, Tom Carty, Meenakshi Pradhan, Chinmay Dwibedi, et al. Differences in fecal microbiomes and metabolomes of people with vs without irritable bowel syndrome and bile acid malabsorption. Gastroenterology, 158(4):1016–1028, 2020.
[8] Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Hybrid compression of bitvectors for the FM-index. In Proc. Data Compression Conference, pages 302–311. IEEE, 2014. doi:10.1109/DCC.2014.87.
[9] Daniel Lemire, Owen Kaser, and Nathan Kurz. Faster remainder by direct computation: Applications to compilers and software libraries. Software: Practice and Experience, 49(6):953–970, 2019. doi:10.1002/spe.2689.
[10] Danyang Ma, Simon J. Puglisi, Rajeev Raman, and Bella Zhukova. On Elias-Fano for rank queries in FM-indexes. In Proc. 31st Data Compression Conference, pages 223–232. IEEE, 2021. doi:10.1109/DCC50243.2021.00030.
[11] Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Bioinformatics in the Era of High-Throughput Sequencing (2nd edition). Cambridge University Press, 2023. URL: http://www.genome-scale.info/.
[12] Veli Mäkinen and Gonzalo Navarro. Implicit compression boosting with applications to self-indexing. In Proc. String Processing and Information Retrieval, 14th International Symposium, LNCS 4726, pages 229–241. Springer, 2007. doi:10.1007/978-3-540-75530-2_21.
[13] Giovanni Manzini. An analysis of the burrows-wheeler transform. J. ACM, 48(3):407–430, 2001. doi:10.1145/382780.382782.
[14] J. Ian Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science, LNCS 1180, pages 37–42. Springer, 1996. doi:10.1007/3-540-62034-6_35.
[15] Gonzalo Navarro. Compact Data Structures – A practical approach. Cambridge University Press, 2016. ISBN 978-1-107-15238-0. 570 pages.
[16] Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In Proc. SEA, pages 295–306. Springer, 2012. doi:10.1007/978-3-642-30850-5_26.
[17] Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proc. 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pages 273–282. ACM, 2014. doi:10.1145/2600428.2609615.
[18] Rasmus Pagh. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proc. Automata, Languages and Programming, 26th International Colloquium, ICALP’99, LNCS 1644, pages 595–604. Springer, 1999. doi:10.1007/3-540-48523-6_56.
[19] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43–es, November 2007. doi:10.1145/1290672.1290680.
[20] Frank Ruskey, Joe Sawada, and Aaron Williams. De Bruijn sequences for fixed-weight binary strings. SIAM J. Discret. Math., 26(2):605–617, 2012. doi:10.1137/100808782.
[21] Sebastiano Vigna. Quasi-succinct indices. In Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, February 4-8, 2013, pages 83–92. ACM, 2013. doi:10.1145/2433396.2433409.

Appendix A Tabulated experiment results

	WT-WEB-1GB	WT-DNA-1GB	bv-dump.bin	RND-1.bin	RND-5.bin	RND-10.bin
GAP	block size: 15, gap: 1
build time (s)	$3.728$	$4.055$	$4.561$	$4.128$	$3.828$	$3.582$
bits/bit	$0.467\,878\,951\,013\,977\,86$	$0.902\,709\,601\,199\,146\,6$	$0.987\,133\,021\,414\,302\,9$	$1.236\,974\,106\,267\,287\,4$	$0.516\,198\,552\,092\,920\,6$	$0.389\,355\,800\,938\,935\,57$
access (ns)	$48.8088$	$153.222$	$164.145$	$167.843$	$96.8741$	$26.032$
rank (ns)	$162.941$	$171.973$	$172.029$	$172.431$	$166.629$	$153.58$
select (ns)	$913.867$	$931.872$	$971.726$	$1291.3$	$964.197$	$896.082$
GAP	block size: 15, gap: 7
build time (s)	$4.136$	$4.457$	$4.784$	$4.553$	$4.225$	$3.987$
bits/bit	$0.467\,845\,090\,862\,284\,1$	$0.902\,673\,980\,582\,676\,7$	$0.987\,097\,510\,927\,937\,7$	$1.236\,939\,952\,806\,085\,6$	$0.516\,164\,398\,631\,718\,8$	$0.389\,321\,647\,477\,733\,73$
access (ns)	$55.2366$	$225.744$	$263.242$	$277.2$	$126.532$	$26.5134$
rank (ns)	$174.946$	$249.561$	$276.962$	$287.748$	$209.206$	$157.792$
select (ns)	$916.729$	$1006.02$	$1072.97$	$1405.67$	$1007.83$	$901.581$
GAP	block size: 15, gap: 15
build time (s)	$3.796$	$4.132$	$4.476$	$4.152$	$3.866$	$3.638$
bits/bit	$0.467\,842\,084\,518\,100\,1$	$0.902\,670\,817\,931\,956$	$0.987\,094\,358\,055\,348\,2$	$1.236\,936\,920\,419\,805\,3$	$0.516\,161\,366\,245\,438\,4$	$0.389\,318\,615\,091\,453\,4$
access (ns)	$56.1212$	$228.589$	$272.445$	$284.819$	$128.413$	$26.5664$
rank (ns)	$175.562$	$257.209$	$285.71$	$295.183$	$211.898$	$157.677$
select (ns)	$926.366$	$1020.79$	$1098.0$	$1418.99$	$1022.15$	$919.123$
GAP	block size: 15, gap: 24
build time (s)	$3.792$	$4.105$	$4.437$	$4.152$	$3.871$	$3.637$
bits/bit	$0.467\,841\,102\,100\,958\,86$	$0.902\,669\,784\,436\,757\,4$	$0.987\,093\,327\,755\,460\,2$	$1.236\,935\,929\,492\,593\,2$	$0.516\,160\,375\,318\,226\,4$	$0.389\,317\,624\,164\,241\,37$
access (ns)	$56.9823$	$237.039$	$277.709$	$294.73$	$130.144$	$26.6282$
rank (ns)	$176.863$	$260.026$	$291.541$	$304.902$	$213.287$	$157.887$
select (ns)	$920.346$	$1023.72$	$1099.86$	$1452.77$	$1021.12$	$908.684$
GAP	block size: 15, gap: 32
build time (s)	$3.783$	$4.145$	$4.481$	$4.199$	$3.915$	$3.629$
bits/bit	$0.467\,840\,688\,451\,636\,3$	$0.902\,669\,349\,280\,884\,4$	$0.987\,092\,893\,944\,981$	$1.236\,935\,512\,260\,083$	$0.516\,159\,958\,085\,716\,1$	$0.389\,317\,206\,931\,731\,05$
access (ns)	$57.1164$	$239.053$	$284.746$	$301.531$	$130.02$	$26.5804$
rank (ns)	$174.242$	$260.277$	$297.251$	$311.316$	$210.922$	$155.45$
select (ns)	$918.053$	$1022.84$	$1114.13$	$1452.71$	$1020.87$	$903.73$
GAP	block size: 15, gap: 64
build time (s)	$3.952$	$4.186$	$4.543$	$4.257$	$3.892$	$3.635$
bits/bit	$0.467\,840\,067\,977\,652\,4$	$0.902\,668\,696\,547\,074\,7$	$0.987\,092\,243\,229\,262\,3$	$1.236\,934\,886\,411\,317\,4$	$0.516\,159\,332\,236\,950\,7$	$0.389\,316\,581\,082\,965\,6$
access (ns)	$59.1719$	$262.017$	$309.844$	$336.026$	$131.382$	$26.5396$
rank (ns)	$176.375$	$283.325$	$322.053$	$345.829$	$211.949$	$155.514$
select (ns)	$920.951$	$1065.62$	$1168.09$	$1520.98$	$1028.45$	$906.002$

	WT-WEB-1GB	WT-DNA-1GB	bv-dump.bin	RND-1.bin	RND-5.bin	RND-10.bin
GAP	block size: 24, gap: 1
build time (s)	$3.413$	$10.835$	$11.661$	$16.962$	$2.974$	$2.399$
bits/bit	$0.396\,031\,446\,736\,880\,8$	$0.860\,447\,697\,958\,992\,3$	$0.969\,373\,827\,457\,722\,8$	$1.204\,960\,980\,788\,164$	$0.461\,326\,225\,948\,883\,4$	$0.314\,638\,674\,708\,912\,7$
access (ns)	$57.3598$	$201.173$	$205.049$	$260.746$	$134.223$	$28.5569$
rank (ns)	$169.666$	$237.213$	$213.786$	$268.979$	$173.468$	$154.479$
select (ns)	$866.488$	$1005.67$	$1037.46$	$1532.87$	$898.629$	$836.867$
GAP	block size: 24, gap: 7
build time (s)	$3.653$	$10.519$	$13.775$	$16.262$	$3.126$	$2.42$
bits/bit	$0.372\,904\,512\,547\,734\,6$	$0.836\,118\,342\,900\,947\,6$	$0.945\,119\,692\,726\,769\,9$	$1.181\,633\,712\,301\,897\,2$	$0.437\,998\,957\,462\,616\,6$	$0.291\,311\,406\,222\,645\,9$
access (ns)	$64.3484$	$246.201$	$285.983$	$298.609$	$186.285$	$29.5175$
rank (ns)	$179.338$	$268.256$	$291.393$	$308.176$	$233.904$	$158.526$
select (ns)	$878.575$	$1030.44$	$1089.68$	$1543.84$	$968.787$	$846.221$
GAP	block size: 24, gap: 15
build time (s)	$3.531$	$10.505$	$11.493$	$16.338$	$3.27$	$2.547$
bits/bit	$0.370\,848\,786\,213\,280\,8$	$0.833\,955\,734\,771\,221$	$0.942\,963\,770\,844\,603\,3$	$1.179\,560\,178\,484\,319\,3$	$0.435\,925\,423\,645\,038\,7$	$0.289\,237\,872\,405\,068\,04$
access (ns)	$64.8765$	$250.521$	$297.227$	$308.713$	$190.512$	$29.5091$
rank (ns)	$180.816$	$271.758$	$303.938$	$314.941$	$236.907$	$159.444$
select (ns)	$882.324$	$1029.76$	$1103.16$	$1573.71$	$984.492$	$854.255$
GAP	block size: 24, gap: 24
build time (s)	$3.827$	$10.694$	$11.651$	$17.072$	$3.312$	$2.709$
bits/bit	$0.370\,174\,249\,740\,217\,1$	$0.833\,246\,127\,643\,075\,6$	$0.942\,256\,357\,645\,567\,7$	$1.178\,879\,798\,919\,858$	$0.435\,245\,044\,080\,577\,46$	$0.288\,557\,492\,840\,606\,8$
access (ns)	$65.5028$	$248.99$	$300.185$	$308.375$	$195.675$	$29.8434$
rank (ns)	$181.514$	$271.901$	$308.174$	$314.475$	$239.596$	$158.259$
select (ns)	$885.657$	$1024.92$	$1107.33$	$1529.96$	$993.326$	$865.059$
GAP	block size: 24, gap: 32
build time (s)	$3.815$	$10.86$	$11.811$	$16.677$	$3.582$	$2.835$
bits/bit	$0.369\,893\,197\,185\,287\,74$	$0.832\,950\,462\,539\,222$	$0.941\,961\,606\,664\,828\,8$	$1.178\,596\,311\,780\,837\,8$	$0.434\,961\,556\,941\,557\,25$	$0.288\,274\,005\,701\,586\,57$
access (ns)	$65.5902$	$255.293$	$306.287$	$315.676$	$196.321$	$29.7771$
rank (ns)	$181.124$	$276.118$	$315.641$	$322.794$	$239.999$	$157.966$
select (ns)	$888.533$	$1035.55$	$1118.28$	$1546.55$	$998.906$	$868.652$
GAP	block size: 24, gap: 64
build time (s)	$4.399$	$11.433$	$12.26$	$17.827$	$3.967$	$3.417$
bits/bit	$0.369\,471\,614\,659\,596\,1$	$0.832\,506\,960\,998\,121\,4$	$0.941\,519\,476\,320\,412\,4$	$1.178\,171\,077\,347\,017\,3$	$0.434\,536\,322\,507\,736\,66$	$0.287\,848\,771\,267\,766$
access (ns)	$67.9074$	$276.484$	$333.083$	$349.913$	$201.254$	$29.7777$
rank (ns)	$183.644$	$296.403$	$342.381$	$355.715$	$245.106$	$157.793$
select (ns)	$894.616$	$1061.87$	$1165.78$	$1596.93$	$1020.69$	$864.823$

	WT-WEB-1GB	WT-DNA-1GB	bv-dump.bin	RND-1.bin	RND-5.bin	RND-10.bin
IT	block size: 63
build time (s)	$2.289$	$3.238$	$3.997$	$4.031$	$3.982$	$1.903$
bits/bit	$0.222\,369\,605\,619\,496\,95$	$0.712\,744\,527\,198\,054\,1$	$0.843\,265\,818\,362\,669\,2$	$1.071\,979\,860\,238\,881$	$0.292\,494\,886\,238\,166\,05$	$0.129\,363\,397\,103\,820\,78$
access (ns)	$74.8317$	$249.465$	$303.511$	$309.669$	$236.876$	$37.5112$
rank (ns)	$179.46$	$261.932$	$316.588$	$320.151$	$270.507$	$159.856$
select (ns)	$760.392$	$876.18$	$954.466$	$970.194$	$878.494$	$659.825$
IT	block size: 64
build time (s)	$1.139$	$2.228$	$2.939$	$2.828$	$2.54$	$0.785$
bits/bit	$0.236\,309\,100\,276\,233\,53$	$0.727\,050\,431\,939\,216\,5$	$0.857\,934\,755\,624\,370\,3$	$1.085\,720\,250\,675\,676\,5$	$0.305\,864\,669\,897\,670\,5$	$0.143\,058\,100\,580\,736\,26$
access (ns)	$64.8315$	$249.066$	$303.555$	$307.486$	$240.434$	$33.4113$
rank (ns)	$178.502$	$264.999$	$322.533$	$320.614$	$271.908$	$160.267$
select (ns)	$764.262$	$888.36$	$971.232$	$1145.56$	$859.516$	$646.619$
REC	block size: 64
build time (s)	$2.245$	$8.691$	$11.548$	$13.362$	$3.621$	$0.809$
bits/bit	$0.236\,308\,598\,911\,094\,76$	$0.727\,049\,904\,506\,986\,3$	$0.857\,934\,229\,822\,829\,7$	$1.085\,719\,744\,967\,522\,2$	$0.305\,864\,164\,189\,516\,3$	$0.143\,057\,594\,872\,582$
access (ns)	$64.6253$	$248.801$	$305.583$	$308.281$	$226.252$	$33.549$
rank (ns)	$186.175$	$274.88$	$319.677$	$320.304$	$252.252$	$160.953$
select (ns)	$762.83$	$870.66$	$956.8$	$1106.23$	$850.819$	$703.01$
DBS	block size: 15
build time (s)	$3.764$	$4.101$	$4.452$	$4.131$	$3.91$	$3.617$
bits/bit	$0.467\,841\,922\,936\,333\,4$	$0.902\,670\,647\,949\,193\,2$	$0.987\,094\,188\,598\,129\,7$	$1.236\,936\,757\,438\,356$	$0.516\,161\,203\,263\,989\,1$	$0.389\,318\,452\,110\,004\,04$
access (ns)	$48.0073$	$151.446$	$161.857$	$165.004$	$95.4266$	$26.1222$
rank (ns)	$167.317$	$168.541$	$169.936$	$170.174$	$170.15$	$165.814$
select (ns)	$909.401$	$917.893$	$957.862$	$1279.39$	$959.579$	$905.097$
DBS	block size: 24
build time (s)	$3.321$	$10.197$	$11.029$	$15.812$	$3.025$	$2.346$
bits/bit	$0.370\,174\,023\,525\,743\,86$	$0.833\,245\,889\,667\,207\,5$	$0.942\,256\,120\,405\,461\,9$	$1.178\,879\,570\,745\,829$	$0.435\,244\,815\,906\,548\,4$	$0.288\,557\,264\,666\,577\,7$
access (ns)	$53.3154$	$160.277$	$173.185$	$176.764$	$133.093$	$28.6485$
rank (ns)	$171.014$	$182.42$	$180.797$	$183.204$	$172.551$	$167.101$
select (ns)	$862.827$	$916.101$	$941.277$	$1393.37$	$893.444$	$842.976$
RRR	block size: 15
build time (s)	$3.538$	$4.769$	$5.765$	$5.158$	$6.343$	$3.11$
bits/bit	$0.457\,483\,470\,794\,666\,3$	$0.900\,648\,555\,341\,946\,7$	$0.987\,155\,238\,708\,128\,2$	$1.234\,912\,143\,201\,290\,4$	$0.507\,886\,580\,132\,793\,1$	$0.370\,627\,164\,873\,278\,4$
access (ns)	$34.0164$	$109.579$	$125.945$	$131.151$	$73.0565$	$20.3536$
rank (ns)	$105.353$	$149.235$	$157.422$	$157.874$	$134.708$	$93.2746$
select (ns)	$648.536$	$830.778$	$950.841$	$1278.28$	$925.266$	$821.865$

	WT-WEB-1GB	WT-DNA-1GB	bv-dump.bin	RND-1.bin	RND-5.bin	RND-10.bin
SDSL	block size: 24
build time (s)	$6.468$	$24.749$	$26.207$	$40.311$	$10.079$	$3.153$
bits/bit	$0.365\,175\,565\,440\,526\,9$	$0.833\,398\,985\,856\,804\,7$	$0.942\,412\,771\,499\,804\,2$	$1.177\,777\,904\,623\,911\,4$	$0.431\,538\,978\,170\,313\,1$	$0.278\,341\,012\,795\,709$
access (ns)	$69.2431$	$255.935$	$284.384$	$327.729$	$150.578$	$39.6841$
rank (ns)	$131.619$	$268.732$	$303.512$	$337.913$	$201.437$	$145.247$
select (ns)	$613.157$	$921.103$	$1083.35$	$1521.24$	$931.851$	$777.251$
SDSL	block size: 31
build time (s)	$5.691$	$23.576$	$24.874$	$39.573$	$9.269$	$2.571$
bits/bit	$0.304\,428\,619\,579\,674\,1$	$0.780\,690\,324\,710\,903\,6$	$0.895\,611\,799\,975\,038\,8$	$1.131\,186\,360\,447\,210\,3$	$0.366\,929\,970\,741\,793\,73$	$0.216\,593\,944\,716\,498\,68$
access (ns)	$80.7888$	$276.749$	$312.463$	$357.589$	$175.915$	$48.5396$
rank (ns)	$136.336$	$284.458$	$325.647$	$360.295$	$216.23$	$154.463$
select (ns)	$582.405$	$909.211$	$1080.24$	$1311.47$	$915.935$	$740.606$
SDSL	block size: 63
build time (s)	$4.318$	$20.965$	$22.133$	$36.357$	$7.999$	$1.654$
bits/bit	$0.220\,891\,961\,599\,987\,9$	$0.713\,251\,545\,955\,347$	$0.843\,772\,811\,120\,335$	$1.071\,990\,387\,909\,185\,9$	$0.291\,513\,361\,658\,801\,43$	$0.125\,901\,700\,922\,089\,08$
access (ns)	$94.2154$	$345.217$	$391.965$	$462.072$	$238.919$	$53.7692$
rank (ns)	$147.529$	$345.117$	$395.659$	$456.36$	$273.225$	$170.339$
select (ns)	$507.254$	$914.16$	$1102.63$	$1221.16$	$894.921$	$648.104$
HYB-IT	block size: 256
build time (s)	$2.509$	$2.466$	$3.685$	$1.463$	$9.759$	$2.894$
bits/bit	$0.178\,405\,246\,538\,911\,18$	$0.726\,910\,806\,155\,084$	$0.900\,132\,652\,272\,571\,6$	$1.078\,124\,716\,295\,863$	$0.321\,984\,713\,360\,781\,1$	$0.085\,882\,514\,075\,320\,1$
access (ns)	$68.1073$	$150.018$	$289.457$	$117.611$	$171.909$	$80.628$
rank (ns)	$75.9432$	$165.043$	$291.335$	$133.961$	$174.076$	$96.8056$
select (ns)	nan	nan	nan	nan	nan	nan
HYB-RRR	block size: 256
build time (s)	$3.344$	$25.407$	$42.77$	$2.646$	$22.861$	$2.661$
bits/bit	$0.179\,313\,915\,915\,168\,02$	$0.711\,185\,904\,735\,567\,4$	$0.853\,696\,703\,216\,206\,6$	$1.077\,480\,904\,180\,698\,8$	$0.281\,826\,643\,036\,107\,4$	$0.085\,882\,499\,174\,159\,02$
access (ns)	$126.547$	$665.36$	$976.07$	$143.217$	$400.072$	$81.6892$
rank (ns)	$133.519$	$668.806$	$947.346$	$160.66$	$407.562$	$98.2424$
select (ns)	nan	nan	nan	nan	nan	nan
HYB	block size: 256
build time (s)	$2.264$	$2.029$	$1.813$	$1.277$	$9.86$	$2.554$
bits/bit	$0.193\,016\,071\,886\,461\,17$	$0.760\,724\,461\,173\,873\,8$	$1.058\,504\,750\,251\,859\,9$	$1.078\,125\,081\,374\,309\,3$	$0.328\,133\,506\,118\,256\,53$	$0.085\,882\,514\,075\,320\,1$
access (ns)	$49.4219$	$93.0849$	$99.518$	$94.6937$	$145.674$	$72.3655$
rank (ns)	$63.7183$	$119.323$	$132.481$	$128.628$	$148.508$	$89.4701$
select (ns)	nan	nan	nan	nan	nan	nan

[bib.bib1] [1] Jarno N. Alanko, Simon J. Puglisi, and Jaakko Vuohtoniemi. Small searchable k-spectra via subset rank queries on the spectral Burrows-Wheeler transform. In SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2023, pages 225–236. SIAM, 2023. doi:10.1137/1.9781611977714.20.

[bib.bib2] [2] Francisco Claude and Gonzalo Navarro. Practical rank/select queries over arbitrary sequences. In String Processing and Information Retrieval, pages 176–187, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. doi:10.1007/978-3-540-89097-3_18.

[bib.bib3] [3] Saska Dönges. saskeli/h0_bv. Software, swhId: swh:1:dir:383ab8d9247150886cfacd830469e5e5bdd72551 (visited on 2025-07-02). URL: https://github.com/saskeli/h0_bv, doi:10.4230/artifacts.23794.

[bib.bib4] [4] Simon Gog, Juha Kärkkäinen, Dominik Kempa, Matthias Petri, and Simon J. Puglisi. Fixed block compression boosting in fm-indexes: Theory and practice. Algorithmica, 81(4):1370–1391, 2019. doi:10.1007/S00453-018-0475-9.

[bib.bib5] [5] Simon Gog and Matthias Petri. Optimized succinct data structures for massive data. Software: Practice and Experience, 44(11):1287–1314, 2014. doi:10.1002/spe.2198.

[bib.bib6] [6] Guy Jacobson. Space-efficient static trees and graphs. In Proc. 30th Annual Symposium on Foundations of Computer Science (FOCS), pages 549–554. IEEE Computer Society, 1989. doi:10.1109/SFCS.1989.63533.

[bib.bib7] [7] Ian B Jeffery, Anubhav Das, Eileen O’Herlihy, Simone Coughlan, Katryna Cisek, Michael Moore, Fintan Bradley, Tom Carty, Meenakshi Pradhan, Chinmay Dwibedi, et al. Differences in fecal microbiomes and metabolomes of people with vs without irritable bowel syndrome and bile acid malabsorption. Gastroenterology, 158(4):1016–1028, 2020.

[bib.bib8] [8] Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. Hybrid compression of bitvectors for the FM-index. In Proc. Data Compression Conference, pages 302–311. IEEE, 2014. doi:10.1109/DCC.2014.87.

[bib.bib9] [9] Daniel Lemire, Owen Kaser, and Nathan Kurz. Faster remainder by direct computation: Applications to compilers and software libraries. Software: Practice and Experience, 49(6):953–970, 2019. doi:10.1002/spe.2689.

[bib.bib10] [10] Danyang Ma, Simon J. Puglisi, Rajeev Raman, and Bella Zhukova. On Elias-Fano for rank queries in FM-indexes. In Proc. 31st Data Compression Conference, pages 223–232. IEEE, 2021. doi:10.1109/DCC50243.2021.00030.

[bib.bib11] [11] Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Bioinformatics in the Era of High-Throughput Sequencing (2nd edition). Cambridge University Press, 2023. URL: http://www.genome-scale.info/.

[bib.bib12] [12] Veli Mäkinen and Gonzalo Navarro. Implicit compression boosting with applications to self-indexing. In Proc. String Processing and Information Retrieval, 14th International Symposium, LNCS 4726, pages 229–241. Springer, 2007. doi:10.1007/978-3-540-75530-2_21.

[bib.bib13] [13] Giovanni Manzini. An analysis of the burrows-wheeler transform. J. ACM, 48(3):407–430, 2001. doi:10.1145/382780.382782.

[bib.bib14] [14] J. Ian Munro. Tables. In Proc. 16th Foundations of Software Technology and Theoretical Computer Science, LNCS 1180, pages 37–42. Springer, 1996. doi:10.1007/3-540-62034-6_35.

[bib.bib15] [15] Gonzalo Navarro. Compact Data Structures – A practical approach. Cambridge University Press, 2016. ISBN 978-1-107-15238-0. 570 pages.

[bib.bib16] [16] Gonzalo Navarro and Eliana Providel. Fast, small, simple rank/select on bitmaps. In Proc. SEA, pages 295–306. Springer, 2012. doi:10.1007/978-3-642-30850-5_26.

[bib.bib17] [17] Giuseppe Ottaviano and Rossano Venturini. Partitioned elias-fano indexes. In Proc. 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pages 273–282. ACM, 2014. doi:10.1145/2600428.2609615.

[bib.bib18] [18] Rasmus Pagh. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proc. Automata, Languages and Programming, 26th International Colloquium, ICALP’99, LNCS 1644, pages 595–604. Springer, 1999. doi:10.1007/3-540-48523-6_56.

[bib.bib19] [19] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43–es, November 2007. doi:10.1145/1290672.1290680.

[bib.bib20] [20] Frank Ruskey, Joe Sawada, and Aaron Williams. De Bruijn sequences for fixed-weight binary strings. SIAM J. Discret. Math., 26(2):605–617, 2012. doi:10.1137/100808782.

[bib.bib21] [21] Sebastiano Vigna. Quasi-succinct indices. In Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, February 4-8, 2013, pages 83–92. ACM, 2013. doi:10.1145/2433396.2433409.

Succinct Rank Dictionaries Revisited

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Supplementary Material:

Acknowledgements:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Contribution.

Roadmap.

2 Preliminaries

2.1 RRR bitvector structure

Lookup table decoding.

On-the-fly decoding.

2.2 Experimental Setup

Data sets.

Experiments.

Machine and Compilation.

3 Reducing the size of the lookup table

Sparse lookup table.

Weighted de Bruijn sequences.

Performance of reduced size lookup tables.

4 Induced total order for b length bitstrings

Example.

4.1 Implementation details

Recursive split.

8-bit block iterative approach.

4.2 Results

5 Hybrid implementation

6 Concluding Remarks

References

Appendix A Tabulated experiment results

4 Induced total order for $b$ length bitstrings