More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Grossi, Roberto; Orlandi, Alessio; Raman, Rajeev; Rao, S. Srinivasa

doi:10.4230/LIPIcs.STACS.2009.1847

Abstract

We consider the problem of representing, in a compressed format, a bit-vector~$S$ of $m$ bits with $n$ $\mathbf{1}$s, supporting the following operations, where $b \in \{ \mathbf{0}, \mathbf{1} \}$:
\begin{itemize}
\item $\mathtt{rank}_b(S,i)$ returns the number of occurrences of bit $b$ in the prefix $S\left[1..i\right]$;
\item $\mathtt{select}_b(S,i)$ returns the position of the $i$th occurrence of bit $b$ in $S$.
\end{itemize}
Such a data structure is called \emph{fully indexable dictionary (\textsc{fid})} [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing $S$ as a set $X = \{ x_1, x_2, \ldots, x_n \}$ of $n$ distinct integers drawn from a universe $[m] = \{1, \ldots, m\}$, the predecessor of integer $y \in [m]$ in $X$ is given by $\ensuremath{\mathtt{select}^{}_1}(S, \ensuremath{\mathtt{rank}_1}(S,y-1))$. {\textsc{fid}}s have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types.

Our focus is on space-efficient {\textsc{fid}}s on the \textsc{ram} model with word size $\Theta(\lg m)$ and constant time for all operations, so that the time cost is independent of the input size.

Given the bitstring $S$ to be encoded, having length $m$ and containing $n$ ones, the minimal amount of information that needs to be stored is $B(n,m) = \lceil \log {{m}\choose{n}} \rceil$. The state of the art in building a \textsc{fid}\ for~$S$ is given in~\mbox{}[P\v{a}tra\c{s}cu, 2008] using $B(m,n)+O( m / ( (\log m/ t) ^t) ) + O(m^{3/4}) $ bits, to support the operations in $O(t)$ time.

Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants $0 < \delta \leq 1/2$, $0 < \varepsilon \leq 1$, and integer $s > 0$, it uses
\[ B(n,m) + O\left(n^{1+\delta} + n \left(\frac{m}{n^s}\right)^\varepsilon\right) \]
bits and performs all the operations in time $O(s\delta^{-1} + \varepsilon^{-1})$. The improvement is twofold: our redundancy can be lowered parametrically and, fixing $s = O(1)$, we get a constant-time \textsc{fid}\ whose space is $B(n,m) + O(m^\varepsilon/\mathrm{poly}(n))$ bits, for sufficiently large $m$. This is a significant improvement compared to the previous bounds for the general case.

More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Authors Roberto Grossi, Alessio Orlandi, Rajeev Raman, S. Srinivasa Rao

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Metrics

Thanks for your feedback!

Could not send message