DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2009.1847

More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Authors: Roberto Grossi, Alessio Orlandi, Rajeev Raman, and S. Srinivasa Rao

Published in: LIPIcs, Volume 3, 26th International Symposium on Theoretical Aspects of Computer Science (2009)

Abstract

We consider the problem of representing, in a compressed format, a bit-vector~$S$ of $m$ bits with $n$ $\mathbf{1}$s, supporting the following operations, where $b \in \{ \mathbf{0}, \mathbf{1} \}$: \begin{itemize} \item $\mathtt{rank}_b(S,i)$ returns the number of occurrences of bit $b$ in the prefix $S\left[1..i\right]$; \item $\mathtt{select}_b(S,i)$ returns the position of the $i$th occurrence of bit $b$ in $S$. \end{itemize} Such a data structure is called \emph{fully indexable dictionary (\textsc{fid})} [Raman, Raman, and Rao, 2007], and is at least as powerful as predecessor data structures. Viewing $S$ as a set $X = \{ x_1, x_2, \ldots, x_n \}$ of $n$ distinct integers drawn from a universe $[m] = \{1, \ldots, m\}$, the predecessor of integer $y \in [m]$ in $X$ is given by $\ensuremath{\mathtt{select}^{}_1}(S, \ensuremath{\mathtt{rank}_1}(S,y-1))$. {\textsc{fid}}s have many applications in succinct and compressed data structures, as they are often involved in the construction of succinct representation for a variety of abstract data types. Our focus is on space-efficient {\textsc{fid}}s on the \textsc{ram} model with word size $\Theta(\lg m)$ and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring $S$ to be encoded, having length $m$ and containing $n$ ones, the minimal amount of information that needs to be stored is $B(n,m) = \lceil \log {{m}\choose{n}} \rceil$. The state of the art in building a \textsc{fid}\ for~$S$ is given in~\mbox{}[P\v{a}tra\c{s}cu, 2008] using $B(m,n)+O( m / ( (\log m/ t) ^t) ) + O(m^{3/4}) $ bits, to support the operations in $O(t)$ time. Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants $0 < \delta \leq 1/2$, $0 < \varepsilon \leq 1$, and integer $s > 0$, it uses \[ B(n,m) + O\left(n^{1+\delta} + n \left(\frac{m}{n^s}\right)^\varepsilon\right) \] bits and performs all the operations in time $O(s\delta^{-1} + \varepsilon^{-1})$. The improvement is twofold: our redundancy can be lowered parametrically and, fixing $s = O(1)$, we get a constant-time \textsc{fid}\ whose space is $B(n,m) + O(m^\varepsilon/\mathrm{poly}(n))$ bits, for sufficiently large $m$. This is a significant improvement compared to the previous bounds for the general case.

Cite as

Roberto Grossi, Alessio Orlandi, Rajeev Raman, and S. Srinivasa Rao. More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries. In 26th International Symposium on Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), Volume 3, pp. 517-528, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009)

Copy BibTex To Clipboard

@InProceedings{grossi_et_al:LIPIcs.STACS.2009.1847,
  author =	{Grossi, Roberto and Orlandi, Alessio and Raman, Rajeev and Rao, S. Srinivasa},
  title =	{{More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries}},
  booktitle =	{26th International Symposium on Theoretical Aspects of Computer Science},
  pages =	{517--528},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-09-5},
  ISSN =	{1868-8969},
  year =	{2009},
  volume =	{3},
  editor =	{Albers, Susanne and Marion, Jean-Yves},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2009.1847},
  URN =		{urn:nbn:de:0030-drops-18470},
  doi =		{10.4230/LIPIcs.STACS.2009.1847},
  annote =	{Keywords: }
}

Search Results

Documents authored by Orlandi, Alessio

More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Abstract

Cite as