Search Results

Documents authored by Labeit, Julian


Document
Practical Range Minimum Queries Revisited

Authors: Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)


Abstract
Finding the position of the minimal element in a subarray A[i..j] of an array A of size n is a fundamental operation in many applications. In 2011, Fischer and Heun presented the first index of size 2n+o(n) bits which answers the operation in constant time for any subarray. The index can be computed in linear time and queries can be answered without consulting the original array. The most recent and currently fastest practical index is due to Ferrada and Navarro (DCC'16). It reduces the range minimum query (RMQ) to more fundamental and well studied queries on binary vectors, namely rank and select, and a RMQ query on an array of sublinear size derived from A. A range min-max tree is employed to solve this recursive RMQ call. In this paper, we review their practical design and suggest a series of changes which result in consistently faster query times. Specifically, we provide a customized select implementation, switch to two levels of recursion, and use the sparse table solution for the recursion base case instead of a range min-max tree. We provide an extensive empirical evaluation of our new implementation and also compare it to the state of the art. Our experimental study shows that our proposal significantly outperforms the previous solutions on established benchmarks (up to a factor of three) and furthermore accelerates real world applications such as traversing a succinct tree or listing all distinct elements in an interval of an array.

Cite as

Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit. Practical Range Minimum Queries Revisited. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{baumstark_et_al:LIPIcs.SEA.2017.12,
  author =	{Baumstark, Niklas and Gog, Simon and Heuer, Tobias and Labeit, Julian},
  title =	{{Practical Range Minimum Queries Revisited}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{12:1--12:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.12},
  URN =		{urn:nbn:de:0030-drops-76158},
  doi =		{10.4230/LIPIcs.SEA.2017.12},
  annote =	{Keywords: Succinct Data Structures, Range Minimum Queries, Algorithm Engineering}
}
Document
The Quantile Index - Succinct Self-Index for Top-k Document Retrieval

Authors: Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit

Published in: LIPIcs, Volume 75, 16th International Symposium on Experimental Algorithms (SEA 2017)


Abstract
One of the central problems in information retrieval is that of finding the k documents in a large text collection that best match a query given by a user. A recent result of Navarro & Nekrich (SODA 2012) showed that single term and phrase queries of length m can be solved in optimal O(m+k) time using a linear word sized index. While a verbatim implementation of the index would be at least an order of magnitude larger than the original collection, various authors incrementally improved the index to a point where the space requirement is currently within a factor of 1.5 to 2.0 of the text size for standard collections. In this paper, we propose a new time/space trade-off for different top-k indexes. This is achieved by sampling only a quantile of the postings in the original inverted file or suffix array-based index. For those queries that cannot be answered using the sampled version of the index we show how to compute the query results on the fly efficiently. As an example, we apply our method to the top-k framework by Navarro & Nekrich. Under probabilistic assumptions that hold for most standard texts, and for a standard scoring function called term frequency, our index can be represented with only sublinearly many bits plus the space needed for a compressed suffix array of the text, while maintaining poly-logarithmic query times. We evaluate our solution on real-world datasets and compare its practical space usage and performance against state-of-the-art implementations. Our experiments show that our index compresses below the size of the original text. To our knowledge it is the first suffix array-based text index that is able to break this bound in practice even for non-repetitive collections, while still maintaining reasonable query times of under half a millisecond on average for top-10 queries.

Cite as

Niklas Baumstark, Simon Gog, Tobias Heuer, and Julian Labeit. The Quantile Index - Succinct Self-Index for Top-k Document Retrieval. In 16th International Symposium on Experimental Algorithms (SEA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 75, pp. 15:1-15:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{baumstark_et_al:LIPIcs.SEA.2017.15,
  author =	{Baumstark, Niklas and Gog, Simon and Heuer, Tobias and Labeit, Julian},
  title =	{{The Quantile Index - Succinct Self-Index for Top-k Document Retrieval}},
  booktitle =	{16th International Symposium on Experimental Algorithms (SEA 2017)},
  pages =	{15:1--15:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-036-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{75},
  editor =	{Iliopoulos, Costas S. and Pissis, Solon P. and Puglisi, Simon J. and Raman, Rajeev},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2017.15},
  URN =		{urn:nbn:de:0030-drops-76183},
  doi =		{10.4230/LIPIcs.SEA.2017.15},
  annote =	{Keywords: Text Indexing, Succinct Data Structures, Top-k Document Retrieval}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail