Search Results

Documents authored by Vildhøj, Hjalte Wedel


Document
Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing

Authors: Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in (i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or (ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1 In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

Cite as

Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.16,
  author =	{Bille, Philip and Ettienne, Mikko Berggren and G{\o}rtz, Inge Li and Vildh{\o}j, Hjalte Wedel},
  title =	{{Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.16},
  URN =		{urn:nbn:de:0030-drops-73254},
  doi =		{10.4230/LIPIcs.CPM.2017.16},
  annote =	{Keywords: compressed indexing, pattern matching, LZ77, prefix search}
}
Document
Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Authors: Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)


Abstract
Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

Cite as

Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 18:1-18:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ISAAC.2016.18,
  author =	{Bille, Philip and Cording, Patrick Hagge and G{\o}rtz, Inge Li and Skjoldjensen, Frederik Rye and Vildh{\o}j, Hjalte Wedel and Vind, S{\o}ren},
  title =	{{Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.18},
  URN =		{urn:nbn:de:0030-drops-67872},
  doi =		{10.4230/LIPIcs.ISAAC.2016.18},
  annote =	{Keywords: Relative compression, dynamic compression, dynamic partial sum, sub-string concatenation, external macro compression}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail