4 Search Results for "Wedel Vildhøj, Hjalte"


Document
Hierarchical Relative Lempel-Ziv Compression

Authors: Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow

Published in: LIPIcs, Volume 265, 21st International Symposium on Experimental Algorithms (SEA 2023)


Abstract
Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string S is compressed relative to a second string R (called the reference) by parsing S into a sequence of substrings that occur in R. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such datasets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we propose a new compression scheme hierarchical relative Lempel-Ziv (HRLZ) which form a rooted tree (or hierarchy) on the strings and then compress each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome datasets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality-sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance.

Cite as

Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow. Hierarchical Relative Lempel-Ziv Compression. In 21st International Symposium on Experimental Algorithms (SEA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 265, pp. 18:1-18:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.SEA.2023.18,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Puglisi, Simon J. and Tarnow, Simon R.},
  title =	{{Hierarchical Relative Lempel-Ziv Compression}},
  booktitle =	{21st International Symposium on Experimental Algorithms (SEA 2023)},
  pages =	{18:1--18:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-279-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{265},
  editor =	{Georgiadis, Loukas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2023.18},
  URN =		{urn:nbn:de:0030-drops-183680},
  doi =		{10.4230/LIPIcs.SEA.2023.18},
  annote =	{Keywords: Relative compression, Lempel-Ziv compression, RLZ, LZ77, string collections, compressed representation, data structures, efficient algorithms}
}
Document
Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing

Authors: Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)


Abstract
Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in (i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or (ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1 In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

Cite as

Philip Bille, Mikko Berggren Ettienne, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 16:1-16:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2017.16,
  author =	{Bille, Philip and Ettienne, Mikko Berggren and G{\o}rtz, Inge Li and Vildh{\o}j, Hjalte Wedel},
  title =	{{Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.16},
  URN =		{urn:nbn:de:0030-drops-73254},
  doi =		{10.4230/LIPIcs.CPM.2017.16},
  annote =	{Keywords: compressed indexing, pattern matching, LZ77, prefix search}
}
Document
Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Authors: Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind

Published in: LIPIcs, Volume 64, 27th International Symposium on Algorithms and Computation (ISAAC 2016)


Abstract
Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

Cite as

Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, Frederik Rye Skjoldjensen, Hjalte Wedel Vildhøj, and Søren Vind. Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 18:1-18:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.ISAAC.2016.18,
  author =	{Bille, Philip and Cording, Patrick Hagge and G{\o}rtz, Inge Li and Skjoldjensen, Frederik Rye and Vildh{\o}j, Hjalte Wedel and Vind, S{\o}ren},
  title =	{{Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation}},
  booktitle =	{27th International Symposium on Algorithms and Computation (ISAAC 2016)},
  pages =	{18:1--18:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-026-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{64},
  editor =	{Hong, Seok-Hee},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2016.18},
  URN =		{urn:nbn:de:0030-drops-67872},
  doi =		{10.4230/LIPIcs.ISAAC.2016.18},
  annote =	{Keywords: Relative compression, dynamic compression, dynamic partial sum, sub-string concatenation, external macro compression}
}
Document
Boxed Permutation Pattern Matching

Authors: Mika Amit, Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, and Hjalte Wedel Vildhøj

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
Given permutations T and P of length n and m, respectively, the Permutation Pattern Matching problem asks to find all m-length subsequences of T that are order-isomorphic to P. This problem has a wide range of applications but is known to be NP-hard. In this paper, we study the special case, where the goal is to only find the boxed subsequences of T that are order-isomorphic to P. This problem was introduced by Bruner and Lackner who showed that it can be solved in O(n^3) time. Cho et al. [CPM 2015] gave an O(n^2m) time algorithm and improved it to O(n^2 log m). In this paper we present a solution that uses only O(n^2) time. In general, there are instances where the output size is Omega(n^2) and hence our bound is optimal. To achieve our results, we introduce several new ideas including a novel reduction to 2D offline dominance counting. Our algorithm is surprisingly simple and straightforward to implement.

Cite as

Mika Amit, Philip Bille, Patrick Hagge Cording, Inge Li Gørtz, and Hjalte Wedel Vildhøj. Boxed Permutation Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 20:1-20:11, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{amit_et_al:LIPIcs.CPM.2016.20,
  author =	{Amit, Mika and Bille, Philip and Hagge Cording, Patrick and Li G{\o}rtz, Inge and Wedel Vildh{\o}j, Hjalte},
  title =	{{Boxed Permutation Pattern Matching}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{20:1--20:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.20},
  URN =		{urn:nbn:de:0030-drops-60744},
  doi =		{10.4230/LIPIcs.CPM.2016.20},
  annote =	{Keywords: Permutation, Subsequence, Pattern Matching, Order Preserving, Boxed Mesh Pattern}
}
  • Refine by Author
  • 4 Bille, Philip
  • 3 Gørtz, Inge Li
  • 2 Vildhøj, Hjalte Wedel
  • 1 Amit, Mika
  • 1 Cording, Patrick Hagge
  • Show More...

  • Refine by Classification
  • 1 Theory of computation → Design and analysis of algorithms

  • Refine by Keyword
  • 2 LZ77
  • 2 Relative compression
  • 1 Boxed Mesh Pattern
  • 1 Lempel-Ziv compression
  • 1 Order Preserving
  • Show More...

  • Refine by Type
  • 4 document

  • Refine by Publication Year
  • 2 2016
  • 1 2017
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail