3 Search Results for "Ganczorz, Michal"


Document
Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before

Authors: Michał Gańczorz

Published in: LIPIcs, Volume 154, 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)


Abstract
We propose new entropy measures for trees, the known ones are H_k(?), the k-th order (tree label) entropy (Ferragina at al. 2005), and tree entropy H(?) (Jansson et al. 2006), the former considers only the tree labels and the latter only tree shape. The proposed entropy measures, H_k(?|L) and H_k(L|?), exploit the relation between the labels and the tree shape. We prove that they lower bound label entropy and tree entropy, respectively, i.e. H_k(?|L) ≤ H(?) and H_k(L|?) ≤ H_k(L). Besides being theoretically superior, the new measures are significantly smaller in practice. We also propose a new succinct representation of labeled trees which represents a tree T using one of the following bounds: |T|(H(?) + H_k(L|?)) or |T|(H_k(?|L) + H_k(L)). The representation is based on a new, simple method of partitioning the tree, which preserves both tree shape and node degrees. The previous state-of-the-art method of compressing the tree achieved |T|(H(?) + H_k(L)) bits, by combining the results of Ferragina at al. 2005 and Jansson et al. 2006; so proposed representation is not worse and often superior. Moreover, our representation supports standard tree navigation in constant time as well as more complex queries. Such a structure achieving this space bounds was not known before: aforementioned solution only worked for compression alone, our structure is the first which achieves H_k(?) for k>0 and supports such queries. Lastly, our data structure is fairly simple, both conceptually and in terms of the implementation, moreover it uses known tools, which is a counter-argument to the claim that methods based on tree-partitioning are impractical.

Cite as

Michał Gańczorz. Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 22:1-22:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{ganczorz:LIPIcs.STACS.2020.22,
  author =	{Ga\'{n}czorz, Micha{\l}},
  title =	{{Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{22:1--22:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Paul, Christophe and Bl\"{a}ser, Markus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2020.22},
  URN =		{urn:nbn:de:0030-drops-118836},
  doi =		{10.4230/LIPIcs.STACS.2020.22},
  annote =	{Keywords: succinct data structures, labeled tree, ordered tree, entropy, tree entropy}
}
Document
Entropy Lower Bounds for Dictionary Compression

Authors: Michał Gańczorz

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compressors as well as parsing-based structures) require |S|H_k(S) + Omega (|S|k log sigma/log_sigma |S|) bits to encode their output. This matches known upper bounds and improves the information-theoretic lower bound of |S|H_k(S). To this end, we abstract the crucial properties of parsings created by those methods, construct a certain family of strings and analyze the parsings of those strings. We also show that for k = alpha log_sigma |S|, where 0 < alpha < 1 is a constant, the aforementioned methods produce an output of size at least 1/(1-alpha)|S|H_k(S) bits. Thus our results separate dictionary compressors from context-based one (such as PPM) and BWT-based ones, as the those include methods achieving |S|H_k(S) + O(sigma^k log sigma) bits, i.e. the redundancy depends on k and sigma but not on |S|.

Cite as

Michał Gańczorz. Entropy Lower Bounds for Dictionary Compression. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 11:1-11:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{ganczorz:LIPIcs.CPM.2019.11,
  author =	{Ga\'{n}czorz, Micha{\l}},
  title =	{{Entropy Lower Bounds for Dictionary Compression}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{11:1--11:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.11},
  URN =		{urn:nbn:de:0030-drops-104822},
  doi =		{10.4230/LIPIcs.CPM.2019.11},
  annote =	{Keywords: compression, empirical entropy, parsing, lower bounds}
}
Document
Edit Distance with Block Operations

Authors: Michal Ganczorz, Pawel Gawrychowski, Artur Jez, and Tomasz Kociumaka

Published in: LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)


Abstract
We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)). The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers.

Cite as

Michal Ganczorz, Pawel Gawrychowski, Artur Jez, and Tomasz Kociumaka. Edit Distance with Block Operations. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ganczorz_et_al:LIPIcs.ESA.2018.33,
  author =	{Ganczorz, Michal and Gawrychowski, Pawel and Jez, Artur and Kociumaka, Tomasz},
  title =	{{Edit Distance with Block Operations}},
  booktitle =	{26th Annual European Symposium on Algorithms (ESA 2018)},
  pages =	{33:1--33:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-081-1},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{112},
  editor =	{Azar, Yossi and Bast, Hannah and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.33},
  URN =		{urn:nbn:de:0030-drops-94963},
  doi =		{10.4230/LIPIcs.ESA.2018.33},
  annote =	{Keywords: Edit distance, Block operations, Common string partition}
}
  • Refine by Author
  • 2 Gańczorz, Michał
  • 1 Ganczorz, Michal
  • 1 Gawrychowski, Pawel
  • 1 Jez, Artur
  • 1 Kociumaka, Tomasz

  • Refine by Classification
  • 1 Theory of computation → Approximation algorithms analysis
  • 1 Theory of computation → Data compression
  • 1 Theory of computation → Design and analysis of algorithms

  • Refine by Keyword
  • 1 Block operations
  • 1 Common string partition
  • 1 Edit distance
  • 1 compression
  • 1 empirical entropy
  • Show More...

  • Refine by Type
  • 3 document

  • Refine by Publication Year
  • 1 2018
  • 1 2019
  • 1 2020

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail