Search Results

Documents authored by Lipták, Zsuzsanna


Document
A Textbook Solution for Dynamic Strings

Authors: Zsuzsanna Lipták, Francesco Masillo, and Gonzalo Navarro

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)


Abstract
We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time O(log N) whp for the updates and O(1) worst-case time for the queries, where N is the total collection size [Gawrychowski et al., SODA 2018]. We present here a much simpler solution based on a forest of enhanced splay trees (FeST), where both the updates and the substring comparison take O(log n) amortized time, n being the lengths of the strings involved. The longest common prefix of length 𝓁 is computed in O(log n + log²𝓁) amortized time. Our query results are correct whp. Our simpler solution enables other more general updates in O(log n) amortized time, such as reversing a substring and/or mapping its symbols. We can also regard substrings as circular or as their omega extension.

Cite as

Zsuzsanna Lipták, Francesco Masillo, and Gonzalo Navarro. A Textbook Solution for Dynamic Strings. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 86:1-86:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.ESA.2024.86,
  author =	{Lipt\'{a}k, Zsuzsanna and Masillo, Francesco and Navarro, Gonzalo},
  title =	{{A Textbook Solution for Dynamic Strings}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{86:1--86:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.86},
  URN =		{urn:nbn:de:0030-drops-211576},
  doi =		{10.4230/LIPIcs.ESA.2024.86},
  annote =	{Keywords: dynamic strings, splay trees, dynamic data structures, LCP, circular strings}
}
Document
BAT-LZ out of hell

Authors: Zsuzsanna Lipták, Francesco Masillo, and Gonzalo Navarro

Published in: LIPIcs, Volume 296, 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)


Abstract
Despite consistently yielding the best compression on repetitive text collections, the Lempel-Ziv parsing has resisted all attempts at offering relevant guarantees on the cost to access an arbitrary symbol. This makes it less attractive for use on compressed self-indexes and other compressed data structures. In this paper we introduce a variant we call BAT-LZ (for Bounded Access Time Lempel-Ziv) where the access cost is bounded by a parameter given at compression time. We design and implement a linear-space algorithm that, in time O(nlog³ n), obtains a BAT-LZ parse of a text of length n by greedily maximizing each next phrase length. The algorithm builds on a new linear-space data structure that solves 5-sided orthogonal range queries in rank space, allowing updates to the coordinate where the one-sided queries are supported, in O(log³ n) time for both queries and updates. This time can be reduced to O(log² n) if O(nlog n) space is used. We design a second algorithm that chooses the sources for the phrases in a clever way, using an enhanced suffix tree, albeit no longer guaranteeing longest possible phrases. This algorithm is much slower in theory, but in practice it is comparable to the greedy parser, while achieving significantly superior compression. We then combine the two algorithms, resulting in a parser that always chooses the longest possible phrases, and the best sources for those. Our experimentation shows that, on most repetitive texts, our algorithms reach an access cost close to log₂ n on texts of length n, while incurring almost no loss in the compression ratio when compared with classical LZ-compression. Several open challenges are discussed at the end of the paper.

Cite as

Zsuzsanna Lipták, Francesco Masillo, and Gonzalo Navarro. BAT-LZ out of hell. In 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 296, pp. 21:1-21:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.CPM.2024.21,
  author =	{Lipt\'{a}k, Zsuzsanna and Masillo, Francesco and Navarro, Gonzalo},
  title =	{{BAT-LZ out of hell}},
  booktitle =	{35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)},
  pages =	{21:1--21:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-326-3},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{296},
  editor =	{Inenaga, Shunsuke and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2024.21},
  URN =		{urn:nbn:de:0030-drops-201317},
  doi =		{10.4230/LIPIcs.CPM.2024.21},
  annote =	{Keywords: Lempel-Ziv parsing, data compression, compressed data structures, repetitive text collections}
}
Document
Complete Volume
LIPIcs, Volume 259, CPM 2023, Complete Volume

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
LIPIcs, Volume 259, CPM 2023, Complete Volume

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1-472, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Proceedings{bulteau_et_al:LIPIcs.CPM.2023,
  title =	{{LIPIcs, Volume 259, CPM 2023, Complete Volume}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1--472},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023},
  URN =		{urn:nbn:de:0030-drops-179536},
  doi =		{10.4230/LIPIcs.CPM.2023},
  annote =	{Keywords: LIPIcs, Volume 259, CPM 2023, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2023.0,
  author =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.0},
  URN =		{urn:nbn:de:0030-drops-179542},
  doi =		{10.4230/LIPIcs.CPM.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Suffix Sorting via Matching Statistics

Authors: Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

Cite as

Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi. Suffix Sorting via Matching Statistics. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 20:1-20:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.WABI.2022.20,
  author =	{Lipt\'{a}k, Zsuzsanna and Masillo, Francesco and Puglisi, Simon J.},
  title =	{{Suffix Sorting via Matching Statistics}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{20:1--20:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.20},
  URN =		{urn:nbn:de:0030-drops-170545},
  doi =		{10.4230/LIPIcs.WABI.2022.20},
  annote =	{Keywords: Generalized suffix array, matching statistics, string collections, compressed representation, data structures, efficient algorithms}
}
Document
A Theoretical and Experimental Analysis of BWT Variants for String Collections

Authors: Davide Cenzato and Zsuzsanna Lipták

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et al. [Theor. Comput. Sci., 2007], is a generalization of the Burrows-Wheeler-Transform (BWT) to multisets of strings. While the original BWT is based on the lexicographic order, the eBWT uses the omega-order, which differs from the lexicographic order in important ways. A number of tools are available that compute the BWT of string collections; however, the data structures they generate in most cases differ from the one originally defined, as well as from each other. In this paper, we review the differences between these BWT variants, both from a theoretical and from a practical point of view, comparing them on several real-life datasets with different characteristics. We find that the differences can be extensive, depending on the dataset characteristics, and are largest on collections of many highly similar short sequences. The widely-used parameter r, the number of runs of the BWT, also shows notable variation between the different BWT variants; on our datasets, it varied by a multiplicative factor of up to 4.2.

Cite as

Davide Cenzato and Zsuzsanna Lipták. A Theoretical and Experimental Analysis of BWT Variants for String Collections. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cenzato_et_al:LIPIcs.CPM.2022.25,
  author =	{Cenzato, Davide and Lipt\'{a}k, Zsuzsanna},
  title =	{{A Theoretical and Experimental Analysis of BWT Variants for String Collections}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.25},
  URN =		{urn:nbn:de:0030-drops-161529},
  doi =		{10.4230/LIPIcs.CPM.2022.25},
  annote =	{Keywords: Burrows-Wheeler-Transform, extended BWT, string collections, repetitiveness measures, r, compression}
}
Document
Pattern Discovery in Colored Strings

Authors: Zsuzsanna Lipták, Simon J. Puglisi, and Massimiliano Rossi

Published in: LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)


Abstract
We consider the problem of identifying patterns of interest in colored strings. A colored string is a string in which each position is colored with one of a finite set of colors. Our task is to find substrings that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically infer properties of the embedded system from the analysis of its simulation traces. We show that the number of interesting patterns is upper-bounded by 𝒪(n²) where n is the length of the string. We introduce a baseline algorithm with 𝒪(n²) running time which identifies all interesting patterns for all colors in the string satisfying certain minimality conditions. When one is interested in patterns related to only one color, we provide an algorithm that identifies patterns in 𝒪(n²log n) time, but is faster than the first algorithm in practice, both on simulated and on real-world patterns.

Cite as

Zsuzsanna Lipták, Simon J. Puglisi, and Massimiliano Rossi. Pattern Discovery in Colored Strings. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.SEA.2020.12,
  author =	{Lipt\'{a}k, Zsuzsanna and Puglisi, Simon J. and Rossi, Massimiliano},
  title =	{{Pattern Discovery in Colored Strings}},
  booktitle =	{18th International Symposium on Experimental Algorithms (SEA 2020)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-148-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{160},
  editor =	{Faro, Simone and Cantone, Domenico},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.12},
  URN =		{urn:nbn:de:0030-drops-120862},
  doi =		{10.4230/LIPIcs.SEA.2020.12},
  annote =	{Keywords: property testing, suffix tree, pattern mining}
}
Document
Reconstruction of Trees from Jumbled and Weighted Subtrees

Authors: Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
Let T be an edge-labeled graph, where the labels are from a finite alphabet Sigma. For a subtree U of T the Parikh vector of U is a vector of length |Sigma| which specifies the multiplicity of each label in U. We ask when T can be reconstructed from the multiset of Parikh vectors of all its subtrees, or all of its paths, or all of its maximal paths. We consider the analogous problems for weighted trees. We show how several well-known reconstruction problems on labeled strings, weighted strings and point sets on a line can be included in this framework. We present reconstruction algorithms and non-reconstructibility results, and extend the polynomial method, previously applied to jumbled strings [Acharya et al., SIAM J. on Discr. Math, 2015] and weighted strings [Bansal et al., CPM 2004], to deal with general trees and special tree classes.

Cite as

Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták. Reconstruction of Trees from Jumbled and Weighted Subtrees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 10:1-10:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bartha_et_al:LIPIcs.CPM.2016.10,
  author =	{Bartha, D\'{e}nes and Burcsi, Peter and Lipt\'{a}k, Zsuzsanna},
  title =	{{Reconstruction of Trees from Jumbled and Weighted Subtrees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.10},
  URN =		{urn:nbn:de:0030-drops-60861},
  doi =		{10.4230/LIPIcs.CPM.2016.10},
  annote =	{Keywords: trees, paths, Parikh vectors, reconstruction problems, homometric sets, polynomial method, jumbled strings, weighted strings}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail