DROPS

Document

DOI: 10.4230/LIPIcs.CPM.2025.3

Counting on General Run-Length Grammars

Authors: Gonzalo Navarro and Alejandro Pacheco

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length m in a text of length n in time O(mlog^{2+ε} n), for any constant ε > 0 chosen at indexing time. This is the first solution to an open problem posed by Christiansen et al. [ACM TALG 2020] and enhances our abilities for computation over compressed data; we give an example application.

Cite as

Gonzalo Navarro and Alejandro Pacheco. Counting on General Run-Length Grammars. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 3:1-3:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{navarro_et_al:LIPIcs.CPM.2025.3,
  author =	{Navarro, Gonzalo and Pacheco, Alejandro},
  title =	{{Counting on General Run-Length Grammars}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{3:1--3:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.3},
  URN =		{urn:nbn:de:0030-drops-230977},
  doi =		{10.4230/LIPIcs.CPM.2025.3},
  annote =	{Keywords: Grammar-based indexing, Run-length context-free grammars, Counting pattern occurrences, Periods in strings}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.18

Improved Circular Dictionary Matching

Authors: Nicola Cotumaccio

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

The circular dictionary matching problem is an extension of the classical dictionary matching problem where every string in the dictionary is interpreted as a circular string: after reading the last character of a string, we can move back to its first character. The circular dictionary matching problem is motivated by applications in bioinformatics and computational geometry. In 2011, Hon et al. [ISAAC 2011] showed how to efficiently solve circular dictionary matching queries within compressed space by building on Mantaci et al.’s eBWT and Sadakane’s compressed suffix tree. The proposed solution is based on the assumption that the strings in the dictionary are all distinct and non-periodic, no string is a circular rotation of some other string, and the strings in the dictionary have similar lengths. In this paper, we consider arbitrary dictionaries, and we show how to solve circular dictionary matching queries in O((m + occ) log n) time within compressed space using n log σ (1 + o(1)) + O(n) + O(d log n) bits, where n is the total length of the dictionary, m is the length of the pattern, occ is the number of occurrences, d is the number of strings in the dictionary and σ is the size of the alphabet. Our solution is based on an extension of the suffix array to arbitrary dictionaries and a sampling mechanism for the LCP array of a dictionary inspired by recent results in graph indexing and compression.

Cite as

Nicola Cotumaccio. Improved Circular Dictionary Matching. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 18:1-18:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cotumaccio:LIPIcs.CPM.2025.18,
  author =	{Cotumaccio, Nicola},
  title =	{{Improved Circular Dictionary Matching}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{18:1--18:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.18},
  URN =		{urn:nbn:de:0030-drops-231122},
  doi =		{10.4230/LIPIcs.CPM.2025.18},
  annote =	{Keywords: Circular pattern matching, dictionary matching, suffix tree, compressed suffix tree, suffix array, LCP array, Burrows-Wheeler Transform, FM-index}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.21

Compressed Dictionary Matching on Run-Length Encoded Strings

Authors: Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

Given a set of pattern strings 𝒫 = {P₁, P₂,… P_k} and a text string S, the classic dictionary matching problem is to report all occurrences of each pattern in S. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let m and n be the total length of the patterns 𝒫 and the length of the text string S, respectively, and let ̅m and ̅n be the total number of runs in the run-length encoding of the patterns in 𝒫 and S, respectively. Our main result is an algorithm that achieves O(( ̅m + ̅n)log log m + occ) expected time, and O( ̅m) space, where occ is the total number of occurrences of patterns in S. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an log log m factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.

Cite as

Philip Bille, Inge Li Gørtz, Simon J. Puglisi, and Simon R. Tarnow. Compressed Dictionary Matching on Run-Length Encoded Strings. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 21:1-21:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2025.21,
  author =	{Bille, Philip and G{\o}rtz, Inge Li and Puglisi, Simon J. and Tarnow, Simon R.},
  title =	{{Compressed Dictionary Matching on Run-Length Encoded Strings}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{21:1--21:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.21},
  URN =		{urn:nbn:de:0030-drops-231158},
  doi =		{10.4230/LIPIcs.CPM.2025.21},
  annote =	{Keywords: Dictionary matching, run-length encoding, compressed pattern matching}
}

Document

DOI: 10.4230/LIPIcs.STACS.2025.38

Two-Dimensional Longest Common Extension Queries in Compact Space

Authors: Arnab Ganguly, Daniel Gibney, Rahul Shah, and Sharma V. Thankachan

Published in: LIPIcs, Volume 327, 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)

Abstract

For a length n text over an alphabet of size σ, we can encode the suffix tree data structure in 𝒪(nlog σ) bits of space. It supports suffix array (SA), inverse suffix array (ISA), and longest common extension (LCE) queries in 𝒪(log^ε_σ n) time, which enables efficient pattern matching; here ε > 0 is an arbitrarily small constant. Further improvements are possible for LCE queries, where 𝒪(1) time queries can be achieved using an index of space 𝒪(nlog σ) bits. However, compactly indexing a two-dimensional text (i.e., an n× n matrix) has been a major open problem. We show progress in this direction by first presenting an 𝒪(n²log σ)-bit structure supporting LCE queries in near 𝒪((log_σ n)^{2/3}) time. We then present an 𝒪(n²log σ + n²log log n)-bit structure supporting ISA queries in near 𝒪(log n ⋅ (log_σ n)^{2/3}) time. Within a similar space, achieving SA queries in poly-logarithmic (even strongly sub-linear) time is a significant challenge. However, our 𝒪(n²log σ + n²log log n)-bit structure can support SA queries in 𝒪(n²/(σ log n)^c) time, where c is an arbitrarily large constant, which enables pattern matching in time faster than what is possible without preprocessing. We then design a repetition-aware data structure. The δ_2D compressibility measure for two-dimensional texts was recently introduced by Carfagna and Manzini [SPIRE 2023]. The measure ranges from 1 to n², with smaller δ_2D indicating a highly compressible two-dimensional text. The current data structure utilizing δ_2D allows only element access. We obtain the first structure based on δ_2D for LCE queries. It takes 𝒪^{~}(n^{5/3} + n^{8/5}δ_2D^{1/5}) space and answers queries in 𝒪(log n) time.

Cite as

Arnab Ganguly, Daniel Gibney, Rahul Shah, and Sharma V. Thankachan. Two-Dimensional Longest Common Extension Queries in Compact Space. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 38:1-38:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ganguly_et_al:LIPIcs.STACS.2025.38,
  author =	{Ganguly, Arnab and Gibney, Daniel and Shah, Rahul and Thankachan, Sharma V.},
  title =	{{Two-Dimensional Longest Common Extension Queries in Compact Space}},
  booktitle =	{42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)},
  pages =	{38:1--38:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-365-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{327},
  editor =	{Beyersdorff, Olaf and Pilipczuk, Micha{\l} and Pimentel, Elaine and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2025.38},
  URN =		{urn:nbn:de:0030-drops-228649},
  doi =		{10.4230/LIPIcs.STACS.2025.38},
  annote =	{Keywords: String matching, text indexing, two-dimensional text}
}

Document

DOI: 10.4230/LIPIcs.ESA.2016.31

Cell-Probe Lower Bounds for Bit Stream Computation

Authors: Raphaël Clifford, Markus Jalsenius, and Benjamin Sach

Published in: LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)

Abstract

We revisit the complexity of online computation in the cell probe model. We consider a class of problems where we are first given a fixed pattern F of n symbols and then one symbol arrives at a time in a stream. After each symbol has arrived we must output some function of F and the n-length suffix of the arriving stream. Cell probe bounds of Omega(delta lg n/w) have previously been shown for both convolution and Hamming distance in this setting, where delta is the size of a symbol in bits and w in Omega(lg n) is the cell size in bits. However, when delta is a constant, as it is in many natural situations, the existing approaches no longer give us non-trivial bounds. We introduce a lop-sided information transfer proof technique which enables us to prove meaningful lower bounds even for constant size input alphabets. Our new framework is capable of proving amortised cell probe lower bounds of Omega(lg^2 n/(w lg lg n)) time per arriving bit. We demonstrate this technique by showing a new lower bound for a problem known as pattern matching with address errors or the L_2-rearrangement distance problem. This gives the first non-trivial cell probe lower bound for any online problem on bit streams that still holds when the cell size is large.

Cite as

Raphaël Clifford, Markus Jalsenius, and Benjamin Sach. Cell-Probe Lower Bounds for Bit Stream Computation. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 31:1-31:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{clifford_et_al:LIPIcs.ESA.2016.31,
  author =	{Clifford, Rapha\"{e}l and Jalsenius, Markus and Sach, Benjamin},
  title =	{{Cell-Probe Lower Bounds for Bit Stream Computation}},
  booktitle =	{24th Annual European Symposium on Algorithms (ESA 2016)},
  pages =	{31:1--31:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-015-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{57},
  editor =	{Sankowski, Piotr and Zaroliagis, Christos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.31},
  URN =		{urn:nbn:de:0030-drops-63827},
  doi =		{10.4230/LIPIcs.ESA.2016.31},
  annote =	{Keywords: Cell-probe lower bounds, algorithms, data streaming}
}

Document

DOI: 10.4230/LIPIcs.STACS.2013.400

Parameterized Matching in the Streaming Model

Authors: Markus Jalsenius, Benny Porat, and Benjamin Sach

Published in: LIPIcs, Volume 20, 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)

Abstract

We study the problem of parameterized matching in a stream where we want to output matches between a pattern of length m and the last m symbols of the stream before the next symbol arrives. Parameterized matching is a natural generalisation of exact matching where an arbitrary one-to-one relabelling of pattern symbols is allowed. We show how this problem can be solved in constant time per arriving stream symbol and sublinear, near optimal space with high probability. Our results are surprising and important: it has been shown that almost no streaming pattern matching problems can be solved (not even randomised) in less than Theta(m) space, with exact matching as the only known problem to have a sublinear, near optimal space solution. Here we demonstrate that a similar sublinear, near optimal space solution is achievable for an even more challenging problem.

Cite as

Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized Matching in the Streaming Model. In 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013). Leibniz International Proceedings in Informatics (LIPIcs), Volume 20, pp. 400-411, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)

Copy BibTex To Clipboard

@InProceedings{jalsenius_et_al:LIPIcs.STACS.2013.400,
  author =	{Jalsenius, Markus and Porat, Benny and Sach, Benjamin},
  title =	{{Parameterized Matching in the Streaming Model}},
  booktitle =	{30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013)},
  pages =	{400--411},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-50-7},
  ISSN =	{1868-8969},
  year =	{2013},
  volume =	{20},
  editor =	{Portier, Natacha and Wilke, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2013.400},
  URN =		{urn:nbn:de:0030-drops-39513},
  doi =		{10.4230/LIPIcs.STACS.2013.400},
  annote =	{Keywords: Pattern matching, streaming algorithms, randomized algorithms}
}

6 Search Results for "Sach, Benjamin"

Counting on General Run-Length Grammars

Abstract

Cite as

Improved Circular Dictionary Matching

Abstract

Cite as

Compressed Dictionary Matching on Run-Length Encoded Strings

Abstract

Cite as

Two-Dimensional Longest Common Extension Queries in Compact Space

Abstract

Cite as

Cell-Probe Lower Bounds for Bit Stream Computation

Abstract

Cite as

Parameterized Matching in the Streaming Model

Abstract

Cite as

Thanks for your feedback!

Could not send message