Volume

LIPIcs, Volume 369

37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)

Part of: Series: Leibniz International Proceedings in Informatics (LIPIcs)
Part of: Conference: Annual Symposium on Combinatorial Pattern Matching (CPM)

Event

CPM 2026, Copenhagen, Denmark, June 15-17, 2026

Editors

Philip Bille

Technical University of Denmark, Lyngby, Denmark

Nicola Prezza

DAIS, Ca' Foscari University of Venice, Italy

Publication Details

published at: 2026-06-08
Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
ISBN: 978-3-95977-420-8

Access Numbers

Detailed Access Statistics available here
Total Accesses (updated on a weekly basis)

0

Documents

0

Metadata

Documents

No documents found matching your filter selection.

Document

Complete Volume

DOI: 10.4230/LIPIcs.CPM.2026

LIPIcs, Volume 369, CPM 2026, Complete Volume

Authors: Philip Bille and Nicola Prezza

Abstract

LIPIcs, Volume 369, CPM 2026, Complete Volume

Cite as

37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 1-644, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@Proceedings{bille_et_al:LIPIcs.CPM.2026,
  title =	{{LIPIcs, Volume 369, CPM 2026, Complete Volume}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{1--644},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026},
  URN =		{urn:nbn:de:0030-drops-262662},
  doi =		{10.4230/LIPIcs.CPM.2026},
  annote =	{Keywords: LIPIcs, Volume 369, CPM 2026, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.CPM.2026.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Philip Bille and Nicola Prezza

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bille_et_al:LIPIcs.CPM.2026.0,
  author =	{Bille, Philip and Prezza, Nicola},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{0:i--0:xviii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.0},
  URN =		{urn:nbn:de:0030-drops-262653},
  doi =		{10.4230/LIPIcs.CPM.2026.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.1

Hamming Distance Oracles

Authors: Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, and Ely Porat

Abstract

In this paper, we present and study the Hamming distance oracle problem. In this problem, the task is to preprocess two strings S and T of lengths n and m, respectively, to obtain a data structure that is able to return the Hamming distance between a substring of S and a substring of T. For strings over a constant-size alphabet, we show that for every x ≤ min{n,m} there is a data structure with Õ(nm/x) preprocessing time and O(x) query time. We also provide a conditional lower bound, showing that for every ε > 0 there is no combinatorial data structure with query time O(x) and preprocessing time O((nm/x)^{1-ε}) unless combinatorial fast matrix multiplication is possible. For strings over a general alphabet, we present a data structure with Õ(nm/√x) pre-processing time and O(x) query time for every x ≤ min {n,m}. Moreover, for every ε > 0 we provide a data structure with a preprocessing time of Õ((n+m)/ε³) that returns with high probability a (1±ε) approximation of the Hamming distance of two input substrings. The query time of the approximation data structure is Õ(1/ε²).

Cite as

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus, and Ely Porat. Hamming Distance Oracles. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 1:1-1:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2026.1,
  author =	{Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan and Porat, Ely},
  title =	{{Hamming Distance Oracles}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{1:1--1:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.1},
  URN =		{urn:nbn:de:0030-drops-259278},
  doi =		{10.4230/LIPIcs.CPM.2026.1},
  annote =	{Keywords: Hamming distance, Fine-grained complexity, Data structure, Oracle}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.2

Near-Real-Time Solutions for Online String Problems

Authors: Dominik Köppl and Gregory Kucherov

Abstract

Based on the Breslauer-Italiano online suffix tree construction algorithm (2013) with double logarithmic worst-case guarantees on the update time per letter, we develop near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings, all in an online manner. Our solutions improve over the best known running times for these problems in terms of the worst-case time per letter, for which we achieve a poly-log-logarithmic time complexity, within a linear space. Best known results for these problems require a poly-logarithmic time complexity per letter or only provide amortized complexity bounds. As a result of independent interest, we give conversions between the longest previous factor array and the longest repeating suffix array in space and time bounds based on their irreducible representations, which can have sizes sublinear in the length of the input string.

Cite as

Dominik Köppl and Gregory Kucherov. Near-Real-Time Solutions for Online String Problems. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 2:1-2:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.CPM.2026.2,
  author =	{K\"{o}ppl, Dominik and Kucherov, Gregory},
  title =	{{Near-Real-Time Solutions for Online String Problems}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{2:1--2:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.2},
  URN =		{urn:nbn:de:0030-drops-259287},
  doi =		{10.4230/LIPIcs.CPM.2026.2},
  annote =	{Keywords: online algorithms, string algorithms, suffix tree, real-time computation, Lempel-Ziv factorization, minimal unique substrings}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.3

Computing k-mers in Graphs

Authors: Jarno N. Alanko and Máximo Pérez-López

Abstract

We initiate the study of computational problems on k-mers (strings of length k) in labeled graphs. As a starting point, we consider the problem of counting the number of distinct k-mers found on the walks of a graph. We establish that this is #P-hard, even on connected deterministic DAGs. However, in the class of deterministic Wheeler graphs (Gagie, Manzini, and Sirén, TCS 2017), we show that distinct k-mers of such a graph W = (V, E) can be counted using O(|W|k) or O(n⁴ log k) arithmetic operations, where n = |V|, m = |E| and |W| = n+m. The latter result uses a new generalization of the technique of prefix doubling to Wheeler graphs. To generalize our results beyond Wheeler graphs, we discuss ways to transform a graph into a Wheeler graph in a manner that preserves the k-mers. As an application of our k-mer counting algorithms, we construct a representation of the de Bruijn graph of the k-mers that occupies O(n_k + |W|k log(max_{1 ≤ 𝓁 ≤ k} n_𝓁) + σlog m) bits of space, where n_𝓁 is the number of distinct 𝓁-mers in the Wheeler graph, and σ is the size of the alphabet. We show how to construct it in the same time complexity. Given that the Wheeler graph can be exponentially smaller than the de Bruijn graph, for large k this provides a theoretical improvement over previous de Bruijn graph construction methods from graphs, which must spend Ω(k) time per k-mer in the graph.

Cite as

Jarno N. Alanko and Máximo Pérez-López. Computing k-mers in Graphs. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 3:1-3:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{alanko_et_al:LIPIcs.CPM.2026.3,
  author =	{Alanko, Jarno N. and P\'{e}rez-L\'{o}pez, M\'{a}ximo},
  title =	{{Computing k-mers in Graphs}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{3:1--3:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.3},
  URN =		{urn:nbn:de:0030-drops-259294},
  doi =		{10.4230/LIPIcs.CPM.2026.3},
  annote =	{Keywords: Wheeler graph, Wheeler language, de Bruijn graph, graph, k-mer, q-gram, DFA, #P-hard}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.4

Compact Representation of Maximal Palindromes

Authors: Takuya Mieno

Abstract

Palindromes are strings that read the same forward and backward. The computation of palindromic structures within strings is a fundamental problem in string algorithms, being motivated by potential applications in formal language theory and bioinformatics. Although the number of palindromic factors in a string of length n can be quadratic, they can be implicitly represented in O(n log n) bits of space by storing the lengths of all maximal palindromes in an integer array, which can be computed in O(n) time [Manacher, 1975]. In this paper, we propose a novel O(n)-bit representation of all maximal palindromes in a string, which enables O(1)-time retrieval of the length of the maximal palindrome centered at any given position. The data structure can be constructed in O(n) time from the input string of length n. Since Manacher’s algorithm and the notion of maximal palindromes are widely utilized for solving numerous problems involving palindromic structures, our compact representation will accelerate the development of more space-efficient solutions to such problems. Indeed, as the first application of our compact representation of maximal palindromes, we present a data structure of size O(n) bits that can compute the longest palindrome appearing in any given factor of a string of length n in O(log n) time.

Cite as

Takuya Mieno. Compact Representation of Maximal Palindromes. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{mieno:LIPIcs.CPM.2026.4,
  author =	{Mieno, Takuya},
  title =	{{Compact Representation of Maximal Palindromes}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{4:1--4:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.4},
  URN =		{urn:nbn:de:0030-drops-259304},
  doi =		{10.4230/LIPIcs.CPM.2026.4},
  annote =	{Keywords: palindromes, succinct data structures, internal queries}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.5

Efficient Grammar Compression via RLZ-Based RePair

Authors: Rahul Varki, Travis Gagie, and Christina Boucher

Abstract

Among grammar-based compression techniques, RePair is a notable offline encoding scheme known for its simplicity and powerful combinatorial properties, producing compact grammars by repeatedly replacing the most frequent adjacent pairs of symbols, known as bigrams. However, RePair’s memory usage scales poorly with input size, as it loads the entire text into memory. In contrast, Relative Lempel-Ziv (RLZ) parsing offers a scalable and lightweight online encoding scheme that losslessly represents a text in terms of phrases that refer to a reference string, but it often fails to expose deeper structural patterns. We introduce an algorithm that produces a RePair grammar from the RLZ parse of the input, leveraging the strengths of both methods. Our method, RLZ-RePair, performs bigram replacements systematically, preserving the integrity of the RLZ phrases throughout the RePair iterations. When the reference is well chosen, our method achieves the same grammar as standard RePair while significantly reducing both memory usage and the number of bigram replacements. In particular, we show that RLZ-RePair can reduce memory usage by more than 80% while incurring only a modest runtime increase compared to RePair. To our knowledge, RLZ-RePair is one of the first scalable methods that constructs exact RePair grammars, resulting in a grammar-based compressor that is both practical for large datasets and faithful to the theoretical elegance of RePair.

Cite as

Rahul Varki, Travis Gagie, and Christina Boucher. Efficient Grammar Compression via RLZ-Based RePair. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{varki_et_al:LIPIcs.CPM.2026.5,
  author =	{Varki, Rahul and Gagie, Travis and Boucher, Christina},
  title =	{{Efficient Grammar Compression via RLZ-Based RePair}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.5},
  URN =		{urn:nbn:de:0030-drops-259310},
  doi =		{10.4230/LIPIcs.CPM.2026.5},
  annote =	{Keywords: RePair, RLZ, Grammar Compression}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.6

Improved Bounds on the Maximum Number of Distinct Squares in Circular Words

Authors: Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba

Abstract

We investigate the asymptotic growth of function CS(n), which maps n to the maximum number of distinct squares in a circular word of length n (that is, the maximum number of distinct squares of length at most n in a word ww of length 2n). We improve upon the lower bound of 1.25n established by Amit and Gawrychowski [SPIRE 2017] and the straightforward upper bound of 2n, which follows from the recent result of Brlek and Li [Comb. Theory, 2025] stating that there are fewer than n squares in standard (i.e., non-circular) words of length n. (Previously, Amit and Gawrychowski gave an upper bound of 32/15n using a weaker upper bound on squares in standard words.) Specifically, we show that CS(n) ≤ ⌈1.8 n⌉ and that, for infinitely many n, CS(n) ≥ 1.5n-𝒪(√n). For the lower bound, we exploit the combinatorial structure of Fibonacci words to construct a family of square-rich circular words. For the upper bound, we exploit density properties of the starting positions of long squares, adapting an approach of Amit and Gawrychowski.

Cite as

Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Improved Bounds on the Maximum Number of Distinct Squares in Circular Words. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{charalampopoulos_et_al:LIPIcs.CPM.2026.6,
  author =	{Charalampopoulos, Panagiotis and Mohamed, Manal and Radoszewski, Jakub and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor},
  title =	{{Improved Bounds on the Maximum Number of Distinct Squares in Circular Words}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{6:1--6:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.6},
  URN =		{urn:nbn:de:0030-drops-259325},
  doi =		{10.4230/LIPIcs.CPM.2026.6},
  annote =	{Keywords: circular words, squares, repetitions}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.7

Optimal Structure for Prefix-Substring Queries

Authors: Paweł Gawrychowski, Florin Manea, and Jonas Richardsen

Abstract

The prefix-substring matching problem [Gu, Farach, and Beigel, SODA 1994] consists in preprocessing a string s of length n for the following queries: given a triple (i, j, k) ∈ {0, … , |s|}³ with 1 ≤ j ≤ k, representing a prefix s[1:i] and a substring s[j:k] of s, find the longest prefix of s that is a suffix of s[1:i]s[j:k]. This is an useful primitive in e.g. dynamic text indexing, compressed pattern matching, and pattern matching on block graphs. The border tree uses some basic periodicity properties to answer such queries in 𝒪(log n) time after 𝒪(n) time preprocessing of s. We design a linear-space structure that answers such queries in constant time after 𝒪(n) time preprocessing of s over a polynomial alphabet, which is worst-case optimal.

Cite as

Paweł Gawrychowski, Florin Manea, and Jonas Richardsen. Optimal Structure for Prefix-Substring Queries. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.CPM.2026.7,
  author =	{Gawrychowski, Pawe{\l} and Manea, Florin and Richardsen, Jonas},
  title =	{{Optimal Structure for Prefix-Substring Queries}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{7:1--7:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.7},
  URN =		{urn:nbn:de:0030-drops-259333},
  doi =		{10.4230/LIPIcs.CPM.2026.7},
  annote =	{Keywords: Border Tree, Prefix-Substring Query, Data Structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.8

Constant Multiplicative Sensitivity on the CDAWGs

Authors: Rikuya Hamai, Hiroto Fujimaru, and Shunsuke Inenaga

Abstract

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string T is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string T, and thus CDAWGs are a compact indexing structure. Indeed, the CDAWG size 𝖾 can be sublinear in n for some highly repetitive strings. Of its various applications, the CDAWG allows for computing pattern occurrences, maximal exact matches (MEMs), minimal absent words (MAWs), and minimal unique substrings (MUSs) in optimal time using O(𝖾) space. For designing space-efficient data storage, it is crucial that the underlying data structure is robust against data edits and errors. As a mathematical measure for this, the notion of compression sensitivity [Akagi et al. 2023] was introduced as the maximum of the size increase in the compressed data structures after edits operations. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation is performed at an arbitrary position in the input string T. We show that the size of the CDAWG after an edit operation on T is asymptotically at most 8 times larger than the original CDAWG before the edit. This O(1) upper bound significantly improves on the only known upper bound O(n/log n) for the problem.

Cite as

Rikuya Hamai, Hiroto Fujimaru, and Shunsuke Inenaga. Constant Multiplicative Sensitivity on the CDAWGs. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 8:1-8:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{hamai_et_al:LIPIcs.CPM.2026.8,
  author =	{Hamai, Rikuya and Fujimaru, Hiroto and Inenaga, Shunsuke},
  title =	{{Constant Multiplicative Sensitivity on the CDAWGs}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{8:1--8:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.8},
  URN =		{urn:nbn:de:0030-drops-259345},
  doi =		{10.4230/LIPIcs.CPM.2026.8},
  annote =	{Keywords: string data structures, maximal repeats, data compression, compression sensitivity, CDAWGs}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.9

Indexing and Encoding Arrays for Element Distinctness Queries

Authors: Johannes Fischer and Filippo Lari

Abstract

We introduce the data structure variant of the well-known element distinctness problem. Given an array of n elements, the goal is to preprocess the array into a data structure that supports queries asking whether all elements within a given query range are distinct. This has applications in text indexing and possibly also in other algorithmic domains. In the indexing model (where access to the input array is allowed), we design a data structure using O((n log b)/b) bits and answering queries in the time needed to solve an online element distinctness instance of size O(b), for any b ≥ 1. As a concrete instantiation of this, there exists an index that answers queries in O(log log log n) time using O({n log²(log log log n)}/{log log log n}) bits of additional space. Moving to the encoding model (where access to the input array is not allowed), we begin by proving an information-theoretic lower bound for the space usage of 2n-O(log n) bits, and then design a matching encoding with O(1) time queries. We then consider the case in which the alphabet size σ is constant. In this setting, the lower bound can be refined to n log(r_σ) - 3 log(σ+2) + O(1) bits, where r_σ = 4cos²(π/(σ+2)). This lower bound is matched by an encoding with O(1) time queries.

Cite as

Johannes Fischer and Filippo Lari. Indexing and Encoding Arrays for Element Distinctness Queries. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 9:1-9:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{fischer_et_al:LIPIcs.CPM.2026.9,
  author =	{Fischer, Johannes and Lari, Filippo},
  title =	{{Indexing and Encoding Arrays for Element Distinctness Queries}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{9:1--9:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.9},
  URN =		{urn:nbn:de:0030-drops-259350},
  doi =		{10.4230/LIPIcs.CPM.2026.9},
  annote =	{Keywords: element distinctness, range queries, lower bounds, succinct data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.10

R-Enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies

Authors: Kotaro Kimura and Tomohiro I

Abstract

A repeat is a substring that occurs at least twice in a string, and is called a maximal repeat if it cannot be extended outwards without reducing its frequency. Nishimoto and Tabei [CPM, 2021] proposed r-enum, an algorithm to enumerate various characteristic substrings, including maximal repeats, in a string T of length n in O(r) words of compressed working space, where r ≤ n is the number of runs in the Burrows-Wheeler transform (BWT) of T. Given the run-length encoded BWT (RLBWT) of T, r-enum runs in O(n log log_w (n/r)) time in addition to the time linear to the number of output strings, where w = Θ(log n) is the word size. In this paper, we first improve the O(n log log_w (n/r)) term to O(n). We next extend r-enum to compute other context-sensitive repeats such as near-supermaximal repeats (NSMRs) and supermaximal repeats, as well as the context diversity for every maximal repeat in the same complexities. Furthermore, we study net occurrences: An occurrence of a repeat is called a net occurrence if it is not covered by another repeat, and the net frequency of a repeat is the number of its net occurrences. With this terminology, an NSMR is a repeat with a positive net frequency. Given the RLBWT of T, we show how to compute the set 𝒮^{nsmr} of all NSMRs in T together with their net frequency/occurrences in O(n) time and O(r) space. We also show that an O(r)-space data structure can be built from the RLBWT to compute the net frequency/occurrences of any pattern in optimal time. The data structure is built in O(r) space and in O(n) time with high probability or deterministic O(n + |𝒮^{nsmr}| log log min(σ, |𝒮^{nsmr}|)) time, where σ ≤ r is the alphabet size of T. To achieve this, we prove that the total number of net occurrences is less than 2r. With the duality between net occurrences and minimal unique substrings (MUSs), we get a new upper bound 2r of the number of MUSs in T, which may be of independent interest.

Cite as

Kotaro Kimura and Tomohiro I. R-Enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kimura_et_al:LIPIcs.CPM.2026.10,
  author =	{Kimura, Kotaro and I, Tomohiro},
  title =	{{R-Enum Revisited: Speedup and Extension for Context-Sensitive Repeats and Net Frequencies}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{10:1--10:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.10},
  URN =		{urn:nbn:de:0030-drops-259361},
  doi =		{10.4230/LIPIcs.CPM.2026.10},
  annote =	{Keywords: Supermaximal repeats, Largest maximal repeats, Net frequencies, Run-length Burrows-Wheeler transform, Compressed data mining}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.11

Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms

Authors: Yuya Uezato

Abstract

Pattern languages (PAT) are a class of languages generated by expressions called patterns that may contain variables. In a pattern, each variable can be instantiated with an arbitrary string. Typed pattern languages extend PAT by associating a type (constraint) with each variable that restricts the domain of allowed substitutions. In this paper, we study regular-typed PAT (PATwRT), where all types are represented either by a regular expression or by an ε-NFA. We consider the PATwRT matching problem for patterns with a single repeated variable of the form P = α₁ β α₂ β ⋯ β α_K. We present simple algorithms whose running time is linear in K and quadratic in the input length N, with polynomial dependence on the sizes of the type representations. Our results extend previous quadratic-time work in two directions: (1) the quadratic-time algorithm for untyped PAT of Fernau et al. (STACS 2015), and (2) the quadratic-time algorithm for the restricted PATwRT K = 3, i.e., α₁ β α₂ β α₃ of Nogami and Terauchi (MFCS 2025).

Cite as

Yuya Uezato. Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{uezato:LIPIcs.CPM.2026.11,
  author =	{Uezato, Yuya},
  title =	{{Matching Regular-Typed Pattern Languages: Quadratic-Time Algorithms}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.11},
  URN =		{urn:nbn:de:0030-drops-259374},
  doi =		{10.4230/LIPIcs.CPM.2026.11},
  annote =	{Keywords: Pattern languages, Regular expressions, String algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.12

Maximizing Diversity in (Near-)Median String Selection

Authors: Diptarka Chakraborty, Rudrayan Kundu, Nidhi Purohit, and Aravinda Kanchana Ruwanpathirana

Abstract

Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the underlying metric, this problem has extensive applications, ranging from bioinformatics to pattern recognition. However, modern applications often require the generation of multiple (near-)optimal yet diverse median strings to enhance flexibility and robustness in decision-making. In this study, we address this need by focusing on two prominent diversity measures: sum dispersion and min dispersion. We first introduce an exact algorithm for the diameter variant of the problem, which identifies pairs of near-optimal medians that are maximally diverse. Subsequently, we propose a (1-ε)-approximation algorithm (for any ε > 0) for sum dispersion, as well as a bi-criteria approximation algorithm for the more challenging min dispersion case, allowing the generation of multiple (more than two) diverse near-optimal Hamming medians. Our approach primarily leverages structural insights into the Hamming median space and also draws on techniques from error-correcting code construction to establish these results.

Cite as

Diptarka Chakraborty, Rudrayan Kundu, Nidhi Purohit, and Aravinda Kanchana Ruwanpathirana. Maximizing Diversity in (Near-)Median String Selection. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{chakraborty_et_al:LIPIcs.CPM.2026.12,
  author =	{Chakraborty, Diptarka and Kundu, Rudrayan and Purohit, Nidhi and Ruwanpathirana, Aravinda Kanchana},
  title =	{{Maximizing Diversity in (Near-)Median String Selection}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{12:1--12:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.12},
  URN =		{urn:nbn:de:0030-drops-259382},
  doi =		{10.4230/LIPIcs.CPM.2026.12},
  annote =	{Keywords: Diversity maximization, Hamming median, diameter, dispersion, approximation algorithms}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.13

Totally Unclustered BWT Images of Any Length over Non-Binary Alphabets

Authors: Gabriele Fici, Estéban Gabory, Giuseppe Romana, and Marinella Sciortino

Abstract

We prove that for every integer n > 0 and for every alphabet Σ_k of size k ≥ 3, there exist words of length n whose Burrows-Wheeler Transform (BWT) is totally unclustered, i.e., it consists of exactly n runs with no two consecutive equal symbols. These words represent the worst-case behavior of the clustering effect of the BWT. We also establish a lower bound on their number. This contrasts with the binary case, where the existence of infinitely many totally unclustered BWT images is still an open problem, related to Artin’s conjecture on primitive roots.

Cite as

Gabriele Fici, Estéban Gabory, Giuseppe Romana, and Marinella Sciortino. Totally Unclustered BWT Images of Any Length over Non-Binary Alphabets. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{fici_et_al:LIPIcs.CPM.2026.13,
  author =	{Fici, Gabriele and Gabory, Est\'{e}ban and Romana, Giuseppe and Sciortino, Marinella},
  title =	{{Totally Unclustered BWT Images of Any Length over Non-Binary Alphabets}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.13},
  URN =		{urn:nbn:de:0030-drops-259399},
  doi =		{10.4230/LIPIcs.CPM.2026.13},
  annote =	{Keywords: Burrows-Wheeler Transform, BWT-runs, Repetitiveness Measure, Clustering Effect, Generalized de Bruijn Words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.14

Hardness Results on Characteristics for Elastic-Degenerate Strings

Authors: Dominik Köppl and Jannik Olbrich

Abstract

Generalizations of plain strings have been proposed as a compact way to represent a collection of nearly identical sequences or to express uncertainty at specific text positions by enumerating all possibilities. While a plain string stores a character at each of its positions, generalizations consider a set of characters (indeterminate strings), a set of strings of equal length (generalized degenerate strings, or shortly GD strings), or a set of strings of arbitrary lengths (elastic-degenerate strings, or shortly ED strings). These generalizations are of importance to compactly represent such type of data, and find applications in bioinformatics for representing and maintaining a set of genetic sequences of the same taxonomy or a multiple sequence alignment. To be of use, attention has been drawn to answering various query types such as pattern matching or measuring similarity of ED strings by generalizing techniques known to plain strings. However, for some types of queries, it has been shown that a generalization of a polynomial-time solvable query on classic strings becomes NP-hard on ED strings, e.g. [Russo et al., 2022]. In that light, we wonder about other types of queries that are of particular interest to bioinformatics: unique substrings, absent words, anti-powers, longest previous factors, and Lempel-Ziv-like compression schemes. While we obtain a polynomial time algorithm for a variation of longest previous factors, we show that all other problems are NP-hard to compute, some of them even under the restriction that the input can be modeled as an indeterminate or GD string.

Cite as

Dominik Köppl and Jannik Olbrich. Hardness Results on Characteristics for Elastic-Degenerate Strings. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 14:1-14:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{koppl_et_al:LIPIcs.CPM.2026.14,
  author =	{K\"{o}ppl, Dominik and Olbrich, Jannik},
  title =	{{Hardness Results on Characteristics for Elastic-Degenerate Strings}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{14:1--14:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.14},
  URN =		{urn:nbn:de:0030-drops-259409},
  doi =		{10.4230/LIPIcs.CPM.2026.14},
  annote =	{Keywords: Elastic-degenerate strings, NP-hardness, longest common factor, minimal unique substring, minimal absent word, anti-power, longest previous factor}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.15

Improved Approximation Ratios for the Shortest Common Superstring Problem with Reverse Complements

Authors: Ryosuke Yamano and Tetsuo Shibuya

Abstract

The Shortest Common Superstring (SCS) problem asks for the shortest string that contains each of a given set of strings as a substring. Its reverse-complement variant, the Shortest Common Superstring problem with Reverse Complements (SCS-RC), naturally arises in bioinformatics applications, where for each input string, either the string itself or its reverse complement must appear as a substring of the superstring. The well-known MGREEDY algorithm for the standard SCS constructs a superstring by first computing an optimal cycle cover on the overlap graph and then concatenating the strings corresponding to the cycles, while its refined variant, TGREEDY, further improves the approximation ratio. Although the original 4- and 3-approximation bounds of these algorithms have been successively improved for the standard SCS, no such progress has been made for the reverse-complement setting. A previous study extended MGREEDY to SCS-RC with a 4-approximation guarantee and briefly suggested that extending TGREEDY to the reverse-complement setting could achieve a 3-approximation. In this work, we strengthen these results by proving that the extensions of MGREEDY and TGREEDY to the reverse-complement setting achieve 3.75- and 2.875-approximation ratios, respectively. Our analysis extends the classical proofs for the standard SCS to handle the bidirectional overlaps introduced by reverse complements. These results provide the first formal improvement of approximation guarantees for SCS-RC, with the 2.875-approximate algorithm currently representing the best known bound for this problem.

Cite as

Ryosuke Yamano and Tetsuo Shibuya. Improved Approximation Ratios for the Shortest Common Superstring Problem with Reverse Complements. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 15:1-15:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{yamano_et_al:LIPIcs.CPM.2026.15,
  author =	{Yamano, Ryosuke and Shibuya, Tetsuo},
  title =	{{Improved Approximation Ratios for the Shortest Common Superstring Problem with Reverse Complements}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{15:1--15:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.15},
  URN =		{urn:nbn:de:0030-drops-259412},
  doi =		{10.4230/LIPIcs.CPM.2026.15},
  annote =	{Keywords: Shortest Common Superstring, Approximation Algorithms, DNA Sequencing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.16

Merging RLBWTs Adaptively

Authors: Travis Gagie

Abstract

We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in O (r) space and O ((r + L) log (m + n)) time, where m and n are the lengths of the uncompressed strings, r is the number of runs in the final eBWT and L is the sum of its irreducible LCP values.

Cite as

Travis Gagie. Merging RLBWTs Adaptively. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{gagie:LIPIcs.CPM.2026.16,
  author =	{Gagie, Travis},
  title =	{{Merging RLBWTs Adaptively}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{16:1--16:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.16},
  URN =		{urn:nbn:de:0030-drops-259420},
  doi =		{10.4230/LIPIcs.CPM.2026.16},
  annote =	{Keywords: Burrows-Wheeler Transform, run-length compression, RLBWT, construction, merging}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.17

Sensitivity of Repetitiveness Measures to String Reversal

Authors: Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, and Cristian Urbina

Abstract

We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, r, of runs in the run-length encoding of the Burrows-Wheeler transform (BWT) can increase additively by Θ(n) when reversing the string. This substantially improves the known Ω(log n) lower-bound for the additive sensitivity of r and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size z of the Lempel-Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio z(w^R)/z(w) approaches 3 from below as |w| → ∞. We also show an asymptotically tight lower-bound of Θ(n) for the additive sensitivity of the size v of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of v to reversing the string is Θ(log n), and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal - a simple and natural data transformation.

Cite as

Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, and Cristian Urbina. Sensitivity of Repetitiveness Measures to String Reversal. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:LIPIcs.CPM.2026.17,
  author =	{Bannai, Hideo and Fujie, Yuto and Guo, Peaker and Inenaga, Shunsuke and Nakashima, Yuto and Puglisi, Simon J. and Urbina, Cristian},
  title =	{{Sensitivity of Repetitiveness Measures to String Reversal}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.17},
  URN =		{urn:nbn:de:0030-drops-259434},
  doi =		{10.4230/LIPIcs.CPM.2026.17},
  annote =	{Keywords: String reversal, Repetitiveness measures, Burrows-Wheeler transform, Lempel-Ziv parsing, Lexicographic parsings}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.18

On Time-Memory Tradeoffs for Maximal Palindromes with Wildcards and k-Mismatches

Authors: Amihood Amir, Ayelet Butman, Michael Itzhaki, and Dina Sokol

Abstract

This paper addresses the problem of identifying palindromic factors in texts that include wildcards - special characters that match all others. These symbols challenge many classical algorithms, as numerous combinatorial properties are not satisfied in their presence. We apply existing wildcard-LCE techniques to obtain a continuous time-memory tradeoff, and present the first non-trivial linear-space algorithm for computing all maximal palindromes with wildcards, improving the best known time-memory product in certain parameter ranges. Our main results are algorithms to find and approximate all maximal palindromes in a given text. We also generalize both methods to the k-mismatches setting, with or without wildcards.

Cite as

Amihood Amir, Ayelet Butman, Michael Itzhaki, and Dina Sokol. On Time-Memory Tradeoffs for Maximal Palindromes with Wildcards and k-Mismatches. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{amir_et_al:LIPIcs.CPM.2026.18,
  author =	{Amir, Amihood and Butman, Ayelet and Itzhaki, Michael and Sokol, Dina},
  title =	{{On Time-Memory Tradeoffs for Maximal Palindromes with Wildcards and k-Mismatches}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.18},
  URN =		{urn:nbn:de:0030-drops-259444},
  doi =		{10.4230/LIPIcs.CPM.2026.18},
  annote =	{Keywords: Wildcards, Mismatches, Palindrome}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.19

Asymmetric Streaming Approximate Pattern Matching

Authors: Wojciech Janczewski and Tatiana Starikovskaya

Abstract

We study the space complexity of pattern matching in the asymmetric streaming model, focusing on approximate pattern matching under the Hamming and edit distances. In this problem, we are given an m-length pattern and an n-length text and must compute, for every position of the text, the smallest distance between the pattern and a substring of the text which ends at this position. In the asymmetric streaming model, we assume to have constant-time random access to the pattern, while the text arrives as a stream, one letter at a time. It is known that computing all distances exactly in the asymmetric streaming model requires Ω(m) space (for the edit distance see Li and Zheng [FSTTCS 2021]). Hence, to achieve sublinear space, a relaxation of the problem is necessary. One possible variant is to consider the small distance regime, where the algorithm must compute only those distances that are bounded by a small integer parameter k. In this case, existing algorithms in a more restrictive fully streaming model (Kociumaka, Clifford, Porat [SODA'19], Bhattacharya, Koucký [ICALP'23]) straightforwardly imply the existence of poly(k, log n)-space asymmetric streaming algorithms. Another possible relaxation is computing all distances approximately. For this variant, we don't have small-space algorithms in the fully streaming model: the best known algorithm solves pattern matching under the Hamming distance (1+ε)-approximately using 𝒪̃(ε^{-2}√m) space (Starikovskaya, Svagerka, Uznański [APPROX'20]). For the edit distance, no efficient approximation algorithms are known. In this work, we show approximation algorithms for pattern matching under the Hamming and edit distances in the asymmetric streaming model for any constant ε > 0: 1) We show that there is a simple randomised asymmetric streaming algorithm that solves approximate pattern matching under the Hamming distance (1+ε)-approximately using 𝒪(ε^{-3}log³n) bits. 2) As our second and main contribution, we extend the result of Cheng et al. [ICALP 2021] and show that for any integer k there is a deterministic asymmetric streaming algorithm that solves pattern matching under the edit distance (2^k-1+ε)-approximately using 𝒪̃(m^{1/k}) space.

Cite as

Wojciech Janczewski and Tatiana Starikovskaya. Asymmetric Streaming Approximate Pattern Matching. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 19:1-19:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{janczewski_et_al:LIPIcs.CPM.2026.19,
  author =	{Janczewski, Wojciech and Starikovskaya, Tatiana},
  title =	{{Asymmetric Streaming Approximate Pattern Matching}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{19:1--19:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.19},
  URN =		{urn:nbn:de:0030-drops-259458},
  doi =		{10.4230/LIPIcs.CPM.2026.19},
  annote =	{Keywords: Asymmetric streaming, Pattern matching, Approximation, Edit distance, Hamming distance}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.20

Longest Common Extension of a Dynamic String in Parallel Constant Time

Authors: Daniel Alexander Albert

Abstract

A longest common extension (LCE) query on a string computes the length of the longest common suffix or prefix at two given positions. A dynamic LCE algorithm maintains a data structure that allows efficient LCE queries on a string that can change via character insertions and deletions. A dynamic parallel constant-time algorithm is presented that can maintain LCE queries on a common CRCW PRAM with 𝒪(n^ε) work, for any ε > 0. The algorithm maintains a string synchronizing sets hierarchy, which it uses to answer substring equality queries, which it in turn uses to answer LCE queries. To achieve constant runtime, the algorithm allows parts of its information to become outdated by up to log n log^* n updates. It answers queries by combining this slightly outdated information with a list of the recent changes. Two applications of this dynamic LCE algorithm are shown. Firstly, a dynamic parallel constant-time algorithm can maintain membership in a Dyck language D_k, k > 0 with 𝒪(n^ε) work for any ε > 0. Secondly, a dynamic parallel constant-time algorithm can maintain squares with 𝒪(n^ε) work for any ε > 0.

Cite as

Daniel Alexander Albert. Longest Common Extension of a Dynamic String in Parallel Constant Time. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{albert:LIPIcs.CPM.2026.20,
  author =	{Albert, Daniel Alexander},
  title =	{{Longest Common Extension of a Dynamic String in Parallel Constant Time}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{20:1--20:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.20},
  URN =		{urn:nbn:de:0030-drops-259467},
  doi =		{10.4230/LIPIcs.CPM.2026.20},
  annote =	{Keywords: Dynamic Strings, Work, Parallel Constant Time, Longest Common Extension, Longest Common Prefix}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.21

A Bitwise Approach to SCER Matching in Indeterminate Strings

Authors: Simone Faro, Dominik Köppl, Thierry Lecroq, and Francesco Pio Marino

Abstract

We study the problem of matching a determinate pattern against an indeterminate text of the same length n, where each text position is a set of possible characters drawn from an alphabet Σ of size σ. We study this matching problem under the order-preserving and parameterized matching setting. For that, we encode character sets by bit expressions using sum-free sequences. This encoding enables constant-time character comparisons and avoids explicit set operations. We present an optimal 𝒪(n) time algorithm for order-preserving matching and an 𝒪(n+(σ_p^x ⋅ σ_p^y) √{σ_p^x + σ_p^y}) time algorithm for parameterized matching, where σ_p^x and σ_p^y denote the number of distinct parameterized symbols in the pattern and the text, respectively. The proposed techniques significantly reduce overhead while maintaining exactness, offering practical performance improvements for pattern matching under uncertainty. Additionally, we extend the parameterized matching framework to allow mismatches, for which we present an algorithm with time complexity 𝒪(σ² n log n + n σ² √σ log(n σ)).

Cite as

Simone Faro, Dominik Köppl, Thierry Lecroq, and Francesco Pio Marino. A Bitwise Approach to SCER Matching in Indeterminate Strings. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 21:1-21:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{faro_et_al:LIPIcs.CPM.2026.21,
  author =	{Faro, Simone and K\"{o}ppl, Dominik and Lecroq, Thierry and Marino, Francesco Pio},
  title =	{{A Bitwise Approach to SCER Matching in Indeterminate Strings}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{21:1--21:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.21},
  URN =		{urn:nbn:de:0030-drops-259470},
  doi =		{10.4230/LIPIcs.CPM.2026.21},
  annote =	{Keywords: string matching, indeterminate strings, SCER matching}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.22

Optimal-Time Mapping in Run-Length Compressed PBWT

Authors: Paola Bonizzoni, Davide Cozzi, and Younan Gao

Abstract

The Positional Burrows-Wheeler Transform (PBWT) is a data structure designed for efficiently representing and querying large collections of sequences, such as haplotype panels in genomics. Forward and backward stepping operations - analogues to LF- and FL-mapping in the traditional BWT - are fundamental to the PBWT, underpinning many algorithms based on the PBWT for haplotype matching and related analyses. Although the run-length encoded variant of the PBWT (also known as the μ-PBWT) achieves O(r̃)-word space usage, where r̃ is the total number of runs, no data structure supporting both forward and backward stepping in constant time within this space bound was previously known. In this paper, we consider the multi-allelic PBWT that is extended from its original binary form to a general ordered alphabet {0, … , σ-1}. We first establish bounds on the size r̃ and then introduce a new O(r̃)-word data structure built over a list of haplotypes {S_1, … , S_h}, each of length w, that supports constant-time forward and backward stepping. We further revisit two key applications - haplotype retrieval and prefix search - leveraging our efficient forward stepping technique. Specifically, we design an O(r̃)-word space data structure that supports haplotype retrieval in O(log log_w h + w) time. For prefix search, we present an O(h + r̃)-word data structure that answers queries in O(m' log log_w σ + occ) time, where m' denotes the length of the longest common prefix returned and occ denotes the number of haplotypes prefixed the longest prefix.

Cite as

Paola Bonizzoni, Davide Cozzi, and Younan Gao. Optimal-Time Mapping in Run-Length Compressed PBWT. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 22:1-22:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bonizzoni_et_al:LIPIcs.CPM.2026.22,
  author =	{Bonizzoni, Paola and Cozzi, Davide and Gao, Younan},
  title =	{{Optimal-Time Mapping in Run-Length Compressed PBWT}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{22:1--22:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.22},
  URN =		{urn:nbn:de:0030-drops-259487},
  doi =		{10.4230/LIPIcs.CPM.2026.22},
  annote =	{Keywords: PBWT, LF-Mapping, prefix searches, run-length encoding}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.23

Improved Bounds on the Sum of Exponents of Runs in a String

Authors: Arkadiusz Czarkowski

Abstract

A substring of a word is a run if it is at least twice as long as its minimum period and cannot be extended to either side with the same period. The exponent of a run is the quotient of its length and its minimum period. ρ(n) is the maximum number of runs in a string of length n, while σ(n) is the maximum sum of exponents of runs in a string of length n. While quite tight bounds on ρ(n) are known (0.944575712n ≤ ρ(n) ≤ n), the best upper bound on σ(n) is 3n whereas the best lower bound on σ(n) is 2.035n. In this paper, we improve the upper bound on σ(n) to 2.3n and the lower bound on σ(n) to 2.04448n. We also provide an improved upper bound on σ(n) of 2.2n in the case of a binary alphabet. Our results are achieved using a combination of theoretical and computer-based approaches.

Cite as

Arkadiusz Czarkowski. Improved Bounds on the Sum of Exponents of Runs in a String. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 23:1-23:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{czarkowski:LIPIcs.CPM.2026.23,
  author =	{Czarkowski, Arkadiusz},
  title =	{{Improved Bounds on the Sum of Exponents of Runs in a String}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{23:1--23:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.23},
  URN =		{urn:nbn:de:0030-drops-259494},
  doi =		{10.4230/LIPIcs.CPM.2026.23},
  annote =	{Keywords: strings, runs, sum of exponents of runs, Lyndon words, L-roots, maximal repetitions, combinatorics on words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.24

On Occurrence-Preserving Morphisms

Authors: Kaisei Kishi, Peaker Guo, Cristian Urbina, and Hideo Bannai

Abstract

A morphism is a mapping that transforms words through letter-wise substitution, where each symbol is consistently replaced by a fixed word. In the field of combinatorics on words, one topic that has attracted considerable attention is the characterization of morphisms that preserve specific properties, such as overlap-freeness, square-freeness, lexicographic order, and primitivity. Continuing this direction, we initiate the study on occurrence-preserving morphisms, which address the following fundamental question: given a morphism ϕ, two words u and v, and k ≥ 1, under what conditions does the number of occurrences of u in v equal the number of occurrences of ϕ^k(u) in ϕ^k(v)? To answer this question, we introduce the notion of interference-free morphisms, examine their properties, and uncover a connection to recognizable morphisms. We then present a precise characterization of occurrence-preserving morphisms in terms of interference-freeness. As applications of our characterization, we first show that there exists a bijection between the starting positions of the occurrences of u in v and those of ϕ^k(u) in ϕ^k(v). We then apply the characterization to the Fibonacci and Thue-Morse words to identify their minimal unique substrings (MUSs). Finally, we exploit the connection between MUSs and net occurrences to simplify existing proofs on net occurrences in these words.

Cite as

Kaisei Kishi, Peaker Guo, Cristian Urbina, and Hideo Bannai. On Occurrence-Preserving Morphisms. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 24:1-24:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kishi_et_al:LIPIcs.CPM.2026.24,
  author =	{Kishi, Kaisei and Guo, Peaker and Urbina, Cristian and Bannai, Hideo},
  title =	{{On Occurrence-Preserving Morphisms}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{24:1--24:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.24},
  URN =		{urn:nbn:de:0030-drops-259504},
  doi =		{10.4230/LIPIcs.CPM.2026.24},
  annote =	{Keywords: Property-preserving morphisms, interference-free morphisms, recognizable morphisms, injective morphisms, Fibonacci words, Thue-Morse words, minimal unique substrings (MUSs), net occurrences}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.25

Compressed Index with Construction in Compressed Space

Authors: Dmitry Kosolobov

Abstract

Suppose that we are given a string s of length n over an alphabet {0,1,…,n^O(1)} and δ is the string complexity of s, a known compression measure. We describe an index on s with O(δlog(n/δ)) space, measured in O(log n)-bit machine words, which can search in s any string of length m in O(m + (occ + 1)log^ε n) time, where occ is the number of occurrences and ε > 0 is any fixed constant (the big-O in the space bound hides factor 1/ε). Crucially, the index can be built in O(n log n) expected time by one left-to-right pass on the string s in a streaming fashion with O(δlog(n/δ)) construction space. The index does not use the Karp-Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is O(δlog n/(δα)), where α = log_σ n and σ is the alphabet size, and it coincides with O(δlog(n/δ)) when δ = O(n/α²)). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for δ ≥ Ω(log log n).

Cite as

Dmitry Kosolobov. Compressed Index with Construction in Compressed Space. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 25:1-25:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kosolobov:LIPIcs.CPM.2026.25,
  author =	{Kosolobov, Dmitry},
  title =	{{Compressed Index with Construction in Compressed Space}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{25:1--25:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.25},
  URN =		{urn:nbn:de:0030-drops-259515},
  doi =		{10.4230/LIPIcs.CPM.2026.25},
  annote =	{Keywords: compressed index, pattern matching, string complexity, grammar, block tree}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.26

The Communication Complexity of Pattern Matching with Edits Revisited

Authors: Tomasz Kociumaka, Jakob Nogler, and Philip Wellnitz

Abstract

The decades-old Pattern Matching with Edits problem, given a length-n string T (the text), a length-m string P (the pattern), and a positive integer k (the threshold), asks to list the k-error occurrences of P in T, that is, all fragments of T whose edit distance to P is at most k. The one-way communication complexity of this problem is the minimum number of bits that Alice, given an instance (P,T,k) of the problem, must send to Bob so that Bob can reconstruct the answer solely from that message. In recent work [STOC'24], we showed that, in the natural parameter regime 0 < k < m < n/2, Ω(n/m ⋅ k log(m/k)) bits are necessary and 𝒪(n/m ⋅ k log² m) bits are sufficient for this problem. More generally, for strings over an alphabet Σ, we gave an 𝒪(n/m ⋅ k log m log(m|Σ|))-bit encoding that allows one to recover a shortest sequence of edits for every k-error occurrence of P in T. In this paper, we revisit the original proof and improve the encoding size to 𝒪(n/m ⋅ k log (m|Σ|/k)), which matches the lower bound for constant-sized alphabets. We further establish a new tight lower bound of Ω(n/m ⋅ k log(m|Σ|/k)) for the edit sequence reporting variant we solve. Our encoding size also matches the communication complexity established for the simpler Pattern Matching with Mismatches problem in the context of streaming algorithms [Clifford, Kociumaka, Porat; SODA'19].

Cite as

Tomasz Kociumaka, Jakob Nogler, and Philip Wellnitz. The Communication Complexity of Pattern Matching with Edits Revisited. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 26:1-26:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{kociumaka_et_al:LIPIcs.CPM.2026.26,
  author =	{Kociumaka, Tomasz and Nogler, Jakob and Wellnitz, Philip},
  title =	{{The Communication Complexity of Pattern Matching with Edits Revisited}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{26:1--26:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.26},
  URN =		{urn:nbn:de:0030-drops-259525},
  doi =		{10.4230/LIPIcs.CPM.2026.26},
  annote =	{Keywords: Edit distance, Pattern matching, Communication complexity}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.27

Exploring the Gap Between LCS and LCStr

Authors: Shay Golan, Matan Kraus, Ely Porat, and B. Riva Shalom

Abstract

The Longest Common Subsequence (LCS) problem and the Longest Common Substring (LCStr) problem are classical string problems with broad theoretical and practical significance. The former has a quadratic conditional lower bound [FOCS, 2015], while the latter admits a linear-time solution. In this paper, we study a natural variation of these problems, the Longest Common Subsequence-Substring (LCSS) problem. The LCSS problem seeks the longest string that is simultaneously a subsequence of one input string and a substring of the other. This variant bridges LCS and LCStr, raising intriguing algorithmic questions: Does the complexity of computing LCSS interpolate between the linear time of LCStr and the quadratic time of LCS? What about approximability? We also examine a natural extension of LCSS to multiple strings, parameterizing the balance between subsequence and substring requirements. Our results reveal several insights. First, under the SETH conjecture, the inherent complexity of LCSS is quadratic, similar to LCS. In contrast, we provide a linear-time approximation for LCSS. Finally, for the multi-string variant, unlike both problems, we design a quadratic-time algorithm, uncovering deeper structural properties of the problem. By studying the complexity of the LCSS problem, we aim to gain some understanding of what influences whether a variant of the LCS problem behaves more like the standard LCS or like LCStr. Our findings suggest that hybrid constraints can create computational "sweet spots," where problems become more tractable than their pure counterparts. This opens a broader research direction in constraint-mediated algorithm design. Beyond LCSS itself, our work highlights unexpected connections between subsequence and substring constraints, advancing the theoretical understanding of string problems and laying the foundation for new algorithmic techniques and complexity-theoretic insights in the rich space between classical string comparison paradigms.

Cite as

Shay Golan, Matan Kraus, Ely Porat, and B. Riva Shalom. Exploring the Gap Between LCS and LCStr. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 27:1-27:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.CPM.2026.27,
  author =	{Golan, Shay and Kraus, Matan and Porat, Ely and Shalom, B. Riva},
  title =	{{Exploring the Gap Between LCS and LCStr}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{27:1--27:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.27},
  URN =		{urn:nbn:de:0030-drops-259535},
  doi =		{10.4230/LIPIcs.CPM.2026.27},
  annote =	{Keywords: Longest Common Subsequence, Longest Common Substring, Conditional Lower Bound}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.28

Periodicity Property Testing on Strings with Wildcards

Authors: Carl Barton, Panagiotis Charalampopoulos, Taha El Ghazi, Jonas Ellert, Oded Lachish, and Tatiana Starikovskaya

Abstract

In this work, we study periodicity in strings with wildcards. A string T with at most k wildcards is called strongly (p,k)-periodic if the wildcards in T can be replaced with alphabet symbols to obtain a string with period p, and weakly (p,k)-periodic if T[i] matches T[i+p] for all i. Intuitively, both generalize to (≤ g, k)-periodicity, which is the property of being (p,k)-periodic for some p ∈ [1..g]. An ε-tester for a property 𝒫 is an algorithm that distinguishes between strings that satisfy 𝒫 and strings where one needs to change at least an ε-fraction of the symbols to obtain a string that satisfies 𝒫. We study one-sided error testers, where strings satisfying 𝒫 must always be accepted, while strings that are ε-far must be rejected with probability at least 2/3. The complexity of a tester is the worst-case number of symbols of an input of length n it must read to make the decision. We design the following testers for p,g ≤ n/2: 1) An ε-tester for strong (p,k)-periodicity with complexity Õ_ε(1) . 2) An ε-tester for strong (≤ g,k)-periodicity with complexity Õ_ε(√g). 3) An ε-tester for weak (p,k)-periodicity with complexity Õ_ε(min(k, n /(k+p))). 4) An ε-tester for weak (≤ g,k)-periodicity with complexity Õ_ε(min(k+ √{gk}, n/√k)). Additionally, we show a lower bound on the complexity of ε-testers for weak (≤ g,k)-periodicity, implying that our tester for weak (≤ g,k)-periodicity is optimal up to a multiplicative (ε^{-1} ln(gk))^O(1) factor for a wide range of g and k. Finally, our tester for strong (≤ g,k)-periodicity generalizes the one of [Lachish and Newman; Algorithmica 2011] for strings without wildcards, matching (up to polylogarithmic factors) the unconditional lower bound of ̃Ω(√g) in said work for constant ε.

Cite as

Carl Barton, Panagiotis Charalampopoulos, Taha El Ghazi, Jonas Ellert, Oded Lachish, and Tatiana Starikovskaya. Periodicity Property Testing on Strings with Wildcards. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 28:1-28:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{barton_et_al:LIPIcs.CPM.2026.28,
  author =	{Barton, Carl and Charalampopoulos, Panagiotis and Ghazi, Taha El and Ellert, Jonas and Lachish, Oded and Starikovskaya, Tatiana},
  title =	{{Periodicity Property Testing on Strings with Wildcards}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{28:1--28:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.28},
  URN =		{urn:nbn:de:0030-drops-259543},
  doi =		{10.4230/LIPIcs.CPM.2026.28},
  annote =	{Keywords: periodicity, property testing, wildcards}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.29

The TAG Array of a Multiple Sequence Alignment

Authors: Jannik Olbrich and Enno Ohlebusch

Abstract

Modern genomic analyses increasingly rely on pangenomes, that is, representations of the genome of entire populations. The simplest representation of a pangenome is a set of individual genome sequences. Compared to e.g. sequence graphs, this has the advantage that efficient exact search via indexes based on the Burrows-Wheeler Transform (BWT) is possible, that no chimeric sequences are created, and that the results are not influenced by heuristics. However, such an index may report a match in thousands of positions even if these all correspond to the same locus, making downstream analysis unnecessarily more expensive. For sufficiently similar sequences (e.g. human chromosomes), a multiple sequence alignment (MSA) can be computed. Since an MSA tends to group similar strings in the same columns, it is likely that a string occurring thousands of times in the pangenome can be described by very few columns in the MSA. We describe a method to tag entries in the BWT with the corresponding column in the MSA and develop an index that can map matches in the BWT to columns in the MSA in time proportional to the output. As a by-product, we can project a match to a designated reference genome, a capability that current pangenome aligners lack.

Cite as

Jannik Olbrich and Enno Ohlebusch. The TAG Array of a Multiple Sequence Alignment. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 29:1-29:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{olbrich_et_al:LIPIcs.CPM.2026.29,
  author =	{Olbrich, Jannik and Ohlebusch, Enno},
  title =	{{The TAG Array of a Multiple Sequence Alignment}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{29:1--29:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.29},
  URN =		{urn:nbn:de:0030-drops-259555},
  doi =		{10.4230/LIPIcs.CPM.2026.29},
  annote =	{Keywords: Burrows-Wheeler Transform, pattern matching, index data structure, pangenomics}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.30

Constructing Suffixient Arrays Revisited

Authors: Paola Bonizzoni, Younan Gao, and Brian Riccardi

Abstract

Recently, Cenzato et al. proposed a new text index, called the suffixient array, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text T[1..n] is available. They show that, given the suffix array, the longest common prefix array, and the Burrows-Wheeler transform (BWT) of the reverse of T[1..n] over an alphabet {1,…,σ}, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in O(n + r log σ) time, where r is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.

Cite as

Paola Bonizzoni, Younan Gao, and Brian Riccardi. Constructing Suffixient Arrays Revisited. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 30:1-30:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{bonizzoni_et_al:LIPIcs.CPM.2026.30,
  author =	{Bonizzoni, Paola and Gao, Younan and Riccardi, Brian},
  title =	{{Constructing Suffixient Arrays Revisited}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{30:1--30:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.30},
  URN =		{urn:nbn:de:0030-drops-259564},
  doi =		{10.4230/LIPIcs.CPM.2026.30},
  annote =	{Keywords: Suffixient set, suffixient array, right-maximal substring, linear-time algorithm}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.31

On the Smallest Size of Internal Collage Systems

Authors: Soichiro Migita, Kyotaro Uehata, and Tomohiro I

Abstract

A Straight-Line Program (SLP) for a string T is a context-free grammar in Chomsky normal form that derives T only, which can be seen as a compressed form of T. Kida et al. introduced collage systems [Theor. Comput. Sci., 2003] to generalize SLPs by adding repetition rules and truncation rules. The smallest size c(T) of collage systems for T has gained attention to see how these generalized rules improve the compression ability of SLPs. Navarro et al. [IEEE Trans. Inf. Theory, 2021] showed that c(T) ∈ O(z(T)) and there is a string family with c(T) ∈ Ω(b(T) log |T|), where z(T) is the number of phrases in the Lempel-Ziv parsing of T and b(T) is the smallest size of bidirectional schemes for T. They also introduced a subclass of collage systems, called internal collage systems, and proved that its smallest size ĉ(T) for T is at least b(T). While c(T) ≤ ĉ(T) is obvious, it is unknown how large ĉ(T) is compared to c(T). In this paper, we prove that ĉ(T) = Θ(c(T)) by showing that any collage system of size m can be transformed into an internal collage system of size O(m) in O(m²) time. Thanks to this result, we can focus on internal collage systems to study the asymptotic behavior of c(T), which helps to suppress excess use of truncation rules. As a direct application, we get b(T) = O(c(T)), which answers an open question posed in [Navarro et al., IEEE Trans. Inf. Theory, 2021]. We also give a MAX-SAT formulation to compute ĉ(T) for a given T.

Cite as

Soichiro Migita, Kyotaro Uehata, and Tomohiro I. On the Smallest Size of Internal Collage Systems. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 31:1-31:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{migita_et_al:LIPIcs.CPM.2026.31,
  author =	{Migita, Soichiro and Uehata, Kyotaro and I, Tomohiro},
  title =	{{On the Smallest Size of Internal Collage Systems}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{31:1--31:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.31},
  URN =		{urn:nbn:de:0030-drops-259575},
  doi =		{10.4230/LIPIcs.CPM.2026.31},
  annote =	{Keywords: Collage Systems, Dictionary-based compression, Compressibility measures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.32

Balancing Two-Dimensional Straight-Line Programs

Authors: Itai Boneh, Estéban Gabory, Paweł Gawrychowski, and Adam Górkiewicz

Abstract

We consider building, given a straight-line program (SLP) consisting of g productions deriving a two-dimensional string T of size N× N, a structure capable of providing random access to any character of T. For one-dimensional strings, it is now known how to build a structure of size 𝒪(g) that provides random access in 𝒪(log N) time. In fact, it is known that this can be obtained by building an equivalent SLP of size 𝒪(g) and depth 𝒪(log N) [Ganardi, Jeż, Lohrey, JACM 2021]. We consider the analogous question for two-dimensional strings: can we build an equivalent SLP of roughly the same size and small depth? We show that the answer is negative: there exists an infinite family of two-dimensional strings of size N× N described by a 2D SLP of size g such that any 2D SLP of depth 𝒪(log N) describing the same string must be of size Ω(g⋅ N/log³N). We complement this with an upper bound showing how to construct such a 2D SLP of size 𝒪(g⋅ N). Next, we observe that one can naturally define a generalization of 2D SLP, which we call 2D SLP with holes. We show that a known general balancing theorem by [Ganardi, Jeż, Lohrey, JACM 2021] immediately implies that, given a 2D SLP of size g deriving a string of size N× N, we can construct a 2D SLP with holes of depth 𝒪(log N) and size 𝒪(g). This allows us to conclude that there is a structure of size 𝒪(g) providing random access in 𝒪(log N) time for such a 2D SLP. Further, this can be extended (analogously as for a 1D SLP) to obtain a structure of size 𝒪(g log^ε N) providing random access in 𝒪(log N/log log N) time, for any ε > 0. The same (optimal) random access time was very recently achieved by [De and Kempa, SODA 2026], but with a significantly larger structure of size 𝒪(g log^{2+ε} N).

Cite as

Itai Boneh, Estéban Gabory, Paweł Gawrychowski, and Adam Górkiewicz. Balancing Two-Dimensional Straight-Line Programs. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 32:1-32:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2026.32,
  author =	{Boneh, Itai and Gabory, Est\'{e}ban and Gawrychowski, Pawe{\l} and G\'{o}rkiewicz, Adam},
  title =	{{Balancing Two-Dimensional Straight-Line Programs}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{32:1--32:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.32},
  URN =		{urn:nbn:de:0030-drops-259582},
  doi =		{10.4230/LIPIcs.CPM.2026.32},
  annote =	{Keywords: Two-dimensional string, straight-line program, random access}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.33

The Smallest String Attractors of Fibonacci and Period-Doubling Words

Authors: Mutsunori Banbara, Hideo Bannai, Peaker Guo, Dominik Köppl, Takuya Mieno, and Yoshio Okamoto

Abstract

A string attractor of a string T[1..|T|] is a set of positions Γ of T such that any substring w of T has an occurrence that crosses a position in Γ, i.e., there is a position i such that w = T[i..i+|w|-1] and the intersection [i,i+|w|-1]∩ Γ is nonempty. The size of the smallest string attractor of Fibonacci words is known to be 2. We completely characterize the set of all smallest string attractors of Fibonacci words, and show a recursive formula describing the 2^{n-4} + 2^{⌈n/2⌉ - 2} distinct position pairs that are the smallest string attractors of the nth Fibonacci word for n ≥ 7. Similarly, the size of the smallest string attractor of period-doubling words is known to be 2. We also completely characterize the set of all smallest string attractors of period-doubling words, and show a formula describing the two distinct position pairs that are the smallest string attractors of the nth period-doubling word for n ≥ 2. Our results show that strings with the same smallest attractor size can have a drastically different number of distinct smallest attractors.

Cite as

Mutsunori Banbara, Hideo Bannai, Peaker Guo, Dominik Köppl, Takuya Mieno, and Yoshio Okamoto. The Smallest String Attractors of Fibonacci and Period-Doubling Words. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 33:1-33:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{banbara_et_al:LIPIcs.CPM.2026.33,
  author =	{Banbara, Mutsunori and Bannai, Hideo and Guo, Peaker and K\"{o}ppl, Dominik and Mieno, Takuya and Okamoto, Yoshio},
  title =	{{The Smallest String Attractors of Fibonacci and Period-Doubling Words}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{33:1--33:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.33},
  URN =		{urn:nbn:de:0030-drops-259599},
  doi =		{10.4230/LIPIcs.CPM.2026.33},
  annote =	{Keywords: String attractors, Fibonacci words, Period-doubling words, Combinatorics on words}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.34

LZBE: An LZ-Style Compressor Supporting O(log n)-Time Random Access

Authors: Hiroki Shibata, Yuto Nakashima, Yutaro Yamaguchi, and Shunsuke Inenaga

Abstract

An LZ-like factorization of a string divides it into factors, each being either a single character or a copy of a preceding substring. While grammar-based compression schemes support efficient random access with space linear in the compressed size, no comparable guarantees are known for general LZ-like factorizations. This limitation motivated restricted variants such as LZ-End [Kreft and Navarro, 2013] and height-bounded LZ (LZHB) [Bannai et al., 2024], which trade off some compression efficiency for faster access. In this paper, we introduce LZ-Begin-End (LZBE), a new LZ-like variant in which every copy factor must refer to a contiguous sequence of preceding factors. This structural restriction ensures that any context-free grammar can be transformed into an LZBE factorization of the same size. We further study the greedy LZBE factorization, which selects each copy factor to be as long as possible while processing the input from left to right, and show that it can be computed in linear time. Moreover, we exhibit a family of strings for which the greedy LZBE factorization is asymptotically smaller than the smallest grammar. These results demonstrate that the LZBE scheme is strictly more expressive than grammar-based compression in the worst case. To support fast queries, we propose a data structure for LZBE-compressed strings that permits O(log n)-time random access within space linear in the compressed size, where n is the length of the input string.

Cite as

Hiroki Shibata, Yuto Nakashima, Yutaro Yamaguchi, and Shunsuke Inenaga. LZBE: An LZ-Style Compressor Supporting O(log n)-Time Random Access. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 34:1-34:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{shibata_et_al:LIPIcs.CPM.2026.34,
  author =	{Shibata, Hiroki and Nakashima, Yuto and Yamaguchi, Yutaro and Inenaga, Shunsuke},
  title =	{{LZBE: An LZ-Style Compressor Supporting O(log n)-Time Random Access}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{34:1--34:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.34},
  URN =		{urn:nbn:de:0030-drops-259609},
  doi =		{10.4230/LIPIcs.CPM.2026.34},
  annote =	{Keywords: data compression, Lempel-Ziv parsing, string algorithms, random access}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.35

Efficient Index for Square Pattern Matching

Authors: Po-Chun Chen, Che-Wei Tsao, Wing-Kai Hon, and Dominik Köppl

Abstract

A string S is called a square if it can be written as the concatenation of two identical strings. Two strings P and Q of the same length are said to square match if, for every substring of P, it is a square if and only if the corresponding substring of Q is also a square. The square pattern matching problem asks for locating all substrings of a given text T of length n that square match a query pattern P of length m. This notion captures similarity in repetition structures and is motivated by applications in areas such as bioinformatics and music structure analysis. In this paper, we introduce a novel technique, called the longest prefix square (LPS) encoding, which represents the square structure of a string as an integer array of the same length. We show that two strings square match if and only if they have identical LPS encodings. Based on this result, we construct an index solving the square pattern matching problem in time O(m lg m + occ) using O(nlg²n) bits of space, where occ denotes the number of occurrences of substrings in T that square match P. If the LPS encoding of P is precomputed, the query time improves to O(m + occ).

Cite as

Po-Chun Chen, Che-Wei Tsao, Wing-Kai Hon, and Dominik Köppl. Efficient Index for Square Pattern Matching. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 35:1-35:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.CPM.2026.35,
  author =	{Chen, Po-Chun and Tsao, Che-Wei and Hon, Wing-Kai and K\"{o}ppl, Dominik},
  title =	{{Efficient Index for Square Pattern Matching}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{35:1--35:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.35},
  URN =		{urn:nbn:de:0030-drops-259617},
  doi =		{10.4230/LIPIcs.CPM.2026.35},
  annote =	{Keywords: string algorithms, pattern matching, indexing, squares}
}

Document

DOI: 10.4230/LIPIcs.CPM.2026.36

Set Parameterized Matching via Multi-Layer Hashing

Authors: Moshe Lewenstein and Ely Porat

Abstract

We study the set parameterized matching problem, a generalization of the classical parameterized matching problem introduced by Baker [Baker, 1993; Baker, 1997]. In set parameterized matching, both the pattern and text are sequences where each position contains a set of characters rather than a single character. Two set-strings parameterized match if there exists a bijection between their alphabets that maps one to the other set-wise. Boussidan [Aaron Boussidan, 2025] introduced this problem for the case of equal-length set-strings. We present a randomized algorithm running in O(N + M) time with high probability, where N is the text size and M is the pattern size. Our approach employs a novel three-layer hashing scheme based on Karp-Rabin fingerprinting that addresses the challenges of (1) the size blowup in representations of the problem, (2) set-to-set matching, and (3) the dynamic nature of encodings of text substrings during pattern scanning.

Cite as

Moshe Lewenstein and Ely Porat. Set Parameterized Matching via Multi-Layer Hashing. In 37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 369, pp. 36:1-36:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{lewenstein_et_al:LIPIcs.CPM.2026.36,
  author =	{Lewenstein, Moshe and Porat, Ely},
  title =	{{Set Parameterized Matching via Multi-Layer Hashing}},
  booktitle =	{37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
  pages =	{36:1--36:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-420-8},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{369},
  editor =	{Bille, Philip and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.36},
  URN =		{urn:nbn:de:0030-drops-259620},
  doi =		{10.4230/LIPIcs.CPM.2026.36},
  annote =	{Keywords: Set Parameterized Matching, Pattern Matching, Randomized Algorithms, Hashing, Parameterized Matching}
}

Filters

Subjects
Applied computing
Applied computing → Computational biology
Information systems
Information systems → Information retrieval
Mathematics of computing
Mathematics of computing → Combinatorial algorithms
Mathematics of computing → Combinatoric problems
Mathematics of computing → Combinatorics
Mathematics of computing → Combinatorics on words
Mathematics of computing → Discrete mathematics
Mathematics of computing → Probabilistic algorithms
Theory of computation
Theory of computation
Theory of computation → Algorithm design techniques
Theory of computation → Approximation algorithms analysis
Theory of computation → Communication complexity
Theory of computation → Complexity theory and logic
Theory of computation → Data compression
Theory of computation → Data structures design and analysis
Theory of computation → Design and analysis of algorithms
Theory of computation → Fixed parameter tractability
Theory of computation → Formal languages and automata theory
Theory of computation → Grammars and context-free languages
Theory of computation → Graph algorithms analysis
Theory of computation → Parallel algorithms
Theory of computation → Parameterized complexity and exact algorithms
Theory of computation → Pattern matching
Theory of computation → Probabilistic computation
Theory of computation → Problems, reductions and completeness
Theory of computation → Regular languages
Theory of computation → Sorting and searching
Theory of computation → Streaming, sublinear and near linear time algorithms
Theory of computation → Theory and algorithms for application domains

Any Issues?

X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail

© 2023-2026 Schloss Dagstuhl – LZI GmbH Schloss Dagstuhl – LZI GmbH About DROPS Imprint Privacy Contact