DROPS

Document

DOI: 10.4230/LIPIcs.STACS.2026.5

On the Complexity of Language Membership for Probabilistic Words

Authors: Antoine Amarilli, Mikaël Monet, Paul Raphaël, and Sylvain Salvati

Published in: LIPIcs, Volume 364, 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Abstract

We study the membership problem to context-free languages L (CFLs) on probabilistic words, that specify for each position a probability distribution on the letters (assuming independence across positions). Our task is to compute, given a probabilistic word, what is the probability that a word drawn according to the distribution belongs to L. This problem generalizes the problem of counting how many words of length n belong to L, or of counting how many completions of a partial word belong to L. We show that this problem is in polynomial time for unambiguous context-free languages (uCFLs), but can be #P-hard already for unions of two linear uCFLs. More generally, we show that the problem is in polynomial time for so-called poly-slicewise-unambiguous languages, where given a length n we can tractably compute an uCFL for the words of length n in the language. This class includes some inherently ambiguous languages, and implies the tractability of bounded CFLs and of languages recognized by unambiguous polynomial-time counter automata; but we show that the problem can be #P-hard for nondeterministic counter automata, even for Parikh automata with a single counter. We then introduce classes of circuits from knowledge compilation which we use for tractable counting, and show that this covers the tractability of poly-slicewise-unambiguous languages and of some CFLs that are not poly-slicewise-unambiguous. Extending these circuits with negation further allows us to show tractability for the language of primitive words, and for the language of concatenations of two palindromes. We finally show the conditional undecidability of the meta-problem that asks, given a CFG, whether the probabilistic membership problem for that CFG is tractable or #P-hard.

Cite as

Antoine Amarilli, Mikaël Monet, Paul Raphaël, and Sylvain Salvati. On the Complexity of Language Membership for Probabilistic Words. In 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 364, pp. 5:1-5:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.STACS.2026.5,
  author =	{Amarilli, Antoine and Monet, Mika\"{e}l and Rapha\"{e}l, Paul and Salvati, Sylvain},
  title =	{{On the Complexity of Language Membership for Probabilistic Words}},
  booktitle =	{43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)},
  pages =	{5:1--5:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-412-3},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{364},
  editor =	{Mahajan, Meena and Manea, Florin and McIver, Annabelle and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2026.5},
  URN =		{urn:nbn:de:0030-drops-254943},
  doi =		{10.4230/LIPIcs.STACS.2026.5},
  annote =	{Keywords: Automaton, probabilistic words, context-free grammar, membership problem}
}

Document

DOI: 10.4230/LIPIcs.CSL.2026.44

Minimal DFAs Witnessing Language Inequivalence

Authors: Jan Martens

Published in: LIPIcs, Volume 363, 34th EACSL Annual Conference on Computer Science Logic (CSL 2026)

Abstract

We study small witnesses for the inequivalence of two regular languages. A natural witness is a distinguishing word, e.g. a word in exactly one of the two languages. We propose using more succinct witnesses in the form of witnessing DFAs. A witnessing DFA recognizes a subset of one of the languages and contains at least one distinguishing word. In this way the DFA expresses behaviour contained in the first language but not the second. We show witnessing DFAs can be used to present more concise witnesses for the inequivalence of two regular languages. We show that the decision problem for the existence of a witnessing DFA of certain size is NP-complete in general, and in P in the special case of unary DFAs. Besides these computational aspects, we study structural properties of witnessing DFAs. Not all languages can be a minimal witness. It turns out that minimal witnesses are exactly the languages that are not decomposable in the union of languages with smaller state-complexity, the so-called prime languages as studied earlier by Kupferman and Mosheiff.

Cite as

Jan Martens. Minimal DFAs Witnessing Language Inequivalence. In 34th EACSL Annual Conference on Computer Science Logic (CSL 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 363, pp. 44:1-44:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{martens:LIPIcs.CSL.2026.44,
  author =	{Martens, Jan},
  title =	{{Minimal DFAs Witnessing Language Inequivalence}},
  booktitle =	{34th EACSL Annual Conference on Computer Science Logic (CSL 2026)},
  pages =	{44:1--44:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-411-6},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{363},
  editor =	{Guerrini, Stefano and K\"{o}nig, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CSL.2026.44},
  URN =		{urn:nbn:de:0030-drops-254691},
  doi =		{10.4230/LIPIcs.CSL.2026.44},
  annote =	{Keywords: Deterministic Finite Automata, Language Inequivalence, DFA decomposition, Prime languages}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2025.36

Streaming Periodicity with Mismatches, Wildcards, and Edits

Authors: Taha El Ghazi and Tatiana Starikovskaya

Published in: LIPIcs, Volume 359, 36th International Symposium on Algorithms and Computation (ISAAC 2025)

Abstract

In this work, we study the problem of detecting periodic trends in strings. While detecting exact periodicity has been studied extensively, real-world data is often noisy, where small deviations or mismatches occur between repetitions. This work focuses on a generalized approach to period detection that efficiently handles noise. Given a string S of length n, the task is to identify integers p such that the prefix and the suffix of S, each of length n-p+1, are similar under a given distance measure. Ergün et al. [APPROX-RANDOM 2017] were the first to study this problem in the streaming model under the Hamming distance. In this work, we combine, in a non-trivial way, the Hamming distance sketch of Clifford et al. [SODA 2019] and the structural description of the k-mismatch occurrences of a pattern in a text by Charalampopoulos et al. [FOCS 2020] to present a more efficient streaming algorithm for period detection under the Hamming distance. As a corollary, we derive a streaming algorithm for detecting periods of strings which may contain wildcards, a special symbol that match any character of the alphabet. Our algorithm is not only more efficient than that of Ergün et al. [TCS 2020], but it also operates without their assumption that the string must be free of wildcards in its final characters. Additionally, we introduce the first two-pass streaming algorithm for computing periods under the edit distance by leveraging and extending the Bhattacharya-Koucký’s grammar decomposition technique [STOC 2023].

Cite as

Taha El Ghazi and Tatiana Starikovskaya. Streaming Periodicity with Mismatches, Wildcards, and Edits. In 36th International Symposium on Algorithms and Computation (ISAAC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 359, pp. 36:1-36:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{elghazi_et_al:LIPIcs.ISAAC.2025.36,
  author =	{El Ghazi, Taha and Starikovskaya, Tatiana},
  title =	{{Streaming Periodicity with Mismatches, Wildcards, and Edits}},
  booktitle =	{36th International Symposium on Algorithms and Computation (ISAAC 2025)},
  pages =	{36:1--36:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-408-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{359},
  editor =	{Chen, Ho-Lin and Hon, Wing-Kai and Tsai, Meng-Tsung},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2025.36},
  URN =		{urn:nbn:de:0030-drops-249446},
  doi =		{10.4230/LIPIcs.ISAAC.2025.36},
  annote =	{Keywords: approximate periods, pattern matching, streaming algorithms}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2025.9

Small Space Encoding and Recognition of k-Palindromic Prefixes

Authors: Gabriel Bathie, Jonas Ellert, and Tatiana Starikovskaya

Published in: LIPIcs, Volume 359, 36th International Symposium on Algorithms and Computation (ISAAC 2025)

Abstract

Palindromes are non-empty strings that read the same forward and backward. We study the problem of recognizing so-called k-palindromic strings, which can be represented as the concatenation of exactly k palindromes. [Rubinchik and Shur, MFCS 2020] showed that the problem is solvable in linear space and time. We present a read-only algorithm that recognizes all k-palindromic prefixes of a string T of length n in O(n ⋅ 6^{k²} ⋅ log^k n) time and O(6^{k²} ⋅ log^k n) space. As a corollary, we also obtain a read-only algorithm for computing the palindromic length of T, i.e., the smallest k such that T is k-palindromic, in O(n ⋅ 6^{k²} ⋅ log^⌈k/2⌉ n) time and O(6^{k²} ⋅ log^⌈k/2⌉ n) space.

Cite as

Gabriel Bathie, Jonas Ellert, and Tatiana Starikovskaya. Small Space Encoding and Recognition of k-Palindromic Prefixes. In 36th International Symposium on Algorithms and Computation (ISAAC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 359, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bathie_et_al:LIPIcs.ISAAC.2025.9,
  author =	{Bathie, Gabriel and Ellert, Jonas and Starikovskaya, Tatiana},
  title =	{{Small Space Encoding and Recognition of k-Palindromic Prefixes}},
  booktitle =	{36th International Symposium on Algorithms and Computation (ISAAC 2025)},
  pages =	{9:1--9:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-408-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{359},
  editor =	{Chen, Ho-Lin and Hon, Wing-Kai and Tsai, Meng-Tsung},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2025.9},
  URN =		{urn:nbn:de:0030-drops-249178},
  doi =		{10.4230/LIPIcs.ISAAC.2025.9},
  annote =	{Keywords: palindromic length, read-only algorithms, palindromes}
}

Document

DOI: 10.4230/LIPIcs.ESA.2025.8

Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares

Authors: Yuto Nakashima, Jakub Radoszewski, and Tomasz Waleń

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

A k-mismatch square is a string of the form XY where X and Y are two equal-length strings that have at most k mismatches. Kolpakov and Kucherov [Theor. Comput. Sci., 2003] defined two notions of k-mismatch repeats, called k-repetitions and k-runs, each representing a sequence of consecutive k-mismatch squares of equal length. They proposed algorithms for computing k-repetitions and k-runs working in 𝒪(nklog k+output) time for a string of length n over an integer alphabet, where output is the number of the reported repeats. We show that output = 𝒪(nk log k), both in case of k-repetitions and k-runs, which implies that the complexity of their algorithms is actually 𝒪(nk log k). We apply this result to computing parameterized squares. A parameterized square is a string of the form XY such that X and Y parameterized-match, i.e., there exists a bijection f on the alphabet such that f(X) = Y. Two parameterized squares XY and X'Y' are equivalent if they parameterized match. Recently Hamai et al. [SPIRE 2024] showed that a string of length n over an alphabet of size σ contains less than nσ non-equivalent parameterized squares, improving an earlier bound by Kociumaka et al. [Theor. Comput. Sci., 2016]. We apply our bound for k-mismatch repeats to propose an algorithm that reports all non-equivalent parameterized squares in 𝒪(nσ log σ) time. We also show that the number of non-equivalent parameterized squares can be computed in 𝒪(n log n) time. This last algorithm applies to squares under any substring compatible equivalence relation and also to counting squares that are distinct as strings. In particular, this improves upon the 𝒪(nσ)-time algorithm of Gawrychowski et al. [CPM 2023] for counting order-preserving squares that are distinct as strings if σ = ω(log n).

Cite as

Yuto Nakashima, Jakub Radoszewski, and Tomasz Waleń. Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{nakashima_et_al:LIPIcs.ESA.2025.8,
  author =	{Nakashima, Yuto and Radoszewski, Jakub and Wale\'{n}, Tomasz},
  title =	{{Fast Computation of k-Runs, Parameterized Squares, and Other Generalised Squares}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.8},
  URN =		{urn:nbn:de:0030-drops-244768},
  doi =		{10.4230/LIPIcs.ESA.2025.8},
  annote =	{Keywords: string algorithm, k-mismatch square, parameterized square, order-preserving square, maximum gapped repeat}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.51

Word Structures and Their Automatic Presentations

Authors: Xiaoyang Gong, Bakh Khoussainov, and Yuyang Zhuge

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We study automatic presentations of the structures (ℕ; S), (ℕ; E_S), (ℕ; ≤), and their expansions by a unary predicate U. Here S is the successor function, E_S is the undirected version of S, and ≤ is the natural order. We call these structures word structures. Our goal is three-fold. First, we study the isomorphism problem for automatic word structures by focusing on the following three problems. The first problem asks to design an algorithm that, given an automatic structure A, decides if A is isomorphic to (ℕ; S). The second asks to design an algorithm that, given two automatic presentations of (ℕ; S, U₁) and (ℕ; S, U₂), where U₁ and U₂ are unary predicates, decides if these structures are isomorphic. The third problem investigates if there is an algorithm that, given two automatic presentations of (ℕ; ≤, U₁) and (ℕ; ≤, U₂), decides whether U₁ ∩ U₂ ≠ ∅. We show that these problems are undecidable. Next, we study intrinsic regularity of the function S in the structure Path_ω = (ℕ; E_S). We build an automatic presentation of Path_ω in which S is not regular. This implies that S is not intrinsically regular in Path_ω. For U ⊆ ℕ, let d_U be the function that computes the distances between the consecutive elements of U. We build automatic presentations of (ℕ; ≤, U) where d_U can realise logarithmic, radical, intermediate, and exponential functions.

Cite as

Xiaoyang Gong, Bakh Khoussainov, and Yuyang Zhuge. Word Structures and Their Automatic Presentations. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 51:1-51:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gong_et_al:LIPIcs.MFCS.2025.51,
  author =	{Gong, Xiaoyang and Khoussainov, Bakh and Zhuge, Yuyang},
  title =	{{Word Structures and Their Automatic Presentations}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{51:1--51:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.51},
  URN =		{urn:nbn:de:0030-drops-241581},
  doi =		{10.4230/LIPIcs.MFCS.2025.51},
  annote =	{Keywords: Automatic structures, the isomorphism problem, decidability, undecidability, regular relations}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.32

A Proof of Shur’s Conjecture on the Growth of Power-Free Languages over Large Alphabets

Authors: Vuong Bui and Matthieu Rosenfeld

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We settle a conjecture of Shur on an estimation of the exponential growth rates of the languages of (n/(n-1))-free words and (n/(n-1)) ^+-free words over large alphabets of size k with a correction of order O (1/(k²)).

Cite as

Vuong Bui and Matthieu Rosenfeld. A Proof of Shur’s Conjecture on the Growth of Power-Free Languages over Large Alphabets. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 32:1-32:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bui_et_al:LIPIcs.MFCS.2025.32,
  author =	{Bui, Vuong and Rosenfeld, Matthieu},
  title =	{{A Proof of Shur’s Conjecture on the Growth of Power-Free Languages over Large Alphabets}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{32:1--32:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.32},
  URN =		{urn:nbn:de:0030-drops-241398},
  doi =		{10.4230/LIPIcs.MFCS.2025.32},
  annote =	{Keywords: power-free languages, large alphabets, Shur’s conjecture, Dejean’s conjecture}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.62

The Complexity of Reachability Problems in Strongly Connected Finite Automata

Authors: Stefan Kiefer and Andrew Ryzhikov

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

Several reachability problems in finite automata, such as completeness of NFAs and synchronisation of total DFAs, correspond to fundamental properties of sets of nonnegative matrices. In particular, the two mentioned properties correspond to matrix mortality and ergodicity, which ask whether there exists a product of the input matrices that is equal to, respectively, the zero matrix and a matrix with a column of strictly positive entries only. The case where the input automaton is strongly connected (that is, the corresponding set of nonnegative matrices is irreducible) frequently appears in applications and often admits better properties than the general case. In this paper, we address the existence of such properties from the computational complexity point of view, and develop a versatile technique to show that several NL-complete problems remain NL-complete in the strongly connected case. In particular, we show that deciding if a binary total DFA is synchronising is NL-complete even if it is promised to be strongly connected, and that deciding completeness of a binary unambiguous NFA with very limited nondeterminism is NL-complete under the same promise.

Cite as

Stefan Kiefer and Andrew Ryzhikov. The Complexity of Reachability Problems in Strongly Connected Finite Automata. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 62:1-62:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kiefer_et_al:LIPIcs.MFCS.2025.62,
  author =	{Kiefer, Stefan and Ryzhikov, Andrew},
  title =	{{The Complexity of Reachability Problems in Strongly Connected Finite Automata}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{62:1--62:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.62},
  URN =		{urn:nbn:de:0030-drops-241690},
  doi =		{10.4230/LIPIcs.MFCS.2025.62},
  annote =	{Keywords: unambiguous automata, nonnegative matrices, irreducible matrix sets, strongly connected automata, matrix monoids, mortality, completeness, synchronisation, ergodicity}
}

@InProceedings{kiefer_et_al:LIPIcs.MFCS.2025.62,
  author =	{Kiefer, Stefan and Ryzhikov, Andrew},
  title =	{{The Complexity of Reachability Problems in Strongly Connected Finite Automata}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{62:1--62:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.62},
  URN =		{urn:nbn:de:0030-drops-241690},
  doi =		{10.4230/LIPIcs.MFCS.2025.62},
  annote =	{Keywords: unambiguous automata, nonnegative matrices, irreducible matrix sets, strongly connected automata, matrix monoids, mortality, completeness, synchronisation, ergodicity}
}

Document

DOI: 10.4230/OASIcs.Manzini.1

BWT and Combinatorics on Words

Authors: Gabriele Fici, Sabrina Mantaci, Antonio Restivo, Giuseppe Romana, Giovanna Rosone, and Marinella Sciortino

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Burrows-Wheeler Transform (BWT) is a reversible transformation on words (strings) introduced in 1994 in the context of data compression, which is a permutation of the characters in the word. Its clustering effect, i.e., the remarkable property of grouping identical characters (BWT runs) when they share common contexts, has made it a powerful tool for boosting compression performances and enabling efficient pattern searching in highly repetitive string collections. In this chapter, we analyze the Burrows-Wheeler transform under the combinatorial point of view, and we survey known properties and connections with different aspects of combinatorics on words. In particular, we focus on the properties of words in relation to the number of their BWT runs. The value r, which counts the number of BWT runs, impacts both compression performance and indexing efficiency, and is considered a measure to evaluate the above-mentioned clustering effect and, consequently, the repetitiveness of a word. We give an overview of the results relating r to other combinatorial repetitiveness measures related to the factor complexity. The chapter also explores extremal cases of the clustering effect. Finally, some results on the sensitivity of the measure r are considered, where the effects of combinatorial operations are studied, such as reversal, edits, and the application of morphisms.

Cite as

Gabriele Fici, Sabrina Mantaci, Antonio Restivo, Giuseppe Romana, Giovanna Rosone, and Marinella Sciortino. BWT and Combinatorics on Words. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 1:1-1:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{fici_et_al:OASIcs.Manzini.1,
  author =	{Fici, Gabriele and Mantaci, Sabrina and Restivo, Antonio and Romana, Giuseppe and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{BWT and Combinatorics on Words}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{1:1--1:23},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.1},
  URN =		{urn:nbn:de:0030-drops-239090},
  doi =		{10.4230/OASIcs.Manzini.1},
  annote =	{Keywords: Burrows-Wheeler Transform, Combinatorics on Words, Clustering Effect, BWT Runs}
}

Document

DOI: 10.4230/OASIcs.Manzini.2

A Survey of the Bijective Burrows-Wheeler Transform

Authors: Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Bijective BWT (BBWT), conceived by Scott in 2007, later summarized in a preprint by Gil and Scott in 2009 (arXiv 2012), is a variant of the Burrows-Wheeler Transform which is bijective: every string is the BBWT of some string. Indeed, the BBWT of a string is the extended BWT [Mantaci et al., 2007] of the factors of its Lyndon factorization. The BBWT has been receiving increasing interest in recent years. In this paper, we survey existing research on the BBWT, starting with its history and motivation. We then present algorithmic topics including construction algorithms with various complexities and an index on top of the BBWT for pattern matching. We subsequently address some properties of the BBWT as a compressor, discussing robustness to operations such as reversal, edits, rotation, as well as compression power. We close with listing other bijective variants of the BWT and open problems concerning the BBWT.

Cite as

Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták. A Survey of the Bijective Burrows-Wheeler Transform. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 2:1-2:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:OASIcs.Manzini.2,
  author =	{Bannai, Hideo and K\"{o}ppl, Dominik and Lipt\'{a}k, Zsuzsanna},
  title =	{{A Survey of the Bijective Burrows-Wheeler Transform}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{2:1--2:26},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.2},
  URN =		{urn:nbn:de:0030-drops-239100},
  doi =		{10.4230/OASIcs.Manzini.2},
  annote =	{Keywords: Burrows-Wheeler Transform, compression, text indexing, repetitiveness measure, Lyndon words, index construction algorithms, bijective string transformation}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.20

SimdMinimizers: Computing Random Minimizers, fast

Authors: Ragnar Groot Koerkamp and Igor Martayan

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Motivation. Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller than the input sequences, but preserve sufficient information for downstream analysis. Minimizers are an especially popular sketching technique and used in a wide variety of applications. They sample at least one out of every w consecutive k-mers. As DNA sequencers are getting more accurate, some applications can afford to use a larger w and hence sparser and smaller sketches. And as sketches get smaller, their analysis becomes faster, so the time spent sketching the full-sized input becomes more of a bottleneck. Methods. Our library simd-minimizers implements a random minimizer algorithm using SIMD instructions. It supports both AVX2 and NEON architectures. Its main novelty is two-fold. First, it splits the input into 8 chunks that are streamed over in parallel through all steps of the algorithm. This is enabled by using the completely deterministic two-stacks sliding window minimum algorithm, which seems not to have been used before for finding minimizers. Results. Our library is up to 6.8× faster than a scalar implementation of the rescan method when w = 5 is small, and 3.4× faster for larger w = 19. Computing canonical minimizers is less than 50% slower than computing forward minimizers, and over 15× faster than the existing implementation in the minimizer-iter crate. Our library finds all (canonical) minimizers of a 3.2 Gbp human genome in 5.2 (resp. 6.7) seconds.

Cite as

Ragnar Groot Koerkamp and Igor Martayan. SimdMinimizers: Computing Random Minimizers, fast. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{grootkoerkamp_et_al:LIPIcs.SEA.2025.20,
  author =	{Groot Koerkamp, Ragnar and Martayan, Igor},
  title =	{{SimdMinimizers: Computing Random Minimizers, fast}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.20},
  URN =		{urn:nbn:de:0030-drops-232581},
  doi =		{10.4230/LIPIcs.SEA.2025.20},
  annote =	{Keywords: Minimizers, Randomized algorithms, Sketching, Hashing}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.86

Optimal Distance Labeling for Permutation Graphs

Authors: Paweł Gawrychowski and Wojciech Janczewski

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

A permutation graph is the intersection graph of a set of segments between two parallel lines. In other words, they are defined by a permutation π on n elements, such that u and v are adjacent if an only if u < v but π(u) > π(v). We consider the problem of computing the distances in such a graph in the setting of informative labeling schemes. The goal of such a scheme is to assign a short bitstring 𝓁(u) to every vertex u, such that the distance between u and v can be computed using only 𝓁(u) and 𝓁(v), and no further knowledge about the whole graph (other than that it is a permutation graph). This elegantly captures the intuition that we would like our data structure to be distributed, and often leads to interesting combinatorial challenges while trying to obtain lower and upper bounds that match up to the lower-order terms. For distance labeling of permutation graphs on n vertices, Katz, Katz, and Peleg [STACS 2000] showed how to construct labels consisting of 𝒪(log² n) bits. Later, Bazzaro and Gavoille [Discret. Math. 309(11)] obtained an asymptotically optimal bound by showing how to construct labels consisting of 9log{n}+𝒪(1) bits, and proving that 3log{n}-𝒪(log{log{n}}) bits are necessary. This however leaves a quite large gap between the known lower and upper bounds. We close this gap by showing how to construct labels consisting of 3log{n}+𝒪(1) bits.

Cite as

Paweł Gawrychowski and Wojciech Janczewski. Optimal Distance Labeling for Permutation Graphs. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 86:1-86:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.ICALP.2025.86,
  author =	{Gawrychowski, Pawe{\l} and Janczewski, Wojciech},
  title =	{{Optimal Distance Labeling for Permutation Graphs}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{86:1--86:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.86},
  URN =		{urn:nbn:de:0030-drops-234632},
  doi =		{10.4230/LIPIcs.ICALP.2025.86},
  annote =	{Keywords: informative labeling, permutation graph, distance labeling}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.5

Covers in Optimal Space

Authors: Itai Boneh and Shay Golan

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

A cover of a string S is a string C such that every index of S is contained in some occurrence of C. First introduced by Apostolico and Ehrenfeucht [TCS'93] over 30 years ago, covers have since received significant attention in the string algorithms community. In this work, we present a space-efficient algorithm for computing a compact representation of all covers of a given string. Our algorithm requires only O(log n) additional memory while accessing the input string of length n in a read-only manner. Moreover, it runs in O(n) time, matching the best-known time complexity for this problem while achieving an exponential improvement in space usage.

Cite as

Itai Boneh and Shay Golan. Covers in Optimal Space. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.CPM.2025.5,
  author =	{Boneh, Itai and Golan, Shay},
  title =	{{Covers in Optimal Space}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{5:1--5:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.5},
  URN =		{urn:nbn:de:0030-drops-230993},
  doi =		{10.4230/LIPIcs.CPM.2025.5},
  annote =	{Keywords: Cover, Read-only random access, small space}
}

Document

DOI: 10.4230/LIPIcs.STACS.2025.61

Efficiently Computing the Minimum Rank of a Matrix in a Monoid of Zero-One Matrices

Authors: Stefan Kiefer and Andrew Ryzhikov

Published in: LIPIcs, Volume 327, 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)

Abstract

A zero-one matrix is a matrix with entries from {0, 1}. We study monoids containing only such matrices. A finite set of zero-one matrices generating such a monoid can be seen as the matrix representation of an unambiguous finite automaton, an important generalisation of deterministic finite automata which shares many of their good properties. Let 𝒜 be a finite set of n×n zero-one matrices generating a monoid of zero-one matrices, and m be the cardinality of 𝒜. We study the computational complexity of computing the minimum rank of a matrix in the monoid generated by 𝒜. By using linear-algebraic techniques, we show that this problem is in NC and can be solved in 𝒪(mn⁴) time. We also provide a combinatorial algorithm finding a matrix of minimum rank in 𝒪(n^{2 + ω} + mn⁴) time, where 2 ≤ ω ≤ 2.4 is the matrix multiplication exponent. As a byproduct, we show a very weak version of a generalisation of the Černý conjecture: there always exists a straight line program of size 𝒪(n²) describing a product resulting in a matrix of minimum rank. For the special case corresponding to complete DFAs (that is, for the case where all matrices have exactly one 1 in each row), the minimum rank is the size of the smallest image of the set of states under the action of a word. Our combinatorial algorithm finds a matrix of minimum rank in time 𝒪(n³ + mn²) in this case.

Cite as

Stefan Kiefer and Andrew Ryzhikov. Efficiently Computing the Minimum Rank of a Matrix in a Monoid of Zero-One Matrices. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 61:1-61:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kiefer_et_al:LIPIcs.STACS.2025.61,
  author =	{Kiefer, Stefan and Ryzhikov, Andrew},
  title =	{{Efficiently Computing the Minimum Rank of a Matrix in a Monoid of Zero-One Matrices}},
  booktitle =	{42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)},
  pages =	{61:1--61:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-365-2},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{327},
  editor =	{Beyersdorff, Olaf and Pilipczuk, Micha{\l} and Pimentel, Elaine and Thắng, Nguy\~{ê}n Kim},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2025.61},
  URN =		{urn:nbn:de:0030-drops-228867},
  doi =		{10.4230/LIPIcs.STACS.2025.61},
  annote =	{Keywords: matrix monoids, minimum rank, unambiguous automata}
}

Document

DOI: 10.4230/LIPIcs.ESA.2024.31

String 2-Covers with No Length Restrictions

Authors: Itai Boneh, Shay Golan, and Arseny Shur

Published in: LIPIcs, Volume 308, 32nd Annual European Symposium on Algorithms (ESA 2024)

Abstract

A λ-cover of a string S is a set of strings {C_i}₁^λ such that every index in S is contained in an occurrence of at least one string C_i. The existence of a 1-cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all 1-covers of a string can be reported in linear time as well. Since in general it is NP-complete to decide whether a string has a λ-cover, the natural next step is the development of efficient algorithms for 2-covers. Radoszewski and Straszyński [ESA 2020] analysed the particular case where the strings in a 2-cover must be of the same length. They provided an algorithm that reports all such 2-covers of S in time near-linear in |S| and in the size of the output. In this work, we consider 2-covers in full generality. Since every length-n string has Ω(n²) trivial 2-covers (every prefix and suffix of total length at least n constitute such a 2-cover), we state the reporting problem as follows: given a string S and a number m, report all 2-covers {C₁,C₂} of S with length |C₁|+|C₂| upper bounded by m. We present an Õ(n + output) time algorithm solving this problem, with output being the size of the output. This algorithm admits a simpler modification that finds a 2-cover of minimum length. We also provide an Õ(n) time construction of a 2-cover oracle which, given two substrings C₁,C₂ of S, reports in poly-logarithmic time whether {C₁,C₂} is a 2-cover of S.

Cite as

Itai Boneh, Shay Golan, and Arseny Shur. String 2-Covers with No Length Restrictions. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 31:1-31:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{boneh_et_al:LIPIcs.ESA.2024.31,
  author =	{Boneh, Itai and Golan, Shay and Shur, Arseny},
  title =	{{String 2-Covers with No Length Restrictions}},
  booktitle =	{32nd Annual European Symposium on Algorithms (ESA 2024)},
  pages =	{31:1--31:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-338-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{308},
  editor =	{Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.31},
  URN =		{urn:nbn:de:0030-drops-211029},
  doi =		{10.4230/LIPIcs.ESA.2024.31},
  annote =	{Keywords: Quasi-periodicity, String cover, Range query, Range stabbing}
}

22 Search Results for "Shur, Arseny"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message