DROPS

Document

DOI: 10.4230/LIPIcs.MFCS.2025.9

Linear Time Subsequence and Supersequence Regex Matching

Authors: Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

It is well-known that checking whether a given string w matches a given regular expression r can be done in quadratic time O(|w|⋅ |r|) and that this cannot be improved to a truly subquadratic running time of O((|w|⋅ |r|)^{1-ε}) assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether w has a subsequence that matches r, and show that regex matching in this sense can be solved in linear time O(|w| + |r|). Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of w that matches r can be solved in O(|w|⋅ |r|), i. e., asymptotically no worse than classical regex matching; and we show that O(|w| + |r|) is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.

Cite as

Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid. Linear Time Subsequence and Supersequence Regex Matching. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.MFCS.2025.9,
  author =	{Amarilli, Antoine and Manea, Florin and Ringleb, Tina and Schmid, Markus L.},
  title =	{{Linear Time Subsequence and Supersequence Regex Matching}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.9},
  URN =		{urn:nbn:de:0030-drops-241162},
  doi =		{10.4230/LIPIcs.MFCS.2025.9},
  annote =	{Keywords: subsequence, supersequence, regular language, regular expression, automata}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.56

Negated String Containment Is Decidable

Authors: Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

We provide a positive answer to a long-standing open question of the decidability of the not-contains string predicate. Not-contains is practically relevant, for instance in symbolic execution of string manipulating programs. Particularly, we show that the predicate ¬Contains(x₁ … x_n, y₁ … y_m), where x₁ … x_n and y₁ … y_m are sequences of string variables constrained by regular languages, is decidable. Decidability of a not-contains predicate combined with chain-free word equations and regular membership constraints follows.

Cite as

Vojtěch Havlena, Michal Hečko, Lukáš Holík, and Ondřej Lengál. Negated String Containment Is Decidable. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 56:1-56:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{havlena_et_al:LIPIcs.MFCS.2025.56,
  author =	{Havlena, Vojt\v{e}ch and He\v{c}ko, Michal and Hol{\'\i}k, Luk\'{a}\v{s} and Leng\'{a}l, Ond\v{r}ej},
  title =	{{Negated String Containment Is Decidable}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{56:1--56:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.56},
  URN =		{urn:nbn:de:0030-drops-241631},
  doi =		{10.4230/LIPIcs.MFCS.2025.56},
  annote =	{Keywords: not-contains, string constraints, word combinatorics, primitive word}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.4

The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable

Authors: Dirk Nowotka and Max Wiedenhöft

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

Patterns are words with terminals and variables. The language of a pattern is the set of words obtained by uniformly substituting all variables with words that contain only terminals. Length constraints restrict valid substitutions of variables by associating the variables of a pattern with a system (or disjunction of systems) of linear diophantine inequalities. Pattern languages with length constraints contain only words in which all variables are substituted to words with lengths that fulfill such a given set of length constraints. We consider membership, inclusion, and equivalence problems for erasing and non-erasing pattern languages with length constraints. Our main result shows that the erasing equivalence problem - one of the most prominent open problems in the realm of patterns - becomes undecidable if length constraints are allowed in addition to variable equality. Additionally, it is shown that the terminal-free inclusion problem, a prominent problem which has been shown to be undecidable in the binary case for patterns without any constraints, is also generally undecidable for all larger alphabets in this setting. Finally, we also show that considering regular constraints, i.e., associating variables also with regular languages as additional restrictions together with length constraints for valid substitutions, results in undecidability of the non-erasing equivalence problem. This sets a first upper bound on constraints to obtain undecidability in this case, as this problem is trivially decidable in the case of no constraints and as it has unknown decidability if only regular or only length constraints are considered.

Cite as

Dirk Nowotka and Max Wiedenhöft. The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{nowotka_et_al:LIPIcs.CPM.2025.4,
  author =	{Nowotka, Dirk and Wiedenh\"{o}ft, Max},
  title =	{{The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.4},
  URN =		{urn:nbn:de:0030-drops-230988},
  doi =		{10.4230/LIPIcs.CPM.2025.4},
  annote =	{Keywords: Patterns, Pattern Languages, Length Constraints, Regular Constraints, Decidability, Undecidability, Membership, Inclusion, Equivalence}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2021.48

Matching Patterns with Variables Under Hamming Distance

Authors: Paweł Gawrychowski, Florin Manea, and Stefan Siemer

Published in: LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)

Abstract

A pattern α is a string of variables and terminal letters. We say that α matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of α by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of α by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.

Cite as

Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching Patterns with Variables Under Hamming Distance. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 48:1-48:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.MFCS.2021.48,
  author =	{Gawrychowski, Pawe{\l} and Manea, Florin and Siemer, Stefan},
  title =	{{Matching Patterns with Variables Under Hamming Distance}},
  booktitle =	{46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)},
  pages =	{48:1--48:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-201-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{202},
  editor =	{Bonchi, Filippo and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.48},
  URN =		{urn:nbn:de:0030-drops-144886},
  doi =		{10.4230/LIPIcs.MFCS.2021.48},
  annote =	{Keywords: Pattern with variables, Matching algorithms, Hamming distance, Conditional lower bounds, Patterns with structural restrictions}
}

Document

DOI: 10.4230/LIPIcs.STACS.2021.25

The Edit Distance to k-Subsequence Universality

Authors: Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer

Published in: LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Abstract

A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼_k-equivalent to u. In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.

Cite as

Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The Edit Distance to k-Subsequence Universality. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{day_et_al:LIPIcs.STACS.2021.25,
  author =	{Day, Joel D. and Fleischmann, Pamela and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan},
  title =	{{The Edit Distance to k-Subsequence Universality}},
  booktitle =	{38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)},
  pages =	{25:1--25:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-180-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{187},
  editor =	{Bl\"{a}ser, Markus and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.25},
  URN =		{urn:nbn:de:0030-drops-136705},
  doi =		{10.4230/LIPIcs.STACS.2021.25},
  annote =	{Keywords: Subsequence, Scattered factor, Subword, Universality, k-subsequence universality, Edit distance, Efficient algorithms}
}

Document

DOI: 10.4230/LIPIcs.STACS.2021.34

Efficiently Testing Simon’s Congruence

Authors: Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer

Published in: LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Abstract

Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well. To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Cite as

Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently Testing Simon’s Congruence. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 34:1-34:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2021.34,
  author =	{Gawrychowski, Pawe{\l} and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan},
  title =	{{Efficiently Testing Simon’s Congruence}},
  booktitle =	{38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)},
  pages =	{34:1--34:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-180-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{187},
  editor =	{Bl\"{a}ser, Markus and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.34},
  URN =		{urn:nbn:de:0030-drops-136796},
  doi =		{10.4230/LIPIcs.STACS.2021.34},
  annote =	{Keywords: Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms}
}

6 Search Results for "Siemer, Stefan"

Linear Time Subsequence and Supersequence Regex Matching

Abstract

Cite as

Negated String Containment Is Decidable

Abstract

Cite as

The Equivalence Problem of E-Pattern Languages with Length Constraints Is Undecidable

Abstract

Cite as

Matching Patterns with Variables Under Hamming Distance

Abstract

Cite as

The Edit Distance to k-Subsequence Universality

Abstract

Cite as

Efficiently Testing Simon’s Congruence

Abstract

Cite as

Thanks for your feedback!

Could not send message