DROPS

Document

DOI: 10.4230/LIPIcs.ISAAC.2024.28

Subsequence Matching and Analysis Problems for Formal Languages

Authors: Szilárd Zsolt Fazekas, Tore Koß, Florin Manea, Robert Mercaş, and Timo Specht

Published in: LIPIcs, Volume 322, 35th International Symposium on Algorithms and Computation (ISAAC 2024)

Abstract

In this paper, we study a series of algorithmic problems related to the subsequences occurring in the strings of a given language, under the assumption that this language is succinctly represented by a grammar generating it, or an automaton accepting it. In particular, we focus on the following problems: Given a string w and a language L, does there exist a word of L which has w as subsequence? Do all words of L have w as a subsequence? Given an integer k alongside L, does there exist a word of L which has all strings of length k, over the alphabet of L, as subsequences? Do all words of L have all strings of length k as subsequences? For the last two problems, efficient algorithms were already presented in [Adamson et al., ISAAC 2023] for the case when L is a regular language, and efficient solutions can be easily obtained for the first two problems. We extend that work as follows: we give sufficient conditions on the class of input-languages, under which these problems are decidable; we provide efficient algorithms for all these problems in the case when the input language is context-free; we show that all problems are undecidable for context-sensitive languages. Finally, we provide a series of initial results related to a class of languages that strictly includes the regular languages and is strictly included in the class of context-sensitive languages, but is incomparable to the of class context-free languages; these results deviate significantly from those reported for language-classes from the Chomsky hierarchy.

Cite as

Szilárd Zsolt Fazekas, Tore Koß, Florin Manea, Robert Mercaş, and Timo Specht. Subsequence Matching and Analysis Problems for Formal Languages. In 35th International Symposium on Algorithms and Computation (ISAAC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 322, pp. 28:1-28:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{fazekas_et_al:LIPIcs.ISAAC.2024.28,
  author =	{Fazekas, Szil\'{a}rd Zsolt and Ko{\ss}, Tore and Manea, Florin and Merca\c{s}, Robert and Specht, Timo},
  title =	{{Subsequence Matching and Analysis Problems for Formal Languages}},
  booktitle =	{35th International Symposium on Algorithms and Computation (ISAAC 2024)},
  pages =	{28:1--28:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-354-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{322},
  editor =	{Mestre, Juli\'{a}n and Wirth, Anthony},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2024.28},
  URN =		{urn:nbn:de:0030-drops-221551},
  doi =		{10.4230/LIPIcs.ISAAC.2024.28},
  annote =	{Keywords: Stringology, String Combinatorics, Subsequence, Formal Languages, Context-Free Languages, Context-Sensitive Languages}
}

Document

DOI: 10.4230/LIPIcs.ISAAC.2023.4

k-Universality of Regular Languages

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea, and Dirk Nowotka

Published in: LIPIcs, Volume 283, 34th International Symposium on Algorithms and Computation (ISAAC 2023)

Abstract

A subsequence of a word w is a word u such that u = w[i₁] w[i₂] … w[i_k], for some set of indices 1 ≤ i₁ < i₂ < … < i_k ≤ |w|. A word w is k-subsequence universal over an alphabet Σ if every word in Σ^k appears in w as a subsequence. In this paper, we study the intersection between the set of k-subsequence universal words over some alphabet Σ and regular languages over Σ. We call a regular language L k-∃-subsequence universal if there exists a k-subsequence universal word in L, and k-∀-subsequence universal if every word of L is k-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is k-∃-subsequence universal and, respectively, if it is k-∀-subsequence universal, for a given k. The algorithms are FPT w.r.t. the size of the input alphabet, and their run-time does not depend on k; they run in polynomial time in the number n of states of the input automaton when the size of the input alphabet is O(log n). Moreover, we show that the problem of deciding if a given regular language is k-∃-subsequence universal is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of k-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of k-subsequence universal words accepted by a given finite automaton.

Cite as

Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea, and Dirk Nowotka. k-Universality of Regular Languages. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 4:1-4:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{adamson_et_al:LIPIcs.ISAAC.2023.4,
  author =	{Adamson, Duncan and Fleischmann, Pamela and Huch, Annika and Ko{\ss}, Tore and Manea, Florin and Nowotka, Dirk},
  title =	{{k-Universality of Regular Languages}},
  booktitle =	{34th International Symposium on Algorithms and Computation (ISAAC 2023)},
  pages =	{4:1--4:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-289-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{283},
  editor =	{Iwata, Satoru and Kakimura, Naonori},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2023.4},
  URN =		{urn:nbn:de:0030-drops-193064},
  doi =		{10.4230/LIPIcs.ISAAC.2023.4},
  annote =	{Keywords: String Algorithms, Regular Languages, Finite Automata, Subsequences}
}

Document

DOI: 10.4230/LIPIcs.STACS.2021.25

The Edit Distance to k-Subsequence Universality

Authors: Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer

Published in: LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Abstract

A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼_k-equivalent to u. In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.

Cite as

Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The Edit Distance to k-Subsequence Universality. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{day_et_al:LIPIcs.STACS.2021.25,
  author =	{Day, Joel D. and Fleischmann, Pamela and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan},
  title =	{{The Edit Distance to k-Subsequence Universality}},
  booktitle =	{38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)},
  pages =	{25:1--25:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-180-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{187},
  editor =	{Bl\"{a}ser, Markus and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.25},
  URN =		{urn:nbn:de:0030-drops-136705},
  doi =		{10.4230/LIPIcs.STACS.2021.25},
  annote =	{Keywords: Subsequence, Scattered factor, Subword, Universality, k-subsequence universality, Edit distance, Efficient algorithms}
}

Document

DOI: 10.4230/LIPIcs.STACS.2021.34

Efficiently Testing Simon’s Congruence

Authors: Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer

Published in: LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Abstract

Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well. To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Cite as

Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently Testing Simon’s Congruence. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 34:1-34:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2021.34,
  author =	{Gawrychowski, Pawe{\l} and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan},
  title =	{{Efficiently Testing Simon’s Congruence}},
  booktitle =	{38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)},
  pages =	{34:1--34:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-180-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{187},
  editor =	{Bl\"{a}ser, Markus and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.34},
  URN =		{urn:nbn:de:0030-drops-136796},
  doi =		{10.4230/LIPIcs.STACS.2021.34},
  annote =	{Keywords: Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms}
}

Search Results

Documents authored by Koß, Tore

Subsequence Matching and Analysis Problems for Formal Languages

Abstract

Cite as

k-Universality of Regular Languages

Abstract

Cite as

The Edit Distance to k-Subsequence Universality

Abstract

Cite as

Efficiently Testing Simon’s Congruence

Abstract

Cite as

Thanks for your feedback!

Could not send message