Document

**Published in:** LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)

A pattern α is a string of variables and terminal letters. We say that α matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of α by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of α by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.

Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching Patterns with Variables Under Hamming Distance. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 48:1-48:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.MFCS.2021.48, author = {Gawrychowski, Pawe{\l} and Manea, Florin and Siemer, Stefan}, title = {{Matching Patterns with Variables Under Hamming Distance}}, booktitle = {46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)}, pages = {48:1--48:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-201-3}, ISSN = {1868-8969}, year = {2021}, volume = {202}, editor = {Bonchi, Filippo and Puglisi, Simon J.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.48}, URN = {urn:nbn:de:0030-drops-144886}, doi = {10.4230/LIPIcs.MFCS.2021.48}, annote = {Keywords: Pattern with variables, Matching algorithms, Hamming distance, Conditional lower bounds, Patterns with structural restrictions} }

Document

**Published in:** LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼_k-equivalent to u.
In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.

Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The Edit Distance to k-Subsequence Universality. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{day_et_al:LIPIcs.STACS.2021.25, author = {Day, Joel D. and Fleischmann, Pamela and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan}, title = {{The Edit Distance to k-Subsequence Universality}}, booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)}, pages = {25:1--25:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-180-1}, ISSN = {1868-8969}, year = {2021}, volume = {187}, editor = {Bl\"{a}ser, Markus and Monmege, Benjamin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.25}, URN = {urn:nbn:de:0030-drops-136705}, doi = {10.4230/LIPIcs.STACS.2021.25}, annote = {Keywords: Subsequence, Scattered factor, Subword, Universality, k-subsequence universality, Edit distance, Efficient algorithms} }

Document

**Published in:** LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well.
To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently Testing Simon’s Congruence. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 34:1-34:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{gawrychowski_et_al:LIPIcs.STACS.2021.34, author = {Gawrychowski, Pawe{\l} and Kosche, Maria and Ko{\ss}, Tore and Manea, Florin and Siemer, Stefan}, title = {{Efficiently Testing Simon’s Congruence}}, booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)}, pages = {34:1--34:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-180-1}, ISSN = {1868-8969}, year = {2021}, volume = {187}, editor = {Bl\"{a}ser, Markus and Monmege, Benjamin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.34}, URN = {urn:nbn:de:0030-drops-136796}, doi = {10.4230/LIPIcs.STACS.2021.34}, annote = {Keywords: Simon’s congruence, Subsequence, Scattered factor, Efficient algorithms} }