The Edit Distance to k-Subsequence Universality

Day, Joel D.; Fleischmann, Pamela; Kosche, Maria; Koß, Tore; Manea, Florin; Siemer, Stefan

doi:10.4230/LIPIcs.STACS.2021.25

Abstract

A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼_k-equivalent to u.
In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.

Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter W. Shor, and Robert E. Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica, 2:195-208, 1987.
Alok Aggarwal, Baruch Schieber, and Takeshi Tokuyama. Finding a minimum-weight k-link path graphs with the concae monge property and applications. Discret. Comput. Geom., 12:263-280, 1994.
Cyril Allauzen and Mehryar Mohri. Linear-space computation of the edit-distance between a string and a finite automaton. CoRR, abs/0904.4686, 2009. URL: http://arxiv.org/abs/0904.4686.
Lorraine A. K. Ayad, Carl Barton, and Solon P. Pissis. A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognit. Lett., 88:81-87, 2017.
Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991.
Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, and Dirk Nowotka. Scattered factor-universality of words. In Natasa Jonoska and Dmytro Savchuk, editors, Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14-28, 2020.
Wolfgang W. Bein, Lawrence L. Larmore, and James K. Park. The d-edge shortest-path problem for a monge graph. Technical Report, SAND-92-1724C:1-10, 1992.
Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proc. LATIN 2000, volume 1776 of Lecture Notes in Computer Science, pages 88-94, 2000.
Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Leen Stougie, and Michelle Sweering. String sanitization under edit distance. In Inge Li Gørtz and Oren Weimann, editors, Proc. CPM 2020, volume 161 of LIPIcs, pages 7:1-7:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16, 2018.
Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly sub-cubic algorithms for language edit distance and RNA-folding via fast bounded-difference min-plus product. In Proc. FOCS 2016, pages 375-384, 2016.
Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235, 2018.
Krishnendu Chatterjee, Thomas A. Henzinger, Rasmus Ibsen-Jensen, and Jan Otop. Edit distance for pushdown automata. In Proc. ICALP 2015, volume 9135 of Lecture Notes in Computer Science, pages 121-133. Springer, 2015.
Herman Z. Q. Chen, Sergey Kitaev, Torsten Mütze, and Brian Y. Sun. On universal partial words. Electronic Notes in Discrete Mathematics, 61:231-237, 2017.
Hyunjoon Cheon and Yo-Sub Han. Computing the shortest string and the edit-distance for parsing expression languages. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 43-54, 2020.
Hyunjoon Cheon, Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. The relative edit-distance between two input-driven languages. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 127-139, 2019.
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007.
Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - overview. J. Discrete Algorithms, 1(3-4):255-280, 2003.
Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The edit distance to k-subsequence universality. CoRR, abs/2007.09192, 2020. URL: http://arxiv.org/abs/2007.09192.
Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. k-spectra of weakly-c-balanced words. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 265-277, 2019.
Nicolaas G. de Bruijn. A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen, 49:758-764, 1946.
Aldo de Luca, Amy Glen, and Luca Q. Zamboni. Rich, sturmian, and trapezoidal words. Theor. Comput. Sci., 407(1-3):569-573, 2008.
Xavier Droubay, Jacques Justin, and Giuseppe Pirillo. Episturmian words and some constructions of de Luca and Rauzy. Theor. Comput. Sci., 255(1-2):539-553, 2001.
Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13, 2018.
Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR abs/1509.00622.
Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. In Proc. 15th STOC, pages 246-251, 1983.
Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53, 1993.
Pawel Gawrychowski, Maria Kosche, Tore Koss, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. CoRR, abs/2005.01112, 2020. URL: http://arxiv.org/abs/2005.01112.
Pawel Gawrychowski, Martin Lange, Narad Rampersad, Jeffrey O. Shallit, and Marek Szykula. Existential length universality. In Proc. STACS 2020, volume 154 of LIPIcs, pages 16:1-16:14, 2020.
Bennet Goeckner, Corbin Groothuis, Cyrus Hettle, Brian Kell, Pamela Kirkpatrick, Rachel Kirsch, and Ryan W. Solava. Universal partial words over non-binary alphabets. Theor. Comput. Sci, 713:56-65, 2018.
Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017.
R. W. Hamming. Error detecting and error correcting codes. Bell Syst. Tech. J., 29(2):147-160, 1950.
Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci., 24(7):1067-1082, 2013.
Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theor. Comput. Sci., 82(1):35-49, 22 May 1991.
Daniel S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341-343, 1975.
Markus Holzer and Martin Kutrib. Descriptional and computational complexity of finite automata - A survey. Inf. Comput., 209(3):456-470, 2011.
Hiroshi Imai and Takao Asano. Dynamic segment intersection search with applications. In Proc. FOCS 1984, pages 393-402, 1984.
Rajesh Jayaram and Barna Saha. Approximating language edit distance beyond fast matrix multiplication: Ultralinear grammars are where parsing becomes hard! In Proc. ICALP 2017, volume 80 of LIPIcs, pages 19:1-19:15, 2017.
Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019.
Markus Krötzsch, Tomás Masopust, and Michaël Thomazo. Complexity of universality and related problems for partially ordered NFAs. Inf. Comput., 255:177-192, 2017.
Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289-302, 2020.
Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364, 2019.
Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019.
Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017.
David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978.
Monroe H. Martin. A problem in arrangements. Bull. Amer. Math. Soc., 40(12):859-864, December 1934.
Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1-21, 2004.
Giovanni Pighizzini. How hard is computing the edit distance? Inf. Comput., 165(1):1-13, 2001.
Jean-Eric Pin. The consequences of Imre Simon’s work in the theory of automata, languages, and semigroups. In Proc. LATIN 2004, volume 2976 of Lecture Notes in Computer Science, page 5, 2004.
Jean-Eric Pin. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semigroup Forum, 98:1-8, 2019.
Narad Rampersad, Jeffrey Shallit, and Zhi Xu. The computational complexity of universality problems for prefixes, suffixes, factors, and subwords of regular languages. Fundam. Inf., 116(1-4):223-236, January 2012.
Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015.
Jacques Sakarovitch and Imre Simon. Subwords. In M. Lothaire, editor, Combinatorics on Words, chapter 6, pages 105-142. Cambridge University Press, 1997.
Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188-203, 2005.
David Sankoff and Joseph Kruskal. Time Warps, String Edits, and Macromolecules The Theory and Practice of Sequence Comparison. Cambridge University Press, 2000 (reprinted). originally published in 1983.
Baruch Schieber. Computing a minimum-weight k-link path in graphs with the concave monge property. In Proc. SODA 1995, pages 405-411. ACM/SIAM, 1995.
Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. URL: https://doi.org/10.1016/j.tcs.2011.10.017.
Imre Simon. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972.
Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222, 1975.
Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003.
Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002.
Robert A. Wagner. Order-n correction for regular languages. Commun. ACM, 17(5):265-268, 1974.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974.
Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016.

The Edit Distance to k-Subsequence Universality

Authors Joel D. Day , Pamela Fleischmann , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

The Edit Distance to k-Subsequence Universality

Authors Joel D. Day , Pamela Fleischmann , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message