The Edit Distance to k-Subsequence Universality

Authors Joel D. Day , Pamela Fleischmann , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer



PDF
Thumbnail PDF

File

LIPIcs.STACS.2021.25.pdf
  • Filesize: 0.76 MB
  • 19 pages

Document Identifiers

Author Details

Joel D. Day
  • Loughborough University, UK
Pamela Fleischmann
  • Computer Science Department, Universität Kiel, Germany
Maria Kosche
  • Computer Science Department, Universität Göttingen, Germany
Tore Koß
  • Computer Science Department, Universität Göttingen, Germany
Florin Manea
  • Computer Science Department, Universität Göttingen, Germany
  • Campus-Institut Data Science, Göttingen, Germany
Stefan Siemer
  • Computer Science Department, Universität Göttingen, Germany

Cite As Get BibTex

Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The Edit Distance to k-Subsequence Universality. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 25:1-25:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.STACS.2021.25

Abstract

A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼_k-equivalent to u.
In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.

Subject Classification

ACM Subject Classification
  • Theory of computation → Formal languages and automata theory
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Subsequence
  • Scattered factor
  • Subword
  • Universality
  • k-subsequence universality
  • Edit distance
  • Efficient algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal, Maria M. Klawe, Shlomo Moran, Peter W. Shor, and Robert E. Wilber. Geometric applications of a matrix-searching algorithm. Algorithmica, 2:195-208, 1987. Google Scholar
  2. Alok Aggarwal, Baruch Schieber, and Takeshi Tokuyama. Finding a minimum-weight k-link path graphs with the concae monge property and applications. Discret. Comput. Geom., 12:263-280, 1994. Google Scholar
  3. Cyril Allauzen and Mehryar Mohri. Linear-space computation of the edit-distance between a string and a finite automaton. CoRR, abs/0904.4686, 2009. URL: http://arxiv.org/abs/0904.4686.
  4. Lorraine A. K. Ayad, Carl Barton, and Solon P. Pissis. A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognit. Lett., 88:81-87, 2017. Google Scholar
  5. Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991. Google Scholar
  6. Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, and Dirk Nowotka. Scattered factor-universality of words. In Natasa Jonoska and Dmytro Savchuk, editors, Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14-28, 2020. Google Scholar
  7. Wolfgang W. Bein, Lawrence L. Larmore, and James K. Park. The d-edge shortest-path problem for a monge graph. Technical Report, SAND-92-1724C:1-10, 1992. Google Scholar
  8. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proc. LATIN 2000, volume 1776 of Lecture Notes in Computer Science, pages 88-94, 2000. Google Scholar
  9. Giulia Bernardini, Huiping Chen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Leen Stougie, and Michelle Sweering. String sanitization under edit distance. In Inge Li Gørtz and Oren Weimann, editors, Proc. CPM 2020, volume 161 of LIPIcs, pages 7:1-7:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  10. Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16, 2018. Google Scholar
  11. Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly sub-cubic algorithms for language edit distance and RNA-folding via fast bounded-difference min-plus product. In Proc. FOCS 2016, pages 375-384, 2016. Google Scholar
  12. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235, 2018. Google Scholar
  13. Krishnendu Chatterjee, Thomas A. Henzinger, Rasmus Ibsen-Jensen, and Jan Otop. Edit distance for pushdown automata. In Proc. ICALP 2015, volume 9135 of Lecture Notes in Computer Science, pages 121-133. Springer, 2015. Google Scholar
  14. Herman Z. Q. Chen, Sergey Kitaev, Torsten Mütze, and Brian Y. Sun. On universal partial words. Electronic Notes in Discrete Mathematics, 61:231-237, 2017. Google Scholar
  15. Hyunjoon Cheon and Yo-Sub Han. Computing the shortest string and the edit-distance for parsing expression languages. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 43-54, 2020. Google Scholar
  16. Hyunjoon Cheon, Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. The relative edit-distance between two input-driven languages. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 127-139, 2019. Google Scholar
  17. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007. Google Scholar
  18. Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - overview. J. Discrete Algorithms, 1(3-4):255-280, 2003. Google Scholar
  19. Joel D. Day, Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. The edit distance to k-subsequence universality. CoRR, abs/2007.09192, 2020. URL: http://arxiv.org/abs/2007.09192.
  20. Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. k-spectra of weakly-c-balanced words. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 265-277, 2019. Google Scholar
  21. Nicolaas G. de Bruijn. A combinatorial problem. Koninklijke Nederlandse Akademie v. Wetenschappen, 49:758-764, 1946. Google Scholar
  22. Aldo de Luca, Amy Glen, and Luca Q. Zamboni. Rich, sturmian, and trapezoidal words. Theor. Comput. Sci., 407(1-3):569-573, 2008. Google Scholar
  23. Xavier Droubay, Jacques Justin, and Giuseppe Pirillo. Episturmian words and some constructions of de Luca and Rauzy. Theor. Comput. Sci., 255(1-2):539-553, 2001. Google Scholar
  24. Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13, 2018. Google Scholar
  25. Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR abs/1509.00622. Google Scholar
  26. Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. In Proc. 15th STOC, pages 246-251, 1983. Google Scholar
  27. Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53, 1993. Google Scholar
  28. Pawel Gawrychowski, Maria Kosche, Tore Koss, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. CoRR, abs/2005.01112, 2020. URL: http://arxiv.org/abs/2005.01112.
  29. Pawel Gawrychowski, Martin Lange, Narad Rampersad, Jeffrey O. Shallit, and Marek Szykula. Existential length universality. In Proc. STACS 2020, volume 154 of LIPIcs, pages 16:1-16:14, 2020. Google Scholar
  30. Bennet Goeckner, Corbin Groothuis, Cyrus Hettle, Brian Kell, Pamela Kirkpatrick, Rachel Kirsch, and Ryan W. Solava. Universal partial words over non-binary alphabets. Theor. Comput. Sci, 713:56-65, 2018. Google Scholar
  31. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017. Google Scholar
  32. R. W. Hamming. Error detecting and error correcting codes. Bell Syst. Tech. J., 29(2):147-160, 1950. Google Scholar
  33. Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci., 24(7):1067-1082, 2013. Google Scholar
  34. Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theor. Comput. Sci., 82(1):35-49, 22 May 1991. Google Scholar
  35. Daniel S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341-343, 1975. Google Scholar
  36. Markus Holzer and Martin Kutrib. Descriptional and computational complexity of finite automata - A survey. Inf. Comput., 209(3):456-470, 2011. Google Scholar
  37. Hiroshi Imai and Takao Asano. Dynamic segment intersection search with applications. In Proc. FOCS 1984, pages 393-402, 1984. Google Scholar
  38. Rajesh Jayaram and Barna Saha. Approximating language edit distance beyond fast matrix multiplication: Ultralinear grammars are where parsing becomes hard! In Proc. ICALP 2017, volume 80 of LIPIcs, pages 19:1-19:15, 2017. Google Scholar
  39. Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015. Google Scholar
  40. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016. Google Scholar
  41. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019. Google Scholar
  42. Markus Krötzsch, Tomás Masopust, and Michaël Thomazo. Complexity of universality and related problems for partially ordered NFAs. Inf. Comput., 255:177-192, 2017. Google Scholar
  43. Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289-302, 2020. Google Scholar
  44. Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364, 2019. Google Scholar
  45. Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019. Google Scholar
  46. Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017. Google Scholar
  47. David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978. Google Scholar
  48. Monroe H. Martin. A problem in arrangements. Bull. Amer. Math. Soc., 40(12):859-864, December 1934. Google Scholar
  49. Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1-21, 2004. Google Scholar
  50. Giovanni Pighizzini. How hard is computing the edit distance? Inf. Comput., 165(1):1-13, 2001. Google Scholar
  51. Jean-Eric Pin. The consequences of Imre Simon’s work in the theory of automata, languages, and semigroups. In Proc. LATIN 2004, volume 2976 of Lecture Notes in Computer Science, page 5, 2004. Google Scholar
  52. Jean-Eric Pin. The influence of Imre Simon’s work in the theory of automata, languages and semigroups. Semigroup Forum, 98:1-8, 2019. Google Scholar
  53. Narad Rampersad, Jeffrey Shallit, and Zhi Xu. The computational complexity of universality problems for prefixes, suffixes, factors, and subwords of regular languages. Fundam. Inf., 116(1-4):223-236, January 2012. Google Scholar
  54. Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015. Google Scholar
  55. Jacques Sakarovitch and Imre Simon. Subwords. In M. Lothaire, editor, Combinatorics on Words, chapter 6, pages 105-142. Cambridge University Press, 1997. Google Scholar
  56. Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188-203, 2005. Google Scholar
  57. David Sankoff and Joseph Kruskal. Time Warps, String Edits, and Macromolecules The Theory and Practice of Sequence Comparison. Cambridge University Press, 2000 (reprinted). originally published in 1983. Google Scholar
  58. Baruch Schieber. Computing a minimum-weight k-link path in graphs with the concave monge property. In Proc. SODA 1995, pages 405-411. ACM/SIAM, 1995. Google Scholar
  59. Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. URL: https://doi.org/10.1016/j.tcs.2011.10.017.
  60. Imre Simon. Hierarchies of events with dot-depth one - Ph.D. thesis. University of Waterloo, 1972. Google Scholar
  61. Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222, 1975. Google Scholar
  62. Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003. Google Scholar
  63. Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002. Google Scholar
  64. Robert A. Wagner. Order-n correction for regular languages. Commun. ACM, 17(5):265-268, 1974. Google Scholar
  65. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974. Google Scholar
  66. Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail