Efficiently Testing Simon’s Congruence

Gawrychowski, Paweł; Kosche, Maria; Koß, Tore; Manea, Florin; Siemer, Stefan

doi:10.4230/LIPIcs.STACS.2021.34

Abstract

Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well.
To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991.
Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, and Dirk Nowotka. Scattered factor-universality of words. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14-28. Springer, 2020.
Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235. SIAM, 2018.
Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - overview. J. Discrete Algorithms, 1(3-4):255-280, 2003.
Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. k-spectra of weakly-c-balanced words. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 265-277. Springer, 2019.
Cees H. Elzinga, Sven Rahmann, and Hui Wang. Algorithms for subsequence combinatorics. Theor. Comput. Sci., 409(3):394-404, 2008.
Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR: URL: https://arxiv.org/abs/1509.00622.
Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 30(2):209-221, 1985.
Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53. Springer, 1993.
Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017.
Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theoretical Computer Science, 82(1):35-49, 22 May 1991.
Hiroshi Imai and Takao Asano. Dynamic segment intersection search with applications. In Proc. 25th FOCS, 1984, pages 393-402, 1984.
Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016.
Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Logical Methods in Computer Science, 15(2), 2019.
Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364. Springer, 2019.
Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019.
Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017.
M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997.
David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978.
Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. of Comput. Syst. Sci., 68(1):1-21, 2004.
Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. preprint, CoRR, 2020. URL: http://arxiv.org/abs/2005.01112.
Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015.
Arto Salomaa. Connections between subwords and certain matrix mappings. Theor. Comput. Sci., 340(2):188-203, 2005.
David Sankoff and Joseph Kruskal. Time Warps, String Edits, and Macromolecules The Theory and Practice of Sequence Comparison. Cambridge University Press, 2000 (reprinted). originally published in 1983.
Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012.
Imre Simon. An algorithm to distinguish words efficiently by their subwords.
Imre Simon. Piecewise testable events. In Proc. Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222. Springer, 1975.
Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003.
Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002.
Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229-239, 1980.
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974.
Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016.

Efficiently Testing Simon’s Congruence

Authors Paweł Gawrychowski , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Efficiently Testing Simon’s Congruence

Authors Paweł Gawrychowski , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message