Efficiently Testing Simon’s Congruence

Authors Paweł Gawrychowski , Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

Thumbnail PDF


  • Filesize: 0.96 MB
  • 18 pages

Document Identifiers

Author Details

Paweł Gawrychowski
  • Faculty of Mathematics and Computer Science, University of Wrocław, Poland
Maria Kosche
  • Computer Science Department, Universität Göttingen, Germany
Tore Koß
  • Computer Science Department, Universität Göttingen, Germany
Florin Manea
  • Computer Science Department, Universität Göttingen, Germany
  • Campus-Institut Data Science, Göttingen, Germany
Stefan Siemer
  • Computer Science Department, Universität Göttingen, Germany

Cite AsGet BibTex

Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently Testing Simon’s Congruence. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 34:1-34:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Simon’s congruence ∼_k is a relation on words defined by Imre Simon in the 1970s and intensely studied since then. This congruence was initially used in connection to piecewise testable languages, but also found many applications in, e.g., learning theory, databases theory, or linguistics. The ∼_k-relation is defined as follows: two words are ∼_k-congruent if they have the same set of subsequences of length at most k. A long standing open problem, stated already by Simon in his initial works on this topic, was to design an algorithm which computes, given two words s and t, the largest k for which s∼_k t. We propose the first algorithm solving this problem in linear time O(|s|+|t|) when the input words are over the integer alphabet {1,…,|s|+|t|} (or other alphabets which can be sorted in linear time). Our approach can be extended to an optimal algorithm in the case of general alphabets as well. To achieve these results, we introduce a novel data-structure, called Simon-Tree, which allows us to construct a natural representation of the equivalence classes induced by ∼_k on the set of suffixes of a word, for all k ≥ 1. We show that such a tree can be constructed for an input word in linear time. Then, when working with two words s and t, we compute their respective Simon-Trees and efficiently build a correspondence between the nodes of these trees. This correspondence, which can also be constructed in linear time O(|s|+|t|), allows us to retrieve the largest k for which s∼_k t.

Subject Classification

ACM Subject Classification
  • Theory of computation → Formal languages and automata theory
  • Theory of computation → Design and analysis of algorithms
  • Simon’s congruence
  • Subsequence
  • Scattered factor
  • Efficient algorithms


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363-376, 1991. Google Scholar
  2. Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, and Dirk Nowotka. Scattered factor-universality of words. In Proc. DLT 2020, volume 12086 of Lecture Notes in Computer Science, pages 14-28. Springer, 2020. Google Scholar
  3. Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1-40:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  4. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216-1235. SIAM, 2018. Google Scholar
  5. Maxime Crochemore, Borivoj Melichar, and Zdenek Tronícek. Directed acyclic subsequence graph - overview. J. Discrete Algorithms, 1(3-4):255-280, 2003. Google Scholar
  6. Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. k-spectra of weakly-c-balanced words. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 265-277. Springer, 2019. Google Scholar
  7. Cees H. Elzinga, Sven Rahmann, and Hui Wang. Algorithms for subsequence combinatorics. Theor. Comput. Sci., 409(3):394-404, 2008. Google Scholar
  8. Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In Proc. MFCS 2018, volume 117 of LIPIcs, pages 62:1-62:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  9. Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, and Wojciech Rytter. Testing k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239-248, 2015. available in CoRR: URL: https://arxiv.org/abs/1509.00622.
  10. Harold N. Gabow and Robert Endre Tarjan. A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 30(2):209-221, 1985. Google Scholar
  11. Emmanuelle Garel. Minimal separators of two words. In Proc. CPM 1993, volume 684 of Lecture Notes in Computer Science, pages 35-53. Springer, 1993. Google Scholar
  12. Simon Halfon, Philippe Schnoebelen, and Georg Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1-12, 2017. Google Scholar
  13. Jean-Jacques Hebrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theoretical Computer Science, 82(1):35-49, 22 May 1991. Google Scholar
  14. Hiroshi Imai and Takao Asano. Dynamic segment intersection search with applications. In Proc. 25th FOCS, 1984, pages 393-402, 1984. Google Scholar
  15. Prateek Karandikar, Manfred Kufleitner, and Philippe Schnoebelen. On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515-519, 2015. Google Scholar
  16. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1-37:22, 2016. Google Scholar
  17. Prateek Karandikar and Philippe Schnoebelen. The height of piecewise-testable languages and the complexity of the logic of subwords. Logical Methods in Computer Science, 15(2), 2019. Google Scholar
  18. Dietrich Kuske and Georg Zetzsche. Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348-364. Springer, 2019. Google Scholar
  19. Marie Lejeune, Julien Leroy, and Michel Rigo. Computing the k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278-291, 2019. Google Scholar
  20. Julien Leroy, Michel Rigo, and Manon Stipulanti. Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017. Google Scholar
  21. M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997. Google Scholar
  22. David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322-336, April 1978. Google Scholar
  23. Alexandru Mateescu, Arto Salomaa, and Sheng Yu. Subword histories and Parikh matrices. J. of Comput. Syst. Sci., 68(1):1-21, 2004. Google Scholar
  24. Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. preprint, CoRR, 2020. URL: http://arxiv.org/abs/2005.01112.
  25. Michel Rigo and Pavel Salimov. Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47-57, 2015. Google Scholar
  26. Arto Salomaa. Connections between subwords and certain matrix mappings. Theor. Comput. Sci., 340(2):188-203, 2005. Google Scholar
  27. David Sankoff and Joseph Kruskal. Time Warps, String Edits, and Macromolecules The Theory and Practice of Sequence Comparison. Cambridge University Press, 2000 (reprinted). originally published in 1983. Google Scholar
  28. Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116-120, 2012. Google Scholar
  29. Imre Simon. An algorithm to distinguish words efficiently by their subwords. Google Scholar
  30. Imre Simon. Piecewise testable events. In Proc. Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214-222. Springer, 1975. Google Scholar
  31. Imre Simon. Words distinguished by their subwords (extended abstract). In Proc. WORDS 2003, volume 27 of TUCS General Publication, pages 6-13, 2003. Google Scholar
  32. Zdenek Tronícek. Common subsequence automaton. In Proc. CIAA 2002 (Revised Papers), volume 2608 of Lecture Notes in Computer Science, pages 270-275, 2002. Google Scholar
  33. Jean Vuillemin. A unifying look at data structures. Commun. ACM, 23(4):229-239, 1980. Google Scholar
  34. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974. Google Scholar
  35. Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1-123:14, 2016. Google Scholar