Simon’s Congruence Pattern Matching

Authors Sungmin Kim , Sang-Ki Ko , Yo-Sub Han



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2022.60.pdf
  • Filesize: 0.79 MB
  • 17 pages

Document Identifiers

Author Details

Sungmin Kim
  • Department of Computer Science, Yonsei University, Seoul, Republic of Korea
Sang-Ki Ko
  • Department of Computer Science & Engineering, Kangwon National University, Chuncheon-si, Republic of Korea
Yo-Sub Han
  • Department of Computer Science, Yonsei University, Seoul, Republic of Korea

Cite AsGet BibTex

Sungmin Kim, Sang-Ki Ko, and Yo-Sub Han. Simon’s Congruence Pattern Matching. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 60:1-60:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ISAAC.2022.60

Abstract

Testing Simon’s congruence asks whether two strings have the same set of subsequences of length no greater than a given integer. In the light of the recent discovery of an optimal linear algorithm for testing Simon’s congruence, we solve the Simon’s congruence pattern matching problem. The problem requires finding all substrings of a text that are congruent to a pattern under the Simon’s congruence. Our algorithm efficiently solves the problem in linear time in the length of the text by reusing results from previous computations with the help of new data structures called X-trees and Y-trees. Moreover, we define and solve variants of the Simon’s congruence pattern matching problem. They require finding the longest and shortest substring of the text as well as the shortest subsequence of the text which is congruent to the pattern under the Simon’s congruence. Two more variants which ask for the longest congruent subsequence of the text and optimizing the pattern matching problem are left as open problems.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • pattern matching
  • Simon’s congruence
  • string algorithm
  • data structure

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Anadi Agrawal and Paweł Gawrychowski. A faster subquadratic algorithm for the longest common increasing subsequence problem. In 31st International Symposium on Algorithms and Computation, volume 181 of LIPIcs, pages 4:1-4:12, 2020. Google Scholar
  2. Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, and Dirk Nowotka. Scattered factor-universality of words. In Developments in Language Theory - 24th International Conference, volume 12086 of Lecture Notes in Computer Science, pages 14-28, 2020. Google Scholar
  3. Richard Beal, Tazin Afrin, Aliya Farheen, and Don Adjeroh. A new algorithm for "the LCS problem" with application in compressing genome resequencing data. BMC Genomics, 17 (Supplement 4)(544):369-381, 2016. Google Scholar
  4. Wun-Tat Chan, Yong Zhang, Stanley P. Y. Fung, Deshi Ye, and Hong Zhu. Efficient algorithms for finding a longest common increasing subsequence. Journal of Combinatorial Optimization, 13(3):277-288, 2007. Google Scholar
  5. Lukas Fleischer and Manfred Kufleitner. Testing Simon’s congruence. In 43rd International Symposium on Mathematical Foundations of Computer Science, pages 62:1-62:13, 2018. Google Scholar
  6. Pamela Fleischmann, Lukas Haschke, Annika Huch, Annika Mayrock, and Dirk Nowotka. Nearly k-universal words - investigating a part of Simon’s congruence. In Descriptional Complexity of Formal Systems - 24th International Conference, Proceedings, volume 13439 of Lecture Notes in Computer Science, pages 57-71. Springer, 2022. Google Scholar
  7. Emmanuelle Garel. Minimal separators of two words. In 4th Annual Symposium on Combinatorial Pattern Matching, pages 35-53, 1993. Google Scholar
  8. Paweł Gawrychowski, Maria Kosche, Tore Koß, Florin Manea, and Stefan Siemer. Efficiently testing Simon’s congruence. In 38th International Symposium on Theoretical Aspects of Computer Science, volume 187 of LIPIcs, pages 34:1-34:18, 2021. Google Scholar
  9. Jean-Jacques Hébrard. An algorithm for distinguishing efficiently bit-strings by their subsequences. Theoretical Computer Science, 82(1):35-49, 1991. Google Scholar
  10. James W. Hunt and M. Douglas McIlroy. An algorithm for differential file comparison. In Computer Science Technical Reports 41, 1975. Google Scholar
  11. Sungmin Kim, Yo-Sub Han, Sang-Ki Ko, and Kai Salomaa. On Simon’s congruence closure of a string. In Descriptional Complexity of Formal Systems - 24th International Conference, Proceedings, volume 13439 of Lecture Notes in Computer Science, pages 127-141. Springer, 2022. Google Scholar
  12. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, 1977. Google Scholar
  13. Thomas Schwentick, Denis Thérien, and Heribert Vollmer. Partially-ordered two-way automata: A new characterization of DA. In Revised Papers from the 5th International Conference on Developments in Language Theory, pages 239-250, 2001. Google Scholar
  14. Jan Sedmidubský and Pavel Zezula. A web application for subsequence matching in 3d human motion data. In 19th IEEE International Symposium on Multimedia, pages 372-373, 2017. Google Scholar
  15. Imre Simon. Piecewise testable events. In Proceedings of the 2nd GI Conference on Automata Theory and Formal Languages, pages 214-222, 1975. Google Scholar
  16. Petra Surynková and Pavel Surynek. Application of longest common subsequence algorithms to meshing of planar domains with quadrilaterals. In Mathematical Methods for Curves and Surfaces - 9th International Conference, volume 10521 of Lecture Notes in Computer Science, pages 296-311, 2016. Google Scholar
  17. Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. Google Scholar
  18. Philipp Weis and Neil Immerman. Structure theorem and strict alternation hierarchy for FO² on words. Logical Methods in Computer Science, 5(3), 2009. Google Scholar