Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares

Authors Gabriel Bathie , Tomasz Kociumaka , Tatiana Starikovskaya



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2023.10.pdf
  • Filesize: 0.86 MB
  • 17 pages

Document Identifiers

Author Details

Gabriel Bathie
  • DIENS, École normale supérieure de Paris, PSL Research University, France
  • LaBRI, Université de Bordeaux, France
Tomasz Kociumaka
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Tatiana Starikovskaya
  • DIENS, École normale supérieure de Paris, PSL Research University, France

Cite AsGet BibTex

Gabriel Bathie, Tomasz Kociumaka, and Tatiana Starikovskaya. Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ISAAC.2023.10

Abstract

We study the online variant of the language distance problem for two classical formal languages, the language of palindromes and the language of squares, and for the two most fundamental distances, the Hamming distance and the edit (Levenshtein) distance. In this problem, defined for a fixed formal language L, we are given a string T of length n, and the task is to compute the minimal distance to L from every prefix of T. We focus on the low-distance regime, where one must compute only the distances smaller than a given threshold k. In this work, our contribution is twofold: 1) First, we show streaming algorithms, which access the input string T only through a single left-to-right scan. Both for palindromes and squares, our algorithms use O(k polylog n) space and time per character in the Hamming-distance case and O(k² polylog n) space and time per character in the edit-distance case. These algorithms are randomised by necessity, and they err with probability inverse-polynomial in n. 2) Second, we show deterministic read-only online algorithms, which are also provided with read-only random access to the already processed characters of T. Both for palindromes and squares, our algorithms use O(k polylog n) space and time per character in the Hamming-distance case and O(k⁴ polylog n) space and amortised time per character in the edit-distance case.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Pattern matching
Keywords
  • Approximate pattern matching
  • streaming algorithms
  • palindromes
  • squares

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. If the Current Clique Algorithms Are Optimal, so Is Valiant’s Parser. SIAM Journal on Computing, 47(6):2527-2555, 2018. URL: https://doi.org/10.1137/16M1061771.
  2. Alfred V. Aho and Thomas G. Peterson. A minimum distance error-correcting parser for context-free languages. SIAM Journal on Computing, 1(4):305-312, 1972. URL: https://doi.org/10.1137/0201022.
  3. Amihood Amir and Benny Porat. Approximate on-line palindrome recognition, and applications. In Proc. of CPM 2014, volume 8486 of LNCS, pages 21-29. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-07566-2_3.
  4. Arturs Backurs and Krzysztof Onak. Fast algorithms for parsing sequences of parentheses with few errors. In Proc. of PODS 2016, pages 477-488. ACM, 2016. URL: https://doi.org/10.1145/2902251.2902304.
  5. Djamal Belazzougui and Mathieu Raffinot. Approximate regular expression matching with multi-strings. Journal of Discrete Algorithms, 18:14-21, 2013. URL: https://doi.org/10.1016/j.jda.2012.07.008.
  6. Petra Berenbrink, Funda Ergün, Frederik Mallmann-Trenn, and Erfan Sadeqi Azer. Palindrome recognition in the streaming model. In Proc. of STACS, volume 25, pages 149-161, 2014. URL: https://doi.org/10.4230/LIPIcs.STACS.2014.149.
  7. Sudatta Bhattacharya and Michal Koucký. Locally consistent decomposition of strings with applications to edit distance sketching. In Proc. of 55th STOC, pages 219-232. ACM, 2023. URL: https://doi.org/10.1145/3564246.3585239.
  8. Karl Bringmann, Fabrizio Grandoni, Barna Saha, and Virginia Vassilevska Williams. Truly subcubic algorithms for language edit distance and RNA folding via fast bounded-difference min-plus product. SIAM Journal on Computing, 48(2):481-512, 2019. URL: https://doi.org/10.1137/17M112720X.
  9. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In Proc. of 61st FOCS, pages 978-989. IEEE, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00095.
  10. Shucheng Chi, Ran Duan, Tianle Xie, and Tianyi Zhang. Faster min-plus product for monotone instances. In Proc. of 54th STOC, pages 1529-1542. ACM, 2022. URL: https://doi.org/10.1145/3519935.3520057.
  11. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In Proc. of SODA 2019, pages 1106-1125. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.68.
  12. Debarati Das, Tomasz Kociumaka, and Barna Saha. Improved approximation algorithms for Dyck edit distance and RNA folding. In Proc. of ICALP 2022, volume 229 of LIPIcs, pages 49:1-49:20, 2022. URL: https://doi.org/10.4230/LIPIcs.ICALP.2022.49.
  13. Anita Dürr. Improved bounds for rectangular monotone Min-Plus Product and applications. Information Processing Letters, 181:106358, 2023. URL: https://doi.org/10.1016/j.ipl.2023.106358.
  14. Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. URL: https://doi.org/10.1090/S0002-9939-1965-0174934-9.
  15. Dvir Fried, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, and Tatiana Starikovskaya. An improved algorithm for the k-Dyck edit distance problem. In Proc. of SODA 2022, pages 3650-3669. SIAM, 2022. URL: https://doi.org/10.1137/1.9781611977073.144.
  16. Zvi Galil. Real-time algorithms for string-matching and palindrome recognition. In Proc. of STOC, pages 161-173. ACM, 1976. URL: https://doi.org/10.1145/800113.803644.
  17. Zvi Galil and Raffaele Giancarlo. Improved string matching with k mismatches. ACM SIGACT News, 17(4):52-54, 1986. URL: https://doi.org/10.1145/8307.8309.
  18. Pawel Gawrychowski, Oleg Merkurev, Arseny M. Shur, and Przemyslaw Uznanski. Tight tradeoffs for real-time approximation of longest palindromes in streams. Algorithmica, 81(9):3630-3654, 2019. URL: https://doi.org/10.1007/s00453-019-00591-8.
  19. Wei Huang, Yaoyun Shi, Shengyu Zhang, and Yufan Zhu. The communication complexity of the Hamming distance problem. Information Processing Letters, 99(4):149-153, 2006. Google Scholar
  20. Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. Small-space and streaming pattern matching with k edits. In Proc. of FOCS 2021, pages 885-896. IEEE, 2021. URL: https://doi.org/10.1109/FOCS52979.2021.00090.
  21. Roman Kolpakov and Gregory Kucherov. Finding approximate repetitions under Hamming distance. Theoretical Computer Science, 303(1):135-156, 2003. Logic and Complexity in Computer Science. URL: https://doi.org/10.1016/S0304-3975(02)00448-6.
  22. Michal Koucký and Michael E. Saks. Simple, deterministic, fast (but weak) approximations to edit distance and Dyck edit distance. In Proc. of SODA 2023, pages 5203-5219. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977554.ch188.
  23. Andreas Krebs, Nutan Limaye, and Srikanth Srinivasan. Streaming algorithms for recognizing nearly well-parenthesized expressions. In Proc. of MFCS 2011, volume 6907 of LNCS, pages 412-423. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22993-0_38.
  24. Gad M. Landau and Jeanette P. Schmidt. An algorithm for approximate tandem repeats. In Proc. of CPM, pages 120-133, 1993. URL: https://doi.org/10.1007/BFb0029801.
  25. Lillian Lee. Fast context-free grammar parsing requires fast Boolean matrix multiplication. Journal of the ACM, 49(1):1-15, January 2002. URL: https://doi.org/10.1145/505241.505242.
  26. Oleg Merkurev and Arseny M. Shur. Computing the maximum exponent in a stream. Algorithmica, 84(3):742-756, 2022. URL: https://doi.org/10.1007/s00453-021-00883-y.
  27. Gene Myers. Approximately matching context-free languages. Information Processing Letters, 54(2):85-92, 1995. URL: https://doi.org/10.1016/0020-0190(95)00007-y.
  28. Alexandre H. L. Porto and Valmir Carneiro Barbosa. Finding approximate palindromes in strings. Pattern Recognit., 35(11):2581-2591, 2002. URL: https://doi.org/10.1016/S0031-3203(01)00179-0.
  29. Walter L. Ruzzo. On the complexity of general context-free language parsing and recognition. In Proc. of ICALP 1979, volume 71 of LNCS, pages 489-497. Springer, 1979. URL: https://doi.org/10.1007/3-540-09510-1_39.
  30. Wojciech Rytter. On maximal suffixes and constant-space linear-time versions of KMP algorithm. Theoretical Computer Science, 299(1-3):763-774, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00590-X.
  31. Barna Saha. The Dyck language edit distance problem in near-linear time. In Proc. of FOCS 2014, pages 611-620. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/FOCS.2014.71.
  32. Barna Saha. Language edit distance and maximum likelihood parsing of stochastic grammars: Faster algorithms and connection to fundamental graph problems. In Proc. of FOCS 2015, pages 118-135. IEEE Computer Society, 2015. URL: https://doi.org/10.1109/FOCS.2015.17.
  33. Barna Saha. Fast space-efficient approximations of language edit distance and RNA folding: An amnesic dynamic programming approach. In Proc. of FOCS 2017, pages 295-306. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/FOCS.2017.35.
  34. Giorgio Satta. Tree-adjoining grammar parsing and boolean matrix multiplication. Comput. Linguistics, 20(2):173-191, 1994. URL: https://aclanthology.org/J94-2002.
  35. Dina Sokol, Gary Benson, and Justin Tojeira. Tandem repeats over the edit distance. Bioinformatics, 23(2):e30-e35, January 2007. URL: https://doi.org/10.1093/bioinformatics/btl309.
  36. Dina Sokol and Justin Tojeira. Speeding up the detection of tandem repeats over the edit distance. Theoretical Computer Science, 525:103-110, 2014. Advances in Stringology. URL: https://doi.org/10.1016/j.tcs.2013.04.021.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail