Approximate Circular Pattern Matching Under Edit Distance

Authors Panagiotis Charalampopoulos , Solon P. Pissis , Jakub Radoszewski , Wojciech Rytter , Tomasz Waleń , Wiktor Zuba



PDF
Thumbnail PDF

File

LIPIcs.STACS.2024.24.pdf
  • Filesize: 1.02 MB
  • 22 pages

Document Identifiers

Author Details

Panagiotis Charalampopoulos
  • Birkbeck, University of London, UK
Solon P. Pissis
  • CWI, Amsterdam, The Netherlands
  • Vrije Universiteit, Amsterdam, The Netherlands
Jakub Radoszewski
  • University of Warsaw, Poland
Wojciech Rytter
  • University of Warsaw, Poland
Tomasz Waleń
  • University of Warsaw, Poland
Wiktor Zuba
  • CWI, Amsterdam, The Netherlands

Acknowledgements

We thank Tomasz Kociumaka for helpful discussions.

Cite AsGet BibTex

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate Circular Pattern Matching Under Edit Distance. In 41st International Symposium on Theoretical Aspects of Computer Science (STACS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 289, pp. 24:1-24:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.STACS.2024.24

Abstract

In the k-Edit Circular Pattern Matching (k-Edit CPM) problem, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions of the substrings of T that are at edit distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented 𝒪(nk²)-time and 𝒪(nk log³ k)-time solutions for the reporting and decision versions of k-Edit CPM, respectively. Here, we show that the reporting and decision versions of k-Edit CPM can be solved in 𝒪(n+(n/m) k⁶) time and 𝒪(n+(n/m) k⁵ log³ k) time, respectively, thus obtaining the first algorithms with a complexity of the type 𝒪(n+(n/m) poly(k)) for this problem. Notably, our algorithms run in 𝒪(n) time when m = Ω(k⁶) and are superior to the previous respective solutions when m = ω(k⁴). We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer. We obtain our solutions by exploiting the structure of approximate circular occurrences of P in T, when T is relatively short w.r.t. P. Roughly speaking, either the starting positions of approximate occurrences of rotations of P form 𝒪(k⁴) intervals that can be computed efficiently, or some rotation of P is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • circular pattern matching
  • approximate pattern matching
  • edit distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Andris Ambainis. Quantum query algorithms and lower bounds. In Classical and New Paradigms of Computation and their Complexity Hierarchies, pages 15-32, 2004. URL: https://doi.org/10.1007/978-1-4020-2776-5_2.
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. Journal of Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  3. Lorraine A. K. Ayad, Carl Barton, and Solon P. Pissis. A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognition Letters, 88:81-87, 2017. URL: https://doi.org/10.1016/j.patrec.2017.01.018.
  4. Lorraine A. K. Ayad and Solon P. Pissis. MARS: Improving multiple circular sequence alignment using refined sequences. BMC Genomics, 18(1):86, 2017. URL: https://doi.org/10.1186/s12864-016-3477-5.
  5. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing, 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  6. Adriano Barenco, Charles H. Bennett, Richard Cleve, David P. DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A. Smolin, and Harald Weinfurter. Elementary gates for quantum computation. Physical Review A, 52:3457-3467, 1995. URL: https://doi.org/10.1103/PhysRevA.52.3457.
  7. Carl Barton, Costas S. Iliopoulos, and Solon P. Pissis. Fast algorithms for approximate circular string matching. Algorithms for Molecular Biology, 9:9, 2014. URL: https://doi.org/10.1186/1748-7188-9-9.
  8. Gabriel Bathie, Tomasz Kociumaka, and Tatiana Starikovskaya. Small-space algorithms for the online language distance problem for palindromes and squares. In 34th International Symposium on Algorithms and Computation, ISAAC 2023, volume 283 of LIPIcs, pages 10:1-10:17, 2023. URL: https://doi.org/10.4230/LIPICS.ISAAC.2023.10.
  9. Karl Bringmann, Philip Wellnitz, and Marvin Künnemann. Few matches or almost periodicity: Faster pattern matching with mismatches in compressed texts. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1126-1145. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.69.
  10. Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theoretical Computer Science, 288(1):21-43, 2002. URL: https://doi.org/10.1016/S0304-3975(01)00144-X.
  11. Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern Hamming distances. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, pages 643-656. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384266.
  12. Timothy M. Chan, Ce Jin, Virginia Vassilevska Williams, and Yinzhan Xu. Faster algorithms for text-to-pattern Hamming distances. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, pages 2188-2203. IEEE, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00136.
  13. Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Circular pattern matching with k mismatches. Journal of Computer and System Sciences, 115:73-85, 2021. URL: https://doi.org/10.1016/j.jcss.2020.07.003.
  14. Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate circular pattern matching. CoRR, abs/2208.08915, 2022. https://arxiv.org/abs/2208.08915, URL: https://doi.org/10.48550/ARXIV.2208.08915.
  15. Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, and Wiktor Zuba. Approximate circular pattern matching. In 30th Annual European Symposium on Algorithms, ESA 2022, volume 244 of LIPIcs, pages 35:1-35:19, 2022. URL: https://doi.org/10.4230/LIPIcs.ESA.2022.35.
  16. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 978-989. IEEE, 2020. Full version: arXiv:2004.08350v2. URL: https://doi.org/10.1109/FOCS46700.2020.00095.
  17. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance: A reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, pages 698-707. IEEE, 2022. Full version: arXiv:2204.03087v1. URL: https://doi.org/10.1109/FOCS54457.2022.00072.
  18. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2039-2052. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch142.
  19. Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, and Przemysław Uznański. The dynamic k-mismatch problem. In 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, volume 223 of LIPIcs, pages 18:1-18:15, 2022. URL: https://doi.org/10.4230/LIPIcs.CPM.2022.18.
  20. Richard Cole and Ramesh Hariharan. Approximate string matching: A simpler faster algorithm. SIAM Journal on Computing, 31(6):1761-1782, 2002. URL: https://doi.org/10.1137/S0097539700370527.
  21. Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. Optimal dynamic strings. In 29th ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 1509-1528. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.99.
  22. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L_1 distance. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 62:1-62:13, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  23. Roberto Grossi, Costas S. Iliopoulos, Robert Mercas, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, and Fatima Vayani. Circular sequence comparison: algorithms and applications. Algorithms for Molecular Biology, 11:12, 2016. URL: https://doi.org/10.1186/s13015-016-0076-6.
  24. Yijie Han. Deterministic sorting in O(nlog log n) time and linear space. Journal of Algorithms, 50(1):96-105, 2004. URL: https://doi.org/10.1016/j.jalgor.2003.09.001.
  25. Ramesh Hariharan and V. Vinay. String matching in Õ(sqrt(n)+sqrt(m)) quantum time. Journal of Discrete Algorithms, 1(1):103-110, 2003. URL: https://doi.org/10.1016/S1570-8667(03)00010-8.
  26. Tommi Hirvola and Jorma Tarhio. Approximate online matching of circular strings. In Experimental Algorithms - 13th International Symposium, SEA 2014, pages 315-325. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-07959-2_27.
  27. Ce Jin and Jakob Nogler. Quantum speed-ups for string synchronizing sets, longest common substring, and k-mismatch matching. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, pages 5090-5121. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977554.ch186.
  28. Dominik Kempa and Tomasz Kociumaka. Dynamic suffix array with polylogarithmic queries and updates. In STOC 2022: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1657-1670. ACM, 2022. URL: https://doi.org/10.1145/3519935.3520061.
  29. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, 1977. URL: https://doi.org/10.1137/0206024.
  30. Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. Small-space and streaming pattern matching with k edits. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, pages 885-896. IEEE, 2021. URL: https://doi.org/10.1109/FOCS52979.2021.00090.
  31. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 532-551. SIAM, 2015. Full version: arXiv:1311.6235. URL: https://doi.org/10.1137/1.9781611973730.36.
  32. Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. Journal of Computer and System Sciences, 37(1):63-78, 1988. URL: https://doi.org/10.1016/0022-0000(88)90045-1.
  33. Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989. URL: https://doi.org/10.1016/0196-6774(89)90010-2.
  34. Maurice Maes. On a cyclic string-to-string correction problem. Information Processing Letters, 35(2):73-78, 1990. URL: https://doi.org/10.1016/0020-0190(90)90109-B.
  35. Vicente Palazón-González and Andrés Marzal. Speeding up the cyclic edit distance using LAESA with early abandon. Pattern Recognition Letters, 62:1-7, 2015. URL: https://doi.org/10.1016/j.patrec.2015.04.013.
  36. Vicente Palazón-González, Andrés Marzal, and Juan Miguel Vilar. On hidden Markov models and cyclic strings for shape recognition. Pattern Recognition, 47(7):2490-2504, 2014. URL: https://doi.org/10.1016/j.patcog.2014.01.018.
  37. Süleyman Cenk Sahinalp and Uzi Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm (extended abstract). In 37th Annual Symposium on Foundations of Computer Science, FOCS 1996, pages 320-328. IEEE Computer Society, 1996. URL: https://doi.org/10.1109/SFCS.1996.548491.
  38. Peter H. Sellers. The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4):359-373, 1980. URL: https://doi.org/10.1016/0196-6774(80)90016-4.
  39. Teresa Anna Steiner. Differentially private approximate pattern matching. In 15th Innovations in Theoretical Computer Science Conference, ITCS 2024, volume 287 of LIPIcs, pages 94:1-94:18, 2024. URL: https://doi.org/10.4230/LIPICS.ITCS.2024.94.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail