eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2024-03-11
24:1
24:22
10.4230/LIPIcs.STACS.2024.24
article
Approximate Circular Pattern Matching Under Edit Distance
Charalampopoulos, Panagiotis
1
https://orcid.org/0000-0002-6024-1557
Pissis, Solon P.
2
3
https://orcid.org/0000-0002-1445-1932
Radoszewski, Jakub
4
https://orcid.org/0000-0002-0067-6401
Rytter, Wojciech
4
https://orcid.org/0000-0002-9162-6724
Waleń, Tomasz
4
https://orcid.org/0000-0002-7369-3309
Zuba, Wiktor
2
https://orcid.org/0000-0002-1988-3507
Birkbeck, University of London, UK
CWI, Amsterdam, The Netherlands
Vrije Universiteit, Amsterdam, The Netherlands
University of Warsaw, Poland
In the k-Edit Circular Pattern Matching (k-Edit CPM) problem, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions of the substrings of T that are at edit distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented 𝒪(nk²)-time and 𝒪(nk log³ k)-time solutions for the reporting and decision versions of k-Edit CPM, respectively. Here, we show that the reporting and decision versions of k-Edit CPM can be solved in 𝒪(n+(n/m) k⁶) time and 𝒪(n+(n/m) k⁵ log³ k) time, respectively, thus obtaining the first algorithms with a complexity of the type 𝒪(n+(n/m) poly(k)) for this problem. Notably, our algorithms run in 𝒪(n) time when m = Ω(k⁶) and are superior to the previous respective solutions when m = ω(k⁴). We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer.
We obtain our solutions by exploiting the structure of approximate circular occurrences of P in T, when T is relatively short w.r.t. P. Roughly speaking, either the starting positions of approximate occurrences of rotations of P form 𝒪(k⁴) intervals that can be computed efficiently, or some rotation of P is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).
https://drops.dagstuhl.de/storage/00lipics/lipics-vol289-stacs2024/LIPIcs.STACS.2024.24/LIPIcs.STACS.2024.24.pdf
circular pattern matching
approximate pattern matching
edit distance