Charalampopoulos, Panagiotis ;
Kociumaka, Tomasz ;
Radoszewski, Jakub ;
Pissis, Solon P. ;
Rytter, Wojciech ;
Waleń, Tomasz ;
Zuba, Wiktor
Approximate Circular Pattern Matching
Abstract
We investigate the complexity of approximate circular pattern matching (CPM, in short) under the Hamming and edit distance. Under each of these two basic metrics, we are given a lengthn text T, a lengthm pattern P, and a positive integer threshold k, and we are to report all starting positions (called occurrences) of fragments of T that are at distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if there is any such occurrence. All previous results for approximate CPM were either averagecase upper bounds or heuristics, with the exception of the work of Charalampopoulos et al. [CKP^+, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP^+, JCSS'21] from 𝒪(n+(n/m) ⋅ k⁴) to 𝒪(n+(n/m) ⋅ k³ log log k) time; for the edit distance, we give an 𝒪(nk²)time algorithm. Notably, for the decision versions and wide parameterranges, we give algorithms whose complexities are almost identical to the stateoftheart for standard (i.e., noncircular) approximate pattern matching:
 For the decision version of the approximate CPM problem under the Hamming distance, we obtain an 𝒪(n+(n/m) ⋅ k² log k / log log k)time algorithm, which works in 𝒪(n) time whenever k = 𝒪(√{m log log m / log m}). In comparison, the fastest algorithm for the standard counterpart of the problem, by Chan et al. [CGKKP, STOC’20], runs in 𝒪(n) time only for k = 𝒪(√m). We achieve this result via a reduction to a geometric problem by building on ideas from [CKP^+, JCSS'21] and Charalampopoulos et al. [CKW, FOCS'20].
 For the decision version of the approximate CPM problem under the edit distance, the 𝒪(nklog³ k) runtime of our algorithm near matches the 𝒪(nk) runtime of the LandauVishkin algorithm [LV, J. Algorithms'89] for approximate pattern matching under edit distance; the latter algorithm remains the fastest known for k = Ω(m^{2/5}). As a stepping stone, we propose an 𝒪(nklog³ k)time algorithm for solving the Longest Prefix k'Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all k' ∈ {1,…,k}. Our algorithm is based on Tiskin’s theory of seaweeds [Tiskin, Math. Comput. Sci.'08], with recent advancements (see Charalampopoulos et al. [CKW, FOCS'22]), and on exploiting the seaweeds' relation to Monge matrices.
In contrast, we obtain a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its noncircular counterpart. We also show that a strongly subquadratictime algorithm for the decision version of approximate CPM under edit distance would refute the Strong Exponential Time Hypothesis.
BibTeX  Entry
@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2022.35,
author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Radoszewski, Jakub and Pissis, Solon P. and Rytter, Wojciech and Wale\'{n}, Tomasz and Zuba, Wiktor},
title = {{Approximate Circular Pattern Matching}},
booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)},
pages = {35:135:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772471},
ISSN = {18688969},
year = {2022},
volume = {244},
editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/16973},
URN = {urn:nbn:de:0030drops169738},
doi = {10.4230/LIPIcs.ESA.2022.35},
annote = {Keywords: approximate circular pattern matching, Hamming distance, edit distance}
}
01.09.2022
Keywords: 

approximate circular pattern matching, Hamming distance, edit distance 
Seminar: 

30th Annual European Symposium on Algorithms (ESA 2022)

Issue date: 

2022 
Date of publication: 

01.09.2022 