,
Panagiotis Charalampopoulos
,
Tatiana Starikovskaya
Creative Commons Attribution 4.0 International license
In this work, we address the problem of approximate pattern matching with wildcards. Given a pattern P of length m containing D wildcards, a text T of length n, and an integer k, our objective is to identify all fragments of T within Hamming distance k from P.
Our primary contribution is an algorithm with runtime 𝒪(n + (D+k)(G+k)⋅ n/m) for this problem. Here, G ≤ D represents the number of maximal wildcard fragments in P. We derive this algorithm by elaborating in a non-trivial way on the ideas presented by [Charalampopoulos, Kociumaka, and Wellnitz, FOCS'20] for pattern matching with mismatches (without wildcards). Our algorithm improves over the state of the art when D, G, and k are small relative to n. For instance, if m = n/2, k = G = n^{2/5}, and D = n^{3/5}, our algorithm operates in 𝒪(n) time, surpassing the Ω(n^{6/5}) time requirement of all previously known algorithms.
In the case of exact pattern matching with wildcards (k = 0), we present a much simpler algorithm with runtime 𝒪(n + DG ⋅ n/m) that clearly illustrates our main technical innovation: the utilisation of positions of P that do not belong to any fragment of P with a density of wildcards much larger than D/m as anchors for the sought (approximate) occurrences. Notably, our algorithm outperforms the best-known 𝒪(n log m)-time FFT-based algorithms of [Cole and Hariharan, STOC'02] and [Clifford and Clifford, IPL'04] if DG = o(m log m).
We complement our algorithmic results with a structural characterization of the k-mismatch occurrences of P. We demonstrate that in a text of length 𝒪(m), these occurrences can be partitioned into 𝒪((D+k)(G+k)) arithmetic progressions. Additionally, we construct an infinite family of examples with Ω((D+k)k) arithmetic progressions of occurrences, leveraging a combinatorial result on progression-free sets [Elkin, SODA'10].
@InProceedings{bathie_et_al:LIPIcs.ESA.2024.20,
author = {Bathie, Gabriel and Charalampopoulos, Panagiotis and Starikovskaya, Tatiana},
title = {{Pattern Matching with Mismatches and Wildcards}},
booktitle = {32nd Annual European Symposium on Algorithms (ESA 2024)},
pages = {20:1--20:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-338-6},
ISSN = {1868-8969},
year = {2024},
volume = {308},
editor = {Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.20},
URN = {urn:nbn:de:0030-drops-210910},
doi = {10.4230/LIPIcs.ESA.2024.20},
annote = {Keywords: pattern matching, wildcards, mismatches, Hamming distance}
}