In this work, we address the problem of approximate pattern matching with wildcards. Given a pattern P of length m containing D wildcards, a text T of length n, and an integer k, our objective is to identify all fragments of T within Hamming distance k from P. Our primary contribution is an algorithm with runtime 𝒪(n + (D+k)(G+k)⋅ n/m) for this problem. Here, G ≤ D represents the number of maximal wildcard fragments in P. We derive this algorithm by elaborating in a non-trivial way on the ideas presented by [Charalampopoulos, Kociumaka, and Wellnitz, FOCS'20] for pattern matching with mismatches (without wildcards). Our algorithm improves over the state of the art when D, G, and k are small relative to n. For instance, if m = n/2, k = G = n^{2/5}, and D = n^{3/5}, our algorithm operates in 𝒪(n) time, surpassing the Ω(n^{6/5}) time requirement of all previously known algorithms. In the case of exact pattern matching with wildcards (k = 0), we present a much simpler algorithm with runtime 𝒪(n + DG ⋅ n/m) that clearly illustrates our main technical innovation: the utilisation of positions of P that do not belong to any fragment of P with a density of wildcards much larger than D/m as anchors for the sought (approximate) occurrences. Notably, our algorithm outperforms the best-known 𝒪(n log m)-time FFT-based algorithms of [Cole and Hariharan, STOC'02] and [Clifford and Clifford, IPL'04] if DG = o(m log m). We complement our algorithmic results with a structural characterization of the k-mismatch occurrences of P. We demonstrate that in a text of length 𝒪(m), these occurrences can be partitioned into 𝒪((D+k)(G+k)) arithmetic progressions. Additionally, we construct an infinite family of examples with Ω((D+k)k) arithmetic progressions of occurrences, leveraging a combinatorial result on progression-free sets [Elkin, SODA'10].
@InProceedings{bathie_et_al:LIPIcs.ESA.2024.20, author = {Bathie, Gabriel and Charalampopoulos, Panagiotis and Starikovskaya, Tatiana}, title = {{Pattern Matching with Mismatches and Wildcards}}, booktitle = {32nd Annual European Symposium on Algorithms (ESA 2024)}, pages = {20:1--20:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-338-6}, ISSN = {1868-8969}, year = {2024}, volume = {308}, editor = {Chan, Timothy and Fischer, Johannes and Iacono, John and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.20}, URN = {urn:nbn:de:0030-drops-210910}, doi = {10.4230/LIPIcs.ESA.2024.20}, annote = {Keywords: pattern matching, wildcards, mismatches, Hamming distance} }
Feedback for Dagstuhl Publishing