An elastic-degenerate (ED) string 𝐓 is a sequence 𝐓 = 𝐓[1] ⋯ 𝐓[n] of n finite sets of strings. The cardinality m of 𝐓 is the total number of strings in 𝐓[i], for all i ∈ [1..n]. The size N of 𝐓 is the total length of all m strings of 𝐓. ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1..p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of 𝐓. We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple 𝒪(k m p + kN)-time algorithm for k-Mismatch EDSM and an 𝒪(k² m p + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k^{2/3}mp+√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp+ kN)-time algorithm for k-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np²+N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np²+N) to 𝒪(np²+N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np² + N) time, for any constant k > 0. Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).
@InProceedings{pissis_et_al:LIPIcs.CPM.2025.28, author = {Pissis, Solon P. and Radoszewski, Jakub and Zuba, Wiktor}, title = {{Faster Approximate Elastic-Degenerate String Matching - Part A}}, booktitle = {36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)}, pages = {28:1--28:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-369-0}, ISSN = {1868-8969}, year = {2025}, volume = {331}, editor = {Bonizzoni, Paola and M\"{a}kinen, Veli}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.28}, URN = {urn:nbn:de:0030-drops-231227}, doi = {10.4230/LIPIcs.CPM.2025.28}, annote = {Keywords: ED string, approximate string matching, Hamming distance, edit distance} }
Feedback for Dagstuhl Publishing