,
Jakub Radoszewski
,
Wiktor Zuba
Creative Commons Attribution 4.0 International license
An elastic-degenerate (ED) string 𝐓 is a sequence 𝐓 = 𝐓[1] ⋯ 𝐓[n] of n finite sets of strings. The cardinality m of 𝐓 is the total number of strings in 𝐓[i], for all i ∈ [1..n]. The size N of 𝐓 is the total length of all m strings of 𝐓. ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1..p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of 𝐓. We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM.
Bernardini et al. (Theoretical Computer Science, 2020) showed a simple 𝒪(k m p + kN)-time algorithm for k-Mismatch EDSM and an 𝒪(k² m p + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k^{2/3}mp+√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp+ kN)-time algorithm for k-Edit EDSM.
Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np²+N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np²+N) to 𝒪(np²+N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np² + N) time, for any constant k > 0.
Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).
@InProceedings{pissis_et_al:LIPIcs.CPM.2025.28,
author = {Pissis, Solon P. and Radoszewski, Jakub and Zuba, Wiktor},
title = {{Faster Approximate Elastic-Degenerate String Matching - Part A}},
booktitle = {36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
pages = {28:1--28:19},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-369-0},
ISSN = {1868-8969},
year = {2025},
volume = {331},
editor = {Bonizzoni, Paola and M\"{a}kinen, Veli},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.28},
URN = {urn:nbn:de:0030-drops-231227},
doi = {10.4230/LIPIcs.CPM.2025.28},
annote = {Keywords: ED string, approximate string matching, Hamming distance, edit distance}
}