,
Dvir Fried
,
Shay Golan
,
Matan Kraus
,
Ely Porat
Creative Commons Attribution 4.0 International license
In this paper, we present and study the Hamming distance oracle problem. In this problem, the task is to preprocess two strings S and T of lengths n and m, respectively, to obtain a data structure that is able to return the Hamming distance between a substring of S and a substring of T.
For strings over a constant-size alphabet, we show that for every x ≤ min{n,m} there is a data structure with Õ(nm/x) preprocessing time and O(x) query time. We also provide a conditional lower bound, showing that for every ε > 0 there is no combinatorial data structure with query time O(x) and preprocessing time O((nm/x)^{1-ε}) unless combinatorial fast matrix multiplication is possible.
For strings over a general alphabet, we present a data structure with Õ(nm/√x) pre-processing time and O(x) query time for every x ≤ min {n,m}. Moreover, for every ε > 0 we provide a data structure with a preprocessing time of Õ((n+m)/ε³) that returns with high probability a (1±ε) approximation of the Hamming distance of two input substrings. The query time of the approximation data structure is Õ(1/ε²).
@InProceedings{boneh_et_al:LIPIcs.CPM.2026.1,
author = {Boneh, Itai and Fried, Dvir and Golan, Shay and Kraus, Matan and Porat, Ely},
title = {{Hamming Distance Oracles}},
booktitle = {37th Annual Symposium on Combinatorial Pattern Matching (CPM 2026)},
pages = {1:1--1:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-420-8},
ISSN = {1868-8969},
year = {2026},
volume = {369},
editor = {Bille, Philip and Prezza, Nicola},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2026.1},
URN = {urn:nbn:de:0030-drops-259278},
doi = {10.4230/LIPIcs.CPM.2026.1},
annote = {Keywords: Hamming distance, Fine-grained complexity, Data structure, Oracle}
}