Document

# Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

## File

LIPIcs.ICALP.2018.62.pdf
• Filesize: 0.54 MB
• 13 pages

## Cite As

Pawel Gawrychowski and Przemyslaw Uznanski. Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 62:1-62:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.62

## Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Design and analysis of algorithms
##### Keywords
• approximate pattern matching
• conditional lower bounds
• L_1 distance
• Hamming distance

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987.
2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004.
3. Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate matching in the L₁ metric. In CPM, pages 91-103, 2005.
4. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299-1317, 2012.
5. Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In CPM, pages 68-78, 2005.