Gawrychowski, Pawel ;
Uznanski, Przemyslaw
Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance
Abstract
Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every msubstring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the kbounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/ epsilon)approximation seems more natural. For kbounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time tradeoff between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the tradeoff with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for kbounded L_1 distance. Finally, for (1 +/ epsilon)approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{2} n). We improve this to O~(epsilon^{1}n), thus essentially matching the complexity of the best known algorithm for (1 +/ epsilon)approximate Hamming distance.
BibTeX  Entry
@InProceedings{gawrychowski_et_al:LIPIcs:2018:9066,
author = {Pawel Gawrychowski and Przemyslaw Uznanski},
title = {{Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance}},
booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
pages = {62:162:13},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770767},
ISSN = {18688969},
year = {2018},
volume = {107},
editor = {Ioannis Chatzigiannakis and Christos Kaklamanis and D{\'a}niel Marx and Donald Sannella},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9066},
URN = {urn:nbn:de:0030drops90669},
doi = {10.4230/LIPIcs.ICALP.2018.62},
annote = {Keywords: approximate pattern matching, conditional lower bounds, L_1 distance, Hamming distance}
}
04.07.2018
Keywords: 

approximate pattern matching, conditional lower bounds, L_1 distance, Hamming distance 
Seminar: 

45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Issue date: 

2018 
Date of publication: 

04.07.2018 