Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

Gawrychowski, Pawel; Uznanski, Przemyslaw

doi:10.4230/LIPIcs.ICALP.2018.62

File

LIPIcs.ICALP.2018.62.pdf

Filesize: 0.54 MB
13 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ICALP.2018.62
URN: urn:nbn:de:0030-drops-90669

Author Details

Pawel Gawrychowski

Institute of Computer Science, University of Wrocław, Poland

Przemyslaw Uznanski

Department of Computer Science, ETH Zürich, Switzerland

Cite AsGet BibTex

Pawel Gawrychowski and Przemyslaw Uznanski. Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 62:1-62:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.62

Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

Subject Classification

ACM Subject Classification

Theory of computation → Design and analysis of algorithms

Keywords

approximate pattern matching
conditional lower bounds
L_1 distance
Hamming distance

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987.
Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004.
Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate matching in the L₁ metric. In CPM, pages 91-103, 2005.
Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299-1317, 2012.
Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In CPM, pages 68-78, 2005.
Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm. http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf, 2009. Retrieved March 2017.
Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In SODA, pages 2039-2052, 2016.
Daniel Graf, Karim Labib, and Przemyslaw Uznański. Hamming distance completeness and sparse matrix multiplication. CoRR, abs/1711.03887, 2017. URL: http://arxiv.org/abs/1711.03887.
T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory of Computing, 4(1):129-135, 2008.
Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53-60, 1993.
Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the hamming distance in 1/ε time per alignment. In FOCS, pages 601-613, 2015.
Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern hamming distance. In SOSA, pages 10:1-10:5, 2018.
Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986.
Ohad Lipsky and Ely Porat. L₁ pattern matching lower bound. Inf. Process. Lett., 105(4):141-143, 2008.
Ohad Lipsky and Ely Porat. Approximate pattern matching with the L₁, L₂ and L_∞ metrics. Algorithmica, 60(2):335-348, 2011.
David P. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, pages 167-175, 2004.