Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

Authors Pawel Gawrychowski, Przemyslaw Uznanski

Thumbnail PDF


  • Filesize: 0.54 MB
  • 13 pages

Document Identifiers

Author Details

Pawel Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland
Przemyslaw Uznanski
  • Department of Computer Science, ETH Zürich, Switzerland

Cite AsGet BibTex

Pawel Gawrychowski and Przemyslaw Uznanski. Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 62:1-62:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • approximate pattern matching
  • conditional lower bounds
  • L_1 distance
  • Hamming distance


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. Google Scholar
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. Google Scholar
  3. Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate matching in the L₁ metric. In CPM, pages 91-103, 2005. Google Scholar
  4. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299-1317, 2012. Google Scholar
  5. Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In CPM, pages 68-78, 2005. Google Scholar
  6. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm., 2009. Retrieved March 2017.
  7. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In SODA, pages 2039-2052, 2016. Google Scholar
  8. Daniel Graf, Karim Labib, and Przemyslaw Uznański. Hamming distance completeness and sparse matrix multiplication. CoRR, abs/1711.03887, 2017. URL:
  9. T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory of Computing, 4(1):129-135, 2008. Google Scholar
  10. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53-60, 1993. Google Scholar
  11. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the hamming distance in 1/ε time per alignment. In FOCS, pages 601-613, 2015. Google Scholar
  12. Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern hamming distance. In SOSA, pages 10:1-10:5, 2018. Google Scholar
  13. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. Google Scholar
  14. Ohad Lipsky and Ely Porat. L₁ pattern matching lower bound. Inf. Process. Lett., 105(4):141-143, 2008. Google Scholar
  15. Ohad Lipsky and Ely Porat. Approximate pattern matching with the L₁, L₂ and L_∞ metrics. Algorithmica, 60(2):335-348, 2011. Google Scholar
  16. David P. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, pages 167-175, 2004. Google Scholar