Document Open Access Logo

Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance

Authors Pawel Gawrychowski, Przemyslaw Uznanski



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.62.pdf
  • Filesize: 0.54 MB
  • 13 pages

Document Identifiers

Author Details

Pawel Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland
Przemyslaw Uznanski
  • Department of Computer Science, ETH Zürich, Switzerland

Cite AsGet BibTex

Pawel Gawrychowski and Przemyslaw Uznanski. Towards Unified Approximate Pattern Matching for Hamming and L_1 Distance. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 62:1-62:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ICALP.2018.62

Abstract

Computing the distance between a given pattern of length n and a text of length m is defined as calculating, for every m-substring of the text, the distance between the pattern and the substring. This naturally generalizes the standard notion of exact pattern matching to incorporate dissimilarity score. For both Hamming and L_{1} distance only relatively slow O~(n sqrt{m}) solutions are known for this generalization. This can be overcome by relaxing the question. For Hamming distance, the usual relaxation is to consider the k-bounded variant, where distances exceeding k are reported as infty, while for L_{1} distance asking for a (1 +/- epsilon)-approximation seems more natural. For k-bounded Hamming distance, Amir et al. [J. Algorithms 2004] showed an O~(n sqrt{k}) time algorithm, and Clifford et al. [SODA 2016] designed an O~((m+k^{2})* n/m) time solution. We provide a smooth time trade-off between these bounds by exhibiting an O~((m+k sqrt{m})* n/m) time algorithm. We complement the trade-off with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails. We also exhibit a series of reductions that together allow us to achieve essentially the same complexity for k-bounded L_1 distance. Finally, for (1 +/- epsilon)-approximate L_1 distance, the running time of the best previously known algorithm of Lipsky and Porat [Algorithmica 2011] was O(epsilon^{-2} n). We improve this to O~(epsilon^{-1}n), thus essentially matching the complexity of the best known algorithm for (1 +/- epsilon)-approximate Hamming distance.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • approximate pattern matching
  • conditional lower bounds
  • L_1 distance
  • Hamming distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. Google Scholar
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. Google Scholar
  3. Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate matching in the L₁ metric. In CPM, pages 91-103, 2005. Google Scholar
  4. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299-1317, 2012. Google Scholar
  5. Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In CPM, pages 68-78, 2005. Google Scholar
  6. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm. http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf, 2009. Retrieved March 2017.
  7. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In SODA, pages 2039-2052, 2016. Google Scholar
  8. Daniel Graf, Karim Labib, and Przemyslaw Uznański. Hamming distance completeness and sparse matrix multiplication. CoRR, abs/1711.03887, 2017. URL: http://arxiv.org/abs/1711.03887.
  9. T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory of Computing, 4(1):129-135, 2008. Google Scholar
  10. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53-60, 1993. Google Scholar
  11. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the hamming distance in 1/ε time per alignment. In FOCS, pages 601-613, 2015. Google Scholar
  12. Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern hamming distance. In SOSA, pages 10:1-10:5, 2018. Google Scholar
  13. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. Google Scholar
  14. Ohad Lipsky and Ely Porat. L₁ pattern matching lower bound. Inf. Process. Lett., 105(4):141-143, 2008. Google Scholar
  15. Ohad Lipsky and Ely Porat. Approximate pattern matching with the L₁, L₂ and L_∞ metrics. Algorithmica, 60(2):335-348, 2011. Google Scholar
  16. David P. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, pages 167-175, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail