Document Open Access Logo

Approximating Text-To-Pattern Distance via Dimensionality Reduction

Author Przemysław Uznański



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.29.pdf
  • Filesize: 0.52 MB
  • 11 pages

Document Identifiers

Author Details

Przemysław Uznański
  • Institute of Computer Science, University of Wrocław, Poland

Cite AsGet BibTex

Przemysław Uznański. Approximating Text-To-Pattern Distance via Dimensionality Reduction. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 29:1-29:11, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.29

Abstract

Text-to-pattern distance is a fundamental problem in string matching, where given a pattern of length m and a text of length n, over an integer alphabet, we are asked to compute the distance between pattern and the text at every location. The distance function can be e.g. Hamming distance or 𝓁_p distance for some parameter p > 0. Almost all state-of-the-art exact and approximate algorithms developed in the past ∼ 40 years were using FFT as a black-box. In this work we present 𝒪~(n/ε²) time algorithms for (1±ε)-approximation of 𝓁₂ distances, and 𝒪~(n/ε³) algorithm for approximation of Hamming and 𝓁₁ distances, all without use of FFT. This is independent to the very recent development by Chan et al. [STOC 2020], where 𝒪(n/ε²) algorithm for Hamming distances not using FFT was presented - although their algorithm is much more "combinatorial", our techniques apply to other norms than Hamming.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sketching and sampling
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Approximate Pattern Matching
  • 𝓁₂ Distance
  • 𝓁₁ Distance
  • Hamming Distance
  • Approximation Algorithms
  • Combinatorial Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. Google Scholar
  2. Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671-687, 2003. URL: https://doi.org/10.1016/S0022-0000(03)00025-4.
  3. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  4. Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate matching in the L₁ metric. In Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005, Proceedings, pages 91-103, 2005. URL: https://doi.org/10.1007/11496656_9.
  5. Timothy M. Chan, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, and Ely Porat. Approximating text-to-pattern hamming distances. In STOC, 2020 (to appear). Google Scholar
  6. Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In CPM, pages 68-78, 2005. URL: https://doi.org/10.1007/11496656_7.
  7. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm. http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf. Retrieved March 2017. URL: http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf.
  8. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana Starikovskaya. The k-mismatch problem revisited. In SODA, pages 2039-2052, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch142.
  9. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. In SODA, pages 1106-1125, 2019. URL: https://doi.org/10.1137/1.9781611975482.68.
  10. Raphaël Clifford and Tatiana Starikovskaya. Approximate Hamming distance in a stream. In ICALP, pages 20:1-20:14, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.20.
  11. Michael B. Cohen, T. S. Jayram, and Jelani Nelson. Simple analyses of the sparse johnson-lindenstrauss transform. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, pages 15:1-15:9, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.15.
  12. M. J. Fischer and M. S. Paterson. String-matching and other products. Technical report, Massachusetts Institute of Technology, 1974. Google Scholar
  13. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L₁ distance. In ICALP, pages 62:1-62:13, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  14. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams. In ICALP, pages 65:1-65:16, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.65.
  15. William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space, 1984. Google Scholar
  16. Daniel M. Kane and Jelani Nelson. Sparser johnson-lindenstrauss transforms. J. ACM, 61(1):4:1-4:23, 2014. URL: https://doi.org/10.1145/2559902.
  17. Anatolii Karatsuba. Multiplication of multidigit numbers on automata. In Soviet physics doklady, volume 7, pages 595-596, 1963. Google Scholar
  18. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53-60, 1993. URL: https://doi.org/10.1016/0020-0190(93)90177-B.
  19. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the hamming distance in 1/ε time per alignment. In FOCS, pages 601-613, 2015. URL: https://doi.org/10.1109/FOCS.2015.43.
  20. Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern hamming distance. In SOSA@SODA, pages 10:1-10:5, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.10.
  21. S. R. Kosaraju. Efficient string matching. Manuscript, 1987. Google Scholar
  22. Karim Labib, Przemysław Uznański, and Daniel Wolleb-Graf. Hamming distance completeness. In CPM, pages 14:1-14:17, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.14.
  23. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. URL: https://doi.org/10.1016/0304-3975(86)90178-7.
  24. Ohad Lipsky and Ely Porat. L₁ pattern matching lower bound. Inf. Process. Lett., 105(4):141-143, 2008. URL: https://doi.org/10.1016/j.ipl.2007.08.011.
  25. Ohad Lipsky and Ely Porat. Approximate pattern matching with the L₁, L₂ and L_∞ metrics. Algorithmica, 60(2):335-348, 2011. URL: https://doi.org/10.1007/s00453-009-9345-9.
  26. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In FOCS, pages 315-323, 2009. URL: https://doi.org/10.1109/FOCS.2009.11.
  27. Tatiana Starikovskaya, Michal Svagerka, and Przemysław Uznański. L_p pattern matching in a stream. CoRR, abs/1907.04405, 2019. URL: http://arxiv.org/abs/1907.04405.
  28. Jan Studený and Przemysław Uznański. Approximating approximate pattern matching. In CPM, volume 128, pages 15:1-15:13, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.15.
  29. AL Toom. The complexity of a scheme of functional elements simulating the multiplication of integers. In Doklady Akademii Nauk, volume 150(3), pages 496-498. Russian Academy of Sciences, 1963. Google Scholar
  30. Przemysław Uznański. Approximating text-to-pattern distance via dimensionality reduction. CoRR, abs/2002.03459, 2020. URL: http://arxiv.org/abs/2002.03459.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail