Document Open Access Logo

Approximating Approximate Pattern Matching

Authors Jan Studený, Przemysław Uznański



PDF
Thumbnail PDF

File

LIPIcs.CPM.2019.15.pdf
  • Filesize: 0.49 MB
  • 13 pages

Document Identifiers

Author Details

Jan Studený
  • Department of Computer Science, ETH Zürich, Switzerland
Przemysław Uznański
  • Institute of Computer Science, University of Wrocław, Poland

Cite AsGet BibTex

Jan Studený and Przemysław Uznański. Approximating Approximate Pattern Matching. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 15:1-15:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CPM.2019.15

Abstract

Given a text T of length n and a pattern P of length m, the approximate pattern matching problem asks for computation of a particular distance function between P and every m-substring of T. We consider a (1 +/- epsilon) multiplicative approximation variant of this problem, for l_p distance function. In this paper, we describe two (1+epsilon)-approximate algorithms with a runtime of O~(n/epsilon) for all (constant) non-negative values of p. For constant p >= 1 we show a deterministic (1+epsilon)-approximation algorithm. Previously, such run time was known only for the case of l_1 distance, by Gawrychowski and Uznański [ICALP 2018] and only with a randomized algorithm. For constant 0 <= p <= 1 we show a randomized algorithm for the l_p, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS 2015, SOSA 2018] for Hamming distance (case of p=0) and of Gawrychowski and Uznański for l_1 distance.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Approximate Pattern Matching
  • l_p Distance
  • l_1 Distance
  • Hamming Distance
  • Approximation Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Karl R. Abrahamson. Generalized String Matching. SIAM J. Comput., 16(6):1039-1051, 1987. Google Scholar
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. URL: http://dx.doi.org/10.1016/S0196-6774(03)00097-X.
  3. Amihood Amir, Ohad Lipsky, Ely Porat, and Julia Umanski. Approximate Matching in the L₁ Metric. In CPM, pages 91-103, 2005. URL: http://dx.doi.org/10.1007/11496656_9.
  4. Amit Chakrabarti and Oded Regev. An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance. SIAM J. Comput., 41(5):1299-1317, 2012. URL: http://dx.doi.org/10.1137/120861072.
  5. Peter Clifford, Raphaël Clifford, and Costas S. Iliopoulos. Faster Algorithms for δ,γ-Matching and Related Problems. In CPM, pages 68-78, 2005. URL: http://dx.doi.org/10.1007/11496656_7.
  6. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm. http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf, 2009. Retrieved March 2017.
  7. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In SODA, pages 2039-2052. SIAM, 2016. Google Scholar
  8. M. J. Fischer and M. S. Paterson. STRING-MATCHING AND OTHER PRODUCTS. Technical report, Massachusetts Institute of Technology, 1974. Google Scholar
  9. Sumit Ganguly. Taylor Polynomial Estimator for Estimating Frequency Moments. In ICALP, pages 542-553, 2015. URL: http://dx.doi.org/10.1007/978-3-662-47672-7_44.
  10. Paweł Gawrychowski and Przemysław Uznański. Towards Unified Approximate Pattern Matching for Hamming and L₁ Distance. In ICALP, pages 62:1-62:13, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.62.
  11. Daniel Graf, Karim Labib, and Przemysław Uznański. Brief Announcement: Hamming Distance Completeness and Sparse Matrix Multiplication. In ICALP, pages 109:1-109:4, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.109.
  12. T. S. Jayram, Ravi Kumar, and D. Sivakumar. The One-Way Communication Complexity of Hamming Distance. Theory of Computing, 4(1):129-135, 2008. URL: http://dx.doi.org/10.4086/toc.2008.v004a006.
  13. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. On the Exact Space Complexity of Sketching and Streaming Small Norms. In SODA, pages 1161-1178, 2010. URL: http://dx.doi.org/10.1137/1.9781611973075.93.
  14. Howard J. Karloff. Fast Algorithms for Approximately Counting Mismatches. Inf. Process. Lett., 48(2):53-60, 1993. URL: http://dx.doi.org/10.1016/0020-0190(93)90177-B.
  15. Tsvi Kopelowitz and Ely Porat. Breaking the Variance: Approximating the Hamming Distance in 1/ε Time Per Alignment. In FOCS, pages 601-613, 2015. URL: http://dx.doi.org/10.1109/FOCS.2015.43.
  16. Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In SOSA, pages 10:1-10:5, 2018. URL: http://dx.doi.org/10.4230/OASIcs.SOSA.2018.10.
  17. Yi Li and David P. Woodruff. A Tight Lower Bound for High Frequency Moment Estimation with Small Error. In APPROX-RANDOM, pages 623-638, 2013. URL: http://dx.doi.org/10.1007/978-3-642-40328-6_43.
  18. Ohad Lipsky. Eficient Distance Computations. Master’s thesis, Bar-Ilan University. Department of Mathematics and Computer Science., 2003. Google Scholar
  19. Ohad Lipsky and Ely Porat. L₁ pattern matching lower bound. Inf. Process. Lett., 105(4):141-143, 2008. URL: http://dx.doi.org/10.1016/j.ipl.2007.08.011.
  20. Ohad Lipsky and Ely Porat. Approximate Pattern Matching with the L₁, L₂ and L_∞ Metrics. Algorithmica, 60(2):335-348, 2011. URL: http://dx.doi.org/10.1007/s00453-009-9345-9.
  21. Marcin Mucha, Karol Węgrzycki, and Michał Włodarczyk. A Subquadratic Approximation Scheme for Partition. In SODA, pages 70-88, 2019. URL: http://dx.doi.org/10.1137/1.9781611975482.5.
  22. John Nolan. Stable distributions: models for heavy-tailed data. Birkhauser New York, 2003. Google Scholar
  23. Ely Porat and Klim Efremenko. Approximating general metric distances between a pattern and a text. In SODA, pages 419-427, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347128.
  24. Jan Studený and Przemysław Uznański. Approximating Approximate Pattern Matching. CoRR, abs/1810.01676, 2018. URL: http://arxiv.org/abs/1810.01676.
  25. David P. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, pages 167-175, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982817.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail