A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

Authors Tsvi Kopelowitz, Ely Porat



PDF
Thumbnail PDF

File

OASIcs.SOSA.2018.10.pdf
  • Filesize: 418 kB
  • 5 pages

Document Identifiers

Author Details

Tsvi Kopelowitz
Ely Porat

Cite As Get BibTex

Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 10:1-10:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/OASIcs.SOSA.2018.10

Abstract

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet \Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is \tilde O(n\sqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |\Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time.

Subject Classification

Keywords
  • Pattern Matching
  • Hamming Distance
  • Approximation Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. K. Abrahamson. Generalized string matching. In SIAM J. Computing 16 (6), page 1039–1051, 1987. Google Scholar
  2. A. Amir, O. Lipsky, E. Porat, and J. Umanski. Approximate matching in the l1 metric. In CPM, pages 91-103, 2005. Google Scholar
  3. Amihood Amir, Yonatan Aumann, Gary Benson, Avivit Levy, Ohad Lipsky, Ely Porat, Steven Skiena, and Uzi Vishne. Pattern matching with address errors: rearrangement distances. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 1221-1229, 2006. Google Scholar
  4. Amihood Amir, Yonatan Aumann, Piotr Indyk, Avivit Levy, and Ely Porat. Efficient computations of l_1 and l_infinity rearrangement distances. In String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007, Santiago, Chile, October 29-31, 2007, Proceedings, pages 39-49, 2007. Google Scholar
  5. Amihood Amir, Yonatan Aumann, Oren Kapah, Avivit Levy, and Ely Porat. Approximate string matching with address bit errors. In Combinatorial Pattern Matching, 19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008, Proceedings, pages 118-129, 2008. Google Scholar
  6. Amihood Amir, Estrella Eisenberg, and Ely Porat. Swap and mismatch edit distance. In Algorithms - ESA 2004, 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004, Proceedings, pages 16-27, 2004. Google Scholar
  7. Amihood Amir, Tzvika Hartman, Oren Kapah, Avivit Levy, and Ely Porat. On the cost of interchange rearrangement in strings. In Algorithms - ESA 2007, 15th Annual European Symposium, Eilat, Israel, October 8-10, 2007, Proceedings, pages 99-110, 2007. Google Scholar
  8. Amihood Amir, Moshe Lewenstein, and Ely Porat. Approximate swapped matching. In Foundations of Software Technology and Theoretical Computer Science, 20th Conference, FST TCS 2000 New Delhi, India, December 13-15, 2000, Proceedings., pages 302-311, 2000. Google Scholar
  9. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 377-386, 2010. Google Scholar
  10. A. Backurs and P. Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Accepted to 56th IEEE Symposium on Foundations of Computer Science (FOCS), 2015. Google Scholar
  11. Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In 45th Symposium on Foundations of Computer Science (FOCS 2004), 17-19 October 2004, Rome, Italy, Proceedings, pages 550-559, 2004. Google Scholar
  12. Ayelet Butman, Noa Lewenstein, Benny Porat, and Ely Porat. Jump-matching with errors. In String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007, Santiago, Chile, October 29-31, 2007, Proceedings, pages 98-106, 2007. Google Scholar
  13. Amit Chakrabarti and Oded Regev. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299-1317, 2012. Google Scholar
  14. Raphael Clifford. Matrix multiplication and pattern matching under hamming norm. http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf. Retrieved August 2015.
  15. Raphaël Clifford, Klim Efremenko, Benny Porat, Ely Porat, and Amir Rothschild. Mismatch sampling. Information and Computation, 214:112-118, 2012. Google Scholar
  16. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. From coding theory to efficient pattern matching. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 778-784, 2009. URL: http://dl.acm.org/citation.cfm?id=1496770.1496855.
  17. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. Pattern matching with don't cares and few errors. Journal of Computer System Science, 76(2):115-124, 2010. Google Scholar
  18. Raphaël Clifford and Ely Porat. A filtering algorithm for k-mismatch with don't cares. Inf. Process. Lett., 110(22):1021-1025, 2010. URL: http://dx.doi.org/10.1016/j.ipl.2010.08.012.
  19. Graham Cormode and S. Muthukrishnan. The string edit distance matching problem with moves. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA., pages 667-676, 2002. Google Scholar
  20. M.J. Fischer and M.S. Paterson. String matching and other products. r.m. karp (ed.), complexity of computation. In SIAM–AMS Proceedings, vol. 7,, page 113–125, 1974. Google Scholar
  21. T. S. Jayram, Ravi Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory of Computing, 4(1):129-135, 2008. Google Scholar
  22. H. Karloff. Fast algorithms for approximately counting mismatches. In Inf. Process. Lett. 48 (2), pages 53-60, 1993. Google Scholar
  23. Tsvi Kopelowitz and Ely Porat. Breaking the variance: Approximating the hamming distance in 1/ε time per alignment. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS, pages 601-613, 2015. Google Scholar
  24. Vladimir Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. In Probl. Inf. Transmission 1, page 8–17, 1965. Google Scholar
  25. O. Lipsky and E. Porat. Approximated pattern matching with the l1, l2 and linfinit metrics. In SPIRE, pages 212-223, 2008. Google Scholar
  26. R. Lowrance and R. A. Wagner. An extension of the string-to-string correction problem. J. of the ACM, pages 177-183, 1975. Google Scholar
  27. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, October 25-27, 2009, Atlanta, Georgia, USA, pages 315-323, 2009. Google Scholar
  28. Benny Porat, Ely Porat, and Asaf Zur. Pattern matching with pair correlation distance. In String Processing and Information Retrieval, 15th International Symposium, SPIRE 2008, Melbourne, Australia, November 10-12, 2008. Proceedings, pages 249-256, 2008. Google Scholar
  29. Ely Porat and Klim Efremenko. Approximating general metric distances between a pattern and a text. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, pages 419-427, 2008. Google Scholar
  30. Ely Porat and Ohad Lipsky. Improved sketching of hamming distance with error correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 173-182, 2007. Google Scholar
  31. Ariel Shiftan and Ely Porat. Set intersection and sequence matching. In String Processing and Information Retrieval, 16th International Symposium, SPIRE 2009, Saariselkä, Finland, August 25-27, 2009, Proceedings, pages 285-294, 2009. Google Scholar
  32. David P. Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 167-175, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail