Approximate Online Pattern Matching in Sublinear Time

Authors Diptarka Chakraborty, Debarati Das, Michal Koucký



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2019.10.pdf
  • Filesize: 492 kB
  • 15 pages

Document Identifiers

Author Details

Diptarka Chakraborty
  • National University of Singapore, Singapore
Debarati Das
  • University of Copenhagen, Denmark
Michal Koucký
  • Computer Science Institute of Charles University, Czech Republic

Acknowledgements

Authors would like to thank anonymous reviewers for many helpful suggestions and comments on an earlier version of this paper.

Cite AsGet BibTex

Diptarka Chakraborty, Debarati Das, and Michal Koucký. Approximate Online Pattern Matching in Sublinear Time. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.FSTTCS.2019.10

Abstract

We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Sigma, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let k_t be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k_1,k_2,...,k_n. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of k_t approximately within a constant factor. The worst case running time of our algorithm is O~(n m^(3/4)). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes k_t approximately within O(1)-multiplicative factor and m^(8/9)-additive error. Our algorithm takes O~(m^(1-(7/54))) amortized time per symbol arrival and takes O~(m^(1-(1/54))) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Koucký and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Approximate Pattern Matching
  • Online Pattern Matching
  • Edit Distance
  • Sublinear Algorithm
  • Streaming Algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud and Arturs Backurs. Towards Hardness of Approximation for Polynomial Time Problems. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 11:1-11:26, 2017. Google Scholar
  2. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. Google Scholar
  3. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 375-388, 2016. Google Scholar
  4. Amir Abboud and Aviad Rubinstein. Fast and Deterministic Constant Factor Approximation Algorithms for LCS Imply New Circuit Lower Bounds. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 35:1-35:14, 2018. Google Scholar
  5. Karl Abrahamson. Generalized String Matching. SIAM J. Comput., 16(6):1039-1051, December 1987. Google Scholar
  6. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 377-386, 2010. Google Scholar
  7. V. L. Arlazarov, E. A. Dinic, M. A. Konrod, and L. A. Faradzev. On economic construction of the transitive closure of a directed graph. Dokl. Akad, Nauk SSSR 194:487-488, 1970. [in Russian]. English translation: Soviet. Math. Dokl. 11 No. 5 (1970), 1209-1210. Google Scholar
  8. Arturs Backurs and Piotr Indyk. Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC '15, pages 51-58, New York, NY, USA, 2015. ACM. Google Scholar
  9. Karl Bringmann and Marvin Künnemann. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 79-97, 2015. Google Scholar
  10. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 979-990, 2018. Google Scholar
  11. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. CoRR, abs/1810.03664, 2018. URL: http://arxiv.org/abs/1810.03664.
  12. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A Black Box for Online Approximate Pattern Matching. In Combinatorial Pattern Matching, 19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008, Proceedings, pages 143-151, 2008. Google Scholar
  13. Raphaël Clifford, Markus Jalsenius, and Benjamin Sach. Cell-probe bounds for online edit distance and other pattern matching problems. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 552-561, 2015. Google Scholar
  14. Raphaël Clifford and Benjamin Sach. Online Approximate Matching with Non-local Distances. In Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 22-24, 2009, Proceedings, pages 142-153, 2009. Google Scholar
  15. Raphaël Clifford and Benjamin Sach. Pseudo-realtime Pattern Matching: Closing the Gap. In Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, pages 101-111, 2010. Google Scholar
  16. Richard Cole and Ramesh Hariharan. Approximate String Matching: A Simpler Faster Algorithm. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 25-27 January 1998, San Francisco, California, USA., pages 463-472, 1998. Google Scholar
  17. Maxime Crochemore. String-Matching on Ordered Alphabets. Theor. Comput. Sci., 92(1):33-47, 1992. Google Scholar
  18. Maxime Crochemore, Leszek Gasieniec, Wojciech Plandowski, and Wojciech Rytter. Two-Dimensional Pattern Matching in Linear Time and Small Space. In STACS, pages 181-192, 1995. Google Scholar
  19. Zvi Galil and Raffaele Giancarlo. Data structures and algorithms for approximate string matching. J. Complexity, 4(1):33-72, 1988. Google Scholar
  20. Zvi Galil and Kunsoo Park. An Improved Algorithm for Approximate String Matching. SIAM Journal on Computing, 19(6):989-999, 1990. Google Scholar
  21. Zvi Galil and Joel Seiferas. Time-space-optimal String Matching (Preliminary Report). In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, STOC '81, pages 106-113, New York, NY, USA, 1981. ACM. Google Scholar
  22. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 397-407, 2017. Google Scholar
  23. Leszek Gasieniec, Wojciech Plandowski, and Wojciech Rytter. The Zooming Method: A Recursive Approach to Time-Space Efficient String-Matching. Theor. Comput. Sci., 147(1&2):19-30, 1995. Google Scholar
  24. Piotr Indyk. Faster Algorithms for String Matching Problems: Matching the Convolution Bound. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, November 8-11, 1998, Palo Alto, California, USA, pages 166-173, 1998. Google Scholar
  25. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977. Google Scholar
  26. Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, pages 10:1-10:5, 2018. Google Scholar
  27. Gad M. Landau and Uzi Vishkin. Fast Parallel and Serial Approximate String Matching. Journal of Algorithms, 10(2):157-169, 1989. Google Scholar
  28. VI Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966. Google Scholar
  29. William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980. Google Scholar
  30. G. Myers. Incremental alignment algorithms and their applications. Technical Report, 1986. Google Scholar
  31. Gonzalo Navarro. A Guided Tour to Approximate String Matching. ACM Comput. Surv., 33(1):31-88, March 2001. Google Scholar
  32. Mihai Patrascu. Succincter. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 305-313, 2008. Google Scholar
  33. Peter H. Sellers. The Theory and Computation of Evolutionary Distances: pattern recognition. Journal of Algorithms, pages 1:359-373, 1980. Google Scholar
  34. Tatiana A. Starikovskaya. Communication and Streaming Complexity of Approximate Pattern Matching. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, pages 13:1-13:11, 2017. Google Scholar
  35. Esko Ukkonen. Algorithms for Approximate String Matching. Inf. Control, 64(1-3):100-118, March 1985. Google Scholar
  36. Esko Ukkonen and Derick Wood. Approximate String Matching with Suffix Automata. Algorithmica, 10(5):353-364, 1993. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail