Document

# Approximate Online Pattern Matching in Sublinear Time

## File

LIPIcs.FSTTCS.2019.10.pdf
• Filesize: 492 kB
• 15 pages

## Acknowledgements

Authors would like to thank anonymous reviewers for many helpful suggestions and comments on an earlier version of this paper.

## Cite As

Diptarka Chakraborty, Debarati Das, and Michal Koucký. Approximate Online Pattern Matching in Sublinear Time. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 150, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.FSTTCS.2019.10

## Abstract

We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern P of length m and a text T of length n over some alphabet Sigma, and a positive integer k. The goal is to find all the positions j in T such that there is a substring of T ending at j which has edit distance at most k from the pattern P. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position t in {1,...,n}, let k_t be the smallest edit distance between P and any substring of T ending at t. In this paper we give a constant factor approximation to the sequence k_1,k_2,...,k_n. We consider both offline and online settings. In the offline setting, where both P and T are available, we present an algorithm that for all t in {1,...,n}, computes the value of k_t approximately within a constant factor. The worst case running time of our algorithm is O~(n m^(3/4)). In the online setting, we are given P and then T arrives one symbol at a time. We design an algorithm that upon arrival of the t-th symbol of T computes k_t approximately within O(1)-multiplicative factor and m^(8/9)-additive error. Our algorithm takes O~(m^(1-(7/54))) amortized time per symbol arrival and takes O~(m^(1-(1/54))) additional space apart from storing the pattern P. Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first algorithm that takes worst-case sublinear (in the length of the pattern) time and sublinear extra space for the online approximate pattern matching problem. To get our result we build on the technique of Chakraborty, Das, Goldenberg, Koucký and Saks [FOCS'18] for computing a constant factor approximation of edit distance in sub-quadratic time.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Pattern matching
• Theory of computation → Streaming, sublinear and near linear time algorithms
##### Keywords
• Approximate Pattern Matching
• Online Pattern Matching
• Edit Distance
• Sublinear Algorithm
• Streaming Algorithm

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Amir Abboud and Arturs Backurs. Towards Hardness of Approximation for Polynomial Time Problems. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, pages 11:1-11:26, 2017.
2. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015.
3. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 375-388, 2016.
4. Amir Abboud and Aviad Rubinstein. Fast and Deterministic Constant Factor Approximation Algorithms for LCS Imply New Circuit Lower Bounds. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 35:1-35:14, 2018.
5. Karl Abrahamson. Generalized String Matching. SIAM J. Comput., 16(6):1039-1051, December 1987.
6. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 377-386, 2010.
7. V. L. Arlazarov, E. A. Dinic, M. A. Konrod, and L. A. Faradzev. On economic construction of the transitive closure of a directed graph. Dokl. Akad, Nauk SSSR 194:487-488, 1970. [in Russian]. English translation: Soviet. Math. Dokl. 11 No. 5 (1970), 1209-1210.
8. Arturs Backurs and Piotr Indyk. Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC '15, pages 51-58, New York, NY, USA, 2015. ACM.
9. Karl Bringmann and Marvin Künnemann. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 79-97, 2015.
10. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 979-990, 2018.
11. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. CoRR, abs/1810.03664, 2018. URL: http://arxiv.org/abs/1810.03664.
12. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A Black Box for Online Approximate Pattern Matching. In Combinatorial Pattern Matching, 19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008, Proceedings, pages 143-151, 2008.
13. Raphaël Clifford, Markus Jalsenius, and Benjamin Sach. Cell-probe bounds for online edit distance and other pattern matching problems. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 552-561, 2015.
14. Raphaël Clifford and Benjamin Sach. Online Approximate Matching with Non-local Distances. In Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 22-24, 2009, Proceedings, pages 142-153, 2009.
15. Raphaël Clifford and Benjamin Sach. Pseudo-realtime Pattern Matching: Closing the Gap. In Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, pages 101-111, 2010.
16. Richard Cole and Ramesh Hariharan. Approximate String Matching: A Simpler Faster Algorithm. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 25-27 January 1998, San Francisco, California, USA., pages 463-472, 1998.
17. Maxime Crochemore. String-Matching on Ordered Alphabets. Theor. Comput. Sci., 92(1):33-47, 1992.
18. Maxime Crochemore, Leszek Gasieniec, Wojciech Plandowski, and Wojciech Rytter. Two-Dimensional Pattern Matching in Linear Time and Small Space. In STACS, pages 181-192, 1995.
19. Zvi Galil and Raffaele Giancarlo. Data structures and algorithms for approximate string matching. J. Complexity, 4(1):33-72, 1988.
20. Zvi Galil and Kunsoo Park. An Improved Algorithm for Approximate String Matching. SIAM Journal on Computing, 19(6):989-999, 1990.
21. Zvi Galil and Joel Seiferas. Time-space-optimal String Matching (Preliminary Report). In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, STOC '81, pages 106-113, New York, NY, USA, 1981. ACM.
22. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 397-407, 2017.
23. Leszek Gasieniec, Wojciech Plandowski, and Wojciech Rytter. The Zooming Method: A Recursive Approach to Time-Space Efficient String-Matching. Theor. Comput. Sci., 147(1&2):19-30, 1995.
24. Piotr Indyk. Faster Algorithms for String Matching Problems: Matching the Convolution Bound. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, November 8-11, 1998, Palo Alto, California, USA, pages 166-173, 1998.
25. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977.
26. Tsvi Kopelowitz and Ely Porat. A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance. In 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, pages 10:1-10:5, 2018.
27. Gad M. Landau and Uzi Vishkin. Fast Parallel and Serial Approximate String Matching. Journal of Algorithms, 10(2):157-169, 1989.
28. VI Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady, 10:707, 1966.
29. William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980.
30. G. Myers. Incremental alignment algorithms and their applications. Technical Report, 1986.
31. Gonzalo Navarro. A Guided Tour to Approximate String Matching. ACM Comput. Surv., 33(1):31-88, March 2001.
32. Mihai Patrascu. Succincter. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 305-313, 2008.
33. Peter H. Sellers. The Theory and Computation of Evolutionary Distances: pattern recognition. Journal of Algorithms, pages 1:359-373, 1980.
34. Tatiana A. Starikovskaya. Communication and Streaming Complexity of Approximate Pattern Matching. In 28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, July 4-6, 2017, Warsaw, Poland, pages 13:1-13:11, 2017.
35. Esko Ukkonen. Algorithms for Approximate String Matching. Inf. Control, 64(1-3):100-118, March 1985.
36. Esko Ukkonen and Derick Wood. Approximate String Matching with Suffix Automata. Algorithmica, 10(5):353-364, 1993.