Streaming Pattern Matching with d Wildcards

Authors Shay Golan, Tsvi Kopelowitz, Ely Porat



PDF
Thumbnail PDF

File

LIPIcs.ESA.2016.44.pdf
  • Filesize: 1.11 MB
  • 16 pages

Document Identifiers

Author Details

Shay Golan
Tsvi Kopelowitz
Ely Porat

Cite As Get BibTex

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming Pattern Matching with d Wildcards. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 44:1-44:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ESA.2016.44

Abstract

In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol '?'. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space.

In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model.
The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space.

Subject Classification

Keywords
  • wildcards
  • don't-cares
  • streaming pattern matching
  • fingerprints

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: http://dx.doi.org/10.1006/jcss.1997.1545.
  2. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. URL: http://dx.doi.org/10.1016/S0196-6774(03)00097-X.
  3. Jean Berstel and Luc Boasson. Partial words and a theorem of fine and wilf. Theor. Comput. Sci., 218(1):135-141, 1999. URL: http://dx.doi.org/10.1016/S0304-3975(98)00255-2.
  4. Francine Blanchet-Sadri. Algorithmic Combinatorics on Partial Words. Discrete mathematics and its applications. CRC Press, 2008. URL: http://www.crcpress.com/product/isbn/9781420060928.
  5. Francine Blanchet-Sadri and Robert A. Hegstrom. Partial words and a theorem of fine and wilf revisited. Theor. Comput. Sci., 270(1-2):401-419, 2002. URL: http://dx.doi.org/10.1016/S0304-3975(00)00407-2.
  6. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Transactions on Algorithms, 10(4):22:1-22:12, 2014. URL: http://dx.doi.org/10.1145/2635814.
  7. Dany Breslauer, Roberto Grossi, and Filippo Mignosi. Simple real-time constant-space string matching. Theor. Comput. Sci., 483:2-9, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.11.040.
  8. Sabin Cautis, Filippo Mignosi, Jeffrey Shallit, Ming-wei Wang, and Soroosh Yazdani. Periodicity, morphisms, and matrices. Theor. Comput. Sci., 295:107-121, 2003. URL: http://dx.doi.org/10.1016/S0304-3975(02)00398-5.
  9. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. URL: http://dx.doi.org/10.1016/j.ipl.2006.08.002.
  10. Raphaël Clifford, Klim Efremenko, Benny Porat, and Ely Porat. A black box for online approximate pattern matching. Inf. Comput., 209(4):731-736, 2011. URL: http://dx.doi.org/10.1016/j.ic.2010.12.007.
  11. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. From coding theory to efficient pattern matching. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 778-784, 2009. URL: http://dl.acm.org/citation.cfm?id=1496770.1496855.
  12. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. Pattern matching with don't cares and few errors. J. Comput. Syst. Sci., 76(2):115-124, 2010. URL: http://dx.doi.org/10.1016/j.jcss.2009.06.002.
  13. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. Dictionary matching in a stream. In Nikhil Bansal and Irene Finocchi, editors, Proc. 23rd Annual European Symposium on Algorithms (ESA'15), volume 9294 of LNCS, pages 361-372. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_31.
  14. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 2039-2052, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch142.
  15. Raphaël Clifford, Markus Jalsenius, Ely Porat, and Benjamin Sach. Space lower bounds for online pattern matching. Theor. Comput. Sci., 483:68-74, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.06.012.
  16. Raphaël Clifford and Ely Porat. A filtering algorithm for k-mismatch with don't cares. In String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007, Santiago, Chile, October 29-31, 2007, Proceedings, pages 130-136, 2007. URL: http://dx.doi.org/10.1007/978-3-540-75530-2_12.
  17. Raphaël Clifford and Benjamin Sach. Pseudo-realtime pattern matching: Closing the gap. In Amihood Amir and Laxmi Parida, editors, Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, volume 6129 of Lecture Notes in Computer Science, pages 101-111. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_10.
  18. Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 592-601. ACM, 2002. URL: http://dx.doi.org/10.1145/509907.509992.
  19. Funda Ergün, Hossein Jowhari, and Mert Saglam. Periodicity in streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 13th International Workshop, APPROX 2010, and 14th International Workshop, RANDOM 2010, Barcelona, Spain, September 1-3, 2010. Proceedings, pages 545-559, 2010. URL: http://dx.doi.org/10.1007/978-3-642-15369-3_41.
  20. Nathan J Fine and Herbert S Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. Google Scholar
  21. Michael J Fischer and Michael S Paterson. String-matching and other products. Technical report, DTIC Document, 1974. Google Scholar
  22. Zvi Galil and Joel I. Seiferas. Time-space-optimal string matching. J. Comput. Syst. Sci., 26(3):280-294, 1983. URL: http://dx.doi.org/10.1016/0022-0000(83)90002-8.
  23. Pawel Gawrychowski. Optimal pattern matching in LZW compressed strings. ACM Transactions on Algorithms, 9(3):25, 2013. URL: http://dx.doi.org/10.1145/2483699.2483705.
  24. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming pattern matching with d wildcards. CoRR, abs/1605.16729, 2015. URL: http://arxiv.org/abs/1605.16729.
  25. Monika Rauch Henzinger, Prabhakar Raghavan, and Sridar Rajagopalan. External Memory Algorithms, chapter Computing on data streams, pages 107-118. American Mathematical Society, Boston, USA, 1999. Google Scholar
  26. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In 39th Annual Symposium on Foundations of Computer Science, FOCS'98, November 8-11, 1998, Palo Alto, California, USA, pages 166-173. IEEE Computer Society, 1998. URL: http://dx.doi.org/10.1109/SFCS.1998.743440.
  27. Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized matching in the streaming model. In 30th International Symposium on Theoretical Aspects of Computer Science, STACS 2013, February 27 - March 2, 2013, Kiel, Germany, pages 400-411, 2013. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2013.400.
  28. Adam Kalai. Efficient pattern-matching with don't cares. In David Eppstein, editor, Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA., pages 655-656. ACM/SIAM, 2002. URL: http://dl.acm.org/citation.cfm?id=545381.545468.
  29. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 745-754, 2011. URL: http://dx.doi.org/10.1145/1993636.1993735.
  30. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: http://dx.doi.org/10.1147/rd.312.0249.
  31. Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977. URL: http://dx.doi.org/10.1137/0206024.
  32. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. URL: http://dx.doi.org/10.1016/0304-3975(86)90178-7.
  33. Lap-Kei Lee, Moshe Lewenstein, and Qin Zhang. Parikh matching in the streaming model. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Cartagena de Indias, Colombia, October 21-25, 2012. Proceedings, pages 336-341, 2012. URL: http://dx.doi.org/10.1007/978-3-642-34109-0_35.
  34. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. URL: http://dx.doi.org/10.1561/0400000002.
  35. S. Muthukrishnan and H. Ramesh. String matching under a general matching relation. In R. K. Shyamasundar, editor, Foundations of Software Technology and Theoretical Computer Science, 12th Conference, New Delhi, India, December 18-20, 1992, Proceedings, volume 652 of Lecture Notes in Computer Science, pages 356-367. Springer, 1992. URL: http://dx.doi.org/10.1007/3-540-56287-7_118.
  36. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symp. on Foundations of Computer Science, FOCS 2009, Oct. 25-27, 2009, Atlanta, Georgia, USA, pages 315-323, 2009. URL: http://dx.doi.org/10.1109/FOCS.2009.11.
  37. Ely Porat and Ohad Lipsky. Improved sketching of hamming distance with error correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 173-182, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73437-6_19.
  38. William F. Smyth and Shu Wang. A new approach to the periodicity lemma on strings with holes. Theor. Comput. Sci., 410(43):4295-4302, 2009. URL: http://dx.doi.org/10.1016/j.tcs.2009.07.010.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail