Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Authors Shay Golan, Tsvi Kopelowitz, Ely Porat



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2018.65.pdf
  • Filesize: 0.66 MB
  • 16 pages

Document Identifiers

Author Details

Shay Golan
  • Bar Ilan University, Ramat Gan, Israel
Tsvi Kopelowitz
  • Bar Ilan University, Ramat Gan, Israel
Ely Porat
  • Bar Ilan University, Ramat Gan, Israel

Cite As Get BibTex

Shay Golan, Tsvi Kopelowitz, and Ely Porat. Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 65:1-65:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ICALP.2018.65

Abstract

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately.
In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear.
We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Streaming approximate pattern matching
  • Dictionary matching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: http://dx.doi.org/10.1006/jcss.1997.1545.
  3. Amihood Amir, Yonatan Aumann, Gad M. Landau, Moshe Lewenstein, and Noa Lewenstein. Pattern matching with swaps. J. Algorithms, 37(2):247-266, 2000. URL: http://dx.doi.org/10.1006/jagm.2000.1120.
  4. Amihood Amir, Yonatan Aumann, Moshe Lewenstein, and Ely Porat. Function matching. SIAM J. Comput., 35(5):1007-1022, 2006. URL: http://dx.doi.org/10.1137/S0097539702424496.
  5. Amihood Amir, Estrella Eisenberg, and Ely Porat. Swap and mismatch edit distance. Algorithmica, 45(1):109-120, 2006. URL: http://dx.doi.org/10.1007/s00453-005-1192-8.
  6. Amihood Amir, Martin Farach, and S. Muthukrishnan. Alphabet dependence in parameterized matching. Inf. Process. Lett., 49(3):111-115, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90086-8.
  7. Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the gap: Essentially optimal algorithms for online dictionary matching with one gap. In Proceedings of the 27th International Symposium on Algorithms and Computation, ISAAC, pages 12:1-12:12, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ISAAC.2016.12.
  8. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary matching with one gap. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM, pages 11-20, 2014. Google Scholar
  9. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary matching with a few gaps. Theor. Comput. Sci., 589:34-46, 2015. Google Scholar
  10. Amihood Amir, Moshe Lewenstein, and Ely Porat. Approximate swapped matching. Inf. Process. Lett., 83(1):33-39, 2002. URL: http://dx.doi.org/10.1016/S0020-0190(01)00302-7.
  11. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. URL: http://dx.doi.org/10.1016/S0196-6774(03)00097-X.
  12. Amihood Amir and Gonzalo Navarro. Parameterized matching on non-linear structures. Inf. Process. Lett., 109(15):864-867, 2009. URL: http://dx.doi.org/10.1016/j.ipl.2009.04.012.
  13. Amihood Amir and Igor Nor. Generalized function matching. J. Discrete Algorithms, 5(3):514-523, 2007. URL: http://dx.doi.org/10.1016/j.jda.2006.10.001.
  14. Tanver Athar, carl Barton, Widmer bland, Jia Gao, Costas S. Illopoulos, Chang Liu, and Solon P. Pissis. Fast circular dictionary-matching algorithm. Mathematical Structures in Computer Science, pages 1-14, 2015. Google Scholar
  15. Brenda S. Baker. Parameterized pattern matching by boyer-moore-type algorithms. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 1995., pages 541-550, 1995. URL: http://dl.acm.org/citation.cfm?id=313651.313816.
  16. Brenda S. Baker. Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci., 52(1):28-42, 1996. URL: http://dx.doi.org/10.1006/jcss.1996.0003.
  17. Brenda S. Baker. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput., 26(5):1343-1362, 1997. URL: http://dx.doi.org/10.1137/S0097539793246707.
  18. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Transactions on Algorithms, 10(4):22:1-22:12, 2014. URL: http://dx.doi.org/10.1145/2635814.
  19. Dany Breslauer, Roberto Grossi, and Filippo Mignosi. Simple real-time constant-space string matching. Theor. Comput. Sci., 483:2-9, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.11.040.
  20. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. URL: http://dx.doi.org/10.1016/j.ipl.2006.08.002.
  21. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. From coding theory to efficient pattern matching. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 778-784, 2009. URL: http://dl.acm.org/citation.cfm?id=1496770.1496855.
  22. Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. Pattern matching with don't cares and few errors. J. Comput. Syst. Sci., 76(2):115-124, 2010. URL: http://dx.doi.org/10.1016/j.jcss.2009.06.002.
  23. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. Dictionary matching in a stream. In 23rd Annual European Symposium of Algorithms, ESA, pages 361-372, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_31.
  24. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2039-2052, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch142.
  25. Raphaël Clifford, Markus Jalsenius, Ely Porat, and Benjamin Sach. Pattern matching in multiple streams. In Proceedings of Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, pages 97-109, 2012. URL: http://dx.doi.org/10.1007/978-3-642-31265-6_8.
  26. Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k-mismatch problem. CoRR, abs/1708.05223, 2017. URL: http://arxiv.org/abs/1708.05223.
  27. Raphaël Clifford and Ely Porat. A filtering algorithm for k -mismatch with don't cares. In String Processing and Information Retrieval, 14th International Symposium, SPIRE 2007, Santiago, Chile, October 29-31, 2007, Proceedings, pages 130-136, 2007. URL: http://dx.doi.org/10.1007/978-3-540-75530-2_12.
  28. Raphaël Clifford and Tatiana Starikovskaya. Approximate Hamming Distance in a Stream. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), pages 20:1-20:14, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2016.20.
  29. Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 592-601, 2002. URL: http://dx.doi.org/10.1145/509907.509992.
  30. Funda Ergün, Hossein Jowhari, and Mert Saglam. Periodicity in streams. In Proceedings of Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 13th International Workshop, APPROX 2010, and 14th International Workshop, RANDOM, pages 545-559, 2010. Google Scholar
  31. Guy Feigenblat, Ely Porat, and Ariel Shiftan. An improved query time for succinct dynamic dictionary matching. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM, pages 120-129, 2014. Google Scholar
  32. Guy Feigenblat, Ely Porat, and Ariel Shiftan. Linear time succinct indexable dictionary construction with applications. In Data Compression Conference , DCC, pages 13-23, 2016. Google Scholar
  33. Michael J Fischer and Michael S Paterson. String-matching and other products. Technical report, DTIC Document, 1974. Google Scholar
  34. Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. Space-efficient dictionaries for parameterized and order-preserving pattern matching. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM, pages 2:1-2:12, 2016. Google Scholar
  35. Arnab Ganguly, Wing-Kai Hon, and Rahul Shah. A framework for dynamic parameterized dictionary matching. In 15th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT, pages 10:1-10:14, 2016. Google Scholar
  36. Anna C. Gilbert, Hung Q. Ngo, Ely Porat, Atri Rudra, and Martin J. Strauss. l2/l2-foreach sparse recovery with low risk. In Proceedings of Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, pages 461-472, 2013. URL: http://dx.doi.org/10.1007/978-3-642-39206-1_39.
  37. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming Pattern Matching with d Wildcards. In 24th Annual European Symposium on Algorithms (ESA), pages 44:1-44:16, 2016. Google Scholar
  38. Shay Golan and Ely Porat. Real-time streaming multi-pattern search for constant alphabet. In 25th Annual European Symposium on Algorithms, ESA 2017, pages 41:1-41:15, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.ESA.2017.41.
  39. Carmit Hazay, Moshe Lewenstein, and Dina Sokol. Approximate parameterized matching. ACM Trans. Algorithms, 3(3):29, 2007. URL: http://dx.doi.org/10.1145/1273340.1273345.
  40. Wei Huang, Yaoyun Shi, Shengyu Zhang, and Yufan Zhu. The communication complexity of the hamming distance problem. Inf. Process. Lett., 99(4):149-153, 2006. URL: http://dx.doi.org/10.1016/j.ipl.2006.01.014.
  41. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, pages 166-173, 1998. URL: http://dx.doi.org/10.1109/SFCS.1998.743440.
  42. Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized matching in the streaming model. In Proceedings Symposium on Theoretical Aspects of Computer Science, STACS, pages 400-411, 2013. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2013.400.
  43. Adam Kalai. Efficient pattern-matching with don't cares. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2002, pages 655-656, 2002. URL: http://dl.acm.org/citation.cfm?id=545381.545468.
  44. Richard M. Karp and Michael O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987. URL: http://dx.doi.org/10.1147/rd.312.0249.
  45. Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct online dictionary matching with improved worst-case guarantees. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM, pages 6:1-6:13, 2016. Google Scholar
  46. Gad M. Landau and Uzi Vishkin. Efficient string matching with k mismatches. Theor. Comput. Sci., 43:239-249, 1986. URL: http://dx.doi.org/10.1016/0304-3975(86)90178-7.
  47. Lap-Kei Lee, Moshe Lewenstein, and Qin Zhang. Parikh matching in the streaming model. In String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings, pages 336-341, 2012. URL: http://dx.doi.org/10.1007/978-3-642-34109-0_35.
  48. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. URL: http://dx.doi.org/10.1561/0400000002.
  49. S. Muthukrishnan and H. Ramesh. String matching under a general matching relation. In Foundations of Software Technology and Theoretical Computer Science, 12th Conference, New Delhi, India, December 18-20, 1992, Proceedings, pages 356-367, 1992. URL: http://dx.doi.org/10.1007/3-540-56287-7_118.
  50. Hung Q. Ngo, Ely Porat, and Atri Rudra. Efficiently decodable error-correcting list disjunct matrices and applications - (extended abstract). In Proceedings of Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, pages 557-568, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22006-7_47.
  51. Hung Q. Ngo, Ely Porat, and Atri Rudra. Efficiently decodable compressed sensing by list-recoverable codes and recursion. In 29th International Symposium on Theoretical Aspects of Computer Science, STACS 2012, pages 230-241, 2012. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2012.230.
  52. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 315-323, 2009. URL: http://dx.doi.org/10.1109/FOCS.2009.11.
  53. Ely Porat and Ohad Lipsky. Improved sketching of hamming distance with error correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 173-182, 2007. URL: http://dx.doi.org/10.1007/978-3-540-73437-6_19.
  54. Ely Porat and Amir Rothschild. Explicit nonadaptive combinatorial group testing schemes. IEEE Trans. Information Theory, 57(12):7982-7989, 2011. URL: http://dx.doi.org/10.1109/TIT.2011.2163296.
  55. Jakub Radoszewski and Tatiana A. Starikovskaya. Streaming k-mismatch with error correcting and applications. In 2017 Data Compression Conference, DCC, pages 290-299, 2017. URL: http://dx.doi.org/10.1109/DCC.2017.14.
  56. Tatiana Starikovskaya. Communication and Streaming Complexity of Approximate Pattern Matching. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), pages 13:1-13:11, 2017. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2017.13.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail