Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Authors Shay Golan, Ely Porat



PDF
Thumbnail PDF

File

LIPIcs.ESA.2017.41.pdf
  • Filesize: 0.54 MB
  • 15 pages

Document Identifiers

Author Details

Shay Golan
Ely Porat

Cite As Get BibTex

Shay Golan and Ely Porat. Real-Time Streaming Multi-Pattern Search for Constant Alphabet. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 41:1-41:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ESA.2017.41

Abstract

In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one  character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities:

* O(log log |Sigma|) time per character in the worst case  and O(d log m) words of space.

* O(1/epsilon) time per character in the worst case  and O(d |\Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1.

These results improve upon the algorithm of [Clifford et al., ESA'15] which uses O(d log m) words of space and takes O(log log (m+d)) time per character.

Subject Classification

Keywords
  • multi-pattern
  • dictionary
  • streaming pattern matching
  • fingerprints

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  2. Mansoor Alicherry, Muthusrinivasan Muthuprasanna, and Vijay Kumar. High speed pattern matching for network IDS/IPS. In Proceedings of the 14th IEEE International Conference on Network Protocols, ICNP, pages 187-196, 2006. URL: http://dx.doi.org/10.1109/ICNP.2006.320212.
  3. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: http://dx.doi.org/10.1006/jcss.1997.1545.
  4. Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the gap: Essentially optimal algorithms for online dictionary matching with one gap. In 27th International Symposium on Algorithms and Computation, ISAAC 2016, pages 12:1-12:12, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ISAAC.2016.12.
  5. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary matching with one gap. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM, pages 11-20, 2014. Google Scholar
  6. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary matching with a few gaps. Theor. Comput. Sci., 589:34-46, 2015. Google Scholar
  7. Tanver Athar, carl Barton, Widmer bland, Jia Gao, Costas S. Illopoulos, Chang Liu, and Solon P. Pissis. Fast circular dictionary-matching algorithm. Mathematical Structures in Computer Science, pages 1-14, 2015. Google Scholar
  8. Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, pages 785-794, 2009. URL: http://dl.acm.org/citation.cfm?id=1496770.1496856.
  9. Anat Bremler-Barr, David Hay, and Yaron Koral. Compactdfa: Scalable pattern matching using longest prefix match solutions. IEEE/ACM Trans. Netw., 22(2):415-428, 2014. URL: http://dx.doi.org/10.1109/TNET.2013.2253119.
  10. Dany Breslauer and Zvi Galil. Real-time streaming string-matching. ACM Transactions on Algorithms, 10(4):22:1-22:12, 2014. URL: http://dx.doi.org/10.1145/2635814.
  11. Dany Breslauer, Roberto Grossi, and Filippo Mignosi. Simple real-time constant-space string matching. Theor. Comput. Sci., 483:2-9, 2013. URL: http://dx.doi.org/10.1016/j.tcs.2012.11.040.
  12. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. Dictionary matching in a stream. In Proceedings of Annual European Symposium on Algorithms, ESA, pages 361-372, 2015. Google Scholar
  13. Raphaël Clifford, Allyx Fontaine, Ely Porat, Benjamin Sach, and Tatiana A. Starikovskaya. The k-mismatch problem revisited. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pages 2039-2052, 2016. URL: http://dx.doi.org/10.1137/1.9781611974331.ch142.
  14. Funda Ergün, Hossein Jowhari, and Mert Saglam. Periodicity in streams. In Proceedings of Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 13th International Workshop, APPROX 2010, and 14th International Workshop, RANDOM 2010, pages 545-559, 2010. URL: http://dx.doi.org/10.1007/978-3-642-15369-3_41.
  15. Yu Fang, Randy H. Katz, and T. V. Lakshman. Gigabit rate packet pattern-matching using TCAM. In Proceedings of 12th IEEE International Conference on Network Protocols (ICNP 2004), pages 174-183, 2004. URL: http://dx.doi.org/10.1109/ICNP.2004.1348108.
  16. Guy Feigenblat, Ely Porat, and Ariel Shiftan. An improved query time for succinct dynamic dictionary matching. In Combinatorial Pattern Matching - 25th Annual Symposium, CPM, pages 120-129, 2014. Google Scholar
  17. Guy Feigenblat, Ely Porat, and Ariel Shiftan. Linear time succinct indexable dictionary construction with applications. In Data Compression Conference , DCC, pages 13-23, 2016. Google Scholar
  18. Johannes Fischer, Travis Gagie, Pawel Gawrychowski, and Tomasz Kociumaka. Approximating LZ77 via small-space multiple-pattern matching. In Proceedings of Algorithms - ESA 2015 - 23rd Annual European Symposium, pages 533-544, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_45.
  19. Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3):538-544, 1984. URL: http://dx.doi.org/10.1145/828.1884.
  20. Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. Space-efficient dictionaries for parameterized and order-preserving pattern matching. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM, pages 2:1-2:12, 2016. Google Scholar
  21. Arnab Ganguly, Wing-Kai Hon, and Rahul Shah. A framework for dynamic parameterized dictionary matching. In 15th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT, pages 10:1-10:14, 2016. Google Scholar
  22. Shay Golan, Tsvi Kopelowitz, and Ely Porat. Streaming Pattern Matching with d Wildcards. In 24th Annual European Symposium on Algorithms (ESA), pages 44:1-44:16, 2016. Google Scholar
  23. Torben Hagerup. Sorting and searching on the word RAM. In STACS 98, 15th Annual Symposium on Theoretical Aspects of Computer Science, Paris, France, February 25-27, 1998, Proceedings, pages 366-398, 1998. URL: http://dx.doi.org/10.1007/BFb0028575.
  24. Torben Hagerup, Peter Bro Miltersen, and Rasmus Pagh. Deterministic dictionaries. J. Algorithms, 41(1):69-85, 2001. URL: http://dx.doi.org/10.1006/jagm.2001.1171.
  25. Monika Rauch Henzinger, Prabhakar Raghavan, and Sridar Rajagopalan. External Memory Algorithms, chapter Computing on data streams, pages 107-118. American Mathematical Society, 1999. Google Scholar
  26. Cheng-Liang Hsieh, Lucas Vespa, and Ning Weng. A high-throughput DPI engine on GPU via algorithm/implementation co-optimization. J. Parallel Distrib. Comput., 88:46-56, 2016. URL: http://dx.doi.org/10.1016/j.jpdc.2015.11.001.
  27. Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized matching in the streaming model. In proceedings of 30th International Symposium on Theoretical Aspects of Computer Science, STACS 2013, pages 400-411, 2013. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2013.400.
  28. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, pages 745-754, 2011. URL: http://dx.doi.org/10.1145/1993636.1993735.
  29. Junghak Kim and Song-in Choi. High speed pattern matching for deep packet inspection. In Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, pages 1310-1315. IEEE, 2009. Google Scholar
  30. Tsvi Kopelowitz, Ely Porat, and Yaron Rozen. Succinct Online Dictionary Matching with Improved Worst-Case Guarantees. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), volume 54 of Leibniz International Proceedings in Informatics (LIPIcs), pages 6:1-6:13, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2016.6.
  31. Lap-Kei Lee, Moshe Lewenstein, and Qin Zhang. Parikh matching in the streaming model. In Proceedings of String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, pages 336-341, 2012. URL: http://dx.doi.org/10.1007/978-3-642-34109-0_35.
  32. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. URL: http://dx.doi.org/10.1561/0400000002.
  33. Benny Porat and Ely Porat. Exact and approximate pattern matching in the streaming model. In Proceedings of 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, October 25-27, 2009, pages 315-323, 2009. URL: http://dx.doi.org/10.1109/FOCS.2009.11.
  34. Yaron Rozen. On the online dictionary matching problem. Master’s thesis, Bar-Ilan University, Ramat Gan, Israel, September 2016. Under the supervision of Prof. Ely Porat. Google Scholar
  35. Milan Ruzic. Constructing efficient dictionaries in close to sorting time. In Proceedings of Automata, Languages and Programming, 35th International Colloquium, ICALP 2008, pages 84-95, 2008. URL: http://dx.doi.org/10.1007/978-3-540-70575-8_8.
  36. Nathan Tuck, Timothy Sherwood, Brad Calder, and George Varghese. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proceedings IEEE INFOCOM 2004, The 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, 2004. URL: http://www.ieee-infocom.org/2004/Papers/54_5.PDF.
  37. Antonino Tumeo, Oreste Villa, and Daniel G. Chavarría-Miranda. Aho-corasick string matching on shared and distributed-memory parallel architectures. IEEE Trans. Parallel Distrib. Syst., 23(3):436-443, 2012. URL: http://dx.doi.org/10.1109/TPDS.2011.181.
  38. Antonino Tumeo, Oreste Villa, and Donatella Sciuto. Efficient pattern matching on gpus for intrusion detection systems. In Proceedings of the 7th Conference on Computing Frontiers, 2010, pages 87-88, 2010. URL: http://dx.doi.org/10.1145/1787275.1787296.
  39. Lucas Vespa and Ning Weng. GPEP: graphics processing enhanced pattern-matching for high-performance deep packet inspection. In 2011 IEEE International Conference on Internet of Things (iThings) & 4th IEEE International Conference on Cyber, Physical and Social Computing (CPSCom), pages 74-81, 2011. URL: http://dx.doi.org/10.1109/iThings/CPSCom.2011.36.
  40. Lucas Vespa and Ning Weng. Swm: Simplified wu-manber for gpu-based deep packet inspection. In Proceedings of the International Conference on Security and Management (SAM), page 1. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012. Google Scholar
  41. Yaron Weinsberg, Shimrit Tzur-David, Danny Dolev, and Tal Anker. High performance string matching algorithm for a network intrusion prevention system (nips). In High Performance Switching and Routing, 2006 Workshop on, pages 7-pp. IEEE, 2006. Google Scholar
  42. SangKyun Yun. An efficient tcam-based implementation of multipattern matching using covered state encoding. IEEE Transactions on Computers, 61(2):213-221, 2012. Google Scholar
  43. Xinyan Zha and Sartaj Sahni. Gpu-to-gpu and host-to-host multipattern string matching on a GPU. IEEE Trans. Computers, 62(6):1156-1169, 2013. URL: http://dx.doi.org/10.1109/TC.2012.61.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail