Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap

Authors Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, B. Riva Shalom



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2016.12.pdf
  • Filesize: 0.49 MB
  • 12 pages

Document Identifiers

Author Details

Amihood Amir
Tsvi Kopelowitz
Avivit Levy
Seth Pettie
Ely Porat
B. Riva Shalom

Cite As Get BibTex

Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the Gap: Essentially Optimal Algorithms for Online Dictionary Matching with One Gap. In 27th International Symposium on Algorithms and Computation (ISAAC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 64, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ISAAC.2016.12

Abstract

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap.

In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of Omega(delta(G_D)+op) time per text character, where G_D is a bipartite graph that captures the structure of D, delta(G_D) is the degeneracy of this graph, and op is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal.

We also provide matching upper-bounds (up to sub-polynomial factors), in terms of the degeneracy, for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on delta(G_D). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of practical interest since although delta(G_D) can be as large as sqrt(d), and even larger if G_D is a multi-graph, it is typically a very small constant in practice. Finally, when delta(G_D) is large we are able to obtain even more efficient solutions.

Subject Classification

Keywords
  • Pattern matching
  • Dictionary matching
  • 3SUM
  • Triangle reporting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 434-443, 2014. Google Scholar
  2. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  3. N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844-856, 1995. Google Scholar
  4. Amihood Amir, Martin Farach, Ramana M. Idury, Johannes A. La Poutré, and Alejandro A. Schäffer. Improved dynamic dictionary matching. Inf. Comput., 119(2):258-282, 1995. Google Scholar
  5. Amihood Amir, Dmitry Keselman, Gad M. Landau, Moshe Lewenstein, Noa Lewenstein, and Michael Rodeh. Text indexing and dictionary matching with one error. J. Algorithms, 37(2):309-325, 2000. URL: http://dx.doi.org/10.1006/jagm.2000.1104.
  6. Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom. Mind the gap. CoRR, abs/1503.07563, 2015. Google Scholar
  7. Amihood Amir, Avivit Levy, Ely Porat, and B. Riva Shalom. Dictionary matching with one gap. In CPM, pages 11-20, 2014. Google Scholar
  8. Nikhil Bansal and Ryan Williams. Regularity lemmas and combinatorial algorithms. Theory of Computing, 8(1):69-94, 2012. Google Scholar
  9. Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, and David Kofoed Wind. String matching with variable length gaps. Theor. Comput. Sci., 443:25-34, 2012. Google Scholar
  10. Philip Bille and Mikkel Thorup. Regular expression matching with multi-strings and intervals. In Proc. of SODA, pages 1297-1308, 2010. Google Scholar
  11. A. Bjørklund, R. Pagh, V. Vassilevska Williams, and U. Zwick. Listing triangles. In Proceedings 41st Int'l Colloquium on Automata, Languages, and Programming (ICALP (I)), pages 223-234, 2014. Google Scholar
  12. Gerth Stølting Brodal and Leszek Gasieniec. Approximate dictionary queries. In Proc. of CPM, pages 65-74, 1996. Google Scholar
  13. Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14(1):210-223, 1985. URL: http://dx.doi.org/10.1137/0214017.
  14. Hagai Cohen and Ely Porat. Fast set intersection and two-patterns matching. Theor. Comput. Sci., 411(40-42):3795-3800, 2010. URL: http://dx.doi.org/10.1016/j.tcs.2010.06.002.
  15. Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proc. of STOC, pages 91-100, 2004. Google Scholar
  16. Kimmo Fredriksson and Szymon Grabowski. Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr., 11(4):335-357, 2008. Google Scholar
  17. A. Grønlund and S. Pettie. Threesomes, degenerates, and love triangles. CoRR, abs/1404.0799, 2014. Google Scholar
  18. Tuukka Haapasalo, Panu Silvasti, Seppo Sippu, and Eljas Soisalon-Soininen. Online dictionary matching with variable-length gaps. In Proc. of SEA, pages 76-87, 2011. Google Scholar
  19. Kay Hofmann, Philipp Bucher, Laurent Falquet, and Amos Bairoch. The PROSITE database, its status in 1999. Nucleic Acids Research, 27(1):215-219, 1999. Google Scholar
  20. Wing-Kai Hon, Tak-Wah Lam, Rahul Shah, Sharma V. Thankachan, Hing-Fung Ting, and Yilin Yang. Dictionary matching with uneven gaps. In CPM, pages 247-260, 2015. Google Scholar
  21. A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM J. Comput., 7(4):413-423, 1978. Google Scholar
  22. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Dynamic set intersection. In WADS, 2015. Google Scholar
  23. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3sum conjecture. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, 2016. Google Scholar
  24. Gregory Kucherov and Michaël Rusinowitch. Matching a set of strings with variable length don't cares. Theor. Comput. Sci., 178(1-2):129-154, 1997. Google Scholar
  25. Michele Morgante, Alberto Policriti, Nicola Vitacolonna, and Andrea Zuccolo. Structured motifs search. Journal of Computational Biology, 12(8):1065-1082, 2005. Google Scholar
  26. Christian Worm Mortensen. Fully dynamic orthogonal range reporting on RAM. SIAM J. Comput., 35(6):1494-1525, 2006. Google Scholar
  27. Eugene W. Myers. A four russians algorithm for regular expression pattern matching. J. ACM, 39(2):430-448, 1992. Google Scholar
  28. G. Myers and G. Mehldau. A system for pattern matching applications on biosequences. CABIOS, 9(3):299-314, 1993. Google Scholar
  29. Gonzalo Navarro and Mathieu Raffinot. Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology, 10(6):903-923, 2003. URL: http://dx.doi.org/10.1089/106652703322756140.
  30. Mihai Pǎtraşcu. Towards polynomial lower bounds for dynamic problems. In Leonard J. Schulman, editor, STOC, pages 603-610. ACM, 2010. URL: http://dx.doi.org/10.1145/1806689.1806772.
  31. Verint. Personal communication, 2013. Google Scholar
  32. Meng Zhang, Yi Zhang, and Liang Hu. A faster algorithm for matching a set of patterns with variable length don't cares. Inf. Process. Lett., 110(6):216-220, 2010. URL: http://dx.doi.org/10.1016/j.ipl.2009.12.007.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail