Fine-Grained Complexity of Regular Expression Pattern Matching and Membership

Schepper, Philipp

doi:10.4230/LIPIcs.ESA.2020.80

Abstract

The currently fastest algorithm for regular expression pattern matching and membership improves the classical O(nm) time algorithm by a factor of about log^{3/2}n. Instead of focussing on general patterns we analyse homogeneous patterns of bounded depth in this work. For them a classification splitting the types in easy (strongly sub-quadratic) and hard (essentially quadratic time under SETH) is known. We take a very fine-grained look at the hard pattern types from this classification and show a dichotomy: few types allow super-poly-logarithmic improvements while the algorithms for the other pattern types can only be improved by a constant number of log-factors, assuming the Formula-SAT Hypothesis.

Amir Abboud and Karl Bringmann. Tighter connections between formula-sat and shaving logs. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, pages 8:1-8:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. Full version: https://arxiv.org/abs/1804.08978. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.8.
Amir Abboud, Richard Ryan Williams, and Huacheng Yu. More applications of the polynomial method to algorithm design. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 218-230. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.17.
Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: https://doi.org/10.1145/360825.360855.
Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 457-466. IEEE Computer Society, 2016. Full version: https://arxiv.org/abs/1511.07070. URL: https://doi.org/10.1109/FOCS.2016.56.
Philip Bille and Mikkel Thorup. Faster regular expression matching. In Susanne Albers, Alberto Marchetti-Spaccamela, Yossi Matias, Sotiris E. Nikoletseas, and Wolfgang Thomas, editors, Automata, Languages and Programming, 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I, volume 5555 of Lecture Notes in Computer Science, pages 171-182. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02927-1_16.
Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422-426, 1970. URL: https://doi.org/10.1145/362686.362692.
Maria Luisa Bonet and Samuel R. Buss. Size-depth tradeoffs for boolean fomulae. Inf. Process. Lett., 49(3):151-155, 1994. URL: https://doi.org/10.1016/0020-0190(94)90093-0.
Karl Bringmann, Allan Grønlund, and Kasper Green Larsen. A dichotomy for regular expression membership testing. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 307-318. IEEE Computer Society, 2017. Full version: https://arxiv.org/abs/1611.00918. URL: https://doi.org/10.1109/FOCS.2017.36.
Timothy M. Chan and Ryan Williams. Deterministic apsp, orthogonal vectors, and more: Quickly derandomizing razborov-smolensky. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1246-1255. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch87.
Ruiwen Chen, Valentine Kabanets, and Nitin Saurabh. An improved deterministic #sat algorithm for small de morgan formulas. In Erzsébet Csuhaj-Varjú, Martin Dietzfelbinger, and Zoltán Ésik, editors, Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part II, volume 8635 of Lecture Notes in Computer Science, pages 165-176. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-44465-8_15.
Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 592-601. ACM, 2002. URL: https://doi.org/10.1145/509907.509992.
Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Monitoring regular expressions on out-of-order streams. In Rada Chirkova, Asuman Dogac, M. Tamer Özsu, and Timos K. Sellis, editors, Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, pages 1315-1319. IEEE Computer Society, 2007. URL: https://doi.org/10.1109/ICDE.2007.369001.
Kenrick Kin, Björn Hartmann, Tony DeRose, and Maneesh Agrawala. Proton: multitouch gestures as regular expressions. In Joseph A. Konstan, Ed H. Chi, and Kristina Höök, editors, CHI Conference on Human Factors in Computing Systems, CHI '12, Austin, TX, USA - May 05-10, 2012, pages 2885-2894. ACM, 2012. URL: https://doi.org/10.1145/2207676.2208694.
Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977. URL: https://doi.org/10.1137/0206024.
Ilan Komargodski, Ran Raz, and Avishay Tal. Improved average-case lower bounds for demorgan formula size. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 588-597. IEEE Computer Society, 2013. URL: https://doi.org/10.1109/FOCS.2013.69.
David Landsman. RNP-1, an RNA-binding motif is conserved in the DNA-binding cold shock domain. Nucleic Acids Research, 20(11):2861-2864, June 1992. URL: https://doi.org/10.1093/nar/20.11.2861.
Quanzhong Li and Bongki Moon. Indexing and querying XML data for regular path expressions. In Peter M. G. Apers, Paolo Atzeni, Stefano Ceri, Stefano Paraboschi, Kotagiri Ramamohanarao, and Richard T. Snodgrass, editors, VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, pages 361-370. Morgan Kaufmann, 2001. URL: http://www.vldb.org/conf/2001/P361.pdf.
Makoto Murata. Extended path expressions for XML. In Peter Buneman, editor, Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 21-23, 2001, Santa Barbara, California, USA. ACM, 2001. URL: https://doi.org/10.1145/375551.375569.
Eugene W. Myers. A four russians algorithm for regular expression pattern matching. J. ACM, 39(2):430-448, 1992. URL: https://doi.org/10.1145/128749.128755.
Gonzalo Navarro and Mathieu Raffinot. Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology, 10(6):903-923, 2003. PMID: 14980017. URL: https://doi.org/10.1089/106652703322756140.
Rahul Santhanam. Fighting perebor: New and improved algorithms for formula and QBF satisfiability. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 183-192. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/FOCS.2010.25.
Ken Thompson. Regular expression search algorithm. Commun. ACM, 11(6):419-422, 1968. URL: https://doi.org/10.1145/363347.363387.
Richard Ryan Williams. The polynomial method in circuit complexity applied to algorithm design (invited talk). In Venkatesh Raman and S. P. Suresh, editors, 34th International Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2014, December 15-17, 2014, New Delhi, India, volume 29 of LIPIcs, pages 47-60. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2014.47.
Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, and Randy H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Laxmi N. Bhuyan, Michel Dubois, and Will Eatherton, editors, Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2006, San Jose, California, USA, December 3-5, 2006, pages 93-102. ACM, 2006. URL: https://doi.org/10.1145/1185347.1185360.

Fine-Grained Complexity of Regular Expression Pattern Matching and Membership

Author Philipp Schepper

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Fine-Grained Complexity of Regular Expression Pattern Matching and Membership

Author Philipp Schepper

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message