From Regular Expression Matching to Parsing

Bille, Philip; Gørtz, Inge Li

doi:10.4230/LIPIcs.MFCS.2019.71

File

Author Details

Philip Bille

Technical University of Denmark, DTU Compute, Denmark

Inge Li Gørtz

Technical University of Denmark, DTU Compute, Denmark

Cite AsGet BibTex

Philip Bille and Inge Li Gørtz. From Regular Expression Matching to Parsing. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 71:1-71:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.MFCS.2019.71

Abstract

Given a regular expression R and a string Q, the regular expression parsing problem is to determine if Q matches R and if so, determine how it matches, e.g., by a mapping of the characters of Q to the characters in R. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string Q matches regular expression R into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing.

Subject Classification

ACM Subject Classification

Theory of computation → Design and analysis of algorithms

Keywords

regular expressions
finite automata
regular expression parsing
algorithms

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., 1986.
Arturs Backurs and Piotr Indyk. Which Regular Expression Patterns are Hard to Match? In Proc. 57th FOCS, pages 457-466, 2016.
Philip Bille. New Algorithms for Regular Expression Matching. In Proc. of the 33rd ICALP, pages 643-654, 2006.
Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486-496, 2008.
Philip Bille and Inge Li Gørtz. From Regular Expression Matching to Parsing. Arxiv preprint arXiv:1804.02906, 2019.
Philip Bille and Mikkel Thorup. Faster Regular Expression Matching. In Proc. 36th ICALP, pages 171-182, 2009. Full version with appendix available at http://www2.compute.dtu.dk/asciitilde phbi/files/publications/2009fremC.pdf.
Philip Bille and Mikkel Thorup. Regular Expression Matching with Multi-Strings and Intervals. In Proc. 21st SODA, pages 1297-1308, 2010.
Danny Dubé and Marc Feeley. Efficiently building a parse tree from a regular expression. Acta Informatica, 37(2):121-144, 2000.
Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci., 48(3):533-551, 1994.
Alain Frisch and Luca Cardelli. Greedy regular expression matching. In Proc. 31st ICALP, volume 3142, pages 618-629, 2004.
Minos N Garofalakis, Rajeev Rastogi, and Kyuseok Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. 25th VLDB, pages 223-234, 1999.
Victor M. Glushkov. The Abstract Theory of Automata. Russian Math. Surveys, 16(5):1-53, 1961.
D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341-343, 1975.
Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Monitoring Regular Expressions on Out-of-Order Streams. In Proc. 23nd ICDE, pages 1315-1319, 2007.
Steven M Kearns. Extending regular expressions with context operators and parse extraction. Software: Practice and Experience, 21(8):787-804, 1991.
Kenrick Kin, Björn Hartmann, Tony DeRose, and Maneesh Agrawala. Proton: multitouch gestures as regular expressions. In Proc. SIGCHI, pages 2885-2894, 2012.
Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan Turner. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In Proc. SIGCOMM, pages 339-350, 2006.
Ville Laurikari. NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In Proc. 7th SPIRE, pages 181-187, 2000.
Quanzhong Li and Bongki Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proc. 27th VLDB, pages 361-370, 2001.
R. McNaughton and H. Yamada. Regular Expressions and State Graphs for Automata. IRE Trans. on Electronic Computers, 9(1):39-47, 1960.
Makoto Murata. Extended path expressions of XML. In Proc. 20th PODS, pages 126-137, 2001.
E. W. Myers. A Four-Russian Algorithm for Regular Expression Pattern Matching. J. ACM, 39(2):430-448, 1992.
Gonzalo Navarro and Mathieu Raffinot. Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching. J. Comp. Biology, 10(6):903-923, 2003.
Lasse Nielsen and Fritz Henglein. Bit-coded Regular Expression Parsing. In Proc. 5th LATA, pages 402-413, 2011.
Martin Sulzmann and Kenny Zhuo Ming Lu. Regular expression sub-matching using partial derivatives. In Proc. 14th PPDP, pages 79-90, 2012.
K. Thompson. Regular Expression Search Algorithm. Commun. ACM, 11:419-422, 1968.
Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, and Randy H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Proc. ANCS, pages 93-102, 2006.

From Regular Expression Matching to Parsing

Authors Philip Bille , Inge Li Gørtz

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

From Regular Expression Matching to Parsing

Authors Philip Bille , Inge Li Gørtz

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References