Document Open Access Logo

From Regular Expression Matching to Parsing

Authors Philip Bille , Inge Li Gørtz



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2019.71.pdf
  • Filesize: 1.19 MB
  • 14 pages

Document Identifiers

Author Details

Philip Bille
  • Technical University of Denmark, DTU Compute, Denmark
Inge Li Gørtz
  • Technical University of Denmark, DTU Compute, Denmark

Cite AsGet BibTex

Philip Bille and Inge Li Gørtz. From Regular Expression Matching to Parsing. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 138, pp. 71:1-71:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.MFCS.2019.71

Abstract

Given a regular expression R and a string Q, the regular expression parsing problem is to determine if Q matches R and if so, determine how it matches, e.g., by a mapping of the characters of Q to the characters in R. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string Q matches regular expression R into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • regular expressions
  • finite automata
  • regular expression parsing
  • algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley Longman Publishing Co., Inc., 1986. Google Scholar
  2. Arturs Backurs and Piotr Indyk. Which Regular Expression Patterns are Hard to Match? In Proc. 57th FOCS, pages 457-466, 2016. Google Scholar
  3. Philip Bille. New Algorithms for Regular Expression Matching. In Proc. of the 33rd ICALP, pages 643-654, 2006. Google Scholar
  4. Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486-496, 2008. Google Scholar
  5. Philip Bille and Inge Li Gørtz. From Regular Expression Matching to Parsing. Arxiv preprint arXiv:1804.02906, 2019. Google Scholar
  6. Philip Bille and Mikkel Thorup. Faster Regular Expression Matching. In Proc. 36th ICALP, pages 171-182, 2009. Full version with appendix available at http://www2.compute.dtu.dk/asciitilde phbi/files/publications/2009fremC.pdf. Google Scholar
  7. Philip Bille and Mikkel Thorup. Regular Expression Matching with Multi-Strings and Intervals. In Proc. 21st SODA, pages 1297-1308, 2010. Google Scholar
  8. Danny Dubé and Marc Feeley. Efficiently building a parse tree from a regular expression. Acta Informatica, 37(2):121-144, 2000. Google Scholar
  9. Michael L. Fredman and Dan E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci., 48(3):533-551, 1994. Google Scholar
  10. Alain Frisch and Luca Cardelli. Greedy regular expression matching. In Proc. 31st ICALP, volume 3142, pages 618-629, 2004. Google Scholar
  11. Minos N Garofalakis, Rajeev Rastogi, and Kyuseok Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. 25th VLDB, pages 223-234, 1999. Google Scholar
  12. Victor M. Glushkov. The Abstract Theory of Automata. Russian Math. Surveys, 16(5):1-53, 1961. Google Scholar
  13. D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341-343, 1975. Google Scholar
  14. Theodore Johnson, S. Muthukrishnan, and Irina Rozenbaum. Monitoring Regular Expressions on Out-of-Order Streams. In Proc. 23nd ICDE, pages 1315-1319, 2007. Google Scholar
  15. Steven M Kearns. Extending regular expressions with context operators and parse extraction. Software: Practice and Experience, 21(8):787-804, 1991. Google Scholar
  16. Kenrick Kin, Björn Hartmann, Tony DeRose, and Maneesh Agrawala. Proton: multitouch gestures as regular expressions. In Proc. SIGCHI, pages 2885-2894, 2012. Google Scholar
  17. Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan Turner. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In Proc. SIGCOMM, pages 339-350, 2006. Google Scholar
  18. Ville Laurikari. NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In Proc. 7th SPIRE, pages 181-187, 2000. Google Scholar
  19. Quanzhong Li and Bongki Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proc. 27th VLDB, pages 361-370, 2001. Google Scholar
  20. R. McNaughton and H. Yamada. Regular Expressions and State Graphs for Automata. IRE Trans. on Electronic Computers, 9(1):39-47, 1960. Google Scholar
  21. Makoto Murata. Extended path expressions of XML. In Proc. 20th PODS, pages 126-137, 2001. Google Scholar
  22. E. W. Myers. A Four-Russian Algorithm for Regular Expression Pattern Matching. J. ACM, 39(2):430-448, 1992. Google Scholar
  23. Gonzalo Navarro and Mathieu Raffinot. Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching. J. Comp. Biology, 10(6):903-923, 2003. Google Scholar
  24. Lasse Nielsen and Fritz Henglein. Bit-coded Regular Expression Parsing. In Proc. 5th LATA, pages 402-413, 2011. Google Scholar
  25. Martin Sulzmann and Kenny Zhuo Ming Lu. Regular expression sub-matching using partial derivatives. In Proc. 14th PPDP, pages 79-90, 2012. Google Scholar
  26. K. Thompson. Regular Expression Search Algorithm. Commun. ACM, 11:419-422, 1968. Google Scholar
  27. Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, and Randy H. Katz. Fast and memory-efficient regular expression matching for deep packet inspection. In Proc. ANCS, pages 93-102, 2006. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail