A Verified Earley Parser

Rau, Martin; Nipkow, Tobias

doi:10.4230/LIPIcs.ITP.2024.31

Abstract

An Earley parser is a top-down parsing technique that is capable of parsing arbitrary context-free grammars. We present a functional implementation of an Earley parser verified using the interactive theorem prover Isabelle/HOL. Our formalization builds upon Cliff Jones' extensive, refinement-based paper proof. We implement and prove soundness and completeness of a functional recognizer modeling Jay Earley’s original imperative implementation and extend it with the necessary data structures to enable the construction of parse trees following the work of Elizabeth Scott. Building upon this foundation, we develop a functional parser and prove its soundness. We round off the paper by providing an informal argument and empirical data regarding the running time and space complexity of our implementation.

Failure to patch two-month-old bug led to massive equifax breach. https://arstechnica.com/information-technology/2017/09/massive-equifax-breach-caused-by-failure-to-patch-two-month-old-bug. Accessed: 2024-03-16.
Stagefright: Scary code in the heart of android. https://www.blackhat.com/docs/us-15/materials/us-15-Drake-Stagefright-Scary-Code-In-The-Heart-Of-Android.pdf. Accessed: 2024-03-16.
Windows media parsing remote code execution vulnerability. https://nvd.nist.gov/vuln/detail/CVE-2016-0101. Accessed: 2024-03-16.
Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation, and Compiling. Prentice-Hall, Inc., USA, 1972.
John Aycock and R. Nigel Horspool. Directly-executable Earley parsing. In Reinhard Wilhelm, editor, Compiler Construction, CC 2001, volume 2027 of LNCS, pages 229-243. Springer, 2001. URL: https://doi.org/10.1007/3-540-45306-7_16.
John Aycock and R. Nigel Horspool. Practical Earley parsing. Comput. J., 45(6):620-630, 2002. URL: https://doi.org/10.1093/comjnl/45.6.620.
Aditi Barthwal and Michael Norrish. Verified, executable parsing. In Giuseppe Castagna, editor, Programming Languages and Systems, pages 160-174, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
Clement Blaudeau and Natarajan Shankar. A verified packrat parser interpreter for parsing expression grammars. In Certified Programs and Proofs, CPP 2020, pages 3-17. ACM, 2020. URL: https://doi.org/10.1145/3372885.3373836.
Maksym Bortin. A formalisation of the Cocke-Younger-Kasami algorithm. Archive of Formal Proofs, April 2016. , Formal proof development. URL: https://isa-afp.org/entries/CYK.html.
M. Bouckaert, A. Pirotte, and M. Snelling. Efficient parsing algorithms for general context-free parsers. Information Sciences, 8(1):1-26, 1975. URL: https://doi.org/10.1016/0020-0255(75)90002-X.
Nils Anders Danielsson. Total parser combinators. In Paul Hudak and Stephanie Weirich, editors, International Conference on Functional Programming, ICFP 2010, pages 285-296. ACM, 2010. URL: https://doi.org/10.1145/1863543.1863585.
Jay Earley. An efficient context-free parsing algorithm. Commun. ACM, 13(2):94-102, 1970. URL: https://doi.org/10.1145/362007.362035.
Romain Edelmann, Jad Hamza, and Viktor Kunčak. Zippy LL(1) parsing with derivatives. In Programming Language Design and Implementation, PLDI 2020, pages 1036-1051. ACM, 2020. URL: https://doi.org/10.1145/3385412.3385992.
Denis Firsov and Tarmo Uustalu. Certified CYK parsing of context-free languages. Journal of Logical and Algebraic Methods in Programming, 83(5):459-468, 2014. Nordic Workshop on Programming Theory (NWPT 2012). URL: https://doi.org/10.1016/j.jlamp.2014.09.002.
Denis Firsov and Tarmo Uustalu. Certified normalization of context-free grammars. In Certified Programs and Proofs, CPP '15, pages 167-174. ACM, 2015. URL: https://doi.org/10.1145/2676724.2693177.
Bryan Ford. Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In International Conference on Functional Programming, ICFP '02, pages 36-47. ACM, 2002. URL: https://doi.org/10.1145/581478.581483.
Bryan Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Principles of Programming Languages, POPL '04, pages 111-122. ACM, 2004. URL: https://doi.org/10.1145/964001.964011.
Dick Grune and Ceriel J. H. Jacobs. Parsing Techniques: a Practical Guide. Ellis Horwood, 1990.
Mark Johnson. The Computational Complexity of GLR Parsing, pages 35-42. Springer US, Boston, MA, 1991. URL: https://doi.org/10.1007/978-1-4615-4034-2_3.
C B Jones. Formal development of correct algorithms: An example based on Earley’s recogniser. In Proceedings of ACM Conference on Proving Assertions about Programs, pages 150-169. ACM, 1972. URL: https://doi.org/10.1145/800235.807083.
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. Validating LR(1) parsers. In Helmut Seidl, editor, European Symposium on Programming, ESOP 2012, volume 7211 of LNCS, pages 397-416. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-28869-2_20.
Jeffrey Kegler. Marpa, a practical general parser: the recognizer, 2023. URL: https://arxiv.org/abs/1910.08129.
Adam Koprowski and Henri Binsztok. TRX: A formally verified parser interpreter. Log. Methods Comput. Sci., 7(2), 2011. URL: https://doi.org/10.2168/LMCS-7(2:18)2011.
Alexander Krauss. Recursive definitions of monadic functions. In Ekaterina Komendantskaya, Ana Bove, and Milad Niqui, editors, Partiality and Recursion in Interactive Theorem Provers, PAR@ITP 2010, volume 5 of EPiC Series, pages 1-13. EasyChair, 2010. URL: https://doi.org/10.29007/1mdt.
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. A Verified LL(1) Parser Generator. In John Harrison, John O'Leary, and Andrew Tolmach, editors, Interactive Theorem Proving (ITP 2019), volume 141 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:18, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ITP.2019.24.
Joop M.I.M. Leo. A general context-free parsing algorithm running in linear time on every LR(k) grammar without using lookahead. Theoretical Computer Science, 82(1):165-176, 1991. URL: https://doi.org/10.1016/0304-3975(91)90180-A.
Philippe McLean and R. Nigel Horspool. A faster Earley parser. In Tibor Gyimóthy, editor, Compiler Construction, CC'96, volume 1060 of LNCS, pages 281-293. Springer, 1996. URL: https://doi.org/10.1007/3-540-61053-7_68.
Tobias Nipkow and Gerwin Klein. Concrete Semantics with Isabelle/HOL. Springer, 2014. URL: http://concrete-semantics.org.
Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL - A Proof Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002.
Steven Obua. Local lexing. Archive of Formal Proofs, 2017. , Formal proof development. URL: https://isa-afp.org/entries/LocalLexing.html.
Steven Obua, Phil Scott, and Jacques Fleuriot. Local lexing, 2017. URL: https://arxiv.org/abs/1702.03277.
Sinan Polat, Merve Selcuk-Simsek, and Ilyas Cicekli. A modified Earley parser for huge natural language grammars. Res. Comput. Sci., 117:23-35, 2016. URL: https://rcs.cic.ipn.mx/2016_117/A%20Modified%20Earley%20Parser%20for%20Huge%20Natural%20Language%20Grammars.pdf.
Martin Rau. Earley parser. Archive of Formal Proofs, July 2023. , Formal proof development. URL: https://devel.isa-afp.org/entries/Earley_Parser.html.
Tom Ridge. Simple, functional, sound and complete parsing for all context-free grammars. In Jean-Pierre Jouannaud and Zhong Shao, editors, Certified Programs and Proofs, CPP 2011, volume 7086 of LNCS, pages 103-118. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-25379-9_10.
Elizabeth Scott. SPPF-style parsing from Earley recognisers. Electronic Notes in Theoretical Computer Science, 203(2):53-67, 2008. Workshop on Language Descriptions, Tools, and Applications (LDTA 2007). URL: https://doi.org/10.1016/j.entcs.2008.03.044.
Masaru Tomita. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, USA, 1985.
Masaru Tomita. An efficient augmented-context-free parsing algorithm. Comput. Linguist., 13(1–2):31-46, January 1987.

A Verified Earley Parser

Authors Martin Rau , Tobias Nipkow

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message