Morpheus: Automated Safety Verification of Data-Dependent Parser Combinator Programs

Mishra, Ashish; Jagannathan, Suresh

doi:10.4230/LIPIcs.ECOOP.2023.20

Abstract

Parser combinators are a well-known mechanism used for the compositional construction of parsers, and have shown to be particularly useful in writing parsers for rich grammars with data-dependencies and global state. Verifying applications written using them, however, has proven to be challenging in large part because of the inherently effectful nature of the parsers being composed and the difficulty in reasoning about the arbitrarily rich data-dependent semantic actions that can be associated with parsing actions. In this paper, we address these challenges by defining a parser combinator framework called Morpheus equipped with abstractions for defining composable effects tailored for parsing and semantic actions, and a rich specification language used to define safety properties over the constituent parsers comprising a program. Even though its abstractions yield many of the same expressivity benefits as other parser combinator systems, Morpheus is carefully engineered to yield a substantially more tractable automated verification pathway. We demonstrate its utility in verifying a number of realistic, challenging parsing applications, including several cases that involve non-trivial data-dependent relations.

Michael D. Adams and Ömer S. Ağacan. Indentation-sensitive parsing for parsec. In SIGPLAN Notices, volume 49(12), pages 121-132, New York, NY, USA, September 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2775050.2633369.
Ali Afroozeh and Anastasia Izmaylova. One parser to rule them all. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!), Onward! 2015, pages 151-170, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2814228.2814242.
Angstrom. Angstrom parser-combinator library, 2021. URL: https://github.com/inhabitedtype/angstrom.
Edwin Brady. Idris: Implementing a dependently typed programming language. In Proceedings of the 2014 International Workshop on Logical Frameworks and Meta-Languages: Theory and Practice, LFMTP '14, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2631172.2631174.
Arthur Charguéraud. Characteristic formulae for the verification of imperative programs. SIGPLAN Not., 46(9):418-430, September 2011. URL: https://doi.org/10.1145/2034574.2034828.
Nils Anders Danielsson. Total parser combinators. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP '10, pages 285-296, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1863543.1863585.
Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In C. R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 337-340, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, and Adam Chlipala. Narcissus: Correct-by-construction derivation of decoders and encoders from binary formats. Proc. ACM Program. Lang., 3(ICFP), July 2019. URL: https://doi.org/10.1145/3341686.
DNS. Domain names - Implementation and specification, 1987. Network Working Group. URL: https://www.rfc-editor.org/rfc/rfc1035.
J. Gross and Adam Chlipala. Parsing parsers a pearl of ( dependently typed ) programming and proof, 2015.
HaskellWiki. Parsec - HaskellWiki, 2021. [Online; accessed 7-July-2022]. URL: https://wiki.haskell.org/index.php?title=Parsec&oldid=64649.
Graham Hutton and Erik Meijer. Monadic parser combinators, September 1999.
Documentation for the Idris Language, 2017. URL: https://docs.idris-lang.org/en/latest/index.html.
Trevor Jim, Yitzhak Mandelbaum, and David Walker. Semantics and algorithms for data-dependent grammars. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '10, pages 417-430, New York, NY, USA, 2010. Association for Computing Machinery. URL: https://doi.org/10.1145/1706299.1706347.
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. Validating lr(1) parsers. In Helmut Seidl, editor, Programming Languages and Systems, pages 397-416, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. Validating lr(1) parsers. In Helmut Seidl, editor, Programming Languages and Systems, pages 397-416, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. Rustbelt: Securing the foundations of the rust programming language. Proc. ACM Program. Lang., 2(POPL), December 2017. URL: https://doi.org/10.1145/3158154.
Gowtham Kaki and Suresh Jagannathan. A relational framework for higher-order shape analysis. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP '14, pages 311-324, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2628136.2628159.
Mark Karpov. Megaparsec: Monadic Parser Combinators, 2022. URL: https://github.com/mrkkrp/megaparsec.
Shin-ya Katsumata. Parametric effect monads and semantics of effect systems. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '14, pages 633-645, New York, NY, USA, 2014. Association for Computing Machinery. URL: https://doi.org/10.1145/2535838.2535846.
Adam Koprowski and Henri Binsztok. Trx: A formally verified parser editor=Gordon, Andrew D., interpreter. In Programming Languages and Systems, pages 345-365, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
Neelakantan Krishnaswami and Jeremy Yallop. A typed, algebraic approach to parsing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 379-393, June 2019. URL: https://doi.org/10.1145/3314221.3314625.
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. Costar: A verified all(*) parser. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, pages 420-434, New York, NY, USA, 2021. Association for Computing Machinery. URL: https://doi.org/10.1145/3453483.3454053.
Nicolas Laurent and Kim Mens. Taming context-sensitive languages with principled stateful parsing. In Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering, SLE 2016, pages 15-27, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2997364.2997370.
Daan Leijen and Erik Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, University of Utrecht, July 2001. User Modeling 2007, 11th International Conference, UM 2007, Corfu, Greece, June 25-29, 2007. URL: https://www.microsoft.com/en-us/research/publication/parsec-direct-style-monadic-parser-combinators-for-the-real-world/.
Simon Marlow. Haskell 2010 language report, 2010. URL: https://www.haskell.org/onlinereport/haskell2010/.
J. McCarthy. Towards a Mathematical Science of Computation, pages 35-56. Springer Netherlands, Dordrecht, 1993. URL: https://doi.org/10.1007/978-94-011-1793-7_2.
Ashish Mishra and Suresh Jagannathan. Morpheus: Automated safety verification of data-dependent parser combinator programs, 2023. URL: https://doi.org/10.48550/arXiv.2305.07901.
Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. Rocksalt: Better, faster, stronger sfi for the x86. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 395-404, New York, NY, USA, 2012. Association for Computing Machinery. URL: https://doi.org/10.1145/2254064.2254111.
Max Murato. MParser, A Simple Monadic Parser Combinator Library, 2021. URL: https://github.com/murmour/mparser.
Aleksandar Nanevski, Greg Morrisett, and Lars Birkedal. Polymorphism and separation in hoare type theory. SIGPLAN Not., 41(9):62-73, September 2006. URL: https://doi.org/10.1145/1160074.1159812.
Aleksandar Nanevski, Greg Morrisett, Avraham Shinnar, Paul Govereau, and Lars Birkedal. Ynot: Dependent types for imperative programs. SIGPLAN Not., 43(9):229-240, September 2008. URL: https://doi.org/10.1145/1411203.1411237.
Meredith L. Patterson. Hammer primer, 2015. URL: https://github.com/sergeybratus/HammerPrimer.
PDF. Iso 32000 (pdf), 2008. PDF Association. URL: https://www.pdfa.org/resource/iso-32000-pdf/pdf-2.
PKWare. żip file format specification, 2020. URL: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT.
Xiaokang Qiu, Pranav Garg, Andrei Ştefănescu, and Parthasarathy Madhusudan. Natural proofs for structure, data, and separation. SIGPLAN Not., 48(6):231-242, June 2013. URL: https://doi.org/10.1145/2499370.2462169.
Tahina Ramananandro, Antoine Delignat-Lavaud, Cédric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan Protzenko. Everparse: Verified secure zero-copy parsers for authenticated message formats. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC'19, pages 1465-1482, USA, 2019. USENIX Association.
Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. Liquid types. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 159-169, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1375581.1375602.
Kathleen Fisher Sam Lasser, Chris Casinghino and Cody Roux. A verified ll(1) parser generator. In ITP, 2019.
Wolfram Schulte. VCC: Contract-based modular verification of concurrent C. In 31st International Conference on Software Engineering, ICSE 2009. IEEE Computer Society, January 2008.
Nikhil Swamy, Joel Weinberger, Cole Schlesinger, Juan Chen, and Benjamin Livshits. Verifying higher-order programs with the dijkstra monad. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 387-398, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2491956.2491978.
Niki Vazou, Alexander Bakst, and Ranjit Jhala. Bounded refinement types. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, pages 48-61, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2784731.2784745.
Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon L. Peyton Jones. Refinement types for Haskell. In Johan Jeuring and Manuel M. T. Chakravarty, editors, Proceedings of the 19th ACM SIGPLAN international conference on Functional programming, Gothenburg, Sweden, September 1-3, 2014, pages 269-282. ACM, 2014. URL: https://doi.org/10.1145/2628136.2628161.
Philip Wadler. Monads for functional programming. In Manfred Broy, editor, Program Design Calculi, pages 233-264, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg.
Philip Wadler and Peter Thiemann. The marriage of effects and monads. ACM Trans. Comput. Logic, 4(1):1-32, January 2003. URL: https://doi.org/10.1145/601775.601776.

Morpheus: Automated Safety Verification of Data-Dependent Parser Combinator Programs

Authors Ashish Mishra , Suresh Jagannathan

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Morpheus: Automated Safety Verification of Data-Dependent Parser Combinator Programs

Authors Ashish Mishra , Suresh Jagannathan

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message