K-LLVM: A Relatively Complete Semantics of LLVM IR

Li, Liyi; Gunter, Elsa L.

doi:10.4230/LIPIcs.ECOOP.2020.7

Abstract

LLVM [Lattner and Adve, 2004] is designed for the compile-time, link-time and run-time optimization of programs written in various programming languages. The language supported by LLVM targeted by modern compilers is LLVM IR [llvm.org, 2018]. In this paper we define K-LLVM, a reference semantics for LLVM IR. To the best of our knowledge, K-LLVM is the most complete formal LLVM IR semantics to date, including all LLVM IR instructions, intrinsic functions in the LLVM documentation and Standard-C library functions that are necessary to execute many LLVM IR programs. Additionally, K-LLVM formulates an abstract machine that executes all LLVM IR instructions. The machine allows to describe our formal semantics in terms of simulating a conceptual virtual machine that runs LLVM IR programs, including non-deterministic programs. Even though the K-LLVM memory model in this paper is assumed to be a sequentially consistent memory model and does not include all LLVM concurrency memory behaviors, the design of K-LLVM’s data layout allows the K-LLVM abstract machine to execute some LLVM IR programs that previous semantics did not cover, such as the full range of LLVM IR behaviors for the interaction among LLVM IR casting, pointer arithmetic, memory operations and some memory flags (e.g. readonly) of function headers. Additionally, the memory model is modularized in a manner that supports investigating other memory models. To validate K-LLVM, we have implemented it in 𝕂 [Roşu, 2016], which generated an interpreter for LLVM IR. Using this, we ran tests including 1,385 unit test programs and around 3,000 concrete LLVM IR programs, and K-LLVM passed all of them.

Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizing C++ Concurrency. SIGPLAN Not., 46(1):55-66, January 2011. URL: https://doi.org/10.1145/1925844.1926394.
Yves Bertot and Pierre Casteran. Interactive Theorem Proving and Program Development. SpringerVerlag, 2004.
Sandrine Blazy and Xavier Leroy. Mechanized Semantics for the Clight Subset of the C Language. Journal of Automated Reasoning, 43(3):263-288, 2009. URL: https://doi.org/10.1007/s10817-009-9148-3.
Martin Bodin, Arthur Chargueraud, Daniele Filaretti, Philippa Gardner, Sergio Maffeis, Daiva Naudziuniene, Alan Schmitt, and Gareth Smith. A Trusted Mechanised JavaScript Specification. SIGPLAN Not., 49(1):87-100, January 2014. URL: https://doi.org/10.1145/2578855.2535876.
Denis Bogdănaş and Grigore Roşu. K-Java: A Complete Semantics of Java. In Proceedings of the 42nd Symposium on Principles of Programming Languages (POPL'15), pages 445-456. ACM, January 2015. URL: https://doi.org/10.1145/2676726.2676982.
Egon Börger, Nicu G. Fruja, Vincenzo Gervasi, and Robert F. Stärk. A High-level Modular Definition of the Semantics of C#. Theor. Comput. Sci., 336(2-3):235-284, May 2005. URL: https://doi.org/10.1016/j.tcs.2004.11.008.
Soham Chakraborty and Viktor Vafeiadis. Formalizing the Concurrency Semantics of an LLVM Fragment. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO '17, pages 100-110, Piscataway, NJ, USA, 2017. IEEE Press. URL: http://dl.acm.org/citation.cfm?id=3049832.3049844.
Soham Chakraborty and Viktor Vafeiadis. Grounding Thin-air Reads with Event Structures. Proc. ACM Program. Lang., 3(POPL):70:1-70:28, January 2019. URL: https://doi.org/10.1145/3290383.
Sophia Drossopoulou, Susan Eisenbach, and Sarfraz Khurshid. Is the Java Type System Sound? Theor. Pract. Object Syst., 5(1):3-24, January 1999. URL: https://doi.org/10.1002/(SICI)1096-9942(199901/03)5:1<3::AID-TAPO2>3.0.CO;2-T.
Chucky Ellison and David Lazar. The LLVM Semantics in K, 2012. URL: https://github.com/davidlazar/llvm-semantics.
Chucky Ellison and Grigore Rosu. An Executable Formal Semantics of C with Applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '12, pages 533-544, New York, NY, USA, 2012. ACM. URL: https://doi.org/10.1145/2103656.2103719.
Azadeh Farzan, Feng Chen, José Meseguer, and Grigore Roşu. Formal Analysis of Java Programs in JavaFAN. In Rajeev Alur and Doron A. Peled, editors, Computer Aided Verification, pages 501-505, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
Daniele Filaretti and Sergio Maffeis. An Executable Formal Semantics of PHP. In Richard Jones, editor, ECOOP 2014 - Object-Oriented Programming, pages 567-592, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.
Yuri Gurevich. Evolving algebras 1993: Lipari guide. In Egon Börger, editor, Specification and Validation Methods, pages 9-36. Oxford University Press, Inc., New York, NY, USA, 1995. URL: http://dl.acm.org/citation.cfm?id=233976.233979.
Myra Van Inwegen and Elsa L. Gunter. HOL-ML. In Proceedings of the 6th International Workshop on Higher Order Logic Theorem Proving and Its Applications, HUG '93, pages 61-74, London, UK, UK, 1994. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=646520.694367.
Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A Promising Semantics for Relaxed-memory Concurrency. SIGPLAN Not., 52(1):175-189, January 2017. URL: https://doi.org/10.1145/3093333.3009850.
Jeehoon Kang, Chung-Kil Hur, William Mansky, Dmitri Garbuzov, Steve Zdancewic, and Viktor Vafeiadis. A Formal C Memory Model Supporting Integer-pointer Casts. SIGPLAN Not., 50(6):326-335, June 2015. URL: https://doi.org/10.1145/2813885.2738005.
Jeehoon Kang, Yoonseung Kim, Youngju Song, Juneyoung Lee, Sanghoon Park, Mark Dongyeon Shin, Yonghyun Kim, Sungkeun Cho, Joonwon Choi, Chung-Kil Hur, and Kwangkeun Yi. Crellvm: Verified Credible Compilation for LLVM. SIGPLAN Not., 53(4):631-645, June 2018. URL: https://doi.org/10.1145/3296979.3192377.
Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. Taming Release-acquire Consistency. SIGPLAN Not., 51(1):649-662, January 2016. URL: https://doi.org/10.1145/2914770.2837643.
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairing Sequential Consistency in C/C++11. SIGPLAN Not., 52(6):618-632, June 2017. URL: https://doi.org/10.1145/3140587.3062352.
Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '04, pages 75-, Washington, DC, USA, 2004. IEEE Computer Society. URL: http://dl.acm.org/citation.cfm?id=977395.977673.
Daniel K. Lee, Karl Crary, and Robert Harper. Towards a Mechanized Metatheory of Standard ML. SIGPLAN Not., 42(1):173-184, January 2007. URL: https://doi.org/10.1145/1190215.1190245.
Juneyoung Lee, Chung-Kil Hur, Ralf Jung, Zhengyang Liu, John Regehr, and Nuno P. Lopes. Reconciling High-level Optimizations and Low-level Code in LLVM. Proc. ACM Program. Lang., 2(OOPSLA):125:1-125:28, October 2018. URL: https://doi.org/10.1145/3276495.
Juneyoung Lee, Yoonseung Kim, Youngju Song, Chung-Kil Hur, Sanjoy Das, David Majnemer, John Regehr, and Nuno P. Lopes. Taming Undefined Behavior in LLVM. SIGPLAN Not., 52(6):633-647, June 2017. URL: https://doi.org/10.1145/3140587.3062343.
Xavier Leroy, Andrew W. Appel, Sandrine Blazy, and Gordon Stewart. The CompCert Memory Model, Version 2. Research Report RR-7987, INRIA, June 2012. URL: https://hal.inria.fr/hal-00703441.
Liyi Li and Elsa Gunter. LLVM Semantics, 2019. URL: https://github.com/liyili2/llvm-semantics-1.
Liyi Li and Elsa L Gunter. IsaK: A Complete Semantics of K. Technical Report http://hdl.handle.net/2142/100116, University of Illinois at Urbana-Champaign, June 2018.
Liyi Li and Elsa L. Gunter. IsaK-Static: A Complete Static Semantics of K. In Kyungmin Bae and Peter Csaba Ölveczky, editors, Formal Aspects of Component Software, pages 196-215, Cham, 2018. Springer International Publishing.
llvm.org. LLVM Language Reference Manual, 2018. URL: http://releases.llvm.org/6.0.0/docs/LangRef.html.
Nuno P. Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. Provably Correct Peephole Optimizations with Alive. SIGPLAN Not., 50(6):22-32, June 2015. URL: https://doi.org/10.1145/2813885.2737965.
Savi Maharaj and Elsa L. Gunter. Studying the ML Module System in HOL. In Proceedings of the 7th International Workshop on Higher Order Logic Theorem Proving and Its Applications, pages 346-361, London, UK, UK, 1994. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=646521.759249.
William Mansky. Specifying and verifying program transformations with PTRANS. Technical Report http://hdl.handle.net/2142/49385, University of Illinois at Urbana-Champaign, May 2014.
William Mansky, Elsa L. Gunter, Dennis Griffith, and Michael D. Adams. Specifying and Executing Optimizations for Generalized Control Flow Graphs. Science of Computer Programming, 130:2-23, November 2016. URL: https://doi.org/10.1016/j.scico.2016.06.003.
Narciso Martí-Oliet and José Meseguer. Rewriting Logic as a Logical and Semantic Framework. In J. Meseguer, editor, Electronic Notes in Theoretical Computer Science, volume 4. Elsevier Science Publishers, 2000.
Paul E. Mckenney. Memory Barriers: a Hardware View for Software Hackers, 2009.
Kayvan Memarian, Victor B. F. Gomes, Brooks Davis, Stephen Kell, Alexander Richardson, Robert N. M. Watson, and Peter Sewell. Exploring C Semantics and Pointer Provenance. Proc. ACM Program. Lang., 3(POPL):67:1-67:32, January 2019. URL: https://doi.org/10.1145/3290380.
Robin Milner, Mads Tofte, and David MacQueen. The Definition of Standard ML. MIT Press, Cambridge, MA, USA, 1997.
Daejun Park, Andrei Ştefănescu, and Grigore Roşu. KJS: A Complete Formal Semantics of JavaScript. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15), pages 346-356. ACM, June 2015. URL: https://doi.org/10.1145/2737924.2737991.
Lawrence C. Paulson. Isabelle: The Next 700 Theorem Provers. In P. Odifreddi, editor, Logic and Computer Science, pages 361-386. Academic Press, 1990.
Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. Bridging the gap between programming languages and hardware weak memory models. Proc. ACM Program. Lang., 3(POPL):69:1-69:31, January 2019. URL: https://doi.org/10.1145/3290382.
Grigore Roşu. K Implementation, 2016. URL: https://github.com/kframework/k.
Grigore Roşu and Traian Florin Şerbănuţă. An Overview of the K Semantic Framework. Journal of Logic and Algebraic Programming, 79(6):397-434, 2010. URL: https://doi.org/10.1016/j.jlap.2010.03.012.
Don Syme. Proving Java Type Soundness. In Formal Syntax and Semantics of Java, pages 83-118, London, UK, UK, 1999. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=645580.658814.
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and Understanding Bugs in C Compilers. SIGPLAN Not., 46(6):283-294, June 2011. URL: https://doi.org/10.1145/1993316.1993532.
Jianzhou Zhao, Santosh Nagarakatte, Milo M.K. Martin, and Steve Zdancewic. Formalizing the LLVM Intermediate Representation for Verified Program Transformations. SIGPLAN Not., 47(1):427-440, January 2012. URL: https://doi.org/10.1145/2103621.2103709.

K-LLVM: A Relatively Complete Semantics of LLVM IR

Authors Liyi Li, Elsa L. Gunter

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

K-LLVM: A Relatively Complete Semantics of LLVM IR

Authors Liyi Li, Elsa L. Gunter

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message