Building Code Transpilers for Domain-Specific Languages Using Program Synthesis (Experience Paper)

Authors Sahil Bhatia, Sumer Kohli, Sanjit A. Seshia, Alvin Cheung



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2023.38.pdf
  • Filesize: 1.19 MB
  • 30 pages

Document Identifiers

Author Details

Sahil Bhatia
  • University of California, Berkeley, CA, USA
Sumer Kohli
  • University of California, Berkeley, CA, USA
Sanjit A. Seshia
  • University of California, Berkeley, CA, USA
Alvin Cheung
  • University of California, Berkeley, CA, USA

Acknowledgements

We would like to thank Shadaj Laddad, Maaz Ahmad and anonymous reviewers for their insightful feedback.

Cite AsGet BibTex

Sahil Bhatia, Sumer Kohli, Sanjit A. Seshia, and Alvin Cheung. Building Code Transpilers for Domain-Specific Languages Using Program Synthesis (Experience Paper). In 37th European Conference on Object-Oriented Programming (ECOOP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 263, pp. 38:1-38:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ECOOP.2023.38

Abstract

Domain-specific languages (DSLs) are prevalent across many application domains. Such languages let developers easily express computations using high-level abstractions that result in performant implementations. To leverage DSLs, however, application developers need to master the DSL’s syntax and manually rewrite existing code. Compilers can aid in this effort, but part of building a compiler requires transpiling code from the source code to the target DSL. Such transpilation is typically done via pattern-matching rules on the source code. Sadly, developing such rules is often a painstaking and error-prone process. In this paper, we describe our experience in using program synthesis to build code transpilers. To do so, we developed MetaLift, a new framework for building transpilers that transform general-purpose code into DSLs using program synthesis. To use MetaLift, transpiler developers first define the target DSL’s semantics using MetaLift’s specification language, and specify the search space for each input code fragment to be transpiled using MetaLift’s API. MetaLift then leverages program synthesizers and theorem provers to automatically find transpilations expressed in the target DSL that is provably semantic equivalent to the input code. We have used MetaLift to build three DSL transpilers targeting different programming models and application domains. Our results show that the MetaLift-based compilers can translate many benchmarks used in prior work created by specialized implementations, but can be built using orders-of-magnitude fewer lines of code as compared to prior work.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Compilers
Keywords
  • Program Synthesis
  • Code Transpilation
  • DSLs
  • Verification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. https://github.com/uwplse/Casper/tree/master/bin/benchmarks, Accessed: 2022-03-14.
  2. Maaz Bin Safeer Ahmad and Alvin Cheung. Leveraging parallel data processing frameworks with verified lifting. In Proceedings Fifth Workshop on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016., pages 67-83, 2016. Google Scholar
  3. Maaz Bin Safeer Ahmad and Alvin Cheung. Automatically leveraging mapreduce frameworks for data-intensive applications. In Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1205-1220. ACM, 2018. Google Scholar
  4. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Pearson Education, Inc, 2006. Google Scholar
  5. Takuya Akiba, Kentaro Imajo, Hiroaki Iwami, Yoichi Iwata, Toshiki Kataoka, Naohiro Takahashi, Michal Moskal, and Nikhil Swamy. Calibrating research in program synthesis using 72,000 hours of programmer time, 2013. Google Scholar
  6. Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference, DAC '12, pages 1216-1225, New York, NY, USA, 2012. ACM. URL: https://doi.org/10.1145/2228360.2228584.
  7. Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-strength smt solver. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 415-442, Cham, 2022. Springer International Publishing. Google Scholar
  8. Mike Barnett and Rustan Leino. Weakest-precondition of unstructured programs. In PASTE '05: The 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 82-87. ACM Press, September 2005. URL: https://www.microsoft.com/en-us/research/publication/weakest-precondition-of-unstructured-programs/.
  9. Clark Barrett, Pascal Fontaine, and Cesare Tinelli. The SMT-LIB Standard: Version 2.6. Technical report, Department of Computer Science, The University of Iowa, 2017. Available at www.SMT-LIB.org. Google Scholar
  10. Clark Barrett, Roberto Sebastiani, Sanjit A. Seshia, and Cesare Tinelli. Satisfiability modulo theories. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability, chapter 33, pages 1267-1329. IOS Press, second edition, 2021. Google Scholar
  11. Rastislav Bodík and Barbara Jobstmann. Algorithmic program synthesis: introduction. STTT, 15(5-6):397-411, 2013. Google Scholar
  12. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5-10, 2010, pages 363-375, 2010. Google Scholar
  13. Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pages 2552-2562, Red Hook, NY, USA, 2018. Curran Associates Inc. Google Scholar
  14. Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. Optimizing database-backed applications with query synthesis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 3-14, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2491956.2462180.
  15. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107-113, January 2008. URL: https://doi.org/10.1145/1327452.1327492.
  16. Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, and Vasily Volkov. Parallel computing experiences with cuda. IEEE Micro, 28(4):13-27, July 2008. URL: https://doi.org/10.1109/MM.2008.57.
  17. Sumit Gulwani. Dimensions in program synthesis. In Proceedings of the 12th International ACM SIGPLAN Conference on Principles and Practice of Declarative Programming, July 26-28, 2010, Hagenberg, Austria, pages 13-24, 2010. Google Scholar
  18. Arie Gurfinkel, Temesghen Kahsai, Anvesh Komuravelli, and Jorge A. Navas. The seahorn verification framework. In Daniel Kroening and Corina S. Păsăreanu, editors, Computer Aided Verification, pages 343-361, Cham, 2015. Springer International Publishing. Google Scholar
  19. C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576-580, October 1969. URL: https://doi.org/10.1145/363235.363259.
  20. Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. Verified lifting of stencil computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '16, pages 711-726, New York, NY, USA, 2016. ACM. URL: https://doi.org/10.1145/2908080.2908117.
  21. David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. Spatial: A language and compiler for application accelerators, 2018. Google Scholar
  22. Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '04, pages 75-, Washington, DC, USA, 2004. IEEE Computer Society. URL: http://dl.acm.org/citation.cfm?id=977395.977673.
  23. David Menendez and Santosh Nagarakatte. Alive-Infer: Data-driven Precondition Inference for Peephole Optimizations in LLVM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, pages 49-63, New York, NY, USA, 2017. ACM. URL: https://doi.org/10.1145/3062341.3062372.
  24. Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, Emina Torlak, and Xi Wang. Scaling symbolic evaluation for automated verification of systems code with serval. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP '19, pages 225-242, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3341301.3359641.
  25. Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 651-654, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2491411.2494584.
  26. Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL: https://openreview.net/forum?id=rJ0JwFcex.
  27. Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Rastislav Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 396-407, New York, NY, USA, 2014. ACM. URL: https://doi.org/10.1145/2594291.2594339.
  28. Elizabeth Polgreen, Kevin Cheang, Pranav Gaddamadugu, Adwait Godbole, Kevin Laeufer, Shaokai Lin, Yatin A. Manerkar, Federico Mora, and Sanjit A. Seshia. UCLID5: multi-modal formal modeling, verification, and synthesis. In 34th International Conference on Computer Aided Verification (CAV), volume 13371 of Lecture Notes in Computer Science, pages 538-551. Springer, 2022. Google Scholar
  29. Elizabeth Polgreen, Andrew Reynolds, and Sanjit A. Seshia. Satisfiability and synthesis modulo oracles. In Proceedings of the 23rd International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), January 2022. Google Scholar
  30. Oleksandr Polozov and Sumit Gulwani. Flashmeta: A framework for inductive program synthesis. In OOPSLA 2015 Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107-126, October 2015. URL: https://www.microsoft.com/en-us/research/publication/flashmeta-framework-inductive-program-synthesis/.
  31. Cosmin Radoi, Stephen J. Fink, Rodric Rabbah, and Manu Sridharan. Translating imperative code to mapreduce. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '14, pages 909-927, New York, NY, USA, 2014. ACM. URL: https://doi.org/10.1145/2660193.2660228.
  32. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519-530, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2491956.2462176.
  33. Baptiste Rozière, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. Unsupervised translation of programming languages. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.
  34. Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic superoptimization. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, Houston, TX, USA - March 16 - 20, 2013, pages 305-316, 2013. Google Scholar
  35. Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. Packet transactions: High-level programming for line-rate switches. In Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, pages 15-28, 2016. Google Scholar
  36. Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. Domino examples, October 2018. URL: https://github.com/packet-transactions/domino-examples/blob/master/banzai_atoms/pred_raw.sk.
  37. Sketch. https://people.csail.mit.edu/asolar/, 2016. Accessed: 2016-05-01.
  38. Armando Solar-Lezama. Program Synthesis by Sketching. PhD thesis, University of California at Berkeley, Berkeley, CA, USA, 2008. AAI3353225. Google Scholar
  39. Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst., 13(4s):134:1-134:25, April 2014. URL: https://doi.org/10.1145/2584665.
  40. C.-H. Tai, J. Zhu, and N. Dukkipati. Making large scale deployment of rcp practical for real networks. In IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, pages 2180-2188, 2008. URL: https://doi.org/10.1109/INFOCOM.2008.285.
  41. Emina Torlak and Rastislav Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2013, pages 135-152, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2509578.2509586.
  42. The z3 theorem prover. https://github.com/Z3Prover/z3, 2017.
  43. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, 2012. Google Scholar