Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations

Authors Jie Qiu , Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung



PDF
Thumbnail PDF

File

LIPIcs.ECOOP.2024.32.pdf
  • Filesize: 6.56 MB
  • 28 pages

Document Identifiers

Author Details

Jie Qiu
  • Pittsburgh, PA, USA
Colin Cai
  • University of California, Berkeley, CA, USA
Sahil Bhatia
  • University of California, Berkeley, CA, USA
Niranjan Hasabnis
  • Intel Labs, Menlo Park, CA, USA
Sanjit A. Seshia
  • University of California, Berkeley, CA, USA
Alvin Cheung
  • University of California, Berkeley, CA, USA

Acknowledgements

We would like to thank Jayaram Bobba and Zhongkai Zhang from Intel’s Habana team for inputs on Gaudi architecture, TPC-C programming model, and obtaining high-performance from TPC kernels. We would like to thank Hasan Genc and Sophia Shao for helpful insights into Gemmini code generation.

Cite AsGet BibTex

Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, and Alvin Cheung. Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 32:1-32:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ECOOP.2024.32

Abstract

Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we present Tenspiler, a verified-lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code that does not leverage any specialized framework or accelerator) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Unlike classical pattern-matching-based compilers, Tenspiler uses program synthesis to translate input code into TensIR, which is then compiled to the target API / ISA. Currently, Tenspiler already supports six DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on 6 different software frameworks and hardware devices, Tenspiler offers on average 105× kernel and 9.65× end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Compilers
Keywords
  • Program Synthesis
  • Code Transpilation
  • Tensor DSLs
  • Verification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Claude Model. https://www.anthropic.com/news/claude-3-family. [Online].
  2. Darknet. http://pjreddie.com/darknet/. [Online].
  3. Mathfu. https://github.com/google/mathfu. [Online].
  4. Texas Instrument Digital Signal Processing (DSP) Library for MSP430 Microcontrollers. https://www.ti.com/tool/MSP-DSPLIB. [Online].
  5. Maaz Bin Safeer Ahmad and Alvin Cheung. Automatically leveraging mapreduce frameworks for data-intensive applications. In Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1205-1220. ACM, 2018. Google Scholar
  6. Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, and Shoaib Kamil. Automatically translating image processing libraries to halide. ACM Trans. Graph., 38(6), November 2019. URL: https://doi.org/10.1145/3355089.3356549.
  7. Maaz Bin Safeer Ahmad, Alexander J. Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. Vector instruction selection for digital signal processors using program synthesis. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '22, pages 1004-1016, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3503222.3507714.
  8. Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design, pages 1-8, 2013. URL: https://doi.org/10.1109/FMCAD.2013.6679385.
  9. SMT-LIB Authors. SMT-LIB Standard. https://smtlib.cs.uiowa.edu/. [Online].
  10. Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng, Cesare Tinelli, and Yoni Zohar. cvc5: A versatile and industrial-strength smt solver. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 415-442, Cham, 2022. Springer International Publishing. Google Scholar
  11. Mike Barnett and K. Rustan M. Leino. Weakest-precondition of unstructured programs. In Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE '05, pages 82-87, New York, NY, USA, 2005. Association for Computing Machinery. URL: https://doi.org/10.1145/1108792.1108813.
  12. Sahil Bhatia, Sumer Kohli, Sanjit A. Seshia, and Alvin Cheung. Building Code Transpilers for Domain-Specific Languages Using Program Synthesis. In Karim Ali and Guido Salvaneschi, editors, 37th European Conference on Object-Oriented Programming (ECOOP 2023), volume 263 of Leibniz International Proceedings in Informatics (LIPIcs), pages 38:1-38:30, Dagstuhl, Germany, 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ECOOP.2023.38.
  13. L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, et al. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28(2):135-151, 2002. Google Scholar
  14. Alexander Brauckmann, Elizabeth Polgreen, Tobias Grosser, and Michael FP O'Boyle. mlirsynth: Automatic, retargetable program raising in multi-level ir using program synthesis. In 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 39-50. IEEE, 2023. Google Scholar
  15. Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. Optimizing database-backed applications with query synthesis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 3-14, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2491956.2462180.
  16. Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL: https://lmsys.org/blog/2023-03-30-vicuna/.
  17. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248-255, 2009. URL: https://doi.org/10.1109/CVPR.2009.5206848.
  18. C. A. R. Hoare. An axiomatic basis for computer programming. Commun. ACM, 12(10):576-580, 1969. Google Scholar
  19. Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. Verified lifting of stencil computations. SIGPLAN Not., 51(6):711-726, June 2016. URL: https://doi.org/10.1145/2980983.2908117.
  20. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The tensor algebra compiler. Proc. ACM Program. Lang., 1(OOPSLA), October 2017. URL: https://doi.org/10.1145/3133901.
  21. Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, and Joseph M. Hellerstein. Katara: synthesizing crdts with verified lifting. Proc. ACM Program. Lang., 6(OOPSLA2):1349-1377, 2022. Google Scholar
  22. Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO '04, page 75, USA, 2004. IEEE Computer Society. Google Scholar
  23. Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. Mlir: A compiler infrastructure for the end of moore’s law, 2020. URL: https://arxiv.org/abs/2002.11054.
  24. llamacpp. https://github.com/leloykun/llama2.cpp/, 2024. Accessed: 2024-01-19.
  25. José Wesley de Souza Magalhães, Jackson Woodruff, Elizabeth Polgreen, and Michael F. P. O'Boyle. C2taco: Lifting tensor code to taco. In Proceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2023, pages 42-56, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3624007.3624053.
  26. Benjamin Mariano, Yanju Chen, Yu Feng, Greg Durrett, and Işil Dillig. Automated transpilation of imperative to functional code using neural-guided program synthesis. Proc. ACM Program. Lang., 6(OOPSLA1), April 2022. URL: https://doi.org/10.1145/3527315.
  27. Apple mlx. https://ml-explore.github.io/mlx/, 2024.
  28. Numba. https://numba.readthedocs.io/en/stable/cuda/overview.html, 2024.
  29. Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, and Alvin Cheung. Tenspiler: A verified lifting-based compiler for tensor operations, 2024. URL: https://arxiv.org/abs/2404.18249.
  30. Cosmin Radoi, Stephen J. Fink, Rodric Rabbah, and Manu Sridharan. Translating imperative code to mapreduce. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '14, pages 909-927, New York, NY, USA, 2014. ACM. Google Scholar
  31. Christopher D Rosin. Stepping stones to inductive synthesis of low-level looping programs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33(01), pages 2362-2370, 2019. Google Scholar
  32. Baptiste Rozière, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. Unsupervised translation of programming languages. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL: https://proceedings.neurips.cc/paper/2020/hash/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.
  33. Mazen AR Saghir. Application-specific instruction-set architectures for embedded DSP applications. Citeseer, 1998. Google Scholar
  34. Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. Packet transactions: High-level programming for line-rate switches. In Marinho P. Barcellos, Jon Crowcroft, Amin Vahdat, and Sachin Katti, editors, Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, pages 15-28. ACM, 2016. Google Scholar
  35. Sunbeom So and Hakjoo Oh. Synthesizing imperative programs from examples guided by static analysis. In International Static Analysis Symposium, pages 364-381. Springer, 2017. Google Scholar
  36. Emina Torlak and Rastislav Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2013, pages 135-152, New York, NY, USA, 2013. ACM. URL: https://doi.org/10.1145/2509578.2509586.
  37. Tensorflow xla. https://www.tensorflow.org/xla/architecture, 2024.
  38. Vojin Zivojnovic. Dspstone: A dsp-oriented benchmarking methodology. Proc. Signal Processing Applications & Technology, Dallas, TX, 1994, pages 715-720, 1994. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail