Challenges and Opportunities in C/C++ Source-To-Source Compilation (Invited Paper)

Authors João Bispo, Nuno Paulino , Luís Miguel Sousa

Thumbnail PDF


  • Filesize: 0.63 MB
  • 15 pages

Document Identifiers

Author Details

João Bispo
  • University of Porto, Portugal
Nuno Paulino
  • Faculty of Engineering, University of Porto, Portugal
Luís Miguel Sousa
  • Faculty of Engineering, University of Porto, Portugal
  • INESC TEC, Porto, Portugal


We would like to thank José G. F. Coutinho for reviewing the paper and the useful feedback.

Cite AsGet BibTex

João Bispo, Nuno Paulino, and Luís Miguel Sousa. Challenges and Opportunities in C/C++ Source-To-Source Compilation (Invited Paper). In 14th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 12th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2023). Open Access Series in Informatics (OASIcs), Volume 107, pp. 2:1-2:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


The C/C++ compilation stack (Intermediate Representations (IR), compilation passes and backends) is encumbered by a steep learning curve, which we believe can be lowered by complementing it with approaches such as source-to-source compilation. Source-to-source compilation is a technology that is widely used and quite mature in certain programming environments, such as JavaScript, but that faces a low adoption rate in others. In the particular case of C and C++ some of the identified factors include the high complexity of the languages, increased difficulty in building and maintaining C/C++ parsers, or limitations on using source code as an intermediate representation. Additionally, new technologies such as Multi-Level Intermediate Representation (MLIR) have appeared as potential competitors to source-to-source compilers at this level. In this paper, we present what we have identified as current challenges of source-to-source compilation of C and C++, as well as what we consider to be opportunities and possible directions forward. We also present several examples, implemented on top of the Clava source-to-source compiler, that use some of these ideas and techniques to raise the abstraction level of compiler research on complex compiled languages such as C or C++. The examples include automatic parallelization of for loops, high-level synthesis optimisation, hardware/software partitioning with run-time decisions, and automatic insertion of inline assembly for fast prototyping of custom instructions.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Compilers
  • Software and its engineering → Source code generation
  • Software and its engineering → Development frameworks and environments
  • Software and its engineering → Software maintenance tools
  • Source-to-source
  • compilation
  • transpilers
  • C/C++
  • code transformation


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. J Stephen Adamczyk and John H Spicer. Template instantiation in the EDG C++ front end. Edison Design Group Technical Report, 1995. Google Scholar
  2. Dominik Adamski, Michał Szydłowski, G Jabłoński, and J Lasoń. Dynamic tiling optimization for polly compiler. International Journal of Microelectronics and Computer Science, 8(4), 2017. Google Scholar
  3. Hamid Arabnejad, João Bispo, João M. P. Cardoso, and Jorge G. Barbosa. Source-to-source compilation targeting openmp-based automatic parallelization of c applications. J. Supercomput., 76(9):6753-6785, September 2020. URL:
  4. Daniel L Ayres and Michael P Cummings. Heterogeneous hardware support in beagle, a high-performance computing library for statistical phylogenetics. In 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pages 23-32. IEEE, 2017. Google Scholar
  5. Hansang Bae, Dheya Mustafa, Jae-Woo Lee, Hao Lin, Chirag Dave, Rudolf Eigenmann, Samuel P Midkiff, H Bae, D Mustafa, J w Lee, H Lin, R Eigenmann, S P Midkiff, and C Dave. The cetus source-to-source compiler infrastructure: Overview and evaluation. Int J Parallel Prog, 41:753-767, 2013. URL:
  6. Jairo Balart, Alejandro Duran, Marc Gonzàlez, Xavier Martorell, Eduard Ayguadé, and Jesús Labarta. Nanos mercurium: a research compiler for OpenMP. In Proceedings of the European Workshop on OpenMP, volume 8, page 2004, 2004. Google Scholar
  7. G.D. Balogh, G.R. Mudalige, I.Z. Reguly, S.F. Antao, and C. Bertolli. Op2-clang: A source-to-source translator using clang/llvm libtooling. In 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), pages 59-70, 2018. URL:
  8. Jean-Baptiste Besnard, Julien Adam, Sameer Shende, Marc Pérache, Patrick Carribault, Julien Jaeger, and Allen D Maloney. Introducing task-containers as an alternative to runtime-stacking. In Proceedings of the 23rd European MPI Users' Group Meeting, pages 51-63, 2016. Google Scholar
  9. João Bispo and João M.P. Cardoso. Clava: C/C++ source-to-source compilation using LARA. SoftwareX, 12:100565, July 2020. URL:
  10. Lorenzo Chelini, Andi Drebes, Oleksandr Zinenko, Albert Cohen, Nicolas Vasilache, Tobias Grosser, and Henk Corporaal. Progressive raising in multi-level ir. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 15-26. IEEE, 2021. Google Scholar
  11. Juan Miguel de Haro, Jaume Bosch, Antonio Filgueras, Miquel Vidal, Daniel Jiménez-González, Carlos Álvarez, Xavier Martorell, Eduard Ayguadé, and Jesús Labarta. Ompss@fpga framework for high performance fpga computing. IEEE Transactions on Computers, 70(12):2029-2042, 2021. URL:
  12. Christophe Denis, Pablo De Oliveira Castro, and Eric Petit. Verificarlo: Checking floating point accuracy through monte carlo arithmetic. arXiv preprint, 2015. URL:
  13. Joao Mario Domingos, Nuno Neves, Nuno Roma, and Pedro Tomás. Unlimited vector extension with data streaming support. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 209-222, 2021. URL:
  14. Roger Ferrer, Sara Royuela, Diego Caballero, Alejandro Duran, Xavier Martorell, and Eduard Ayguadé. Mercurium: Design decisions for a s2s compiler. In Cetus Users and Compiler Infastructure Workshop in conjunction with PACT, volume 2011, 2011. Google Scholar
  15. Dick Grune, Kees Van Reeuwijk, Henri E Bal, Ceriel JH Jacobs, and Koen Langendoen. Modern compiler design. Springer Science & Business Media, 2012. Google Scholar
  16. Luís Miguel Henriques. Automatic streaming for risc-v via source-to-source compilation. Msc thesis, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, 2022. URL:
  17. Miguel Henriques. Clava based transforms for uve code insertion, 2022. URL:
  18. Bernhard Höckner. The insieme compiler frontend: A clang-based C/C++ frontend. Msc thesis, University of Innsbruck, 2014. URL:
  19. Kevin A Huck, Allen D Malony, Sameer Shende, and Doug W Jacobsen. Integrated measurement for cross-platform openmp performance analysis. In International Workshop on OpenMP, pages 146-160. Springer, 2014. Google Scholar
  20. ISO. ISO/IEC 9899:2018 Information technology - Programming languages - C. International Organization for Standardization, Geneva, Switzerland, June 2018. Google Scholar
  21. ISO. ISO/IEC 14882:2020 Information technology - Programming languages - C++. International Organization for Standardization, Geneva, Switzerland, December 2020. Google Scholar
  22. Herbert Jordan. Insieme: A Compiler Infrastructure for Parallel Programs. PhD thesis, University of Innsbruck, August 2014. URL:
  23. Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, and Thomas Fahringer. Inspire: The insieme parallel intermediate representation. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, PACT '13, pages 7-18. IEEE Press, 2013. Google Scholar
  24. Michael Kruse and Tobias Grosser. Delicm: scalar dependence removal at zero memory cost. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, pages 241-253, 2018. Google Scholar
  25. Olaf Krzikalla. Performing Source-to-Source Transformations with Clang, 2013. European LLVM Conference, Paris. URL:
  26. Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004., pages 75-86. IEEE, 2004. Google Scholar
  27. Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. Mlir: Scaling compiler infrastructure for domain specific computation. CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, pages 2-14, February 2021. URL:
  28. Julia Lawall. Coccinelle: Reducing the barriers to modularization in a large c code base. In Proceedings of the companion publication of the 13th international conference on Modularity, MODULARITY '14, pages 5-6, New York, NY, USA, 2014. Association for Computing Machinery. URL:
  29. Julia Lawall and Gilles Muller. Coccinelle: 10 years of automated evolution in the linux kernel. In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '18, pages 601-613, USA, 2018. USENIX Association. Google Scholar
  30. Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. The OCaml System Release 4.14, 2022. URL:
  31. LLVM Project. Clang: a C language family frontend for LLVM, 2022. URL:
  32. Bernardo Cardoso Lopes and Nathan Lanza. [RFC] An MLIR based Clang IR (CIR) - Clang Frontend - LLVM Discussion Forums. 2022. Available at, 2022. Accessed 2022-07-05.
  33. Bruno Cardoso Lopes. Understanding and writing an llvm compiler back-end. In ELC'09: Embedded Linux Conference, 2009, 2009. Google Scholar
  34. Patrick McCormick, Christine Sweeney, Nick Moss, Dean Prichard, Samuel K Gutierrez, Kei Davis, and Jamaludin Mohd-Yusof. Exploring the construction of a domain-aware toolchain for high-performance computing. In 2014 fourth international workshop on domain-specific languages and high-level frameworks for high performance computing, pages 1-10. IEEE, 2014. Google Scholar
  35. Reed Milewicz, Peter Pirkelbauer, Prema Soundararajan, Hadia Ahmed, and Tony Skjellum. Negative perceptions about the applicability of source-to-source compilers in hpc: A literature review. In International Conference on High Performance Computing, pages 233-246. Springer, 2021. Google Scholar
  36. William S Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. Polygeist: Raising C to Polyhedral MLIR. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 45-59. IEEE, 2021. Google Scholar
  37. George C. Necula, Scott McPeak, Shree P. Rahul, and Westley Weimer. Cil: Intermediate language and tools for analysis and transformation of c programs. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2304:213-228, 2002. URL:
  38. Ricardo Nobre, João Bispo, Tiago Carvalho, and João MP Cardoso. Nonio—modular automatic compiler phase selection and ordering specialization framework for modern compilers. SoftwareX, 10:100238, 2019. Google Scholar
  39. Chris Northwood. Javascript. In The Full Stack Developer, pages 159-208. Springer, 2018. Google Scholar
  40. Diego Novillo. GCC an architectural overview, current status, and future directions. In Proceedings of the Linux Symposium, volume 2, page 185, 2006. Google Scholar
  41. Pedro Pinto, Tiago Carvalho, João Bispo, and João M. P. Cardoso. Lara as a language-independent aspect-oriented programming approach. In Proceedings of the Symposium on Applied Computing, pages 1623-1630, New York, NY, USA, 2017. Association for Computing Machinery. URL:
  42. Pedro Pinto, Tiago Carvalho, João Bispo, Miguel António Ramalho, and João MP Cardoso. Aspect composition for multiple target languages using lara. Computer Languages, Systems & Structures, 53:1-26, 2018. Google Scholar
  43. LLVM Project. Using Clang as a Library - LibTooling, 2022. URL:
  44. Dan Quinlan and Chunhua Liao. The rose source-to-source compiler infrastructure. In Cetus users and compiler infrastructure workshop, in conjunction with PACT, volume 2011, page 1. Citeseer, 2011. Google Scholar
  45. Tiago Santos. Acceleration of applications with fpga-based computing machines: Code restructuring. Msc thesis, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, 2020. URL:
  46. Tiago Santos and João M.P. Cardoso. Automatic selection and insertion of hls directives via a source-to-source compiler. In 2020 International Conference on Field-Programmable Technology (ICFPT), pages 227-232, 2020. URL:
  47. Suyog Sarda and Mayur Pandey. LLVM essentials. Packt Publishing Ltd, 2015. Google Scholar
  48. S. Satoh. NAS Parallel Benchmarks 2.3 OpenMP C Version, 2000. URL:
  49. Luís Sousa. Runtime management of heterogeneous compute resources in embedded systems. Msc thesis, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, 2021. URL:
  50. Luís Miguel Sousa, Nuno Paulino, João Canas Ferreira, and João Bispo. A flexible hls hoeffding tree implementation for runtime learning on fpga. In 2022 IEEE 21st Mediterranean Electrotechnical Conference (MELECON), pages 972-977, 2022. URL:
  51. Jessica Vandebon, Jose GF Coutinho, Wayne Luk, Eriko Nurvitadhi, and Tim Todman. Artisan: A meta-programming approach for codifying optimisation strategies. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 177-185. IEEE, 2020. Google Scholar
  52. Nicolas Vasilache, Oleksandr Zinenko, Aart JC Bik, Mahesh Ravishankar, Thomas Raoux, Alexander Belyaev, Matthias Springer, Tobias Gysi, Diego Caballero, Stephan Herhut, et al. Composable and modular code generation in mlir: A structured and retargetable approach to tensor compiler construction. arXiv preprint, 2022. URL:
  53. Peter Zangerl, Herbert Jordan, Peter Thoman, Philipp Gschwandtner, and Thomas Fahringer. Exploring the semantic gap in compiling embedded dsls. ACM International Conference Proceeding Series, pages 195-201, July 2018. URL: