Embedded Multi-Core Code Generation with Cross-Layer Parallelization

Authors Oliver Oey , Michael Huebner , Timo Stripf , Juergen Becker

Thumbnail PDF


  • Filesize: 0.72 MB
  • 13 pages

Document Identifiers

Author Details

Oliver Oey
  • Karlsruhe Institute of Technology, Germany
  • emmtrix Technologies GmbH, Karlsruhe, Germany
Michael Huebner
  • BTU Cottbus - Senftenberg, Germany
Timo Stripf
  • emmtrix Technologies GmbH, Karlsruhe, Germany
Juergen Becker
  • Karlsruhe Institute of Technology, Germany

Cite AsGet BibTex

Oliver Oey, Michael Huebner, Timo Stripf, and Juergen Becker. Embedded Multi-Core Code Generation with Cross-Layer Parallelization. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


In this paper, we present a method for optimizing C code for embedded multi-core systems using cross-layer parallelization. The method has two phases. The first is to develop the algorithm without any optimization for the target platform. Then, the second step is to optimize and parallelize the code across four defined layers which are the algorithm, code, task, and data layers, for efficient execution on the target hardware. Each layer is focused on selected hardware characteristics. By using an iterative approach, individual kernels and composite algorithms can be very well adapted to execution on the hardware without further adaptation of the algorithm itself. The realization of this cross-layer parallelization consists of algorithm recognition, code transformations, task distribution, and insertion of synchronization and communication statements. The method is evaluated first on a common kernel and then on a sample image processing algorithm to showcase the benefits of the approach. Compared to other methods that only rely on two or three of these layers, 20 to 30 % of additional performance gain can be achieved.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Source code generation
  • Software and its engineering → Imperative languages
  • Software and its engineering → Very high level languages
  • Computer systems organization → Embedded software
  • Parallelization
  • multi-core Processors
  • model-based Development
  • Code Generation


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Hamid Arabnejad, João Bispo, João M. P. Cardoso, and Jorge G. Barbosa. Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications. The Journal of Supercomputing, 76(9):6753-6785, December 2019. URL: https://doi.org/10.1007/s11227-019-03109-9.
  2. Jürgen Becker, Thomas Bruckschloegl, Oliver Oey, Timo Stripf, George Goulas, Nick Raptis, Christos Valouxis, Panayiotis Alefragis, Nikolaos Voros, and Christos Gogos. Profile-Guided Compilation of Scilab Algorithms for Multiprocessor Systems. In Reconfigurable Computing: Architectures, Tools, and Applications: 10th International Symposium, ARC 2014, Vilamoura, Portugal, April 14-16, 2014. Proceedings 10, pages 330-336. Springer, 2014. Google Scholar
  3. John Canny. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679-698, November 1986. URL: https://doi.org/10.1109/tpami.1986.4767851.
  4. Lorenzo Chelini, Andi Drebes, Oleksandr Zinenko, Albert Cohen, Nicolas Vasilache, Tobias Grosser, and Henk Corporaal. Progressive Raising in Multi-level IR. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, February 2021. URL: https://doi.org/10.1109/cgo51591.2021.9370332.
  5. Marco Danelutto, Gabriele Mencagli, Massimo Torquati, Horacio GonzálezendashVélez, and Peter Kilpatrick. Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming. International Journal of Parallel Programming, 49(2):177-198, November 2020. URL: https://doi.org/10.1007/s10766-020-00684-w.
  6. Richard O. Duda and Peter E. Hart. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 15(1):11-15, 1972. Google Scholar
  7. Ian Foster. Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley Longman Publishing Co., Inc., 1995. Google Scholar
  8. Saiyedul Islam, Sundar Balasubramaniam, Shruti Gupta, Shikhar Brajesh, Rohan Badlani, Nitin Labhishetty, Abhinav Baid, Poonam Goyal, and Navneet Goyal. Pattern-Based Automatic Parallelization of Representative-Based Clustering Algorithms. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, October 2018. URL: https://doi.org/10.1109/dsaa.2018.00020.
  9. Nikita Kataev. Interactive Parallelization of C Programs in SAPFOR. In SSI, pages 139-148, 2020. Google Scholar
  10. Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, February 2021. URL: https://doi.org/10.1109/cgo51591.2021.9370308.
  11. Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kołodziej, and Christoph Kessler. Benchmarking OpenCL, OpenACC, OpenMP, and CUDA. In Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing. ACM, July 2017. URL: https://doi.org/10.1145/3110355.3110356.
  12. Luís Miguel Pinho, Eduardo Quinones, and Andrea Marongiu. High-performance and time-predictable embedded computing. River Publishers, 2018. Google Scholar
  13. Sabri Pllana and Fatos Xhafa, editors. Programming multi-core and many-core computing systems. John Wiley & Sons, Inc., January 2017. URL: https://doi.org/10.1002/9781119332015.
  14. Todor Stefanov, Hristo Nikolov, Lubomir Bogdanov, and Angel Popov. DAEDALUS framework for high-level synthesis: Past, present and future. In 2021 25th International Conference Electronics. IEEE, June 2021. URL: https://doi.org/10.1109/ieeeconf52705.2021.9467445.
  15. Jessica Vandebon, Jose G. F. Coutinho, Wayne Luk, Eriko Nurvitadhi, and Tim Todman. Artisan: a Meta-Programming Approach For Codifying Optimisation Strategies. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 177-185, 2020. URL: https://doi.org/10.1109/FCCM48280.2020.00032.
  16. Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer, Shibo Wang, Tao Wang, Yonghui Wu, and Zhifeng Chen. GSPMD: General and Scalable Parallelization for ML Computation Graphs, 2021. URL: https://doi.org/10.48550/arXiv.2105.04663.
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail