Document Open Access Logo

Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures

Authors Benjamin Rouxel , Stefanos Skalistis , Steven Derrien, Isabelle Puaut

Thumbnail PDF


  • Filesize: 0.88 MB
  • 24 pages

Document Identifiers

Author Details

Benjamin Rouxel
  • Univ Rennes, Inria, CNRS, IRISA, France
Stefanos Skalistis
  • Univ Rennes, Inria, CNRS, IRISA, France
Steven Derrien
  • Univ Rennes, Inria, CNRS, IRISA, France
Isabelle Puaut
  • Univ Rennes, Inria, CNRS, IRISA, France

Cite AsGet BibTex

Benjamin Rouxel, Stefanos Skalistis, Steven Derrien, and Isabelle Puaut. Hiding Communication Delays in Contention-Free Execution for SPM-Based Multi-Core Architectures. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 25:1-25:24, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)


Multi-core systems using ScratchPad Memories (SPMs) are attractive architectures for executing time-critical embedded applications, because they provide both predictability and performance. In this paper, we propose a scheduling technique that jointly selects SPM contents off-line, in such a way that the cost of SPM loading/unloading is hidden. Communications are fragmented to augment hiding possibilities. Experimental results show the effectiveness of the proposed technique on streaming applications and synthetic task-graphs. The overlapping of communications with computations allows the length of generated schedules to be reduced by 4% on average on streaming applications, with a maximum of 16%, and by 8% on average for synthetic task graphs. We further show on a case study that generated schedules can be implemented with low overhead on a predictable multi-core architecture (Kalray MPPA).

Subject Classification

ACM Subject Classification
  • Computer systems organization → Embedded and cyber-physical systems
  • Computer systems organization → Real-time systems
  • Real-time Systems
  • Contention-Free Scheduling
  • SPM multi-core architecture


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Ahmed Alhammad and Rodolfo Pellizzoni. Time-predictable execution of multithreaded applications on multicore systems. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, pages 1-6. IEEE, 2014. Google Scholar
  2. Ahmed Alhammad, Saud Wasly, and Rodolfo Pellizzoni. Memory efficient global scheduling of real-time tasks. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015 IEEE, pages 285-296. IEEE, 2015. Google Scholar
  3. Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nélis, and Thomas Nolte. Contention-free execution of automotive applications on a clustered many-core platform. In Real-Time Systems (ECRTS), 2016 28th Euromicro Conference on, pages 14-24. IEEE, 2016. Google Scholar
  4. Matthias Becker, Saad Mubeen, Dakshina Dasari, Moris Behnam, and Thomas Nolte. Scheduling multi-rate real-time applications on clustered many-core architectures with memory constraints. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pages 560-567, January 2018. URL:
  5. Gerald G Brown and Robert F Dell. Formulating integer linear programs: A rogues' gallery. INFORMS Transactions on Education, 7(2):153-159, 2007. Google Scholar
  6. Daniel Casini, Alessandro Biondi, Geoffrey Nelissen, and Giorgio C. Buttazzo. Memory Feasibility Analysis of Parallel Tasks Running on Scratchpad-Based Architectures. In 2018 IEEE Real-Time Systems Symposium, RTSS 2018, Nashville, TN, USA, December 11-14, 2018, pages 312-324, 2018. Google Scholar
  7. Sheng-Wei Cheng, Jian-Jia Chen, Jan Reineke, and Tei-Wei Kuo. Memory Bank Partitioning for Fixed-Priority Tasks in a Multi-core System. In Real-Time Systems Symposium (RTSS), 2017 IEEE, pages 209-219. IEEE, 2017. Google Scholar
  8. Junchul Choi, Hyunok Oh, Sungchan Kim, and Soonhoi Ha. Executing synchronous dataflow graphs on a spm-based multicore architecture. In Proceedings of the 49th Annual Design Automation Conference, pages 664-671. ACM, 2012. Google Scholar
  9. Yoonseo Choi, Yuan Lin, Nathan Chong, Scott Mahlke, and Trevor Mudge. Stream compilation for real-time embedded multicore systems. In Code generation and optimization, 2009. CGO 2009. International symposium on, pages 210-220. IEEE, 2009. Google Scholar
  10. Robert I. Davis and Alan Burns. A survey of hard real-time scheduling algorithms for multiprocessor systems. in ACM Computing Surveys, 2011. Google Scholar
  11. Benoît Dupont De Dinechin, Duco Van Amstel, Marc Poulhi`es, and Guillaume Lager. Time-critical computing on a single-chip massively parallel processor. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014, pages 1-6. IEEE, 2014. Google Scholar
  12. Steven Derrien, Isabelle Puaut, Panayiotis Alefragis, Marcus Bednara, Harald Bucher, Clément David, Yann Debray, Umut Durak, Imen Fassi, Christian Ferdinand, Damien Hardy, Angeliki Kritikakou, Gerard Rauwerda, Simon Reder, Martin Sicks, Timo Stripf, Kim Sunesen, Timon ter Braak, Nikolaos Voros, and Jürgen Becker. WCET-Aware Parallelization of Model-Based Applications for Multi-Cores: the ARGO Approach. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017. IEEE, 2017. Google Scholar
  13. Jean-Francois Deverge and Isabelle Puaut. WCET-directed dynamic scratchpad memory allocation of data. In Real-Time Systems, 2007. ECRTS'07. 19th Euromicro Conference on, pages 179-190. IEEE, 2007. Google Scholar
  14. Robert P Dick, David L Rhodes, and Wayne Wolf. TGFF: task graphs for free. In Proceedings of the 6th international workshop on Hardware/software codesign, pages 97-101. IEEE Computer Society, 1998. Google Scholar
  15. Boubacar Diouf, Can Hantacs, Albert Cohen, "Ozcan "Ozturk, and Jens Palsberg. A decoupled local memory allocator. ACM Transactions on Architecture and Code Optimization (TACO), 9(4):34, 2013. Google Scholar
  16. Guy Durrieu, Madeleine Faugere, Sylvain Girbal, Daniel Gracia P'erez, Claire Pagetti, and Wolfgang Puffitsch. Predictable flight management system implementation on a multicore processor. In Embedded Real Time Software (ERTS'14), 2014. Google Scholar
  17. Roberto Giorgi, Zdravko Popovic, and Nikola Puzovic. Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture. In Parallel &Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1-8. IEEE, 2009. Google Scholar
  18. Igor Griva, Stephen G. Nash, and Ariela Sofer. Linear and Nonlinear Optimization, Second Edition. Society for Industrial Mathematics, 2008. Google Scholar
  19. James A Kahle, Michael N Day, H Peter Hofstee, Charles R Johns, Theodore R Maeurer, and David Shippy. Introduction to the cell multiprocessor. IBM journal of Research and Development, 49(4.5):589-604, 2005. Google Scholar
  20. Md Kamruzzaman, Steven Swanson, and Dean M. Tullsen. Inter-core Prefetching for Multicore Processors Using Migrating Helper Threads. SIGPLAN Not., 46(3):393-404, March 2011. URL:
  21. Timon Kelter, Tim Harde, Peter Marwedel, and Heiko Falk. Evaluation of resource arbitration methods for multi-core real-time systems. In WCET, pages 1-10, 2013. Google Scholar
  22. Yooseong Kim, David Broman, Jian Cai, and Aviral Shrivastaval. WCET-aware dynamic code management on scratchpads for software-managed multicores. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014 IEEE 20th, pages 179-188. IEEE, 2014. Google Scholar
  23. Alexander C Klaiber and Henry M Levy. An architecture for software-controlled data prefetching. In ACM SIGARCH Computer Architecture News, volume 19, pages 43-53. ACM, 1991. Google Scholar
  24. Manjunath Kudlur and Scott Mahlke. Orchestrating the execution of stream programs on multicore platforms. In ACM SIGPLAN Notices, volume 43, pages 114-124. ACM, 2008. Google Scholar
  25. Lian Li, Jingling Xue, and Jens Knoop. Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs. ACM Transactions on Embedded Computing Systems (TECS), 10(2):28, 2010. Google Scholar
  26. Cl'audio Maia, Luis Nogueira, Luis Miguel Pinho, and Daniel Gracia P'erez. A closer look into the aer model. In Emerging Technologies and Factory Automation (ETFA), 2016 IEEE 21st International Conference on, pages 1-8. IEEE, 2016. Google Scholar
  27. Renato Mancuso, Roman Dudko, and Marco Caccamo. Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems. In Embedded and Real-Time Computing Systems and Applications (RTCSA), 2014 IEEE 20th International Conference on, pages 1-10. IEEE, 2014. Google Scholar
  28. Pierre Michaud. Best-Offset Hardware Prefetching. In International Symposium on High-Performance Computer Architecture, Barcelona, Spain, March 2016. URL:
  29. Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. A predictable execution model for COTS-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 269-279. IEEE, 2011. Google Scholar
  30. Benjamin Rouxel, Steven Derrien, and Isabelle Puaut. Tightening contention delays while scheduling parallel applications on multi-core architecture. In Embedded Software (EMSOFT), 2017 International Conference on. ACM, 2017. Google Scholar
  31. Benjamin Rouxel and Isabelle Puaut. STR2RTS: Refactored StreamIT Benchmarks into Statically Analyzable Parallel Benchmarks for WCET Estimation &Real-Time Scheduling. In Jan Reineke, editor, 17th International Workshop on Worst-Case Execution Time Analysis (WCET 2017), volume 57 of OpenAccess Series in Informatics (OASIcs), pages 1-12, Dagstuhl, Germany, 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL:
  32. Martin Schoeberl, Florian Brandner, Stefan Hepp, Wolfgang Puffitsch, and Daniel Prokesch. Patmos reference handbook. Technical University of Denmark, Tech. Rep, 2015. Google Scholar
  33. Stefanos Skalistis and Alena Simalatsar. Near-optimal deployment of dataflow applications on many-core platforms with real-time guarantees. In 2017 Design, Automation &Test in Europe Conference &Exhibition (DATE), pages 752-757. IEEE, 2017. Google Scholar
  34. Muhammad Refaat Soliman and Rodolfo Pellizzoni. WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching. In Marko Bertogna, editor, 29th Euromicro Conference on Real-Time Systems (ECRTS 2017), volume 76 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:23, Dagstuhl, Germany, 2017. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL:
  35. Rohan Tabish, Renato Mancuso, Saud Wasly, Ahmed Alhammad, Sujit S Phatak, Rodolfo Pellizzoni, and Marco Caccamo. A real-time scratchpad-centric os for multi-core embedded systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016 IEEE, pages 1-11. IEEE, 2016. Google Scholar
  36. Hideki Takase, Hiroyuki Tomiyama, and Hiroaki Takada. Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems. In Design, Automation &Test in Europe Conference &Exhibition (DATE), 2010, pages 1124-1129. IEEE, 2010. Google Scholar
  37. Pranav Tendulkar, Peter Poplavko, Ioannis Galanommatis, and Oded Maler. Many-core scheduling of data parallel applications using SMT solvers. In Digital System Design (DSD), 2014 17th Euromicro Conference on, pages 615-622. IEEE, 2014. Google Scholar
  38. Saud Wasly and Rodolfo Pellizzoni. A dynamic scratchpad memory unit for predictable real-time embedded systems. In Real-Time Systems (ECRTS), 2013 25th Euromicro Conference on, pages 183-192. IEEE, 2013. Google Scholar
  39. Saud Wasly and Rodolfo Pellizzoni. Hiding memory latency using fixed priority scheduling. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014 IEEE 20th, pages 75-86. IEEE, 2014. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail