WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching

Authors Muhammad Refaat Soliman, Rodolfo Pellizzoni

Thumbnail PDF


  • Filesize: 4 MB
  • 23 pages

Document Identifiers

Author Details

Muhammad Refaat Soliman
Rodolfo Pellizzoni

Cite AsGet BibTex

Muhammad Refaat Soliman and Rodolfo Pellizzoni. WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 76, pp. 24:1-24:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


In recent years, the real-time community has produced a variety of approaches targeted at managing on-chip memory (scratchpads and caches) in a predictable way. However, to obtain safe WCET bounds, such techniques generally assume that the processor is stalled while waiting to reload the content of the on-chip memory; hence, they are less effective at hiding main memory latency compared to speculation-based techniques, such as hardware prefetching, that are largely used in general-purpose systems. In this work, we introduce a novel compiler-directed prefetching scheme for scratchpad memory that effectively hides the latency of main memory accesses by overlapping data transfers with the program execution. We implement and test an automated program compilation and optimization flow within the LLVM framework, and we show how to obtain improved WCET bounds through static analysis.
  • scratchpad
  • LLVM
  • prefetching
  • real-time
  • genetic algorithm


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. A. Alhammad, S. Wasly, and R. Pellizzoni. Memory efficient global scheduling of real-time tasks. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, pages 285-296, April 2015. Google Scholar
  2. Oren Avissar, Rajeev Barua, and Dave Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst., 1(1):6-26, November 2002. Google Scholar
  3. P. Burgio, A. Marongiu, P. Valente, and M. Bertogna. A memory-centric approach to enable timing-predictability within embedded many-core accelerators. In Real-Time and Embedded Systems and Technologies (RTEST), 2015 CSI Symposium on, pages 1-8, Oct 2015. Google Scholar
  4. P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 238-252, Los Angeles, California, 1977. ACM Press, New York, NY. Google Scholar
  5. M. Dasygenis, E. Brockmeyer, B. Durinck, F. Catthoor, D. Soudris, and A. Thanailakis. A combined dma and application-specific prefetching approach for tackling the memory latency bottleneck. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(3):279-291, March 2006. Google Scholar
  6. Jean-Francois Deverge and Isabelle Puaut. Wcet-directed dynamic scratchpad memory allocation of data. In Proceedings of the 19th Euromicro Conference on Real-Time Systems, ECRTS'07, pages 179-190, Washington, DC, USA, 2007. IEEE Computer Society. Google Scholar
  7. Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. J. Embedded Comput., 1(4):521-540, December 2005. Google Scholar
  8. Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference, DAC'04, pages 238-243, New York, NY, USA, 2004. ACM. Google Scholar
  9. Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv., 48(2):32:1-32:36, November 2015. Google Scholar
  10. Richard Johnson, David Pearson, and Keshav Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI'94, pages 171-185, New York, NY, USA, 1994. ACM. Google Scholar
  11. Sungjun Kim. Using scratchpad memory for stack data in hard real-time embedded systems. In Proceedings of the Memory Architecture and Organization Workshop, 2011. Google Scholar
  12. Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis &transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO'04, pages 75-, Washington, DC, USA, 2004. IEEE Computer Society. Google Scholar
  13. Thomas Lundqvist. A WCET Analysis Method for Pipelined Microprocessors with Cache Memories. PhD thesis, School of Computer Science and Engineering, Chalmers University of Technology, Sweden, 2002. Google Scholar
  14. R. Mancuso, R. Dudko, and M. Caccamo. Light-PREM: Automated software refactoring for predictable execution on cots embedded systems. In 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, pages 1-10, Aug 2014. Google Scholar
  15. Alessandra Melani, Marko Bertogna, Vincenzo Bonifaci, Alberto Marchetti-Spaccamela, and Giorgio Buttazzo. Memory-processor co-scheduling in fixed priority systems. In Proceedings of the 23rd International Conference on Real Time and Networks Systems, RTNS'15, pages 87-96, New York, NY, USA, 2015. ACM. Google Scholar
  16. Sparsh Mittal. A survey of recent prefetching techniques for processor caches. ACM Comput. Surv., 49(2):35:1-35:35, August 2016. Google Scholar
  17. Nghi Nguyen, Angel Dominguez, and Rajeev Barua. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. ACM Trans. Embed. Comput. Syst., 8(3):21:1-21:32, April 2009. Google Scholar
  18. David Patterson and John L. Hennessy. Computer architecture: a quantitative approach. Elsevier, 2012. Google Scholar
  19. R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, and R. Kegley. A predictable execution model for cots-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 269-279, April 2011. Google Scholar
  20. Muhammad R. Soliman and Rodolfo Pellizzoni. Data Scratchpad Prefetching for Real-time Systems. Technical report, University of Waterloo, UWSpace, 2017. URL: http://hdl.handle.net/10012/11837.
  21. V. Suhendra, T. Mitra, A. Roychoudhury, and Ting Chen. Wcet centric data allocation to scratchpad memory. In 26th IEEE International Real-Time Systems Symposium (RTSS'05), pages 10 pp.-232, Dec 2005. Google Scholar
  22. R. Tabish, R. Mancuso, S. Wasly, A. Alhammad, S. S. Phatak, R. Pellizzoni, and M. Caccamo. A real-time scratchpad-centric os for multi-core embedded systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-11, April 2016. Google Scholar
  23. Stephan Thesing. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline Models. PhD thesis, Universität des Saarlandes, 2004. Google Scholar
  24. Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst., 5(2):472-511, May 2006. Google Scholar
  25. Jussi Vanhatalo, Hagen Völzer, and Jana Koehler. The refined process structure tree. In Proceedings of the 6th International Conference on Business Process Management, BPM'08, pages 100-115, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
  26. J. Whitham and N. Audsley. Studying the applicability of the scratchpad memory management unit. In 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 205-214, April 2010. Google Scholar
  27. Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David B. Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter P. Puschner, Jan Staschulat, and Per Stenström. The worst-case execution-time problem: Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., 7(3):36:1-36:53, May 2008. URL: http://dx.doi.org/10.1145/1347375.1347389.
  28. Xuejun Yang, Li Wang, Jingling Xue, Tao Tang, Xiaoguang Ren, and Sen Ye. Improving scratchpad allocation with demand-driven data tiling. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES'10, pages 127-136, New York, NY, USA, 2010. ACM. Google Scholar
  29. Y. Yang, M. Wang, Z. Shao, and M. Guo. Dynamic scratch-pad memory management with data pipelining for embedded systems. In Computational Science and Engineering, 2009. CSE'09. International Conference on, volume 2, pages 358-365, Aug 2009. Google Scholar