WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching

Soliman, Muhammad Refaat; Pellizzoni, Rodolfo

doi:10.4230/LIPIcs.ECRTS.2017.24

Abstract

In recent years, the real-time community has produced a variety of approaches targeted at managing on-chip memory (scratchpads and caches) in a predictable way. However, to obtain safe WCET bounds, such techniques generally assume that the processor is stalled while waiting to reload the content of the on-chip memory; hence, they are less effective at hiding main memory latency compared to speculation-based techniques, such as hardware prefetching, that are largely used in general-purpose systems. In this work, we introduce a novel compiler-directed prefetching scheme for scratchpad memory that effectively hides the latency of main memory accesses by overlapping data transfers with the program execution. We implement and test an automated program compilation and optimization flow within the LLVM framework, and we show how to obtain improved WCET bounds through static analysis.

A. Alhammad, S. Wasly, and R. Pellizzoni. Memory efficient global scheduling of real-time tasks. In 21st IEEE Real-Time and Embedded Technology and Applications Symposium, pages 285-296, April 2015.
Oren Avissar, Rajeev Barua, and Dave Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst., 1(1):6-26, November 2002.
P. Burgio, A. Marongiu, P. Valente, and M. Bertogna. A memory-centric approach to enable timing-predictability within embedded many-core accelerators. In Real-Time and Embedded Systems and Technologies (RTEST), 2015 CSI Symposium on, pages 1-8, Oct 2015.
P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 238-252, Los Angeles, California, 1977. ACM Press, New York, NY.
M. Dasygenis, E. Brockmeyer, B. Durinck, F. Catthoor, D. Soudris, and A. Thanailakis. A combined dma and application-specific prefetching approach for tackling the memory latency bottleneck. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(3):279-291, March 2006.
Jean-Francois Deverge and Isabelle Puaut. Wcet-directed dynamic scratchpad memory allocation of data. In Proceedings of the 19th Euromicro Conference on Real-Time Systems, ECRTS'07, pages 179-190, Washington, DC, USA, 2007. IEEE Computer Society.
Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. J. Embedded Comput., 1(4):521-540, December 2005.
Poletti Francesco, Paul Marchal, David Atienza, Luca Benini, Francky Catthoor, and Jose M. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference, DAC'04, pages 238-243, New York, NY, USA, 2004. ACM.
Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Comput. Surv., 48(2):32:1-32:36, November 2015.
Richard Johnson, David Pearson, and Keshav Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI'94, pages 171-185, New York, NY, USA, 1994. ACM.
Sungjun Kim. Using scratchpad memory for stack data in hard real-time embedded systems. In Proceedings of the Memory Architecture and Organization Workshop, 2011.
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis &transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO'04, pages 75-, Washington, DC, USA, 2004. IEEE Computer Society.
Thomas Lundqvist. A WCET Analysis Method for Pipelined Microprocessors with Cache Memories. PhD thesis, School of Computer Science and Engineering, Chalmers University of Technology, Sweden, 2002.
R. Mancuso, R. Dudko, and M. Caccamo. Light-PREM: Automated software refactoring for predictable execution on cots embedded systems. In 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, pages 1-10, Aug 2014.
Alessandra Melani, Marko Bertogna, Vincenzo Bonifaci, Alberto Marchetti-Spaccamela, and Giorgio Buttazzo. Memory-processor co-scheduling in fixed priority systems. In Proceedings of the 23rd International Conference on Real Time and Networks Systems, RTNS'15, pages 87-96, New York, NY, USA, 2015. ACM.
Sparsh Mittal. A survey of recent prefetching techniques for processor caches. ACM Comput. Surv., 49(2):35:1-35:35, August 2016.
Nghi Nguyen, Angel Dominguez, and Rajeev Barua. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. ACM Trans. Embed. Comput. Syst., 8(3):21:1-21:32, April 2009.
David Patterson and John L. Hennessy. Computer architecture: a quantitative approach. Elsevier, 2012.
R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, and R. Kegley. A predictable execution model for cots-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 269-279, April 2011.
Muhammad R. Soliman and Rodolfo Pellizzoni. Data Scratchpad Prefetching for Real-time Systems. Technical report, University of Waterloo, UWSpace, 2017. URL: http://hdl.handle.net/10012/11837.
V. Suhendra, T. Mitra, A. Roychoudhury, and Ting Chen. Wcet centric data allocation to scratchpad memory. In 26th IEEE International Real-Time Systems Symposium (RTSS'05), pages 10 pp.-232, Dec 2005.
R. Tabish, R. Mancuso, S. Wasly, A. Alhammad, S. S. Phatak, R. Pellizzoni, and M. Caccamo. A real-time scratchpad-centric os for multi-core embedded systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 1-11, April 2016.
Stephan Thesing. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline Models. PhD thesis, Universität des Saarlandes, 2004.
Sumesh Udayakumaran, Angel Dominguez, and Rajeev Barua. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst., 5(2):472-511, May 2006.
Jussi Vanhatalo, Hagen Völzer, and Jana Koehler. The refined process structure tree. In Proceedings of the 6th International Conference on Business Process Management, BPM'08, pages 100-115, Berlin, Heidelberg, 2008. Springer-Verlag.
J. Whitham and N. Audsley. Studying the applicability of the scratchpad memory management unit. In 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 205-214, April 2010.
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David B. Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter P. Puschner, Jan Staschulat, and Per Stenström. The worst-case execution-time problem: Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., 7(3):36:1-36:53, May 2008. URL: http://dx.doi.org/10.1145/1347375.1347389.
Xuejun Yang, Li Wang, Jingling Xue, Tao Tang, Xiaoguang Ren, and Sen Ye. Improving scratchpad allocation with demand-driven data tiling. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES'10, pages 127-136, New York, NY, USA, 2010. ACM.
Y. Yang, M. Wang, Z. Shao, and M. Guo. Dynamic scratch-pad memory management with data pipelining for embedded systems. In Computational Science and Engineering, 2009. CSE'09. International Conference on, volume 2, pages 358-365, Aug 2009.

WCET-Driven Dynamic Data Scratchpad Management With Compiler-Directed Prefetching

Authors Muhammad Refaat Soliman, Rodolfo Pellizzoni

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message