Exploring iGPU Memory Interference Response to L2 Cache Locking

González, Alfonso Mascareñas; Chaudron, Jean-Baptiste; Leconte, Régine; Bouchebaba, Youcef; Doose, David

doi:10.4230/OASIcs.WCET.2023.3

File

OASIcs.WCET.2023.3.pdf

Filesize: 1.55 MB
11 pages

Document Identifiers

DOI: 10.4230/OASIcs.WCET.2023.3
URN: urn:nbn:de:0030-drops-184321

Author Details

Alfonso Mascareñas González

ISAE-SUPAERO, Université de Toulouse, France

Jean-Baptiste Chaudron

ISAE-SUPAERO, Université de Toulouse, France

Régine Leconte

ISAE-SUPAERO, Université de Toulouse, France

Youcef Bouchebaba

ONERA, Université de Toulouse, France

David Doose

ONERA, Université de Toulouse, France

Cite AsGet BibTex

Alfonso Mascareñas González, Jean-Baptiste Chaudron, Régine Leconte, Youcef Bouchebaba, and David Doose. Exploring iGPU Memory Interference Response to L2 Cache Locking. In 21th International Workshop on Worst-Case Execution Time Analysis (WCET 2023). Open Access Series in Informatics (OASIcs), Volume 114, pp. 3:1-3:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/OASIcs.WCET.2023.3

Abstract

The demand of parallel execution in real-time embedded applications has motivated the integration of GPUs as processing accelerators on SoCs (System-on-Chip) embedded architectures, often leading to CPU-iGPU architectures. In the safety-critical domain, it is paramount to ensure that the execution deadlines of critical tasks are not exceeded. To ease the analysis of this kind of tasks, we can make their worst-case execution time more predictable. One way to achieve this is by mitigating or controlling the memory interference generated by the concurrent execution of tasks through the application of a series of techniques (e.g., cache partitioning, bank partitioning, cache locking, bandwidth regulation). Originally, these were applied to CPUs, and more recently, to GPUs as well. In this work, we focus on the hardware-based L2 cache locking on iGPUs as memory interference mitigation mechanism. We are interested in evaluating its capacity for reducing the worst-case and the average-case execution time in different scenarios. Our measurement-based analysis has been carried out on the NVIDIA’s Jetson AGX Orin 64 GB MPSoC, making use of four representative benchmarks (data resetting, 2D convolution, 3D convolution and matrix upsampling).

Subject Classification

ACM Subject Classification

Computer systems organization → Embedded systems
Computer systems organization → Real-time systems

Keywords

iGPU
cache locking
real-time
memory interference

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Waqar Ali and Heechul Yun. Work-in-progress: Protecting real-time gpu applications on integrated cpu-gpu soc platforms. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 141-144, 2017. URL: https://doi.org/10.1109/RTAS.2017.26.
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481-2495, 2017. URL: https://doi.org/10.1109/TPAMI.2016.2644615.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016. URL: https://doi.org/10.1109/CVPR.2016.90.
Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan Rajkumar. Fractional gpus: Software-based compute and memory bandwidth reservation for gpus. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 29-41, 2019. URL: https://doi.org/10.1109/RTAS.2019.00011.
Tamara Lugo, Santiago Lozano, Javier Fernández, and Jesus Carretero. A survey of techniques for reducing interference in real-time applications on multicore platforms. IEEE Access, 10:21853-21882, 2022. URL: https://doi.org/10.1109/ACCESS.2022.3151891.
Antonio Martí-Campoy, Angel Perles, Francisco Rodríguez-Ballester, and J. Busquets-Mataix. Static use of locking caches vs. dynamic use of locking caches for real-time systems. In CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology, volume 2, pages 1283-1286 vol.2, June 2003. URL: https://doi.org/10.1109/CCECE.2003.1226134.
Alfonso Mascareñas González, Jean-Baptiste Chaudron, Frédéric Boniol, Youcef Bouchebaba, and Jean-Loup Bussenot. Task and memory mapping optimization for sdram interference minimization on heterogeneous mpsocs. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), pages 1-8. IEEE Press, 2022. URL: https://doi.org/10.1109/ETFA52439.2022.9921677.
Sparsh Mittal. A survey of techniques for cache locking. ACM Transactions on Design Automation of Electronic Systems, 21(3), May 2016. URL: https://doi.org/10.1145/2858792.
NVIDIA. CUDA C++ Best Practices Guide, May 2022.
NVIDIA. CUDA C++ Programming Guide, December 2022.
NVIDIA. NVIDIA Orin Series System-on-Chip - TECHNICAL REFERENCE MANUAL, March 2022.
NVIDIA. Ampere Tuning Guide, February 2023.
John Picchi and Wei Zhang. Impact of l2 cache locking on gpu performance. In SoutheastCon 2015, pages 1-4, 2015. URL: https://doi.org/10.1109/SECON.2015.7133036.
Xin Wang and Wei Zhang. Cache locking vs. partitioning for real-time computing on integrated cpu-gpu processors. In 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), pages 1-8, 2016. URL: https://doi.org/10.1109/PCCC.2016.7820644.
Heechul Yun, Renato Mancuso, Zheng-Pei Wu, and Rodolfo Pellizzoni. Palloc: Dram bank-aware memory allocator for performance isolation on multicore platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 155-166, 2014. URL: https://doi.org/10.1109/RTAS.2014.6925999.
Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64, 2013. URL: https://doi.org/10.1109/RTAS.2013.6531079.