Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms

Ali, Waqar; Yun, Heechul

doi:10.4230/LIPIcs.ECRTS.2018.19

File

Subject Classification

ACM Subject Classification

Software and its engineering → Real-time schedulability
Computer systems organization → Heterogeneous (hybrid) systems
Computer systems organization → Processors and memory architectures

Keywords

GPU
memory bandwidth
resource contention
CPU throttling
fair scheduler

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetson TX2 platform, an integrated CPU-GPU architecture, we observed that, in the worst case, the GPU kernels can suffer as much as 3X slowdown in the presence of co-running memory intensive CPU applications. In this paper, we propose a software mechanism, which we call BWLOCK++, to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications.

Cite As Get BibTex

Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ECRTS.2018.19

Author Details

Waqar Ali

University of Kansas, Lawrence, USA

Heechul Yun

University of Kansas, Lawrence, USA

References

Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. Page placement strategies for gpus within heterogeneous memory systems. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
Tanya Amert, Nathan Otterness, Ming Yang, James H. Anderson, and F. Donelson Smith. Gpu scheduling on the nvidia tx2: Hidden details revealed. In IEEE Real-Time Systems Symposium (RTSS), 2017.
N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. Wellings. Applying new scheduling theory to static priority preemptive scheduling. Software Engineering Journal, 8(5):284-292, 1993.
Nicola Capodieci, Roberto Cavicchioli, Paolo Valente, and Marko Bertogna. Sigamma: Server based gpu arbitration mechanism for memory accesses. In International Conference on Real-Time Networks and Systems (RTNS), 2017.
NVIDIA Corp. Nvidia jetson platforms. URL: https://developer.nvidia.com/embedded-computing.
Glenn A. Elliott, Bryan C. Ward, and James H. Anderson. Gpusync: A framework for real-time gpu management. In IEEE Real-Time Systems Symposium (RTSS), 2013.
Björn Forsberg, Andrea Marongiu, and Luca Benini. Gpuguard: Towards supporting a predictable execution model for heterogeneous soc. In Design, Automation &Test in Europe (DATE), 2017.
Greg Kroah Hartman. Modifying a dynamic library without changing the source code | linux journal. URL: http://www.linuxjournal.com/article/7795.
Shinpei Kato, Karthik Lakshmanan, Ragunathan (Raj) Rajkumar, and Yutaka Ishikawa. Timegraph: Gpu scheduling for real-time multi-tasking environments. In USENIX Annual Technical Conference (ATC), 2011.
Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Brandt Scott. Gdev: First-class gpu resource management in the operating system. In USENIX Annual Technical Conference (ATC), 2012.
Shinpei Kato, Eijiro Takeuchi, Yoshiki Ishiguro, Yoshiki Ninomiya, Kazuya Takeda, and Tsuyoshi Hamada. An open approach to autonomous vehicles. IEEE Micro, 35(6):60-68, 2015.
Hyoseung Kim, Pratyush Patel, Shige Wang, and Ragunathan (Raj) Rajkumar. A server based approach for predictable gpu access control. In Embedded and Real-Time Computing Systems and Applications (RTCSA), 2017.
Ingo Molnar. Modular scheduler core and completely fair scheduler. URL: https://lwn.net/Articles/230501.
Nathan Otterness, Ming Yang, Sarah Rust, and Eunbyun Park. Inferring the scheduling policies of an embedded cuda gpu. In Workshop on Operating Systems Platforms for Embedded Real Time Systems Applications (OSPERT), 2017.
Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alexander C. Berg, and Shige Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2017.
Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. A predictable execution model for cots-based embedded systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2011.
Lui Sha, Ragunathan (Raj) Rajkumar, and John P. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Transactions on computers, 39(9):1175-1185, 1990.
John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen mei W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical report, University of Illinois at Urbana-Champaign, 2012.
Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. Taming non-blocking caches to improve isolation in multicore real-time systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016.
Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. Addressing isolation challenges of non-blocking caches for multicore real-time systems. Real-Time Systems, 53(5):673-708, 2017.
Heechul Yun, Waqar Ali, Santosh Gondi, and Siddhartha Biswas. Bwlock: A dynamic memory access control framework for soft real-time applications on multicore platforms. IEEE Transactions on Computers (TC), PP(99):1-1, 2016.
Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2013.
Husheng Zhou, Guangmo Tong, and Cong Liu. Gpes: a preemptive execution system for gpgpu computing. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015.

Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms

Authors Waqar Ali, Heechul Yun

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms

Authors Waqar Ali, Heechul Yun

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Supplementary Materials

References

Thanks for your feedback!

Could not send message