Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms

Authors Waqar Ali, Heechul Yun

Thumbnail PDF


  • Filesize: 0.86 MB
  • 22 pages

Document Identifiers

Author Details

Waqar Ali
  • University of Kansas, Lawrence, USA
Heechul Yun
  • University of Kansas, Lawrence, USA

Cite AsGet BibTex

Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetson TX2 platform, an integrated CPU-GPU architecture, we observed that, in the worst case, the GPU kernels can suffer as much as 3X slowdown in the presence of co-running memory intensive CPU applications. In this paper, we propose a software mechanism, which we call BWLOCK++, to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Real-time schedulability
  • Computer systems organization → Heterogeneous (hybrid) systems
  • Computer systems organization → Processors and memory architectures
  • GPU
  • memory bandwidth
  • resource contention
  • CPU throttling
  • fair scheduler


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. Page placement strategies for gpus within heterogeneous memory systems. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015. Google Scholar
  2. Tanya Amert, Nathan Otterness, Ming Yang, James H. Anderson, and F. Donelson Smith. Gpu scheduling on the nvidia tx2: Hidden details revealed. In IEEE Real-Time Systems Symposium (RTSS), 2017. Google Scholar
  3. N. Audsley, A. Burns, M. Richardson, K. Tindell, and A. Wellings. Applying new scheduling theory to static priority preemptive scheduling. Software Engineering Journal, 8(5):284-292, 1993. Google Scholar
  4. Nicola Capodieci, Roberto Cavicchioli, Paolo Valente, and Marko Bertogna. Sigamma: Server based gpu arbitration mechanism for memory accesses. In International Conference on Real-Time Networks and Systems (RTNS), 2017. Google Scholar
  5. NVIDIA Corp. Nvidia jetson platforms. URL:
  6. Glenn A. Elliott, Bryan C. Ward, and James H. Anderson. Gpusync: A framework for real-time gpu management. In IEEE Real-Time Systems Symposium (RTSS), 2013. Google Scholar
  7. Björn Forsberg, Andrea Marongiu, and Luca Benini. Gpuguard: Towards supporting a predictable execution model for heterogeneous soc. In Design, Automation &Test in Europe (DATE), 2017. Google Scholar
  8. Greg Kroah Hartman. Modifying a dynamic library without changing the source code | linux journal. URL:
  9. Shinpei Kato, Karthik Lakshmanan, Ragunathan (Raj) Rajkumar, and Yutaka Ishikawa. Timegraph: Gpu scheduling for real-time multi-tasking environments. In USENIX Annual Technical Conference (ATC), 2011. Google Scholar
  10. Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Brandt Scott. Gdev: First-class gpu resource management in the operating system. In USENIX Annual Technical Conference (ATC), 2012. Google Scholar
  11. Shinpei Kato, Eijiro Takeuchi, Yoshiki Ishiguro, Yoshiki Ninomiya, Kazuya Takeda, and Tsuyoshi Hamada. An open approach to autonomous vehicles. IEEE Micro, 35(6):60-68, 2015. Google Scholar
  12. Hyoseung Kim, Pratyush Patel, Shige Wang, and Ragunathan (Raj) Rajkumar. A server based approach for predictable gpu access control. In Embedded and Real-Time Computing Systems and Applications (RTCSA), 2017. Google Scholar
  13. Ingo Molnar. Modular scheduler core and completely fair scheduler. URL:
  14. Nathan Otterness, Ming Yang, Sarah Rust, and Eunbyun Park. Inferring the scheduling policies of an embedded cuda gpu. In Workshop on Operating Systems Platforms for Embedded Real Time Systems Applications (OSPERT), 2017. Google Scholar
  15. Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alexander C. Berg, and Shige Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2017. Google Scholar
  16. Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. A predictable execution model for cots-based embedded systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2011. Google Scholar
  17. Lui Sha, Ragunathan (Raj) Rajkumar, and John P. Lehoczky. Priority inheritance protocols: An approach to real-time synchronization. IEEE Transactions on computers, 39(9):1175-1185, 1990. Google Scholar
  18. John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen mei W. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical report, University of Illinois at Urbana-Champaign, 2012. Google Scholar
  19. Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. Taming non-blocking caches to improve isolation in multicore real-time systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016. Google Scholar
  20. Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. Addressing isolation challenges of non-blocking caches for multicore real-time systems. Real-Time Systems, 53(5):673-708, 2017. Google Scholar
  21. Heechul Yun, Waqar Ali, Santosh Gondi, and Siddhartha Biswas. Bwlock: A dynamic memory access control framework for soft real-time applications on multicore platforms. IEEE Transactions on Computers (TC), PP(99):1-1, 2016. Google Scholar
  22. Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2013. Google Scholar
  23. Husheng Zhou, Guangmo Tong, and Cong Liu. Gpes: a preemptive execution system for gpgpu computing. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail