Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Bakita, Joshua; Anderson, James H.

doi:10.4230/LIPIcs.ECRTS.2025.21

Abstract

As GPU-using tasks become more common in embedded, safety-critical systems, efficiency demands necessitate sharing a single GPU among multiple tasks. Unfortunately, existing ways to schedule multiple tasks onto a GPU often either result in a loss of ability to meet deadlines, or a loss of efficiency. In this work, we develop a system-level spatial compute partitioning mechanism for NVIDIA GPUs and demonstrate that it can be used to execute tasks efficiently without compromising timing predictability. Our tool, called nvtaskset, supports composable systems by not requiring task, driver, or hardware modifications. In our evaluation, we demonstrate sub-1-μs overheads, stronger partition enforcement, and finer-granularity partitioning when using our mechanism instead of NVIDIA’s Multi-Process Service (MPS) or Multi-instance GPU (MiG) features.

Karim M Abdalla, Lacky V Shah, Jerome F Duluk Jr, Timothy John Purcell, Tanmoy Mandal, and Gentaro Hirota. Scheduling and execution of compute tasks, June 2015. U.S. Patent 9,069,609.
Tanya Amert, Nathan Otterness, Ming Yang, James H. Anderson, and F. Donelson Smith. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In Proceedings of the 38th IEEE Real-Time Systems Symposium, pages 104-115, December 2017. URL: https://doi.org/10.1109/RTSS.2017.00017.
Tanya Amert, Zelin Tong, Sergey Voronov, Joshua Bakita, F Donelson Smith, and James H Anderson. TimeWall: Enabling time partitioning for real-time multicore+accelerator platforms. In Proceedings of the 42nd IEEE Real-Time Systems Symposium, pages 455-468, December 2021. URL: https://doi.org/10.1109/RTSS52674.2021.00048.
Joshua Bakita and James H. Anderson. Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems (Artifact). Software (visited on 2025-06-25). URL: https://www.cs.unc.edu/~jbakita/ecrts25-ae/
full metadata available at: https://doi.org/10.4230/artifacts.23733
Joshua Bakita and James H Anderson. Hardware compute partitioning on NVIDIA GPUs. In Proceedings of the 29th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 54-66, May 2023. URL: https://doi.org/10.1109/RTAS58335.2023.00012.
Joshua Bakita and James H Anderson. Demystifying NVIDIA GPU internals to enable reliable GPU management. In Proceedings of the 30th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 294-305, May 2024. URL: https://doi.org/10.1109/RTAS61025.2024.00031.
Nicola Capodieci, Roberto Cavicchioli, Marko Bertogna, and Aingara Paramakuru. Deadline-based scheduling for GPU with preemption support. In Proceedings of the 39th IEEE Real-Time Systems Symposium, pages 119-130, December 2018. URL: https://doi.org/10.1109/RTSS.2018.00021.
Nicola Capodieci, Roberto Cavicchioli, Ignacio Sañudo Olmedo, Marco Solieri, and Marko Bertogna. Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms. In Proceedings of the 26th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 1-10, August 2020. URL: https://doi.org/10.1109/RTCSA50079.2020.9203722.
Jerome F Duluk Jr, Luke Durant, Ramon Matas Navarro, Alan Menezes, Jeffrey Tuckey, Gentaro Hirota, and Brian Pharris. Dynamic partitioning of execution resources, April 2022. U.S. Patent 11,307,903.
Jerome F Duluk Jr, Gregory Scott Palmer, Jonathon Stuart Ramsey Evans, Shailendra Singh, Samuel H Duncan, Wishwesh Anil Gandhi, Lacky V Shah, Eric Rock, Feiqi Su, James Leroy Deming, et al. Techniques for configuring a processor to function as multiple, separate processors, February 2022. U.S. Patent 11,249,905.
Samuel H Duncan, Lacky V Shah, Sean J Treichler, Daniel Elliot Wexler, Jerome F Duluk Jr, Philip Browning Johnson, and Jonathon Stuart Ramsay Evans. Concurrent execution of independent streams in multi-channel time slice groups, September 2016. U.S. Patent 9,442,759.
Glenn A Elliott. Real-time scheduling for GPUs with applications in advanced automotive systems. PhD thesis, The University of North Carolina at Chapel Hill, 2015. URL: https://doi.org/10.17615/gk2m-0503.
Glenn A Elliott, Bryan C Ward, and James H Anderson. GPUSync: A framework for real-time GPU management. In Proceedings of the 34th Real-Time Systems Symposium, pages 33-44, December 2013. URL: https://doi.org/10.1109/RTSS.2013.12.
Kshitij Gupta, Jeff A. Stuart, and John D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. In Proceedings of the 2012 IEEE Innovative Parallel Computing Conference, pages 1-14, May 2012. URL: https://doi.org/10.1109/InPar.2012.6339596.
Ari B Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z Zhang. Decoding CUDA binary. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, pages 229-241, February 2019. URL: https://doi.org/10.1109/CGO.2019.8661186.
Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan Rajkumar. Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs. In Proceedings of the 25th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 29-41, April 2019. URL: https://doi.org/10.1109/RTAS.2019.00011.
Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. Dissecting the NVIDIA Turing T4 GPU via microbenchmarking, March 2019. URL: https://doi.org/10.48550/arXiv.1903.07486.
Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. Dissecting the NVIDIA Volta GPU architecture via microbenchmarking, April 2018. URL: https://doi.org/10.48550/arXiv.1804.06826.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25:1097-1105, December 2012.
Jochen Liedtke, Hermann Hartig, and Michael Hohmuth. Os-controlled cache predictability for real-time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium, pages 213-224, June 1997. URL: https://doi.org/10.1109/RTTAS.1997.601360.
Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, and Chengyong Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st International Conference on Parallel Architecture and Compilation Techniques, pages 367-376, September 2012. URL: https://doi.org/10.1145/2370816.2370869.
Sizhe Liu, Rohan Wagle, James H Anderson, Ming Yang, Chi Zhang, and Yunhua Li. Autonomy today: Many delay-prone black boxes. In Proceedings of the 36th Euromicro Conference on Real-Time Systems, pages 12:1-12:27, July 2024. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2024.12.
Albert Meixner. System and method for launching data parallel and task parallel application threads and graphics processing unit incorporating the same, March 2016. U.S. Patent 9,286,114 B2.
Mesa Project Authors. The Mesa 3D graphics library, 2022. URL: https://www.mesa3d.org/.
Nouveau Project Authors. Nouveau: Accelerated open source driver for nVidia cards, 2022. URL: https://nouveau.freedesktop.org/.
NVIDIA. CUDA C++ programming guide, 2022. Version PG-02829-001_v11.8.
NVIDIA. Linux open GPU kernel module source, 2024. URL: https://github.com/NVIDIA/open-gpu-kernel-modules.
NVIDIA. Multi-process service, 2024. Version R555.
NVIDIA. nvgpu git repository, 2024. URL: git://nv-tegra.nvidia.com/linux-nvgpu.git.
NVIDIA. NVIDIA multi-instance GPU user guide, 2024. Version RN-08625-v2.0.
NVIDIA. Open GPU documentation, 2024. URL: https://github.com/NVIDIA/open-gpu-doc.
NVIDIA. Parallel thread execution ISA, 2024. Version 8.5.
Ignacio Sañudo Olmedo, Nicola Capodieci, Jorge Luis Martinez, Andrea Marongiu, and Marko Bertogna. Dissecting the CUDA scheduling hierarchy: a performance and predictability perspective. In Proceedings of the 26th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 213-225, April 2020. URL: https://doi.org/10.1109/RTAS48715.2020.000-5.
Nathan Otterness and James H Anderson. Exploring AMD GPU scheduling details by experimenting with “worst practices”. In Proceedings of the 29th International Conference on Real-Time Networks and Systems, pages 24-34, April 2021. URL: https://doi.org/10.1145/3453417.3453432.
Nathan Otterness, Ming Yang, Tanya Amert, James Anderson, and F Donelson Smith. Inferring the scheduling policies of an embedded CUDA GPU. In Proceedings of the 13th Annual Workshop on Operating Systems Platforms for Embedded Real Time Applications, pages 47-52, July 2017.
Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James H. Anderson, F. Donelson Smith, Alex Berg, and Shige Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In Proceedings of the 23rd IEEE Real-Time and Embedded Technology and Applications Symposium, pages 353-364, April 2017. URL: https://doi.org/10.1109/RTAS.2017.3.
Timothy John Purcell, Lacky V Shah, and Jerome F Duluk Jr. Scheduling and management of compute tasks with different execution priority levels. U.S. Patent Application 13/236,473.
Sujan Kumar Saha, Yecheng Xiang, and Hyoseung Kim. STGM: Spatio-temporal GPU management for real-time tasks. In Proceedings of the 25th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 1-6, August 2019. URL: https://doi.org/10.1109/RTCSA.2019.8864564.
Roy Spliet and Robert Mullins. The case for limited-preemptive scheduling in GPUs for real-time systems. In Proceedings of 14th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications, pages 43-48, July 2018.
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS, pages 119-130, June 2015. URL: https://doi.org/10.1145/2751205.2751213.
Tyler Yandrofski, Leo Chen, Nathan Otterness, James H Anderson, and F Donelson Smith. Making powerful enemies on NVIDIA GPUs. In Proceedings of the 43rd IEEE Real-Time Systems Symposium, pages 383-395, December 2022. URL: https://doi.org/10.1109/RTSS55097.2022.00040.
Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H Anderson, and F Donelson Smith. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In Proceedings of the 30th Euromicro Conference on Real-Time Systems, pages 20:1-20:21, July 2018. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2018.20.
Ming Yang, Shige Wang, Joshua Bakita, Thanh Vu, F Donelson Smith, James H Anderson, and Jan-Michael Frahm. Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: Addressing an industrial challenge. In Proceedings of the 25th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 305-317, April 2019. URL: https://doi.org/10.1109/RTAS.2019.00033.
Chao Yu, Yuebin Bai, Hailong Yang, Kun Cheng, Yuhao Gu, Zhongzhi Luan, and Depei Qian. SMGuard: A flexible and fine-grained resource management framework for GPUs. IEEE Transactions on Parallel and Distributed Systems, 29(12):2849-2862, June 2018. URL: https://doi.org/10.1109/TPDS.2018.2848621.
Heechul Yun, Renato Mancuso, Zheng-Pei Wu, and Rodolfo Pellizzoni. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the 20th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 155-166, April 2014. URL: https://doi.org/10.1109/RTAS.2014.6925999.
Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yunzhe Li, Zhifeng Jiang, Yang Li, Xiaowen Chu, and Huaicheng Li. SGDRC: Software-defined dynamic resource control for concurrent dnn inference on nvidia gpus. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pages 267-281, March 2025. URL: https://doi.org/10.1145/3710848.3710863.
Husheng Zhou, Soroush Bateni, and Cong Liu. S³DNN: Supervised streaming and scheduling for GPU-accelerated real-time DNN workloads. In Proceedings of the 24th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 190-201, April 2018. URL: https://doi.org/10.1109/RTAS.2018.00028.
An Zou, Jing Li, Christopher D Gill, and Xuan Zhang. RTGPU: Real-time GPU scheduling of hard deadline parallel tasks with fine-grain utilization. IEEE Transactions on Parallel and Distributed Systems, 34(5):1450-1465, May 2023. URL: https://doi.org/10.1109/TPDS.2023.3235439.

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Authors Joshua Bakita , James H. Anderson

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Authors Joshua Bakita , James H. Anderson

Files

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Supplementary Materials

References

Thanks for your feedback!

Could not send message