AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

Otterness, Nathan; Anderson, James H.

doi:10.4230/LIPIcs.ECRTS.2020.10

Abstract

Graphics processing units (GPUs) manufactured by NVIDIA continue to dominate many fields of research, including real-time GPU-management. NVIDIA’s status as a key enabling technology for deep learning and image processing makes this unsurprising, especially when combined with the company’s push into embedded, safety-critical domains like autonomous driving. NVIDIA’s primary competitor, AMD, has received comparatively little attention, due in part to few embedded offerings and a lack of support from popular deep-learning toolkits. Recently, however, AMD’s ROCm (Radeon Open Compute) software platform was made available to address at least the second of these two issues, but is ROCm worth the attention of safety-critical software developers? In order to answer this question, this paper explores the features and pitfalls of AMD GPUs, focusing on contrasting details with NVIDIA’s GPU hardware and software. We argue that an open software stack such as ROCm may be able to provide much-needed flexibility and reproducibility in the context of real-time GPU research, where new algorithmic or analysis techniques should typically remain agnostic to the underlying GPU architecture. In support of this claim, we summarize how closed-source platforms have obstructed prior research using NVIDIA GPUs, and then demonstrate that AMD may be a viable alternative by modifying components of the ROCm software stack to implement spatial partitioning. Finally, we present a case study using the PyTorch deep-learning framework that demonstrates the impact such modifications can have on complex real-world software.

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016.
Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms (Artifact). Dagstuhl Artifacts Series, 4(2):3:1-3:2, 2018. URL: http://drops.dagstuhl.de/opus/volltexte/2018/8971.
Tanya Amert, Nathan Otterness, James Anderson, and F. D. Smith. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In IEEE Real-Time Systems Symposium (RTSS), 2017.
PyTorch Authors. PyTorch. Online at https://pytorch.org/, 2020. URL: https://pytorch.org/.
Can Basaran and Kyoung-Don Kang. Supporting preemptive task executions and memory copies in GPGPUs. In Euromicro Conference on Real-Time Systems (ECRTS), 2012.
Nicola Capodieci, Roberto Cavicchioli, Marko Bertogna, and Aingara Paramakuru. Deadline-based scheduling for GPU with preemption support. In IEEE Real-Time Systems Symposium (RTSS), 2018.
Nicola Capodieci, Roberto Cavicchioli, Paolo Valente, and Marko Bertogna. SiGAMMA: Server based integrated GPU arbitration mechanism for memory accesses. In International Conference on Real-Time Networks and Systems (RTNS), pages 48-57, 2017.
Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna. Novel methodologies for predictable CPU-to-GPU command offloading. In Euromicro Conference on Real-Time Systems (ECRTS), 2019.
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization (IISWC), 2009.
AMD Corporation. AMD graphics core next (GCN) architecture. Online at https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-architecture.pdf, accessed September 2019., 2011. URL: https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-architecture.pdf.
AMD Corporation. Radeon: Dissecting the polaris architecture (white paper). Online at https://www.amd.com/system/files/documents/polaris-whitepaper.pdf, acessed September 2019., 2016. URL: https://www.amd.com/system/files/documents/polaris-whitepaper.pdf.
AMD Corporation. ROCm, a new era in open GPU computing. Online at https://rocm.github.io/, 2016. URL: https://rocm.github.io/.
AMD Corporation. Radeon’s next-generation Vega architecture. Online at https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf, accessed September 2019., 2017. URL: https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf.
AMD Corporation. "Vega" instruction set architecture: Reference guide. Online at https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf, 2017. URL: https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf.
AMD Corporation. Introducing RDNA architecture. Online at https://www.amd.com/system/files/documents/rdna-whitepaper.pdf, accessed February 2020., 2019. URL: https://www.amd.com/system/files/documents/rdna-whitepaper.pdf.
NVIDIA Corporation. CUDA C programming guide. Online at https://docs.nvidia.com/cuda/cuda-c-programming-guide/, 2019. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/.
Glenn A Elliott, Bryan C Ward, and James H Anderson. GPUSync: A framework for real-time GPU management. In IEEE Real-Time Systems Symposium (RTSS), 2013.
Björn Forsberg, Andrea Marongiu, and Luca Benini. GPUguard: Towards supporting a predictable execution model for heterogeneous SoC. In Proceedings of the Conference on Design, Automation & Test in Europe, 2017.
HSA Foundation. HSA platform system architecture specification. Online at http://www.hsafoundation.com/?ddownload=5702, 2018. URL: http://www.hsafoundation.com/?ddownload=5702.
HSA Foundation. HSA runtime programmer’s reference manual. Online at http://www.hsafoundation.com/?ddownload=5704, 2018. URL: http://www.hsafoundation.com/?ddownload=5704.
Horace He. The state of machine learning frameworks in 2019. Online at https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/, October 2019. URL: https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/.
Přemysl Houdek, Michal Sojka, and Zdeněk Hanzálek. Towards predictable execution model on ARM-based heterogeneous platforms. In International Symposium on Industrial Electronics (ISIE), 2017.
LLVM Compiler Infrastructure. User guide for AMDGPU backend. Online at https://llvm.org/docs/AMDGPUUsage.html, 2019. URL: https://llvm.org/docs/AMDGPUUsage.html.
Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan (Raj) Rajkumar. Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2019.
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
Will Judd. AMD Radeon RX 570 benchmarks: A capable 1080p workhorse. Online at https://www.eurogamer.net/articles/digitalfoundry-2019-05-01-amd-radeon-rx-570-benchmarks-7001, June 2019. URL: https://www.eurogamer.net/articles/digitalfoundry-2019-05-01-amd-radeon-rx-570-benchmarks-7001.
Shinpei Kato, Karthik Lakshmanan, Aman Kumar, Mihir Kelkar, Yutaka Ishikawa, and Ragunathan Rajkumar. RGEM: A responsive GPGPU execution model for runtime engines. In IEEE Real-Time Systems Symposium (RTSS), 2011.
Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, 2011.
Michael Larabel. AMDKFD is present for linux 3.19 in open-source HSA start. Phoronix.com, 2014. Online at URL: https://www.phoronix.com/scan.php?page=news_item&px=MTg1MzE.
Haeseung Lee and Mohammed Abdullah Al Faruque. Run-time scheduling framework for event-driven applications on a GPU-based embedded system. In IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (TCAD), 2016.
Hyeonsu Lee, Jaehun Roh, and Euiseong Seo. A GPU kernel transactionization scheme for preemptive priority scheduling. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2018.
NVIDIA. CUDA toolkit. Online at https://developer.nvidia.com/cuda-toolkit, 2019. URL: https://developer.nvidia.com/cuda-toolkit.
NVIDIA. NVIDIA cuDNN. Online at https://developer.nvidia.com/cudnn, 2019. URL: https://developer.nvidia.com/cudnn.
Nathan Otterness, Ming Yang, Tanya Amert, James Anderson, and F. D. Smith. Inferring the scheduling policies of an embedded cuda GPU. In Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT), 2017.
Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James Anderson, F.D. Smith Smith, Alex Berg, and Shige Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2017.
Sujan Kumar Saha. Spatio-temporal GPU management for real-time cyber-physical systems. Master’s thesis, UC Riverside, 2018.
Puja Tayal. NVIDIA loses some discrete GPU market share to AMD. Online at https://articles2.marketrealist.com/2019/06/nvidia-loses-some-discrete-gpu-market-share-to-amd/, June 2019. URL: https://articles2.marketrealist.com/2019/06/nvidia-loses-some-discrete-gpu-market-share-to-amd/.
Uri Verner, Avi Mendelson, and Assaf Schuster. Batch method for efficient resource sharing in real-time multi-GPU systems. In International Conference on Distributed Computing and Networking. Springer, 2014.
Uri Verner, Avi Mendelson, and Assaf Schuster. Scheduling periodic real-time communication in multi-GPU systems. In IEEE International Conference on Computer Communication and Networks (ICCCN), 2014.
Uri Verner, Assaf Schuster, Mark Silberstein, and Avi Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-GPU systems. In ACM International Systems and Storage Conference, 2012.
Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H Anderson, and F Donelson Smith. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In Euromicro Conference on Real-Time Systems (ECRTS), 2018.
Ming Yang, Shige Wang, Joshua Bakita, Thanh Vu, F Donelson Smith, James H Anderson, and Jan-Michael Frahm. Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: Addressing an industrial challenge. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2019.
Husheng Zhou, Guangmo Tong, and Cong Liu. GPES: A preemptive execution system for GPGPU computing. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015.

AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

Authors Nathan Otterness, James H. Anderson

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

Authors Nathan Otterness, James H. Anderson

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message