AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

Authors Nathan Otterness, James H. Anderson



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2020.10.pdf
  • Filesize: 0.67 MB
  • 23 pages

Document Identifiers

Author Details

Nathan Otterness
  • The University of North Carolina at Chapel Hill, NC, USA
James H. Anderson
  • The University of North Carolina at Chapel Hill, NC, USA

Cite AsGet BibTex

Nathan Otterness and James H. Anderson. AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 165, pp. 10:1-10:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ECRTS.2020.10

Abstract

Graphics processing units (GPUs) manufactured by NVIDIA continue to dominate many fields of research, including real-time GPU-management. NVIDIA’s status as a key enabling technology for deep learning and image processing makes this unsurprising, especially when combined with the company’s push into embedded, safety-critical domains like autonomous driving. NVIDIA’s primary competitor, AMD, has received comparatively little attention, due in part to few embedded offerings and a lack of support from popular deep-learning toolkits. Recently, however, AMD’s ROCm (Radeon Open Compute) software platform was made available to address at least the second of these two issues, but is ROCm worth the attention of safety-critical software developers? In order to answer this question, this paper explores the features and pitfalls of AMD GPUs, focusing on contrasting details with NVIDIA’s GPU hardware and software. We argue that an open software stack such as ROCm may be able to provide much-needed flexibility and reproducibility in the context of real-time GPU research, where new algorithmic or analysis techniques should typically remain agnostic to the underlying GPU architecture. In support of this claim, we summarize how closed-source platforms have obstructed prior research using NVIDIA GPUs, and then demonstrate that AMD may be a viable alternative by modifying components of the ROCm software stack to implement spatial partitioning. Finally, we present a case study using the PyTorch deep-learning framework that demonstrates the impact such modifications can have on complex real-world software.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Heterogeneous (hybrid) systems
  • Computer systems organization → Real-time systems
  • Software and its engineering → Scheduling
  • Software and its engineering → Concurrency control
  • Computing methodologies → Graphics processors
  • Computing methodologies → Concurrent computing methodologies
Keywords
  • real-time systems
  • graphics processing units
  • parallel computing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016. Google Scholar
  2. Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms (Artifact). Dagstuhl Artifacts Series, 4(2):3:1-3:2, 2018. URL: http://drops.dagstuhl.de/opus/volltexte/2018/8971.
  3. Tanya Amert, Nathan Otterness, James Anderson, and F. D. Smith. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In IEEE Real-Time Systems Symposium (RTSS), 2017. Google Scholar
  4. PyTorch Authors. PyTorch. Online at https://pytorch.org/, 2020. URL: https://pytorch.org/.
  5. Can Basaran and Kyoung-Don Kang. Supporting preemptive task executions and memory copies in GPGPUs. In Euromicro Conference on Real-Time Systems (ECRTS), 2012. Google Scholar
  6. Nicola Capodieci, Roberto Cavicchioli, Marko Bertogna, and Aingara Paramakuru. Deadline-based scheduling for GPU with preemption support. In IEEE Real-Time Systems Symposium (RTSS), 2018. Google Scholar
  7. Nicola Capodieci, Roberto Cavicchioli, Paolo Valente, and Marko Bertogna. SiGAMMA: Server based integrated GPU arbitration mechanism for memory accesses. In International Conference on Real-Time Networks and Systems (RTNS), pages 48-57, 2017. Google Scholar
  8. Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna. Novel methodologies for predictable CPU-to-GPU command offloading. In Euromicro Conference on Real-Time Systems (ECRTS), 2019. Google Scholar
  9. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization (IISWC), 2009. Google Scholar
  10. AMD Corporation. AMD graphics core next (GCN) architecture. Online at https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-architecture.pdf, accessed September 2019., 2011. URL: https://www.techpowerup.com/gpu-specs/docs/amd-gcn1-architecture.pdf.
  11. AMD Corporation. Radeon: Dissecting the polaris architecture (white paper). Online at https://www.amd.com/system/files/documents/polaris-whitepaper.pdf, acessed September 2019., 2016. URL: https://www.amd.com/system/files/documents/polaris-whitepaper.pdf.
  12. AMD Corporation. ROCm, a new era in open GPU computing. Online at https://rocm.github.io/, 2016. URL: https://rocm.github.io/.
  13. AMD Corporation. Radeon’s next-generation Vega architecture. Online at https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf, accessed September 2019., 2017. URL: https://www.techpowerup.com/gpu-specs/docs/amd-vega-architecture.pdf.
  14. AMD Corporation. "Vega" instruction set architecture: Reference guide. Online at https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf, 2017. URL: https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf.
  15. AMD Corporation. Introducing RDNA architecture. Online at https://www.amd.com/system/files/documents/rdna-whitepaper.pdf, accessed February 2020., 2019. URL: https://www.amd.com/system/files/documents/rdna-whitepaper.pdf.
  16. NVIDIA Corporation. CUDA C programming guide. Online at https://docs.nvidia.com/cuda/cuda-c-programming-guide/, 2019. URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/.
  17. Glenn A Elliott, Bryan C Ward, and James H Anderson. GPUSync: A framework for real-time GPU management. In IEEE Real-Time Systems Symposium (RTSS), 2013. Google Scholar
  18. Björn Forsberg, Andrea Marongiu, and Luca Benini. GPUguard: Towards supporting a predictable execution model for heterogeneous SoC. In Proceedings of the Conference on Design, Automation & Test in Europe, 2017. Google Scholar
  19. HSA Foundation. HSA platform system architecture specification. Online at http://www.hsafoundation.com/?ddownload=5702, 2018. URL: http://www.hsafoundation.com/?ddownload=5702.
  20. HSA Foundation. HSA runtime programmer’s reference manual. Online at http://www.hsafoundation.com/?ddownload=5704, 2018. URL: http://www.hsafoundation.com/?ddownload=5704.
  21. Horace He. The state of machine learning frameworks in 2019. Online at https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/, October 2019. URL: https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/.
  22. Přemysl Houdek, Michal Sojka, and Zdeněk Hanzálek. Towards predictable execution model on ARM-based heterogeneous platforms. In International Symposium on Industrial Electronics (ISIE), 2017. Google Scholar
  23. LLVM Compiler Infrastructure. User guide for AMDGPU backend. Online at https://llvm.org/docs/AMDGPUUsage.html, 2019. URL: https://llvm.org/docs/AMDGPUUsage.html.
  24. Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan (Raj) Rajkumar. Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2019. Google Scholar
  25. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. Google Scholar
  26. Will Judd. AMD Radeon RX 570 benchmarks: A capable 1080p workhorse. Online at https://www.eurogamer.net/articles/digitalfoundry-2019-05-01-amd-radeon-rx-570-benchmarks-7001, June 2019. URL: https://www.eurogamer.net/articles/digitalfoundry-2019-05-01-amd-radeon-rx-570-benchmarks-7001.
  27. Shinpei Kato, Karthik Lakshmanan, Aman Kumar, Mihir Kelkar, Yutaka Ishikawa, and Ragunathan Rajkumar. RGEM: A responsive GPGPU execution model for runtime engines. In IEEE Real-Time Systems Symposium (RTSS), 2011. Google Scholar
  28. Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, 2011. Google Scholar
  29. Michael Larabel. AMDKFD is present for linux 3.19 in open-source HSA start. Phoronix.com, 2014. Online at URL: https://www.phoronix.com/scan.php?page=news_item&px=MTg1MzE.
  30. Haeseung Lee and Mohammed Abdullah Al Faruque. Run-time scheduling framework for event-driven applications on a GPU-based embedded system. In IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems (TCAD), 2016. Google Scholar
  31. Hyeonsu Lee, Jaehun Roh, and Euiseong Seo. A GPU kernel transactionization scheme for preemptive priority scheduling. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2018. Google Scholar
  32. NVIDIA. CUDA toolkit. Online at https://developer.nvidia.com/cuda-toolkit, 2019. URL: https://developer.nvidia.com/cuda-toolkit.
  33. NVIDIA. NVIDIA cuDNN. Online at https://developer.nvidia.com/cudnn, 2019. URL: https://developer.nvidia.com/cudnn.
  34. Nathan Otterness, Ming Yang, Tanya Amert, James Anderson, and F. D. Smith. Inferring the scheduling policies of an embedded cuda GPU. In Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT), 2017. Google Scholar
  35. Nathan Otterness, Ming Yang, Sarah Rust, Eunbyung Park, James Anderson, F.D. Smith Smith, Alex Berg, and Shige Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2017. Google Scholar
  36. Sujan Kumar Saha. Spatio-temporal GPU management for real-time cyber-physical systems. Master’s thesis, UC Riverside, 2018. Google Scholar
  37. Puja Tayal. NVIDIA loses some discrete GPU market share to AMD. Online at https://articles2.marketrealist.com/2019/06/nvidia-loses-some-discrete-gpu-market-share-to-amd/, June 2019. URL: https://articles2.marketrealist.com/2019/06/nvidia-loses-some-discrete-gpu-market-share-to-amd/.
  38. Uri Verner, Avi Mendelson, and Assaf Schuster. Batch method for efficient resource sharing in real-time multi-GPU systems. In International Conference on Distributed Computing and Networking. Springer, 2014. Google Scholar
  39. Uri Verner, Avi Mendelson, and Assaf Schuster. Scheduling periodic real-time communication in multi-GPU systems. In IEEE International Conference on Computer Communication and Networks (ICCCN), 2014. Google Scholar
  40. Uri Verner, Assaf Schuster, Mark Silberstein, and Avi Mendelson. Scheduling processing of real-time data streams on heterogeneous multi-GPU systems. In ACM International Systems and Storage Conference, 2012. Google Scholar
  41. Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H Anderson, and F Donelson Smith. Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In Euromicro Conference on Real-Time Systems (ECRTS), 2018. Google Scholar
  42. Ming Yang, Shige Wang, Joshua Bakita, Thanh Vu, F Donelson Smith, James H Anderson, and Jan-Michael Frahm. Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: Addressing an industrial challenge. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2019. Google Scholar
  43. Husheng Zhou, Guangmo Tong, and Cong Liu. GPES: A preemptive execution system for GPGPU computing. In IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015. Google Scholar