Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems

Yang, Ming; Otterness, Nathan; Amert, Tanya; Bakita, Joshua; Anderson, James H.; Smith, F. Donelson

doi:10.4230/LIPIcs.ECRTS.2018.20

Abstract

NVIDIA's CUDA API has enabled GPUs to be used as computing accelerators across a wide range of applications. This has resulted in performance gains in many application domains, but the underlying GPU hardware and software are subject to many non-obvious pitfalls. This is particularly problematic for safety-critical systems, where worst-case behaviors must be taken into account. While such behaviors were not a key concern for earlier CUDA users, the usage of GPUs in autonomous vehicles has taken CUDA programs out of the sole domain of computer-vision and machine-learning experts and into safety-critical processing pipelines. Certification is necessary in this new domain, which is problematic because GPU software may have been developed without any regard for worst-case behaviors. Pitfalls when using CUDA in real-time autonomous systems can result from the lack of specifics in official documentation, and developers of GPU software not being aware of the implications of their design choices with regards to real-time requirements. This paper focuses on the particular challenges facing the real-time community when utilizing CUDA-enabled GPUs for autonomous applications, and best practices for applying real-time safety-critical principles.

T. Amert, N. Otterness, M. Yang, J. Anderson, and F. D. Smith. GPU scheduling on the NVIDIA TX2: Hidden details revealed. In RTSS 2017, pages 104-115. IEEE Computer Society, 2017. URL: http://dx.doi.org/10.1109/RTSS.2017.00017.
J. Aumiller, S. Brandt, S. Kato, and N. Rath. Supporting low-latency CPS using GPUs and direct I/O schemes. In RTCSA '12, pages 437-442. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/RTCSA.2012.59.
C. Basaran and K. Kang. Supporting preemptive task executions and memory copies in GPGPUs. In ECRTS '12, pages 287-296. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/ECRTS.2012.15.
K. Berezovskyi, K. Bletsas, and B. Andersson. Makespan computation for GPU threads running on a single streaming multiprocessor. In ECRTS '12, pages 277-286. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/ECRTS.2012.16.
K. Berezovskyi, K. Bletsas, and S. Petters. Faster makespan estimation for GPU threads on a single streaming multiprocessor. In ETFA '13, pages 1-8. IEEE, 2013. URL: http://dx.doi.org/10.1109/ETFA.2013.6647966.
K. Berezovskyi, F. Guet, L. Santinelli, K. Bletsas, and E. Tovar. Measurement-based probabilistic timing analysis for graphics processor units. In ARCS '16, volume 9637 of Lecture Notes in Computer Science, pages 223-236. Springer, 2016. URL: http://dx.doi.org/10.1007/978-3-319-30695-7_17.
K. Berezovskyi, L. Santinelli, K. Bletsas, and E. Tovar. WCET measurement-based and extreme value theory characterisation of CUDA kernels. In RTNS '14, page 279. ACM, 2014. URL: http://dx.doi.org/10.1145/2659787.2659827.
A. Betts and A. Donaldson. Estimating the WCET of GPU-accelerated applications using hybrid analysis. In ECRTS '13, pages 193-202. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/ECRTS.2013.29.
N. Capodieci, R. Cavicchioli, P. Valente, and M. Bertogna. SiGAMMA: Server based integrated GPU arbitration mechanism for memory accesses. In RTNS 2017, pages 48-57. ACM, 2017. URL: http://dx.doi.org/10.1145/3139258.3139270.
R. Cavicchioli, N. Capodieci, and M. Bertogna. Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms. In RTNS 2017, pages 1-10. IEEE, 2017. URL: http://dx.doi.org/10.1109/ETFA.2017.8247615.
G. Elliott, B. Ward, and J. Anderson. GPUSync: A framework for real-time GPU management. In RTSS '13, pages 33-44, 2013. URL: http://dx.doi.org/10.1109/RTSS.2013.12.
B. Forsberg, A. Marongiu, and L. Benini. Gpuguard: Towards supporting a predictable execution model for heterogeneous SoC. In DATE '17, pages 318-321. IEEE, 2017. URL: http://dx.doi.org/10.23919/DATE.2017.7927008.
A. Horga, S. Chattopadhyayb, P. Elesa, and Z. Peng. Systematic detection of memory related performance bottlenecks in GPGPU programs. In JSA '16, 2016.
P. Houdek, M. Sojka, and Z. Hanzálek. Towards predictable execution model on ARM-based heterogeneous platforms. In ISIE '17, pages 1297-1302. IEEE, 2017. URL: http://dx.doi.org/10.1109/ISIE.2017.8001432.
S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar. RGEM: A responsive GPGPU execution model for runtime engines. In RTSS '11, pages 57-66. IEEE Computer Society, 2011. URL: http://dx.doi.org/10.1109/RTSS.2011.13.
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC '11. USENIX Association, 2011. URL: https://www.usenix.org/conference/usenixatc11/timegraph-gpu-scheduling-real-time-multi-tasking-environments.
H. Lee and M. Abdullah Al Faruque. Run-time scheduling framework for event-driven applications on a GPU-based embedded system. In TCAD '16, 2016.
A. Li, G. van den Braak, A. Kumar, and H. Corporaal. Adaptive and transparent cache bypassing for GPUs. In SIGHPC '15, pages 17:1-17:12. ACM, 2015. URL: http://dx.doi.org/10.1145/2807591.2807606.
X. Mei and X. Chu. Dissecting GPU memory hierarchy through microbenchmarking. In TPDS '16, 2016.
Multi-process service. Online at https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.
NVIDIA. Embedded systems developer kits and modules. Online at http://www.nvidia.com/object/embedded-systemsdev-kits-modules.html.
NVIDIA. Best practices guide. Online at http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html, 2017.
NVIDIA. CUDA toolkit documentation v9.1.85. Online at http://docs.nvidia.com/cuda/, 2018.
N. Otterness, V. Miller, M. Yang, J. Anderson, F.D. Smith, and S. Wang. GPU sharing for image processing in embedded real-time systems. In OSPERT '16, 2016.
N. Otterness, M. Yang, T. Amert, J. Anderson, and F.D. Smith. Inferring the scheduling policies of an embedded CUDA GPU. In OSPERT '17, 2017.
N. Otterness, M. Yang, S. Rust, E. Park, J. Anderson, F.D. Smith, A. Berg, and S. Wang. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In RTAS '17, pages 353-364, 2017. URL: http://dx.doi.org/10.1109/RTAS.2017.3.
U. Verner, A. Mendelson, and A. Schuster. Scheduling processing of real-time data streams on heterogeneous multi-GPU systems. In SYSTOR '12, page 7. ACM, 2012. URL: http://dx.doi.org/10.1145/2367589.2367596.
U. Verner, A. Mendelson, and A. Schuster. Batch method for efficient resource sharing in real-time multi-GPU systems. In ICDCN '14, volume 8314 of Lecture Notes in Computer Science, pages 347-362. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-642-45249-9_23.
U. Verner, A. Mendelson, and A. Schuster. Scheduling periodic real-time communication in multi-GPU systems. In ICCCN '14, pages 1-8. IEEE, 2014. URL: https://doi.org/10.1109/ICCCN.2014.6911778, URL: http://dx.doi.org/10.1109/ICCCN.2014.6911778.
H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. Demystifying GPU microarchitecture through microbenchmarking. In ISPASS '10, pages 235-246. IEEE Computer Society, 2010. URL: http://dx.doi.org/10.1109/ISPASS.2010.5452013.
Y. Xu, R. Wang, T. Li, M. Song, L. Gao, Z. Luan, and D. Qian. Scheduling tasks with mixed timing constraints in GPU-powered real-time systems. In ICS '16, pages 30:1-30:13. ACM, 2016. URL: http://dx.doi.org/10.1145/2925426.2926265.
J. Zhong and B. He. Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling. IEEE Transactions on Parallel and Distributed Systems, 25:1522–1532, 2014.
H. Zhou, G. Tong, and C. Liu. GPES: A preemptive execution system for GPGPU computing. In RTAS '15, pages 87-97. IEEE Computer Society, 2015. URL: http://dx.doi.org/10.1109/RTAS.2015.7108420.

Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems

Authors Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H. Anderson, F. Donelson Smith

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems

Authors Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H. Anderson, F. Donelson Smith

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message