Autonomy Today: Many Delay-Prone Black Boxes

Authors Sizhe Liu , Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, Yunhua Li



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2024.12.pdf
  • Filesize: 2.57 MB
  • 27 pages

Document Identifiers

Author Details

Sizhe Liu
  • University of North Carolina at Chapel Hill, NC, USA
Rohan Wagle
  • University of North Carolina at Chapel Hill, NC, USA
James H. Anderson
  • University of North Carolina at Chapel Hill, NC, USA
Ming Yang
  • WeRide Corp., San Jose, CA, USA
Chi Zhang
  • WeRide Corp., San Jose, CA, USA
Yunhua Li
  • WeRide Corp., San Jose, CA, USA

Acknowledgements

We thank Huazhong Ning at WeRide for coordinating this collaboration and BlackBerry QNX for supplying relevant software.

Cite AsGet BibTex

Sizhe Liu, Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, and Yunhua Li. Autonomy Today: Many Delay-Prone Black Boxes. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 12:1-12:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ECRTS.2024.12

Abstract

Machine-learning (ML) technology has been a key enabler in the push towards realizing ever more sophisticated autonomous-driving features. In deploying such technology, the automotive industry has relied heavily on using "black-box" software and hardware components that were originally intended for non-safety-critical contexts, without a full understanding of their real-time capabilities. A prime example of such a component is CUDA, which is fundamental to the acceleration of ML algorithms using NVIDIA GPUs. In this paper, evidence is presented demonstrating that CUDA can cause unbounded task delays. Such delays are the result of CUDA’s usage of synchronization mechanisms in the POSIX thread (pthread) library, so the latter is implicated as a delay-prone component as well. Such synchronization delays are shown to be the source of a system failure that occurred in an actual autonomous vehicle system during testing at WeRide. Motivated by these findings, a broader experimental study is presented that demonstrates several real-time deficiencies in CUDA, the glibc pthread library, Linux, and the POSIX interface of the safety-certified QNX Operating System for Safety. Partial mitigations for these deficiencies are presented and further actions are proposed for real-time researchers and developers to integrate more complete mitigations.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time operating systems
  • Software and its engineering → Process synchronization
Keywords
  • autonomous driving
  • CUDA programming
  • locking protocols
  • POSIX thread
  • operating systems
  • machine learning systems
  • real-time systems

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Ahmed and J. Anderson. Optimal multiprocessor locking protocols under fifo scheduling. In Proceedings of the 35th Euromicro Conference on Real-Time Systems, pages 16.1-16.21, July 2023. Google Scholar
  2. T. Amert, Z. Tong, S. Voronov, J. Bakita, F.D. Smith, and J. Anderson. TimeWall: Enabling time partitioning for real-time multicore+accelerator platforms. In Proceedings of the 42nd IEEE Real-Time Systems Symposium, pages 455-468, December 2021. Google Scholar
  3. N. C. Audsley, A. Burns, and A. J. Wellings. Deadline monotonic scheduling theory and application. Control Engineering Practice, 1(1):71-78, 1993. URL: https://doi.org/10.1016/0967-0661(93)92105-D.
  4. M. R. Bachute and J. M. Subhedar. Autonomous driving architectures: Insights of machine learning and deep learning algorithms. Machine Learning with Applications, 6:100164, 2021. URL: https://doi.org/10.1016/j.mlwa.2021.100164.
  5. BlackBerry QNX Safety Certifications, Compliance and Conformance. https://blackberry.qnx.com/en/developers/certifications. Accessed: 2024-05-10.
  6. A. Block, H. Leontyev, B. Brandenburg, and J. Anderson. A flexible real-time locking protocol for multiprocessors. In Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 71-80. IEEE, August 2007. Google Scholar
  7. B. Brandenburg. Scheduling and Locking in Multiprocessor Real-Time Operating Systems. PhD thesis, University of North Carolina, Chapel Hill, NC, 2011. Google Scholar
  8. B. Brandenburg. A fully preemptive multiprocessor semaphore protocol for latency-sensitive real-time applications. In Proceedings of the 25th Euromicro Conference on Real-Time Systems, pages 292-302, July 2013. Google Scholar
  9. B. Brandenburg. The FMLP+: An asymptotically optimal real-time locking protocol for suspension-aware analysis. In Proceedings of the 26th Euromicro Conference on Real-Time Systems, pages 61-71, July 2014. Google Scholar
  10. B. Brandenburg. Multiprocessor real-time locking protocols: A systematic review. CoRR, abs/1909.09600, 2019. Google Scholar
  11. B. Brandenburg and J. Anderson. Reader-writer synchronization for shared-memory multiprocessor real-time systems. In Proceedings of the 21st Euromicro Conference on Real-Time Systems, pages 184-193, July 2009. Google Scholar
  12. B. Brandenburg and J. Anderson. Optimality results for multiprocessor real-time locking. In Proceedings of the 31st IEEE Real-Time Systems Symposium, pages 49-60. IEEE Press, December 2010. Google Scholar
  13. B. Brandenburg and J. Anderson. Real-time resource-sharing under clustered scheduling: Mutex, reader-writer, and k-exclusion locks. In Proceedings of the ACM International Conference on Embedded Software, pages 69-78. ACM, October 2011. Google Scholar
  14. B. Brandenburg and J. Anderson. The OMLP family of optimal multiprocessor real-time locking protocols. Design Automation for Embedded Systems, 17(2):277-342, 2014. Google Scholar
  15. B. Brandenburg and A. Bastoni. The case for migratory priority inheritance in linux: Bounded priority inversions on multiprocessors. In 14th Real-Time Linux Workshop, pages 67-86. Real-Time Linux Foundation, 2012. Google Scholar
  16. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers. In A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, editors, Computer Vision - ECCV 2020, pages 213-229, Cham, 2020. Springer International Publishing. Google Scholar
  17. Carla Simulator. https://carla.org/. Accessed: 2024-05-10.
  18. Getting Started with CUDA Graphs. https://developer.nvidia.com/blog/cuda-graphs/. Accessed: 2024-05-10.
  19. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL: https://openreview.net/forum?id=YicbFdNTTy.
  20. A. Easwaran and B. Andersson. Resource sharing in global fixed-priority preemptive multiprocessor scheduling. In Proceedings of the 30th IEEE Real-Time Systems Symposium, pages 377-386. IEEE, December 2009. Google Scholar
  21. G. Elliott. Real-Time Scheduling of GPUs, with Applications in Advanced Automotive Systems. PhD thesis, University of North Carolina, Chapel Hill, NC, 2015. Google Scholar
  22. G. Elliott and J. Anderson. The limitations of fixed-priority interrupt handling in preempt rt and alternative approaches. In Proceedings of the 14th OSADL Real-Time Linux Workshop, pages 149-155, 2012. URL: https://api.semanticscholar.org/CorpusID:1503941.
  23. H. Franke, R. Russell, and M. Kirkwood. Fuss, futexes and furwocks: Fast userlevel locking in Linux. In AUUG Conference Proceedings, volume 85, pages 479-495. AUUG, Inc, 2002. Google Scholar
  24. B. Gallmeister and C. Lanier. Early experience with POSIX 1003.4 and POSIX 1003.4A. In Proceedings of the 12th IEEE Real-Time Systems Symposium, pages 190-198. IEEE, December 1991. Google Scholar
  25. Bootlin Elixir cross referencer - GLIBC 2.23. https://elixir.bootlin.com/glibc/glibc-2.23/source. Accessed: 2024-05-10.
  26. Bootlin Elixir cross referencer - GLIBC 2.23 - pthread_rwlock_timedrdlock. https://elixir.bootlin.com/glibc/glibc-2.23/source/nptl/pthread_rwlock_timedrdlock.c. Accessed: 2024-05-10.
  27. The GNU C library (GLIBC). https://elixir.bootlin.com/glibc/glibc-2.38/source/sysdeps/nptl. Accessed: 2024-05-10.
  28. L. Liu, S. Lu, R. Zhong, B. Wu, Y. Yao, Q. Zhang, and W. Shi. Computing systems for autonomous driving: State of the art and challenges. IEEE Internet of Things Journal, 8(8):6469-6486, 2021. URL: https://doi.org/10.1109/JIOT.2020.3043716.
  29. Sizhe Liu. Source code for ECRTS 2024 paper Autonomy Today: Many Delay-Prone Black Boxes. Software, swhId: https://archive.softwareheritage.org/swh:1:dir:825b348c28207aef045b6565ae0e977add42d44c;origin=https://github.com/sizheliu-unc/ECRTS24;visit=swh:1:snp:a46dcb1736d43e4f8475487f9d44b4bdbbac27ef;anchor=swh:1:rev:8d786c7be0135388d9d2d0e3390dcd967d853966, (visited on 06/06/2024). URL: https://github.com/sizheliu-unc/ECRTS24.
  30. LTTng site. https://lttng.org/. Accessed: 2024-05-10.
  31. Meta Research - DETR. https://github.com/facebookresearch/detr. Accessed: 2024-05-10.
  32. K. Muhammad, A. Ullah, J. Lloret, J. D. Ser, and V. H. C. de Albuquerque. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Transactions on Intelligent Transportation Systems, 22(7):4316-4336, 2021. URL: https://doi.org/10.1109/TITS.2020.3032227.
  33. NVIDIA Nsight Systems. https://developer.nvidia.com/nsight-systems. Accessed: 2024-05-10.
  34. NVIDIA Research Projects - SegFormer. https://github.com/NVlabs/SegFormer. Accessed: 2024-05-10.
  35. NVIDIA TensorRT official website. https://developer.nvidia.com/tensorrt. Accessed: 2024-05-10.
  36. ONNX model - DeiT-B . https://github.com/onnx/models/blob/main/Computer_Vision/deit3_base_patch16_224_Opset17_timm/deit3_base_patch16_224_Opset17.onnx. Accessed: 2024-05-10.
  37. ONNX model - RegNet-Y. https://github.com/onnx/models/blob/main/Computer_Vision/regnet_y_16gf_Opset16_torch_hub/regnet_y_16gf_Opset16.onnx. Accessed: 2024-05-10.
  38. ONNX model - ViT-S. https://github.com/onnx/models/blob/main/Computer_Vision/vit_small_patch16_224_Opset16_timm/vit_small_patch16_224_Opset16.onnx. Accessed: 2024-05-10.
  39. How to Overlap Data Transfers in CUDA C/C++. https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/. Accessed: 2024-05-10.
  40. IEEE standard for information technology-portable operating system interface (POSIX(TM)) base specifications, issue 7, 2018. IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008), pp. 1-3951. URL: https://doi.org/10.1109/IEEESTD.2018.8277153.
  41. QNX Neutrino Realtime Operating System: Library Reference. http://www.qnx.com/developers/docs/6.5.0/index.jsp?topic=%2Fcom.qnx.doc.neutrino_lib_ref%2Fp%2Fpthread_rwlock_rdlock.html. Accessed: 2024-05-10.
  42. QNX Reader/writer locks. https://www.qnx.com/developers/docs/7.1/index.html#com.qnx.doc.neutrino.sys_arch/topic/kernel_Reader_writer_locks.html. Accessed: 2024-05-10.
  43. I. Radosavovic, R. Kosaraju, R. Girshick, K. He, and P. Dollar. Designing network design spaces. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10425-10433, Los Alamitos, CA, USA, June 2020. IEEE Computer Society. URL: https://doi.org/10.1109/CVPR42600.2020.01044.
  44. R. Rajkumar, L. Sha, and J. P. Lehoczky. An experimental investigation of synchronization protocols. IEEE Real-Time Systems Newsletter, 5(2-3):11-17, 1989. Google Scholar
  45. F. Reghenzani, G. Massari, and W. Fornaciari. The real-time Linux kernel: A survey on PREEMPT_RT. ACM Comput. Surv., 52(1), February 2019. URL: https://doi.org/10.1145/3297714.
  46. ROS: Home. https://www.ros.org/. Accessed: 2024-05-10.
  47. RT-mutex subsystem with PI support. https://docs.kernel.org/locking/rt-mutex.html. Accessed: 2024-05-10.
  48. C. Scordino and G. Lipari. Linux and real-time: Current approaches and future opportunities. In IEEE International Congress ANIPLA, 2006. Google Scholar
  49. L. Sha and J. B. Goodenough. Real-time scheduling theory and Ada. Computer, 23(4):53-62, 1990. URL: https://doi.org/10.1109/2.55469.
  50. L. Sha, R. Rajkumar, and J.P. Lehoczky. Priority inheritance protocols: an approach to real-time synchronization. IEEE Transactions on Computers, 39(9):1175-1185, 1990. URL: https://doi.org/10.1109/12.57058.
  51. R. L. Sites. Benchmarking "Hello, world!": Six different views of the execution of "Hello, world!" show what is often missing in today’s tools. ACM Queue, 16(5):54-80, October 2018. URL: https://doi.org/10.1145/3291276.3291278.
  52. R. L. Sites. Understanding software dynamics. Addison-Wesley Professional Computing Series. Addison Wesley, Boston, MA, February 2022. Google Scholar
  53. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou. Training data-efficient image transformers &; distillation through attention. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10347-10357. PMLR, 18-24 July 2021. URL: https://proceedings.mlr.press/v139/touvron21a.html.
  54. Ubuntu 16.04 LTS (Xenial Xerus). https://ubuntu.com/16-04. Accessed: 2024-025-10.
  55. B. Ward and J. Anderson. Supporting nested locking in multiprocessor real-time systems. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems, pages 223-232, July 2012. Google Scholar
  56. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Neural Information Processing Systems (NeurIPS), 2021. Google Scholar
  57. M. Yang, A. Wieder, and B. Brandenburg. Global real-time semaphore protocols: A survey, unified analysis, and comparison. In 2015 IEEE Real-Time Systems Symposium, pages 1-12, 2015. URL: https://doi.org/10.1109/RTSS.2015.8.