Leveraging Hardware QoS to Control Contention in the Xilinx Zynq UltraScale+ MPSoC

Authors Alejandro Serrano-Cases , Juan M. Reina , Jaume Abella , Enrico Mezzetti , Francisco J. Cazorla



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2021.3.pdf
  • Filesize: 1.47 MB
  • 26 pages

Document Identifiers

Author Details

Alejandro Serrano-Cases
  • Barcelona Supercomputing Center (BSC), Spain
Juan M. Reina
  • Barcelona Supercomputing Center (BSC), Spain
Jaume Abella
  • Barcelona Supercomputing Center (BSC), Spain
  • Maspatechnologies S.L, Barcelona, Spain
Enrico Mezzetti
  • Barcelona Supercomputing Center (BSC), Spain
  • Maspatechnologies S.L, Barcelona, Spain
Francisco J. Cazorla
  • Barcelona Supercomputing Center (BSC), Spain
  • Maspatechnologies S.L, Barcelona, Spain

Cite AsGet BibTex

Alejandro Serrano-Cases, Juan M. Reina, Jaume Abella, Enrico Mezzetti, and Francisco J. Cazorla. Leveraging Hardware QoS to Control Contention in the Xilinx Zynq UltraScale+ MPSoC. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 196, pp. 3:1-3:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ECRTS.2021.3

Abstract

The interference co-running tasks generate on each other’s timing behavior continues to be one of the main challenges to be addressed before Multi-Processor System-on-Chip (MPSoCs) are fully embraced in critical systems like those deployed in avionics and automotive domains. Modern MPSoCs like the Xilinx Zynq UltraScale+ incorporate hardware Quality of Service (QoS) mechanisms that can help controlling contention among tasks. Given the distributed nature of modern MPSoCs, the route a request follows from its source (usually a compute element like a CPU) to its target (usually a memory) crosses several QoS points, each one potentially implementing a different QoS mechanism. Mastering QoS mechanisms individually, as well as their combined operation, is pivotal to obtain the expected benefits from the QoS support. In this work, we perform, to our knowledge, the first qualitative and quantitative analysis of the distributed QoS mechanisms in the Xilinx UltraScale+ MPSoC. We empirically derive QoS information not covered by the technical documentation, and show limitations and benefits of the available QoS support. To that end, we use a case study building on neural network kernels commonly used in autonomous systems in different real-time domains.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time system architecture
Keywords
  • Quality of Service
  • Real-Time Systems
  • MPSoC
  • Multicore Contention

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Homa Aghilinasab, Waqar Ali, Heechul Yun, and Rodolfo Pellizzoni. Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(11):3348-3360, November 2020. URL: https://doi.org/10.1109/tcad.2020.3012210.
  2. Irune Agirre, Jaume Abella, Mikel Azkarate-Askasua, and Francisco J. Cazorla. On the tailoring of CAST-32A certification guidance to real COTS multicore architectures. In 2017 12th IEEE International Symposium on Industrial Embedded Systems (SIES), pages 1-8. IEEE, June 2017. URL: https://doi.org/10.1109/sies.2017.7993376.
  3. ARINC. Specification 653: Avionics Application Standard Software Interface. Aeronautical Radio, Inc, 1996. Google Scholar
  4. Arm. ARM CoreLink NIC-400 Network Interconnect Technical Reference Manual. Google Scholar
  5. Arm. ARM CoreLink QoS-400 Network Interconnect Advanced Quality of Service Supplement to ARM CoreLink NIC-400 Network Interconnect Technical Reference Manual. Google Scholar
  6. Arm. ARM CoreLink QVN-400 Network Interconnect Advanced Quality of Service using Virtual Networks Supplement to ARM CoreLink NIC-400 Network Interconnect Technical Reference Manual. Google Scholar
  7. Arm. ARM Cortex-A53 MPCore Processor Technical Reference Manual. Version r0p4. URL: https://developer.arm.com/documentation/ddi0500/j/.
  8. Arm. Arm® Architecture Reference Manual Supplement Memory System Resource Partitioning and Monitoring (MPAM), for Armv8-A. Google Scholar
  9. Arm. ARM® CoreLink™ CCI-400 Cache Coherent Interconnect. Revision: r1p3. Technical Reference Manual. Google Scholar
  10. Arm. Cortex-R5 and Cortex-R5F Technical Reference Manual. Version r1p1. URL: https://developer.arm.com/documentation/ddi0460/c/.
  11. Arm. AMBA AXI and ACE Protocol Specification AXI3, AXI4, and AXI4-Lite ACE and ACE-Lite. ARM IHI 0022E (ID033013), 2013. Google Scholar
  12. Matthias Becker, Dakshina Dasari, Borislav Nicolic, Benny Akesson, Vincent Nelis, and Thomas Nolte. Contention-free execution of automotive applications on a clustered many-core platform. In 2016 28th Euromicro Conference on Real-Time Systems (ECRTS), pages 14-24. IEEE, July 2016. URL: https://doi.org/10.1109/ecrts.2016.14.
  13. Matthias Becker, Borislav Nikolic, Dakshina Dasari, Benny Akesson, Vincent Nelis, Moris Behnam, and Thomas Nolte. Partitioning and analysis of the network-on-chip on a COTS many-core platform. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 101-112. IEEE, April 2017. URL: https://doi.org/10.1109/rtas.2017.32.
  14. Alessandro Biondi and Marco Di Natale. Achieving predictable multicore execution of automotive applications using the LET paradigm. In 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 240-250. IEEE, April 2018. URL: https://doi.org/10.1109/rtas.2018.00032.
  15. Carlos Boneti, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Chen-Yong Cher, and Mateo Valero. Software-controlled priority characterization of POWER5 processor. In 2008 International Symposium on Computer Architecture, pages 415-426. IEEE, June 2008. URL: https://doi.org/10.1109/isca.2008.8.
  16. Jordi Cardona, Carles Hernández, Jaume Abella, and Francisco J. Cazorla. Maximum-contention control unit (MCCU): resource access count and contention time enforcement. In Design, Automation & Test in Europe Conference & Exhibition, DATE, pages 710-715. IEEE, 2019. URL: https://doi.org/10.23919/DATE.2019.8715155.
  17. Jordi Cardona, Carles Hernandez, Enrico Mezzetti, Jaume Abella, and Francisco J. Cazorla. NoCo: ILP-based worst-case contention estimation for mesh real-time manycores. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 265-276. IEEE, December 2018. URL: https://doi.org/10.1109/rtss.2018.00043.
  18. Certification Authorities Software Team. CAST-32A Multi-core Processors, 2016. Google Scholar
  19. Dakshina Dasari and Vincent Nelis. An analysis of the impact of bus contention on the WCET in multicores. In 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, pages 1450-1457. IEEE, June 2012. URL: https://doi.org/10.1109/hpcc.2012.212.
  20. Dakshina Dasari, Vincent Nelis, and Benny Akesson. A framework for memory contention analysis in multi-core platforms. Real-Time Systems, 52(3):272-322, May 2016. URL: https://doi.org/10.1007/s11241-015-9229-9.
  21. Dakshina Dasari, Borislav Nikolic, Vincent Nelis, and Stefan M. Petters. NoC contention analysis using a branch-and-prune algorithm. ACM Transactions on Embedded Computing Systems, 13(3s):113:1-113:26, March 2014. URL: https://doi.org/10.1145/2567937.
  22. Enrique Díaz, Enrico Mezzetti, Leonidas Kosmidis, Jaume Abella, and Francisco J. Cazorla. Modelling multicore contention on the AURIXtrademark TC27x. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, June 2018. URL: https://doi.org/10.1109/dac.2018.8465780.
  23. Falk Rehm and Jörg Seitter. Software Mechanisms for Controlling QoS. In 2021 Design, Automation & Test in Europe Conference & Exhibition, DATE 2021, Virtual Conference, February 01-05, 2021, pages 1485-1488, 2016. Google Scholar
  24. Farzad Farshchi, Qijing Huang, and Heechul Yun. BRU: bandwidth regulation unit for real-time multicore processors. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 364-375. IEEE, April 2020. URL: https://doi.org/10.1109/RTAS48715.2020.00011.
  25. Fernando Fernandes dos Santos, Lucas Draghetti, Lucas Weigel, Luigi Carro, Philippe Navaux, and Paolo Rech. Evaluation and mitigation of soft-errors in neural network-based object detection in three gpu architectures. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pages 169-176. IEEE, June 2017. URL: https://doi.org/10.1109/dsn-w.2017.47.
  26. Thomas Ferrandiz, Fabrice Frances, and Christian Fraboul. A sensitivity analysis of two worst-case delay computation methods for SpaceWire networks. In 2012 24th Euromicro Conference on Real-Time Systems, pages 47-56. IEEE, July 2012. URL: https://doi.org/10.1109/ecrts.2012.35.
  27. Freescale semicondutor. QorIQ T2080 Reference Manual, 2016. Also supports T2081. Doc. No.: T2080RM. Rev. 3, 11/2016. Google Scholar
  28. Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. A survey on cache management mechanisms for real-time embedded systems. ACM Computing Surveys, 48(2):32:1-32:36, 2015. URL: https://doi.org/10.1145/2830555.
  29. Mohamed Hassan and Rodolfo Pellizzoni. Bounding DRAM interference in COTS heterogeneous MPSoCs for mixed criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11):2323-2336, November 2018. URL: https://doi.org/10.1109/tcad.2018.2857379.
  30. Mohamed Hassan and Rodolfo Pellizzoni. Analysis of memory-contention in heterogeneous cots mpsocs. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020), volume 165 of Leibniz International Proceedings in Informatics (LIPIcs), pages 23:1-23:24. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2020.23.
  31. Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Ronak Singhal, Matt Merten, and Martin Dixon. SMT QoS: Hardware Prototyping of Thread-level Performance Differentiation Mechanisms. In HotPar 12, Berkeley, CA, June 2012. USENIX Association. Google Scholar
  32. International Organization for Standardization. ISO/DIS 26262. Road Vehicles - Functional Safety, 2009. Google Scholar
  33. Javier Jalle, Jaume Abella, Eduardo Quiñones, Luca Fossati, Marco Zulianello, and Francisco J. Cazorla. AHRB: A high-performance time-composable AMBA AHB bus. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 225-236. IEEE, 2014. URL: https://doi.org/10.1109/rtas.2014.6926005.
  34. Jean-Yves Le Boudec and Patrick Thiran. Network calculus: a theory of deterministic queuing systems for the internet. Springer-Verlag, 2001. URL: https://doi.org/10.1007/3-540-45318-0.
  35. Sunggu Lee. Real-time wormhole channels. Journal Of Parallel And Distributed Computing, 63(3):299–311, March 2003. URL: https://doi.org/10.1016/S0743-7315(02)00055-2.
  36. Mingsong Lv, Nan Guan, Jan Reineke, Reinhard Wilhelm, and Wang Yi. A survey on static cache analysis for real-time systems. Leibniz Transactions on Embedded Systems, 3(1):05-1-05:48, 2016. URL: https://doi.org/10.4230/LITES-v003-i001-a005.
  37. Kristiyan Manev, Anuj Vaishnav, and Dirk Koch. Unexpected Diversity: Quantitative Memory Analysis for Zynq UltraScale+ Systems. In 2019 International Conference on Field-Programmable Technology (ICFPT), pages 179-187. IEEE, 2019. URL: https://doi.org/10.1109/ICFPT47387.2019.00029.
  38. Sparsh Mittal. A survey of techniques for cache partitioning in multicore processors. ACM Computing Surveys, 50(2):27:1-27:39, 2017. URL: https://doi.org/10.1145/3062394.
  39. Kyle J. Nesbit, Miquel Moreto, Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and James E. Smith. Multicore resource management. IEEE Micro, 28(3):6-16, 2008. URL: https://doi.org/10.1109/mm.2008.43.
  40. Jan Nowotsch, Michael Paulitsch, Daniel Buhler, Henrik Theiling, Simon Wegener, and Michael Schmidt. Multi-core interference-sensitive WCET analysis leveraging runtime resource capacity enforcement. In 2014 26th Euromicro Conference on Real-Time Systems, pages 109-118, 2014. URL: https://doi.org/10.1109/ecrts.2014.20.
  41. Diniz Nuno and Jose Rufino. ARINC 653 in Space. In DASIA - Data Systems in Aerospace, ESA Special Publication, 2005. Google Scholar
  42. nVIDIA. Technical Reference Manual. Xavier Series SoC. DP-09253-002. Version 1.1, 2018. Google Scholar
  43. Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, and Giorgio C. Buttazzo. A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019), volume 133 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:24, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2019.24.
  44. Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. A predictable execution model for COTS-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium, pages 269-279. IEEE, April 2011. URL: https://doi.org/10.1109/rtas.2011.33.
  45. Rodolfo Pellizzoni, Bach D. Bui, Marco Caccamo, and Lui Sha. Coscheduling of CPU and I/O transactions in COTS-based embedded systems. In 2008 Real-Time Systems Symposium, pages 221-231. IEEE, November 2008. URL: https://doi.org/10.1109/rtss.2008.42.
  46. Jon Pérez-Cerrolaza, Roman Obermaisser, Jaume Abella, Francisco J. Cazorla, Kim Grüttner, Irune Agirre, Hamidreza Ahmadian, and Imanol Allende. Multi-core devices for safety-critical systems: A survey. ACM Computing Surveys, 53(4):79:1-79:38, 2020. URL: https://doi.org/10.1145/3398665.
  47. Yue Qian, Zhonghai Lu, and Wenhua Dou. Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip. In 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, pages 44-53. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/nocs.2009.5071444.
  48. David Radack, Harold Jr, and Paul Parkinson. Civil certification of multi-core processing systems in commercial avionics. In 2019 27th Safety-critical Systems Symposium, February 2019. Google Scholar
  49. Dara Rahmati, Srinivasan Murali, Luca Benini, Federico Angiolini, Giovanni De Micheli, and Hamid Sarbazi-Azad. Computing accurate performance bounds for best effort networks-on-chip. IEEE Transactions on Computers, 62(3):452-467, March 2013. URL: https://doi.org/10.1109/tc.2011.240.
  50. Francesco Restuccia, Marco Pagani, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. Is your bus arbiter really fair? restoring fairness in AXI interconnects for FPGA SoCs. ACM Trans. on Embedded Computer Systems, 18(5s):51:1-51:22, 2019. URL: https://doi.org/10.1145/3358183.
  51. Shahin Roozkhosh and Renato Mancuso. The potential of programmable logic in the middle: Cache bleaching. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 296-309. IEEE, April 2020. URL: https://doi.org/10.1109/rtas48715.2020.00006.
  52. Simon Schliecker, Mircea Negrean, and Rolf Ernst. Bounding the shared resource load for the performance analysis of multiprocessor systems. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '10, pages 759-764, 2010. Google Scholar
  53. Nathanaël Sensfelder, Julien Brunel, and Claire Pagetti. On How to Identify Cache Coherence: Case of the NXP QorIQ T4240. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020), volume 165 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1-13:22, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2020.13.
  54. Parul Sohal, Rohan Tabish, Ulrich Drepper, and Renato Mancuso. E-WarP: A system-wide framework for memory bandwidth profiling and management. In 2020 IEEE Real-Time Systems Symposium (RTSS), pages 345-357. IEEE, December 2020. URL: https://doi.org/10.1109/rtss49844.2020.00039.
  55. Synopsis. DesignWare Enhanced Universal DDR Memory Controller. Google Scholar
  56. Hamid Tabani, Roger Pujol, Jaume Abella, and Francisco J. Cazorla. A cross-layer review of deep learning frameworks to ease their optimization and reuse. In 2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC), pages 144-145. IEEE, May 2020. URL: https://doi.org/10.1109/isorc49007.2020.00030.
  57. Sebastian Tobuschat and Rolf Ernst. Real-time communication analysis for networks-on-chip with backpressure. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pages 590-595. IEEE, March 2017. URL: https://doi.org/10.23919/date.2017.7927055.
  58. XILINX. Rockwell Collins Uses Zynq UltraScale+ RFSoC Devices in Revolutionizing How Arrays are Produced and Fielded: Powered by Xilinx, 2018. URL: https://www.xilinx.com/video/corporate/rockwell-collins-rfsoc-revolutionizing-how-arrays-are-produced.html.
  59. XILINX. Zynq UltraScale+ Device. Technical Reference Manual. UG1085 (v2.1), 2019. Google Scholar
  60. Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64. IEEE, April 2013. URL: https://doi.org/10.1109/rtas.2013.6531079.