Modeling and Analysis of Bus Contention for Hardware Accelerators in FPGA SoCs

Authors Francesco Restuccia, Marco Pagani, Alessandro Biondi, Mauro Marinoni, Giorgio Buttazzo



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2020.12.pdf
  • Filesize: 0.74 MB
  • 23 pages

Document Identifiers

Author Details

Francesco Restuccia
  • TeCIP Institute and Dept. of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, Italy
Marco Pagani
  • TeCIP Institute, Scuola Superiore Sant'Anna, Pisa, Italy
  • Université de Lille, CNRS, Centrale Lille, UMR 9189, CRIStAL, Lille, France
Alessandro Biondi
  • TeCIP Institute and Dept. of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, Italy
Mauro Marinoni
  • TeCIP Institute and Dept. of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, Italy
Giorgio Buttazzo
  • TeCIP Institute and Dept. of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Pisa, Italy

Cite AsGet BibTex

Francesco Restuccia, Marco Pagani, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. Modeling and Analysis of Bus Contention for Hardware Accelerators in FPGA SoCs. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 165, pp. 12:1-12:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ECRTS.2020.12

Abstract

FPGA System-on-Chips (SoCs) are heterogeneous platforms that combine general-purpose processors with a field-programmable gate array (FPGA) fabric. The FPGA fabric is composed of a programmable logic in which hardware accelerators can be deployed to accelerate the execution of specific functionality. The main source of unpredictability when bounding the execution times of hardware accelerators pertains the access to the shared memories via the on-chip bus. This work is focused on bounding the worst-case bus contention experienced by the hardware accelerators deployed in the FPGA fabric. To this end, this work considers the AMBA AXI bus, which is the de-facto standard communication interface used in most the commercial off-the-shelf (COTS) FPGA SoCs, and presents an analysis technique to bound the response times of hardware accelerators implemented on such platforms. A fine-grained modeling of the AXI bus and AXI interconnects is first provided. Then, contention delays are studied under hierarchical bus infrastructures with arbitrary depths. Experimental results are finally presented to validate the proposed model with execution traces on two modern FPGA-based SoC produced by Xilinx (Zynq-7000 and Zynq-Ultrascale+ families) and to assess the performance of the proposed analysis.

Subject Classification

ACM Subject Classification
  • Hardware → Interconnect
  • Hardware → Hardware accelerators
Keywords
  • Heterogeneous computing
  • Predictable hardware acceleration
  • FPGA SoCs
  • Multi-Master architectures

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Benny Akesson, Kees Goossens, and Markus Ringhofer. Predator: a predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 251-256. ACM, 2007. Google Scholar
  2. ARM. AMBA AXI and ACE Protocol Specification, 2011. Google Scholar
  3. A. Biondi, A. Balsini, M. Pagani, E. Rossi, M. Marinoni, and G. Buttazzo. A framework for supporting real-time applications on dynamic reconfigurable fpgas. In 2016 IEEE Real-Time Systems Symposium (RTSS), pages 1-12, 2016. Google Scholar
  4. D. Casini, A. Biondi, G. Nelissen, and G. Buttazzo. A holistic memory contention analysis for parallel real-time tasks under partitioned scheduling. In Proceedings of the 26th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2020), 2020. Google Scholar
  5. W. Chang, D. Goswami, S. Chakraborty, L. Ju, C. J. Xue, and S. Andalam. Memory-aware embedded control systems design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(4):586-599, April 2017. URL: https://doi.org/10.1109/TCAD.2016.2613933.
  6. Sudipta Chattopadhyay, Lee Kee Chong, Abhik Roychoudhury, Timon Kelter, Peter Marwedel, and Heiko Falk. A unified WCET analysis framework for multicore platforms. ACM Transactions on Embedded Computing Systems (TECS), 13(4s):124, 2014. Google Scholar
  7. Paul Emberson, Roger Stafford, and Robert I Davis. Techniques for the synthesis of multiprocessor tasksets. In proceedings 1st International Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS 2010), pages 6-11, 2010. Google Scholar
  8. Gabriel Fernandez, Javier Jalle, Jaume Abella, Eduardo Qui~nones, Tullio Vardanega, and Francisco J. Cazorla. Increasing confidence on measurement-based contention bounds for real-time round-robin buses. In Proceedings of the 52nd Annual Design Automation Conference, DAC ’15, New York, NY, USA, 2015. Association for Computing Machinery. URL: https://doi.org/10.1145/2744769.2744858.
  9. Nan Guan, Martin Stigge, Wang Yi, and Ge Yu. Cache-aware scheduling and analysis for multicores. In Proceedings of the seventh ACM international conference on Embedded software, pages 245-254. ACM, 2009. Google Scholar
  10. Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. A Survey of FPGA-based Neural Network Inference Accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 12(1):2, 2019. Google Scholar
  11. Mohamed Hassan and Rodolfo Pellizzoni. Bounding DRAM interference in COTS heterogeneous MPSoCs for mixed criticality systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11):2323-2336, 2018. Google Scholar
  12. F. Hebbache, M. Jan, F. Brandner, and L. Pautet. Shedding the shackles of time-division multiplexing. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 456-468, December 2018. URL: https://doi.org/10.1109/RTSS.2018.00059.
  13. Intel. Stratix 10 GX/SX Device Overview, October 2017. Google Scholar
  14. Intel FPGA. Custom IP Development Using Avalon® and Arm AMBA AXI Interfaces. OQSYS3000. Google Scholar
  15. J. Jalle, L. Kosmidis, J. Abella, E. Quiñones, and F. J. Cazorla. Bus designs for time-probabilistic multicore processors. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE), pages 1-6, March 2014. URL: https://doi.org/10.7873/DATE.2014.063.
  16. H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar. Bounding memory interference delay in COTS-based multi-core systems. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), April 2014. Google Scholar
  17. Hyoseung Kim, Dionisio de Niz, Björn Andersson, Mark Klein, Onur Mutlu, and Ragunathan Rajkumar. Bounding and reducing memory interference in COTS-based multi-core systems. Real-Time Systems, 52(3):356-395, May 2016. Google Scholar
  18. Jörg Henkel Lars Bauer, Marvin Damschen. Runtime-reconfigurable architectures for WCET guarantees and mixed criticality. In Special session at ESWEEK 2019: Analyses and Architectures for Mixed-Critical Systems: Industry Trends and Research Perspective. ACM, 2019. Google Scholar
  19. Mingsong Lv, Nan Guan, Jan Reineke, Reinhard Wilhelm, and Wang Yi. A survey on static cache analysis for real-time systems. Leibniz Transactions on Embedded Systems, 3(1):05-1-05:48, 2016. URL: https://doi.org/10.4230/LITES-v003-i001-a005.
  20. Geoffrey Nelissen and Alessandro Biondi. The SRP Resource Sharing Protocol for Self-Suspending Tasks. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 361-372. IEEE, 2018. Google Scholar
  21. Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. A linux-based support for developing real-time applications on heterogeneous platforms with dynamic fpga reconfiguration. In 2017 30th IEEE International System-on-Chip Conference (SOCC), pages 96-101. IEEE, 2017. Google Scholar
  22. Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, and Giorgio Buttazzo. A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019), volume 133 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:24, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  23. Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. Safely Preventing Unbounded Delays During Bus Transactions in FPGA-based SoC. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2020. Google Scholar
  24. Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, Giorgiomaria Cicero, and Giorgio Buttazzo. AXI HyperConnect: A Predictable, Hypervisor-level AXI Interconnect for Hardware Accelerators in FPGA SoC. In Proceedings of the 57th ACM/IEEE Design Automation Conference (DAC 2020), 2020. Google Scholar
  25. Francesco Restuccia, Marco Pagani, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. Is Your Bus Arbiter Really Fair? Restoring Fairness in AXI Interconnects for FPGA SoCs. ACM Trans. Embedded Computing Systems, 18(5s):51:1-51:22, October 2019. Google Scholar
  26. M. Slijepcevic, C. Hernandez, J. Abella, and F. J. Cazorla. Design and implementation of a fair credit-based bandwidth sharing scheme for buses. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017, pages 926-929, March 2017. URL: https://doi.org/10.23919/DATE.2017.7927122.
  27. Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 65-74. ACM, 2017. Google Scholar
  28. Xilinx. Zynq-7000 All Programmable SoC - Reference Manual, September 2016. UG585. Google Scholar
  29. Xilinx. AXI Performance Monitor v5.0, 2017. PG037. Google Scholar
  30. Xilinx. Vivado Design Suite: AXI Reference Guide, July 2017. UG1037. Google Scholar
  31. Xilinx. Zynq UltraScale+ Device - Reference Manual, December 2017. UG1085. Google Scholar
  32. Xilinx. AXI Interconnect, LogiCORE IP Product Guide, 2018. PG059. Google Scholar
  33. Xilinx Inc. The CHaiDNN official github website. https://github.com/Xilinx/chaidnn. Google Scholar
  34. Xilinx Inc. Integrated Logic Analyzer, LogiCORE IP Product Guide, 2016. PG172. Google Scholar
  35. Xilinx Inc. SmartConnect, LogiCORE IP Product Guide, 2018. PG247. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail