A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs

Authors Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, Giorgio Buttazzo



PDF
Thumbnail PDF

File

LIPIcs.ECRTS.2019.24.pdf
  • Filesize: 0.72 MB
  • 24 pages

Document Identifiers

Author Details

Marco Pagani
  • Scuola Superiore Sant'Anna, Pisa, Italy
  • Université de Lille, CNRS, Centrale Lille, UMR 9189, CRIStAL, Lille, France
Enrico Rossi
  • Scuola Superiore Sant'Anna, Pisa, Italy
Alessandro Biondi
  • Scuola Superiore Sant'Anna, Pisa, Italy
Mauro Marinoni
  • Scuola Superiore Sant'Anna, Pisa, Italy
Giuseppe Lipari
  • Université de Lille, CNRS, Centrale Lille, UMR 9189, CRIStAL, Lille, France
Giorgio Buttazzo
  • Scuola Superiore Sant'Anna, Pisa, Italy

Cite As Get BibTex

Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, and Giorgio Buttazzo. A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 24:1-24:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.ECRTS.2019.24

Abstract

Hardware platforms for real-time embedded systems are evolving towards heterogeneous architectures comprising different types of processing cores and dedicated hardware accelerators, which can be implemented on silicon or dynamically deployed on FPGA fabric. Such accelerators typically access a shared memory to exchange a significant amount of data with other processing elements. Existing COTS solutions focus on maximizing the overall throughput of the system, rather than guaranteeing the timing constraints of individual hardware accelerators. This paper presents the AXI budgeting unit (ABU), a hardware-based solution to implement a bandwidth reservation mechanism on top of the AMBA AXI standard infrastructure for hardware accelerators deployed on FPGAs. An accurate and tractable model, as well as the corresponding analysis, are also proposed to bound the response time of hardware accelerators in the presence of ABUs, in order to verify whether they can complete before their deadlines. Finally, a set of experiments are reported to evaluate the proposed approach on a state-of-the-art platform, namely the Zynq-7020 by Xilinx. The resource consumption of the ABU has been quantified to be less than 1% of the total FPGA resources of the Zynq-7020.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Real-time systems
  • Computer systems organization → System on a chip
  • Hardware → Reconfigurable logic and FPGAs
Keywords
  • AXI Bus
  • Bandwidth Reservation
  • Hardware Acceleration
  • FPGA

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Luca Abeni and Giorgio Buttazzo. Integrating multimedia applications in hard real-time systems. In Real-Time Systems Symposium, 1998. Proceedings. The 19th IEEE, pages 4-13. IEEE, 1998. Google Scholar
  2. Ankit Agrawal, Renato Mancuso, Rodolfo Pellizzoni, and Gerhard Fohler. Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems. In 2018 IEEE Real-Time Systems Symposium (RTSS). IEEE, December 2018. Google Scholar
  3. Benny Akesson, Liesbeth Steffens, and Kees Goossens. Efficient service allocation in hardware using credit-controlled static-priority arbitration. In 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 59-68. IEEE, 2009. Google Scholar
  4. James H. Anderson, Philip Holman, and Anand Srinivasan. Fair Scheduling of Real-Time Tasks on Multiprocessors. In Handbook of Scheduling - Algorithms, Models, and Performance Analysis. Chapman and Hall/CRC, 2004. Google Scholar
  5. ARM. AMBA AXI and ACE Protocol Specification, 2011. Google Scholar
  6. E. Betti, S. Bak, R. Pellizzoni, M. Caccamo, and L. Sha. Real-Time I/O Management System with COTS Peripherals. IEEE Transactions on Computers, 62(1):45-58, January 2013. URL: http://dx.doi.org/10.1109/TC.2011.202.
  7. Alessandro Biondi, Alessio Balsini, Marco Pagani, Enrico Rossi, Mauro Marinoni, and Giorgio Buttazzo. A Framework for Supporting Real-Time Applications on Dynamic Reconfigurable FPGAs. In Real-Time Systems Symposium (RTSS), pages 1-12, 2016. Google Scholar
  8. Alessandro Biondi, Alessandra Melani, and Marko Bertogna. Hard constant bandwidth server: Comprehensive formulation and critical scenarios. In Proceedings of the 9th IEEE International Symposium on Industrial Embedded Systems (SIES 2014), pages 29-37. IEEE, 2014. Google Scholar
  9. Roman Bourgade, Christine Rochange, and Pascal Sainrat. Predictable bus arbitration schemes for heterogeneous time-critical workloads running on multicore processors. In Emerging Technologies &Factory Automation (ETFA), 2011 IEEE 16th Conference on, pages 1-4. IEEE, 2011. Google Scholar
  10. Paolo Burgio, Martino Ruggiero, Francesco Esposito, Mauro Marinoni, Giorgio Buttazzo, and Luca Benini. Adaptive TDMA bus allocation and elastic scheduling: A unified approach for enhancing robustness in multi-core RT systems. In Computer Design (ICCD), 2010 IEEE International Conference on, pages 187-194. IEEE, 2010. Google Scholar
  11. Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, et al. From software to accelerators with legup high-level synthesis. In Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, page 18. IEEE Press, 2013. Google Scholar
  12. Chien-Hua Chen, Geeng-Wei Lee, Juinn-Dar Huang, and Jing-Yang Jou. A real-time and bandwidth guaranteed arbitration algorithm for SoC bus communication. In Design Automation, 2006. Asia and South Pacific Conference on, pages 6-pp. IEEE, 2006. Google Scholar
  13. Eric S Chung, Peter A Milder, James C Hoe, and Ken Mai. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 225-236. IEEE Computer Society, 2010. Google Scholar
  14. Ben Cope, Peter YK Cheung, Wayne Luk, and Lee Howes. Performance comparison of graphics processors to reconfigurable logic: A case study. IEEE Transactions on computers, 59(4):433-448, 2010. Google Scholar
  15. Rene L Cruz et al. A calculus for network delay, part I: Network elements in isolation. IEEE Transactions on information theory, 37(1):114-131, 1991. Google Scholar
  16. Robert I. Davis and Alan Burns. A Survey of Hard Real-time Scheduling for Multiprocessor Systems. ACM Comput. Surv., 43(4), 2011. Google Scholar
  17. Manil Dev Gomony, Jamie Garside, Benny Akesson, Neil Audsley, and Kees Goossens. A globally arbitrated memory tree for mixed-time-criticality systems. IEEE Transactions on Computers, 66(2):212-225, 2017. Google Scholar
  18. Danlu Guo, Mohamed Hassan, Rodolfo Pellizzoni, and Hiren Patel. A comparative study of predictable dram controllers. ACM Transactions on Embedded Computing Systems (TECS), 17(2):53, 2018. Google Scholar
  19. Dominik Honegger, Helen Oleynikova, and Marc Pollefeys. Real-time and low latency embedded computer vision hardware based on a combination of fpga and mobile cpu. In Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on, pages 4930-4935. IEEE, 2014. Google Scholar
  20. Intel. Stratix 10 GX/SX Device Overview, October 2017. Google Scholar
  21. Jan Moritz Joseph, Morten Mey, Kristian Ehlers, Christopher Blochwitz, Tobias Winker, and Thilo Pionteck. Design space exploration for a hardware-accelerated embedded real-time pose estimation using vivado HLS. In ReConFigurable Computing and FPGAs (ReConFig), 2017 International Conference on, pages 1-8. IEEE, 2017. Google Scholar
  22. Shinpei Kato, Karthik Lakshmanan, Yutaka Ishikawa, and Ragunathan Rajkumar. Resource sharing in GPU-accelerated windowing systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2011 17th IEEE, pages 191-200. IEEE, 2011. Google Scholar
  23. Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. USENIX ATC, pages 17-30, 2011. Google Scholar
  24. Kanishka Lahiri, Anand Raghunathan, and Ganesh Lakshminarayana. The LOTTERYBUS on-chip communication architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(6):596-608, 2006. Google Scholar
  25. Bu-Ching Lin, Geeng-Wei Lee, Juinn-Dar Huang, and Jing-Yang Jou. A precise bandwidth control arbitration algorithm for hard real-time SoC buses. In Proceedings of the 2007 Asia and South Pacific Design Automation Conference, pages 165-170. IEEE Computer Society, 2007. Google Scholar
  26. Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. A survey and evaluation of fpga high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(10):1591-1604, 2016. Google Scholar
  27. Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. A linux-based support for developing real-time applications on heterogeneous platforms with dynamic fpga reconfiguration. In 2017 30th IEEE International System-on-Chip Conference (SOCC), pages 96-101. IEEE, 2017. Google Scholar
  28. Karl Pauwels, Matteo Tomasi, Javier Diaz Alonso, Eduardo Ros, and Marc M Van Hulle. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers, 61(7):999-1012, 2012. Google Scholar
  29. R. Pellizzoni and M. Caccamo. Impact of Peripheral-Processor Interference on WCET Analysis of Real-Time Embedded Systems. IEEE Transactions on Computers, 59(3):400-415, March 2010. URL: http://dx.doi.org/10.1109/TC.2009.156.
  30. Francesco Poletti, Davide Bertozzi, Luca Benini, and Alessandro Bogliolo. Performance analysis of arbitration policies for SoC communication architectures. Design Automation for Embedded Systems, 8(2-3):189-210, 2003. Google Scholar
  31. Ragunathan Rajkumar, Kanaka Juvva, Anastasio Molano, and Shuichi Oikawa. Resource kernels: A resource-centric approach to real-time and multimedia systems. In Multimedia Computing and Networking 1998, volume 3310, pages 150-165. International Society for Optics and Photonics, 1997. Google Scholar
  32. Thomas D Richardson, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Yuan Xie, Chita Das, and Vijay Degalahal. A hybrid SoC interconnect with dynamic TDMA-based transaction-less buses and on-chip networks. In VLSI Design, 2006. Held jointly with 5th International Conference on Embedded Systems and Design., 19th International Conference on, pages 8-pp. IEEE, 2006. Google Scholar
  33. Simon Schliecker, Mircea Negrean, Gabriela Nicolescu, Pierre Paulin, and Rolf Ernst. Reliable performance analysis of a multicore multithreaded system-on-chip. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, pages 161-166. ACM, 2008. Google Scholar
  34. Éricles Sousa, Deepak Gangadharan, Frank Hannig, and Juergen Teich. Runtime reconfigurable bus arbitration for concurrent applications on heterogeneous MPSoC architectures. In Digital System Design (DSD), 2014 17th Euromicro Conference on, pages 74-81. IEEE, 2014. Google Scholar
  35. Jan Staschulat and Marco Bekooij. Dataflow models for shared memory access latency analysis. In Proceedings of the seventh ACM international conference on Embedded software, pages 275-284. ACM, 2009. Google Scholar
  36. Marcel Steine, Marco Bekooij, and Maarten Wiggers. A priority-based budget scheduler with conservative dataflow model. In Digital System Design, Architectures, Methods and Tools, 2009. DSD'09. 12th Euromicro Conference on, pages 37-44. IEEE, 2009. Google Scholar
  37. Dimitrios Stiliadis and Anujan Varma. Latency-rate servers: a general model for analysis of traffic scheduling algorithms. IEEE/ACM Transactions on networking, 6(5):611-624, 1998. Google Scholar
  38. Stylianos I Venieris, Alexandros Kouris, and Christos-Savvas Bouganis. Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. ACM Computing Surveys (CSUR), 51(3):56, 2018. Google Scholar
  39. Xilinx. Zynq UltraScale+ Device - Technical Reference Manual, December 2017. UG1085. Google Scholar
  40. Xilinx Inc. Using Quality of Service (QoS) Capabilities in Zynq-7000 AP SoC Devices, July 2015. XAPP1266. Google Scholar
  41. Xilinx Inc. AXI Interconnect, LogiCORE IP Product Guide, 2018. PG059. Google Scholar
  42. Xilinx Inc. Convolutional Encoder, LogiCORE IP Product Guide, 2018. PG026. Google Scholar
  43. Xilinx Inc. Fast Fourier Transform, LogiCORE IP Product Guide, 2018. PG109. Google Scholar
  44. Xilinx Inc. FIR Compiler, LogiCORE IP Product Guide, 2018. PG149. Google Scholar
  45. Xilinx Inc. SmartConnect, LogiCORE IP Product Guide, 2018. PG247. Google Scholar
  46. Ching-Chien Yuan, Yu-Jung Huang, Shih-Jhe Lin, and Kai-hsiang Huang. A reconfigurable arbiter for SOC applications. In Circuits and Systems, 2008. APCCAS 2008. IEEE Asia Pacific Conference on, pages 713-716. IEEE, 2008. Google Scholar
  47. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64, April 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail