A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs

Pagani, Marco; Rossi, Enrico; Biondi, Alessandro; Marinoni, Mauro; Lipari, Giuseppe; Buttazzo, Giorgio

doi:10.4230/LIPIcs.ECRTS.2019.24

Abstract

Hardware platforms for real-time embedded systems are evolving towards heterogeneous architectures comprising different types of processing cores and dedicated hardware accelerators, which can be implemented on silicon or dynamically deployed on FPGA fabric. Such accelerators typically access a shared memory to exchange a significant amount of data with other processing elements. Existing COTS solutions focus on maximizing the overall throughput of the system, rather than guaranteeing the timing constraints of individual hardware accelerators. This paper presents the AXI budgeting unit (ABU), a hardware-based solution to implement a bandwidth reservation mechanism on top of the AMBA AXI standard infrastructure for hardware accelerators deployed on FPGAs. An accurate and tractable model, as well as the corresponding analysis, are also proposed to bound the response time of hardware accelerators in the presence of ABUs, in order to verify whether they can complete before their deadlines. Finally, a set of experiments are reported to evaluate the proposed approach on a state-of-the-art platform, namely the Zynq-7020 by Xilinx. The resource consumption of the ABU has been quantified to be less than 1% of the total FPGA resources of the Zynq-7020.

Luca Abeni and Giorgio Buttazzo. Integrating multimedia applications in hard real-time systems. In Real-Time Systems Symposium, 1998. Proceedings. The 19th IEEE, pages 4-13. IEEE, 1998.
Ankit Agrawal, Renato Mancuso, Rodolfo Pellizzoni, and Gerhard Fohler. Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems. In 2018 IEEE Real-Time Systems Symposium (RTSS). IEEE, December 2018.
Benny Akesson, Liesbeth Steffens, and Kees Goossens. Efficient service allocation in hardware using credit-controlled static-priority arbitration. In 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 59-68. IEEE, 2009.
James H. Anderson, Philip Holman, and Anand Srinivasan. Fair Scheduling of Real-Time Tasks on Multiprocessors. In Handbook of Scheduling - Algorithms, Models, and Performance Analysis. Chapman and Hall/CRC, 2004.
ARM. AMBA AXI and ACE Protocol Specification, 2011.
E. Betti, S. Bak, R. Pellizzoni, M. Caccamo, and L. Sha. Real-Time I/O Management System with COTS Peripherals. IEEE Transactions on Computers, 62(1):45-58, January 2013. URL: http://dx.doi.org/10.1109/TC.2011.202.
Alessandro Biondi, Alessio Balsini, Marco Pagani, Enrico Rossi, Mauro Marinoni, and Giorgio Buttazzo. A Framework for Supporting Real-Time Applications on Dynamic Reconfigurable FPGAs. In Real-Time Systems Symposium (RTSS), pages 1-12, 2016.
Alessandro Biondi, Alessandra Melani, and Marko Bertogna. Hard constant bandwidth server: Comprehensive formulation and critical scenarios. In Proceedings of the 9th IEEE International Symposium on Industrial Embedded Systems (SIES 2014), pages 29-37. IEEE, 2014.
Roman Bourgade, Christine Rochange, and Pascal Sainrat. Predictable bus arbitration schemes for heterogeneous time-critical workloads running on multicore processors. In Emerging Technologies &Factory Automation (ETFA), 2011 IEEE 16th Conference on, pages 1-4. IEEE, 2011.
Paolo Burgio, Martino Ruggiero, Francesco Esposito, Mauro Marinoni, Giorgio Buttazzo, and Luca Benini. Adaptive TDMA bus allocation and elastic scheduling: A unified approach for enhancing robustness in multi-core RT systems. In Computer Design (ICCD), 2010 IEEE International Conference on, pages 187-194. IEEE, 2010.
Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, et al. From software to accelerators with legup high-level synthesis. In Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, page 18. IEEE Press, 2013.
Chien-Hua Chen, Geeng-Wei Lee, Juinn-Dar Huang, and Jing-Yang Jou. A real-time and bandwidth guaranteed arbitration algorithm for SoC bus communication. In Design Automation, 2006. Asia and South Pacific Conference on, pages 6-pp. IEEE, 2006.
Eric S Chung, Peter A Milder, James C Hoe, and Ken Mai. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 225-236. IEEE Computer Society, 2010.
Ben Cope, Peter YK Cheung, Wayne Luk, and Lee Howes. Performance comparison of graphics processors to reconfigurable logic: A case study. IEEE Transactions on computers, 59(4):433-448, 2010.
Rene L Cruz et al. A calculus for network delay, part I: Network elements in isolation. IEEE Transactions on information theory, 37(1):114-131, 1991.
Robert I. Davis and Alan Burns. A Survey of Hard Real-time Scheduling for Multiprocessor Systems. ACM Comput. Surv., 43(4), 2011.
Manil Dev Gomony, Jamie Garside, Benny Akesson, Neil Audsley, and Kees Goossens. A globally arbitrated memory tree for mixed-time-criticality systems. IEEE Transactions on Computers, 66(2):212-225, 2017.
Danlu Guo, Mohamed Hassan, Rodolfo Pellizzoni, and Hiren Patel. A comparative study of predictable dram controllers. ACM Transactions on Embedded Computing Systems (TECS), 17(2):53, 2018.
Dominik Honegger, Helen Oleynikova, and Marc Pollefeys. Real-time and low latency embedded computer vision hardware based on a combination of fpga and mobile cpu. In Intelligent Robots and Systems (IROS 2014), 2014 IEEE/RSJ International Conference on, pages 4930-4935. IEEE, 2014.
Intel. Stratix 10 GX/SX Device Overview, October 2017.
Jan Moritz Joseph, Morten Mey, Kristian Ehlers, Christopher Blochwitz, Tobias Winker, and Thilo Pionteck. Design space exploration for a hardware-accelerated embedded real-time pose estimation using vivado HLS. In ReConFigurable Computing and FPGAs (ReConFig), 2017 International Conference on, pages 1-8. IEEE, 2017.
Shinpei Kato, Karthik Lakshmanan, Yutaka Ishikawa, and Ragunathan Rajkumar. Resource sharing in GPU-accelerated windowing systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS), 2011 17th IEEE, pages 191-200. IEEE, 2011.
Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. USENIX ATC, pages 17-30, 2011.
Kanishka Lahiri, Anand Raghunathan, and Ganesh Lakshminarayana. The LOTTERYBUS on-chip communication architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(6):596-608, 2006.
Bu-Ching Lin, Geeng-Wei Lee, Juinn-Dar Huang, and Jing-Yang Jou. A precise bandwidth control arbitration algorithm for hard real-time SoC buses. In Proceedings of the 2007 Asia and South Pacific Design Automation Conference, pages 165-170. IEEE Computer Society, 2007.
Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. A survey and evaluation of fpga high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(10):1591-1604, 2016.
Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, and Giorgio Buttazzo. A linux-based support for developing real-time applications on heterogeneous platforms with dynamic fpga reconfiguration. In 2017 30th IEEE International System-on-Chip Conference (SOCC), pages 96-101. IEEE, 2017.
Karl Pauwels, Matteo Tomasi, Javier Diaz Alonso, Eduardo Ros, and Marc M Van Hulle. A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Transactions on Computers, 61(7):999-1012, 2012.
R. Pellizzoni and M. Caccamo. Impact of Peripheral-Processor Interference on WCET Analysis of Real-Time Embedded Systems. IEEE Transactions on Computers, 59(3):400-415, March 2010. URL: http://dx.doi.org/10.1109/TC.2009.156.
Francesco Poletti, Davide Bertozzi, Luca Benini, and Alessandro Bogliolo. Performance analysis of arbitration policies for SoC communication architectures. Design Automation for Embedded Systems, 8(2-3):189-210, 2003.
Ragunathan Rajkumar, Kanaka Juvva, Anastasio Molano, and Shuichi Oikawa. Resource kernels: A resource-centric approach to real-time and multimedia systems. In Multimedia Computing and Networking 1998, volume 3310, pages 150-165. International Society for Optics and Photonics, 1997.
Thomas D Richardson, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Yuan Xie, Chita Das, and Vijay Degalahal. A hybrid SoC interconnect with dynamic TDMA-based transaction-less buses and on-chip networks. In VLSI Design, 2006. Held jointly with 5th International Conference on Embedded Systems and Design., 19th International Conference on, pages 8-pp. IEEE, 2006.
Simon Schliecker, Mircea Negrean, Gabriela Nicolescu, Pierre Paulin, and Rolf Ernst. Reliable performance analysis of a multicore multithreaded system-on-chip. In Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, pages 161-166. ACM, 2008.
Éricles Sousa, Deepak Gangadharan, Frank Hannig, and Juergen Teich. Runtime reconfigurable bus arbitration for concurrent applications on heterogeneous MPSoC architectures. In Digital System Design (DSD), 2014 17th Euromicro Conference on, pages 74-81. IEEE, 2014.
Jan Staschulat and Marco Bekooij. Dataflow models for shared memory access latency analysis. In Proceedings of the seventh ACM international conference on Embedded software, pages 275-284. ACM, 2009.
Marcel Steine, Marco Bekooij, and Maarten Wiggers. A priority-based budget scheduler with conservative dataflow model. In Digital System Design, Architectures, Methods and Tools, 2009. DSD'09. 12th Euromicro Conference on, pages 37-44. IEEE, 2009.
Dimitrios Stiliadis and Anujan Varma. Latency-rate servers: a general model for analysis of traffic scheduling algorithms. IEEE/ACM Transactions on networking, 6(5):611-624, 1998.
Stylianos I Venieris, Alexandros Kouris, and Christos-Savvas Bouganis. Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. ACM Computing Surveys (CSUR), 51(3):56, 2018.
Xilinx. Zynq UltraScale+ Device - Technical Reference Manual, December 2017. UG1085.
Xilinx Inc. Using Quality of Service (QoS) Capabilities in Zynq-7000 AP SoC Devices, July 2015. XAPP1266.
Xilinx Inc. AXI Interconnect, LogiCORE IP Product Guide, 2018. PG059.
Xilinx Inc. Convolutional Encoder, LogiCORE IP Product Guide, 2018. PG026.
Xilinx Inc. Fast Fourier Transform, LogiCORE IP Product Guide, 2018. PG109.
Xilinx Inc. FIR Compiler, LogiCORE IP Product Guide, 2018. PG149.
Xilinx Inc. SmartConnect, LogiCORE IP Product Guide, 2018. PG247.
Ching-Chien Yuan, Yu-Jung Huang, Shih-Jhe Lin, and Kai-hsiang Huang. A reconfigurable arbiter for SOC applications. In Circuits and Systems, 2008. APCCAS 2008. IEEE Asia Pacific Conference on, pages 713-716. IEEE, 2008.
H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 55-64, April 2013.

A Bandwidth Reservation Mechanism for AXI-Based Hardware Accelerators on FPGAs

Authors Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, Giorgio Buttazzo

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message