Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation

Platzer, Michael; Puschner, Peter

doi:10.4230/LIPIcs.ECRTS.2021.1

Abstract

In this work, we present Vicuna, a timing-predictable vector coprocessor. A vector processor can be scaled to satisfy the performance requirements of massively parallel computation tasks, yet its timing behavior can remain simple enough to be efficiently analyzable. Therefore, vector processors are promising for highly parallel real-time applications, such as advanced driver assistance systems and autonomous vehicles. Vicuna has been specifically tailored to address the needs of real-time applications. It features predictable and repeatable timing behavior and is free of timing anomalies, thus enabling effective and tight worst-case execution time (WCET) analysis while retaining the performance and efficiency commonly seen in other vector processors. We demonstrate our architecture’s predictability, scalability, and performance by running a set of benchmark applications on several configurations of Vicuna synthesized on a Xilinx 7 Series FPGA with a peak performance of over 10 billion 8-bit operations per second, which is in line with existing non-predictable soft vector-processing architectures.

K. Andryc, M. Merchant, and R. Tessier. FlexGrip: A soft GPGPU for FPGAs. In 2013 International Conference on Field-Programmable Technology (FPT), pages 230-237, December 2013. URL: https://doi.org/10.1109/FPT.2013.6718358.
Krste Asanovic. Vector Microprocessors. PhD thesis, University of California, Berkeley, CA, USA, 1998.
Mihail Asavoae, Belgacem Ben Hedia, and Mathieu Jan. Formal Executable Models for Automatic Detection of Timing Anomalies. In Florian Brandner, editor, 18th International Workshop on Worst-Case Execution Time Analysis (WCET 2018), volume 63 of OpenAccess Series in Informatics (OASIcs), pages 2:1-2:13, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/OASIcs.WCET.2018.2.
S. F. Beldianu and S. G. Ziavras. Performance-energy optimizations for shared vector accelerators in multicores. IEEE Transactions on Computers, 64(3):805-817, 2015. URL: https://doi.org/10.1109/TC.2013.2295820.
Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, and Luca Benini. Ara: A 1 GHz+ scalable and energy-efficient RISC-V vector processor with multi-precision floating point support in 22 nm FD-SOI. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP:1-14, December 2019. URL: https://doi.org/10.1109/TVLSI.2019.2950087.
Christopher H. Chou, Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, and Guy G.F. Lemieux. VEGAS: Soft vector processor with scratchpad memory. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '11, page 15–24, New York, NY, USA, 2011. Association for Computing Machinery. URL: https://doi.org/10.1145/1950413.1950420.
Daniel Dabbelt, Colin Schmidt, Eric Love, Howard Mao, Sagar Karandikar, and Krste Asanovic. Vector processors for energy-efficient embedded systems. In Proceedings of the Third ACM International Workshop on Many-Core Embedded Systems, MES '16, page 10–16, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2934495.2934497.
J. Dean. The deep learning revolution and its implications for computer architecture and chip design. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pages 8-14, February 2020. URL: https://doi.org/10.1109/ISSCC19947.2020.9063049.
G. A. Elliott and J. H. Anderson. Real-world constraints of GPUs in real-time systems. In 2011 IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications, volume 2, pages 48-54, 2011. URL: https://doi.org/10.1109/RTCSA.2011.46.
Glenn A. Elliott and James H. Anderson. Globally scheduled real-time multiprocessor systems with GPUs. Real-Time Systems, 48:34-74, 2012. URL: https://doi.org/10.1007/s11241-011-9140-y.
Michael J. Flynn. Some computer organizations and their effectiveness. IEEE Trans. Comput., 21(9):948–960, September 1972. URL: https://doi.org/10.1109/TC.1972.5009071.
Martin Frieb, Ralf Jahr, Haluk Ozaktas, Andreas Hugl, Hans Regler, and Theo Ungerer. A parallelization approach for hard real-time systems and its application on two industrial programs. Int. J. Parallel Program., 44(6):1296–1336, December 2016. URL: https://doi.org/10.1007/s10766-016-0432-7.
V. Golyanik, M. Nasri, and D. Stricker. Towards scheduling hard real-time image processing tasks on a single GPU. In 2017 IEEE International Conference on Image Processing (ICIP), pages 4382-4386, 2017. URL: https://doi.org/10.1109/ICIP.2017.8297110.
Giovani Gracioli, Rohan Tabish, Renato Mancuso, Reza Mirosanlou, Rodolfo Pellizzoni, and Marco Caccamo. Designing Mixed Criticality Applications on Modern Heterogeneous MPSoC Platforms. In Sophie Quinton, editor, 31st Euromicro Conference on Real-Time Systems (ECRTS 2019), volume 133 of Leibniz International Proceedings in Informatics (LIPIcs), pages 27:1-27:25, Dagstuhl, Germany, May 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2019.27.
S. Hahn and J. Reineke. Design and analysis of sic: A provably timing-predictable pipelined processor core. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 469-481, 2018. URL: https://doi.org/10.1109/RTSS.2018.00060.
Sebastian Hahn, Michael Jacobs, and Jan Reineke. Enabling compositionality for multicore timing analysis. In Proceedings of the 24th International Conference on Real-Time Networks and Systems, RTNS '16, page 299–308, New York, NY, USA, 2016. Association for Computing Machinery. URL: https://doi.org/10.1145/2997465.2997471.
Sebastian Hahn, Jan Reineke, and Reinhard Wilhelm. Toward Compact Abstractions for Processor Pipelines, pages 205-220. Springer International Publishing, 2015. URL: https://doi.org/10.1007/978-3-319-23506-6_14.
Sebastian Hahn, Jan Reineke, and Reinhard Wilhelm. Towards compositionality in execution time analysis: Definition and challenges. SIGBED Rev., 12(1):28–36, 2015. URL: https://doi.org/10.1145/2752801.2752805.
R. M. Hord. The Illiac IV: The First Supercomputer. Springer-Verlag Berlin Heidelberg GmbH, 1982.
M. Jan, M. Asavoae, M. Schoeberl, and E. A. Lee. Formal semantics of predictable pipelines: a comparative study. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), pages 103-108, 2020. URL: https://doi.org/10.1109/ASP-DAC47756.2020.9045351.
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News, 45(2):1–12, June 2017. URL: https://doi.org/10.1145/3140659.3080246.
Nassima Kadri and Mouloud Koudil. A survey on fault-tolerant application mapping techniques for network-on-chip. Journal of Systems Architecture, 92:39-52, 2019. URL: https://doi.org/10.1016/j.sysarc.2018.10.001.
Junsung Kim, Ragunathan (Raj) Rajkumar, and Shinpei Kato. Towards adaptive gpu resource management for embedded real-time systems. SIGBED Rev., 10(1):14–17, 2013. URL: https://doi.org/10.1145/2492385.2492387.
Charles Eric Laforest, Zimo Li, Tristan O'rourke, Ming G. Liu, and J. Gregory Steffan. Composing multi-ported memories on fpgas. ACM Trans. Reconfigurable Technol. Syst., 7(3), September 2014. URL: https://doi.org/10.1145/2629629.
Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanović, and K. Asanović. A 45nm 1.3ghz 16.7 double-precision gflops/w risc-v processor with vector accelerators. In ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC), pages 199-202, September 2014. URL: https://doi.org/10.1109/ESSCIRC.2014.6942056.
Ben Lickly, Isaac Liu, Sungjun Kim, Hiren D. Patel, Stephen A. Edwards, and Edward A. Lee. Predictable programming on a precision timed architecture. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '08, page 137–146, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1450095.1450117.
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39–55, March 2008. URL: https://doi.org/10.1109/MM.2008.31.
Radu Marculescu, Umit Y. Ogras, Li-Shiuan Peh, Natalie Enright Jerger, and Yatin Hoskote. Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives. Trans. Comp.-Aided Des. Integ. Cir. Sys., 28(1):3–21, January 2009. URL: https://doi.org/10.1109/TCAD.2008.2010691.
Tulika Mitra. Time-predictable computing by design: Looking back, looking forward. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC '19, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3316781.3323489.
G. Ofenbeck, R. Steinmann, V. Caparros, D. G. Spampinato, and M. Püschel. Applying the roofline model. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 76-85, March 2014. URL: https://doi.org/10.1109/ISPASS.2014.6844463.
John Owens, Mike Houston, David Luebke, Simon Green, John Stone, and James Phillips. GPU computing. Proceedings of the IEEE, 96:879-899, May 2008. URL: https://doi.org/10.1109/JPROC.2008.917757.
Kariofyllis Patsidis, Chrysostomos Nicopoulos, Georgios Ch. Sirakoulis, and Giorgos Dimitrakopoulos. RISC-V2: A scalable RISC-V vector processor. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1-5, September 2020. URL: https://doi.org/10.1109/ISCAS45731.2020.9181071.
Behnaz Pourmohseni, Stefan Wildermann, Michael Glaß, and Jürgen Teich. Hard real-time application mapping reconfiguration for NoC-based many-core systems. Real-Time Systems, 55:433-469, 2019. URL: https://doi.org/10.1007/s11241-019-09326-y.
RISC-V International. Working draft of the proposed RISC-V V vector extension, January 2021. Version 0.10. URL: https://github.com/riscv/riscv-v-spec.
Richard M. Russell. The CRAY-1 computer system. Commun. ACM, 21(1):63–72, January 1978. URL: https://doi.org/10.1145/359327.359336.
S. Saidi, R. Ernst, S. Uhrig, H. Theiling, and B. D. de Dinechin. The shift to multicores in real-time and safety-critical systems. In 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 220-229, 2015. URL: https://doi.org/10.1109/CODESISSS.2015.7331385.
P. D. Schiavone, F. Conti, D. Rossi, M. Gautschi, A. Pullini, E. Flamand, and L. Benini. Slow and steady wins the race? a comparison of ultra-low-power RISC-V cores for internet-of-things applications. In 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), pages 1-8, September 2017. URL: https://doi.org/10.1109/PATMOS.2017.8106976.
Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, Stefan Hepp, Benedikt Huber, Alexander Jordan, Evangelia Kasapaki, Jens Knoop, Yonghui Li, Daniel Prokesch, Wolfgang Puffitsch, Peter Puschner, André Rocha, Cláudio Silva, Jens Sparsø, and Alessandro Tocchi. T-CREST: Time-predictable multi-core architecture for embedded systems. Journal of Systems Architecture, 61(9):449-471, 2015. URL: https://doi.org/10.1016/j.sysarc.2015.04.002.
Aaron Severance and Guy Lemieux. VENICE: A compact vector processor for FPGA applications. In 2011 IEEE Hot Chips 23 Symposium (HCS), pages 1-5, 2011. URL: https://doi.org/10.1109/HOTCHIPS.2011.7477515.
Aaron Severance and Guy Lemieux. Embedded supercomputing in FPGAs with the vectorblox MXP matrix processor. In 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 1-10, 2013. URL: https://doi.org/10.1109/CODES-ISSS.2013.6658993.
Amit Kumar Singh, Piotr Dziurzanski, Hashan Roshantha Mendis, and Leandro Soares Indrusiak. A survey and comparative study of hard and soft real-time dynamic resource allocation strategies for multi-/many-core systems. ACM Comput. Surv., 50(2), 2017. URL: https://doi.org/10.1145/3057267.
Sudarshan Srinivasan, Pradeep Janedula, Saurabh Dhoble, Sasikanth Avancha, Dipankar Das, Naveen Mellempudi, Bharat Daga, Martin Langhammer, Gregg Baeckler, and Bharat Kaul. High performance scalable FPGA accelerator for deep neural networks, 2019. URL: https://arxiv.org/abs/1908.11809.
Theo Ungerer, Christian Bradatsch, Martin Frieb, Florian Kluge, Jörg Mische, Alexander Stegmeier, Ralf Jahr, Mike Gerdes, Pavel Zaykov, Lucie Matusova, Zai Jian Jia Li, Zlatko Petrov, Bert Böddeker, Sebastian Kehr, Hans Regler, Andreas Hugl, Christine Rochange, Haluk Ozaktas, Hugues Cassé, Armelle Bonenfant, Pascal Sainrat, Nick Lay, David George, Ian Broster, Eduardo Quiñones, Milos Panic, Jaume Abella, Carles Hernandez, Francisco Cazorla, Sascha Uhrig, Mathias Rohde, and Arthur Pyka. Parallelizing industrial hard real-time applications for the parmerasa multicore. ACM Trans. Embed. Comput. Syst., 15(3), May 2016. URL: https://doi.org/10.1145/2910589.
I. Wenzel, R. Kirner, P. Puschner, and B. Rieder. Principles of timing anomalies in superscalar processors. In Fifth International Conference on Quality Software (QSIC'05), pages 295-303, 2005. URL: https://doi.org/10.1109/QSIC.2005.49.
R. Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C. Ferdinand. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(7):966-978, 2009. URL: https://doi.org/10.1109/TCAD.2009.2013287.
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst., 7(3), 2008. URL: https://doi.org/10.1145/1347375.1347389.
Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, April 2009. URL: https://doi.org/10.1145/1498765.1498785.
Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. VESPA: Portable, scalable, and flexible FPGA-based vector processors. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '08, page 61–70, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1450095.1450107.
Jason Yu, Guy Lemieux, and Christpher Eagleston. Vector processing as a soft-core CPU accelerator. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays, FPGA '08, page 222–232, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1344671.1344704.
M. Zimmer, D. Broman, C. Shaver, and E. A. Lee. Flexpret: A processor platform for mixed-criticality systems. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS), pages 101-110, 2014. URL: https://doi.org/10.1109/RTAS.2014.6925994.

Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation

Authors Michael Platzer , Peter Puschner

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Vicuna: A Timing-Predictable RISC-V Vector Coprocessor for Scalable Parallel Computation

Authors Michael Platzer , Peter Puschner

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message