Multithread Accelerators on FPGAs: A Dataflow-Based Approach

Ratto, Francesco; Esposito, Stefano; Sau, Carlo; Raffo, Luigi; Palumbo, Francesca

doi:10.4230/OASIcs.PARMA-DITAM.2022.6

Abstract

Multithreading is a well-known technique for general-purpose systems to deliver a substantial performance gain, raising resource efficiency by exploiting underutilization periods. With the increase of specialized hardware, resource efficiency became fundamental to master the introduced overhead of such kind of devices. In this work, we propose a model-based approach for designing specialized multithread hardware accelerators. This novel approach exploits dataflow models of applications and tagged tokens to let the resulting hardware support concurrent threads without the need to replicate the whole accelerator. Assessment is carried out over different versions of an accelerator for a compute-intensive step of modern video coding algorithms, under several feeding configurations. Results highlight that the proposed multithread accelerators achieve a valuable tradeoff: saving computational resources with respect to replicated parallel single-thread accelerators, while guaranteeing shorter waiting, response, and elaboration time than a unique single-thread accelerator multiplexed in time.

Cite As Get BibTex

Francesco Ratto, Stefano Esposito, Carlo Sau, Luigi Raffo, and Francesca Palumbo. Multithread Accelerators on FPGAs: A Dataflow-Based Approach. In 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2022). Open Access Series in Informatics (OASIcs), Volume 100, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.6

@InProceedings{ratto_et_al:OASIcs.PARMA-DITAM.2022.6,
  author =	{Ratto, Francesco and Esposito, Stefano and Sau, Carlo and Raffo, Luigi and Palumbo, Francesca},
  title =	{{Multithread Accelerators on FPGAs: A Dataflow-Based Approach}},
  booktitle =	{13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2022)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-231-0},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{100},
  editor =	{Palumbo, Francesca and Bispo, Jo\~{a}o and Cherubin, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2022.6},
  URN =		{urn:nbn:de:0030-drops-161225},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2022.6},
  annote =	{Keywords: multithreading, dataflow, hardware acceleration, heterogeneous systems, tagged dataflow}
}

@InProceedings{ratto_et_al:OASIcs.PARMA-DITAM.2022.6,
  author =	{Ratto, Francesco and Esposito, Stefano and Sau, Carlo and Raffo, Luigi and Palumbo, Francesca},
  title =	{{Multithread Accelerators on FPGAs: A Dataflow-Based Approach}},
  booktitle =	{13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2022)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-231-0},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{100},
  editor =	{Palumbo, Francesca and Bispo, Jo\~{a}o and Cherubin, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2022.6},
  URN =		{urn:nbn:de:0030-drops-161225},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2022.6},
  annote =	{Keywords: multithreading, dataflow, hardware acceleration, heterogeneous systems, tagged dataflow}
}

Author Details

Francesco Ratto

Università degli Studi di Cagliari, Italy

Stefano Esposito

Università degli Studi di Cagliari, Italy

Carlo Sau

Università degli Studi di Cagliari, Italy

Luigi Raffo

Università degli Studi di Cagliari, Italy

Francesca Palumbo

Università degli Studi di Sassari, Italy

Funding

Prof. Palumbo is grateful to the University of Sassari that supported her studies on this topic through the "fondo di Ateneo per la ricerca 2020".

References

Shuvra S Bhattacharyya, Praveen K Murthy, and Edward A Lee. Software synthesis from dataflow graphs, volume 360. Springer Science & Business Media, 1996.
Nicola Carta, Carlo Sau, Danilo Pani, Francesca Palumbo, and Luigi Raffo. A coarse-grained reconfigurable approach for low-power spike sorting architectures. In 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), pages 439-442. IEEE, 2013.
Angelos Charalambidis, Nikolaos Papaspyrou, and Panos Rondogiannis. Tagged dataflow: a formal model for iterative map-reduce. In EDBT/ICDT Workshops, pages 29-36, 2014.
Yen-Kuang Chen and Sun-Yuan Kung. Trend and challenge on system-on-a-chip designs. Journal of signal processing systems, 53(1):217-229, 2008.
Jongsok Choi, Stephen Brown, and Jason Anderson. From software threads to parallel hardware in high-level synthesis for fpgas. In 2013 International Conference on Field-Programmable Technology (FPT), pages 270-277. IEEE, 2013.
Tomasz S Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Jason Wong, Peter Yiannacouras, and Deshanand P Singh. From opencl to high-performance hardware on fpgas. In 22nd international conference on field programmable logic and applications (FPL), pages 531-534. IEEE, 2012.
William J Dally, Yatish Turakhia, and Song Han. Domain-specific hardware accelerators. Communications of the ACM, 63(7):48-57, 2020.
Tiziana Fanni, Lin Li, Timo Viitanen, Carlo Sau, Renjie Xie, Francesca Palumbo, Luigi Raffo, Heikki Huttunen, Jarmo Takala, and Shuvra S Bhattacharyya. Hardware design methodology using lightweight dataflow and its integration with low power techniques. Journal of Systems Architecture, 78:15-29, 2017.
Rajesh K Gupta and Giovanni De Micheli. Hardware-software cosynthesis for digital systems. IEEE Design & test of computers, 10(3):29-41, 1993.
John L Hennessy and David A Patterson. A new golden age for computer architecture. Communications of the ACM, 62(2):48-60, 2019.
Jens Huthmann, Julian Oppermann, and Andreas Koch. Automatic high-level synthesis of multi-threaded hardware accelerators. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pages 1-4. IEEE, 2014.
Jörn W Janneck, Ian D Miller, David B Parlour, Ghislain Roquier, Matthieu Wipliez, and Mickaël Raulet. Synthesizing hardware from dataflow programs. Journal of Signal Processing Systems, 63(2):241-249, 2011.
Edward A Lee and David G Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235-1245, 1987.
Rishiyur S Nikhil et al. Executing a program on the mit tagged-token dataflow architecture. IEEE Transactions on computers, 39(3):300-318, 1990.
Francesca Palumbo, Danilo Pani, Emanuele Manca, Luigi Raffo, Marco Mattavelli, and Ghislain Roquier. RVC: A multi-decoder CAL composer tool. In Proceedings of the 2010 Conference on Design & Architectures for Signal & Image Processing, DASIP 2010, Edinburgh, Scotland, UK, October 26-28, 2010, Electronic Chips & Systems design Initiative, ECSI, pages 144-151. IEEE, 2010. URL: https://doi.org/10.1109/DASIP.2010.5706258.
Alexandros Papakonstantinou, Karthik Gururaj, John A Stratton, Deming Chen, Jason Cong, and Wen-Mei W Hwu. Fcuda: Enabling efficient compilation of cuda kernels onto fpgas. In 2009 IEEE 7th Symposium on Application Specific Processors, pages 35-42. IEEE, 2009.
Alfonso Rodrıguez, Juan Valverde, and Eduardo de la Torre. Design of opencl-compatible multithreaded hardware accelerators with dynamic support for embedded fpgas. In 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pages 1-7. IEEE, 2015.
Leandro Santiago, Leandro AJ Marzulo, Brunno F Goldstein, Tiago AO Alves, and Felipe MG França. Stack-tagged dataflow. In 2014 International Symposium on Computer Architecture and High Performance Computing Workshop, pages 78-83. IEEE, 2014.
Jocelyn Serot, Francois Berry, and Sameer Ahmed. Implementing stream-processing applications on fpgas: A dsl-based approach. In 2011 21st International Conference on Field Programmable Logic and Applications, pages 130-137. IEEE, 2011.
Harald Simmler, Lorne Levinson, and Reinhard Männer. Multitasking on fpga coprocessors. In International Workshop on Field Programmable Logic and Applications, pages 121-130. Springer, 2000.
Ying Wang, Xuegong Zhou, Lingli Wang, Jian Yan, Wayne Luk, Chenglian Peng, and Jiarong Tong. Spread: A streaming-based partially reconfigurable architecture and programming model. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(12):2179-2192, 2013.

Multithread Accelerators on FPGAs: A Dataflow-Based Approach

Authors Francesco Ratto , Stefano Esposito , Carlo Sau , Luigi Raffo , Francesca Palumbo

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message