Multithread Accelerators on FPGAs: A Dataflow-Based Approach

Authors Francesco Ratto , Stefano Esposito , Carlo Sau , Luigi Raffo , Francesca Palumbo

Thumbnail PDF


  • Filesize: 3.96 MB
  • 14 pages

Document Identifiers

Author Details

Francesco Ratto
  • Università degli Studi di Cagliari, Italy
Stefano Esposito
  • Università degli Studi di Cagliari, Italy
Carlo Sau
  • Università degli Studi di Cagliari, Italy
Luigi Raffo
  • Università degli Studi di Cagliari, Italy
Francesca Palumbo
  • Università degli Studi di Sassari, Italy

Cite AsGet BibTex

Francesco Ratto, Stefano Esposito, Carlo Sau, Luigi Raffo, and Francesca Palumbo. Multithread Accelerators on FPGAs: A Dataflow-Based Approach. In 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2022). Open Access Series in Informatics (OASIcs), Volume 100, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Multithreading is a well-known technique for general-purpose systems to deliver a substantial performance gain, raising resource efficiency by exploiting underutilization periods. With the increase of specialized hardware, resource efficiency became fundamental to master the introduced overhead of such kind of devices. In this work, we propose a model-based approach for designing specialized multithread hardware accelerators. This novel approach exploits dataflow models of applications and tagged tokens to let the resulting hardware support concurrent threads without the need to replicate the whole accelerator. Assessment is carried out over different versions of an accelerator for a compute-intensive step of modern video coding algorithms, under several feeding configurations. Results highlight that the proposed multithread accelerators achieve a valuable tradeoff: saving computational resources with respect to replicated parallel single-thread accelerators, while guaranteeing shorter waiting, response, and elaboration time than a unique single-thread accelerator multiplexed in time.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Data flow architectures
  • Computing methodologies → Concurrent algorithms
  • Hardware → Best practices for EDA
  • multithreading
  • dataflow
  • hardware acceleration
  • heterogeneous systems
  • tagged dataflow


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Shuvra S Bhattacharyya, Praveen K Murthy, and Edward A Lee. Software synthesis from dataflow graphs, volume 360. Springer Science & Business Media, 1996. Google Scholar
  2. Nicola Carta, Carlo Sau, Danilo Pani, Francesca Palumbo, and Luigi Raffo. A coarse-grained reconfigurable approach for low-power spike sorting architectures. In 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), pages 439-442. IEEE, 2013. Google Scholar
  3. Angelos Charalambidis, Nikolaos Papaspyrou, and Panos Rondogiannis. Tagged dataflow: a formal model for iterative map-reduce. In EDBT/ICDT Workshops, pages 29-36, 2014. Google Scholar
  4. Yen-Kuang Chen and Sun-Yuan Kung. Trend and challenge on system-on-a-chip designs. Journal of signal processing systems, 53(1):217-229, 2008. Google Scholar
  5. Jongsok Choi, Stephen Brown, and Jason Anderson. From software threads to parallel hardware in high-level synthesis for fpgas. In 2013 International Conference on Field-Programmable Technology (FPT), pages 270-277. IEEE, 2013. Google Scholar
  6. Tomasz S Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Jason Wong, Peter Yiannacouras, and Deshanand P Singh. From opencl to high-performance hardware on fpgas. In 22nd international conference on field programmable logic and applications (FPL), pages 531-534. IEEE, 2012. Google Scholar
  7. William J Dally, Yatish Turakhia, and Song Han. Domain-specific hardware accelerators. Communications of the ACM, 63(7):48-57, 2020. Google Scholar
  8. Tiziana Fanni, Lin Li, Timo Viitanen, Carlo Sau, Renjie Xie, Francesca Palumbo, Luigi Raffo, Heikki Huttunen, Jarmo Takala, and Shuvra S Bhattacharyya. Hardware design methodology using lightweight dataflow and its integration with low power techniques. Journal of Systems Architecture, 78:15-29, 2017. Google Scholar
  9. Rajesh K Gupta and Giovanni De Micheli. Hardware-software cosynthesis for digital systems. IEEE Design & test of computers, 10(3):29-41, 1993. Google Scholar
  10. John L Hennessy and David A Patterson. A new golden age for computer architecture. Communications of the ACM, 62(2):48-60, 2019. Google Scholar
  11. Jens Huthmann, Julian Oppermann, and Andreas Koch. Automatic high-level synthesis of multi-threaded hardware accelerators. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pages 1-4. IEEE, 2014. Google Scholar
  12. Jörn W Janneck, Ian D Miller, David B Parlour, Ghislain Roquier, Matthieu Wipliez, and Mickaël Raulet. Synthesizing hardware from dataflow programs. Journal of Signal Processing Systems, 63(2):241-249, 2011. Google Scholar
  13. Edward A Lee and David G Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235-1245, 1987. Google Scholar
  14. Rishiyur S Nikhil et al. Executing a program on the mit tagged-token dataflow architecture. IEEE Transactions on computers, 39(3):300-318, 1990. Google Scholar
  15. Francesca Palumbo, Danilo Pani, Emanuele Manca, Luigi Raffo, Marco Mattavelli, and Ghislain Roquier. RVC: A multi-decoder CAL composer tool. In Proceedings of the 2010 Conference on Design & Architectures for Signal & Image Processing, DASIP 2010, Edinburgh, Scotland, UK, October 26-28, 2010, Electronic Chips & Systems design Initiative, ECSI, pages 144-151. IEEE, 2010. URL:
  16. Alexandros Papakonstantinou, Karthik Gururaj, John A Stratton, Deming Chen, Jason Cong, and Wen-Mei W Hwu. Fcuda: Enabling efficient compilation of cuda kernels onto fpgas. In 2009 IEEE 7th Symposium on Application Specific Processors, pages 35-42. IEEE, 2009. Google Scholar
  17. Alfonso Rodrıguez, Juan Valverde, and Eduardo de la Torre. Design of opencl-compatible multithreaded hardware accelerators with dynamic support for embedded fpgas. In 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pages 1-7. IEEE, 2015. Google Scholar
  18. Leandro Santiago, Leandro AJ Marzulo, Brunno F Goldstein, Tiago AO Alves, and Felipe MG França. Stack-tagged dataflow. In 2014 International Symposium on Computer Architecture and High Performance Computing Workshop, pages 78-83. IEEE, 2014. Google Scholar
  19. Jocelyn Serot, Francois Berry, and Sameer Ahmed. Implementing stream-processing applications on fpgas: A dsl-based approach. In 2011 21st International Conference on Field Programmable Logic and Applications, pages 130-137. IEEE, 2011. Google Scholar
  20. Harald Simmler, Lorne Levinson, and Reinhard Männer. Multitasking on fpga coprocessors. In International Workshop on Field Programmable Logic and Applications, pages 121-130. Springer, 2000. Google Scholar
  21. Ying Wang, Xuegong Zhou, Lingli Wang, Jian Yan, Wayne Luk, Chenglian Peng, and Jiarong Tong. Spread: A streaming-based partially reconfigurable architecture and programming model. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(12):2179-2192, 2013. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail