Search Results

Documents authored by Corporaal, Henk


Document
Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs

Authors: Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis

Published in: OASIcs, Volume 141, 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)


Abstract
Since the emergence of transformer-based models, the computational demands for Large Language Model (LLM) inference have been increasing exponentially, primarily due to their compounding parameter sizes, their structural complexity, and the use of non-linear functions. This tendency leads to the necessity of deploying them on low-power edge devices and DNN accelerators, to fuel next-generation agentic AI systems. Coarse-Grained Reconfigurable Architectures (CGRAs) have proven to be a compelling paradigm for edge acceleration, combining the programmability of general-purpose platforms with the high performance and energy efficiency associated with ASICs. In this work, we introduce an end-to-end performance modeling and mapping framework for LLM inference on heterogeneous CGRAs. Our methodology enables rapid exploration of the micro-architectural design space parameters, i.e., the number of processing elements, vector sizes, and memory configurations, by providing an accurate, explainable, and analytical CGRA performance modeling methodology, with an average cycle error of 0.9%. Architecturally, we build upon R-Blocks, a heterogeneous CGRA platform, and extend it to support floating-point arithmetic operations as well as a full-stack compilation and mapping flow for both full (FP32) and quantized (INT8) Llama2 models. The proposed methodology, evaluated on a 22nm technology node, achieves superior peak performance per Watt compared to related works such as REVAMP and CFEACT (1.8× and 2.8× respectively).

Cite as

Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis. Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs. In 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026). Open Access Series in Informatics (OASIcs), Volume 141, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{kefallinos_et_al:OASIcs.PARMA-DITAM.2026.8,
  author =	{Kefallinos, Dionysios and Alexandris, Georgios and Maras, Alexis and Chaidos, Panagiotis and Gomony, Manil Dev and Corporaal, Henk and Soudris, Dimitrios and Xydis, Sotirios},
  title =	{{Performance Modeling \& Mapping of LLM Inference on Heterogeneous Vectorized CGRAs}},
  booktitle =	{17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-416-1},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{141},
  editor =	{Baroffio, Davide and Busia, Paola and Denisov, Lev and Shukla, Nitin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2026.8},
  URN =		{urn:nbn:de:0030-drops-256752},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2026.8},
  annote =	{Keywords: Edge AI, LLM, CGRA, Heterogeneous Architectures, Performance Modeling, Hardware Acceleration, Low Power Computing}
}
Document
An Automated Flow to Map Throughput Constrained Applications to a MPSoC

Authors: Roel Jordans, Firew Siyoum, Sander Stuijk, Akash Kumar, and Henk Corporaal

Published in: OASIcs, Volume 18, Bringing Theory to Practice: Predictability and Performance in Embedded Systems (2011)


Abstract
This paper describes a design flow to map throughput constrained applications on a Multi-processor System-on-Chip (MPSoC). It integrates several state-of-the-art mapping and synthesis tools into an automated tool flow. This flow takes as input a throughput constrained application, modeled with a synchronous dataflow graph, a C-based implementation for each actor in the graph, and a template based architecture description. Using these inputs, the tool flow generates an MPSoC platform tailored to the application requirements and it subsequently maps the application to this platform. The output of the flow is an FPGA programmable bit file. An easily extensible template based architecture is presented, this architecture allows fast and flexible generation of a predictable platform that can be synthesized using the presented tool flow. The effectiveness of the tool flow is demonstrated by mapping an MJPEG-decoder onto our MPSoC platform. This case study shows that our flow is able to provide a tight, conservative bound on the worst-case throughput of the FPGA implementation. The presented tool flow is freely available at http://www.es.ele.tue.nl/mamps.

Cite as

Roel Jordans, Firew Siyoum, Sander Stuijk, Akash Kumar, and Henk Corporaal. An Automated Flow to Map Throughput Constrained Applications to a MPSoC. In Bringing Theory to Practice: Predictability and Performance in Embedded Systems. Open Access Series in Informatics (OASIcs), Volume 18, pp. 47-58, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2011)


Copy BibTex To Clipboard

@InProceedings{jordans_et_al:OASIcs.PPES.2011.47,
  author =	{Jordans, Roel and Siyoum, Firew and Stuijk, Sander and Kumar, Akash and Corporaal, Henk},
  title =	{{An Automated Flow to Map Throughput Constrained Applications to a MPSoC}},
  booktitle =	{Bringing Theory to Practice: Predictability and Performance in Embedded Systems},
  pages =	{47--58},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-28-6},
  ISSN =	{2190-6807},
  year =	{2011},
  volume =	{18},
  editor =	{Lucas, Philipp and Wilhelm, Reinhard},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PPES.2011.47},
  URN =		{urn:nbn:de:0030-drops-30819},
  doi =		{10.4230/OASIcs.PPES.2011.47},
  annote =	{Keywords: design flow automation, multi-processor system-on-chip, throughput constrained, synchronous data-flow graphs}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail