Search Results

Documents authored by Xydis, Sotirios


Document
Linking High-Level Synthesis with FPGA Runtime Orchestration

Authors: Despoina Tomkou, Aggelos Ferikoglou, Dimosthenis Masouros, Sotirios Xydis, and Dimitrios Soudris

Published in: OASIcs, Volume 141, 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)


Abstract
FPGAs are increasingly being adopted across the edge-to-cloud continuum due to their ability to provide both high performance and energy efficiency. However, the complexity of programming FPGAs often leads to deployed designs that underutilize available resources. FPGA multi-tenancy has been proposed to enhance resource utilization, yet monolithic designs and dynamic workload demands continue to challenge efficient FPGA usage and compliance with Quality of Service requirements. To address these issues, we propose a novel framework for the optimal orchestration of FPGAs across the edge-to-cloud continuum while meeting user demands. The framework generates approximations of Pareto-optimal designs for each application, capturing trade-offs between performance and resource usage with minimal bitstream generation. This information allows the runtime orchestrator to select the most suitable design based on available PR regions and the QoS requirements of each user. Experimental results demonstrate that the proposed approach achieves an average reduction of QoS violations by a factor of 8.1× across diverse workloads and baseline configurations. Overall, the framework offers a practical and effective solution for realizing FPGA-as-a-Service across the edge-to-cloud continuum.

Cite as

Despoina Tomkou, Aggelos Ferikoglou, Dimosthenis Masouros, Sotirios Xydis, and Dimitrios Soudris. Linking High-Level Synthesis with FPGA Runtime Orchestration. In 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026). Open Access Series in Informatics (OASIcs), Volume 141, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{tomkou_et_al:OASIcs.PARMA-DITAM.2026.7,
  author =	{Tomkou, Despoina and Ferikoglou, Aggelos and Masouros, Dimosthenis and Xydis, Sotirios and Soudris, Dimitrios},
  title =	{{Linking High-Level Synthesis with FPGA Runtime Orchestration}},
  booktitle =	{17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-416-1},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{141},
  editor =	{Baroffio, Davide and Busia, Paola and Denisov, Lev and Shukla, Nitin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2026.7},
  URN =		{urn:nbn:de:0030-drops-256746},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2026.7},
  annote =	{Keywords: FPGA, Orchestration, Partial Reconfiguration, FPGAaaS}
}
Document
Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs

Authors: Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis

Published in: OASIcs, Volume 141, 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)


Abstract
Since the emergence of transformer-based models, the computational demands for Large Language Model (LLM) inference have been increasing exponentially, primarily due to their compounding parameter sizes, their structural complexity, and the use of non-linear functions. This tendency leads to the necessity of deploying them on low-power edge devices and DNN accelerators, to fuel next-generation agentic AI systems. Coarse-Grained Reconfigurable Architectures (CGRAs) have proven to be a compelling paradigm for edge acceleration, combining the programmability of general-purpose platforms with the high performance and energy efficiency associated with ASICs. In this work, we introduce an end-to-end performance modeling and mapping framework for LLM inference on heterogeneous CGRAs. Our methodology enables rapid exploration of the micro-architectural design space parameters, i.e., the number of processing elements, vector sizes, and memory configurations, by providing an accurate, explainable, and analytical CGRA performance modeling methodology, with an average cycle error of 0.9%. Architecturally, we build upon R-Blocks, a heterogeneous CGRA platform, and extend it to support floating-point arithmetic operations as well as a full-stack compilation and mapping flow for both full (FP32) and quantized (INT8) Llama2 models. The proposed methodology, evaluated on a 22nm technology node, achieves superior peak performance per Watt compared to related works such as REVAMP and CFEACT (1.8× and 2.8× respectively).

Cite as

Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis. Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs. In 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026). Open Access Series in Informatics (OASIcs), Volume 141, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{kefallinos_et_al:OASIcs.PARMA-DITAM.2026.8,
  author =	{Kefallinos, Dionysios and Alexandris, Georgios and Maras, Alexis and Chaidos, Panagiotis and Gomony, Manil Dev and Corporaal, Henk and Soudris, Dimitrios and Xydis, Sotirios},
  title =	{{Performance Modeling \& Mapping of LLM Inference on Heterogeneous Vectorized CGRAs}},
  booktitle =	{17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-416-1},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{141},
  editor =	{Baroffio, Davide and Busia, Paola and Denisov, Lev and Shukla, Nitin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2026.8},
  URN =		{urn:nbn:de:0030-drops-256752},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2026.8},
  annote =	{Keywords: Edge AI, LLM, CGRA, Heterogeneous Architectures, Performance Modeling, Hardware Acceleration, Low Power Computing}
}
Document
Complete Volume
OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume

Authors: João Bispo, Sotirios Xydis, Serena Curzel, and Luís Miguel Sousa

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume

Cite as

15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 1-88, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Proceedings{bispo_et_al:OASIcs.PARMA-DITAM.2024,
  title =	{{OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{1--88},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024},
  URN =		{urn:nbn:de:0030-drops-196938},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024},
  annote =	{Keywords: OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: João Bispo, Sotirios Xydis, Serena Curzel, and Luís Miguel Sousa

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bispo_et_al:OASIcs.PARMA-DITAM.2024.0,
  author =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{0:i--0:x},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.0},
  URN =		{urn:nbn:de:0030-drops-196947},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Resource Aware GPU Scheduling in Kubernetes Infrastructure

Authors: Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris

Published in: OASIcs, Volume 88, 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)


Abstract
Nowadays, there is an ever-increasing number of artificial intelligence inference workloads pushed and executed on the cloud. To effectively serve and manage the computational demands, data center operators have provisioned their infrastructures with accelerators. Specifically for GPUs, support for efficient management lacks, as state-of-the-art schedulers and orchestrators, threat GPUs only as typical compute resources ignoring their unique characteristics and application properties. This phenomenon combined with the GPU over-provisioning problem leads to severe resource under-utilization. Even though prior work has addressed this problem by colocating applications into a single accelerator device, its resource agnostic nature does not manage to face the resource under-utilization and quality of service violations especially for latency critical applications. In this paper, we design a resource aware GPU scheduling framework, able to efficiently colocate applications on the same GPU accelerator card. We integrate our solution with Kubernetes, one of the most widely used cloud orchestration frameworks. We show that our scheduler can achieve 58.8% lower end-to-end job execution time 99%-ile, while delivering 52.5% higher GPU memory usage, 105.9% higher GPU utilization percentage on average and 44.4% lower energy consumption on average, compared to the state-of-the-art schedulers, for a variety of ML representative workloads.

Cite as

Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris. Resource Aware GPU Scheduling in Kubernetes Infrastructure. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ferikoglou_et_al:OASIcs.PARMA-DITAM.2021.4,
  author =	{Ferikoglou, Aggelos and Masouros, Dimosthenis and Tzenetopoulos, Achilleas and Xydis, Sotirios and Soudris, Dimitrios},
  title =	{{Resource Aware GPU Scheduling in Kubernetes Infrastructure}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{4:1--4:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.4},
  URN =		{urn:nbn:de:0030-drops-136403},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.4},
  annote =	{Keywords: cloud computing, GPU scheduling, kubernetes, heterogeneity}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail