14 Search Results for "Xydis, Sotirios"


Volume

OASIcs, Volume 116

15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)

PARMA-DITAM 2024, January 18, 2024, Munich, Germany

Editors: João Bispo, Sotirios Xydis, Serena Curzel, and Luís Miguel Sousa

Artifact
Software
FPGAScheduler

Authors: Aggelos Ferikoglou


Abstract

Cite as

Aggelos Ferikoglou. FPGAScheduler (Software, Source Code). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-25581,
   title = {{FPGAScheduler}}, 
   author = {Ferikoglou, Aggelos},
   note = {Software, swhId: \href{https://archive.softwareheritage.org/swh:1:dir:2941cf80f4a5b396622da5ebb3f1fb76ffae7b8b;origin=https://github.com/aferikoglou/FPGAScheduler;visit=swh:1:snp:901fb2866292917bb3ed2a4e9973317816eb0787;anchor=swh:1:rev:a7ed7d4ed62bc17fd412752ec49f59cf8f313499}{\texttt{swh:1:dir:2941cf80f4a5b396622da5ebb3f1fb76ffae7b8b}} (visited on 2026-04-10)},
   url = {https://github.com/aferikoglou/FPGAScheduler},
   doi = {10.4230/artifacts.25581},
}
Document
Linking High-Level Synthesis with FPGA Runtime Orchestration

Authors: Despoina Tomkou, Aggelos Ferikoglou, Dimosthenis Masouros, Sotirios Xydis, and Dimitrios Soudris

Published in: OASIcs, Volume 141, 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)


Abstract
FPGAs are increasingly being adopted across the edge-to-cloud continuum due to their ability to provide both high performance and energy efficiency. However, the complexity of programming FPGAs often leads to deployed designs that underutilize available resources. FPGA multi-tenancy has been proposed to enhance resource utilization, yet monolithic designs and dynamic workload demands continue to challenge efficient FPGA usage and compliance with Quality of Service requirements. To address these issues, we propose a novel framework for the optimal orchestration of FPGAs across the edge-to-cloud continuum while meeting user demands. The framework generates approximations of Pareto-optimal designs for each application, capturing trade-offs between performance and resource usage with minimal bitstream generation. This information allows the runtime orchestrator to select the most suitable design based on available PR regions and the QoS requirements of each user. Experimental results demonstrate that the proposed approach achieves an average reduction of QoS violations by a factor of 8.1× across diverse workloads and baseline configurations. Overall, the framework offers a practical and effective solution for realizing FPGA-as-a-Service across the edge-to-cloud continuum.

Cite as

Despoina Tomkou, Aggelos Ferikoglou, Dimosthenis Masouros, Sotirios Xydis, and Dimitrios Soudris. Linking High-Level Synthesis with FPGA Runtime Orchestration. In 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026). Open Access Series in Informatics (OASIcs), Volume 141, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{tomkou_et_al:OASIcs.PARMA-DITAM.2026.7,
  author =	{Tomkou, Despoina and Ferikoglou, Aggelos and Masouros, Dimosthenis and Xydis, Sotirios and Soudris, Dimitrios},
  title =	{{Linking High-Level Synthesis with FPGA Runtime Orchestration}},
  booktitle =	{17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-416-1},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{141},
  editor =	{Baroffio, Davide and Busia, Paola and Denisov, Lev and Shukla, Nitin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2026.7},
  URN =		{urn:nbn:de:0030-drops-256746},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2026.7},
  annote =	{Keywords: FPGA, Orchestration, Partial Reconfiguration, FPGAaaS}
}
Document
Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs

Authors: Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis

Published in: OASIcs, Volume 141, 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)


Abstract
Since the emergence of transformer-based models, the computational demands for Large Language Model (LLM) inference have been increasing exponentially, primarily due to their compounding parameter sizes, their structural complexity, and the use of non-linear functions. This tendency leads to the necessity of deploying them on low-power edge devices and DNN accelerators, to fuel next-generation agentic AI systems. Coarse-Grained Reconfigurable Architectures (CGRAs) have proven to be a compelling paradigm for edge acceleration, combining the programmability of general-purpose platforms with the high performance and energy efficiency associated with ASICs. In this work, we introduce an end-to-end performance modeling and mapping framework for LLM inference on heterogeneous CGRAs. Our methodology enables rapid exploration of the micro-architectural design space parameters, i.e., the number of processing elements, vector sizes, and memory configurations, by providing an accurate, explainable, and analytical CGRA performance modeling methodology, with an average cycle error of 0.9%. Architecturally, we build upon R-Blocks, a heterogeneous CGRA platform, and extend it to support floating-point arithmetic operations as well as a full-stack compilation and mapping flow for both full (FP32) and quantized (INT8) Llama2 models. The proposed methodology, evaluated on a 22nm technology node, achieves superior peak performance per Watt compared to related works such as REVAMP and CFEACT (1.8× and 2.8× respectively).

Cite as

Dionysios Kefallinos, Georgios Alexandris, Alexis Maras, Panagiotis Chaidos, Manil Dev Gomony, Henk Corporaal, Dimitrios Soudris, and Sotirios Xydis. Performance Modeling & Mapping of LLM Inference on Heterogeneous Vectorized CGRAs. In 17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026). Open Access Series in Informatics (OASIcs), Volume 141, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)


Copy BibTex To Clipboard

@InProceedings{kefallinos_et_al:OASIcs.PARMA-DITAM.2026.8,
  author =	{Kefallinos, Dionysios and Alexandris, Georgios and Maras, Alexis and Chaidos, Panagiotis and Gomony, Manil Dev and Corporaal, Henk and Soudris, Dimitrios and Xydis, Sotirios},
  title =	{{Performance Modeling \& Mapping of LLM Inference on Heterogeneous Vectorized CGRAs}},
  booktitle =	{17th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 15th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2026)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-416-1},
  ISSN =	{2190-6807},
  year =	{2026},
  volume =	{141},
  editor =	{Baroffio, Davide and Busia, Paola and Denisov, Lev and Shukla, Nitin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2026.8},
  URN =		{urn:nbn:de:0030-drops-256752},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2026.8},
  annote =	{Keywords: Edge AI, LLM, CGRA, Heterogeneous Architectures, Performance Modeling, Hardware Acceleration, Low Power Computing}
}
Document
System-Level Timing Performance Estimation Based on a Unifying HW/SW Performance Metric

Authors: Vittoriano Muttillo, Vincenzo Stoico, Giacomo Valente, Marco Santic, Luigi Pomante, and Daniele Frigioni

Published in: OASIcs, Volume 127, 16th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 14th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2025)


Abstract
The rapidly increasing complexity of embedded systems and the critical impact of non-functional requirements demand the adoption of an appropriate system-level HW/SW co-design methodology. This methodology tries to satisfy all design requirements by simultaneously considering several alternative HW/SW implementations. In this context, early performance estimation approaches are crucial in reducing the design space, thereby minimizing design time and cost. To address the challenge of system-level performance estimation, this work presents and formalizes a novel approach based on a unifying HW/SW performance metric for early execution time estimation. The proposed approach estimates the execution time of a C function when executed by different HW/SW processor technologies. The approach is validated through an extensive experimental study, demonstrating its effectiveness and efficiency in terms of estimation error (i.e., lower than 10%) and estimation time (close to zero) when compared to existing methods in the literature.

Cite as

Vittoriano Muttillo, Vincenzo Stoico, Giacomo Valente, Marco Santic, Luigi Pomante, and Daniele Frigioni. System-Level Timing Performance Estimation Based on a Unifying HW/SW Performance Metric. In 16th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 14th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2025). Open Access Series in Informatics (OASIcs), Volume 127, pp. 3:1-3:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{muttillo_et_al:OASIcs.PARMA-DITAM.2025.3,
  author =	{Muttillo, Vittoriano and Stoico, Vincenzo and Valente, Giacomo and Santic, Marco and Pomante, Luigi and Frigioni, Daniele},
  title =	{{System-Level Timing Performance Estimation Based on a Unifying HW/SW Performance Metric}},
  booktitle =	{16th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 14th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2025)},
  pages =	{3:1--3:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-363-8},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{127},
  editor =	{Cattaneo, Daniele and Fazio, Maria and Kosmidis, Leonidas and Morabito, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2025.3},
  URN =		{urn:nbn:de:0030-drops-229071},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2025.3},
  annote =	{Keywords: embedded systems, hw/sw co-design, performance estimation, lasso, machine learning}
}
Document
Complete Volume
OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume

Authors: João Bispo, Sotirios Xydis, Serena Curzel, and Luís Miguel Sousa

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume

Cite as

15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 1-88, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Proceedings{bispo_et_al:OASIcs.PARMA-DITAM.2024,
  title =	{{OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{1--88},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024},
  URN =		{urn:nbn:de:0030-drops-196938},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024},
  annote =	{Keywords: OASIcs, Volume 116, PARMA-DITAM 2024, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: João Bispo, Sotirios Xydis, Serena Curzel, and Luís Miguel Sousa

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{bispo_et_al:OASIcs.PARMA-DITAM.2024.0,
  author =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{0:i--0:x},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.0},
  URN =		{urn:nbn:de:0030-drops-196947},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
High-Level Synthesis Developments in the Context of European Space Technology Research (Invited Talk)

Authors: Fabrizio Ferrandi, Michele Fiorito, Claudio Barone, Giovanni Gozzi, and Serena Curzel

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
European efforts to boost competitiveness in the space services sector promote the research and development of advanced software and hardware solutions. The EU-funded HERMES project contributes to the effort by qualifying radiation-hardened, high-performance programmable microprocessors and developing a software ecosystem that facilitates the deployment of complex applications on such platforms. The main objectives of the project include reaching a technology readiness level of 6 (i.e., validated and demonstrated in relevant environment) for the rad-hard NG-ULTRA FPGA with its ceramic hermetic package CGA 1752, developed within projects of the European Space Agency, French National Centre for Space Studies and the European Union. An equally important share of the project is dedicated to the development and validation of tools that support multicore software programming and FPGA acceleration. The HERMES project selected the Bambu High-Level Synthesis tool to integrate capabilities to translate C/C++ code into Verilog/VHDL in its development ecosystem. In HERMES, Bambu has been and will be extended to support new FPGA targets, architectural models, model-based design, and input applications. The increased performance offered by FPGAs is thus made available also to software developers who do not have hardware design expertise.

Cite as

Fabrizio Ferrandi, Michele Fiorito, Claudio Barone, Giovanni Gozzi, and Serena Curzel. High-Level Synthesis Developments in the Context of European Space Technology Research (Invited Talk). In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 1:1-1:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{ferrandi_et_al:OASIcs.PARMA-DITAM.2024.1,
  author =	{Ferrandi, Fabrizio and Fiorito, Michele and Barone, Claudio and Gozzi, Giovanni and Curzel, Serena},
  title =	{{High-Level Synthesis Developments in the Context of European Space Technology Research}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{1:1--1:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.1},
  URN =		{urn:nbn:de:0030-drops-196951},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.1},
  annote =	{Keywords: High-Level Synthesis, rad-hard FPGAs}
}
Document
Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures

Authors: Alireza Amirshahi, Giovanni Ansaloni, and David Atienza

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware accelerators. Against this backdrop, in this paper we propose a novel memory arrangement strategy, governed by the hardware accelerator’s kernel size, which effectively minimizes off-chip data access. This arrangement is particularly beneficial for end-to-end transformer model inference, where most of the computation is based on general matrix multiplication (GEMM) operations. Additionally, we address the overhead of non-GEMM operations in transformer models within the scope of this memory data arrangement. Our study explores the implementation and effectiveness of the proposed accelerator-driven data arrangement approach in both single- and multi-core systems. Our evaluation demonstrates that our approach can achieve up to a 2.7x speed increase when executing inferences employing state-of-the-art transformers.

Cite as

Alireza Amirshahi, Giovanni Ansaloni, and David Atienza. Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amirshahi_et_al:OASIcs.PARMA-DITAM.2024.2,
  author =	{Amirshahi, Alireza and Ansaloni, Giovanni and Atienza, David},
  title =	{{Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.2},
  URN =		{urn:nbn:de:0030-drops-196960},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.2},
  annote =	{Keywords: Memory arrangement, Data layout, Hardware accelerators, Transformer models, Multi-core, System simulation}
}
Document
Zero-Copy, Minimal-Blackout Virtual Machine Migrations Using Disaggregated Shared Memory

Authors: Andreas Grapentin, Felix Eberhardt, Tobias Zagorni, Andreas Polze, Michele Gazzetti, and Christian Pinto

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
We propose a new live-migration paradigm for virtual machines called zero-copy migration. By making the working set of the virtual machine available on the destination host through transparently byte-addressable disaggregated memory, we remove the need for a pre-copy phase while simultaneously reducing the performance impact of the post-copy phase. We describe an open-source implementation of the proposed paradigm based on QEMU-KVM and libvirt, and we evaluate the efficiency of the approach with a deployment on a functional hardware prototype of a memory disaggregation system realized using ThymesisFlow. Using a series of configurable benchmarks, we show that the lead time and blackout time of the migration are equal to best-case scenarios of traditional pre-copy, post-copy and hybrid approaches. Key performance metrics from the perspective of applications running in the virtual machine, such as memory latency and throughput, are improved by up to three orders of magnitude, increasing both flexibility and responsiveness of live-migrations in the datacenter.

Cite as

Andreas Grapentin, Felix Eberhardt, Tobias Zagorni, Andreas Polze, Michele Gazzetti, and Christian Pinto. Zero-Copy, Minimal-Blackout Virtual Machine Migrations Using Disaggregated Shared Memory. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{grapentin_et_al:OASIcs.PARMA-DITAM.2024.3,
  author =	{Grapentin, Andreas and Eberhardt, Felix and Zagorni, Tobias and Polze, Andreas and Gazzetti, Michele and Pinto, Christian},
  title =	{{Zero-Copy, Minimal-Blackout Virtual Machine Migrations Using Disaggregated Shared Memory}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.3},
  URN =		{urn:nbn:de:0030-drops-196972},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.3},
  annote =	{Keywords: disaggregation, disaggregated memory, vm live migration, thymesisflow, power9, opencapi, performance evaluation, zero copy}
}
Document
Precision Tuning the Rust Memory-Safe Programming Language

Authors: Gabriele Magnani, Lev Denisov, Daniele Cattaneo, Giovanni Agosta, and Stefano Cherubin

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
Precision tuning is an increasingly common approach for exploiting the tradeoff between energy efficiency or speedup, and accuracy. Its effectiveness is particularly strong whenever the maximum performance must be extracted from a computing system, such as embedded platforms. In these contexts, current engineering practice sees a dominance of memory-unsafe programming languages such as C and C++. However, the unsafe nature of these languages has come under great scrutiny as it leads to significant software vulnerabilities. Hence, safer programming languages which prevent memory-related bugs by design have been proposed as a replacement. Amongst these safer programming languages, one of the most popular has been Rust. In this work we adapt a state-of-the-art precision tuning tool, TAFFO, to operate on Rust code. By porting the PolyBench/C benchmark suite to Rust, we show that the effectiveness of the precision tuning is not affected by the use of a safer programming language, and moreover the safety properties of the language can be successfully preserved. Specifically, using TAFFO and Rust we achieved up to a 15× speedup over the base Rust code, thanks to the use of precision tuning.

Cite as

Gabriele Magnani, Lev Denisov, Daniele Cattaneo, Giovanni Agosta, and Stefano Cherubin. Precision Tuning the Rust Memory-Safe Programming Language. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{magnani_et_al:OASIcs.PARMA-DITAM.2024.4,
  author =	{Magnani, Gabriele and Denisov, Lev and Cattaneo, Daniele and Agosta, Giovanni and Cherubin, Stefano},
  title =	{{Precision Tuning the Rust Memory-Safe Programming Language}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{4:1--4:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.4},
  URN =		{urn:nbn:de:0030-drops-196989},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.4},
  annote =	{Keywords: Approximate Computing, Memory Safety, Precision Tuning}
}
Document
Embedded Multi-Core Code Generation with Cross-Layer Parallelization

Authors: Oliver Oey, Michael Huebner, Timo Stripf, and Juergen Becker

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
In this paper, we present a method for optimizing C code for embedded multi-core systems using cross-layer parallelization. The method has two phases. The first is to develop the algorithm without any optimization for the target platform. Then, the second step is to optimize and parallelize the code across four defined layers which are the algorithm, code, task, and data layers, for efficient execution on the target hardware. Each layer is focused on selected hardware characteristics. By using an iterative approach, individual kernels and composite algorithms can be very well adapted to execution on the hardware without further adaptation of the algorithm itself. The realization of this cross-layer parallelization consists of algorithm recognition, code transformations, task distribution, and insertion of synchronization and communication statements. The method is evaluated first on a common kernel and then on a sample image processing algorithm to showcase the benefits of the approach. Compared to other methods that only rely on two or three of these layers, 20 to 30 % of additional performance gain can be achieved.

Cite as

Oliver Oey, Michael Huebner, Timo Stripf, and Juergen Becker. Embedded Multi-Core Code Generation with Cross-Layer Parallelization. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{oey_et_al:OASIcs.PARMA-DITAM.2024.5,
  author =	{Oey, Oliver and Huebner, Michael and Stripf, Timo and Becker, Juergen},
  title =	{{Embedded Multi-Core Code Generation with Cross-Layer Parallelization}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{5:1--5:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.5},
  URN =		{urn:nbn:de:0030-drops-196990},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.5},
  annote =	{Keywords: Parallelization, multi-core Processors, model-based Development, Code Generation}
}
Document
Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions

Authors: Marco Procaccini, Amin Sahebi, Marco Barbone, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.

Cite as

Marco Procaccini, Amin Sahebi, Marco Barbone, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{procaccini_et_al:OASIcs.PARMA-DITAM.2024.6,
  author =	{Procaccini, Marco and Sahebi, Amin and Barbone, Marco and Luk, Wayne and Gaydadjiev, Georgi and Giorgi, Roberto},
  title =	{{Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{6:1--6:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.6},
  URN =		{urn:nbn:de:0030-drops-197003},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.6},
  annote =	{Keywords: Graph processing, Distributed computing, Grid partitioning, FPGA, Accelerators}
}
Document
Resource Aware GPU Scheduling in Kubernetes Infrastructure

Authors: Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris

Published in: OASIcs, Volume 88, 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)


Abstract
Nowadays, there is an ever-increasing number of artificial intelligence inference workloads pushed and executed on the cloud. To effectively serve and manage the computational demands, data center operators have provisioned their infrastructures with accelerators. Specifically for GPUs, support for efficient management lacks, as state-of-the-art schedulers and orchestrators, threat GPUs only as typical compute resources ignoring their unique characteristics and application properties. This phenomenon combined with the GPU over-provisioning problem leads to severe resource under-utilization. Even though prior work has addressed this problem by colocating applications into a single accelerator device, its resource agnostic nature does not manage to face the resource under-utilization and quality of service violations especially for latency critical applications. In this paper, we design a resource aware GPU scheduling framework, able to efficiently colocate applications on the same GPU accelerator card. We integrate our solution with Kubernetes, one of the most widely used cloud orchestration frameworks. We show that our scheduler can achieve 58.8% lower end-to-end job execution time 99%-ile, while delivering 52.5% higher GPU memory usage, 105.9% higher GPU utilization percentage on average and 44.4% lower energy consumption on average, compared to the state-of-the-art schedulers, for a variety of ML representative workloads.

Cite as

Aggelos Ferikoglou, Dimosthenis Masouros, Achilleas Tzenetopoulos, Sotirios Xydis, and Dimitrios Soudris. Resource Aware GPU Scheduling in Kubernetes Infrastructure. In 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021). Open Access Series in Informatics (OASIcs), Volume 88, pp. 4:1-4:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ferikoglou_et_al:OASIcs.PARMA-DITAM.2021.4,
  author =	{Ferikoglou, Aggelos and Masouros, Dimosthenis and Tzenetopoulos, Achilleas and Xydis, Sotirios and Soudris, Dimitrios},
  title =	{{Resource Aware GPU Scheduling in Kubernetes Infrastructure}},
  booktitle =	{12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
  pages =	{4:1--4:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-181-8},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{88},
  editor =	{Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2021.4},
  URN =		{urn:nbn:de:0030-drops-136403},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2021.4},
  annote =	{Keywords: cloud computing, GPU scheduling, kubernetes, heterogeneity}
}
  • Refine by Type
  • 12 Document/PDF
  • 1 Artifact
  • 1 Document/HTML
  • 1 Volume

  • Refine by Publication Year
  • 3 2026
  • 1 2025
  • 9 2024
  • 1 2021

  • Refine by Author
  • 5 Xydis, Sotirios
  • 3 Curzel, Serena
  • 3 Ferikoglou, Aggelos
  • 3 Soudris, Dimitrios
  • 2 Bispo, João
  • Show More...

  • Refine by Series/Journal
  • 12 OASIcs

  • Refine by Classification
  • 3 Computer systems organization → Cloud computing
  • 3 Computer systems organization → Reconfigurable computing
  • 3 Software and its engineering → Compilers
  • 2 Computer systems organization → Embedded software
  • 2 Computer systems organization → Heterogeneous (hybrid) systems
  • Show More...

  • Refine by Keyword
  • 2 FPGA
  • 1 Accelerators
  • 1 Approximate Computing
  • 1 CGRA
  • 1 Code Generation
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail