DROPS

Document

Invited Paper

DOI: 10.4230/OASIcs.WCET.2024.5

Invited Paper: On the Granularity of Bandwidth Regulation in FPGA-Based Heterogeneous Systems on Chip

Authors: Gianluca Brilli, Giacomo Valente, Alessandro Capotondi, Tania Di Mascio, and Andrea Marongiu

Published in: OASIcs, Volume 121, 22nd International Workshop on Worst-Case Execution Time Analysis (WCET 2024)

Abstract

Main memory sharing in commercial, FPGA-based Heterogeneous System on Chips (HeSoCs) can cause significant interference, and ultimately severe slowdown of the executing workload, which bars the adoption of such systems in the context of time-critical applications. Bandwidth regulation approaches based on monitoring and throttling are widely adopted also in commercial hardware to improve the system quality of service (QoS), and previous work has shown that the finer the granularity of the mechanism, the more effective the QoS control. Different mechanisms, however, might exploit more or less effectively the available residual memory bandwidth, provided that the QoS requirement is satisfied. In this paper we present an exhaustive experimental evaluation of how three bandwidth regulation mechanisms with coarse, fine and ultra-fine granularity compare in terms of exploitation of the system memory bandwidth. Our results show that a very fine-grained regulation mechanism might experience worse system-level memory bandwidth exploitation compared to a coarser-grained approach.

Cite as

Gianluca Brilli, Giacomo Valente, Alessandro Capotondi, Tania Di Mascio, and Andrea Marongiu. Invited Paper: On the Granularity of Bandwidth Regulation in FPGA-Based Heterogeneous Systems on Chip. In 22nd International Workshop on Worst-Case Execution Time Analysis (WCET 2024). Open Access Series in Informatics (OASIcs), Volume 121, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{brilli_et_al:OASIcs.WCET.2024.5,
  author =	{Brilli, Gianluca and Valente, Giacomo and Capotondi, Alessandro and Di Mascio, Tania and Marongiu, Andrea},
  title =	{{Invited Paper: On the Granularity of Bandwidth Regulation in FPGA-Based Heterogeneous Systems on Chip}},
  booktitle =	{22nd International Workshop on Worst-Case Execution Time Analysis (WCET 2024)},
  pages =	{5:1--5:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-346-1},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{121},
  editor =	{Carle, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.WCET.2024.5},
  URN =		{urn:nbn:de:0030-drops-204732},
  doi =		{10.4230/OASIcs.WCET.2024.5},
  annote =	{Keywords: Bandwidth Regulation, System-on-Chip, FPGA}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.2

SlackCheck: A Linux Kernel Module to Verify Temporal Properties of a Task Schedule

Authors: Michele Castrovilli and Enrico Bini

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

The Linux Kernel offers several scheduling classes. From SCHED_DEADLINE down to SCHED_FIFO, SCHED_RR and SCHED_OTHER, the scheduling classes can provide different responsiveness to very diverse user workloads. Still, Linux does not offer any mechanism to take some action upon the violation of temporal constraints at runtime. The lack of such a feature is also due to the difficulty of extending the established notion of deadline to workloads which are not releasing periodic/sporadic jobs. Exploiting the notion of supply functions for any resource schedule, we implemented SlackCheck, a kernel module which is capable to verify at runtime if a given task is assigned a desired amount of resource or not. SlackCheck adds a constant-time check at every scheduling decision and leverages the recent availability of a Runtime Verification engine in the kernel.

Cite as

Michele Castrovilli and Enrico Bini. SlackCheck: A Linux Kernel Module to Verify Temporal Properties of a Task Schedule. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 2:1-2:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{castrovilli_et_al:LIPIcs.ECRTS.2024.2,
  author =	{Castrovilli, Michele and Bini, Enrico},
  title =	{{SlackCheck: A Linux Kernel Module to Verify Temporal Properties of a Task Schedule}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{2:1--2:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.2},
  URN =		{urn:nbn:de:0030-drops-203054},
  doi =		{10.4230/LIPIcs.ECRTS.2024.2},
  annote =	{Keywords: Linux scheduler, Runtime verification, bounded-delay resource partition, supply function, service curve, real-time calculus, network calculus}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.5

Shared Resource Contention in MCUs: A Reality Check and the Quest for Timeliness

Authors: Daniel Oliveira, Weifan Chen, Sandro Pinto, and Renato Mancuso

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

Microcontrollers (MCUs) are steadily embracing multi-core technology to meet growing performance demands. This trend marks a shift from their traditionally simple, deterministic designs to more complex and inherently less predictable architectures. While shared resource contention is well-studied in mid to high-end embedded systems, the emergence of multi-core architectures in MCUs introduces unique challenges and characteristics that existing research has not fully explored. In this paper, we conduct an in-depth investigation of both mainstream and next-generation MCU-based platforms, aiming to identify the sources of contention on systems typically lacking these problems. We empirically demonstrate substantial contention effects across different MCU architectures (i.e., from single- to multi-core configurations), highlighting significant application slowdowns. Notably, we observe that slowdowns can reach several orders of magnitude, with the most extreme cases showing up to a 3800x (times, not percent) increase in execution time. To address these issues, we propose and evaluate muTPArtc, a novel mechanism designed for Timely Progress Assessment (TPA) and TPA-based runtime control specifically tailored to MCUs. muTPArtc is an MCU-specialized TPA-based mechanism that leverages hardware facilities widely available in commercial off-the-shelf MCUs (i.e., hardware breakpoints and cycle counters) to successfully monitor applications' progress, detect, and mitigate timing violations. Our results demonstrate that muTPArtc effectively manages performance degradation due to interference, requiring only minimal modifications to the build pipeline and no changes to the source code of the target application, while incurring minor overheads.

Cite as

Daniel Oliveira, Weifan Chen, Sandro Pinto, and Renato Mancuso. Shared Resource Contention in MCUs: A Reality Check and the Quest for Timeliness. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 5:1-5:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{oliveira_et_al:LIPIcs.ECRTS.2024.5,
  author =	{Oliveira, Daniel and Chen, Weifan and Pinto, Sandro and Mancuso, Renato},
  title =	{{Shared Resource Contention in MCUs: A Reality Check and the Quest for Timeliness}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{5:1--5:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.5},
  URN =		{urn:nbn:de:0030-drops-203088},
  doi =		{10.4230/LIPIcs.ECRTS.2024.5},
  annote =	{Keywords: multi-core microcontrollers, shared resources contention, progress-aware regulation}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.7

The Omnivisor: A Real-Time Static Partitioning Hypervisor Extension for Heterogeneous Core Virtualization over MPSoCs

Authors: Daniele Ottaviano, Francesco Ciraolo, Renato Mancuso, and Marcello Cinque

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

Following the needs of industrial applications, virtualization has emerged as one of the most effective approaches for the consolidation of mixed-criticality systems while meeting tight constraints in terms of space, weight, power, and cost (SWaP-C). In embedded platforms with homogeneous processors, a wealth of works have proposed designs and techniques to enforce spatio-temporal isolation by leveraging well-understood virtualization support. Unfortunately, achieving the same goal on heterogeneous MultiProcessor Systems-on-Chip (MPSoCs) has been largely overlooked. Modern hypervisors are designed to operate exclusively on main cores, with little or no consideration given to other co-processors within the system, such as small microcontroller-level CPUs or soft-cores deployed on programmable logic (FPGA). Typically, hypervisors consider co-processors as I/O devices allocated to virtual machines that run on primary cores, yielding full control and responsibility over them. Nevertheless, inadequate management of these resources can lead to spatio-temporal isolation issues within the system. In this paper, we propose the Omnivisor model as a paradigm for the holistic management of heterogeneous platforms. The model generalizes the features of real-time static partitioning hypervisors to enable the execution of virtual machines on processors with different Instruction Set Architectures (ISAs) within the same MPSoC. Moreover, the Omnivisor ensures temporal and spatial isolation between virtual machines by integrating and leveraging a variety of hardware and software protection mechanisms. The presented approach not only expands the scope of virtualization in MPSoCs but also enhances the overall system reliability and real-time performance for mixed-criticality applications. A full open-source reference implementation of the Omnivisor based on the Jailhouse hypervisor is provided, targeting ARM real-time processing units and RISC-V soft-cores on FPGA. Experimental results on real hardware show the benefits of the solution, including enabling the seamless launch of virtual machines on different ISAs and extending spatial/temporal isolation to heterogenous cores with enhanced regulation policies.

Cite as

Daniele Ottaviano, Francesco Ciraolo, Renato Mancuso, and Marcello Cinque. The Omnivisor: A Real-Time Static Partitioning Hypervisor Extension for Heterogeneous Core Virtualization over MPSoCs. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 7:1-7:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ottaviano_et_al:LIPIcs.ECRTS.2024.7,
  author =	{Ottaviano, Daniele and Ciraolo, Francesco and Mancuso, Renato and Cinque, Marcello},
  title =	{{The Omnivisor: A Real-Time Static Partitioning Hypervisor Extension for Heterogeneous Core Virtualization over MPSoCs}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{7:1--7:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.7},
  URN =		{urn:nbn:de:0030-drops-203107},
  doi =		{10.4230/LIPIcs.ECRTS.2024.7},
  annote =	{Keywords: Mixed-Criticality, Embedded Virtualization, Real-Time Systems, MPSoCs}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2019.17

Impact of DM-LRU on WCET: A Static Analysis Approach

Authors: Renato Mancuso, Heechul Yun, and Isabelle Puaut

Published in: LIPIcs, Volume 133, 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)

Abstract

Cache memories in modern embedded processors are known to improve average memory access performance. Unfortunately, they are also known to represent a major source of unpredictability for hard real-time workload. One of the main limitations of typical caches is that content selection and replacement is entirely performed in hardware. As such, it is hard to control the cache behavior in software to favor caching of blocks that are known to have an impact on an application’s worst-case execution time (WCET). In this paper, we consider a cache replacement policy, namely DM-LRU, that allows system designers to prioritize caching of memory blocks that are known to have an important impact on an application’s WCET. Considering a single-core, single-level cache hierarchy, we describe an abstract interpretation-based timing analysis for DM-LRU. We implement the proposed analysis in a self-contained toolkit and study its qualitative properties on a set of representative benchmarks. Apart from being useful to compute the WCET when DM-LRU or similar policies are used, the proposed analysis can allow designers to perform WCET impact-aware selection of content to be retained in cache.

Cite as

Renato Mancuso, Heechul Yun, and Isabelle Puaut. Impact of DM-LRU on WCET: A Static Analysis Approach. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 17:1-17:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{mancuso_et_al:LIPIcs.ECRTS.2019.17,
  author =	{Mancuso, Renato and Yun, Heechul and Puaut, Isabelle},
  title =	{{Impact of DM-LRU on WCET: A Static Analysis Approach}},
  booktitle =	{31st Euromicro Conference on Real-Time Systems (ECRTS 2019)},
  pages =	{17:1--17:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-110-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{133},
  editor =	{Quinton, Sophie},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2019.17},
  URN =		{urn:nbn:de:0030-drops-107546},
  doi =		{10.4230/LIPIcs.ECRTS.2019.17},
  annote =	{Keywords: real-time, static cache analysis, abstract interpretation, LRU, deterministic memory, static cache locking, dynamic cache locking, cache profiling, WCET analysis}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2018.1

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Authors: Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, and Heechul Yun

Published in: LIPIcs, Volume 106, 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

Poor time predictability of multicore processors has been a long-standing challenge in the real-time systems community. In this paper, we make a case that a fundamental problem that prevents efficient and predictable real-time computing on multicore is the lack of a proper memory abstraction to express memory criticality, which cuts across various layers of the system: the application, OS, and hardware. We, therefore, propose a new holistic resource management approach driven by a new memory abstraction, which we call Deterministic Memory. The key characteristic of deterministic memory is that the platform-the OS and hardware-guarantees small and tightly bounded worst-case memory access timing. In contrast, we call the conventional memory abstraction as best-effort memory in which only highly pessimistic worst-case bounds can be achieved. We propose to utilize both abstractions to achieve high time predictability but without significantly sacrificing performance. We present deterministic memory-aware OS and architecture designs, including OS-level page allocator, hardware-level cache, and DRAM controller designs. We implement the proposed OS and architecture extensions on Linux and gem5 simulator. Our evaluation results, using a set of synthetic and real-world benchmarks, demonstrate the feasibility and effectiveness of our approach.

Cite as

Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, and Heechul Yun. Deterministic Memory Abstraction and Supporting Multicore System Architecture. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 1:1-1:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{farshchi_et_al:LIPIcs.ECRTS.2018.1,
  author =	{Farshchi, Farzad and Valsan, Prathap Kumar and Mancuso, Renato and Yun, Heechul},
  title =	{{Deterministic Memory Abstraction and Supporting Multicore System Architecture}},
  booktitle =	{30th Euromicro Conference on Real-Time Systems (ECRTS 2018)},
  pages =	{1:1--1:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-075-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{106},
  editor =	{Altmeyer, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2018.1},
  URN =		{urn:nbn:de:0030-drops-90010},
  doi =		{10.4230/LIPIcs.ECRTS.2018.1},
  annote =	{Keywords: multicore processors, real-time, shared cache, DRAM controller, Linux}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2018.19

Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms

Authors: Waqar Ali and Heechul Yun

Published in: LIPIcs, Volume 106, 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetson TX2 platform, an integrated CPU-GPU architecture, we observed that, in the worst case, the GPU kernels can suffer as much as 3X slowdown in the presence of co-running memory intensive CPU applications. In this paper, we propose a software mechanism, which we call BWLOCK++, to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications.

Cite as

Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{ali_et_al:LIPIcs.ECRTS.2018.19,
  author =	{Ali, Waqar and Yun, Heechul},
  title =	{{Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms}},
  booktitle =	{30th Euromicro Conference on Real-Time Systems (ECRTS 2018)},
  pages =	{19:1--19:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-075-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{106},
  editor =	{Altmeyer, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2018.19},
  URN =		{urn:nbn:de:0030-drops-89833},
  doi =		{10.4230/LIPIcs.ECRTS.2018.19},
  annote =	{Keywords: GPU, memory bandwidth, resource contention, CPU throttling, fair scheduler}
}

Document

DOI: 10.4230/DARTS.4.2.3

Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms (Artifact)

Authors: Waqar Ali and Heechul Yun

Published in: DARTS, Volume 4, Issue 2, Special Issue of the 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

This artifact is based on BWLOCK++, a software framework to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications in platforms containing integrated GPUs. The artifact is designed to support the claims of the companion paper and contains instructions on how to build and execute BWLOCK++ on a target hardware platform.

Cite as

Waqar Ali and Heechul Yun. Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms (Artifact). In Special Issue of the 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Dagstuhl Artifacts Series (DARTS), Volume 4, Issue 2, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@Article{ali_et_al:DARTS.4.2.3,
  author =	{Ali, Waqar and Yun, Heechul},
  title =	{{Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms (Artifact)}},
  pages =	{3:1--3:2},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2018},
  volume =	{4},
  number =	{2},
  editor =	{Ali, Waqar and Yun, Heechul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.4.2.3},
  URN =		{urn:nbn:de:0030-drops-89719},
  doi =		{10.4230/DARTS.4.2.3},
  annote =	{Keywords: GPU, memory bandwidth, resource contention, CPU throttling, fair scheduler}
}

8 Search Results for "Yun, Heechul"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as