DROPS

Artifact

Software

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems (Artifact)

Authors: Joshua Bakita and James H. Anderson

Abstract

Cite as

Joshua Bakita, James H. Anderson. Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems (Artifact) (Software, Source Code). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@misc{dagstuhl-artifact-23733,
   title = {{Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems (Artifact)}}, 
   author = {Bakita, Joshua and Anderson, James H.},
   note = {Software (visited on 2025-07-07)},
   url = {https://www.cs.unc.edu/~jbakita/ecrts25-ae/},
   doi = {10.4230/artifacts.23733},
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2025.21

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Authors: Joshua Bakita and James H. Anderson

Published in: LIPIcs, Volume 335, 37th Euromicro Conference on Real-Time Systems (ECRTS 2025)

Abstract

As GPU-using tasks become more common in embedded, safety-critical systems, efficiency demands necessitate sharing a single GPU among multiple tasks. Unfortunately, existing ways to schedule multiple tasks onto a GPU often either result in a loss of ability to meet deadlines, or a loss of efficiency. In this work, we develop a system-level spatial compute partitioning mechanism for NVIDIA GPUs and demonstrate that it can be used to execute tasks efficiently without compromising timing predictability. Our tool, called nvtaskset, supports composable systems by not requiring task, driver, or hardware modifications. In our evaluation, we demonstrate sub-1-μs overheads, stronger partition enforcement, and finer-granularity partitioning when using our mechanism instead of NVIDIA’s Multi-Process Service (MPS) or Multi-instance GPU (MiG) features.

Cite as

Joshua Bakita and James H. Anderson. Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems. In 37th Euromicro Conference on Real-Time Systems (ECRTS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 335, pp. 21:1-21:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bakita_et_al:LIPIcs.ECRTS.2025.21,
  author =	{Bakita, Joshua and Anderson, James H.},
  title =	{{Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems}},
  booktitle =	{37th Euromicro Conference on Real-Time Systems (ECRTS 2025)},
  pages =	{21:1--21:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-377-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{335},
  editor =	{Mancuso, Renato},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2025.21},
  URN =		{urn:nbn:de:0030-drops-235998},
  doi =		{10.4230/LIPIcs.ECRTS.2025.21},
  annote =	{Keywords: Real-time systems, composable systems, graphics processing units, CUDA}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.11

Open Problem Resolved: The "Two" in Existing Multiprocessor PI-Blocking Bounds Is Fundamental

Authors: Shareef Ahmed and James H. Anderson

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

The goal of a real-time locking protocol is to reduce any priority-inversion blocking (pi-blocking) a task may incur while waiting to access a shared resource. For mutual-exclusion sharing on an m-processor platform, the best existing lower bound on per-task pi-blocking under suspension-oblivious analysis is a (trivial) lower bound of (m-1) request lengths under any job-level fixed-priority (JLFP) scheduler. Surprisingly, most asymptotically optimal locking protocols achieve a per-task pi-blocking upper bound of (2m-1) request lengths under JLFP scheduling, even though a range of very different mechanisms are used in these protocols. This paper closes the gap between these existing lower and upper bounds by establishing a lower bound of (2m-2) request lengths under global fixed-priority (G-FP) and global earliest-deadline-first (G-EDF) scheduling. This paper also shows that worst-case per-task pi-blocking can be arbitrarily close to (2m-1) request lengths for locking protocols that satisfy a certain property that is met by most (if not all) existing locking protocols. These results imply that most known asymptotically optimal locking protocols are almost truly optimal (not just asymptotic) under G-FP and G-EDF scheduling.

Cite as

Shareef Ahmed and James H. Anderson. Open Problem Resolved: The "Two" in Existing Multiprocessor PI-Blocking Bounds Is Fundamental. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ahmed_et_al:LIPIcs.ECRTS.2024.11,
  author =	{Ahmed, Shareef and Anderson, James H.},
  title =	{{Open Problem Resolved: The "Two" in Existing Multiprocessor PI-Blocking Bounds Is Fundamental}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{11:1--11:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.11},
  URN =		{urn:nbn:de:0030-drops-203145},
  doi =		{10.4230/LIPIcs.ECRTS.2024.11},
  annote =	{Keywords: Real-Time Systems, Real-Time Synchronization, Multiprocessors}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.12

Autonomy Today: Many Delay-Prone Black Boxes

Authors: Sizhe Liu, Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, and Yunhua Li

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

Machine-learning (ML) technology has been a key enabler in the push towards realizing ever more sophisticated autonomous-driving features. In deploying such technology, the automotive industry has relied heavily on using "black-box" software and hardware components that were originally intended for non-safety-critical contexts, without a full understanding of their real-time capabilities. A prime example of such a component is CUDA, which is fundamental to the acceleration of ML algorithms using NVIDIA GPUs. In this paper, evidence is presented demonstrating that CUDA can cause unbounded task delays. Such delays are the result of CUDA’s usage of synchronization mechanisms in the POSIX thread (pthread) library, so the latter is implicated as a delay-prone component as well. Such synchronization delays are shown to be the source of a system failure that occurred in an actual autonomous vehicle system during testing at WeRide. Motivated by these findings, a broader experimental study is presented that demonstrates several real-time deficiencies in CUDA, the glibc pthread library, Linux, and the POSIX interface of the safety-certified QNX Operating System for Safety. Partial mitigations for these deficiencies are presented and further actions are proposed for real-time researchers and developers to integrate more complete mitigations.

Cite as

Sizhe Liu, Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, and Yunhua Li. Autonomy Today: Many Delay-Prone Black Boxes. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 12:1-12:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.ECRTS.2024.12,
  author =	{Liu, Sizhe and Wagle, Rohan and Anderson, James H. and Yang, Ming and Zhang, Chi and Li, Yunhua},
  title =	{{Autonomy Today: Many Delay-Prone Black Boxes}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{12:1--12:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.12},
  URN =		{urn:nbn:de:0030-drops-203152},
  doi =		{10.4230/LIPIcs.ECRTS.2024.12},
  annote =	{Keywords: autonomous driving, CUDA programming, locking protocols, POSIX thread, operating systems, machine learning systems, real-time systems}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2024.15

Predictable GPU Sharing in Component-Based Real-Time Systems

Authors: Syed W. Ali, Zelin Tong, Joseph Goh, and James H. Anderson

Published in: LIPIcs, Volume 298, 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

This paper presents a real-time locking protocol whose design was motivated by the goal of enabling safe GPU sharing in time-sliced component-based systems. This locking protocol enables a GPU to be shared concurrently across, and utilized within, isolated components with predictable execution times. It relies on a novel resizing technique where GPU work is dimensioned on-the-fly to run on partitions of an NVIDIA GPU. This technique can be applied to any component that internally utilizes global CPU scheduling. The proposed locking protocol enables increased GPU parallelism and reduces GPU capacity loss with analytically provable benefits.

Cite as

Syed W. Ali, Zelin Tong, Joseph Goh, and James H. Anderson. Predictable GPU Sharing in Component-Based Real-Time Systems. In 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 298, pp. 15:1-15:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ali_et_al:LIPIcs.ECRTS.2024.15,
  author =	{Ali, Syed W. and Tong, Zelin and Goh, Joseph and Anderson, James H.},
  title =	{{Predictable GPU Sharing in Component-Based Real-Time Systems}},
  booktitle =	{36th Euromicro Conference on Real-Time Systems (ECRTS 2024)},
  pages =	{15:1--15:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-324-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{298},
  editor =	{Pellizzoni, Rodolfo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2024.15},
  URN =		{urn:nbn:de:0030-drops-203183},
  doi =		{10.4230/LIPIcs.ECRTS.2024.15},
  annote =	{Keywords: GPU locking protocols, real-time locking protocols, priority-inversion blocking, component-based systems}
}

Document

Artifact

DOI: 10.4230/DARTS.10.1.1

Predictable GPU Sharing in Component-Based Real-Time Systems (Artifact)

Authors: Syed W. Ali, Zelin Tong, Joseph Goh, and James H. Anderson

Published in: DARTS, Volume 10, Issue 1, Special Issue of the 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

This paper presents a real-time locking protocol whose design was motivated by the goal of enabling safe GPU sharing in time-sliced component-based systems. This locking protocol enables a GPU to be shared concurrently across, and utilized within, isolated components with predictable execution times. It relies on a novel resizing technique where GPU work is dimensioned on-the-fly to run on partitions of an NVIDIA GPU. This technique can be applied to any component that internally utilizes global CPU scheduling. The proposed locking protocol enables increased GPU parallelism and reduces GPU capacity loss with analytically provable benefits.

Cite as

Syed W. Ali, Zelin Tong, Joseph Goh, and James H. Anderson. Predictable GPU Sharing in Component-Based Real-Time Systems (Artifact). In Special Issue of the 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Dagstuhl Artifacts Series (DARTS), Volume 10, Issue 1, pp. 1:1-1:5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{ali_et_al:DARTS.10.1.1,
  author =	{Ali, Syed W. and Tong, Zelin and Goh, Joseph and Anderson, James H.},
  title =	{{Predictable GPU Sharing in Component-Based Real-Time Systems (Artifact)}},
  pages =	{1:1--1:5},
  journal =	{Dagstuhl Artifacts Series},
  ISBN =	{978-3-95977-327-0},
  ISSN =	{2509-8195},
  year =	{2024},
  volume =	{10},
  number =	{1},
  editor =	{Ali, Syed W. and Tong, Zelin and Goh, Joseph and Anderson, James H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.10.1.1},
  URN =		{urn:nbn:de:0030-drops-203236},
  doi =		{10.4230/DARTS.10.1.1},
  annote =	{Keywords: GPU locking protocols, real-time locking protocols, priority-inversion blocking, component-based systems}
}

Document

Artifact

DOI: 10.4230/DARTS.10.1.3

Autonomy Today: Many Delay-Prone Black Boxes (Artifact)

Authors: Sizhe Liu, Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, and Yunhua Li

Published in: DARTS, Volume 10, Issue 1, Special Issue of the 36th Euromicro Conference on Real-Time Systems (ECRTS 2024)

Abstract

Machine-learning (ML) technology has been a key enabler in the push towards realizing ever more sophisticated autonomous-driving features. In deploying such technology, the automotive industry has relied heavily on using "black-box" software and hardware components that were originally intended for non-safety-critical contexts, without a full understanding of their real-time capabilities. A prime example of such a component is CUDA, which is fundamental to the acceleration of ML algorithms using NVIDIA GPUs. In this paper, evidence is presented demonstrating that CUDA can cause unbounded task delays. Such delays are the result of CUDA’s usage of synchronization mechanisms in the POSIX thread (pthread) library, so the latter is implicated as a delay-prone component as well. Such synchronization delays are shown to be the source of a system failure that occurred in an actual autonomous vehicle system during testing at WeRide. Motivated by these findings, a broader experimental study is presented that demonstrates several real-time deficiencies in CUDA, the glibc pthread library, Linux, and the POSIX interface of the safety-certified QNX Operating System for Safety. Partial mitigations for these deficiencies are presented and further actions are proposed for real-time researchers and developers to integrate more complete mitigations.

Cite as

Sizhe Liu, Rohan Wagle, James H. Anderson, Ming Yang, Chi Zhang, and Yunhua Li. Autonomy Today: Many Delay-Prone Black Boxes (Artifact). In Special Issue of the 36th Euromicro Conference on Real-Time Systems (ECRTS 2024). Dagstuhl Artifacts Series (DARTS), Volume 10, Issue 1, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{liu_et_al:DARTS.10.1.3,
  author =	{Liu, Sizhe and Wagle, Rohan and Anderson, James H. and Yang, Ming and Zhang, Chi and Li, Yunhua},
  title =	{{Autonomy Today: Many Delay-Prone Black Boxes (Artifact)}},
  pages =	{3:1--3:3},
  journal =	{Dagstuhl Artifacts Series},
  ISBN =	{978-3-95977-327-0},
  ISSN =	{2509-8195},
  year =	{2024},
  volume =	{10},
  number =	{1},
  editor =	{Liu, Sizhe and Wagle, Rohan and Anderson, James H. and Yang, Ming and Zhang, Chi and Li, Yunhua},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.10.1.3},
  URN =		{urn:nbn:de:0030-drops-203259},
  doi =		{10.4230/DARTS.10.1.3},
  annote =	{Keywords: autonomous driving, CUDA programming, locking protocols, POSIX thread, operating systems, machine learning systems, real-time systems}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2023.16

Optimal Multiprocessor Locking Protocols Under FIFO Scheduling

Authors: Shareef Ahmed and James H. Anderson

Published in: LIPIcs, Volume 262, 35th Euromicro Conference on Real-Time Systems (ECRTS 2023)

Abstract

Real-time locking protocols are typically designed to reduce any priority-inversion blocking (pi-blocking) a task may incur while waiting to access a shared resource. For the multiprocessor case, a number of such protocols have been developed that ensure asymptotically optimal pi-blocking bounds under job-level fixed-priority scheduling. Unfortunately, no optimal multiprocessor real-time locking protocols are known that ensure tight pi-blocking bounds under any scheduler. This paper presents the first such protocols. Specifically, protocols are presented for mutual exclusion, reader-writer synchronization, and k-exclusion that are optimal under first-in-first-out (FIFO) scheduling when schedulability analysis treats suspension times as computation. Experiments are presented that demonstrate the effectiveness of these protocols.

Cite as

Shareef Ahmed and James H. Anderson. Optimal Multiprocessor Locking Protocols Under FIFO Scheduling. In 35th Euromicro Conference on Real-Time Systems (ECRTS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 262, pp. 16:1-16:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{ahmed_et_al:LIPIcs.ECRTS.2023.16,
  author =	{Ahmed, Shareef and Anderson, James H.},
  title =	{{Optimal Multiprocessor Locking Protocols Under FIFO Scheduling}},
  booktitle =	{35th Euromicro Conference on Real-Time Systems (ECRTS 2023)},
  pages =	{16:1--16:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-280-8},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{262},
  editor =	{Papadopoulos, Alessandro V.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2023.16},
  URN =		{urn:nbn:de:0030-drops-180451},
  doi =		{10.4230/LIPIcs.ECRTS.2023.16},
  annote =	{Keywords: Real-Time Systems, Real-Time Synchronization, Multiprocessors}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2022.10

Overrun-Resilient Multiprocessor Real-Time Locking

Authors: Zelin Tong, Shareef Ahmed, and James H. Anderson

Published in: LIPIcs, Volume 231, 34th Euromicro Conference on Real-Time Systems (ECRTS 2022)

Abstract

Existing real-time locking protocols require accurate worst-case execution time (WCET) estimates for both tasks and critical sections (CSs) in order to function correctly. On multicore platforms, however, the only seemingly viable strategy for obtaining such estimates is via measurements, which cannot produce a true WCET with certainty. The absence of correct WCETs can be partially ameliorated by enforcing execution budgets at both the task and CS levels and by using a locking protocol that is resilient to budget overruns, i.e., that ensures that the schedulability of non-overrunning tasks is not compromised by tasks that do overrun their budgets. Unfortunately, no fully overrun-resilient locking protocol has been proposed to date for multiprocessor systems. To remedy this situation, this paper presents two such protocols, the OR-FMLP and the OR-OMLP , which introduce overrun-resiliency mechanisms to two existing multiprocessor protocols, the spin-based FMLP and suspension-based global OMLP, respectively. In devising such mechanisms, undo code can be problematic. For the important locking use case of protecting shared data structures, it is shown that such code can be avoided entirely by using abortable critical sections, a concept proposed herein that leverages obstruction-free synchronization techniques. Experiments are presented that demonstrate both the effectiveness of the mechanisms introduced in this paper and their cost.

Cite as

Zelin Tong, Shareef Ahmed, and James H. Anderson. Overrun-Resilient Multiprocessor Real-Time Locking. In 34th Euromicro Conference on Real-Time Systems (ECRTS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 231, pp. 10:1-10:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{tong_et_al:LIPIcs.ECRTS.2022.10,
  author =	{Tong, Zelin and Ahmed, Shareef and Anderson, James H.},
  title =	{{Overrun-Resilient Multiprocessor Real-Time Locking}},
  booktitle =	{34th Euromicro Conference on Real-Time Systems (ECRTS 2022)},
  pages =	{10:1--10:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-239-6},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{231},
  editor =	{Maggio, Martina},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2022.10},
  URN =		{urn:nbn:de:0030-drops-163272},
  doi =		{10.4230/LIPIcs.ECRTS.2022.10},
  annote =	{Keywords: Real-Time Systems, Real-Time Synchronization, Budget Enforcement}
}

Document

Artifact

DOI: 10.4230/DARTS.7.1.3

Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads (Artifact)

Authors: Catherine E. Nemitz, Shai Caspin, James H. Anderson, and Bryan C. Ward

Published in: DARTS, Volume 7, Issue 1, Special Issue of the 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)

Abstract

This paper is directed at reader/writer locking for read-dominant real-time workloads. It is shown that state-of-the-art real-time reader/writer locking protocols are subject to performance limitations when reads dominate, and that existing schedulability analysis fails to leverage the sparsity of writes in this case. A new reader/writer locking-protocol implementation and new inflation-free schedulability analysis are proposed to address these problems. Overhead evaluations of the new implementation show a decrease in overheads of up to 70% over previous implementations, leading to throughput for read operations increasing by up to 450%. Schedulability experiments are presented that show that the analysis results in schedulability improvements of up to 156.8% compared to the existing state-of-the-art approach.

Cite as

Catherine E. Nemitz, Shai Caspin, James H. Anderson, and Bryan C. Ward. Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads (Artifact). In Special Issue of the 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Dagstuhl Artifacts Series (DARTS), Volume 7, Issue 1, pp. 3:1-3:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@Article{nemitz_et_al:DARTS.7.1.3,
  author =	{Nemitz, Catherine E. and Caspin, Shai and Anderson, James H. and Ward, Bryan C.},
  title =	{{Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads (Artifact)}},
  pages =	{3:1--3:3},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2021},
  volume =	{7},
  number =	{1},
  editor =	{Nemitz, Catherine E. and Caspin, Shai and Anderson, James H. and Ward, Bryan C.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.7.1.3},
  URN =		{urn:nbn:de:0030-drops-139828},
  doi =		{10.4230/DARTS.7.1.3},
  annote =	{Keywords: Reader/writer, real-time, synchronization, spinlock, RMR complexity}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2021.6

Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads

Authors: Catherine E. Nemitz, Shai Caspin, James H. Anderson, and Bryan C. Ward

Published in: LIPIcs, Volume 196, 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)

Abstract

This paper is directed at reader/writer locking for read-dominant real-time workloads. It is shown that state-of-the-art real-time reader/writer locking protocols are subject to performance limitations when reads dominate, and that existing schedulability analysis fails to leverage the sparsity of writes in this case. A new reader/writer locking-protocol implementation and new inflation-free schedulability analysis are proposed to address these problems. Overhead evaluations of the new implementation show a decrease in overheads of up to 70% over previous implementations, leading to throughput for read operations increasing by up to 450%. Schedulability experiments are presented that show that the analysis results in schedulability improvements of up to 156.8% compared to the existing state-of-the-art approach.

Cite as

Catherine E. Nemitz, Shai Caspin, James H. Anderson, and Bryan C. Ward. Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 196, pp. 6:1-6:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{nemitz_et_al:LIPIcs.ECRTS.2021.6,
  author =	{Nemitz, Catherine E. and Caspin, Shai and Anderson, James H. and Ward, Bryan C.},
  title =	{{Light Reading: Optimizing Reader/Writer Locking for Read-Dominant Real-Time Workloads}},
  booktitle =	{33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)},
  pages =	{6:1--6:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-192-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{196},
  editor =	{Brandenburg, Bj\"{o}rn B.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2021.6},
  URN =		{urn:nbn:de:0030-drops-139378},
  doi =		{10.4230/LIPIcs.ECRTS.2021.6},
  annote =	{Keywords: Reader/writer, real-time, synchronization, spinlock, RMR complexity}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2021.11

Tight Tardiness Bounds for Pseudo-Harmonic Tasks Under Global-EDF-Like Schedulers

Authors: Shareef Ahmed and James H. Anderson

Published in: LIPIcs, Volume 196, 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)

Abstract

The global earliest-deadline-first (GEDF) scheduler and its variants are soft-real-time (SRT) optimal for periodic/sporadic tasks, meaning they provide bounded tardiness so long as the underlying platform is not over-utilized. Although their SRT-optimality has long been known, tight tardiness bounds for these schedulers have remained elusive. In this paper, a tardiness bound, that does not depend on the processor or task count, is derived for pseudo-harmonic periodic tasks, which are commonly used in practice, under global-EDF-like (GEL) schedulers. This class of schedulers includes both GEDF and first-in-first-out (FIFO). This bound is shown to be generally tight via an example. Furthermore, it is shown that exact tardiness bounds for GEL-scheduled pseudo-harmonic periodic tasks can be computed in pseudo-polynomial time.

Cite as

Shareef Ahmed and James H. Anderson. Tight Tardiness Bounds for Pseudo-Harmonic Tasks Under Global-EDF-Like Schedulers. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 196, pp. 11:1-11:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{ahmed_et_al:LIPIcs.ECRTS.2021.11,
  author =	{Ahmed, Shareef and Anderson, James H.},
  title =	{{Tight Tardiness Bounds for Pseudo-Harmonic Tasks Under Global-EDF-Like Schedulers}},
  booktitle =	{33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)},
  pages =	{11:1--11:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-192-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{196},
  editor =	{Brandenburg, Bj\"{o}rn B.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2021.11},
  URN =		{urn:nbn:de:0030-drops-139428},
  doi =		{10.4230/LIPIcs.ECRTS.2021.11},
  annote =	{Keywords: soft real-time systems, multicore, tardiness bounds}
}

Document

Artifact

DOI: 10.4230/DARTS.6.1.1

Simultaneous Multithreading and Hard Real Time: Can it be Safe? (Artifact)

Authors: Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson

Published in: DARTS, Volume 6, Issue 1, Special Issue of the 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)

Abstract

The applicability of Simultaneous Multithreading (SMT) to real-time systems has been hampered by the difficulty of obtaining reliable execution costs in an SMT-enabled system. This problem is addressed from two directions. A scheduler is introduced, CERT-MT, that minimizes SMT-related timing variations, and two new timing analysis methods - one based on the binomial distribution and one based on Cantelli’s Inequality - are given. Both methods estimate probabilistic WCETs and attach statistical confidence levels to those estimates. The timing analyses are applied to tasks executing with and without SMT, and it is found that in some cases, two tasks utilizing SMT can be safely executed in less time than would be needed for either task by itself. A large-scale schedulability study is conducted, showing that CERT-MT can schedule systems with total utilizations twice what could otherwise be achieved. This artifact includes benchmark experiments used to compare execution times with and without SMT and code to analyze the benchmark experiments and duplicate the reported schedulability experiments.

Cite as

Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson. Simultaneous Multithreading and Hard Real Time: Can it be Safe? (Artifact). In Special Issue of the 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Dagstuhl Artifacts Series (DARTS), Volume 6, Issue 1, pp. 1:1-1:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@Article{osborne_et_al:DARTS.6.1.1,
  author =	{Osborne, Sims Hill and Bakita, Joshua J. and Anderson, James H.},
  title =	{{Simultaneous Multithreading and Hard Real Time: Can it be Safe? (Artifact)}},
  pages =	{1:1--1:3},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2020},
  volume =	{6},
  number =	{1},
  editor =	{Osborne, Sims Hill and Bakita, Joshua J. and Anderson, James H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.6.1.1},
  URN =		{urn:nbn:de:0030-drops-123915},
  doi =		{10.4230/DARTS.6.1.1},
  annote =	{Keywords: real-time systems, simultaneous multithreading, real-time, scheduling algorithms, timing analysis, probability, statistics}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2020.10

AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads

Authors: Nathan Otterness and James H. Anderson

Published in: LIPIcs, Volume 165, 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)

Abstract

Graphics processing units (GPUs) manufactured by NVIDIA continue to dominate many fields of research, including real-time GPU-management. NVIDIA’s status as a key enabling technology for deep learning and image processing makes this unsurprising, especially when combined with the company’s push into embedded, safety-critical domains like autonomous driving. NVIDIA’s primary competitor, AMD, has received comparatively little attention, due in part to few embedded offerings and a lack of support from popular deep-learning toolkits. Recently, however, AMD’s ROCm (Radeon Open Compute) software platform was made available to address at least the second of these two issues, but is ROCm worth the attention of safety-critical software developers? In order to answer this question, this paper explores the features and pitfalls of AMD GPUs, focusing on contrasting details with NVIDIA’s GPU hardware and software. We argue that an open software stack such as ROCm may be able to provide much-needed flexibility and reproducibility in the context of real-time GPU research, where new algorithmic or analysis techniques should typically remain agnostic to the underlying GPU architecture. In support of this claim, we summarize how closed-source platforms have obstructed prior research using NVIDIA GPUs, and then demonstrate that AMD may be a viable alternative by modifying components of the ROCm software stack to implement spatial partitioning. Finally, we present a case study using the PyTorch deep-learning framework that demonstrates the impact such modifications can have on complex real-world software.

Cite as

Nathan Otterness and James H. Anderson. AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 165, pp. 10:1-10:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{otterness_et_al:LIPIcs.ECRTS.2020.10,
  author =	{Otterness, Nathan and Anderson, James H.},
  title =	{{AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads}},
  booktitle =	{32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)},
  pages =	{10:1--10:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-152-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{165},
  editor =	{V\"{o}lp, Marcus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2020.10},
  URN =		{urn:nbn:de:0030-drops-123732},
  doi =		{10.4230/LIPIcs.ECRTS.2020.10},
  annote =	{Keywords: real-time systems, graphics processing units, parallel computing}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2020.14

Simultaneous Multithreading and Hard Real Time: Can It Be Safe?

Authors: Sims Hill Osborne and James H. Anderson

Published in: LIPIcs, Volume 165, 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)

Abstract

The applicability of Simultaneous Multithreading (SMT) to real-time systems has been hampered by the difficulty of obtaining reliable execution costs in an SMT-enabled system. This problem is addressed by introducing a scheduling framework, called CERT-MT, that combines scheduling-aware timing analysis with a cyclic-executive scheduler in a way that minimizes SMT-related timing variations. The proposed scheduling-aware timing analysis is based on maximum observed execution times and accounts for the uncertainty inherent in measurement-based timing analysis. The timing analysis is found to work for tasks with and without SMT, though some adjustments are required in the former case. A large-scale schedulability study is presented that shows CERT-MT can schedule systems with total utilizations approaching 1.4 times the core count, without sacrificing safety.

Cite as

Sims Hill Osborne and James H. Anderson. Simultaneous Multithreading and Hard Real Time: Can It Be Safe?. In 32nd Euromicro Conference on Real-Time Systems (ECRTS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 165, pp. 14:1-14:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{osborne_et_al:LIPIcs.ECRTS.2020.14,
  author =	{Osborne, Sims Hill and Anderson, James H.},
  title =	{{Simultaneous Multithreading and Hard Real Time: Can It Be Safe?}},
  booktitle =	{32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)},
  pages =	{14:1--14:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-152-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{165},
  editor =	{V\"{o}lp, Marcus},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2020.14},
  URN =		{urn:nbn:de:0030-drops-123778},
  doi =		{10.4230/LIPIcs.ECRTS.2020.14},
  annote =	{Keywords: real-time systems, simultaneous multithreading, hard real-time, scheduling algorithms, probability, statistics, timing analysis}
}

Document

Artifact

DOI: 10.4230/DARTS.5.1.8

Simultaneous Multithreading Applied to Real Time (Artifact)

Authors: Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson

Published in: DARTS, Volume 5, Issue 1, Special Issue of the 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)

Abstract

Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled. This artifact includes benchmark experiments used to compare execution times with and without SMT and code to duplicate the reported schedulability experiments.

Cite as

Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson. Simultaneous Multithreading Applied to Real Time (Artifact). In Special Issue of the 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Dagstuhl Artifacts Series (DARTS), Volume 5, Issue 1, pp. 8:1-8:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Article{osborne_et_al:DARTS.5.1.8,
  author =	{Osborne, Sims Hill and Bakita, Joshua J. and Anderson, James H.},
  title =	{{Simultaneous Multithreading Applied to Real Time}},
  pages =	{8:1--8:2},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2019},
  volume =	{5},
  number =	{1},
  editor =	{Osborne, Sims Hill and Bakita, Joshua J. and Anderson, James H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.5.1.8},
  URN =		{urn:nbn:de:0030-drops-107362},
  doi =		{10.4230/DARTS.5.1.8},
  annote =	{Keywords: real-time systems, simultaneous multithreading, soft real-time, scheduling algorithms}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2019.3

Simultaneous Multithreading Applied to Real Time

Authors: Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson

Published in: LIPIcs, Volume 133, 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)

Abstract

Existing models used in real-time scheduling are inadequate to take advantage of simultaneous multithreading (SMT), which has been shown to improve performance in many areas of computing, but has seen little application to real-time systems. The SMART task model, which allows for combining SMT and real time by accounting for the variable task execution costs caused by SMT, is introduced, along with methods and conditions for scheduling SMT tasks under global earliest-deadline-first scheduling. The benefits of using SMT are demonstrated through a large-scale schedulability study in which we show that task systems with utilizations 30% larger than what would be schedulable without SMT can be correctly scheduled.

Cite as

Sims Hill Osborne, Joshua J. Bakita, and James H. Anderson. Simultaneous Multithreading Applied to Real Time. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 3:1-3:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{osborne_et_al:LIPIcs.ECRTS.2019.3,
  author =	{Osborne, Sims Hill and Bakita, Joshua J. and Anderson, James H.},
  title =	{{Simultaneous Multithreading Applied to Real Time}},
  booktitle =	{31st Euromicro Conference on Real-Time Systems (ECRTS 2019)},
  pages =	{3:1--3:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-110-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{133},
  editor =	{Quinton, Sophie},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2019.3},
  URN =		{urn:nbn:de:0030-drops-107400},
  doi =		{10.4230/LIPIcs.ECRTS.2019.3},
  annote =	{Keywords: real-time systems, simultaneous multithreading, soft real-time, scheduling algorithms}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2019.13

GEDF Tardiness: Open Problems Involving Uniform Multiprocessors and Affinity Masks Resolved

Authors: Stephen Tang, Sergey Voronov, and James H. Anderson

Published in: LIPIcs, Volume 133, 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)

Abstract

Prior work has shown that the global earliest-deadline-first (GEDF) scheduler is soft real-time (SRT)-optimal for sporadic task systems in a variety of contexts, meaning that bounded deadline tardiness can be guaranteed under it for any task system that does not cause platform overutilization. However, one particularly compelling context has remained elusive: multiprocessor platforms in which tasks have affinity masks that determine the processors where they may execute. Actual GEDF implementations, such as the SCHED_DEADLINE class in Linux, have dealt with this unresolved question by foregoing SRT guarantees once affinity masks are set. This unresolved question, as it pertains to SCHED_DEADLINE, was included by Peter Zijlstra in a list of important open problems affecting Linux in his keynote talk at ECRTS 2017. In this paper, this question is resolved along with another open problem that at first blush seems unrelated but actually is. Specifically, both problems are closed by establishing two results. First, a proof strategy used previously to establish GEDF tardiness bounds that are exponential in size on heterogeneous uniform multiprocessors is generalized to show that polynomial bounds exist on a wider class of platforms. Second, both uniform multiprocessors and identical multiprocessors with affinities are shown to be within this class. These results yield the first polynomial GEDF tardiness bounds for the uniform case and the first such bounds of any kind for the identical-with-affinities case.

Cite as

Stephen Tang, Sergey Voronov, and James H. Anderson. GEDF Tardiness: Open Problems Involving Uniform Multiprocessors and Affinity Masks Resolved. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 13:1-13:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{tang_et_al:LIPIcs.ECRTS.2019.13,
  author =	{Tang, Stephen and Voronov, Sergey and Anderson, James H.},
  title =	{{GEDF Tardiness: Open Problems Involving Uniform Multiprocessors and Affinity Masks Resolved}},
  booktitle =	{31st Euromicro Conference on Real-Time Systems (ECRTS 2019)},
  pages =	{13:1--13:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-110-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{133},
  editor =	{Quinton, Sophie},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2019.13},
  URN =		{urn:nbn:de:0030-drops-107504},
  doi =		{10.4230/LIPIcs.ECRTS.2019.13},
  annote =	{Keywords: scheduling theory, multicore, processor affinity masks, GEDF, uniform multiprocessors}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2018.20

Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems

Authors: Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H. Anderson, and F. Donelson Smith

Published in: LIPIcs, Volume 106, 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

NVIDIA's CUDA API has enabled GPUs to be used as computing accelerators across a wide range of applications. This has resulted in performance gains in many application domains, but the underlying GPU hardware and software are subject to many non-obvious pitfalls. This is particularly problematic for safety-critical systems, where worst-case behaviors must be taken into account. While such behaviors were not a key concern for earlier CUDA users, the usage of GPUs in autonomous vehicles has taken CUDA programs out of the sole domain of computer-vision and machine-learning experts and into safety-critical processing pipelines. Certification is necessary in this new domain, which is problematic because GPU software may have been developed without any regard for worst-case behaviors. Pitfalls when using CUDA in real-time autonomous systems can result from the lack of specifics in official documentation, and developers of GPU software not being aware of the implications of their design choices with regards to real-time requirements. This paper focuses on the particular challenges facing the real-time community when utilizing CUDA-enabled GPUs for autonomous applications, and best practices for applying real-time safety-critical principles.

Cite as

Ming Yang, Nathan Otterness, Tanya Amert, Joshua Bakita, James H. Anderson, and F. Donelson Smith. Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{yang_et_al:LIPIcs.ECRTS.2018.20,
  author =	{Yang, Ming and Otterness, Nathan and Amert, Tanya and Bakita, Joshua and Anderson, James H. and Smith, F. Donelson},
  title =	{{Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems}},
  booktitle =	{30th Euromicro Conference on Real-Time Systems (ECRTS 2018)},
  pages =	{20:1--20:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-075-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{106},
  editor =	{Altmeyer, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2018.20},
  URN =		{urn:nbn:de:0030-drops-89845},
  doi =		{10.4230/LIPIcs.ECRTS.2018.20},
  annote =	{Keywords: real-time systems, graphics processing units, scheduling algorithms, parallel computing, embedded software}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2018.25

Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts

Authors: Catherine E. Nemitz, Tanya Amert, and James H. Anderson

Published in: LIPIcs, Volume 106, 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

During the past decade, parallelism-related issues have been at the forefront of real-time systems research due to the advent of multicore technologies. In the coming years, such issues will loom ever larger due to increasing core counts. Having more cores means a greater potential exists for platform capacity loss when the available parallelism cannot be fully exploited. In this paper, such capacity loss is considered in the context of real-time locking protocols. In this context, lock nesting becomes a key concern as it can result in transitive blocking chains that force tasks to execute sequentially unnecessarily. Such chains can be quite long on a larger machine. Contention-sensitive real-time locking protocols have been proposed as a means of "breaking" transitive blocking chains, but such protocols tend to have high overhead due to more complicated lock/unlock logic. To ease such overhead, the usage of lock servers is considered herein. In particular, four specific lock-server paradigms are proposed and many nuances concerning their deployment are explored. Experiments are presented that show that, by executing cache hot, lock servers can enable reductions in lock/unlock overhead of up to 86%. Such reductions make contention-sensitive protocols a viable approach in practice.

Cite as

Catherine E. Nemitz, Tanya Amert, and James H. Anderson. Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 106, pp. 25:1-25:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{nemitz_et_al:LIPIcs.ECRTS.2018.25,
  author =	{Nemitz, Catherine E. and Amert, Tanya and Anderson, James H.},
  title =	{{Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts}},
  booktitle =	{30th Euromicro Conference on Real-Time Systems (ECRTS 2018)},
  pages =	{25:1--25:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-075-0},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{106},
  editor =	{Altmeyer, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2018.25},
  URN =		{urn:nbn:de:0030-drops-89789},
  doi =		{10.4230/LIPIcs.ECRTS.2018.25},
  annote =	{Keywords: multiprocess locking protocols, nested locks, priority-inversion blocking, reader/writer locks, real-time locking protocols}
}

Document

DOI: 10.4230/DARTS.4.2.2

Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts (Artifact)

Authors: Catherine E. Nemitz, Tanya Amert, and James H. Anderson

Published in: DARTS, Volume 4, Issue 2, Special Issue of the 30th Euromicro Conference on Real-Time Systems (ECRTS 2018)

Abstract

During the past decade, parallelism-related issues have been at the forefront of real-time systems research due to the advent of multicore technologies. In the coming years, such issues will loom ever larger due to increasing core counts. Having more cores means a greater potential exists for platform capacity loss when the available parallelism cannot be fully exploited. In this work, such capacity loss is considered in the context of real-time locking protocols. In this context, lock nesting becomes a key concern as it can result in transitive blocking chains that force tasks to execute sequentially unnecessarily. Such chains can be quite long on a larger machine. Contention-sensitive real-time locking protocols have been proposed as a means of ``breaking'' transitive blocking chains, but such protocols tend to have high overhead due to more complicated lock/unlock logic. To ease such overhead, the usage of lock servers is considered herein. In particular, four specific lock-server paradigms are proposed and many nuances concerning their deployment are explored. Experiments are presented that show that, by executing cache hot, lock servers can enable reductions in lock/unlock overhead of up to 86\%. Such reductions make contention-sensitive protocols a viable approach in practice. This artifact contains the implementation of two contention-sensitive locking protocol variants implemented with four proposed lock-server paradigms, as well as the experiments with which they were evaluated.

Cite as

Catherine E. Nemitz, Tanya Amert, and James H. Anderson. Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts (Artifact). In Special Issue of the 30th Euromicro Conference on Real-Time Systems (ECRTS 2018). Dagstuhl Artifacts Series (DARTS), Volume 4, Issue 2, pp. 2:1-2:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@Article{nemitz_et_al:DARTS.4.2.2,
  author =	{Nemitz, Catherine E. and Amert, Tanya and Anderson, James H.},
  title =	{{Using Lock Servers to Scale Real-Time Locking Protocols: Chasing Ever-Increasing Core Counts (Artifact)}},
  pages =	{2:1--2:3},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2018},
  volume =	{4},
  number =	{2},
  editor =	{Nemitz, Catherine E. and Amert, Tanya and Anderson, James H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.4.2.2},
  URN =		{urn:nbn:de:0030-drops-89704},
  doi =		{10.4230/DARTS.4.2.2},
  annote =	{Keywords: multiprocess locking protocols, nested locks, priority-inversion blocking, reader/writer locks, real-time locking protocols}
}

Document

DOI: 10.4230/LIPIcs.ECRTS.2017.15

Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds

Authors: Zheng Dong, Cong Liu, Alan Gatherer, Lee McFearin, Peter Yan, and James H. Anderson

Published in: LIPIcs, Volume 76, 29th Euromicro Conference on Real-Time Systems (ECRTS 2017)

Abstract

Heterogeneous computing platforms with multiple types of computing resources have been widely used in many industrial systems to process dataflow tasks with pre-defined affinity of tasks to subgroups of resources. For many dataflow workloads with soft real-time requirements, guaranteeing fast and bounded response times is often the objective. This paper presents a new set of analysis techniques showing that a classical real-time scheduler, namely earliest-deadline first (EDF), is able to support dataflow tasks scheduled on such heterogeneous platforms with provably bounded response times while incurring no resource capacity loss, thus proving EDF to be an optimal solution for this scheduling problem. Experiments using synthetic workloads with widely varied parameters also demonstrate that the magnitude of the response time bounds yielded under the proposed analysis is reasonably small under all scenarios. Compared to the state-of-the-art soft real-time analysis techniques, our test yields a 68% reduction on response time bounds on average. This work demonstrates the potential of applying EDF into practical industrial systems containing dataflow-based workloads that desire guaranteed bounded response times.

Cite as

Zheng Dong, Cong Liu, Alan Gatherer, Lee McFearin, Peter Yan, and James H. Anderson. Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds. In 29th Euromicro Conference on Real-Time Systems (ECRTS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 76, pp. 15:1-15:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{dong_et_al:LIPIcs.ECRTS.2017.15,
  author =	{Dong, Zheng and Liu, Cong and Gatherer, Alan and McFearin, Lee and Yan, Peter and Anderson, James H.},
  title =	{{Optimal Dataflow Scheduling on a Heterogeneous Multiprocessor With Reduced Response Time Bounds}},
  booktitle =	{29th Euromicro Conference on Real-Time Systems (ECRTS 2017)},
  pages =	{15:1--15:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-037-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{76},
  editor =	{Bertogna, Marko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2017.15},
  URN =		{urn:nbn:de:0030-drops-71565},
  doi =		{10.4230/LIPIcs.ECRTS.2017.15},
  annote =	{Keywords: Real-time Scheduling, schedulability, heterogeneous multiprocessor}
}

Search Results

Documents authored by Anderson, James H.

Anderson, James H.

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Anderson, James

Abstract

Cite as

Thanks for your feedback!

Could not send message