,
Xiaorui Wang
Creative Commons Attribution 4.0 International license
64442cb4c3672ce03376740d4c75af1e
(Get MD5 Sum)
Many of today’s real-time embedded systems are increasingly relying on GPUs for AI-related computing. However, existing GPU scheduling solutions for spatially shared GPU systems are still mostly open-loop and rely on worst-case execution time (WCET) estimation for offline schedulability analysis, which cannot adapt to online workload variations. Although adaptive scheduling has been proposed to handle runtime execution time variations, prior approaches target either CPU or time-slicing GPUs, where only one task can execute on the GPU within a time slice. In contrast, spatial sharing enables concurrent kernel execution via Streaming Multiprocessor (SM) partitioning, allowing better GPU resource utilization. Therefore, new adaptive solutions must be designed for spatially shared GPU systems. In this paper, we propose DySM, a closed-loop response time control algorithm for spatially shared GPUs in soft real-time systems. In face of runtime workload variations, DySM leverages dynamic SM scaling to control task response times with low runtime overhead. To model GPU resource contention among tasks, we analytically derive a multi-input-multi-output (MIMO) system model that captures the impact of SM scaling on the response times of different tasks. Based on this model, DySM is designed using feedback control theory for guaranteed system stability and control accuracy. Experimental results on an Nvidia GPU testbed demonstrate that DySM outperforms state-of-the-art solutions by providing runtime real-time guarantees. Compared to the best-performing baseline, DySM can reduce the deadline miss ratio by up to 90.93%.
@Article{subramaniyan_et_al:DARTS.12.2.4,
author = {Subramaniyan, Srinivasan and Wang, Xiaorui},
title = {{DySM: Dynamic Scaling of GPU Streaming Multiprocessor in Spatially Shared Real-Time Embedded GPU Systems (Artifact)}},
pages = {4:1--4:3},
journal = {Dagstuhl Artifacts Series},
ISSN = {2509-8195},
year = {2026},
volume = {12},
number = {2},
editor = {Subramaniyan, Srinivasan and Wang, Xiaorui},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/DARTS.12.2.4},
URN = {urn:nbn:de:0030-drops-266218},
doi = {10.4230/DARTS.12.2.4},
annote = {Keywords: Real-time systems, GPU scheduling, response time ratio control, SM scaling, and feedback control}
}