,
Thomas Carle
,
Christine Rochange
Creative Commons Attribution 4.0 International license
83f8cddda1e1ee16c87fb381700f516d
(Get MD5 Sum)
This paper proposes to model the Worst-Case Execution Time (WCET) of a GPU thread block as the Worst-Case Response Time (WCRT) of the warps composing the block. Inspired by the WCRT analyzes for classical CPU tasks, the response time of a warp is modeled as its execution time in isolation added to an interference term that accounts for the execution of higher priority warps. We provide an algorithm to build a representation of the execution of each warp of a thread block that distinguishes phases of execution on the functional units and phases of idleness due to operations latency. A simple formula relying on this model is then proposed to safely upper bound the WCRT of warps scheduled under greedy policies such as Greedy-Then-Oldest (GTO) or Loose Round-Robin (LRR). We experimented our approach using simulations of kernels from a GPU benchmark suite on the Accel-Sim simulator. We also evaluated the model on a GPU program that is likely to be found in safety critical systems : SGEMM (Single-precision GEneral Matrix Multiplication). This work constitutes a promising first building block of an analysis pipeline for enabling static WCET computation on GPUs.
@Article{jeanmougin_et_al:DARTS.11.1.3,
author = {Jeanmougin, Louison and Carle, Thomas and Rochange, Christine},
title = {{Bounding the WCET of a GPU Thread Block with a Multi-Phase Representation of Warps Execution (Artifact)}},
pages = {3:1--3:5},
journal = {Dagstuhl Artifacts Series},
ISSN = {2509-8195},
year = {2025},
volume = {11},
number = {1},
editor = {Jeanmougin, Louison and Carle, Thomas and Rochange, Christine},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/DARTS.11.1.3},
URN = {urn:nbn:de:0030-drops-236047},
doi = {10.4230/DARTS.11.1.3},
annote = {Keywords: GPU, WCET analysis}
}