Warp-Level CFG Construction for GPU Kernel WCET Analysis

Jeanmougin, Louison; Sotin, Pascal; Rochange, Christine; Carle, Thomas

doi:10.4230/OASIcs.WCET.2023.1

File

OASIcs.WCET.2023.1.pdf

Filesize: 0.64 MB
13 pages

Document Identifiers

DOI: 10.4230/OASIcs.WCET.2023.1
URN: urn:nbn:de:0030-drops-184303

Author Details

Louison Jeanmougin

IRIT - Univ. Toulouse 3 - CNRS, France

Pascal Sotin

IRIT - Univ. Toulouse 2 - CNRS, France

Christine Rochange

IRIT - Univ. Toulouse 3 - CNRS, France

Thomas Carle

IRIT - Univ. Toulouse 3 - CNRS, France

Cite AsGet BibTex

Louison Jeanmougin, Pascal Sotin, Christine Rochange, and Thomas Carle. Warp-Level CFG Construction for GPU Kernel WCET Analysis. In 21th International Workshop on Worst-Case Execution Time Analysis (WCET 2023). Open Access Series in Informatics (OASIcs), Volume 114, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/OASIcs.WCET.2023.1

Abstract

We present an abstract interpretation technique to automatically build a Control Flow Graph (CFG) representation of the execution of a GPU kernel. GPUs implement an inherently parallel execution model, in which threads are grouped within so-called warps that execute in lockstep. This execution model enables the representation of the execution of the threads of a warp as a single CFG. However, thread divergence may appear within a warp and its effect must be captured explicitly within the CFG. Our method builds the CFG of a warp by applying abstract interpretation on the assembly (Nvidia SASS) code of a kernel, and by maintaining an abstract representation of which threads within the warp agree on which values. This allows the method to detect precisely the points in the program where thread divergence may occur, and avoid spurious reactivation edges in the CFG. We apply our technique on benchmark kernels as a proof-of-concept, and generate IPET systems using the resulting CFGs.

Subject Classification

ACM Subject Classification

Computer systems organization → Real-time systems
Theory of computation → Abstraction

Keywords

Graphical Processing Unit (GPU)
Control Flow Graphs (CFG)
Worst-Case Execution Time (WCET)
Program analysis

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Sebastian Altmeyer and Claire Maiza. Cache-related preemption delay via useful cache blocks: Survey and redefinition. J. Syst. Archit., 57(7):707-719, 2011. URL: https://doi.org/10.1016/j.sysarc.2010.08.006.
T. Amert, N. Otterness, M. Yang, J. H. Anderson, and F. D. Smith. Gpu scheduling on the nvidia tx2: Hidden details revealed. In 2017 IEEE Real-Time Systems Symposium (RTSS), 2017.
Gogul Balakrishnan and Thomas W. Reps. WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst., 32(6):23:1-23:84, 2010. URL: https://doi.org/10.1145/1749608.1749612.
Kostiantyn Berezovskyi, Konstantinos Bletsas, and Björn Andersson. Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor. In 2012 24th Euromicro Conference on Real-Time Systems, pages 277-286, July 2012. ISSN: 2377-5998. URL: https://doi.org/10.1109/ECRTS.2012.16.
Adam Betts and Alastair Donaldson. Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis. In 2013 25th Euromicro Conference on Real-Time Systems, pages 193-202, July 2013. ISSN: 2377-5998. URL: https://doi.org/10.1109/ECRTS.2013.29.
François Bourdoncle. Efficient chaotic iteration strategies with widenings. In Dines Bjørner, Manfred Broy, and Igor V. Pottosin, editors, Formal Methods in Programming and Their Applications, International Conference, Akademgorodok, Novosibirsk, Russia, June 28 - July 2, 1993, Proceedings, volume 735 of Lecture Notes in Computer Science, pages 128-141. Springer, 1993. URL: https://doi.org/10.1007/BFb0039704.
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC), pages 44-54, October 2009. URL: https://doi.org/10.1109/IISWC.2009.5306797.
Robert I. Davis, Sebastian Altmeyer, Leandro Soares Indrusiak, Claire Maiza, Vincent Nélis, and Jan Reineke. An extensible framework for multicore response time analysis. Real Time Syst., 54(3):607-661, 2018. URL: https://doi.org/10.1007/s11241-017-9285-4.
Vesa Hirvisalo. On Static Timing Analysis of GPU Kernels. In Heiko Falk, editor, 14th International Workshop on Worst-Case Execution Time Analysis, volume 39 of OpenAccess Series in Informatics (OASIcs), pages 43-52, Dagstuhl, Germany, 2014. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISSN: 2190-6807. URL: https://doi.org/10.4230/OASIcs.WCET.2014.43.
Yijie Huangfu and Wei Zhang. Static WCET Analysis of GPUs with Predictable Warp Scheduling. In 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC), pages 101-108, May 2017. ISSN: 2375-5261. URL: https://doi.org/10.1109/ISORC.2017.24.
Rémi Meunier, Thomas Carle, and Thierry Monteil. Correctness and Efficiency Criteria for the Multi-Phase Task Model. In Martina Maggio, editor, 34th Euromicro Conference on Real-Time Systems (ECRTS 2022), volume 231 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1-9:21, Dagstuhl, Germany, 2022. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ECRTS.2022.9.
I. S. Olmedo, N. Capodieci, J. L. Martinez, A. Marongiu, and M. Bertogna. Dissecting the cuda scheduling hierarchy: a performance and predictability perspective. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2020.