On Static Timing Analysis of GPU Kernels

Author Vesa Hirvisalo



PDF
Thumbnail PDF

File

OASIcs.WCET.2014.43.pdf
  • Filesize: 421 kB
  • 10 pages

Document Identifiers

Author Details

Vesa Hirvisalo

Cite AsGet BibTex

Vesa Hirvisalo. On Static Timing Analysis of GPU Kernels. In 14th International Workshop on Worst-Case Execution Time Analysis. Open Access Series in Informatics (OASIcs), Volume 39, pp. 43-52, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)
https://doi.org/10.4230/OASIcs.WCET.2014.43

Abstract

We study static timing analysis of programs running on GPU accelerators. Such programs follow a data parallel programming model that allows massive parallelism on manycore processors. Data parallel programming and GPUs as accelerators have received wide use during the recent years. The timing analysis of programs running on single core machines is well known and applied also in practice. However for multicore and manycore machines, timing analysis presents a significant but yet not properly solved problem. In this paper, we present static timing analysis of GPU kernels based on a method that we call abstract CTA simulation. Cooperative Thread Arrays (CTA) are the basic execution structure that GPU devices use in their operation that proceeds in thread groups called warps. Abstract CTA simulation is based on static analysis of thread divergence in warps and their abstract scheduling.
Keywords
  • Parallelism
  • WCET

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Betts and A. F. Donaldson. Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis. In Proceedings of the Euromicro Conference on Real-Time Systems (ECTRS), pages 193-202, 2012. Google Scholar
  2. S. Chattopadhyay, L. K. Chong, A. Roychoudhury, T. Kelter, P. Marwedel, and H. Falk. A Unified WCET Analysis Framework for Multi-core Platforms. ACM Transactions on Embedded Computing Systems (TECS), 13(4s), April 2014. Google Scholar
  3. B. Coutinho, D. Sampaio, F. M. Q. Pereira, and W. Jr. Meira. Divergence Analysis and Optimizations. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT), pages 320-329, 2011. Google Scholar
  4. A. E. Dalsgaard, M. C. Olesen, M. Toft, R. R. Hansen, and K. G. Larsen. METAMOC: Modular Execution Time Analysis using Model Checking. In Proceedings of the International Workshop on Worst-Case Execution Time Analysis (WCET), pages 114-124, 2010. Google Scholar
  5. A. Gustavsson, J. Gustafsson, and B. Lisper. Timing Analysis of Parallel Software Using Abstract Execution. In Proceedings of International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), pages 59-77, 2014. Google Scholar
  6. Khronos. OpenCL documentation. URL: http://www.khronos.org/opencl/.
  7. NVIDIA. CUDA documentation. URL: http://nvidia.com/.
  8. R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenström. The worst-case execution-time problem - overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems (TECS), 7(3):1-53, April 2008. Google Scholar