GPU Schedulers: How Fair Is Fair Enough?

Sorensen, Tyler; Evrard, Hugues; Donaldson, Alastair F.

doi:10.4230/LIPIcs.CONCUR.2018.23

File

LIPIcs.CONCUR.2018.23.pdf

Filesize: 0.55 MB
17 pages

Document Identifiers

DOI: 10.4230/LIPIcs.CONCUR.2018.23
URN: urn:nbn:de:0030-drops-95619

Author Details

Tyler Sorensen

Imperial College London, UK

Hugues Evrard

Imperial College London, UK

Alastair F. Donaldson

Imperial College London, UK

Cite AsGet BibTex

Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson. GPU Schedulers: How Fair Is Fair Enough?. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 23:1-23:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.CONCUR.2018.23

Abstract

Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent programming. However, systems with semi-fair schedulers, e.g. graphics processing units (GPUs), are becoming increasingly common. Such schedulers provide varying degrees of fairness, guaranteeing enough to allow some, but not all, blocking idioms. While a number of applications that use blocking idioms do run on today's GPUs, reasoning about liveness properties of such applications is difficult as documentation is scarce and scattered. In this work, we aim to clarify fairness properties of semi-fair schedulers. To do this, we define a general temporal logic formula, based on weak fairness, parameterised by a predicate that enables fairness per-thread at certain points of an execution. We then define fairness properties for three GPU schedulers: HSA, OpenCL, and occupancy-bound execution. We examine existing GPU applications and show that none of the above schedulers are strong enough to provide the fairness properties required by these applications. It hence appears that existing GPU scheduler descriptions do not entirely capture the fairness properties that are provided on current GPUs. Thus, we present two new schedulers that aim to support existing GPU applications. We analyse the behaviour of common blocking idioms under each scheduler and show that one of our new schedulers allows a more natural implementation of a GPU protocol.

Subject Classification

ACM Subject Classification

Software and its engineering → Semantics
Software and its engineering → Scheduling
Computing methodologies → Graphics processors

Keywords

GPU scheduling
Blocking synchronisation
GPU semantics

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

clSPARSE. Retrieved June 2018 from URL: https://github.com/clMathLibraries/clSPARSE.
Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. The MIT Press, 2008.
Blaise Barney. POSIX threads programming: Condition variables. (visited January 2018). URL: https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables.
Adam Betts, Nathan Chong, Alastair F. Donaldson, Jeroen Ketema, Shaz Qadeer, Paul Thomson, and John Wickerson. The design and implementation of a verification technique for gpu kernels. TOPLAS, 37(3):10:1-10:49, 2015.
M. Daga and J. L. Greathouse. Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices. In HiPC, pages 64-74. IEEE, 2015.
Kshitij Gupta, Jeff Stuart, and John D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. In InPar, pages 1-14, 2012.
Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., 2008.
HSA Foundation. HSA programmer’s reference manual: HSAIL virtual ISA and programming model, compiler writer, and object format (BRIG). (rev 1.1.1), March 2017. URL: http://www.hsafoundation.com/standards/.
Wen-mei W. Hwu. GPU Computing Gems Jade Edition. Morgan Kaufmann, 2011.
Khronos Group. The OpenCL C specification version 2.0 (rev. 33), May 2017. URL: https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf.
Khronos Group. The OpenCL specification version: 2.2 (rev. 2.2-7), May 2018. URL: https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf.
Orna Lichtenstein, Amir Pnueli, and Lenore Zuck. The glory of the past. In Logics of Programs, pages 196-218. Springer Berlin Heidelberg, 1985.
Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, and Brian Vinter. A synchronization-free algorithm for parallel sparse triangular solves. In Euro-Par, pages 617-630. Springer, 2016.
Nvidia. CUB. (visited January 2018). URL: http://nvlabs.github.io/cub/.
Nvidia. CUDA C programming guide, version 9.1, January 2018. URL: http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.
Nvidia. CUDA Code Samples, 2018. URL: https://developer.nvidia.com/cuda-code-samples.
Sreepathi Pai and Keshav Pingali. A compiler for throughput optimization of graph algorithms on GPUs. In OOPSLA, pages 1-19, 2016.
Tyler Sorensen and Alastair F. Donaldson. Exposing errors related to weak memory in GPU applications. In PLDI, pages 100-113. ACM, 2016.
Tyler Sorensen and Alastair F. Donaldson. The hitchhiker’s guide to cross-platform OpenCL application development. In IWOCL, pages 2:1-2:12, 2016.
Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, and Zvonimir Rakamaric. Portable inter-workgroup barrier synchronisation for GPUs. In OOPSLA, pages 39-58, 2016.
Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In ICS, pages 119-130. ACM, 2015.
Shucai Xiao and Wu-chun Feng. Inter-block GPU communication via fast barrier synchronization. In IPDPS, pages 1-12, 2010.
Shengen Yan, Guoping Long, and Yunquan Zhang. Streamscan: Fast scan algorithms for GPUs without global barrier synchronization. In PPoPP, pages 229-238. ACM, 2013.