GPU Schedulers: How Fair Is Fair Enough?

Authors Tyler Sorensen, Hugues Evrard, Alastair F. Donaldson

Thumbnail PDF


  • Filesize: 0.55 MB
  • 17 pages

Document Identifiers

Author Details

Tyler Sorensen
  • Imperial College London, UK
Hugues Evrard
  • Imperial College London, UK
Alastair F. Donaldson
  • Imperial College London, UK

Cite AsGet BibTex

Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson. GPU Schedulers: How Fair Is Fair Enough?. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 23:1-23:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent programming. However, systems with semi-fair schedulers, e.g. graphics processing units (GPUs), are becoming increasingly common. Such schedulers provide varying degrees of fairness, guaranteeing enough to allow some, but not all, blocking idioms. While a number of applications that use blocking idioms do run on today's GPUs, reasoning about liveness properties of such applications is difficult as documentation is scarce and scattered. In this work, we aim to clarify fairness properties of semi-fair schedulers. To do this, we define a general temporal logic formula, based on weak fairness, parameterised by a predicate that enables fairness per-thread at certain points of an execution. We then define fairness properties for three GPU schedulers: HSA, OpenCL, and occupancy-bound execution. We examine existing GPU applications and show that none of the above schedulers are strong enough to provide the fairness properties required by these applications. It hence appears that existing GPU scheduler descriptions do not entirely capture the fairness properties that are provided on current GPUs. Thus, we present two new schedulers that aim to support existing GPU applications. We analyse the behaviour of common blocking idioms under each scheduler and show that one of our new schedulers allows a more natural implementation of a GPU protocol.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Semantics
  • Software and its engineering → Scheduling
  • Computing methodologies → Graphics processors
  • GPU scheduling
  • Blocking synchronisation
  • GPU semantics


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. clSPARSE. Retrieved June 2018 from URL:
  2. Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. The MIT Press, 2008. Google Scholar
  3. Blaise Barney. POSIX threads programming: Condition variables. (visited January 2018). URL:
  4. Adam Betts, Nathan Chong, Alastair F. Donaldson, Jeroen Ketema, Shaz Qadeer, Paul Thomson, and John Wickerson. The design and implementation of a verification technique for gpu kernels. TOPLAS, 37(3):10:1-10:49, 2015. Google Scholar
  5. M. Daga and J. L. Greathouse. Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices. In HiPC, pages 64-74. IEEE, 2015. Google Scholar
  6. Kshitij Gupta, Jeff Stuart, and John D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. In InPar, pages 1-14, 2012. Google Scholar
  7. Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., 2008. Google Scholar
  8. HSA Foundation. HSA programmer’s reference manual: HSAIL virtual ISA and programming model, compiler writer, and object format (BRIG). (rev 1.1.1), March 2017. URL:
  9. Wen-mei W. Hwu. GPU Computing Gems Jade Edition. Morgan Kaufmann, 2011. Google Scholar
  10. Khronos Group. The OpenCL C specification version 2.0 (rev. 33), May 2017. URL:
  11. Khronos Group. The OpenCL specification version: 2.2 (rev. 2.2-7), May 2018. URL:
  12. Orna Lichtenstein, Amir Pnueli, and Lenore Zuck. The glory of the past. In Logics of Programs, pages 196-218. Springer Berlin Heidelberg, 1985. Google Scholar
  13. Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, and Brian Vinter. A synchronization-free algorithm for parallel sparse triangular solves. In Euro-Par, pages 617-630. Springer, 2016. Google Scholar
  14. Nvidia. CUB. (visited January 2018). URL:
  15. Nvidia. CUDA C programming guide, version 9.1, January 2018. URL:
  16. Nvidia. CUDA Code Samples, 2018. URL:
  17. Sreepathi Pai and Keshav Pingali. A compiler for throughput optimization of graph algorithms on GPUs. In OOPSLA, pages 1-19, 2016. Google Scholar
  18. Tyler Sorensen and Alastair F. Donaldson. Exposing errors related to weak memory in GPU applications. In PLDI, pages 100-113. ACM, 2016. Google Scholar
  19. Tyler Sorensen and Alastair F. Donaldson. The hitchhiker’s guide to cross-platform OpenCL application development. In IWOCL, pages 2:1-2:12, 2016. Google Scholar
  20. Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, and Zvonimir Rakamaric. Portable inter-workgroup barrier synchronisation for GPUs. In OOPSLA, pages 39-58, 2016. Google Scholar
  21. Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, and Jeffrey Vetter. Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In ICS, pages 119-130. ACM, 2015. Google Scholar
  22. Shucai Xiao and Wu-chun Feng. Inter-block GPU communication via fast barrier synchronization. In IPDPS, pages 1-12, 2010. Google Scholar
  23. Shengen Yan, Guoping Long, and Yunquan Zhang. Streamscan: Fast scan algorithms for GPUs without global barrier synchronization. In PPoPP, pages 229-238. ACM, 2013. Google Scholar