Forward Progress on GPU Concurrency (Invited Talk)

Authors Alastair F. Donaldson, Jeroen Ketema, Tyler Sorensen, John Wickerson

Thumbnail PDF


  • Filesize: 483 kB
  • 13 pages

Document Identifiers

Author Details

Alastair F. Donaldson
Jeroen Ketema
Tyler Sorensen
John Wickerson

Cite AsGet BibTex

Alastair F. Donaldson, Jeroen Ketema, Tyler Sorensen, and John Wickerson. Forward Progress on GPU Concurrency (Invited Talk). In 28th International Conference on Concurrency Theory (CONCUR 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 85, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years in the Multicore Programming Group at Imperial College London, and with collaborators internationally, related to understanding and reasoning about concurrency in software designed for acceleration on GPUs. In this article we provide an overview of this work, which includes contributions to data race analysis, compiler testing, memory model understanding and formalisation, and most recently efforts to enable portable GPU implementations of algorithms that require forward progress guarantees.
  • GPUs
  • concurrency
  • formal verification
  • memory models
  • data races


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In ASPLOS, pages 577-591, 2015. Google Scholar
  2. Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation, testing, and data mining for weak memory. ACM Trans. Program. Lang. Syst., 36(2):7:1-7:74, 2014. Google Scholar
  3. Afshin Amighi, Saeed Darabi, Stefan Blom, and Marieke Huisman. Specification and verification of atomic operations in GPGPU programs. In SEFM, pages 69-83, 2015. Google Scholar
  4. Ethel Bardsley, Adam Betts, Nathan Chong, Peter Collingbourne, Pantazis Deligiannis, Alastair F. Donaldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer. Engineering a static verification tool for GPU kernels. In CAV, pages 226-242, 2014. Google Scholar
  5. Michael Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K. Rustan M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, pages 364-387, 2005. Google Scholar
  6. Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. In CAV, pages 171-177, 2011. Google Scholar
  7. Ethel Barsdley and Alastair F. Donaldson. Warps and atomics: Beyond barrier synchronisation in the verification of GPU kernels. In NFM, 2014. Google Scholar
  8. Mark Batty, Alastair F. Donaldson, and John Wickerson. Overhauling SC atomics in C11 and OpenCL. In POPL, pages 634-648. ACM, 2016. Google Scholar
  9. Adam Betts, Nathan Chong, Pantazis Deligiannis, Alastair F. Donaldson, and Jeroen Ketema. Implementing and evaluating candidate-based invariant generation. IEEE Trans. Software Eng., 2017. To appear. Google Scholar
  10. Adam Betts, Nathan Chong, Alastair F. Donaldson, Jeroen Ketema, Shaz Qadeer, Paul Thomson, and John Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Trans. Program. Lang. Syst., 37(3):10:1-10:49, 2015. Google Scholar
  11. Adam Betts, Nathan Chong, Alastair F. Donaldson, Shaz Qadeer, and Paul Thomson. GPUVerify: A verifier for GPU kernels. In OOPSLA, pages 113-132, 2012. Google Scholar
  12. Stefan Blom, Marieke Huisman, and Matej Mihelčić. Specification and verification of GPGPU programs. Science of Computer Programming, 95(3):376-388, 2014. Google Scholar
  13. Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720-748, 1999. Google Scholar
  14. Wei-Fan Chiang, Ganesh Gopalakrishnan, Guodong Li, and Zvonimir Rakamarić. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In NFM, pages 213-228, 2013. Google Scholar
  15. Nathan Chong, Alastair F. Donaldson, Paul Kelly, Jeroen Ketema, and Shaz Qadeer. Barrier invariants: A shared state abstraction for the analysis of data-dependent GPU kernels. In OOPSLA, pages 605-622, 2013. Google Scholar
  16. Nathan Chong, Alastair F. Donaldson, and Jeroen Ketema. A sound and complete abstraction for reasoning about parallel prefix sums. In POPL, pages 397-410, 2014. Google Scholar
  17. Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. Symbolic crosschecking of data-parallel floating-point code. IEEE Trans. Software Eng., 40(7):710-737, 2014. Google Scholar
  18. Peter Collingbourne, Alastair F. Donaldson, Jeroen Ketema, and Shaz Qadeer. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In ESOP, pages 270-289, 2013. Google Scholar
  19. Leonardo Mendonça de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In TACAS, pages 337-340, 2008. Google Scholar
  20. Cormac Flanagan and K. Rustan M. Leino. Houdini, an annotation assistant for ESC/Java. In FME, pages 500-517, 2001. Google Scholar
  21. Kshitij Gupta, Jeff Stuart, and John D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. In InPar, pages 1-14, 2012. Google Scholar
  22. ISO/IEC. Programming languages - C. International standard 9899:2011, 2011. Google Scholar
  23. Jeroen Ketema and Alastair F. Donaldson. Termination analysis for GPU kernels. Science of Computer Programming, 2017. To appear. Google Scholar
  24. Khronos Group. The OpenCL specification version: 2.0 (rev. 29), July 2015. URL:
  25. Peter M. Kogge and Harold S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Comp., C-22(8):786-–793, 1973. Google Scholar
  26. Kensuke Kojima and Atsushi Igarashi. A Hoare logic for SIMT programs. In APLAS, pages 58-73, 2013. Google Scholar
  27. Vu Le, Mehrdad Afshari, and Zhendong Su. Compiler validation via equivalence modulo inputs. In PLDI, pages 216-226, 2014. Google Scholar
  28. Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, and Sorin Lerner. Verifying GPU kernels by test amplification. In PLDI, pages 383-394, 2012. Google Scholar
  29. Guodong Li and Ganesh Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In FSE, pages 187-196, 2010. Google Scholar
  30. Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, and Sreeranga P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In PPoPP, pages 215-224, 2012. Google Scholar
  31. Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. Many-core compiler fuzzing. In PLDI, pages 65-76, 2015. Google Scholar
  32. William M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100-107, 1998. Google Scholar
  33. NVIDIA. CUDA C programming guide, version 5.5, 2013. Google Scholar
  34. NVIDIA. CUDA 9 features revealed: Volta, cooperative groups and more, accessed 2017. URL:
  35. NVIDIA. CUDA-MEMCHECK, accessed 2017. URL:
  36. Marc S. Orr, Shuai Che, Ayse Yilmazer, Bradford M. Beckmann, Mark D. Hill, and David A. Wood. Synchronization using remote-scope promotion. In ASPLOS, pages 73-86, 2015. Google Scholar
  37. Sreepathi Pai and Keshav Pingali. A compiler for throughput optimization of graph algorithms on GPUs. In OOPSLA, pages 1-19, 2016. Google Scholar
  38. Phillipe A. Pereira, Higo F. Albuquerque, Hendrio Marques, Isabela Silva, Celso Carvalho, Lucas C. Cordeiro, Vanessa Santos, and Ricardo Ferreira. Verifying CUDA programs using SMT-based context-bounded model checking. In SAC, pages 1648-1653, 2016. Google Scholar
  39. Hamid Sarbazi-Azad, editor. Advances in GPU Research and Practice. Morgan Kaufmann, 2017. Google Scholar
  40. Tyler Sorensen and Alastair F. Donaldson. Exposing errors related to weak memory models in GPU applications. In PLDI, pages 100-113, 2016. Google Scholar
  41. Tyler Sorensen and Alastair F. Donaldson. The hitchhiker’s guide to cross-platform OpenCL application development. In IWOCL, pages 2:1-2:12, 2016. Google Scholar
  42. Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, and Zvonimir Rakamaric. Portable inter-workgroup barrier synchronisation for gpus. In OOPSLA, pages 39-58, 2016. Google Scholar
  43. Tyler Sorensen, Hugues Evrard, and Alastair F. Donaldson. Cooperative kernels: GPU multitasking for blocking algorithms. In ESEC/FSE, 2017. To appear. Google Scholar
  44. Stavros Tripakis, Christos Stergiou, and Roberto Lublinerman. Checking equivalence of SPMD programs using non-interference. In HotPar, pages 1-5, 2010. Google Scholar
  45. John Wickerson, Mark Batty, Bradford M. Beckmann, and Alastair F. Donaldson. Remote-scope promotion: clarified, rectified, and verified. In OOPSLA, pages 731-747, 2015. Google Scholar
  46. John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. Automatically comparing memory consistency models. In POPL, pages 190-204, 2017. Google Scholar
  47. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In PLDI, pages 283-294, 2011. Google Scholar
  48. Mai Zheng, Vignesh T. Ravi, Feng Qin, and Gagan Agrawal. GRace: A low-overhead mechanism for detecting data races in GPU programs. In PPoPP, pages 135-146, 2011. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail