The use of heterogeneous systems has become widespread and popular in the past decade with more than one type of processor, such as CPUs, GPUs (Graphics Processing Units), and FPGAs (Field Programmable Gate Arrays) etc. A wide range of applications use both CPU and GPU to leverage the benefits of their unique features and strengths. Therefore, collaborative computation between CPU and GPU is essential to achieve high program performance. However, poorly placed global synchronization barriers and synchronous memory transfers are the main bottlenecks to enhanced program performance, preventing CPU and GPU computations from overlapping. Based on this observation, we propose a new optimization technique called hetero-sync motion that can relocate such barrier instructions to new locations, resulting in improved performance in CPU-GPU heterogeneous programs. Further, we propose GSOHC, a compiler analysis and optimization framework that automatically finds opportunities for hetero-sync motion in the input program and then performs code transformation to apply the optimization. Our static analysis is a context-sensitive, flow-sensitive inter-procedural data-flow analysis with three phases to identify the optimization opportunities precisely. We have implemented GSOHC using LLVM/Clang infrastructure. On A4000, P100 and A100 GPUs, our optimization achieves up to 1.8x, up to 1.9x and up to 1.9x speedups over baseline, respectively.
@Article{kumarbasu_et_al:DARTS.11.2.17, author = {Kumar Basu, Soumik and Vedurada, Jyothi}, title = {{GSOHC: Global Synchronization Optimization for Heterogeneous Computing (Artifact)}}, pages = {17:1--17:7}, journal = {Dagstuhl Artifacts Series}, ISSN = {2509-8195}, year = {2025}, volume = {11}, number = {2}, editor = {Kumar Basu, Soumik and Vedurada, Jyothi}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DARTS.11.2.17}, URN = {urn:nbn:de:0030-drops-233603}, doi = {10.4230/DARTS.11.2.17}, annote = {Keywords: Static Analysis, Synchronization, CPU-GPU, Heterogeneous Computing, Parallelization} }
3ff143a2c39552543adbc232bb0dfad2
(Get MD5 Sum)
The artifact has been evaluated as described in the ECOOP 2025 Call for Artifacts and the ACM Artifact Review and Badging Policy.
Feedback for Dagstuhl Publishing