Analysis of GPU Memory Allocation Characteristics

Rodriguez, Marcos; Yarza, Irune; Kosmidis, Leonidas; Calderón, Alejandro J.

doi:10.4230/OASIcs.PARMA-DITAM.2025.1

File

Author Details

Marcos Rodriguez

Ikerlan Technology Research Center, Mondragón, Spain
Universitat Politècnica de Catalunya, Barcelona, Spain

Irune Yarza

Ikerlan Technology Research Center, Mondragón, Spain

Leonidas Kosmidis

Barcelona Super Computing Centre (BSC), Spain

Alejandro J. Calderón

Ikerlan Technology Research Center, Mondragón, Spain

Cite As Get BibTex

Marcos Rodriguez, Irune Yarza, Leonidas Kosmidis, and Alejandro J. Calderón. Analysis of GPU Memory Allocation Characteristics. In 16th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 14th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2025). Open Access Series in Informatics (OASIcs), Volume 127, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/OASIcs.PARMA-DITAM.2025.1

Abstract

The number of applications subject to safety-critical regulations is on the rise, and consequently, the computing requirements for such applications are increasing as well. This trend has led to the integration of General-Purpose Graphics Processing Units (GPGPUs) into these systems. However, the inherent characteristics of GPGPUs, including their black-box nature, dynamic allocation mechanisms, and frequent use of pointers, present challenges in certifying these applications for safety-critical systems. This paper aims to shed light on the unique characteristics of GPU programs and how they impact the certification process. To achieve this goal, several allocation methods are rigorously evaluated to determine which one is best suited to an application, regarding the program characteristics within the safety-critical domain. By conducting this evaluation, we seek to provide insights into the complexities of GPU memory accesses and its compatibility with safety-critical requirements. The ultimate objective is to offer recommendations on the most appropriate allocation method based on the unique needs of each application, thus contributing to the safe and reliable integration of GPGPUs into safety-critical systems.

Subject Classification

ACM Subject Classification

Computer systems organization → Parallel architectures
Software and its engineering → Real-time schedulability
Software and its engineering → Parallel programming languages

Keywords

CUDA
Memory allocation
Rodinia
Embedded

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Rodinia: Accelerating compute-intensive applications with accelerators, 2018. URL: https://rodinia.cs.virginia.edu/doku.php.
Mars-data, 2024. URL: https://anonymous.4open.science/r/MARS-data-568D/README.md.
Andrew V Adinetz. Halloc: a high-throughput dynamic memory allocator for gpgpu architectures, 2014.
Krste Asanovic. The landscape of parallel computing research: A view from berkeley. Report, Electrical Engineering and Computer Sciences University of California at Berkeley, 2006.
Joshua Bakita. Enabling GPU memory oversubscription via transparent paging to an NVMe SSD*. Real-Time Systems Symposium, 2022.
Alejandro J. Calderón, Leonidas Kosmidis, Carlos F. Nicolás, Francisco J. Cazorla, and Peio Onaindia. Understanding and exploiting the internals of GPU resource allocation for critical systems. In David Z. Pan, editor, Proceedings of the International Conference on Computer-Aided Design, ICCAD 2019, Westminster, CO, USA, November 4-7, 2019, pages 1-8. ACM, 2019. URL: https://doi.org/10.1109/ICCAD45719.2019.8942170.
Alejandro J. Calderón, Leonidas Kosmidis, Carlos F. Nicolás, Francisco J. Cazorla, and Peio Onaindia. GMAI: understanding and exploiting the internals of GPU resource allocation in critical systems. ACM Trans. Embed. Comput. Syst., 19(5):34:1-34:23, 2020. URL: https://doi.org/10.1145/3391896.
Alejandro J. Calderón. Real-Time High-Performance Computing for Embedded Control Systems. Thesis, Universitat Politècnica de Catalunya, 2022.
A. Chatterjee. Function interposition in c with an example of user defined malloc, 2017. URL: https://www.geeksforgeeks.org/function-interposition-in-c-with-an-example-of-user-defined-malloc.
Shuai Che. Rodinia: A benchmark suite for heterogeneous computing, 2009. URL: https://doi.org/10.1109/IISWC.2009.5306797.
Shuai Che. A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads, 2010. URL: https://doi.org/10.1109/IISWC.2010.5650274.
John Cheng. Professional CUDA C Programming. John Wiley & Sons, Inc., 2014.
Jake Choi. Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC. Concurrency and Computation: Practice and Experience, 2020. URL: https://doi.org/10.1002/cpe.6018.
Björn Forsberg, Luca Benini, and Andrea Marongiu. Taming Data Caches for Predictable Execution on GPU-based SoCs. IEEE, 2019. URL: https://doi.org/10.23919/DATE.2019.8715255.
NVIDIA. Unified memory in cuda for beginners, 2017. URL: https://developer.nvidia.com/blog/unified-memory-cuda-beginners/.
NVIDIA. Gpudirect rdma, 2020. URL: https://docs.nvidia.com/cuda/gpudirect-rdma/index.html.
NVIDIA. Gpudirect storage, 2020. URL: https://developer.nvidia.com/gpudirect-storage.
System Optimization and Riverside Computer Architecture Laboratory at the University of California. Complete rodinia benchmark suite v3.1, 2017. URL: https://github.com/socal-ucr/Rodinia/tree/3.1.
Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, and Wen-mei W. Hwu. Gpu-initiated on-demand high-throughput storage access in the bam system architecture. In Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift, editors, Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023, pages 325-339. ACM, 2023. URL: https://doi.org/10.1145/3575693.3575748.
Marcos Rodriguez. marcosrc92/MARS-data. Dataset, (visited on 2025-01-13). URL: https://github.com/marcosrc92/MARS-data
archived version
full metadata available at: https://doi.org/10.4230/artifacts.22756
Paul L. Springer. Berkeley’s dwarfs on cuda, 2012. URL: https://api.semanticscholar.org/CorpusID:44643311.
Markus Steinberger. Scatteralloc: Massively parallel dynamic memory allocation for the gpu, 2012.
Sven Widmer, Dominik Wodniok, Nicolas Weber, and Michael Goesele. Fast dynamic memory allocator for massively parallel architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, Houston, Texas, USA, March 16, 2013, GPGPU-6, pages 120-126, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2458523.2458535.

Analysis of GPU Memory Allocation Characteristics

Authors Marcos Rodriguez , Irune Yarza , Leonidas Kosmidis , Alejandro J. Calderón

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Analysis of GPU Memory Allocation Characteristics

Authors Marcos Rodriguez , Irune Yarza , Leonidas Kosmidis , Alejandro J. Calderón

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message