Analysis of GPU Memory Allocation Characteristics

Authors Marcos Rodriguez , Irune Yarza , Leonidas Kosmidis , Alejandro J. Calderón



PDF
Thumbnail PDF

File

OASIcs.PARMA-DITAM.2025.1.pdf
  • Filesize: 0.92 MB
  • 15 pages

Document Identifiers

Author Details

Marcos Rodriguez
  • Ikerlan Technology Research Center, Mondragón, Spain
  • Universitat Politècnica de Catalunya, Barcelona, Spain
Irune Yarza
  • Ikerlan Technology Research Center, Mondragón, Spain
Leonidas Kosmidis
  • Barcelona Super Computing Centre (BSC), Spain
Alejandro J. Calderón
  • Ikerlan Technology Research Center, Mondragón, Spain

Cite As Get BibTex

Marcos Rodriguez, Irune Yarza, Leonidas Kosmidis, and Alejandro J. Calderón. Analysis of GPU Memory Allocation Characteristics. In 16th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 14th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2025). Open Access Series in Informatics (OASIcs), Volume 127, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/OASIcs.PARMA-DITAM.2025.1

Abstract

The number of applications subject to safety-critical regulations is on the rise, and consequently, the computing requirements for such applications are increasing as well. This trend has led to the integration of General-Purpose Graphics Processing Units (GPGPUs) into these systems. However, the inherent characteristics of GPGPUs, including their black-box nature, dynamic allocation mechanisms, and frequent use of pointers, present challenges in certifying these applications for safety-critical systems.
This paper aims to shed light on the unique characteristics of GPU programs and how they impact the certification process. To achieve this goal, several allocation methods are rigorously evaluated to determine which one is best suited to an application, regarding the program characteristics within the safety-critical domain.
By conducting this evaluation, we seek to provide insights into the complexities of GPU memory accesses and its compatibility with safety-critical requirements. The ultimate objective is to offer recommendations on the most appropriate allocation method based on the unique needs of each application, thus contributing to the safe and reliable integration of GPGPUs into safety-critical systems.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Parallel architectures
  • Software and its engineering → Real-time schedulability
  • Software and its engineering → Parallel programming languages
Keywords
  • CUDA
  • Memory allocation
  • Rodinia
  • Embedded

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rodinia: Accelerating compute-intensive applications with accelerators, 2018. URL: https://rodinia.cs.virginia.edu/doku.php.
  2. Mars-data, 2024. URL: https://anonymous.4open.science/r/MARS-data-568D/README.md.
  3. Andrew V Adinetz. Halloc: a high-throughput dynamic memory allocator for gpgpu architectures, 2014. Google Scholar
  4. Krste Asanovic. The landscape of parallel computing research: A view from berkeley. Report, Electrical Engineering and Computer Sciences University of California at Berkeley, 2006. Google Scholar
  5. Joshua Bakita. Enabling GPU memory oversubscription via transparent paging to an NVMe SSD*. Real-Time Systems Symposium, 2022. Google Scholar
  6. Alejandro J. Calderón, Leonidas Kosmidis, Carlos F. Nicolás, Francisco J. Cazorla, and Peio Onaindia. Understanding and exploiting the internals of GPU resource allocation for critical systems. In David Z. Pan, editor, Proceedings of the International Conference on Computer-Aided Design, ICCAD 2019, Westminster, CO, USA, November 4-7, 2019, pages 1-8. ACM, 2019. URL: https://doi.org/10.1109/ICCAD45719.2019.8942170.
  7. Alejandro J. Calderón, Leonidas Kosmidis, Carlos F. Nicolás, Francisco J. Cazorla, and Peio Onaindia. GMAI: understanding and exploiting the internals of GPU resource allocation in critical systems. ACM Trans. Embed. Comput. Syst., 19(5):34:1-34:23, 2020. URL: https://doi.org/10.1145/3391896.
  8. Alejandro J. Calderón. Real-Time High-Performance Computing for Embedded Control Systems. Thesis, Universitat Politècnica de Catalunya, 2022. Google Scholar
  9. A. Chatterjee. Function interposition in c with an example of user defined malloc, 2017. URL: https://www.geeksforgeeks.org/function-interposition-in-c-with-an-example-of-user-defined-malloc.
  10. Shuai Che. Rodinia: A benchmark suite for heterogeneous computing, 2009. URL: https://doi.org/10.1109/IISWC.2009.5306797.
  11. Shuai Che. A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads, 2010. URL: https://doi.org/10.1109/IISWC.2010.5650274.
  12. John Cheng. Professional CUDA C Programming. John Wiley & Sons, Inc., 2014. Google Scholar
  13. Jake Choi. Comparing unified, pinned, and host/device memory allocations for memory-intensive workloads on Tegra SoC. Concurrency and Computation: Practice and Experience, 2020. URL: https://doi.org/10.1002/cpe.6018.
  14. Björn Forsberg, Luca Benini, and Andrea Marongiu. Taming Data Caches for Predictable Execution on GPU-based SoCs. IEEE, 2019. URL: https://doi.org/10.23919/DATE.2019.8715255.
  15. NVIDIA. Unified memory in cuda for beginners, 2017. URL: https://developer.nvidia.com/blog/unified-memory-cuda-beginners/.
  16. NVIDIA. Gpudirect rdma, 2020. URL: https://docs.nvidia.com/cuda/gpudirect-rdma/index.html.
  17. NVIDIA. Gpudirect storage, 2020. URL: https://developer.nvidia.com/gpudirect-storage.
  18. System Optimization and Riverside Computer Architecture Laboratory at the University of California. Complete rodinia benchmark suite v3.1, 2017. URL: https://github.com/socal-ucr/Rodinia/tree/3.1.
  19. Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, and Wen-mei W. Hwu. Gpu-initiated on-demand high-throughput storage access in the bam system architecture. In Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift, editors, Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023, pages 325-339. ACM, 2023. URL: https://doi.org/10.1145/3575693.3575748.
  20. Marcos Rodriguez. marcosrc92/MARS-data. Dataset, (visited on 2025-01-13). URL: https://github.com/marcosrc92/MARS-data
    Software Heritage Logo archived version
    full metadata available at: https://doi.org/10.4230/artifacts.22756
  21. Paul L. Springer. Berkeley’s dwarfs on cuda, 2012. URL: https://api.semanticscholar.org/CorpusID:44643311.
  22. Markus Steinberger. Scatteralloc: Massively parallel dynamic memory allocation for the gpu, 2012. Google Scholar
  23. Sven Widmer, Dominik Wodniok, Nicolas Weber, and Michael Goesele. Fast dynamic memory allocator for massively parallel architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, Houston, Texas, USA, March 16, 2013, GPGPU-6, pages 120-126, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2458523.2458535.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail