DARTS.5.1.4.pdf
- Filesize: 298 kB
- 3 pages
3978b2398eab0687e51009e681c0ada9
(Get MD5 Sum)
High-performance heterogeneous embedded platforms allow offloading of parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). A time-predictable characterization of task submission is a must in real-time applications. We provide a profiler of the time spent by the CPU for submitting stereotypical GP-GPU workload shaped as a Deep Neural Network of parameterized complexity. The submission is performed using the latest API available: NVIDIA CUDA, including its various techniques, and Vulkan. Complete automation for the test on Jetson Xavier is also provided by scripts that install software dependencies, run the experiments, and collect results in a PDF report.
Feedback for Dagstuhl Publishing