Search Results

Documents authored by Cavicchioli, Roberto


Document
Artifact
API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms (Artifact)

Authors: Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna

Published in: DARTS, Volume 5, Issue 1, Special Issue of the 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)


Abstract
High-performance heterogeneous embedded platforms allow offloading of parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). A time-predictable characterization of task submission is a must in real-time applications. We provide a profiler of the time spent by the CPU for submitting stereotypical GP-GPU workload shaped as a Deep Neural Network of parameterized complexity. The submission is performed using the latest API available: NVIDIA CUDA, including its various techniques, and Vulkan. Complete automation for the test on Jetson Xavier is also provided by scripts that install software dependencies, run the experiments, and collect results in a PDF report.

Cite as

Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna. API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms (Artifact). In Special Issue of the 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Dagstuhl Artifacts Series (DARTS), Volume 5, Issue 1, pp. 4:1-4:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Article{cavicchioli_et_al:DARTS.5.1.4,
  author =	{Cavicchioli, Roberto and Capodieci, Nicola and Solieri, Marco and Bertogna, Marko},
  title =	{{API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms}},
  pages =	{4:1--4:3},
  journal =	{Dagstuhl Artifacts Series},
  ISSN =	{2509-8195},
  year =	{2019},
  volume =	{5},
  number =	{1},
  editor =	{Cavicchioli, Roberto and Capodieci, Nicola and Solieri, Marco and Bertogna, Marko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DARTS.5.1.4},
  URN =		{urn:nbn:de:0030-drops-107322},
  doi =		{10.4230/DARTS.5.1.4},
  annote =	{Keywords: GPU, Applications, Heterogeneus systems}
}
Document
Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Authors: Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna

Published in: LIPIcs, Volume 133, 31st Euromicro Conference on Real-Time Systems (ECRTS 2019)


Abstract
There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems. Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity.

Cite as

Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, and Marko Bertogna. Novel Methodologies for Predictable CPU-To-GPU Command Offloading. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 133, pp. 22:1-22:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{cavicchioli_et_al:LIPIcs.ECRTS.2019.22,
  author =	{Cavicchioli, Roberto and Capodieci, Nicola and Solieri, Marco and Bertogna, Marko},
  title =	{{Novel Methodologies for Predictable CPU-To-GPU Command Offloading}},
  booktitle =	{31st Euromicro Conference on Real-Time Systems (ECRTS 2019)},
  pages =	{22:1--22:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-110-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{133},
  editor =	{Quinton, Sophie},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECRTS.2019.22},
  URN =		{urn:nbn:de:0030-drops-107595},
  doi =		{10.4230/LIPIcs.ECRTS.2019.22},
  annote =	{Keywords: Heterogeneous systems, GPU, CUDA, Vulkan}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail