Search Results

Documents authored by Ansaloni, Giovanni


Document
Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures

Authors: Alireza Amirshahi, Giovanni Ansaloni, and David Atienza

Published in: OASIcs, Volume 116, 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)


Abstract
The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and accelerators tailored for transformer models, supporting their computation hotspots with high efficiency. However, memory bandwidth can hinder improvements in hardware accelerators. Against this backdrop, in this paper we propose a novel memory arrangement strategy, governed by the hardware accelerator’s kernel size, which effectively minimizes off-chip data access. This arrangement is particularly beneficial for end-to-end transformer model inference, where most of the computation is based on general matrix multiplication (GEMM) operations. Additionally, we address the overhead of non-GEMM operations in transformer models within the scope of this memory data arrangement. Our study explores the implementation and effectiveness of the proposed accelerator-driven data arrangement approach in both single- and multi-core systems. Our evaluation demonstrates that our approach can achieve up to a 2.7x speed increase when executing inferences employing state-of-the-art transformers.

Cite as

Alireza Amirshahi, Giovanni Ansaloni, and David Atienza. Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{amirshahi_et_al:OASIcs.PARMA-DITAM.2024.2,
  author =	{Amirshahi, Alireza and Ansaloni, Giovanni and Atienza, David},
  title =	{{Accelerator-Driven Data Arrangement to Minimize Transformers Run-Time on Multi-Core Architectures}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.2},
  URN =		{urn:nbn:de:0030-drops-196960},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.2},
  annote =	{Keywords: Memory arrangement, Data layout, Hardware accelerators, Transformer models, Multi-core, System simulation}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail