Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions

Procaccini, Marco; Sahebi, Amin; Barbone, Marco; Luk, Wayne; Gaydadjiev, Georgi; Giorgi, Roberto

doi:10.4230/OASIcs.PARMA-DITAM.2024.6

File

OASIcs.PARMA-DITAM.2024.6.pdf

Filesize: 0.89 MB
12 pages

Document Identifiers

DOI: 10.4230/OASIcs.PARMA-DITAM.2024.6
URN: urn:nbn:de:0030-drops-197003

Author Details

Marco Procaccini

University of Siena, Italy

Amin Sahebi

University of Siena, Italy

Marco Barbone

Imperial College London, UK

Wayne Luk

Imperial College London, UK

Georgi Gaydadjiev

Delft University of Technology, The Netherlands

Roberto Giorgi

University of Siena, Italy

Cite AsGet BibTex

Marco Procaccini, Amin Sahebi, Marco Barbone, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/OASIcs.PARMA-DITAM.2024.6

@InProceedings{procaccini_et_al:OASIcs.PARMA-DITAM.2024.6,
  author =	{Procaccini, Marco and Sahebi, Amin and Barbone, Marco and Luk, Wayne and Gaydadjiev, Georgi and Giorgi, Roberto},
  title =	{{Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{6:1--6:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.6},
  URN =		{urn:nbn:de:0030-drops-197003},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.6},
  annote =	{Keywords: Graph processing, Distributed computing, Grid partitioning, FPGA, Accelerators}
}

@InProceedings{procaccini_et_al:OASIcs.PARMA-DITAM.2024.6,
  author =	{Procaccini, Marco and Sahebi, Amin and Barbone, Marco and Luk, Wayne and Gaydadjiev, Georgi and Giorgi, Roberto},
  title =	{{Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions}},
  booktitle =	{15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024)},
  pages =	{6:1--6:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-307-2},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{116},
  editor =	{Bispo, Jo\~{a}o and Xydis, Sotirios and Curzel, Serena and Sousa, Lu{\'\i}s Miguel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.PARMA-DITAM.2024.6},
  URN =		{urn:nbn:de:0030-drops-197003},
  doi =		{10.4230/OASIcs.PARMA-DITAM.2024.6},
  annote =	{Keywords: Graph processing, Distributed computing, Grid partitioning, FPGA, Accelerators}
}

Abstract

Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.

Subject Classification

ACM Subject Classification

Hardware → Hardware accelerators

Keywords

Graph processing
Distributed computing
Grid partitioning
FPGA
Accelerators

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

AMD Xilinx. Heterogeneous Accelerated Compute Clusters. https://www.amd-haccs.io/, 2023 (accessed 15 November 2023).
Apache Hadoop. Hadoop. Accessed 30 Jan 2023. URL: https://hadoop.apache.org/.
Christophe Bobda, Joel Mandebi Mbongue, Paul Chow, Mohammad Ewais, Naif Tarafdar, Juan Camilo Vega, Ken Eguro, Dirk Koch, Suranga Handagala, Miriam Leeser, et al. The future of FPGA acceleration in datacenters and the cloud. ACM TRETS, 15(3):1-42, 2022.
Paolo Boldi and Sebastiano Vigna. The web graph framework I: Compression techniques. In Proc. of the 13th ACM International WWW Conference, pages 595-601, NY, USA, 2004.
Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. In The ACM/SIGDA International Symposium on FPGAs, FPGA '21, pages 69-80, New York, NY, USA, 2021.
Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. of the 2017 ACM/SIGDA International Symposium on FPGAs, FPGA '17, page 217226, NY, USA, 2017.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008.
Benedikt Elser and Alberto Montresor. An evaluation study of BigData frameworks for graph processing. In 2013 IEEE International Conference on Big Data, pages 60-67, 2013.
Nina Engelhardt and Hayden K.-H. So. GraVF-M: Graph processing system generation for Multi-FPGA platforms. ACM Trans. Reconfigurable Technol. Syst., 12(4), November 2019.
Antonio Filgueras, Miquel Vidal, Marc Mateu, Daniel Jiménez-González, Carlos Álvarez, Xavier Martorell, Eduard Ayguadé, Dimitrios Theodoropoulos, Dionisios Pnevmatikatos, Paolo Gai, Stefano Garzarella, David Oro, Javier Hernando, Nicola Bettin, Alberto Pomella, Marco Procaccini, and Roberto Giorgi. The axiom project: Iot on heterogeneous embedded platforms. IEEE Design Test, pages 1-1, 2019.
R. Giorgi, F. Khalili, and M. Procaccini. AXIOM: A Scalable, Efficient and Reconfigurable Embedded Platform. In IEEE Proc.DATE, pages 1-6, March 2019.
R. Giorgi, Farnam. Khalili, and Marco Procaccini. Translating Timing into an Architecture: The Synergy of COTSon and HLS. Hindawi - IJRC, 2019:1-18, December 2019.
Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 938-948. SIAM, 2010.
Jérôme Kunegis. Konect: The koblenz network collection. In Proc. of the 22nd Int. Conf. on WWW, pages 1343-1350, New York, NY, USA, 2013. ACM.
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection, 2014. URL: http://snap.stanford.edu/data, (accessed 15 November 2023).
Wenqiang Li, Guanghao Jin, Xuewen Cui, and Simon See. An evaluation of unified memory technology on NVIDIA GPUs. In 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing, pages 1092-1098. IEEE, 2015.
Yashuai Lü, Hui Guo, Libo Huang, Qi Yu, Li Shen, Nong Xiao, and Zhiying Wang. GraphPEG: Accelerating graph processing on GPUs. ACM TACO, 18(3):1-24, 2021.
M. Usman Nisar, Arash Fard, and John A. Miller. Techniques for graph analytics on big data. In 2013 IEEE International Congress on Big Data, pages 255-262, 2013.
Amin Sahebi, Marco Barbone, Marco Procaccini, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. Distributed large-scale graph processing on fpgas. Journal of Big Data, 10(1):95, 2023.
Sherif Sakr and et al. Bonifati. The future is big graphs: A community view on graph processing systems. Commun. ACM, 64(9):62-71, August 2021.
Justin S Smith, Adrian E Roitberg, and Olexandr Isayev. Transforming computational drug discovery with machine learning and AI, 2018.
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. on NNLS, 32(1):4-24, 2021.
Xiaowei Zhu, Wentao Han, and Wenguang Chen. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In Proc. of the 2015 Conf. on Usenix Annual Technical Conference, pages 375-386, USA, 2015.