Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions

Authors Marco Procaccini , Amin Sahebi , Marco Barbone , Wayne Luk , Georgi Gaydadjiev , Roberto Giorgi

Thumbnail PDF


  • Filesize: 0.89 MB
  • 12 pages

Document Identifiers

Author Details

Marco Procaccini
  • University of Siena, Italy
Amin Sahebi
  • University of Siena, Italy
Marco Barbone
  • Imperial College London, UK
Wayne Luk
  • Imperial College London, UK
Georgi Gaydadjiev
  • Delft University of Technology, The Netherlands
Roberto Giorgi
  • University of Siena, Italy

Cite AsGet BibTex

Marco Procaccini, Amin Sahebi, Marco Barbone, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. Accelerating Large-Scale Graph Processing with FPGAs: Lesson Learned and Future Directions. In 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2024). Open Access Series in Informatics (OASIcs), Volume 116, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.

Subject Classification

ACM Subject Classification
  • Hardware → Hardware accelerators
  • Graph processing
  • Distributed computing
  • Grid partitioning
  • FPGA
  • Accelerators


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. AMD Xilinx. Heterogeneous Accelerated Compute Clusters., 2023 (accessed 15 November 2023).
  2. Apache Hadoop. Hadoop. Accessed 30 Jan 2023. URL:
  3. Christophe Bobda, Joel Mandebi Mbongue, Paul Chow, Mohammad Ewais, Naif Tarafdar, Juan Camilo Vega, Ken Eguro, Dirk Koch, Suranga Handagala, Miriam Leeser, et al. The future of FPGA acceleration in datacenters and the cloud. ACM TRETS, 15(3):1-42, 2022. Google Scholar
  4. Paolo Boldi and Sebastiano Vigna. The web graph framework I: Compression techniques. In Proc. of the 13th ACM International WWW Conference, pages 595-601, NY, USA, 2004. Google Scholar
  5. Xinyu Chen, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. ThunderGP: HLS-based graph processing framework on FPGAs. In The ACM/SIGDA International Symposium on FPGAs, FPGA '21, pages 69-80, New York, NY, USA, 2021. Google Scholar
  6. Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, and Huazhong Yang. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. of the 2017 ACM/SIGDA International Symposium on FPGAs, FPGA '17, page 217226, NY, USA, 2017. Google Scholar
  7. Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. Google Scholar
  8. Benedikt Elser and Alberto Montresor. An evaluation study of BigData frameworks for graph processing. In 2013 IEEE International Conference on Big Data, pages 60-67, 2013. Google Scholar
  9. Nina Engelhardt and Hayden K.-H. So. GraVF-M: Graph processing system generation for Multi-FPGA platforms. ACM Trans. Reconfigurable Technol. Syst., 12(4), November 2019. Google Scholar
  10. Antonio Filgueras, Miquel Vidal, Marc Mateu, Daniel Jiménez-González, Carlos Álvarez, Xavier Martorell, Eduard Ayguadé, Dimitrios Theodoropoulos, Dionisios Pnevmatikatos, Paolo Gai, Stefano Garzarella, David Oro, Javier Hernando, Nicola Bettin, Alberto Pomella, Marco Procaccini, and Roberto Giorgi. The axiom project: Iot on heterogeneous embedded platforms. IEEE Design Test, pages 1-1, 2019. Google Scholar
  11. R. Giorgi, F. Khalili, and M. Procaccini. AXIOM: A Scalable, Efficient and Reconfigurable Embedded Platform. In IEEE Proc.DATE, pages 1-6, March 2019. Google Scholar
  12. R. Giorgi, Farnam. Khalili, and Marco Procaccini. Translating Timing into an Architecture: The Synergy of COTSon and HLS. Hindawi - IJRC, 2019:1-18, December 2019. Google Scholar
  13. Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 938-948. SIAM, 2010. Google Scholar
  14. Jérôme Kunegis. Konect: The koblenz network collection. In Proc. of the 22nd Int. Conf. on WWW, pages 1343-1350, New York, NY, USA, 2013. ACM. Google Scholar
  15. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection, 2014. URL:, (accessed 15 November 2023).
  16. Wenqiang Li, Guanghao Jin, Xuewen Cui, and Simon See. An evaluation of unified memory technology on NVIDIA GPUs. In 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing, pages 1092-1098. IEEE, 2015. Google Scholar
  17. Yashuai Lü, Hui Guo, Libo Huang, Qi Yu, Li Shen, Nong Xiao, and Zhiying Wang. GraphPEG: Accelerating graph processing on GPUs. ACM TACO, 18(3):1-24, 2021. Google Scholar
  18. M. Usman Nisar, Arash Fard, and John A. Miller. Techniques for graph analytics on big data. In 2013 IEEE International Congress on Big Data, pages 255-262, 2013. Google Scholar
  19. Amin Sahebi, Marco Barbone, Marco Procaccini, Wayne Luk, Georgi Gaydadjiev, and Roberto Giorgi. Distributed large-scale graph processing on fpgas. Journal of Big Data, 10(1):95, 2023. Google Scholar
  20. Sherif Sakr and et al. Bonifati. The future is big graphs: A community view on graph processing systems. Commun. ACM, 64(9):62-71, August 2021. Google Scholar
  21. Justin S Smith, Adrian E Roitberg, and Olexandr Isayev. Transforming computational drug discovery with machine learning and AI, 2018. Google Scholar
  22. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. on NNLS, 32(1):4-24, 2021. Google Scholar
  23. Xiaowei Zhu, Wentao Han, and Wenguang Chen. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In Proc. of the 2015 Conf. on Usenix Annual Technical Conference, pages 375-386, USA, 2015. Google Scholar