CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Firmli, Soukaina; Trigonakis, Vasileios; Lozi, Jean-Pierre; Psaroudakis, Iraklis; Weld, Alexander; Chiadmi, Dalila; Hong, Sungpack; Chafi, Hassan

doi:10.4230/LIPIcs.OPODIS.2020.17

Abstract

The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph-processing system lies a concurrent graph data structure storing the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph-processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and low-memory graph mutations. Existing graph structures offer a hard trade-off between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these trade-offs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, and LLAMA, a state-of-the-art update-friendly graph structure. In our evaluation, which is based on popular graph-processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates.

Boost Adjacency-List Documentation. URL: https://www.boost.org/doc/libs/1_67_0//libs/graph/doc/adjacency_list.html.
Green-Marl Code. URL: https://github.com/stanford-ppl/Green-Marl.
LLAMA Code. URL: https://github.com/goatdb/llama.
OpenMP. URL: https://www.openmp.org.
PGQL: Property Graph Query Language. URL: http://pgql-lang.org/.
Property Graph Model. URL: https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model.
SNAP (2014). Stanford Network Analysis Platform. URL: http://snap.stanford.edu/snap.
SPARQL Query Language For RDF. URL: http://www.w3.org/TR/rdf-sparql-query/.
Tinkerpop, Gremlin. URL: https://github.com/tinkerpop/gremlin/wiki.
Tim Berners-Lee, James Hendler, Ora Lassila, et al. The Semantic Web. Scientific american, 284(5), 2001.
Raymond Cheng, Enhong Chen, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, and Feng Zhao. Kineograph: Taking The Pulse Of A Fast-changing And Connected World. In EuroSys, 2012.
Laxman Dhulipala, Guy Blelloch, and Julian Shun. Julienne: A Framework For Parallel Graph Algorithms Using Work-efficient Bucketing. In SPAA, 2017.
Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. Fractal: A General-Purpose Graph Pattern Mining System. In SIGMOD, 2019.
David Ediger, Jason Riedy, David A. Bader, and Henning Meyerhenke. Tracking Structure of Streaming Social Networks. In IPDPSW, 2011.
Soukaina Firmli and Dalila Chiadmi. A Review Of Engines For Graph Storage And Mutations. In Innovation In Information Systems And Technologies To Support Learning Research, 2020.
Gartner. Gartner Top 10 Data And Analytics Trends For 2019. URL: https://www.gartner.com/smarterwithgartner/gartner-top-10-data-analytics-trends/.
Michael Haubenschild, Manuel Then, Sungpack Hong, and Hassan Chafi. ASGraph: A Mutable Multi-versioned Graph Container With High Analytical Performance. In GRADES, 2016.
S. Hong, S. Depner, T. Manhardt, J. Van Der Lugt, M. Verstraaten, and H. Chafi. PGX.D: A Fast Distributed Graph Processing Engine. In SC, 2015.
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: A DSL For Easy And Efficient Graph Analysis. In ASPLOS, 2012.
Nikolaos D. Kallimanis and Eleni Kanellou. Wait-Free Concurrent Graph Objects With Dynamic Traversals. In OPODIS, 2016.
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. Graphflow: An Active Graph Database. In SIGMOD, 2017.
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI, 2012.
P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In ICDE, 2015.
K. Madduri and D.A. Bader. Compact Graph Representations And Parallel Connectivity Algorithms For Massive Dynamic Network Analysis. In IPDPS, 2009.
Mugilan Mariappan and Keval Vora. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In EuroSys, 2019.
Daniel Mawhirter and Bo Wu. AutoMine: Harmonizing High-level Abstraction And High Performance For Graph Mining. In SOSP, 2019.
Neo4j. Neo4j Graph Database. URL: http://www.neo4j.org.
Oracle. Parallel Graph Analytics (PGX). URL: https://www.oracle.com/middleware/technologies/parallel-graph-analytix.html.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The Pagerank Citation Ranking: Bringing Order To The Web. Technical report, Stanford InfoLab, 1999.
Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. GRAPHITE: An Extensible Graph Traversal Framework For Relational Database Management Systems. In SSDBM, 2015.
Raghavan Raman, Oskar van Rest, Sungpack Hong, Zhe Wu, Hassan Chafi, and Jay Banerjee. PGX.ISO: Parallel And Efficient In-memory Engine For Subgraph Isomorphism. In GRADES, 2014.
Nicholas P. Roth, Vasileios Trigonakis, Sungpack Hong, Hassan Chafi, Anthony Potter, Boris Motik, and Ian Horrocks. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. In GRADES, 2017.
Sherif Sakr, Sameh Elnikety, and Yuxiong He. G-SPARQL: A Hybrid Engine For Querying Large Attributed Graphs. In ACM CIKM, 2012.
Martin Sevenich, Sungpack Hong, Oskar van Rest, Zhe Wu, Jayanta Banerjee, and Hassan Chafi. Using Domain-specific Languages For Analytic Graph Databases. PVLDB, 9(13):1257-1268, September 2016.
Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework For Shared Memory. In PPoPP, 2013.
Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. NetworKit: A Tool Suite For Large-Scale Complex Network Analysis. Network Science, 4(4):508–530, 2016.
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guo Tong Xie. SQLGraph: An Efficient Relational-Based Property Graph Store. In SIGMOD, 2015.
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. PGQL: A Property Graph Query Language. In GRADES, 2016.
Brian Wheatman and Helen Xu. Packed Compressed Sparse Row: A Dynamic Graph Representation. In HPEC, 2018.
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. A Distributed Graph Engine For Web Scale RDF Data. PVLDB, 6(4), 2013.
Kaiyuan Zhang, Rong Chen, and Haibo Chen. NUMA-Aware Graph-Structured Analytics. In PPoPP, 2015.

CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Authors Soukaina Firmli, Vasileios Trigonakis, Jean-Pierre Lozi, Iraklis Psaroudakis, Alexander Weld, Dalila Chiadmi, Sungpack Hong, Hassan Chafi

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message