CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Authors Soukaina Firmli, Vasileios Trigonakis, Jean-Pierre Lozi, Iraklis Psaroudakis, Alexander Weld, Dalila Chiadmi, Sungpack Hong, Hassan Chafi



PDF
Thumbnail PDF

File

LIPIcs.OPODIS.2020.17.pdf
  • Filesize: 0.59 MB
  • 16 pages

Document Identifiers

Author Details

Soukaina Firmli
  • Mohammed V University in Rabat, Ecole Mohammadia d'Ingénieurs, SIP Research Team, Morocco
  • Oracle Labs, Casablanca, Morocco
Vasileios Trigonakis
  • Oracle Labs, Zürich, Switzerland
Jean-Pierre Lozi
  • Oracle Labs, Zürich, Switzerland
Iraklis Psaroudakis
  • Oracle Labs, Zürich, Switzerland
Alexander Weld
  • Oracle Labs, Zürich, Switzerland
Dalila Chiadmi
  • Mohammed V University in Rabat, Ecole Mohammadia d'Ingénieurs, SIP Research Team, Morocco
Sungpack Hong
  • Oracle Labs, Palo Alto, CA, USA
Hassan Chafi
  • Oracle Labs, Palo Alto, CA, USA

Cite AsGet BibTex

Soukaina Firmli, Vasileios Trigonakis, Jean-Pierre Lozi, Iraklis Psaroudakis, Alexander Weld, Dalila Chiadmi, Sungpack Hong, and Hassan Chafi. CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure. In 24th International Conference on Principles of Distributed Systems (OPODIS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 184, pp. 17:1-17:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.OPODIS.2020.17

Abstract

The graph model enables a broad range of analysis, thus graph processing is an invaluable tool in data analytics. At the heart of every graph-processing system lies a concurrent graph data structure storing the graph. Such a data structure needs to be highly efficient for both graph algorithms and queries. Due to the continuous evolution, the sparsity, and the scale-free nature of real-world graphs, graph-processing systems face the challenge of providing an appropriate graph data structure that enables both fast analytical workloads and low-memory graph mutations. Existing graph structures offer a hard trade-off between read-only performance, update friendliness, and memory consumption upon updates. In this paper, we introduce CSR++, a new graph data structure that removes these trade-offs and enables both fast read-only analytics and quick and memory-friendly mutations. CSR++ combines ideas from CSR, the fastest read-only data structure, and adjacency lists to achieve the best of both worlds. We compare CSR++ to CSR, adjacency lists from the Boost Graph Library, and LLAMA, a state-of-the-art update-friendly graph structure. In our evaluation, which is based on popular graph-processing algorithms executed over real-world graphs, we show that CSR++ remains close to CSR in read-only concurrent performance (within 10% on average), while significantly outperforming CSR (by an order of magnitude) and LLAMA (by almost 2×) with frequent updates.

Subject Classification

ACM Subject Classification
  • Information systems → Data structures
  • Theory of computation → Concurrency
  • Theory of computation → Graph algorithms analysis
  • Computing methodologies → Concurrent algorithms
Keywords
  • Data Structures
  • Concurrency
  • Graph Processing
  • Graph Mutations

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Boost Adjacency-List Documentation. URL: https://www.boost.org/doc/libs/1_67_0//libs/graph/doc/adjacency_list.html.
  2. Green-Marl Code. URL: https://github.com/stanford-ppl/Green-Marl.
  3. LLAMA Code. URL: https://github.com/goatdb/llama.
  4. OpenMP. URL: https://www.openmp.org.
  5. PGQL: Property Graph Query Language. URL: http://pgql-lang.org/.
  6. Property Graph Model. URL: https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model.
  7. SNAP (2014). Stanford Network Analysis Platform. URL: http://snap.stanford.edu/snap.
  8. SPARQL Query Language For RDF. URL: http://www.w3.org/TR/rdf-sparql-query/.
  9. Tinkerpop, Gremlin. URL: https://github.com/tinkerpop/gremlin/wiki.
  10. Tim Berners-Lee, James Hendler, Ora Lassila, et al. The Semantic Web. Scientific american, 284(5), 2001. Google Scholar
  11. Raymond Cheng, Enhong Chen, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, and Feng Zhao. Kineograph: Taking The Pulse Of A Fast-changing And Connected World. In EuroSys, 2012. Google Scholar
  12. Laxman Dhulipala, Guy Blelloch, and Julian Shun. Julienne: A Framework For Parallel Graph Algorithms Using Work-efficient Bucketing. In SPAA, 2017. Google Scholar
  13. Vinicius Dias, Carlos H. C. Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. Fractal: A General-Purpose Graph Pattern Mining System. In SIGMOD, 2019. Google Scholar
  14. David Ediger, Jason Riedy, David A. Bader, and Henning Meyerhenke. Tracking Structure of Streaming Social Networks. In IPDPSW, 2011. Google Scholar
  15. Soukaina Firmli and Dalila Chiadmi. A Review Of Engines For Graph Storage And Mutations. In Innovation In Information Systems And Technologies To Support Learning Research, 2020. Google Scholar
  16. Gartner. Gartner Top 10 Data And Analytics Trends For 2019. URL: https://www.gartner.com/smarterwithgartner/gartner-top-10-data-analytics-trends/.
  17. Michael Haubenschild, Manuel Then, Sungpack Hong, and Hassan Chafi. ASGraph: A Mutable Multi-versioned Graph Container With High Analytical Performance. In GRADES, 2016. Google Scholar
  18. S. Hong, S. Depner, T. Manhardt, J. Van Der Lugt, M. Verstraaten, and H. Chafi. PGX.D: A Fast Distributed Graph Processing Engine. In SC, 2015. Google Scholar
  19. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-Marl: A DSL For Easy And Efficient Graph Analysis. In ASPLOS, 2012. Google Scholar
  20. Nikolaos D. Kallimanis and Eleni Kanellou. Wait-Free Concurrent Graph Objects With Dynamic Traversals. In OPODIS, 2016. Google Scholar
  21. Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. Graphflow: An Active Graph Database. In SIGMOD, 2017. Google Scholar
  22. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI, 2012. Google Scholar
  23. P. Macko, V. J. Marathe, D. W. Margo, and M. I. Seltzer. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In ICDE, 2015. Google Scholar
  24. K. Madduri and D.A. Bader. Compact Graph Representations And Parallel Connectivity Algorithms For Massive Dynamic Network Analysis. In IPDPS, 2009. Google Scholar
  25. Mugilan Mariappan and Keval Vora. GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs. In EuroSys, 2019. Google Scholar
  26. Daniel Mawhirter and Bo Wu. AutoMine: Harmonizing High-level Abstraction And High Performance For Graph Mining. In SOSP, 2019. Google Scholar
  27. Neo4j. Neo4j Graph Database. URL: http://www.neo4j.org.
  28. Oracle. Parallel Graph Analytics (PGX). URL: https://www.oracle.com/middleware/technologies/parallel-graph-analytix.html.
  29. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The Pagerank Citation Ranking: Bringing Order To The Web. Technical report, Stanford InfoLab, 1999. Google Scholar
  30. Marcus Paradies, Wolfgang Lehner, and Christof Bornhövd. GRAPHITE: An Extensible Graph Traversal Framework For Relational Database Management Systems. In SSDBM, 2015. Google Scholar
  31. Raghavan Raman, Oskar van Rest, Sungpack Hong, Zhe Wu, Hassan Chafi, and Jay Banerjee. PGX.ISO: Parallel And Efficient In-memory Engine For Subgraph Isomorphism. In GRADES, 2014. Google Scholar
  32. Nicholas P. Roth, Vasileios Trigonakis, Sungpack Hong, Hassan Chafi, Anthony Potter, Boris Motik, and Ian Horrocks. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. In GRADES, 2017. Google Scholar
  33. Sherif Sakr, Sameh Elnikety, and Yuxiong He. G-SPARQL: A Hybrid Engine For Querying Large Attributed Graphs. In ACM CIKM, 2012. Google Scholar
  34. Martin Sevenich, Sungpack Hong, Oskar van Rest, Zhe Wu, Jayanta Banerjee, and Hassan Chafi. Using Domain-specific Languages For Analytic Graph Databases. PVLDB, 9(13):1257-1268, September 2016. Google Scholar
  35. Julian Shun and Guy E. Blelloch. Ligra: A Lightweight Graph Processing Framework For Shared Memory. In PPoPP, 2013. Google Scholar
  36. Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. NetworKit: A Tool Suite For Large-Scale Complex Network Analysis. Network Science, 4(4):508–530, 2016. Google Scholar
  37. Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guo Tong Xie. SQLGraph: An Efficient Relational-Based Property Graph Store. In SIGMOD, 2015. Google Scholar
  38. Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. PGQL: A Property Graph Query Language. In GRADES, 2016. Google Scholar
  39. Brian Wheatman and Helen Xu. Packed Compressed Sparse Row: A Dynamic Graph Representation. In HPEC, 2018. Google Scholar
  40. Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. A Distributed Graph Engine For Web Scale RDF Data. PVLDB, 6(4), 2013. Google Scholar
  41. Kaiyuan Zhang, Rong Chen, and Haibo Chen. NUMA-Aware Graph-Structured Analytics. In PPoPP, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail