Speeding Up BigClam Implementation on SNAP

Authors C. H. Bryan Liu, Benjamin Paul Chamberlain



PDF
Thumbnail PDF

File

OASIcs.ICCSW.2018.1.pdf
  • Filesize: 0.68 MB
  • 13 pages

Document Identifiers

Author Details

C. H. Bryan Liu
  • Department of Computing, Imperial College London, United Kingdom
Benjamin Paul Chamberlain
  • Department of Computing, Imperial College London, United Kingdom

Cite AsGet BibTex

C. H. Bryan Liu and Benjamin Paul Chamberlain. Speeding Up BigClam Implementation on SNAP. In 2018 Imperial College Computing Student Workshop (ICCSW 2018). Open Access Series in Informatics (OASIcs), Volume 66, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.ICCSW.2018.1

Abstract

We perform a detailed analysis of the C++ implementation of the Cluster Affiliation Model for Big Networks (BigClam) on the Stanford Network Analysis Project (SNAP). BigClam is a popular graph mining algorithm that is capable of finding overlapping communities in networks containing millions of nodes. Our analysis shows a key stage of the algorithm - determining if a node belongs to a community - dominates the runtime of the implementation, yet the computation is not parallelized. We show that by parallelizing computations across multiple threads using OpenMP we can speed up the algorithm by 5.3 times when solving large networks for communities, while preserving the integrity of the program and the result.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Concurrent algorithms
Keywords
  • BigClam
  • Community Detection
  • Parallelization
  • Networks

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Reid Andersen, Fan Chung, and Kevin Lang. Local Graph Partitioning Using PageRank Vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS '06, pages 475-486, Washington, DC, USA, 2006. IEEE Computer Society. URL: http://dx.doi.org/10.1109/FOCS.2006.44.
  2. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008. Google Scholar
  3. T. S. Evans and R. Lambiotte. Line graphs of weighted networks for overlapping communities. The European Physical Journal B, 77(2):265-272, September 2010. URL: http://dx.doi.org/10.1140/epjb/e2010-00261-8.
  4. Santo Fortunato. Community Detection in Graphs. Physics Reports, 486(3):75-174, 2010. URL: http://dx.doi.org/10.1016/j.physrep.2009.11.002.
  5. David F. Gleich and C. Seshadhri. Vertex Neighborhoods, Low Conductance Cuts, and Good Seeds for Local Community Methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 597-605, New York, NY, USA, 2012. ACM. URL: http://dx.doi.org/10.1145/2339530.2339628.
  6. Patrik O Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5(Nov):1457-1469, 2004. Google Scholar
  7. Ravi Kannan, Santosh Vempala, and Adrian Vetta. On Clusterings: Good, Bad and Spectral. J. ACM, 51(3):497-515, May 2004. URL: http://dx.doi.org/10.1145/990308.990313.
  8. Zhana Kuncheva and Giovanni Montana. Community Detection in Multiplex Networks Using Locally Adaptive Random Walks. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ASONAM '15, pages 1308-1315, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2808797.2808852.
  9. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
  10. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, New York, NY, USA, 2nd edition, 2014. Google Scholar
  11. Jure Leskovec and Rok Sosič. SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1):1, 2016. Google Scholar
  12. C. H. Bryan Liu. On overlapping community-based networks: generation, detection, and their applications. Master’s thesis, Imperial College London, London, United Kingdom, June 2016. Google Scholar
  13. M. E. J. Newman. Detecting community structure in networks. The European Physical Journal B, 38(2):321-330, March 2004. URL: http://dx.doi.org/10.1140/epjb/e2004-00124-y.
  14. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69:026113, February 2004. URL: http://dx.doi.org/10.1103/PhysRevE.69.026113.
  15. OpenMP Architecture Review Board. OpenMP application program interface version 4.5, 2015. URL: http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.
  16. Pascal Pons and Matthieu Latapy. Computing Communities in Large Networks Using Random Walks. In Proceedings of the 20th International Conference on Computer and Information Sciences, ISCIS'05, pages 284-293, Berlin, Heidelberg, 2005. Springer-Verlag. URL: http://dx.doi.org/10.1007/11569596_31.
  17. Alex Pothen. Graph Partitioning Algorithms with Applications to Scientific Computing. In David E. Keyes, Ahmed Sameh, and V. Venkatakrishnan, editors, Parallel Numerical Algorithms, pages 323-368, Dordrecht, 1997. Springer Netherlands. URL: http://dx.doi.org/10.1007/978-94-011-5412-3_12.
  18. Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E, 83:066114, June 2011. URL: http://dx.doi.org/10.1103/PhysRevE.83.066114.
  19. Rik Sarkar. Community detection and cascades [Lecture]. http://www.inf.ed.ac.uk/teaching/courses/stn/files1516/slides/community-continued.pdf, 2015. Lecture slides from the course Social and Technological Networks (Autumn 2015), University of Edinburgh.
  20. Meng Wang, Chaokun Wang, Jeffrey Xu Yu, and Jun Zhang. Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-oriented Framework. Proc. VLDB Endow., 8(10):998-1009, June 2015. URL: http://dx.doi.org/10.14778/2794367.2794370.
  21. Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440-442, 1998. Google Scholar
  22. Joyce Jiyoung Whang, David F Gleich, and Inderjit S Dhillon. Overlapping community detection using seed set expansion. In Proceedings of the 22nd ACM international conference on Conference on information &knowledge management, pages 2099-2108. ACM, 2013. Google Scholar
  23. Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. An Empirical Evaluation of In-memory Multi-version Concurrency Control. Proc. VLDB Endow., 10(7):781-792, March 2017. URL: http://dx.doi.org/10.14778/3067421.3067427.
  24. Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. GraphX: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, pages 2:1-2:6, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2484425.2484427.
  25. J. Yang and J. Leskovec. Community-Affiliation Graph Model for Overlapping Network Community Detection. In 2012 IEEE 12th International Conference on Data Mining, pages 1170-1175, December 2012. URL: http://dx.doi.org/10.1109/ICDM.2012.139.
  26. Jaewon Yang and Jure Leskovec. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM '13, pages 587-596, New York, NY, USA, 2013. ACM. URL: http://dx.doi.org/10.1145/2433396.2433471.
  27. Hongyi Zhang, Irwin King, and Michael R. Lyu. Incorporating Implicit Link Preference into Overlapping Community Detection. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI'15, pages 396-402. AAAI Press, 2015. URL: http://dl.acm.org/citation.cfm?id=2887007.2887063.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail