Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades

Authors Corrie Jacobien Carstens, Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, Dorothea Wagner



PDF
Thumbnail PDF

File

LIPIcs.ESA.2018.11.pdf
  • Filesize: 0.95 MB
  • 16 pages

Document Identifiers

Author Details

Corrie Jacobien Carstens
  • University of Amsterdam, Netherlands
Michael Hamann
  • Karlsruhe Institute of Technology, Germany
Ulrich Meyer
  • Goethe University, Frankfurt, Germany
Manuel Penschuck
  • Goethe University, Frankfurt, Germany
Hung Tran
  • Goethe University, Frankfurt, Germany
Dorothea Wagner
  • Karlsruhe Institute of Technology, Germany

Cite AsGet BibTex

Corrie Jacobien Carstens, Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, and Dorothea Wagner. Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 11:1-11:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ESA.2018.11

Abstract

Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved [Mihail and Zegura, 2003]. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC [Carstens et al., 2016]. Since trades however are more expensive, we study Curveball's practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades [Carstens et al., 2016] processing every node in a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES [M. Hamann et al., 2017] nearly one order of magnitude faster.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Random graphs
Keywords
  • Graph randomisation
  • Curveball
  • I/O-efficiency
  • Parallelism

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Aggarwal, J. Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, 1988. URL: http://dx.doi.org/10.1145/48529.48535.
  2. L. Arge. The buffer tree: A new technique for optimal I/O-algorithms, pages 334-345. Springer Berlin Heidelberg, 1995. URL: http://dx.doi.org/10.1007/3-540-60220-8_74.
  3. A. Beckmann, R. Dementiev, and J. Singler. Building a parallel pipelined external memory algorithm library. In IPDPS'09, 2009. URL: http://dx.doi.org/10.1109/IPDPS.2009.5161001.
  4. C. J. Carstens. Proof of uniform sampling of binary matrices with fixed row sums and column sums for the fast curveball algorithm. Physical Review E, 91:042812, 2015. Google Scholar
  5. C. J. Carstens. Topology of Complex Networks: Models and Analysis. PhD thesis, RMIT University, January 2016. Google Scholar
  6. C. J. Carstens, A. Berger, and G. Strona. Curveball: a new generation of sampling algorithms for graphs with fixed degree sequence. CoRR, 2016. URL: http://arxiv.org/abs/1609.05137.
  7. C. J. Carstens, M. Hamann, U. Meyer, M. Penschuck, H. Tran, and D. Wagner. Parallel and I/O-efficient randomisation of massive networks using global curveball trades. CoRR, abs/1804.08487, 2018. Google Scholar
  8. G. W. Cobb and Y.-P. Chen. An application of markov chain monte carlo to community ecology. The American Mathematical Monthly, 110(4):265-288, 2003. Google Scholar
  9. R. Dementiev, L. Kettner, and P. Sanders. STXXL: standard template library for XXL data sets. Software: Practice and Experience, 38(6):589-637, 2008. URL: http://dx.doi.org/10.1002/spe.844.
  10. R. B. Eggleton and D. A. Holton. Simple and multigraphic realizations of degree sequences, pages 155-172. Springer Berlin Heidelberg, 1981. URL: http://dx.doi.org/10.1007/BFb0091817.
  11. P. Erdős and A. Rényi. On random graphs I. Publicationes Mathematicae Debrecen, 1959. Google Scholar
  12. C. Greenhill. A polynomial bound on the mixing time of a markov chain for sampling regular directed graphs. The Electronic Journal of Combinatorics, 18(1):P234, 2011. Google Scholar
  13. C. Greenhill. The switch markov chain for sampling irregular graphs: Extended abstract. In Proceedings of SODA '15, pages 1564-1572, 2015. Google Scholar
  14. M. Hamann, U. Meyer, M. Penschuck, H. Tran, and D. Wagner. I/O-efficient generation of massive graphs following the LFR benchmark. CoRR, 2017. URL: http://arxiv.org/abs/1604.08738.
  15. M. Hamann, U. Meyer, M. Penschuck, and D. Wagner. I/O-efficient generation of massive graphs following the LFR benchmark. In ALENEX, 2017. URL: http://dx.doi.org/10.1137/1.9781611974768.
  16. F. Iorio, M. Bernardo-Faura, A. Gobbi, T. Cokelaer, G. Jurman, and J. Saez-Rodriguez. Efficient randomization of biological networks while preserving functional characterization of individual nodes. BMC bioinformatics, 17(1):542, 2016. Google Scholar
  17. S. Itzkovitz, R. Milo, N. Kashtan, G. Ziv, and U. Alon. Subgraphs in random networks. Physical review E, 68:026127, Aug 2003. URL: http://dx.doi.org/10.1103/PhysRevE.68.026127.
  18. A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E, 80:016118, Jul 2009. URL: http://dx.doi.org/10.1103/PhysRevE.80.016118.
  19. A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78:046110, 2008. URL: http://dx.doi.org/10.1103/PhysRevE.78.046110.
  20. D. A. Levin, Y. Peres, and E. L. Wilmer. Markov chains and mixing times. American Mathematical Society, Providence, Rhode Island, 2009. Google Scholar
  21. A. Maheshwari and N. Zeh. A Survey of Techniques for Designing I/O-Efficient Algorithms, pages 36-61. Springer Berlin Heidelberg, 2003. Google Scholar
  22. U. Meyer, P. Sanders, and J. Sibeyn. Algorithms for Memory Hierarchies: Advanced Lectures. Springer Berlin Heidelberg, 2003. URL: http://dx.doi.org/10.1007/3-540-36574-5.
  23. C. G. M. Mihail and E. Zegura. The markov chain simulation method for generating connected power law random graphs. In Proceedings of ALENEX '03. SIAM, 2003. Google Scholar
  24. R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. On the uniform generation of random graphs with prescribed degree sequences. CoRR, 2003. URL: http://arxiv.org/abs/cond-mat/0312028.
  25. M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random Struct. Algorithms, 6(2/3):161-179, 1995. Google Scholar
  26. M. E. J. Newman. The Structure and Function of Complex Networks. SIAM Review, 45(2):167-256, 2003. URL: http://dx.doi.org/10.1137/S003614450342480.
  27. M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64:026118, Jul 2001. URL: http://dx.doi.org/10.1103/PhysRevE.64.026118.
  28. R. Pagh. Basic external memory data structures, pages 36-61. Springer Berlin Heidelberg, 2003. Google Scholar
  29. J. Ray, A. Pinar, and C. Seshadhri. Are We There Yet? When to Stop a Markov Chain while Generating Random Graphs, pages 153-164. Springer Berlin Heidelberg, 2012. URL: http://dx.doi.org/10.1007/978-3-642-30541-2_12.
  30. J. Ray, A. Pinar, and C. Seshadhri. A stopping criterion for markov chains when generating independent random graphs. J. of Compl. Net., 3(2), 2015. URL: http://dx.doi.org/10.1093/comnet/cnu041.
  31. W. E. Schlauch, E. Á. Horvát, and K. A. Zweig. Different flavors of randomness: comparing random graph models with fixed degree sequences. Social Network Analysis and Mining, 5(1):1-14, 2015. URL: http://dx.doi.org/10.1007/s13278-015-0267-z.
  32. W. E. Schlauch and K. A. Zweig. Influence of the null-model on motif detection. In ASONAM'15, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2808797.2809400.
  33. C. L. Staudt, A. Sazonovs, and H. Meyerhenke. NetworKit: A tool suite for large-scale complex network analysis. Network Science, 4(04), 2016. URL: http://dx.doi.org/10.1017/nws.2016.20.
  34. S. H. Strogatz. Exploring complex networks. Nature, 410(6825):268, 2001. Google Scholar
  35. G. Strona, D. Nappo, F. Boccacci, S. Fattorini, and J. San-Miguel-Ayanz. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications, 5:4114-, 2014. URL: http://dx.doi.org/10.1038/ncomms5114.
  36. N. D. Verhelst. An efficient MCMC algorithm to sample binary matrices with fixed marginals. Psychometrika, 73(4):705-728, 2008. Google Scholar
  37. F. Viger and M. Latapy. Fast generation of random connected graphs with prescribed degrees. CoRR, feb 2005. Source code available at https://www-complexnetworks.lip6.fr/~latapy/FV/generation.html. URL: http://arxiv.org/abs/cs/0502085.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail