Global and Local Information in Clustering Labeled Block Models

Authors Varun Kanade, Elchanan Mossel, Tselil Schramm



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2014.779.pdf
  • Filesize: 0.59 MB
  • 14 pages

Document Identifiers

Author Details

Varun Kanade
Elchanan Mossel
Tselil Schramm

Cite AsGet BibTex

Varun Kanade, Elchanan Mossel, and Tselil Schramm. Global and Local Information in Clustering Labeled Block Models. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 28, pp. 779-792, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.779

Abstract

The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e. p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).
Keywords
  • stochastic block models
  • information flow on trees

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Armen E. Allahverdyan, Greg Ver Steeg, and Aram Galstyan. Community detection with and without prior information. Europhysics Letters, 90:18002, 2010. Google Scholar
  2. Sugato Basu, Arindam Banerjee, and Raymond J Mooney. Semi-supervised clustering by seeding. In ICML, volume 2, pages 27-34, 2002. Google Scholar
  3. Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 59-68. ACM, 2004. Google Scholar
  4. P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Science, 106(50):21068-21073, 2009. Google Scholar
  5. Olivier Chapelle, Jason Weston, and Bernhard Schoelkopf. Cluster kernels for semi-supervised learning. In NIPS, pages 585-592, 2002. Google Scholar
  6. A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(02):227-284, 2010. Google Scholar
  7. A. Condon and Richard M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116-140, 2001. Google Scholar
  8. Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84:066106, Dec 2011. Google Scholar
  9. Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett., 107:065701, 2011. Google Scholar
  10. Martin E. Dyer and Alan M. Frieze. The solution of some random NP-hard problems in polynomial expected time. Journal of Algorithms, 10(4):451-489, 1989. Google Scholar
  11. William Evans, Claire Kenyon, Yuval Peres, and Leonard J. Schulman. Broadcasting on trees and the Ising model. The Annals of Applied Probability, 10(2):410-433, 2000. Google Scholar
  12. David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 369-376. ACM, 2014. Google Scholar
  13. Hamed Hatami, László Lovász, and Balázs Szegedy. Limits of local-global convergent graph sequences. arXiv preprint arXiv:1205.4356, 2012. Google Scholar
  14. P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109-137, 1983. Google Scholar
  15. Mark Jerrum and G. B. Sorkin. The Metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82(1-3):155-175, 1998. Google Scholar
  16. Varun Kanade, Elchanan Mossel, and Tselil Schramm. Global and local information in clustering labeled block models. Available at http://arxiv.org/abs/1404.6325, 2014.
  17. Russell Lyons and Fedor Nazarov. Perfect matchings as iid factors on non-amenable groups. European Journal of Combinatorics, 32(7):1115-1125, 2011. Google Scholar
  18. Laurent Massoulié. Community detection thresholds and the weak Ramanujan property. In Proceedings of the Symposium on the Theory of Computation (STOC), 2014. Google Scholar
  19. Frank McSherry. Spectral partitioning of random graphs. In Proceedings of IEEE Conference on the Foundations of Computer Science (FOCS), pages 529-537, 2001. Google Scholar
  20. Elchanan Mossel. Reconstruction on trees: Beating the second eigenvalue. The Annals of Applied Probability, 11(1):285-300, 2001. Google Scholar
  21. Elchanan Mossel. Survey: Information flow on trees. Available at https://arxiv.org/abs/math/0406446, 2004.
  22. Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. Preprint avaiable at https://arxiv.org/abs/1202.1499, 2012.
  23. Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propogation, robust reconstruction, and optimal recovery of block models. Preprint available at https://arxiv.org/abs/1309.1380, 2013.
  24. Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. Preprint available at http://arxiv.org/abs/1311.4115, 2013.
  25. Elchanan Mossel and Yuval Peres. Information flow on trees. Ann. Appl. Probab., 13(3):817-1230, 2003. Google Scholar
  26. A. Sly. Reconstruction of symmetric potts models. In Proceedings of the 41st ACM Symposium on Theory of Computing, pages 581-590, 2009. Google Scholar
  27. T. A. B. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75-100, 1997. Google Scholar
  28. Greg Ver Steeg, Cristopher Moore, Aram Galstyan, and Armen E. Allahverdyan. Phase transitions in community detection: A solvable toy model. Available at http://www.santafe.edu/media/workingpapers/13-12-039.pdf, 2013.
  29. Pan Zhang, Florent Krzakala, Jörg Reichardt, and Lenka Zdeborová. Comparitive study for inference of hidden classes in stochastic block models. Journal of Statistical Mechanics: Theory and Experiment, 2012. Google Scholar
  30. Pan Zhang, Cristopher Moore, and Lenka Zdeborová. Phase transitions in semisupervised clustering of sparse networks. Available at http://arxiv.org/abs/1404.7789, 2014.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail