ACM Other Conferences

10.1145/acmotherconferences

0000000

10.5555/0000000

Proceedings of the Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)

APPROX/RANDOM 2014

10.4230/LIPIcs.APPROX-RANDOM.2014.779

Global and Local Information in Clustering Labeled Block Models

Kanade

Varun

Author Mossel

Elchanan

Author Schramm

Tselil

Author

04 09 2014

779 792

The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e. p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction

of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly.

In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).

stochastic block models information flow on trees

Armen E. Allahverdyan, Greg Ver Steeg, and Aram Galstyan. Community detection with and without prior information. Europhysics Letters, 90:18002, 2010.

Sugato Basu, Arindam Banerjee, and Raymond J Mooney. Semi-supervised clustering by seeding. In ICML, volume 2, pages 27-34, 2002.

Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 59-68. ACM, 2004.

P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Science, 106(50):21068-21073, 2009.

Olivier Chapelle, Jason Weston, and Bernhard Schoelkopf. Cluster kernels for semi-supervised learning. In NIPS, pages 585-592, 2002.

A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(02):227-284, 2010.

A. Condon and Richard M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116-140, 2001.

Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84:066106, Dec 2011.

Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett., 107:065701, 2011.

Martin E. Dyer and Alan M. Frieze. The solution of some random NP-hard problems in polynomial expected time. Journal of Algorithms, 10(4):451-489, 1989.

William Evans, Claire Kenyon, Yuval Peres, and Leonard J. Schulman. Broadcasting on trees and the Ising model. The Annals of Applied Probability, 10(2):410-433, 2000.

David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 369-376. ACM, 2014.

Hamed Hatami, László Lovász, and Balázs Szegedy. Limits of local-global convergent graph sequences. arXiv preprint arXiv:1205.4356, 2012.

P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109-137, 1983.

Mark Jerrum and G. B. Sorkin. The Metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82(1-3):155-175, 1998.

Varun Kanade, Elchanan Mossel, and Tselil Schramm. Global and local information in clustering labeled block models. Available at http://arxiv.org/abs/1404.6325, 2014.

Russell Lyons and Fedor Nazarov. Perfect matchings as iid factors on non-amenable groups. European Journal of Combinatorics, 32(7):1115-1125, 2011.

Laurent Massoulié. Community detection thresholds and the weak Ramanujan property. In Proceedings of the Symposium on the Theory of Computation (STOC), 2014.

Frank McSherry. Spectral partitioning of random graphs. In Proceedings of IEEE Conference on the Foundations of Computer Science (FOCS), pages 529-537, 2001.

Elchanan Mossel. Reconstruction on trees: Beating the second eigenvalue. The Annals of Applied Probability, 11(1):285-300, 2001.

Elchanan Mossel. Survey: Information flow on trees. Available at https://arxiv.org/abs/math/0406446, 2004.

Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. Preprint avaiable at https://arxiv.org/abs/1202.1499, 2012.

Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propogation, robust reconstruction, and optimal recovery of block models. Preprint available at https://arxiv.org/abs/1309.1380, 2013.

Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. Preprint available at http://arxiv.org/abs/1311.4115, 2013.

Elchanan Mossel and Yuval Peres. Information flow on trees. Ann. Appl. Probab., 13(3):817-1230, 2003.

A. Sly. Reconstruction of symmetric potts models. In Proceedings of the 41st ACM Symposium on Theory of Computing, pages 581-590, 2009.

T. A. B. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75-100, 1997.

Greg Ver Steeg, Cristopher Moore, Aram Galstyan, and Armen E. Allahverdyan. Phase transitions in community detection: A solvable toy model. Available at http://www.santafe.edu/media/workingpapers/13-12-039.pdf, 2013.

Pan Zhang, Florent Krzakala, Jörg Reichardt, and Lenka Zdeborová. Comparitive study for inference of hidden classes in stochastic block models. Journal of Statistical Mechanics: Theory and Experiment, 2012.

Pan Zhang, Cristopher Moore, and Lenka Zdeborová. Phase transitions in semisupervised clustering of sparse networks. Available at http://arxiv.org/abs/1404.7789, 2014.

<book-part-wrapper xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" content-type="research-article">

<collection-meta collection-type="book-series">

<collection-id collection-id-type="doi">10.1145/acmotherconferences</collection-id>

<title-group>

<title>ACM Other Conferences</title>

</title-group>

</collection-meta>

<book-meta>

<book-id book-id-type="acm-id">0000000</book-id>

<book-id book-id-type="doi">10.5555/0000000</book-id>

<book-title-group>

<book-title>Proceedings of the Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)</book-title>

<alt-title alt-title-type="acronym">APPROX/RANDOM 2014</alt-title>

</book-title-group>

</book-meta>

<book-part book-part-type="chapter" xml:lang="en">

<book-part-meta>

<book-part-id book-part-id-type="doi">10.4230/LIPIcs.APPROX-RANDOM.2014.779</book-part-id>

<book-part-id book-part-id-type="article-no">54</book-part-id>

<subj-group subj-group-type="ccs2012"/>

<title-group>

<title>Global and Local Information in Clustering Labeled Block Models</title>

</title-group>

<contrib-group>

<name>

<surname>Kanade</surname>

<given-names>Varun</given-names>

</name>

<role>Author</role>

</contrib>

<name>

<surname>Mossel</surname>

<given-names>Elchanan</given-names>

</name>

<role>Author</role>

</contrib>

<name>

<surname>Schramm</surname>

<given-names>Tselil</given-names>

</name>

<role>Author</role>

</contrib>

</contrib-group>

<pub-date date-type="publication">

</pub-date>

The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e. p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction

of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly.

In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).

</abstract>

<kwd-group>

<kwd>stochastic block models</kwd>

<kwd>information flow on trees</kwd>

</kwd-group>

</book-part-meta>

<back>

<ref-list specific-use="unparsed">

<mixed-citation>Armen E. Allahverdyan, Greg Ver Steeg, and Aram Galstyan. Community detection with and without prior information. Europhysics Letters, 90:18002, 2010.</mixed-citation>

</ref>

<mixed-citation>Sugato Basu, Arindam Banerjee, and Raymond J Mooney. Semi-supervised clustering by seeding. In ICML, volume 2, pages 27-34, 2002.</mixed-citation>

</ref>

<mixed-citation>Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 59-68. ACM, 2004.</mixed-citation>

</ref>

<mixed-citation>P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Science, 106(50):21068-21073, 2009.</mixed-citation>

</ref>

<mixed-citation>Olivier Chapelle, Jason Weston, and Bernhard Schoelkopf. Cluster kernels for semi-supervised learning. In NIPS, pages 585-592, 2002.</mixed-citation>

</ref>

<mixed-citation>A. Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(02):227-284, 2010.</mixed-citation>

</ref>

<mixed-citation>A. Condon and Richard M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116-140, 2001.</mixed-citation>

</ref>

<mixed-citation>Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E, 84:066106, Dec 2011.</mixed-citation>

</ref>

<mixed-citation>Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett., 107:065701, 2011.</mixed-citation>

</ref>

<mixed-citation>Martin E. Dyer and Alan M. Frieze. The solution of some random NP-hard problems in polynomial expected time. Journal of Algorithms, 10(4):451-489, 1989.</mixed-citation>

</ref>

<mixed-citation>William Evans, Claire Kenyon, Yuval Peres, and Leonard J. Schulman. Broadcasting on trees and the Ising model. The Annals of Applied Probability, 10(2):410-433, 2000.</mixed-citation>

</ref>

<mixed-citation>David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 369-376. ACM, 2014.</mixed-citation>

</ref>

<mixed-citation>Hamed Hatami, László Lovász, and Balázs Szegedy. Limits of local-global convergent graph sequences. arXiv preprint arXiv:1205.4356, 2012.</mixed-citation>

</ref>

<mixed-citation>P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109-137, 1983.</mixed-citation>

</ref>

<mixed-citation>Mark Jerrum and G. B. Sorkin. The Metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82(1-3):155-175, 1998.</mixed-citation>

</ref>

<mixed-citation>Varun Kanade, Elchanan Mossel, and Tselil Schramm. Global and local information in clustering labeled block models. Available at http://arxiv.org/abs/1404.6325, 2014.</mixed-citation>

</ref>

<mixed-citation>Russell Lyons and Fedor Nazarov. Perfect matchings as iid factors on non-amenable groups. European Journal of Combinatorics, 32(7):1115-1125, 2011.</mixed-citation>

</ref>

<mixed-citation>Laurent Massoulié. Community detection thresholds and the weak Ramanujan property. In Proceedings of the Symposium on the Theory of Computation (STOC), 2014.</mixed-citation>

</ref>

<mixed-citation>Frank McSherry. Spectral partitioning of random graphs. In Proceedings of IEEE Conference on the Foundations of Computer Science (FOCS), pages 529-537, 2001.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel. Reconstruction on trees: Beating the second eigenvalue. The Annals of Applied Probability, 11(1):285-300, 2001.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel. Survey: Information flow on trees. Available at https://arxiv.org/abs/math/0406446, 2004.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel, Joe Neeman, and Allan Sly. Stochastic block models and reconstruction. Preprint avaiable at https://arxiv.org/abs/1202.1499, 2012.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propogation, robust reconstruction, and optimal recovery of block models. Preprint available at https://arxiv.org/abs/1309.1380, 2013.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. Preprint available at http://arxiv.org/abs/1311.4115, 2013.</mixed-citation>

</ref>

<mixed-citation>Elchanan Mossel and Yuval Peres. Information flow on trees. Ann. Appl. Probab., 13(3):817-1230, 2003.</mixed-citation>

</ref>

<mixed-citation>A. Sly. Reconstruction of symmetric potts models. In Proceedings of the 41st ACM Symposium on Theory of Computing, pages 581-590, 2009.</mixed-citation>

</ref>

<mixed-citation>T. A. B. Snijders and K. Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14(1):75-100, 1997.</mixed-citation>

</ref>

<mixed-citation>Greg Ver Steeg, Cristopher Moore, Aram Galstyan, and Armen E. Allahverdyan. Phase transitions in community detection: A solvable toy model. Available at http://www.santafe.edu/media/workingpapers/13-12-039.pdf, 2013.</mixed-citation>

</ref>

<mixed-citation>Pan Zhang, Florent Krzakala, Jörg Reichardt, and Lenka Zdeborová. Comparitive study for inference of hidden classes in stochastic block models. Journal of Statistical Mechanics: Theory and Experiment, 2012.</mixed-citation>

</ref>

<mixed-citation>Pan Zhang, Cristopher Moore, and Lenka Zdeborová. Phase transitions in semisupervised clustering of sparse networks. Available at http://arxiv.org/abs/1404.7789, 2014.</mixed-citation>

</ref>

</ref-list>

</back>

</book-part>

</book-part-wrapper>