Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains

Katzelnick, Dor; Schwartz, Roy

doi:10.4230/LIPIcs.APPROX/RANDOM.2020.49

Subject Classification

ACM Subject Classification

Theory of computation → Facility location and clustering
Mathematics of computing → Combinatorial optimization

Keywords

Correlation Clustering
Grothendieck’s Inequality
Approximation

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

Correlation Clustering is an elegant model where given a graph with edges labeled + or -, the goal is to produce a clustering that agrees the most with the labels: + edges should reside within clusters and - edges should cross between clusters. In this work we study the MaxCorr objective, aiming to find a clustering that maximizes the difference between edges classified correctly and incorrectly. We focus on the case of bipartite graphs and present an improved approximation of 0.254, improving upon the known approximation of 0.219 given by Charikar and Wirth [FOCS`2004] and going beyond the 0.2296 barrier imposed by their technique. Our algorithm is inspired by Krivine’s method for bounding Grothendieck’s constant, and we extend this method to allow for more than two clusters in the output. Moreover, our algorithm leads to two additional results: (1) the first known approximation guarantees for MaxCorr where the output is constrained to have a bounded number of clusters; and (2) a natural extension of Grothendieck’s inequality to large domains.

Cite As Get BibTex

Dor Katzelnick and Roy Schwartz. Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 49:1-49:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.49

Author Details

Dor Katzelnick

Department of Computer Science, Technion, Haifa, Israel

Roy Schwartz

Department of Computer Science, Technion, Haifa, Israel

References

Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke Van Zuylen. Improved approximation algorithms for bipartite correlation clustering. SIAM Journal on Computing, 41(5):1110-1121, 2012.
Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008.
Noga Alon, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. Quadratic forms on graphs. Inventiones mathematicae, 163(3):499-522, 2006.
Noga Alon and Assaf Naor. Approximating the cut-norm via grothendieck’s inequality. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 72-80, 2004.
Noga Amit. The bicluster graph editing problem. PhD thesis, Tel Aviv University, 2004.
Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, and Alexandros Dimakis. Bipartite correlation clustering: Maximizing agreements. In Artificial Intelligence and Statistics, pages 121-129, 2016.
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1-3):89-113, 2004.
Amir Ben-Dor, Ron Shamir, and Zohar Yakhini. Clustering gene expression patterns. Journal of computational biology, 6(3-4):281-297, 1999.
Mark Braverman, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. The grothendieck constant is strictly smaller than krivine’s bound. ieee 52nd annual symposium on foundations of computer science focs 2011, 453-462. IEEE Computer Soc., Los Alamitos, CA, 2011.
Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360-383, 2005.
Moses Charikar and Anthony Wirth. Maximizing quadratic programs: Extending grothendieck’s inequality. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 54-60. IEEE, 2004.
Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal lp rounding algorithm for correlationclustering on complete and complete k-partite graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 219-228, 2015.
Yizong Cheng and George M Church. Biclustering of expression data. In Ismb, volume 8, pages 93-103, 2000.
William Cohen and Jacob Richman. Learning to match and cluster entity names. In In ACM SIGIR-2001 Workshop on Mathematical/Formal Methods in Information Retrieval, 2001.
William W. Cohen and Jacob Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, page 475–480, New York, NY, USA, 2002. Association for Computing Machinery.
Tom Coleman, James Saunderson, and Anthony Wirth. A local-search 2-approximation for 2-correlation-clustering. In European Symposium on Algorithms, pages 308-319. Springer, 2008.
Erik D Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172-187, 2006.
Vladimir Filkov and Steven Skiena. Integrating microarray data by consensus clustering. International Journal on Artificial Intelligence Tools, 13(04):863-880, 2004.
Alan M. Frieze and Mark Jerrum. Improved approximation algorithms for MAX k-cut and MAX BISECTION. Algorithmica, 18(1):67-81, 1997. URL: https://doi.org/10.1007/BF02523688.
Ioannis Giotis and Venkatesan Guruswami. Correlation clustering with a fixed number of clusters. Theory of Computing, 2(13):249-266, 2006. URL: https://doi.org/10.4086/toc.2006.v002a013.
Alexandre Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Soc. de Matemática de São Paulo, 1956.
Venkatesan Guruswami and Prasad Raghavendra. Constraint satisfaction over a non-boolean domain: Approximation algorithms and unique-games hardness. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 77-90. Springer, 2008.
David R. Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. J. ACM, 45(2):246-265, 1998. URL: https://doi.org/10.1145/274787.274791.
Marek Karpinski and Warren Schudy. Linear time approximation schemes for the gale-berlekamp game and related minimization problems. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 313-322, 2009.
Subhash Khot and Assaf Naor. Grothendieck-type inequalities in combinatorial optimization. arXiv preprint, 2011. URL: http://arxiv.org/abs/1108.2464.
Jean-Louis Krivine. Constantes de grothendieck et fonctions de type positif sur les spheres. Séminaire Analyse fonctionnelle (dit" Maurey-Schwartz"), pages 1-17, 1978.
Sara C Madeira and Arlindo L Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics, 1(1):24-45, 2004.
Andrew McCallum and Ben Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pages 79-86, August 2003.
JA Reeds. A new lower bound on the real grothendieck constant. Manuscript (http://www. dtc. umn. edu/ reedsj/bound2. dvi), 88(96):107, 1991.
Ronald E Rietz. A proof of the grothendieck inequality. Israel journal of mathematics, 19(3):271-276, 1974.
Cornelis Roos, Tamás Terlaky, Arkadi Nemirovski, and Kees Roos. On maximization of quadratic form over intersection of ellipsoids with common center. Mathematical Programming, 86, September 1999.
Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 526-527. Society for Industrial and Applied Mathematics, 2004.
Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In IJCAI, pages 1744-1749, 2007.
David P. Williamson and David B. Shmoys. The Design of Approximation Algorithms. Cambridge University Press, 2011. URL: http://www.cambridge.org/de/knowledge/isbn/item5759340/?site_locale=de_DE.
Anthony Wirth. Correlation clustering. C. Sammut and G. Webb, editors, Encyclopedia of Machine Learning, pages 227–-231, 2010.
Hongyuan Zha, Xiaofeng He, Chris Ding, Horst Simon, and Ming Gu. Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on Information and knowledge management, pages 25-32, 2001.

Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains

Authors Dor Katzelnick, Roy Schwartz

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains

Authors Dor Katzelnick, Roy Schwartz

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message