Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains

Authors Dor Katzelnick, Roy Schwartz



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2020.49.pdf
  • Filesize: 0.58 MB
  • 18 pages

Document Identifiers

Author Details

Dor Katzelnick
  • Department of Computer Science, Technion, Haifa, Israel
Roy Schwartz
  • Department of Computer Science, Technion, Haifa, Israel

Cite AsGet BibTex

Dor Katzelnick and Roy Schwartz. Maximizing the Correlation: Extending Grothendieck’s Inequality to Large Domains. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 49:1-49:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.49

Abstract

Correlation Clustering is an elegant model where given a graph with edges labeled + or -, the goal is to produce a clustering that agrees the most with the labels: + edges should reside within clusters and - edges should cross between clusters. In this work we study the MaxCorr objective, aiming to find a clustering that maximizes the difference between edges classified correctly and incorrectly. We focus on the case of bipartite graphs and present an improved approximation of 0.254, improving upon the known approximation of 0.219 given by Charikar and Wirth [FOCS`2004] and going beyond the 0.2296 barrier imposed by their technique. Our algorithm is inspired by Krivine’s method for bounding Grothendieck’s constant, and we extend this method to allow for more than two clusters in the output. Moreover, our algorithm leads to two additional results: (1) the first known approximation guarantees for MaxCorr where the output is constrained to have a bounded number of clusters; and (2) a natural extension of Grothendieck’s inequality to large domains.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
  • Mathematics of computing → Combinatorial optimization
Keywords
  • Correlation Clustering
  • Grothendieck’s Inequality
  • Approximation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke Van Zuylen. Improved approximation algorithms for bipartite correlation clustering. SIAM Journal on Computing, 41(5):1110-1121, 2012. Google Scholar
  2. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008. Google Scholar
  3. Noga Alon, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. Quadratic forms on graphs. Inventiones mathematicae, 163(3):499-522, 2006. Google Scholar
  4. Noga Alon and Assaf Naor. Approximating the cut-norm via grothendieck’s inequality. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 72-80, 2004. Google Scholar
  5. Noga Amit. The bicluster graph editing problem. PhD thesis, Tel Aviv University, 2004. Google Scholar
  6. Megasthenis Asteris, Anastasios Kyrillidis, Dimitris Papailiopoulos, and Alexandros Dimakis. Bipartite correlation clustering: Maximizing agreements. In Artificial Intelligence and Statistics, pages 121-129, 2016. Google Scholar
  7. Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1-3):89-113, 2004. Google Scholar
  8. Amir Ben-Dor, Ron Shamir, and Zohar Yakhini. Clustering gene expression patterns. Journal of computational biology, 6(3-4):281-297, 1999. Google Scholar
  9. Mark Braverman, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. The grothendieck constant is strictly smaller than krivine’s bound. ieee 52nd annual symposium on foundations of computer science focs 2011, 453-462. IEEE Computer Soc., Los Alamitos, CA, 2011. Google Scholar
  10. Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360-383, 2005. Google Scholar
  11. Moses Charikar and Anthony Wirth. Maximizing quadratic programs: Extending grothendieck’s inequality. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 54-60. IEEE, 2004. Google Scholar
  12. Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal lp rounding algorithm for correlationclustering on complete and complete k-partite graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 219-228, 2015. Google Scholar
  13. Yizong Cheng and George M Church. Biclustering of expression data. In Ismb, volume 8, pages 93-103, 2000. Google Scholar
  14. William Cohen and Jacob Richman. Learning to match and cluster entity names. In In ACM SIGIR-2001 Workshop on Mathematical/Formal Methods in Information Retrieval, 2001. Google Scholar
  15. William W. Cohen and Jacob Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, page 475–480, New York, NY, USA, 2002. Association for Computing Machinery. Google Scholar
  16. Tom Coleman, James Saunderson, and Anthony Wirth. A local-search 2-approximation for 2-correlation-clustering. In European Symposium on Algorithms, pages 308-319. Springer, 2008. Google Scholar
  17. Erik D Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172-187, 2006. Google Scholar
  18. Vladimir Filkov and Steven Skiena. Integrating microarray data by consensus clustering. International Journal on Artificial Intelligence Tools, 13(04):863-880, 2004. Google Scholar
  19. Alan M. Frieze and Mark Jerrum. Improved approximation algorithms for MAX k-cut and MAX BISECTION. Algorithmica, 18(1):67-81, 1997. URL: https://doi.org/10.1007/BF02523688.
  20. Ioannis Giotis and Venkatesan Guruswami. Correlation clustering with a fixed number of clusters. Theory of Computing, 2(13):249-266, 2006. URL: https://doi.org/10.4086/toc.2006.v002a013.
  21. Alexandre Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques. Soc. de Matemática de São Paulo, 1956. Google Scholar
  22. Venkatesan Guruswami and Prasad Raghavendra. Constraint satisfaction over a non-boolean domain: Approximation algorithms and unique-games hardness. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 77-90. Springer, 2008. Google Scholar
  23. David R. Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. J. ACM, 45(2):246-265, 1998. URL: https://doi.org/10.1145/274787.274791.
  24. Marek Karpinski and Warren Schudy. Linear time approximation schemes for the gale-berlekamp game and related minimization problems. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 313-322, 2009. Google Scholar
  25. Subhash Khot and Assaf Naor. Grothendieck-type inequalities in combinatorial optimization. arXiv preprint, 2011. URL: http://arxiv.org/abs/1108.2464.
  26. Jean-Louis Krivine. Constantes de grothendieck et fonctions de type positif sur les spheres. Séminaire Analyse fonctionnelle (dit" Maurey-Schwartz"), pages 1-17, 1978. Google Scholar
  27. Sara C Madeira and Arlindo L Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics, 1(1):24-45, 2004. Google Scholar
  28. Andrew McCallum and Ben Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pages 79-86, August 2003. Google Scholar
  29. JA Reeds. A new lower bound on the real grothendieck constant. Manuscript (http://www. dtc. umn. edu/ reedsj/bound2. dvi), 88(96):107, 1991. Google Scholar
  30. Ronald E Rietz. A proof of the grothendieck inequality. Israel journal of mathematics, 19(3):271-276, 1974. Google Scholar
  31. Cornelis Roos, Tamás Terlaky, Arkadi Nemirovski, and Kees Roos. On maximization of quadratic form over intersection of ellipsoids with common center. Mathematical Programming, 86, September 1999. Google Scholar
  32. Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 526-527. Society for Industrial and Applied Mathematics, 2004. Google Scholar
  33. Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In IJCAI, pages 1744-1749, 2007. Google Scholar
  34. David P. Williamson and David B. Shmoys. The Design of Approximation Algorithms. Cambridge University Press, 2011. URL: http://www.cambridge.org/de/knowledge/isbn/item5759340/?site_locale=de_DE.
  35. Anthony Wirth. Correlation clustering. C. Sammut and G. Webb, editors, Encyclopedia of Machine Learning, pages 227–-231, 2010. Google Scholar
  36. Hongyuan Zha, Xiaofeng He, Chris Ding, Horst Simon, and Ming Gu. Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on Information and knowledge management, pages 25-32, 2001. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail