A Faster Algorithm for Constrained Correlation Clustering

Authors Nick Fischer , Evangelos Kipouridis , Jonas Klausen , Mikkel Thorup



PDF
Thumbnail PDF

File

LIPIcs.STACS.2025.32.pdf
  • Filesize: 0.82 MB
  • 18 pages

Document Identifiers

Author Details

Nick Fischer
  • INSAIT, Sofia University "St. Kliment Ohridski", Bulgaria
Evangelos Kipouridis
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Jonas Klausen
  • BARC, University of Copenhagen, Denmark
Mikkel Thorup
  • BARC, University of Copenhagen, Denmark

Acknowledgements

We thank Lorenzo Beretta for his valuable suggestions on weighted sampling.

Cite As Get BibTex

Nick Fischer, Evangelos Kipouridis, Jonas Klausen, and Mikkel Thorup. A Faster Algorithm for Constrained Correlation Clustering. In 42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 327, pp. 32:1-32:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.STACS.2025.32

Abstract

In the Correlation Clustering problem we are given n nodes, and a preference for each pair of nodes indicating whether we prefer the two endpoints to be in the same cluster or not. The output is a clustering inducing the minimum number of violated preferences. In certain cases, however, the preference between some pairs may be too important to be violated. The constrained version of this problem specifies pairs of nodes that must be in the same cluster as well as pairs that must not be in the same cluster (hard constraints). The output clustering has to satisfy all hard constraints while minimizing the number of violated preferences. 
Constrained Correlation Clustering is APX-Hard and has been approximated within a factor 3 by van Zuylen et al. [SODA '07]. Their algorithm is based on rounding an LP with Θ(n³) constraints, resulting in an Ω(n^{3ω}) running time. In this work, using a more combinatorial approach, we show how to approximate this problem significantly faster at the cost of a slightly weaker approximation factor. In particular, our algorithm runs in Õ(n³) time (notice that the input size is Θ(n²)) and approximates Constrained Correlation Clustering within a factor 16.
To achieve our result we need properties guaranteed by a particular influential algorithm for (unconstrained) Correlation Clustering, the CC-PIVOT algorithm. This algorithm chooses a pivot node u, creates a cluster containing u and all its preferred nodes, and recursively solves the rest of the problem. It is known that selecting pivots at random gives a 3-approximation. As a byproduct of our work, we provide a derandomization of the CC-PIVOT algorithm that still achieves the 3-approximation; furthermore, we show that there exist instances where no ordering of the pivots can give a (3-ε)-approximation, for any constant ε.
Finally, we introduce a node-weighted version of Correlation Clustering, which can be approximated within factor 3 using our insights on Constrained Correlation Clustering. As the general weighted version of Correlation Clustering would require a major breakthrough to approximate within a factor o(log n), Node-Weighted Correlation Clustering may be a practical alternative.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Clustering
  • Constrained Correlation Clustering
  • Approximation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas. Generating labels from clicks. In Ricardo Baeza-Yates, Paolo Boldi, Berthier A. Ribeiro-Neto, and Berkant Barla Cambazoglu, editors, Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9-11, 2009, pages 172-181. ACM, 2009. URL: https://doi.org/10.1145/1498759.1498824.
  2. Nir Ailon and Moses Charikar. Fitting tree metrics: Hierarchical clustering and phylogeny. SIAM J. Comput., 40(5):1275-1291, 2011. Announced at FOCS'05. URL: https://doi.org/10.1137/100806886.
  3. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1-23:27, 2008. Announced in STOC 2005. URL: https://doi.org/10.1145/1411509.1411513.
  4. Zeyuan Allen-Zhu and Lorenzo Orecchia. Nearly linear-time packing and covering LP solvers - achieving width-independence and -convergence. Math. Program., 175(1-2):307-353, 2019. URL: https://doi.org/10.1007/s10107-018-1244-x.
  5. Arvind Arasu, Christopher Ré, and Dan Suciu. Large-scale deduplication with constraints using dedupalog. In Yannis E. Ioannidis, Dik Lun Lee, and Raymond T. Ng, editors, Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pages 952-963. IEEE Computer Society, 2009. URL: https://doi.org/10.1109/ICDE.2009.43.
  6. Sepehr Assadi and Chen Wang. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. CoRR, abs/2109.14528, 2021. URL: https://arxiv.org/abs/2109.14528.
  7. Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Mach. Learn., 56(1-3):89-113, 2004. URL: https://doi.org/10.1023/B:MACH.0000033116.57574.95.
  8. Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Almost 3-approximate correlation clustering in constant rounds. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 720-731. IEEE, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00074.
  9. Soheil Behnezhad, Moses Charikar, Weiyun Ma, and Li-Yang Tan. Single-pass streaming algorithms for correlation clustering. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 819-849. SIAM, 2023. URL: https://doi.org/10.1137/1.9781611977554.CH33.
  10. Francesco Bonchi, Aristides Gionis, and Antti Ukkonen. Overlapping correlation clustering. Knowl. Inf. Syst., 35(1):1-32, 2013. URL: https://doi.org/10.1007/s10115-012-0522-9.
  11. Mark Bun, Marek Eliás, and Janardhan Kulkarni. Differentially private correlation clustering. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 1136-1146. PMLR, 2021. URL: http://proceedings.mlr.press/v139/bun21a.html.
  12. Melanie Cambus, Fabian Kuhn, Etna Lindy, Shreyas Pai, and Jara Uitto. A (3 + ε)-Approximate Correlation Clustering Algorithm in Dynamic Streams, pages 2861-2880. SIAM, 2024. URL: https://doi.org/10.1137/1.9781611977912.101.
  13. Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, and Lukas Vogl. Understanding the cluster linear program for correlation clustering. In Bojan Mohar, Igor Shinkar, and Ryan O'Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 1605-1616. ACM, 2024. URL: https://doi.org/10.1145/3618260.3649749.
  14. Nairen Cao, Shang-En Huang, and Hsin-Hao SU. Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds, pages 4124-4154. SIAM, 2024. URL: https://doi.org/10.1137/1.9781611977912.143.
  15. Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera. A graph-theoretic approach to webpage segmentation. In Jinpeng Huai, Robin Chen, Hsiao-Wuen Hon, Yunhao Liu, Wei-Ying Ma, Andrew Tomkins, and Xiaodong Zhang, editors, Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008, pages 377-386. ACM, 2008. URL: https://doi.org/10.1145/1367497.1367549.
  16. Sayak Chakrabarty and Konstantin Makarychev. Single-pass pivot algorithm for correlation clustering. keep it simple! CoRR, abs/2305.13560, 2023. URL: https://doi.org/10.48550/arXiv.2305.13560.
  17. Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360-383, 2005. Announced in FOCS 2003. URL: https://doi.org/10.1016/j.jcss.2004.10.012.
  18. Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 219-228. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746604.
  19. Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering sparse graphs. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 2213-2221, 2012. URL: https://proceedings.neurips.cc/paper/2012/hash/1e6e0a04d20f50967c64dac2d639a577-Abstract.html.
  20. Vincent Cohen-Addad, Chenglin Fan, Euiwoong Lee, and Arnaud de Mesmay. Fitting metrics and ultrametrics with minimum disagreements. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 301-311. IEEE, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00035.
  21. Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 2069-2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.
  22. Vincent Cohen-Addad, Euiwoong Lee, Shi Li, and Alantha Newman. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1082-1104. IEEE, 2023. URL: https://doi.org/10.1109/FOCS57990.2023.00065.
  23. Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation clustering with sherali-adams. CoRR, abs/2207.10889, 2022. URL: https://doi.org/10.48550/arXiv.2207.10889.
  24. Vincent Cohen-Addad, David Rasmussen Lolck, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, and Hanwen Zhang. Combinatorial correlation clustering. In Bojan Mohar, Igor Shinkar, and Ryan O'Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 1617-1628. ACM, 2024. URL: https://doi.org/10.1145/3618260.3649712.
  25. Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2-3):172-187, 2006. URL: https://doi.org/10.1016/j.tcs.2006.05.008.
  26. Nick Fischer, Evangelos Kipouridis, Jonas Klausen, and Mikkel Thorup. A faster algorithm for constrained correlation clustering, 2025. URL: https://arxiv.org/abs/2501.03154.
  27. Lisa Fleischer. A fast approximation scheme for fractional covering problems with variable upper bounds. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 1001-1010. SIAM, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982942.
  28. Fedor V. Fomin, Stefan Kratsch, Marcin Pilipczuk, Michal Pilipczuk, and Yngve Villanger. Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J. Comput. Syst. Sci., 80(7):1430-1447, 2014. URL: https://doi.org/10.1016/j.jcss.2014.04.015.
  29. Naveen Garg and Jochen Koenemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science, FOCS '98, page 300, USA, 1998. IEEE Computer Society. Google Scholar
  30. Dmitri V. Kalashnikov, Zhaoqi Chen, Sharad Mehrotra, and Rabia Nuray-Turan. Web people search via connection analysis. IEEE Trans. Knowl. Data Eng., 20(11):1550-1565, 2008. URL: https://doi.org/10.1109/TKDE.2008.78.
  31. Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Dong Yoo. Higher-order correlation clustering for image segmentation. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, pages 1530-1538, 2011. URL: https://proceedings.neurips.cc/paper/2011/hash/98d6f58ab0dafbb86b083a001561bb34-Abstract.html.
  32. Domenico Mandaglio, Andrea Tagarelli, and Francesco Gullo. Correlation clustering with global weight bounds. In Nuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, and José Antonio Lozano, editors, Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part II, volume 12976 of Lecture Notes in Computer Science, pages 499-515. Springer, 2021. URL: https://doi.org/10.1007/978-3-030-86520-7_31.
  33. Gregory J. Puleo and Olgica Milenkovic. Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM J. Optim., 25(3):1857-1872, 2015. URL: https://doi.org/10.1137/140994198.
  34. Anke van Zuylen and David P. Williamson. Deterministic pivoting algorithms for constrained ranking and clustering problems. Math. Oper. Res., 34(3):594-620, 2009. Announced in SODA 2007. URL: https://doi.org/10.1287/moor.1090.0385.
  35. Nate Veldt. Correlation clustering via strong triadic closure labeling: Fast approximation algorithms and practical lower bounds. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 22060-22083. PMLR, 2022. URL: https://proceedings.mlr.press/v162/veldt22a.html.
  36. Nate Veldt, David F. Gleich, and Anthony Wirth. A correlation clustering framework for community detection. In Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis, editors, Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pages 439-448. ACM, 2018. URL: https://doi.org/10.1145/3178876.3186110.
  37. Michael D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Trans. Software Eng., 17(9):972-975, 1991. URL: https://doi.org/10.1109/32.92917.
  38. Alastair J. Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10:127-128(1), April 1974. Google Scholar
  39. Di Wang, Satish Rao, and Michael W. Mahoney. Unified acceleration method for packing and covering problems via diameter reduction. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy, volume 55 of LIPIcs, pages 50:1-50:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.50.
  40. Julian Yarkony, Alexander T. Ihler, and Charless C. Fowlkes. Fast planar correlation clustering for image segmentation. In Andrew W. Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid, editors, Computer Vision - ECCV 2012 - 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, volume 7577 of Lecture Notes in Computer Science, pages 568-581. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-33783-3_41.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail