Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions

Authors Sepehr Assadi, Chen Wang



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2022.10.pdf
  • Filesize: 0.87 MB
  • 20 pages

Document Identifiers

Author Details

Sepehr Assadi
  • Department of Computer Science, Rutgers University, Piscataway, NJ, USA
Chen Wang
  • Department of Computer Science, Rutgers University, Piscataway, NJ, USA

Acknowledgements

We thank the anonymous reviewers of ITCS 2022 for their many insightful suggestions that helped with improving the presentation of this paper.

Cite As Get BibTex

Sepehr Assadi and Chen Wang. Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ITCS.2022.10

Abstract

We present a new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem. In particular, we obtain the following algorithms for n-vertex (+/-)-labeled graphs G:  
- A sublinear-time algorithm that with high probability returns a constant approximation clustering of G in O(nlog²n) time assuming access to the adjacency list of the (+)-labeled edges of G (this is almost quadratically faster than even reading the input once). Previously, no sublinear-time algorithm was known for this problem with any multiplicative approximation guarantee. 
- A semi-streaming algorithm that with high probability returns a constant approximation clustering of G in O(n log n) space and a single pass over the edges of the graph G (this memory is almost quadratically smaller than input size). Previously, no single-pass algorithm with o(n²) space was known for this problem with any approximation guarantee.  The main ingredient of our approach is a novel connection to sparse-dense graph decompositions that are used extensively in the graph coloring literature. To our knowledge, this connection is the first application of these decompositions beyond graph coloring, and in particular for the correlation clustering problem, and can be of independent interest.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Computing methodologies → Machine learning
Keywords
  • Correlation Clustering
  • Sublinear Algorithms
  • Semi-streaming Algorithms
  • Sublinear time Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Saba Ahmadi, Samir Khuller, and Barna Saha. Min-max correlation clustering via multicut. In Andrea Lodi and Viswanath Nagarajan, editors, Integer Programming and Combinatorial Optimization - 20th International Conference, IPCO 2019, Ann Arbor, MI, USA, May 22-24, 2019, Proceedings, volume 11480 of Lecture Notes in Computer Science, pages 13-26. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-17953-3_2.
  2. Sara Ahmadian, Alessandro Epasto, Ravi Kumar, and Mohammad Mahdian. Fair correlation clustering. In Silvia Chiappa and Roberto Calandra, editors, The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, pages 4195-4205. PMLR, 2020. URL: http://proceedings.mlr.press/v108/ahmadian20a.html.
  3. Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlation clustering in data streams. Algorithmica, 83(7):1980-2017, 2021. URL: https://doi.org/10.1007/s00453-021-00816-9.
  4. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1-23:27, 2008. URL: https://doi.org/10.1145/1411509.1411513.
  5. Noga Alon and Sepehr Assadi. Palette sparsification beyond (Δ+1) vertex coloring. In Jaroslaw Byrka and Raghu Meka, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference, volume 176 of LIPIcs, pages 6:1-6:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2020.6.
  6. Noga Alon and Joel H Spencer. The probabilistic method. John Wiley & Sons, 2004. Google Scholar
  7. Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for (Δ + 1) vertex coloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 767-786, 2019. Google Scholar
  8. Sepehr Assadi and Shay Solomon. When algorithms for maximal independent set and maximal matching run in sublinear time. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, pages 17:1-17:17, 2019. Google Scholar
  9. László Aszalós. Decompose boolean matrices with correlation clustering. Entropy, 23(7):852, 2021. URL: https://doi.org/10.3390/e23070852.
  10. Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine learning, 56(1):89-113, 2004. Google Scholar
  11. Francesco Bonchi, David García-Soriano, and Konstantin Kutzkov. Local correlation clustering. CoRR, abs/1312.5105, 2013. URL: http://arxiv.org/abs/1312.5105.
  12. Francesco Bonchi, Aristides Gionis, and Antti Ukkonen. Overlapping correlation clustering. Knowl. Inf. Syst., 35(1):1-32, 2013. URL: https://doi.org/10.1007/s10115-012-0522-9.
  13. Marco Bressan, Nicolò Cesa-Bianchi, Andrea Paudice, and Fabio Vitale. Correlation clustering with adaptive similarity queries. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 12510-12519, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/b0ba5c44aaf65f6ca34cf116e6d82ebf-Abstract.html.
  14. Mélanie Cambus, Davin Choo, Havu Miikonen, and Jara Uitto. Massively parallel correlation clustering in bounded arboricity graphs. In Seth Gilbert, editor, 35th International Symposium on Distributed Computing, DISC 2021, October 4-8, 2021, Freiburg, Germany (Virtual Conference), volume 209 of LIPIcs, pages 15:1-15:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.DISC.2021.15.
  15. Yi-Jun Chang, Wenzheng Li, and Seth Pettie. An optimal distributed (Δ+1)-coloring algorithm? In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 445-456, 2018. Google Scholar
  16. Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360-383, 2005. URL: https://doi.org/10.1016/j.jcss.2004.10.012.
  17. Yudong Chen, Sujay Sanghavi, and Huan Xu. Clustering sparse graphs. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 2213-2221, 2012. URL: https://proceedings.neurips.cc/paper/2012/hash/1e6e0a04d20f50967c64dac2d639a577-Abstract.html.
  18. Flavio Chierichetti, Nilesh N. Dalvi, and Ravi Kumar. Correlation clustering in mapreduce. In Sofus A. Macskassy, Claudia Perlich, Jure Leskovec, Wei Wang, and Rayid Ghani, editors, The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014, pages 641-650. ACM, 2014. URL: https://doi.org/10.1145/2623330.2623743.
  19. Flavio Chierichetti, Alessandro Panconesi, Giuseppe Re, and Luca Trevisan. Correlation clustering reconstruction in semi-adversarial models. CoRR, abs/2108.04729, 2021. URL: http://arxiv.org/abs/2108.04729.
  20. William W. Cohen and Jacob Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, pages 475-480. ACM, 2002. URL: https://doi.org/10.1145/775047.775116.
  21. Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub Tarnawski. Correlation clustering in constant many parallel rounds. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 2069-2078. PMLR, 2021. URL: http://proceedings.mlr.press/v139/cohen-addad21b.html.
  22. Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 45:1-45:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.45.
  23. Dotan Emanuel and Amos Fiat. Correlation clustering - minimizing disagreements on arbitrary weighted graphs. In Giuseppe Di Battista and Uri Zwick, editors, Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003, Proceedings, volume 2832 of Lecture Notes in Computer Science, pages 208-220. Springer, 2003. URL: https://doi.org/10.1007/978-3-540-39658-1_21.
  24. Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2-3):207-216, 2005. Google Scholar
  25. Manuela Fischer and Andreas Noever. Tight analysis of parallel randomized greedy MIS. ACM Trans. Algorithms, 16(1):6:1-6:13, 2020. URL: https://doi.org/10.1145/3326165.
  26. Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In Manuela M. Veloso, editor, IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pages 1744-1749, 2007. URL: http://ijcai.org/Proceedings/07/Papers/282.pdf.
  27. David García-Soriano, Konstantin Kutzkov, Francesco Bonchi, and Charalampos E. Tsourakakis. Query-efficient correlation clustering. In Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen, editors, WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, pages 1468-1478. ACM / IW3C2, 2020. URL: https://doi.org/10.1145/3366423.3380220.
  28. Magnús M. Halldórsson, Fabian Kuhn, Yannic Maus, and Tigran Tonoyan. Efficient randomized distributed coloring in CONGEST. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC '21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1180-1193. ACM, 2021. Google Scholar
  29. David G Harris, Johannes Schneider, and Hsin-Hao Su. Distributed (Δ+ 1)-coloring in sublogarithmic rounds. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 465-478. ACM, 2016. Google Scholar
  30. Jafar Jafarov, Sanchit Kalhan, Konstantin Makarychev, and Yury Makarychev. Local correlation clustering with asymmetric classification errors. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 4677-4686. PMLR, 2021. URL: http://proceedings.mlr.press/v139/jafarov21a.html.
  31. Sungwoong Kim, Chang Dong Yoo, Sebastian Nowozin, and Pushmeet Kohli. Image segmentation usinghigher-order correlation clustering. IEEE Trans. Pattern Anal. Mach. Intell., 36(9):1761-1774, 2014. URL: https://doi.org/10.1109/TPAMI.2014.2303095.
  32. Michael Molloy and Bruce A. Reed. A bound on the total chromatic number. Combinatorica, 18(2):241-280, 1998. Google Scholar
  33. Michael Molloy and Bruce A. Reed. Asymptotically optimal frugal colouring. J. Comb. Theory, Ser. B, 100(2):226-246, 2010. Google Scholar
  34. Michael Molloy and Bruce A. Reed. Colouring graphs when the number of colours is almost the maximum degree. J. Comb. Theory, Ser. B, 109:134-195, 2014. Google Scholar
  35. Bruce Reed. The list colouring constants. Journal of Graph Theory, 31(2):149-153, 1999. Google Scholar
  36. Bruce A. Reed. ω, Δ, and χ. Journal of Graph Theory, 27(4):177-212, 1998. Google Scholar
  37. Bruce A. Reed. A strengthening of Brooks' theorem. J. Comb. Theory, Ser. B, 76(2):136-149, 1999. Google Scholar
  38. Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Lacki, and Vahab S. Mirrokni. Scalable community detection via parallel correlation clustering. Proc. VLDB Endow., 14(11):2305-2313, 2021. URL: http://www.vldb.org/pvldb/vol14/p2305-shi.pdf.
  39. Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In J. Ian Munro, editor, Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, New Orleans, Louisiana, USA, January 11-14, 2004, pages 526-527. SIAM, 2004. URL: http://dl.acm.org/citation.cfm?id=982792.982866.
  40. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. Google Scholar
  41. Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37-57, 1985. URL: https://doi.org/10.1145/3147.3165.
  42. Jordi R Weggemans, Alex Urech, Alexander Rausch, Robert Spreeuw, Richard Boucherie, Florian Schreck, Kareljan Schoutens, Jiří Minář, and Florian Speelman. Solving correlation clustering with qaoa and a rydberg qudit system: a full-stack approach. arXiv preprint arXiv:2106.11672, 2021. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail