Fair Correlation Clustering in Forests

Casel, Katrin; Friedrich, Tobias; Schirneck, Martin; Wietheger, Simon

doi:10.4230/LIPIcs.FORC.2023.9

Abstract

The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives.
In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view.
While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable.
As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes.

Saba Ahmadi, Sainyam Galhotra, Barna Saha, and Roy Schwartz. Fair correlation clustering. CoRR, arXiv:2002.03508, 2020. ArXiv preprint. URL: https://doi.org/10.48550/arXiv.2002.03508.
Sara Ahmadian, Alessandro Epasto, Ravi Kumar, and Mohammad Mahdian. Fair correlation clustering. In Proceedings of the 23rd Conference on Artificial Intelligence and Statistics (AISTATS), pages 4195-4205, 2020. URL: https://proceedings.mlr.press/v108/ahmadian20a.html.
Sara Ahmadian and Maryam Negahbani. Improved approximation for fair correlation clustering. CoRR, abs/2206.05050, 2022. URL: https://doi.org/10.48550/arXiv.2206.05050.
Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5):23:1-23:27, 2008. URL: https://doi.org/10.1145/1411509.1411513.
Noga Alon and Douglas B. West. The Borsuk-Ulam theorem and bisection of necklaces. Proceedings of the American Mathematical Society, 98(4):623-628, 1986. URL: https://doi.org/10.2307/2045739.
Sayan Bandyapadhyay, Fedor V. Fomin, and Kirill Simonov. On coresets for fair clustering in metric and euclidean spaces and their applications. In Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP), pages 23:1-23:15, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.23.
Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering. Machine Learning, 56(1–3):89-113, 2004. URL: https://doi.org/10.1023/B:MACH.0000033116.57574.95.
Amir Ben-Dor, Ron Shamir, and Zohar Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6(3–4):281-297, 1999. URL: https://doi.org/10.1089/106652799318274.
Suman K. Bera, Deeparnab Chakrabarty, Nicolas J. Flores, and Maryam Negahbani. Fair algorithms for clustering. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), pages 4954-4965, 2019.
Ioana Oriana Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt. On the cost of essentially fair clusterings. In Proceedings of the 2019 Conference on Approximation for Combinatorial Optimization Problems and the 2019 Conference on Randomization in Computation (APPROX/RANDOM), volume 145 of LIPIcs, pages 18:1-18:22, 2019.
Francesco Bonchi, David García-Soriano, and Francesco Gullo. Correlation Clustering. Morgan & Claypool Publishers, 2022. URL: https://doi.org/10.2200/S01163ED1V01Y202201DMK019.
Sebastian Böcker and Jan Baumbach. Cluster editing. In Proceedings of the 9th Conference on Computability in Europe (CiE), pages 33-44, 2013. URL: https://doi.org/10.1007/978-3-642-39053-1_5.
Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360-383, 2005. URL: https://doi.org/10.1016/j.jcss.2004.10.012.
Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum Flow and Minimum-Cost Flow in Almost-Linear Time. In Proceedings of the 63rd Symposium on Foundations of Computer Science (FOCS), pages 612-623, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00064.
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), pages 5036-5044, 2017.
Vincent Cohen-Addad, Euiwoong Lee, and Alantha Newman. Correlation Clustering with Sherali-Adams. In Proceedings of the 63rd Symposium on Foundations of Computer Science (FOCS), pages 651-661. IEEE, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00068.
Julien Darlay, Nadia Brauner, and Julien Moncel. Dense and sparse graph partition. Discrete Applied Mathematics, 160(16):2389-2396, 2012. URL: https://doi.org/10.1016/j.dam.2012.06.004.
Michael Dinitz, Aravind Srinivasan, Leonidas Tsepenekas, and Anil Vullikanti. Fair disaster containment via graph-cut problems. In Proceedings of the 25th Conference on Artificial Intelligence and Statistics (AISTATS), pages 6321-6333, 2022. URL: https://proceedings.mlr.press/v151/dinitz22a.html.
Thomas Epping, Winfried Hochstättler, and Peter Oertel. Complexity results on a paint shop problem. Discrete Applied Mathematics, 136:2-3:217-226, 2004. URL: https://doi.org/10.1016/S0166-218X(03)00442-6.
Seyed A. Esmaeili, Brian Brubach, Leonidas Tsepenekas, and John P. Dickerson. Probabilistic fair clustering. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), pages 12743-12755, 2020.
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 259-268, 2015. URL: https://doi.org/10.1145/2783258.2783311.
Andreas E. Feldmann and Luca Foschini. Balanced partitions of trees and applications. Algorithmica, 71(2):354-376, 2015. URL: https://doi.org/10.1007/s00453-013-9802-3.
Zachary Friggstad and Ramin Mousavi. Fair correlation clustering with global and local guarantees. In Proceedings of the 2021 Workshop on Algorithms and Data Structures (WADS), pages 414-427, 2021. URL: https://doi.org/10.1007/978-3-030-83508-8_30.
Jonggyu Jang and Hyun Jong Yang. α-Fairness-maximizing user association in energy-constrained small cell networks. IEEE Transactions on Wireless Communications, 21(9):7443-7459, 2022. URL: https://doi.org/10.1109/TWC.2022.3158694.
Suchi Kumari and Anurag Singh. Fair end-to-end window-based congestion control in time-varying data communication networks. International Journal of Communication Systems, 32(11), 2019. URL: https://doi.org/10.1002/dac.3986.
Dana Pessach and Erez Shmueli. A review on fairness in machine learning. ACM Computing Surveys, 55(3):51:1-51:44, 2022. URL: https://doi.org/10.1145/3494672.
Simon Régnier. Sur quelques aspects mathématiques des problèmes de classification automatique. Mathématiques et Sciences Humaines, 82:31-44, 1983.
Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Fair coresets and streaming algorithms for fair k-means. In Proceedings of the 17th Workshop on Approximation and Online Algorithms (WAOA), pages 232-251, 2020. URL: https://doi.org/10.1007/978-3-030-39479-0_16.
Roy Schwartz and Roded Zats. Fair correlation clustering in general graphs. In Proceedings of the 2022 Conference on Approximation for Combinatorial Optimization Problems and the 2022 Conference on Randomization in Computation (APPROX/RANDOM), pages 37:1-37:19, 2022. URL: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.37.
Xiao Xin. An FPT algorithm for the correlation clustering problem. Key Engineering Materials, 474–476:924-927, 2011. URL: https://doi.org/10.4028/www.scientific.net/KEM.474-476.924.
Charles T. Zahn, Jr. Approximating symmetric relations by equivalence relations. Journal of the Society for Industrial and Applied Mathematics, 12(4):840-847, 1964. URL: https://doi.org/10.1137/0112071.
Imtiaz Masud Ziko, Jing Yuan, Eric Granger, and Ismail Ben Ayed. Variational fair clustering. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), pages 11202-11209, 2021. URL: https://doi.org/10.1609/aaai.v35i12.17336.

Fair Correlation Clustering in Forests

Authors Katrin Casel , Tobias Friedrich , Martin Schirneck , Simon Wietheger

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Fair Correlation Clustering in Forests

Authors Katrin Casel , Tobias Friedrich , Martin Schirneck , Simon Wietheger

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message