Proportionally Fair Clustering Revisited

Micha, Evi; Shah, Nisarg

doi:10.4230/LIPIcs.ICALP.2020.85

File

Author Details

Evi Micha

University of Toronto, Canada

Nisarg Shah

University of Toronto, Canada

Cite AsGet BibTex

Evi Micha and Nisarg Shah. Proportionally Fair Clustering Revisited. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 85:1-85:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ICALP.2020.85

Abstract

In this work, we study fairness in centroid clustering. In this problem, k cluster centers must be placed given n points in a metric space, and the cost to each point is its distance to the nearest cluster center. Recent work of Chen et al. [Chen et al., 2019] introduces the notion of a proportionally fair clustering, in which no group of at least n/k points can find a new cluster center which provides lower cost to each member of the group. They propose a greedy capture algorithm which provides a 1+√2 approximation of proportional fairness for any metric space, and derive generalization bounds for learning proportionally fair clustering from samples in the case where a cluster center can only be placed at one of finitely many given locations in the metric space. We focus on the case where cluster centers can be placed anywhere in the (usually infinite) metric space. In case of the L² distance metric over ℝ^t, we show that the approximation ratio of greedy capture improves to 2. We also show that this is due to a special property of the L² distance; for the L¹ and L^∞ distances, the approximation ratio remains 1+√2. We provide universal lower bounds which apply to all algorithms. We also consider metric spaces defined on graphs. For trees, we show that an exact proportionally fair clustering always exists and provide an efficient algorithm to find one. The corresponding question for general graph remains an interesting open question. Finally, we show that for the L² distance, checking whether a proportionally fair clustering exists and implementing greedy capture over an infinite metric space are NP-hard problems, but (approximately) solvable in special cases. We also derive generalization bounds which show that an approximately proportionally fair clustering for a large number of points can be learned from a small number of samples. Our work advances the understanding of proportional fairness in clustering, and points out many avenues for future work.

Subject Classification

ACM Subject Classification

Theory of computation → Algorithmic mechanism design
Theory of computation → Facility location and clustering

Keywords

Fairness
Clustering
Facility location

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alekh Agarwal, Miroslav Dudík, and Zhiwei Steven Wu. Fair regression: Quantitative definitions and reduction-based algorithms. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), volume 97, pages 120-129, 2019.
Noga Alon, Michal Feldman, Ariel D. Procaccia, and Moshe Tennenholtz. Strategyproof approximation of the minimax on networks. Mathematics of Operations Research, 35(3):513-526, 2010.
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23, 2016.
Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 405-413, 2019.
Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4955-4966, 2019.
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the vapnik-chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929-965, 1989.
Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDM), pages 13-18, 2009.
Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. Proportionally fair clustering. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 1032-1041, 2019.
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NeurIPS), pages 5036-5044, 2017.
Vincent Conitzer, Vincent Freeman, Nisarg Shah, and Jennifer Wortman Vaughan. Group fairness for the allocation of indivisible goods. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), pages 1853-1860, 2019.
Joseph Cox and Michael B Partensky. Spatial localization problem and the circle of Apollonius. arXiv, 2007. URL: http://arxiv.org/abs/physics/0701146.
Brandon Fain, Kamesh Munagala, and Nisarg Shah. Fair allocation of indivisible public goods. In Proceedings of the 19th ACM Conference on Economics and Computation (EC), pages 575-592, 2018.
Michal Feldman and Yoav Wilf. Strategyproof facility location and the least squares objective. In Proceedings of the fourteenth ACM conference on Electronic Commerce (EC), pages 873-890, 2013.
Dimitris Fotakis. Incremental algorithms for facility location and k-median. Theoretical Computer Science, 361(2-3):275-313, 2006.
Anupam Gupta, Guru Guruganesh, and Melanie Schmidt. Approximation algorithms for aversion k-clustering via local k-median. In Proceedings of the 43rd International Colloquium on Automata, Languages and Programming (ICALP), pages 66:1-66:13, 2016.
Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NeurIPS), pages 3315-3323, 2016.
Tatsunori B Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. arXiv, 2018. URL: http://arxiv.org/abs/1806.08010.
Stephen T. Hedetniemi, David P. Jacobs, and K. E. Kennedy. A theorem of ore and self-stabilizing algorithms for disjoint minimal dominating sets. Theoretical Computer Science, 593:132-138, 2015.
Safwan Hossain, Andjela Mladenovic, and Nisarg Shah. Designing fairly fair classifiers via economic fairness notions. In Proceedings of the 29th International World Wide Web Conference (WWW), pages 1559-1569, 2020.
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proceedings of the 35thInternational Conference on Machine Learning (ICML), pages 2569-2577, 2018.
Ke Liao and Diansheng Guo. A clustering-based approach to the capacitated facility location problem 1. Transactions in GIS, 12(3):323-339, 2008.
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. arXiv, 2019. URL: http://arxiv.org/abs/1908.09635.
Hervé Moulin. Fair Division and Collective Welfare. MIT Press, 2003.
Arvind Narayanan. Translation tutorial: 21 fairness definitions and their politics. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT), 2018.
Cathy O'Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2016.
Oystein Ore. Theory of graphs. American Mathematical Society Colloquium. Providence, R.I., 38, 1962.
David C. Parkes and Rakesh V. Vohra. Algorithmic and economic perspectives on fairness. arXiv, 2019. URL: http://arxiv.org/abs/1909.05282.
Clemens Rösner and Melanie Schmidt. Privacy preserving clustering with constraints. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming (ICALP), pages 96:1-–96:14, 2018.
Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
Lloyd Shapley and Herbert Scarf. On cores and indivisibility. Journal of Mathematical Economics, 1(1):23-37, 1974.
Vladimir Shenmaier. The problem of a minimal ball enclosing k points. Journal of Applied and Industrial Mathematics, 7(3):444-448, 2013.
Berk Ustun, Yang Liu, and David C. Parkes. Fairness without harm: Decoupled classifiers with preference guarantees. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 6373-6382, 2019.
Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264-–280, 1971.
Hal Varian. Equity, envy and efficiency. Journal of Economic Theory, 9:63-91, 1974.
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 20(75):1-42, 2019.
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30thInternational Conference on Machine Learning (ICML), pages 325-333, 2013.

Proportionally Fair Clustering Revisited

Authors Evi Micha, Nisarg Shah

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Proportionally Fair Clustering Revisited

Authors Evi Micha, Nisarg Shah

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message