Proportionally Fair Clustering Revisited

Authors Evi Micha, Nisarg Shah



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2020.85.pdf
  • Filesize: 0.49 MB
  • 16 pages

Document Identifiers

Author Details

Evi Micha
  • University of Toronto, Canada
Nisarg Shah
  • University of Toronto, Canada

Acknowledgements

We thank anonymous reviewers for suggesting Theorem 11.

Cite As Get BibTex

Evi Micha and Nisarg Shah. Proportionally Fair Clustering Revisited. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 85:1-85:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ICALP.2020.85

Abstract

In this work, we study fairness in centroid clustering. In this problem, k cluster centers must be placed given n points in a metric space, and the cost to each point is its distance to the nearest cluster center. Recent work of Chen et al. [Chen et al., 2019] introduces the notion of a proportionally fair clustering, in which no group of at least n/k points can find a new cluster center which provides lower cost to each member of the group. They propose a greedy capture algorithm which provides a 1+√2 approximation of proportional fairness for any metric space, and derive generalization bounds for learning proportionally fair clustering from samples in the case where a cluster center can only be placed at one of finitely many given locations in the metric space. 
We focus on the case where cluster centers can be placed anywhere in the (usually infinite) metric space. In case of the L² distance metric over ℝ^t, we show that the approximation ratio of greedy capture improves to 2. We also show that this is due to a special property of the L² distance; for the L¹ and L^∞ distances, the approximation ratio remains 1+√2. We provide universal lower bounds which apply to all algorithms. 
We also consider metric spaces defined on graphs. For trees, we show that an exact proportionally fair clustering always exists and provide an efficient algorithm to find one. The corresponding question for general graph remains an interesting open question. 
Finally, we show that for the L² distance, checking whether a proportionally fair clustering exists and implementing greedy capture over an infinite metric space are NP-hard problems, but (approximately) solvable in special cases. We also derive generalization bounds which show that an approximately proportionally fair clustering for a large number of points can be learned from a small number of samples. Our work advances the understanding of proportional fairness in clustering, and points out many avenues for future work.

Subject Classification

ACM Subject Classification
  • Theory of computation → Algorithmic mechanism design
  • Theory of computation → Facility location and clustering
Keywords
  • Fairness
  • Clustering
  • Facility location

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alekh Agarwal, Miroslav Dudík, and Zhiwei Steven Wu. Fair regression: Quantitative definitions and reduction-based algorithms. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), volume 97, pages 120-129, 2019. Google Scholar
  2. Noga Alon, Michal Feldman, Ariel D. Procaccia, and Moshe Tennenholtz. Strategyproof approximation of the minimax on networks. Mathematics of Operations Research, 35(3):513-526, 2010. Google Scholar
  3. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23, 2016. Google Scholar
  4. Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 405-413, 2019. Google Scholar
  5. Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS), pages 4955-4966, 2019. Google Scholar
  6. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the vapnik-chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929-965, 1989. Google Scholar
  7. Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDM), pages 13-18, 2009. Google Scholar
  8. Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. Proportionally fair clustering. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 1032-1041, 2019. Google Scholar
  9. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NeurIPS), pages 5036-5044, 2017. Google Scholar
  10. Vincent Conitzer, Vincent Freeman, Nisarg Shah, and Jennifer Wortman Vaughan. Group fairness for the allocation of indivisible goods. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), pages 1853-1860, 2019. Google Scholar
  11. Joseph Cox and Michael B Partensky. Spatial localization problem and the circle of Apollonius. arXiv, 2007. URL: http://arxiv.org/abs/physics/0701146.
  12. Brandon Fain, Kamesh Munagala, and Nisarg Shah. Fair allocation of indivisible public goods. In Proceedings of the 19th ACM Conference on Economics and Computation (EC), pages 575-592, 2018. Google Scholar
  13. Michal Feldman and Yoav Wilf. Strategyproof facility location and the least squares objective. In Proceedings of the fourteenth ACM conference on Electronic Commerce (EC), pages 873-890, 2013. Google Scholar
  14. Dimitris Fotakis. Incremental algorithms for facility location and k-median. Theoretical Computer Science, 361(2-3):275-313, 2006. Google Scholar
  15. Anupam Gupta, Guru Guruganesh, and Melanie Schmidt. Approximation algorithms for aversion k-clustering via local k-median. In Proceedings of the 43rd International Colloquium on Automata, Languages and Programming (ICALP), pages 66:1-66:13, 2016. Google Scholar
  16. Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NeurIPS), pages 3315-3323, 2016. Google Scholar
  17. Tatsunori B Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. arXiv, 2018. URL: http://arxiv.org/abs/1806.08010.
  18. Stephen T. Hedetniemi, David P. Jacobs, and K. E. Kennedy. A theorem of ore and self-stabilizing algorithms for disjoint minimal dominating sets. Theoretical Computer Science, 593:132-138, 2015. Google Scholar
  19. Safwan Hossain, Andjela Mladenovic, and Nisarg Shah. Designing fairly fair classifiers via economic fairness notions. In Proceedings of the 29th International World Wide Web Conference (WWW), pages 1559-1569, 2020. Google Scholar
  20. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proceedings of the 35thInternational Conference on Machine Learning (ICML), pages 2569-2577, 2018. Google Scholar
  21. Ke Liao and Diansheng Guo. A clustering-based approach to the capacitated facility location problem 1. Transactions in GIS, 12(3):323-339, 2008. Google Scholar
  22. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. arXiv, 2019. URL: http://arxiv.org/abs/1908.09635.
  23. Hervé Moulin. Fair Division and Collective Welfare. MIT Press, 2003. Google Scholar
  24. Arvind Narayanan. Translation tutorial: 21 fairness definitions and their politics. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT), 2018. Google Scholar
  25. Cathy O'Neil. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2016. Google Scholar
  26. Oystein Ore. Theory of graphs. American Mathematical Society Colloquium. Providence, R.I., 38, 1962. Google Scholar
  27. David C. Parkes and Rakesh V. Vohra. Algorithmic and economic perspectives on fairness. arXiv, 2019. URL: http://arxiv.org/abs/1909.05282.
  28. Clemens Rösner and Melanie Schmidt. Privacy preserving clustering with constraints. In Proceedings of the 45th International Colloquium on Automata, Languages and Programming (ICALP), pages 96:1-–96:14, 2018. Google Scholar
  29. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014. Google Scholar
  30. Lloyd Shapley and Herbert Scarf. On cores and indivisibility. Journal of Mathematical Economics, 1(1):23-37, 1974. Google Scholar
  31. Vladimir Shenmaier. The problem of a minimal ball enclosing k points. Journal of Applied and Industrial Mathematics, 7(3):444-448, 2013. Google Scholar
  32. Berk Ustun, Yang Liu, and David C. Parkes. Fairness without harm: Decoupled classifiers with preference guarantees. In Proceedings of the 36thInternational Conference on Machine Learning (ICML), pages 6373-6382, 2019. Google Scholar
  33. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264-–280, 1971. Google Scholar
  34. Hal Varian. Equity, envy and efficiency. Journal of Economic Theory, 9:63-91, 1974. Google Scholar
  35. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 20(75):1-42, 2019. Google Scholar
  36. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In Proceedings of the 30thInternational Conference on Machine Learning (ICML), pages 325-333, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail