On the Cost of Essentially Fair Clusterings

Bercea, Ioana O.; Groß, Martin; Khuller, Samir; Kumar, Aounon; Rösner, Clemens; Schmidt, Daniel R.; Schmidt, Melanie

doi:10.4230/LIPIcs.APPROX-RANDOM.2019.18

Abstract

Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters - especially if the data is already biased. At NIPS 2017, Chierichetti et al. [Flavio Chierichetti et al., 2017] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [Clemens Rösner and Melanie Schmidt, 2018]. We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.

Karen Aardal, Pieter L. van den Berg, Dion Gijswijt, and Shanfei Li. Approximation algorithms for hard capacitated k-facility location problems. European Journal of Operational Research, 242:358-368, 2015. URL: https://doi.org/10.1016/j.ejor.2014.10.011.
Ankit Aggarwal, Anand Louis, Manisha Bansal, Naveen Garg, Neelima Gupta, Shubham Gupta, and Surabhi Jain. A 3-approximation algorithm for the facility location problem with uniform capacities. Mathematical Programming, 141:527-547, 2013. URL: https://doi.org/10.1007/s10107-012-0565-4.
Gagan Aggarwal, Rina Panigrahy, Tomás Feder, Dilys Thomas, Krishnaram Kenthapadi, Samir Khuller, and An Zhu. Achieving anonymity via clustering. ACM Transactions on Algorithms, 6:49:1-49:19, 2010. URL: https://doi.org/10.1145/1798596.1798602.
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms. In Chris Umans, editor, Proceedings of the 58th IEEE Symposium on Foundations of Computer Science (FOCS 2017), pages 61-72. IEEE Computer Society, 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
Sara Ahmadian and Chaitanya Swamy. Improved Approximation Guarantees for Lower-Bounded Facility Location. In Thomas Erlebach and Giuseppe Persiano, editors, 10th International Workshop on Approximation and Online Algorithms (WAOA 2012), volume 7846 of Lecture Notes in Computer Science (LNCS), pages 257-271. Springer Berlin Heidelberg, 2012. URL: https://doi.org/10.1007/978-3-642-38016-7_21.
Sara Ahmadian and Chaitanya Swamy. Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 69:1-69:15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.69.
Hyung-Chan An, Mohit Singh, and Ola Svensson. LP-based algorithms for capacitated facility location. SIAM Journal on Computing, 46:272-306, 2017. URL: https://doi.org/10.1137/151002320.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 754-767. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015. URL: https://doi.org/10.4230/LIPIcs.SOCG.2015.754.
Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable Fair Clustering. CoRR, abs/1902.03519, 2019. URL: http://arxiv.org/abs/1902.03519.
Manisha Bansal, Naveen Garg, and Neelima Gupta. A 5-Approximation for Capacitated Facility Location. In Leah Epstein and Paolo Ferragina, editors, Algorithms - ESA 2012, volume 7501 of Lecture Notes in Computer Science (LNCS), pages 133-144. Springer Berlin Heidelberg, 2012. URL: https://doi.org/10.1007/978-3-642-33090-2_13.
Judit Bar-Ilan, Guy Kortsarz, and David Peleg. How to Allocate Network Centers. Journal of Algorithms, 15:385-415, 1993. URL: https://doi.org/10.1006/jagm.1993.1047.
Suman K. Bera, Deeparnab Chakrabarty, and Maryam Negahbani. Fair Algorithms for Clustering. CoRR, abs/1901.02393, 2019. URL: http://arxiv.org/abs/1901.02393.
Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt. On the cost of essentially fair clusterings. CoRR, abs/1811.10319, 2018. URL: http://arxiv.org/abs/1811.10319.
Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization. ACM Transactions on Algorithms, 13:23:1-23:31, 2017. URL: https://doi.org/10.1145/2981561.
Deeparnab Chakrabarty and Chaitanya Swamy. Facility location with client latencies: LP-based techniques for minimum-latency problems. Math. Oper. Res., 41(3):865-883, 2016. URL: https://doi.org/10.1287/moor.2015.0758.
Moses Charikar, Samir Khuller, David M. Mount, and Giri Narasimhan. Algorithms for facility location problems with outliers. In S. Rao Kosaraju, editor, Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pages 642-651, 2001. URL: http://dl.acm.org/citation.cfm?id=365411.365555.
Ke Chen. A constant factor approximation algorithm for k-median clustering with outliers. In Shang-Hua Teng, editor, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2008), pages 826-835. SIAM, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347173.
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair Clustering Through Fairlets. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 (NIPS 2017), pages 5036-5044, 2017. URL: http://papers.nips.cc/paper/7088-fair-clustering-through-fairlets.
Marek Cygan, MohammadTaghi Hajiaghayi, and Samir Khuller. LP rounding for k-centers with non-uniform hard capacities. In Venkatesan Guruswami, editor, 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2012), pages 273-282. IEEE Computer Society, 2012. URL: https://doi.org/10.1109/FOCS.2012.63.
Marek Cygan and Tomasz Kociumaka. Constant Factor Approximation for Capacitated k-Center with Outliers. In Ernst W. Mayr and Natacha Portier, editors, 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), Leibniz International Proceedings in Informatics (LIPIcs), pages 251-262. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.STACS.2014.251.
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. Fairness through awareness. In Shafi Goldwasser, editor, Innovations in Theoretical Computer Science 2012 (ITCS 2012), pages 214-226. ACM, 2012. URL: https://doi.org/10.1145/2090236.2090255.
Teofilo F. Gonzalez. Clustering to Minimize the Maximum Intercluster Distance. Theoretical Computer Science, 38:293-306, 1985. URL: https://doi.org/10.1016/0304-3975(85)90224-5.
Sudipto Guha and Samir Khuller. Greedy Strikes Back: Improved Facility Location Algorithms. Journal of Algorithms, 31:228-248, 1999. URL: https://doi.org/10.1006/jagm.1998.0993.
Moritz Hardt, Eric Price, and Nati Srebro. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29 (NIPS 2016), pages 3315-3323, 2016.
Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM, 33(3):533-550, 1986. URL: https://doi.org/10.1145/5925.5933.
Wen-Lian Hsu and George L. Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1:209-215, 1979. URL: https://doi.org/10.1016/0166-218X(79)90044-1.
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pages 731-740. ACM, 2002. URL: https://doi.org/10.1145/509907.510012.
Samir Khuller and Yoram J. Sussmann. The Capacitated K-Center Problem. SIAM Journal on Discrete Mathematics, 13:403-418, 2000. URL: https://doi.org/10.1137/S0895480197329776.
Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-Center Clustering for Data Summarization. CoRR, abs/1901.08628, 2019. URL: http://arxiv.org/abs/1901.08628.
Matthäus Kleindessner, Samira Samadi, Pranjal Awasthi, and Jamie Morgenstern. Guarantees for Spectral Clustering with Fairness Constraints. CoRR, abs/1901.08668, 2019. URL: http://arxiv.org/abs/1901.08668.
Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2018), pages 646-659. ACM, 2018. URL: https://doi.org/10.1145/3188745.3188882.
Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Information Processing Letters, 120:40-43, 2017. URL: https://doi.org/10.1016/j.ipl.2016.11.009.
Jian Li, Ke Yi, and Qin Zhang. Clustering with Diversity. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming (ICALP 2010), volume 6198 of Lecture Notes in Computer Science (LNCS), pages 188-200. Springer Berlin Heidelberg, 2010. URL: https://doi.org/10.1007/978-3-642-14165-2_17.
Shanfei Li. An Improved Approximation Algorithm for the Hard Uniform Capacitated k-median Problem. In Klaus Jansen, José D. P. Rolim, Nikhil R. Devanur, and Cristopher Moore, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014), volume 28 of Leibniz International Proceedings in Informatics (LIPIcs), pages 325-338. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.325.
Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Information and Computation, 222:45-58, 2013. URL: https://doi.org/10.1016/j.ic.2012.01.007.
Shi Li. Approximating capacitated k-median with (1 + ε)k open facilities. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), pages 786-796. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch56.
Shi Li. On Uniform Capacitated k-Median Beyond the Natural LP Relaxation. ACM Transactions on Algorithms, 13:22:1-22:18, 2017. URL: https://doi.org/10.1145/2983633.
Shi Li and Ola Svensson. Approximating k-Median via Pseudo-Approximation. SIAM Journal on Computing, 45:530-547, 2016. URL: https://doi.org/10.1137/130938645.
Andrea Romei and Salvatore Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29:582-638, 2014. URL: https://doi.org/10.1017/S0269888913000039.
Clemens Rösner and Melanie Schmidt. Privacy Preserving Clustering with Constraints. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 96:1-96:14. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.96.
Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Fair Coresets and Streaming Algorithms for Fair k-Means Clustering. CoRR, abs/1812.10854, 2018. URL: http://arxiv.org/abs/1812.10854.
Zoya Svitkina. Lower-bounded facility location. ACM Transaction on Algorithms, 6:69:1-69:16, 2010. URL: https://doi.org/10.1145/1824777.1824789.
Indrė Zliobaitė, Faisal Kamiran, and Toon Calders. Handling Conditional Discrimination. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, ICDM '11, pages 992-1001. IEEE Computer Society, 2011. URL: https://doi.org/10.1109/ICDM.2011.72.
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning Fair Representations. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning (ICML 2013), volume 28 of Proceedings of Machine Learning Research, pages 325-333. PMLR, 2013. URL: http://proceedings.mlr.press/v28/zemel13.html.

On the Cost of Essentially Fair Clusterings

Authors Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt , Melanie Schmidt

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

On the Cost of Essentially Fair Clusterings

Authors Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt , Melanie Schmidt

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References