Diverse Data Selection under Fairness Constraints

Authors Zafeiria Moumoulidou, Andrew McGregor , Alexandra Meliou

Thumbnail PDF


  • Filesize: 6.97 MB
  • 25 pages

Document Identifiers

Author Details

Zafeiria Moumoulidou
  • College of Information and Computer Sciences, University of Massachusetts Amherst, MA, USA
Andrew McGregor
  • College of Information and Computer Sciences, University of Massachusetts Amherst, MA, USA
Alexandra Meliou
  • College of Information and Computer Sciences, University of Massachusetts Amherst, MA, USA

Cite AsGet BibTex

Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 13:1-13:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe 𝒰 of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
  • data selection
  • diversity maximization
  • fairness constraints
  • approximation algorithms


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In KDD '13, pages 32-40, 2013. Google Scholar
  2. Pankaj K. Agarwal, Stavros Sintos, and Alex Steiger. Efficient indexes for diverse top-k range queries. In PODS '20, page 213–227, 2020. Google Scholar
  3. Sepideh Aghamolaei, Majid Farhadi, and Hamid Zarrabi-Zadeh. Diversity maximization via composable coresets. In CCCG, 2015. Google Scholar
  4. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In WSDM '09, page 5–14, 2009. Google Scholar
  5. Albert Angel and Nick Koudas. Efficient diversity-aware search. In SIGMOD ’11, page 781–792, 2011. Google Scholar
  6. Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In NIPS’16, page 4105–4113, 2016. Google Scholar
  7. Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. Better sliding window algorithms to maximize subadditive and diversity objectives. In PODS '19, page 254–268, 2019. Google Scholar
  8. Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Trans. Algorithms, 2017. Google Scholar
  9. Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS '12, pages 155-166, 2012. Google Scholar
  10. Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR ’98, page 335–336, 1998. Google Scholar
  11. Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Fast coreset-based diversity maximization under matroid constraints. In WSDM '18, pages 81-89, 2018. Google Scholar
  12. Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. Mapreduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. Proc. VLDB Endow., page 469–480, 2017. Google Scholar
  13. Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and Nisheeth Vishnoi. Fair and diverse DPP-based data summarization. In ICML '2018, pages 716-725, 2018. Google Scholar
  14. L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In ICALP, 2017. Google Scholar
  15. Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Local search for max-sum diversification. In SODA ’17, page 130–142, 2017. Google Scholar
  16. Barun Chandra and Magnús M Halldórsson. Approximation algorithms for dispersion problems. J. Algorithms, pages 438-465, 2001. Google Scholar
  17. Danny Z. Chen, Jian Li, Hongyu Liang, and Haitao Wang. Matroid and knapsack center problems. Algorithmica, pages 27-52, 2016. Google Scholar
  18. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In NIPS'17, pages 5036-5044, 2017. Google Scholar
  19. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvtiskii. Matroids, matchings, and fairness. In Proceedings of Machine Learning Research, PMLR '19, 2019. Google Scholar
  20. Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. How to solve fair k-center in massive data models. In ICML 2020, pages 1877-1886, 2020. Google Scholar
  21. Anesa "Nes" Diaz-Uda, Carmen Medina, and Beth Schill. Diversity’s new frontier: Diversity of thought and the future of the workforce. Deloitte Insights, 2013. URL: https://www2.deloitte.com/us/en/insights/topics/talent/diversitys-new-frontier.html.
  22. M. Drosou and E. Pitoura. Diverse set selection over dynamic data. IEEE Transactions on Knowledge and Data Engineering, 26(5):1102-1116, 2014. Google Scholar
  23. Marina Drosou, H.V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. Diversity in big data: A review. Big Data, 5:73-84, 2017. Google Scholar
  24. Marina Drosou and Evaggelia Pitoura. Search result diversification. SIGMOD Rec., pages 41-47, 2010. Google Scholar
  25. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In ITCS '12, pages 214-226, 2012. Google Scholar
  26. Erhan Erkut. The discrete p-dispersion problem. European Journal of Operational Research, 46(1):48-60, 1990. Google Scholar
  27. Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: Testing software for discrimination. In ESEC/FSE '17, pages 498-510, 2017. Google Scholar
  28. Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In WWW ’09, page 381–390, 2009. Google Scholar
  29. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985. Google Scholar
  30. Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133-137, October 1997. Google Scholar
  31. Vivian Hunt, Dennis Layton, and Sara Prince. Why diversity matters. McKinsey & Company, 2015. URL: https://www.mckinsey.com/business-functions/organization/our-insights/why-diversity-matters.
  32. Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In PODS ’14, page 100–108, 2014. Google Scholar
  33. Matthew Jones, Huy Nguyen, and Thy Nguyen. Fair k-centers via maximum matching. In ICML 2020, pages 4940-4949, 2020. Google Scholar
  34. Matthew Kay, Cynthia Matuszek, and Sean A. Munson. Unequal representation and gender stereotypes in image search results for occupations. In CHI '15, page 3819–3828, 2015. Google Scholar
  35. Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-center clustering for data summarization. In ICML '19, volume 97, pages 3448-3457, 09-15 June 2019. Google Scholar
  36. Michael J. Kuby. Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geographical Analysis, 19(4):315-329, 1987. Google Scholar
  37. Todd Litman. Evaluating transportation equity: Guidance for incorporating distributional impacts in transportation planning, 2020. Google Scholar
  38. Sean A. Munson, Daniel Xiaodan Zhou, and Paul Resnick. Sidelines: An algorithm for increasing diversity in news and opinion aggregators. In ICWSM, 2009. Google Scholar
  39. Christos Nomikos, Aris Pagourtzis, and Stathis Zachos. Randomized and approximation algorithms for blue-red matching, 2007. URL: https://doi.org/10.1007/978-3-540-74456-6_63.
  40. James B. Orlin. Max flows in o(nm) time, or better. In STOC'13, pages 765-774, 2013. URL: https://doi.org/10.1145/2488608.2488705.
  41. James B. Orlin and Xiao-Yue Gong. A fast max flow algorithm. CoRR, abs/1910.04848, 2019. URL: http://arxiv.org/abs/1910.04848.
  42. Lu Qin, Jeffrey Xu Yu, and Lijun Chang. Diversifying top-k results. Proc. VLDB Endow., 5(11):1124–1135, July 2012. Google Scholar
  43. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Oper. Res., 42(2):299-310, April 1994. Google Scholar
  44. Alexander Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2003. Google Scholar
  45. Julia Stoyanovich, Ke Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In EDBT, 2018. Google Scholar
  46. Arie Tamir. Obnoxious facility location on graphs. SIAM J. Discrete Math., 4:550-567, November 1991. Google Scholar
  47. Yue Wang, Alexandra Meliou, and Gerome Miklau. Rc-index: Diversifying answers to range queries. Proc. VLDB Endow., 11(7):773–786, 2018. Google Scholar
  48. Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In IJCAI'19, pages 6035-6042, 2019. Google Scholar
  49. Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In SSDBM ’17, 2017. Google Scholar
  50. Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. Fa*ir: A fair top-k ranking algorithm. In CIKM '17, pages 1569-1578, 2017. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail