Improved Approximation and Scalability for Fair Max-Min Diversification

Addanki, Raghavendra; McGregor, Andrew; Meliou, Alexandra; Moumoulidou, Zafeiria

doi:10.4230/LIPIcs.ICDT.2022.7

Abstract

Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results:  
1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 
2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m]. 
Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Sofiane Abbar, Sihem Amer-Yahia, Piotr Indyk, Sepideh Mahabadi, and Kasturi R Varadarajan. Diverse near neighbor problem. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 207-214, 2013.
Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In KDD '13, pages 32-40, 2013.
Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved approximation and scalability for fair max-min diversification, 2022. URL: http://arxiv.org/abs/2201.06678.
P. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric approximation via coresets, 2007.
Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In PODS '12, pages 23-34, 2012.
Pankaj K. Agarwal, Stavros Sintos, and Alex Steiger. Efficient indexes for diverse top-k range queries. In PODS '20, pages 213-227, 2020.
Sepideh Aghamolaei, Majid Farhadi, and Hamid Zarrabi-Zadeh. Diversity maximization via composable coresets. In CCCG, 2015.
Albert Angel and Nick Koudas. Efficient diversity-aware search. In SIGMOD ’11, pages 781-792, 2011.
Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501-555, May 1998.
Patrice Assouad. Plongements lipschitziens dans ℝⁿ. Bulletin de la Société Mathématique de France, 111:429-448, 1983.
Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. Advances in Neural Information Processing Systems, 32:4954-4965, 2019.
Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In NIPS’16, pages 4105-4113, 2016.
Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. Better sliding window algorithms to maximize subadditive and diversity objectives. In PODS '19, pages 254-268, 2019.
Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Trans. Algorithms, 2017.
Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS '12, pages 155-166, 2012.
Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR ’98, pages 335-336, 1998.
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Fast coreset-based diversity maximization under matroid constraints. In WSDM '18, pages 81-89, 2018.
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. A general coreset-based approach to diversity maximization under matroid constraints. ACM Trans. Knowl. Discov. Data, 2020.
Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. Mapreduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. Proc. VLDB Endow., pages 469-480, 2017.
Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and Nisheeth Vishnoi. Fair and diverse DPP-based data summarization. In ICML '2018, pages 716-725, 2018.
L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In ICALP, 2017.
Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Local search for max-sum diversification. In SODA ’17, pages 130-142, 2017.
Barun Chandra and Magnús M Halldórsson. Approximation algorithms for dispersion problems. J. Algorithms, pages 438-465, 2001.
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In NIPS, 2017.
Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. How to solve fair k-center in massive data models. In ICML 2020, pages 1877-1886, 2020.
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009.
Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE 2007, pages 1036-1045, 2007. URL: https://doi.org/10.1109/ICDE.2007.368962.
Ting Deng and Wenfei Fan. On the complexity of query result diversification. ACM Transactions on Database Systems (TODS), 39(2):1-46, 2014.
M. Drosou and E. Pitoura. Diverse set selection over dynamic data. IEEE Transactions on Knowledge and Data Engineering, 26(5):1102-1116, 2014.
Marina Drosou, H.V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. Diversity in big data: A review. Big Data, 5:73-84, 2017.
Marina Drosou and Evaggelia Pitoura. Search result diversification. SIGMOD Rec., 39(1):41-47, 2010.
Marina Drosou and Evaggelia Pitoura. Disc diversity: Result diversification based on dissimilarity and coverage. Proc. VLDB Endow., 6(1):13-24, November 2012.
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In ITCS '12, pages 214-226, 2012.
Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, and Jakub M Tarnawski. Fairness in streaming submodular maximization: Algorithms and hardness. In NeurIPS 2020, volume 33, pages 13609-13622, 2020.
Erhan Erkut. The discrete p-dispersion problem. European Journal of Operational Research, 46(1):48-60, 1990.
Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: Testing software for discrimination. In ESEC/FSE '17, pages 498-510, 2017.
Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In WWW ’09, pages 381-390, 2009.
Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985.
Sudipto Guha. Tight results for clustering and summarizing data streams. In ICDT '09, pages 268-275, 2009.
Anupam Gupta, Robert Krauthgamer, and James R Lee. Bounded geometries, fractals, and low-distortion embeddings. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 534-543. IEEE, 2003.
Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133-137, October 1997.
Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In PODS ’14, pages 100-108, 2014.
Matthew Jones, Huy Nguyen, and Thy Nguyen. Fair k-centers via maximum matching. In ICML 2020, pages 4940-4949, 2020.
Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-center clustering for data summarization. In ICML '19, volume 97, pages 3448-3457, June 2019.
Michael J. Kuby. Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geographical Analysis, 19(4):315-329, 1987.
Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In ICDT 2021, pages 13:1-13:25, 2021.
Lu Qin, Jeffrey Xu Yu, and Lijun Chang. Diversifying top-k results. Proc. VLDB Endow., 5(11):1124-1135, July 2012.
S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Oper. Res., 42(2):299-310, April 1994.
Alexander Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2003.
Julia Stoyanovich, Ke Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In EDBT, 2018.
Arie Tamir. Obnoxious facility location on graphs. SIAM J. Discrete Math., 4:550-567, November 1991.
Marcos R. Vieira, Humberto L. Razente, Maria C. N. Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina, and Vassilis J. Tsotras. On query result diversification. In ICDE 2011, pages 1163-1174, 2011.
Yanhao Wang, Francesco Fabbri, and Michael Mathioudakis. Fair and representative subset selection from data streams. In WWW 2021, pages 1340-1350, 2021.
Yue Wang, Alexandra Meliou, and Gerome Miklau. Rc-index: Diversifying answers to range queries. Proc. VLDB Endow., 11(7):773-786, 2018.
Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In IJCAI'19, pages 6035-6042, 2019.
Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In SSDBM ’17, 2017.
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. Fa*ir: A fair top-k ranking algorithm. In CIKM '17, pages 1569-1578, 2017.

Improved Approximation and Scalability for Fair Max-Min Diversification

Authors Raghavendra Addanki, Andrew McGregor , Alexandra Meliou, Zafeiria Moumoulidou

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Improved Approximation and Scalability for Fair Max-Min Diversification

Authors Raghavendra Addanki, Andrew McGregor , Alexandra Meliou, Zafeiria Moumoulidou

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

Supplementary Materials

References

Thanks for your feedback!

Could not send message