Improved Approximation and Scalability for Fair Max-Min Diversification

Authors Raghavendra Addanki, Andrew McGregor , Alexandra Meliou, Zafeiria Moumoulidou



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2022.7.pdf
  • Filesize: 0.92 MB
  • 21 pages

Document Identifiers

Author Details

Raghavendra Addanki
  • Manning College of Information & Computer Sciences, University of Massachusetts Amherst, MA, USA
Andrew McGregor
  • Manning College of Information & Computer Sciences, University of Massachusetts Amherst, MA, USA
Alexandra Meliou
  • Manning College of Information & Computer Sciences, University of Massachusetts Amherst, MA, USA
Zafeiria Moumoulidou
  • Manning College of Information & Computer Sciences, University of Massachusetts Amherst, MA, USA

Cite AsGet BibTex

Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved Approximation and Scalability for Fair Max-Min Diversification. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 7:1-7:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.ICDT.2022.7

Abstract

Given an n-point metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair Max-Min diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6-approximation that is guaranteed to satisfy the fairness constraints up to a factor 1-ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m-1 approximation. 2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1-ε) k_i points from category i ∈ [m]. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • algorithmic fairness
  • diversity maximization
  • data selection
  • approximation algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sofiane Abbar, Sihem Amer-Yahia, Piotr Indyk, Sepideh Mahabadi, and Kasturi R Varadarajan. Diverse near neighbor problem. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 207-214, 2013. Google Scholar
  2. Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In KDD '13, pages 32-40, 2013. Google Scholar
  3. Raghavendra Addanki, Andrew McGregor, Alexandra Meliou, and Zafeiria Moumoulidou. Improved approximation and scalability for fair max-min diversification, 2022. URL: http://arxiv.org/abs/2201.06678.
  4. P. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric approximation via coresets, 2007. Google Scholar
  5. Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In PODS '12, pages 23-34, 2012. Google Scholar
  6. Pankaj K. Agarwal, Stavros Sintos, and Alex Steiger. Efficient indexes for diverse top-k range queries. In PODS '20, pages 213-227, 2020. Google Scholar
  7. Sepideh Aghamolaei, Majid Farhadi, and Hamid Zarrabi-Zadeh. Diversity maximization via composable coresets. In CCCG, 2015. Google Scholar
  8. Albert Angel and Nick Koudas. Efficient diversity-aware search. In SIGMOD ’11, pages 781-792, 2011. Google Scholar
  9. Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501-555, May 1998. Google Scholar
  10. Patrice Assouad. Plongements lipschitziens dans ℝⁿ. Bulletin de la Société Mathématique de France, 111:429-448, 1983. Google Scholar
  11. Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. Advances in Neural Information Processing Systems, 32:4954-4965, 2019. Google Scholar
  12. Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In NIPS’16, pages 4105-4113, 2016. Google Scholar
  13. Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. Better sliding window algorithms to maximize subadditive and diversity objectives. In PODS '19, pages 254-268, 2019. Google Scholar
  14. Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Trans. Algorithms, 2017. Google Scholar
  15. Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS '12, pages 155-166, 2012. Google Scholar
  16. Jaime Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR ’98, pages 335-336, 1998. Google Scholar
  17. Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Fast coreset-based diversity maximization under matroid constraints. In WSDM '18, pages 81-89, 2018. Google Scholar
  18. Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. A general coreset-based approach to diversity maximization under matroid constraints. ACM Trans. Knowl. Discov. Data, 2020. Google Scholar
  19. Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. Mapreduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. Proc. VLDB Endow., pages 469-480, 2017. Google Scholar
  20. Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and Nisheeth Vishnoi. Fair and diverse DPP-based data summarization. In ICML '2018, pages 716-725, 2018. Google Scholar
  21. L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In ICALP, 2017. Google Scholar
  22. Alfonso Cevallos, Friedrich Eisenbrand, and Rico Zenklusen. Local search for max-sum diversification. In SODA ’17, pages 130-142, 2017. Google Scholar
  23. Barun Chandra and Magnús M Halldórsson. Approximation algorithms for dispersion problems. J. Algorithms, pages 438-465, 2001. Google Scholar
  24. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In NIPS, 2017. Google Scholar
  25. Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. How to solve fair k-center in massive data models. In ICML 2020, pages 1877-1886, 2020. Google Scholar
  26. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2009. Google Scholar
  27. Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE 2007, pages 1036-1045, 2007. URL: https://doi.org/10.1109/ICDE.2007.368962.
  28. Ting Deng and Wenfei Fan. On the complexity of query result diversification. ACM Transactions on Database Systems (TODS), 39(2):1-46, 2014. Google Scholar
  29. M. Drosou and E. Pitoura. Diverse set selection over dynamic data. IEEE Transactions on Knowledge and Data Engineering, 26(5):1102-1116, 2014. Google Scholar
  30. Marina Drosou, H.V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. Diversity in big data: A review. Big Data, 5:73-84, 2017. Google Scholar
  31. Marina Drosou and Evaggelia Pitoura. Search result diversification. SIGMOD Rec., 39(1):41-47, 2010. Google Scholar
  32. Marina Drosou and Evaggelia Pitoura. Disc diversity: Result diversification based on dissimilarity and coverage. Proc. VLDB Endow., 6(1):13-24, November 2012. Google Scholar
  33. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In ITCS '12, pages 214-226, 2012. Google Scholar
  34. Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, and Jakub M Tarnawski. Fairness in streaming submodular maximization: Algorithms and hardness. In NeurIPS 2020, volume 33, pages 13609-13622, 2020. Google Scholar
  35. Erhan Erkut. The discrete p-dispersion problem. European Journal of Operational Research, 46(1):48-60, 1990. Google Scholar
  36. Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. Fairness testing: Testing software for discrimination. In ESEC/FSE '17, pages 498-510, 2017. Google Scholar
  37. Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In WWW ’09, pages 381-390, 2009. Google Scholar
  38. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985. Google Scholar
  39. Sudipto Guha. Tight results for clustering and summarizing data streams. In ICDT '09, pages 268-275, 2009. Google Scholar
  40. Anupam Gupta, Robert Krauthgamer, and James R Lee. Bounded geometries, fractals, and low-distortion embeddings. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 534-543. IEEE, 2003. Google Scholar
  41. Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133-137, October 1997. Google Scholar
  42. Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In PODS ’14, pages 100-108, 2014. Google Scholar
  43. Matthew Jones, Huy Nguyen, and Thy Nguyen. Fair k-centers via maximum matching. In ICML 2020, pages 4940-4949, 2020. Google Scholar
  44. Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-center clustering for data summarization. In ICML '19, volume 97, pages 3448-3457, June 2019. Google Scholar
  45. Michael J. Kuby. Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geographical Analysis, 19(4):315-329, 1987. Google Scholar
  46. Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse Data Selection under Fairness Constraints. In ICDT 2021, pages 13:1-13:25, 2021. Google Scholar
  47. Lu Qin, Jeffrey Xu Yu, and Lijun Chang. Diversifying top-k results. Proc. VLDB Endow., 5(11):1124-1135, July 2012. Google Scholar
  48. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Oper. Res., 42(2):299-310, April 1994. Google Scholar
  49. Alexander Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science & Business Media, 2003. Google Scholar
  50. Julia Stoyanovich, Ke Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In EDBT, 2018. Google Scholar
  51. Arie Tamir. Obnoxious facility location on graphs. SIAM J. Discrete Math., 4:550-567, November 1991. Google Scholar
  52. Marcos R. Vieira, Humberto L. Razente, Maria C. N. Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina, and Vassilis J. Tsotras. On query result diversification. In ICDE 2011, pages 1163-1174, 2011. Google Scholar
  53. Yanhao Wang, Francesco Fabbri, and Michael Mathioudakis. Fair and representative subset selection from data streams. In WWW 2021, pages 1340-1350, 2021. Google Scholar
  54. Yue Wang, Alexandra Meliou, and Gerome Miklau. Rc-index: Diversifying answers to range queries. Proc. VLDB Endow., 11(7):773-786, 2018. Google Scholar
  55. Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. Balanced ranking with diversity constraints. In IJCAI'19, pages 6035-6042, 2019. Google Scholar
  56. Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In SSDBM ’17, 2017. Google Scholar
  57. Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. Fa*ir: A fair top-k ranking algorithm. In CIKM '17, pages 1569-1578, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail