Improved Diversity Maximization Algorithms for Matching and Pseudoforest

Authors Sepideh Mahabadi, Shyam Narayanan

Thumbnail PDF


  • Filesize: 0.79 MB
  • 22 pages

Document Identifiers

Author Details

Sepideh Mahabadi
  • Microsoft Research, Redmond, WA, USA
Shyam Narayanan
  • Massachusetts Institute of Technology, Cambridge, MA, USA

Cite AsGet BibTex

Sepideh Mahabadi and Shyam Narayanan. Improved Diversity Maximization Algorithms for Matching and Pseudoforest. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 25:1-25:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


In this work we consider the diversity maximization problem, where given a data set X of n elements, and a parameter k, the goal is to pick a subset of X of size k maximizing a certain diversity measure. Chandra and Halldórsson [Barun Chandra and Magnús M. Halldórsson, 2001] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except "remote-matching", where only an O(log k) approximation was known. In this work we present an O(1) approximation for this remaining notion. Further, we consider these notions from the perpective of composable coresets. Indyk et al. [Piotr Indyk et al., 2014] provided composable coresets with a constant factor approximation for all but "remote-pseudoforest" and "remote-matching", which again they only obtained a O(log k) approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only O(k), and for remote-pseudoforest, our coreset has size O(k^{1+ε}) for any ε > 0, for an O(1/ε)-approximate coreset.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
  • Theory of computation → Computational geometry
  • diversity maximization
  • approximation algorithms
  • composable coresets


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Sofiane Abbar, Sihem Amer-Yahia, Piotr Indyk, and Sepideh Mahabadi. Real-time recommendation of diverse related articles. In Proceedings of the 22nd international conference on World Wide Web, pages 1-12, 2013. Google Scholar
  2. Sofiane Abbar, Sihem Amer-Yahia, Piotr Indyk, Sepideh Mahabadi, and Kasturi R Varadarajan. Diverse near neighbor problem. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 207-214, 2013. Google Scholar
  3. Zeinab Abbassi, Vahab S Mirrokni, and Mayur Thakur. Diversity Maximization Under Matroid Constraints. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 32-40, 2013. Google Scholar
  4. Sepideh Aghamolaei, Majid Farhadi, and Hamid Zarrabi-Zadeh. Diversity maximization via composable coresets. In CCCG, pages 38-48, 2015. Google Scholar
  5. Albert Angel and Nick Koudas. Efficient diversity-aware search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 781-792, 2011. Google Scholar
  6. Sepehr Assadi and Sanjeev Khanna. Randomized composable coresets for matching and vertex cover. arXiv preprint, 2017. URL:
  7. Aditya Bhaskara, Mehrdad Ghadiri, Vahab S. Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In Advances in Neural Information Processing Systems, pages 4098-4106, 2016. Google Scholar
  8. Allan Borodin, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, pages 155-166, 2012. Google Scholar
  9. Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, and Eli Upfal. Mapreduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. arXiv preprint, 2016. URL:
  10. Alfonso Cevallos, Friedrich Eisenbrand, and Sarah Morell. Diversity maximization in doubling metrics. arXiv preprint, 2018. URL:
  11. Barun Chandra and Magnús M. Halldórsson. Approximation algorithms for dispersion problems. J. Algorithms, 38(2):438-465, 2001. Google Scholar
  12. Artur Czumaj and Christian Sohler. Estimating the weight of metric minimum spanning trees in sublinear time. SIAM J. Comput., 39(3):904-922, 2009. Google Scholar
  13. Marina Drosou and Evaggelia Pitoura. Search result diversification. ACM SIGMOD Record, 39(1):41-47, 2010. Google Scholar
  14. Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, and Peilin Zhong. Improved sliding window algorithms for clustering and coverage via bucketing-based sketches. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3005-3042. SIAM, 2022. Google Scholar
  15. Alessandro Epasto, Vahab Mirrokni, and Morteza Zadimoghaddam. Scalable diversity maximization via small-size composable core-sets (brief announcement). In The 31st ACM symposium on parallelism in algorithms and architectures, pages 41-42, 2019. Google Scholar
  16. E. N. Gilbert and H. O. Pollak. Steiner minimal trees. SIAM J. Appl. Math., 16:1-29, 1968. Google Scholar
  17. Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th international conference on World wide web, pages 381-390, 2009. Google Scholar
  18. Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. Diverse sequential subset selection for supervised video summarization. Advances in neural information processing systems, 27, 2014. Google Scholar
  19. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293-306, 1985. Google Scholar
  20. Magnús M Halldórsson, Kazuo Iwano, Naoki Katoh, and Takeshi Tokuyama. Finding subsets maximizing minimum structures. SIAM Journal on Discrete Mathematics, 12(3):342-359, 1999. Google Scholar
  21. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 373-380. ACM, 2004. Google Scholar
  22. Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, and Alireza Rezaei. Composable core-sets for determinant maximization problems via spectral spanners. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1675-1694. SIAM, 2020. Google Scholar
  23. Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, Snowbird, UT, USA, June 22-27, 2014, pages 100-108. ACM, 2014. Google Scholar
  24. Anoop Jain, Parag Sarda, and Jayant R Haritsa. Providing diversity in k-nearest neighbor query results. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 404-413. Springer, 2004. Google Scholar
  25. Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pages 510-520, 2011. Google Scholar
  26. Hui Lin, Jeff Bilmes, and Shasha Xie. Graph-based submodular selection for extractive summarization. In 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, pages 381-386. IEEE, 2009. Google Scholar
  27. Sepideh Mahabadi, Piotr Indyk, Shayan Oveis Gharan, and Alireza Rezaei. Composable core-sets for determinant maximization: A simple near-optimal algorithm. In International Conference on Machine Learning, pages 4254-4263. PMLR, 2019. Google Scholar
  28. Vahab Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 153-162. ACM, 2015. Google Scholar
  29. Julien Pilourdault, Sihem Amer-Yahia, Dongwon Lee, and Senjuti Basu Roy. Motivation-aware task assignment in crowdsourcing. In EDBT, 2017. Google Scholar
  30. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Facility dispersion problems: Heuristics and special cases. Algorithms and Data Structures, 519:355-366, 1991. Google Scholar
  31. Michael J Welch, Junghoo Cho, and Christopher Olston. Search result diversity for informational queries. In Proceedings of the 20th international conference on World wide web, pages 237-246, 2011. Google Scholar
  32. Cong Yu, Laks VS Lakshmanan, and Sihem Amer-Yahia. Recommendation diversification using explanations. In 2009 IEEE 25th International Conference on Data Engineering, pages 1299-1302. IEEE, 2009. Google Scholar
  33. Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511-4515, 2010. Google Scholar
  34. Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web, pages 22-32, 2005. Google Scholar