Improved Approximation Algorithms and Lower Bounds for Search-Diversification Problems

Authors Amir Abboud, Vincent Cohen-Addad, Euiwoong Lee, Pasin Manurangsi



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2022.7.pdf
  • Filesize: 0.78 MB
  • 18 pages

Document Identifiers

Author Details

Amir Abboud
  • Weizmann Institute of Science, Rehovot, Israel
Vincent Cohen-Addad
  • Google Research, Zürich, Switzerland
Euiwoong Lee
  • University of Michigan, Ann Arbor, MI, USA
Pasin Manurangsi
  • Google Research, Mountain View, CA, USA

Acknowledgements

We are grateful to Karthik C.S. for insightful discussions, and to Badih Ghazi for encouraging us to work on the problems.

Cite As Get BibTex

Amir Abboud, Vincent Cohen-Addad, Euiwoong Lee, and Pasin Manurangsi. Improved Approximation Algorithms and Lower Bounds for Search-Diversification Problems. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ICALP.2022.7

Abstract

We study several questions related to diversifying search results. We give improved approximation algorithms in each of the following problems, together with some lower bounds.  
1) We give a polynomial-time approximation scheme (PTAS) for a diversified search ranking problem [Nikhil Bansal et al., 2010] whose objective is to minimizes the discounted cumulative gain. Our PTAS runs in time n^{2^O(log(1/ε)/ε)} ⋅ m^O(1) where n denotes the number of elements in the databases and m denotes the number of constraints. Complementing this result, we show that no PTAS can run in time f(ε) ⋅ (nm)^{2^o(1/ε)} assuming Gap-ETH and therefore our running time is nearly tight. Both our upper and lower bounds answer open questions from [Nikhil Bansal et al., 2010]. 
2) We next consider the Max-Sum Dispersion problem, whose objective is to select k out of n elements from a database that maximizes the dispersion, which is defined as the sum of the pairwise distances under a given metric. We give a quasipolynomial-time approximation scheme (QPTAS) for the problem which runs in time n^{O_ε(log n)}. This improves upon previously known polynomial-time algorithms with approximate ratios 0.5 [Refael Hassin et al., 1997; Allan Borodin et al., 2017]. Furthermore, we observe that reductions from previous work rule out approximation schemes that run in n^õ_ε(log n) time assuming ETH. 
3) Finally, we consider a generalization of Max-Sum Dispersion called Max-Sum Diversification. In addition to the sum of pairwise distance, the objective also includes another function f. For monotone submodular function f, we give a quasipolynomial-time algorithm with approximation ratio arbitrarily close to (1-1/e). This improves upon the best polynomial-time algorithm which has approximation ratio 0.5 [Allan Borodin et al., 2017]. Furthermore, the (1-1/e) factor is also tight as achieving better-than-(1-1/e) approximation is NP-hard [Uriel Feige, 1998].

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Approximation Algorithms
  • Complexity
  • Data Mining
  • Diversification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Zeinab Abbassi, Vahab S. Mirrokni, and Mayur Thakur. Diversity maximization under matroid constraints. In Inderjit S. Dhillon, Yehuda Koren, Rayid Ghani, Ted E. Senator, Paul Bradley, Rajesh Parekh, Jingrui He, Robert L. Grossman, and Ramasamy Uthurusamy, editors, The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013, pages 32-40. ACM, 2013. URL: https://doi.org/10.1145/2487575.2487636.
  2. Amir Abboud, Vincent Cohen-Addad, Euiwoong Lee, and Pasin Manurangsi. Improved approximation algorithms and lower bounds for search-diversification problems. CoRR, abs/2203.01857, 2022. URL: https://doi.org/10.48550/arXiv.2203.01857.
  3. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In Ricardo Baeza-Yates, Paolo Boldi, Berthier A. Ribeiro-Neto, and Berkant Barla Cambazoglu, editors, Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9-11, 2009, pages 5-14. ACM, 2009. URL: https://doi.org/10.1145/1498759.1498766.
  4. Noga Alon, Sanjeev Arora, Rajsekar Manokaran, Dana Moshkovitz, and Omri Weinstein. Inapproximability of densest κ-subgraph from average case hardness, 2011. Google Scholar
  5. Noga Alon, Troy Lee, Adi Shraibman, and Santosh S. Vempala. The approximate rank of a matrix and its algorithmic applications: approximate rank. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 675-684. ACM, 2013. URL: https://doi.org/10.1145/2488608.2488694.
  6. Yuichi Asahiro, Refael Hassin, and Kazuo Iwama. Complexity of finding dense subgraphs. Discrete Applied Mathematics, 121(1–3):15-26, 2002. URL: https://doi.org/10.1016/S0166-218X(01)00243-8.
  7. Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner. Scalable fair clustering. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 405-413. PMLR, 2019. URL: http://proceedings.mlr.press/v97/backurs19a.html.
  8. Nikhil Bansal, Anupam Gupta, and Ravishankar Krishnaswamy. A constant factor approximation algorithm for generalized min-sum set cover. In SODA, pages 1539-1545, 2010. URL: https://doi.org/10.1137/1.9781611973075.125.
  9. Nikhil Bansal, Kamal Jain, Anna Kazeykina, and Joseph Naor. Approximation algorithms for diversified search ranking. In ICALP, pages 273-284, 2010. URL: https://doi.org/10.1007/978-3-642-14162-1_23.
  10. Rémi Bardenet and Odalric-Ambrym Maillard. Concentration inequalities for sampling without replacement. Bernoulli, 21(3):1361-1385, 2015. Google Scholar
  11. Siddharth Barman. Approximating nash equilibria and dense subgraphs via an approximate version of carathéodory’s theorem. SIAM J. Comput., 47(3):960-981, 2018. URL: https://doi.org/10.1137/15M1050574.
  12. Julien Baste, Lars Jaffke, Tomás Masarík, Geevarghese Philip, and Günter Rote. FPT algorithms for diverse collections of hitting sets. Algorithms, 12(12):254, 2019. URL: https://doi.org/10.3390/a12120254.
  13. Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an O(n^1/4) approximation for densest k-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 201-210, 2010. URL: https://doi.org/10.1145/1806689.1806718.
  14. Aditya Bhaskara, Moses Charikar, Aravindan Vijayaraghavan, Venkatesan Guruswami, and Yuan Zhou. Polynomial integrality gaps for strong SDP relaxations of densest k-subgraph. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '12, pages 388-405, Philadelphia, PA, USA, 2012. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=2095116.2095150.
  15. Aditya Bhaskara, Mehrdad Ghadiri, Vahab S. Mirrokni, and Ola Svensson. Linear relaxations for finding diverse elements in metric spaces. In Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4098-4106, 2016. URL: https://proceedings.neurips.cc/paper/2016/hash/d79c6256b9bdac53a55801a066b70da3-Abstract.html.
  16. Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Trans. Algorithms, 13(3):41:1-41:25, 2017. URL: https://doi.org/10.1145/3086464.
  17. Mark Braverman, Young Kun-Ko, Aviad Rubinstein, and Omri Weinstein. ETH hardness for densest-k-subgraph with perfect completeness. In SODA, pages 1326-1341, 2017. Google Scholar
  18. Gruia Călinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740-1766, 2011. URL: https://doi.org/10.1137/080733991.
  19. Jaime G. Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In W. Bruce Croft, Alistair Moffat, C. J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, pages 335-336. ACM, 1998. URL: https://doi.org/10.1145/290941.291025.
  20. Parinya Chalermsook, Marek Cygan, Guy Kortsarz, Bundit Laekhanukit, Pasin Manurangsi, Danupon Nanongkai, and Luca Trevisan. From gap-exponential time hypothesis to fixed parameter tractable inapproximability: Clique, dominating set, and more. SIAM J. Comput., 49(4):772-810, 2020. URL: https://doi.org/10.1137/18M1166869.
  21. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5029-5037, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/978fce5bcc4eccc88ad48ce3914124a2-Abstract.html.
  22. Eden Chlamtác, Pasin Manurangsi, Dana Moshkovitz, and Aravindan Vijayaraghavan. Approximation algorithms for label cover and the log-density threshold. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 900-919. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974782.57.
  23. Irit Dinur. Mildly exponential reduction from gap 3SAT to polynomial-gap label-cover. Electronic Colloquium on Computational Complexity (ECCC), 23:128, 2016. URL: http://eccc.hpi-web.de/report/2016/128.
  24. Marina Drosou, H. V. Jagadish, Evaggelia Pitoura, and Julia Stoyanovich. Diversity in big data: A review. Big Data, 5(2):73-84, 2017. URL: https://doi.org/10.1089/big.2016.0054.
  25. Alessandro Epasto, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Scalable diversity maximization via small-size composable core-sets (brief announcement). In Christian Scheideler and Petra Berenbrink, editors, The 31st ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, AZ, USA, June 22-24, 2019, pages 41-42. ACM, 2019. URL: https://doi.org/10.1145/3323165.3323172.
  26. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634-652, 1998. URL: https://doi.org/10.1145/285055.285059.
  27. Uriel Feige. Relations between average case complexity and approximation complexity. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 534-543, New York, NY, USA, 2002. ACM. URL: https://doi.org/10.1145/509907.509985.
  28. Uriel Feige, Guy Kortsarz, and David Peleg. The dense k-subgraph problem. Algorithmica, 29(3):410-421, 2001. URL: https://doi.org/10.1007/s004530010050.
  29. Uriel Feige and Michael Langberg. Approximation algorithms for maximization problems arising in graph partitioning. J. Algorithms, 41(2):174-211, November 2001. URL: https://doi.org/10.1006/jagm.2001.1183.
  30. Uriel Feige and Michael Seltser. On the densest k-subgraph problem. Technical report, Weizmann Institute of Science, Rehovot, Israel, 1997. Google Scholar
  31. Fedor V. Fomin, Petr A. Golovach, Fahad Panolan, Geevarghese Philip, and Saket Saurabh. Diverse collections in matroids and graphs. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 31:1-31:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.31.
  32. Doron Goldstein and Michael Langberg. The dense k subgraph problem. CoRR, abs/0912.5327, 2009. URL: http://arxiv.org/abs/0912.5327,
  33. Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In WWW, pages 381-390, 2009. URL: https://doi.org/10.1145/1526709.1526761.
  34. Tesshu Hanaka, Yasuaki Kobayashi, Kazuhiro Kurita, See Woo Lee, and Yota Otachi. Computing diverse shortest paths efficiently: A theoretical and experimental study. CoRR, abs/2112.05403, 2021. URL: http://arxiv.org/abs/2112.05403.
  35. Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133-137, 1997. URL: https://doi.org/10.1016/S0167-6377(97)00034-5.
  36. Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. Composable core-sets for diversity and coverage maximization. In Richard Hull and Martin Grohe, editors, Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, Snowbird, UT, USA, June 22-27, 2014, pages 100-108. ACM, 2014. URL: https://doi.org/10.1145/2594538.2594560.
  37. Subhash Khot. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM J. Comput., 36(4):1025-1071, 2006. URL: https://doi.org/10.1137/S0097539705447037.
  38. Michael J. Kuby. Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geographical Analysis, 19(4):315-329, 1987. URL: https://doi.org/10.1111/j.1538-4632.1987.tb00133.x.
  39. Alex Kulesza, Ben Taskar, et al. Determinantal point processes for machine learning. Foundations and Trendsregistered in Machine Learning, 5(2-3):123-286, 2012. Google Scholar
  40. Pasin Manurangsi. Almost-polynomial ratio ETH-hardness of approximating densest k-subgraph. In STOC, pages 954-961, 2017. URL: https://doi.org/10.1145/3055399.3055412.
  41. Pasin Manurangsi. Tight running time lower bounds for strong inapproximability of maximum k-coverage, unique set cover and related problems (via t-wise agreement testing theorem). In SODA, pages 62-81, 2020. URL: https://doi.org/10.1137/1.9781611975994.5.
  42. Pasin Manurangsi and Prasad Raghavendra. A birthday repetition theorem and complexity of approximating dense CSPs. In ICALP, pages 78:1-78:15, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.78.
  43. Pasin Manurangsi, Aviad Rubinstein, and Tselil Schramm. The strongish planted clique hypothesis and its consequences. In ITCS, pages 10:1-10:21, 2021. URL: https://doi.org/10.4230/LIPIcs.ITCS.2021.10.
  44. Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. URL: https://doi.org/10.1017/CBO9780511813603.
  45. I. Douglas Moon and Sohail S. Chaudhry. An analysis of network location problems with distance constraints. Management Science, 30(3):290-307, 1984. URL: http://www.jstor.org/stable/2631804.
  46. Zafeiria Moumoulidou, Andrew McGregor, and Alexandra Meliou. Diverse data selection under fairness constraints. In Ke Yi and Zhewei Wei, editors, 24th International Conference on Database Theory, ICDT 2021, March 23-26, 2021, Nicosia, Cyprus, volume 186 of LIPIcs, pages 13:1-13:25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.13.
  47. Dana Pessach and Erez Shmueli. Algorithmic fairness. CoRR, abs/2001.09784, 2020. URL: http://arxiv.org/abs/2001.09784.
  48. Prasad Raghavendra and David Steurer. Graph expansion and the unique games conjecture. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC '10, pages 755-764, New York, NY, USA, 2010. ACM. URL: https://doi.org/10.1145/1806689.1806792.
  49. S. S. Ravi, Daniel J. Rosenkrantz, and Giri Kumar Tayi. Heuristic and special case algorithms for dispersion problems. Oper. Res., 42(2):299-310, 1994. URL: https://doi.org/10.1287/opre.42.2.299.
  50. LT Rodrygo, Craig Macdonald, and Iadh Ounis. Search result diversification. Foundations and Trends in Information Retrieval, 9(1):1-90, 2015. Google Scholar
  51. Anand Srivastav and Katja Wolf. Finding dense subgraphs with semidefinite programming. In Proceedings of the International Workshop on Approximation Algorithms for Combinatorial Optimization, APPROX '98, pages 181-191, London, UK, UK, 1998. Springer-Verlag. URL: http://dl.acm.org/citation.cfm?id=646687.702946.
  52. Sepehr Abbasi Zadeh, Mehrdad Ghadiri, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Scalable feature selection via distributed diversity maximization. In Satinder P. Singh and Shaul Markovitch, editors, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 2876-2883. AAAI Press, 2017. URL: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14914.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail