Diversity Maximization in Doubling Metrics

Authors Alfonso Cevallos , Friedrich Eisenbrand , Sarah Morell



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2018.33.pdf
  • Filesize: 477 kB
  • 12 pages

Document Identifiers

Author Details

Alfonso Cevallos
  • Swiss Federal Institute of Technology (ETH), Switzerland
Friedrich Eisenbrand
  • École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Sarah Morell
  • Technische Universität Berlin (TU Berlin), Germany

Cite AsGet BibTex

Alfonso Cevallos, Friedrich Eisenbrand, and Sarah Morell. Diversity Maximization in Doubling Metrics. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 33:1-33:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ISAAC.2018.33

Abstract

Diversity maximization is an important geometric optimization problem with many applications in recommender systems, machine learning or search engines among others. A typical diversification problem is as follows: Given a finite metric space (X,d) and a parameter k in N, find a subset of k elements of X that has maximum diversity. There are many functions that measure diversity. One of the most popular measures, called remote-clique, is the sum of the pairwise distances of the chosen elements. In this paper, we present novel results on three widely used diversity measures: Remote-clique, remote-star and remote-bipartition. Our main result are polynomial time approximation schemes for these three diversification problems under the assumption that the metric space is doubling. This setting has been discussed in the recent literature. The existence of such a PTAS however was left open. Our results also hold in the setting where the distances are raised to a fixed power q >= 1, giving rise to more variants of diversity functions, similar in spirit to the variations of clustering problems depending on the power applied to the pairwise distances. Finally, we provide a proof of NP-hardness for remote-clique with squared distances in doubling metric spaces.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Remote-clique
  • remote-star
  • remote-bipartition
  • doubling dimension
  • grid rounding
  • epsilon-nets
  • polynomial time approximation scheme
  • facility location
  • information retrieval

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In 19th Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 32-40. ACM, 2013. Google Scholar
  2. S. Aghamolaei, M. Farhadi, and H. Zarrabi-Zadeh. Diversity Maximization via Composable Coresets. In 27th Canadian Conference on Computational Geometry (CCCG), page 43, 2015. Google Scholar
  3. N. Alon, S. Arora, R. Manokaran, D. Moshkovitz, and O. Weinstein. Inapproximability of densest κ-subgraph from average case hardness. Unpublished manuscript, 2011. Google Scholar
  4. A. Bhaskara, M. Ghadiri, V. Mirrokni, and O. Svensson. Linear relaxations for finding diverse elements in metric spaces. In Advances in Neural Information Processing Systems, pages 4098-4106, 2016. Google Scholar
  5. B. Birnbaum and K. J. Goldman. An improved analysis for a greedy remote-clique algorithm using factor-revealing LPs. Algorithmica, 55(1):42-59, 2009. Google Scholar
  6. A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In Proceedings of the 31st Symposium on Principles of Database Systems, pages 155-166, 2012. Google Scholar
  7. M. Ceccarello, A. Pietracaprina, G. Pucci, and E. Upfal. MapReduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension. Proceedings of the VLDB Endowment, 10(5):469-480, 2017. Google Scholar
  8. A. Cevallos, F. Eisenbrand, and S. Morell. Diversity maximization in doubling metrics. arXiv preprint, 2018. URL: http://arxiv.org/abs/1809.09521.
  9. A. Cevallos, F. Eisenbrand, and R. Zenklusen. Max-Sum Diversity via Convex Programming. In 32nd Annual Symposium on Computational Geometry (SoCG), pages 26:1-26:14, 2016. Google Scholar
  10. A. Cevallos, F. Eisenbrand, and R. Zenklusen. Local Search for Max-Sum Diversification. In 28th Symposium on Discrete Algorithms (SODA), pages 130-142. SIAM, 2017. Google Scholar
  11. B. Chandra and M. M. Halldórsson. Approximation algorithms for dispersion problems. Journal of algorithms, 38(2):438-465, 2001. Google Scholar
  12. V. Cohen-Addad, P. N. Klein, and C. Mathieu. Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. In 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 353-364. IEEE, 2016. Google Scholar
  13. J. P. Cunningham and Z. Ghahramani. Linear dimensionality reduction: survey, insights, and generalizations. Journal of Machine Learning Research, 16(1):2859-2900, 2015. Google Scholar
  14. S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In Proceedings of the 40th Symposium on Theory of Computing, pages 537-546. ACM, 2008. Google Scholar
  15. S. P. Fekete and H. Meijer. Maximum dispersion and geometric maximum weight cliques. Algorithmica, 38(3):501-511, 2004. Google Scholar
  16. W. Fernandez de la Vega, M. Karpinski, and C. Kenyon. A Polynomial Time Approximation Scheme for Metric MIN-BISECTION. Electronic Colloquium on Computational Complexity (ECCC), pages 1-12, 2002. Google Scholar
  17. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In 18th International Conference on World Wide Web (WWW), pages 381-390. ACM, 2009. Google Scholar
  18. L. A. Gottlieb and R. Krauthgamer. A nonlinear approach to dimension reduction. Discrete &Computational Geometry, 54(2):291-315, 2015. Google Scholar
  19. S. Har-Peled. Geometric approximation algorithms, volume 173. American mathematical society Boston, 2011. Google Scholar
  20. R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Operations Research Letters, 21(3):133-137, 1997. Google Scholar
  21. P. Indyk, S. Mahabadi, M. Mahdian, and V. S. Mirrokni. Composable core-sets for diversity and coverage maximization. In 33rd ACM Symposium on Principles of Database Systems, pages 100-108, 2014. Google Scholar
  22. P. Indyk and A. Naor. Nearest-neighbor-preserving embeddings. ACM Transactions on Algorithms (TALG), 3(3):31, 2007. Google Scholar
  23. L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. Proceedings of the VLDB Endowment, 5(11):1124-1135, 2012. Google Scholar
  24. F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In 29th SIGIR Conference on Research and Development in Information Retrieval, pages 691-692. ACM, 2006. Google Scholar
  25. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299-310, 1994. Google Scholar
  26. A. Singhal. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35-43, 2001. Google Scholar
  27. J. B. Tenenbaum, V. de Silva, and J. C Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000. Google Scholar
  28. N. Vasconcelos. Feature selection by maximum marginal diversity. In Advances in Neural Information Processing Systems, pages 1375-1382, 2003. Google Scholar
  29. M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In 27th International Conference on Data Engineering (ICDE), pages 1163-1174. IEEE, 2011. Google Scholar
  30. D.W. Wang and Y.S. Kuo. A study on two geometric location problems. Information processing letters, 28(6):281-286, 1988. Google Scholar