Max-Sum Diversity Via Convex Programming
Diversity maximization is an important concept in information retrieval, computational geometry and operations research. Usually, it is a variant of the following problem: Given a ground set, constraints, and a function f that measures diversity of a subset, the task is to select a feasible subset S such that f(S) is maximized. The sum-dispersion function f(S) which is the sum of the pairwise distances in S, is in this context a prominent diversification measure. The corresponding diversity maximization is the "max-sum" or "sum-sum" diversification. Many recent results deal with the design of constant-factor approximation algorithms of diversification problems involving sum-dispersion function under a matroid constraint.
In this paper, we present a PTAS for the max-sum diversity problem under a matroid constraint for distances d(.,.) of negative type. Distances of negative type are, for example, metric distances stemming from the l_2 and l_1 norms, as well as the cosine or spherical, or Jaccard distance which are popular similarity metrics in web and image search.
Our algorithm is based on techniques developed in geometric algorithms like metric embeddings and convex optimization. We show that one can compute a fractional solution of the usually non-convex relaxation of the problem which yields an upper bound on the optimum integer solution. Starting from this fractional solution, we employ a deterministic rounding approach which only incurs a small loss in terms of objective, thus leading to a PTAS. This technique can be applied to other previously studied variants of the max-sum dispersion function, including combinations of diversity with linear-score maximization, improving over the previous constant-factor approximation algorithms.
Geometric Dispersion
Embeddings
Approximation Algorithms
Convex Programming
Matroids
26:1-26:14
Regular Paper
Alfonso
Cevallos
Alfonso Cevallos
Friedrich
Eisenbrand
Friedrich Eisenbrand
Rico
Zenklusen
Rico Zenklusen
10.4230/LIPIcs.SoCG.2016.26
Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 32-40. ACM, 2013.
A. A. Ageev and M. I. Sviridenko. Pipage rounding: a new method of constructing algorithms with proven performance guarantee. Journal of Combinatorial Optimization, 8(23):307-328, 2004.
N. Alon, S. Arora, R. Manokaran, D. Moshkovitz, and O. Weinstein. Inapproximability of densest κ-subgraph from average case hardness. Unpublished manuscript, 2011.
S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In Proceedings of the 20th international conference on World wide web, pages 317-326. ACM, 2011.
B. Birnbaum and K. J. Goldman. An improved analysis for a greedy remote-clique algorithm using factor-revealing LPs. Algorithmica, 55(1):42-59, 2009.
L. M. Blumenthal. Theory and Applications of Distance Geometry, volume 347. Oxford, 1953.
A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In Proceedings of the 31st Symposium on Principles of Database Systems, pages 155-166. ACM, 2012.
G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740-1766, 2011.
M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pages 380-388. ACM, 2002.
C. Chekuri, J. Vondrák, and R. Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In Proceedings of the 51st IEEE Symposium on Foundations of Computer Science, pages 575-584, 2010.
M. M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer-Verlag, Berlin, 1997.
M. M. Deza and H. Maehara. Metric transforms and euclidean embeddings. Transactions of the American Mathematical Society, 317(2):661-671, 1990.
S. P. Fekete and H. Meijer. Maximum dispersion and geometric maximum weight cliques. Algorithmica, 38(3):501-511, 2004.
M. Fréchet. Les dimensions d'un ensemble abstrait. Mathematische Annalen, 68(2):145-168, 1910.
F. R. Giles. Submodular Functions, Graphs and Integer Polyhedra. PhD thesis, University of Waterloo, 1975.
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, pages 381-390. ACM, 2009.
J. C. Gower and P. Legendre. Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1):5-48, 1986.
M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization, volume 2 of Algorithms and Combinatorics. Springer, 1988.
R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Operations Research Letters, 21(3):133-137, 1997.
L. G. Khachiyan. A polynomial algorithm in linear programming. Doklady Akademii Nauk SSSR, 244:1093-1097, 1979.
M. K. Kozlov, S. P. Tarasov, and L. G. Khachiyan. The polynomial solvability of convex quadratic programming. USSR Computational Mathematics and Mathematical Physics, 20(5):223-228, 1980.
Q. Lv, M. Charikar, and K. Li. Image similarity search with compact data structures. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management, pages 208-217. ACM, 2004.
K Makarychev, W Schudy, and M Sviridenko. Concentration inequalities for nonlinear matroid intersection. Random Structures &Algorithms, 46(3):541-571, 2015.
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval, volume 1. Cambridge university press Cambridge, 2008.
J. Matoušek. Lecture notes on metric embeddings. https://kam.mff.cuni.cz/~matousek/ba-a4.pdf, 2013.
https://kam.mff.cuni.cz/~matousek/ba-a4.pdf
E. Pekalska and R. P. W. Duin. The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence). World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2005.
P. Raghavan and C. D. Tompson. Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica, 7(4):365-374, 1987.
S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299-310, 1994.
G. Salton and M. J. MacGill. Introduction to modern information retrieval. McGraw-Hill computer science series, 1983.
I. J. Schoenberg. Metric spaces and completely monotone functions. Annals of Mathematics, pages 811-841, 1938.
I. J. Schoenberg. Metric spaces and positive definite functions. Transactions of the American Mathematical Society, 44(3):522-536, 1938.
A. Schrijver. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer Science &Business Media, 2003.
M. Skutella. Convex quadratic and semidefinite programming relaxations in scheduling. Journal of the ACM, 48(2):206-242, 2001.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode