A Bi-Criteria Approximation Algorithm for k-Means

Makarychev, Konstantin; Makarychev, Yury; Sviridenko, Maxim; Ward, Justin

doi:10.4230/LIPIcs.APPROX-RANDOM.2016.14

Subject Classification

Keywords

k-means clustering
bicriteria approximation algorithms
linear programming
local search

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

We consider the classical k-means clustering problem in the setting of bi-criteria approximation, in which an algorithm is allowed to output beta*k > k clusters, and must produce a clustering with cost at most alpha times the to the cost of the optimal set of k clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee alpha(beta) depending on the number beta*k of clusters that may be opened. Our guarantee alpha(beta) is always at most 9 + epsilon and improves rapidly with beta (for example: alpha(2) < 2.59, and alpha(3) < 1.4). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings.

Cite As Get BibTex

Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. A Bi-Criteria Approximation Algorithm for k-Means. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 14:1-14:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2016.14

Author Details

Konstantin Makarychev

Yury Makarychev

Maxim Sviridenko

Justin Ward

References

Alexander A. Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. of Combinatorial Optimization, 8(3):307-328, 2004.
Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. Adaptive sampling for k-means clustering. In Proc. of the 12th International Workshop APPROX, pages 15-28, 2009.
D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. In Machine Learning, volume 75, pages 245-249, 2009.
Aris Anagnostopoulos, Anirban Dasgupta, and Ravi Kumar. A constant-factor approximation algorithm for co-clustering. In Theory of Computing, volume 8(1), pages 597-622, 2012.
D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027-1035, 2007.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. on Computing, 33(3):544-562, 2004.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Proc. of the 31st International Symposium on Computational Geometry, pages 754-767, 2015.
Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In Proc. of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 737-756, 2015. Updated version: URL: http://arxiv.org/abs/1406.2951.
Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. on Computing, 34(4):803-824, 2005.
Fabian A. Chudak and David B. Shmoys. Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. on Computing, 33(1):1-25, 2003.
Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proc. of the 47th Annual ACM on Symposium on Theory of Computing, pages 163-172, 2015.
Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. CoRR, abs/1603.09535, 2016. URL: http://arxiv.org/abs/1603.09535.
S. Dasgupta. The hardness of k-means clustering. In Technical Report CS2007-0890, University of California, San Diego., 2007.
Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proc. of the 23rd Annual Symposium on Computational Geometry, pages 11-18, 2007.
Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. CoRR, abs/1603.08976, 2016. URL: http://arxiv.org/abs/1603.08976.
Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract). In Proc. of the 10th Annual Symposium on Computational Geometry, pages 332-339, 1994.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274-296, March 2001.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Angela Y. Wu, and Christine D. Piatko. A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2-3):89-112, 2004.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Proc. of the 45th Annual ACM Symposium on Theory of Computing, pages 901-910, 2013.
Jyh-Han Lin and Jeffrey Scott Vitter. Approximation algorithms for geometric median problems. Information Processing Lett., 44(5):245-249, December 1992.
Jyh-Han Lin and Jeffrey Scott Vitter. ε-approximations with minimum packing constraint violation. In Proc. of the 24th Annual ACM Symposium on Theory of Computing, pages 771-782, 1992.
S. Lloyd. Least squares quantization in PCM. Technical report, Bell Laboratories, 1957.
S. Lloyd. Least squares quantization in PCM. IEEE Trans. on Information Theory, 28(2):129-137, Mar 1982.
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. In Proc. of the 3rd International Workshop on Algorithms and Computation, pages 274-285, 2009.
Jirı Matoušek. On approximate geometric k-clustering. Discrete &Computational Geometry, 24(1):61-84, 2000.
R. Ostrovsky, Y. Rabani, L. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k-means problem. In Proc. of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 165-174, 2006.
Maxim Sviridenko. An improved approximation algorithm for the metric uncapacitated facility location problem. In Integer Programming and Combinatorial Optimization: Proc. of the 9th IPCO Conference, pages 240-257, 2002.

A Bi-Criteria Approximation Algorithm for k-Means

Authors Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, Justin Ward

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message