A Bi-Criteria Approximation Algorithm for k-Means

Authors Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, Justin Ward



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2016.14.pdf
  • Filesize: 0.55 MB
  • 20 pages

Document Identifiers

Author Details

Konstantin Makarychev
Yury Makarychev
Maxim Sviridenko
Justin Ward

Cite As Get BibTex

Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. A Bi-Criteria Approximation Algorithm for k-Means. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 14:1-14:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2016.14

Abstract

We consider the classical k-means clustering problem in the setting of bi-criteria approximation, in which an algorithm is allowed to output beta*k > k clusters, and must produce a clustering with cost at most alpha times the to the cost of the optimal set of k clusters.  We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor.

We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee alpha(beta) depending on the number beta*k of clusters that may be opened.  Our guarantee alpha(beta) is always at most 9 + epsilon and improves rapidly with beta (for example: alpha(2) < 2.59, and alpha(3) < 1.4).  Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings.

Subject Classification

Keywords
  • k-means clustering
  • bicriteria approximation algorithms
  • linear programming
  • local search

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexander A. Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. of Combinatorial Optimization, 8(3):307-328, 2004. Google Scholar
  2. Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. Adaptive sampling for k-means clustering. In Proc. of the 12th International Workshop APPROX, pages 15-28, 2009. Google Scholar
  3. D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. In Machine Learning, volume 75, pages 245-249, 2009. Google Scholar
  4. Aris Anagnostopoulos, Anirban Dasgupta, and Ravi Kumar. A constant-factor approximation algorithm for co-clustering. In Theory of Computing, volume 8(1), pages 597-622, 2012. Google Scholar
  5. D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027-1035, 2007. Google Scholar
  6. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. on Computing, 33(3):544-562, 2004. Google Scholar
  7. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Proc. of the 31st International Symposium on Computational Geometry, pages 754-767, 2015. Google Scholar
  8. Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In Proc. of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 737-756, 2015. Updated version: URL: http://arxiv.org/abs/1406.2951.
  9. Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. on Computing, 34(4):803-824, 2005. Google Scholar
  10. Fabian A. Chudak and David B. Shmoys. Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. on Computing, 33(1):1-25, 2003. Google Scholar
  11. Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proc. of the 47th Annual ACM on Symposium on Theory of Computing, pages 163-172, 2015. Google Scholar
  12. Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. CoRR, abs/1603.09535, 2016. URL: http://arxiv.org/abs/1603.09535.
  13. S. Dasgupta. The hardness of k-means clustering. In Technical Report CS2007-0890, University of California, San Diego., 2007. Google Scholar
  14. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proc. of the 23rd Annual Symposium on Computational Geometry, pages 11-18, 2007. Google Scholar
  15. Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. CoRR, abs/1603.08976, 2016. URL: http://arxiv.org/abs/1603.08976.
  16. Mary Inaba, Naoki Katoh, and Hiroshi Imai. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract). In Proc. of the 10th Annual Symposium on Computational Geometry, pages 332-339, 1994. Google Scholar
  17. Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274-296, March 2001. Google Scholar
  18. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Angela Y. Wu, and Christine D. Piatko. A local search approximation algorithm for k-means clustering. Computational Geometry, 28(2-3):89-112, 2004. Google Scholar
  19. Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Proc. of the 45th Annual ACM Symposium on Theory of Computing, pages 901-910, 2013. Google Scholar
  20. Jyh-Han Lin and Jeffrey Scott Vitter. Approximation algorithms for geometric median problems. Information Processing Lett., 44(5):245-249, December 1992. Google Scholar
  21. Jyh-Han Lin and Jeffrey Scott Vitter. ε-approximations with minimum packing constraint violation. In Proc. of the 24th Annual ACM Symposium on Theory of Computing, pages 771-782, 1992. Google Scholar
  22. S. Lloyd. Least squares quantization in PCM. Technical report, Bell Laboratories, 1957. Google Scholar
  23. S. Lloyd. Least squares quantization in PCM. IEEE Trans. on Information Theory, 28(2):129-137, Mar 1982. Google Scholar
  24. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. In Proc. of the 3rd International Workshop on Algorithms and Computation, pages 274-285, 2009. Google Scholar
  25. Jirı Matoušek. On approximate geometric k-clustering. Discrete &Computational Geometry, 24(1):61-84, 2000. Google Scholar
  26. R. Ostrovsky, Y. Rabani, L. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k-means problem. In Proc. of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 165-174, 2006. Google Scholar
  27. Maxim Sviridenko. An improved approximation algorithm for the metric uncapacitated facility location problem. In Integer Programming and Combinatorial Optimization: Proc. of the 9th IPCO Conference, pages 240-257, 2002. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail