The Hardness of Approximation of Euclidean k-Means

Authors Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, Ali Kemal Sinop



PDF
Thumbnail PDF

File

LIPIcs.SOCG.2015.754.pdf
  • Filesize: 0.51 MB
  • 14 pages

Document Identifiers

Author Details

Pranjal Awasthi
Moses Charikar
Ravishankar Krishnaswamy
Ali Kemal Sinop

Cite As Get BibTex

Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 754-767, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.SOCG.2015.754

Abstract

The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d.

In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Subject Classification

Keywords
  • Euclidean k-means
  • Hardness of Approximation
  • Vertex Cover

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245-248, 2009. Google Scholar
  2. Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth. Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509-516, 1992. Google Scholar
  3. Noga Alon and Joel Spencer. The Probabilistic Method. John Wiley, 1992. Google Scholar
  4. Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Euclidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 106-113, 1998. Google Scholar
  5. David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 1027-1035, 2007. Google Scholar
  6. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544-562, 2004. Google Scholar
  7. Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 250-257, 2002. Google Scholar
  8. Maria-Florina Balcan, Avrim Blum, and Anupam Gupta. Approximate clustering without the approximation. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 1068-1077, 2009. Google Scholar
  9. Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. CoRR, abs/1406.2951, 2014. Google Scholar
  10. Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129-149, 2002. Google Scholar
  11. Sanjoy Dasgupta. The hardness of k-means clustering. Technical report, University of California, San Diego, 2008. Google Scholar
  12. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, 22(1):60-65, 2003. Google Scholar
  13. Wenceslas Fernandez de la Vega, Marek Karpinski, Claire Kenyon, and Yuval Rabani. Approximation schemes for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego, CA, USA, pages 50-58, 2003. Google Scholar
  14. Irit Dinur and Samuel Safra. On the hardness of approximating minimum vertex cover. Annals of Mathematics, 162(1):439-485, 2005. Google Scholar
  15. Tomás Feder and Daniel H. Greene. Optimal algorithms for approximate clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 434-444, 1988. Google Scholar
  16. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, June 6-8, 2007, pages 11-18, 2007. Google Scholar
  17. Venkatesan Guruswami and Piotr Indyk. Embeddings and non-approximability of geometric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 537-538, 2003. Google Scholar
  18. Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. J. ACM, 33(3):533-550, 1986. Google Scholar
  19. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731-740, 2002. Google Scholar
  20. Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274-296, 2001. Google Scholar
  21. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89-112, 2004. Google Scholar
  22. Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2-ε. Journal of Computer and System Sciences, 74(3):335-349, 2008. Google Scholar
  23. Stavros G. Kolliopoulos and Satish Rao. A nearly linear-time approximation scheme for the Euclidean k-median problem. SIAM J. Comput., 37(3):757-782, 2007. Google Scholar
  24. Guy Kortsarz, Michael Langberg, and Zeev Nutov. Approximating maximum subgraphs without short cycles. SIAM J. Discrete Math., 24(1):255-269, 2010. Google Scholar
  25. Amit Kumar and Ravindran Kannan. Clustering with spectral norm and the k-means algorithm. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 299-308, 2010. Google Scholar
  26. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. A simple linear time (1+́ε)-approximation algorithm for k-means clustering in any dimensions. In 45th Symposium on Foundations of Computer Science (FOCS 2004), 17-19 October 2004, Rome, Italy, Proceedings, pages 454-462, 2004. Google Scholar
  27. Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 901-910, 2013. Google Scholar
  28. Stuart P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-136, 1982. Google Scholar
  29. Alexander Lubotzky, Ralph Phillips, and Peter Sarnak. Ramanujan graphs. Combinatorica, 8(3):261-277, 1988. Google Scholar
  30. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi R. Varadarajan. The planar k-means problem is NP-hard. Theor. Comput. Sci., 442:13-21, 2012. Google Scholar
  31. Jiri Matoušek. On approximate geometric k-clustering. Discrete and Computational Geometry, 24(1), 2000. Google Scholar
  32. Moshe Morgenstern. Existence and explicit constructions of q + 1 regular ramanujan graphs for every prime power q. J. Comb. Theory, Ser. B, 62(1):44-62, 1994. Google Scholar
  33. Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of lloyd-type methods for the k-means problem. J. ACM, 59(6):28, 2012. Google Scholar
  34. Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus F. M. Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14(1):1-37, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail