The Hardness of Approximation of Euclidean k-Means

Awasthi, Pranjal; Charikar, Moses; Krishnaswamy, Ravishankar; Sinop, Ali Kemal

doi:10.4230/LIPIcs.SOCG.2015.754

File

Author Details

Pranjal Awasthi

Moses Charikar

Ravishankar Krishnaswamy

Ali Kemal Sinop

Cite AsGet BibTex

Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 754-767, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)
https://doi.org/10.4230/LIPIcs.SOCG.2015.754

Abstract

The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d. In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245-248, 2009.
Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth. Construction of asymptotically good low-rate error-correcting codes through pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509-516, 1992.
Noga Alon and Joel Spencer. The Probabilistic Method. John Wiley, 1992.
Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Euclidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 106-113, 1998.
David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 1027-1035, 2007.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544-562, 2004.
Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 250-257, 2002.
Maria-Florina Balcan, Avrim Blum, and Anupam Gupta. Approximate clustering without the approximation. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 1068-1077, 2009.
Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. CoRR, abs/1406.2951, 2014.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129-149, 2002.
Sanjoy Dasgupta. The hardness of k-means clustering. Technical report, University of California, San Diego, 2008.
Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, 22(1):60-65, 2003.
Wenceslas Fernandez de la Vega, Marek Karpinski, Claire Kenyon, and Yuval Rabani. Approximation schemes for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego, CA, USA, pages 50-58, 2003.
Irit Dinur and Samuel Safra. On the hardness of approximating minimum vertex cover. Annals of Mathematics, 162(1):439-485, 2005.
Tomás Feder and Daniel H. Greene. Optimal algorithms for approximate clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA, pages 434-444, 1988.
Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, June 6-8, 2007, pages 11-18, 2007.
Venkatesan Guruswami and Piotr Indyk. Embeddings and non-approximability of geometric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 537-538, 2003.
Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. J. ACM, 33(3):533-550, 1986.
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731-740, 2002.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274-296, 2001.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89-112, 2004.
Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2-ε. Journal of Computer and System Sciences, 74(3):335-349, 2008.
Stavros G. Kolliopoulos and Satish Rao. A nearly linear-time approximation scheme for the Euclidean k-median problem. SIAM J. Comput., 37(3):757-782, 2007.
Guy Kortsarz, Michael Langberg, and Zeev Nutov. Approximating maximum subgraphs without short cycles. SIAM J. Discrete Math., 24(1):255-269, 2010.
Amit Kumar and Ravindran Kannan. Clustering with spectral norm and the k-means algorithm. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 299-308, 2010.
Amit Kumar, Yogish Sabharwal, and Sandeep Sen. A simple linear time (1+́ε)-approximation algorithm for k-means clustering in any dimensions. In 45th Symposium on Foundations of Computer Science (FOCS 2004), 17-19 October 2004, Rome, Italy, Proceedings, pages 454-462, 2004.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 901-910, 2013.
Stuart P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-136, 1982.
Alexander Lubotzky, Ralph Phillips, and Peter Sarnak. Ramanujan graphs. Combinatorica, 8(3):261-277, 1988.
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi R. Varadarajan. The planar k-means problem is NP-hard. Theor. Comput. Sci., 442:13-21, 2012.
Jiri Matoušek. On approximate geometric k-clustering. Discrete and Computational Geometry, 24(1), 2000.
Moshe Morgenstern. Existence and explicit constructions of q + 1 regular ramanujan graphs for every prime power q. J. Comb. Theory, Ser. B, 62(1):44-62, 1994.
Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of lloyd-type methods for the k-means problem. J. ACM, 59(6):28, 2012.
Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus F. M. Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14(1):1-37, 2008.

The Hardness of Approximation of Euclidean k-Means

Authors Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, Ali Kemal Sinop

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message