Abstract
The Euclidean kmeans problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NPhardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d.
In this paper we provide the first hardness of approximation for the Euclidean kmeans problem. Concretely, we show that there exists a constant c > 0 such that it is NPhard to approximate the kmeans objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on trianglefree graphs: given a trianglefree graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to trianglefree graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.
BibTeX  Entry
@InProceedings{awasthi_et_al:LIPIcs:2015:5117,
author = {Pranjal Awasthi and Moses Charikar and Ravishankar Krishnaswamy and Ali Kemal Sinop},
title = {{The Hardness of Approximation of Euclidean kMeans}},
booktitle = {31st International Symposium on Computational Geometry (SoCG 2015)},
pages = {754767},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783939897835},
ISSN = {18688969},
year = {2015},
volume = {34},
editor = {Lars Arge and J{\'a}nos Pach},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2015/5117},
URN = {urn:nbn:de:0030drops51178},
doi = {10.4230/LIPIcs.SOCG.2015.754},
annote = {Keywords: Euclidean kmeans, Hardness of Approximation, Vertex Cover}
}
Keywords: 

Euclidean kmeans, Hardness of Approximation, Vertex Cover 
Collection: 

31st International Symposium on Computational Geometry (SoCG 2015) 
Issue Date: 

2015 
Date of publication: 

12.06.2015 