Clustering problems often arise in fields like data mining and machine learning. Clustering usually refers to the task of partitioning a collection of objects into groups with similar elements, with respect to a similarity (or dissimilarity) measure. Among the clustering problems, k-means clustering in particular has received much attention from researchers. Despite the fact that k-means is a well studied problem, its status in the plane is still open. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound achievable in polynomial time is 9+epsilon.

In this paper, we consider the following variant of k-means. Given a set C of points in R^d and a real f > 0, find a finite set F of points in R^d that minimizes the quantity f*|F|+sum_{p in C} min_{q in F} {||p-q||}^2. For any fixed dimension d, we design a PTAS for this problem that is based on local search. We also give a "bi-criterion" local search algorithm for k-means which uses (1+epsilon)k centers and yields a solution whose cost is at most (1+epsilon) times the cost of an optimal k-means solution. The algorithm runs in polynomial time for any fixed dimension.

The contribution of this paper is two-fold. On the one hand, we are able to handle the square of distances in an elegant manner, obtaining a near-optimal approximation bound. This leads us towards a better understanding of the k-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that little is known about the local search method for geometric approximation.