Bandyapadhyay, Sayan ;
Varadarajan, Kasturi
On Variants of kmeans Clustering
Abstract
Clustering problems often arise in fields like data mining and machine learning. Clustering usually refers to the task of partitioning a collection of objects into groups with similar elements, with respect to a similarity (or dissimilarity) measure. Among the clustering problems, kmeans clustering in particular has received much attention from researchers. Despite the fact that kmeans is a well studied problem, its status in the plane is still open. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound achievable in polynomial time is 9+epsilon.
In this paper, we consider the following variant of kmeans. Given a set C of points in R^d and a real f > 0, find a finite set F of points in R^d that minimizes the quantity f*F+sum_{p in C} min_{q in F} {pq}^2. For any fixed dimension d, we design a PTAS for this problem that is based on local search. We also give a "bicriterion" local search algorithm for kmeans which uses (1+epsilon)k centers and yields a solution whose cost is at most (1+epsilon) times the cost of an optimal kmeans solution. The algorithm runs in polynomial time for any fixed dimension.
The contribution of this paper is twofold. On the one hand, we are able to handle the square of distances in an elegant manner, obtaining a nearoptimal approximation bound. This leads us towards a better understanding of the kmeans problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that little is known about the local search method for geometric approximation.
BibTeX  Entry
@InProceedings{bandyapadhyay_et_al:LIPIcs:2016:5906,
author = {Sayan Bandyapadhyay and Kasturi Varadarajan},
title = {{On Variants of kmeans Clustering}},
booktitle = {32nd International Symposium on Computational Geometry (SoCG 2016)},
pages = {14:114:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770095},
ISSN = {18688969},
year = {2016},
volume = {51},
editor = {S{\'a}ndor Fekete and Anna Lubiw},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/5906},
URN = {urn:nbn:de:0030drops59061},
doi = {10.4230/LIPIcs.SoCG.2016.14},
annote = {Keywords: kmeans, Facility location, Local search, Geometric approximation}
}
2016
Keywords: 

kmeans, Facility location, Local search, Geometric approximation 
Seminar: 

32nd International Symposium on Computational Geometry (SoCG 2016)

Issue date: 

2016 
Date of publication: 

2016 