Since its introduction in 1957, Lloyd’s algorithm for k-means clustering has been extensively studied and has undergone several improvements. While in its original form it does not guarantee any approximation factor at all, Arthur and Vassilvitskii (SODA 2007) proposed k-means++ which enhances Lloyd’s algorithm by a seeding method which guarantees a 𝒪(log k)-approximation in expectation. More recently, Lattanzi and Sohler (ICML 2019) proposed LS++ which further improves the solution quality of k-means++ by local search techniques to obtain a 𝒪(1)-approximation. On the practical side, the greedy variant of k-means++ is often used although its worst-case behaviour is provably worse than for the standard k-means++ variant.

We investigate how to improve LS++ further in practice. We study two options for improving the practical performance: (a) Combining LS++ with greedy k-means++ instead of k-means++, and (b) Improving LS++ by better entangling it with Lloyd’s algorithm. Option (a) worsens the theoretical guarantees of k-means++ but improves the practical quality also in combination with LS++ as we confirm in our experiments. Option (b) is our new algorithm, Foresight LS++. We experimentally show that FLS++ improves upon the solution quality of LS++. It retains its asymptotic runtime and its worst-case approximation bounds.