We consider the Min-Sum k-Clustering (k-MSC) problem. Given a set of points in a metric which is represented by an edge-weighted graph G = (V, E) and a parameter k, the goal is to partition the points V into k clusters such that the sum of distances between all pairs of the points within the same cluster is minimized.

The k-MSC problem is known to be APX-hard on general metrics. The best known approximation algorithms for the problem obtained by Behsaz, Friggstad, Salavatipour and Sivakumar [Algorithmica 2019] achieve an approximation ratio of O(log |V|) in polynomial time for general metrics and an approximation ratio 2+ε in quasi-polynomial time for metrics with bounded doubling dimension. No approximation schemes for k-MSC (when k is part of the input) is known for any non-trivial metrics prior to our work. In fact, most of the previous works rely on the simple fact that there is a 2-approximate reduction from k-MSC to the balanced k-median problem and design approximation algorithms for the latter to obtain an approximation for k-MSC.

In this paper, we obtain the first Quasi-Polynomial Time Approximation Schemes (QPTAS) for the problem on metrics induced by graphs of bounded treewidth, graphs of bounded highway dimension, graphs of bounded doubling dimensions (including fixed dimensional Euclidean metrics), and planar and minor-free graphs. We bypass the barrier of 2 for k-MSC by introducing a new clustering problem, which we call min-hub clustering, which is a generalization of balanced k-median and is a trade off between center-based clustering problems (such as balanced k-median) and pair-wise clustering (such as Min-Sum k-clustering). We then show how one can find approximation schemes for Min-hub clustering on certain classes of metrics.