Space Complexity of Euclidean Clustering

Authors Xiaoyi Zhu , Yuxiang Tian , Lingxiao Huang , Zengfeng Huang

Document Identifiers

Author Details

Xiaoyi Zhu
  • School of Data Science, Fudan University, Shanghai, China
Yuxiang Tian
  • School of Data Science, Fudan University, Shanghai, China
Lingxiao Huang
  • State Key Laboratory of Novel Software Technology, Nanjing University, China
Zengfeng Huang
  • School of Data Science, Fudan University, Shanghai, China

Xiaoyi Zhu, Yuxiang Tian, Lingxiao Huang, and Zengfeng Huang. Space Complexity of Euclidean Clustering. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 82:1-82:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


The (k, z)-Clustering problem in Euclidean space ℝ^d has been extensively studied. Given the scale of data involved, compression methods for the Euclidean (k, z)-Clustering problem, such as data compression and dimension reduction, have received significant attention in the literature. However, the space complexity of the clustering problem, specifically, the number of bits required to compress the cost function within a multiplicative error ε, remains unclear in existing literature.
This paper initiates the study of space complexity for Euclidean (k, z)-Clustering and offers both upper and lower bounds. Our space bounds are nearly tight when k is constant, indicating that storing a coreset, a well-known data compression approach, serves as the optimal compression scheme. Furthermore, our lower bound result for (k, z)-Clustering establishes a tight space bound of Θ(n d) for terminal embedding, where n represents the dataset size. Our technical approach leverages new geometric insights for principal angles and discrepancy methods, which may hold independent interest.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
  • Theory of computation → Facility location and clustering
  • Theory of computation → Data compression
  • Space complexity
  • Euclidean clustering
  • coreset
  • terminal embedding


