Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces

Cho, Kyungjin; Oh, Eunjin

doi:10.4230/LIPIcs.ISAAC.2021.46

File

Author Details

Kyungjin Cho

POSTECH, Pohang, South Korea

Eunjin Oh

POSTECH, Poahng, South Korea

Cite AsGet BibTex

Kyungjin Cho and Eunjin Oh. Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 46:1-46:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.46

Abstract

In this paper, we present a linear-time approximation scheme for k-means clustering of incomplete data points in d-dimensional Euclidean space. An incomplete data point with Δ > 0 unspecified entries is represented as an axis-parallel affine subspace of dimension Δ. The distance between two incomplete data points is defined as the Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for k-means clustering of axis-parallel affine subspaces of dimension Δ that yields an (1+ε)-approximate solution in O(nd) time. The constants hidden behind O(⋅) depend only on Δ, ε and k. This improves the O(n² d)-time algorithm by Eiben et al. [SODA'21] by a factor of n.

Subject Classification

ACM Subject Classification

Theory of computation → Computational geometry

Keywords

k-means clustering
affine subspaces

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Marcel R. Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms, 6(4), September 2010.
P.D. Allison. Missing Data. Number no. 136 in Missing Data. SAGE Publications, 2001. URL: https://books.google.co.kr/books?id=ZtYArHXjpB8C.
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245-248, 2009.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG 2015), 2015.
Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009.
Kyungjin Cho and Eunjin Oh. Linear-time approximation scheme for k-means clustering of affine subspaces. CoRR, abs/2106.14176, 2021. URL: http://arxiv.org/abs/2106.14176.
Eduard Eiben, Fedor V Fomin, Petr A Golovach, Willian Lochet, Fahad Panolan, and Kirill Simonov. EPTAS for k-means clustering of affine subspaces. In Proceedings of the Thirty-Second ACM-SIAM Symposium on Discrete Algorithms (SODA 2021), pages 2649-2659, 2021.
Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43th Annual ACM Symposium on Theory of Computing (STOC 2011), pages 569-578, 2011.
Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. Improved algorithms for clustering with outliers. In Proceedings of the 30th International Symposium on Algorithms and Computation (ISAAC 2019), pages 61:1-61:12, 2019.
Jie Gao, Michael Langberg, and Leonard J Schulman. Analysis of incomplete data and an intrinsic-dimension helly theorem. Discrete & Computational Geometry, 40(4):537-560, 2008.
Jie Gao, Michael Langberg, and Leonard J. Schulman. Clustering lines in high-dimensional space: Classification of incomplete data. ACM Trans. Algorithms, 7(1), 2010.
Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3-19, January 2007.
Anil Kumar Jain, M. Narasimha Murty, and Patrick J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264-323, 1999.
Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):1-32, 2010.
Euiwoong Lee and Leonard J Schulman. Clustering affine subspaces: hardness and algorithms. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SODA 2013), pages 810-827, 2013.
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. Theoretical Computer Science, 442:13-21, 2012.
Yair Marom and Dan Feldman. k-means clustering of lines for big data. In Advances in Neural Information Processing Systems, volume 32, 2019.
Nimrod Megiddo. On the complexity of some geometric problems in unbounded dimension. Journal of Symbolic Computation, 10(3):327-334, 1990.
Björn Ommer and Jitendra Malik. Multi-scale object detection by clustering lines. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV 2009), pages 484-491, 2009.

Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces

Authors Kyungjin Cho, Eunjin Oh

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces

Authors Kyungjin Cho, Eunjin Oh

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References