Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces

Authors Kyungjin Cho, Eunjin Oh



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2021.46.pdf
  • Filesize: 0.8 MB
  • 13 pages

Document Identifiers

Author Details

Kyungjin Cho
  • POSTECH, Pohang, South Korea
Eunjin Oh
  • POSTECH, Poahng, South Korea

Cite AsGet BibTex

Kyungjin Cho and Eunjin Oh. Linear-Time Approximation Scheme for k-Means Clustering of Axis-Parallel Affine Subspaces. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 46:1-46:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.46

Abstract

In this paper, we present a linear-time approximation scheme for k-means clustering of incomplete data points in d-dimensional Euclidean space. An incomplete data point with Δ > 0 unspecified entries is represented as an axis-parallel affine subspace of dimension Δ. The distance between two incomplete data points is defined as the Euclidean distance between two closest points in the axis-parallel affine subspaces corresponding to the data points. We present an algorithm for k-means clustering of axis-parallel affine subspaces of dimension Δ that yields an (1+ε)-approximate solution in O(nd) time. The constants hidden behind O(⋅) depend only on Δ, ε and k. This improves the O(n² d)-time algorithm by Eiben et al. [SODA'21] by a factor of n.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • k-means clustering
  • affine subspaces

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcel R. Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms, 6(4), September 2010. Google Scholar
  2. P.D. Allison. Missing Data. Number no. 136 in Missing Data. SAGE Publications, 2001. URL: https://books.google.co.kr/books?id=ZtYArHXjpB8C.
  3. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245-248, 2009. Google Scholar
  4. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG 2015), 2015. Google Scholar
  5. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  6. Kyungjin Cho and Eunjin Oh. Linear-time approximation scheme for k-means clustering of affine subspaces. CoRR, abs/2106.14176, 2021. URL: http://arxiv.org/abs/2106.14176.
  7. Eduard Eiben, Fedor V Fomin, Petr A Golovach, Willian Lochet, Fahad Panolan, and Kirill Simonov. EPTAS for k-means clustering of affine subspaces. In Proceedings of the Thirty-Second ACM-SIAM Symposium on Discrete Algorithms (SODA 2021), pages 2649-2659, 2021. Google Scholar
  8. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the 43th Annual ACM Symposium on Theory of Computing (STOC 2011), pages 569-578, 2011. Google Scholar
  9. Qilong Feng, Zhen Zhang, Ziyun Huang, Jinhui Xu, and Jianxin Wang. Improved algorithms for clustering with outliers. In Proceedings of the 30th International Symposium on Algorithms and Computation (ISAAC 2019), pages 61:1-61:12, 2019. Google Scholar
  10. Jie Gao, Michael Langberg, and Leonard J Schulman. Analysis of incomplete data and an intrinsic-dimension helly theorem. Discrete & Computational Geometry, 40(4):537-560, 2008. Google Scholar
  11. Jie Gao, Michael Langberg, and Leonard J. Schulman. Clustering lines in high-dimensional space: Classification of incomplete data. ACM Trans. Algorithms, 7(1), 2010. Google Scholar
  12. Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3-19, January 2007. Google Scholar
  13. Anil Kumar Jain, M. Narasimha Murty, and Patrick J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264-323, 1999. Google Scholar
  14. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):1-32, 2010. Google Scholar
  15. Euiwoong Lee and Leonard J Schulman. Clustering affine subspaces: hardness and algorithms. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SODA 2013), pages 810-827, 2013. Google Scholar
  16. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. Theoretical Computer Science, 442:13-21, 2012. Google Scholar
  17. Yair Marom and Dan Feldman. k-means clustering of lines for big data. In Advances in Neural Information Processing Systems, volume 32, 2019. Google Scholar
  18. Nimrod Megiddo. On the complexity of some geometric problems in unbounded dimension. Journal of Symbolic Computation, 10(3):327-334, 1990. Google Scholar
  19. Björn Ommer and Jitendra Malik. Multi-scale object detection by clustering lines. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV 2009), pages 484-491, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail