The Complexity of the k-means Method

Authors Tim Roughgarden, Joshua R. Wang



PDF
Thumbnail PDF

File

LIPIcs.ESA.2016.78.pdf
  • Filesize: 498 kB
  • 14 pages

Document Identifiers

Author Details

Tim Roughgarden
Joshua R. Wang

Cite As Get BibTex

Tim Roughgarden and Joshua R. Wang. The Complexity of the k-means Method. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 78:1-78:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ESA.2016.78

Abstract

The k-means method is a widely used technique for clustering points in Euclidean space.  While it is extremely fast in practice, its worst-case running time is exponential in the number of data points. We prove that the k-means method can implicitly solve PSPACE-complete problems, providing a complexity-theoretic explanation for its worst-case running time.  Our result parallels recent work on the complexity of the simplex method for linear programming.

Subject Classification

Keywords
  • k-means
  • PSPACE-complete

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ilan Adler, Christos Papadimitriou, and Aviad Rubinstein. On simplex pivoting rules and complexity theory. In Integer Programming and Combinatorial Optimization, pages 13-24. Springer, 2014. Google Scholar
  2. Nina Amenta and Gunter M Ziegler. Deformed products and maximal shadows of polytopes. Contemporary Mathematics, 223:57-90, 1999. Google Scholar
  3. David Arthur, Bodo Manthey, and H Roglin. k-means has polynomial smoothed complexity. In Foundations of Computer Science, 2009. FOCS'09. 50th Annual IEEE Symposium on, pages 405-414. IEEE, 2009. Google Scholar
  4. David Arthur and Sergei Vassilvitskii. How slow is the k-means method? In Proceedings of the twenty-second annual symposium on Computational geometry, pages 144-153. ACM, 2006. Google Scholar
  5. David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027-1035. Society for Industrial and Applied Mathematics, 2007. Google Scholar
  6. Pavel Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25-71. Springer, 2006. Google Scholar
  7. Anup Bhattacharya, Ragesh Jaiswal, and Nir Ailon. A tight lower bound instance for k-means++ in constant dimension. In Theory and Applications of Models of Computation, pages 7-22. Springer, 2014. Google Scholar
  8. Tobias Brunsch and Heiko Röglin. A bad instance for k-means++. In Theory and Applications of Models of Computation, pages 344-352. Springer, 2011. Google Scholar
  9. Yann Disser and Martin Skutella. The simplex algorithm is np-mighty. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 858-872. SIAM, 2015. Google Scholar
  10. John Fearnley and Rahul Savani. The complexity of the simplex method. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 201-208. ACM, 2015. Google Scholar
  11. Paul W Goldberg, Christos H Papadimitriou, and Rahul Savani. The complexity of the homotopy method, equilibrium selection, and lemke-howson solutions. ACM Transactions on Economics and Computation, 1(2):9, 2013. Google Scholar
  12. David S Johnson, Christos H Papadimitriou, and Mihalis Yannakakis. How easy is local search? Journal of computer and system sciences, 37(1):79-100, 1988. Google Scholar
  13. Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. A local search approximation algorithm for k-means clustering. In Proceedings of the eighteenth annual symposium on Computational geometry, pages 10-18. ACM, 2002. Google Scholar
  14. Victor Klee and George J Minty. How good is the simplex algorithm. Technical report, DTIC Document, 1970. Google Scholar
  15. Stuart P Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129-137, 1982. Google Scholar
  16. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is np-hard. In WALCOM: Algorithms and Computation, pages 274-285. Springer, 2009. Google Scholar
  17. Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385-463, 2004. Google Scholar
  18. Andrea Vattani. K-means requires exponentially many iterations even in the plane. Discrete &Computational Geometry, 45(4):596-616, 2011. Google Scholar
  19. Norman Zadeh. A bad network problem for the simplex method and other minimum cost flow algorithms. Mathematical Programming, 5(1):255-266, 1973. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail