Coresets for Fuzzy K-Means with Applications

Authors Johannes Blömer, Sascha Brauer, Kathrin Bujna



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2018.46.pdf
  • Filesize: 422 kB
  • 12 pages

Document Identifiers

Author Details

Johannes Blömer
  • Department of Computer Science, Paderborn University, Paderborn, Germany
Sascha Brauer
  • Department of Computer Science, Paderborn University, Paderborn, Germany
Kathrin Bujna
  • Department of Computer Science, Paderborn University, Paderborn, Germany

Cite AsGet BibTex

Johannes Blömer, Sascha Brauer, and Kathrin Bujna. Coresets for Fuzzy K-Means with Applications. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 46:1-46:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ISAAC.2018.46

Abstract

The fuzzy K-means problem is a popular generalization of the well-known K-means problem to soft clusterings. We present the first coresets for fuzzy K-means with size linear in the dimension, polynomial in the number of clusters, and poly-logarithmic in the number of points. We show that these coresets can be employed in the computation of a (1+epsilon)-approximation for fuzzy K-means, improving previously presented results. We further show that our coresets can be maintained in an insertion-only streaming setting, where data points arrive one-by-one.

Subject Classification

ACM Subject Classification
  • Theory of computation → Unsupervised learning and clustering
Keywords
  • clustering
  • fuzzy k-means
  • coresets
  • approximation algorithms
  • streaming

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. N. Ailon, R. Jaiswal, and C. Monteleoni. Streaming k-means approximation. In Advances in Neural Information Processing Systems 22, pages 10-18. Curran Associates, Inc., 2009. Google Scholar
  2. P. Awasthi, M. Charikar, R. Krishnaswamy, and A. K. Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry, pages 754-767, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.SOCG.2015.754.
  3. J. L. Bentley and J. B. Saxe. Decomposable searching problems I. Static-to-dynamic transformation. Journal of Algorithms, 1(4):301-358, 1980. Google Scholar
  4. J. C. Bezdek, R. Ehrlich, and W. Full. FCM: The fuzzy c-means clustering algorithm. Computers &Geosciences, 10(2):191-203, 1984. Google Scholar
  5. J. C. Bezdek, R. J. Hathaway, M. J. Sabin, and W. T. Tucker. Convergence theory for fuzzy c-means: Counterexamples and repairs. IEEE Transactions on Systems, Man and Cybernetics, 17(5):873-877, 1987. URL: http://dx.doi.org/10.1109/TSMC.1987.6499296.
  6. J. Blömer, S. Brauer, and K. Bujna. A Theoretical Analysis of the Fuzzy K-Means Problem. In 2016 IEEE 16th International Conference on Data Mining, pages 805-810, 2016. URL: http://dx.doi.org/10.1109/ICDM.2016.0094.
  7. V. Braverman, A. Meyerson, R. Ostrovsky, A. Roytman, M. Shindler, and B. Tagiku. Streaming K-means on Well-clusterable Data. In Proceedings of the Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms, pages 26-40, 2011. Google Scholar
  8. K. Chen. On Coresets for K-Median and K-Means Clustering in Metric and Euclidean Spaces and Their Applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: http://dx.doi.org/10.1137/070699007.
  9. D. Dembélé and P. Kastner. Fuzzy C-means method for clustering microarray data. Bioinformatics, 19(8):973-980, 2003. URL: http://dx.doi.org/10.1093/bioinformatics/btg119.
  10. J. C. Dunn. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics, 3(3):32-57, 1973. URL: http://dx.doi.org/10.1080/01969727308546046.
  11. D. Feldman, M. Faulkner, and A. Krause. Scalable Training of Mixture Models via Coresets. In Advances in Neural Information Processing Systems 24, pages 2142-2150. Curran Associates, Inc., 2011. Google Scholar
  12. D. Feldman and M. Langberg. A Unified Framework for Approximating and Clustering Data. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pages 569-578, 2011. URL: http://dx.doi.org/10.1145/1993636.1993712.
  13. D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for K-means Clustering Based on Weak Coresets. In Proceedings of the Twenty-third Annual Symposium on Computational Geometry, pages 11-18, 2007. URL: http://dx.doi.org/10.1145/1247069.1247072.
  14. D. Feldman, M. Schmidt, and C. Sohler. Turning Big Data into Tiny Data: Constant-size Coresets for K-means, PCA and Projective Clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1434-1453, 2013. Google Scholar
  15. S. Har-Peled and A. Kushal. Smaller Coresets for K-median and K-means Clustering. In Proceedings of the Twenty-first Annual Symposium on Computational Geometry, pages 126-134, 2005. URL: http://dx.doi.org/10.1007/s00454-006-1271-x.
  16. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, pages 291-300, 2004. URL: http://dx.doi.org/10.1145/1007352.1007400.
  17. R. J. Hathaway and J. C. Bezdek. Local convergence of the fuzzy c-Means algorithms. Pattern Recognition, 19(6):477-480, 1986. Google Scholar
  18. T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami. Fuzzy c-Means Algorithms for Very Large Data. IEEE Transactions on Fuzzy Systems, 20(6):1130-1146, 2012. URL: http://dx.doi.org/10.1109/TFUZZ.2012.2201485.
  19. K. Hirota and W. Pedrycz. Fuzzy computing for data mining. Proceedings of the IEEE, 87(9):1575-1600, 1999. URL: http://dx.doi.org/10.1109/5.784240.
  20. F. Hoppner and F. Klawonn. A contribution to convergence theory of fuzzy c-means and derivatives. IEEE Transactions on Fuzzy Systems, 11(5):682-694, 2003. URL: http://dx.doi.org/10.1109/TFUZZ.2003.817858.
  21. P. Hore, L. O. Hall, and D. B. Goldgof. Single Pass Fuzzy C Means. In 2007 IEEE International Fuzzy Systems Conference, pages 1-7, 2007. URL: http://dx.doi.org/10.1109/FUZZY.2007.4295372.
  22. M. Inaba, N. Katoh, and H. Imai. Applications of Weighted Voronoi Diagrams and Randomization to Variance-based K-clustering. In Proceedings of the Tenth Annual Symposium on Computational Geometry, pages 332-339, 1994. URL: http://dx.doi.org/10.1145/177424.178042.
  23. W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 1984. Google Scholar
  24. T. Kim, J. C. Bezdek, and R. J. Hathaway. Optimality tests for fixed points of the fuzzy c-means algorithm. Pattern Recognition, 21(6):651-663, 1988. URL: http://dx.doi.org/10.1016/0031-3203(88)90037-4.
  25. S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-137, 1982. URL: http://dx.doi.org/10.1109/tit.1982.1056489.
  26. Mario Lucic, Olivier Bachem, and Andreas Krause. Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 1-9, 2016. Google Scholar
  27. M. R. Rezaee, P. M. J. van der Zwet, B. P. F. Lelieveldt, R. J. van der Geest, and J. H. C. Reiber. A multiresolution image segmentation technique based on pyramidal segmentation and fuzzy clustering. IEEE Transactions on Image Processing, 9(7):1238-1248, 2000. URL: http://dx.doi.org/10.1109/83.847836.
  28. M. Shindler, A. Wong, and A. Meyerson. Fast and accurate k-means for large datasets. In Advances in neural information processing systems, pages 2375-2383, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail