Clustering with Faulty Centers

Authors Kyle Fox, Hongyao Huang, Benjamin Raichel



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2022.10.pdf
  • Filesize: 0.64 MB
  • 14 pages

Document Identifiers

Author Details

Kyle Fox
  • University of Texas at Dallas, TX, USA
Hongyao Huang
  • University of Texas at Dallas, TX, USA
Benjamin Raichel
  • University of Texas at Dallas, TX, USA

Cite As Get BibTex

Kyle Fox, Hongyao Huang, and Benjamin Raichel. Clustering with Faulty Centers. In 33rd International Symposium on Algorithms and Computation (ISAAC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 248, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ISAAC.2022.10

Abstract

In this paper we introduce and formally study the problem of k-clustering with faulty centers. Specifically, we study the faulty versions of k-center, k-median, and k-means clustering, where centers have some probability of not existing, as opposed to prior work where clients had some probability of not existing. For all three problems we provide fixed parameter tractable algorithms, in the parameters k, d, and ε, that (1+ε)-approximate the minimum expected cost solutions for points in d dimensional Euclidean space. For Faulty k-center we additionally provide a 5-approximation for general metrics. Significantly, all of our algorithms have a small dependence on n. Specifically, our Faulty k-center algorithms have only linear dependence on n, while for our algorithms for Faulty k-median and Faulty k-means the dependence is still only n^(1 + o(1)).

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
  • Theory of computation → Computational geometry
Keywords
  • clustering
  • approximation
  • probabilistic input
  • uncertain input

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. P. K. Agarwal and C. M. Procopiuc. Exact and approximation algorithms for clustering. Algorithmica, 33(2):201-226, 2002. URL: https://doi.org/10.1007/s00453-001-0110-y.
  2. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. Np-hardness of euclidean sum-of-squares clustering. Mach. Learn., 75(2):245-248, 2009. URL: https://doi.org/10.1007/s10994-009-5103-0.
  3. Michael Chau, Reynold Cheng, Ben Kao, and Jackey Ng. Uncertain data mining: An example in clustering location data. In Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference (PAKDD), volume 3918, pages 199-204. Springer, 2006. URL: https://doi.org/10.1007/11731139_24.
  4. Shiva Chaudhuri, Naveen Garg, and R. Ravi. The p-neighbor k-center problem. Inf. Process. Lett., 65(3):131-134, 1998. URL: https://doi.org/10.1016/S0020-0190(97)00224-X.
  5. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM J. Comput., 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
  6. Vincent Cohen-Addad, Marcin Pilipczuk, and Michal Pilipczuk. Efficient approximation schemes for uniform-cost clustering problems in planar graphs. In 27th Annual European Symposium on Algorithms (ESA), pages 33:1-33:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.33.
  7. Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 191-200. ACM, 2008. URL: https://doi.org/10.1145/1376916.1376944.
  8. T. Feder and D. H. Greene. Optimal algorithms for approximate clustering. In 20th Annual ACM Symposium on Theory of Computing (STOC), pages 434-444. ACM, 1988. URL: https://doi.org/10.1145/62212.62255.
  9. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Jeff Erickson, editor, Proceedings of the 23rd ACM Symposium on Computational Geometry, Gyeongju, South Korea, June 6-8, 2007, pages 11-18. ACM, 2007. URL: https://doi.org/10.1145/1247069.1247072.
  10. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306, 1985. URL: https://doi.org/10.1016/0304-3975(85)90224-5.
  11. Sudipto Guha and Kamesh Munagala. Exceeding expectations and clustering uncertain data. In Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 269-278. ACM, 2009. URL: https://doi.org/10.1145/1559795.1559836.
  12. Sariel Har-Peled. Geometric Approximation Algorithms, volume 173 of Mathematical Surveys and Monographs. AMS, Boston, MA, USA, 2011. Google Scholar
  13. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In 36th Annual ACM Symposium on Theory of Computing (STOC), pages 291-300, 2004. URL: https://doi.org/10.1145/1007352.1007400.
  14. Sariel Har-Peled and Benjamin Raichel. Net and prune: A linear time algorithm for euclidean distance problems. J. ACM, 62(6):44:1-44:35, 2015. URL: https://doi.org/10.1145/2831230.
  15. D. S. Hochbaum and D. B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operations Research, 10(2):180-184, 1985. URL: https://doi.org/10.1287/moor.10.2.180.
  16. W. Hsu and G. L. Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics, 1(3):209-215, 1979. URL: https://doi.org/10.1016/0166-218X(79)90044-1.
  17. Lingxiao Huang and Jian Li. Stochastic k-center and j-flat-center problems. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 110-129. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974782.8.
  18. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 731-740. ACM, 2002. URL: https://doi.org/10.1145/509907.510012.
  19. Pegah Kamousi, Timothy M. Chan, and Subhash Suri. Closest pair and the post office problem for stochastic points. In Algorithms and Data Structures - 12th International Symposium (WADS), volume 6844 of Lecture Notes in Computer Science, pages 548-559. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22300-6_46.
  20. Pegah Kamousi, Timothy M. Chan, and Subhash Suri. Stochastic minimum spanning trees in euclidean spaces. In Proceedings of the 27th ACM Symposium on Computational Geometry (SoCG), pages 65-74. ACM, 2011. URL: https://doi.org/10.1145/1998196.1998206.
  21. Samir Khuller, Robert Pless, and Yoram J. Sussmann. Fault tolerant k-center problems. Theor. Comput. Sci., 242(1-2):237-245, 2000. URL: https://doi.org/10.1016/S0304-3975(98)00222-9.
  22. Nirman Kumar and Benjamin Raichel. Fault tolerant clustering revisited. In Proceedings of the 25th Canadian Conference on Computational Geometry (CCCG). Carleton University, Ottawa, Canada, 2013. URL: http://cccg.ca/proceedings/2013/papers/paper_36.pdf.
  23. Christiane Lammersen, Melanie Schmidt, and Christian Sohler. Probabilistic k-median clustering in data streams. Theory Comput. Syst., 56(1):251-290, 2015. URL: https://doi.org/10.1007/s00224-014-9539-7.
  24. Alexander Munteanu, Christian Sohler, and Dan Feldman. Smallest enclosing ball for probabilistic data. In 30th Annual Symposium on Computational Geometry (SoCG), page 214. ACM, 2014. URL: https://doi.org/10.1145/2582112.2582114.
  25. Wang Kay Ngai, Ben Kao, Chun Kit Chui, Reynold Cheng, Michael Chau, and Kevin Y. Yip. Efficient clustering of uncertain data. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pages 436-445. IEEE Computer Society, 2006. URL: https://doi.org/10.1109/ICDM.2006.63.
  26. Haitao Wang and Jingru Zhang. One-dimensional k-center on uncertain data. Theor. Comput. Sci., 602:114-124, 2015. URL: https://doi.org/10.1016/j.tcs.2015.08.017.
  27. William Henry Young. On classes of summable functions and their fourier series. Proc. R. Soc. Lond. A, 87:225-229, 1912. URL: https://doi.org/10.1098/rspa.1912.0076.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail