Clustering Under Perturbation Stability in Near-Linear Time

Authors Pankaj K. Agarwal, Hsien-Chih Chang, Kamesh Munagala, Erin Taylor, Emo Welzl



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2020.8.pdf
  • Filesize: 0.54 MB
  • 16 pages

Document Identifiers

Author Details

Pankaj K. Agarwal
  • Department of Computer Science, Duke University, Durham, NC, USA
Hsien-Chih Chang
  • Department of Computer Science, Dartmouth College, Hanover, NH, USA
Kamesh Munagala
  • Department of Computer Science, Duke University, Durham, NC, USA
Erin Taylor
  • Department of Computer Science, Duke University, Durham, NC, USA
Emo Welzl
  • Department of Computer Science, ETH Zürich, Switzerland

Cite AsGet BibTex

Pankaj K. Agarwal, Hsien-Chih Chang, Kamesh Munagala, Erin Taylor, and Emo Welzl. Clustering Under Perturbation Stability in Near-Linear Time. In 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 182, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.FSTTCS.2020.8

Abstract

We consider the problem of center-based clustering in low-dimensional Euclidean spaces under the perturbation stability assumption. An instance is α-stable if the underlying optimal clustering continues to remain optimal even when all pairwise distances are arbitrarily perturbed by a factor of at most α. Our main contribution is in presenting efficient exact algorithms for α-stable clustering instances whose running times depend near-linearly on the size of the data set when α ≥ 2 + √3. For k-center and k-means problems, our algorithms also achieve polynomial dependence on the number of clusters, k, when α ≥ 2 + √3 + ε for any constant ε > 0 in any fixed dimension. For k-median, our algorithms have polynomial dependence on k for α > 5 in any fixed dimension; and for α ≥ 2 + √3 in two dimensions. Our algorithms are simple, and only require applying techniques such as local search or dynamic programming to a suitably modified metric space, combined with careful choice of data structures.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • clustering
  • stability
  • local search
  • dynamic programming
  • coreset
  • polyhedral metric
  • trapezoid decomposition
  • range query

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Margareta Ackerman and Shai Ben-David. Clusterability: A theoretical study. In Proceedings of of the 12th International Conference on Artificial Intelligence and Statistics, volume 5 of JMLR Proceedings, pages 1-8, 2009. Google Scholar
  2. Peyman Afshani, Jérémy Barbay, and Timothy M Chan. Instance-optimal geometric algorithms. Journal of the ACM (JACM), 64(1):3, 2017. Google Scholar
  3. Pankaj K Agarwal, Jeff Erickson, et al. Geometric range searching and its relatives. Contemporary Mathematics, 223:1-56, 1999. Google Scholar
  4. Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1-30, 2005. Google Scholar
  5. Pankaj K Agarwal and Jiří Matoušek. Ray shooting and parametric search. SIAM Journal on Computing, 22(4):794-806, 1993. Google Scholar
  6. Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Approximate clustering with same-cluster queries. In Anna R. Karlin, editor, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pages 40:1-40:21, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ITCS.2018.40.
  7. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75(2):245-248, 2009. Google Scholar
  8. Omer Angel, Sébastien Bubeck, Yuval Peres, and Fan Wei. Local max-cut in smoothed polynomial time. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 429-437. ACM, 2017. Google Scholar
  9. Haris Angelidakis, Pranjal Awasthi, Avrim Blum, Vaggos Chatziafratis, and Chen Dan. Bilu-Linial stability, certified algorithms and the independent set problem. Preprint, October 2018. Google Scholar
  10. Haris Angelidakis, Konstantin Makarychev, and Yury Makarychev. Algorithms for stable and perturbation-resilient problems. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 438-451. ACM, 2017. Google Scholar
  11. Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Euclidean k-medians and related problems. In STOC, volume 98, pages 106-113, 1998. Google Scholar
  12. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544-562, January 2004. URL: https://doi.org/10.1137/S0097539702416402.
  13. Hassan Ashtiani, Shrinu Kushagra, and Shai Ben-David. Clustering with same-cluster queries. In Advances in neural information processing systems, pages 3216-3224, 2016. Google Scholar
  14. Pranjal Awasthi, Avrim Blum, and Or Sheffet. Center-based clustering under perturbation stability. Information Processing Letters, 112(1-2):49-54, 2012. Google Scholar
  15. Mihai Bādoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 250-257. ACM, 2002. Google Scholar
  16. Ainesh Bakshi and Nadiia Chepurko. Polynomial time algorithm for 2-stable clustering instances. Preprint, July 2016. Google Scholar
  17. Maria-Florina Balcan, Avrim Blum, and Anupam Gupta. Approximate clustering without the approximation. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, pages 1068-1077. Society for Industrial and Applied Mathematics, 2009. Google Scholar
  18. Maria-Florina Balcan, Avrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 671-680. ACM, 2008. Google Scholar
  19. Maria-Florina Balcan and Mark Braverman. Nash equilibria in perturbation-stable games. Theory of Computing, 13(1):1-31, 2017. Google Scholar
  20. Maria-Florina Balcan, Nika Haghtalab, and Colin White. k-center clustering under perturbation resilience. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and Davide Sangiorgi, editors, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 68:1-68:14, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.68.
  21. Maria Florina Balcan and Yingyu Liang. Clustering under perturbation resilience. SIAM Journal on Computing, 45(1):102-155, 2016. Google Scholar
  22. Shalev Ben-David and Lev Reyzin. Data stability in clustering: A closer look. Theoretical Computer Science, 558(1):51-61, 2014. Google Scholar
  23. Yonatan Bilu and Nathan Linial. Are stable instances easy? Combinatorics, Probability and Computing, 21(5):643-660, 2012. Google Scholar
  24. Paul B Callahan and S Rao Kosaraju. Faster algorithms for some geometric graph problems in higher dimensions. In Proceedings of the fourth annual ACM-SIAM symposium on Discrete algorithms, pages 291-300. Society for Industrial and Applied Mathematics, 1993. Google Scholar
  25. Chandra Chekuri and Shalmoli Gupta. Perturbation resilient clustering for k-center and related problems via LP relaxations. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018), volume 116 of Leibniz International Proceedings in Informatics (LIPIcs), pages 9:1-9:16, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.9.
  26. Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '18, pages 430-440, Philadelphia, PA, USA, 2018. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=3174304.3175298.
  27. Vincent Cohen-Addad, Arnaud de Mesmay, Eva Rotenberg, and Alan Roytman. The bane of low-dimensionality clustering. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '18, pages 441-456, Philadelphia, PA, USA, 2018. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=3174304.3175300.
  28. Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximation schemes for clustering in doubling metrics. Preprint, June 2019. Google Scholar
  29. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT Approximations for k-Median and k-Means. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 42:1-42:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  30. Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. SIAM Journal on Computing, 48(2):644-667, 2019. Google Scholar
  31. Vincent Cohen-Addad and Chris Schwiegelshohn. On the local structure of stable clustering instances. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 49-60. IEEE, 2017. Google Scholar
  32. Sanjoy Dasgupta. The hardness of k-means clustering. Technical report, Department of Computer Science and Engineering, University of California, September 2008. Google Scholar
  33. Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational geometry. In Computational geometry, pages 1-17. Springer, 1997. Google Scholar
  34. Amit Deshpande, Anand Louis, and Apoorv Vikram Singh. On Euclidean k-means clustering with α-center proximity. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2087-2095, 2019. Google Scholar
  35. Abhratanu Dutta, Aravindan Vijayaraghavan, and Alex Wang. Clustering stable instances of Euclidean k-means. In Advances in Neural Information Processing Systems, pages 6500-6509, 2017. Google Scholar
  36. Tomás Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 434-444. ACM, 1988. Google Scholar
  37. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A ptas for k-means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, pages 11-18. ACM, 2007. Google Scholar
  38. Zachary Friggstad, Kamyar Khodamoradi, and Mohammad R. Salavatipour. Exact algorithms and lower bounds for stable instances of Euclidean k-means. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '19, pages 2958-2972, Philadelphia, PA, USA, 2019. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=3310435.3310618.
  39. Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. SIAM Journal on Computing, 48(2):452-480, 2019. URL: https://doi.org/10.1137/17M1127181.
  40. Sariel Har-Peled. No, coreset, no cry. In International Conference on Foundations of Software Technology and Theoretical Computer Science, pages 324-335. Springer, 2004. Google Scholar
  41. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 291-300. ACM, 2004. Google Scholar
  42. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338-355, 1984. Google Scholar
  43. Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k-center problem. Math. Oper. Res., 10(2):180-184, May 1985. Google Scholar
  44. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 731-740. ACM, 2002. Google Scholar
  45. Amit Kumar and Ravindran Kannan. Clustering with spectral norm and the k-means algorithm. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science (FOCS), pages 299-308. IEEE, 2010. Google Scholar
  46. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. A simple linear time (1+ epsilon)-approximation algorithm for k-means clustering in any dimensions. In Annual Symposium on Foundations of Computer Science, volume 45, pages 454-462. IEEE COMPUTER SOCIETY PRESS, 2004. Google Scholar
  47. Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Information Processing Letters, 120:40-43, 2017. Google Scholar
  48. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. Theoretical Computer Science, 442:13-21, 2012. Google Scholar
  49. Konstantin Makarychev, Yury Makarychev, and Aravindan Vijayaraghavan. Bilu-Linial stable instances of max cut and minimum multiway cut. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 890-906. SIAM, 2014. Google Scholar
  50. Nimrod Megiddo and Kenneth J Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182-196, 1984. Google Scholar
  51. Matúš Mihalák, Marcel Schöngens, Rastislav Šrámek, and Peter Widmayer. On the complexity of the metric TSP under stability considerations. In International Conference on Current Trends in Theory and Practice of Computer Science, pages 382-393. Springer, 2011. Google Scholar
  52. Rafail Ostrovsky, Yuval Rabani, Leonard J Schulman, and Chaitanya Swamy. The effectiveness of Lloyd-type methods for the k-means problem. Journal of the ACM (JACM), 59(6):28, 2012. Google Scholar
  53. Daniel D. Sleator and Robert Endre Tarjan. A data structure for dynamic trees. J. Comput. Syst. Sci., 26(3):362-391, 1983. Google Scholar
  54. Csaba D Toth, Joseph O'Rourke, and Jacob E Goodman, editors. Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail