Document

# Hardness of Approximation for Euclidean k-Median

## File

LIPIcs.APPROX-RANDOM.2021.4.pdf
• Filesize: 0.82 MB
• 23 pages

## Acknowledgements

Dishant Goyal would like to thank TCS Research Scholar Program.

## Cite As

Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of Approximation for Euclidean k-Median. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.4

## Abstract

The Euclidean k-median problem is defined in the following manner: given a set 𝒳 of n points in d-dimensional Euclidean space ℝ^d, and an integer k, find a set C ⊂ ℝ^d of k points (called centers) such that the cost function Φ(C,𝒳) ≡ ∑_{x ∈ 𝒳} min_{c ∈ C} ‖x-c‖₂ is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output β k centers (for constant β > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any β < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of β < 1.28, again assuming UGC.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Facility location and clustering
• Theory of computation → Approximation algorithms analysis
##### Keywords
• Hardness of approximation
• bicriteria approximation
• approximation algorithms
• k-median
• k-means

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. Adaptive sampling for k-means clustering. In Irit Dinur, Klaus Jansen, Joseph Naor, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 12th International Workshop, APPROX 2009, and 13th International Workshop, RANDOM 2009, Berkeley, CA, USA, August 21-23, 2009. Proceedings, volume 5687 of Lecture Notes in Computer Science, pages 15-28. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-03685-9_2.
2. S. Ahmadian, A. Norouzi-Fard, O. Svensson, and J. Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 61-72, October 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
3. Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. Streaming k-means approximation. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 10-18. Curran Associates, Inc., 2009. URL: http://papers.nips.cc/paper/3812-streaming-k-means-approximation.pdf.
4. Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. Np-hardness of euclidean sum-of-squares clustering. Mach. Learn., 75(2):245–248, May 2009. URL: https://doi.org/10.1007/s10994-009-5103-0.
5. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544-562, 2004. URL: https://doi.org/10.1137/S0097539702416402.
6. P. Austrin, S. Khot, and M. Safra. Inapproximability of vertex cover and independent set in bounded degree graphs. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 74-80, 2009. URL: https://doi.org/10.1109/CCC.2009.38.
7. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 754-767, Dagstuhl, Germany, 2015. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.SOCG.2015.754.
8. Sayan Bandyapadhyay and Kasturi Varadarajan. On Variants of k-means Clustering. In Sándor Fekete and Anna Lubiw, editors, 32nd International Symposium on Computational Geometry (SoCG 2016), volume 51 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1-14:15, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.SoCG.2016.14.
9. Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of approximation of euclidean k-median. CoRR, abs/2011.04221, 2020. URL: http://arxiv.org/abs/2011.04221.
10. Mihai Bundefineddoiu, Sariel Har-Peled, and Piotr Indyk. Approximate clustering via core-sets. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, page 250–257, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/509907.509947.
11. Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2), 2017. URL: https://doi.org/10.1145/2981561.
12. Ramaswamy Chandrasekaran and Arie Tamir. Open questions concerning weiszfeld’s algorithm for the fermat-weber location problem. Math. Program., 44(1-3):293-295, 1989. URL: https://doi.org/10.1007/BF01587094.
13. Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, STOC '99, page 1–10, New York, NY, USA, 1999. Association for Computing Machinery. URL: https://doi.org/10.1145/301250.301257.
14. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
15. Michael B. Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. Geometric Median in Nearly Linear Time, page 9–21. Association for Computing Machinery, New York, NY, USA, 2016. URL: https://doi.org/10.1145/2897518.2897647.
16. Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 430-440. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.29.
17. Vincent Cohen-Addad and Karthik C. S. Inapproximability of clustering in lp metrics. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 519-539. IEEE Computer Society, 2019. URL: https://doi.org/10.1109/FOCS.2019.00040.
18. Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. On approximability of clustering problems without candidate centers. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2635-2648. SIAM, 2021. URL: https://doi.org/10.1137/1.9781611976465.156.
19. Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 00:353-364, 2016. URL: https://doi.org/doi.ieeecomputersociety.org/10.1109/FOCS.2016.46.
20. Sanjoy Dasgupta. The hardness of k-means clustering. Technical Report CS2008-0916, Department of Computer Science and Engineering, University of California San Diego, 2008.
21. Zvi Drezner and Horst W Hamacher. Facility location: applications and theory. Springer Science & Business Media, 2001.
22. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, SCG '07, pages 11-18, New York, NY, USA, 2007. ACM. URL: https://doi.org/10.1145/1247069.1247072.
23. Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), 00:365-374, 2016. URL: https://doi.org/doi.ieeecomputersociety.org/10.1109/FOCS.2016.47.
24. Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. J. Algorithms, 31(1):228-248, 1999. URL: https://doi.org/10.1006/jagm.1998.0993.
25. Ragesh Jaiswal, Amit Kumar, and Sandeep Sen. A simple D²-sampling based PTAS for k-means and other clustering problems. Algorithmica, 70(1):22-46, 2014. URL: https://doi.org/10.1007/s00453-013-9833-9.
26. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. In Proceedings of the Eighteenth Annual Symposium on Computational Geometry, SCG ’02, page 10–18, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/513400.513402.
27. J. KRARUP and S. VAJDA. On torricelli’s geometrical solution to a problem of fermat. IMA Journal of Management Mathematics, 8(3):215-224, 1997. URL: https://doi.org/10.1093/imaman/8.3.215.
28. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, 2010. URL: https://doi.org/10.1145/1667053.1667054.
29. Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Information Processing Letters, 120:40-43, 2017. URL: https://doi.org/10.1016/j.ipl.2016.11.009.
30. Shi Li. Approximating capacitated k-median with (1+ε)k open facilities. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 786-796. SIAM, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch56.
31. Shi Li. On uniform capacitated k-median beyond the natural lp relaxation. ACM Trans. Algorithms, 13(2), 2017. URL: https://doi.org/10.1145/2983633.
32. Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’13, page 901–910, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2488608.2488723.
33. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is np-hard. Theoretical Computer Science, 442:13-21, 2012. Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009). URL: https://doi.org/10.1016/j.tcs.2010.05.034.
34. Konstantin Makarychev, Yury Makarychev, and Ilya Razenshteyn. Performance of johnson-lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 1027–1038, New York, NY, USA, 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3313276.3316350.
35. Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. A Bi-Criteria Approximation Algorithm for k-Means. In Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris Umans, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016), volume 60 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1-14:20, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2016.14.
36. Jirí Matousek. On approximate geometric k-clustering. Discret. Comput. Geom., 24(1):61-84, 2000. URL: https://doi.org/10.1007/s004540010019.
37. Nimrod Megiddo and Kenneth J. Supowit. On the complexity of some common geometric location problems. SIAM Journal on Computing, 13(1):182-196, 1984. URL: https://doi.org/10.1137/0213014.
38. P. Milasevic and G. R. Ducharme. Uniqueness of the spatial median. Ann. Statist., 15(3):1332-1333, September 1987. URL: https://doi.org/10.1214/aos/1176350511.
39. Stanislav Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21(4):2308-2335, 2015. URL: https://doi.org/10.3150/14-BEJ645.
40. Andrea Vattani. The hardness of k-means clustering in the plane. Technical report, Department of Computer Science and Engineering, University of California San Diego, 2009.
41. Dennis Wei. A constant-factor bi-criteria approximation guarantee for k-means++. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 604-612. Curran Associates, Inc., 2016. URL: http://papers.nips.cc/paper/6309-a-constant-factor-bi-criteria-approximation-guarantee-for-k-means.pdf.
42. E. WEISZFELD. Sur le point pour lequel la somme des distances de n points donnes est minimum. Tohoku Mathematical Journal, First Series, 43:355-386, 1937.
X

Feedback for Dagstuhl Publishing