FPT Approximation for Constrained Metric k-Median/Means

Goyal, Dishant; Jaiswal, Ragesh; Kumar, Amit

doi:10.4230/LIPIcs.IPEC.2020.14

Abstract

The Metric k-median problem over a metric space (𝒳, d) is defined as follows: given a set L ⊆ 𝒳 of facility locations and a set C ⊆ 𝒳 of clients, open a set F ⊆ L of k facilities such that the total service cost, defined as Φ(F, C) := ∑_{x ∈ C} min_{f ∈ F} d(x, f), is minimised. The metric k-means problem is defined similarly using squared distances (i.e., d²(., .) instead of d(., .)). In many applications there are additional constraints that any solution needs to satisfy. For example, to balance the load among the facilities in resource allocation problems, a capacity u is imposed on every facility. That is, no more than u clients can be assigned to any facility. This problem is known as the capacitated k-means/k-median problem. Likewise, various other applications have different constraints, which give rise to different constrained versions of the problem such as r-gather, fault-tolerant, outlier k-means/k-median problem. Surprisingly, for many of these constrained problems, no constant-approximation algorithm is known. Moreover, the unconstrained problem itself is known [Marek Adamczyk et al., 2019] to be W[2]-hard when parameterized by k. We give FPT algorithms with constant approximation guarantee for a range of constrained k-median/means problems. For some of the constrained problems, ours is the first constant factor approximation algorithm whereas for others, we improve or match the approximation guarantee of previous works. We work within the unified framework of Ding and Xu [Ding and Xu, 2015] that allows us to simultaneously obtain algorithms for a range of constrained problems. In particular, we obtain a (3+ε)-approximation and (9+ε)-approximation for the constrained versions of the k-median and k-means problem respectively in FPT time. In many practical settings of the k-median/means problem, one is allowed to open a facility at any client location, i.e., C ⊆ L. For this special case, our algorithm gives a (2+ε)-approximation and (4+ε)-approximation for the constrained versions of k-median and k-means problem respectively in FPT time. Since our algorithm is based on simple sampling technique, it can also be converted to a constant-pass log-space streaming algorithm. In particular, here are some of the main highlights of this work:
1) For the uniform capacitated k-median/means problems our results matches previously known results of Addad et al. [Vincent Cohen-Addad and Jason Li, 2019].
2) For the r-gather k-median/means problem (clustering with lower bound on the size of clusters), our FPT approximation bounds are better than what was previously known.
3) Our approximation bounds for the fault-tolerant, outlier, and uncertain versions is better than all previously known results, albeit in FPT time.
4) For certain constrained settings such as chromatic, l-diversity, and semi-supervised k-median/means, we obtain the first constant factor approximation algorithms to the best of our knowledge.
5) Since our algorithms are based on a simple sampling based approach, we also obtain constant-pass log-space streaming algorithms for most of the above-mentioned problems.

Marek Adamczyk, Jaroslaw Byrka, Jan Marcinkowski, Syed M. Meesum, and Michal Wlodarczyk. Constant-factor FPT approximation for capacitated k-median. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms (ESA 2019), volume 144 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1-1:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.1.
Gagan Aggarwal, Rina Panigrahy, Tomás Feder, Dilys Thomas, Krishnaram Kenthapadi, Samir Khuller, and An Zhu. Achieving anonymity via clustering. ACM Trans. Algorithms, 6(3), July 2010. URL: https://doi.org/10.1145/1798596.1798602.
S. Ahmadian, A. Norouzi-Fard, O. Svensson, and J. Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS 2017), pages 61-72, October 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat. NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn., 75(2):245-248, May 2009. URL: https://doi.org/10.1007/s10994-009-5103-0.
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Rec., 28(2):49–60, June 1999. URL: https://doi.org/10.1145/304181.304187.
David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pages 1027-1035, USA, 2007. Society for Industrial and Applied Mathematics.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544-562, 2004. URL: https://doi.org/10.1137/S0097539702416402.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry (SoCG 2015), volume 34 of Leibniz International Proceedings in Informatics (LIPIcs), pages 754-767, Dagstuhl, Germany, 2015. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.SOCG.2015.754.
Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Faster algorithms for the constrained k-means problem. Theor. Comp. Sys., 62(1):93–115, January 2018. URL: https://doi.org/10.1007/s00224-017-9820-7.
Vladimir Braverman, Adam Meyerson, Rafail Ostrovsky, Alan Roytman, Michael Shindler, and Brian Tagiku. Streaming k-means on well-clusterable data. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, page 26–40, USA, 2011. Society for Industrial and Applied Mathematics.
Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2), March 2017. URL: https://doi.org/10.1145/2981561.
Jaroslaw Byrka, Piotr Skowron, and Krzysztof Sornat. Proportional Approval Voting, Harmonic k-median, and Negative Association. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1-26:14, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.26.
Parinya Chalermsook, Marek Cygan, Guy Kortsarz, Bundit Laekhanukit, Pasin Manurangsi, Danupon Nanongkai, and Luca Trevisan. From gap-eth to fpt-inapproximability: Clique, dominating set, and more. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS 2017), pages 743-754, 2017.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1):129-149, 2002. URL: https://doi.org/10.1006/jcss.2002.1882.
Ke Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
Vincent Cohen-Addad and Karthik C.S. Inapproximability of clustering in lp metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS 2019), pages 519-539, 2019.
Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT Approximations for k-Median and k-Means. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 42:1-42:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.42.
Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in Euclidean and minor-free metrics. 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS 2016), 00:353-364, 2016. URL: https://doi.org/doi.ieeecomputersociety.org/10.1109/FOCS.2016.46.
Vincent Cohen-Addad and Jason Li. On the Fixed-Parameter Tractability of Capacitated Clustering. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 41:1-41:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.41.
Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’08, page 191–200, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi.org/10.1145/1376916.1376944.
Marek Cygan, MohammadTaghi Hajiaghayi, and Samir Khuller. LP rounding for k-centers with non-uniform hard capacities. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 273-282, 2012.
Sanjoy Dasgupta. The hardness of k-means clustering. Technical Report CS2008-0916, Department of Computer Science and Engineering, University of California San Diego, 2008.
Hu Ding. Faster balanced clusterings in high dimension. Theoretical Computer Science, 2020. URL: https://doi.org/10.1016/j.tcs.2020.07.022.
Hu Ding and Jinhui Xu. A unified framework for clustering constrained data without locality property. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, page 1471–1490, USA, 2015. Society for Industrial and Applied Mathematics.
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, page 226–231. AAAI Press, 1996.
Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, SCG 2007, pages 11-18, New York, NY, USA, 2007. ACM. URL: https://doi.org/10.1145/1247069.1247072.
Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. SIAM Journal on Computing, 48(2):452-480, 2019. URL: https://doi.org/10.1137/17M1127181.
Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. Streaming PTAS for constrained k-means, 2019. URL: http://arxiv.org/abs/1909.07511.
Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. Journal of Algorithms, 31(1):228-248, 1999. URL: https://doi.org/10.1006/jagm.1998.0993.
Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility location. CoRR, abs/0809.2554, 2008. URL: http://arxiv.org/abs/0809.2554.
Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, Shi Li, and Barna Saha. A constant factor approximation algorithm for fault-tolerant k-median. ACM Trans. Algorithms, 12(3), April 2016. URL: https://doi.org/10.1145/2854153.
Erez Hartuv and Ron Shamir. A clustering algorithm based on graph connectivity. Information Processing Letters, 76(4):175-181, 2000. URL: https://doi.org/10.1016/S0020-0190(00)00142-3.
Anil K. Jain, M Narasimha Murty, and P. J. Flynn. Data clustering: A review. ACM Comput. Surv., 31(3):264–323, September 1999. URL: https://doi.org/10.1145/331499.331504.
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pages 731-740, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/509907.510012.
Ragesh Jaiswal, Amit Kumar, and Sandeep Sen. A simple D²-sampling based PTAS for k-means and other clustering problems. Algorithmica, 70(1):22-46, 2014. URL: https://doi.org/10.1007/s00453-013-9833-9.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. In Proceedings of the Eighteenth Annual Symposium on Computational Geometry, SCG 2002, page 10–18, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/513400.513402.
Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, page 646–659, New York, NY, USA, 2018. Association for Computing Machinery. URL: https://doi.org/10.1145/3188745.3188882.
Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, February 2010. URL: https://doi.org/10.1145/1667053.1667054.
Shi Li. Approximating capacitated k-median with (1 + ε)k open facilities. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, page 786–796, USA, 2016. Society for Industrial and Applied Mathematics.
Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2013, page 901–910, New York, NY, USA, 2013. Association for Computing Machinery. URL: https://doi.org/10.1145/2488608.2488723.
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is NP-hard. Theoretical Computer Science, 442:13-21, 2012. Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009). URL: https://doi.org/10.1016/j.tcs.2010.05.034.
Pasin Manurangsi. Tight running time lower bounds for strong inapproximability of maximum k-coverage, unique set cover and related problems (via t-wise agreement testing theorem). In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, page 62?81, USA, 2020. Society for Industrial and Applied Mathematics.
Pitu B. Mirchandani and Richard L. Francis. Discrete Location Theory. Wiley, 1990.
Latanya Sweeney. k-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557–570, October 2002. URL: https://doi.org/10.1142/S0218488502001648.
Andrea Vattani. The hardness of k-means clustering in the plane. Technical report, Department of Computer Science and Engineering, University of California San Diego, 2009.
J S Vitter. Random sampling with a reservoir. ACM Trans. Math. Software, 11(1):37-57, 1985.
Dongkuan Xu and Yingjie Tian. A comprehensive survey of clustering algorithms. Annals of Data Science, 2, August 2015. URL: https://doi.org/10.1007/s40745-015-0040-1.

FPT Approximation for Constrained Metric k-Median/Means

Authors Dishant Goyal, Ragesh Jaiswal, Amit Kumar

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

FPT Approximation for Constrained Metric k-Median/Means

Authors Dishant Goyal, Ragesh Jaiswal, Amit Kumar

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message