Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions

Authors Ragesh Jaiswal , Amit Kumar



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2023.41.pdf
  • Filesize: 0.85 MB
  • 16 pages

Document Identifiers

Author Details

Ragesh Jaiswal
  • CSE, IIT Delhi, India
Amit Kumar
  • CSE, IIT Delhi, India

Cite AsGet BibTex

Ragesh Jaiswal and Amit Kumar. Clustering What Matters in Constrained Settings: Improved Outlier to Outlier-Free Reductions. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 41:1-41:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ISAAC.2023.41

Abstract

Constrained clustering problems generalize classical clustering formulations, e.g., k-median, k-means, by imposing additional constraints on the feasibility of a clustering. There has been significant recent progress in obtaining approximation algorithms for these problems, both in the metric and the Euclidean settings. However, the outlier version of these problems, where the solution is allowed to leave out m points from the clustering, is not well understood. In this work, we give a general framework for reducing the outlier version of a constrained k-median or k-means problem to the corresponding outlier-free version with only (1+ε)-loss in the approximation ratio. The reduction is obtained by mapping the original instance of the problem to f(k, m, ε) instances of the outlier-free version, where f(k, m, ε) = ((k+m)/ε)^O(m). As specific applications, we get the following results: - First FPT (in the parameters k and m) (1+ε)-approximation algorithm for the outlier version of capacitated k-median and k-means in Euclidean spaces with hard capacities. - First FPT (in the parameters k and m) (3+ε) and (9+ε) approximation algorithms for the outlier version of capacitated k-median and k-means, respectively, in general metric spaces with hard capacities. - First FPT (in the parameters k and m) (2-δ)-approximation algorithm for the outlier version of the k-median problem under the Ulam metric. Our work generalizes the results of Bhattacharya et al. and Agrawal et al. to a larger class of constrained clustering problems. Further, our reduction works for arbitrary metric spaces and so can extend clustering algorithms for outlier-free versions in both Euclidean and arbitrary metric spaces.

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • clustering
  • constrained
  • outlier

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Gagan Aggarwal, Rina Panigrahy, Tomás Feder, Dilys Thomas, Krishnaram Kenthapadi, Samir Khuller, and An Zhu. Achieving anonymity via clustering. ACM Trans. Algorithms, 6(3), July 2010. URL: https://doi.org/10.1145/1798596.1798602.
  2. Akanksha Agrawal, Tanmay Inamdar, Saket Saurabh, and Jie Xue. Clustering what matters: Optimal approximation for clustering with outliers, 2023. URL: https://arxiv.org/abs/2212.00696.
  3. S. Ahmadian, A. Norouzi-Fard, O. Svensson, and J. Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 61-72, October 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
  4. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544-562, 2004. URL: https://doi.org/10.1137/S0097539702416402.
  5. Sayan Bandyapadhyay, Fedor V. Fomin, and Kirill Simonov. On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications. In Nikhil Bansal, Emanuela Merelli, and James Worrell, editors, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), volume 198 of Leibniz International Proceedings in Informatics (LIPIcs), pages 23:1-23:15, Dagstuhl, Germany, 2021. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.23.
  6. Suman Bera, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. Fair algorithms for clustering. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL: https://proceedings.neurips.cc/paper_files/paper/2019/file/fc192b0c0d270dbf41870a63a8c76c2f-Paper.pdf.
  7. Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt. On the Cost of Essentially Fair Clusterings. In Dimitris Achlioptas and László A. Végh, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019), volume 145 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1-18:22, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.18.
  8. Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In Nitin Saxena and Sunil Simon, editors, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020), volume 182 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1-13:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2020.13.
  9. Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. Faster algorithms for the constrained k-means problem. Theor. Comp. Sys., 62(1):93-115, January 2018. URL: https://doi.org/10.1007/s00224-017-9820-7.
  10. V. Braverman, V. Cohen-Addad, H. Jiang, R. Krauthgamer, C. Schwiegelshohn, M. Toftrup, and X. Wu. The power of uniform sampling for coresets. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 462-473, Los Alamitos, CA, USA, November 2022. IEEE Computer Society. URL: https://doi.org/10.1109/FOCS54457.2022.00051.
  11. Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer. Clustering permutations: New techniques with streaming applications. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 31:1-31:24. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.ITCS.2023.31.
  12. Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem. Journal of Computer and System Sciences, 65(1):129-149, 2002. URL: https://doi.org/10.1006/jcss.2002.1882.
  13. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. URL: https://doi.org/10.1137/070699007.
  14. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT Approximations for k-Median and k-Means. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 42:1-42:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.42.
  15. Vincent Cohen-Addad and Jason Li. On the Fixed-Parameter Tractability of Capacitated Clustering. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 41:1-41:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.41.
  16. Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 169-182, New York, NY, USA, 2021. Association for Computing Machinery. URL: https://doi.org/10.1145/3406325.3451022.
  17. Rajni Dabas, Neelima Gupta, and Tanmay Inamdar. Fpt approximations for capacitated/fair clustering with outliers, 2023. URL: https://arxiv.org/abs/2305.01471.
  18. Hu Ding. Faster balanced clusterings in high dimension. Theoretical Computer Science, 842:28-40, 2020. URL: https://doi.org/10.1016/j.tcs.2020.07.022.
  19. Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the twenty-third annual symposium on Computational geometry, SCG '07, pages 11-18, New York, NY, USA, 2007. ACM. URL: https://doi.org/10.1145/1247069.1247072.
  20. Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. FPT Approximation for Constrained Metric k-Median/Means. In Yixin Cao and Marcin Pilipczuk, editors, 15th International Symposium on Parameterized and Exact Computation (IPEC 2020), volume 180 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1-14:19, Dagstuhl, Germany, 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.IPEC.2020.14.
  21. Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, Shi Li, and Barna Saha. A constant factor approximation algorithm for fault-tolerant k-median. ACM Trans. Algorithms, 12(3), April 2016. URL: https://doi.org/10.1145/2854153.
  22. Lingxiao Huang, Shaofeng H. C. Jiang, Jianing Lou, and Xuan Wu. Near-optimal coresets for robust clustering, 2022. URL: https://arxiv.org/abs/2210.10394.
  23. Tanmay Inamdar and Kasturi Varadarajan. Fault tolerant clustering with outliers. In Evripidis Bampis and Nicole Megow, editors, Approximation and Online Algorithms, pages 188-201, Cham, 2020. Springer International Publishing. Google Scholar
  24. Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yogish Sabharwal, and Barna Saha. The matroid median problem. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '11, pages 1117-1130, USA, 2011. Society for Industrial and Applied Mathematics. Google Scholar
  25. Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pages 646-659, New York, NY, USA, 2018. Association for Computing Machinery. URL: https://doi.org/10.1145/3188745.3188882.
  26. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, February 2010. URL: https://doi.org/10.1145/1667053.1667054.
  27. Clemens Rösner and Melanie Schmidt. Privacy Preserving Clustering with Constraints. In Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella, editors, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 96:1-96:14, Dagstuhl, Germany, 2018. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.96.