Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means

Authors Anna Arutyunova, Melanie Schmidt



PDF
Thumbnail PDF

File

LIPIcs.STACS.2021.7.pdf
  • Filesize: 0.7 MB
  • 17 pages

Document Identifiers

Author Details

Anna Arutyunova
  • Universität Bonn, Germany
Melanie Schmidt
  • Universität Köln, Germany

Acknowledgements

We thank anonymous reviewers for their detailed comments to a previous version.

Cite AsGet BibTex

Anna Arutyunova and Melanie Schmidt. Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.STACS.2021.7

Abstract

We study k-clustering problems with lower bounds, including k-median and k-means clustering with lower bounds. In addition to the point set P and the number of centers k, a k-clustering problem with (uniform) lower bounds gets a number B. The solution space is restricted to clusterings where every cluster has at least B points. We demonstrate how to approximate k-median with lower bounds via a reduction to facility location with lower bounds, for which O(1)-approximation algorithms are known. Then we propose a new constrained clustering problem with lower bounds where we allow points to be assigned multiple times (to different centers). This means that for every point, the clustering specifies a set of centers to which it is assigned. We call this clustering with weak lower bounds. We give an 8-approximation for k-median clustering with weak lower bounds and an O(1)-approximation for k-means with weak lower bounds. We conclude by showing that at a constant increase in the approximation factor, we can restrict the number of assignments of every point to 2 (or, if we allow fractional assignments, to 1+ε). This also leads to the first bicritera approximation algorithm for k-means with (standard) lower bounds where bicriteria is interpreted in the sense that the lower bounds are violated by a constant factor. All algorithms in this paper run in time that is polynomial in n and k (and d for the Euclidean variants considered).

Subject Classification

ACM Subject Classification
  • Theory of computation → Facility location and clustering
Keywords
  • Clustering with Constraints
  • lower Bounds
  • k-Means
  • Anonymity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marek Adamczyk, Jaroslaw Byrka, Jan Marcinkowski, Syed Mohammad Meesum, and Michal Wlodarczyk. Constant-factor FPT approximation for capacitated k-median. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pages 1:1-1:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ESA.2019.1.
  2. Ankit Aggarwal, Anand Louis, Manisha Bansal, Naveen Garg, Neelima Gupta, Shubham Gupta, and Surabhi Jain. A 3-approximation algorithm for the facility location problem with uniform capacities. Mathematical Programming, 141(1-2):527-547, 2013. URL: https://doi.org/10.1007/s10107-012-0565-4.
  3. Gagan Aggarwal, Rina Panigrahy, Tomás Feder, Dilys Thomas, Krishnaram Kenthapadi, Samir Khuller, and An Zhu. Achieving anonymity via clustering. ACM Transactions on Algorithms (TALG), 6(3):49:1-49:19, 2010. URL: https://doi.org/10.1145/1798596.1798602.
  4. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In Proceedings of the 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 61-72, 2017. URL: https://doi.org/10.1109/FOCS.2017.15.
  5. Sara Ahmadian and Chaitanya Swamy. Improved approximation guarantees for lower-bounded facility location. In Proceedings of the 10th International Workshop on Approximation and Online Algorithms (WAOA), pages 257-271, 2012. URL: https://doi.org/10.1007/978-3-642-38016-7_21.
  6. Sara Ahmadian and Chaitanya Swamy. Approximation algorithms for clustering problems with lower bounds and outliers. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, (ICALP), pages 69:1-69:15, 2016. URL: https://doi.org/10.4230/LIPIcs.ICALP.2016.69.
  7. Anna Arutyunova and Melanie Schmidt. Achieving anonymity via weak lower bound constraints for k-median and k-means. URL: http://arxiv.org/abs/2009.03078v2.
  8. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG), pages 754-767, 2015. URL: https://doi.org/10.4230/LIPIcs.SOCG.2015.754.
  9. Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Transaction on Algorithms (TALG), 13(2):23:1-23:31, 2017. URL: https://doi.org/10.1145/2981561.
  10. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In Proceedings of the 46th International Colloqium on Automata, Languages, and Programming (ICALP), pages 42:1-42:14, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.42.
  11. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science (TCS), 38:293-306, 1985. URL: https://doi.org/10.1016/0304-3975(85)90224-5.
  12. Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. Journal of Algorithms, 31(1):228-248, 1999. URL: https://doi.org/10.1006/jagm.1998.0993.
  13. Sudipto Guha, Adam Meyerson, and Kamesh Munagala. Hierarchical placement and network design problems. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 603-612, 2000. URL: https://doi.org/10.1109/SFCS.2000.892328.
  14. Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM, 33(3):533-550, 1986. URL: https://doi.org/10.1145/5925.5933.
  15. Wen-Lian Hsu and George L. Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics (DAM), 1(3):209-215, 1979. URL: https://doi.org/10.1016/0166-218X(79)90044-1.
  16. Tanmay Inamdar and Kasturi Varadarajan. Capacitated sum-of-radii clustering: An FPT approximation. In Proceedings of the 28th Annual European Symposium on Algorithms (ESA), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 62:1-62:17, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.62.
  17. Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. Journal of the ACM, 50(6):795-824, 2003. URL: https://doi.org/10.1145/950620.950621.
  18. Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 731-740, 2002. URL: https://doi.org/10.1145/509907.510012.
  19. Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM, 48(2):274-296, 2001. URL: https://doi.org/10.1145/375827.375845.
  20. David R. Karger and Maria Minkoff. Building Steiner trees with incomplete global knowledge. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 613-623, 2000. URL: https://doi.org/10.1109/SFCS.2000.892329.
  21. Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Information Processing Letters (IPL), 120:40-43, 2017. URL: https://doi.org/10.1016/j.ipl.2016.11.009.
  22. Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Information and Computation, 222:45-58, 2013. URL: https://doi.org/10.1016/j.ic.2012.01.007.
  23. Shi Li. On facility location with general lower bounds. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2279-2290, 2019. URL: https://doi.org/10.1137/1.9781611975482.138.
  24. Guolong Lin, Chandrashekhar Nagarajan, Rajmohan Rajaraman, and David P. Williamson. A general approach for incremental approximation and hierarchical clustering. SIAM Journal on Computing (SICOMP), 39(8):3633-3669, 2010. URL: https://doi.org/10.1137/070698257.
  25. Richard Matthew McCutchen and Samir Khuller. Streaming algorithms for k-center clustering with outliers and with anonymity. In Proceedings of the 11th International Workshop on Approximation, Randomization and Combinatorial Optimization (APPROX), pages 165-178, 2008. URL: https://doi.org/10.1007/978-3-540-85363-3_14.
  26. Zoya Svitkina. Lower-bounded facility location. ACM Transactions on Algorithms (TALG), 6(4):69, 2010. URL: https://doi.org/10.1145/1824777.1824789.
  27. Jens Vygen. Lecture notes - approximation algorithms for facility location problems, 2004/2005. accessed May 8th, 2019. URL: http://gett.or.uni-bonn.de/~vygen/files/fl.pdf.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail