Document Open Access Logo

Coresets for Clustering in Geometric Intersection Graphs

Authors Sayan Bandyapadhyay , Fedor V. Fomin , Tanmay Inamdar



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2023.10.pdf
  • Filesize: 0.88 MB
  • 16 pages

Document Identifiers

Author Details

Sayan Bandyapadhyay
  • Portland State University, OR, USA
Fedor V. Fomin
  • University of Bergen, Norway
Tanmay Inamdar
  • University of Bergen, Norway

Cite AsGet BibTex

Sayan Bandyapadhyay, Fedor V. Fomin, and Tanmay Inamdar. Coresets for Clustering in Geometric Intersection Graphs. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 10:1-10:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.SoCG.2023.10

Abstract

Designing coresets - small-space sketches of the data preserving cost of the solutions within (1± ε)-approximate factor - is an important research direction in the study of center-based k-clustering problems, such as k-means or k-median. Feldman and Langberg [STOC'11] have shown that for k-clustering of n points in general metrics, it is possible to obtain coresets whose size depends logarithmically in n. Moreover, such a dependency in n is inevitable in general metrics. A significant amount of recent work in the area is devoted to obtaining coresests whose sizes are independent of n for special metrics, like d-dimensional Euclidean space [Huang, Vishnoi, STOC'20], doubling metrics [Huang, Jiang, Li, Wu, FOCS'18], metrics of graphs of bounded treewidth [Baker, Braverman, Huang, Jiang, Krauthgamer, Wu, ICML’20], or graphs excluding a fixed minor [Braverman, Jiang, Krauthgamer, Wu, SODA’21]. In this paper, we provide the first constructions of coresets whose size does not depend on n for k-clustering in the metrics induced by geometric intersection graphs. For example, we obtain (k log²k)/ε^𝒪(1) size coresets for k-clustering in Euclidean-weighted unit-disk graphs (UDGs) and unit-square graphs (USGs). These constructions follow from a general theorem that identifies two canonical properties of a graph metric sufficient for obtaining coresets whose size is independent of n. The proof of our theorem builds on the recent work of Cohen-Addad, Saulpic, and Schwiegelshohn [STOC '21], which ensures small-sized coresets conditioned on the existence of an interesting set of centers, called centroid set. The main technical contribution of our work is the proof of the existence of such a small-sized centroid set for graphs that satisfy the two canonical properties. Loosely speaking, the metrics of geometric intersection graphs are "similar" to the Euclidean metrics for points that are close, and to the shortest path metrics of planar graphs for points that are far apart. The main technical challenge in constructing centroid sets of small sizes is in combining these two very different metrics. The new coreset construction helps to design the first (1+ε)-approximation for center-based clustering problems in UDGs and USGs, that is fixed-parameter tractable in k and ε (FPT-AS).

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
  • Theory of computation → Facility location and clustering
  • Theory of computation → Sparsification and spanners
Keywords
  • k-median
  • k-means
  • clustering
  • coresets
  • geometric graphs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 61-72. IEEE Computer Society, 2017. Google Scholar
  2. Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for euclidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 106-113, New York, NY, USA, 1998. ACM. URL: https://doi.org/10.1145/276698.276718.
  3. Daniel N. Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in graphs of bounded treewidth. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 569-579. PMLR, 2020. URL: http://proceedings.mlr.press/v119/baker20a.html.
  4. Hari Balakrishnan, Christopher L Barrett, VS Anil Kumar, Madhav V Marathe, and Shripad Thite. The distance-2 matching problem and its relationship to the mac-layer capacity of ad hoc wireless networks. IEEE Journal on Selected Areas in Communications, 22(6):1069-1079, 2004. Google Scholar
  5. Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and the johnson-lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 1039-1050, 2019. Google Scholar
  6. Amariah Becker, Philip N Klein, and David Saulpic. Polynomial-time approximation schemes for k-center, k-median, and capacitated vehicle routing in bounded highway dimension. In 26th Annual European Symposium on Algorithms (ESA 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  7. Ahmad Biniaz. Plane hop spanners for unit disk graphs: Simpler and better. Comput. Geom., 89:101622, 2020. URL: https://doi.org/10.1016/j.comgeo.2020.101622.
  8. Vladimir Braverman, Shaofeng H-C Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in excluded-minor graphs and beyond. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2679-2696. SIAM, 2021. Google Scholar
  9. Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1-23:31, 2017. Google Scholar
  10. Timothy M. Chan and Dimitrios Skrepetos. Approximate shortest paths and distance oracles in weighted unit-disk graphs. J. Comput. Geom., 10(2):3-20, 2019. URL: https://doi.org/10.20382/jocg.v10i2a2.
  11. Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 1-10, 1999. Google Scholar
  12. Moses Charikar and Shi Li. A dependent lp-rounding approach for the k-median problem. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, volume 7391 of Lecture Notes in Computer Science, pages 194-205. Springer, 2012. Google Scholar
  13. Ke Chen. On coresets for k-median and k-means clustering in metric and euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  14. Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximation schemes for clustering in doubling metrics. Journal of the ACM (JACM), 68(6):1-34, 2021. Google Scholar
  15. Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 42:1-42:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.42.
  16. Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. SIAM J. Comput., 48(2):644-667, 2019. URL: https://doi.org/10.1137/17M112717X.
  17. Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, and Chris Schwiegelshohn. Towards optimal lower bounds for k-median and k-means coresets. In Stefano Leonardi and Anupam Gupta, editors, STOC '22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1038-1051. ACM, 2022. URL: https://doi.org/10.1145/3519935.3519946.
  18. Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset framework for clustering. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC '21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 169-182. ACM, 2021. URL: https://doi.org/10.1145/3406325.3451022.
  19. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Lance Fortnow and Salil P. Vadhan, editors, Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pages 569-578. ACM, 2011. URL: https://doi.org/10.1145/1993636.1993712.
  20. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 569-578, 2011. Google Scholar
  21. Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca, and projective clustering. SIAM Journal on Computing, 49(3):601-657, 2020. Google Scholar
  22. Gereon Frahling and Christian Sohler. Coresets in dynamic geometric data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 209-217, 2005. Google Scholar
  23. Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. SIAM J. Comput., 48(2):452-480, 2019. URL: https://doi.org/10.1137/17M1127181.
  24. Jie Gao and Li Zhang. Well-separated pair decomposition for the unit-disk graph metric and its applications. SIAM Journal on Computing, 35(1):151-169, 2005. Google Scholar
  25. Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering. Discret. Comput. Geom., 37(1):3-19, 2007. Google Scholar
  26. Sariel Har-Peled and Soham Mazumdar. On coresets for k-means and k-median clustering. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 291-300. ACM, 2004. Google Scholar
  27. Lingxiao Huang, Shaofeng H-C Jiang, Jian Li, and Xuan Wu. Epsilon-coresets for clustering (with outliers) in doubling metrics. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 814-825. IEEE, 2018. Google Scholar
  28. Lingxiao Huang and Nisheeth K. Vishnoi. Coresets for clustering in euclidean spaces: importance sampling is nearly optimal. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 1416-1429. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384296.
  29. Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274-296, 2001. Google Scholar
  30. Ken-ichi Kawarabayashi, Christian Sommer, and Mikkel Thorup. More compact oracles for approximate distances in undirected planar graphs. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 550-563. SIAM, 2013. URL: https://doi.org/10.1137/1.9781611973105.40.
  31. Fabian Kuhn, Tim Nieberg, Thomas Moscibroda, and Rogert Wattenhofer. Local approximation schemes for ad hoc and sensor networks. In Proceedings of the 2005 joint workshop on Foundations of mobile computing, pages 97-103, 2005. Google Scholar
  32. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1-5:32, 2010. Google Scholar
  33. Michael Langberg and Leonard J Schulman. Universal ε-approximators for integrals. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 598-607. SIAM, 2010. Google Scholar
  34. Emmanuelle Lebhar and Zvi Lotker. Unit disk graph and physical interference model: Putting pieces together. In 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1-8. IEEE, 2009. Google Scholar
  35. Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM J. Comput., 45(2):530-547, 2016. Google Scholar
  36. Xiang-Yang Li, Gruia Calinescu, and Peng-Jun Wan. Distributed construction of a planar spanner and routing for ad hoc wireless networks. In Proceedings. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, volume 3, pages 1268-1277. IEEE, 2002. Google Scholar
  37. Xiang-Yang Li, Wen-Zhan Song, and Yu Wang. Efficient topology control for ad-hoc wireless networks with non-uniform transmission ranges. Wireless Networks, 11(3):255-264, 2005. Google Scholar
  38. Frank Schulz. Modeling sensor and ad hoc networks. In Algorithms for Sensor and Ad Hoc Networks, pages 21-36. Springer, 2007. Google Scholar
  39. Amin Shahraki, Amir Taherkordi, Øystein Haugen, and Frank Eliassen. Clustering objectives in wireless sensor networks: A survey and research direction analysis. Computer Networks, 180:107376, 2020. Google Scholar
  40. Christian Sohler and David P Woodruff. Strong coresets for k-median and subspace approximation: Goodbye dimension. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 802-813. IEEE, 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail