We devise ε-coresets for robust (k,z)-Clustering with m outliers through black-box reductions to vanilla clustering. Given an ε-coreset construction for vanilla clustering with size N, we construct coresets of size N⋅ polylog(kmε^{-1}) + O_z(min{kmε^{-1}, m ε^{-2z}log^z(kmε^{-1})}) for various metric spaces, where O_z hides 2^{O(zlog z)} factors. This increases the size of the vanilla coreset by a small multiplicative factor of polylog(kmε^{-1}), and the additive term is up to a (ε^{-1}log (km))^{O(z)} factor to the size of the optimal robust coreset. Plugging in recent vanilla coreset results of [Cohen-Addad, Saulpic and Schwiegelshohn, STOC'21; Cohen-Addad, Draganov, Russo, Saulpic and Schwiegelshohn, SODA'25], we obtain the first coresets for (k,z)-Clustering with m outliers with size near-linear in k while previous results have size at least Ω(k²) [Huang, Jiang, Lou and Wu, ICLR'23; Huang, Li, Lu and Wu, SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only O_z(km ε^{-1}) non-dense points to obtain the O_z(km ε^{-1}) additive bound. Another sufficient condition requires the vanilla coreset to possess an extra size-preserving property. To utilize this condition, we further give a black-box reduction that turns a vanilla coreset to the one that satisfies the said size-preserving property, and this leads to the alternative O_z(mε^{-2z}log^{z}(kmε^{-1})) additive size bound. We also give low-space implementations of our reductions in the dynamic streaming setting. Combined with known streaming constructions for vanilla coresets [Braverman, Frahling, Lang, Sohler and Yang, ICML'17; Hu, Song, Yang and Zhong, arXiv'1802.00459], we obtain the first dynamic streaming algorithms for coresets for k-Median (and k-Means) with m outliers, using space Õ(k + m) ⋅ poly(dε^{-1}log Δ) for inputs on a discrete grid [Δ]^d.
@InProceedings{jiang_et_al:LIPIcs.ICALP.2025.101, author = {Jiang, Shaofeng H.-C. and Lou, Jianing}, title = {{Coresets for Robust Clustering via Black-Box Reductions to Vanilla Case}}, booktitle = {52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)}, pages = {101:1--101:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-372-0}, ISSN = {1868-8969}, year = {2025}, volume = {334}, editor = {Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.101}, URN = {urn:nbn:de:0030-drops-234781}, doi = {10.4230/LIPIcs.ICALP.2025.101}, annote = {Keywords: Coresets, clustering, outliers, streaming algorithms} }
Feedback for Dagstuhl Publishing