Document

APPROX

**Published in:** LIPIcs, Volume 176, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)

We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., intervals and disks. Let α be the cardinality of the largest independent set. Our goal is to estimate α in a small amount of space, given that the input is received as a one-pass stream. We also consider a generalization of this problem by assigning weights to each object and estimating β, the largest value of a weighted independent set. We initialize the study of this problem in the turnstile streaming model (insertions and deletions) and provide the first algorithms for estimating α and β.
For unit-length intervals, we obtain a (2+ε)-approximation to α and β in poly(log(n)/ε) space. We also show a matching lower bound. Combined with the 3/2-approximation for insertion-only streams by Cabello and Perez-Lanterno [Cabello and Pérez-Lantero, 2017], our result implies a separation between the insertion-only and turnstile model. For unit-radius disks, we obtain a (8√3/π)-approximation to α and β in poly(log(n)/ε) space, which is closely related to the hexagonal circle packing constant.
Finally, we provide algorithms for estimating α for arbitrary-length intervals under a bounded intersection assumption and study the parameterized space complexity of estimating α and β, where the parameter is the ratio of maximum to minimum interval length.

Ainesh Bakshi, Nadiia Chepurko, and David P. Woodruff. Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 176, pp. 64:1-64:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bakshi_et_al:LIPIcs.APPROX/RANDOM.2020.64, author = {Bakshi, Ainesh and Chepurko, Nadiia and Woodruff, David P.}, title = {{Weighted Maximum Independent Set of Geometric Objects in Turnstile Streams}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)}, pages = {64:1--64:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-164-1}, ISSN = {1868-8969}, year = {2020}, volume = {176}, editor = {Byrka, Jaros{\l}aw and Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2020.64}, URN = {urn:nbn:de:0030-drops-126679}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.64}, annote = {Keywords: Weighted Maximum Independent Set, Geometric Graphs, Turnstile Streams} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)

In this work, we study the k-median and k-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer understanding through the lens of beyond worst-case analysis. Our main motivation is the following: for many applications such as clustering proteins by function or clustering communities in a social network, there is some unknown target clustering, and the hope is that running a k-median or k-means algorithm will produce clusterings which are close to matching the target clustering. Worst-case results can guarantee constant factor approximations to the optimal k-median or k-means objective value, but not closeness to the target clustering.
Our first result is a distributed algorithm which returns a near-optimal clustering assuming a natural notion of stability, namely, approximation stability [Awasthi and Balcan, 2014], even when a constant fraction of the data are outliers. The communication complexity is O~(sk+z) where s is the number of machines, k is the number of clusters, and z is the number of outliers. Next, we show this amount of communication cannot be improved even in the setting when the input satisfies various non-worst-case assumptions. We give a matching Omega(sk+z) lower bound on the communication required both for approximating the optimal k-means or k-median cost up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance. These lower bounds hold even when the data satisfies approximation stability or other common notions of stability, and the cluster sizes are balanced. Therefore, Omega(sk+z) is a communication bottleneck, even for real-world instances.

Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, and David P. Woodruff. Robust Communication-Optimal Distributed Clustering Algorithms. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.ICALP.2019.18, author = {Awasthi, Pranjal and Bakshi, Ainesh and Balcan, Maria-Florina and White, Colin and Woodruff, David P.}, title = {{Robust Communication-Optimal Distributed Clustering Algorithms}}, booktitle = {46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)}, pages = {18:1--18:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-109-2}, ISSN = {1868-8969}, year = {2019}, volume = {132}, editor = {Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.18}, URN = {urn:nbn:de:0030-drops-105942}, doi = {10.4230/LIPIcs.ICALP.2019.18}, annote = {Keywords: robust distributed clustering, communication complexity} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail