Found 2 Possible Name Variants:

Document

Invited Talk

**Published in:** LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)

Typical worst case analysis of algorithms has led to a rich theory, but suffers from many pitfalls.
This has inspired several approaches to bypass worst case analysis.
In this talk, I will describe two vignettes from recent work in this realm.
In the first part of the talk, I will discuss tensor decomposition -- a natural higher dimensional analog of matrix decomposition.
Low rank tensor decompositions have proved to be a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. Yet, they pose significant challenges for algorithm design as most problems about tensors
are NP-hard. I will discuss a smoothed analysis framework for analyzing algorithms for tensor decomposition which models realistic instances of learning problems and allows us to overcome many of the usual limitations of using tensor methods.
In the second part of the talk, I will explore the phenomenon of convex relaxations returning integer solutions. Clearly this is not true in the worst case. I will discuss instances of discrete optimization problems where, for a suitable distribution on inputs, LP and SDP relaxations produce integer solutions with high probability. This has been studied in the context of LP decoding, sparse recovery, stochastic block models and so on. I will mention some recent results for clustering problems: when points are drawn from a distribution over k sufficiently separated clusters, the well known k-median relaxation and a (not so well known) SDP relaxation for k-means exactly recover the clusters.

Moses S. Charikar. Bypassing Worst Case Analysis: Tensor Decomposition and Clustering (Invited Talk). In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{charikar:LIPIcs.FSTTCS.2015.1, author = {Charikar, Moses S.}, title = {{Bypassing Worst Case Analysis: Tensor Decomposition and Clustering}}, booktitle = {35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)}, pages = {1--1}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-97-2}, ISSN = {1868-8969}, year = {2015}, volume = {45}, editor = {Harsha, Prahladh and Ramalingam, G.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.1}, URN = {urn:nbn:de:0030-drops-56647}, doi = {10.4230/LIPIcs.FSTTCS.2015.1}, annote = {Keywords: tensor decomposition, smoothed analysis, convex relaxations, integrality} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2).

Moses Charikar and Erik Waingarten. Polylogarithmic Sketches for Clustering. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 38:1-38:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2022.38, author = {Charikar, Moses and Waingarten, Erik}, title = {{Polylogarithmic Sketches for Clustering}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {38:1--38:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.38}, URN = {urn:nbn:de:0030-drops-163793}, doi = {10.4230/LIPIcs.ICALP.2022.38}, annote = {Keywords: sketching, clustering} }

Document

Extended Abstract

**Published in:** LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)

We introduce a model for ant trail formation, building upon previous work on biologically feasible local algorithms that plausibly describe how ants maintain trail networks. The model is a variant of a reinforced random walk on a directed graph, where ants lay pheromone on edges as they traverse them and the next edge to traverse is chosen based on the level of pheromone; this pheromone decays with time. There is a bidirectional flow of ants in the network: the forward flow proceeds along forward edges from source (e.g. the nest) to sink (e.g. a food source), and the backward flow in the opposite direction. Some fraction of ants are lost as they pass through each node (modeling the loss of ants due to exploration observed in the field). We initiate a theoretical study of this model. We note that ant navigation has inspired the field of ant colony optimization, heuristics that have been applied to several combinatorial optimization problems; however the algorithms developed there are considerably more complex and not constrained to being biologically feasible.
We first consider the linear decision rule, where the flow divides itself among the next set of edges in proportion to their pheromone level. Here, we show that the process converges to the path with minimum leakage when the forward and backward flows do not change over time. On the other hand, when the forward and backward flows increase over time (caused by positive reinforcement from the discovery of a food source, for example), we show that the process converges to the shortest path. These results are for graphs consisting of two parallel paths (a case that has been investigated before in experiments). Through simulations, we show that these results hold for more general graphs drawn from various random graph models; proving this convergence in the general case is an interesting open problem. Further, to understand the behaviour of other decision rules beyond the linear rule, we consider a general family of decision rules. For this family, we show that there is no advantage of using a non-linear decision rule, if the goal is to find the shortest or the minimum leakage path. We also show that bidirectional flow is necessary for convergence to such paths. Our results provide a plausible explanation for field observations, and open up new avenues for further theoretical and experimental investigation.

Moses Charikar, Shivam Garg, Deborah M. Gordon, and Kirankumar Shiragur. A Model for Ant Trail Formation and its Convergence Properties (Extended Abstract). In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 85:1-85:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ITCS.2021.85, author = {Charikar, Moses and Garg, Shivam and Gordon, Deborah M. and Shiragur, Kirankumar}, title = {{A Model for Ant Trail Formation and its Convergence Properties}}, booktitle = {12th Innovations in Theoretical Computer Science Conference (ITCS 2021)}, pages = {85:1--85:2}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-177-1}, ISSN = {1868-8969}, year = {2021}, volume = {185}, editor = {Lee, James R.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.85}, URN = {urn:nbn:de:0030-drops-136247}, doi = {10.4230/LIPIcs.ITCS.2021.85}, annote = {Keywords: Ant colonies, Reinforced random walks, Natural Algorithms, Graph Algorithms, Shortest Path, Distributed Algorithms} }

Document

**Published in:** LIPIcs, Volume 129, 35th International Symposium on Computational Geometry (SoCG 2019)

We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using O~(n/alpha) bits for the problem of computing an alpha-approximation for DTW between strings x and y of length n, and we prove a lower bound of Omega(n / alpha) bits for the same problem. Our communication protocol works for strings over an arbitrary metric of polynomial size and aspect ratio, and we optimize the logarithmic factors depending on properties of the underlying metric, such as when the points are low-dimensional integer vectors equipped with various metrics or have bounded doubling dimension. We also consider linear sketches of DTW, showing that such sketches must have size Omega(n).

Vladimir Braverman, Moses Charikar, William Kuszmaul, David P. Woodruff, and Lin F. Yang. The One-Way Communication Complexity of Dynamic Time Warping Distance. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.SoCG.2019.16, author = {Braverman, Vladimir and Charikar, Moses and Kuszmaul, William and Woodruff, David P. and Yang, Lin F.}, title = {{The One-Way Communication Complexity of Dynamic Time Warping Distance}}, booktitle = {35th International Symposium on Computational Geometry (SoCG 2019)}, pages = {16:1--16:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-104-7}, ISSN = {1868-8969}, year = {2019}, volume = {129}, editor = {Barequet, Gill and Wang, Yusu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2019.16}, URN = {urn:nbn:de:0030-drops-104203}, doi = {10.4230/LIPIcs.SoCG.2019.16}, annote = {Keywords: dynamic time warping, one-way communication complexity, tree metrics} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

The state-of-the-art algorithm for maintaining an approximate maximum matching in fully dynamic graphs has a polynomial worst-case update time, even for poor approximation guarantees. Bhattacharya, Henzinger and Nanongkai showed how to maintain a constant approximation to the minimum vertex cover, and thus also a constant-factor estimate of the maximum matching size, with polylogarithmic worst-case update time. Later (in SODA'17 Proc.) they improved the approximation to 2+epsilon. Nevertheless, the fundamental problem of maintaining an approximate matching with sub-polynomial worst-case time bounds remained open.
We present a randomized algorithm for maintaining an almost-maximal matching in fully dynamic graphs with polylogarithmic worst-case update time. Such a matching provides (2+epsilon)-approximations for both maximum matching and minimum vertex cover, for any epsilon > 0. The worst-case update time of our algorithm, O(poly(log n,epsilon^{-1})), holds deterministically, while the almost-maximality guarantee holds with high probability. Our result was done independently of the (2+epsilon)-approximation result of Bhattacharya et al., thus settling the aforementioned problem on dynamic matchings and providing essentially the best possible approximation guarantee for dynamic vertex cover (assuming the unique games conjecture).
To prove this result, we exploit a connection between the standard oblivious adversarial model, which can be viewed as inherently "online", and an "offline" model where some (limited) information on the future can be revealed efficiently upon demand. Our randomized algorithm is derived from a deterministic algorithm in this offline model. This approach gives an elegant way to analyze randomized dynamic algorithms, and is of independent interest.

Moses Charikar and Shay Solomon. Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Worst-Case Time Barrier. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2018.33, author = {Charikar, Moses and Solomon, Shay}, title = {{Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Worst-Case Time Barrier}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {33:1--33:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.33}, URN = {urn:nbn:de:0030-drops-90370}, doi = {10.4230/LIPIcs.ICALP.2018.33}, annote = {Keywords: dynamic graph algorithms, maximum matching, worst-case bounds} }

Document

**Published in:** LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)

Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually producing an alignment (i.e., the sequence of edits) has received far less attention. Somewhat surprisingly, we show that any algorithm to estimate edit distance can be used in a black-box fashion to produce an approximate alignment of strings, with modest loss in approximation factor and small loss in run time. Plugging in the result of Andoni, Krauthgamer, and Onak, we obtain an alignment that is a (log n)^{O(1/epsilon^2)} approximation in time O~(n^{1 + epsilon}).
Closely related to the study of approximation algorithms is the study of metric embeddings for edit distance. We show that min-hash techniques can be useful in designing edit distance embeddings through three results: (1) An embedding from Ulam distance (edit distance over permutations) to Hamming space that matches the best known distortion of O(log n) and also implicitly encodes a sequence of edits between the strings; (2) In the case where the edit distance between the input strings is known to have an upper bound K, we show that embeddings of edit distance into Hamming space with distortion f(n) can be modified in a black-box fashion to give distortion O(f(poly(K))) for a class of periodic-free strings; (3) A randomized dimension-reduction map with contraction c and asymptotically optimal expected distortion O(c), improving on the previous O~(c^{1 + 2 / log log log n}) distortion result of Batu, Ergun, and Sahinalp.

Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul. On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 34:1-34:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2018.34, author = {Charikar, Moses and Geri, Ofir and Kim, Michael P. and Kuszmaul, William}, title = {{On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings}}, booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)}, pages = {34:1--34:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-076-7}, ISSN = {1868-8969}, year = {2018}, volume = {107}, editor = {Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.34}, URN = {urn:nbn:de:0030-drops-90383}, doi = {10.4230/LIPIcs.ICALP.2018.34}, annote = {Keywords: edit distance, alignment, approximation algorithms, embedding, dimension reduction} }

Document

**Published in:** LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in p-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded 1st moments to a stable "core" with bounded 2nd moments, which may be of independent interest.

Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 45:1-45:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{steinhardt_et_al:LIPIcs.ITCS.2018.45, author = {Steinhardt, Jacob and Charikar, Moses and Valiant, Gregory}, title = {{Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers}}, booktitle = {9th Innovations in Theoretical Computer Science Conference (ITCS 2018)}, pages = {45:1--45:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-060-6}, ISSN = {1868-8969}, year = {2018}, volume = {94}, editor = {Karlin, Anna R.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.45}, URN = {urn:nbn:de:0030-drops-83687}, doi = {10.4230/LIPIcs.ITCS.2018.45}, annote = {Keywords: robust learning, outliers, stochastic block models, p-norm estimation} }

Document

**Published in:** LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)

In the min-cost bipartite perfect matching with delays (MBPMD) problem, requests arrive online at points of a finite metric space. Each request is either positive or negative and has to be matched to a request of opposite polarity. As opposed to traditional online matching problems, the algorithm does not have to serve requests as they arrive, and may choose to match them later at a cost. Our objective is to minimize the sum of the distances between matched pairs of requests (the connection cost) and the sum of the waiting times of the requests (the delay cost). This objective exhibits a natural tradeoff between minimizing the distances and the cost of waiting for better matches. This tradeoff appears in many real-life scenarios, notably, ride-sharing platforms. MBPMD is related to its non-bipartite variant, min-cost perfect matching with delays (MPMD), in which each request can be matched to any other request. MPMD was introduced by Emek et al. (STOC'16), who showed an O(log^2(n)+log(Delta))-competitive randomized algorithm on n-point metric spaces with aspect ratio Delta.
Our contribution is threefold. First, we present a new lower bound construction for MPMD and MBPMD. We get a lower bound of Omega(sqrt(log(n)/log(log(n)))) on the competitive ratio of any randomized algorithm for MBPMD. For MPMD, we improve the lower bound from Omega(sqrt(log(n))) (shown by Azar et al., SODA'17) to Omega(log(n)/log(log(n))), thus, almost matching their upper bound of O(log(n)). Second, we adapt the algorithm of Emek et al. to the bipartite case, and provide a simplified analysis that improves the competitive ratio to O(log(n)). The key ingredient of the algorithm is an O(h)-competitive randomized algorithm for MBPMD on weighted trees of height h. Third, we provide an O(h)-competitive deterministic algorithm for MBPMD on weighted trees of height h. This algorithm is obtained by adapting the algorithm for MPMD by Azar et al. to the apparently more complicated bipartite setting.

Itai Ashlagi, Yossi Azar, Moses Charikar, Ashish Chiplunkar, Ofir Geri, Haim Kaplan, Rahul Makhijani, Yuyi Wang, and Roger Wattenhofer. Min-Cost Bipartite Perfect Matching with Delays. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 1:1-1:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{ashlagi_et_al:LIPIcs.APPROX-RANDOM.2017.1, author = {Ashlagi, Itai and Azar, Yossi and Charikar, Moses and Chiplunkar, Ashish and Geri, Ofir and Kaplan, Haim and Makhijani, Rahul and Wang, Yuyi and Wattenhofer, Roger}, title = {{Min-Cost Bipartite Perfect Matching with Delays}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)}, pages = {1:1--1:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-044-6}, ISSN = {1868-8969}, year = {2017}, volume = {81}, editor = {Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.1}, URN = {urn:nbn:de:0030-drops-75509}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2017.1}, annote = {Keywords: online algorithms with delayed service, bipartite matching, competitive analysis} }

Document

**Published in:** LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)

We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set.

Moses Charikar, Yonatan Naamad, and Anthony Wirth. On Approximating Target Set Selection. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.APPROX-RANDOM.2016.4, author = {Charikar, Moses and Naamad, Yonatan and Wirth, Anthony}, title = {{On Approximating Target Set Selection}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)}, pages = {4:1--4:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-018-7}, ISSN = {1868-8969}, year = {2016}, volume = {60}, editor = {Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.4}, URN = {urn:nbn:de:0030-drops-66274}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2016.4}, annote = {Keywords: target set selection, influence propagation, approximation algorithms, hardness of approximation, planted dense subgraph} }

Document

**Published in:** LIPIcs, Volume 34, 31st International Symposium on Computational Geometry (SoCG 2015)

The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d.
In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 754-767, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.SOCG.2015.754, author = {Awasthi, Pranjal and Charikar, Moses and Krishnaswamy, Ravishankar and Sinop, Ali Kemal}, title = {{The Hardness of Approximation of Euclidean k-Means}}, booktitle = {31st International Symposium on Computational Geometry (SoCG 2015)}, pages = {754--767}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-83-5}, ISSN = {1868-8969}, year = {2015}, volume = {34}, editor = {Arge, Lars and Pach, J\'{a}nos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SOCG.2015.754}, URN = {urn:nbn:de:0030-drops-51178}, doi = {10.4230/LIPIcs.SOCG.2015.754}, annote = {Keywords: Euclidean k-means, Hardness of Approximation, Vertex Cover} }