13 Search Results for "Charikar, Moses"


Document
Track A: Algorithms, Complexity and Games
Polylogarithmic Sketches for Clustering

Authors: Moses Charikar and Erik Waingarten

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)


Abstract
Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2).

Cite as

Moses Charikar and Erik Waingarten. Polylogarithmic Sketches for Clustering. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 38:1-38:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2022.38,
  author =	{Charikar, Moses and Waingarten, Erik},
  title =	{{Polylogarithmic Sketches for Clustering}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{38:1--38:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.38},
  URN =		{urn:nbn:de:0030-drops-163793},
  doi =		{10.4230/LIPIcs.ICALP.2022.38},
  annote =	{Keywords: sketching, clustering}
}
Document
Invited Paper
Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper)

Authors: Mikkel Thorup

Published in: LIPIcs, Volume 227, 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)


Abstract
We consider the numerical taxonomy problem of fitting an S× S distance matrix D with a tree metric T. Here T is a weighted tree spanning S where the path lengths in T induce a metric on S. If there is a tree metric matching D exactly, then it is easily found. If there is no exact match, then for some k, we want to minimize the L_k norm of the errors, that is, pick T so as to minimize ‖D-T‖_k = (∑_{i,j ∈ S} |D(i,j)-T(i,j)|^k) ^{1/k}. This problem was raised in biology in the 1960s for k = 1,2. The biological interpretation is that T represents a possible evolution behind the species in S matching some measured distances in D. Sometimes, it is required that T is an ultrametric, meaning that all species are at the same distance from the root. An evolutionary tree induces a hierarchical classification of species and this is not just tied to biology. Medicine, ecology and linguistics are just some of the fields where this concept appears, and it is even an integral part of machine learning and data science. Fundamentally, if we can approximate distances with a tree, then they are much easier to reason about: many questions that are NP-hard for general metrics can be answered in linear time on tree metrics. In fact, humans have appreciated hierarchical classifications at least since Plato and Aristotle (350 BC). The numerical taxonomy problem is important in practice and many heuristics have been proposed. In this talk we will review the basic algorithmic theory, results and techniques, for the problem, including the most recent result from FOCS'21 [Vincent Cohen-Addad et al., 2021]. They paint a varied landscape with big differences between different moments, and with some very nice open problems remaining. - At STOC'93, Farach, Kannan, and Warnow [Martin Farach et al., 1995] proved that under L_∞, we can find the optimal ultrametric. Almost all other variants of the problem are APX-hard. - At SODA'96, Agarwala, Bafna, Farach, Paterson, and Thorup [Richa Agarwala et al., 1999] showed that for any norm L_k, k ≥ 1, if the best ultrametric can be α-approximated, then the best tree metric can be 3α-approximated. In particular, this implied a 3-approximation for tree metrics under L_∞. - At FOCS'05, Ailon and Charikar [Nir Ailon and Moses Charikar, 2011] showed that for any L_k, k ≥ 1, we can get an approximation factor of O(((log n)(log log n))^{1/k}) for both tree and ultrametrics. Their paper was focused on the L₁ norm, and they wrote "Determining whether an O(1) approximation can be obtained is a fascinating question". - At FOCS'21, Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [Vincent Cohen-Addad et al., 2021] showed that indeed a constant factor is possible for L₁ for both tree and ultrametrics. This uses the special structure of L₁ in relation to hierarchies. - The status of L_k is wide open for 1 < k < ∞. All we know is that the approximation factor is between Ω(1) and O((log n)(log log n)).

Cite as

Mikkel Thorup. Reconstructing the Tree of Life (Fitting Distances by Tree Metrics) (Invited Paper). In 18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 227, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{thorup:LIPIcs.SWAT.2022.3,
  author =	{Thorup, Mikkel},
  title =	{{Reconstructing the Tree of Life (Fitting Distances by Tree Metrics)}},
  booktitle =	{18th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2022)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-236-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{227},
  editor =	{Czumaj, Artur and Xin, Qin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SWAT.2022.3},
  URN =		{urn:nbn:de:0030-drops-161631},
  doi =		{10.4230/LIPIcs.SWAT.2022.3},
  annote =	{Keywords: Numerical taxonomy, computational phylogenetics, hierarchical clustering}
}
Document
Extended Abstract
A Model for Ant Trail Formation and its Convergence Properties (Extended Abstract)

Authors: Moses Charikar, Shivam Garg, Deborah M. Gordon, and Kirankumar Shiragur

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)


Abstract
We introduce a model for ant trail formation, building upon previous work on biologically feasible local algorithms that plausibly describe how ants maintain trail networks. The model is a variant of a reinforced random walk on a directed graph, where ants lay pheromone on edges as they traverse them and the next edge to traverse is chosen based on the level of pheromone; this pheromone decays with time. There is a bidirectional flow of ants in the network: the forward flow proceeds along forward edges from source (e.g. the nest) to sink (e.g. a food source), and the backward flow in the opposite direction. Some fraction of ants are lost as they pass through each node (modeling the loss of ants due to exploration observed in the field). We initiate a theoretical study of this model. We note that ant navigation has inspired the field of ant colony optimization, heuristics that have been applied to several combinatorial optimization problems; however the algorithms developed there are considerably more complex and not constrained to being biologically feasible. We first consider the linear decision rule, where the flow divides itself among the next set of edges in proportion to their pheromone level. Here, we show that the process converges to the path with minimum leakage when the forward and backward flows do not change over time. On the other hand, when the forward and backward flows increase over time (caused by positive reinforcement from the discovery of a food source, for example), we show that the process converges to the shortest path. These results are for graphs consisting of two parallel paths (a case that has been investigated before in experiments). Through simulations, we show that these results hold for more general graphs drawn from various random graph models; proving this convergence in the general case is an interesting open problem. Further, to understand the behaviour of other decision rules beyond the linear rule, we consider a general family of decision rules. For this family, we show that there is no advantage of using a non-linear decision rule, if the goal is to find the shortest or the minimum leakage path. We also show that bidirectional flow is necessary for convergence to such paths. Our results provide a plausible explanation for field observations, and open up new avenues for further theoretical and experimental investigation.

Cite as

Moses Charikar, Shivam Garg, Deborah M. Gordon, and Kirankumar Shiragur. A Model for Ant Trail Formation and its Convergence Properties (Extended Abstract). In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 85:1-85:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ITCS.2021.85,
  author =	{Charikar, Moses and Garg, Shivam and Gordon, Deborah M. and Shiragur, Kirankumar},
  title =	{{A Model for Ant Trail Formation and its Convergence Properties}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{85:1--85:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.85},
  URN =		{urn:nbn:de:0030-drops-136247},
  doi =		{10.4230/LIPIcs.ITCS.2021.85},
  annote =	{Keywords: Ant colonies, Reinforced random walks, Natural Algorithms, Graph Algorithms, Shortest Path, Distributed Algorithms}
}
Document
Track A: Algorithms, Complexity and Games
Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation

Authors: William Kuszmaul

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
Dynamic time warping distance (DTW) is a widely used distance measure between time series, with applications in areas such as speech recognition and bioinformatics. The best known algorithms for computing DTW run in near quadratic time, and conditional lower bounds prohibit the existence of significantly faster algorithms. The lower bounds do not prevent a faster algorithm for the important special case in which the DTW is small, however. For an arbitrary metric space Sigma with distances normalized so that the smallest non-zero distance is one, we present an algorithm which computes dtw(x, y) for two strings x and y over Sigma in time O(n * dtw(x, y)). When dtw(x, y) is small, this represents a significant speedup over the standard quadratic-time algorithm. Using our low-distance regime algorithm as a building block, we also present an approximation algorithm which computes dtw(x, y) within a factor of O(n^epsilon) in time O~(n^{2 - epsilon}) for 0 < epsilon < 1. The algorithm allows for the strings x and y to be taken over an arbitrary well-separated tree metric with logarithmic depth and at most exponential aspect ratio. Notably, any polynomial-size metric space can be efficiently embedded into such a tree metric with logarithmic expected distortion. Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n^epsilon-approximation in time O~(n^{2 - epsilon}), with high probability. Finally, we turn our attention to the relationship between edit distance and dynamic time warping distance. We prove a reduction from computing edit distance over an arbitrary metric space to computing DTW over the same metric space, except with an added null character (whose distance to a letter l is defined to be the edit-distance insertion cost of l). Applying our reduction to a conditional lower bound of Bringmann and Künnemann pertaining to edit distance over {0, 1}, we obtain a conditional lower bound for computing DTW over a three letter alphabet (with distances of zero and one). This improves on a previous result of Abboud, Backurs, and Williams, who gave a conditional lower bound for DTW over an alphabet of size five. With a similar approach, we also prove a reduction from computing edit distance (over generalized Hamming Space) to computing longest-common-subsequence length (LCS) over an alphabet with an added null character. Surprisingly, this means that one can recover conditional lower bounds for LCS directly from those for edit distance, which was not previously thought to be the case.

Cite as

William Kuszmaul. Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 80:1-80:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kuszmaul:LIPIcs.ICALP.2019.80,
  author =	{Kuszmaul, William},
  title =	{{Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{80:1--80:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.80},
  URN =		{urn:nbn:de:0030-drops-106568},
  doi =		{10.4230/LIPIcs.ICALP.2019.80},
  annote =	{Keywords: dynamic time warping, edit distance, approximation algorithm, tree metrics}
}
Document
The One-Way Communication Complexity of Dynamic Time Warping Distance

Authors: Vladimir Braverman, Moses Charikar, William Kuszmaul, David P. Woodruff, and Lin F. Yang

Published in: LIPIcs, Volume 129, 35th International Symposium on Computational Geometry (SoCG 2019)


Abstract
We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using O~(n/alpha) bits for the problem of computing an alpha-approximation for DTW between strings x and y of length n, and we prove a lower bound of Omega(n / alpha) bits for the same problem. Our communication protocol works for strings over an arbitrary metric of polynomial size and aspect ratio, and we optimize the logarithmic factors depending on properties of the underlying metric, such as when the points are low-dimensional integer vectors equipped with various metrics or have bounded doubling dimension. We also consider linear sketches of DTW, showing that such sketches must have size Omega(n).

Cite as

Vladimir Braverman, Moses Charikar, William Kuszmaul, David P. Woodruff, and Lin F. Yang. The One-Way Communication Complexity of Dynamic Time Warping Distance. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.SoCG.2019.16,
  author =	{Braverman, Vladimir and Charikar, Moses and Kuszmaul, William and Woodruff, David P. and Yang, Lin F.},
  title =	{{The One-Way Communication Complexity of Dynamic Time Warping Distance}},
  booktitle =	{35th International Symposium on Computational Geometry (SoCG 2019)},
  pages =	{16:1--16:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-104-7},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{129},
  editor =	{Barequet, Gill and Wang, Yusu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2019.16},
  URN =		{urn:nbn:de:0030-drops-104203},
  doi =		{10.4230/LIPIcs.SoCG.2019.16},
  annote =	{Keywords: dynamic time warping, one-way communication complexity, tree metrics}
}
Document
Mildly Exponential Time Approximation Algorithms for Vertex Cover, Balanced Separator and Uniform Sparsest Cut

Authors: Pasin Manurangsi and Luca Trevisan

Published in: LIPIcs, Volume 116, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)


Abstract
In this work, we study the trade-off between the running time of approximation algorithms and their approximation guarantees. By leveraging a structure of the "hard" instances of the Arora-Rao-Vazirani lemma [Sanjeev Arora et al., 2009; James R. Lee, 2005], we show that the Sum-of-Squares hierarchy can be adapted to provide "fast", but still exponential time, approximation algorithms for several problems in the regime where they are believed to be NP-hard. Specifically, our framework yields the following algorithms; here n denote the number of vertices of the graph and r can be any positive real number greater than 1 (possibly depending on n). - A (2 - 1/(O(r)))-approximation algorithm for Vertex Cover that runs in exp (n/(2^{r^2)})n^{O(1)} time. - An O(r)-approximation algorithms for Uniform Sparsest Cut and Balanced Separator that runs in exp (n/(2^{r^2)})n^{O(1)} time. Our algorithm for Vertex Cover improves upon Bansal et al.'s algorithm [Nikhil Bansal et al., 2017] which achieves (2 - 1/(O(r)))-approximation in time exp (n/(r^r))n^{O(1)}. For Uniform Sparsest Cut and Balanced Separator, our algorithms improve upon O(r)-approximation exp (n/(2^r))n^{O(1)}-time algorithms that follow from a work of Charikar et al. [Moses Charikar et al., 2010].

Cite as

Pasin Manurangsi and Luca Trevisan. Mildly Exponential Time Approximation Algorithms for Vertex Cover, Balanced Separator and Uniform Sparsest Cut. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 20:1-20:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{manurangsi_et_al:LIPIcs.APPROX-RANDOM.2018.20,
  author =	{Manurangsi, Pasin and Trevisan, Luca},
  title =	{{Mildly Exponential Time Approximation Algorithms for Vertex Cover, Balanced Separator and Uniform Sparsest Cut}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)},
  pages =	{20:1--20:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-085-9},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{116},
  editor =	{Blais, Eric and Jansen, Klaus and D. P. Rolim, Jos\'{e} and Steurer, David},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2018.20},
  URN =		{urn:nbn:de:0030-drops-94241},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2018.20},
  annote =	{Keywords: Approximation algorithms, Exponential-time algorithms, Vertex Cover, Sparsest Cut, Balanced Separator}
}
Document
Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Worst-Case Time Barrier

Authors: Moses Charikar and Shay Solomon

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)


Abstract
The state-of-the-art algorithm for maintaining an approximate maximum matching in fully dynamic graphs has a polynomial worst-case update time, even for poor approximation guarantees. Bhattacharya, Henzinger and Nanongkai showed how to maintain a constant approximation to the minimum vertex cover, and thus also a constant-factor estimate of the maximum matching size, with polylogarithmic worst-case update time. Later (in SODA'17 Proc.) they improved the approximation to 2+epsilon. Nevertheless, the fundamental problem of maintaining an approximate matching with sub-polynomial worst-case time bounds remained open. We present a randomized algorithm for maintaining an almost-maximal matching in fully dynamic graphs with polylogarithmic worst-case update time. Such a matching provides (2+epsilon)-approximations for both maximum matching and minimum vertex cover, for any epsilon > 0. The worst-case update time of our algorithm, O(poly(log n,epsilon^{-1})), holds deterministically, while the almost-maximality guarantee holds with high probability. Our result was done independently of the (2+epsilon)-approximation result of Bhattacharya et al., thus settling the aforementioned problem on dynamic matchings and providing essentially the best possible approximation guarantee for dynamic vertex cover (assuming the unique games conjecture). To prove this result, we exploit a connection between the standard oblivious adversarial model, which can be viewed as inherently "online", and an "offline" model where some (limited) information on the future can be revealed efficiently upon demand. Our randomized algorithm is derived from a deterministic algorithm in this offline model. This approach gives an elegant way to analyze randomized dynamic algorithms, and is of independent interest.

Cite as

Moses Charikar and Shay Solomon. Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Worst-Case Time Barrier. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2018.33,
  author =	{Charikar, Moses and Solomon, Shay},
  title =	{{Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Worst-Case Time Barrier}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{33:1--33:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.33},
  URN =		{urn:nbn:de:0030-drops-90370},
  doi =		{10.4230/LIPIcs.ICALP.2018.33},
  annote =	{Keywords: dynamic graph algorithms, maximum matching, worst-case bounds}
}
Document
On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings

Authors: Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul

Published in: LIPIcs, Volume 107, 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)


Abstract
Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually producing an alignment (i.e., the sequence of edits) has received far less attention. Somewhat surprisingly, we show that any algorithm to estimate edit distance can be used in a black-box fashion to produce an approximate alignment of strings, with modest loss in approximation factor and small loss in run time. Plugging in the result of Andoni, Krauthgamer, and Onak, we obtain an alignment that is a (log n)^{O(1/epsilon^2)} approximation in time O~(n^{1 + epsilon}). Closely related to the study of approximation algorithms is the study of metric embeddings for edit distance. We show that min-hash techniques can be useful in designing edit distance embeddings through three results: (1) An embedding from Ulam distance (edit distance over permutations) to Hamming space that matches the best known distortion of O(log n) and also implicitly encodes a sequence of edits between the strings; (2) In the case where the edit distance between the input strings is known to have an upper bound K, we show that embeddings of edit distance into Hamming space with distortion f(n) can be modified in a black-box fashion to give distortion O(f(poly(K))) for a class of periodic-free strings; (3) A randomized dimension-reduction map with contraction c and asymptotically optimal expected distortion O(c), improving on the previous O~(c^{1 + 2 / log log log n}) distortion result of Batu, Ergun, and Sahinalp.

Cite as

Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul. On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 107, pp. 34:1-34:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.ICALP.2018.34,
  author =	{Charikar, Moses and Geri, Ofir and Kim, Michael P. and Kuszmaul, William},
  title =	{{On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings}},
  booktitle =	{45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
  pages =	{34:1--34:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-076-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{107},
  editor =	{Chatzigiannakis, Ioannis and Kaklamanis, Christos and Marx, D\'{a}niel and Sannella, Donald},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2018.34},
  URN =		{urn:nbn:de:0030-drops-90383},
  doi =		{10.4230/LIPIcs.ICALP.2018.34},
  annote =	{Keywords: edit distance, alignment, approximation algorithms, embedding, dimension reduction}
}
Document
Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Authors: Jacob Steinhardt, Moses Charikar, and Gregory Valiant

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)


Abstract
We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in p-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded 1st moments to a stable "core" with bounded 2nd moments, which may be of independent interest.

Cite as

Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 45:1-45:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{steinhardt_et_al:LIPIcs.ITCS.2018.45,
  author =	{Steinhardt, Jacob and Charikar, Moses and Valiant, Gregory},
  title =	{{Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{45:1--45:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.45},
  URN =		{urn:nbn:de:0030-drops-83687},
  doi =		{10.4230/LIPIcs.ITCS.2018.45},
  annote =	{Keywords: robust learning, outliers, stochastic block models, p-norm estimation}
}
Document
Min-Cost Bipartite Perfect Matching with Delays

Authors: Itai Ashlagi, Yossi Azar, Moses Charikar, Ashish Chiplunkar, Ofir Geri, Haim Kaplan, Rahul Makhijani, Yuyi Wang, and Roger Wattenhofer

Published in: LIPIcs, Volume 81, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)


Abstract
In the min-cost bipartite perfect matching with delays (MBPMD) problem, requests arrive online at points of a finite metric space. Each request is either positive or negative and has to be matched to a request of opposite polarity. As opposed to traditional online matching problems, the algorithm does not have to serve requests as they arrive, and may choose to match them later at a cost. Our objective is to minimize the sum of the distances between matched pairs of requests (the connection cost) and the sum of the waiting times of the requests (the delay cost). This objective exhibits a natural tradeoff between minimizing the distances and the cost of waiting for better matches. This tradeoff appears in many real-life scenarios, notably, ride-sharing platforms. MBPMD is related to its non-bipartite variant, min-cost perfect matching with delays (MPMD), in which each request can be matched to any other request. MPMD was introduced by Emek et al. (STOC'16), who showed an O(log^2(n)+log(Delta))-competitive randomized algorithm on n-point metric spaces with aspect ratio Delta. Our contribution is threefold. First, we present a new lower bound construction for MPMD and MBPMD. We get a lower bound of Omega(sqrt(log(n)/log(log(n)))) on the competitive ratio of any randomized algorithm for MBPMD. For MPMD, we improve the lower bound from Omega(sqrt(log(n))) (shown by Azar et al., SODA'17) to Omega(log(n)/log(log(n))), thus, almost matching their upper bound of O(log(n)). Second, we adapt the algorithm of Emek et al. to the bipartite case, and provide a simplified analysis that improves the competitive ratio to O(log(n)). The key ingredient of the algorithm is an O(h)-competitive randomized algorithm for MBPMD on weighted trees of height h. Third, we provide an O(h)-competitive deterministic algorithm for MBPMD on weighted trees of height h. This algorithm is obtained by adapting the algorithm for MPMD by Azar et al. to the apparently more complicated bipartite setting.

Cite as

Itai Ashlagi, Yossi Azar, Moses Charikar, Ashish Chiplunkar, Ofir Geri, Haim Kaplan, Rahul Makhijani, Yuyi Wang, and Roger Wattenhofer. Min-Cost Bipartite Perfect Matching with Delays. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 81, pp. 1:1-1:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{ashlagi_et_al:LIPIcs.APPROX-RANDOM.2017.1,
  author =	{Ashlagi, Itai and Azar, Yossi and Charikar, Moses and Chiplunkar, Ashish and Geri, Ofir and Kaplan, Haim and Makhijani, Rahul and Wang, Yuyi and Wattenhofer, Roger},
  title =	{{Min-Cost Bipartite Perfect Matching with Delays}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2017)},
  pages =	{1:1--1:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-044-6},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{81},
  editor =	{Jansen, Klaus and Rolim, Jos\'{e} D. P. and Williamson, David P. and Vempala, Santosh S.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2017.1},
  URN =		{urn:nbn:de:0030-drops-75509},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2017.1},
  annote =	{Keywords: online algorithms with delayed service, bipartite matching, competitive analysis}
}
Document
On Approximating Target Set Selection

Authors: Moses Charikar, Yonatan Naamad, and Anthony Wirth

Published in: LIPIcs, Volume 60, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)


Abstract
We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set.

Cite as

Moses Charikar, Yonatan Naamad, and Anthony Wirth. On Approximating Target Set Selection. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 60, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{charikar_et_al:LIPIcs.APPROX-RANDOM.2016.4,
  author =	{Charikar, Moses and Naamad, Yonatan and Wirth, Anthony},
  title =	{{On Approximating Target Set Selection}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-018-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{60},
  editor =	{Jansen, Klaus and Mathieu, Claire and Rolim, Jos\'{e} D. P. and Umans, Chris},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  URN =		{urn:nbn:de:0030-drops-66274},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2016.4},
  annote =	{Keywords: target set selection, influence propagation, approximation algorithms, hardness of approximation, planted dense subgraph}
}
Document
Invited Talk
Bypassing Worst Case Analysis: Tensor Decomposition and Clustering (Invited Talk)

Authors: Moses S. Charikar

Published in: LIPIcs, Volume 45, 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)


Abstract
Typical worst case analysis of algorithms has led to a rich theory, but suffers from many pitfalls. This has inspired several approaches to bypass worst case analysis. In this talk, I will describe two vignettes from recent work in this realm. In the first part of the talk, I will discuss tensor decomposition -- a natural higher dimensional analog of matrix decomposition. Low rank tensor decompositions have proved to be a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. Yet, they pose significant challenges for algorithm design as most problems about tensors are NP-hard. I will discuss a smoothed analysis framework for analyzing algorithms for tensor decomposition which models realistic instances of learning problems and allows us to overcome many of the usual limitations of using tensor methods. In the second part of the talk, I will explore the phenomenon of convex relaxations returning integer solutions. Clearly this is not true in the worst case. I will discuss instances of discrete optimization problems where, for a suitable distribution on inputs, LP and SDP relaxations produce integer solutions with high probability. This has been studied in the context of LP decoding, sparse recovery, stochastic block models and so on. I will mention some recent results for clustering problems: when points are drawn from a distribution over k sufficiently separated clusters, the well known k-median relaxation and a (not so well known) SDP relaxation for k-means exactly recover the clusters.

Cite as

Moses S. Charikar. Bypassing Worst Case Analysis: Tensor Decomposition and Clustering (Invited Talk). In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{charikar:LIPIcs.FSTTCS.2015.1,
  author =	{Charikar, Moses S.},
  title =	{{Bypassing Worst Case Analysis: Tensor Decomposition and Clustering}},
  booktitle =	{35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)},
  pages =	{1--1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-97-2},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{45},
  editor =	{Harsha, Prahladh and Ramalingam, G.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.FSTTCS.2015.1},
  URN =		{urn:nbn:de:0030-drops-56647},
  doi =		{10.4230/LIPIcs.FSTTCS.2015.1},
  annote =	{Keywords: tensor decomposition, smoothed analysis, convex relaxations, integrality}
}
Document
The Hardness of Approximation of Euclidean k-Means

Authors: Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop

Published in: LIPIcs, Volume 34, 31st International Symposium on Computational Geometry (SoCG 2015)


Abstract
The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d. In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Cite as

Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 754-767, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.SOCG.2015.754,
  author =	{Awasthi, Pranjal and Charikar, Moses and Krishnaswamy, Ravishankar and Sinop, Ali Kemal},
  title =	{{The Hardness of Approximation of Euclidean k-Means}},
  booktitle =	{31st International Symposium on Computational Geometry (SoCG 2015)},
  pages =	{754--767},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-83-5},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{34},
  editor =	{Arge, Lars and Pach, J\'{a}nos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SOCG.2015.754},
  URN =		{urn:nbn:de:0030-drops-51178},
  doi =		{10.4230/LIPIcs.SOCG.2015.754},
  annote =	{Keywords: Euclidean k-means, Hardness of Approximation, Vertex Cover}
}
  • Refine by Author
  • 9 Charikar, Moses
  • 3 Kuszmaul, William
  • 2 Geri, Ofir
  • 1 Ashlagi, Itai
  • 1 Awasthi, Pranjal
  • Show More...

  • Refine by Classification
  • 3 Theory of computation → Approximation algorithms analysis
  • 2 Mathematics of computing → Graph algorithms
  • 1 Applied computing → Biological networks
  • 1 Theory of computation
  • 1 Theory of computation → Communication complexity
  • Show More...

  • Refine by Keyword
  • 2 Vertex Cover
  • 2 approximation algorithms
  • 2 dynamic time warping
  • 2 edit distance
  • 2 tree metrics
  • Show More...

  • Refine by Type
  • 13 document

  • Refine by Publication Year
  • 4 2018
  • 2 2015
  • 2 2019
  • 2 2022
  • 1 2016
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail