7 Search Results for "Awasthi, Pranjal"


Document
Local Search k-means++ with Foresight

Authors: Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Since its introduction in 1957, Lloyd’s algorithm for k-means clustering has been extensively studied and has undergone several improvements. While in its original form it does not guarantee any approximation factor at all, Arthur and Vassilvitskii (SODA 2007) proposed k-means++ which enhances Lloyd’s algorithm by a seeding method which guarantees a 𝒪(log k)-approximation in expectation. More recently, Lattanzi and Sohler (ICML 2019) proposed LS++ which further improves the solution quality of k-means++ by local search techniques to obtain a 𝒪(1)-approximation. On the practical side, the greedy variant of k-means++ is often used although its worst-case behaviour is provably worse than for the standard k-means++ variant. We investigate how to improve LS++ further in practice. We study two options for improving the practical performance: (a) Combining LS++ with greedy k-means++ instead of k-means++, and (b) Improving LS++ by better entangling it with Lloyd’s algorithm. Option (a) worsens the theoretical guarantees of k-means++ but improves the practical quality also in combination with LS++ as we confirm in our experiments. Option (b) is our new algorithm, Foresight LS++. We experimentally show that FLS++ improves upon the solution quality of LS++. It retains its asymptotic runtime and its worst-case approximation bounds.

Cite as

Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{conrads_et_al:LIPIcs.SEA.2024.7,
  author =	{Conrads, Theo and Drexler, Lukas and K\"{o}nen, Joshua and Schmidt, Daniel R. and Schmidt, Melanie},
  title =	{{Local Search k-means++ with Foresight}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{7:1--7:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.7},
  URN =		{urn:nbn:de:0030-drops-203727},
  doi =		{10.4230/LIPIcs.SEA.2024.7},
  annote =	{Keywords: k-means clustering, kmeans++, greedy, local search}
}
Document
Track A: Algorithms, Complexity and Games
Polylogarithmic Approximations for Robust s-t Path

Authors: Shi Li, Chenyang Xu, and Ruilong Zhang

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)


Abstract
The paper revisits the Robust s-t Path problem, one of the most fundamental problems in robust optimization. In the problem, we are given a directed graph with n vertices and k distinct cost functions (scenarios) defined over edges, and aim to choose an s-t path such that the total cost of the path is always provable no matter which scenario is realized. Viewing each cost function as an agent, our goal is to find a fair s-t path, which minimizes the maximum cost among all agents. The problem is NP-hard to approximate within a factor of o(log k) unless NP ⊆ DTIME(n^{polylog n}), and the best-known approximation ratio is Õ(√n), which is based on the natural flow linear program. A longstanding open question is whether we can achieve a polylogarithmic approximation for the problem; it remains open even if a quasi-polynomial running time is allowed. Our main result is a O(log n log k) approximation for the Robust s-t Path problem in quasi-polynomial time, solving the open question in the quasi-polynomial time regime. The algorithm is built on a novel linear program formulation for a decision-tree-type structure, which enables us to overcome the Ω(√n) integrality gap for the natural flow LP. Furthermore, we show that for graphs with bounded treewidth, the quasi-polynomial running time can be improved to a polynomial. We hope our techniques can offer new insights into this problem and other related problems in robust optimization.

Cite as

Shi Li, Chenyang Xu, and Ruilong Zhang. Polylogarithmic Approximations for Robust s-t Path. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 106:1-106:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{li_et_al:LIPIcs.ICALP.2024.106,
  author =	{Li, Shi and Xu, Chenyang and Zhang, Ruilong},
  title =	{{Polylogarithmic Approximations for Robust s-t Path}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{106:1--106:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.106},
  URN =		{urn:nbn:de:0030-drops-202497},
  doi =		{10.4230/LIPIcs.ICALP.2024.106},
  annote =	{Keywords: Approximation Algorithm, Randomized LP Rounding, Robust s-t Path}
}
Document
Distributionally Robust Data Join

Authors: Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern

Published in: LIPIcs, Volume 256, 4th Symposium on Foundations of Responsible Computing (FORC 2023)


Abstract
Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is r₁ away from the empirical distribution over the labeled dataset and r₂ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.

Cite as

Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern. Distributionally Robust Data Join. In 4th Symposium on Foundations of Responsible Computing (FORC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 256, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.FORC.2023.10,
  author =	{Awasthi, Pranjal and Jung, Christopher and Morgenstern, Jamie},
  title =	{{Distributionally Robust Data Join}},
  booktitle =	{4th Symposium on Foundations of Responsible Computing (FORC 2023)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-272-3},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{256},
  editor =	{Talwar, Kunal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2023.10},
  URN =		{urn:nbn:de:0030-drops-179311},
  doi =		{10.4230/LIPIcs.FORC.2023.10},
  annote =	{Keywords: Distributionally Robust Optimization, Semi-Supervised Learning, Learning Theory}
}
Document
APPROX
Hardness of Approximation for Euclidean k-Median

Authors: Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)


Abstract
The Euclidean k-median problem is defined in the following manner: given a set 𝒳 of n points in d-dimensional Euclidean space ℝ^d, and an integer k, find a set C ⊂ ℝ^d of k points (called centers) such that the cost function Φ(C,𝒳) ≡ ∑_{x ∈ 𝒳} min_{c ∈ C} ‖x-c‖₂ is minimized. The Euclidean k-means problem is defined similarly by replacing the distance with squared Euclidean distance in the cost function. Various hardness of approximation results are known for the Euclidean k-means problem [Pranjal Awasthi et al., 2015; Euiwoong Lee et al., 2017; Vincent Cohen{-}Addad and {Karthik {C. S.}}, 2019]. However, no hardness of approximation result was known for the Euclidean k-median problem. In this work, assuming the unique games conjecture (UGC), we provide the hardness of approximation result for the Euclidean k-median problem in O(log k) dimensional space. This solves an open question posed explicitly in the work of Awasthi et al. [Pranjal Awasthi et al., 2015]. Furthermore, we study the hardness of approximation for the Euclidean k-means/k-median problems in the bi-criteria setting where an algorithm is allowed to choose more than k centers. That is, bi-criteria approximation algorithms are allowed to output β k centers (for constant β > 1) and the approximation ratio is computed with respect to the optimal k-means/k-median cost. We show the hardness of bi-criteria approximation result for the Euclidean k-median problem for any β < 1.015, assuming UGC. We also show a similar hardness of bi-criteria approximation result for the Euclidean k-means problem with a stronger bound of β < 1.28, again assuming UGC.

Cite as

Anup Bhattacharya, Dishant Goyal, and Ragesh Jaiswal. Hardness of Approximation for Euclidean k-Median. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 4:1-4:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{bhattacharya_et_al:LIPIcs.APPROX/RANDOM.2021.4,
  author =	{Bhattacharya, Anup and Goyal, Dishant and Jaiswal, Ragesh},
  title =	{{Hardness of Approximation for Euclidean k-Median}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{4:1--4:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  URN =		{urn:nbn:de:0030-drops-146979},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.4},
  annote =	{Keywords: Hardness of approximation, bicriteria approximation, approximation algorithms, k-median, k-means}
}
Document
Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem

Authors: Haris Angelidakis, Pranjal Awasthi, Avrim Blum, Vaggos Chatziafratis, and Chen Dan

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
We study the classic Maximum Independent Set problem under the notion of stability introduced by Bilu and Linial (2010): a weighted instance of Independent Set is gamma-stable if it has a unique optimal solution that remains the unique optimal solution under multiplicative perturbations of the weights by a factor of at most gamma >= 1. The goal then is to efficiently recover this "pronounced" optimal solution exactly. In this work, we solve stable instances of Independent Set on several classes of graphs: we improve upon previous results by solving O~(Delta/sqrt(log Delta))-stable instances on graphs of maximum degree Delta, (k - 1)-stable instances on k-colorable graphs and (1 + epsilon)-stable instances on planar graphs (for any fixed epsilon > 0), using both combinatorial techniques as well as LPs and the Sherali-Adams hierarchy. For general graphs, we present a strong lower bound showing that there are no efficient algorithms for O(n^(1/2 - epsilon))-stable instances of Independent Set, assuming the planted clique conjecture. To complement our negative result, we give an algorithm for (epsilon n)-stable instances, for any fixed epsilon > 0. As a by-product of our techniques, we give algorithms as well as lower bounds for stable instances of Node Multiway Cut (a generalization of Edge Multiway Cut), by exploiting its connections to Vertex Cover. Furthermore, we prove a general structural result showing that the integrality gap of convex relaxations of several maximization problems reduces dramatically on stable instances. Moreover, we initiate the study of certified algorithms for Independent Set. The notion of a gamma-certified algorithm was introduced very recently by Makarychev and Makarychev (2018) and it is a class of gamma-approximation algorithms that satisfy one crucial property: the solution returned is optimal for a perturbation of the original instance, where perturbations are again multiplicative up to a factor of gamma >= 1 (hence, such algorithms not only solve gamma-stable instances optimally, but also have guarantees even on unstable instances). Here, we obtain Delta-certified algorithms for Independent Set on graphs of maximum degree Delta, and (1+epsilon)-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Fürer (1994) and prove that it is a ((Delta + 1)/3 + epsilon)-certified algorithm for Independent Set on graphs of maximum degree Delta where all weights are equal to 1.

Cite as

Haris Angelidakis, Pranjal Awasthi, Avrim Blum, Vaggos Chatziafratis, and Chen Dan. Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 7:1-7:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{angelidakis_et_al:LIPIcs.ESA.2019.7,
  author =	{Angelidakis, Haris and Awasthi, Pranjal and Blum, Avrim and Chatziafratis, Vaggos and Dan, Chen},
  title =	{{Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{7:1--7:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.7},
  URN =		{urn:nbn:de:0030-drops-111288},
  doi =		{10.4230/LIPIcs.ESA.2019.7},
  annote =	{Keywords: Bilu-Linial stability, perturbation resilience, beyond worst-case analysis, Independent Set, Vertex Cover, Multiway Cut}
}
Document
Track A: Algorithms, Complexity and Games
Robust Communication-Optimal Distributed Clustering Algorithms

Authors: Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, and David P. Woodruff

Published in: LIPIcs, Volume 132, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)


Abstract
In this work, we study the k-median and k-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer understanding through the lens of beyond worst-case analysis. Our main motivation is the following: for many applications such as clustering proteins by function or clustering communities in a social network, there is some unknown target clustering, and the hope is that running a k-median or k-means algorithm will produce clusterings which are close to matching the target clustering. Worst-case results can guarantee constant factor approximations to the optimal k-median or k-means objective value, but not closeness to the target clustering. Our first result is a distributed algorithm which returns a near-optimal clustering assuming a natural notion of stability, namely, approximation stability [Awasthi and Balcan, 2014], even when a constant fraction of the data are outliers. The communication complexity is O~(sk+z) where s is the number of machines, k is the number of clusters, and z is the number of outliers. Next, we show this amount of communication cannot be improved even in the setting when the input satisfies various non-worst-case assumptions. We give a matching Omega(sk+z) lower bound on the communication required both for approximating the optimal k-means or k-median cost up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance. These lower bounds hold even when the data satisfies approximation stability or other common notions of stability, and the cluster sizes are balanced. Therefore, Omega(sk+z) is a communication bottleneck, even for real-world instances.

Cite as

Pranjal Awasthi, Ainesh Bakshi, Maria-Florina Balcan, Colin White, and David P. Woodruff. Robust Communication-Optimal Distributed Clustering Algorithms. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 18:1-18:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.ICALP.2019.18,
  author =	{Awasthi, Pranjal and Bakshi, Ainesh and Balcan, Maria-Florina and White, Colin and Woodruff, David P.},
  title =	{{Robust Communication-Optimal Distributed Clustering Algorithms}},
  booktitle =	{46th International Colloquium on Automata, Languages, and Programming (ICALP 2019)},
  pages =	{18:1--18:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-109-2},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{132},
  editor =	{Baier, Christel and Chatzigiannakis, Ioannis and Flocchini, Paola and Leonardi, Stefano},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2019.18},
  URN =		{urn:nbn:de:0030-drops-105942},
  doi =		{10.4230/LIPIcs.ICALP.2019.18},
  annote =	{Keywords: robust distributed clustering, communication complexity}
}
Document
The Hardness of Approximation of Euclidean k-Means

Authors: Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop

Published in: LIPIcs, Volume 34, 31st International Symposium on Computational Geometry (SoCG 2015)


Abstract
The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R^d, and the goal is to choose k center points in R^d so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1+c)-approximation which runs in time poly(n) exp(k/c). At the other extreme, the only known computational complexity result for this problem is NP-hardness [Aloise et al.'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R^d can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d. In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant c > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1+c). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

Cite as

Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 754-767, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.SOCG.2015.754,
  author =	{Awasthi, Pranjal and Charikar, Moses and Krishnaswamy, Ravishankar and Sinop, Ali Kemal},
  title =	{{The Hardness of Approximation of Euclidean k-Means}},
  booktitle =	{31st International Symposium on Computational Geometry (SoCG 2015)},
  pages =	{754--767},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-83-5},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{34},
  editor =	{Arge, Lars and Pach, J\'{a}nos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SOCG.2015.754},
  URN =		{urn:nbn:de:0030-drops-51178},
  doi =		{10.4230/LIPIcs.SOCG.2015.754},
  annote =	{Keywords: Euclidean k-means, Hardness of Approximation, Vertex Cover}
}
  • Refine by Author
  • 4 Awasthi, Pranjal
  • 1 Angelidakis, Haris
  • 1 Bakshi, Ainesh
  • 1 Balcan, Maria-Florina
  • 1 Bhattacharya, Anup
  • Show More...

  • Refine by Classification
  • 2 Theory of computation → Facility location and clustering
  • 1 Information systems → Clustering
  • 1 Mathematics of computing → Combinatorial algorithms
  • 1 Theory of computation → Algorithm design techniques
  • 1 Theory of computation → Approximation algorithms analysis
  • Show More...

  • Refine by Keyword
  • 2 Vertex Cover
  • 1 Approximation Algorithm
  • 1 Bilu-Linial stability
  • 1 Distributionally Robust Optimization
  • 1 Euclidean k-means
  • Show More...

  • Refine by Type
  • 7 document

  • Refine by Publication Year
  • 2 2019
  • 2 2024
  • 1 2015
  • 1 2021
  • 1 2023