3 Search Results for "Gandhi, Rajiv"


Document
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs

Authors: Sepehr Assadi, Nirmit Joshi, Milind Prabhu, and Vihan Shah

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)


Abstract
Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any ϕ ∈ (0,1], return a ϕ-quantile of S up to an ε error, i.e., return a ϕ'-quantile with ϕ' = ϕ ± ε. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/ε) log{(ε n)}) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesleý [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items, and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/ε) log(εn)) space and O(log(1/ε)+log log(εn)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and ε). En route to this, we also simplify the original GK summaries for unweighted quantiles.

Cite as

Sepehr Assadi, Nirmit Joshi, Milind Prabhu, and Vihan Shah. Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 19:1-19:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{assadi_et_al:LIPIcs.ICDT.2023.19,
  author =	{Assadi, Sepehr and Joshi, Nirmit and Prabhu, Milind and Shah, Vihan},
  title =	{{Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{19:1--19:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.19},
  URN =		{urn:nbn:de:0030-drops-177618},
  doi =		{10.4230/LIPIcs.ICDT.2023.19},
  annote =	{Keywords: Streaming algorithms, Quantile summaries, Rank estimation}
}
Document
Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

Authors: Barna Saha and Sanjay Subramanian

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices u and v in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under the same-cluster querying framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. In a recent work, Ailon et al. (2018b) provided an approximation algorithm for correlation clustering that approximates the correlation clustering objective within (1+epsilon) with O((k^{14} log{n} log{k})/epsilon^6) queries when the number of clusters, k, is fixed. For many applications, k is not fixed and can grow with |V|. Moreover, the dependency of k^14 on query complexity renders the algorithm impractical even for datasets with small values of k. In this paper, we take a different approach. Let C_{OPT} be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_{OPT} rather than by the number of clusters. Indeed, a good clustering must have small C_{OPT}. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_{OPT} queries and an efficient algorithm that outputs a 2-approximation using at most C_{OPT} queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+alpha for an absolute constant alpha > 0 with o(C_{OPT}) queries. Therefore, our first algorithm achieves the optimal query bound within a factor of 2. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against known correlation clustering algorithms that do not perform querying. In all cases, our algorithms exhibit superior performance.

Cite as

Barna Saha and Sanjay Subramanian. Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 81:1-81:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{saha_et_al:LIPIcs.ESA.2019.81,
  author =	{Saha, Barna and Subramanian, Sanjay},
  title =	{{Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{81:1--81:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.81},
  URN =		{urn:nbn:de:0030-drops-112020},
  doi =		{10.4230/LIPIcs.ESA.2019.81},
  annote =	{Keywords: Clustering, Approximation Algorithm, Crowdsourcing, Randomized Algorithm}
}
Document
Bicovering: Covering Edges With Two Small Subsets of Vertices

Authors: Amey Bhangale, Rajiv Gandhi, Mohammad Taghi Hajiaghayi, Rohit Khandekar, and Guy Kortsarz

Published in: LIPIcs, Volume 55, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)


Abstract
We study the following basic problem called Bi-Covering. Given a graph G(V, E), find two (not necessarily disjoint) sets A subseteq V and B subseteq V such that A union B = V and that every edge e belongs to either the graph induced by A or to the graph induced by B. The goal is to minimize max{|A|, |B|}. This is the most simple case of the Channel Allocation problem [Gandhi et al., Networks, 2006]. A solution that outputs V,emptyset gives ratio at most 2. We show that under the similar Strong Unique Game Conjecture by [Bansal-Khot, FOCS, 2009] there is no 2 - epsilon ratio algorithm for the problem, for any constant epsilon > 0. Given a bipartite graph, Max-bi-clique is a problem of finding largest k*k complete bipartite sub graph. For Max-bi-clique problem, a constant factor hardness was known under random 3-SAT hypothesis of Feige [Feige, STOC, 2002] and also under the assumption that NP !subseteq intersection_{epsilon > 0} BPTIME(2^{n^{epsilon}}) [Khot, SIAM J. on Comp., 2011]. It was an open problem in [Ambühl et. al., SIAM J. on Comp., 2011] to prove inapproximability of Max-bi-clique assuming weaker conjecture. Our result implies similar hardness result assuming the Strong Unique Games Conjecture. On the algorithmic side, we also give better than 2 approximation for Bi-Covering on numerous special graph classes. In particular, we get 1.876 approximation for Chordal graphs, exact algorithm for Interval Graphs, 1 + o(1) for Minor Free Graph, 2 - 4*delta/3 for graphs with minimum degree delta*n, 2/(1+delta^2/8) for delta-vertex expander, 8/5 for Split Graphs, 2 - (6/5)*1/d for graphs with minimum constant degree d etc. Our algorithmic results are quite non-trivial. In achieving these results, we use various known structural results about the graphs, combined with the techniques that we develop tailored to getting better than 2 approximation.

Cite as

Amey Bhangale, Rajiv Gandhi, Mohammad Taghi Hajiaghayi, Rohit Khandekar, and Guy Kortsarz. Bicovering: Covering Edges With Two Small Subsets of Vertices. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 6:1-6:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bhangale_et_al:LIPIcs.ICALP.2016.6,
  author =	{Bhangale, Amey and Gandhi, Rajiv and Hajiaghayi, Mohammad Taghi and Khandekar, Rohit and Kortsarz, Guy},
  title =	{{Bicovering: Covering Edges With Two Small Subsets of Vertices}},
  booktitle =	{43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
  pages =	{6:1--6:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-013-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{55},
  editor =	{Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.6},
  URN =		{urn:nbn:de:0030-drops-62728},
  doi =		{10.4230/LIPIcs.ICALP.2016.6},
  annote =	{Keywords: Bi-covering, Unique Games, Max Bi-clique}
}
  • Refine by Author
  • 1 Assadi, Sepehr
  • 1 Bhangale, Amey
  • 1 Gandhi, Rajiv
  • 1 Hajiaghayi, Mohammad Taghi
  • 1 Joshi, Nirmit
  • Show More...

  • Refine by Classification
  • 2 Theory of computation → Approximation algorithms analysis
  • 1 Theory of computation → Data structures design and analysis
  • 1 Theory of computation → Streaming, sublinear and near linear time algorithms
  • 1 Theory of computation → Unsupervised learning and clustering

  • Refine by Keyword
  • 1 Approximation Algorithm
  • 1 Bi-covering
  • 1 Clustering
  • 1 Crowdsourcing
  • 1 Max Bi-clique
  • Show More...

  • Refine by Type
  • 3 document

  • Refine by Publication Year
  • 1 2016
  • 1 2019
  • 1 2023

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail