Search Results

Documents authored by Ghazi, Badih


Document
Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient

Authors: Badih Ghazi, Ravi Kumar, and Pasin Manurangsi

Published in: LIPIcs, Volume 304, 5th Conference on Information-Theoretic Cryptography (ITC 2024)


Abstract
We obtain a new protocol for binary counting in the ε-DP_shuffle model with error O(1/ε) and expected communication Õ((log n)/ε) messages per user. Previous protocols incur either an error of O(1/ε^1.5) with O_ε(log n) messages per user (Ghazi et al., ITC 2020) or an error of O(1/ε) with O_ε(n²) messages per user (Cheu and Yan, TPDP 2022). Using the new protocol, we obtained improved ε-DP_shuffle protocols for real summation and histograms.

Cite as

Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient. In 5th Conference on Information-Theoretic Cryptography (ITC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 304, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITC.2024.4,
  author =	{Ghazi, Badih and Kumar, Ravi and Manurangsi, Pasin},
  title =	{{Pure-DP Aggregation in the Shuffle Model: Error-Optimal and Communication-Efficient}},
  booktitle =	{5th Conference on Information-Theoretic Cryptography (ITC 2024)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-333-1},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{304},
  editor =	{Aggarwal, Divesh},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITC.2024.4},
  URN =		{urn:nbn:de:0030-drops-205127},
  doi =		{10.4230/LIPIcs.ITC.2024.4},
  annote =	{Keywords: Differential Privacy, Shuffle Model, Aggregation, Pure Differential Privacy}
}
Document
Differentially Private Aggregation via Imperfect Shuffling

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson, and Samson Zhou

Published in: LIPIcs, Volume 267, 4th Conference on Information-Theoretic Cryptography (ITC 2023)


Abstract
In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private summation problem. We show that the standard split-and-mix protocol by Ishai et. al. [FOCS 2006] can be adapted to achieve near-optimal utility bounds in the imperfect shuffle model. Specifically, we show that surprisingly, there is no additional error overhead necessary in the imperfect shuffle model.

Cite as

Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson, and Samson Zhou. Differentially Private Aggregation via Imperfect Shuffling. In 4th Conference on Information-Theoretic Cryptography (ITC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 267, pp. 17:1-17:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITC.2023.17,
  author =	{Ghazi, Badih and Kumar, Ravi and Manurangsi, Pasin and Nelson, Jelani and Zhou, Samson},
  title =	{{Differentially Private Aggregation via Imperfect Shuffling}},
  booktitle =	{4th Conference on Information-Theoretic Cryptography (ITC 2023)},
  pages =	{17:1--17:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-271-6},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{267},
  editor =	{Chung, Kai-Min},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITC.2023.17},
  URN =		{urn:nbn:de:0030-drops-183453},
  doi =		{10.4230/LIPIcs.ITC.2023.17},
  annote =	{Keywords: Differential privacy, private summation, shuffle model}
}
Document
Track A: Algorithms, Complexity and Games
On Differentially Private Counting on Trees

Authors: Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, and Kewen Wu

Published in: LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)


Abstract
We study the problem of performing counting queries at different levels in hierarchical structures while preserving individuals' privacy. Motivated by applications, we propose a new error measure for this problem by considering a combination of multiplicative and additive approximation to the query results. We examine known mechanisms in differential privacy (DP) and prove their optimality, under this measure, in the pure-DP setting. In the approximate-DP setting, we design new algorithms achieving significant improvements over known ones.

Cite as

Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, and Kewen Wu. On Differentially Private Counting on Trees. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 66:1-66:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ICALP.2023.66,
  author =	{Ghazi, Badih and Kamath, Pritish and Kumar, Ravi and Manurangsi, Pasin and Wu, Kewen},
  title =	{{On Differentially Private Counting on Trees}},
  booktitle =	{50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)},
  pages =	{66:1--66:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-278-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{261},
  editor =	{Etessami, Kousha and Feige, Uriel and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.66},
  URN =		{urn:nbn:de:0030-drops-181186},
  doi =		{10.4230/LIPIcs.ICALP.2023.66},
  annote =	{Keywords: Differential Privacy, Algorithms, Trees, Hierarchies}
}
Document
Algorithms with More Granular Differential Privacy Guarantees

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Thomas Steinke

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)


Abstract
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. We study several basic data analysis and learning tasks in this framework, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).

Cite as

Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Thomas Steinke. Algorithms with More Granular Differential Privacy Guarantees. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 54:1-54:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITCS.2023.54,
  author =	{Ghazi, Badih and Kumar, Ravi and Manurangsi, Pasin and Steinke, Thomas},
  title =	{{Algorithms with More Granular Differential Privacy Guarantees}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{54:1--54:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.54},
  URN =		{urn:nbn:de:0030-drops-175574},
  doi =		{10.4230/LIPIcs.ITCS.2023.54},
  annote =	{Keywords: Differential Privacy, Algorithms, Per-Attribute Privacy}
}
Document
Private Counting of Distinct and k-Occurring Items in Time Windows

Authors: Badih Ghazi, Ravi Kumar, Jelani Nelson, and Pasin Manurangsi

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)


Abstract
In this work, we study the task of estimating the numbers of distinct and k-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times t₁ and t₂), or are restricted to being cumulative (between times 1 and t₂), and depending on whether the DP neighboring relation is event-level or the more stringent item-level. We obtain nearly tight upper and lower bounds on the errors of DP algorithms for these problems. En route, we obtain an event-level DP algorithm for estimating, at each time step, the number of distinct items seen over the last W updates with error polylogarithmic in W; this answers an open question of Bolot et al. (ICDT 2013).

Cite as

Badih Ghazi, Ravi Kumar, Jelani Nelson, and Pasin Manurangsi. Private Counting of Distinct and k-Occurring Items in Time Windows. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 55:1-55:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITCS.2023.55,
  author =	{Ghazi, Badih and Kumar, Ravi and Nelson, Jelani and Manurangsi, Pasin},
  title =	{{Private Counting of Distinct and k-Occurring Items in Time Windows}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{55:1--55:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.55},
  URN =		{urn:nbn:de:0030-drops-175580},
  doi =		{10.4230/LIPIcs.ITCS.2023.55},
  annote =	{Keywords: Differential Privacy, Algorithms, Distinct Elements, Time Windows}
}
Document
On Distributed Differential Privacy and Counting Distinct Elements

Authors: Lijie Chen, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)


Abstract
We study the setup where each of n users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of (ε,δ)-differentially privacy: - In the non-interactive local setting, we prove that the additive error of any protocol is Ω(n) for any constant ε and for any δ inverse polynomial in n. - In the single-message shuffle setting, we prove a lower bound of Ω̃(n) on the error for any constant ε and for some δ inverse quasi-polynomial in n. We do so by building on the moment-matching method from the literature on distribution estimation. - In the multi-message shuffle setting, we give a protocol with at most one message per user in expectation and with an error of Õ(√n) for any constant ε and for any δ inverse polynomial in n. Our protocol is also robustly shuffle private, and our error of √n matches a known lower bound for such protocols. Our proof technique relies on a new notion, that we call dominated protocols, and which can also be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of selection and learning parity. Our first lower bound for estimating the number of distinct elements provides the first ω(√n) separation between global sensitivity and error in local differential privacy, thus answering an open question of Vadhan (2017). We also provide a simple construction that gives Ω̃(n) separation between global sensitivity and error in two-party differential privacy, thereby answering an open question of McGregor et al. (2011).

Cite as

Lijie Chen, Badih Ghazi, Ravi Kumar, and Pasin Manurangsi. On Distributed Differential Privacy and Counting Distinct Elements. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 56:1-56:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ITCS.2021.56,
  author =	{Chen, Lijie and Ghazi, Badih and Kumar, Ravi and Manurangsi, Pasin},
  title =	{{On Distributed Differential Privacy and Counting Distinct Elements}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{56:1--56:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.56},
  URN =		{urn:nbn:de:0030-drops-135953},
  doi =		{10.4230/LIPIcs.ITCS.2021.56},
  annote =	{Keywords: Differential Privacy, Shuffle Model}
}
Document
Pure Differentially Private Summation from Anonymous Messages

Authors: Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker

Published in: LIPIcs, Volume 163, 1st Conference on Information-Theoretic Cryptography (ITC 2020)


Abstract
The shuffled (aka anonymous) model has recently generated significant interest as a candidate distributed privacy framework with trust assumptions better than the central model but with achievable error rates smaller than the local model. In this paper, we study pure differentially private protocols in the shuffled model for summation, a very basic and widely used primitive. Specifically: - For the binary summation problem where each of n users holds a bit as an input, we give a pure ε-differentially private protocol for estimating the number of ones held by the users up to an absolute error of O_{ε}(1), and where each user sends O_{ε}(log n) one-bit messages. This is the first pure protocol in the shuffled model with error o(√n) for constant values of ε. Using our binary summation protocol as a building block, we give a pure ε-differentially private protocol that performs summation of real numbers in [0, 1] up to an absolute error of O_{ε}(1), and where each user sends O_{ε}(log³ n) messages each consisting of O(log log n) bits. - In contrast, we show that for any pure ε-differentially private protocol for binary summation in the shuffled model having absolute error n^{0.5-Ω(1)}, the per user communication has to be at least Ω_{ε}(√{log n}) bits. This implies (i) the first separation between the (bounded-communication) multi-message shuffled model and the central model, and (ii) the first separation between pure and approximate differentially private protocols in the shuffled model. Interestingly, over the course of proving our lower bound, we have to consider (a generalization of) the following question that might be of independent interest: given γ ∈ (0, 1), what is the smallest positive integer m for which there exist two random variables X⁰ and X^1 supported on {0, … , m} such that (i) the total variation distance between X⁰ and X^1 is at least 1 - γ, and (ii) the moment generating functions of X⁰ and X^1 are within a constant factor of each other everywhere? We show that the answer to this question is m = Θ(√{log(1/γ)}).

Cite as

Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure Differentially Private Summation from Anonymous Messages. In 1st Conference on Information-Theoretic Cryptography (ITC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 163, pp. 15:1-15:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITC.2020.15,
  author =	{Ghazi, Badih and Golowich, Noah and Kumar, Ravi and Manurangsi, Pasin and Pagh, Rasmus and Velingker, Ameya},
  title =	{{Pure Differentially Private Summation from Anonymous Messages}},
  booktitle =	{1st Conference on Information-Theoretic Cryptography (ITC 2020)},
  pages =	{15:1--15:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-151-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{163},
  editor =	{Tauman Kalai, Yael and Smith, Adam D. and Wichs, Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITC.2020.15},
  URN =		{urn:nbn:de:0030-drops-121208},
  doi =		{10.4230/LIPIcs.ITC.2020.15},
  annote =	{Keywords: Pure differential privacy, Shuffled model, Anonymous messages, Summation, Communication bounds}
}
Document
Dimension Reduction for Polynomials over Gaussian Space and Applications

Authors: Badih Ghazi, Pritish Kamath, and Prasad Raghavendra

Published in: LIPIcs, Volume 102, 33rd Computational Complexity Conference (CCC 2018)


Abstract
We introduce a new technique for reducing the dimension of the ambient space of low-degree polynomials in the Gaussian space while preserving their relative correlation structure. As an application, we obtain an explicit upper bound on the dimension of an epsilon-optimal noise-stable Gaussian partition. In fact, we address the more general problem of upper bounding the number of samples needed to epsilon-approximate any joint distribution that can be non-interactively simulated from a correlated Gaussian source. Our results significantly improve (from Ackermann-like to "merely" exponential) the upper bounds recently proved on the above problems by De, Mossel & Neeman [CCC 2017, SODA 2018 resp.] and imply decidability of the larger alphabet case of the gap non-interactive simulation problem posed by Ghazi, Kamath & Sudan [FOCS 2016]. Our technique of dimension reduction for low-degree polynomials is simple and can be seen as a generalization of the Johnson-Lindenstrauss lemma and could be of independent interest.

Cite as

Badih Ghazi, Pritish Kamath, and Prasad Raghavendra. Dimension Reduction for Polynomials over Gaussian Space and Applications. In 33rd Computational Complexity Conference (CCC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 102, pp. 28:1-28:37, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.CCC.2018.28,
  author =	{Ghazi, Badih and Kamath, Pritish and Raghavendra, Prasad},
  title =	{{Dimension Reduction for Polynomials over Gaussian Space and Applications}},
  booktitle =	{33rd Computational Complexity Conference (CCC 2018)},
  pages =	{28:1--28:37},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-069-9},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{102},
  editor =	{Servedio, Rocco A.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2018.28},
  URN =		{urn:nbn:de:0030-drops-88616},
  doi =		{10.4230/LIPIcs.CCC.2018.28},
  annote =	{Keywords: Dimension reduction, Low-degree Polynomials, Noise Stability, Non-Interactive Simulation}
}
Document
Compression in a Distributed Setting

Authors: Badih Ghazi, Elad Haramaty, Pritish Kamath, and Madhu Sudan

Published in: LIPIcs, Volume 67, 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)


Abstract
Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_epsilon denote the number of iterations it would take for a typical player to obtain an epsilon-approximation to Q in total variation distance, we ask whether T_epsilon iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer. We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P || Q)) in tilde{O}(T_epsilon) iterations while allowing for O(epsilon)-error, where D(. || .) denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to H(Q) + D(P || Q) bits, while not requiring T_epsilon * K iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.

Cite as

Badih Ghazi, Elad Haramaty, Pritish Kamath, and Madhu Sudan. Compression in a Distributed Setting. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 19:1-19:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ITCS.2017.19,
  author =	{Ghazi, Badih and Haramaty, Elad and Kamath, Pritish and Sudan, Madhu},
  title =	{{Compression in a Distributed Setting}},
  booktitle =	{8th Innovations in Theoretical Computer Science Conference (ITCS 2017)},
  pages =	{19:1--19:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-029-3},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{67},
  editor =	{Papadimitriou, Christos H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2017.19},
  URN =		{urn:nbn:de:0030-drops-81763},
  doi =		{10.4230/LIPIcs.ITCS.2017.19},
  annote =	{Keywords: Distributed Compression, Communication, Language Evolution, Isolating Hash Families}
}
Document
On the Power of Learning from k-Wise Queries

Authors: Vitaly Feldman and Badih Ghazi

Published in: LIPIcs, Volume 67, 8th Innovations in Theoretical Computer Science Conference (ITCS 2017)


Abstract
Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of Ex_{x ~ D}[q(x)] for any choice of the query function q mapping X to the reals, where D is an unknown data distribution over X.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using k-wise queries each of which is specified by a function q mapping X^k to the reals. Hence it is natural to ask whether algorithms using k-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai and Wasserman (2003) showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with k-wise SQs is smaller than the (unary) SQ complexity by a factor of at most 2^k. We show that for more general problems over distributions the picture is substantially richer. For every k, the complexity of distribution-independent PAC learning with k-wise queries can be exponentially larger than learning with (k+1)-wise queries. We then give two approaches for simulating a k-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum et al.. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using k-wise queries. The second approach exploits the k-party communication complexity of the k-wise query function.

Cite as

Vitaly Feldman and Badih Ghazi. On the Power of Learning from k-Wise Queries. In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 67, pp. 41:1-41:32, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{feldman_et_al:LIPIcs.ITCS.2017.41,
  author =	{Feldman, Vitaly and Ghazi, Badih},
  title =	{{On the Power of Learning from k-Wise Queries}},
  booktitle =	{8th Innovations in Theoretical Computer Science Conference (ITCS 2017)},
  pages =	{41:1--41:32},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-029-3},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{67},
  editor =	{Papadimitriou, Christos H.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2017.41},
  URN =		{urn:nbn:de:0030-drops-81801},
  doi =		{10.4230/LIPIcs.ITCS.2017.41},
  annote =	{Keywords: Statistical Queries, PAC Learning, Differential Privacy, Lower bounds, Communication Complexity}
}
Document
The Power of Shared Randomness in Uncertain Communication

Authors: Badih Ghazi and Madhu Sudan

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)


Abstract
In a recent work (Ghazi et al., SODA 2016), the authors with Komargodski and Kothari initiated the study of communication with contextual uncertainty, a setup aiming to understand how efficient communication is possible when the communicating parties imperfectly share a huge context. In this setting, Alice is given a function f and an input string x, and Bob is given a function g and an input string y. The pair (x,y) comes from a known distribution mu and f and g are guaranteed to be close under this distribution. Alice and Bob wish to compute g(x,y) with high probability. The lack of agreement between Alice and Bob on the function that is being computed captures the uncertainty in the context. The previous work showed that any problem with one-way communication complexity k in the standard model (i.e., without uncertainty, in other words, under the promise that f=g) has public-coin communication at most O(k(1+I)) bits in the uncertain case, where I is the mutual information between x and y. Moreover, a lower bound of Omega(sqrt{I}) bits on the public-coin uncertain communication was also shown. However, an important question that was left open is related to the power that public randomness brings to uncertain communication. Can Alice and Bob achieve efficient communication amid uncertainty without using public randomness? And how powerful are public-coin protocols in overcoming uncertainty? Motivated by these two questions: - We prove the first separation between private-coin uncertain communication and public-coin uncertain communication. Namely, we exhibit a function class for which the communication in the standard model and the public-coin uncertain communication are O(1) while the private-coin uncertain communication is a growing function of n (the length of the inputs). This lower bound (proved with respect to the uniform distribution) is in sharp contrast with the case of public-coin uncertain communication which was shown by the previous work to be within a constant factor from the certain communication. This lower bound also implies the first separation between public-coin uncertain communication and deterministic uncertain communication. Interestingly, we also show that if Alice and Bob imperfectly share a sequence of random bits (a setup weaker than public randomness), then achieving a constant blow-up in communication is still possible. - We improve the lower-bound of the previous work on public-coin uncertain communication. Namely, we exhibit a function class and a distribution (with mutual information I approx n) for which the one-way certain communication is k bits but the one-way public-coin uncertain communication is at least Omega(sqrt{k}*sqrt{I}) bits. Our proofs introduce new problems in the standard communication complexity model and prove lower bounds for these problems. Both the problems and the lower bound techniques may be of general interest.

Cite as

Badih Ghazi and Madhu Sudan. The Power of Shared Randomness in Uncertain Communication. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 49:1-49:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{ghazi_et_al:LIPIcs.ICALP.2017.49,
  author =	{Ghazi, Badih and Sudan, Madhu},
  title =	{{The Power of Shared Randomness in Uncertain Communication}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{49:1--49:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.49},
  URN =		{urn:nbn:de:0030-drops-74871},
  doi =		{10.4230/LIPIcs.ICALP.2017.49},
  annote =	{Keywords: randomness, uncertainty, communication, imperfectly shared randomness, lower bounds}
}
Document
The Information Complexity of Hamming Distance

Authors: Eric Blais, Joshua Brody, and Badih Ghazi

Published in: LIPIcs, Volume 28, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)


Abstract
The Hamming distance function Ham_{n,d} returns 1 on all pairs of inputs x and y that differ in at most d coordinates and returns 0 otherwise. We initiate the study of the information complexity of the Hamming distance function. We give a new optimal lower bound for the information complexity of the Ham_{n,d} function in the small-error regime where the protocol is required to err with probability at most epsilon < d/n. We also give a new conditional lower bound for the information complexity of Ham_{n,d} that is optimal in all regimes. These results imply the first new lower bounds on the communication complexity of the Hamming distance function for the shared randomness two-way communication model since Pang and El-Gamal (1986). These results also imply new lower bounds in the areas of property testing and parity decision tree complexity.

Cite as

Eric Blais, Joshua Brody, and Badih Ghazi. The Information Complexity of Hamming Distance. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 28, pp. 465-489, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{blais_et_al:LIPIcs.APPROX-RANDOM.2014.465,
  author =	{Blais, Eric and Brody, Joshua and Ghazi, Badih},
  title =	{{The Information Complexity of Hamming Distance}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)},
  pages =	{465--489},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-74-3},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{28},
  editor =	{Jansen, Klaus and Rolim, Jos\'{e} and Devanur, Nikhil R. and Moore, Cristopher},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2014.465},
  URN =		{urn:nbn:de:0030-drops-47174},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2014.465},
  annote =	{Keywords: Hamming distance, communication complexity, information complexity}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail