Search Results

Documents authored by Tětek, Jakub


Document
RANDOM
Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private

Authors: Jakub Tětek

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)


Abstract
The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not even able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice. In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error R of the approximation algorithm is sufficiently concentrated around 0 (e.g. 𝔼[|R|] is bounded) and that the function being approximated has a small global sensitivity Δ. Specifically, if we have a randomized approximation algorithm with sufficiently concentrated error which has time/space/query complexity T(n,ρ) with ρ being an accuracy parameter, we can generally speaking get an algorithm with the same accuracy and complexity T(n,Θ(ε ρ)) that is ε-differentially private. Our technical results are as follows. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of the global sensitivity Δ and the subexponential diameter of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error 𝔼[|R|]. Finally, we use our results to give either the first known or improved sublinear-complexity differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in subliinear time, rank queries, or estimating the size of the maximum matching. Our results raise many new questions and we state multiple open problems.

Cite as

Jakub Tětek. Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 73:1-73:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{tetek:LIPIcs.APPROX/RANDOM.2024.73,
  author =	{T\v{e}tek, Jakub},
  title =	{{Additive Noise Mechanisms for Making Randomized Approximation Algorithms Differentially Private}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{73:1--73:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.73},
  URN =		{urn:nbn:de:0030-drops-210660},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.73},
  annote =	{Keywords: Differential privacy, Randomized approximation algorithms}
}
Document
RANDOM
Bias Reduction for Sum Estimation

Authors: Talya Eden, Jakob Bæk Tejs Houen, Shyam Narayanan, Will Rosenbaum, and Jakub Tětek

Published in: LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)


Abstract
In classical statistics and distribution testing, it is often assumed that elements can be sampled exactly from some distribution 𝒫, and that when an element x is sampled, the probability 𝒫(x) of sampling x is also known. In this setting, recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution 𝒬 that is sufficiently close to 𝒫. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution 𝒬 sufficient, and what is the algorithmic cost of coping with this noise? In this paper, we investigate these questions for the problem of estimating the sum of a multiset of N real values x_1, …, x_N. This problem is well-studied in the statistical literature in the case 𝒫 = 𝒬, where the Hansen-Hurwitz estimator [Annals of Mathematical Statistics, 1943] is frequently used. We assume that for some (known) distribution 𝒫, values are sampled from a distribution 𝒬 that is pointwise close to 𝒫. That is, there is a parameter γ < 1 such that for all x_i, (1 - γ) 𝒫(i) ≤ 𝒬(i) ≤ (1 + γ) 𝒫(i). For every positive integer k we define an estimator ζ_k for μ = ∑_i x_i whose bias is proportional to γ^k (where our ζ₁ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if 𝒬 is pointwise γ-close to uniform and all x_i ∈ {0, 1}, for any ε > 0, we can estimate μ to within additive error ε N using m = Θ(N^{1-1/k}/ε^{2/k}) samples, where k = ⌈lg ε/lg γ⌉. We then show that this sample complexity is essentially optimal. Interestingly, our upper and lower bounds show that the sample complexity need not vary uniformly with the desired error parameter ε: for some values of ε, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

Cite as

Talya Eden, Jakob Bæk Tejs Houen, Shyam Narayanan, Will Rosenbaum, and Jakub Tětek. Bias Reduction for Sum Estimation. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 62:1-62:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{eden_et_al:LIPIcs.APPROX/RANDOM.2023.62,
  author =	{Eden, Talya and Houen, Jakob B{\ae}k Tejs and Narayanan, Shyam and Rosenbaum, Will and T\v{e}tek, Jakub},
  title =	{{Bias Reduction for Sum Estimation}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)},
  pages =	{62:1--62:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-296-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{275},
  editor =	{Megow, Nicole and Smith, Adam},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2023.62},
  URN =		{urn:nbn:de:0030-drops-188872},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2023.62},
  annote =	{Keywords: bias reduction, sum estimation, sublinear time algorithms, sample complexity}
}
Document
Track A: Algorithms, Complexity and Games
Approximate Triangle Counting via Sampling and Fast Matrix Multiplication

Authors: Jakub Tětek

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)


Abstract
There is a simple O(n³/{ε²T}) time algorithm for 1±ε-approximate triangle counting where T is the number of triangles in the graph and n the number of vertices. At the same time, one may count triangles exactly using fast matrix multiplication in time Õ(n^ω). Is it possible to get a negative dependency on the number of triangles T while retaining the state-of-the-art n^ω dependency on n? We answer this question positively by providing an algorithm which runs in time O({n^ω}/T^{ω-2})⋅poly(n^o(1)/ε). This is optimal in the sense that as long as the exponent of T is independent of n, T, it cannot be improved while retaining the dependency on n. Our algorithm improves upon the state of the art when T ≫ 1 and T ≪ n. We also consider the problem of approximate triangle counting in sparse graphs, parameterized by the number of edges m. The best known algorithm runs in time Õ_ε(m^{3/2}/T) [Eden et al., SIAM Journal on Computing, 2017]. An algorithm by Alon et al. [JACM, 1995] for exact triangle counting that runs in time Õ(m^{2ω/(ω + 1)}). We again get an algorithm whose complexity has a state-of-the-art dependency on m while having negative dependency on T. Specifically, our algorithm runs in time O({m^{2ω/(ω+1)}}/{T^{2(ω-1)/(ω+1)}}) ⋅ poly(n^o(1)/ε). This is again optimal in the sense that no better constant exponent of T is possible without worsening the dependency on m. This algorithm improves upon the state of the art when T ≫ 1 and T ≪ √m. In both cases, algorithms with time complexity matching query complexity lower bounds were known on some range of parameters. While those algorithms have optimal query complexity for the whole range of T, the time complexity departs from the query complexity and is no longer optimal (as we show) for T ≪ n and T ≪ √m, respectively. We focus on the time complexity in this range of T. To the best of our knowledge, this is the first paper considering the discrepancy between query and time complexity in graph parameter estimation.

Cite as

Jakub Tětek. Approximate Triangle Counting via Sampling and Fast Matrix Multiplication. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 107:1-107:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{tetek:LIPIcs.ICALP.2022.107,
  author =	{T\v{e}tek, Jakub},
  title =	{{Approximate Triangle Counting via Sampling and Fast Matrix Multiplication}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{107:1--107:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.107},
  URN =		{urn:nbn:de:0030-drops-164485},
  doi =		{10.4230/LIPIcs.ICALP.2022.107},
  annote =	{Keywords: Approximate triangle counting, Fast matrix multiplication, Sampling}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail