Making Progress Based on False Discoveries
We consider Stochastic Convex Optimization as a case-study for Adaptive Data Analysis. A basic question is how many samples are needed in order to compute ε-accurate estimates of O(1/ε²) gradients queried by gradient descent. We provide two intermediate answers to this question.
First, we show that for a general analyst (not necessarily gradient descent) Ω(1/ε³) samples are required, which is more than the number of sample required to simply optimize the population loss. Our construction builds upon a new lower bound (that may be of interest of its own right) for an analyst that may ask several non adaptive questions in a batch of fixed and known T rounds of adaptivity and requires a fraction of true discoveries. We show that for such an analyst Ω (√T/ε²) samples are necessary.
Second, we show that, under certain assumptions on the oracle, in an interaction with gradient descent ̃ Ω(1/ε^{2.5}) samples are necessary. Which is again suboptimal in terms of optimization. Our assumptions are that the oracle has only first order access and is post-hoc generalizing. First order access means that it can only compute the gradients of the sampled function at points queried by the algorithm. Our assumption of post-hoc generalization follows from existing lower bounds for statistical queries. More generally then, we provide a generic reduction from the standard setting of statistical queries to the problem of estimating gradients queried by gradient descent.
Overall these results are in contrast with classical bounds that show that with O(1/ε²) samples one can optimize the population risk to accuracy of O(ε) but, as it turns out, with spurious gradients.
Adaptive Data Analysis
Stochastic Convex Optimization
Learning Theory
Theory of computation~Machine learning theory
76:1-76:18
Regular Paper
This work has been supported by an ISF grant no 2188/20 and the research was funded in part by the ERC grant (FOG- 101116258 FoG).
https://arxiv.org/abs/2204.08809
Roi
Livni
Roi Livni
Department of Electrical Engineering, Tel Aviv University, Israel
10.4230/LIPIcs.ITCS.2024.76
Idan Amir, Yair Carmon, Tomer Koren, and Roi Livni. Never go full batch (in stochastic convex optimization). Advances in Neural Information Processing Systems, 34, 2021.
Idan Amir, Tomer Koren, and Roi Livni. Sgd generalizes better than gd (and regularization doesn’t help). In Conference on Learning Theory, pages 63-92. PMLR, 2021.
Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, and Kunal Talwar. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, 33:4381-4391, 2020.
Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, and Jonathan Ullman. Algorithmic stability for adaptive data analysis. SIAM Journal on Computing, 50(3):STOC16-377, 2021.
Avrim Blum and Moritz Hardt. The ladder: A reliable leaderboard for machine learning competitions. In International Conference on Machine Learning, pages 1006-1014. PMLR, 2015.
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trendsregistered in Machine Learning, 8(3-4):231-357, 2015.
Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. SIAM Journal on Computing, 47(5):1888-1938, 2018.
Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven Wu. Adaptive learning with robust generalization guarantees. In Conference on Learning Theory, pages 772-814. PMLR, 2016.
Anindya De. Lower bounds in differential privacy. In Theory of cryptography conference, pages 321-338. Springer, 2012.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems, 28, 2015.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248):636-638, 2015.
Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117-126, 2015.
Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. Statistical query algorithms for mean vector estimation and stochastic convex optimization. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1265-1277. SIAM, 2017.
Moritz Hardt and Jonathan Ullman. Preventing false discovery in interactive data analysis is hard. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 454-463. IEEE, 2014.
John PA Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. Jama, 294(2):218-228, 2005.
John PA Ioannidis. Why most published research findings are false. PLoS medicine, 2(8):e124, 2005.
Tomer Koren, Roi Livni, Yishay Mansour, and Uri Sherman. Benign underfitting of stochastic gradient descent. arXiv preprint, 2022. URL: https://arxiv.org/abs/2202.13361.
https://arxiv.org/abs/2202.13361
Roi Livni. Making progress based on false discoveries. arXiv preprint, 2022. URL: https://arxiv.org/abs/2204.08809.
https://arxiv.org/abs/2204.08809
Arkadi Nemirovski and Dmitry Yudin. Problem complexity and method efficiency in optimization (as nemirovsky and db yudin). Wiley, Interscience, 1985.
Arkadij Semenovič Nemirovskij and David Borisovich Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery, 10(9):712-712, 2011.
Thomas Steinke and Jonathan Ullman. Interactive fingerprinting codes and the hardness of preventing false discovery. In Conference on learning theory, pages 1588-1628. PMLR, 2015.
Jonathan Ullman, Adam Smith, Kobbi Nissim, Uri Stemmer, and Thomas Steinke. The limits of post-selection generalization. Advances in Neural Information Processing Systems, 31, 2018.
Tijana Zrnic and Moritz Hardt. Natural analysts in adaptive data analysis. In International Conference on Machine Learning, pages 7703-7711. PMLR, 2019.
Roi Livni
Creative Commons Attribution 4.0 International license
https://creativecommons.org/licenses/by/4.0/legalcode