Document

# Memory-Sample Lower Bounds for Learning Parity with Noise

## File

LIPIcs.APPROX-RANDOM.2021.60.pdf
• Filesize: 0.75 MB
• 19 pages

## Acknowledgements

We would like to thank Avishay Tal and Greg Valiant for the helpful discussions.

## Cite As

Sumegha Garg, Pravesh K. Kothari, Pengda Liu, and Ran Raz. Memory-Sample Lower Bounds for Learning Parity with Noise. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 60:1-60:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.60

## Abstract

In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn x = (x₁,…,x_n) ∈ {0,1}ⁿ from a stream of random linear equations over 𝔽₂ that are correct with probability 1/2+ε and flipped with probability 1/2-ε (0 < ε < 1/2), that any learning algorithm requires either a memory of size Ω(n²/ε) or an exponential number of samples. In fact, we study memory-sample lower bounds for a large class of learning problems, as characterized by [Garg et al., 2018], when the samples are noisy. A matrix M: A × X → {-1,1} corresponds to the following learning problem with error parameter ε: an unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a₁, b₁), (a₂, b₂) …, where for every i, a_i ∈ A is chosen uniformly at random and b_i = M(a_i,x) with probability 1/2+ε and b_i = -M(a_i,x) with probability 1/2-ε (0 < ε < 1/2). Assume that k,𝓁, r are such that any submatrix of M of at least 2^{-k} ⋅ |A| rows and at least 2^{-𝓁} ⋅ |X| columns, has a bias of at most 2^{-r}. We show that any learning algorithm for the learning problem corresponding to M, with error parameter ε, requires either a memory of size at least Ω((k⋅𝓁)/ε), or at least 2^{Ω(r)} samples. The result holds even if the learner has an exponentially small success probability (of 2^{-Ω(r)}). In particular, this shows that for a large class of learning problems, same as those in [Garg et al., 2018], any learning algorithm requires either a memory of size at least Ω(((log|X|)⋅(log|A|))/ε) or an exponential number of noisy samples. Our proof is based on adapting the arguments in [Ran Raz, 2017; Garg et al., 2018] to the noisy case.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Machine learning theory
##### Keywords
• learning parity under noise
• space lower bound
• branching program

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Michael Alekhnovich. More on average case vs approximation complexity. In 44th Symposium on Foundations of Computer Science (FOCS 2003), 11-14 October 2003, Cambridge, MA, USA, Proceedings, pages 298-307. IEEE Computer Society, 2003. URL: https://doi.org/10.1109/SFCS.2003.1238204.
2. Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In Conference On Learning Theory, pages 843-856, 2018.
3. Avrim Blum, Adam Kalai, and Hal Wasserman. Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM, 50(4):506-519, 2003. URL: https://doi.org/10.1145/792538.792543.
4. Yuval Dagan, Gil Kur, and Ohad Shamir. Space lower bounds for linear prediction in the streaming model. In Conference on Learning Theory, pages 929-954. PMLR, 2019.
5. Yuval Dagan and Ohad Shamir. Detecting correlations with little memory and communication. In Conference On Learning Theory, pages 1145-1198, 2018.
6. Wei Dai, Stefano Tessaro, and Xihu Zhang. Super-linear time-memory trade-offs for symmetric encryption. Cryptology ePrint Archive, Report 2020/663, 2020. URL: https://eprint.iacr.org/2020/663.
7. Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. On agnostic learning of parities, monomials, and halfspaces. SIAM J. Comput., 39(2):606-645, 2009. URL: https://doi.org/10.1137/070684914.
8. Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990-1002. ACM, 2018.
9. Sumegha Garg, Ran Raz, and Avishay Tal. Time-space lower bounds for two-pass learning. In 34th Computational Complexity Conference (CCC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
10. Uma Girish, Ran Raz, and Wei Zhan. Quantum logspace algorithm for powering matrices with bounded norm. arXiv preprint arXiv:2006.04880, 2020.
11. Jiaxin Guan and Mark Zhandary. Simple schemes in the bounded storage model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 500-524. Springer, 2019.
12. Jiaxin Guan and Mark Zhandry. Disappearing cryptography in the bounded storage model. IACR Cryptol. ePrint Arch., 2021:406, 2021.
13. Joseph Jaeger and Stefano Tessaro. Tight time-memory trade-offs for symmetric encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 467-497. Springer, 2019.
14. Adam Tauman Kalai, Adam R. Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777-1805, 2008. URL: https://doi.org/10.1137/060649057.
15. Michael J. Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 45(6):983-1006, 1998. URL: https://doi.org/10.1145/293347.293351.
16. Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067-1080. ACM, 2017.
17. Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516-1566. PMLR, 2017.
18. Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
19. Michal Moshkovitz and Naftali Tishby. Mixing complexity and its applications to neural networks. arXiv preprint arXiv:1703.00729, 2017.
20. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266-275. IEEE, 2016.
21. Ran Raz. A time-space lower bound for a large class of learning problems. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 732-742, 2017.
22. Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. Advances in Neural Information Processing Systems, 27:163-171, 2014.
23. Vatsal Sharan, Aaron Sidford, and Gregory Valiant. Memory-sample tradeoffs for linear regression with small error. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 890-901, 2019.
24. Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490-1516. PMLR, 2016.
25. Stefano Tessaro and Aishwarya Thiruvengadam. Provable time-memory trade-offs: symmetric cryptography against memory-bounded adversaries. In Theory of Cryptography Conference, pages 3-32. Springer, 2018.
26. Gregory Valiant and Paul Valiant. Information theoretically secure databases. arXiv preprint arXiv:1605.02646, 2016.