Document

# Time-Space Lower Bounds for Two-Pass Learning

## File

LIPIcs.CCC.2019.22.pdf
• Filesize: 0.7 MB
• 39 pages

## Cite As

Sumegha Garg, Ran Raz, and Avishay Tal. Time-Space Lower Bounds for Two-Pass Learning. In 34th Computational Complexity Conference (CCC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 137, pp. 22:1-22:39, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CCC.2019.22

## Abstract

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz, 2016; Kol et al., 2017; Raz, 2017; Moshkovitz and Moshkovitz, 2018; Beame et al., 2018; Garg et al., 2018]. For example, any algorithm for learning parities of size n requires either a memory of size Omega(n^{2}) or an exponential number of samples [Raz, 2016]. All these works modeled the learner as a one-pass branching program, allowing only one pass over the stream of samples. In this work, we prove the first memory-samples lower bounds (with a super-linear lower bound on the memory size and super-polynomial lower bound on the number of samples) when the learner is allowed two passes over the stream of samples. For example, we prove that any two-pass algorithm for learning parities of size n requires either a memory of size Omega(n^{1.5}) or at least 2^{Omega(sqrt{n})} samples. More generally, a matrix M: A x X - > {-1,1} corresponds to the following learning problem: An unknown element x in X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a_1, b_1), (a_2, b_2) ..., where for every i, a_i in A is chosen uniformly at random and b_i = M(a_i,x). Assume that k,l, r are such that any submatrix of M of at least 2^{-k} * |A| rows and at least 2^{-l} * |X| columns, has a bias of at most 2^{-r}. We show that any two-pass learning algorithm for the learning problem corresponding to M requires either a memory of size at least Omega (k * min{k,sqrt{l}}), or at least 2^{Omega(min{k,sqrt{l},r})} samples.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Machine learning theory
• Theory of computation → Circuit complexity
##### Keywords
• branching program
• two-pass streaming
• PAC learning
• lower bounds

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. David A Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC1. Journal of Computer and System Sciences, 38(1):150-164, 1989.
2. Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In Conference On Learning Theory, pages 843-856, 2018.
3. Benny Chor and Oded Goldreich. Unbiased bits from sources of weak randomness and probabilistic communication complexity. SIAM Journal on Computing, 17(2):230-261, 1988.
4. Yuval Dagan and Ohad Shamir. Detecting Correlations with Little Memory and Communication. In Conference On Learning Theory, pages 1145-1198, 2018.
5. Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990-1002. ACM, 2018.
6. Gillat Kol and Ran Raz. Interactive channel capacity. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 715-724. ACM, 2013.
7. Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067-1080. ACM, 2017.
8. Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516-1566, 2017.
9. Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
10. Michal Moshkovitz and Naftali Tishby. Mixing complexity and its applications to neural networks. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.00729.
11. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266-275. IEEE, 2016.
12. Ran Raz. A time-space lower bound for a large class of learning problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 732-742. IEEE, 2017.
13. Miklos Santha and Umesh V Vazirani. Generating quasi-random sequences from slightly-random sources. In Foundations of Computer Science, 1984. 25th Annual Symposium on, pages 434-440. IEEE, 1984.
14. Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems, pages 163-171, 2014.
15. Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490-1516, 2016.
16. Gregory Valiant and Paul Valiant. Information theoretically secure databases. arXiv preprint, 2016. URL: http://arxiv.org/abs/1605.02646.