Time-Space Lower Bounds for Two-Pass Learning

Authors Sumegha Garg, Ran Raz, Avishay Tal



PDF
Thumbnail PDF

File

LIPIcs.CCC.2019.22.pdf
  • Filesize: 0.7 MB
  • 39 pages

Document Identifiers

Author Details

Sumegha Garg
  • Department of Computer Science, Princeton University, USA
Ran Raz
  • Department of Computer Science, Princeton University, USA
Avishay Tal
  • Department of Computer Science, Stanford University, USA

Cite AsGet BibTex

Sumegha Garg, Ran Raz, and Avishay Tal. Time-Space Lower Bounds for Two-Pass Learning. In 34th Computational Complexity Conference (CCC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 137, pp. 22:1-22:39, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.CCC.2019.22

Abstract

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz, 2016; Kol et al., 2017; Raz, 2017; Moshkovitz and Moshkovitz, 2018; Beame et al., 2018; Garg et al., 2018]. For example, any algorithm for learning parities of size n requires either a memory of size Omega(n^{2}) or an exponential number of samples [Raz, 2016]. All these works modeled the learner as a one-pass branching program, allowing only one pass over the stream of samples. In this work, we prove the first memory-samples lower bounds (with a super-linear lower bound on the memory size and super-polynomial lower bound on the number of samples) when the learner is allowed two passes over the stream of samples. For example, we prove that any two-pass algorithm for learning parities of size n requires either a memory of size Omega(n^{1.5}) or at least 2^{Omega(sqrt{n})} samples. More generally, a matrix M: A x X - > {-1,1} corresponds to the following learning problem: An unknown element x in X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a_1, b_1), (a_2, b_2) ..., where for every i, a_i in A is chosen uniformly at random and b_i = M(a_i,x). Assume that k,l, r are such that any submatrix of M of at least 2^{-k} * |A| rows and at least 2^{-l} * |X| columns, has a bias of at most 2^{-r}. We show that any two-pass learning algorithm for the learning problem corresponding to M requires either a memory of size at least Omega (k * min{k,sqrt{l}}), or at least 2^{Omega(min{k,sqrt{l},r})} samples.

Subject Classification

ACM Subject Classification
  • Theory of computation → Machine learning theory
  • Theory of computation → Circuit complexity
Keywords
  • branching program
  • time-space tradeoffs
  • two-pass streaming
  • PAC learning
  • lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. David A Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC1. Journal of Computer and System Sciences, 38(1):150-164, 1989. Google Scholar
  2. Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In Conference On Learning Theory, pages 843-856, 2018. Google Scholar
  3. Benny Chor and Oded Goldreich. Unbiased bits from sources of weak randomness and probabilistic communication complexity. SIAM Journal on Computing, 17(2):230-261, 1988. Google Scholar
  4. Yuval Dagan and Ohad Shamir. Detecting Correlations with Little Memory and Communication. In Conference On Learning Theory, pages 1145-1198, 2018. Google Scholar
  5. Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990-1002. ACM, 2018. Google Scholar
  6. Gillat Kol and Ran Raz. Interactive channel capacity. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 715-724. ACM, 2013. Google Scholar
  7. Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067-1080. ACM, 2017. Google Scholar
  8. Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516-1566, 2017. Google Scholar
  9. Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  10. Michal Moshkovitz and Naftali Tishby. Mixing complexity and its applications to neural networks. arXiv preprint, 2017. URL: http://arxiv.org/abs/1703.00729.
  11. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266-275. IEEE, 2016. Google Scholar
  12. Ran Raz. A time-space lower bound for a large class of learning problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 732-742. IEEE, 2017. Google Scholar
  13. Miklos Santha and Umesh V Vazirani. Generating quasi-random sequences from slightly-random sources. In Foundations of Computer Science, 1984. 25th Annual Symposium on, pages 434-440. IEEE, 1984. Google Scholar
  14. Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems, pages 163-171, 2014. Google Scholar
  15. Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490-1516, 2016. Google Scholar
  16. Gregory Valiant and Paul Valiant. Information theoretically secure databases. arXiv preprint, 2016. URL: http://arxiv.org/abs/1605.02646.