ACM Other Conferences

10.1145/acmotherconferences

0000000

10.5555/0000000

Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

ICALP 2021

10.4230/LIPIcs.ICALP.2021.30

10003752.10010070.10010071.10010073

Theory of computation~Boolean function learning

500

Learning Stochastic Decision Trees

Blanc

Guy

Stanford University, CA, USA Author Lange

Jane

MIT, Cambridge, MA, USA Author Tan

Li-Yang

Stanford University, CA, USA Author

02 07 2021

30:1 30:16

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an η-corrupted set of uniform random samples labeled by a size-s stochastic decision tree, our algorithm runs in time n^{O(log(s/ε)/ε²)} and returns a hypothesis with error within an additive 2η + ε of the Bayes optimal. An additive 2η is the information-theoretic minimum.

Previously no non-trivial algorithm with a guarantee of O(η) + ε was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

Learning theory decision trees proper learning algorithms adversarial noise

Guy Blanc, Neha Gupta, Jane Lange, and Li-Yang Tan. Universal guarantees for decision tree induction via a higher-order splitting criterion. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020.

Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at URL: https://arxiv.org/abs/2006.00743.

Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1-44, 2020.

Avirm Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC), pages 253-262, 1994.

Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Inform. Process. Lett., 42(4):183-185, 1992.10.1016/0020-0190(92)90237-P

Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 learns juntas for smoothed product distributions. In Proceedings of the 33rd Annual Conference on Learning Theory (COLT), pages 902-915, 2020.

Nader Bshouty. Exact learning via the monotone theory. In Proceedings of 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 302-311, 1993.

Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. Pac learning with nasty noise. Theoretical Computer Science, 288(2):255-275, 2002.

Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869-880, 2019.

Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231-246, 1989.

Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, and Adam Klivans. Superpolynomial lower bounds for learning one-layer neural networks using gradient descent. In Proceedings of the 37th International Conference on Machine Learning (ICML), volume 119, pages 3587-3596, 2020.

Surbhi Goel, Aravind Gollakota, and Adam R. Klivans. Statistical-query lower bounds via functional gradients. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.

Surbhi Goel and Adam Klivans. Learning neural networks with two nonlinear layers in polynomial time. In Proceedings of the 32nd Conference on Learning Theory (COLT), volume 99, pages 1470-1499, 2019.

Surbhi Goel, Adam Klivans, and Raghu Meka. Learning one convolutional layer with overlapping patches. In Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 1783-1791, 2018.

Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527-536, 2008.

Thomas Hancock. Learning kμ decision trees on the uniform distribution. In Proceedings of the 6th Annual Conference on Computational Learning Theory (COT), pages 352-360, 1993.

Thomas Hancock, Tao Jiang, Ming Li, and John Tromp. Lower bounds on learning decision lists and trees. Information and Computation, 126(2):114-122, 1996.

David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1):78-150, 1992.

Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.

Jeffrey C. Jackson and Rocco A. Servedio. On learning random dnf formulas under the uniform distribution. Theory of Computing, 2(8):147-172, 2006.10.4086/toc.2006.v002a008

Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777-1805, 2008.

Adam Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 395-404, 2009.

Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109-128, 1999.

Michael Kearns and Robert Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464-497, 1994.

Michael Kearns, Robert Schapire, and Linda Sellie. Toward efficient agnostic learning. Machine Learning, 17(2/3):115-141, 1994.

Adam Klivans and Rocco Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(Apr):587-602, 2006.

Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331-1348, 1993.

Homin Lee. On the learnability of monotone functions. PhD thesis, Columbia University, 2009.

Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform and learnability. Journal of the ACM, 40(3):607-620, 1993.

Dinesh Mehta and Vijay Raghavan. Decision tree approximations of boolean functions. Theoretical Computer Science, 270(1-2):609-623, 2002.

Ryan O'Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827-844, 2007.

Ronald Rivest. Learning decision lists. Machine learning, 2(3):229-246, 1987.

<book-part-wrapper xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="2.0" xml:lang="en" content-type="research-article">

<collection-meta collection-type="book-series">

<collection-id collection-id-type="doi">10.1145/acmotherconferences</collection-id>

<title-group>

<title>ACM Other Conferences</title>

</title-group>

</collection-meta>

<book-meta>

<book-id book-id-type="acm-id">0000000</book-id>

<book-id book-id-type="doi">10.5555/0000000</book-id>

<book-title-group>

<book-title>Proceedings of the 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)</book-title>

<alt-title alt-title-type="acronym">ICALP 2021</alt-title>

</book-title-group>

</book-meta>

<book-part book-part-type="chapter" xml:lang="en">

<book-part-meta>

<book-part-id book-part-id-type="doi">10.4230/LIPIcs.ICALP.2021.30</book-part-id>

<book-part-id book-part-id-type="article-no">30</book-part-id>

<subj-group subj-group-type="ccs2012">

<compound-subject>

<compound-subject-part content-type="code">10003752.10010070.10010071.10010073</compound-subject-part>

<compound-subject-part content-type="text">Theory of computation~Boolean function learning</compound-subject-part>

<compound-subject-part content-type="weight">500</compound-subject-part>

</compound-subject>

</subj-group>

<title-group>

<title>Learning Stochastic Decision Trees</title>

</title-group>

<contrib-group>

<name>

<surname>Blanc</surname>

<given-names>Guy</given-names>

</name>

<aff>Stanford University, CA, USA</aff>

<role>Author</role>

</contrib>

<name>

<surname>Lange</surname>

<given-names>Jane</given-names>

</name>

<aff>MIT, Cambridge, MA, USA</aff>

<role>Author</role>

</contrib>

<name>

<given-names>Li-Yang</given-names>

</name>

<aff>Stanford University, CA, USA</aff>

<role>Author</role>

</contrib>

</contrib-group>

<pub-date date-type="publication">

</pub-date>

<p>We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an η-corrupted set of uniform random samples labeled by a size-s stochastic decision tree, our algorithm runs in time n^{O(log(s/ε)/ε²)} and returns a hypothesis with error within an additive 2η + ε of the Bayes optimal. An additive 2η is the information-theoretic minimum. </p>

<p>Previously no non-trivial algorithm with a guarantee of O(η) + ε was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.</p>

</abstract>

<kwd-group>

<kwd>Learning theory</kwd>

<kwd>decision trees</kwd>

<kwd>proper learning algorithms</kwd>

<kwd>adversarial noise</kwd>

</kwd-group>

</book-part-meta>

<back>

<ref-list specific-use="unparsed">

<mixed-citation>Guy Blanc, Neha Gupta, Jane Lange, and Li-Yang Tan. Universal guarantees for decision tree induction via a higher-order splitting criterion. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020.</mixed-citation>

</ref>

<mixed-citation>Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at URL: https://arxiv.org/abs/2006.00743.</mixed-citation>

</ref>

<mixed-citation>Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1-44, 2020.</mixed-citation>

</ref>

<mixed-citation>Avirm Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC), pages 253-262, 1994.</mixed-citation>

</ref>

<mixed-citation>

Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Inform. Process. Lett., 42(4):183-185, 1992.

<pub-id pub-id-type="doi" xlink:href="10.1016/0020-0190(92)90237-P">10.1016/0020-0190(92)90237-P</pub-id>

</mixed-citation>

</ref>

<mixed-citation>Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 learns juntas for smoothed product distributions. In Proceedings of the 33rd Annual Conference on Learning Theory (COLT), pages 902-915, 2020.</mixed-citation>

</ref>

<mixed-citation>Nader Bshouty. Exact learning via the monotone theory. In Proceedings of 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 302-311, 1993.</mixed-citation>

</ref>

<mixed-citation>Nader H Bshouty, Nadav Eiron, and Eyal Kushilevitz. Pac learning with nasty noise. Theoretical Computer Science, 288(2):255-275, 2002.</mixed-citation>

</ref>

<mixed-citation>Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869-880, 2019.</mixed-citation>

</ref>

<mixed-citation>Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231-246, 1989.</mixed-citation>

</ref>

<mixed-citation>Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, and Adam Klivans. Superpolynomial lower bounds for learning one-layer neural networks using gradient descent. In Proceedings of the 37th International Conference on Machine Learning (ICML), volume 119, pages 3587-3596, 2020.</mixed-citation>

</ref>

<mixed-citation>Surbhi Goel, Aravind Gollakota, and Adam R. Klivans. Statistical-query lower bounds via functional gradients. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.</mixed-citation>

</ref>

<mixed-citation>Surbhi Goel and Adam Klivans. Learning neural networks with two nonlinear layers in polynomial time. In Proceedings of the 32nd Conference on Learning Theory (COLT), volume 99, pages 1470-1499, 2019.</mixed-citation>

</ref>

<mixed-citation>Surbhi Goel, Adam Klivans, and Raghu Meka. Learning one convolutional layer with overlapping patches. In Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 1783-1791, 2018.</mixed-citation>

</ref>

<mixed-citation>Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527-536, 2008.</mixed-citation>

</ref>

<mixed-citation>Thomas Hancock. Learning kμ decision trees on the uniform distribution. In Proceedings of the 6th Annual Conference on Computational Learning Theory (COT), pages 352-360, 1993.</mixed-citation>

</ref>

<mixed-citation>Thomas Hancock, Tao Jiang, Ming Li, and John Tromp. Lower bounds on learning decision lists and trees. Information and Computation, 126(2):114-122, 1996.</mixed-citation>

</ref>

<mixed-citation>David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1):78-150, 1992.</mixed-citation>

</ref>

<mixed-citation>Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.</mixed-citation>

</ref>

<mixed-citation>

Jeffrey C. Jackson and Rocco A. Servedio. On learning random dnf formulas under the uniform distribution. Theory of Computing, 2(8):147-172, 2006.

<pub-id pub-id-type="doi" xlink:href="10.4086/toc.2006.v002a008">10.4086/toc.2006.v002a008</pub-id>

</mixed-citation>

</ref>

<mixed-citation>Adam Kalai, Adam Klivans, Yishay Mansour, and Rocco A. Servedio. Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777-1805, 2008.</mixed-citation>

</ref>

<mixed-citation>Adam Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 395-404, 2009.</mixed-citation>

</ref>

<mixed-citation>Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109-128, 1999.</mixed-citation>

</ref>

<mixed-citation>Michael Kearns and Robert Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464-497, 1994.</mixed-citation>

</ref>

<mixed-citation>Michael Kearns, Robert Schapire, and Linda Sellie. Toward efficient agnostic learning. Machine Learning, 17(2/3):115-141, 1994.</mixed-citation>

</ref>

<mixed-citation>Adam Klivans and Rocco Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(Apr):587-602, 2006.</mixed-citation>

</ref>

<mixed-citation>Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331-1348, 1993.</mixed-citation>

</ref>

<mixed-citation>Homin Lee. On the learnability of monotone functions. PhD thesis, Columbia University, 2009.</mixed-citation>

</ref>

<mixed-citation>Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform and learnability. Journal of the ACM, 40(3):607-620, 1993.</mixed-citation>

</ref>

<mixed-citation>Dinesh Mehta and Vijay Raghavan. Decision tree approximations of boolean functions. Theoretical Computer Science, 270(1-2):609-623, 2002.</mixed-citation>

</ref>

<mixed-citation>Ryan O'Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827-844, 2007.</mixed-citation>

</ref>

<mixed-citation>Ronald Rivest. Learning decision lists. Machine learning, 2(3):229-246, 1987.</mixed-citation>

</ref>

</ref-list>

</back>

</book-part>

</book-part-wrapper>