Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Blanc, Guy; Lange, Jane; Qiao, Mingda; Tan, Li-Yang

doi:10.4230/LIPIcs.APPROX/RANDOM.2021.45

Subject Classification

ACM Subject Classification

Theory of computation → Design and analysis of algorithms

Keywords

decision trees
learning theory
smoothed analysis

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.

Cite As Get BibTex

Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. Decision Tree Heuristics Can Fail, Even in the Smoothed Setting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.45

Author Details

Guy Blanc

Stanford University, CA, USA

Jane Lange

Massachusetts Institute of Technology, Cambridge, MA, USA

Mingda Qiao

Stanford University, CA, USA

Li-Yang Tan

Stanford University, CA, USA

References

Guy Blanc, Neha Gupta, Jane Lange, and Li-Yang Tan. Universal guarantees for decision tree induction via a higher-order splitting criterion. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020.
Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at URL: https://arxiv.org/abs/2006.00743.
Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1-44, 2020.
Avirm Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC), pages 253-262, 1994.
Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Inform. Process. Lett., 42(4):183-185, 1992. URL: https://doi.org/10.1016/0020-0190(92)90237-P.
Leo Breiman, Jerome Friedman, Charles Stone, and Richard Olshen. Classification and regression trees. Wadsworth International Group, 1984.
Alon Brutzkus, Amit Daniely, and Eran Malach. On the Optimality of Trees Generated by ID3. ArXiv, abs/1907.05444, 2019.
Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 learns juntas for smoothed product distributions. In Proceedings of the 33rd Annual Conference on Learning Theory (COLT), pages 902-915, 2020.
Nader Bshouty. Exact learning via the monotone theory. In Proceedings of 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 302-311, 1993.
Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869-880, 2019.
Tom Dietterich, Michael Kearns, and Yishay Mansour. Applying the weak learning framework to understand and improve C4.5. In Proceedings of the 13th International Conference on Machine Learning (ICML), pages 96-104, 1996.
Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231-246, 1989.
Amos Fiat and Dmitry Pechyony. Decision trees: More theoretical justification for practical algorithms. In Proceedings of the 15th International Conference on Algorithmic Learning Theory (ALT), pages 156-170, 2004.
Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527-536, 2008.
Thomas Hancock. Learning kμ decision trees on the uniform distribution. In Proceedings of the 6th Annual Conference on Computational Learning Theory (COT), pages 352-360, 1993.
Thomas Hancock, Tao Jiang, Ming Li, and John Tromp. Lower bounds on learning decision lists and trees. Information and Computation, 126(2):114-122, 1996.
Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
Jeffrey C. Jackson and Rocco A. Servedio. On learning random dnf formulas under the uniform distribution. Theory of Computing, 2(8):147-172, 2006. URL: https://doi.org/10.4086/toc.2006.v002a008.
Adam Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 395-404, 2009.
Michael Kearns. Boosting theory towards practice: recent developments in decision tree induction and the weak learning framework (invited talk). In Proceedings of the 13th National Conference on Artificial intelligence (AAAI), pages 1337-1339, 1996.
Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the 28th Annual Symposium on the Theory of Computing (STOC), pages 459-468, 1996.
Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109-128, 1999.
Adam Klivans and Rocco Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(Apr):587-602, 2006.
Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331-1348, 1993.
Homin Lee. On the learnability of monotone functions. PhD thesis, Columbia University, 2009.
Dinesh Mehta and Vijay Raghavan. Decision tree approximations of boolean functions. Theoretical Computer Science, 270(1-2):609-623, 2002.
Ryan O'Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827-844, 2007.
Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81-106, 1986.
Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
Ronald Rivest. Learning decision lists. Machine learning, 2(3):229-246, 1987.
Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385-463, 2004.
Ian Witten, Eibe Frank, Mark Hall, and Christopher Pal. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016.
Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1-37, 2008.

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Authors Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Acknowledgements

References

Thanks for your feedback!

Could not send message

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Authors Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

Acknowledgements

References

Thanks for your feedback!

Could not send message