Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Authors Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2021.45.pdf
  • Filesize: 0.69 MB
  • 16 pages

Document Identifiers

Author Details

Guy Blanc
  • Stanford University, CA, USA
Jane Lange
  • Massachusetts Institute of Technology, Cambridge, MA, USA
Mingda Qiao
  • Stanford University, CA, USA
Li-Yang Tan
  • Stanford University, CA, USA

Acknowledgements

We thank the anonymous reviewers, whose suggestions have helped improved this paper.

Cite AsGet BibTex

Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. Decision Tree Heuristics Can Fail, Even in the Smoothed Setting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2021.45

Abstract

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • decision trees
  • learning theory
  • smoothed analysis

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Guy Blanc, Neha Gupta, Jane Lange, and Li-Yang Tan. Universal guarantees for decision tree induction via a higher-order splitting criterion. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020. Google Scholar
  2. Guy Blanc, Jane Lange, and Li-Yang Tan. Provable guarantees for decision tree induction: the agnostic setting. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Available at URL: https://arxiv.org/abs/2006.00743.
  3. Guy Blanc, Jane Lange, and Li-Yang Tan. Top-down induction of decision trees: rigorous guarantees and inherent limitations. In Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 1-44, 2020. Google Scholar
  4. Avirm Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC), pages 253-262, 1994. Google Scholar
  5. Avrim Blum. Rank-r decision trees are a subclass of r-decision lists. Inform. Process. Lett., 42(4):183-185, 1992. URL: https://doi.org/10.1016/0020-0190(92)90237-P.
  6. Leo Breiman, Jerome Friedman, Charles Stone, and Richard Olshen. Classification and regression trees. Wadsworth International Group, 1984. Google Scholar
  7. Alon Brutzkus, Amit Daniely, and Eran Malach. On the Optimality of Trees Generated by ID3. ArXiv, abs/1907.05444, 2019. Google Scholar
  8. Alon Brutzkus, Amit Daniely, and Eran Malach. ID3 learns juntas for smoothed product distributions. In Proceedings of the 33rd Annual Conference on Learning Theory (COLT), pages 902-915, 2020. Google Scholar
  9. Nader Bshouty. Exact learning via the monotone theory. In Proceedings of 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 302-311, 1993. Google Scholar
  10. Sitan Chen and Ankur Moitra. Beyond the low-degree algorithm: mixtures of subcubes and their applications. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 869-880, 2019. Google Scholar
  11. Tom Dietterich, Michael Kearns, and Yishay Mansour. Applying the weak learning framework to understand and improve C4.5. In Proceedings of the 13th International Conference on Machine Learning (ICML), pages 96-104, 1996. Google Scholar
  12. Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82(3):231-246, 1989. Google Scholar
  13. Amos Fiat and Dmitry Pechyony. Decision trees: More theoretical justification for practical algorithms. In Proceedings of the 15th International Conference on Algorithmic Learning Theory (ALT), pages 156-170, 2004. Google Scholar
  14. Parikshit Gopalan, Adam Kalai, and Adam Klivans. Agnostically learning decision trees. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), pages 527-536, 2008. Google Scholar
  15. Thomas Hancock. Learning kμ decision trees on the uniform distribution. In Proceedings of the 6th Annual Conference on Computational Learning Theory (COT), pages 352-360, 1993. Google Scholar
  16. Thomas Hancock, Tao Jiang, Ming Li, and John Tromp. Lower bounds on learning decision lists and trees. Information and Computation, 126(2):114-122, 1996. Google Scholar
  17. Elad Hazan, Adam Klivans, and Yang Yuan. Hyperparameter optimization: A spectral approach. Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018. Google Scholar
  18. Jeffrey C. Jackson and Rocco A. Servedio. On learning random dnf formulas under the uniform distribution. Theory of Computing, 2(8):147-172, 2006. URL: https://doi.org/10.4086/toc.2006.v002a008.
  19. Adam Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and smoothed analysis. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 395-404, 2009. Google Scholar
  20. Michael Kearns. Boosting theory towards practice: recent developments in decision tree induction and the weak learning framework (invited talk). In Proceedings of the 13th National Conference on Artificial intelligence (AAAI), pages 1337-1339, 1996. Google Scholar
  21. Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the 28th Annual Symposium on the Theory of Computing (STOC), pages 459-468, 1996. Google Scholar
  22. Michael Kearns and Yishay Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and System Sciences, 58(1):109-128, 1999. Google Scholar
  23. Adam Klivans and Rocco Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(Apr):587-602, 2006. Google Scholar
  24. Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM Journal on Computing, 22(6):1331-1348, 1993. Google Scholar
  25. Homin Lee. On the learnability of monotone functions. PhD thesis, Columbia University, 2009. Google Scholar
  26. Dinesh Mehta and Vijay Raghavan. Decision tree approximations of boolean functions. Theoretical Computer Science, 270(1-2):609-623, 2002. Google Scholar
  27. Ryan O'Donnell and Rocco Servedio. Learning monotone decision trees in polynomial time. SIAM Journal on Computing, 37(3):827-844, 2007. Google Scholar
  28. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81-106, 1986. Google Scholar
  29. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. Google Scholar
  30. Ronald Rivest. Learning decision lists. Machine learning, 2(3):229-246, 1987. Google Scholar
  31. Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385-463, 2004. Google Scholar
  32. Ian Witten, Eibe Frank, Mark Hall, and Christopher Pal. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016. Google Scholar
  33. Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1-37, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail