License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CP.2021.50
URN: urn:nbn:de:0030-drops-153416
URL: https://drops.dagstuhl.de/opus/volltexte/2021/15341/
Go to the corresponding LIPIcs Volume Portal


Shati, Pouya ; Cohen, Eldan ; McIlraith, Sheila

SAT-Based Approach for Learning Optimal Decision Trees with Non-Binary Features

pdf-format:
LIPIcs-CP-2021-50.pdf (0.7 MB)


Abstract

Decision trees are a popular classification model in machine learning due to their interpretability and performance. Traditionally, decision-tree classifiers are constructed using greedy heuristic algorithms, however these algorithms do not provide guarantees on the quality of the resultant trees. Instead, a recent line of work has studied the use of exact optimization approaches for constructing optimal decision trees. Most of the recent approaches that employ exact optimization are designed for datasets with binary features. While numeric and categorical features can be transformed to binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a novel SAT-based encoding for decision trees that supports non-binary features and demonstrate how it can be used to solve two well-studied variants of the optimal decision tree problem. We perform an extensive empirical analysis that shows our approach obtains superior performance and is often an order of magnitude faster than the current state-of-the-art exact techniques on non-binary datasets.

BibTeX - Entry

@InProceedings{shati_et_al:LIPIcs.CP.2021.50,
  author =	{Shati, Pouya and Cohen, Eldan and McIlraith, Sheila},
  title =	{{SAT-Based Approach for Learning Optimal Decision Trees with Non-Binary Features}},
  booktitle =	{27th International Conference on Principles and Practice of Constraint Programming (CP 2021)},
  pages =	{50:1--50:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-211-2},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{210},
  editor =	{Michel, Laurent D.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/15341},
  URN =		{urn:nbn:de:0030-drops-153416},
  doi =		{10.4230/LIPIcs.CP.2021.50},
  annote =	{Keywords: Decision Tree, Classification, Numeric Data, Categorical Data, SAT, MaxSAT}
}

Keywords: Decision Tree, Classification, Numeric Data, Categorical Data, SAT, MaxSAT
Collection: 27th International Conference on Principles and Practice of Constraint Programming (CP 2021)
Issue Date: 2021
Date of publication: 15.10.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI