LIPIcs.CP.2023.33.pdf
- Filesize: 1.54 MB
- 19 pages
Decision trees are a popular classification model in machine learning due to their interpretability and performance. However, the number of splits in decision trees grow exponentially with their depth which can incur a higher computational cost, increase data fragmentation, hinder interpretability, and restrict their applicability to memory-constrained hardware. In constrast, binary decision diagrams (BDD) utilize the same split across each level, leading to a linear number of splits in total. Recent work has considered optimal binary decision diagrams (BDD) as compact and accurate classification models, but has only focused on binary datasets and has not explicitly optimized the compactness of the resulting diagrams. In this work, we present a SAT-based encoding for a multi-terminal variant of BDDs (MTBDDs) that incorporates a state-of-the-art direct encoding of numerical features. We then develop and evaluate different approaches to explicitly optimize the compactness of the diagrams. In one family of approaches, we learn a tree BDD first and model the size of the diagram the tree will be reduced to as a secondary objective, in a one-stage or two-stage optimization scheme. Alternatively, we directly learn diagrams that support multi-dimensional splits for improved expressiveness. Our experiments show that direct encoding of numerical features leads to better performance. Furthermore, we show that exact optimization of size leads to more compact solutions while maintaining higher accuracy. Finally, our experiments show that multi-dimensional splits are a viable approach to achieving higher expressiveness with a lower computational cost.
Feedback for Dagstuhl Publishing