The Hardness of Decision Tree Complexity

Loff, Bruno; Milovanov, Alexey

doi:10.4230/LIPIcs.STACS.2025.66

The Hardness of Decision Tree Complexity

Bruno Loff

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal Alexey Milovanov

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal

Abstract

Let $f$ be a Boolean function given as either a truth table or a circuit. How difficult is it to find the decision tree complexity, also known as deterministic query complexity, of $f$ in both cases? We prove that this problem is $\mathrm{NC^{1}}$ -hard and PSPACE-hard, respectively. The second bound is tight, and the first bound is close to being tight.

Keywords and phrases:

Decision tree, Log-depth circuits

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

Acknowledgements:

The authors would like to thank Wei Zhan for his answer at StackOverflow.

Funding:

This work was funded by the European Union (ERC, HOFGA, 101041696). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. It was also supported by FCT through the LASIGE Research Unit, ref. UIDB/00408/2025 and ref. UIDP/00408/2025, and by CMAFcIO, FCT Project UIDB/04561/2020, https://doi.org/10.54499/UIDB/04561/2020.

DOI:

10.4230/LIPIcs.STACS.2025.66

Event:

42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)

Editors:

Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim Thắng

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

The decision tree is one of the most important computational models in theoretical computer science. Decision trees were invented in the 50s with the purpose of analyzing data. In this context, at each node in the tree we query some feature of the data, which partitions the data points depending on the value of this chosen feature. The resulting partition at the leaves should allow us to better understand the data. Starting in the 1980s, several learning algorithms were developed that would process data and produce a classifier. Meaning, we assume the existence of some function $f$ of which we know some sampled pairs $(x,f(x))$ (the data), and we wish to produce a decision tree that would be able to predict $f(x)$ , even on a previously-unseen input $x$ (some famous algorithms are CHAID, CART, ID3, and C4.5, see ([9, 4, 14, 15])). The goal here is to produce the smallest possible decision tree, while making the fewest possible mistakes. This task, of decision-tree synthesis, is used in applications (e.g. [19]).

But one can also consider a more algorithmic problem, which is of theoretical interest. Here, the function $f$ is completely known (i.e., we know $(x,f(x))$ for all $x$ ), and we wish to produce a decision tree which computes $f$ (e.g., without any mistakes). As far as we are aware it was in the 1970s (e.g. [20]) when people first studied, for specific functions $f$ , and for specific ways of querying the input $x$ , how small can be the depth of a decision tree that computes $f(x)$ when given $x$ as input. In the meantime, decision trees have become a ubiquitous computational model, useful in the study of various kinds of computation. To give a few examples, decision trees are relevant to the study of data structures (the cell-probe model [13, 11]), cryptographic reductions (in black-box reductions [17]), and communication complexity¹¹1In lifting theorems. Some examples of lifting theorems that use deterministic decision trees are: [16, 7, 6, 5, 12].. There is a meta-complexity problem underlying this algorithmic problem: how does one determine the decision-tree complexity of a given function $f$ ? This question has been studied before, usually for the learning problem [10], but also for the algorithmic problem [1, 18].

Let us restrict our attention to the simplest of all decision-tree models, where the known function $f:\{0,1\}^{n}\to\{0,1\}$ is a Boolean function, and the computational model is deterministic decision tree that must compute $f(x)$ by querying the bits of $x$ . Here, we can consider two scenarios, with respect to how $f$ is given:

(tt-DT): We are given $f$ as a truth table, meaning a binary string of length $N=2^{n}$ so that $f(x)$ appears at the $x$ -th position.
(circuit-DT): We are given $f$ as a Boolean circuit, which potentially allows for a more succinct encoding of $f$ .

The meta-complexity problem is: we are given $f$ as input, either as a truth-table (tt-DT) or as a circuit (circuit-DT) and we wish to find the deterministic query complexity complexity of $f$ , namely, the smallest depth of a deterministic decision tree that computes $f(x)$ by querying the bits of $x$ . How hard is this problem? In this paper, we give a satisfactory answer for both scenarios, essentially via the same technique.
It is not difficult to see that circuit-DT belongs to PSPACE: see Proposition 7 below. One could immediately conjecture that the problem is PSPACE-hard, but one soon comes across various difficulties in proving such a statement. (After the required definitions are in place, in Section 3.2, we discuss precisely where the difficulty lies.) Our first main result, Theorem 10, is showing that this problem is indeed PSPACE-hard.
It is also well-known, and appears as Proposition 6 below, that tt-DT belongs to P. Meaning, denoting the input length for tt-DT as $N=2^{n}$ , which is the length of the truth-table of a Boolean function $f:\{0,1\}^{n}\to\{0,1\}$ , Proposition 6 states that tt-DT can be solved in $\mathrm{poly}(N)$ time. In Proposition 9, we observe that tt-DT can also be computed in parallel, namely, by a Boolean circuit of depth $O(\log N\log\log N)$ and size $\mathrm{poly}(N)$ . This result is simple enough that it should be considered folklore: easy to prove for anyone who would care to do so. However, despite being a natural and fundamental problem, no matching lower-bound was known. Our second main result, Theorem 11, shows that the above bound is close to tight: tt-DT is $\mathrm{NC^{1}}$ -hard (under uniform $\mathrm{NC^{0}}$ reductions).

2 Definitions and upper bounds

Definition 1.

We let $\{0,1\}^{n}$ denote the set of all binary strings of length $n$ , sometimes called the Boolean cube. We let $\{0,1,\ast\}^{n}$ denote the set of ( $n$ -bit) partial assignments. The elements $\rho\in\{0,1,\ast\}^{n}$ are in bijection with the Boolean subcubes:

\{x\in\{0,1\}^{n}\mid\forall i\in[n]\quad\rho_{i}\neq\ast\implies x_{i}=\rho_{% i}\}.

We will denote this set also by $\rho$ , by abuse of notation.

We let $[i\leftarrow b]=\ast^{i-1}b\ast^{n-i-1}$ assign $b$ to the $i$ -th coordinate. Two partial assignment $\rho$ and $\rho^{\prime}$ are called compatible if $\rho_{i}\neq\ast$ and $\rho^{\prime}_{i}\neq\ast$ implies $\rho_{i}=\rho^{\prime}_{i}$ .

If $\rho,\rho^{\prime}\in\{0,1,\ast\}^{n}$ are compatible, then $\rho\cdot\rho^{\prime}$ is the partial assignment such that $(\rho\cdot\rho^{\prime})_{i}$ equals to $\rho_{i}$ if $\rho_{i}\neq\ast$ , equals $\rho^{\prime}_{i}$ if $\rho^{\prime}_{i}\neq\ast$ , and equals $\ast$ if $\rho_{i}=\rho^{\prime}_{i}=\ast$ .

If $|\rho^{-1}(\ast)|=\ell$ and $y\in\{0,1\}^{\ell}$ , we let $\rho(y)\in\{0,1\}^{n}$ be the binary string which equals $\rho$ where $\rho_{i}\neq\ast$ , and equals $y$ in the remaining coordinates (which are filled in order).

Given a Boolean function $f:\{0,1\}^{n}\to\{0,1\}$ and a partial assignment $\rho\in\{0,1,\ast\}^{n}$ with $\ell=|\rho^{-1}(\ast)|$ the number of $\ast$ , we let $f|_{\rho}:\{0,1\}^{\ell}\to\{0,1\}$ be given by

f|_{\rho}(y)=f(\rho(y)).

I.e. it is the restriction of $f$ to the Boolean subcube $\rho$ .

Definition 2.

A deterministic decision tree over $\{0,1\}^{n}$ is a rooted, labelled, ordered binary tree $T$ :

$\blacksquare$

Each non-leaf node $v$ is labelled by an index $i_{v}\in[n]$ .
$\blacksquare$

Each non-leaf node $v$ has two children $v_{0}$ and $v_{1}$ .

Associated with each node $v$ of $T$ is a partial assignment $\rho_{v}$

$\blacksquare$

The root is associated with $\ast^{n}$ .
$\blacksquare$

For a non-leaf node $v$ , and for both $b\in\{0,1\}$ , $\rho_{v_{b}}=\rho_{v}\cdot[x_{i_{v}}=b]$ .

Definition 3.

The computation of $T$ on input $x\in\{0,1\}^{n}$ is a path in $T$ , which begins at the root, and proceeds at each node $v$ by going to the child $v_{x_{i_{v}}}$ , until it reaches a leaf.

It is easy to see that $\rho_{v}$ is the set of inputs whose computation goes through $v$ . We have $(\rho_{v})_{i}=\ast$ if and only if the coordinate $x_{i}$ has not yet been queried in the computation between the root and node $v$ , i.e., at or before $v$ . If $x_{i}$ has been queried at or before $v$ , then $(\rho_{v})_{i}=x_{i}$ for every $x$ whose computation goes through $v$ .

Definition 4.

We say that $T$ computes a function $f:\{0,1\}^{n}\to\{0,1\}$ if, for every leaf $v$ of $T$ , $f$ is constant on $\rho_{v}$ . The deterministic query complexity of a function $f:\{0,1\}^{n}\to\{0,1\}$ , which will be denoted $\mathsf{D}(f)$ , is the smallest depth of any decision tree that computes $f$ .

Definition 5.

We let tt-DT be the computational problem where we are given the truth-table of a function $f:\{0,1\}^{n}\to\{0,1\}$ , and wish to compute $\mathsf{D}(f)$ . We let circuit-DT be the computational problem where we are given a Boolean circuit $C$ computing a function $f:\{0,1\}^{n}\to\{0,1\}$ , and we wish to compute $\mathsf{D}(f)$ .

The simplest observation that one can make is that tt-DT has a polynomial-time algorithm.

Proposition 6 ([8, 1]).

tt-DT belongs to P. More precisely, there is an algorithm that computes the DT-complexity of an $n$ -ary Boolean function in time $O(3^{n}\cdot n)=O(N^{1.585\ldots}\log N)$ , where $N=2^{n}$ .

This algorithm should be considered folklore, but we include it here because it is simple and insightful.

Proof.

The main, crucial observation is that the best decision tree for $f$ must first choose a coordinate $i$ , query $x_{i}$ , and then run the best decision tree for $f|_{[i\leftarrow x_{i}]}$ . In general, for any partial assignment $\rho\in\{0,1,\ast\}^{n}$ , $\mathsf{D}(f|_{\rho})=0$ if $f$ is constant on $\rho$ , and otherwise

\mathsf{D}(f|_{\rho})=\min_{i\in\rho^{-1}(\ast)}\{1+\max_{b\in\{0,1\}}\mathsf{% D}(f|_{\rho\cdot[i\leftarrow b]})\}.

This gives us a dynamic programming algorithm: knowing $\mathsf{D}(f|_{\rho})$ for all partial assignments $\rho\in\{0,1,\ast\}^{n}$ with $|\rho^{-1}(\ast)|=\ell$ free variables, we can use the above formula to compute $\mathsf{D}(f|_{\rho})$ for all $\rho\in\{0,1,\ast\}^{n}$ with $|\rho^{-1}(\ast)|=\ell+1$ free variables. Finally $f=f|_{\ast^{n}}$ , so we learned $\mathsf{D}(f)$ . There are $3^{n}$ partial assignments in total, and each computation $\mathsf{D}(f|_{\rho})$ takes time $O(n)$ in a random-access machine. $\hfill\blacktriangleleft$ Some more insight will come from the following game reformulation of the statement “ $\mathsf{D}(f)\leq k$ ”. Consider the following game between two players, Alice and Bob. The game lasts for $k$ steps. At every step, Alice chooses a variable $x_{i}$ , and Bob sets a Boolean value to the corresponding variable, either $x_{i}=0$ or $x_{i}=1$ . After $k$ steps, Alice wins $f|_{\rho}$ is constant on the partial assignment $\rho$ corresponding to Alice and Bob’s moves; otherwise, Bob wins. It follows that Alice has a winning strategy in this game if and only if $\mathsf{D}(f)\leq k$ . Indeed, if $\mathsf{D}(f)\leq k$ , then Alice can make moves according to the corresponding tree, and if $\mathsf{D}(f)>k$ , then Bob wins because this inequality means that for every $i$ , the decision tree complexity of $f|_{[i\leftarrow 1]}$ or $f|_{[i\leftarrow 0]}$ is at least $k$ . Bob’s strategy is then to repeatedly choose the value $b\in\{0,1\}$ that maximizes $\mathsf{D}(f|_{[i\leftarrow b]})$ . One can algorithmically find the winner in this game by a simple recursive algorithm. It is easy to see that, if a Boolean function $f:\{0,1\}^{n}\to\{0,1\}$ is given as a Boolean circuit $C$ , an algorithm can decide which of the two players has a winning strategy, using $\mathrm{poly}(n,|C|)$ memory. So, we get the following:

Proposition 7.

circuit-DT belongs to PSPACE.

One may now ask whether Proposition 6 can be at all improved. Indeed, the algorithm can be parallelized. First, a definition:

Definition 8.

For $i\in{\mathbb{N}}$ , we let $\mathrm{NC^{i}}$ denote the class of functions $f:\{0,1\}^{n}\to\{0,1\}^{m}$ computable by Boolean circuits with binary AND and OR gates, and unary NOT gates, in depth $O((\log n)^{i})$ and size $\mathrm{poly}(n,m)$ . We let $\widetilde{\mathrm{NC^{1}}}$ denote the class of functions $f:\{0,1\}^{n}\to\{0,1\}^{m}$ computable by such circuits in depth $O(\log n\cdot\log\log n)$ and size $\mathrm{poly}(n,m)$ .

We now claim the following.

Proposition 9.

tt-DT can be computed by a Boolean circuit of size $O(3^{n}\cdot\mathrm{poly}(n))=O(N^{1.583\dots}\mathop{\mathrm{polylog}}N)$ and depth $O(n\log n)=O(\log N\log\log N$ ). Hence, tt-DT is in $\widetilde{NC^{1}}$ .

Proof.

The equation (2) directly gives us a circuit that uses $O(3^{n})$ min, max, increment, and all-equal gates: for each partial assignment $\rho$ we check if $f|_{\rho}$ is constant using an all-equal gate, otherwise we compute the formula given by (2). This circuit has depth $O(n)$ using such gates. Implementing such gates using Boolean gates will result in a circuit of depth $O(n\log n)$ . $\hfill\blacktriangleleft$

3 Lower bounds

Our two main theorems are the following.

Theorem 10.

circuit-DT is PSPACE-hard under polynomial-time reductions.

Theorem 11.

tt-DT is $\mathrm{NC^{1}}$ -hard under $\mathrm{NC^{0}}$ -reduction.

The first theorem is well understood, but the second requires some clarification regarding reductions and uniformity. Recall the definition of DLOGTIME-uniform cicuits from [3].

Definition 12.

Let ${\mathcal{C}}=\{C_{n}\mid n\in{\mathbb{N}}\}$ be a family of Boolean circuits, so that $C_{n}$ computes a function $f:\{0,1\}^{n}\to\{0,1\}^{m(n)}$ .

$\blacksquare$

The “direct connection language” of ${\mathcal{C}}$ is the set of tuples $\langle 0^{n},t,a,b\rangle$ , where $a\in[|C_{n}|]$ and $b\in\{0\}\cup[|C_{n}|]$ are gate numbers in $C_{n}$ , $t$ is the type of gate $a$ (AND, OR, NOT, or input gate $x_{i}$ ), and gate $b$ is a child of gate $a$ (and equals $0$ if $a$ is a leaf).
$\blacksquare$

The circuit family ${\mathcal{C}}$ is DLOGTIME-uniform if its direct connection language can be recognized in $O(\log(n))$ time by a deterministic multi-tape Turing machine with an index tape for random access to the input.

We say that language $A\subseteq\{0,1\}^{\ast}$ is $\mathrm{NC^{0}}$ -reduced to language $B\subseteq\{0,1\}^{\ast}$ , which we write $A\leq_{\mathrm{NC^{0}}}B$ , if there is a DLOGTIME-uniform family of $\mathrm{NC^{0}}$ -circuits ${\mathcal{C}}=\{C_{n}\}$ such that, for every $x\in\{0,1\}^{n}$ , $x\in A$ iff $C_{n}(x)\in B$ .

It is not difficult to see that this type of reduction satisfies the natural properties like transitivity and closure: If $A\leq_{\mathrm{NC^{0}}}B$ and $B\in\mathrm{NC^{1}}$ then $A\in\mathrm{NC^{1}}$ (this holds for both uniform and non-uniform variants of $\mathrm{NC^{1}}$ ).

3.1 TQBF

The proof of theorems 10 and 11 are similar: we prove that TQBF reduces to DT. Recall that TQBF (True Quantified Boolean Formula) is the problem of determining the truth of a formula

\exists y_{1}\forall x_{1}\exists y_{2}\forall x_{2}\ldots\exists y_{n}\forall x% _{n}h(y_{1},x_{1},y_{2},x_{2},\ldots y_{n},x_{n}),

where $h$ is some Boolean function. As we did for decision-tree complexity, let us consider two variants of the TQBF problem:

tt-TQBF: We are given as input a Boolean function $h:\{0,1\}^{2n}\to\{0,1\}$ as a truth table, and wish to know whether (3.1) holds.
circuit-TQBF: We are given $h$ as a circuit, and wish to know whether (3.1) holds.

It is well-known that circuit-TQBF is PSPACE-complete. It turns out that tt-TQBF is $\mathrm{NC^{1}}$ -complete:

Theorem 13.

tt-TQBF is $\mathrm{NC^{1}}$ -complete under $\leq_{\mathrm{NC^{0}}}$ reductions.

To prove Theorem 13 we use the following result of Barrington.

Theorem 14 ([2, 3]).

The $S_{5}$ identity problem, $S_{5}\mathsf{IP}$ , is the problem of deciding if the product of given permutations from $S_{5}$ is equal to the identity. Then $S_{5}\mathsf{IP}$ is $\mathrm{NC^{1}}$ -complete under $\leq_{\mathrm{NC^{0}}}$ reductions.

This theorem was proved in [2], and in [3] the authors verified that the reasoning proves the desired statement with DLOGTIME-uniform $\mathrm{NC^{0}}$ reductions.

Proof of Theorem 13.

It is easy to see that tt-TQBF is in $\mathrm{NC^{1}}$ : the formula (3.1) is a Boolean formula of depth $n$ whose leaves are entries in the truth-table of $h$ . To prove that tt-TQBF is $\mathrm{NC^{1}}$ -hard, the idea is to consider $S_{5}\mathsf{IP}$ as a game that can be interpreted as a TQBF formula.

Consider the input of $S_{5}\mathsf{IP}$ : permutations $\pi_{1},\ldots,\pi_{N}$ , with $N=2^{n}$ . Imagine two players Alice and Bob; Alice wants to prove that the product is equal to the identity permutation and Bob does not trust her.

They play in the following game. Bob asks: what is the product of the first half of the permutations, i.e. $\pi_{1}\cdot\ldots\cdot\pi_{\lfloor\frac{N}{2}\rfloor}$ . Alice states that this product is some permutation $\sigma_{1}$ . Additionally, since Alice is trying to prove to Bob that all the product of all permutations is equal to the identity, she is also implicitly stating that the product $\pi_{\lfloor\frac{N}{2}\rfloor+1}\cdot\ldots\cdot\pi_{N}$ is equal to $\sigma_{1}^{-1}$ . Bob does not trust Alice, and so he chooses $b_{1}\in\{0,1\}$ to mean that he believes one of Alice’s statements to be false. I.e. he chooses $b_{1}=0$ if he believes that the first part is actually not $\sigma_{1}$ , and he sets $b_{1}=1$ if he believes that the second part is not $\sigma_{1}^{-1}$ . After this, a similar procedure repeats: Alice states that the value of the product of the first half of Bob’s chosen part is $\sigma_{2}$ (this is one quarter of all permutations), which implies that the second half of Bob’s chosen part is $\sigma_{2}^{-1}\sigma_{1}^{(-1)^{b_{1}}}$ . Then Bob chooses one of these two quarters $b_{2}\in\{0,1\}$ where he believes Alice’s statement is false, and so on.

This game produces a sequence $\sigma_{1},b_{1},\ldots,\sigma_{n},b_{n}$ . At the end of the game, the winner is identified by whether $\alpha=\pi_{i}$ or not, where $\alpha$ and $i$ are inferred from the sequence. This game can be interpreted as a TQBF: each statement $\sigma_{i}$ of Alice can be encoded as $7$ bits (since $5!<2^{7}$ ), and each choice $b_{i}$ of Bob can be encoded as $1$ bit. Every value of the truth table of $h$ for tt-TQBF can be constructed from Alice and Bob’s moves and the corresponding value of some $\pi_{i}$ .

Indeed, $h(\sigma_{1},b_{1},\ldots,\sigma_{n},b_{n})=1$ if and only if $\alpha=\pi_{i}$ for the appropriate $\alpha\in S_{5}$ and $i\in[n]$ . Hence, this reduction is in $\mathrm{NC^{0}}$ , since every value in the truth-table of $h$ (i.e. the winner in the corresponding game) depends only on one permutation $\pi_{i}$ of the input, which is encoded using a constant number of bits.

We further claim that the corresponding $\mathrm{NC^{0}}$ reduction can be made DLOGTIME uniform. By logarithmic time we mean $O(\log N)=O(n)$ . We first describe an algorithm which, when given $\sigma_{1},b_{1},\ldots\sigma_{n},b_{n}$ (i.e. the moves in Alice-Bob game), outputs the index $i$ and the permutation $\alpha$ required to compute the bit $h(\sigma_{1},b_{1},\ldots\sigma_{n},b_{n})=[\alpha=\pi_{i}?]$ of $h$ ’s truth table.

The algorithm looks at every pair of moves $(\sigma_{1},b_{1}),\ldots(\sigma_{n},b_{n})$ one time and maintains in memory a permutation $\alpha$ , an index $k\in[n]$ , and two indexes $1\leq s\leq t\leq N$ . These values in memory have the following meaning: after round $k$ , Alice has stated that $\pi_{s}\ldots\pi_{t}=\alpha$ . Initially $\alpha$ is the identity, $k=0$ , $i=1$ , and $j=N$ .

The move of Alice in the $k$ -th round is some permutation $\sigma_{k}\in S_{5}$ , and Alice states that $\pi_{s}\ldots\pi_{m}=\sigma_{k}$ , where $m=\frac{s+t}{2}$ . The move of Bob (some bit $b_{k}$ ) is a choice “left” or “right”. If Bob’s answer is “left”, then we set $s:=s,t:=m$ , $\alpha:=\sigma$ and if Bob’s choice is “right” then we set $s:=m+1$ , $t:=t$ , and $\alpha:=\sigma_{k}^{-1}\cdot\alpha$ . Each such calculation can be done in constant time, so the entire computation can be done in linear time.

The above circuit is very simple, and the above algorithm shows how to compute the connections between the various gates. Since this algorithms runs in $O(\log N)$ time, this circuit is DLOGTIME-uniform. $\hfill\blacktriangleleft$

3.2 The crucial difficulty: TQBF vs DT

We have now laid out enough definitions that we can discuss the crucial difficulty in proving our main result (Theorems 10 and 11). We would like to prove that circuit-DT is PSPACE-hard, and we know that circuit-TQBF is PSPACE-hard. We would like to prove that tt-DT is $\mathrm{NC^{1}}$ -hard, and we know that tt-TQBF is $\mathrm{NC^{1}}$ -hard. The fundamental difference between the two problems can be understood by looking them as a game.

In TQBF, we have two disjoint sets of variables: Alice sets the $y_{i}$ variables and Bob sets the $x_{i}$ variables. In DT, we have a single set of variables: Alice chooses a variable, and Bob sets the variable. Now, what we would like to do, is to simulate the TQBF game using the DT game. The difficulty is that it is not at all obvious how such a simulation should proceed.

To simulate a TQBF game over variables $y_{1},x_{1},\ldots,y_{n},x_{n}$ , we use a DT game over variables $y_{1},y_{1}^{\prime},x_{1},x_{1}^{\prime},\ldots,y_{n},y_{n}^{\prime},x_{n},x_% {n}^{\prime}$ , and some other auxiliary variables. The hope is that the DT instance works in such a way that Alice must play by choosing first $y_{i}$ and then $y_{i}^{\prime}$ , or first $y_{i}^{\prime}$ and then $y_{i}$ . Whichever order she chooses, Bob must play by setting the first chosen $y_{i}$ or $y^{\prime}_{i}$ variable to $1$ , and by setting the second chosen to $0$ . Then Alice must play by choosing one of the two $x_{i}$ or $x_{i}^{\prime}$ variables, and Bob can set it anyway he wants. This way we simulate the TQBF game using the DT game. The difficulty is now in ensuring that Alice and Bob are indeed forced to play by the above “standard” strategies. To enforce this, additional gadgets will be put in place, so that any deviation from the above standard strategies will cause the deviating player to loose the DT game.

So let us begin.

3.3 First auxiliary function

In the reduction we will need an example, due to Wei Zhan, of a Boolean function $W$ such that for some variable $p$ it holds that $\mathsf{D}(W)=\mathsf{D}(W|_{p=1})\gg\mathsf{D}(W|_{p=0})$ . This function will be a kind of product of the following function $w:\{0,1\}^{4}\to\{0,1\}$ :

w(p,a_{0},a_{1},r)=\begin{cases}0&\text{if $p=1$, and $a_{0}=a_{1}$,}\\ a_{r}&\text{if $p=0$, or $a_{0}\neq a_{1}$.}\end{cases}

Lemma 15 ([21]).

The function $w$ is such that

(i)

$\mathsf{D}(w)=3$ ,
(ii)

$\mathsf{D}(w|_{p=1})=3$ and
(iii)

$\mathsf{D}(w|_{p=0})=2$ .

Proof of Lemma 15.

(i)

To determine $w$ one can ask the values of $a_{0}$ and $a_{1}$ . If $a_{0}\not=a_{1}$ then it is enough to know $r$ to determine the value of the function. If $a_{0}=a_{1}$ then it is enough to know $p$ (if $p=0$ then $w=a_{0}=a_{1}$ , if $p=1$ then $w=0$ ). The proof of the lower bound follows from the second item.
(ii)

The upper-bound follows from (i). The lower-bound is proven by brute force. We have

$w(1,a_{0},a_{1},r)=\begin{cases}0&\text{if $a_{0}=a_{1}$,}\\ a_{r}&\text{if $a_{0}\neq a_{1}$.}\end{cases}$

If we choose to query $a_{0}$ and $a_{1}$ first, and they differ, we still need to query $r$ . If we choose to query $a_{i}$ and $r$ first, and $r=1-i$ , we still need to query $a_{1-i}$ .
(iii)

To determine $w(0,a_{0},a_{1},r)$ we may ask $r$ then $a_{r}$ . It is easy to see that one question is not enough.

$\hfill\blacktriangleleft$

Denote by $W_{k}:\{0,1\}^{1+3k}$ the Boolean function given by:

W_{k}(p,a_{0}^{1},a_{1}^{1},r^{1},\ldots,a_{0}^{k},a_{1}^{k},r^{k})=w(p,a_{0}^% {1},a_{1}^{1},r^{1})\oplus\ldots\oplus w(p,a_{0}^{k},a_{1}^{k},r^{k}).

Lemma 16.

The function $W_{k}$ has the following properties:

1.

$\mathsf{D}(W_{k})=3k$ ; Moreover, there is a strategy for the second player (who sets the values) such that if the first player (Alice) chooses variable $p$ then she looses.
2.

$\mathsf{D}(W_{k}|_{p=1})=3k$ ;
3.

$\mathsf{D}(W_{k}|_{p=0})=2k$ .

Proof of Lemma 16.

It is easy to see that if functions $f$ and $g$ have disjoint variables then $\mathsf{D}(f\oplus g)=\mathsf{D}(f)+\mathsf{D}(g)$ . By these reasons the second and the third items are direct corollaries of the same items in Lemma 15. To prove the first item just consider the following obvious inequalities:

3k=\mathsf{D}(W_{k}|_{p=1})\leq\mathsf{D}(W_{k})\leq k\cdot\mathsf{D}(w)=3k.\

$\hfill\blacktriangleleft$

3.4 Second auxiliary function

Another tool in the reduction is the following function:

F_{n}(y_{1},y_{1}^{\prime},x_{1},x_{1}^{\prime},\ldots y_{n},y_{n}^{\prime},x_% {n},x_{n}^{\prime}):=f_{1}\oplus g_{1}\oplus\ldots\oplus f_{n}\oplus g_{n},% \text{ where:}

$\blacksquare$

every $f_{i}$ is defined as $f_{i}(y_{i},y_{i}^{\prime})=y_{i}\land y_{i}^{\prime}$ ;
$\blacksquare$

every $g_{i}$ is defined as

$g_{i}(y_{1},y_{1}^{\prime},x_{1},x_{1}^{\prime},\ldots y_{i},y_{i}^{\prime},x_% {i},x_{i}^{\prime})=\left\{\begin{array}[]{ll}x_{i}&\textrm{if $f_{1}\oplus g_% {1}\oplus\ldots\oplus g_{i-1}\oplus f_{i}=1$ }\\ x_{i}^{\prime}&\textrm{otherwise.}\end{array}\right.$

We claim that $\mathsf{D}(F_{n})=3n$ . Moreover, we want to claim some properties of the corresponding DT-game between Alice (who chooses variables) and Bob (who sets values). We would like to say that Alice and Bob must follow certain standard strategies, or else they will loose the DT game. Standard strategies are as follows:

Standard strategies for Alice.

In a standard strategy for Alice, she spends $2n$ questions to ask about the variables $y_{i}$ and $y_{i}^{\prime}$ for all $i$ . She can ask these questions in an arbitrary order.

She also spends $n$ questions to ask about exactly one of the two variables $x_{i}$ and $x_{i}^{\prime}$ , for each $i$ . Here the order is crucial. The correct variable to choose depends on the value of $f_{1}\oplus g_{1}\oplus\dots\oplus g_{i-1}\oplus f_{i}$ : she chooses $x_{i}$ if this is $1$ , and $x_{i}^{\prime}$ if this is $0$ , as in the definition of $g_{i}$ above. So, before asking about $x_{i}$ or $x_{i}^{\prime}$ , Alice must first ask about $y_{1},y_{1}^{\prime},\ldots,y_{i-1},y_{i-1}^{\prime}$ and about the appropriate variable in every couple $(x_{j},x_{j}^{\prime})$ for all $j<i$ . This defines $f_{j}$ , for all $j\leq i$ , and $g_{j}$ , for all $j<i$ .

Standard strategies for Bob.

A standard strategy for Bob is any strategy such that, the first time Alice asks about one of the two variables $y_{i}$ or $y_{i}^{\prime}$ , Bob answers $1$ .
We now formulate the following lemma about standard strategies.

Lemma 17.

It holds that $\mathsf{D}(F_{n})=3n$ and, furthermore,

1.

If Alice plays according to a standard strategy, she wins the DT game within $3n$ rounds.
2.

If Alice does not play according to a standard strategy then Bob can play in such a way that Alice does not win within $3n$ rounds.
3.

If, while Alice is playing according to a standard strategy, Bob does not answer according to one of his standard strategies, then Alice can win the DT game in strictly fewer than $3n$ rounds.

This auxiliary function is one of the central pieces in the reduction. It will allow us to replace the TQBF game, where Alice and Bob choose values for some variables, by a DT game, where Alice chooses some variables and Bob sets their value. For the reduction to go through, however, we need to put the pieces together in just the right way.

Proof of Lemma 17.

The first observation follows from Items 1 and 2. Item 1 follows because, in a standard strategy, Alice has asked about all variables in the functions $f_{i}$ , and because she asked about the relevant variable among $x_{i}$ and $x_{i}^{\prime}$ , she also learned all the $g_{i}$ .

Now we prove Item 2. Assume that Alice does not play according to a standard strategy. It means that either (a) Alice does not ask about some $y_{i}$ or $y_{i}^{\prime}$ , or (b) she did not ask about one of the $x_{i}$ or $x_{i}^{\prime}$ , or (c. $i$ ) she asks about $x_{i}$ or $x_{i}^{\prime}$ before seeing all the variables of $f_{1}\oplus g_{1}\oplus\ldots\oplus g_{i-1}\oplus f_{i}$ , or (d) for some $i$ , she saw all the variables of $f_{1}\oplus g_{1}\oplus\ldots\oplus g_{i-1}\oplus f_{i}$ , but asked about the wrong $x_{i}$ or $x_{i}^{\prime}$ .

In case (a), Alice did not ask about some $y_{i}$ or $y_{i}^{\prime}$ , so that the value of $f_{i}$ is not defined and independent of the remaining values, and hence the value of $F_{n}$ is also not defined, so if Bob plays according to any standard strategy, they end in a non-monochromatic subcube, and Alice looses. Likewise, in cases (b) and (d), $g_{i}$ is undefined and independent of the remaining values, and hence $F_{n}$ is also undefined, and Alice also looses.

Now suppose we are in case (c. $i$ ), but not (a), (b), or (d), or (c. $j$ ), for any $j<i$ . Then, Alice has asked about one of the variables $x_{i}$ or $x_{i}^{\prime}$ , but she did so before asking every variable of $f_{1}\oplus g_{1}\oplus\ldots\oplus g_{i-1}\oplus f_{i}$ . We then claim that Bob can answer Alice’s questions in such a way that Alice will need to ask at least $3n+1$ questions in total to know the value of $F$ . Indeed, Alice has asked about $x_{i}$ or $x_{i}^{\prime}$ before $f_{1}\oplus g_{1}\oplus\ldots\oplus g_{i-1}\oplus f_{i}$ became defined. So after choosing some value for the variable $x_{i}$ or $x_{i}^{\prime}$ , Bob can still answer Alice’s questions by a standard strategy such that $g_{i}$ equals the variable $x_{i}^{\prime}$ or $x_{i}$ which Alice did not choose. The remaining questions Bob answers according to an arbitrary standard strategy. Since Alice wasted (at least) one question by asking an irrelevant variable, it follows that she needs $3n$ other questions to fix the value of $F_{n}$ .

Finally, to prove Item 3, suppose that the first time Alice asks one of the two variables $y_{i}$ or $y_{i}^{\prime}$ , Bob answers $0$ . Then the function $f_{i}=y_{i}\wedge y_{i}^{\prime}$ becomes fixed and equal to $0$ , so Alice can fix the value of $F$ without ever asking about the other variable. And so Alice wins within $3n-1$ moves. $\hfill\blacktriangleleft$

3.5 The reduction

Let us here sketch of proofs of theorems 10 and 11. We reduce the TQBF instance

\exists y_{1}\forall x_{1}\ldots\forall x_{n}h(y_{1},x_{1},\ldots y_{n},x_{n})

(1)

to computing the query complexity $\mathsf{D}(D)$ of the function:

D=\left\{\begin{array}[]{ll}q,&\textrm{if $G_{n}\lor p=0$ }\\ W_{10n}\oplus F_{n},&\textrm{otherwise.}\end{array}\right.

	$\displaystyle\text{where }D$	$\displaystyle=D(x_{1},x_{1}^{\prime},y_{1},y_{1}^{\prime},\ldots,x_{n},x_{n}^{% \prime},y_{n},y_{n}^{\prime},p,q,a_{0}^{1},a_{1}^{1},r^{1},\ldots,a_{0}^{10n},% a_{1}^{10n},r^{10n})$
is a Boolean function on $4n+2+30n$ variables.
	$\displaystyle W_{10n}$	$\displaystyle=W_{10n}(p,a_{0}^{1},a_{1}^{1},r^{1},\ldots,a_{0}^{10n},a_{1}^{10% n},r^{10n})\text{ was defined in \lx@cref{creftypecap~refnum}{sec:Wn},}$
	$\displaystyle F_{n}$	$\displaystyle=F_{n}(y_{1},y_{1}^{\prime},x_{1},x_{1}^{\prime},\ldots y_{n},y_{% n}^{\prime},x_{n},x_{n}^{\prime})\text{ was defined in \lx@cref{% creftypecap~refnum}{sec:Fn}. And}$
	$\displaystyle G_{n}$	$\displaystyle=G_{n}(y_{1},y_{1}^{\prime},x_{1},x_{1}^{\prime},\ldots y_{n},y_{% n}^{\prime},x_{n},x_{n}^{\prime})$
is a (previously undefined) Boolean function on $2n$ variables, given by:
	$\displaystyle G_{n}$	$\displaystyle=h(y_{1},x_{1},\ldots,y_{n},x_{n})\lor\bigvee_{i=1}^{n}(y_{i}% \land y_{i}^{\prime})\lor\bigvee_{i=1}^{n}x_{i}\oplus x_{i}^{\prime}$

In the next subsection we prove that this reduction is correct. But let us sketch here how the above pieces ( $p$ , $q$ , $W_{10n}$ , $F_{n}$ and $G_{n}$ ) fit together:

$\blacksquare$

If Equation 1 holds, Alice can query the variables of $F_{n}$ using a standard strategy for $F_{n}$ : if she wants to set $y_{i}$ to $1$ , she asks about $y_{i}$ and then $y_{i}^{\prime}$ ; to set $y_{i}$ to $0$ , she asks first about $y_{i}^{\prime}$ and then $y_{i}$ . If Bob plays according to a standard strategy, he will be forced to set the variables $y_{i}$ as Alice wants, and ultimately $h$ will be $1$ , so $G$ will be $1$ , and then Alice and Bob must play the game for $W_{10n}$ . So Alice wins with $3n+30n$ queries in total.
$\blacksquare$

If Equation 1 does not hold, and Alice plays according to a standard strategy on the $3n$ variables of $F_{n}$ , then Bob can force $h$ and hence $G_{n}$ to be $0$ . Now Alice is in trouble: if she doesn’t query $p$ , she won’t know whether $q$ or $W_{10n}$ is relevant, so the function won’t be fixed. But as soon as she queries $p$ , Bob can answer $p=1$ , and now she must query further $30n$ variables to learn $W_{10n}$ . That’s $3n+1+30n$ queries in total.
$\blacksquare$

If at any point one of the players does not use a standard strategy, then the other player will have a way of winning .

Completion of the proofs of Theorems 10 and 11.

First we note that this reduction can be computed in polynomial time if $h$ is given as a circuit and by a $\mathrm{NC^{0}}$ -circuit if $h$ is given a truth-table: every value of $D$ depends only on one value of $h$ . To see DLOGTIME-uniformity one can check that, given values for the input variables of $D$ , one can calculate the ouput, as a function of $h$ , in time $O(n)$ . We simply calculate every value in the above expression for $D$ in time $O(n)$ , and as a result, the output will be either $1$ , $0$ , $h_{n}(\ldots)$ or $\neg h_{n}(\ldots)$ . So the truth table of $D$ can be computed from the truth table of $h$ by a DLOGTIME-uniform $\mathrm{NC^{0}}$ -reduction.

We now claim that (1) is true if and only if $\mathsf{D}(D)\leq 33n$ .

When (1) holds

First we prove if (1) is true, i.e., Alice has a winning strategy in the TQBF-game, then $\mathsf{D}(D)\leq 33n$ , i.e. Alice has a winning strategy in the DT-game, which allows her to win within $33n$ steps.

In the TQBF-game Alice first chooses $y_{1}$ , then $y_{2}$ as a function of $y_{1}$ and (Bob’s choice for) $x_{1}$ then $y_{3}$ as a function from $y_{1},x_{1},y_{2},x_{2}$ , and so on. We wish to translate such a strategy for the TQBF-game, into a strategy for the DT-game. The translation is the following.

Alice begins by fixing the value of $F_{n}$ to a constant. She will do so, by using the following standard DT-strategy (standard as in the proof of Lemma 17). If the TQBF-strategy of Alice sets $y_{1}=1$ , then the standard DT-strategy of Alice first asks about the variable $y_{1}$ , and then asks about the variable $y_{1}^{\prime}$ . If the TQBF-strategy of Alice sets $y_{1}=0$ , the standard DT-strategy of Alice swaps the order: first asking about $y_{1}^{\prime}$ and then about $y_{1}$ . The standard DT-strategy of Alice then asks about the appropriate relevant variable $x_{1}$ or $x_{1}^{\prime}$ (according to her standard strategy). She considers the value of the chosen variable ( $x_{1}$ or $x_{1}^{\prime}$ ) to be the value Bob has chosen for $x_{1}$ in TQBF-game. Then, the DT-strategy of Alice asks about $y_{2}$ and $y_{2}^{\prime}$ in some order, that again depends on the corresponding value for $y_{2}$ in the TQBF-strategy, and so on.

Now, either Bob follows a standard strategy (as in Lemma 17), or not. If he does, then (because the TQBF-strategy is a winning strategy for Alice) the value of $h$ is equal to $1$ and hence $G_{n}$ is also equal to $1$ and therefore $D$ is equal to $F_{n}\oplus W_{10n}$ . Note, that the value of $F_{n}$ is already defined. By Lemma 16 Alice can define the value of $W_{10n}$ in remaining $30n$ moves, so she wins.

If Bob does not use a standard strategy, Alice can define the value of $F_{n}$ in fewer than $3n$ moves. She can do it by Lemma 17. Then Alice asks about $p$ . Then, either:

$\blacksquare$

Bob answers $p=1$ . In this case, $D$ is equal to $F_{n}\oplus W_{10n}$ , $F_{n}$ is already defined and Alice can ask at least $30n$ questions, which is enough to determine the value of $W_{10n}$ .
$\blacksquare$

Bob answers $p=0$ . In this case by Lemma 16 Alice can define the value of $W_{10n}|_{p=0}$ in $20n$ questions, so she has at least $10n$ questions left. She uses these questions to ask about all the variables $q,x_{1},x_{1}^{\prime},y_{1},y_{1}^{\prime},\ldots,x_{n},x_{n}^{\prime},y_{n}$ and $y_{n}^{\prime}$ . Therefore, now Alice knows the values of $W_{10n}$ , $q$ , $p$ , $G_{n}$ , and $F_{n}$ , so Alice knows the value of $D$ .

When (1) is false

Now we prove that if (1) is false then Alice cannot win in $33n$ moves. First we prove the following

Lemma 18.

Assume that (1) is false. Then, Bob can answer the questions of Alice about the variables of $F_{n}$ in such a way that

$\blacksquare$

$F_{n}$ will not be defined as long as Alice has asked fewer than $3n$ questions.
$\blacksquare$

If Alice has asked exactly $3n$ questions, then either (i) the value of $F_{n}$ is not defined, or (ii) the value of $F_{n}$ is defined, but there exists an assignment of variables not asked by Alice such that $G_{n}=0$ .

Proof of Lemma 18.

From Lemma 17 we know that, if Alice does not play according to a standard strategy, she will need more than $3n$ questions to define $F_{n}$ . So we can assume that Alice plays according to a standard strategy. Bob will also play according to a standard strategy, with the following additional constraint: for every pair $\{y_{i},y_{i}^{\prime}\}$ Bob will answer $0$ to the second requested variable. (Recall that, a standard strategy for Bob is one where he answers $1$ for the first requested variable in the pair.)

We claim that if (1) is false, and Alice uses a standard strategy, then Bob can make $x_{i}=x_{i}^{\prime}$ for every $i$ , and $h=0$ , thus forcing $G_{n}=0$ . Indeed, in a standard strategy Alice asks variables in the right order, so that Bob can set $x_{i}$ and $x_{i}^{\prime}$ to the same value, according to his winning strategy in the TQBF-game. This makes the value of $h$ equal to $0$ . $\hfill\blacktriangleleft$

Now we are ready to describe the strategy for Bob. To recall: we are assuming that (1) is false, and we will devise a strategy for Bob that will leave $D$ undefined unless Alice asks more than $33n$ queries.

Bob’s strategy.

$\blacksquare$

For the variables that define $F_{n}$ , Bob uses the strategy from Lemma 18.
$\blacksquare$

If Alice asks about $p$ then Bob answers $1$ .
$\blacksquare$

For the variables that define $W_{10n}$ , Bob uses some best strategy for $W_{10n}|_{p=1}$ .
$\blacksquare$

For $q$ the answer is arbitrary.

We argue that if Bob uses this strategy then Alice cannot define the value of $D$ in $33n$ questions. Indeed, assume that Alice asks about $p$ . Then $p=1$ and the value of $D$ is equal to the value of $F_{n}\oplus W_{10n}$ . Now either Alice asked $<3n$ questions about $F_{n}$ or $<30n$ questions about $W_{10n}|_{p=1}$ . Either way, one of these functions is not defined and hence $D$ is also not defined.

Now assume that Alice does not ask about $p$ . Then by setting $p=1$ , we can always force the value of $D$ to be equal to $F_{n}\oplus W_{10n}$ , so this value must be defined. But to define $F_{n}$ and $W_{10n}$ , Alice must spend all her $33n$ questions, and so she cannot ask about $q$ . Moreover, to learn the value of $F_{n}$ , she must spend exactly $3n$ questions about $F_{n}$ , but she cannot spend any more. From Lemma 18, there must then exist some setting of the variables Alice didn’t ask, such that $G_{n}=0$ . Then, by way of this assignment together with $p=0$ , $D$ becomes equal to $q$ , which Alice did not ask and hence is not defined. $\hfill\blacktriangleleft$

4 Open questions

1.

What is the exact time-complexity of tt-DT? Is it possible to improve $O(3^{n}n)$ -algorithm of Proposition 6? Is it possible to prove any non-trivial bounds (for example, under the Exponential Time Hypothesis)?
2.

Is it possible to improve the $O(\log N\log\log N)$ -depth bound of Proposition 9?
3.

What is the exact time, space, and circuit complexity of the problem of finding the minimum size of a decision tree that computes a given Boolean function? It is known that this problem belongs to P[1], but the best depth upper-bound we know is $O((\log N)^{2})$ . It seemed to us that our reduction cannot be adapted to this case without a significantly new idea.
4.

What can we say about the problem of approximating DT complexity? One can consider, in the reduction above, the function $D_{n}^{1}\oplus\ldots\oplus D_{n}^{k}$ instead of $D_{n}$ for some $k$ , where all $D_{n}^{i}$ are the same functions as $D_{n}$ with fresh variables. It allows to prove that the problem of the approximation of DT with constant term has the same complexity as exact calculation of DT. Is it possible to improve on this result?

References

[1] Scott Aaronson. Algorithms for boolean function query properties. SIAM Journal on Computing, 32(5):1140–1157, 2003. doi:10.1137/S0097539700379644.
[2] David A Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC¹. In Proceedings STOC, 1986. doi:10.1145/12130.12131.
[3] David A Mix Barrington, Neil Immerman, and Howard Straubing. On uniformity within NC1. Journal of Computer and System Sciences, 41(3):274–306, 1990. doi:10.1016/0022-0000(90)90022-D.
[4] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Chapman and Hall/CRC, 1984.
[5] Arkadev Chattopadhyay, Michal Kouckỳ, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. Computational Complexity, 28:617–659, 2019. doi:10.1007/S00037-019-00190-7.
[6] Mika Göös. Lower bounds for clique vs. independent set. In Proceedings of FOCS, 2015. doi:10.1109/FOCS.2015.69.
[7] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. In Proceedings of the 56th FOCS, 2015. doi:10.1109/FOCS.2015.70.
[8] David Guijarro, Vıctor Lavın, and Vijay Raghavan. Exact learning when irrelevant variables abound. Information Processing Letters, 70(5):233–239, 1999. doi:10.1016/S0020-0190(99)00063-0.
[9] G. V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119–127, 1980.
[10] C. Koch, C. Strassle, and L. Tan. Properly learning decision trees with queries is NP-hard. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 2383–2407, Los Alamitos, CA, USA, November 2023. IEEE Computer Society. doi:10.1109/FOCS57990.2023.00146.
[11] Kasper Green Larsen. Models and techniques for proving data structure lower bounds. PhD thesis, Aarhus University, 2013.
[12] Bruno Loff and Sagnik Mukhopadhyay. Lifting theorems for equality. In Proceedings of STACS, 2019. doi:10.4230/LIPIcs.STACS.2019.50.
[13] Mihai Pǎtraşcu. Lower bound techniques for data structures. PhD thesis, Massachusetts Institute of Technology, 2008.
[14] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. doi:10.1023/A:1022643204877.
[15] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[16] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Combinatorica, 19(3):403–435, 1999. doi:10.1007/S004930050062.
[17] Omer Reingold, Luca Trevisan, and Salil Vadhan. Notions of reducibility between cryptographic primitives. In Theory of Cryptography Conference, pages 1–20. Springer, 2004. doi:10.1007/978-3-540-24638-1_1.
[18] Detlef Sieling. Minimization of decision trees is hard to approximate. J. Comput. Syst. Sci., 74(3):394–403, 2008. doi:10.1016/J.JCSS.2007.06.014.
[19] Shihao Xuanyuan, Shiang Xuanyuan, and Ye Yue. Application of c4. 5 algorithm in insurance and financial services using data mining methods. Mobile Information Systems, 2022(1), 2022.
[20] Andrew Chi-Chih Yao. On the complexity of comparison problems using linear functions. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975), pages 85–89. IEEE Computer Society, 1975.
[21] Wei Zhan. URL: https://cstheory.stackexchange.com/questions/52546/a-question-about-decision-tree-complexity.

[bib.bib1] [1] Scott Aaronson. Algorithms for boolean function query properties. SIAM Journal on Computing, 32(5):1140–1157, 2003. doi:10.1137/S0097539700379644.

[bib.bib2] [2] David A Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC¹. In Proceedings STOC, 1986. doi:10.1145/12130.12131.

[bib.bib3] [3] David A Mix Barrington, Neil Immerman, and Howard Straubing. On uniformity within NC1. Journal of Computer and System Sciences, 41(3):274–306, 1990. doi:10.1016/0022-0000(90)90022-D.

[bib.bib4] [4] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Chapman and Hall/CRC, 1984.

[bib.bib5] [5] Arkadev Chattopadhyay, Michal Kouckỳ, Bruno Loff, and Sagnik Mukhopadhyay. Simulation theorems via pseudo-random properties. Computational Complexity, 28:617–659, 2019. doi:10.1007/S00037-019-00190-7.

[bib.bib6] [6] Mika Göös. Lower bounds for clique vs. independent set. In Proceedings of FOCS, 2015. doi:10.1109/FOCS.2015.69.

[bib.bib7] [7] Mika Göös, Toniann Pitassi, and Thomas Watson. Deterministic communication vs. partition number. In Proceedings of the 56th FOCS, 2015. doi:10.1109/FOCS.2015.70.

[bib.bib8] [8] David Guijarro, Vıctor Lavın, and Vijay Raghavan. Exact learning when irrelevant variables abound. Information Processing Letters, 70(5):233–239, 1999. doi:10.1016/S0020-0190(99)00063-0.

[bib.bib9] [9] G. V. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119–127, 1980.

[bib.bib10] [10] C. Koch, C. Strassle, and L. Tan. Properly learning decision trees with queries is NP-hard. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 2383–2407, Los Alamitos, CA, USA, November 2023. IEEE Computer Society. doi:10.1109/FOCS57990.2023.00146.

[bib.bib11] [11] Kasper Green Larsen. Models and techniques for proving data structure lower bounds. PhD thesis, Aarhus University, 2013.

[bib.bib12] [12] Bruno Loff and Sagnik Mukhopadhyay. Lifting theorems for equality. In Proceedings of STACS, 2019. doi:10.4230/LIPIcs.STACS.2019.50.

[bib.bib13] [13] Mihai Pǎtraşcu. Lower bound techniques for data structures. PhD thesis, Massachusetts Institute of Technology, 2008.

[bib.bib14] [14] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. doi:10.1023/A:1022643204877.

[bib.bib15] [15] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

[bib.bib16] [16] Ran Raz and Pierre McKenzie. Separation of the monotone NC hierarchy. Combinatorica, 19(3):403–435, 1999. doi:10.1007/S004930050062.

[bib.bib17] [17] Omer Reingold, Luca Trevisan, and Salil Vadhan. Notions of reducibility between cryptographic primitives. In Theory of Cryptography Conference, pages 1–20. Springer, 2004. doi:10.1007/978-3-540-24638-1_1.

[bib.bib18] [18] Detlef Sieling. Minimization of decision trees is hard to approximate. J. Comput. Syst. Sci., 74(3):394–403, 2008. doi:10.1016/J.JCSS.2007.06.014.

[bib.bib19] [19] Shihao Xuanyuan, Shiang Xuanyuan, and Ye Yue. Application of c4. 5 algorithm in insurance and financial services using data mining methods. Mobile Information Systems, 2022(1), 2022.

[bib.bib20] [20] Andrew Chi-Chih Yao. On the complexity of comparison problems using linear functions. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975), pages 85–89. IEEE Computer Society, 1975.

[bib.bib21] [21] Wei Zhan. URL: https://cstheory.stackexchange.com/questions/52546/a-question-about-decision-tree-complexity.