Toward Better Depth Lower Bounds: Strong Composition of XOR and a Random Function

Chukhin, Nikolai; Kulikov, Alexander S.; Mihajlin, Ivan

doi:10.4230/LIPIcs.STACS.2025.26

Toward Better Depth Lower Bounds: Strong Composition of XOR and a Random Function

Nikolai Chukhin Neapolis University Pafos, Cyprus
JetBrains Research, Paphos, Cyprus Alexander S. Kulikov

JetBrains Research, Paphos, Cyprus Ivan Mihajlin JetBrains Research, Paphos, Cyprus

Abstract

Proving formula depth lower bounds is a fundamental challenge in complexity theory, with the strongest known bound of $(3-o(1))\log n$ established by Håstad over 25 years ago. The Karchmer–Raz–Wigderson (KRW) conjecture offers a promising approach to advance these bounds and separate ${\mathsf{P}}$ from ${\mathsf{NC}}^{1}$ . It suggests that the depth complexity of a function composition $f\diamond g$ approximates the sum of the depth complexities of $f$ and $g$ .

The Karchmer–Wigderson (KW) relation framework translates formula depth into communication complexity, restating the KRW conjecture as $\mathsf{CC}(\mathsf{KW}_{f}\diamond\mathsf{KW}_{g})\approx\mathsf{CC}(\mathsf{% KW}_{f})+\mathsf{CC}(\mathsf{KW}_{g})$ . Prior work has confirmed the conjecture under various relaxations, often replacing one or both KW relations with the universal relation or constraining the communication game through strong composition.

In this paper, we examine the strong composition $\mathsf{KW}_{\mathsf{XOR}}\circledast\mathsf{KW}_{f}$ of the parity function and a random Boolean function $f$ . We prove that with probability $1-o(1)$ , any protocol solving this composition requires at least $n^{3-o(1)}$ leaves. This result establishes a depth lower bound of $(3-o(1))\log n$ , matching Håstad’s bound, but is applicable to a broader class of inner functions, even when the outer function is simple. Though bounds for the strong composition do not translate directly to formula depth bounds, they usually help to analyze the standard composition (of the corresponding two functions) which is directly related to formula depth.

Our proof utilizes formal complexity measures. First, we apply Khrapchenko’s method to show that numerous instances of $f$ remain unsolved after several communication steps. Subsequently, we transition to a different formal complexity measure to demonstrate that the remaining communication problem is at least as hard as $\mathsf{KW}_{\mathsf{OR}}\circledast\mathsf{KW}_{f}$ . This hybrid approach not only achieves the desired lower bound, but also introduces a novel technique for analyzing formula depth, potentially informing future research in complexity theory.

Keywords and phrases:

complexity, formula complexity, lower bounds, Boolean functions, depth

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Circuit complexity

DOI:

10.4230/LIPIcs.STACS.2025.26

Event:

42nd International Symposium on Theoretical Aspects of Computer Science (STACS 2025)

Editors:

Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim Thắng

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Proving formula depth lower bounds is an important and difficult challenge in complexity theory: the strongest known lower bound $(3-o(1))\log n$ proved by Håstad [6] (following a line of works starting from Subbotovskaya [17, 9, 16]) remains unbeaten for more than 25 years already (in 2014, Tal [18] improved lower order terms in this lower bound). One of the most actively studied approaches to this problem is the one suggested by Karchmer, Raz, and Wigderson [11]. They conjectured that the naive approach of computing a composition of two functions is close to optimal. Namely, for two Boolean functions $f\colon\{0,1\}^{m}\to\{0,1\}$ and $g\colon\{0,1\}^{n}\to\{0,1\}$ , define their composition $f\diamond g\colon\{0,1\}^{m\times n}\to\{0,1\}$ as a function that first applies $g$ to every row of the input matrix and then applies $f$ to the resulting column vector. The KRW conjecture then states that ${\mathsf{D}}(f\diamond g)$ is close to ${\mathsf{D}}(f)+{\mathsf{D}}(g)$ , where ${\mathsf{D}}(\cdot)$ denotes the minimum depth of a de Morgan formula computing the given function. Karchmer, Raz, and Wigderson [11] proved that if the conjecture is true, then ${\mathsf{P}}\not\subseteq{\mathsf{NC}}^{1}$ , that is, there are functions in ${\mathsf{P}}{}$ that cannot be computed in logarithmic parallel time.

A convenient way of studying the KRW conjecture is through the framework of Karchmer–Wigderson relation [12]. It not only allows one to apply the tools from communication complexity, but also suggests various important special cases of the conjecture. For a function $f\colon\{0,1\}^{n}\to\{0,1\}$ , the relation ${\mathsf{KW}}_{f}$ is defined as follows:

{\mathsf{KW}}_{f}=\{(a,b,i)\colon a\in f^{-1}(1),b\in f^{-1}(0),i\in[n],a_{i}% \neq b_{i}\}.

The communication complexity ${\mathsf{CC}}({\mathsf{KW}}_{f})$ of this relation is the minimum number of bits that Alice and Bob need to exchange to solve the following communication problem: Alice is given $a\in f^{-1}(1)$ , Bob is given $b\in f^{-1}(0)$ , and their goal is to find an index $i\in[n]$ such that $(a,b,i)\in{\mathsf{KW}}_{f}$ (i.e., $a_{i}\neq b_{i}$ ). Karchmer and Wigderson [12] proved that, for any function $f$ , the communication complexity of ${\mathsf{KW}}_{f}$ is equal to the depth complexity of $f$ : ${\mathsf{CC}}({\mathsf{KW}}_{f})={\mathsf{D}}(f)$ . Within this framework, the KRW conjecture is restated as follows: ${\mathsf{CC}}({\mathsf{KW}}_{f}\diamond{\mathsf{KW}}_{g})$ is close to ${\mathsf{CC}}({\mathsf{KW}}_{f})+{\mathsf{CC}}({\mathsf{KW}}_{g})$ (where ${\mathsf{KW}}_{f}\diamond{\mathsf{KW}}_{g}$ is another name for ${\mathsf{KW}}_{f\diamond g}$ ).

One natural way of relaxing the conjecture is to replace one or both of the two relations ${\mathsf{KW}}_{f}$ and ${\mathsf{KW}}_{g}$ by the universal relation, defined as follows:

U_{n}=\{(a,b,i)\colon a,b\in\{0,1\}^{n},a\neq b,i\in[n],a_{i}\neq b_{i}\}.

Using a universal relation instead of the Karchmer–Wigderson relation makes the corresponding communication game only harder, hence proving lower bounds for it is potentially easier and could lead to the resolution of the original conjecture. For this reason, such relaxations have been studied intensively.

Edmonds et al. [4] proved the KRW conjecture for the composition $U_{m}\diamond U_{n}$ of two universal relations using communication complexity methods. Håstad and Wigderson [7] improved it for a higher degree of composition using a different approach. Karchmer et al. [11] extended this result to monotone functions. Håstad [6] demonstrated the conjecture for the composition $f\diamond{\mathsf{XOR}}_{n}$ of an arbitrary function $f\colon\{0,1\}^{m}\to\{0,1\}$ with the parity function ${\mathsf{XOR}}_{n}$ . This was later reaffirmed by Dinur and Meir [3] through a communication complexity approach. Further advancements were made by Gavinsky et al. [5] who established the conjecture for the composition $f\diamond U_{n}$ of any non-constant function $f\colon\{0,1\}^{m}\to\{0,1\}$ with the universal relation $U_{n}$ . Mihajlin and Smal [15] proved the KRW conjecture for the composition of a universal relation with certain hard functions using ${\mathsf{XOR}}$ -composition. Subsequently, Wu [20] improved this result by extending it to the composition of a universal relation with a wider range of functions (though still not with the majority of them). de Rezende et al. [2] proved the conjecture in a semi-monotone setting for a wide range of functions $g$ .

Another natural way of relaxing the initial conjecture is to constrain the communication game (instead of allowing for more inputs for the game). In the strong composition ${\mathsf{KW}}_{f}\circledast{\mathsf{KW}}_{g}$ , Alice receives $X\in(f\diamond g)^{-1}(1)$ and Bob receives $Y\in(f\diamond g)^{-1}(0)$ , and their objective is to identify a pair of indices $(i,j)$ such that $X_{i,j}\neq Y_{i,j}$ , similar to the regular composition. However, this time it must hold additionally that $g(X_{i})\neq g(Y_{i})$ .

This way of relaxing the conjecture was considered in a number of previous papers and was formalized recently by Meir [14]. Håstad and Wigderson, in their proof of the lower bound for two universal relations, initially establish the result for what they call the extended universal relation, a concept closely related to strong composition. Similarly, Karchmer et al. [11] demonstrate that, in the monotone setting, strong composition coincides with the standard composition. de Rezende et al. [2] utilized this notion, although without explicitly naming it. Meir [14] formalized the notion of strong composition in his proof of the relaxation of the KRW conjecture.

Theorem 1 (Meir, [14]).

There exists a constant $\gamma>0.04$ such that for every non-constant function $f\colon\{0,1\}^{m}\to\{0,1\}$ and for all $n\in\mathbb{N}$ , there exists a function $g\colon\{0,1\}^{n}\to\{0,1\}$ such that

{\mathsf{CC}}({\mathsf{KW}}_{f}\circledast{\mathsf{KW}}_{g})\geq\log{\mathsf{% CC}}({\mathsf{KW}}_{f})-(1-\gamma)m+n-O(\log(mn)).

1.1 Our Result

Håstad [6] proved the KRW conjecture for ${\mathsf{KW}}_{f}\diamond{\mathsf{KW}}_{{\mathsf{XOR}}}$ . However it is still an open question to prove the KRW conjecture for ${\mathsf{KW}}_{{\mathsf{XOR}}}\diamond{\mathsf{KW}}_{f}$ . In this paper, we study the strong composition ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ of the parity function ${\mathsf{XOR}}_{m}$ with a random function $f\colon\{0,1\}^{\log m}\to\{0,1\}$ . Since Alice and Bob receive an input of size $m\log m$ , we estimate the size of ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ in terms of $n=m\log m$ . It is not difficult to see that the communication complexity of the corresponding game is at most $3\log n$ : ${\mathsf{KW}}_{f}$ can be solved in $\log m$ bits of communication, whereas ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}$ can be solved in $2\log m$ bits of communication, using the standard divide-and-conquer approach (Alice sends the parity of the first half, Bob then identifies the half in which the parity differs, thus, by utilizing 2 bits of communication, the input size is reduced by a factor of two). We prove that if the function $f$ is well balanced and hard to approximate (which happens with probability $1-o(1)$ ), then the bound $3\log n$ is essentially optimal. Below, we state the result in terms of the protocol size (i.e., the number of leaves), rather than depth, since this gives a more general lower bound. In particular, it immediately implies a $(3-o(1))\log n$ depth lower bound.

Corollary 2.

With probability $1-o(1)$ , for a random function $f\colon\{0,1\}^{\log m}\to\{0,1\}$ , any protocol solving ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ has at least $n^{3-o(1)}$ leaves, where $n=m\log m$ .

In turn, this result follows from the following general lower bound, given in terms of ${\mathsf{L}}_{\frac{3}{4}}$ that stands for the smallest size of a formula that agrees with $f$ on a $\frac{3}{4}$ fraction of inputs.

Theorem 3.

For any $0.49$ -balanced function $f\colon\{0,1\}^{\log m}\to\{0,1\}$ , any protocol solving ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ has at least $n^{2-o(1)}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)$ leaves, where $n=m\log m$ .

In contrast to many results mentioned above and similarly to the bound by de Rezende at al. [2], our result works for a wide range of inner functions $f$ , what brings us closer to resolving KRW, which makes a claim about the complexity of composing any pair of functions. Also, many of the previous techniques work well in the regime where the outer function is hard and give no strong lower bounds when the outer function is easy (as it is the case with the ${\mathsf{XOR}}{}$ function). For example, random restrictions (as one of the most successful methods for proving lower bounds) does not seem to give meaningful lower bounds for ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ , as under a random restriction this composition turns into a $\mathsf{XOR}$ of a small number of variables which is easy to compute. The lower bound by Meir (see Theorem 1) also gives strong lower bounds in the regime where the outer function is hard (and only gives a trivial lower bound of the form $o(\log n)$ for the function that we study).

To prove the lower bound, we exploit formal complexity measures. As in [4, 15], we consider two stages of a protocol solving ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ . During the first stage, we track the progress using the classical measure by Khrapchenko [13] and ensure that even after many steps of the protocol, there are still many instances of $f$ that need to be solved. At the second stage, we switch to another formal complexity measure and show that the remaining communication problem is, roughly, not easier than ${\mathsf{KW}}_{{\mathsf{OR}}}\circledast{\mathsf{KW}}_{f}$ . We believe that this proof technique is interesting on its own, since it is not only easy to show that Khrapchenko’s measure cannot give superquadratic size lower bounds, but it is also known that natural generalizations of this measure are also unable to give stronger than quadratic lower bounds [8]. For more details on Khrapchenko’s measure and its limitations, see Sections 2.1 and 2.5.

2 Notation, Known Facts, and Technical Lemmas

Throughout the paper, $\log$ denotes the binary logarithm whereas $\ln$ denotes the natural logarithm. By $n$ we usually denote the size of the input. All asymptotic estimates are given under an implicit assumption that $n$ goes to infinity. By $[n]$ , we denote the set $\{1,2,\dotsc,n\}$ . By ${\mathsf{R}}_{+}$ we denote the set $\{x\in{\mathsf{R}}\colon x>0\}$ . We utilize the following asymptotic estimates for binomial coefficients. For any constant $0<\alpha<1$ ,

\Omega(n^{-1/2})2^{h(\alpha)n}\leq\binom{n}{\alpha n}\leq 2^{h(\alpha)n},

(1)

where $h(x)=-x\log x-(1-x)\log(1-x)$ denotes the binary entropy function.

For a string $x\in\{0,1\}^{n}$ , its $i$ -th bit of $x$ is denoted by $x_{i}$ . For a matrix $X\in\{0,1\}^{m\times n}$ , by $X_{i}$ we denote the $i$ -th row of $X$ and by $X_{i,j}$ we denote the bit of $X$ in the intersection of the $i$ -th row and the $j$ -th column.

2.1 Graphs

For a rooted tree, the depth of its node is the number of edges on the path from the node to the root; the depth of the tree is the maximum depth of its nodes.

Let $G(V,E)$ be a graph and $\varnothing\neq A\subseteq V$ be its nonempty subset of nodes. By $G[A]$ , we denote a subgraph of $G$ induced by $A$ . By $\operatorname{avgdeg}(G,A)$ , we denote the average degree of $A$ :

\displaystyle\operatorname{avgdeg}(G,A)=\frac{1}{|A|}\sum_{v\in A}\deg(v)\,.

(2)

For a biparite graph $G(A\sqcup B,E)$ with nonempty parts, let

\psi(G)=\operatorname{avgdeg}(G,A)\cdot\operatorname{avgdeg}(G,B)\,.

(3)

Clearly, $\psi(G)\leq|A|\cdot|B|$ . The lemma below shows that this graph measure is subadditive.

Lemma 4.

Let $G(A\sqcup B,E)$ be a bipartite graph and $A=A_{L}\sqcup A_{R}$ be a partition of $A$ into two parts. Let $G_{L}=G[A_{L}\sqcup B]$ and $G_{R}=G[A_{R}\sqcup B]$ . Then,

\psi(G)\leq\psi(G_{L})+\psi(G_{R}).

Proof.

Let $E_{L}$ and $E_{R}$ be the set of edges of $G_{L}$ and $G_{R}$ , respectively. Clearly, $E=E_{L}\sqcup E_{R}$ . Then,

	$\displaystyle\psi(G)\leq\psi(G_{L})+\psi(G_{R})$	$\displaystyle\iff\frac{\|E\|^{2}}{(\|A_{L}\|+\|A_{R}\|)\|B\|}\leq\frac{\|E_{L}\|^{2}}{\|A% _{L}\|\|B\|}+\frac{\|E_{R}\|^{2}}{\|A_{R}\|\|B\|}$
		$\displaystyle\iff\frac{\|E_{L}\|^{2}+\|E_{R}\|^{2}+2\|E_{L}\|\|E_{R}\|}{\|A_{L}\|+\|A_{R}% \|}\leq\frac{\|E_{L}\|^{2}}{\|A_{L}\|}+\frac{\|E_{R}\|^{2}}{\|A_{R}\|}$
		$\displaystyle\iff 2\|E_{L}\|\|E_{R}\|\|A_{L}\|\|A_{R}\|\leq\|E_{R}\|^{2}\|A_{L}\|^{2}+\|E_{% L}\|^{2}\|A_{R}\|^{2}$
		$\displaystyle\iff 0\leq(\|E_{R}\|\|A_{L}\|-\|E_{L}\|\|A_{R}\|)^{2}.\$

$\hfill\blacktriangleleft$

The next lemma shows that if $G$ contains a node of small enough degree, then deleting it not only does not drop $\psi$ , but also does not drop too much the average degree of the parts.

Lemma 5.

Let a node $a\in A$ of a bipartite graph $G(A\sqcup B,E)$ satisfy $\deg(G,a)\leq{\operatorname{avgdeg}(G,A)}/{2}$ and let $A^{\prime}=A\setminus\{a\}$ and $G^{\prime}(A^{\prime}\sqcup B,E^{\prime})=G[A\setminus\{a\}\sqcup B]$ . Then,

	$\displaystyle\psi(G^{\prime})$	$\displaystyle\geq\psi(G),$		(4)
	$\displaystyle\operatorname{avgdeg}(G^{\prime},A^{\prime})$	$\displaystyle\geq\operatorname{avgdeg}(G,A),$		(5)

Proof.

The inequality $\operatorname{avgdeg}(G^{\prime},A^{\prime})\geq\operatorname{avgdeg}(G,A)$ holds since $A^{\prime}$ results from $A$ by removing a node of degree less than the average degree.

To prove the inequality (4), let $d=\deg(G,a)$ . Then, $|E^{\prime}|=|E|-d$ and

	$\displaystyle\psi(G^{\prime})\geq\psi(G)$	$\displaystyle\iff\frac{(\|E\|-d)^{2}}{(\|A\|-1)\|B\|}\geq\frac{\|E\|^{2}}{\|A\|\|B\|}$
		$\displaystyle\iff\frac{\|E\|^{2}-2\|E\|d+d^{2}}{(\|A\|-1)\|B\|}\geq\frac{\|E\|^{2}}{\|A\|\|% B\|}$
		$\displaystyle\Longleftarrow\frac{\|E\|-2d}{\|A\|-1}\geq\frac{\|E\|}{\|A\|}$
		$\displaystyle\iff\|E\|\|A\|-2d\|A\|\geq\|E\|(\|A\|-1)$
		$\displaystyle\iff d\leq\frac{\|E\|}{2\|A\|}=\frac{\operatorname{avgdeg}(G,A)}{2}.\$

$\hfill\blacktriangleleft$

2.2 Boolean Functions

By $\mathbb{B}_{n}$ , we denote the set of all Boolean functions on $n$ variables. For two disjoint sets $A,B\subseteq\{0,1\}^{n}$ , the set $A\times B$ is called a combinatorial rectangle, and it is called full if $A$ and $B$ form a partition of $\{0,1\}^{n}$ . Clearly, there is a bijection between $\mathbb{B}_{n}$ and full combinatorial rectangles. For $f\in\mathbb{B}_{n}$ , by $R_{f}=f^{-1}(1)\times f^{-1}(0)$ , we denote the corresponding full rectangle. We say that a Boolean function $f$ is balanced if $|f^{-1}(0)|=|f^{-1}(1)|$ .

In this paper, it will prove convenient to apply a function $g\in\mathbb{B}_{m}$ not only to Boolean vectors $x\in\{0,1\}^{m}$ , but also to matrices $X\in\{0,1\}^{n\times m}$ :

g(X)=(g(X_{1}),\dotsc,g(X_{n})),

i.e., $g(X)\in\{0,1\}^{n}$ results by applying $g$ to every row of $X$ . This allows to define a composition in a natural way. For $f\in\mathbb{B}_{m}$ and $g\in\mathbb{B}_{n}$ , their composition $f\diamond g\colon\{0,1\}^{m\times n}\to\{0,1\}$ treats the input as an $m\times n$ matrix and first applies $g$ to all its rows and then applies $f$ to the resulting column-vector:

f\diamond g(X)=f(g(X))=f(g(X_{1}),\dotsc,g(X_{m})).

For a set of matrices $\mathcal{X}\subseteq\{0,1\}^{m\times n}$ , by $i$ -th projection $\operatorname{proj}_{i}\mathcal{X}$ , we denote the set of all $i$ -th rows among the matrices of $\mathcal{X}$ :

\operatorname{proj}_{i}\mathcal{X}=\{X_{i}\colon X\in\mathcal{X}\}=\{t\in\{0,1% \}^{n}\colon\exists X\in\mathcal{X}\colon t=X_{i}\}.

(6)

In the proof of the main result, we will be dealing with Boolean matrices of dimension $n\times\log n$ . Let $\mathcal{X}\subseteq\{0,1\}^{n\times\log n}$ be a set of such matrices. We say that $\mathcal{X}$ is $\alpha$ -bounded if $|\operatorname{proj}_{i}\mathcal{X}|\leq\alpha n$ , for all $i\in[n]$ . The $i$ -th projection of $\mathcal{X}$ is called sparse if $|\operatorname{proj}_{i}\mathcal{X}|<\frac{3}{8}n$ , and dense otherwise. The following lemma shows that if $|\mathcal{X}|$ is large and $\mathcal{X}$ is $\alpha$ -bounded, then the number of sparse projections of $\mathcal{X}$ is low. Later on, we will be applying this lemma for $\mathcal{X}$ which is almost $0.5$ -bounded and whose size gradually decreases to argue that the number of sparse projections cannot grow too fast.

Lemma 6.

Let $k\in\mathbb{N}$ and $\alpha\in\left(\frac{3}{8},\frac{1}{2}\right]$ . If $\mathcal{X}\subseteq\{0,1\}^{n\times\log n}$ is $\alpha$ -bounded and $|\mathcal{X}|\geq\alpha^{n}\frac{n^{n}}{2^{k}}$ , then the number of sparse projections of $\mathcal{X}$ does not exceed $k\log^{-1}\frac{8\alpha}{3}$ .

Proof.

Let $\beta_{i}\in[0,1]$ be such that $|\operatorname{proj}_{i}\mathcal{X}|=\beta_{i}\alpha n$ . The $i$ -th projection is sparse if and only if $\beta_{i}<\frac{3}{8\alpha}$ . Let $t$ be the number of sparse projections and assume, without loss of generality, that the first $t$ projections are sparse. Then,

	$\displaystyle\alpha^{n}\frac{n^{n}}{2^{k}}$	$\displaystyle\leq\|\mathcal{X}\|\leq\prod_{i=1}^{n}\|\operatorname{proj}_{i}% \mathcal{X}\|=(\alpha n)^{n}\prod_{i=1}^{n}\beta_{i}\iff$
	$\displaystyle\frac{1}{2^{k}}$	$\displaystyle\leq\prod_{i=1}^{n}\beta_{i}=\prod_{i=1}^{t}\beta_{i}\cdot\prod_{% i=t+1}^{n}\beta_{i}<\left(\frac{3}{8\alpha}\right)^{t}\implies k\log^{-1}\frac% {8\alpha}{3}\geq t.\$

$\hfill\blacktriangleleft$

2.3 Boolean Formulas

The computational model studied in this paper is de Morgan formulas: it is a binary tree whose leaves are labeled by variables $x_{1},\dotsc,x_{n}$ and their negations whereas internal nodes (called gates) are labeled by $\lor$ and $\land$ (binary disjunction and conjunction, respectively). Such a formula computes a Boolean function $f(x_{1},\dotsc,x_{n})\in\mathbb{B}_{n}$ . We also say that a formula $F$ separates a rectangle $A\times B$ , if $f(a)=1$ and $f(b)=0$ , for all $(a,b)\in A\times B$ . This way, if a formula $F$ computes a function $f$ , then it separates $R_{f}$ .

For a formula $F$ , the size ${\mathsf{L}}(F)$ is defined as the number of leaves in $F$ . This extends to Boolean functions: for $f\in\mathbb{B}_{n}$ , by ${\mathsf{L}}(f)$ we denote the smallest size of a formula computing $f$ . Similarly, the depth ${\mathsf{D}}(F)$ is the depth of the tree whereas ${\mathsf{D}}(f)$ is the smallest depth of a formula computing $f$ .

By ${\mathsf{L}}_{\frac{3}{4}}(f)$ , we denote the smallest size of a formula $F$ that agrees with $f$ on a $\frac{3}{4}$ fraction of inputs, i.e.,

\Pr_{x\in\{0,1\}^{n}}[F(x)=f(x)]\geq\frac{3}{4}.

We say that $F$ approximates $f$ .

It is known that formulas can be balanced: ${\mathsf{D}}(f)=\Theta(\log{\mathsf{L}}(f))$ (see references in [10, Section 6.1]): this is proved by showing that, for any formula $F$ , there exists an equivalent formula $F^{\prime}$ with ${\mathsf{L}}(F^{\prime})\leq{\mathsf{L}}(F)^{O(1)}$ and ${\mathsf{D}}(F^{\prime})\leq O(\log{\mathsf{L}}(F))$ . The following theorem further refines this: by allowing a larger constant in the depth upper bound, one can control the size of the resulting balanced formula.

Theorem 7 ([1]).

For any $k\geq 2$ and any formula $F$ , there exists an equivalent formula $F^{\prime}$ satisfying ${\mathsf{D}}(F^{\prime})\leq 3\ln 2\cdot k\cdot\log{\mathsf{L}}(F)$ and ${\mathsf{L}}(F^{\prime})\leq{\mathsf{L}}(F)^{\gamma}$ , where $\gamma=1+\frac{1}{1+\log(k-1)}$ .

Using a counting argument, one can show that, with probability $1-o(1)$ , for a random Boolean function $f\in\mathbb{B}_{n}$ , ${\mathsf{L}}(f)=\Omega(2^{n}/\log n)$ . To prove this, one compares the number of small size formulas with the number of Boolean functions $|\mathbb{B}_{n}|=2^{2^{n}}$ , using the following estimate (see [10, Lemma 1.23]). It ensures that the number of formulas of size at most $\frac{2^{n}}{100\log n}$ is $o(|\mathbb{B}_{n}|)$ :

(17n)^{\frac{2^{n}}{100\log n}}=2^{\frac{\log(17n)}{100\log n}2^{n}}\,.

Lemma 8.

For all large enough $l$ , the number of Boolean formulas over $n$ variables with at most $l$ leaves is at most

(17n)^{l}.

(7)

Proof.

The number of binary trees with $l$ leaves is at most $4^{l}$ . For each such tree, there are at most $(4n)^{l}$ ways to convert it into a de Morgan formula: there are $2n$ input literals for the leaves and two operations for each internal gate. Consequently, the total number of formulas with at most $l$ leaves is at most

l\cdot 4^{l}\cdot(4n)^{l}=l\cdot 16^{l}\cdot n^{l}\leq(17n)^{l},

which is true for $l\geq 71$ . $\hfill\blacktriangleleft$

We say that $f\in\mathbb{B}_{n}$ is $\alpha$ -balanced if

\alpha\cdot 2^{n}\leq|f^{-1}(0)|,|f^{-1}(1)|\leq(1-\alpha)\cdot 2^{n}

i.e., $\left||f^{-1}(0)|-|f^{-1}(1)|\right|\leq(1-2\alpha)2^{n}$ .

Lemma 9.

For all sufficiently large $n$ and any constant $\frac{3}{8}<\alpha<\frac{1}{2}$ , a random function $f\in\mathbb{B}_{n}$ is $\alpha$ -balanced and ${\mathsf{L}}_{\frac{3}{4}}(f)=\Omega(\frac{2^{n}}{\log n})$ , with probability $1-o(1)$ .

Proof.

For a formula over $n$ variables, the number of Boolean functions it approximates is at most (by the estimate (1))

\sum_{d=3\cdot 2^{n}/4}^{2^{n}}\binom{2^{n}}{d}=\sum_{d=0}^{2^{n}/4}\binom{2^{% n}}{d}\leq 2^{n}\binom{2^{n}}{2^{n}/4}\leq 2^{n}\cdot 2^{h(1/4)2^{n}}\,.

Combining this with (7), we get that the number of functions approximated by formulas of size $\beta\frac{2^{n}}{\log n}$ is at most

(17n)^{\beta\frac{2^{n}}{\log n}}2^{n}2^{h(1/4)2^{n}}=2^{2^{n}\left(\beta\frac% {\log(17n)}{\log n}+h(1/4)\right)+n}.

For any constant $0<\beta<1-h(1/4)$ , this is a $o(1)$ fraction of $\mathbb{B}_{n}$ .

Now, the probability that a random $f\in\mathbb{B}_{n}$ is not $\alpha$ -balanced (i.e., $||f^{-1}(0)|-|f^{-1}(1)||>(1-2\alpha)\cdot 2^{n}$ ) is at most

\frac{1}{2^{2^{n}}}\cdot 2\cdot\sum_{i=0}^{\alpha\cdot 2^{n}-1}\binom{2^{n}}{i% }\leq\frac{1}{2^{2^{n}}}\cdot 2\cdot\alpha\cdot 2^{n}\cdot\binom{2^{n}}{\alpha% \cdot 2^{n}}\leq 2^{2^{n}(h(\alpha)-1)+n+1}=o(1).

Thus, with probability $1-o(1)$ , a random $f\in\mathbb{B}_{n}$ is $\alpha$ -balanced and hard to approximate. $\hfill\blacktriangleleft$

2.4 Karchmer–Wigderson Games

Karchmer and Wigderson [12] came up with the following characterization of Boolean formulas. For a Boolean function $f\in\mathbb{B}_{n}$ , the Karchmer–Wigderson game ${\mathsf{KW}}_{f}$ is the following communication problem. Alice is given $a\in f^{-1}(1)$ , whereas Bob is given $b\in f^{-1}(0)$ , and their goal is to find an index $i\in[n]$ such that $a_{i}\neq b_{i}$ . A communication protocol for ${\mathsf{KW}}_{f}$ is a rooted binary tree whose leaves are labeled with indices from $[n]$ and each internal node $v$ is labeled either by a function $A_{v}\colon f^{-1}(1)\to\{0,1\}$ or by a function $B_{v}\colon f^{-1}(0)\to\{0,1\}$ . For any pair $(a,b)\in f^{-1}(1)\times f^{-1}(0)$ , one can reach a leaf of the protocol by traversing a path from the root to a leaf to determine to which of the two children to proceed from a node $v$ , one computes either $A_{v}(a)$ or $B_{v}(b)$ . We say that a protocol solves ${\mathsf{KW}}_{f}$ , if for any $(a,b)\in f^{-1}(1)\times f^{-1}(0)$ , one reaches a leaf $i\in[n]$ such that $a_{i}\neq b_{i}$ . Similarly to formulas, we say that a protocol separates a combinatorial rectangle $A\times B$ , if it works correctly for all pairs $(a,b)\in A\times B$ .

Karchmer and Wigderson showed that formulas computing $f$ and protocols solving ${\mathsf{KW}}_{f}$ can be transformed (even mechanically) into one another. In particular, the smallest number of leaves in the protocol solving ${\mathsf{KW}}_{f}$ is equal to ${\mathsf{L}}(f)$ , whereas the smallest depth of a protocol (also known as the communication complexity of ${\mathsf{KW}}_{f}$ , denoted by ${\mathsf{CC}}({\mathsf{KW}}_{f})$ ) is nothing else but ${\mathsf{D}}(f)$ . By ${\mathsf{L}}(A\times B)$ for a combinatorial rectangle $A\times B$ , we denote the minimum number of leaves in a protocol separating $A$ and $B$ .

With each node of a protocol solving ${\mathsf{KW}}_{f}$ , one can associate a combinatorial rectangle in a natural way. The root of the protocol corresponds to $R_{f}$ . For the two children of Alice’s node $v$ with a rectangle $A\times B$ , one associates two rectangles $A_{0}\times B$ and $A_{1}\times B$ , where $A_{i}=\{a\in A\colon A_{v}(a)=i\}$ . This way, Alice splits the current rectangle horizontally. Similarly, when Bob speaks, he splits the current rectangle vertically. Each leaf of a protocol solving ${\mathsf{KW}}_{f}$ is associated with a monochromatic rectangle, i.e., a rectangle $A\times B$ such that there exists $i\in[n]$ for which $a_{i}\neq b_{i}$ for all $(a,b)\in A\times B$ .

For functions $f\in\mathbb{B}_{m}$ and $g\in\mathbb{B}_{n}$ , the strong composition of ${\mathsf{KW}}_{f}$ and ${\mathsf{KW}}_{g}$ , denoted as ${\mathsf{KW}}_{f}\circledast{\mathsf{KW}}_{g}$ , is the following communication problem: Alice and Bob receive inputs $X\in(f\diamond g)^{-1}(1)$ and $Y\in(f\diamond g)^{-1}(0)$ , respectively, and need to find indices $(i,j)$ such that $X_{i,j}\neq Y_{i,j}$ and $g(X_{i})\neq g(Y_{i})$ . We say that a protocol strongly separates sets $\mathcal{X}\subseteq(f\diamond g)^{-1}(1)$ and $\mathcal{Y}\subseteq(f\diamond g)^{-1}(0)$ , if it solves the strong composition ${\mathsf{KW}}_{f}\circledast{\mathsf{KW}}_{g}$ on inputs $\mathcal{X}\times\mathcal{Y}$ .

2.5 Formal Complexity Measures

For $f\in\mathbb{B}_{n}$ , define a bipartite graph $G_{f}(f^{-1}(1)\sqcup f^{-1}(0),E_{f})$ as follows:

E_{f}=\{\{u,v\}\colon u\in f^{-1}(1),v\in f^{-1}(0),d_{H}(u,v)=1\},

where $d_{H}$ is the Hamming distance. Khrapchenko [13] proved that, for any $f\in\mathbb{B}_{n}$ , $\psi(G_{f})\leq{\mathsf{L}}(f)$ (recall (3) for the definition of $\psi(G)$ ). This immediately gives a lower bound ${\mathsf{L}}({\mathsf{XOR}}_{n})\geq n^{2}$ . Note the two useful properties of $\psi(G_{f})$ : on the one hand, it is a lower bound to ${\mathsf{L}}(f)$ , on the other hand, it is much easier to estimate than ${\mathsf{L}}(f)$ .

Paterson [19, Section 8.8] noted that Khrapchenko’s approach can be cast as follows. A function $\mu\colon\mathbb{B}_{n}\to{\mathsf{R}}_{+}$ is called a formal complexity measure if it satisfies the following two properties:

1.

normalization: $\mu(x_{i}),\mu(\overline{x_{i}})\leq 1$ , for all $i\in[n]$ ,
2.

subadditivity: $\mu(f\lor g)\leq\mu(f)+\mu(g)$ and $\mu(f\land g)\leq\mu(f)+\mu(g)$ , for all $f,g\in\mathbb{B}_{n}$ .

Note that Khrapchenko’s measure can be defined in this notation as $\phi(f)=\psi(G_{f})$ . Its subadditivity is shown in Lemma 4, whereas the normalization property can be easily seen.

It is not difficult to see that ${\mathsf{L}}$ itself is a formal complexity measure. Moreover, it turns out that it is the largest formal complexity measure.

Lemma 10 (Lemma 8.1 in [19]).

For any formal complexity measure $\mu\colon\mathbb{B}_{n}\to{\mathsf{R}}$ and any $f\in B_{n}$ , $\mu(f)\leq{\mathsf{L}}(f)$ .

3 Proof of the Main Result

In this section, we prove the main result of the paper.

See 3

See 2

Proof.

Lemma 9 guarantees that for a random function $f\colon\{0,1\}^{\log m}\to\{0,1\}$ , $f$ is $0.49$ -balanced and ${\mathsf{L}}_{\frac{3}{4}}(f)=\Omega(m/\log\log m)$ with probability $1-o(1)$ . Plugging this into Theorem 3 gives the required lower bound. $\hfill\blacktriangleleft$

3.1 Proof Overview

We start by proving a lower bound on the size of any protocol solving ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ and having a logarithmic depth. Then, using balancing techniques (see Theorem 7), we generalize the size lower bound to all protocols.

Fix a set $\mathcal{Z}\subseteq\{0,1\}^{\log m}$ such that $|\mathcal{Z}|=0.98m$ and $f$ is balanced on $\mathcal{Z}$ : $|\mathcal{X}_{0}|=|\mathcal{Y}_{0}|=0.49m$ , where $\mathcal{X}_{0}=f^{-1}(1)\cap\mathcal{Z}$ and $\mathcal{Y}_{0}=f^{-1}(0)\cap\mathcal{Z}$ .

We prove a lower bound for any protocol that strongly separates ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ on inputs $\mathcal{X}_{T}\times\mathcal{Y}_{T}$ (which are defined later). To this end, we associate, with nodes of the protocol, a graph similar to $G_{{\mathsf{XOR}}_{m}}$ and use Khrapchenko’s measure to track the progress of the protocol. A node of the graph is associated with all inputs $X$ having the same vector $f(X)$ . The reasoning is that, in a natural scenario, the protocol will first solve ${\mathsf{XOR}}_{m}$ , followed by solving $f$ , implying that the protocol does not need to distinguish between $X$ and $X^{\prime}$ in the initial rounds, if $f(X)=f(X^{\prime})$ . We connect two graph nodes by an edge if their vectors differ in exactly one coordinate.

We aim to ensure that each edge in the graph has a large projection: for any two nodes connected by an edge, the elements of blocks associated with them cover a substantial number of inputs for the function $f$ . There will be no small protocol capable of solving the problem within these two blocks since $f$ is hard to approximate. This is the rationale behind ensuring that all edges in the graph have large projections on both sides. To achieve this, we enforce that each block that is associated with a node shrinks by at most a factor of two at each step of the protocol. This process ensures that a significant number of edges in the graph will maintain large projections on both sides.

Once the Khrapchenko measure becomes sufficiently small, we can assert that ${\mathsf{XOR}}_{m}$ is nearly solved, and the protocol, in a sense, must now solve an instance of ${\mathsf{OR}}_{d}\circledast f$ . Using the fact that solving each edge independently is hard, we conclude that solving an ${\mathsf{OR}}_{d}\circledast f$ over these edges should be as difficult as approximately $d\cdot{\mathsf{L}}_{\frac{3}{4}}(f)$ .

3.2 Proof

Throughout this section, we assume that $m$ is large enough and $f\in\mathbb{B}_{\log m}$ is a fixed function that is $0.49$ -balanced. Fix sets $\mathcal{X}_{0}\subseteq f^{-1}(1),\mathcal{Y}_{0}\subseteq f^{-1}(0)$ of size $0.49m$ and let

	$\displaystyle\mathcal{X}_{T}$	$\displaystyle=\{X\in\{0,1\}^{m\times\log m}\colon({\mathsf{XOR}}_{m}\diamond f% )(X)=1\wedge X_{i}\in\mathcal{X}_{0}\sqcup\mathcal{Y}_{0},\;\forall i\in[m]\},$		(8)
	$\displaystyle\mathcal{Y}_{T}$	$\displaystyle=\{Y\in\{0,1\}^{m\times\log m}\colon({\mathsf{XOR}}_{m}\diamond f% )(Y)=0\wedge Y_{i}\in\mathcal{X}_{0}\sqcup\mathcal{Y}_{0},\;\forall i\in[m]\}.$		(9)

Let $\alpha>0$ be a constant and $P$ be a protocol that strongly separates $\mathcal{X}_{T}\times\mathcal{Y}_{T}$ and has depth at most $\alpha\log m$ . Recall that each node $S$ of $P$ is associated with a rectangle $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ . We build a subtree $D$ of $P$ having the same root and associate a graph $G_{N}$ to every node $N$ of $D$ . The graphs $G_{N}$ are built inductively from the graphs associated with the parents of $N$ as explained below, but all these graphs are subsets of the $m$ -dimensional hypercube: the set of nodes of each such graph is a subset of $\{0,1\}^{m}$ and for each edge $\{u,v\}$ it holds that $d_{H}(u,v)=1$ .

For the root $T$ of the protocol $P$ , the graph $G_{T}$ is simply $G_{{\mathsf{XOR}}_{m}}$ (which is nothing else but the $m$ -dimensional hypercube): its set of nodes is $\{0,1\}^{m}$ , two nodes are joined by an edge with label $i$ if they differ in the $i$ -th coordinate.

For any node $v$ of the graph $G_{S}$ , we associate the following set of inputs called block:

\mathcal{B}_{S}(v)=\{X\in\{0,1\}^{m\times\log m}\colon X\in\mathcal{X}_{S}% \sqcup\mathcal{Y}_{S}\text{ and }f(X)=v\}.

We say that an edge $\{u,v\}$ with label $i$ of $G_{S}$ is heavy if the projection of both $\mathcal{B}_{S}(u)$ and $\mathcal{B}_{S}(v)$ onto the $i$ th coordinate is dense, i.e.,

|\operatorname{proj}_{i}\mathcal{B}_{S}(u)|,|\operatorname{proj}_{i}\mathcal{B% }_{S}(v)|\geq\frac{3}{8}m,

and light otherwise.

Since the nodes of the graph $G_{S}$ form a subset of $\{0,1\}^{m}$ , we can naturally divide them into two parts, as their blocks correspond to subsets of either $\mathcal{X}_{S}$ or $\mathcal{Y}_{S}$ .

	$\displaystyle A_{S}$	$\displaystyle=\{v\in V(G_{S})\mid{\mathsf{XOR}}_{m}(v)=1\}$
	$\displaystyle B_{S}$	$\displaystyle=\{v\in V(G_{S})\mid{\mathsf{XOR}}_{m}(v)=0\}$

For a graph $G_{S}$ , we define $d_{A}(G_{S})$ as the average degree of the part $A_{S}$ and $d_{B}(G_{S})$ as the average degree of the part $B_{S}$ . We say that a graph $G_{S}$ is special if

\min\{d_{A}(G_{S}),d_{B}(G_{S})\}\leq 12\alpha\log^{2}m.

We will construct the tree $D$ inductively. For a node $S$ in the tree $D$ , we either stop the process if $G_{S}$ is special, or construct the two children of $S$ from the protocol $P$ and their graphs. We continue building $D$ on these two children inductively. Hence, all graphs corresponding to internal nodes of the tree $D$ are not special, while all graphs associated with leaves of $D$ are special.

Definition 11.

A graph $G_{S}$ , associated with a node $S$ in the tree $D$ , is adjusted if all its edges are heavy and

\deg(v)>\frac{d_{A}(G_{S})}{2},\;\forall v\in A_{S}\qquad\text{and}\qquad\deg(% v)>\frac{d_{B}(G_{S})}{2},\;\forall v\in B_{S}.

(10)

We will ensure that all graphs $G_{S}$ for any node $S$ in the tree $D$ are adjusted.

Lemma 12.

For each node $v$ of the graph $G_{T}$ (associated with the root $T$ of the protocol $P$ ),

1.

the degree of $v$ is $m$ ;
2.

$|\operatorname{proj}_{i}\mathcal{B}_{T}(v)|=0.49m$ , for all $i\in[m]$ ;
3.

$|\mathcal{B}_{T}(v)|=\frac{(0.98m)^{m}}{2^{m}}$ .

Proof.

Nodes of $G_{T}$ are $m$ -dimensional binary vectors, hence $\deg(v)=m$ .

To prove the second property, recall that $f$ is balanced on $\mathcal{X}_{0}\sqcup\mathcal{Y}_{0}$ . If $v_{i}=1$ (or $v_{i}=0$ ), for some $i\in[m]$ , the $i$ -th projection can take any value from $\mathcal{X}_{0}$ ( $\mathcal{Y}_{0}$ , respectively). Hence, $|\operatorname{proj}_{i}\mathcal{B}_{T}(v)|=0.49m$ .

Finally, to prove the third property, note the $G_{T}$ has $2^{m}$ nodes and for each vertex $v$ the size of the block $\mathcal{B}_{T}(v)$ is at most $(0.49m)^{m}$ . Therefore, since each input from $\mathcal{X}_{T}\sqcup\mathcal{Y}_{T}$ belongs to exactly one block that is associated with a node from $G_{T}$ and $|\mathcal{X}_{T}\sqcup\mathcal{Y}_{T}|=|\mathcal{X}_{0}\sqcup\mathcal{Y}_{0}|^% {m}=(0.98m)^{m}$ , $|\mathcal{B}_{T}(v)|=\frac{(0.98m)^{m}}{2^{m}}$ . $\hfill\blacktriangleleft$

Lemma 12 ensures that the graph $G_{T}$ is adjusted and not special, thus the root $T$ has two children. Using the function $\mathcal{B}$ , we show how to construct an intermediate graph $H_{N}$ for some child of a node $S$ in the tree $D$ and then we apply some cleanup procedures for the graph $H_{N}$ to construct a graph $G_{N}$ . Recall that each step of $P$ partitions the set of either Alice’s or Bob’s inputs into two parts. Let $G_{S}$ be a graph for some node $S$ of the protocol $P$ that is associated with a rectangle $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ and assume, without loss of generality, that it is Alice’s turn. Therefore, graph $G_{S}$ is not special, otherwise we will stop the building process of the subtree of $S$ . Let $S_{L}$ be the left child of $S$ in the protocol $P$ and $S_{R}$ be the right child. We add the same children of the node $S$ in the tree $D$ . Then, we put $v$ from $B_{S}$ into both $H_{S_{L}}$ and $H_{S_{R}}$ (since the block $\mathcal{B}_{S}(v)$ has not changed). For each node $v\in A_{S}$ we decide in which of the two graphs we will put it. The block $\mathcal{B}_{S}(v)$ is also split into two: $\mathcal{B}_{S_{L}}(v)$ and $\mathcal{B}_{S_{R}}(v)$ , corresponding to the two ways of the protocol. We assign $v$ to the left graph $H_{S_{L}}$ if $2\cdot|\mathcal{B}_{S_{L}}(v)|\geq|\mathcal{B}_{S}(v)|$ , and to the right graph $H_{S_{R}}$ if $2\cdot|\mathcal{B}_{S_{R}}(v)|>|\mathcal{B}_{S}(v)|$ . An edge $\{u,v\}$ from the edges of $G_{S}$ goes to $H_{S_{L}}$ if and only if both $u$ and $v$ are assigned to $H_{S_{L}}$ . The same rule applies for edges in $H_{S_{R}}$ . This approach ensures that the size of each block $\mathcal{B}_{S}(v)$ shrinks by at most a factor of two when transitioning from a parent to a child in the tree $D$ . Then, the graphs $G_{S_{L}}$ and $G_{S_{R}}$ will be built using graphs $H_{S_{L}}$ and $H_{S_{R}}$ , respectively.

The idea of the structure of the graph $G_{S}$ arises from Khrapchenko’s graph, so we will use the same measure:

\psi(G_{S})=d_{A}(G_{S})\cdot d_{B}(G_{S}).

Lemma 4 states that $\psi$ is subadditive.

After obtaining the graph $H_{C}$ for a node $C$ of the tree $D$ , we make our first cleanup by deleting all light edges: let $H_{C}^{\prime}$ be a graph resulting from $H_{C}$ by removing all its light edges. The next lemma shows that this does not drop the measure $\psi$ too much.

Lemma 13.

\psi(H_{C}^{\prime})\geq\psi(H_{C})\left(1-\frac{1}{\log m}\right).

Proof.

Let $S$ be the parent of $C$ in $D$ . Since $S$ is not a leaf, we have that $\min\{d_{A}(G_{S}),d_{B}(G_{S})\}>12\alpha\log^{2}m$ and the degree of every node in $G_{S}$ is at least half of the average degree of its part. Without loss of generality, assume that inputs were deleted from $\mathcal{X}_{S}$ , and therefore $d_{A}(H_{C})\geq\frac{d_{A}(G_{S})}{2}>6\alpha\log^{2}m$ .

An edge $\{u,v\}$ can become light because of only one of its endpoints, because the blocks on the other side remain unchanged. From Lemma 12, we know that the initial size of each block is $(0.49m)^{m}$ , and after each step of the protocol, the size of a block shrinks by at most a factor of two. Hence, for any node $v$ , the size of its block $\mathcal{B}_{S}(v)$ is at least $\frac{(0.49m)^{m}}{2^{\alpha\log m}}$ , because the protocol depth is bounded by $\alpha\log m$ . Hence, we can bound the number of light edges incident to $v$ by $3\alpha\log m$ using Lemma 6 (since $\log^{-1}(8\cdot 0.49/3)<3$ ). Therefore,

d_{A}(H_{C}^{\prime})\geq d_{A}(H_{C})-3\alpha\log m.

Now, consider $d_{B}(H_{C}^{\prime})$ . Let $E_{C}$ be the set of edges in $H_{C}$ , whereas $A_{C}$ and $B_{C}$ be its parts of nodes. Then,

d_{B}(H_{C}^{\prime})\geq\frac{E_{C}-|A_{C}|\cdot 3\alpha\log m}{|B_{C}|}=d_{B% }(H_{C})-3\alpha\log m\frac{|A_{C}|}{|B_{C}|}.

Hence,

	$\displaystyle\psi(H_{C}^{\prime})=d_{A}(H_{C}^{\prime})d_{B}(H_{C}^{\prime})$	$\displaystyle\geq\left(d_{A}(H_{C})-3\alpha\log m\right)\left(d_{B}(H_{C})-3% \alpha\log m\frac{\|A_{C}\|}{\|B_{C}\|}\right)$
		$\displaystyle\geq\psi(H_{C})-3\alpha d_{B}(H_{C})\log m-\frac{3\alpha\|A_{C}\|d_% {A}(H_{C})\log m}{\|B_{C}\|}$
		$\displaystyle=\psi(H_{C})\left(1-\frac{3\alpha\log m}{d_{A}(H_{C})}-\frac{3% \alpha\|A_{C}\|\log m}{\|E_{C}\|}\right)$
		$\displaystyle=\psi(H_{C})\left(1-\frac{6\alpha\log m}{d_{A}(H_{C})}\right)$
		$\displaystyle>\psi(H_{C})\left(1-\frac{6\alpha\log m}{6\alpha\log^{2}m}\right)% =\psi(H_{C})\left(1-\frac{1}{\log m}\right).\$

$\hfill\blacktriangleleft$

The next lemma shows how to construct an adjusted graph $G_{C}$ , from the intermediate graph $H_{C}^{\prime}$ .

Lemma 14.

There exists a subgraph $G_{C}$ of the graph $H_{C}^{\prime}$ such that $G_{C}$ is adjusted and $\psi(G_{C})\geq\psi(H_{C})\left(1-\frac{1}{\log m}\right)$ .

Proof.

To get $G_{C}$ , we keep removing nodes from $H_{C}^{\prime}$ until it satisfies (10). If (10) is violated, there exists, without loss of generality, a node $v\in A_{C}$ such that $\deg(v)\leq\frac{d_{A}(G_{C})}{2}$ . Let $G_{C}^{\prime}=G_{C}\setminus\{v\}$ . Lemma 5 guarantees that this does not decrease the measure. This process is clearly finite. $\hfill\blacktriangleleft$

This way, we construct the graph $G_{C}$ for the node $C$ . If $C$ is not special, we continue expanding the subtree rooted at $C$ . Recall also that, for each internal node $S$ of the tree $D$ , whose children are $S_{L}$ and $S_{R}$ , the following holds:

\psi(G_{S})\leq\psi(H_{S_{L}})+\psi(H_{S_{L}}).

Hence, combining it with Lemma 14 we have:

\psi(G_{S})\left(1-\frac{1}{\log m}\right)\leq\psi(G_{S_{L}})+\psi(G_{S_{R}}).

(11)

On the other hand, if $S$ is special, we will use the following two lemmas to argue that strongly separating $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ is still difficult.

Lemma 15.

Let $S$ be a node of the tree $D$ such that it has a node $v\in G_{S}$ having $d$ adjacent edges. Then, any protocol that strongly separates $\mathcal{X}_{S}$ and $\mathcal{Y}_{S}$ has at least $\Omega\left(d\cdot{\mathsf{L}}_{\frac{3}{4}}(f)\right)$ leaves.

Proof.

Consider the subgraph of $G_{S}$ induced by $v$ and its neighbors $u_{1},\dotsc,u_{d}$ connected to $v$ . Denote by $l_{i}$ the label of the edge $\{v,u_{i}\}$ . Define a measure $\xi$ on subrectangles of $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ :

\xi(\mathcal{X}\times\mathcal{Y})=\sum_{i=1}^{d}{\mathsf{L}}\left(% \operatorname{proj}_{l_{i}}\mathcal{B}\times\operatorname{proj}_{l_{i}}% \mathcal{B}_{i}\right),

where $\mathcal{X}\subseteq\mathcal{X}_{S},\,\mathcal{Y}\subseteq\mathcal{Y}_{S}$ , $\mathcal{B}=\mathcal{X}\cap\mathcal{B}_{S}(v)$ and $\mathcal{B}_{i}=\mathcal{Y}\cap\mathcal{B}_{S}(u_{i})$ , for all $i\in[d]$ . By ${\mathsf{KW}}(A\times B)$ , for any $A\cap B=\varnothing$ , we denote a Karchmer-Wigderson communication game where Alice gets $a\in A$ , Bob gets $b\in B$ , and they need to find $i\colon a_{i}\neq b_{i}$ . We prove that any protocol strongly separating $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ requires at least $\xi(\mathcal{X}_{S}\times\mathcal{Y}_{S})$ leaves.

It is easy to see that $\xi$ is subadditive, being a sum of subadditive measures: if $\mathcal{X}=\mathcal{X^{\prime}}\sqcup\mathcal{X^{\prime\prime}}$ , then $\xi(\mathcal{X}\times\mathcal{Y})\leq\xi(\mathcal{X^{\prime}}\times\mathcal{Y}% )+\xi(\mathcal{X^{\prime\prime}}\times\mathcal{Y})$ and the same applies when we split $\mathcal{Y}$ . Namely, let $\mathcal{Y}=\mathcal{Y^{\prime}}\sqcup\mathcal{Y^{\prime\prime}}$ , $\mathcal{B}_{i}=\mathcal{Y^{\prime}}\cap\mathcal{B}_{S}(u_{i})$ , and $\mathcal{B}_{i}^{\prime\prime}=\mathcal{Y^{\prime\prime}}\cap\mathcal{B}_{S}(u% _{i})$ . Then,

	$\displaystyle\xi(\mathcal{X}\times\mathcal{Y^{\prime}}\sqcup\mathcal{Y^{\prime% \prime}})$	$\displaystyle=\sum_{i=1}^{d}{\mathsf{L}}\left(\operatorname{proj}_{l_{i}}% \mathcal{B}\times\operatorname{proj}_{l_{i}}\mathcal{B}_{i}^{\prime}\sqcup% \mathcal{B}_{i}^{\prime\prime}\right)$
		$\displaystyle\leq\sum_{i=1}^{d}{\mathsf{L}}\left(\operatorname{proj}_{l_{i}}% \mathcal{B}\times\operatorname{proj}_{l_{i}}\mathcal{B}_{i}^{\prime}\right)+% \sum_{i=1}^{d}{\mathsf{L}}\left(\operatorname{proj}_{l_{i}}\mathcal{B}\times% \operatorname{proj}_{l_{i}}\mathcal{B}_{i}^{\prime\prime}\right)$
		$\displaystyle=\xi(\mathcal{X}\times\mathcal{Y}^{\prime})+\xi(\mathcal{X}\times% \mathcal{Y}^{\prime\prime}).$

Consider a protocol $P^{\prime}$ strongly separating $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ and its leaf $L$ associated with a rectangle of inputs $\mathcal{X^{\prime}}_{L}\times\mathcal{Y^{\prime}}_{L}$ . We show that $\xi(\mathcal{X^{\prime}}_{L}\times\mathcal{Y^{\prime}}_{L})\leq 1$ . Since $L$ is a leaf, there exists $i, j$ such that for each $X\in\mathcal{X^{\prime}}_{L}$ and $Y\in\mathcal{Y^{\prime}}_{L}$ :

X_{i,j}\neq Y_{i,j}\quad\text{and}\quad f(X_{i})\neq f(Y_{i}).

Let $k$ be such that $\mathcal{B}_{k}\neq\varnothing$ (if all $\mathcal{B}_{t}$ are empty, then $\xi=0$ ). Then, $\mathcal{B}(u_{t})=\varnothing$ , for all $t\neq k$ , as otherwise there would be no $i$ such that $f(X_{i})\neq f(Y_{i})$ for all $(X,Y)\in\mathcal{X^{\prime}}_{L}\times\mathcal{Y^{\prime}}_{L}$ , since $u_{k}$ differs from $v$ in the position $l_{k}$ , and $u_{t}$ differs from $v$ in the position $l_{t}$ and $l_{k}\neq l_{t}$ . Thus, if $\xi(\mathcal{X^{\prime}}_{L}\times\mathcal{Y^{\prime}}_{L})>1$ , then ${\mathsf{L}}(\operatorname{proj}_{l_{k}}\mathcal{B}\times\operatorname{proj}_{% l_{k}}\mathcal{B}_{k})>1$ , which contradicts to the existence of a pair $(i,j)$ .

Thus, $\xi$ is normal (has the value at most $1$ for any leaf of any protocol that strongly separates $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ ) and subadditive. Hence, its value for the whole protocol $P^{\prime}$ is a lower bound on the size of $P^{\prime}$ . Thus, it remains to estimate $\xi$ for $P^{\prime}$ .

Since all $d$ edges are heavy, we have:

|\operatorname{proj}_{l_{i}}\mathcal{B}_{S}(v)|+|\operatorname{proj}_{l_{i}}% \mathcal{B}_{S}(u_{i})|\geq\frac{3}{4}m,\quad\forall i\in[d].

Hence,

{\mathsf{L}}(\operatorname{proj}_{l_{i}}\mathcal{B}_{S}(v)\times\operatorname{% proj}_{l_{i}}\mathcal{B}_{S}(u_{i}))=\Omega\left({\mathsf{L}}_{\frac{3}{4}}(f)% \right),

for all $i\in[d]$ . Summing over all $i\in[d]$ , gives the desired lower bound. $\hfill\blacktriangleleft$

Lemma 16.

For a special node $S$ of the tree $D$ , the number of leaves in any protocol strongly separating $\mathcal{X}_{S}\times\mathcal{Y}_{S}$ is

\Omega\left(\frac{\psi(G_{S})\cdot{\mathsf{L}}_{\frac{3}{4}}(f)}{\log^{2}m}% \right).

Proof.

Assume, without loss of generality, that

d_{A}(G_{S})\geq d_{B}(G_{S})\quad\text{and}\quad d_{B}(G_{S})\leq 12\alpha% \log^{2}m.

Applying Lemma 15 to a node of degree at least $d_{A}(G_{S})$ , we get that the number of leaves is at least

\Omega\left(d_{A}(G_{S})\cdot{\mathsf{L}}_{\frac{3}{4}}(f)\right)=\Omega\left(% \frac{\psi(G_{S})}{d_{B}(G_{S})}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)\right)=% \Omega\left(\frac{\psi(G_{S})\cdot{\mathsf{L}}_{\frac{3}{4}}(f)}{\log^{2}m}% \right).\

$\hfill\blacktriangleleft$

At this point, everything is ready to lower bound the size of any protocol of logarithmic depth.

Theorem 17.

The size of the protocol $P$ (strongly separating $\mathcal{X}_{T}\times\mathcal{Y}_{T}$ ) is

\Omega\left(\frac{m^{2}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)}{\log^{2}m}\left(1-% \frac{1}{\log m}\right)^{\alpha\log m}\right).

Proof.

Lemma 16 states that the number of leaves needed to resolve any leaf $S$ of the tree $D$ is $\Omega\left(\psi(G_{S})\cdot{\mathsf{L}}_{\frac{3}{4}}(f)/\log^{2}m\right)$ . Let $\mathcal{S}$ be the set of all leaves of the tree $D$ . Using estimate (11), we have:

\psi(G_{T})\cdot\left(1-\frac{1}{\log m}\right)^{\alpha\log m}\leq\sum_{S\in% \mathcal{S}}\psi(G_{S}).

Since $\psi(G_{T})=m^{2}$ (by Lemma 12), Then, the number of leaves in $P$ is

\Omega\left(\sum_{S\in\mathcal{S}}\frac{\psi(G_{S})\cdot{\mathsf{L}}_{\frac{3}% {4}}(f)}{\log^{2}m}\right)\geq\Omega\left(\frac{m^{2}\cdot{\mathsf{L}}_{\frac{% 3}{4}}(f)}{\log^{2}m}\left(1-\frac{1}{\log m}\right)^{\alpha\log m}\right).\

$\hfill\blacktriangleleft$

Recall that $\alpha$ is a constant. Assuming $m\geq 4$ , we have $\log m\geq 2$ , and thus $1-\frac{1}{\log m}\geq e^{-\frac{2}{\log m}}$ . Then,

\frac{m^{2}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)}{\log^{2}m}\left(1-\frac{1}{\log m% }\right)^{\alpha\log m}\geq\frac{m^{2}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)}{\log% ^{2}m}e^{-\frac{2}{\log m}\cdot\alpha\log m}\geq m^{2-\varepsilon}\cdot{% \mathsf{L}}_{\frac{3}{4}}(f),

for any constant $\varepsilon>0$ when $m$ is sufficiently large. Hence, the number of leaves needed for a protocol $P$ is $m^{2-o(1)}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)$ .

Finally, we get rid of the assumption that the depth of $P$ is logarithmic and prove the main result.

Proof of Theorem 3.

Let $P$ be a protocol with $m^{2-\varepsilon}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)$ leaves, for some $\varepsilon>0$ , solving ${\mathsf{KW}}_{{\mathsf{XOR}}_{m}}\circledast{\mathsf{KW}}_{f}$ . We transform it into a protocol $P^{\prime}$ with $(m^{(2-\varepsilon)}\cdot{\mathsf{L}}_{\frac{3}{4}})^{\gamma}$ leaves and depth bounded by $3(3-\varepsilon)k\ln 2\cdot\log m$ , by applying Theorem 7, where $\gamma=1+\frac{1}{1+\log(k-1)}$ . (Theorem 7 is stated in terms of formulas, but it is not difficult to see that it works also for protocols for strong composition.)

Since $\varepsilon>0$ and $\lim_{k\to\infty}\gamma=1$ , there exist $k$ and $\varepsilon^{\prime}>0$ such that

\left(m^{2-\varepsilon}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)\right)^{\gamma}\leq m% ^{2-\varepsilon^{\prime}}\cdot{\mathsf{L}}_{\frac{3}{4}}(f),

since ${\mathsf{L}}_{\frac{3}{4}}(m)\leq m$ . Hence, protocol $P^{\prime}$ has logarithmic depth and at most $m^{2-\varepsilon^{\prime}}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)$ leaves, which contradicts Theorem 17. Therefore, $P$ has $\Omega\left(m^{2-o(1)}\cdot{\mathsf{L}}_{\frac{3}{4}}(f)\right)=\Omega(n^{2-o(% 1)}\cdot{\mathsf{L}}_{\frac{3}{4}}(f))$ leaves. $\hfill\blacktriangleleft$

References

[1] Maria Luisa Bonet and Samuel R. Buss. Size-depth tradeoffs for boolean fomulae. Inf. Process. Lett., 49(3):151–155, 1994. doi:10.1016/0020-0190(94)90093-0.
[2] Susanna F. de Rezende, Or Meir, Jakob Nordström, Toniann Pitassi, and Robert Robere. KRW composition theorems via lifting. Comput. Complex., 33(1):4, 2024. doi:10.1007/s00037-024-00250-7.
[3] Irit Dinur and Or Meir. Toward the KRW composition conjecture: Cubic formula lower bounds via communication complexity. Comput. Complex., 27(3):375–462, 2018. doi:10.1007/s00037-017-0159-x.
[4] Jeff Edmonds, Russell Impagliazzo, Steven Rudich, and Jirí Sgall. Communication complexity towards lower bounds on circuit depth. Comput. Complex., 10(3):210–246, 2001. doi:10.1007/s00037-001-8195-x.
[5] Dmitry Gavinsky, Or Meir, Omri Weinstein, and Avi Wigderson. Toward better formula lower bounds: The composition of a function and a universal relation. SIAM J. Comput., 46(1):114–131, 2017. doi:10.1137/15M1018319.
[6] Johan Håstad. The shrinkage exponent of de morgan formulas is 2. SIAM J. Comput., 27(1):48–64, 1998. doi:10.1137/S0097539794261556.
[7] Johan Håstad and Avi Wigderson. Composition of the universal relation. In Jin-Yi Cai, editor, Advances In Computational Complexity Theory, Proceedings of a DIMACS Workshop, New Jersey, USA, December 3-7, 1990, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 119–134. DIMACS/AMS, 1990. doi:10.1090/dimacs/013/07.
[8] Pavel Hrubes, Stasys Jukna, Alexander S. Kulikov, and Pavel Pudlák. On convex complexity measures. Theor. Comput. Sci., 411(16-18):1842–1854, 2010. doi:10.1016/j.tcs.2010.02.004.
[9] Russell Impagliazzo and Noam Nisan. The effect of random restrictions on formula size. Random Struct. Algorithms, 4(2):121–134, 1993. doi:10.1002/rsa.3240040202.
[10] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012. doi:10.1007/978-3-642-24508-4.
[11] Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum in communication complexity. Comput. Complex., 5(3/4):191–204, 1995. doi:10.1007/BF01206317.
[12] Mauricio Karchmer and Avi Wigderson. Monotone circuits for connectivity require super-logarithmic depth. SIAM J. Discret. Math., 3(2):255–265, 1990. doi:10.1137/0403021.
[13] V. M. Khrapchenko. Method of determining lower bounds for the complexity of p-schemes. Mathematical notes of the Academy of Sciences of the USSR, 10(1):474–479, 1971. doi:10.1007/BF01747074.
[14] Or Meir. Toward better depth lower bounds: A krw-like theorem for strong composition. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1056–1081. IEEE, 2023. doi:10.1109/FOCS57990.2023.00064.
[15] Ivan Mihajlin and Alexander Smal. Toward better depth lower bounds: The XOR-KRW conjecture. In Valentine Kabanets, editor, 36th Computational Complexity Conference, CCC 2021, July 20-23, 2021, Toronto, Ontario, Canada (Virtual Conference), volume 200 of LIPIcs, pages 38:1–38:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.38.
[16] Mike Paterson and Uri Zwick. Shrinkage of de morgan formulae under restriction. Random Struct. Algorithms, 4(2):135–150, 1993. doi:10.1002/rsa.3240040203.
[17] B. A. Subbotovskaya. Realization of linear functions by formulas using $\vee$ , $\&$ , ^-. Dokl. Akad. Nauk SSSR, 136(3):553–555, 1961. URL: http://mi.mathnet.ru/dan24539.
[18] Avishay Tal. Shrinkage of de morgan formulae by spectral techniques. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 551–560. IEEE Computer Society, 2014. doi:10.1109/FOCS.2014.65.
[19] Ingo Wegener. The complexity of Boolean functions. Wiley-Teubner, 1987. URL: http://ls2-www.cs.uni-dortmund.de/monographs/bluebook/.
[20] Hao Wu. An improved composition theorem of a universal relation and most functions via effective restriction. CoRR, abs/2310.07422, 2023. doi:10.48550/arXiv.2310.07422.

[bib.bib1] [1] Maria Luisa Bonet and Samuel R. Buss. Size-depth tradeoffs for boolean fomulae. Inf. Process. Lett., 49(3):151–155, 1994. doi:10.1016/0020-0190(94)90093-0.

[bib.bib2] [2] Susanna F. de Rezende, Or Meir, Jakob Nordström, Toniann Pitassi, and Robert Robere. KRW composition theorems via lifting. Comput. Complex., 33(1):4, 2024. doi:10.1007/s00037-024-00250-7.

[bib.bib3] [3] Irit Dinur and Or Meir. Toward the KRW composition conjecture: Cubic formula lower bounds via communication complexity. Comput. Complex., 27(3):375–462, 2018. doi:10.1007/s00037-017-0159-x.

[bib.bib4] [4] Jeff Edmonds, Russell Impagliazzo, Steven Rudich, and Jirí Sgall. Communication complexity towards lower bounds on circuit depth. Comput. Complex., 10(3):210–246, 2001. doi:10.1007/s00037-001-8195-x.

[bib.bib5] [5] Dmitry Gavinsky, Or Meir, Omri Weinstein, and Avi Wigderson. Toward better formula lower bounds: The composition of a function and a universal relation. SIAM J. Comput., 46(1):114–131, 2017. doi:10.1137/15M1018319.

[bib.bib6] [6] Johan Håstad. The shrinkage exponent of de morgan formulas is 2. SIAM J. Comput., 27(1):48–64, 1998. doi:10.1137/S0097539794261556.

[bib.bib7] [7] Johan Håstad and Avi Wigderson. Composition of the universal relation. In Jin-Yi Cai, editor, Advances In Computational Complexity Theory, Proceedings of a DIMACS Workshop, New Jersey, USA, December 3-7, 1990, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 119–134. DIMACS/AMS, 1990. doi:10.1090/dimacs/013/07.

[bib.bib8] [8] Pavel Hrubes, Stasys Jukna, Alexander S. Kulikov, and Pavel Pudlák. On convex complexity measures. Theor. Comput. Sci., 411(16-18):1842–1854, 2010. doi:10.1016/j.tcs.2010.02.004.

[bib.bib9] [9] Russell Impagliazzo and Noam Nisan. The effect of random restrictions on formula size. Random Struct. Algorithms, 4(2):121–134, 1993. doi:10.1002/rsa.3240040202.

[bib.bib10] [10] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012. doi:10.1007/978-3-642-24508-4.

[bib.bib11] [11] Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum in communication complexity. Comput. Complex., 5(3/4):191–204, 1995. doi:10.1007/BF01206317.

[bib.bib12] [12] Mauricio Karchmer and Avi Wigderson. Monotone circuits for connectivity require super-logarithmic depth. SIAM J. Discret. Math., 3(2):255–265, 1990. doi:10.1137/0403021.

[bib.bib13] [13] V. M. Khrapchenko. Method of determining lower bounds for the complexity of p-schemes. Mathematical notes of the Academy of Sciences of the USSR, 10(1):474–479, 1971. doi:10.1007/BF01747074.

[bib.bib14] [14] Or Meir. Toward better depth lower bounds: A krw-like theorem for strong composition. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 1056–1081. IEEE, 2023. doi:10.1109/FOCS57990.2023.00064.

[bib.bib15] [15] Ivan Mihajlin and Alexander Smal. Toward better depth lower bounds: The XOR-KRW conjecture. In Valentine Kabanets, editor, 36th Computational Complexity Conference, CCC 2021, July 20-23, 2021, Toronto, Ontario, Canada (Virtual Conference), volume 200 of LIPIcs, pages 38:1–38:24. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.CCC.2021.38.

[bib.bib16] [16] Mike Paterson and Uri Zwick. Shrinkage of de morgan formulae under restriction. Random Struct. Algorithms, 4(2):135–150, 1993. doi:10.1002/rsa.3240040203.

[bib.bib17] [17] B. A. Subbotovskaya. Realization of linear functions by formulas using $\vee$ , $\&$ , ^-. Dokl. Akad. Nauk SSSR, 136(3):553–555, 1961. URL: http://mi.mathnet.ru/dan24539.

[bib.bib18] [18] Avishay Tal. Shrinkage of de morgan formulae by spectral techniques. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 551–560. IEEE Computer Society, 2014. doi:10.1109/FOCS.2014.65.

[bib.bib19] [19] Ingo Wegener. The complexity of Boolean functions. Wiley-Teubner, 1987. URL: http://ls2-www.cs.uni-dortmund.de/monographs/bluebook/.

[bib.bib20] [20] Hao Wu. An improved composition theorem of a universal relation and most functions via effective restriction. CoRR, abs/2310.07422, 2023. doi:10.48550/arXiv.2310.07422.

	$\displaystyle\psi(G)\leq\psi(G_{L})+\psi(G_{R})$	$\displaystyle\iff\frac{\|E\|^{2}}{(\|A_{L}\|+\|A_{R}\|)\|B\|}\leq\frac{\|E_{L}\|^{2}}{\|A% _{L}\|\|B\|}+\frac{\|E_{R}\|^{2}}{\|A_{R}\|\|B\|}$
		$\displaystyle\iff\frac{\|E_{L}\|^{2}+\|E_{R}\|^{2}+2\|E_{L}\|\|E_{R}\|}{\|A_{L}\|+\|A_{R}% \|}\leq\frac{\|E_{L}\|^{2}}{\|A_{L}\|}+\frac{\|E_{R}\|^{2}}{\|A_{R}\|}$
		$\displaystyle\iff 2\|E_{L}\|\|E_{R}\|\|A_{L}\|\|A_{R}\|\leq\|E_{R}\|^{2}\|A_{L}\|^{2}+\|E_{% L}\|^{2}\|A_{R}\|^{2}$
		$\displaystyle\iff 0\leq(\|E_{R}\|\|A_{L}\|-\|E_{L}\|\|A_{R}\|)^{2}.\$

	$\displaystyle\psi(G^{\prime})\geq\psi(G)$	$\displaystyle\iff\frac{(\|E\|-d)^{2}}{(\|A\|-1)\|B\|}\geq\frac{\|E\|^{2}}{\|A\|\|B\|}$
		$\displaystyle\iff\frac{\|E\|^{2}-2\|E\|d+d^{2}}{(\|A\|-1)\|B\|}\geq\frac{\|E\|^{2}}{\|A\|\|% B\|}$
		$\displaystyle\Longleftarrow\frac{\|E\|-2d}{\|A\|-1}\geq\frac{\|E\|}{\|A\|}$
		$\displaystyle\iff\|E\|\|A\|-2d\|A\|\geq\|E\|(\|A\|-1)$
		$\displaystyle\iff d\leq\frac{\|E\|}{2\|A\|}=\frac{\operatorname{avgdeg}(G,A)}{2}.\$