Catalytic Computing and Register Programs Beyond Log-Depth

Alekseev, Yaroslav; Filmus, Yuval; Mertz, Ian; Smal, Alexander; Vinciguerra, Antoine

doi:10.4230/LIPIcs.MFCS.2025.6

Catalytic Computing and Register Programs Beyond Log-Depth

Yaroslav Alekseev

Technion Israel Institute of Technology, Haifa, Israel Yuval Filmus

Technion Israel Institute of Technology, Haifa, Israel Ian Mertz

Charles University, Prague, Czech Republic Alexander Smal

JetBrains Research, Paphos, Cyprus Antoine Vinciguerra

Technion Israel Institute of Technology, Haifa, Israel

Abstract

In a seminal work, Buhrman et al. (STOC 2014) defined the class ${\text{CSPACE}}(s,c)$ of problems solvable in space $s$ with an additional catalytic tape of size $c$ , which is a tape whose initial content must be restored at the end of the computation. They showed that uniform TC¹ circuits are computable in catalytic logspace, i.e., ${\text{CL}}={\text{CSPACE}}(O(\log{n}),2^{O(\log{n})})$ , thus giving strong evidence that catalytic space gives L strict additional power. Their study focuses on an arithmetic model called register programs, which has been a focal point in development since then.

Understanding CL remains a major open problem, as TC¹ remains the most powerful containment to date. In this work, we study the power of catalytic space and register programs to compute circuits of larger depth. Using register programs, we show that for every $\epsilon>0$ ,

{\text{SAC${}^{2}$}}\subseteq{\text{CSPACE}}\left(O\left(\frac{\log^{2}{n}}{% \log\log{n}}\right),2^{O(\log^{1+\epsilon}n)}\right).

On the other hand, we know that ${\text{SAC${}^{2}$}}\subseteq{\text{TC}}^{2}\subseteq{\text{CSPACE}}\left(O% \left(\log^{2}{n}\right),2^{O(\log{n})}\right)$ . Our result thus shows an $O(\log\log n)$ factor improvement on the free space needed to compute SAC², at the expense of a nearly-polynomial-sized catalytic tape.

We also exhibit non-trivial register programs for matrix powering, which is a further step towards showing ${\text{NC${}^{2}$}}\subseteq{\text{CL}}$ .

Keywords and phrases:

catalytic computing, circuit classes, polynomial method

Funding:

Yaroslav Alekseev: Supported by ISF grant 507/24.

Yuval Filmus: Supported by ISF grant 507/24.

Ian Mertz: Supported by the Grant Agency of the Czech Republic under the grant agreement no. 24-10306S and by the Center for Foundations of Contemporary Computer Science (Charles Univ. project UNCE 24/SCI/008).

Antoine Vinciguerra: Supported by ISF grant 507/24.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Circuit complexity ; Theory of computation

\rightarrow

Complexity classes

Editors:

Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

1.1 Catalytic Computation

In the realm of space-bounded computation, the catalytic computing framework, first introduced by Buhrman et al.[10], investigates the power of having additional storage which has to be restored to its original value at the end of computation. A catalytic Turing machine is a space-bounded Turing machine with two read-write tapes: a standard work tape of size $s$ and an additional, dedicated catalytic tape of size $c$ . However, the catalytic tape begins completely filled in with some memory $\tau$ , and despite being available for use as work space, the catalytic tape must be restored to its initial state $\tau$ at the end of the computation.

While ${\text{CSPACE}}(s,c)$ , the class of functions solvable with work space $s$ and catalytic space $c$ , clearly sits between ${\text{SPACE}}(s)$ and ${\text{SPACE}}(s+c)$ , naïvely it might seem that catalytic space should not improve the power of machines, as was conjectured in many previous works [19, 33, 24, 29]. Unexpectedly, [10] showed that catalytic logspace ( ${\text{CL}}={\text{CSPACE}}(\log{n},2^{O(\log{n})})$ ), the catalytic analogue of the complexity class L, contains the circuit class TC¹, which contains non-deterministic and randomized logspace (NL and BPL, respectively) as well as functions, such as determinant, widely believed to be outside both. Thus, they gave a strong case for the power of catalytic memory as a resource.

Building on this result, the catalytic approach and its methods have seen growing interest. Many follow-up works have sought to understand variants and properties of catalytic space have been studied, such as non-deterministic and randomized analogues [11, 22, 14, 31], non-uniform catalytic models [37, 40, 17, 18], robustness to errors in catalytic resetting [28, 25], catalytic communication protocols [39], and other variants [27, 6, 5] (see surveys by Koucký [30] and Mertz [35] for more discussion of these and other works).

Furthermore, applying catalytic tools to other complexity-theoretic questions has seen successful results in recent years. Two of the most successful approaches have been 1) approaches to derandomizing non-catalytic space-bounded classes [38, 23, 32]; and 2) space-bounded algorithms for function composition [15, 16, 18], which recently culminated in a breakthrough simulation by Williams [46] of time with quadratically less space.

Nevertheless, the exact strength of catalytic space remains open. In their first work, [10] showed ${\text{TC${}^{1}$}}\subseteq{\text{CL}}\subseteq{\text{ZPP}}$ , with the key open problem being the relationship of CL to P. Further work has shown slight progress on both ends; Cook et al. [14] showed that CL reduces to the lossy coding problem, which itself is in ZPP, while Agarwala and Mertz [1] showed that bipartite matching, a function not known to be anywhere in the NC hierarchy, can be solved in CL.

The key open question in this work is whether or not CL contains higher levels of the NC hierarchy. Mertz [35] posed a concrete approach to showing ${\text{NC${}^{2}$}}\subseteq{\text{CL}}$ via register programs, which we turn to now.

1.2 Register Programs

Register programs were first introduced by Coppersmith and Grossmann [21], and revived later by Ben-Or and Cleve to generalize Barrington’s theorem to arithmetic circuits [12, 13]. A register program over a ring $\mathbf{R}$ and a set of inputs $x_{1},\dots,x_{n}\in\mathbf{R}$ is defined as a sequence of instructions $I_{1},\dots,I_{n}\colon\mathbf{R}^{s}\mapsto\mathbf{R}^{s}$ applied to a set of registers $\{R_{1},\dots,R_{s}\}$ .

The key technique of Buhrman et al. [10] was based on register programs, which they showed can be directly simulated on a catalytic machine. They constructed a register program computing the $k$ th power of some value $x$ with $k$ registers and $O(1)$ accesses to the value $x$ ; using some extensions and other works, they showed that CL contains the circuit class TC¹. This was also the driving force behind the result of Cook and Mertz [18], and by extension the work of Williams [46], who showed that the Tree Evaluation problem, which was suggested as a function which would separate L from P [19], can be computed in space $O(\log{n}\log\log{n})$ .

As with catalytic space more generally, the power of register programs is still a mystery and deserves to be investigated thoroughly. The approach suggested in [35] for CL versus NC² is to find a register program computing the $k$ th power of a matrix with a register program matching the initial result of [10] in the non-commutative setting. More generally, it was proposed that further algebraic extension of these and other works could pave the way to improving the strength of CL.

1.3 Our Results

In this work, we investigate the complexity of computing polynomials in the register program framework, and make the first progress towards catalytic algorithms for circuit classes beyond TC¹. We present register programs for different kinds of polynomials (such as symmetric polynomials and polynomials representing Boolean functions), as well as more efficient programs for evaluating non-constant depth $d$ boolean circuits with a constant number of recursive calls.

From this, we deduce that the class SAC² of Boolean circuits of polynomial size and depth $O(\log^{2}{n})$ , with bounded fan-in $A N D$ gates and unbounded fan-in $O R$ gates, can be computed with $o(\log^{2}{n})$ work space when given access to nearly polynomial catalytic space.

Theorem 1.

For all $\epsilon>0$ ,

{\text{SAC${}^{2}$}}\subseteq{\text{CSPACE}}\left(O\left(\frac{\log^{2}{n}}{% \log\log{n}}\right),2^{O(\log^{1+\epsilon}{n})}\right).

Our technique also gives such an improvement for NC² with only polynomial catalytic space, although this can be derived more directly from the results of [10]. It also extends to #SAC², the arithmetic variant of SAC².

Second, we show sublinear register programs to compute matrix powering, thus making initial progress towards the program of [34]:

Theorem 2.

Let $d,n,p\in\mathbb{N}$ , where $p$ is prime. Let $M\in M_{n}(\mathbb{F}_{p})$ . For all $\epsilon>0$ there is a register program that computes $M^{d}$ with

$\blacksquare$

$O_{\epsilon}(d^{\epsilon}\log{d})$ recursive calls to $M$ ,
$\blacksquare$

$O_{\epsilon}\left(n^{\exp(1/\epsilon)}\right)$ basic instructions, and
$\blacksquare$

$O_{\epsilon}\left(n^{\exp(1/\epsilon)}\right)$ registers over $\mathbb{F}_{O\left(\left(np\right)^{\exp(1/\epsilon)}\right)}$ .

1.4 Main Contribution

We introduce a novel clean register program for computing multivariate polynomials. Current state-of-the-art register programs for degree $d$ multivariate polynomials $P(x_{1},\dots,x_{n})$ require $d+1$ recursive calls and $n+1$ registers [18]. Our approach dramatically reduces the number of recursive calls to a constant number, albeit at the cost of using more registers. This efficiency gain stems from our fundamental observation: the number of recursive calls is solely dependent on the number of multiplications.

Therefore, unlike traditional techniques that compute the polynomial directly or rely on interpolation, we propose a new strategy: constructing weaker polynomial representations, composed exclusively of additions and powers, over larger fields. This generalizes the technique employed in [10] for computing unbounded Majority gates. As established in [10] and [12], and as we will further elaborate, a class of functions admitting register programs with at most $t$ recursive calls implies the existence of a register program cleanly computing depth-d circuits comprising functions from this class by register programs with at most $t^{d}$ recursive calls. This register program can thus be described in $O(d\log t)$ space. Our register program for polynomials, by achieving a constant number of recursive calls, allows us to eliminate this $O(\log t)$ term, thereby reducing the descriptive complexity for those circuits.

2 Preliminaries

2.1 Circuits

A Boolean circuit is a directed acyclic graph $C$ with $0,1$ -valued inputs and outputs. Internal nodes (gates) are labeled with Boolean operations from a given set $B$ . The circuit size $s(C)$ is the number of nodes, and its depth $d(C)$ is the longest input-output path. While Boolean circuits had been studied earlier [43], their importance for parallel computing gained significant attention in the late 1970s [7, 36, 41, 8].

Definition 3.

We define the following circuit families over input literals

$x_{1},\ldots,x_{n},\lnot x_{1},\ldots,\lnot x_{n}$ :

$\blacksquare$

NC circuits: fan-in 2 AND and OR gates
$\blacksquare$

SAC circuits: fan-in 2 AND gates and unbounded fan-in OR gates¹¹1Since ${\text{SAC}}^{i}$ is closed under complement for all $i\geq 1$ (see [9]), these fan-ins can also be reversed, but for our argument we specifically use this version.
$\blacksquare$

AC circuits: unbounded fan-in AND and OR gates
$\blacksquare$

TC circuits: unbounded fan-in threshold gates

We denote by ${\text{NC}}^{i}$ $($ ${\text{SAC}}^{i}$ , ${\text{AC}}^{i}$ , ${\text{TC}}^{i}$ $)$ the family of functions computable by NC $($ SAC, AC, TC $)$ circuits of polynomial size and $O(\log^{i}n)$ depth. By NC we denote $\bigcup_{i\in\mathbb{N}}{\text{NC}}^{i}$ and similarly for SAC, AC, and TC.

The relations between these different classes have been extensively studied [41, 45]. For all $i$ , we have the following relations:

{\text{NC}}^{i}\subseteq{\text{SAC}}^{i}\subseteq{\text{AC}}^{i}\subseteq{% \text{TC}}^{i}\subseteq{\text{NC}}^{i+1}.

As a consequence, we have that

{\text{NC}}={\text{SAC}}={\text{AC}}={\text{TC}}.

Furthermore, other lines of research [7, 20, 44] have established other relationships for (logspace) uniform circuit classes:

{\text{NC${}^{1}$}}\subseteq{\text{L}}\subseteq{\text{NL}}\subseteq{\text{SAC$% {}^{1}$}}\subseteq{\text{AC${}^{1}$}}\subseteq{\text{TC${}^{1}$}}\subseteq{% \text{NC${}^{2}$}}\subseteq{\text{SAC${}^{2}$}}\subseteq\cdots\subseteq{\text{% NC}}\subseteq{\text{P}}.

While all containments are widely conjectured to be strict, no separations are known.

2.2 Catalytic Computation

Central to this work is the notion, introduced by Buhrman et al. [10], of catalytic space.

Definition 4.

A catalytic Turing machine with work space $s:=s(n)$ and catalytic space $c:=c(n)$ is a space-bounded Turing machine $M$ with access to two read-write tapes: a standard work tape of size $s$ , and a catalytic tape of size $c$ . The catalytic tape has the additional restriction that for every $\tau\in\{0,1\}^{c}$ , if we run $M$ on any input $x$ with the catalytic tape in initial configuration $\tau$ , the catalytic tape has final configuration $\tau$ as well.

This definition gives rise to natural complexity classes:

Definition 5.

We define ${\text{CSPACE}}(s,c)$ as the class of problems solvable by a catalytic Turing machine with a work tape of size $s$ and a catalytic tape of size $c$ . Furthermore, we denote catalytic logspace to be ${\text{CL}}={\text{CSPACE}}(O(\log n),2^{O(\log n)})$ .

We can view CL as a logspace machine equipped with the maximal effective amount of catalytic tape. Indeed, if a ${\text{CSPACE}}(s,c)$ space has a catalytic tape of size $c=2^{\omega(s)}$ , then storing the catalytic tape’s head’s position would already exceed its free space $s$ . This suggests that, beyond $c=2^{\theta(s)}$ , a catalytic tape offers no additional computational power.

2.3 Register Programs

Inspired by Ben-Or and Cleve’s work on straight-line programs and and their simulation of NC¹ circuits on this model [3, 12], Buhrman et al. [10] investigated the potential for optimization on a wider class of circuits, examining a more generic type of computational model due to Coppersmith and Grossman [21].

Definition 6.

Let $\mathbf{R}$ be a ring. An $\mathbf{R}$ -register program over input $x_{1},\dots,x_{n}$ with space $s$ is defined as a sequence of instructions $I_{1},\dots,I_{t}$ applied to a set of registers $R_{1},\dots,R_{s}$ , where each instruction $I_{k}$ has one of the following two forms:

$\blacksquare$

Basic instruction: updating a register $R_{i}$ with a polynomial $p_{k}$ over the other registers:

$I_{k}\colon R_{i}\leftarrow R_{i}+p_{k}(R_{1},\dots,R_{i-1},R_{i+1},\dots,R_{n% }).$
$\blacksquare$

Input access/Recursive call: adds the input (scaled by some $\lambda\in\mathbf{R}$ ) to one register $R_{i}$ :

$I_{k}\colon R_{i}\leftarrow R_{i}+\lambda x_{j}.$

The time $t$ of the register program is the number of instructions $I_{k}$ .

Furthermore, a register program computes a function $f\colon\mathbf{R}^{n}\mapsto\mathbf{R}^{m}$ if there is a set of $m$ registers $\{R_{i_{1}},\dots,R_{i_{m}}\}$ , initially holding some value $\tau_{k}\in\mathbf{R}$ , such that, once all the instructions are applied, we have:

R_{i_{k}}=\tau_{k}+{(f(x_{1},\cdots,x_{n}))}_{k}.

With regards to such subroutines, it will be important to restrict our register programs to be of a form amenable to composition:

Definition 7.

A $\mathbf{R}$ -register program cleanly computes a function $f\colon\mathbf{R}^{n}\mapsto\mathbf{R}^{m}$ if, assuming all registers are initialized to some initial value, $\tau_{i}\in\mathbf{R}$ :

$\blacksquare$

The register program computes $f$ on a subset $S$ of registers
$\blacksquare$

For every other register $R_{j}$ , $R_{j}=\tau_{j}$ .

We note that in Definition 6, we use the term input access when we are given direct access to the input $x$ , while we use the term recursive call when we are designing a subroutine whose input is being received not from the global function but rather from some preceding intermediate computation. We will make this distinction clear where necessary.

We present the following composition lemma to highlight the distinction between these two terms, and also to illustrate the use of clean computation.

Lemma 8 (Composition Lemma).

Let $f\colon\mathbf{R}^{n}\rightarrow\mathbf{R}^{m}$ and $g\colon\mathbf{R}^{o}\rightarrow\mathbf{R}^{n}$ be functions. Let $\Pi_{f}$ and $\Pi_{g}$ be clean register programs computing $f$ and $g$ with $t_{f}$ and $t_{g}$ input accesses, $s_{f}$ and $s_{g}$ basic instructions, and $r_{f}$ and $r_{g}$ registers, respectively. Then there exists a clean register program $\Pi_{f\circ g}$ that computes $f\circ g$ with

$\blacksquare$

$t_{f}t_{g}$ input accesses,
$\blacksquare$

$s_{f}+t_{f}s_{g}$ basic instructions, and
$\blacksquare$

$\max{\{r_{f},r_{g}\}}$ registers.

Proof.

Let $x\in\mathbf{R}$ and let $y=g(x)$ , so $f(g(x))=f(y)$ . We can cleanly compute $f(y)$ using $\Pi_{f}$ , with $t_{f}$ recursive calls to $y$ , $s_{f}$ basic instructions and $r_{f}$ registers.

On the other hand, we can cleanly compute $y=g(x)$ into the registers of $\Pi_{f}$ using $t_{g}$ input accesses and $s_{g}$ basic instructions for each of the $t_{f}$ recursive calls made to $g$ .

There are now two cases:

$\blacksquare$

If $r_{g}<r_{f}$ , since $\Pi_{g}$ cleanly computes $g$ , we can use the additional registers of $\Pi_{f}$ to compute $y$ .
$\blacksquare$

If $r_{g}>r_{f}$ , we add $r_{g}-r_{f}$ registers and compute $y$ using the latter registers as well as the additional registers of $\Pi_{f}$ .

Thus we can compute $f(g(x))$ with $t_{f}t_{g}$ input accesses to $x$ and $\max{\{r_{f},r_{g}\}}$ registers. $\hfill\blacktriangleleft$

Lastly, we connect clean register programs to catalytic computation:

Lemma 9 (Lemma 15 in [10]).

Any clean register program of time $t$ , space $s$ , and with $n$ inputs over a finite ring $\mathbf{R}$ can be simulated by a catalytic Turing machine in pure space $O(\log t+\log n+\log|\mathbf{R}|)$ and catalytic space $O(s\log|\mathbf{R}|)$ .

2.4 Polynomial Representation

The question of representing Boolean functions as multivariate polynomials over a ring has been intensively studied and has proven to be a useful tool in circuit complexity (see [2] for a survey). We will consider two kinds of representation: one that we will call the representation of $f$ , and the other the weak representation of $f$ .

Definition 10.

Let $P$ be a polynomial on $n$ variables over a ring $\mathbf{R}$ , and let $f$ be a Boolean function with $n$ inputs. We say $P$ represents $f$ if for all inputs $x\in\{0,1\}^{n}$ , we have $P(x)=0$ if and only if $f(x)=0$ .

Let us underline the difference between these two definitions of representation with an example. Let us consider the $n$ -ary AND function. Observe that the polynomial $P(X_{1},\dots,X_{n})=\sum_{i=1}^{n}X_{i}$ over any $\mathbb{Z}_{m}$ , where $m>n$ , weakly computes AND. Indeed, we can take $S_{0}=\{1,\dots,n-1\}$ and $S_{1}=\{n\}$ . On the other hand, $P$ does not represent AND, and in general, it is known that AND cannot be represented by a degree $1$ polynomial.

Definition 11.

We will say that a Boolean function $f$ has a $(\mathbf{R},d)$ -representation if there is a degree $d$ polynomial which represents it. We also define $(\mathbf{R},d,s)$ -representation, where additionally the polynomial is required to have at most $s$ monomials.

3 Register Programs for Polynomials

To prove our main theorem, our main task will be to find an efficient register program to cleanly compute multivariate polynomials. In this section, we present such a register program for multivariate polynomials for $\mathbf{Z}_{p}$ , for any prime $p\in\mathbb{N}$ , as well as a register program specifically for the Boolean case.

3.1 Computing Univariate Polynomials

In order to prove ${\text{TC${}^{1}$}}\subseteq{\text{CL}}$ , [10] designs a register program to compute $x^{n}$ for any element $x$ in a commutative field. We state a straightforward generalization of their lemma and corresponding program to compute arbitrary univariate polynomials:

Lemma 12.

Let $p\in\mathbb{N}$ be a prime number, and let $P\in\mathbb{F}_{p}[X]$ be a univariate polynomial of degree at most $n$ . For all $x\in\mathbb{F}_{p}$ , there is a clean register program that computes $P(x)$ with

$\blacksquare$

$4$ input accesses to $x$ ,
$\blacksquare$

$2n+2$ basic instructions, and
$\blacksquare$

$n+2$ registers.

Proof.

Let $P=\sum_{i=0}^{n}a_{i}X^{i}$ for some coefficients $a_{i}$ . Let $R_{in}$ be a register initially equal to $\tau_{in}$ , in which we will apply our input accesses on $x$ .

It is straightforward to show, by writing $x$ as $(\tau_{in}+x)-\tau_{in}$ , that

x^{i}=\sum_{j=0}^{n}\binom{i}{j}(\tau_{in}+x)^{j}(-\tau_{in})^{i-j}.

Making this observation, [10] presents in Lemma $5$ a register program that cleanly computes $x^{i}$ with $i+2$ registers and $4$ input accesses.

We will use the latter register program to compute $x^{n}$ to construct a register program to compute $P(x)$ , given in Figure 1.

⬇

1Registers:

2

R_{in}=\tau_{in}

3

R_{1}=\tau_{1},\dots,R_{n}=\tau_{n}

4

R_{out}=\tau_{out}

5

6

R_{in}\leftarrow R_{in}+x

//

R_{in}=\tau_{in}+x

7

8For

1\leq i\leq n

9

R_{i}\leftarrow R_{i}+R_{in}^{i}

//

R_{i}=\tau_{i}+(\tau_{in}+x)^{i}

10

11

R_{in}\leftarrow R_{in}-x

//

R_{in}=\tau_{in}

12

13For

1\leq i\leq n

14

\scriptstyle R_{out}\leftarrow R_{out}+a_{i}({(-1)}^{i}R_{in}^{i}+\text{\small% {$\Sigma$}}_{j=1}^{i}\binom{i}{j}{(-1)}^{i-j}R_{j}R_{in}^{i-j})

15

\scriptstyle R_{out}=\tau_{out}+P(x)-a_{0}+\text{\small{$\Sigma$}}_{i=1}^{n}a_% {i}\text{\small{$\Sigma$}}_{j=1}^{i}\binom{i}{j}(-1)^{i-j}\tau_{j}\tau_{in}^{i% -j}

16

17

R_{in}\leftarrow R_{in}+x

//

R_{in}=\tau_{in}+x

18

R_{i}\leftarrow R_{i}-R_{in}^{i}

//

R_{i}=\tau_{i}

19

R_{in}\leftarrow R_{in}-x

//

R_{in}=\tau_{in}

20

21For

1\leq i\leq n

22

\scriptstyle R_{out}\leftarrow R_{out}-a_{i}\text{\small{$\Sigma$}}_{j=1}^{i}% \binom{i}{j}(-1)^{i-j}R_{j}R_{in}^{i-j}

23//

R_{out}=\tau_{out}+P(x)-a_{0}

24

25

R_{out}\leftarrow R_{out}+a_{0}

//

R_{out}=\tau_{out}+P(x)

Figure 1: Program for computing a polynomial

P(x)

of degree

n

using

4

recursive calls to

x

,

2n+2

basic instructions, and

n+2

registers.

$\hfill\blacktriangleleft$

As discussed in [10, 18], this program can be adapted to compute a set of polynomials $\{P_{1},\dotsc,P_{\ell}\}$ with similar parameters:

Lemma 13.

Let $p\in\mathbb{N}$ be a prime number, and let $P_{1},\ldots,P_{\ell}\in\mathbb{F}_{p}[X]$ be univariate polynomials of degree at most $n$ . For all $x\in\mathbb{F}_{p}$ , there is a clean register program $\mathbf{UP}_{P_{1},\dotsc,P_{\ell}}$ that cleanly computes $P_{1}(x)\ldots P_{\ell}(x)$ with

$\blacksquare$

$4$ recursive calls to $x$ ,
$\blacksquare$

$2n+2\ell$ basic instructions, and
$\blacksquare$

$n+1+\ell$ registers.

Proof.

The program is the same as that of Lemma 12, but with each instruction involving $R_{out}$ replaced by $\ell$ instructions of the same form, where each involves a different output register $R_{out,j}$ and the corresponding polynomial $P_{j}(x)$ . $\hfill\blacktriangleleft$

3.2 Computing Multivariate Polynomials

In the last section, we showed that computing powers of elements in a commutative ring can be done with a constant number of recursive calls. Moreover, it should be clear that computing the sum of variables is an easy operation: we simply add each variable to a fixed output register in turn (see [10]).

Thus, our goal is to represent our polynomial in a way that requires only addition and powering operations.

Theorem 14 ([42, 4]).

Let $P\in\mathbb{F}_{p}[x_{1},\dotsc,x_{n}]$ be a homogeneous polynomial of degree $d<p$ . There exist $m\in\mathbb{N}$ , $m$ elements $\alpha_{i}$ and $n m$ elements $\beta_{i,j}$ such that:

P(x_{1},\dots,x_{n})=\sum_{i=1}^{m}\alpha_{i}{\left(\sum_{j=1}^{n}\beta_{i,j}x% _{j}\right)^{d}}.

Moreover, $m\leq\binom{n+d-1}{d-1}$ .

Representing our polynomial in that form does not involve any multiplication. We can thus describe a register program that computes a polynomial in $O(1)$ recursive calls:

Lemma 15.

Let $P$ be a homogeneous degree $d<p$ polynomial $P(x_{1},\dotsc,x_{n})$ over $\mathbb{F}_{p}$ . There is a register program $F_{P}$ that cleanly computes $P$ with

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(d\binom{n+d}{d})$ basic instructions, and
$\blacksquare$

$O(d\binom{n+d}{d})$ registers over $\mathbb{F}_{p}$ .

Proof.

We use Theorem 14 and write $P$ as:

P(x_{1},\dots,x_{n})=\sum_{i=1}^{m}\alpha_{i}\left(\sum_{j=1}^{n}\beta_{i,j}x_% {j}\right)^{d}.

Let $f_{i}(x_{1},\dots,x_{n})=\sum_{j=1}^{n}\beta_{i,j}x_{j}$ and $g=x^{d}$ . Then by the above discussion:

$\blacksquare$

$f_{i}$ can be cleanly computed with $1$ recursive call to each $x_{i}$ , $0$ basic instructions, and $1$ register.
$\blacksquare$

$g$ can be cleanly computed with $4$ recursive calls, $2d+2$ basic instructions, and $d+2$ registers using the register program from Lemma 12.

Hence combining Lemma 8 and slightly modifying the register program $\mathbf{UP}_{g\circ f_{1},\dotsc,g\circ f_{m}}$ from Lemma 13 to share the same output register, there exists a register program $R_{i}$ that cleanly computes $P$ in $R_{out}$ with:

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$m(2d+2)\leq(2d+2)\binom{n+d-1}{d-1}+1$ basic instructions, and
$\blacksquare$

$m(d+1)+1\leq(d+1)\binom{n+d-1}{d-1}+1$ registers,

as required in the statement of the lemma. $\hfill\blacktriangleleft$

This register program immediately generalizes to a non-homogeneous polynomial $P$ by considering the decomposition $P=\sum_{i=0}^{d}P_{i}$ , where each $P_{i}$ is a homogeneous degree $i$ polynomial.

Corollary 16.

Let $P$ be a degree $d<p$ polynomial $P(x_{1},\dotsc,x_{n})$ over $\mathbb{F}_{p}$ . There is a register program $F_{P}$ that cleanly computes $P$ with

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(d^{2}\binom{n+d}{d})$ basic instructions, and
$\blacksquare$

$O(d^{2}\binom{n+d}{d})$ registers over $\mathbb{F}_{p}$ .

This register program has the advantage of employing a constant number of recursive calls. However, this method has two caveats. First, the cost in the number of registers is superpolynomial when $d$ is non-constant. Second, the degree $d$ is upper bounded by the field size.

Concerning the field issue, we will instead consider the field $\mathbb{F}_{q}$ for some $q>d$ , such that $(P\bmod q)\bmod p=P\bmod p$ . Observe that for any degree $d$ polynomial over $\mathbb{Z}$ where the coefficients are smaller than $p$ , the polynomial evaluation for $x=(x_{1},\dots,x_{n})$ , where $0\leq x_{i}\leq p$ , is upper bounded by

P(x)\leq p\sum_{i=0}^{d}\binom{n}{i}p^{i}\leq 2^{n}p^{d+1}.

Hence, we can first evaluate $P(x)$ over a field of size $q\geq 2^{n}p^{d+1}$ . This yields the most general register program, yet leaves the problem with the number of registers unresolved.

Corollary 17.

Let $P$ be a degree $d$ polynomial $P(x_{1},\dotsc,x_{n})$ over $\mathbb{F}_{p}$ . There is a register program that cleanly computes $P$ with

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(d^{2}\binom{n+d}{d})$ basic instructions,
$\blacksquare$

$O(d^{2}\binom{n+d}{d})$ registers over $\mathbb{F}_{q}$ , where $q=O(2^{n}p^{d+1})$ , and
$\blacksquare$

$1$ register over $\mathbb{F}_{p}$ .

We also note one register program of orthogonal strength, namely greater in recursive calls but much less in space. In order to prove that Tree Evaluation is in ${\text{SPACE}}(\log{n}\log\log{n})$ (later improved by Stoeckl, see [26]), Cook and Mertz [18] present a register program to compute multivariate polynomials, which we will also use later:

Lemma 18.

Let $P(x_{1},\dots,x_{n})$ be a polynomial of degree $d<p$ over a prime field $\mathbb{F}_{p}$ . There exists a register program that cleanly computes $P$ with:

$\blacksquare$

$n+1$ registers, including a single output register,
$\blacksquare$

$O(d)$ basic instructions,
$\blacksquare$

$d+1$ instructions of the type $I_{\lambda,all}$ or its inverse $I^{-1}_{\lambda,all}$ , where:

$\displaystyle I_{\lambda,all}\colon\text{For }$ $\displaystyle 1\leq i\leq d,$

$\displaystyle R_{in,\ell}\leftarrow R_{in,\ell}+\lambda x_{i}.$

3.3 Computing Boolean functions

In the preceding section, we presented a register program to compute polynomials over any finite field. However, for the case of polynomials over $\mathbb{F}_{2}$ , i.e. Boolean functions, there are specific properties that we can exploit to make them simpler to compute, the most important being that any polynomial over $\mathbb{F}_{2}$ is multilinear since $x_{i}^{2}=x_{i}$ .

Since any Boolean function $f$ is uniquely represented by a polynomial $P$ over $\mathbb{Z}_{2}$ , we will directly consider Boolean functions in this section. Let us present a register program that computes any Boolean function $f$ given a representation of $f$ .

Let us first consider the case of symmetric functions. Given a symmetric function $f$ in $n$ variables, there is some univariate polynomial $G\colon[n]\mapsto\mathbb{N}$ such that $f(x_{1},\dots,x_{n})=G(x_{1}+\cdots+x_{n})$ . We can compute $G$ using Lemma 12, which yields the following register program:

Lemma 19.

Let $n\in\mathbb{N}$ , and let $f$ be a symmetric polynomial function. Let $p\in\mathbb{N}$ be a prime number such that $p>n$ . There is a register program that cleanly computes $f$ with:

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(n)$ basic instructions, and
$\blacksquare$

$O(p)$ registers over $\mathbb{Z}_{p}$ .

From this, we deduce the following lemma for polynomials in general:

Lemma 20.

Let $f\colon\{0,1\}^{n}\mapsto\{0,1\}$ be a Boolean function which is $(\mathbb{Z},d,t)$ -represented, and let $p$ be a prime number such that $p>\max\{d,t\}$ . There is a register program which cleanly computes $f$ with:

$\blacksquare$

$64$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(tp^{2}\log p)$ basic instructions, and
$\blacksquare$

$O(tp)$ registers over $\mathbb{Z}_{p}$ .

Proof.

Let $P$ be the polynomial that represents $f$ , which we write as a sum of terms

P=\sum_{j=1}^{t}u_{k},

where each term $u_{k}$ has the form

u_{k}=\prod_{j=1}^{d}x_{i_{j}},\quad i_{1},\dots,i_{d}\in[n].

Observe that each $u_{k}$ is symmetric, and so using the register program given by Lemma 19, we deduce that we can compute all the terms in parallel with

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(tp)$ basic instructions, and
$\blacksquare$

$O(tp)$ registers over $\mathbb{Z}_{p}$ .

To compute $\sum_{k=1}^{t}u_{k}$ , we can again use the register program of Lemma 19. This yields a register program with

$\blacksquare$

$4$ recursive calls to each $u_{j}$ ,
$\blacksquare$

$O(p)$ basic instructions, and
$\blacksquare$

$O(p)$ registers over $\mathbb{Z}_{p}$ .

Composing the two register programs using Lemma 8, we have a register program $R_{f}$ which computes $P$ with

$\blacksquare$

$16$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(tp^{2})$ basic instructions, and
$\blacksquare$

$O(tp)$ registers over $\mathbb{Z}_{p}$ .

Lastly, we convert the computation of $P$ into a computation of $f$ . Note that our final output register for $P$ is over $\mathbb{Z}_{p}$ , which we represent in binary with $O(\log p)$ bits. We now apply the register program for the OR function from Lemma 12 which uses $4$ recursive calls, $O(\log p)$ basic instructions, and $O(\log p)$ registers. Composing this with the rest of the program gives us a final program for computing $f$ with

$\blacksquare$

$64$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(tp^{2}\log p)$ basic instructions, and
$\blacksquare$

$O(tp)$ registers over $\mathbb{Z}_{p}$ .

which completes the lemma. $\hfill\blacktriangleleft$

4 Results for Circuits and Matrix Powering

We now apply the register programs we found for polynomials to the central computation models in question. We first show how to use the register programs to compute non-constant depth Boolean circuits, and later present a register program computing powers of matrices.

4.1 Circuits via Merging Layers

In this section, we will find efficient register programs for circuits, and hence efficient catalytic algorithms, by a strategy of merging layers and directly computing the “super-functions” that emerge. Our starting point is the following lemma, used in [12, 10].

Lemma 21.

Let $B$ be a set of Boolean functions such that for any function $g\in B$ we have a clean register program $P_{g}$ with at most $t$ input accesses, $b$ basic instructions, and $r$ registers computing $g$ . Let $C$ be a depth $d$ , size $s$ circuit whose gates are functions in $B$ . Then $C$ can be cleanly computed by a register program $P_{C}$ with

$\blacksquare$

$O(t^{d})$ recursive calls,
$\blacksquare$

$O(sb\cdot t^{d})$ instructions, and
$\blacksquare$

$O(rs)$ registers.

Proof.

We will perform the operations of each layer of the circuit in parallel. Each layer uses $t$ recursive calls to the previous layer and $s b$ basic instructions; hence, for a depth $d$ circuit, applying Lemma 8 iteratively gives $O(t^{d})$ recursive calls to the last layer and $sb\cdot O(t^{d})$ total basic instructions. Since each layer has at most $s$ gates, we will need at most $r s$ registers at each layer, and so again by Lemma 8 this gives $2rs$ registers in total. $\hfill\blacktriangleleft$

Our strategy will be as follows: instead of computing a height $h$ circuit $C$ with AND and OR gates layer by layer, we will show that we can compute $d$ layers at a time by an efficient register program, and thus consider the circuit $C^{\prime}$ of height $h/d$ whose gates are themselves depth $d$ circuits. We do this using polynomial representations of such circuits, which we then combine with Lemma 20 to obtain the register programs in question. Our main statement is the following:

Lemma 22.

Let $f\colon\{0,1\}^{n}\mapsto\{0,1\}$ be a Boolean function. Let $C$ be a size $n^{O(1)}$ depth $d$ circuit with fan-in $2$ AND gates and fan-in $\ell$ OR gates for some $\ell$ , computing $f$ . Then $C$ can be represented by a degree $k\leq 2^{d}$ polynomial with $t\leq\ell^{2^{d}}$ terms over $\mathbb{Z}$ .

Proof.

The proof is by induction. The claim for $d=0$ follows immediately since, in this case, the circuit computes one of the inputs.

We now assume that the claim holds for depth $d-1$ , and let $C$ be a depth $d$ circuit. We apply the induction hypothesis to the children of the top gate $g$ , and we have two cases for $g$ itself:

If the top gate is an OR gate:

We can find a polynomial $P=\sum_{i=1}^{\ell}P_{i}$ representing the circuit $C$ , where $P_{i}$ is the polynomial for each input of the OR gate. Using the induction hypothesis

\deg P=\max_{i}{\deg P_{i}}\leq 2^{d-1}<2^{d}.

Moreover, if we let $t_{i}$ be the number of terms of each $P_{i}$ , the number of terms of $P$ is

t=\sum_{i}t_{i}\leq\ell\cdot\ell^{2^{d-1}}\leq\ell^{2^{d}}.

If the output gate is a binary AND gate:

Let $P_{l}$ and $P_{r}$ respectively be the polynomials representing the left and right children. The polynomial $P=P_{l}P_{r}$ represents the circuit $C$ , and

\deg P=\deg P_{l}+\deg P_{r}\leq 2^{d-1}+2^{d-1}=2^{d}.

On the other hand,

t=t_{l}t_{r}\leq\left(\ell^{2^{d-1}}\right)^{2}=\ell^{2^{d}},

which completes the proof. $\hfill\blacktriangleleft$

Combining Lemma 22 for $\ell=n^{O(1)}$ with our register program for representations in Lemma 20, we immediately obtain the following corollary.

Corollary 23.

Let $C$ be a size $s$ depth $d$ circuit on $n$ inputs with unbounded fan-in OR gates and fan-in $2$ AND gates, and let ${p}\in\mathbb{N}$ be a prime number such that $p>s^{2^{d}}$ . There is a register program which cleanly computes $C$ with:

$\blacksquare$

$64$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$s^{O(2^{d})}$ basic instructions, and
$\blacksquare$

$s^{2^{d}}$ registers over $\mathbb{Z}_{p}$ .

Our main result follows for the right choice of $d$ .

Proof of Theorem 1.

Let $d\in\mathbb{N}$ be such that $d\leq\epsilon\log\log{n}$ . Corollary 23 provides a register program cleanly computing any size $n^{O(1)}$ depth $d$ bounded fan-in OR fan-in $2$ AND circuit with

$\blacksquare$

$64$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$2^{O(\log^{1+\epsilon}{n})}$ basic instructions, and
$\blacksquare$

$2^{O(\log^{1+\epsilon}{n})}$ registers over $\mathbb{Z}_{p}$ , where $p=2^{O(\log^{1+\epsilon}{n})}$ .

Given an SAC² circuit $C$ of size $n^{O(1)}$ and depth $O(\log^{2}{n})$ , we will rewrite it as a circuit $C^{\prime}$ of size at most $n^{O(1)}$ and depth $O(\frac{\log^{2}{n}}{d})$ , where each gate is a size $n^{O(1)}$ depth $d$ circuit with unbounded fan-in OR and fan-in 2 AND gates. Hence using Lemma 21, for $p=2^{O(\log^{1+\epsilon}{n})}$ we have a register program for $C$ with

$\blacksquare$

$64^{O\left(\frac{\log^{2}{n}}{\epsilon\log\log{n}}\right)}=2^{O\left(\frac{% \log^{2}{n}}{\log\log{n}}\right)}$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$n^{O(1)}2^{O(\log^{1+\epsilon}n)}\cdot 2^{O\left(\frac{\log^{2}{n}}{\log\log{n% }}\right)}=2^{O\left(\frac{\log^{2}{n}}{\log\log{n}}\right)}$ basic instructions, and
$\blacksquare$

$n^{O(1)}\cdot 2^{O(\log^{1+\epsilon}{n})}=2^{O(\log^{1+\epsilon}{n})}$ registers over $\mathbb{Z}_{p}$ .

At the end, these recursive calls translate into basic instructions reading the input, giving a total time of $2^{O\left(\frac{\log^{2}{n}}{\log\log{n}}\right)}$ . We can thus translate this register program to a catalytic machine using Lemma 9, and we deduce that:

C\in{\text{CSPACE}}\left(\frac{\log^{2}{n}}{\log\log{n}},2^{O(\log^{1+\epsilon% }{n})}\right)

as claimed. $\hfill\blacktriangleleft$

4.1.1 Extension to other models

We briefly note other circuit classes for which the polynomial representation of Lemma 22 gives similar results to Theorem 1. First, using the same technique as above for $\ell=2$ , i.e. NC circuits, we get the following result:

Theorem 24.

{\text{NC${}^{2}$}}\subseteq{\text{CSPACE}}\left(O\left(\frac{\log^{2}{n}}{% \log\log{n}}\right),n^{O(1)}\right)

Alternatively, we can show that CL can compute NC circuits of slightly more than logarithmic depth:

Theorem 25.

\mathsf{NC\textsf{--}\,DEPTH}(\log n\cdot\log\log n)\subseteq{\text{CL}}

We note that in these cases, Lemma 22 follows immediately from the DNF representation of circuits and thus Theorem 24 and Theorem 25 do not require techniques such as polynomial representation.

While we have exclusively discussed Boolean circuits in this paper, it is also worth noting that Theorem 1 generalizes to arithmetic circuits as well. Indeed, the following lemma is analogous to Lemma 22, and follows even more directly from the definition:

Lemma 26.

Let $C$ be a polynomial size depth $d$ circuit with fan-in $2$ $\times$ gates and fan-in $s$ $+$ gates over $\mathbb{Z}_{m}$ , for some $s$ . Then $C$ can be represented by a degree $k\leq 2^{d}$ polynomial with $t\leq s^{2^{d}}$ terms over $\mathbb{Z}_{m}$ .

Considering a prime $p\geq m$ and using Corollary 17, we have a register program to compute such circuits:

Lemma 27.

Any size $s$ , depth $d$ arithmetic circuits over $\mathbb{Z}_{m}$ with fan-in $2$ $\times$ gates can be computed by a register program with

$\blacksquare$

$4$ recursive calls to each $x_{i}$ ,
$\blacksquare$

$O(2^{2d}\binom{s+2^{d}}{2^{d}})$ basic instructions,
$\blacksquare$

$O(2^{2d}\binom{s+2^{d}}{2^{d}})$ registers over $\mathbb{F}_{q}$ where $q=O(2^{s}m^{2^{d}})$ , and
$\blacksquare$

$1$ register over $\mathbb{F}_{p}$ .

Define now ${\text{\#SAC${}^{i}$}}(\mathbb{Z}_{m})$ to be the class of polynomial size $O(\log^{i}{n})$ depth circuit with bounded fan-in $\times$ gates and unbounded fan-in $+$ gates. Taking $s=n^{O(1)}$ and $d=\epsilon\log\log{n}$ in Lemma 27, we get the following result for ${\text{\#SAC${}^{2}$}}(\mathbb{Z}_{m})$ :

Theorem 28.

For all $m\in\mathbb{N}$ , for all $\epsilon>0$ ,

{\text{\#SAC${}^{2}$}}\left(\mathbb{Z}_{m}\right)\subseteq{\text{CSPACE}}\left% (O\left(\frac{\log^{2}{n}}{\log\log{n}}\right),2^{O(\log^{1+\epsilon}{n})}% \right).

4.2 Matrix Powering via Decomposition

We now move to register programs for computing powers of a matrix $M\in M_{n}(\mathbb{Z}_{p})$ . A first attempt can be given by simply applying Lemma 15, as computing $M^{d}$ is equivalent to computing $n^{2}$ degree $d$ polynomial in the $n^{2}$ coefficients; namely, if we denote by $m^{(d)}_{i,j}$ the coefficient of $M^{d}$ , we have:

m^{(d)}_{i,j}=\sum_{1\leq k_{1},\dots,k_{d-1}\leq n}m_{i,k_{1}}\left(\prod_{i=% 1}^{d-2}m_{k_{i},k_{i+1}}\right)m_{k_{d-1},j}.

We can therefore use Lemma 15 to compute $M^{d}$ for $d<p$ .

Lemma 29.

Let $M\in M_{n}(\mathbb{F}_{p})$ . There is a register program that cleanly computes $M^{d}$ for $d<p$ with:

$\blacksquare$

$4$ recursive calls to $M$ ,
$\blacksquare$

$O(dn^{2}\binom{n+d}{d})$ basic instructions, and
$\blacksquare$

$O\left(dn^{2}\binom{n^{2}+d}{d}\right)$ registers over $\mathbb{F}_{p}$ .

The register program from Lemma 29 is a first step towards a more generic program for matrix powering, but it has two major issues:

$\blacksquare$

The program can only compute powers up to $p-1$ .
$\blacksquare$

The number of registers grows exponentially with $d$ .

We address these issues independently to get a register program that can handle all the cases.

Computing any power $𝒅$

Let $M\in M_{n}(\mathbb{F}_{p})$ and let $L\in M_{n}(\mathbb{Z})$ be the natural extension of $M$ to integers. Observe that the coefficients $\ell^{(d)}_{i,j}$ of $L^{d}$ are evaluations of degree $d$ polynomials with $n^{d-1}$ terms, and hence $\ell^{(d)}_{i,j}\leq p^{d}n^{d-1}$ .

Let us consider the first prime number $q$ greater than $p^{d}n^{d-1}\geq d$ . In this case, we have $\left(\ell^{(d)}_{i,j}\bmod q\right)\bmod p=\ell^{(d)}_{i,j}\bmod p=m^{(d)}_{i% ,j}$ . Hence, we can use the register program of Lemma 29 for $d<q$ and have an output register over $\mathbb{F}_{p}$ for each coefficient. This yields the following:

Lemma 30.

Let $M\in M_{n}(\mathbb{F}_{p})$ . There is a register program that cleanly computes $M^{d}$ with:

$\blacksquare$

$4$ recursive calls to $M$ ,
$\blacksquare$

$O\left(dn^{2}\binom{n^{2}+d}{d}\right)$ basic instructions,
$\blacksquare$

$O\left(dn^{2}\binom{n^{2}+d}{d}\right)$ registers over $\mathbb{F}_{O(p^{d}n^{d-1})}$ , and
$\blacksquare$

$O(n^{2})$ registers over $\mathbb{F}_{p}$ .

Reducing the Number of Registers

The latter register program works for all $d$ and has a constant number of recursive calls. However, the number of registers is still exponential in $d$ , and therefore it is not usable as is. To fix this issue, let us observe that if we let $f_{d}$ be the powering function $f_{d}(M)=M^{d}$ , we have $f_{d^{k}}=(f_{d})^{\circ^{k}}$ , where $\circ^{k}$ means composing $k$ times the same function. An iterated application of Lemma 8 gives the following:

Lemma 31.

Let $\mathbf{R}$ be a ring, $x\in\mathbf{R}$ and $\delta\in\mathbb{N}$ . Suppose that for all $k\leq\delta$ there is a clean register program $P_{k}$ computing $x^{k}$ with at most $t$ recursive calls, $s$ basic instructions, and $r$ registers. Then, there exists a register program $P$ that computes $x^{d}$ with

$\blacksquare$

at most $(\lceil\log_{\delta}{d}\rceil+2)\frac{t}{t-1}t^{\lceil\log_{\delta}{d}\rceil}$ recursive calls,
$\blacksquare$

$O(\log_{\delta}{d})\frac{t+s(t+1)^{\lceil\log_{\delta}{d}\rceil}}{t}$ basic instructions, and
$\blacksquare$

$1+r(\lfloor\log_{\delta}{d}\rfloor+1)$ registers over $\mathbf{R}$ .

Proof.

Let $f_{k}\colon\mathbf{R}\rightarrow\mathbf{R}$ be the functions that computes $x^{k}$ for all $k\leq d$ . Observe that

d=\sum_{i=0}^{\lfloor\log_{\delta}{d}\rfloor}\alpha_{i}\delta^{i},

where $\alpha_{i}$ are non-negative integers smaller than $\delta$ . Therefore

x^{d}=\prod_{i=0}^{\lfloor\log_{\delta}{d}\rfloor}(x^{\delta^{i}})^{\alpha_{i}}.

Note that $(x^{\delta^{i}})^{\alpha_{i}}=f_{\alpha_{i}}\circ{(f_{\delta})}^{i}(x)$ . We can apply Lemma 8 to obtain a register program $R_{i}$ that cleanly computes $(x^{\delta^{i}})^{\alpha_{i}}$ with $t^{i+1}$ recursive calls, $(t+1)^{i}s$ basic instructions, and $r$ registers. Then, using Lemma 18 for $P=\prod_{i}(x^{\delta^{i}})^{\alpha_{i}}$ , we can compute $x^{d}$ by calling $\lfloor\log_{\delta}{d}\rfloor+2\leq\lceil\log_{\delta}{d}\rceil+2$ times each program $R_{i}$ , and using a single additional register. In total, we require

$\blacksquare$

at most $(\lceil\log_{\delta}{d}\rceil+2)\sum_{i=0}^{\lfloor\log_{\delta}{d}\rfloor}t^{% i+1}\leq(\lceil\log_{\delta}{d}\rceil+2)\frac{t}{t-1}t^{\lceil\log_{\delta}{d}\rceil}$ recursive calls to $x$ ,
$\blacksquare$

$O(\log_{\delta}{d})\left(1+s\sum_{i=0}^{\lfloor\log_{\delta}{d}\rfloor}(t+1)^{% i}\right)\leq O(\log_{\delta}{d})\frac{t+s(t+1)^{\lceil\log_{\delta}{d}\rceil}% }{t}$ basic instructions, and
$\blacksquare$

$1+\sum_{i=0}^{\lfloor\log_{\delta}{d}\rfloor}r=1+r(\lfloor\log_{\delta}{d}% \rfloor+1)$ registers.

$\hfill\blacktriangleleft$ Hence, we fix some $\delta\in\mathbb{N}$ , and we can compute $M^{d}$ for all $d$ based on the register program for $\delta$ . This yields our register program for Theorem 2.

Proof of Theorem 2.

Let $\delta\in\mathbb{N}$ . Combining Lemma 30 and Lemma 31, we get that for all $d\in\mathbb{N}$ , there exists a register program that computes $M^{d}$ with:

$\blacksquare$

$O(\frac{\log{d}}{\log{\delta}}d^{\frac{3}{\log{\delta}}})$ recursive calls to $M$ ,
$\blacksquare$

$O\left(\frac{\delta n^{2}d^{\frac{3}{\log{\delta}}}\log{d}}{\log{\delta}}% \binom{n^{2}+\delta}{\delta}\right)$ basic instructions,
$\blacksquare$

$O(\frac{n^{2}\log{d}}{\log{\delta}})$ registers over $\mathbb{F}_{p}$ .

Replacing by $\epsilon=\frac{3}{\log{\delta}}$ , yields a program with

$\blacksquare$

$O(\epsilon d^{\epsilon}\log{d})$ recursive calls to $M$ ,
$\blacksquare$

$O\left(\epsilon\frac{n^{2^{\frac{3}{\epsilon}+1}}}{2^{\frac{3}{\epsilon}-1}}d^% {\epsilon}\log{d}\right)$ basic instructions,
$\blacksquare$

$O\left(\epsilon\frac{n^{2^{\frac{3}{\epsilon}+1}}}{2^{\frac{3}{\epsilon}-1}}% \log{d}\right)$ registers over $\mathbb{F}_{q}$ for $q=O\left(\left(np\right)^{2^{\frac{3}{\epsilon}}}\right)$ , and
$\blacksquare$

$O(\epsilon n^{2}\log{d})$ registers over $\mathbb{F}_{p}$ .

And this completes the proof. $\hfill\blacktriangleleft$

5 Conclusion and Open problems

In this paper, we introduce a novel approach to compute functions recognized by non-constant depth polynomial-size circuits with bounded $A N D$ fan-in and unbounded $O R$ fan-in. Our method relies on representing these functions in a weak polynomial form over a field $\mathbb{F}_{p}$ for a sufficiently large prime $p$ .

This work marks the first new connection between circuit complexity and catalytic computation classes since the seminal result ${\text{TC${}^{1}$}}\subseteq{\text{CL}}$ [10].

An open question stemming from our paper is the extensibility of our method to compute circuits of greater depth. Progress in this direction would likely necessitate exploring weaker polynomial representations that allow more efficient register programs for deeper or different types of circuits. For instance, building on existing concepts in the literature, perhaps a program $P$ which weakly represents $f$ , i.e. $f(x)=b$ iff $P(x)\in S_{b}$ for some fixed sets $S_{0},S_{1}$ , could allow us to make further progress.

References

[1] Aryan Agarwala and Ian Mertz. Bipartite matching is in catalytic logspace. In Electron. Colloquium Comput. Complex, volume 48, 2025.
[2] Richard Beigel. The polynomial method in circuit complexity. In [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference, pages 82–95. IEEE, 1993. doi:10.1109/SCT.1993.336538.
[3] Michael Ben-Or and Richard Cleve. Computing algebraic formulas using a constant number of registers. SIAM J. Comput., 21(1):54–58, 1992. doi:10.1137/0221006.
[4] Andrzej Białynicki-Birula and Andrzej Schinzel. Representations of multivariate polynomials by sums of univariate polynomials in linear forms. In Colloquium Mathematicum, volume 2, pages 201–233, 2008.
[5] Sagar Bisoyi, Krishnamoorthy Dinesh, Bhabya Deep Rai, and Jayalal Sarma. Almost-catalytic computation. arXiv preprint arXiv:2409.07208, 2024.
[6] Sagar Bisoyi, Krishnamoorthy Dinesh, and Jayalal Sarma. On pure space vs catalytic space. Theoretical Computer Science, 921:112–126, 2022. doi:10.1016/J.TCS.2022.04.005.
[7] Allan Borodin. On relating time and space to size and depth. SIAM journal on computing, 6(4):733–744, 1977. doi:10.1137/0206054.
[8] Allan Borodin, Stephen Cook, and Nicholas Pippenger. Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and control, 58(1-3):113–136, 1983. doi:10.1016/S0019-9958(83)80060-6.
[9] Allan Borodin, Stephen A. Cook, Patrick W. Dymond, Walter L. Ruzzo, and Martin Tompa. Two applications of inductive counting for complementation problems. SIAM J. Comput., 18(3):559–578, 1989. doi:10.1137/0218038.
[10] Harry Buhrman, Richard Cleve, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 857–866, 2014.
[11] Harry Buhrman, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Catalytic space: Non-determinism and hierarchy. Theory of Computing Systems, 62:116–135, 2018. doi:10.1007/S00224-017-9784-7.
[12] Richard Cleve. Computing algebraic formulas with a constant number of registers. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 254–257, 1988.
[13] Richard Erwin Cleve. Methodologies for designing block ciphers and cryptographic protocols. PhD thesis, University of Toronto, 1990.
[14] James Cook, Jiatu Li, Ian Mertz, and Edward Pyne. The structure of catalytic space: Capturing randomness and time via compression. ECCC TR24-106, 2024.
[15] James Cook and Ian Mertz. Catalytic approaches to the tree evaluation problem. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 752–760, 2020. doi:10.1145/3357713.3384316.
[16] James Cook and Ian Mertz. Encodings and the tree evaluation problem. In Electron. Colloquium Comput. Complex, volume 54, 2021.
[17] James Cook and Ian Mertz. Trading time and space in catalytic branching programs. In 37th Computational Complexity Conference (CCC 2022). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2022.
[18] James Cook and Ian Mertz. Tree evaluation is in space o (log n· log log n). In Electron. Colloquium Comput. Complex., 2023.
[19] Stephen Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branching programs for tree evaluation. ACM Transactions on Computation Theory (TOCT), 3(2):1–43, 2012. doi:10.1145/2077336.2077337.
[20] Stephen A Cook. The classification of problems which have fast parallel algorithms. In International Conference on Fundamentals of Computation Theory, pages 78–93. Springer, 1983. doi:10.1007/3-540-12689-9_95.
[21] Don Coppersmith and Edna Grossman. Generators for certain alternating groups with applications to cryptography. SIAM Journal on Applied Mathematics, 29(4):624–627, 1975.
[22] Samir Datta, Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Randomized and symmetric catalytic computation. In International Computer Science Symposium in Russia, pages 211–223. Springer, 2020. doi:10.1007/978-3-030-50026-9_15.
[23] Dean Doron, Edward Pyne, and Roei Tell. Opening up the distinguisher: A hardness to randomness approach for bpl= l that uses properties of bpl. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 2039–2049, 2024. doi:10.1145/3618260.3649772.
[24] Jeff Edmonds, Venkatesh Medabalimi, and Toniann Pitassi. Hardness of function composition for semantic read once branching programs. In 33rd Computational Complexity Conference (CCC 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.
[25] Marten Folkertsma, Ian Mertz, Florian Speelman, and Quinten Tupker. Fully characterizing lossy catalytic computation. arXiv preprint arXiv:2409.05046, 2024. doi:10.48550/arXiv.2409.05046.
[26] Oded Goldreich. Solving tree evaluation in o (log n· log log n) space. ECCC, TR24-124, 2024.
[27] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Unambiguous catalytic computation. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2019.
[28] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Lossy catalytic computation. arXiv preprint arXiv:2408.14670, 2024. doi:10.48550/arXiv.2408.14670.
[29] Kazuo Iwama and Atsuki Nagao. Read-once branching programs for tree evaluation problems. ACM Transactions on Computation Theory (TOCT), 11(1):1–12, 2018. doi:10.1145/3282433.
[30] Michal Koucký et al. Catalytic computation. Bulletin of EATCS, 1(118), 2016.
[31] Michal Koucký, Ian Mertz, Ted Pyne, and Sasha Sami. Collapsing catalytic classes. In Electron. Colloquium Comput. Complex, volume 19, 2025.
[32] Jiatu Li, Edward Pyne, and Roei Tell. Distinguishing, predicting, and certifying: On the long reach of partial notions of pseudorandomness. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pages 1–13. IEEE, 2024. doi:10.1109/FOCS61266.2024.00095.
[33] David Liu. Pebbling arguments for tree evaluation. arXiv preprint arXiv:1311.0293, 2013. arXiv:1311.0293.
[34] Ian Mertz. Catalytic computing, tree evaluation, & clean computation, 2020.
[35] Ian Mertz et al. Reusing space: Techniques and open problems. Bulletin of EATCS, 141(3), 2023.
[36] Nicholas Pippenger. On simultaneous resource bounds. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), pages 307–311. IEEE, 1979.
[37] Aaron Potechin. A note on amortized branching program complexity. arXiv preprint arXiv:1611.06632, 2016.
[38] Edward Pyne. Derandomizing logspace with a small shared hard drive. In 39th Computational Complexity Conference (CCC 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
[39] Edward Pyne, Nathan S. Sheffield, and William Wang. Catalytic Communication. In Raghu Meka, editor, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025), volume 325 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:24, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2025.79.
[40] Robert Robere and Jeroen Zuiddam. Amortized circuit complexity, formal complexity measures, and catalytic algorithms. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 759–769. IEEE, 2022.
[41] Walter L Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981. doi:10.1016/0022-0000(81)90038-6.
[42] Andrzej Schinzel. On a decomposition of polynomials in several variables. Journal de théorie des nombres de Bordeaux, 14(2):647–666, 2002.
[43] Claude E Shannon. The synthesis of two-terminal switching circuits. The Bell System Technical Journal, 28(1):59–98, 1949. doi:10.1002/J.1538-7305.1949.TB03624.X.
[44] H Venkateswaran. Circuit definitions of nondeterministic complexity classes. SIAM Journal on Computing, 21(4):655–670, 1992. doi:10.1137/0221040.
[45] Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
[46] Ryan Williams. Simulating time with square-root space. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, 2025.

[bib.bib1] [1] Aryan Agarwala and Ian Mertz. Bipartite matching is in catalytic logspace. In Electron. Colloquium Comput. Complex, volume 48, 2025.

[bib.bib2] [2] Richard Beigel. The polynomial method in circuit complexity. In [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference, pages 82–95. IEEE, 1993. doi:10.1109/SCT.1993.336538.

[bib.bib3] [3] Michael Ben-Or and Richard Cleve. Computing algebraic formulas using a constant number of registers. SIAM J. Comput., 21(1):54–58, 1992. doi:10.1137/0221006.

[bib.bib4] [4] Andrzej Białynicki-Birula and Andrzej Schinzel. Representations of multivariate polynomials by sums of univariate polynomials in linear forms. In Colloquium Mathematicum, volume 2, pages 201–233, 2008.

[bib.bib5] [5] Sagar Bisoyi, Krishnamoorthy Dinesh, Bhabya Deep Rai, and Jayalal Sarma. Almost-catalytic computation. arXiv preprint arXiv:2409.07208, 2024.

[bib.bib6] [6] Sagar Bisoyi, Krishnamoorthy Dinesh, and Jayalal Sarma. On pure space vs catalytic space. Theoretical Computer Science, 921:112–126, 2022. doi:10.1016/J.TCS.2022.04.005.

[bib.bib7] [7] Allan Borodin. On relating time and space to size and depth. SIAM journal on computing, 6(4):733–744, 1977. doi:10.1137/0206054.

[bib.bib8] [8] Allan Borodin, Stephen Cook, and Nicholas Pippenger. Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and control, 58(1-3):113–136, 1983. doi:10.1016/S0019-9958(83)80060-6.

[bib.bib9] [9] Allan Borodin, Stephen A. Cook, Patrick W. Dymond, Walter L. Ruzzo, and Martin Tompa. Two applications of inductive counting for complementation problems. SIAM J. Comput., 18(3):559–578, 1989. doi:10.1137/0218038.

[bib.bib10] [10] Harry Buhrman, Richard Cleve, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 857–866, 2014.

[bib.bib11] [11] Harry Buhrman, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Catalytic space: Non-determinism and hierarchy. Theory of Computing Systems, 62:116–135, 2018. doi:10.1007/S00224-017-9784-7.

[bib.bib12] [12] Richard Cleve. Computing algebraic formulas with a constant number of registers. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 254–257, 1988.

[bib.bib13] [13] Richard Erwin Cleve. Methodologies for designing block ciphers and cryptographic protocols. PhD thesis, University of Toronto, 1990.

[bib.bib14] [14] James Cook, Jiatu Li, Ian Mertz, and Edward Pyne. The structure of catalytic space: Capturing randomness and time via compression. ECCC TR24-106, 2024.

[bib.bib15] [15] James Cook and Ian Mertz. Catalytic approaches to the tree evaluation problem. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 752–760, 2020. doi:10.1145/3357713.3384316.

[bib.bib16] [16] James Cook and Ian Mertz. Encodings and the tree evaluation problem. In Electron. Colloquium Comput. Complex, volume 54, 2021.

[bib.bib17] [17] James Cook and Ian Mertz. Trading time and space in catalytic branching programs. In 37th Computational Complexity Conference (CCC 2022). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2022.

[bib.bib18] [18] James Cook and Ian Mertz. Tree evaluation is in space o (log n· log log n). In Electron. Colloquium Comput. Complex., 2023.

[bib.bib19] [19] Stephen Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branching programs for tree evaluation. ACM Transactions on Computation Theory (TOCT), 3(2):1–43, 2012. doi:10.1145/2077336.2077337.

[bib.bib20] [20] Stephen A Cook. The classification of problems which have fast parallel algorithms. In International Conference on Fundamentals of Computation Theory, pages 78–93. Springer, 1983. doi:10.1007/3-540-12689-9_95.

[bib.bib21] [21] Don Coppersmith and Edna Grossman. Generators for certain alternating groups with applications to cryptography. SIAM Journal on Applied Mathematics, 29(4):624–627, 1975.

[bib.bib22] [22] Samir Datta, Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Randomized and symmetric catalytic computation. In International Computer Science Symposium in Russia, pages 211–223. Springer, 2020. doi:10.1007/978-3-030-50026-9_15.

[bib.bib23] [23] Dean Doron, Edward Pyne, and Roei Tell. Opening up the distinguisher: A hardness to randomness approach for bpl= l that uses properties of bpl. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 2039–2049, 2024. doi:10.1145/3618260.3649772.

[bib.bib24] [24] Jeff Edmonds, Venkatesh Medabalimi, and Toniann Pitassi. Hardness of function composition for semantic read once branching programs. In 33rd Computational Complexity Conference (CCC 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.

[bib.bib25] [25] Marten Folkertsma, Ian Mertz, Florian Speelman, and Quinten Tupker. Fully characterizing lossy catalytic computation. arXiv preprint arXiv:2409.05046, 2024. doi:10.48550/arXiv.2409.05046.

[bib.bib26] [26] Oded Goldreich. Solving tree evaluation in o (log n· log log n) space. ECCC, TR24-124, 2024.

[bib.bib27] [27] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Unambiguous catalytic computation. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2019.

[bib.bib28] [28] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Lossy catalytic computation. arXiv preprint arXiv:2408.14670, 2024. doi:10.48550/arXiv.2408.14670.

[bib.bib29] [29] Kazuo Iwama and Atsuki Nagao. Read-once branching programs for tree evaluation problems. ACM Transactions on Computation Theory (TOCT), 11(1):1–12, 2018. doi:10.1145/3282433.

[bib.bib30] [30] Michal Koucký et al. Catalytic computation. Bulletin of EATCS, 1(118), 2016.

[bib.bib31] [31] Michal Koucký, Ian Mertz, Ted Pyne, and Sasha Sami. Collapsing catalytic classes. In Electron. Colloquium Comput. Complex, volume 19, 2025.

[bib.bib32] [32] Jiatu Li, Edward Pyne, and Roei Tell. Distinguishing, predicting, and certifying: On the long reach of partial notions of pseudorandomness. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pages 1–13. IEEE, 2024. doi:10.1109/FOCS61266.2024.00095.

[bib.bib33] [33] David Liu. Pebbling arguments for tree evaluation. arXiv preprint arXiv:1311.0293, 2013. arXiv:1311.0293.

[bib.bib34] [34] Ian Mertz. Catalytic computing, tree evaluation, & clean computation, 2020.

[bib.bib35] [35] Ian Mertz et al. Reusing space: Techniques and open problems. Bulletin of EATCS, 141(3), 2023.

[bib.bib36] [36] Nicholas Pippenger. On simultaneous resource bounds. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), pages 307–311. IEEE, 1979.

[bib.bib37] [37] Aaron Potechin. A note on amortized branching program complexity. arXiv preprint arXiv:1611.06632, 2016.

[bib.bib38] [38] Edward Pyne. Derandomizing logspace with a small shared hard drive. In 39th Computational Complexity Conference (CCC 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.

[bib.bib39] [39] Edward Pyne, Nathan S. Sheffield, and William Wang. Catalytic Communication. In Raghu Meka, editor, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025), volume 325 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:24, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2025.79.

[bib.bib40] [40] Robert Robere and Jeroen Zuiddam. Amortized circuit complexity, formal complexity measures, and catalytic algorithms. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 759–769. IEEE, 2022.

[bib.bib41] [41] Walter L Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981. doi:10.1016/0022-0000(81)90038-6.

[bib.bib42] [42] Andrzej Schinzel. On a decomposition of polynomials in several variables. Journal de théorie des nombres de Bordeaux, 14(2):647–666, 2002.

[bib.bib43] [43] Claude E Shannon. The synthesis of two-terminal switching circuits. The Bell System Technical Journal, 28(1):59–98, 1949. doi:10.1002/J.1538-7305.1949.TB03624.X.

[bib.bib44] [44] H Venkateswaran. Circuit definitions of nondeterministic complexity classes. SIAM Journal on Computing, 21(4):655–670, 1992. doi:10.1137/0221040.

[bib.bib45] [45] Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.

[bib.bib46] [46] Ryan Williams. Simulating time with square-root space. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, 2025.

	$\displaystyle I_{\lambda,all}\colon\text{For }$	$\displaystyle 1\leq i\leq d,$
		$\displaystyle R_{in,\ell}\leftarrow R_{in,\ell}+\lambda x_{i}.$

Catalytic Computing and Register Programs Beyond Log-Depth

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

1.1 Catalytic Computation

1.2 Register Programs

1.3 Our Results

Theorem 1.

Theorem 2.

1.4 Main Contribution

2 Preliminaries

2.1 Circuits

Definition 3.

2.2 Catalytic Computation

Definition 4.

Definition 5.

2.3 Register Programs

Definition 6.

Definition 7.

Lemma 8 (Composition Lemma).

Proof.

Lemma 9 (Lemma 15 in [10]).

2.4 Polynomial Representation

Definition 10.

Definition 11.

3 Register Programs for Polynomials

3.1 Computing Univariate Polynomials

Lemma 12.

Proof.

Lemma 13.

Proof.

3.2 Computing Multivariate Polynomials

Theorem 14 ([42, 4]).

Lemma 15.

Proof.

Corollary 16.

Corollary 17.

Lemma 18.

3.3 Computing Boolean functions

Lemma 19.

Lemma 20.

Proof.

4 Results for Circuits and Matrix Powering

4.1 Circuits via Merging Layers

Lemma 21.

Proof.

Lemma 22.

Proof.

Corollary 23.

Proof of Theorem 1.

4.1.1 Extension to other models

Theorem 24.

Theorem 25.

Lemma 26.

Lemma 27.

Theorem 28.

4.2 Matrix Powering via Decomposition

Lemma 29.

Computing any power 𝒅

Lemma 30.

Reducing the Number of Registers

Lemma 31.

Proof.

Proof of Theorem 2.

5 Conclusion and Open problems

References

Computing any power $𝒅$