Abstract 1 Introduction 2 Preliminaries 3 Register Programs for Polynomials 4 Results for Circuits and Matrix Powering 5 Conclusion and Open problems References

Catalytic Computing and Register Programs Beyond Log-Depth

Yaroslav Alekseev ORCID Technion Israel Institute of Technology, Haifa, Israel Yuval Filmus ORCID Technion Israel Institute of Technology, Haifa, Israel Ian Mertz ORCID Charles University, Prague, Czech Republic Alexander Smal ORCID JetBrains Research, Paphos, Cyprus Antoine Vinciguerra ORCID Technion Israel Institute of Technology, Haifa, Israel
Abstract

In a seminal work, Buhrman et al. (STOC 2014) defined the class CSPACE(s,c) of problems solvable in space s with an additional catalytic tape of size c, which is a tape whose initial content must be restored at the end of the computation. They showed that uniform TC1 circuits are computable in catalytic logspace, i.e., CL=CSPACE(O(logn),2O(logn)), thus giving strong evidence that catalytic space gives L strict additional power. Their study focuses on an arithmetic model called register programs, which has been a focal point in development since then.

Understanding CL remains a major open problem, as TC1 remains the most powerful containment to date. In this work, we study the power of catalytic space and register programs to compute circuits of larger depth. Using register programs, we show that for every ϵ>0,

SAC2CSPACE(O(log2nloglogn),2O(log1+ϵn)).

On the other hand, we know that SAC2TC2CSPACE(O(log2n),2O(logn)). Our result thus shows an O(loglogn) factor improvement on the free space needed to compute SAC2, at the expense of a nearly-polynomial-sized catalytic tape.

We also exhibit non-trivial register programs for matrix powering, which is a further step towards showing NC2CL.

Keywords and phrases:
catalytic computing, circuit classes, polynomial method
Funding:
Yaroslav Alekseev: Supported by ISF grant 507/24.
Yuval Filmus: Supported by ISF grant 507/24.
Ian Mertz: Supported by the Grant Agency of the Czech Republic under the grant agreement no. 24-10306S and by the Center for Foundations of Contemporary Computer Science (Charles Univ. project UNCE 24/SCI/008).
Antoine Vinciguerra: Supported by ISF grant 507/24.
Copyright and License:
[Uncaptioned image] © Yaroslav Alekseev, Yuval Filmus, Ian Mertz, Alexander Smal, and Antoine Vinciguerra; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Circuit complexity
; Theory of computation Complexity classes
Related Version:
Previous Version: https://arxiv.org/pdf/2504.17412
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Introduction

1.1 Catalytic Computation

In the realm of space-bounded computation, the catalytic computing framework, first introduced by Buhrman et al.[10], investigates the power of having additional storage which has to be restored to its original value at the end of computation. A catalytic Turing machine is a space-bounded Turing machine with two read-write tapes: a standard work tape of size s and an additional, dedicated catalytic tape of size c. However, the catalytic tape begins completely filled in with some memory τ, and despite being available for use as work space, the catalytic tape must be restored to its initial state τ at the end of the computation.

While CSPACE(s,c), the class of functions solvable with work space s and catalytic space c, clearly sits between SPACE(s) and SPACE(s+c), naïvely it might seem that catalytic space should not improve the power of machines, as was conjectured in many previous works [19, 33, 24, 29]. Unexpectedly, [10] showed that catalytic logspace (CL=CSPACE(logn,2O(logn))), the catalytic analogue of the complexity class L, contains the circuit class TC1, which contains non-deterministic and randomized logspace (NL and BPL, respectively) as well as functions, such as determinant, widely believed to be outside both. Thus, they gave a strong case for the power of catalytic memory as a resource.

Building on this result, the catalytic approach and its methods have seen growing interest. Many follow-up works have sought to understand variants and properties of catalytic space have been studied, such as non-deterministic and randomized analogues [11, 22, 14, 31], non-uniform catalytic models [37, 40, 17, 18], robustness to errors in catalytic resetting [28, 25], catalytic communication protocols [39], and other variants [27, 6, 5] (see surveys by Koucký [30] and Mertz [35] for more discussion of these and other works).

Furthermore, applying catalytic tools to other complexity-theoretic questions has seen successful results in recent years. Two of the most successful approaches have been 1) approaches to derandomizing non-catalytic space-bounded classes [38, 23, 32]; and 2) space-bounded algorithms for function composition [15, 16, 18], which recently culminated in a breakthrough simulation by Williams [46] of time with quadratically less space.

Nevertheless, the exact strength of catalytic space remains open. In their first work, [10] showed TC1CLZPP, with the key open problem being the relationship of CL to P. Further work has shown slight progress on both ends; Cook et al. [14] showed that CL reduces to the lossy coding problem, which itself is in ZPP, while Agarwala and Mertz [1] showed that bipartite matching, a function not known to be anywhere in the NC hierarchy, can be solved in CL.

The key open question in this work is whether or not CL contains higher levels of the NC hierarchy. Mertz [35] posed a concrete approach to showing NC2CL via register programs, which we turn to now.

1.2 Register Programs

Register programs were first introduced by Coppersmith and Grossmann [21], and revived later by Ben-Or and Cleve to generalize Barrington’s theorem to arithmetic circuits [12, 13]. A register program over a ring 𝐑 and a set of inputs x1,,xn𝐑 is defined as a sequence of instructions I1,,In:𝐑s𝐑s applied to a set of registers {R1,,Rs}.

The key technique of Buhrman et al. [10] was based on register programs, which they showed can be directly simulated on a catalytic machine. They constructed a register program computing the kth power of some value x with k registers and O(1) accesses to the value x; using some extensions and other works, they showed that CL contains the circuit class TC1. This was also the driving force behind the result of Cook and Mertz [18], and by extension the work of Williams [46], who showed that the Tree Evaluation problem, which was suggested as a function which would separate L from P [19], can be computed in space O(lognloglogn).

As with catalytic space more generally, the power of register programs is still a mystery and deserves to be investigated thoroughly. The approach suggested in [35] for CL versus NC2 is to find a register program computing the kth power of a matrix with a register program matching the initial result of [10] in the non-commutative setting. More generally, it was proposed that further algebraic extension of these and other works could pave the way to improving the strength of CL.

1.3 Our Results

In this work, we investigate the complexity of computing polynomials in the register program framework, and make the first progress towards catalytic algorithms for circuit classes beyond TC1. We present register programs for different kinds of polynomials (such as symmetric polynomials and polynomials representing Boolean functions), as well as more efficient programs for evaluating non-constant depth d boolean circuits with a constant number of recursive calls.

From this, we deduce that the class SAC2 of Boolean circuits of polynomial size and depth O(log2n), with bounded fan-in AND gates and unbounded fan-in OR gates, can be computed with o(log2n) work space when given access to nearly polynomial catalytic space.

Theorem 1.

For all ϵ>0,

SAC2CSPACE(O(log2nloglogn),2O(log1+ϵn)).

Our technique also gives such an improvement for NC2 with only polynomial catalytic space, although this can be derived more directly from the results of [10]. It also extends to #SAC2, the arithmetic variant of SAC2.

Second, we show sublinear register programs to compute matrix powering, thus making initial progress towards the program of [34]:

Theorem 2.

Let d,n,p, where p is prime. Let MMn(𝔽p). For all ϵ>0 there is a register program that computes Md with

  • Oϵ(dϵlogd) recursive calls to M,

  • Oϵ(nexp(1/ϵ)) basic instructions, and

  • Oϵ(nexp(1/ϵ)) registers over 𝔽O((np)exp(1/ϵ)).

1.4 Main Contribution

We introduce a novel clean register program for computing multivariate polynomials. Current state-of-the-art register programs for degree d multivariate polynomials P(x1,,xn) require d+1 recursive calls and n+1 registers [18]. Our approach dramatically reduces the number of recursive calls to a constant number, albeit at the cost of using more registers. This efficiency gain stems from our fundamental observation: the number of recursive calls is solely dependent on the number of multiplications.

Therefore, unlike traditional techniques that compute the polynomial directly or rely on interpolation, we propose a new strategy: constructing weaker polynomial representations, composed exclusively of additions and powers, over larger fields. This generalizes the technique employed in [10] for computing unbounded Majority gates. As established in [10] and [12], and as we will further elaborate, a class of functions admitting register programs with at most t recursive calls implies the existence of a register program cleanly computing depth-d circuits comprising functions from this class by register programs with at most td recursive calls. This register program can thus be described in O(dlogt) space. Our register program for polynomials, by achieving a constant number of recursive calls, allows us to eliminate this O(logt) term, thereby reducing the descriptive complexity for those circuits.

2 Preliminaries

2.1 Circuits

A Boolean circuit is a directed acyclic graph C with 0,1-valued inputs and outputs. Internal nodes (gates) are labeled with Boolean operations from a given set B. The circuit size s(C) is the number of nodes, and its depth d(C) is the longest input-output path. While Boolean circuits had been studied earlier [43], their importance for parallel computing gained significant attention in the late 1970s [7, 36, 41, 8].

Definition 3.

We define the following circuit families over input literals

x1,,xn,¬x1,,¬xn:

  • NC circuits: fan-in 2 AND and OR gates

  • SAC circuits: fan-in 2 AND gates and unbounded fan-in OR gates111Since SACi is closed under complement for all i1 (see [9]), these fan-ins can also be reversed, but for our argument we specifically use this version.

  • AC circuits: unbounded fan-in AND and OR gates

  • TC circuits: unbounded fan-in threshold gates

We denote by NCi (SACi, ACi, TCi) the family of functions computable by NC (SAC, AC, TC) circuits of polynomial size and O(login) depth. By NC we denote iNCi and similarly for SAC, AC, and TC.

The relations between these different classes have been extensively studied [41, 45]. For all i, we have the following relations:

NCiSACiACiTCiNCi+1.

As a consequence, we have that

NC=SAC=AC=TC.

Furthermore, other lines of research [7, 20, 44] have established other relationships for (logspace) uniform circuit classes:

NC1LNLSAC1AC1TC1NC2SAC2NCP.

While all containments are widely conjectured to be strict, no separations are known.

2.2 Catalytic Computation

Central to this work is the notion, introduced by Buhrman et al. [10], of catalytic space.

Definition 4.

A catalytic Turing machine with work space s:=s(n) and catalytic space c:=c(n) is a space-bounded Turing machine M with access to two read-write tapes: a standard work tape of size s, and a catalytic tape of size c. The catalytic tape has the additional restriction that for every τ{0,1}c, if we run M on any input x with the catalytic tape in initial configuration τ, the catalytic tape has final configuration τ as well.

This definition gives rise to natural complexity classes:

Definition 5.

We define CSPACE(s,c) as the class of problems solvable by a catalytic Turing machine with a work tape of size s and a catalytic tape of size c. Furthermore, we denote catalytic logspace to be CL=CSPACE(O(logn),2O(logn)).

We can view CL as a logspace machine equipped with the maximal effective amount of catalytic tape. Indeed, if a CSPACE(s,c) space has a catalytic tape of size c=2ω(s), then storing the catalytic tape’s head’s position would already exceed its free space s. This suggests that, beyond c=2θ(s), a catalytic tape offers no additional computational power.

2.3 Register Programs

Inspired by Ben-Or and Cleve’s work on straight-line programs and and their simulation of NC1 circuits on this model [3, 12], Buhrman et al. [10] investigated the potential for optimization on a wider class of circuits, examining a more generic type of computational model due to Coppersmith and Grossman [21].

Definition 6.

Let 𝐑 be a ring. An 𝐑-register program over input x1,,xn with space s is defined as a sequence of instructions I1,,It applied to a set of registers R1,,Rs, where each instruction Ik has one of the following two forms:

  • Basic instruction: updating a register Ri with a polynomial pk over the other registers:

    Ik:RiRi+pk(R1,,Ri1,Ri+1,,Rn).
  • Input access/Recursive call: adds the input (scaled by some λ𝐑) to one register Ri:

    Ik:RiRi+λxj.

The time t of the register program is the number of instructions Ik.

Furthermore, a register program computes a function f:𝐑n𝐑m if there is a set of m registers {Ri1,,Rim}, initially holding some value τk𝐑, such that, once all the instructions are applied, we have:

Rik=τk+(f(x1,,xn))k.

With regards to such subroutines, it will be important to restrict our register programs to be of a form amenable to composition:

Definition 7.

A 𝐑-register program cleanly computes a function f:𝐑n𝐑m if, assuming all registers are initialized to some initial value, τi𝐑:

  • The register program computes f on a subset S of registers

  • For every other register Rj, Rj=τj.

We note that in Definition 6, we use the term input access when we are given direct access to the input x, while we use the term recursive call when we are designing a subroutine whose input is being received not from the global function but rather from some preceding intermediate computation. We will make this distinction clear where necessary.

We present the following composition lemma to highlight the distinction between these two terms, and also to illustrate the use of clean computation.

Lemma 8 (Composition Lemma).

Let f:𝐑n𝐑m and g:𝐑o𝐑n be functions. Let Πf and Πg be clean register programs computing f and g with tf and tg input accesses, sf and sg basic instructions, and rf and rg registers, respectively. Then there exists a clean register program Πfg that computes fg with

  • tftg input accesses,

  • sf+tfsg basic instructions, and

  • max{rf,rg} registers.

Proof.

Let x𝐑 and let y=g(x), so f(g(x))=f(y). We can cleanly compute f(y) using Πf, with tf recursive calls to y, sf basic instructions and rf registers.

On the other hand, we can cleanly compute y=g(x) into the registers of Πf using tg input accesses and sg basic instructions for each of the tf recursive calls made to g.

There are now two cases:

  • If rg<rf, since Πg cleanly computes g, we can use the additional registers of Πf to compute y.

  • If rg>rf, we add rgrf registers and compute y using the latter registers as well as the additional registers of Πf.

Thus we can compute f(g(x)) with tftg input accesses to x and max{rf,rg} registers.

Lastly, we connect clean register programs to catalytic computation:

Lemma 9 (Lemma 15 in [10]).

Any clean register program of time t, space s, and with n inputs over a finite ring 𝐑 can be simulated by a catalytic Turing machine in pure space O(logt+logn+log|𝐑|) and catalytic space O(slog|𝐑|).

2.4 Polynomial Representation

The question of representing Boolean functions as multivariate polynomials over a ring has been intensively studied and has proven to be a useful tool in circuit complexity (see [2] for a survey). We will consider two kinds of representation: one that we will call the representation of f, and the other the weak representation of f.

Definition 10.

Let P be a polynomial on n variables over a ring 𝐑, and let f be a Boolean function with n inputs. We say P represents f if for all inputs x{0,1}n, we have P(x)=0 if and only if f(x)=0.

Let us underline the difference between these two definitions of representation with an example. Let us consider the n-ary AND function. Observe that the polynomial P(X1,,Xn)=i=1nXi over any m, where m>n, weakly computes AND. Indeed, we can take S0={1,,n1} and S1={n}. On the other hand, P does not represent AND, and in general, it is known that AND cannot be represented by a degree 1 polynomial.

Definition 11.

We will say that a Boolean function f has a (𝐑,d)-representation if there is a degree d polynomial which represents it. We also define (𝐑,d,s)-representation, where additionally the polynomial is required to have at most s monomials.

3 Register Programs for Polynomials

To prove our main theorem, our main task will be to find an efficient register program to cleanly compute multivariate polynomials. In this section, we present such a register program for multivariate polynomials for 𝐙p, for any prime p, as well as a register program specifically for the Boolean case.

3.1 Computing Univariate Polynomials

In order to prove TC1CL, [10] designs a register program to compute xn for any element x in a commutative field. We state a straightforward generalization of their lemma and corresponding program to compute arbitrary univariate polynomials:

Lemma 12.

Let p be a prime number, and let P𝔽p[X] be a univariate polynomial of degree at most n. For all x𝔽p, there is a clean register program that computes P(x) with

  • 4 input accesses to x,

  • 2n+2 basic instructions, and

  • n+2 registers.

Proof.

Let P=i=0naiXi for some coefficients ai. Let Rin be a register initially equal to τin, in which we will apply our input accesses on x.

It is straightforward to show, by writing x as (τin+x)τin, that

xi=j=0n(ij)(τin+x)j(τin)ij.

Making this observation, [10] presents in Lemma 5 a register program that cleanly computes xi with i+2 registers and 4 input accesses.

We will use the latter register program to compute xn to construct a register program to compute P(x), given in Figure 1.

1Registers:
2Rin=τin
3R1=τ1,,Rn=τn
4Rout=τout
5
6RinRin+x // Rin=τin+x
7
8For 1in
9 RiRi+Rini // Ri=τi+(τin+x)i
10
11RinRinx // Rin=τin
12
13For 1in
14 RoutRout+ai((1)iRini+Σj=1i(ij)(1)ijRjRinij)
15 Rout=τout+P(x)a0+Σi=1naiΣj=1i(ij)(1)ijτjτinij
16
17RinRin+x // Rin=τin+x
18RiRiRini // Ri=τi
19RinRinx // Rin=τin
20
21For 1in
22 RoutRoutaiΣj=1i(ij)(1)ijRjRinij
23// Rout=τout+P(x)a0
24
25RoutRout+a0 // Rout=τout+P(x)
Figure 1: Program for computing a polynomial P(x) of degree n using 4 recursive calls to x, 2n+2 basic instructions, and n+2 registers.

As discussed in [10, 18], this program can be adapted to compute a set of polynomials {P1,,P} with similar parameters:

Lemma 13.

Let p be a prime number, and let P1,,P𝔽p[X] be univariate polynomials of degree at most n. For all x𝔽p, there is a clean register program 𝐔𝐏P1,,P that cleanly computes P1(x)P(x) with

  • 4 recursive calls to x,

  • 2n+2 basic instructions, and

  • n+1+ registers.

Proof.

The program is the same as that of Lemma 12, but with each instruction involving Rout replaced by instructions of the same form, where each involves a different output register Rout,j and the corresponding polynomial Pj(x).

3.2 Computing Multivariate Polynomials

In the last section, we showed that computing powers of elements in a commutative ring can be done with a constant number of recursive calls. Moreover, it should be clear that computing the sum of variables is an easy operation: we simply add each variable to a fixed output register in turn (see [10]).

Thus, our goal is to represent our polynomial in a way that requires only addition and powering operations.

Theorem 14 ([42, 4]).

Let P𝔽p[x1,,xn] be a homogeneous polynomial of degree d<p. There exist m, m elements αi and nm elements βi,j such that:

P(x1,,xn)=i=1mαi(j=1nβi,jxj)d.

Moreover, m(n+d1d1).

Representing our polynomial in that form does not involve any multiplication. We can thus describe a register program that computes a polynomial in O(1) recursive calls:

Lemma 15.

Let P be a homogeneous degree d<p polynomial P(x1,,xn) over 𝔽p. There is a register program FP that cleanly computes P with

  • 4 recursive calls to each xi,

  • O(d(n+dd)) basic instructions, and

  • O(d(n+dd)) registers over 𝔽p.

Proof.

We use Theorem 14 and write P as:

P(x1,,xn)=i=1mαi(j=1nβi,jxj)d.

Let fi(x1,,xn)=j=1nβi,jxj and g=xd. Then by the above discussion:

  • fi can be cleanly computed with 1 recursive call to each xi, 0 basic instructions, and 1 register.

  • g can be cleanly computed with 4 recursive calls, 2d+2 basic instructions, and d+2 registers using the register program from Lemma 12.

Hence combining Lemma 8 and slightly modifying the register program 𝐔𝐏gf1,,gfm from Lemma 13 to share the same output register, there exists a register program Ri that cleanly computes P in Rout with:

  • 4 recursive calls to each xi,

  • m(2d+2)(2d+2)(n+d1d1)+1 basic instructions, and

  • m(d+1)+1(d+1)(n+d1d1)+1 registers,

as required in the statement of the lemma.

This register program immediately generalizes to a non-homogeneous polynomial P by considering the decomposition P=i=0dPi, where each Pi is a homogeneous degree i polynomial.

Corollary 16.

Let P be a degree d<p polynomial P(x1,,xn) over 𝔽p. There is a register program FP that cleanly computes P with

  • 4 recursive calls to each xi,

  • O(d2(n+dd)) basic instructions, and

  • O(d2(n+dd)) registers over 𝔽p.

This register program has the advantage of employing a constant number of recursive calls. However, this method has two caveats. First, the cost in the number of registers is superpolynomial when d is non-constant. Second, the degree d is upper bounded by the field size.

Concerning the field issue, we will instead consider the field 𝔽q for some q>d, such that (Pmodq)modp=Pmodp. Observe that for any degree d polynomial over where the coefficients are smaller than p, the polynomial evaluation for x=(x1,,xn), where 0xip, is upper bounded by

P(x)pi=0d(ni)pi2npd+1.

Hence, we can first evaluate P(x) over a field of size q2npd+1. This yields the most general register program, yet leaves the problem with the number of registers unresolved.

Corollary 17.

Let P be a degree d polynomial P(x1,,xn) over 𝔽p. There is a register program that cleanly computes P with

  • 4 recursive calls to each xi,

  • O(d2(n+dd)) basic instructions,

  • O(d2(n+dd)) registers over 𝔽q, where q=O(2npd+1), and

  • 1 register over 𝔽p.

We also note one register program of orthogonal strength, namely greater in recursive calls but much less in space. In order to prove that Tree Evaluation is in SPACE(lognloglogn) (later improved by Stoeckl, see [26]), Cook and Mertz [18] present a register program to compute multivariate polynomials, which we will also use later:

Lemma 18.

Let P(x1,,xn) be a polynomial of degree d<p over a prime field 𝔽p. There exists a register program that cleanly computes P with:

  • n+1 registers, including a single output register,

  • O(d) basic instructions,

  • d+1 instructions of the type Iλ,all or its inverse Iλ,all1, where:

    Iλ,all:For 1id,
    Rin,Rin,+λxi.

3.3 Computing Boolean functions

In the preceding section, we presented a register program to compute polynomials over any finite field. However, for the case of polynomials over 𝔽2, i.e. Boolean functions, there are specific properties that we can exploit to make them simpler to compute, the most important being that any polynomial over 𝔽2 is multilinear since xi2=xi.

Since any Boolean function f is uniquely represented by a polynomial P over 2, we will directly consider Boolean functions in this section. Let us present a register program that computes any Boolean function f given a representation of f.

Let us first consider the case of symmetric functions. Given a symmetric function f in n variables, there is some univariate polynomial G:[n] such that f(x1,,xn)=G(x1++xn). We can compute G using Lemma 12, which yields the following register program:

Lemma 19.

Let n, and let f be a symmetric polynomial function. Let p be a prime number such that p>n. There is a register program that cleanly computes f with:

  • 4 recursive calls to each xi,

  • O(n) basic instructions, and

  • O(p) registers over p.

From this, we deduce the following lemma for polynomials in general:

Lemma 20.

Let f:{0,1}n{0,1} be a Boolean function which is (,d,t)-represented, and let p be a prime number such that p>max{d,t}. There is a register program which cleanly computes f with:

  • 64 recursive calls to each xi,

  • O(tp2logp) basic instructions, and

  • O(tp) registers over p.

Proof.

Let P be the polynomial that represents f, which we write as a sum of terms

P=j=1tuk,

where each term uk has the form

uk=j=1dxij,i1,,id[n].

Observe that each uk is symmetric, and so using the register program given by Lemma 19, we deduce that we can compute all the terms in parallel with

  • 4 recursive calls to each xi,

  • O(tp) basic instructions, and

  • O(tp) registers over p.

To compute k=1tuk, we can again use the register program of Lemma 19. This yields a register program with

  • 4 recursive calls to each uj,

  • O(p) basic instructions, and

  • O(p) registers over p.

Composing the two register programs using Lemma 8, we have a register program Rf which computes P with

  • 16 recursive calls to each xi,

  • O(tp2) basic instructions, and

  • O(tp) registers over p.

Lastly, we convert the computation of P into a computation of f. Note that our final output register for P is over p, which we represent in binary with O(logp) bits. We now apply the register program for the OR function from Lemma 12 which uses 4 recursive calls, O(logp) basic instructions, and O(logp) registers. Composing this with the rest of the program gives us a final program for computing f with

  • 64 recursive calls to each xi,

  • O(tp2logp) basic instructions, and

  • O(tp) registers over p.

which completes the lemma.

4 Results for Circuits and Matrix Powering

We now apply the register programs we found for polynomials to the central computation models in question. We first show how to use the register programs to compute non-constant depth Boolean circuits, and later present a register program computing powers of matrices.

4.1 Circuits via Merging Layers

In this section, we will find efficient register programs for circuits, and hence efficient catalytic algorithms, by a strategy of merging layers and directly computing the “super-functions” that emerge. Our starting point is the following lemma, used in [12, 10].

Lemma 21.

Let B be a set of Boolean functions such that for any function gB we have a clean register program Pg with at most t input accesses, b basic instructions, and r registers computing g. Let C be a depth d, size s circuit whose gates are functions in B. Then C can be cleanly computed by a register program PC with

  • O(td) recursive calls,

  • O(sbtd) instructions, and

  • O(rs) registers.

Proof.

We will perform the operations of each layer of the circuit in parallel. Each layer uses t recursive calls to the previous layer and sb basic instructions; hence, for a depth d circuit, applying Lemma 8 iteratively gives O(td) recursive calls to the last layer and sbO(td) total basic instructions. Since each layer has at most s gates, we will need at most rs registers at each layer, and so again by Lemma 8 this gives 2rs registers in total.

Our strategy will be as follows: instead of computing a height h circuit C with AND and OR gates layer by layer, we will show that we can compute d layers at a time by an efficient register program, and thus consider the circuit C of height h/d whose gates are themselves depth d circuits. We do this using polynomial representations of such circuits, which we then combine with Lemma 20 to obtain the register programs in question. Our main statement is the following:

Lemma 22.

Let f:{0,1}n{0,1} be a Boolean function. Let C be a size nO(1) depth d circuit with fan-in 2 AND gates and fan-in OR gates for some , computing f. Then C can be represented by a degree k2d polynomial with t2d terms over .

Proof.

The proof is by induction. The claim for d=0 follows immediately since, in this case, the circuit computes one of the inputs.

We now assume that the claim holds for depth d1, and let C be a depth d circuit. We apply the induction hypothesis to the children of the top gate g, and we have two cases for g itself:

If the top gate is an OR gate:

We can find a polynomial P=i=1Pi representing the circuit C, where Pi is the polynomial for each input of the OR gate. Using the induction hypothesis

degP=maxidegPi2d1<2d.

Moreover, if we let ti be the number of terms of each Pi, the number of terms of P is

t=iti2d12d.
If the output gate is a binary AND gate:

Let Pl and Pr respectively be the polynomials representing the left and right children. The polynomial P=PlPr represents the circuit C, and

degP=degPl+degPr2d1+2d1=2d.

On the other hand,

t=tltr(2d1)2=2d,

which completes the proof.

Combining Lemma 22 for =nO(1) with our register program for representations in Lemma 20, we immediately obtain the following corollary.

Corollary 23.

Let C be a size s depth d circuit on n inputs with unbounded fan-in OR gates and fan-in 2 AND gates, and let p be a prime number such that p>s2d. There is a register program which cleanly computes C with:

  • 64 recursive calls to each xi,

  • sO(2d) basic instructions, and

  • s2d registers over p.

Our main result follows for the right choice of d.

Proof of Theorem 1.

Let d be such that dϵloglogn. Corollary 23 provides a register program cleanly computing any size nO(1) depth d bounded fan-in OR fan-in 2 AND circuit with

  • 64 recursive calls to each xi,

  • 2O(log1+ϵn) basic instructions, and

  • 2O(log1+ϵn) registers over p, where p=2O(log1+ϵn).

Given an SAC2 circuit C of size nO(1) and depth O(log2n), we will rewrite it as a circuit C of size at most nO(1) and depth O(log2nd), where each gate is a size nO(1) depth d circuit with unbounded fan-in OR and fan-in 2 AND gates. Hence using Lemma 21, for p=2O(log1+ϵn) we have a register program for C with

  • 64O(log2nϵloglogn)=2O(log2nloglogn) recursive calls to each xi,

  • nO(1)2O(log1+ϵn)2O(log2nloglogn)=2O(log2nloglogn) basic instructions, and

  • nO(1)2O(log1+ϵn)=2O(log1+ϵn) registers over p.

At the end, these recursive calls translate into basic instructions reading the input, giving a total time of 2O(log2nloglogn). We can thus translate this register program to a catalytic machine using Lemma 9, and we deduce that:

CCSPACE(log2nloglogn,2O(log1+ϵn))

as claimed.

4.1.1 Extension to other models

We briefly note other circuit classes for which the polynomial representation of Lemma 22 gives similar results to Theorem 1. First, using the same technique as above for =2, i.e. NC circuits, we get the following result:

Theorem 24.
NC2CSPACE(O(log2nloglogn),nO(1))

Alternatively, we can show that CL can compute NC circuits of slightly more than logarithmic depth:

Theorem 25.
𝖭𝖢𝖣𝖤𝖯𝖳𝖧(lognloglogn)CL

We note that in these cases, Lemma 22 follows immediately from the DNF representation of circuits and thus Theorem 24 and Theorem 25 do not require techniques such as polynomial representation.

While we have exclusively discussed Boolean circuits in this paper, it is also worth noting that Theorem 1 generalizes to arithmetic circuits as well. Indeed, the following lemma is analogous to Lemma 22, and follows even more directly from the definition:

Lemma 26.

Let C be a polynomial size depth d circuit with fan-in 2 × gates and fan-in s + gates over m, for some s. Then C can be represented by a degree k2d polynomial with ts2d terms over m.

Considering a prime pm and using Corollary 17, we have a register program to compute such circuits:

Lemma 27.

Any size s, depth d arithmetic circuits over m with fan-in 2 × gates can be computed by a register program with

  • 4 recursive calls to each xi,

  • O(22d(s+2d2d)) basic instructions,

  • O(22d(s+2d2d)) registers over 𝔽q where q=O(2sm2d), and

  • 1 register over 𝔽p.

Define now #SACi(m) to be the class of polynomial size O(login) depth circuit with bounded fan-in × gates and unbounded fan-in + gates. Taking s=nO(1) and d=ϵloglogn in Lemma 27, we get the following result for #SAC2(m):

Theorem 28.

For all m, for all ϵ>0,

#SAC2(m)CSPACE(O(log2nloglogn),2O(log1+ϵn)).

4.2 Matrix Powering via Decomposition

We now move to register programs for computing powers of a matrix MMn(p). A first attempt can be given by simply applying Lemma 15, as computing Md is equivalent to computing n2 degree d polynomial in the n2 coefficients; namely, if we denote by mi,j(d) the coefficient of Md, we have:

mi,j(d)=1k1,,kd1nmi,k1(i=1d2mki,ki+1)mkd1,j.

We can therefore use Lemma 15 to compute Md for d<p.

Lemma 29.

Let MMn(𝔽p). There is a register program that cleanly computes Md for d<p with:

  • 4 recursive calls to M,

  • O(dn2(n+dd)) basic instructions, and

  • O(dn2(n2+dd)) registers over 𝔽p.

The register program from Lemma 29 is a first step towards a more generic program for matrix powering, but it has two major issues:

  • The program can only compute powers up to p1.

  • The number of registers grows exponentially with d.

We address these issues independently to get a register program that can handle all the cases.

Computing any power 𝒅

Let MMn(𝔽p) and let LMn() be the natural extension of M to integers. Observe that the coefficients i,j(d) of Ld are evaluations of degree d polynomials with nd1 terms, and hence i,j(d)pdnd1.

Let us consider the first prime number q greater than pdnd1d. In this case, we have (i,j(d)modq)modp=i,j(d)modp=mi,j(d). Hence, we can use the register program of Lemma 29 for d<q and have an output register over 𝔽p for each coefficient. This yields the following:

Lemma 30.

Let MMn(𝔽p). There is a register program that cleanly computes Md with:

  • 4 recursive calls to M,

  • O(dn2(n2+dd)) basic instructions,

  • O(dn2(n2+dd)) registers over 𝔽O(pdnd1), and

  • O(n2) registers over 𝔽p.

Reducing the Number of Registers

The latter register program works for all d and has a constant number of recursive calls. However, the number of registers is still exponential in d, and therefore it is not usable as is. To fix this issue, let us observe that if we let fd be the powering function fd(M)=Md, we have fdk=(fd)k, where k means composing k times the same function. An iterated application of Lemma 8 gives the following:

Lemma 31.

Let 𝐑 be a ring, x𝐑 and δ. Suppose that for all kδ there is a clean register program Pk computing xk with at most t recursive calls, s basic instructions, and r registers. Then, there exists a register program P that computes xd with

  • at most (logδd+2)tt1tlogδd recursive calls,

  • O(logδd)t+s(t+1)logδdt basic instructions, and

  • 1+r(logδd+1) registers over 𝐑.

Proof.

Let fk:𝐑𝐑 be the functions that computes xk for all kd. Observe that

d=i=0logδdαiδi,

where αi are non-negative integers smaller than δ. Therefore

xd=i=0logδd(xδi)αi.

Note that (xδi)αi=fαi(fδ)i(x). We can apply Lemma 8 to obtain a register program Ri that cleanly computes (xδi)αi with ti+1 recursive calls, (t+1)is basic instructions, and r registers. Then, using Lemma 18 for P=i(xδi)αi, we can compute xd by calling logδd+2logδd+2 times each program Ri, and using a single additional register. In total, we require

  • at most (logδd+2)i=0logδdti+1(logδd+2)tt1tlogδd recursive calls to x,

  • O(logδd)(1+si=0logδd(t+1)i)O(logδd)t+s(t+1)logδdt basic instructions, and

  • 1+i=0logδdr=1+r(logδd+1) registers.

Hence, we fix some δ, and we can compute Md for all d based on the register program for δ. This yields our register program for Theorem 2.

Proof of Theorem 2.

Let δ. Combining Lemma 30 and Lemma 31, we get that for all d, there exists a register program that computes Md with:

  • O(logdlogδd3logδ) recursive calls to M,

  • O(δn2d3logδlogdlogδ(n2+δδ)) basic instructions,

  • O(n2logdlogδ) registers over 𝔽p.

Replacing by ϵ=3logδ, yields a program with

  • O(ϵdϵlogd) recursive calls to M,

  • O(ϵn23ϵ+123ϵ1dϵlogd) basic instructions,

  • O(ϵn23ϵ+123ϵ1logd) registers over 𝔽q for q=O((np)23ϵ), and

  • O(ϵn2logd) registers over 𝔽p.

And this completes the proof.

5 Conclusion and Open problems

In this paper, we introduce a novel approach to compute functions recognized by non-constant depth polynomial-size circuits with bounded AND fan-in and unbounded OR fan-in. Our method relies on representing these functions in a weak polynomial form over a field 𝔽p for a sufficiently large prime p.

This work marks the first new connection between circuit complexity and catalytic computation classes since the seminal result TC1CL [10].

An open question stemming from our paper is the extensibility of our method to compute circuits of greater depth. Progress in this direction would likely necessitate exploring weaker polynomial representations that allow more efficient register programs for deeper or different types of circuits. For instance, building on existing concepts in the literature, perhaps a program P which weakly represents f, i.e. f(x)=b iff P(x)Sb for some fixed sets S0,S1, could allow us to make further progress.

References

  • [1] Aryan Agarwala and Ian Mertz. Bipartite matching is in catalytic logspace. In Electron. Colloquium Comput. Complex, volume 48, 2025.
  • [2] Richard Beigel. The polynomial method in circuit complexity. In [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference, pages 82–95. IEEE, 1993. doi:10.1109/SCT.1993.336538.
  • [3] Michael Ben-Or and Richard Cleve. Computing algebraic formulas using a constant number of registers. SIAM J. Comput., 21(1):54–58, 1992. doi:10.1137/0221006.
  • [4] Andrzej Białynicki-Birula and Andrzej Schinzel. Representations of multivariate polynomials by sums of univariate polynomials in linear forms. In Colloquium Mathematicum, volume 2, pages 201–233, 2008.
  • [5] Sagar Bisoyi, Krishnamoorthy Dinesh, Bhabya Deep Rai, and Jayalal Sarma. Almost-catalytic computation. arXiv preprint arXiv:2409.07208, 2024.
  • [6] Sagar Bisoyi, Krishnamoorthy Dinesh, and Jayalal Sarma. On pure space vs catalytic space. Theoretical Computer Science, 921:112–126, 2022. doi:10.1016/J.TCS.2022.04.005.
  • [7] Allan Borodin. On relating time and space to size and depth. SIAM journal on computing, 6(4):733–744, 1977. doi:10.1137/0206054.
  • [8] Allan Borodin, Stephen Cook, and Nicholas Pippenger. Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and control, 58(1-3):113–136, 1983. doi:10.1016/S0019-9958(83)80060-6.
  • [9] Allan Borodin, Stephen A. Cook, Patrick W. Dymond, Walter L. Ruzzo, and Martin Tompa. Two applications of inductive counting for complementation problems. SIAM J. Comput., 18(3):559–578, 1989. doi:10.1137/0218038.
  • [10] Harry Buhrman, Richard Cleve, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Computing with a full memory: catalytic space. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 857–866, 2014.
  • [11] Harry Buhrman, Michal Kouckỳ, Bruno Loff, and Florian Speelman. Catalytic space: Non-determinism and hierarchy. Theory of Computing Systems, 62:116–135, 2018. doi:10.1007/S00224-017-9784-7.
  • [12] Richard Cleve. Computing algebraic formulas with a constant number of registers. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 254–257, 1988.
  • [13] Richard Erwin Cleve. Methodologies for designing block ciphers and cryptographic protocols. PhD thesis, University of Toronto, 1990.
  • [14] James Cook, Jiatu Li, Ian Mertz, and Edward Pyne. The structure of catalytic space: Capturing randomness and time via compression. ECCC TR24-106, 2024.
  • [15] James Cook and Ian Mertz. Catalytic approaches to the tree evaluation problem. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 752–760, 2020. doi:10.1145/3357713.3384316.
  • [16] James Cook and Ian Mertz. Encodings and the tree evaluation problem. In Electron. Colloquium Comput. Complex, volume 54, 2021.
  • [17] James Cook and Ian Mertz. Trading time and space in catalytic branching programs. In 37th Computational Complexity Conference (CCC 2022). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2022.
  • [18] James Cook and Ian Mertz. Tree evaluation is in space o (log n· log log n). In Electron. Colloquium Comput. Complex., 2023.
  • [19] Stephen Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branching programs for tree evaluation. ACM Transactions on Computation Theory (TOCT), 3(2):1–43, 2012. doi:10.1145/2077336.2077337.
  • [20] Stephen A Cook. The classification of problems which have fast parallel algorithms. In International Conference on Fundamentals of Computation Theory, pages 78–93. Springer, 1983. doi:10.1007/3-540-12689-9_95.
  • [21] Don Coppersmith and Edna Grossman. Generators for certain alternating groups with applications to cryptography. SIAM Journal on Applied Mathematics, 29(4):624–627, 1975.
  • [22] Samir Datta, Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Randomized and symmetric catalytic computation. In International Computer Science Symposium in Russia, pages 211–223. Springer, 2020. doi:10.1007/978-3-030-50026-9_15.
  • [23] Dean Doron, Edward Pyne, and Roei Tell. Opening up the distinguisher: A hardness to randomness approach for bpl= l that uses properties of bpl. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 2039–2049, 2024. doi:10.1145/3618260.3649772.
  • [24] Jeff Edmonds, Venkatesh Medabalimi, and Toniann Pitassi. Hardness of function composition for semantic read once branching programs. In 33rd Computational Complexity Conference (CCC 2018). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2018.
  • [25] Marten Folkertsma, Ian Mertz, Florian Speelman, and Quinten Tupker. Fully characterizing lossy catalytic computation. arXiv preprint arXiv:2409.05046, 2024. doi:10.48550/arXiv.2409.05046.
  • [26] Oded Goldreich. Solving tree evaluation in o (log n· log log n) space. ECCC, TR24-124, 2024.
  • [27] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Unambiguous catalytic computation. In 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2019.
  • [28] Chetan Gupta, Rahul Jain, Vimal Raj Sharma, and Raghunath Tewari. Lossy catalytic computation. arXiv preprint arXiv:2408.14670, 2024. doi:10.48550/arXiv.2408.14670.
  • [29] Kazuo Iwama and Atsuki Nagao. Read-once branching programs for tree evaluation problems. ACM Transactions on Computation Theory (TOCT), 11(1):1–12, 2018. doi:10.1145/3282433.
  • [30] Michal Koucký et al. Catalytic computation. Bulletin of EATCS, 1(118), 2016.
  • [31] Michal Koucký, Ian Mertz, Ted Pyne, and Sasha Sami. Collapsing catalytic classes. In Electron. Colloquium Comput. Complex, volume 19, 2025.
  • [32] Jiatu Li, Edward Pyne, and Roei Tell. Distinguishing, predicting, and certifying: On the long reach of partial notions of pseudorandomness. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pages 1–13. IEEE, 2024. doi:10.1109/FOCS61266.2024.00095.
  • [33] David Liu. Pebbling arguments for tree evaluation. arXiv preprint arXiv:1311.0293, 2013. arXiv:1311.0293.
  • [34] Ian Mertz. Catalytic computing, tree evaluation, & clean computation, 2020.
  • [35] Ian Mertz et al. Reusing space: Techniques and open problems. Bulletin of EATCS, 141(3), 2023.
  • [36] Nicholas Pippenger. On simultaneous resource bounds. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), pages 307–311. IEEE, 1979.
  • [37] Aaron Potechin. A note on amortized branching program complexity. arXiv preprint arXiv:1611.06632, 2016.
  • [38] Edward Pyne. Derandomizing logspace with a small shared hard drive. In 39th Computational Complexity Conference (CCC 2024). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024.
  • [39] Edward Pyne, Nathan S. Sheffield, and William Wang. Catalytic Communication. In Raghu Meka, editor, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025), volume 325 of Leibniz International Proceedings in Informatics (LIPIcs), pages 79:1–79:24, Dagstuhl, Germany, 2025. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ITCS.2025.79.
  • [40] Robert Robere and Jeroen Zuiddam. Amortized circuit complexity, formal complexity measures, and catalytic algorithms. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 759–769. IEEE, 2022.
  • [41] Walter L Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981. doi:10.1016/0022-0000(81)90038-6.
  • [42] Andrzej Schinzel. On a decomposition of polynomials in several variables. Journal de théorie des nombres de Bordeaux, 14(2):647–666, 2002.
  • [43] Claude E Shannon. The synthesis of two-terminal switching circuits. The Bell System Technical Journal, 28(1):59–98, 1949. doi:10.1002/J.1538-7305.1949.TB03624.X.
  • [44] H Venkateswaran. Circuit definitions of nondeterministic complexity classes. SIAM Journal on Computing, 21(4):655–670, 1992. doi:10.1137/0221040.
  • [45] Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
  • [46] Ryan Williams. Simulating time with square-root space. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, 2025.