Abstract 1 Introduction 2 Preliminaries 3 Circuit and Query Models 4 Exponential Polynomials 5 Algorithms for Identity Testing References

Identity Testing for Circuits with Exponentiation Gates

Jiatu Li ORCID Massachusetts Institute of Technology, Cambridge, MA, USA Mengdi Wu ORCID Carnegie Mellon University, Pittsburgh, PA, USA
Abstract

Motivated by practical applications in the design of optimization compilers for neural networks, we initiated the study of identity testing problems for arithmetic circuits augmented with exponentiation gates that compute the real function xex. These circuits compute real functions of form P(x)/P(x), where both P(x) and P(x) are exponential polynomials

i=1kfi(x)exp(gi(x)hi(x)),

for polynomials fi(x),gi(x), and hi(x).

We formalize a black-box query model over finite fields for this class of circuits, which is mathematical simple and reflects constraints faced by real-world neural network compilers. We proved that a simple and efficient randomized identity testing algorithm achieves perfect completeness and non-trivial soundness. Concurrent with our work, the algorithm has been implemented in the optimization compiler Mirage by Wu et al. (OSDI 2025), demonstrating promising empirical performance in both efficiency and soundness error. Finally, we propose a number-theoretic conjecture under which our algorithm is sound with high probability.

Keywords and phrases:
Polynomial Identity Testing, Exponential Polynomials
Funding:
Jiatu Li: received support from the National Science Foundation under Grant CCF-2127597.
Mengdi Wu: received support from NSF awards CNS-2147909, CNS-2211882, and CNS-2239351 and research awards from Amazon, Cisco, Google, Meta, NVIDIA, Oracle, Qualcomm, and Samsung.
Copyright and License:
[Uncaptioned image] © Jiatu Li and Mengdi Wu; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Circuit complexity
Related Version:
Full Version: https://arxiv.org/abs/2506.04529
Acknowledgements:
We are grateful to Seyoon Ragavan for mentioning the connection of our results to [11] and related works. We also thank Ryan O’Donnell and Ryan Williams for discussions about the identity testing algorithm and its analysis, and Zhihao Jia and Oded Padon for discussions about modeling the real-world problem.
Editor:
Shubhangi Saraf

1 Introduction

Polynomial Identity Testing (PIT) is a central problem of theoretical computer science. Given a multi-variate polynomial f:𝔽n𝔽 with certain properties, the goal of the problem is to verify whether f is identically zero.

Early results on identity testing of polynomials date back to DeMillo and Lipton [8], Zippel [26], and Schwartz [19] (see also [14] for a special case on finite fields). It is proved that a black-box probabilistic testing, namely checking whether f(x)=0 for a random element x𝔽n, works well for low-degree polynomials. This simple algorithm serves as a key step in many randomized algorithms (see, e.g., [22, 12, 1]) as well as probabilistic proof systems, including the celebrated interactive proof system for 𝖯𝖲𝖯𝖠𝖢𝖤 [20] and the PCP theorem [3]. Subsequently, there has been a rich literature on identity testing of various types of polynomials, including sparse polynomials and polynomials computed by depth-3 algebraic circuits (see, e.g., [17, 18] and the references therein).

Perhaps surprisingly, the probabilistic identity testing algorithm for low-degree polynomials has recently been found useful in developing optimization compilers for deep neural networks. Loosely speaking, deep neural networks are represented as a high-level programming language, and the optimization compiler is designed to automatically identify redundancy in programs to improve the efficiency of the neural network. In a recent optimization compiler PET [24], Wang et al. observed that redundancy detection in partial neural network computation without non-linear activation can be naturally modeled as black-box identity testing of low-degree polynomials. By applying the classical probabilistic testing algorithm [8, 26, 19], PET achieves a significant speed-up over its benchmark.

 Remark 1 (black-box model).

In the black-box setting, the algorithm is given oracle access to the polynomial rather than its description. For the optimization compilers of neural networks (e.g., [24]), the neural networks are gigantic and are usually loaded into optimization devices such as GPUs or TPUs. Black-box identity testing algorithms are favorable, as evaluation on (partial) neural networks exploits the optimization devices and is therefore efficient in practice. In contrast, algorithms analyzing the circuit in white-box, even when they have lower computational complexity in theory, may still be less efficient if they cannot make full use of optimization devices.111Moreover, black-box algorithms are much easier to implement, as one can reuse the existing software infrastructure for evaluating neural networks, making them more reliable while reducing human efforts.

A natural question is, therefore, whether this approach can be extended to neural networks with non-linear activation. In a subsequent work, Wu et al. [25] introduced a framework, Mirage, that could benefit from extending the redundancy detection algorithm in PET [24] to neural network components with exponentiation operators. This enables the modeling of non-linear activation functions such as 𝗌𝗈𝖿𝗍𝗆𝖺𝗑(). In the language of circuit complexity, the program representation in Mirage can be modeled as integer-coefficient arithmetic circuits with “exponentiation gates”, where there is at most one exponentiation gate in each path from output gates to input gates (see Section 3.1). The goal of Mirage is to “efficiently” check whether a circuit C:n of such form is identical to 0 on in a black-box query model that is practically satisfactory.

Compared to polynomials and standard arithmetic circuits computing polynomials, the behavior of circuits with exponentiation gates is not well studied. Integer-coefficient polynomials can be naturally evaluated over 𝔽p with modular arithmetic. However, it is not a-prior clear what it the correct way to define the evaluation of circuits with exponentiation gates over finite fields, as there is no standard analogy of the exponentiation function in finite fields. This makes it hard to even define a suitable “black-box query model” for the identity testing problem that is easy to analyze and captures the real-world constraints. Moreover, it is unclear whether the standard randomized algorithm [8, 19, 26] works after we define the evaluation of such circuits.

Our Contribution.

Motivated by the practical applications, we initiate a theoretical analysis of the identity testing of circuit with exponentiation gates.

  • We introduce a natural circuit model that formalizes circuits with exponentiation gates xexp(x), which suffices to captures components of modern neural networks such as Attention [23] with certain non-linear activation functions. The circuit model generalizes the standard algebraic circuit model by allowing exponentiation gates.

  • We prove that in the idealized model where the algorithm could query the circuit on any real number, a simple probabilistic testing is correct with high probability.

  • We introduce an algebraic query model that captures the real-world constraints in optimization compilers for deep neural networks such as Mirage [25], and design a simple randomized algorithm that is perfectly complete and non-trivially sound. Concurrent with our work, this algorithm is implemented in Mirage as one of its two redundancy detection algorithms. We also introduce a conjecture about sparse polynomials in finite field, under which our randomized algorithm in algebraic query model is sound with high probability.

1.1 Main Results: A Simplified Setting

Before formally describing our circuit and query models in Section 3, we try to describe a simplified version of the main mathematical problem. Let k, fi,gi,hi be n-variate degree-d polynomials with integer coefficients in [w,w], and P:n{} be a partial real function defined as

P(x)i=1kfi(x)exp(gi(x)hi(x)),

where exp(x)ex. This function is similar to the exponential polynomials that are used in, e.g., transcendental number theory [4, Chapter 12]. For simplicity, we will call P(x) a degree-d exponential polynomial. The number of terms k in P(x) is said to be the width of P(x). We call {fi(x)}i[k] the coefficient polynomials of P(x), and {gi(x)/hi(x)}i[k] the exponent fractions of it.

The algorithmic problem we are interested in is to check whether P(x) is identically zero on n with queries of one of the following two forms:

  • (Real): Given xn, the oracle outputs {0,1,} indicating that P(x)=0, P(x){0}, and P(x)= (i.e. undefined due to division-by-zero), respectively.

  • (Algebraic): Let p,q be prime numbers such that qp1, G=g be the unique multiplicative subgroup of 𝔽p of order q. Given u𝔽pn,v𝔽qn, and aG, the oracle outputs

    Pa(u,v)(i=1kfi(u)agi(v)(hi(v))1modq)modp,

    where ()1 on the exponent denotes the multiplicative inverse in 𝔽q. The oracle returns if hi(v)modq=0 for any i[k].

Before introducing our results, we make a few short remarks on our modeling of the identity testing problem and its connection to real world engineering practice.

 Remark 2 (numerical stability issues).

Testing whether P(x) (modeling a program) is identically zero on n is motivated by the following compilation task – compilers may replace a (sub)program by another optimized one provided that they compute the same function over real numbers. As real numbers are only an idealized model for, say, floating point numbers, this optimization strategy may cause numerical stability issues. Nevertheless, most modern classical and neural network compilers offer the option for users to enable such optimizations as it may significantly improve performance, e.g., the -Ofast option of GCC [21, Chapter 3] and the FlashAttention [7] implementation in PyTorch [15, 16]; see also [10, 24, 25].

 Remark 3 (query models).

The former type of query is an idealized model. In practical applications (e.g. [24, 25]), there is no efficient algorithm to implement the oracle that checks whether P(x)=0 precisely given x, and the approximation of real numbers using floating-point numbers is also unsatisfactory in practice. The latter type of query is more realistic; indeed, it is implemented in the optimization compiler Mirage [25] where the explicit representation of P(x) is a (partial) neural network.222The special case of the algebraic query model without exponentiation functions is implemented in [24]. See Section 1.2 and 3 for more detailed discussions.

Identity Testing Algorithms in the Simplified Setting.

Now we describe our identity testing algorithms in the simplified setting. First, we show that a simple randomized algorithm with one query works in the real query model. The algorithm is to randomly sample x[B]n for B=20dk2 and accepts if P(x){0,}. It is clear that the algorithm is perfectly complete, and the soundness is given by the following theorem:

Theorem 4 (Theorem 22, simplified).

Let P(x) be an n-variate degree-d exponential polynomial of width k that is not identically zero on its domain, and S be any finite non-empty set. For xSn sampled uniformly at random, Pr[P(x){0,}]8dk2/|S|.

As the second result, we analyze the following simple randomized identity testing algorithm in the algebraic query model: Let p,q,G be defined as above, the algorithm randomly samples u𝔽pn, v𝔽qn, and aG, and accepts if Pa(u,v){0,}. The following theorem formalizes the completeness and soundness of the algorithm:

Theorem 5 (Theorem 7, simplified).

Let P(x) be an n-variate degree-d exponential polynomial of width k. Let p,q be prime numbers, qp1, and G=g be the unique order-q multiplicative subgroup of 𝔽p. Suppose that all integer coefficients in P(x) are within [w,w], q>2w, then:

  • (Completeness). If P(x) is identically zero on its domain as a partial real function, for any aG, u𝔽pn, and v𝔽qn, Pa(u,v){0,}.

  • (Soundness). If P(x) is not identically zero on its domain as a partial real function, p,q2kw, then for uniformly random aG, u𝔽pn, and v𝔽qn, the probability that Pa(u,v){0,} is at most q1k1+O(dk2/q).

Note that for sufficiently large k,q such that lnqk1, we have

12lnqk1q1k11lnq2(k1).

Thus in a typical setting that dk3qkO(1), the soundness error is 1Θ(logk/(k1)).333The error probability is constant if q is exponential in k. However, this setting is not meaningful: The width k is usually comparable to the input length, and thus the arithmetic operations over q would be extremely inefficient. By parallel repetition of the randomized algorithm for O(klogk/logq) times, we can boost the error probability to 1/kO(1). This leads to an efficient randomized identity testing algorithm when k is relatively small and the evaluation is much more efficient than obtaining the description of P(x), which, in particular, captures the real-world constraints for the optimization compilers such as Mirage [25].

1.2 Connection to the Circuit Models

We stress that the abstraction in previous subsection is mathematically clean but omits crucial details in modeling the real-world problem. Specifically:

  • It is not immediately clear how the identity testing results for exponential polynomials apply to our motivating real-world application, namely identity testing of neural networks formalized by “circuits with exponentiation gates” [24, 25].

  • It is also not clear how the function Pa(u,v) in Theorem 7 relates to the circuits, and why it can be computed efficiently when P(x) is explicitly given by a “circuit with exponentiation gates”. This is important as in real-world applications, see [25], the function Pa(u,v) is implemented on specialized devices such as GPU or TPU, which are optimized for specific computation patterns. Moreover, it is unclear where the restrictions in the algebraic query model (e.g., a must be in G) come from.

Circuit Model.

We briefly explain our circuit model (called 𝖠𝖤𝗑𝗉1 circuits) and refer readers to Sections 3 and 4 for more details. We work with (arithmetic) circuits with integer coefficients, unbounded fan-in addition and multiplication gates, fan-in two division gates, and fan-in one exponentiation gates. In addition, we impose the restriction that on each path from input variables to output gates, there is at most one exponentiation gates – the circuit cannot compute double exponential xeex by composing exponentiation gates.

The evaluation of circuits over real numbers is defined by a standard gate-by-gate evaluation algorithm, where the exponentiation gate is interpreted as the function xex. However, it is unclear how to define the evaluation of such circuits over finite fields, as there is no standard interpretation of the exponential function.

Let p be a prime number. To define the evaluation of an 𝖠𝖤𝗑𝗉1 circuit C over 𝔽p, a natural idea is to replace the exponential function by xax for some element a𝔽p. This is not ideal as such an interpretation is inconsistent with the evaluation over : It could be the case that

ax1+x2modpax1ax2(modp)

for x1,x2𝔽p, while ex1+x2=ex1ex2 for any x1,x2. As a result, this definition cannot be used for the identity testing of 𝖠𝖤𝗑𝗉1 circuits.

To address the issue, we exploit the algebraic structure of finite fields by using different moduli over and under the exponents. Let p,q be prime numbers such that qp1, G𝔽p be the unique order-q multiplicative subgroup, and aG. It follows that

ax1+x2modqax1ax2(modp)

for any x1,x2𝔽q. If we use q as the modulus “over the exponent” and p as the modulus “under the exponent”, the function xax will be consistent with the arithmetic law ex1+x2=ex1ex2. Following the intuition, we define the evaluation of C with respect to (p,q,a) using a gate-by-gate evaluation algorithm such that:

  • Each input variable or wire carries a pair (u,v)(𝔽p{})×(𝔽q{}).

  • Addition, multiplication, and division gates are implemented coordinate-wisely.

  • The exponentiation gate is implemented by (u,v)(avmodp,).

This evaluation algorithm can be implemented efficiently given the description of the circuit C. We refer readers to Section 3 for more details.

A Structural Lemma.

Let u=(u1,,un)𝔽pn, v=(v1,,vn)𝔽qn, we use Ca(u,v) to denote the output of the evaluation algorithm where the i-th input variable is assigned to (ui,vi). Similar to the standard results that algebraic circuits compute polynomials [2, Section 16], we can then prove that 𝖠𝖤𝗑𝗉1 circuits can be converted to an equivalent fraction of exponential polynomials:

Lemma 6 (Lemma 20, informal).

For every n-input 𝖠𝖤𝗑𝗉1 circuit C, there are n-variate integer-coefficient exponential polynomials P(x) and P(x) such that the following holds:

  • For each u, C(u)=P(u)/P(u).

  • Let p,q be prime numbers such that qp1, Gp,q be the multiplicative subgroup of 𝔽p of order q, and aGp,q. Then for every (u,v)𝔽p×𝔽q, Ca(u,v)Pa(u,v)(Pa(u,v))1(modp), where (a)1 denotes the multiplicative inverse of a modulo p.

We stress that the structural lemma does not provide a polynomial upper bound on the degree, width, or integer weight of the integer-coefficient exponential polynomials P,P. Nevertheless, it can be verified that the exponential polynomials P,P for 𝖠𝖤𝗑𝗉1 circuits from real-world neural network applications, such as 𝗌𝗈𝖿𝗍𝗆𝖺𝗑() and Attention with softmax activations [23], tend to have relatively small degree, width, and weight; interested readers are referred to Section 4.5 for more discussion, and [25] for experimental results.

The Main Theorem.

Let 𝖽𝗈𝗆(C) be the domain of the circuit C, i.e., the set of xn such that C(x). An 𝖠𝖤𝗑𝗉1 circuit C is said to have degree d, width k, and weight w if there are integer-coefficient degree-d width-k exponential polynomials P(x) and P(x) with all integer coefficients within [w,w] that satisfy Lemma 6.

By generalizing Theorems 4 and 5 to fractions of exponential polynomials and combining them with Lemma 6, we can obtain the final result:

Theorem 7.

Let C:n be an 𝖠𝖤𝗑𝗉1 circuit of width k, degree d and weight w, p and q be prime numbers such that qp1 and q>2(kw)2. Let Gp,q be the unique multiplicative subgroup of 𝔽p of order q. The following hold:

  • (Completeness). If C is identically zero on 𝖽𝗈𝗆(C), then for every aGp,q, Ca(u,v){0,} for every (u,v)𝔽pn×𝔽qn.

  • (Soundness). If C is not identically zero on 𝖽𝗈𝗆(C), then for uniformly random u𝔽pn,v𝔽qn,aGp,q, Pr[Ca(u,v){0,}]18dk4q1q1/(k21).

1.3 Technical Overview

We briefly explain the proof of our simplified technical results Theorem 4 and Theorem 5 in the simplified setting, as well as the main theorem (see Theorem 7) that generalizes the results to the circuit setting.

Real Queries.

The idea behind Theorem 4 is very simple: We manage to combine the Schwartz-Zippel lemma and Lindemann-Weierstrass theorem, i.e., e is transcendental (see Theorem 13).

To avoid technical subtlety, we assume that P(x) is a degree-d exponential polynomial where the exponent fractions gi(x)/hi(x) are pairwisely distinct; that is, for every pair of indices i,j[k], ij, gi(x)hj(x)gj(x)hi(x) is not a zero polynomial. Such an exponential polynomial is said to be condensed, and the general case can be reduced to the condensed case by considering another exponential polynomial P^ that “groups” coefficients based on the exponent fraction; see Section 4.3. Moreover, we assume without loss of generality that fi0 and hi0 for each i[k].

Let S be a non-empty finite set, we know by the Schwartz-Zippel lemma that for any i[k], j[k], ij, each of the following non-zero polynomials of degree at least 2d

fi(x),hi(x),gi(x)hj(x)gj(x)hi(x)

evaluates to zero with probability at most 2d/|S| for a uniformly random xSn. By the union bound, we can further prove that with probability at least 13dk2/|S|, none of the polynomials evaluate to 0. For each of such x, we can see that for the rational numbers αigi(x)/hi(x) and βifi(x), we have

P(x)=β1eα1+β2eα2++βkeαk,

which must be non-zero as e is transcendental.

Connection to Algebraic Queries.

Unwinding the proof of Theorem 4, we could actually give a necessary and sufficient condition for an exponential polynomial with integer weights to be identically zero over real evaluations. Let P(x) be a condensed degree-d exponential polynomial

P(x)=i=1kfi(x)exp(gi(x)hi(x))

with integer coefficients, then P(x) is identically zero on its domain (over real evaluation) if and only if for every i[k], fi=0. Subsequently, if p,q are chosen to be larger than the integer coefficients, the identity testing of a condensed exponential function P(x) is equivalent to testing whether fi=0(modp) for every i[k].

With the characterization above, the completeness of Theorem 5 is relatively simple, so we will focus on the soundness property.

We try to mimic the strategy in the proof of Theorem 4. Suppose that P(x) is not identically zero on real evaluations, we know by the discussion above that when p are sufficiently large, there is at least one fi0(modp) for any i[k]. Moreover, if q is sufficiently large, we know that for any ij, gihjgjhi0(modq). Assume without loss of generality that fi0 for all i[k]. For u𝔽pn and v𝔽qn sampled uniformly at random, we still know by the Schwartz-Zippel lemma and the union bound that with probability at least 13dk2/q, none of

fi(x)modp,hi(x)modq,gi(x)hj(x)gj(x)hi(x)modq

evaluates to zero for i,j[k], ij. We can then define αigi(x)/hi(x)modq and βifi(x)modp for i[k] such that

Pa(u,v)=β1aα1+βkaαkmodp,

where β1,,βk are non-zero and α1,,αk are pairwisely distinct.

A Weak Descartes’ Rule in Finite Fields.

It then suffices to prove that for any non-zero β1,,βk𝔽p and distinct α1,,αk𝔽q, k1, if we sample aG uniformly at random, the probability that β1aα1+βkaαkmodp=0 is at most q1/(k1). We note that this can be viewed as a generalization of the Descartes’ Rule (see also [11]), which asserts that any univariate polynomial with k monomials has at most 2k roots.

We provide an elementary proof of a weaker result: The probability that β1aα1+βkaαkmodp=0 is at most 11/k. Theorem 5 is proved using a lemma from [11], which employs a similar but slightly more complicated argument.

Recall that the order-q multiplicative subgroup G𝔽p is a cyclic group. For any fixed i𝔽q, a random element aG can be viewed as generated from randomly sampling gG and outputting gi. Our idea is to choose a good i𝔽q such that at least a 1/k fraction of g satisfies that β1giα1+βkgiαkmodp0.

We say that an index i𝔽q is good for α𝔽q if iαmodq(11/k)q, and is said to be good if for every j[k], i is good for αi. It turns out that if there is a good i𝔽q,

β1giα1++βkgiαk

can be viewed as a polynomial in 𝔽p[g] of degree at most (11/k)q; in that case, there are at most (11/k)q roots in 𝔽p (and also in G).

Therefore, it remains to prove the existence of a good i. Since iiα is a bijection in 𝔽q, we know that for any fixed α, the probability that a random i𝔽q is good for α is at least 11/k. By the union bound, we know that a random i𝔽q is good for α1,,αk with non-zero probability. This implies that there must be a good i𝔽q, which completes the proof.

 Remark 8 (related results).

The analogy of Descartes’ Rule over finite fields has been studied prior to our work. Motivated by understanding the security of the Diffie-Hellman cryptosystem, Canettie et al. [5] proved an upper bound on the number of roots of sparse univariate polynomials. This upper bound was later improved by Kelley [11]. We use the techniques from [5, 11] and obtain similar upper bounds. Note that [5, 11] considers the number of roots in 𝔽p, while we consider the number of roots in a multiplicative subgroup G𝔽P; as a result, our upper bound is cleaner and easier to prove.

Generalization to the Circuit Setting.

There are a few technical issues to obtain the main theorem (see Theorem 7).

First, the proof overview above assumes that the exponential polynomial is condensed, i.e., its exponent fractions are pairwisely distinct. For exponential polynomials that are not condensed, we need to first condense the polynomial by merging terms with equivalent exponent fractions. For instance, the following exponential polynomial P(x)=exp(x23xx3)+exp(x22xx2) may be condensed to P^(x)=2exp(x23xx3). In general, such condensation procedure results into another exponential polynomial P^ that has larger domain and agree with P on the domain of P (see Proposition 19). We need to bridge the gap between P and P^ with standard probability analysis.

Second, as shown in Section 1.2, 𝖠𝖤𝗑𝗉1 circuits are converted to fractions P/P of exponential polynomials rather than exponential polynomials. This requires a careful (but straightforward) adaption of the techniques above. In particular, in the soundness analysis, we use the observation that P/P is identically zero on its domain (over ) if and only if the exponential polynomial PP is identically zero on its domain (over ). This leads to a quadratic overhead (in k) in the soundness error of Theorem 7 compared to Theorem 5, as PP may have width up to k2 when both P and P are of width k.

Organization of the Paper.

We review basic definitions and classical results that we will need in Section 2. In Section 3, we formally describe our circuit model as well as the query models we considered in our paper – the idealized real query model and the algebraic query model; we also briefly explain why the algebraic query model is a better abstraction in the application of optimization compilers for neural networks. In Section 4, we define exponential polynomials, discuss its basic properties, and prove the structural lemma that converts circuits to fractions of exponential polynomials. In Section 5, we prove the correctness of our probabilistic testing algorithm in the real and algebraic query models.

1.4 Discussion and Open Problems

The most interesting open problem is to improve the soundness error of Theorem 5 and Theorem 7. We conjecture that the actual soundness error is 1Ω(1). In particular, we propose the following number-theoretic conjecture under which the soundness error is indeed 1Ω(1):

Conjecture 9 (Strong Descartes’ Rule Conjecture over Finite Fields).

There are constants ε,δ<1 such that the following holds. Let p, q be sufficiently large prime numbers such that qp1, g𝔽q be an element of order g, and Gg be the unique multiplicative subgroup of 𝔽p of order q. For any kqδ, distinct α1,,αk𝔽q, and β1,,βk𝔽p{0}, the non-zero univariate polynomial

f(z)β1zα1+β2zα2++βkzαk

has at most εq roots in G.

This conjecture can be interpreted as a property of the isomorphism mapping

I:[f](f(1),f(g),,f(gq1))

from 𝔽p[x]/(xq1) onto 𝔽pq. It states that there is no non-zero polynomial [f]𝔽p[x]/(xq1) such that both f and I([f]) are very sparse. This seems to be an analogy of the “uncertainty principle” in Fourier analysis (see, e.g., [13, Exercise 3.15]). We also note that numerical experiments (see, e.g., [11, 6]) suggest that it is hard to construct sparse polynomials over 𝔽p with many roots for prime p.

Complexity-theoretic Perspectives.

It is also interesting to consider whether we can design better identity testing algorithms for some restricted classes of 𝖠𝖤𝗑𝗉1 circuits (e.g. of small constant depth). This would potentially lead to real-world applications, as the structure of circuits from neural networks is relatively simple. To start with, one may consider adapting existing techniques for identity testing of algebraic circuits (see, e.g., [17, 18]) to 𝖠𝖤𝗑𝗉1 circuits.

On the other hand, it is interesting to consider whether there are conditional or unconditional lower bounds for identity testing of 𝖠𝖤𝗑𝗉1 circuits in black-box models, such as the algebraic query models that we introduced in Section 1.2.

Other Modeling of the Problem.

We note the the algebraic query model in Section 1.2 is not necessarily the only reasonable formalization of the real world problem. Recall that in optimization compilers for neural networks [24, 25], the neural network (modeled by an 𝖠𝖤𝗑𝗉1 circuit) is loaded into specialized devices (such as GPU or TPU) that are optimized for specific computation patterns and have high communication overhead with the CPU. Algorithms in other black-box models are potentially useful for real-world applications if:

  1. 1.

    the queries can be efficiently implemented on those specialized devices; and

  2. 2.

    the communication overhead is small.

We are not aware of other natural models that satisfies these constraints for identity testing of 𝖠𝖤𝗑𝗉1 circuits, and it remains an interesting open problem that may require joint efforts of system and theory communities. Note that it may make it possible to design better identity testing algorithms if we work with another black-box query model that has better mathematical properties.

2 Preliminaries

2.1 Abstract Algebra

We assume basic familiarity to elementary ring and field theory (see, e.g., [9]). We will use the standard notation: R[x1,,xm] denotes the ring of m-variate polynomials with coefficients in R; for any ideal I in R, R/I denotes the quotient ring of R modulo I; 𝖰𝗎𝗈𝗍(R) denotes the quotient field (i.e. the field of fraction) extending an integral domain R. For any prime p and u𝔽p{0}, we use 𝖨𝗇𝗏p(u) to denote the multiplicative inverse of u modulo p.

Recall that an integral domain is a non-zero commutative ring where the multiplication of two non-zero elements is non-zero. We will need the following result for polynomials.

Lemma 10 ([9, Proposition 1 of Section 9.1]).

For any integral domain R, the ring of R-coefficient multi-variate polynomials R[x1,,xm] is also an integral domain.

2.2 Schwartz-Zippel Lemma

Lemma 11 ([8, 26, 19]).

Let R be an integral domain and SR be a finite subset of R. For any m,d and any non-zero m-variate polynomial f:Rm of total degree d

PrxSm[f(x)=0]d|S|,

where x=(x1,,xm) is uniformly sampled from Sm.

2.3 Transcendental Number Theory

Definition 12.

A complex number α is called algebraic if it is the root of a non-zero integer-coefficient polynomial of finite degree, and called transcendental if it is not algebraic.

Algebraic numbers, denoted by ¯, is a sub-field of . In particular, any rational number α=p/q is algebraic as it is the root of the degree-1 integer-coefficient polynomial qxp.

Theorem 13 (Lindemann–Weierstrass Theorem [4, Theorem 1.4]).

If α1,,αn¯ are distinct algebraic numbers, eα1,,eαn are linearly independent over the field ¯ of algebraic numbers. In particular, e is transcendental.

3 Circuit and Query Models

3.1 Definition of the Circuit Model

The circuit model we introduce next extends the standard arithmetic circuit model by exponentiation gates that is intended to model the exponential function exp(x)ex over real numbers. Formally, an 𝖠𝖤𝗑𝗉1 circuit is a DAG consisting of vertices that are either an input variable or a gate of the following types (all gates are of fan-out 1):

  1. 1.

    constant gates 𝖢a of fan-in 0 intended to model a fixed integer a;

  2. 2.

    addition gates Σ(x1,,xm) of unbounded fan-in that are intended to model the addition of real numbers, i.e., x1++xm;

  3. 3.

    multiplication gates Π(x1,,xm) of unbounded fan-in that are intended to model the multiplication of real numbers, i.e., x1x2xn.

  4. 4.

    division gates 𝖥𝗋𝖺𝖼(x,y) of fan-in 2 intended to model the division of real numbers, i.e., x/y;

  5. 5.

    exponentiation gates 𝖤𝗑𝗉(x) of fan-in 1 intended to model the exponential function, i.e., ex.

In addition, we do not allow compositions of exponentiation gates in 𝖠𝖤𝗑𝗉1 circuits; that is, for any 𝖠𝖤𝗑𝗉1 circuit, there is at most one exponentiation gate along any path from an input variable to an output gate.444Similarly, one can define 𝖠𝖤𝗑𝗉k circuits where there are at most k exponential gates along any such path. In this work, we focus on 𝖠𝖤𝗑𝗉1 circuits as it is natural and is more relevant to the practical motivation of this work. We assume by default that an 𝖠𝖤𝗑𝗉1 circuit has only one output gate, though the definitions and our results can be easily generalized to multi-output circuits.

Evaluation of 𝗔𝗘𝘅𝗽𝟏 Circuits over Real Numbers.

The evaluation of 𝖠𝖤𝗑𝗉1 circuits over real numbers should be clear through the definition. Let C be any n-input 𝖠𝖤𝗑𝗉1 circuit and xn, the evaluation of C(x) is defined as the output of the output gate, where the output values of gates are defined by gate-by-gate evaluation following a topological order.

In particular, if the divisor of a division gate is zero, C(x) is undefined. Therefore, an 𝖠𝖤𝗑𝗉1 circuit may compute a partial function C:n. We define the domain of a C(x) circuit on , denoted by 𝖽𝗈𝗆(C), as the set of xn such that C(x) is defined. For any x𝖽𝗈𝗆(C), we may also say C(x)=.

Practical Motivation: Modeling Components in Neural Networks.

Our circuit model is a straightforward formalization of the program representation in Mirage [25]. Intuitively, the motivation to introduce 𝖠𝖤𝗑𝗉1 circuit is to model components that are widely used in model artificial neural networks that consist of non-linear activation functions.

3.2 Query Model over Reals, and Why it is not Satisfactory

Since 𝖠𝖤𝗑𝗉1 circuits compute functions over , a natural idealized model for the identity testing problem is that the algorithm can evaluate the circuit over any real input. Formally, a identity testing algorithm for 𝖠𝖤𝗑𝗉1 circuits C in real query model is allowed to query the following oracle:

  • (Evaluation). Given an input xn, the evaluation oracle reports whether C(x) is undefined, C(x)=0, or C(x){0}.

The main problem of the real query model is that computers cannot deal with real numbers. For real-world applications, real numbers are usually approximated by floating-point numbers, which does not satisfy arithmetic laws such as the commutativity and associativity of addition. This leads two-sided errors in the implementation of a testing algorithm in real query model:

  • (Completeness). Even if C is identically zero over , it may evaluate to a non-zero value on some inputs when the evaluation queries are implemented in float-point numbers.

  • (Soundness). Even if C is not identically zero, it may evaluate to 0 on any input when evaluation queries are implemented in float-point numbers, either because of the precision issue or because of the failure of arithmetic laws.

The occurrences of errors on both sides reduce the consistency and reliability of the testing algorithms. Take the example of optimization compilers PET [24] or Mirage [25] that aim to detect redundancy in neural networks modeled as 𝖠𝖤𝗑𝗉1 circuits. Two sided error will make the algorithms less predictable and reliable to users; indeed, it could be possible that the oracle will hardly ever report C(x){0} even if C(x) is defined and non-zero for most or all x due to the floating-point issue. In that case, the optimization compiler will barely find any redundancy. It is worth noting that the error is on top of the completeness and soundness error of the identity testing algorithm, so it cannot be resolved by a better (e.g., deterministic) testing algorithm.

3.3 Query Model over Finite Fields

Next, we introduce a natural query model for the identity testing problem of 𝖠𝖤𝗑𝗉1 circuits over finite fields that can be implemented in real-world applications without the precision issue.

Evaluation over Finite Fields.

Let p,q be two primes such that qp1, and Gp,q be the (unique) multiplicative subgroup of 𝔽p of order q. Equivalently, Gp,q contains the roots of the univariate polynomial zq1 over 𝔽p. Loosely speaking, we will define the evaluation of C on the input (u,v)𝔽pn×𝔽qn and aGp,q (modulo (p,q)) by

  • interpreting the exponentiation gate by the function xaxmodp,

  • implementing all computation “on the exponent” in 𝔽q,

  • implementing all computation “under the exponent” in 𝔽p,

and evaluating the circuit gate by gate.

We use Ca(u,v) to denote the evaluation above; see the full version for its formal definition.

Similar to the evaluation of 𝖠𝖤𝗑𝗉1 circuits on , Ca(u,v) may be undefined, denoted by Ca(u,v)=, due to division-by-zero. For each aGp,q, we define the domain of C with respect to a modulo (p,q), denoted by 𝖽𝗈𝗆p,q,a(C), as {(u,v)𝔽pn×𝔽qnCa(u,v)}.

Algebraic Query Model.

Subsequently, we define the identity testing problem for an 𝖠𝖤𝗑𝗉1 circuit C over the algebraic query model as follows. Let p,q,Gp,q be specified above. The algorithm for identity testing is allowed to query the following oracle:

  • (Evaluation). Given (u,v)𝔽pn×𝔽qn and aGp,q, the oracle reports Ca(u,v)𝔽p{}.

This query model is efficient both theoretically and practically. It can be observed, for instance, that the oracle in the algebraic query model can be implemented by algorithms in O(s(logp+logq)) space and polynomial time. In practice, the simplicity of the evaluation model makes it possible to implement it in the neural network compiler scenario (see [24, 25] for more details).

4 Exponential Polynomials

To understand the functions computed by 𝖠𝖤𝗑𝗉1 circuits, we will define exponential polynomials that, intuitively, generalize (multi-variate) polynomials by allowing terms of form exp(). Formally:

Definition 14 (exponential polynomial).

Let R be an integral domain and d,m be integers, d0, m1. An m-variate R-coefficient exponential polynomial P(x) of degree d is defined as

P(x)f1(x)exp(g1(x)h1(x))+f2(x)exp(g2(x)h2(x))++fk(x)exp(gk(x)hk(x)),

where x=(x1,,xm) denotes the indeterminates, fi, gi and hi are m-variate polynomials of degree d with coefficients from R.

The number of terms k is said to be the width of P(x), {fi(x)}i[k] are said to be the coefficient polynomials of P(x), and {gi(x)/hi(x)}i[k] are said to be the exponent fractions of P(x).

In this paper, we only use the special case where R=, i.e., integer-coefficient exponential polynomials. Nevertheless, we will develop the elementary arithmetic of exponential polynomials with respect to an arbitrary integral domain R for coefficients.

4.1 Basic Arithmetic Properties

We stress that an exponential polynomial should be considered as an abstract expression rather than a function. In particular, exp() should be considered as a symbol instead of the exponential function over . For simplicity, we will also use the summation symbol to define an exponential polynomial, i.e.,

P(x)i=1kfi(x)exp(gi(x)hi(x)),

where the summation symbol is a shorthand of the k-term summation in Definition 14.

We first define the addition and multiplication of exponential polynomials. Let R be a ring and P(x),P(x) be R-coefficient exponential polynomials defined as

P(x)i=1kfi(x)exp(gi(x)hi(x)),P(x)i=1kfi(x)exp(gi(x)hi(x)). (1)

We can naturally define the addition and multiplication of P(x) and P(x) as:

P(x)+P(x)i=1kfi(x)exp(gi(x)hi(x))+i=1kfi(x)exp(gi(x)hi(x)). (2)
P(x)P(x)i=1kj=1kfi(x)fj(x)exp(gi(x)hj(x)+gj(x)hi(x)hi(x)hj(x)). (3)

Next, we consider the arithmetic laws for exponential polynomials. As hinted at in the definitions of addition and multiplication, both operations are commutative and associative, and multiplication is distributive over addition. Moreover, it is implicit in the definition of multiplication that exponentiation symbol satisfies

exp(g(x)h(x))exp(g(x)h(x))=exp(g(x)h(x)+g(x)h(x)h(x)h(x)). (4)

Furthermore, we impose the following axiom that allows merging terms with the same exponent fraction:

f(x)exp(g(x)h(x))+f(x)exp(g(x)h(x))=(f(x)+f(x))exp(g(x)h(x)). (5)

We say that P(x) and P(x) are identical, denoted by P(x)=P(x), if they can be transformed to each other using the arithmetic laws above.

We note that the following two exponential polynomials

P1(x)=exp(x);P2(x)=exp(x22xx2)

are not identical as the division law is not allowed in the exponentiation symbol. This is intentional as in the evaluation of 𝖠𝖤𝗑𝗉1 circuits, we will only evaluate the circuit gate by gate without trying to simplify the circuit using the division law.

4.2 Evaluation of Exponential Polynomials

Similar to standard multivariate polynomials, we can define the evaluation of exponential polynomials that explains how to view such abstract expressions as functions. In particular, we will introduce two definitions corresponding to the real evaluation model and algebraic evaluation model of 𝖠𝖤𝗑𝗉1 circuits.

Let P(x) be an n-variate integer-coefficient exponential polynomial defined by the coefficient polynomials {fi}i[k] and exponent fractions {gi/hi}i[k].

Evaluation of Exponential Polynomials on Real Numbers.

We can view P(x) as a function P():n as follows. Given an input un, the evaluation of P(x) on the input u is defined as

P(u)i=1kfi(u)exp(gi(u)hi(u))

where all the operators (i.e., additions, divisions, multiplications, and exponentiations) are interpreted as corresponding functions in . If for some i[k], hi(u)=0, the exponential polynomial is said to be undefined on u. The domain of P(x) over , denoted by 𝖽𝗈𝗆(P), is defined as 𝖽𝗈𝗆(P){unhi(u)0i[k]}.

Evaluation of Exponential Polynomials on Finite Fields.

Similar to the algebraic query model for 𝖠𝖤𝗑𝗉1 circuits, we require two finite fields for the computation under and over exponents to define the evaluation of P over finite fields. Let p,q be two primes such that qp1, and Gp,q be the multiplicative subgroup of 𝔽p of order q. Given inputs u𝔽pn,v𝔽qn, and aGp,q, the evaluation of P on u and u, denoted by Pa(u,v), is defined as:

Pa(u,v)(i=1kfi(u)agi(v)(hi(v))1modq)modp.

Note that Pa(u,v) is undefined, denoted by Pa(u,v)=, if hi(v)modq=0 for some i[k]. The domain of P(x) over (p,q,a) is defined as

𝖽𝗈𝗆p,q,a(P(x)){(u,v)𝔽pn×𝔽qnhi(u)0modqi[k]}.
Proposition 15.

Let P(x) and P(x) be identical integer-coefficient exponential polynomials, p,q be two primes such that qp1. For any a in the multiplicative subgroup of 𝔽p of order q, we have that 𝖽𝗈𝗆p,q,a(P)=𝖽𝗈𝗆p,q,a(P) and Pa(u,v)=Pa(u,v) for any u𝔽pn,v𝔽qn.

4.3 Condensation of Exponential Polynomials

Recall that two exponential polynomials are said to be identical if they can be transformed to each other by arithmetic laws. As there is no division law for the exponent fractions, the exponential polynomials

P1(x)exp(x);P2(x)exp(x22xx2)

are not considered to be the same polynomial. In particular, we can notice that 𝖽𝗈𝗆(P1)𝖽𝗈𝗆(P2). Nevertheless, these two exponential polynomials as functions are essentially the same over all but the input x=2.

We now introduce the condensation of integer-coefficient exponential polynomials that simplifies an exponential polynomial by merging terms with “essentially the same” exponent fractions together, which will be useful in proving our main results.

Equivalent Exponential Fractions.

Let P(x) be a multi-variate integer-coefficient exponential polynomial of width k, {gi(x)/hi(x)}i[k] be the exponent fractions of P(x). Suppose that for each i[k], hi is not a zero polynomial. We define P to be the relation over [k] such that

iPj iff gi(x)hj(x)gj(x)hi(x)=0.

Note that gi(x)hj(x)gj(x)hi(x)=0 means that it is a zero integer-coefficient polynomial, or equivalently, gi(u)hj(u)gj(u)hi(u)=0 for any x.

Lemma 16.

Suppose that hi(x) is not a zero polynomial for any i[k], then P is an equivalence relation over [k].

Condensation of Exponential Polynomials.

Subsequently, we define a condensation P^ of an exponential polynomial P as obtained by grouping the coefficient polynomials according to the relation P. Formally:

Definition 17 (Condensation of exponential polynomials).

Let P(x) be an integer-coefficient exponential polynomial of degree d and width k, {fi(x)}i[k] be the coefficient polynomials of P(x), {gi(x)/hi(x)} be the exponent fractions of P(x) such that hi(x) is not a zero polynomial for each i[k]. Let π={[i1]π,[i2]π,,[it]π} be the partition of [k] induced by P, and i1,,it be arbitrary representation elements. We say that P^(x) is a condensation of P(x) if it is of form

P~(x)F1(x)exp(gi1(x)hi1(x))++Ft(x)exp(git(x)hit(x))

where Fj(x)i[ij]πfi(x) is an integer-coefficient polynomial of degree at most d^.

Definition 18 (Condensed exponential polynomials).

Let P(x) be an exponential polynomial with exponent fractions {gi(x)/hi(x)}i[k]. P(x) is a condensed exponential polynomial if hi’s are not zero polynomials and gi(x)hj(x)hi(x)gj(x) for i,j[k] and ij.

We stress that the condensation of an exponential polynomial is not unique as we can choose the representative elements i1,,it arbitrarily from their equivalent classes. The following proposition shows that P^ may have a larger domain compared to P, but they agree on the domain of P^.

Proposition 19.

Let P^ be a condensation of an integer-coefficient exponential polynomial P. Then:

  • 𝖽𝗈𝗆(P)𝖽𝗈𝗆(P^), and P,P^ agree on 𝖽𝗈𝗆(P).

  • Let p,q be prime numbers such that qp1, Gp,q be the multiplicative subgroup of 𝔽p of order q, and aGp,q. Then 𝖽𝗈𝗆p,q,a(P)𝖽𝗈𝗆p,q,a(P^), and P,P^ agree on 𝖽𝗈𝗆p,q,a(P).

4.4 Structural Lemma for 𝗔𝗘𝘅𝗽𝟏 Circuits

Now we are ready to prove a structural lemma showing that 𝖠𝖤𝗑𝗉1 circuits can be seen as fractions of exponential polynomials.

For simplicity, we introduce the following notation. Let P,P be n-variate exponential polynomials, we define 𝖽𝗈𝗆(P/P) be the set {u𝖽𝗈𝗆(P)𝖽𝗈𝗆(P)P(u)0}, i.e., the domain of the fraction P(x)/P(x). Similarly, let p,q be prime numbers such that qp1, Gp,q be the multiplicative subgroup of 𝔽p of order q, and aGp,q, we define 𝖽𝗈𝗆p,q,a(P/P) as the set {u𝖽𝗈𝗆p,q,a(P)𝖽𝗈𝗆p,q,a(P)P(u)0}.

Lemma 20.

For every n-input 𝖠𝖤𝗑𝗉1 circuit C, there are n-variate integer-coefficient exponential polynomials P(x) and P(x) such that the following holds:

  • 𝖽𝗈𝗆(C)=𝖽𝗈𝗆(P/P), and for each u𝖽𝗈𝗆(C), C(u)=P(u)/P(u).

  • Let p,q be prime numbers such that qp1, Gp,q be the multiplicative subgroup of 𝔽p of order q, and aGp,q. Then 𝖽𝗈𝗆p,q,a(C)=𝖽𝗈𝗆p,q,a(P/P) and for every (u,v)𝖽𝗈𝗆p,q,a(C), Ca(u,v)Pa(u,v)𝖨𝗇𝗏p(Pa(u,v))(modp).

In particular, if C does not contain exponentiation gates, both P and P do not contain the exponentiation symbol, i.e., P and P are integer-coefficient polynomials.

The proof is left to the full version of the paper.

4.5 Width, Degree, and Weight of Concrete Neural Network Components

It is proved in Lemma 20 that any 𝖠𝖤𝗑𝗉1 circuit can be transformed to an equivalent fraction of exponential polynomials. We can therefore define the width, degree, and weight of an 𝖠𝖤𝗑𝗉1 circuit.

Definition 21 (width, degree, and weight of 𝖠𝖤𝗑𝗉1 circuits).

An 𝖠𝖤𝗑𝗉1 circuit C is said to have width k, degree d, and weight w if there are integer-coefficient degree-d width-k exponential polynomials P(x) and P(x) that satisfy both conditions in Lemma 20.

We note that the transformation in Lemma 20 may not be efficient – the width, degree, and the bit-length of the integer coefficients may grow exponentially with respect to the size of the circuit. Nevertheless, it can be verified that many neural network components can be simulated by fractions of exponential polynomials with relatively small width, degree, and coefficients. We refer readers to the full version for more discussions.

5 Algorithms for Identity Testing

Now we are ready to describe our identity testing algorithms in real and algebraic query models. Indeed, our algorithms are essentially the same one: Randomly sample an input x (in corresponding models) and check whether the circuit evaluates to zero or on the input x. We will first prove the correctness of the algorithm in real query model (see Section 5.1), and generalize the proof to the algebraic query models in subsequent subsections.

5.1 Identity Testing in Real Query Model

Formally, the identity testing algorithm in real query model works as follows: Suppose that C:n is an 𝖠𝖤𝗑𝗉1 circuit of width k and degree d, let B=20dk2 be sufficiently large and S={1,2,,B}, the algorithm uniformly sample x1,x2,,xnS, and accept if C(x1,,xn)0.

It is clear that the algorithm is perfectly complete, and the soundness can be formalized as the following theorem.

Theorem 22.

Let C:n be an 𝖠𝖤𝗑𝗉1 circuit of width k and degree d that is not identically zero on 𝖽𝗈𝗆(C). Then for any non-empty finite subset S, if xSn is sampled uniformly at random, Pr[C(x){0,}]8dk2/|S|.

The key step for proving Theorem 22 is a necessary and sufficient condition for an exponential polynomial P to be identically zero. Intuitively, if the exponent fractions g1/h1,,gk/hk of P are pairwisely distinct, then P is identically zero on n if and only if all coefficient polynomials f1,,fk are all identically zero. Formally:

Lemma 23.

Let P:n be a condensed exponential polynomial of form

P(x)=i=1kfi(x)exp(gi(x)hi(x)),

where fi,gi,hi are integer-coefficient polynomials of total degree d and hi(x) is not identically zero on n for every i[k]. Then the following statements are equivalent:

  1. 1.

    P is identically zero on 𝖽𝗈𝗆(P);

  2. 2.

    Pr[P(x){0,}]>3dk2/|S| for xSn sampled uniformly at random for any non-empty finite subset S.

  3. 3.

    For every i[k], fi is identically zero on n.

The proof is omitted; see the full version of the paper.

We are now ready to prove Theorem 22.

Proof of Theorem 22.

Let C:n be an 𝖠𝖤𝗑𝗉1 circuit of width k and degree d that is not identically zero on 𝖽𝗈𝗆(C). Let S be a non-empty finite set. Then by the definition of degree and width of C, there are exponential polynomials P,P defined by

P(x) i[k]fi(x)exp(gi(x)hi(x)),P(x)j[k]fj(x)exp(gj(x)hj(x)).

such that C(x)=P(x)/P(x) for every xn, and for each i[k],j[k], fi,gi,hi,fj,gj,hj are integer-coefficient n-variate polynomials of total degree at most d.

Note that for each i[k],j[k], we have that hi(x),hj(x) are not identically zero on n. This is because otherwise at least one of P(x),P(x) is undefined on every xn, which implies that 𝖽𝗈𝗆(C)=. Moreover, we know that P and P are not identically zero on their domains, respectively, as otherwise C must be identically zero on its domain.

Let P^(x) be a condensation of P(x). As P is not identically zero on its domain, we know by Proposition 19 that P^(x) is not identically zero on its domain. Consider the random variable xSn sampled uniformly at random. We calculate the probability of the following events:

  • Let P^ be the event that P^(x){0,}, i.e., P^(x) is defined and P^(x)0. By Lemma 23, we can see that Pr[P^]13dk2/|S|.

  • Let be the event that P(x) is defined, i.e., hi(x)0 for every i[k]. Note that Pr[hi(x)=0]d/|S| for every fixed i[k] by Lemma 11. By the union bound, we know that Pr[]=1Pr[¬]1dk/|S|.

  • Let P be the event that P(x){0,}. Notice that

    Pr[P^]=Pr[P^]+Pr[P^¬]13dk2/|S|,

    where Pr[P^]=Pr[P] by Proposition 19 and Pr[P^¬]Pr[¬]dk/|S|. Therefore, Pr[P]13dk2/|S|dk/|S|14dk2/|S|.

Following the same argument, we can prove that Pr[P(x){0,}]14dk2/|S| over uniformly random xSn. It follows that

Pr[C(x){0,}] =Pr[P(x){0,}P(x){0,}]
Pr[P(x){0,}]+Pr[P(x){0,}]8dk2/|S|.

This completes the proof.

5.2 A Weak Descartes’ Rule over Finite Fields

Theorem 24.

Let p, q be prime numbers such that qp1, g𝔽p be an element of order q, and Gg be the unique multiplicative subgroup of 𝔽p of order q. Let kq, α1,,αk𝔽q be distinct, and β1,,βk𝔽p. Then the univariate polynomial

f(z)β1zα1+β2zα2++βkzαk

has at most q11/(k1) roots in G.

The key idea of the proof (which follows from [5, 11]) is to find a low-degree polynomial that has as many roots as f(z) in G. Let p,q be prime numbers such that qp1, and G=g be the unique multiplicative subgroup of 𝔽p of order q. Formally:

Proposition 25.

Let c{1,2,,q1} and fc(z) be the polynomial

fc(z)β1zcα1modq+β2zcα2modq++βkzcαkmodq.

Then the polynomials {fc(z)}c[q1] have the same number of roots in G.

Lemma 26 (Lemma 4.1 of [11]).

For α1,,αt,N and nN/gcd(α1,,αt,N), there is a c[n1] such that

maxi[t]{αicmodN}Nn1/t.

Proof of Theorem 24.

Fix p,q,g𝔽p,k, elements α1,,αk𝔽q and β1,,βk𝔽p. Let fc(z)𝔽p[z] be the univariate polynomial

fc(z)β1zcα1modq+β2zcα2modq++βkzcαkmodq

and f(z)=f1(z). Assume that α1α2αk. We may further assume without loss of generality that α1=0, as otherwise we can consider the polynomial f(z)/zα1 that has the same number of roots in G.

Let Nq and n=q, we have gcd(α2,,αk,N)=1 and thus nN/gcd(α2,,αk,N). By Lemma 26, there exists some c{1,2,,q1} such that for every i{2,3,,k},

αicmodqq11/(k1),

which further implies that the polynomial fc(z) is of degree at most q11/(k1). Subsequently, fc(z) has at most q11/(k1) roots in G. This completes the proof as f(z) and fc(z) have the same number of roots in G by Proposition 25.

5.3 Identity Testing in Algebraic Query Model

Theorem 7. [Restated, see original statement.]

Let C:n be an 𝖠𝖤𝗑𝗉1 circuit of width k, degree d and weight w, p and q be prime numbers such that qp1 and q>2(kw)2. Let Gp,q be the unique multiplicative subgroup of 𝔽p of order q. The following hold:

  • (Completeness). If C is identically zero on 𝖽𝗈𝗆(C), then for every aGp,q, Ca(u,v){0,} for every (u,v)𝔽pn×𝔽qn.

  • (Soundness). If C is not identically zero on 𝖽𝗈𝗆(C), then for uniformly random u𝔽pn,v𝔽qn,aGp,q, Pr[Ca(u,v){0,}]18dk4q1q1/(k21).

We first prove two lemmas: Lemma 27 is used to prove the completeness property, and Lemma 28 is used to prove the soundness property.

Lemma 27.

Let P^ be a condensation of an integer-coefficient exponential polynomial P. If P is identically zero on 𝖽𝗈𝗆(P) and 𝖽𝗈𝗆(P), then P^ is identically zero on 𝖽𝗈𝗆(P^).

The proof is omitted; see the full version of the paper.

Lemma 28.

Let p,q be prime numbers such that qp1, Gp,q be the unique multiplicative subgroup of 𝔽p of order q. Let P be a condensed integer-coefficient exponential polynomial

P(x)=i[k]fi(x)exp(gi(x)hi(x))

of degree d and width k. Suppose that for every i[k], fi(x) is not identically zero on 𝔽p and hi(x) is not identically zero on 𝔽q, then for (u,v,a) uniformly sampled from 𝔽pn×𝔽qn×Gp,q,

Pr[Pa(u,v){0,}]3dk2q1+q1/(k1).

The proof is similar to that of Lemma 23; see the full version of the paper.

Proof of Theorem 7.

By the definition of width, degree of 𝖠𝖤𝗑𝗉 circuits, there are exponential polynomials

P(x)=i[k]fi(x)exp(gi(x)hi(x)),P(x)=i[k]fi(x)exp(gi(x)hi(x))

such that the following hold:

  • fi,gi,hi,fi,gi,hi are d-degree polynomials with coefficients in [w,w],

  • 𝖽𝗈𝗆(C)=𝖽𝗈𝗆(P/P), and C(u)=P(u)/P(u) for u𝖽𝗈𝗆(C),

  • 𝖽𝗈𝗆p,q,a(C)=𝖽𝗈𝗆p,q,a(P/P), and Ca(u,v)=P(u,v)𝖨𝗇𝗏p(P(u,v)) for any aGp,q and (u,v)𝖽𝗈𝗆p,q,a(C).

Proof of Completeness.

Let C be an 𝖠𝖤𝗑𝗉1 circuit that is identically zero on 𝖽𝗈𝗆(C). We first prove that either P(x) is identically zero on 𝖽𝗈𝗆(P), or P(x) is identically zero on 𝖽𝗈𝗆(P).

Towards a contradiction, we assume that P is not identically zero on 𝖽𝗈𝗆(P) and P is not identically zero on 𝖽𝗈𝗆(P). Let S be a set of size 30dk2. By Theorem 22, we know that for xSn sampled uniformly at random,

Pr[P(x){0,}],Pr[P(x){0,}]<13.

By the union bound, we know that for xSn sampled uniformly at random, with probability at least 1/3, P(x),P(x){0,}. For any such xSm, we have that x𝖽𝗈𝗆(C) and C(x)=P(x)/P(x)0, which is impossible as C is identically zero on its domain.

Suppose that P(x) is identically zero on 𝖽𝗈𝗆(P), we know by Lemma 27 that the condensation P^(x) of P(x) is also identically zero on 𝖽𝗈𝗆(P^). Let k be the width of P^ and

P^(x)=i[k]f^i(x)exp(g^i(x)h^i(x)).

By Lemma 23, we know that for every i[k], f^i(x) is identically zero on n, or equivalently, f^i(x) is a zero polynomial. In that case, we must have f^i(x)0(modp) for every p. This implies that

P^a(u,v)=i[k]f^i(u)exp(g^i(v)h^i(v)){0,}.

Subsequently, Pa(u,v){0,} as P^ and P agree on 𝖽𝗈𝗆p,q,a(P) (see Proposition 19), which further implies that Ca(u,v){0,}.

Similarly, if P(x) is identically zero on 𝖽𝗈𝗆(P), then Pa(u,v){0,}, which implies that Ca(u,v)=. This concludes the completeness of the algorithm.

Proof of Soundness.

Let R(x)=P(x)P(x). It can be verified that it is an exponential polynomial with width k2, degree 2d and weight w2. It follows that:

  • For any xn, C(x)=P(x)/P(x){0,} if and only if R(x){0,}.

  • For any x𝔽pn×𝔽qn,aGp,q, Ca(x)=Pa(x)/Pa(x){0,} if and only if Ra(x){0,}.

Since C is not identically zero on 𝖽𝗈𝗆(C), i.e., C(x){0,} for some xn, R is not identically zero on 𝖽𝗈𝗆(R). Let R^ be a condensation of R. By Proposition 19, R^ is not identically zero on 𝖽𝗈𝗆(R^).

Let {fi′′}i[k2] be the coefficient polynomials of R^, and {gi′′/hi′′}i[k2] be the exponent fractions of R^. Note that hi′′ is non-zero for every i[k2], as otherwise 𝖽𝗈𝗆(R)=. By Lemma 23, we know that fi′′ is non-zero for some i[k2] and, without loss of generality, we may assume that fi′′ is non-zero for every i[k2].

It can be verified that the integer weights in hi′′ are within [w2,w2]. Moreover, the integer weights in fi′′ are within [(kw)2,(kw)2], as each fi′′ is a summation of at most k2 polynomials that have integer weights within [w2,w2]. As p>q>2(kw)2, we know that for every i[k2], fi′′ is not identically zero on 𝔽pn and hi′′ is not identically zero on 𝔽qn.

Let x=(u,v)𝔽pn×𝔽qn and aGp,q be random variables sampled uniformly at random. We calculate the probability of the following events:

  • Let R^ be the event that R^(x){0,}. By Lemma 28, Pr[R^]16dk4/qq1/(k21).

  • Let be the event that hi′′(v)0 for every i[k2]. For every fixed i[k2], by Lemma 11, Pr[hi′′(v)=0]2d/q. Then by the union bound, Pr[]=1Pr[¬]12dk2/q.

  • Let R be the event that R(x){0,}. Notice that

    Pr[R^]=Pr[R^]+Pr[R^¬]16dk4/qq1/(k21),

    where Pr[R^]=Pr[R] and Pr[R^¬]2dk2/q. Therefore,

    Pr[R]16dk4/qq1/(k21)2dk2/q18dk4/qq1/(k21).

This completes the proof, as Ca(x){0,} if and only if Ra(x){0,}.

References

  • [1] Manindra Agrawal and Somenath Biswas. Primality and identity testing via chinese remaindering. J. ACM, 50(4):429–443, 2003. doi:10.1145/792538.792540.
  • [2] Sanjeev Arora and Boaz Barak. Computational Complexity - A Modern Approach. Cambridge University Press, 2009. URL: http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521424264.
  • [3] Sanjeev Arora and Shmuel Safra. Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70–122, 1998. doi:10.1145/273865.273901.
  • [4] Alan Baker. Transcendental Number Theory. Cambridge Mathematical Library. Cambridge University Press, 2022.
  • [5] Ran Canetti, John Friedlander, Sergei Konyagin, Michael Larsen, Daniel Lieman, and Igor Shparlinski. On the statistical properties of diffie-hellman distributions. Israel Journal of Mathematics, 120:23–46, 2000.
  • [6] Qi Cheng, Shuhong Gao, J. Maurice Rojas, and Daqing Wan. Sparse univariate polynomials with many roots over finite fields. Finite Fields Their Appl., 46:235–246, 2017. doi:10.1016/J.FFA.2017.03.006.
  • [7] Tri Dao. FlashAttention-2: Faster attention with better parallelism and work partitioning. In International Conference on Learning Representations (ICLR), 2024.
  • [8] Richard A. DeMillo and Richard J. Lipton. A probabilistic remark on algebraic program testing. Inf. Process. Lett., 7(4):193–195, 1978. doi:10.1016/0020-0190(78)90067-4.
  • [9] David Steven Dummit and Richard M Foote. Abstract algebra, volume 3. Wiley Hoboken, 2004.
  • [10] Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. Taso: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP ’19, pages 47–62, New York, NY, USA, 2019. Association for Computing Machinery. doi:10.1145/3341301.3359630.
  • [11] Zander Kelley. Roots of sparse polynomials over a finite field. LMS Journal of Computation and Mathematics, 19(A):196–204, 2016. doi:10.1112/S1461157016000334.
  • [12] László Lovász. On determinants, matchings, and random algorithms. In Lothar Budach, editor, Fundamentals of Computation Theory, FCT 1979, Proceedings of the Conference on Algebraic, Arthmetic, and Categorial Methods in Computation Theory, Berlin/Wendisch-Rietz, Germany, September 17-21, 1979, pages 565–574. Akademie-Verlag, Berlin, 1979.
  • [13] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. URL: http://www.cambridge.org/de/academic/subjects/computer-science/algorithmics-complexity-computer-algebra-and-computational-g/analysis-boolean-functions.
  • [14] Øystein Ore. Über höhere kongruenzen. Grøndahl, 1921.
  • [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Naresh Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Liang, Mengli Liu, Ignasi Mattei, Suriya Singh, Jonathan Smith, Nikolaus Sygnowski, Agata Wu, Yongfei Yao, Zachary Bell, Gregory Devito, Jie Yuan, Brian Zhu, Michael Zick, Luke Jia, Tero Maggioni, Soumith Chintala, and Gavin Stevens. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, volume 32, 2019.
  • [16] PyTorch. PyTorch Documentation, entry: torch.nn.functional.scaled_dot_product_attention. https://docs.pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html, 2025. Accessed: 2025-11-27.
  • [17] Nitin Saxena. Progress on polynomial identity testing. Bull. EATCS, 99:49–79, 2009.
  • [18] Nitin Saxena. Progress on polynomial identity testing-II. In Perspectives in Computational Complexity: The Somenath Biswas Anniversary Volume, pages 131–146. Springer, 2014.
  • [19] Jacob T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. J. ACM, 27(4):701–717, 1980. doi:10.1145/322217.322225.
  • [20] Adi Shamir. IP = PSPACE. J. ACM, 39(4):869–877, 1992. doi:10.1145/146585.146609.
  • [21] Richard M. Stallman and the GCC Developer Community. Using the GNU Compiler Collection. GNU Press, Boston, MA, USA, 2025. For GCC version 15.2.0. URL: https://gcc.gnu.org/onlinedocs/gcc-15.2.0/gcc.pdf.
  • [22] William T Tutte. The factorization of linear graphs. Journal of the London Mathematical Society, 1(2):107–111, 1947.
  • [23] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  • [24] Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, and Zhihao Jia. PET: Optimizing tensor programs with partially equivalent transformations and automated corrections. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 37–54. USENIX Association, July 2021. URL: https://www.usenix.org/conference/osdi21/presentation/wang.
  • [25] Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, and Zhihao Jia. Mirage: A multi-level superoptimizer for tensor programs. In 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25), Boston, MA, July 2025. USENIX Association. URL: https://www.usenix.org/conference/osdi25/presentation/wu-mengdi.
  • [26] Richard Zippel. Probabilistic algorithms for sparse polynomials. In Edward W. Ng, editor, Symbolic and Algebraic Computation, EUROSAM ’79, An International Symposiumon Symbolic and Algebraic Computation, Marseille, France, June 1979, Proceedings, volume 72 of Lecture Notes in Computer Science, pages 216–226. Springer, 1979. doi:10.1007/3-540-09519-5_73.