Faster & Deterministic FPT Algorithm for Worst-Case Tensor Decomposition
Abstract
We present a deterministic time algorithm for decomposing -dimensional, width- tensors of rank at most over and . This improves upon the previous randomized algorithm of Peleg, Shpilka, and Volk (ITCS ’24) that takes time and the deterministic time algorithms of Bhargava, Saraf, and Volkovich (STOC ’21).
Our work resolves an open question asked by Peleg, Shpilka, and Volk (ITCS ’24) on whether a deterministic Fixed Parameter Tractable (FPT) algorithm exists for worst-case tensor decomposition. We also make substantial progress on the fundamental problem of how the tractability of tensor decomposition varies as the tensor rank increases. Our result implies that we can achieve deterministic polynomial-time decomposition as long as the rank of the tensor is at most , where is some fixed constant independent of and . Further, we note that there cannot exist a polynomial-time algorithm for unless ETH fails. Our algorithm works for all fields; however, the time complexity worsens to and requires randomization for finite fields of large characteristics. Both conditions are provably necessary unless there are improvements in the state of the art for system solving over the corresponding fields.
Our approach achieves this by designing a proper learning (reconstruction) algorithm for set-multilinear depth-3 arithmetic circuits. On a technical note, we design a “partial” clustering algorithm for set-multilinear depth-3 arithmetic circuits that lets us isolate a cluster from any set-multilinear depth-3 circuit while preserving the structure of the circuit.
Keywords and phrases:
Algebraic circuits, Deterministic algorithms, FPT algorithm, Learning circuits, Reconstruction, Tensor Decomposition, Tensor RankCategory:
Track A: Algorithms, Complexity and GamesCopyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Algebraic complexity theory ; Theory of computation Pseudorandomness and derandomizationEditors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
Tensors, higher-dimensional analogs of matrices, are multi-dimensional arrays with entries from a field . For instance, a 3-dimensional tensor can be written as . The notion of tensor rank and tensor decomposition are one of the most important tools in modern science. Tensor rank and decomposition have become fundamental tools in various branches of science, with applications in machine learning, statistics, signal processing, and computational complexity. Even within machine learning theory, tensor decomposition algorithms are leveraged to give efficient algorithms for phylogenetic reconstruction [40], topic modeling [2], community detection [3], independent component analysis [39], and learning various mixture models [27, 30]. For more details on the applications of tensor decomposition and the rich theory of tensors in general, we refer the reader to the detailed monograph by Landsberg [37] and the references therein.
We will work with general -dimensional tensors . The rank of a tensor is defined as the smallest for which can be written as a sum of tensors of rank 1, where a rank-1 tensor is a tensor of the form with . Here, denotes the Kronecker (outer) product, also known as the tensor product. The expression of as a sum of such rank-1 tensors over the field is called -tensor decomposition, or simply tensor decomposition.
Tensor decomposition is a notoriously111The seminal work of Lim and Hiller [26] aptly justifies the use of this adjective. difficult problem. Håstad [25] showed that determining the tensor rank is an NP-hard problem over and NP-complete over finite fields. The follow-up work of Schafer and Stefankovic [42] further improved our understanding by giving a very tight reduction from algebraic system solving. Formally, they show that for any field , given a system of algebraic equations over , we can in polynomial time construct a 3-dimensional tensor and an integer such that has a solution in iff has rank at most over .
Despite the known hardness results, tensor decomposition remains widely studied due to its broad applicability. There is a rich history of tensor decomposition algorithms in the literature. Most of these algorithms are average-case (i.e., they work when the entries are sampled randomly from a distribution), or they work when the input is non-degenerate (i.e., when the tensor components are chosen from a Zariski open set), or they are heuristic. See, for instance, [14, 4, 36]. This area also includes interesting active research directions, such as making these algorithms noise-resilient and studying their smoothed analysis [6, 43, 16].
On the other hand, tensor decomposition in the worst-case scenario has only recently started to receive attention. Firstly, the work of Bhargava, Saraf, and Volkovich [11] provided an time algorithm to decompose rank- tensors, showing that an efficient algorithm exists to decompose fixed-rank tensors.
The next natural step in furthering the understanding of this NP-hard problem is studying its Fixed Parameter Tractability (FPT). In the FPT setting, the input instance comes with a parameter with specific quantities (e.g., the optimum treewidth of a graph). In our case of tensor decomposition, this parameter is the tensor rank. This setting treats as a parameter much smaller than the instance size , thus relaxing the required runtime of the algorithm from to , for any computable function . The class FPT comprises parameterized problems that admit an algorithm with this running time.
Peleg, Shpilka, and Volk [41] designed a randomized FPT algorithm for decomposing rank- tensors. Specifically, they provided a randomized time algorithm for decomposing rank- tensors. As a direct consequence, they demonstrated that tensors can be decomposed in polynomial time even beyond the constant tensor rank up to .
Both these works use a standard connection between tensor decomposition and reconstruction of a special case of depth-3 circuits. We start by describing this connection below.
Tensors as depth-3 arithmetic circuits
For a tensor consider the following polynomial
Let be a set-multilinear depth- circuit over respecting the partition (denoted by ), and computing . Then observe that
where corresponds to the linear form as an -dimensional vector over . Indeed, it is easy to see that a tensor has rank at most if and only if can be computed by a circuit. Therefore, the rank of is the smallest for which can be computed by a circuit. We say that a circuit has width , if for all .
Now that we understand the direct correspondence between tensors and set-multilinear depth-3 arithmetic circuits, we will, for simplicity of analysis, stick to this polynomial representation of tensors. We emphasize that all operations on the polynomial associated with a tensor can be interpreted as standard efficient operations on tensors themselves. We will describe these basic operations used in our algorithm in terms of tensor notation, which follows directly from the above equivalence.
For instance, any partial evaluation of on a subset of the variable partition (say, ) is simply the weighted contraction of the tensor along those directions (1,3,8) with weights , and , respectively. A complete evaluation is simply the dot product of and the outer-product of the evaluation points, that is Note that the linear forms in the circuit representation of correspond to vectors in the tensor decomposition.
Keeping this connection in mind, we get that if we can learn the optimal representation of as an circuit, that will directly give an optimal tensor decomposition of as well. The problem of learning a circuit representation of a polynomial from black-box (a.k.a. oracle/membership query) access to the polynomial is referred to as arithmetic circuit reconstruction. This problem is the algebraic analog of exact learning in Boolean circuit complexity [5]. In recent years, much attention has been focused on reconstruction algorithms for various interesting subclasses of arithmetic circuits, both in the worst case setting [7, 35, 34, 17, 10] and in the average case setting [23, 24, 33, 19, 9]. Given the natural expressibility of polynomials (and algebraic circuits), arithmetic circuit reconstruction links and connects to many other learning problems. It is not just restricted to tensor decomposition but also extends to other fundamental learning problems like learning mixtures of Gaussian and subspace clustering [31, 20, 12]. Finding other such connections is a very interesting research direction.
Returning to the aforementioned connection, we now focus on learning these special depth-3 circuits. A set-multilinear depth- circuit with respect to a partition and top fan-in , denoted by , computes a set-multilinear polynomial of the form:
where is a linear form in . A polynomial is called set-multilinear with respect to the partition if every monomial that appears in is of the form , where . The relation between tensors and depth-3 set-multilinear circuits is discussed in detail in the full version of paper. We assume black-box (or oracle/membership query) access to the polynomial rather than access to the full tensor . Simply reading the entries of the tensor would require time. However, in the low-rank case, we can learn the decomposition with far fewer measurements. This approach, known as studying or decomposing tensors using evaluations of or “measurements” of the tensor , has been extensively studied in the literature [18, 13].
Non-uniqueness of tensor decomposition.
One reason for the hardness of tensor decomposition is its non-uniqueness. This is evident even in the average-case setting due to the limitations of any tensor decomposition algorithm beyond Kruskal’s uniqueness bounds [38, 47]. Naturally, in the worst-case setting, things become even more convoluted.
Fortunately, thanks to strong structural results on polynomial identities of depth-3 circuits, we can obtain strong almost-uniqueness results for such circuit representations. These are structural results for identically zero circuits and essentially show that under some mild conditions, any circuit that computes the identically zero polynomial must have its linear forms contained in a “low-dimensional” space. And, in the case of set-multilinear depth-3 circuits, it implies that the circuit is of “low-degree” after stripping the linear factors. Observe that two different ways of writing a polynomial constitute an elaborate polynomial identity. This directly yields that two decompositions of will differ on only -linear forms. For more details into these structural results for decompositions, we refer the reader to check out [11, Section 3.4.2].
2 Our Results
As described in the previous section, the problem of worst-case tensor decomposition, or equivalently, proper learning of circuits, was studied in [11], where the authors presented a deterministic algorithm with a running time of . This algorithm runs in polynomial time when is a constant. This was extended beyond constant rank in the recent work of [41], where they proposed a randomized algorithm with a running time of . This algorithm runs in randomized polynomial time for tensor rank .
In [41, Section 1.4], two open problems were posed: Firstly, to understand how the tractability of tensor decomposition changes as tensor rank increases; and secondly, whether a deterministic fixed-parameter tractable (FPT) algorithm exists for worst-case tensor decomposition.
We make significant progress in answering the first question and completely resolve the second. We now state our result for learning set-multilinear arithmetic circuits.
Theorem 1 (Learning circuits).
Given blackbox access to degree , variate polynomial computable by a set-multilinear circuit with top fan-in over or , then there exists a deterministic algorithm that outputs a set-multilinear circuit over with top fan-in computing in time .
As a direct consequence of the above theorem, we get the following corollary.
Corollary 2 (Decomposing rank- tensors).
Let be a -dimensional tensor of rank at most with or . Let . Given black-box access to measurements of (equivalently to evaluations of ), there exists a deterministic time algorithm for computing a decomposition of as a sum of at most rank 1 tensors.
Note that just reading the entries of the tensor will require time. However, the above corollary states that in the low-rank case, we can learn the decomposition with much fewer measurements of the tensor.
As a direct consequence of the above result, we can perform the tensor decomposition for any tensor with tensor rank in time for a fixed constant . This substantially improves the previously known tensor rank bound of .
Remark 3.
The only point at which the field plays a crucial role in our work is in the polynomial system-solving process (which is provably necessary; see Discussion 2). Aside from this, our techniques are field-independent and deterministic. The computational model in Theorem 1 assumes input entries as integers (or rationals), and without loss of generality, we can maintain this assumption by approximating or truncating any given real/ complex numbers to a specific number of bits.
The coefficients in our output are derived from solutions to polynomial systems of equations. These solutions lie in an algebraic extension over the base field, and their -rational approximations can be computed with an additional factor of in the time complexity; see, for instance, [21, Remark on page 2].
On the intractability/hardness side, using Håstad’s [25] tight reduction between tensor rank and SAT, we see that even testing tensor rank will require time assuming the Exponential Time Hypothesis (ETH, [29]). This demonstrates that our time complexity is somewhat tight (with respect to the parameter ). We formalize this in the observation below.
Observation 4.
Given a -dimensional tensor (with dimensions) and , if there exists a time algorithm for testing if , then ETH is false.
Proof.
Note that Håstad ([25, Lemma 2]) converts a 3-SAT instance with variables and clauses to a tensor of rank if and only if the 3-SAT instance is satisfiable222We can assume by an easy reduction from general 3-SAT. Thus, if there exists a time algorithm for testing tensor rank, this would yield a -time algorithm for 3-SAT, since the reduction incurs only a linear blow-up. Thus refuting ETH.
Comparison with previous works
Two main works [11, 41] have previously addressed this problem using similar approaches, and below we describe why exactly the approaches in these two works fail to achieve the required deterministic running time of .
-
In [11], the authors develop a way of computing almost all of the linear forms (i.e. the last layer) in the circuit . After this, they have to brute-force search over all subsets of the variable partition of size to obtain linear forms from each of the gates. This fails the required running time for an FPT algorithm as it adds a factor to the running time.
-
In [41], the authors focus on the reconstruction of multilinear circuits and consequently provide a reconstruction algorithm for set-multilinear circuits. Their approach hinges on obtaining black-box access to a cluster (of “multiplication” gates) of multilinear gates such that the cluster representation is unique. The benefit of this approach is that thanks to its uniqueness, we can aim to obtain black-box access to a cluster of gates. However, even after removing the linear factors from these clusters, the degree can be as high as . To learn this linear factor-free part of the cluster (referred to as the simple part), they have to solve a system of polynomial equations with variables, which takes time over fields or . Although fixed-parameter tractable (FPT), this leads to a very inefficient exponential tower dependence on . Another important distinction is that the algorithm in [41] is randomized, placing tensor rank in randomized FPT time. As observed in [41, Remark 7.3], in the process of getting black-box access to a cluster, the authors have to do polynomial identity testing of a degree polynomial, which is well beyond current PIT techniques to do this in time deterministically. Therefore, there is no easy derandomization of their algorithm.
We summarize the comparison in the table below.
Can we improve the time complexity further to ?
Our approach is based on using “uniqueness” results (or rank bounds in the algebraic complexity literature) of tensor decomposition, which requires . If we use polynomial system solving to learn this low-degree set-multilinear circuit, it will create a system of equations in variables. With these parameters, the currently known best techniques for solving a system of polynomial equations will require a running time, making it difficult to directly improve the running time to . An improvement to would either require a new approach to learning low-degree circuits or a detailed study of these algebraic systems of equations to achieve better time complexity in finding their solutions. However, we cannot rule out the possibility of the decision version of this problem being solved in time. We leave this as an open problem.
Over other fields
Our algorithm works for all fields; however, the time complexity worsens to and requires randomization for finite fields of large characteristics. Both conditions are provably necessary, as we will discuss now. Let’s start by stating our result in generality for all fields.
Theorem 5 (Learning over general field ).
Given blackbox access to a degree , -variate polynomial computable by a set-multilinear circuit over , there exists an algorithm that outputs a set-multilinear circuit computing over a degree extension of in time .
Here, denotes the time complexity of factorizing a univariate polynomial of degree over .
The reason for the different time complexity over different fields lies in the difficulty of system solving over these fields. Our algorithm uses as a subroutine an algorithm for solving algebraic systems with unknowns. The above result follows directly from using the best algorithm for algebraic system solving over any field . This sensitivity to the underlying field is natural in tensor decomposition, as different fields can significantly affect the computational complexity. For example, over the rationals (), determining the exact tensor rank – even for constant tensor rank – is believed to be undecidable [44, 42].
In fact, a tight reduction in [42] solidifies this connection, demonstrating a tight reduction between the feasibility of algebraic systems over and computing tensor rank over :
Theorem 6 ([42]).
For any field , given a system of algebraic equations over , we can construct in polynomial time a 3-dimensional tensor of shape and an integer such that has a solution in if and only if has rank at most over .
Even restricting ourselves to finite fields, the time complexity of Theorem 5 depends on the efficiency of univariate polynomial factorization over the field. Efficient randomized algorithms exist for this task, but derandomizing them remains a notoriously difficult problem, as noted in [1, Problem 15].
Interestingly, the hardness of derandomizing univariate polynomial factorization over directly impacts tensor decomposition, as observed by Volkovich [46]. Specifically, there is no known way to derandomize Theorem 5 over finite fields unless we can derandomize the factorization of univariate (even quadratic) polynomials over finite fields.
Theorem 7 ([46, Theorem 5]).
If the class of (set)-multilinear circuits over the field is learnable as multilinear circuits of polynomial size with , then, in deterministic time, the learning algorithm can be converted to an algorithm that computes the square root of any element in , if it exists.
3 Proof Idea
As the introduction mentions, we will use the connection between tensor rank and set multilinear depth-3 circuits. For readers who are not very familiar with the algebraic circuit representation of tensors, we emphasize that we are simply performing standard efficient operations on tensors and use this notation solely for ease of analysis. We refer the reader to subsection 1 to familiarize themselves with this notation. Overall, we are simply performing weighted contractions (partial evaluations) on our tensor after a change of basis (variable reduction) to first learn most of the vectors in the decomposition. We then use the uniqueness of low-rank tensor decomposition in high-dimensional settings to learn the remaining vectors in the decomposition. We now give a detailed sketch of our algorithm using set-multilinear depth-3 circuits terminology.
Recall that we only need black-box access to , i.e., we can query evaluations of at fixed points in in constant time, rather than requiring a dense representation of the tensor/polynomial.
Through some preprocessing, we can assume the following:
-
We know , the rank of the tensor or equivalently the minimum such that is computable by a . Indeed, we can assume the value of and output the first for which the learning algorithm works, since we can always test if our output is correct deterministically using an existing black-box PIT algorithm. This affects the time complexity by a multiple of .
-
We can strip off with any linear factors (), since any optimal decomposition(with tensor rank ) of gives an optimal decomposition for as well. This comes from a result of the factoring of circuits from [45] described in Lemma 10 that shows that if is computable by a circuit then any irreducible factor can also be computed by a product of circuits. It also describes how to obtain black-box access to these irreducible factors in time.
-
We can assume that . This follows from the width reduction step, see [11, Section 5.1] and Lemma 11. In the low-degree reconstruction in Lemma 12, the system of polynomial equations has variables, which go into the exponent in the running time required to find a solution. Therefore we must be able to reduce the width to , so we can obtain an FPT algorithm.
-
Note that there can be multiple optimal decompositions for , but for the sake of argument, we will fix one representation and argue the proof using this fixed representation .
Low Degree Reconstruction using system solving
If , then we can simply reduce it to finding a solution of a system of a -variate algebraic system. Indeed if the degree is small, the number of monomials appearing in is small, and the total number of variables appearing in is small. One can invoke black-box reconstruction algorithms for sparse polynomials [35, 8] to learn as a sum of monomials. Then, keeping the coefficients of representation as unknowns we set up a system of polynomial equations in variables such that every solution to the system corresponds to a representation of .
Our Computational budget
Note that for low-degree learning (of degree ), we require at least time to learn the low-degree gates, as it requires solving a system of equations with at least variables. Our goal is to ensure that all algebraic manipulations necessary to learn the full circuit fit within this time budget. It is unclear whether the [32] and [41]-style clustering as the learning approach, which requires solving a system of polynomial equations in variables and therefore time, can be performed within this time constraint. Therefore, we combine ideas from both [11] and [41] to adopt partial clustering – just enough to isolate a gate (or its corresponding cluster). Subsequently, we can subtract this cluster and reduce it to a top circuit, which we can then learn by induction.
Learning almost all the gates of the circuit
We adopt the same approach as in [11, Sections 5.3, Section 5.4] to learn almost all the gates of the circuit. The exact result we need is described in Lemma 17. The idea is to use almost-uniqueness for circuits twice. Firstly, we will project the circuit to a low degree (approximately ) and then learn this projection using low-degree reconstruction. Note that the representation we will learn will have some linear forms that are the same as the original representation , by almost-uniqueness. In fact, we can ensure to get two distinct linear forms supported on the same variable set.
In the next step, we will use these linear forms to learn most of the linear forms of . Once we learn and appearing in , we try to learn more linear forms as follows. The algorithm applies a suitable setting of the variables of in the polynomial that makes evaluates to , resulting in a circuit with fewer than multiplication gates. Call the restricted polynomial and let be the restricted version of . By the inductive hypothesis, we can learn an representation of , which will be close to the original representation by almost-uniqueness. Repeating the same with , once we have this, by iterating over all ways of matching up the multiplication gates333This matching step involves a brute-force matching, which might seem wasteful, but it fits within our computational budget, so we don’t need to optimize this. and choices of overlap, we can generate a list of gates such that . Here, is a measure of how many linear forms are different in and (distance measure between two gates). We refer to as the almost circuit.
3.1 Using linear forms learned in Almost Circuit
Now, we have reconstructed the following almost circuit such that . We know that at most linear forms from each are incorrect in the representation we learned in . Therefore, there are at most incorrect linear forms, and so there are at least variable parts for which we know all linear forms in are correct. We call this set of partitions .
As discussed, . In [11], the authors guessed this set, which introduced -type dependence, but we will do something different. Note that, for any subset such that , there would be at least one variable part in .
For any variable part , we observe that if there exists some such that , then we can reconstruct exactly. To do this, we substitute such that and . Now using black-box factoring on after the substitution, we can learn as all other terms vanish. Due to unique factorization, we can say that we learn the projection correctly and we can find simply using . Once we have learned exactly, we can just subtract it from the rest of and learn as an set-multilinear circuit.
What if there is no variable part, for which the above condition holds? That is, . One could try to use the above technique iteratively decreasing the top fan-in using variable parts in , i.e., you pick a variable part and fix the variables to a value such that one linear form depending on (except ) in (and also in as ) is set to zero while keeping non-zero (by ensuring ). This will decrease the fan-in by at least one until we are left with our target gate , which we can learn using the above-mentioned technique. This approach also fails, as there may be other gates that differ from on a few variable parts, none of which are in , and hence cannot be differentiated using just . To avoid this issue, we will focus on learning a cluster of gates, that is, multiplication gates (the ’s in the circuit) that differ on only a few () linear forms instead of learning just one multiplication gate. See Lemma 15 for the formal definition of clusters.
3.2 Set-Multilinear Clustering
Karnin and Shpilka [32] introduced the notion of clustering multiplication gates in any depth-3 circuit. Just like clustering points in space, where close points form a cluster and distant points form different clusters, clustering multiplication gates with as a distance metric ensures that gates in one cluster differ on a few linear forms, while gates in different clusters differ on substantially more linear forms. One significant benefit of studying clustered representation is that it is unique! Furthermore, if we can get black-box access to a single low simple rank cluster, then we can learn a circuit representation of it. Indeed, we can strip off the gcd among different multiplication gates in the cluster, and what is left is just a low-degree circuit. In fact, this exact approach has been used in [32], [11], and [41] for learning (multilinear) circuits.
However, one major drawback of the multilinear clustering used in [41] and [32] is that the rank of the simple part444Resulting polynomial after stripping off the linear factors. of any cluster has an upper bound of . When we try to learn this simple part as a low-degree circuit, it requires solving a system of equations in variables. This was one of the main culprits behind the exponential tower dependence in in [41].
We develop a partial cluster representation (Lemma 15) specifically designed for isolating a single cluster from an circuit. For instance, if we want to isolate a cluster containing the gate , then our clustering algorithm will output a set , the partial cluster containing 1, such that the degree of the simple part of the cluster is at most while ensuring that the ’distance’ between the isolated cluster and other gates is high enough. Formally, for .
Note that, if we can isolate the cluster , that is, get black-box access to a cluster , we can reconstruct it using low-degree reconstruction as the degree of the simple part of is less than . Our clustering mechanism ensures that for . This further implies that .
A natural approach would be to use these -variable parts and to set all the gates not in to zero while ensuring that doesn’t vanish. However, we don’t have any idea what these variable parts are. Obviously, any brute force search for them will have a -type dependence. Furthermore, our approach should also ensure that any projection doesn’t kill or .
3.3 Good -isolating Projections
We will now describe how to handle the above issues. Firstly, we will ensure that the variable parts we are using are not only in , but also that the linear forms depending on these variable parts are linear factors of , i.e., we choose variable parts from . Fixing variables that are in the variable parts from the set will give us up to a few linear forms. This fixing of variables is what we call a good -isolating projection as defined in Definition 19.
And secondly, in Section 7.1, we describe an algorithm to search for those -variable parts such that the linear forms depending on them in other gates differ from , in time . We design a recursive approach that, using the structural guarantees of , outputs a list of candidates for these variable parts, with a guarantee that one of them will help us project to the cluster . The reduction of the search time for these -variable parts from to is the main technical contribution of our work.
We now elaborate on the structural guarantees of that let us do this. Since the rank of is less than , the size of is at least and therefore . Therefore, our algorithm will pick variable parts, one of which is guaranteed to be in , such that the linear forms in the gates not yet set to 0 depending on those parts in have a dimension of at least 2. Then, for each of these, we set a linear form not in the span of to 0 while keeping non-zero, decreasing the top fan-in, and then recursively call the function. The distance condition in Lemma 15 ensures that if there is a gate not in that has not yet been set to zero, we will be able to find such a variable part. Therefore, we have recursive calls, one of which will be such that the gates remaining are only in and the variable parts fixed all belong to , and thus we can learn .
For the correct choice of these variable parts, we can ensure that the linear forms in depending on the part don’t vanish. This ensures that any other gate in also doesn’t vanish, as it is part of their gcd (the initial representation is such that ). Also, due to unique factorization and since all the linear forms that got fixed to constant non-zero values are linear factors of , we can simply multiply back the linear forms we learned in (which will be the same as as and other gates in as the linear forms are part of ) for those variable parts.
Once we have learned , we subtract it from and learn the smaller top fan-in circuit. To ensure that the output circuit computes the correct polynomial, we use an efficient FPT polynomial-time PIT for circuits in Theorem 9 at the end, which ensures that the only circuit output is the correct one.
4 Preliminaries
In this section, we will define notations and develop the basic preliminaries in Algebraic complexity and reconstruction required to understand this work. An experienced reader can presumably skip to Section 5. A more detailed set of preliminaries with Notations can be found in the full version.
Complexity of Solving a System of Polynomial Equations
Theorem 8.
Let be -variate polynomials of degree at most . Then, the complexity of finding a single solution to the system (if one exists) over various fields is as follows:
-
1.
[21] For , we have deterministic time. Here the authors assumed that the constants appearing in the system are integers (or rationals). Note that for all computational applications we can WLOG assume this by simply approximating/truncating a given real number at some number of bits.
-
2.
[28] For (or any algebraically closed field) deterministic time.
-
3.
For all fields , the . Here, denotes the time complexity of factorizing a univariate polynomial of degree over , randomized or deterministic. This follows from standard techniques in elimination theory. For a detailed sketch of the argument and a bound on the size of the extension, see [11, Appendix A].
Polynomial Identity Testing
Corollary 9 ([22, Theorem 3]).
Over any field , there’s a deterministic polynomial time for the class of set-multilinear polynomials computed by depth-3 set-multilinear circuits of degree and top fan-in .
The result in [22] is quantitatively much stronger, but we restate only what is most suited for our application.
Factoring: structural and algorithmic results
In their work [45], showed that for any class of multilinear polynomials, one can derandomize polynomial factoring using PIT algorithms for . Using that fact directly with the PIT algorithm in previous subsection we get that.
Lemma 10 ([45, Corollary 1.2]).
There is a deterministic algorithm that given a black-box access to a circuit , outputs black-boxes for the irreducible factors of , in time . In addition, each such irreducible factor is computable by a circuit. There is a deterministic algorithm that outputs linear functions and black-box access to a simple set-multilinear circuit such that , in time .
Width Reduction for set-multilinear
In [11], the authors presented an algorithm for learning set-multilinear circuits of arbitrary width in roughly the same amount of time it takes to learn set-multilinear circuits of width . The following lemma follows directly from the result of [11, Corollary 5.3] and the fast PIT algorithm of [22].
Lemma 11 ([11, Corollary 5.3]).
Suppose is an algorithm that has the following behavior. On input black-box access to degree , -variate polynomial such that is computable by a width circuit over the field , runs in deterministic time and outputs a circuit computing . Then there is another algorithm that has the following behavior. On input black-box access to degree , -variate polynomial such that is computable by an arbitrary width circuit over the field , runs in deterministic time and outputs a circuit computing .
We refer the reader to Section 5.1 and Section 5.6 of [11] for the details.
Low Degree Reconstruction
In [11], the authors gave an algorithm that reduced the reconstruction of a low-degree set-multilinear into solving a system of polynomial equations with few variables. We will be using it mainly in the case where . The result is as follows
Lemma 12 ([11], Lemma 5.5).
Given black-box access to a degree polynomial such that is computable by a width circuit over the field , there is a deterministic time algorithm that outputs a circuit computing . Further using Lemma 11, we can reduce the width to and get the running time of .
Remark 13.
When or , the output circuit is over the same underlying field . In general the output circuit might be over a degree algebraic extension of .
5 Partial Set-Multilinear Cluster Representation
We will be clustering all gates close to the gate to create a representation with a cluster of gates containing , such that the remaining gates are far from the cluster. The gates clubbed together will be represented by a set , with , fulfilling the properties described in Lemma 15.
For any set of gates of circuit , we can only consider the initial representation of such that and the gcd of the gates in are the same, i.e., has no linear factors. Such a representation will always exist for the polynomial computed by with a depth 3 set-multilinear circuit and the same top fan-in, which can be inferred from Lemma 10.
Next, we introduce the following definition of distance that we will be using to define the cluster representation.
Definition 14 (Cluster Distance).
For circuit with cluster with the representation such that has no linear factors, we define for ,i.e. the cluster distance between cluster and gate , as the number of variable parts on which the linear form in and differ. It can be represented as follows in the mathematical notation
We will use the cluster representation with the properties described in the following lemma.
Lemma 15.
For a polynomial computed by set-multilinear , there exist a set such that for and being a representation such that doesn’t have any linear factors, we have
-
-
-
For all remaining gates , and differ on at least variable parts, i.e.
Due to space constraints, we defer the formal proof of the above lemma to the full version of the paper.
Remark 16.
Check out the full version to compare it to previously known clusterings.
6 Almost circuit, Learning most gates
Given a circuit computing polynomial , we can use the techniques in [11] to learn a circuit such that .
We briefly explain how such a is obtained in time . We will use Lemma 12 to do deterministic reconstruction when the degree is in time . Now, similar to Lemma 5.6 in [11], we set all but variable parts to some random values and reconstruct the polynomial, which will be close to (on the variable parts that were not fixed), to obtain 2 independent linear forms such that they are supported on the same variable part . By setting to one at a time, and recursively calling reconstruction for set-multilinear circuits, we obtain representations close to . These close representations have each multiplication gate that is close to a multiplication gate in due to rank bounds as the gates that contain are obtained when we go , and vice versa. As described in Lemma 5.7 of [11], we obtain of at most circuits such that such that . Working with possibilities, we get at least one circuit such that .
Lemma 17.
Given black-box access to an set-multilinear circuit computing , there exists an algorithm that runs in time and outputs a list of size set-multilinear circuits such that one of the circuits has the property that .
7 Finding a good Projection
The focus of this section will be on learning the cluster (which contains ). Due to our clustering algorithm 555We actually never run the clustering algorithm; it is just used for existential arguments., we know there exists a representation of such that and .
We will now define our useful variable parts by excluding from the partitions for which the linear forms in are not part of . Therefore, we define
As , we have . Our target remains to find a set of variable parts in , such that we can fix them to some values that vanish other gates, but remains non-zero. We define any such fixing of variables to , contained as tuples in a set denoted by , that keeps gates (containing ) alive and sets the rest to as a -isolating projection.
Definition 18 (-isolating Projection).
A -isolating projection is a tuple such that , and is a set of tuples such that is a variable part and is an assignment of variables in the -th variable part. All gates except those in vanish after the substitution of variables according to , i.e.,
The goal of our computation will be getting black-box access to , thus we define a good -isolating projection which let’s us compute the black-box access to .
Definition 19 (Good isolating Projection).
7.1 Computing a good -isolating projection
This subsection will describe how we can compute a good -isolating projection using white-box access to . Our idea is straightforward. We know that for each , differs from on variable parts. By the “closeness” of and , we know that even in we will have at least variable parts that differ from . If we know exactly what these variable parts are for each , and the fact that they are not the same (up to scalar multiple), this means there is an assignment that kills at least one linear form of (and thus itself) for while keeping alive. However, we don’t have any idea what these variable parts are. Obviously, any brute force search for them will have a -type dependence. We design a recursive approach that, using the structural guarantees of , outputs a small list of candidates for these variable parts, with a guarantee that one of them will help us project to the cluster .
We now elaborate on the structural guarantees of that let us do this. If we pick a set of variable parts of size at most (for simplicity, in Algorithm 1, we use ) such that for each variable part, the span of the set of linear forms in the gates of in has a dimension of at least 2, this means for each of the variable parts in this set, we can pick at least one linear form from such that is not a scalar multiple of . Since and are linearly independent, we can find an assignment of variables such that (and therefore ) while keeping and non-zero. Thus, we have found a projection for with the top fan-in decreased by at least 1. This projection is added to and all gates in that are set to zero are removed from for the execution that continues after fixing this variable part. We do this recursively until we reach an execution level where we cannot find any variable parts for our set, which can happen if or the remaining gates in have the same linear forms (up to scalar multiples) as on all the variable parts that have not been fixed to some value. After, doing this for all variable parts in the at most sized set, the algorithm adds the gates and projection that gets to a global list .
Input: Black-box access to circuit and white-box access to such that
Now, to see that this set has a good -isolating projection, we observe that since we picked variable parts or Algorithm 1 couldn’t find variable parts with the required property. In the former case, using , there would be at least one variable part in our set that came from . We use the distance property of the clustering to show in the proof of Lemma 20 that if there is a gate , then there is a variable part in on which the gate (and as ) has an independent linear form from and therefore (and ). Therefore, in the latter case as well, either we pick a variable part in or .
We focus our attention on the path of execution where each variable part picked was from . Since all variable parts in are also in , setting using also sets as the linear forms depending on the variable part are same for and as described in the definition of , while keeping nonzero keeps nonzero. Also, as all variable parts in are also in and the circuit is set-multilinear, keeping non-zero also keeps for all . Therefore, along this execution path, at each recursive level, at least one gate not in gets set to zero, while keeping all the gates in nonzero by fixing variable parts with , until . In the execution call on the path with , the that is added to GoodProjList is a good -isolating Projection and occurs at a depth of at most .
This computation can be seen as a -arity tree with depth at most , with each node a recursive call of the function. We start with and make at most choices of variable parts such that at least one of them is in . In the execution of the algorithm along this path, at each step, at least one gate not in is set to zero by fixing the chosen variable part in while keeping all gates in non-zero, until we reach , which adds a good -isolating Projection to the list.
Lemma 20.
Due to space constraints, we defer the formal proof of the above lemma to the full version of the paper.
8 Reconstruction using a good Projection
Now, we will describe how we can obtain a sized list of candidate circuits for using the list computed in Lemma 20, one of which will be computing .
Input: Black-box access to set-multilinear Circuit and white-box access to such that and a list containing a good Projection
Lemma 21.
Given a list of -isolating Projections from Lemma 20, black-box access to a set-multilinear circuit computing with top fan-in , and white-box access to circuit as in Lemma 17, then Algorithm 2 computes a sized list of depth 3 set-multilinear circuits such that at least one of them computes in time .
Due to space constraints, we defer the formal proof of the above lemma to the full version of the paper. We conclude by presenting the algorithm mentioned in our main theorem (Theorem 1). The proof of its correctness can be found in the full version of the paper.
Input: Black-box access to set-multilinear Circuit
References
- [1] L. M. Adleman and K. S. McCurley. Open problems in number theoretic complexity, II. In Algorithmic Number Theory, First International Symposium, ANTS-I, Ithaca, NY, USA, May 6-9, 1994, Proceedings, pages 291–322, 1994. doi:10.1007/3-540-58691-1.
- [2] Anima Anandkumar, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Yi-Kai Liu. A spectral algorithm for latent dirichlet allocation. In Advances in Neural Information Processing Systems, pages 917–925, 2012.
- [3] Animashree Anandkumar, Rong Ge, Daniel Hsu, and Sham M Kakade. A tensor approach to learning mixed membership community models. The Journal of Machine Learning Research, 15(1):2239–2312, 2014. doi:10.5555/2627435.2670323.
- [4] Animashree Anandkumar, Rong Ge, and Majid Janzamin. Learning overcomplete latent variable models through tensor methods. In Conference on Learning Theory, pages 36–112. PMLR, 2015. URL: http://proceedings.mlr.press/v40/Anandkumar15.html.
- [5] D. Angluin. Queries and concept learning. Machine Learning, 2:319–342, 1988.
- [6] Boaz Barak, Jonathan A Kelner, and David Steurer. Dictionary learning and tensor decomposition via the sum-of-squares method. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 143–151, 2015. doi:10.1145/2746539.2746605.
- [7] A. Beimel, F. Bergadano, N. H. Bshouty, E. Kushilevitz, and S. Varricchio. Learning functions represented as multiplicity automata. J. ACM, 47(3):506–530, 2000. doi:10.1145/337244.337257.
- [8] M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynominal interpolation. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC), pages 301–309, 1988.
- [9] Vishwas Bhargava, Ankit Garg, Neeraj Kayal, and Chandan Saha. Learning generalized depth three arithmetic circuits in the non-degenerate case. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2022.
- [10] Vishwas Bhargava, Shubhangi Saraf, and Ilya Volkovich. Deterministic factorization of sparse polynomials with bounded individual degree. Journal of the ACM (JACM), 67(2):1–28, 2020. doi:10.1145/3365667.
- [11] Vishwas Bhargava, Shubhangi Saraf, and Ilya Volkovich. Reconstruction algorithms for low-rank tensors and depth-3 multilinear circuits. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 809–822, 2021. doi:10.1145/3406325.3451096.
- [12] Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, and Tanmay Sinha. Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), volume 287, pages 25:1–25:19, 2024. doi:10.4230/LIPIcs.ITCS.2024.25.
- [13] S. Chen and R. Meka. Learning polynomials in few relevant dimensions. In Jacob D. Abernethy and Shivani Agarwal, editors, Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 1161–1227. PMLR, 2020. URL: http://proceedings.mlr.press/v125/chen20a.html.
- [14] Lieven DeLathauwer, Josphine Castaing, and Jean-Franois Cardoso. Fourth-order cumulant-based blind identification of underdetermined mixtures. IEEE Transactions on Signal Processing, 55(6):2965–2973, 2007. doi:10.1109/TSP.2007.893943.
- [15] Harm Derksen. Kruskal’s uniqueness inequality is sharp. Linear Algebra and its Applications, 438(2):708–712, 2013.
- [16] Jingqiu Ding, Tommaso d’Orsi, Chih-Hung Liu, David Steurer, and Stefan Tiegel. Fast algorithm for overcomplete order-3 tensor decomposition. In Conference on Learning Theory, pages 3741–3799. PMLR, 2022. URL: https://proceedings.mlr.press/v178/ding22a.html.
- [17] M. A. Forbes and A. Shpilka. Quasipolynomial-time identity testing of non-commutative and read-once oblivious algebraic branching programs. Electronic Colloquium on Computational Complexity (ECCC), 19:115, 2012.
- [18] Michael A Forbes and Amir Shpilka. On identity testing of tensors, low-rank recovery and compressed sensing. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 163–172, 2012. doi:10.1145/2213977.2213995.
- [19] A. Garg, N. Kayal, and C. Saha. Learning sums of powers of low-degree polynomials in the non-degenerate case. arXiv preprint arXiv:2004.06898, 2020.
- [20] Ankit Garg, Neeraj Kayal, and Chandan Saha. Learning sums of powers of low-degree polynomials in the non-degenerate case. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 889–899. IEEE, 2020. doi:10.1109/FOCS46700.2020.00087.
- [21] D. Yu. Grigor’ev and N.N. Vorobjov. Solving systems of polynomial inequalities in subexponential time. Journal of Symbolic Computation, 5(1):37–64, 1988. doi:10.1016/S0747-7171(88)80005-1.
- [22] Zeyu Guo and Rohit Gurjar. Improved explicit hitting-sets for roabps. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2020.
- [23] A. Gupta, N. Kayal, and S. V. Lokam. Efficient reconstruction of random multilinear formulas. In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS, pages 778–787, 2011. doi:10.1109/FOCS.2011.70.
- [24] A. Gupta, N. Kayal, and Y. Qiao. Random arithmetic formulas can be reconstructed efficiently. Computational Complexity, 23(2):207–303, 2014. doi:10.1007/s00037-014-0085-0.
- [25] Johan Håstad. Tensor rank is np-complete. J. Algorithms, 11(4):644–654, 1990. doi:10.1016/0196-6774(90)90014-6.
- [26] Christopher J. Hillar and Lek-Heng Lim. Most tensor problems are np-hard. J. ACM, 60(6), November 2013. doi:10.1145/2512329.
- [27] Daniel Hsu and Sham M Kakade. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. arXiv preprint arXiv:1306.0021, 2013. Presented at the 4th Conference on Innovations in Theoretical Computer Science (ITCS).
- [28] D. Ierardi. Quantifier elimination in the theory of an algebraically-closed field. In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing, STOC ’89, pages 138–147, New York, NY, USA, 1989. Association for Computing Machinery. doi:10.1145/73007.73020.
- [29] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. Journal of Computer and System Sciences, 62(2):367–375, 2001. doi:10.1006/JCSS.2000.1727.
- [30] Prateek Jain and Sewoong Oh. Learning mixtures of discrete product distributions using spectral decompositions. arXiv preprint arXiv:1404.4604, 2014. Presented at the Conference on Learning Theory (COLT).
- [31] Nathaniel Johnston, Benjamin Lovitz, and Aravindan Vijayaraghavan. Computing linear sections of varieties: quantum entanglement, tensor decompositions and beyond. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1316–1336. IEEE, 2023. doi:10.1109/FOCS57990.2023.00079.
- [32] Z. S. Karnin and A. Shpilka. Reconstruction of generalized depth-3 arithmetic circuits with bounded top fan-in. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity (CCC), pages 274–285, 2009.
- [33] N. Kayal and C. Saha. Reconstruction of non-degenerate homogeneous depth three circuits. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019., pages 413–424, 2019. doi:10.1145/3313276.3316360.
- [34] A. Klivans and A. Shpilka. Learning restricted models of arithmetic circuits. Theory of computing, 2(10):185–206, 2006. doi:10.4086/TOC.2006.V002A010.
- [35] A. Klivans and D. Spielman. Randomness efficient identity testing of multivariate polynomials. In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing (STOC), pages 216–223, 2001.
- [36] Pravesh K. Kothari, Ankur Moitra, and Alexander S. Wein. Overcomplete tensor decomposition via koszul-young flattenings, 2024. doi:10.48550/arXiv.2411.14344.
- [37] J. Landsberg. Tensors: geometry and applications. Representation theory, 381(402):3, 2012.
- [38] Benjamin Lovitz and Fedor Petrov. A generalization of kruskal’s theorem on tensor decomposition. In Forum of Mathematics, Sigma, volume 11, page e27. Cambridge University Press, 2023.
- [39] Ankur Moitra and Alexander S Wein. Spectral methods from tensor networks. arXiv preprint arXiv:1811.00944, 2018. arXiv:1811.00944.
- [40] Elchanan Mossel and Sébastien Roch. Learning nonsingular phylogenies and hidden markov models. arXiv preprint arXiv:1506.08512, 2005. Presented at the Thirty-Seventh Annual ACM Symposium on Theory of Computing (STOC).
- [41] Shir Peleg, Amir Shpilka, and Ben Lee Volk. Tensor Reconstruction Beyond Constant Rank. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), volume 287, pages 87:1–87:20, 2024. doi:10.4230/LIPIcs.ITCS.2024.87.
- [42] M. Schaefer and D. Stefankovic. The complexity of tensor rank. CoRR, abs/1612.04338, 2016. arXiv:1612.04338.
- [43] Tselil Schramm and David Steurer. Fast and robust tensor decomposition with applications to dictionary learning. In Conference on Learning Theory, pages 1760–1793. PMLR, 2017. URL: http://proceedings.mlr.press/v65/schramm17a.html.
- [44] Y. Shitov. How hard is the tensor rank? arXiv preprint arXiv:1611.01559, 2016.
- [45] A. Shpilka and I. Volkovich. On the relation between polynomial identity testing and finding variable disjoint factors. In Automata, Languages and Programming, 37th International Colloquium (ICALP), pages 408–419, 2010. Full version at https://eccc.weizmann.ac.il/report/2010/036.
- [46] Ilya Volkovich. A guide to learning arithmetic circuits. In Conference on Learning Theory, pages 1540–1561. PMLR, 2016. URL: http://proceedings.mlr.press/v49/volkovich16.html.
- [47] Alexander S Wein. Average-case complexity of tensor decomposition for low-degree polynomials. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1685–1698, 2023. doi:10.1145/3564246.3585232.