Abstract 1 Introduction 2 Preliminaries 3 Min-plus PTFs vs. nearest neighbors 4 kNN vs. Circuits 5 New bounds for the nearest neighbor complexity of Boolean functions 6 Conclusion References Appendix A Omitted proofs Appendix B Circuits computing nearest neighbors

Nearest Neighbor Complexity and Boolean Circuits

Mason DiCicco ORCID Worcester Polytechnic Institute, MA, USA Vladimir Podolskii ORCID Tufts University, Medford, MA, USA Daniel Reichman ORCID Worcester Polytechnic Institute, MA, USA
Abstract

A nearest neighbor representation of a Boolean function f is a set of vectors (anchors) labeled by 0 or 1 such that f(𝒙)=1 if and only if the closest anchor to 𝒙 is labeled by 1. This model was introduced by Hajnal, Liu and Turán [2022], who studied bounds on the minimum number of anchors required to represent Boolean functions under different choices of anchors (real vs. Boolean vectors) as well as the analogous model of k-nearest neighbors representations.

We initiate a systematic study of the representational power of nearest and k-nearest neighbors through Boolean circuit complexity. To this end, we establish a close connection between Boolean functions with polynomial nearest neighbor complexity and those that can be efficiently represented by classes based on linear inequalities – min-plus polynomial threshold functions – previously studied in relation to threshold circuits. This extends an observation of Hajnal et al. [2022]. Next, we further extend the connection between nearest neighbor representations and circuits to the k-nearest neighbors case.

As an outcome of these connections we obtain exponential lower bounds on the k-nearest neighbors complexity of explicit n-variate functions, assuming kn1ϵ. Previously, no superlinear lower bound was known for any k>1. At the same time, we show that proving superpolynomial lower bounds for the k-nearest neighbors complexity of an explicit function for arbitrary k would require a breakthrough in circuit complexity. In addition, we prove an exponential separation between the nearest neighbor and k-nearest neighbors complexity (for unrestricted k) of an explicit function. These results address questions raised by [17] of proving strong lower bounds for k-nearest neighbors and understanding the role of the parameter k. Finally, we devise new bounds on the nearest neighbor complexity for several families of Boolean functions.

Keywords and phrases:
Complexity, Nearest Neighbors, Circuits
Copyright and License:
[Uncaptioned image] © Mason DiCicco, Vladimir Podolskii, and Daniel Reichman; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Complexity classes
Related Version:
Previous Version: https://arxiv.org/pdf/2402.06740
Acknowledgements:
We would like to thank Bill Martin for several insightful comments. The second and third author thank the Simons Institute for the Theory of Computing where collaboration on this project has began.
Editors:
Raghu Meka

1 Introduction

A nearest-neighbor representation of a function f:{0,1}n{0,1} is a set of vectors, called “anchors,” say S=PN such that f(𝒙)=1 if and only if the nearest anchor to 𝒙 (under the Euclidean distance metric) belongs to P. The set of anchors can be seen as a (disjoint) union of “positive” and “negative” examples. If S{0,1}n, we refer to the representation as Boolean, and if Sn we call it real. This model was pioneered by [17], who advocated the study of Boolean functions admitting efficient representations, focusing on the number of anchors as a measure of the complexity of the representation. They also consider the k-nearest neighbors variation, where the value of f on input x is computed as a majority vote of the k nearest anchors to x.

In particular, [17] observed a relationship between nearest-neighbor representations and threshold circuits. Motivated by their work, we initiate a systematic study of the connections to circuit complexity. Some of our results are related to the weight complexity (number of bits) needed when representing Boolean functions by real anchors: A topic receiving recent attention by [24].

There are numerous extensions and variations of nearest neighbor complexity. While studying all of these here is infeasible, one goal of ours is to encourage further exploration of this broad topic. We discuss some future directions in the conclusion section.

1.1 Motivation

We believe that nearest neighbor representations are a natural and interesting model of computation worthy of study. Furthermore, nearest neighbor complexity is relevant to several central research topics listed next.

1.1.1 Boolean function complexity

Nearest neighbor representations are related to depth-two threshold circuits and decision trees: [17] establish that lower bounds for these models are useful for establishing lower bounds for nearest neighbors. Polynomial threshold functions and their relation to geometric representations of Boolean functions also bear a strong connection to nearest neighbor representations. In fact, we will show that the expressivity of nearest neighbors is essentially equivalent to that of polynomial threshold functions over the tropical semiring – An interesting model of computation due to [18].

Our approach to understanding nearest neighbor complexity in terms of Boolean circuit complexity follows a long tradition in computational learning theory and computational complexity (e.g., [29, 22, 26, 27, 41, 39]). Uncovering new connections between nearest neighbors and well-studied computational models (such as linear decision lists and depth-two circuits) enables us to prove new unconditional lower bounds and upper bounds on the nearest neighbor complexity of explicit111By “explicit,” we mean a function that can be computed in polynomial time by a Turing machine. functions. It also allows us to phrase some open problems in circuit complexity in terms of nearest neighbor complexity. For instance, the difficulty of proving super-polynomial lower bounds for k-nearest neighbors representations – for appropriate values of k – follows from representations (that we construct) of circuits with the same difficulty.

1.1.2 Machine learning and pattern recognition

Learning algorithms based on nearest neighbors have been the subject of extensive research for more than 50 years [8, 11]. For example, there is evidence that increasing k in the k-nearest neighbors rule can decrease its estimation error [10]. However, much less seems to be known about the capacity of the nearest and k-nearest neighbor rules to represent certain functions, while the capacity of other machine learning models such as Boolean circuits and neural networks has received considerable interest [16, 19, 30, 12, 38].

1.1.3 Algorithms for nearest neighbors classification and search

Efficient implementation of the nearest neighbor rule in high dimension has received considerable interest, leading to efficient algorithms and sophisticated data structures [7, 2, 1]. The study of nearest neighbor complexity leads naturally to the study of nonuniform algorithms (Boolean circuits) for this rule. Our work (along with previous findings) yields upper and lower bounds on the size of a circuit needed to implement nearest neighbor classification. We focus on exact representations, leaving the study of circuit complexity for approximate nearest neighbor search (e.g., [20, 28]) for future work.

1.2 Our results

The objects of our study are the classes of Boolean functions admitting real and Boolean nearest neighbor representation of polynomial complexity, 𝖭𝖭 and 𝖧𝖭𝖭 respectively. Standard complexity classes based on circuit-like models of computation are closed under two natural “rewiring” operations: Substitution of variables by constants and duplication of variables (See Definition 11). However, it is not clear how to efficiently perform these operations in the nearest neighbor model. Thus, if we would like to give a precise characterization of 𝖭𝖭 and 𝖧𝖭𝖭 in terms of circuits, we have to consider the closure of these classes under the same operations. As a result, we obtain classes of subfunctions of polynomial-size nearest neighbor representations, 𝖭𝖭¯ and 𝖧𝖭𝖭¯ (see Defintion 12). We then give a precise characterization (equality) of 𝖭𝖭¯ and 𝖧𝖭𝖭¯ in terms of the class of min-plus polynomial threshold functions of polynomial complexity (𝗆𝗉𝖯𝖳𝖥)222An mpPTF is an expression of the form min{L1(𝒙),,L(𝒙)}min{R1(𝒙),,Rr(𝒙)} where Li,Ri are linear forms.. This adds to previous results of [17] showing the containment of 𝖭𝖭 and 𝖧𝖭𝖭 in 𝗆𝗉𝖯𝖳𝖥. As a consequence, we prove (among other results) that 𝖧𝖭𝖭 contains functions that cannot be computed by depth-two threshold circuits with polynomial size and weight. We also observe that the closure of 𝖧𝖭𝖭 is closely connected to the class of functions with 𝖭𝖭 representations of logarithmic bit-complexity.

We study the k-nearest neighbors complexity of Boolean functions for k>1. First, we extend the aforementioned characterization – the closure of 𝖭𝖭 in terms of 𝗆𝗉𝖯𝖳𝖥 – to the closure of 𝗄𝖭𝖭. We use this characterization to prove that 𝗄𝖭𝖭 for constant k is closely related to 𝖭𝖭 and that there exists an explicit function that requires exponential 𝗄𝖭𝖭 complexity when kn1ϵ (for an n-variate function). Next, we generalize the characterization of 𝗄𝖭𝖭 to arbitrary k by introducing a new class, 𝗄𝖲𝖳𝖠𝖳 – functions realizable by an inequality of the k-statistics of two sets of linear forms – which generalizes 𝗆𝗉𝖯𝖳𝖥. Consequently, we show that proving lower bounds for 𝗄𝖭𝖭 for arbitrary k would result in lower bounds for the circuit class 𝖲𝖸𝖬𝖬𝖠𝖩, which would be a major breakthrough in Boolean circuit complexity.

Finally, we present new bounds for nearest neighbor complexity of specific Boolean functions such as disjointness, CNFs, and majority. For example, we show that CNFs with polynomially many clauses have 𝖭𝖭 representations with polynomially-many anchors which also exhibit constant bit-complexity. In contrast, there exist CNFs of polynomial size with exponential 𝖧𝖭𝖭 complexity. We also establish a new lower bound of n/2+2 on the 𝖧𝖭𝖭 complexity of the majority function with an even number of inputs (n). This lower bound is tight, as it matches the upper bound proved in [17].

1.3 Related work

Nearest neighbor complexity (under Euclidean distance) was formalized by [17]. They prove that the functions333𝖳𝖧𝖱t(𝒙)=1ixit 𝖳𝖧𝖱n/3 and 𝖷𝖮𝖱 both require an exponential number of Boolean anchors but only 2 and n+1 real anchors, respectively. In fact, the same argument proves that 𝖳𝖧𝖱t requires at least (nt)/(2tt) anchors for any t, which gives exponential lower bound for t bounded away from n/2, but is vacuous for 𝖳𝖧𝖱n/2. It was subsequently shown in [24] that any symmetric Boolean function f has an 𝖭𝖭 representation with I(f) anchors, where I(f) denotes the number of intervals of f – the minimal number of contiguous intervals [a,b] partitioning [0,n] where f(𝒙) is constant for aixib – and this bound is optimal when all intervals have the same length. This extends the result of [17] that every symmetric function has nearest neighbor complexity at most n+1.

1.3.1 Connections to circuits

It was observed in [17] that functions with polynomial nearest neighbor complexity can be simulated by min-plus polynomial threshold functions, but it is an open question of whether or not the inclusion 𝖭𝖭𝗆𝗉𝖯𝖳𝖥 is proper. Relations to the class 𝗆𝗉𝖯𝖳𝖥 are of interest because it deepens the connection between nearest neighbors and circuit complexity. For instance, [18] establish that systems of 𝗆𝗉𝖯𝖳𝖥s compute exactly the class of 𝖠𝖭𝖣𝖮𝖱𝖳𝖧𝖱 circuits.

The expressive power of k-nearest neighbors rule was also studied by [17]. In particular, they prove that 𝗄𝖭𝖭 can simulate linear decision trees, which yields a linear (in n) lower bound for the number of anchors in 𝗄𝖭𝖭 representations of the 𝖨𝖯mod2 function. They also state the open problem of proving stronger lower bounds for the k-nearest neighbors complexity of an explicit function. In [25], it is shown that, under some regularity assumptions, polynomial-size nearest neighbor representations can simulate convex polytopes (i.e., 𝖠𝖭𝖣𝖳𝖧𝖱) as well as logarithmic fan-in 𝖲𝖸𝖬𝖳𝖧𝖱 circuits and linear decision lists (𝖫𝖣𝖫).

Constructions of Boolean circuits computing nearest neighbor classification are known. [31] constructs an 𝖮𝖱𝖠𝖭𝖣𝖳𝖧𝖱 circuit computing any function with an m-anchor 𝖭𝖭 representation in size O(m2). (See Appendix B). A very similar depth-three construction for 𝗄𝖭𝖭, also with size O(m2), was found by [6]. Note that the weights of the above circuits are bounded by a polynomial in n.

1.3.2 Bit complexity

It was shown in [24] that (the aforementioned) 𝖭𝖭 representations for symmetric functions have logarithmic bit-complexity, and that this is tight for some functions. It is left as an open problem to characterize 𝖭𝖭 representations of threshold functions in terms of bit-complexity. To this end, the same authors (in [25]) show that logarithmic bit complexity suffices to represent the comparison, equality, and odd-max bit Boolean functions, and conjecture that a logarithmic upper bound holds for any threshold function.

Other works have studied the role of bit-complexity in approximate nearest neighbor search; where we wish to find an anchor whose distance is minimal to a query point, up to a factor of (1+ϵ). For example,  [21] provide tight bounds (in terms of bit-complexity) on the size of data structures performing approximate nearest neighbor search. This setting is quite different from our focus on exact classification of Boolean vectors.

The bit-complexity of the weights in polynomial-size threshold circuits has been studied extensively (see, e.g., surveys [34, 37]). For example, it was proved by [14, 15] that arbitrarily large weights can be reduced to have logarithmic bit-complexity by slightly increasing the depth of the circuit (along with a polynomial blow-up in size).

1.4 Organization

Section 2 outlines basic definitions required in subsequent sections. Section 3 establishes the equivalence between 𝖧𝖭𝖭¯,𝖭𝖭¯ and min-plus polynomial threshold functions, then discusses some of the consequences. Section 4 generalizes 𝗆𝗉𝖯𝖳𝖥 to a new class, 𝗄𝖲𝖳𝖠𝖳, and proves that a similar equivalence holds with 𝗄𝖭𝖭¯. Here, we also derive several connections to circuit classes such as 𝖲𝖸𝖬𝖬𝖠𝖩. Section 5 contains new results (upper and lower bounds) for the nearest neighbor complexity of explicit Boolean functions. Many proofs are relegated to Appendix A due to space constraints. Appendix B contains some direct constructions of threshold circuits computing 𝖧𝖭𝖭, one of which has depth two.

2 Preliminaries

We use the following notation throughout the paper:

Vectors are written in bold (i.e. 𝒙=(x1,,xn)).
The k’th statistic of 𝒙, denoted 𝒙(k), is the k’th smallest element of 𝒙. – In particular, 𝒙(k)=xσ(k)xσ(1)xσ(n) for some σSn
Δ(𝒙,𝒚):=𝒙𝒚22 denotes the squared Euclidean distance between 𝒙,𝒚.
𝒙,𝒚 denotes the real inner product (dot product), x1y1++xnyn.
poly(n) refers to an arbitrary polynomial in the variable n.
𝟙[P] denotes the Boolean function whose value is 1 if and only if P holds.

Note that the (squared) Euclidean distance between two Boolean vectors is equal to their Hamming distance, Δ(𝒙,𝒚)=in𝟙[xiyi], so the Hamming weight of a Boolean vector 𝒑 is denoted Δ(𝒑):=Δ(𝒑,𝟎)=𝒑22.

2.1 Boolean functions

Definition 1.

A threshold gate is a Boolean function f:{0,1}n{0,1} defined by a weight vector 𝐰n and a threshold θ such that

f(𝒙)=1𝒘,𝒙θ. (1)

A threshold circuit is a sequence (f1,,fs) of sn gates such that the first n gates are equal to the input variables (i.e., fi=xi for in) and subsequent gates are threshold gates whose inputs are some subset of the previous gates. The output of the circuit is equal to the output of the final gate. The size of the circuit is equal to sn.

A threshold circuit can be viewed as a directed acyclic graph. Nodes with fan-in 0 correspond to inputs, and other nodes correspond to threshold gates applied to the values of the preceding nodes. The node with fan-out 0 correspond to the output node. The depth of the circuit is the length of the longest path from an input node to the output node.

 Remark 2.

It is well known that we may assume that the weights (and the threshold) are integers without loss of generality: Since the domain of a threshold gate is finite, we may approximate each weight by a rational number and multiply by a common denominator. See [23] for a comprehensive introduction to circuit complexity.

Definition 3.

A Nearest Neighbor (𝖭𝖭) representation of a Boolean function f:{0,1}n{0,1} is defined by two disjoint sets of positive and negative anchors P,Nn such that

  • f(𝒙)=1 if there exists a 𝒑P with Δ(𝒙,𝒑)<Δ(𝒙,𝒒) for all 𝒒N.

  • f(𝒙)=0 if there exists a 𝒒N with Δ(𝒙,𝒒)<Δ(𝒙,𝒑) for all 𝒑P.

A Hamming Nearest Neighbor (𝖧𝖭𝖭) representation is defined identically for Boolean anchors in {0,1}n.

Definition 4.

A k-Nearest Neighbors (𝗄𝖭𝖭) representation of a function f:{0,1}n{0,1} is defined by two disjoint sets of positive and negative anchors P,Nn and an integer k such that

f(𝒙)=1 there exists an APN with the following properties:

  1. 1.

    |A|=k

  2. 2.

    |AP||AN|

  3. 3.

    Δ(𝒙,𝒂)<Δ(𝒙,𝒃) for all 𝒂A, 𝒃A.

A 𝗄𝖧𝖭𝖭 representation is defined identically for Boolean anchors.

Definition 5 ([18]).

A min-plus polynomial threshold function (𝗆𝗉𝖯𝖳𝖥) is a Boolean function f:{0,1}n{0,1} defined by two sets of linear forms with integer coefficients444As for threshold gates, there is no loss of generality in the assumption that weights of 𝗆𝗉𝖯𝖳𝖥s are integers. {L1,,L1}{R1,,R2} satisfying

f(𝒙)=1mini1Li(𝒙)minj2Rj(𝒙) (2)

The number of terms in an 𝗆𝗉𝖯𝖳𝖥 is equal to 1+2, and the maximum weight is equal to the largest absolute value of the coefficients of any form.

Definition 6 ([36]).

A linear decision list (𝖫𝖣𝖫) representation of a Boolean function f is a sequence of instructions “if fi(𝐱)=1, then output ci (and halt)” for 1im, followed by “output 0.” Here, f1,,fm are threshold gates and c1,,cm{0,1}. Exact linear decision lists (𝖤𝖫𝖣𝖫) are defined similarly using exact threshold functions – threshold gates where the inequality in (1) is replaced with equality. The length of an 𝖫𝖣𝖫 or 𝖤𝖫𝖣𝖫 is the number of gates, m, and its maximum weight is equal to the largest coefficient of any fi.

Definition 7.

We consider the following well-known Boolean functions.

The majority function, 𝖬𝖠𝖩(x1,,xn)=𝟙[x1++xnn/2]
The disjointness function, 𝖣𝖨𝖲𝖩(𝒙,𝒚)=𝟙[𝒙,𝒚=0]
The inner product mod 2 function, 𝖨𝖯(𝒙,𝒚)=𝒙,𝒚mod2
The odd-max-bit function, 𝖮𝖬𝖡(x1,,xn)=max{i:xi=1}mod2

2.2 Function classes

First, we define classes of Boolean circuits whose inputs may be variables, their negations, or the constants 0 and 1. 𝖠𝖭𝖣, 𝖮𝖱, 𝖳𝖧𝖱, and 𝖲𝖸𝖬 are the classes of polynomial-size555“polynomial” in this context is always with respect to the input size, n. depth-one circuits composed of 𝖠𝖭𝖣, 𝖮𝖱, threshold gates, and symmetric functions (i.e., Boolean functions which depend only on the Hamming weight of the input) respectively. 𝖬𝖠𝖩𝖳𝖧𝖱 is the set of threshold gates with polynomial weights666We abuse the notation denoting by 𝖬𝖠𝖩 both specific function and a class of function. The meaning of our notation will be also clear from the context.. 𝖠𝖢0 is the class of constant-depth circuits consisting of a polynomial number of 𝖠𝖭𝖣, 𝖮𝖱, and 𝖭𝖮𝖳 gates.

For two circuit classes C1, C2, the class of circuits consisting of a circuit from C1 whose inputs are (the outputs of) a polynomial number of circuits from C2 is denoted by C1C2. (e.g., 𝖳𝖧𝖱𝖳𝖧𝖱 refers to depth two threshold circuits of polynomial size.)

Definition 8.

𝖭𝖭 is the class of Boolean functions that have nearest neighbor representations with polynomially-many anchors. 𝖧𝖭𝖭 is the same class where anchors are Boolean. 𝗄𝖭𝖭 and 𝗄𝖧𝖭𝖭 are defined in the same manner for a positive integer k.

Definition 9.

𝗆𝗉𝖯𝖳𝖥() is the class of min-plus polynomial threshold functions with a polynomial number (in terms of the number of inputs) of terms and unbounded maximum weight. 𝗆𝗉𝖯𝖳𝖥(poly(n)) is the same class with polynomially-bounded maximum weight.

Definition 10.

𝖫𝖣𝖫 is the class of Boolean functions representable by linear decision lists with polynomial length. 𝖫𝖣𝖫^ is the same class with polynomially-bounded maximum weight. 𝖤𝖫𝖣𝖫 and 𝖤𝖫𝖣𝖫^ are defined similarly for exact linear decision lists.

3 Min-plus PTFs vs. nearest neighbors

In this section, we introduce the closure operation and derive an equivalence between (the closure of) 𝖭𝖭, 𝖧𝖭𝖭 and 𝗆𝗉𝖯𝖳𝖥.

Definition 11.

Define a substitution of variables as a function v:{0,1}n{0,1}n~ where duplicates variables or adds constant variables (e.g., x1x2x1x1x2x2x20). Then, a Boolean function f:{0,1}n{0,1} is a subfunction of g:{0,1}n~{0,1} when n~=poly(n) and there exists a substitution of variables v such that f(𝐱)=g(v(𝐱)) for all 𝐱{0,1}n.

Subfunctions may equivalently be obtained from g:{0,1}n~{0,1} by identifying variables (e.g., x1=x2) and assigning variables to constants (e.g., x1=0).

Definition 12.

For any function class C, let C¯ denote the closure of C: The set of subfunctions derived from the elements of C. In particular, we say that a Boolean function f has an “𝖭𝖭¯ representation” if it is a subfunction of some g𝖭𝖭.

 Note 13.

Circuit classes are already closed under this operation. For example, 𝖬𝖠𝖩¯=𝖬𝖠𝖩: Subfunctions of the majority function simply add (polynomially-bounded) coefficients and constant terms.

Theorem 14.
𝖭𝖭¯=𝗆𝗉𝖯𝖳𝖥(),𝖧𝖭𝖭¯=𝗆𝗉𝖯𝖳𝖥(poly(n))

Theorem 14 and some consequences are proved in Appendix A.1. Namely, we observe that any n-variate function in 𝖭𝖭¯ is a sub-function of an (n+1)-variate 𝖭𝖭 representation, and that 𝗆𝗉𝖯𝖳𝖥(poly(n)) captures precisely the power of 𝖭𝖭¯ representations with bit-complexity O(logn). Then, using the results of [17] and [18], we immediately establish the following two corollaries.

Corollary 15.

𝖧𝖭𝖭𝖧𝖭𝖭¯

Corollary 16.

𝖭𝖭¯ representations of 𝖨𝖯 and fn(𝐱,𝐲):=i=1nj=1n2(xi,jyi,j) require 2Ω(n) anchors.

(Proofs in A.2 and A.3.) Theorem 14 also yields lower bounds for the circuit complexity of functions belonging to 𝖧𝖭𝖭. (A direct construction in Appendix B shows that 𝖧𝖭𝖭𝖳𝖧𝖱𝖬𝖠𝖩.)

Theorem 17.
𝖧𝖭𝖭𝖬𝖠𝖩𝖬𝖠𝖩

More precisely, there is a Boolean function with an 𝖧𝖭𝖭 representation with n+1 anchors which cannot be computed by a depth-two majority circuit with poly(n) gates.

Proof.

First, we claim that 𝖮𝖬𝖡𝖠𝖭𝖣2𝖧𝖭𝖭¯. Indeed, f is computed by an 𝗆𝗉𝖯𝖳𝖥 with n+1 terms:

min{L1(𝒙,𝒚),L3(𝒙,𝒚),}min{1,L2(𝒙,𝒚),L4(𝒙,𝒚),}

where Lk(𝒙,𝒚)=(k+1)(1xiyi). Note that if xi=yi=1, then Li(𝒙,𝒚)=(i+1) and otherwise Li(𝒙,𝒚)0. Hence, the minimum is obtained at the maximum index j where xj=yj=1. The claim follows from Theorem 14.

Second, it is known that 𝖮𝖬𝖡𝖠𝖭𝖣2𝖬𝖠𝖩𝖬𝖠𝖩 by [4, 16]. Thus, if 𝖧𝖭𝖭 was in 𝖬𝖠𝖩𝖬𝖠𝖩, then we could use the 𝖧𝖭𝖭¯ representation described above to get a 𝖬𝖠𝖩𝖬𝖠𝖩 circuit computing 𝖮𝖬𝖡𝖠𝖭𝖣2, which is a contradiction.

Finally, we observe a connection between 𝗆𝗉𝖯𝖳𝖥s and linear decision lists. This provides additional proof techniques for 𝖧𝖭𝖭¯ and helps to relate a question of separation of 𝖧𝖭𝖭¯ and 𝖭𝖭¯ to the similar question for linear decision lists. The following lemma is proved in Appendix A.4.

Lemma 18.
𝗆𝗉𝖯𝖳𝖥(poly(n))𝖫𝖣𝖫^.

More precisely, any 𝗆𝗉𝖯𝖳𝖥 with m terms and maximum weight W is equivalent to a linear decision list with length and maximum weight O(m2W).

 Remark 19.

This lemma enables another technique to prove lower bounds for 𝖧𝖭𝖭¯ apart from sign-rank. More specifically, it is known that any function without large monochromatic rectangles must have a large linear decision list by [5].

Lemma 20.

𝖫𝖣𝖫𝗆𝗉𝖯𝖳𝖥().

Proof.

It was shown in [18, Lemma 22] that 𝖮𝖬𝖡𝖳𝖧𝖱𝗆𝗉𝖯𝖳𝖥(). Our lemma follows since 𝖮𝖬𝖡 is complete for the class of decision lists – See [18, Lemma 22].

It is open whether 𝖫𝖣𝖫^ and 𝖫𝖣𝖫 are equal by [5]. Lemmas 18 and 20 immediately allow us to relate this problem to the problem of separating 𝖧𝖭𝖭¯ and 𝖭𝖭¯.

Corollary 21.

If 𝖫𝖣𝖫^𝖫𝖣𝖫, then 𝖧𝖭𝖭¯𝖭𝖭¯.

Proof.

From Theorem 14 and Lemmas 1820, we have the following sequence of inclusions.

𝖧𝖭𝖭¯=𝗆𝗉𝖯𝖳𝖥(poly)𝖫𝖣𝖫^𝖫𝖣𝖫𝗆𝗉𝖯𝖳𝖥()=𝖭𝖭¯,

If 𝖧𝖭𝖭¯=𝖭𝖭¯, then the whole sequence of inclusions collapses and, in particular, 𝖫𝖣𝖫^=𝖫𝖣𝖫.

4 kNN vs. Circuits

In this section, we give a circuit-style characterization of 𝗄𝖭𝖭 and provide connections to known circuit classes. From these results, we obtain a separation between 𝗄𝖭𝖭 and 𝖭𝖭. Additionally, our results imply complexity theoretic barriers for proving superpolynomial lower bounds for 𝗄𝖭𝖭 representations of explicit functions.

4.1 Characterization for small 𝒌

Here, we use the connection to 𝗆𝗉𝖯𝖳𝖥 representations to get our first results on k-nearest neighbors complexity. In particular, we relate k-nearest neighbors representations for constant k to 𝖭𝖭¯ and prove a lower bound on k-nearest neighbors complexity for sublinear k.

Theorem 22.

Any Boolean function with an m-anchor 𝗄𝖭𝖭 representation is computed by an 𝗆𝗉𝖯𝖳𝖥 with (mk) terms.

Proof.

We prove only the first statement as both arguments are identical. As noted in the proof of Theorem 14, the distances from anchors to a query point 𝒙 are linear forms L1(𝒙),,Lm(𝒙). Assign each linear form a label 1,,m{1,1} where a positive label indicates placement on the left-hand side of the 𝗆𝗉𝖯𝖳𝖥 and vice versa.

Then, consider the collection A+(𝒙)={Li1(𝒙)++Lik(𝒙)|i1++ik0} and the compliment A(𝒙)={Li1(𝒙)++Lik(𝒙)|i1++ik<0}. The resulting 𝗆𝗉𝖯𝖳𝖥 with (mk) terms, 𝟙[minA+(𝒙)minA(𝒙)], realizes the original 𝗄𝖭𝖭 representation: The minimum is attained by groups of k-nearest neighbors and if any such group has a positive majority then the inequality holds.

It follows that Boolean functions with m-anchor 𝗄𝖭𝖭 representations can be represented in 𝖭𝖭¯ with (mk) anchors. These results generalize to both weighted 𝗄𝖭𝖭 and to non-Boolean inputs. See Appendix A.5 for a discussion.

As a consequence of Theorem 22, sign-rank lower bounds (e.g., Corollary 16) also apply to 𝗄𝖭𝖭. In particular, we get an exponential lower bound for 𝗄𝖭𝖭 with k=O(n1ϵ) for constant ϵ>0. This addresses an open question posed in [17] regarding k-nearest neighbors complexity.

Corollary 23.

Any 𝗄𝖭𝖭¯ representation of 𝖨𝖯 or fn(𝐱,𝐲):=i=1nj=1n2(xi,jyi,j) requires 2Ω(n/k) anchors.

Proof.

Assume that 𝖨𝖯 (or fn) has a 𝗄𝖭𝖭¯ representation with m anchors. By Theorems 14 and 22 , 𝖨𝖯 has an 𝖭𝖭¯ representation with (mk)mk anchors. By Corollary 16, we have mk2Ω(n) and thus m2Ω(n/k).

4.2 Characterization for arbitrary 𝒌

In this section, we generalize the ideas of Theorem 14 to the closure of 𝗄𝖭𝖭, yielding further connections between nearest neighbors and circuit complexity.

Definition 24.

Define by 𝗄𝖲𝖳𝖠𝖳 the class of functions f:{0,1}n{0,1} representable by an inequality between k-statistics of two sets consisting of a polynomial number of linear forms: Given {L1,,L1}{R1,,R2} and integers kl and kr,

f(𝒙)=1(L1(𝒙),,L1(𝒙))(kl)<(R1(𝒙),,R2(𝒙))(kr) (3)

and 1+2 is bounded by a polynomial in n.

As usual, we can assume that all coefficients in the linear forms are integers. Define the subclass 𝗄𝖲𝖳𝖠𝖳^ where all coefficients are bounded by a polynomial in n777𝗆𝗉𝖯𝖳𝖥 can be viewed as a special case of 𝗄𝖲𝖳𝖠𝖳 in which kl=kr=1..

Note that we can reduce Definition 24 to the case of kl=kr with only a linear increase in the size. This can be done by adding “dummy” linear forms that are always smaller than all others.

Theorem 25.
𝗄𝖭𝖭¯=𝗄𝖲𝖳𝖠𝖳,𝗄𝖧𝖭𝖭¯=𝗄𝖲𝖳𝖠𝖳^.

See Appendix A.6 for the proof. Next, we provide another equivalent form of 𝗄𝖲𝖳𝖠𝖳 that is sometimes more convenient.

Theorem 26.

The class 𝗄𝖲𝖳𝖠𝖳 consists exactly of functions f:{0,1}n{0,1} for which there exist linear forms {L1,,Lp} with p=poly(n), a positive integer k, and a labelling function 𝗅𝖺𝖻𝖾𝗅:{1,,p}{0,1}, such that for all 𝐱,

f(𝒙)=1(L1(𝒙),,Lp(𝒙))(k)=Li(𝒙) for some i with 𝗅𝖺𝖻𝖾𝗅(i)=1. (4)

The class 𝗄𝖲𝖳𝖠𝖳^ consists exactly of functions with the same representation with polynomial-size coefficients in the linear forms.

See Appendix A.7 for the proof. Now we show that some well-known circuit classes, for which we do not have any known lower bounds, are computable by 𝗄𝖧𝖭𝖭¯.

Theorem 27.
𝖲𝖸𝖬𝖬𝖠𝖩𝗄𝖲𝖳𝖠𝖳^.

Any symmetric function of s threshold functions has a 𝗄𝖲𝖳𝖠𝖳^ representation with k=s+1.

See Appendix A.8 for the proof. Using the same strategy, we can embed a large complexity class into 𝗄𝖭𝖭 directly:

Theorem 28.
𝖲𝖸𝖬𝖠𝖭𝖣𝗄𝖭𝖭.

Any symmetric function of s conjunctions has a 𝗄𝖭𝖭 representation with k=2s+1.

See Appendix A.9 for the proof.

 Remark 29.

Note that 𝖲𝖸𝖬𝖠𝖭𝖣𝖲𝖸𝖬𝖬𝖠𝖩 and 𝖲𝖸𝖬𝖠𝖭𝖣 is known to simulate the whole class of 𝖠𝖢𝖢0 within quasi-polynomial size [3]. Related classes are of interest in the context of obtaining lower bounds through circuit satisfiability algorithms [40, Conjecture 1].

As a result of Theorem 28, if we prove for some explicit function f that f𝗄𝖭𝖭, it will follow that f𝖲𝖸𝖬𝖠𝖭𝖣, and this would be a major breakthrough in circuit complexity. Also note that 𝖨𝖯𝖲𝖸𝖬𝖠𝖭𝖣 and thus, by Theorem 28, 𝖨𝖯𝗄𝖭𝖭. Together with Corollary 16, this gives a separation between 𝖭𝖭 and 𝗄𝖭𝖭. This also shows that in Corollary 23 we cannot get rid of k in the lower bound.

Theorem 30.

𝖤𝖫𝖣𝖫𝗄𝖲𝖳𝖠𝖳, 𝖤𝖫𝖣𝖫^𝗄𝖲𝖳𝖠𝖳^.

See Appendix A.10 for the proof.

 Remark 31.

The class 𝖤𝖫𝖣𝖫 is known to be contained in 𝖳𝖧𝖱𝖳𝖧𝖱 and proving super-polynomial lower bounds for 𝖤𝖫𝖣𝖫 is an open problem (See [9]).

5 New bounds for the nearest neighbor complexity of Boolean functions

In this section, we derive several bounds on the nearest neighbor complexity of Boolean functions.

5.1 Nearest neighbor complexity of CNFs

We first show that any CNF admits an efficient 𝖭𝖭 representation.

Theorem 32.

Any CNF or DNF with m clauses has an 𝖭𝖭 representation with m+1 anchors and constant bit-complexity.

Proof.

It suffices to prove the statement for DNFs as any CNF can be converted to a DNF by negation.

Let N={𝒒:=(12,,12)} and note that d(𝒙,𝒒)=n/4 for every input 𝒙{0,1}n (where d is the squared Euclidean distance). For each clause, say C(𝒙)=(x1xk), introduce a positive anchor

𝒑𝑪=(1,32,,32k,12,,12nk)

If any variable is negated, replace the corresponding 32 (or 1) with 12 (or 0).

If C(𝒙)=1, then d(𝒙,𝒑𝑪)=(n1)/4<d(𝒙,𝒒). Otherwise, some literal in C is equal to zero, hence d(𝒙,𝒑𝑪)1+(n1)/4>d(𝒙,𝒒). Therefore, the entire DNF, say C1Cm, is satisfied if and only if some 𝒑𝑪𝒊 is a nearest neighbor of 𝒙.

The polynomial-size representation above does not generalize to deeper 𝖠𝖢0 circuits of depth larger than 2. For instance, Corollary 16 exhibits a function computable by a depth-three De Morgan circuit of polynomial size which does not belong to 𝖭𝖭¯. For the well studied disjointness function (that admits a compact CNF representation) we can get an efficient 𝖧𝖭𝖭 representation:

Theorem 33.
𝖣𝖨𝖲𝖩𝖧𝖭𝖭

The disjointness function (in 2n dimensions) has an 𝖧𝖭𝖭 representation with 3n anchors.

Proof.

Consider anchors P={(𝒆𝟏,𝒆𝟏),,(𝒆𝒏,𝒆𝒏)} and N={𝒆𝟏,,𝒆𝟐𝒏} where 𝒆𝒊 denotes the i’th standard basis vector and (𝒆𝒊,𝒆𝒊) their concatenation.

Let 𝒙,𝒚{0,1}n and suppose xi=yi=1 for some i. Then, for all j it holds that Δ((𝒙,𝒚),(𝒆𝒊,𝒆𝒊))Δ((𝒙,𝒚),𝒆𝒋)1 with equality when i=j. Otherwise, Δ((𝒙,𝒚),(𝒆𝒊,𝒆𝒊))Δ((𝒙,𝒚),𝒆𝒋)+1 for all i,j.

 Remark 34.

It can be shown that the number of anchors in Theorem 33 is nearly tight; based on the Ω(n) lower bound for 𝖣𝖨𝖲𝖩 of [33], a simple argument proves that 𝖭𝖭 representations of disjointness require Ω(n/logn) anchors. We omit the details.

Proceeding, we show that some CNFs with polynomially many clauses have exponential Boolean nearest neighbor complexity.

Definition 35.

The Hamming cube graph is an undirected graph with vertices V={0,1}n and edges E={(𝐮,𝐯)V:Δ(𝐮,𝐯)=1}. The components of a Boolean function f are the connected components of the subgraph of the Hamming cube graph induced by the vertex set f1(1).

Lemma 36.

If a Boolean function f has m components then any 𝖧𝖭𝖭 representation of f has at least m anchors.

Proof.

Consider some component C of f and let δ(C) denote the vertex boundary of C: Vertices in {0,1}nC with a neighbor in C. Note that δ(C)f1(0).

Suppose f has 𝖧𝖭𝖭 representation PN and let 𝒑P be the nearest anchor to some 𝒙C. Assume for contradiction that 𝒑C. Note that Δ(𝒙,𝒑) is equal to the length of the shortest path from 𝒙 to 𝒑 in the Hamming cube graph, which by assumption must contain some 𝒚δ(C). (In particular, Δ(𝒙,𝒑)=Δ(𝒙,𝒚)+Δ(𝒚,𝒑).) Thus, there must exist some negative anchor 𝒒N with Δ(𝒚,𝒒)<Δ(𝒚,𝒑). By the triangle inequality,

Δ(𝒙,𝒒)Δ(𝒙,𝒚)+Δ(𝒚,𝒒)<Δ(𝒙,𝒚)+Δ(𝒚,𝒑)=Δ(𝒙,𝒑)

which contradicts the minimality of 𝒑. Thus, each component contains an anchor.

Using the previous results, another separation between 𝖧𝖭𝖭 and 𝖭𝖭 follows from the existence of a CNF (over n-variables) with poly(n) clauses and exponentially (in n) many components. (See Appendix A.)

Theorem 37.

For any k>0, there exists a k-CNF over n variables with poly(n) clauses for which any 𝖧𝖭𝖭 representation has 2Ω(n) anchors.

5.2 A new lower bound for majority

We now discuss the disparity between the 𝖧𝖭𝖭 complexity of the majority function in [17, Theorem 4]: In particular, when n is even, the best upper bound is n2+2 anchors, whereas 2 anchors suffices when n is odd. Note that if ties were allowed (won by positive anchors) in Definition 3, then P={1n} and N={0n} would suffice as an 𝖧𝖭𝖭 representation for 𝖬𝖠𝖩 for all n.

Theorem 38.

For even n, any 𝖧𝖭𝖭 representation of 𝖬𝖠𝖩 requires n2+2 anchors.

Proof.

Suppose PN is an 𝖧𝖭𝖭 representation of 𝖬𝖠𝖩 for even n. We claim that for each 𝒙{0,1}n satisfying Δ(𝒙)=n/2, there is a positive anchor 𝒑𝟏 with 𝒙𝒑 in coordinate-wise order:

It follows from [17] that the nearest anchor 𝒑 to 𝒙 satisfies 𝒙𝒑. Indeed, for some i it holds that xi=1, so suppose for contradiction that pi=0. Then, construct 𝒚=𝒙𝒆𝒊 and let 𝒒N be the nearest anchor to 𝒚. This yields Δ(𝒙,𝒑)=Δ(𝒚,𝒑)+1>Δ(𝒚,𝒒)+1, contradicting the fact that

Δ(𝒙,𝒑)<Δ(𝒙,𝒒)Δ(𝒚,𝒒)+1. (5)

A similar argument shows that 𝒒𝒚. Hence, Δ(𝒚,𝒒)n21, and (5) becomes Δ(𝒙,𝒑)<n2 which implies that Δ(𝒑)n1, proving the claim.

For contradiction, assume that |PN|n2+1. Since there must be at least one negative anchor, we have |P|n2. Then, we can construct 𝒙{0,1}n with Δ(𝒙)=n2 for which there is no positive anchor 𝒑𝟏 with 𝒙𝒑, leading to a contradiction: For each 𝒑P𝟏, arbitrarily select some i where pi=0 and set xi=1, ensuring 𝒙𝒑. After this process, Δ(𝒙)|P|n2. Arbitrarily fixing more coordinates of 𝒙 to 1 so that Δ(𝒙)=n2 completes the construction.

6 Conclusion

We have studied nearest neighbor representations of Boolean functions, proving new lower and upper bounds and devising connections to circuit complexity. There are many future questions and research directions:

  • Studying representations of Boolean functions using ideas from approximate nearest neighbor search [20, 28] could be of interest. Such a study could potentially lead to new insights and more compact representations avoiding the curse of dimensionality.

  • Studying nearest neighbor complexity with respect to additional discrete domains such as grids as well as more than two labels is an interesting future direction.

  • Circuit complexity has been used to derive new algorithms for nearest neighbor problems [1]. Can ideas about nearest neighbor complexity such as connections to mpPTFs be used to obtain new algorithms for nearest neighbor classification and search?

  • Finally, it remains open whether 𝖭𝖭¯=𝖧𝖭𝖭¯.

References

  • [1] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 136–150. IEEE, 2015. doi:10.1109/FOCS.2015.18.
  • [2] Alexandr Andoni. Nearest neighbor search: the old, the new, and the impossible. PhD thesis, Massachusetts Institute of Technology, 2009.
  • [3] Richard Beigel and Jun Tarui. On ACC. Comput. Complex., 4:350–366, 1994. doi:10.1007/BF01263423.
  • [4] Harry Buhrman, Nikolay Vereshchagin, and Ronald de Wolf. On computation and communication with small bias. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 24–32. IEEE, 2007. doi:10.1109/CCC.2007.18.
  • [5] Arkadev Chattopadhyay, Meena Mahajan, Nikhil S. Mande, and Nitin Saurabh. Lower bounds for linear decision lists. Chic. J. Theor. Comput. Sci., 2020, 2020. URL: http://cjtcs.cs.uchicago.edu/articles/2020/1/contents.html.
  • [6] Yan Qiu Chen, Mark S Nixon, and Robert I Damper. Implementing the k-nearest neighbour rule via a neural network. In Proceedings of ICNN’95-International Conference on Neural Networks, volume 1, pages 136–140. IEEE, 1995. doi:10.1109/ICNN.1995.488081.
  • [7] Kenneth L Clarkson. Nearest neighbor queries in metric spaces. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 609–617, 1997. doi:10.1145/258533.258655.
  • [8] Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967. doi:10.1109/TIT.1967.1053964.
  • [9] Yogesh Dahiya, K. Vignesh, Meena Mahajan, and Karteek Sreenivasaiah. Linear threshold functions in decision lists, decision trees, and depth-2 circuits. Inf. Process. Lett., 183:106418, 2024. doi:10.1016/J.IPL.2023.106418.
  • [10] Luc Devroye. On the asymptotic probability of error in nonparametric discrimination. The Annals of Statistics, 9(6):1320–1327, 1981.
  • [11] Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition, volume 31. Springer Science & Business Media, 2013.
  • [12] Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks. In Conference on learning theory, pages 907–940. PMLR, 2016. URL: http://proceedings.mlr.press/v49/eldan16.html.
  • [13] Jürgen Forster. A linear lower bound on the unbounded error probabilistic communication complexity. Journal of Computer and System Sciences, 65(4):612–625, 2002. doi:10.1016/S0022-0000(02)00019-3.
  • [14] Mikael Goldmann, Johan Håstad, and Alexander Razborov. Majority gates vs. general weighted threshold gates. Computational Complexity, 2:277–300, 1992. doi:10.1007/BF01200426.
  • [15] Mikael Goldmann and Marek Karpinski. Simulating threshold circuits by majority circuits. SIAM Journal on Computing, 27(1):230–246, 1998. doi:10.1137/S0097539794274519.
  • [16] András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth. Journal of Computer and System Sciences, 46(2):129–154, 1993. doi:10.1016/0022-0000(93)90001-D.
  • [17] Péter Hajnal, Zhihao Liu, and György Turán. Nearest neighbor representations of boolean functions. Information and Computation, 285:104879, 2022. doi:10.1016/J.IC.2022.104879.
  • [18] Kristoffer Arnsfelt Hansen and Vladimir V Podolskii. Polynomial threshold functions and boolean threshold circuits. Information and Computation, 240:56–73, 2015. doi:10.1016/J.IC.2014.09.008.
  • [19] Lisa Hellerstein and Rocco A Servedio. On PAC learning algorithms for rich boolean function classes. Theoretical Computer Science, 384(1):66–76, 2007. doi:10.1016/J.TCS.2007.05.018.
  • [20] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998. doi:10.1145/276698.276876.
  • [21] Piotr Indyk and Tal Wagner. Approximate nearest neighbors in limited space. In Conference On Learning Theory, pages 2012–2036. PMLR, 2018. URL: http://proceedings.mlr.press/v75/indyk18a.html.
  • [22] Jeffrey C Jackson, Adam R Klivans, and Rocco A Servedio. Learnability beyond AC0. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 776–784, 2002.
  • [23] Stasys Jukna. Boolean function complexity: advances and frontiers, volume 5. Springer, 2012. doi:10.1007/978-3-642-24508-4.
  • [24] Kordag Mehmet Kilic, Jin Sima, and Jehoshua Bruck. On the information capacity of nearest neighbor representations. In 2023 IEEE International Symposium on Information Theory (ISIT), pages 1663–1668, 2023. doi:10.1109/ISIT54713.2023.10206832.
  • [25] Kordag Mehmet Kilic, Jin Sima, and Jehoshua Bruck. Nearest neighbor representations of neurons. In 2024 IEEE International Symposium on Information Theory (ISIT), 2024.
  • [26] Adam R Klivans and Rocco A Servedio. Learning DNF in time 2O(n1/3). Journal of Computer and System Sciences, 2(68):303–318, 2004.
  • [27] Adam R Klivans and Rocco A Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(4), 2006. URL: https://jmlr.org/papers/v7/klivans06a.html.
  • [28] Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 614–623, 1998. doi:10.1145/276698.276877.
  • [29] Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, fourier transform, and learnability. Journal of the ACM (JACM), 40(3):607–620, 1993. doi:10.1145/174130.174138.
  • [30] James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. On the representational efficiency of restricted boltzmann machines. Advances in Neural Information Processing Systems, 26, 2013.
  • [31] O Murphy. Nearest neighbor pattern classification perceptrons. Neural Networks: Theoretical Foundations and Analysis, pages 263–266, 1992.
  • [32] Edward A Patrick and Frederic P Fischer III. A generalized k-nearest neighbor rule. Information and control, 16(2):128–152, 1970. doi:10.1016/S0019-9958(70)90081-1.
  • [33] Alexander A Razborov. On the distributional complexity of disjointness. In International Colloquium on Automata, Languages, and Programming, pages 249–253. Springer, 1990. doi:10.1007/BFB0032036.
  • [34] Alexander A Razborov. On small depth threshold circuits. In Scandinavian Workshop on Algorithm Theory, pages 42–52. Springer, 1992. doi:10.1007/3-540-55706-7_4.
  • [35] Alexander A Razborov and Alexander A Sherstov. The sign-rank of AC0. SIAM Journal on Computing, 39(5):1833–1855, 2010.
  • [36] Ronald L Rivest. Learning decision lists. Machine learning, 2:229–246, 1987. doi:10.1007/BF00058680.
  • [37] Michael E. Saks. Slicing the hypercube, pages 211–256. London Mathematical Society Lecture Note Series. Cambridge University Press, 1993.
  • [38] Matus Telgarsky. Benefits of depth in neural networks. In Conference on learning theory, pages 1517–1539. PMLR, 2016. URL: http://proceedings.mlr.press/v49/telgarsky16.html.
  • [39] Gal Vardi, Daniel Reichman, Toniann Pitassi, and Ohad Shamir. Size and depth separation in approximating benign functions with neural networks. In Conference on Learning Theory, pages 4195–4223. PMLR, 2021. URL: http://proceedings.mlr.press/v134/vardi21a.html.
  • [40] Nikhil Vyas and R. Ryan Williams. Lower bounds against sparse symmetric functions of ACC circuits: Expanding the reach of #SAT algorithms. Theory Comput. Syst., 67(1):149–177, 2023. doi:10.1007/S00224-022-10106-8.
  • [41] R. Ryan Williams. Limits on representing boolean functions by linear combinations of simple functions: Thresholds, relus, and low-degree polynomials. In 33rd Computational Complexity Conference (CCC 2018). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018.

Appendix A Omitted proofs

A.1 Proof of Theorem 14

We break the proof of this theorem into two separate lemmas.

Lemma 39.
𝖭𝖭¯𝗆𝗉𝖯𝖳𝖥(),𝖧𝖭𝖭¯𝗆𝗉𝖯𝖳𝖥(poly(n))

More precisely, any 𝖭𝖭¯ representation with m anchors is equivalent to an 𝗆𝗉𝖯𝖳𝖥 with m terms, and any 𝖧𝖭𝖭¯ representation with m anchors in n~ dimensions is equivalent to an 𝗆𝗉𝖯𝖳𝖥 with m terms and maximum weight n~.

Proof.

The distance from 𝒙{0,1}n to an anchor 𝒑n is a linear form in variables 𝒙:

i(xipi)2 =i[xi22pixi+pi2]
=i[(12pi)xi+pi2]
=𝟏2𝒑,𝒙+𝒑22.

We can observe that 𝖭𝖭 representations essentially compute 𝟙[min𝒑PΔ(𝒙,𝒑)min𝒒NΔ(𝒙,𝒒)], which is an 𝗆𝗉𝖯𝖳𝖥. Subfunctions merely multiply coefficients and add constants to each linear form – For example, d(x1x10,p1p2p3)=2(12p1)x1+(p32+2p12).

In the case of 𝖧𝖭𝖭, we have for all anchors that 𝒑{0,1}n and Δ(𝒙,𝒑) is a linear form with ±1 coefficients and positive constants bounded (in absolute value) by n. As a result, the weights in 𝗆𝗉𝖯𝖳𝖥 are bounded by n as well.

Lemma 40.
𝗆𝗉𝖯𝖳𝖥()𝖭𝖭¯,𝗆𝗉𝖯𝖳𝖥(poly(n))𝖧𝖭𝖭¯

More precisely, any 𝗆𝗉𝖯𝖳𝖥 with m terms has an 𝖭𝖭¯ representation with m anchors in n+1 dimensions. Any 𝗆𝗉𝖯𝖳𝖥 with m terms and maximum weight W has an 𝖧𝖭𝖭¯ representation with m anchors in n~=O(nW) dimensions.

Proof.

We start with the 𝗆𝗉𝖯𝖳𝖥(poly(n)) case. Let 𝟙[mini1L1i(𝒙)minj2L2j(𝒙)] be an arbitrary 𝗆𝗉𝖯𝖳𝖥(poly(n)). First make some pre-processing steps. First, multiply each linear form by 2 and add one to the right-hand side, so that ties are won by the left-hand side. Second, we would like to make all coefficients positive. For this, while there exists a negative term aijkxk (or constant θij), just add xk (or 1) to every linear form until all negative terms are eliminated. No coefficient (or constant) will increase by more than W. Third, we make all coefficients even by multiplying all linear forms by two. Finally, we add the same constant Θ (to be decided later) to all linear forms. Then, every linear form is equal to Lij(𝒙)=aij1x1++aijnxn+θij+Θ, for positive, even constants aijk,θij8W.

Define n block sizes t1,,tn by tk:=maxi,jaijk (i.e., the maximum coefficient of xk in any linear form). Also define C=Θ+maxi,jθij and let n~:=t1++tn+C. Inputs 𝒙{0,1}n are mapped () to query points 𝒙~{0,1}n~ and linear forms Lij are mapped to anchors 𝒑~ij{0,1}n~ such that Δ(𝒙~,𝒑~ij)=Lij(𝒙). In particular,

𝒙~:=x1x1t1 manyxnxntn many11C many

and

𝒑~ij=00(t1+aij1)/211(t1aij1)/200(tn+aijn)/211(tnaijn)/200zij11Czij

where zi will be chosen momentarily. (Let P={𝒑~1j}j1 and N={𝒑~2j}j2.) The distance between 𝒙~ and 𝒑~ij is equal to

Δ(𝒙~,𝒑~ij) =zij+k(tk+aijk2)xk+(tkaijk2)(1xk)
=zij+𝒂𝒊𝒋,𝒙+k(tkaijk2)

Now let zij=Θ+θijk(tkaijk2) so that Δ(𝒙~,𝒑~ij)=𝒂𝒊𝒋,𝒙+Θ+θij. This is valid (i.e., zij is a non-negative integer) if we choose a large enough value for Θ: The minimal value of Θ such that zij0 for all i,j is

Θ=maxi,j(k(tkaijk2)θij)ktk24nW.

Thus, for Θ=4nW, we may always choose 0zijΘ+θijC. Observe that 𝒙𝒙~ by duplicating each xi at most 8W times and introducing at most 4nW+8W constant variables. Thus, the original 𝗆𝗉𝖯𝖳𝖥 is equivalent to a subfunction of an 𝖧𝖭𝖭 representation with m anchors at most 4nW+8W dimensions.

For the 𝗆𝗉𝖯𝖳𝖥() case, the same method applies, only now we do not need to increase the dimension that much. All coefficients can be realized by choosing anchors 𝒑ij=(1𝒂ij)/2 and all constants θij can be corrected using one additional dimension.

From this we can also deduce the following:

Theorem 41.

Any function with an m-anchor 𝖭𝖭 representation with bit-complexity O(logn) is equivalent to an 𝗆𝗉𝖯𝖳𝖥(poly(n)) with m terms. Any function of n inputs with an 𝗆𝗉𝖯𝖳𝖥(poly(n)) representation with m terms is equivalent to a subfunction of a function of n+1 inputs with an m-anchor 𝖭𝖭 representation with bit-complexity O(logn).

Proof.

Observe that in Lemmas 39 and 40 – for 𝖭𝖭 and 𝗆𝗉𝖯𝖳𝖥() – the bit-complexity of 𝖭𝖭 and the logarithms of weights of 𝗆𝗉𝖯𝖳𝖥 are linearly related.

A.2 Proof of Corollary 15

Proof.

It is shown in [17] that 𝖷𝖮𝖱 has a unique 𝖧𝖭𝖭 representation with 2n anchors. Furthermore, it is established in [18] that 𝖷𝖮𝖱𝗆𝗉𝖯𝖳𝖥(poly(n)): In particular, 𝖷𝖮𝖱(𝒙)=1 if and only if min{L0(𝒙),L2(𝒙),}min{L1(𝒙),L3(𝒙),} where Li(𝒙)=i22i(x1++xn).

A.3 Proof of Corollary 16

Proof.

It was shown by [17] that the 𝖭𝖭 complexity of a Boolean function f is bounded below by the sign-rank of f, and this can be easily extended to 𝖭𝖭¯ through Theorem 14: The number of terms in an 𝗆𝗉𝖯𝖳𝖥 computing f is also bounded below by the sign-rank of f, by [18].

[13] and [35] respectively establish that the sign rank of 𝖨𝖯 is equal to 2n/2 and the sign rank of fn is 2Ω(n).

A.4 Proof of Lemma 18

Proof.

Consider a function f𝗆𝗉𝖯𝖳𝖥(poly(n)) and let 𝟙[mini1L1i(𝒙)minj2L2j(𝒙)] be its representation. We can assume that all possible values of all linear forms are distinct. For this it is enough to multiply all forms by 1+2 and to add to each form it’s own unique remainder modulo 1+2.

Observe that all linear forms obtain only polynomially many variables (since there output is polynomially bounded in absolute value). Denote possible values of the form Lij by aij1,,aijt for some t polynomially bounded in n. Note that, for different linear forms, the number of the values obtained might be not the same. To simplify the notation we assume that we add several equal values to the list to make them all of equal size t.

Now we are ready to produce the decision list. Let c1=1 and c2=0. We consider each aijk in increasing order and query if Lij(𝒙)aijk. If so, we output ci. If not, we proceed to the next aijk.

This decision list computes f since we are just looking for the minimal value of a linear form among all possible values of the forms.

A.5 Consequences of Theorem 22

Corollary 42.

Any Boolean function with a 𝗄𝖭𝖭¯ representation with m anchors has an 𝖭𝖭¯ representation with (mk) anchors. (Similarly, Boolean function with a 𝗄𝖧𝖭𝖭¯ representation with m anchors has an 𝖧𝖭𝖭¯ representation with (mk) anchors.)

 Remark 43.

Theorem 22 and Corollary 42 can be extended to non-Boolean inputs. More precisely, the same statements are true over any finite domain Dn. For this we can express (squared) distances to anchors as quadratic forms, for each subset of distances of size k consider the average of these distances and represent them as a distance to a new anchor. We still need to add an extra dimension to absorb constant terms.

 Remark 44.

Theorem 22 and Corollaries 42 and 23 can be extended to the case of weighted 𝗄𝖭𝖭. Indeed, in Theorem 22, instead of sums of linear forms we will have weighted sums. This will require (mk)k!=m!(mk)! terms in the 𝗆𝗉𝖯𝖳𝖥 representation. If the weights in the weighted 𝗄𝖭𝖭 representation are small and the bit-complexity of anchors is small, this results in a 𝖧𝖭𝖭¯ representation and if there are no restrictions of weights and bit-complexity, we get 𝖭𝖭¯ representation. The proof of Corollary 23 still works despite the increase of the number of anchors to m!(mk)!.

A.6 Proof of Theorem 25

We first make the following general observation: [32] show that finding the k’th nearest positive anchor and k’th nearest negative anchor and classifying based on which is closest is equivalent to computing a (2k1)-nearest neighbors representation. This fact can be generalized, considering the closure of 𝗄𝖭𝖭.

Lemma 45.

Let A and B be two sets of numbers and let S be the k smallest elements of AB. Then,

|AS||BS|A(t)<B(t)

where t=k+12. (As in 𝗄𝖭𝖭, we assume S exists and is unique).

Proof.

A contains a majority of the elements in S if and only if |AS|t. This happens if and only if the t’th smallest element in A is smaller than the t’th smallest element in B.

We now proceed with the proof of Theorem 25.

Proof.

For the inclusion 𝗄𝖭𝖭¯𝗄𝖲𝖳𝖠𝖳, consider any function f in 𝗄𝖭𝖭¯. It is a subfunction of some function g with a 𝗄𝖭𝖭 representation PN. As in Lemma 39, the distances between 𝒙 and each anchor are linear forms A={L1(𝒙),,L|P|(𝒙)} and B={R1(𝒙),,R|N|(𝒙)} which we assume have integer coefficients by the usual finite precision argument. By definition g(𝒙)=1 if and only if the set S of k-nearest neighbors satisfies |PS||NS|. By Lemma 45, this happens if and only if A(t)<B(t), taking t=k+12. Hence, g𝗄𝖲𝖳𝖠𝖳. As 𝗄𝖲𝖳𝖠𝖳 is closed under taking subfunctions, f𝗄𝖲𝖳𝖠𝖳 as well.

For the inclusion 𝗄𝖲𝖳𝖠𝖳𝗄𝖭𝖭¯, assume that f has a 𝗄𝖲𝖳𝖠𝖳 representation. By adding dummy linear forms we can have kl=kr. By Lemma 45, the inequality (3) holds if and only if the 2kl1 smallest linear forms consist of more linear forms from the left-hand side than the right. Representing each inequality by an anchor, we obtain a representation of the same function in 𝗄𝖭𝖭¯.

The case of 𝗄𝖧𝖭𝖭¯ and 𝗄𝖲𝖳𝖠𝖳^ is analogous.

A.7 Proof of Theorem 26

Proof.

Suppose a Boolean function f has a representation {L1,,Lp} satisfying (4) for some function 𝗅𝖺𝖻𝖾𝗅 and integer k. We will show that f𝗄𝖲𝖳𝖠𝖳. First, we assume that all coefficients in all linear form are integers and ensure that all values of all linear forms are distinct and even. For this, multiply all forms by 2p and shift each form by its own even remainder modulo 2p.

For each ip, we add one linear form to each side of (3). If 𝗅𝖺𝖻𝖾𝗅(i)=1, then place the form Li(𝒙) on the left-hand side and Li(𝒙)+1 on the right. If 𝗅𝖺𝖻𝖾𝗅(i)=0, put the Li(𝒙) on right-hand side and Li(𝒙)+1 on the left. It is easy to see that the k’th statistics in the left and right-hand sides of the resulting 𝗄𝖲𝖳𝖠𝖳 representation are Li(𝒙) and Li(𝒙)+1 (not necessarily in that order), where Li(𝒙) is the k’th statistic of the original representation. Hence, the inequality in (3) holds if and only if 𝗅𝖺𝖻𝖾𝗅(i)=1.

For the other direction, assume we have a function f𝗄𝖲𝖳𝖠𝖳 given by (3). We again assume that all coefficients are integers and all values of all linear forms are distinct. Now we construct the required representation of f. For each form Li we add to the representation the forms Lij(𝒙):=Li(𝒙)+jkl+kr for all j{0,1,,kl+kr1}, and for each form Ri we add to the representation the forms Rij(𝒙):=Ri(𝒙)+jkl+kr+1 for all j={0,1,,kl+kr}. (That is, we have kl+kr copies of each form Li and kl+kr+1 copies of each form Ri). To each Lij, Rij we assign the label 0 if j<kl, and 1 if jkl. Finally, we set k=(kl+kr1)(kl+kr+1)+1.

Now, observe that the inequality (3) holds if and only if, among the kl+kr1 smallest forms, there are at least kl forms Li. Assume that there are precisely a forms Li and b forms Ri. In particular, a+b=kl+kr1. Then, in the new representation, these linear forms give us

a(kl+kr)+b(kl+kr+1)=(a+b)(kl+kr+1)a=(kl+kr1)(kl+kr+1)a

smallest forms. By construction, the next smallest forms are either Li0Li(kl+kr) or Ri0Ri(kl+kr+1) for some i. Thus, the k’th smallest form is either Lia or Ria and its label is 1 if and only if akl as desired.

A.8 Proof of Theorem 27

Proof.

Suppose we are given a function f𝖲𝖸𝖬𝖬𝖠𝖩 and a circuit computing it. We are going to construct a 𝗄𝖲𝖳𝖠𝖳^ representation of f in the form given by Theorem 26.

We can assume that all 𝖬𝖠𝖩 gates in the circuit have the same threshold t=0. For this we can just add dummy variables and fix them to constants. Denote the linear forms for 𝖬𝖠𝖩 gates by L1,,Ls (all weights are integers) and denote by g:{0,1}s{0,1} the symmetric function at the top of the circuit. Here, s is the size of the circuit. Now, construct a 𝗄𝖲𝖳𝖠𝖳^ representation with the following linear forms:

(s+2)L1(𝒙),,(s+2)Ls(𝒙),1,2,,s+1. (6)

That is, we multiply each linear form by (s+2) and add (s+1) constant linear forms with values 1,,s+1. We let k=s+1.

It is easy to see that the k’th statistic of (6) is always one of the constant linear forms. It is the form i if and only if i1 of the linear forms among L1,,Ls are positive. We assign label 1 to the form i if and only if g(𝒙)=1 for inputs of weight i1. As a result, we get the desired representation for f and show that f𝗄𝖲𝖳𝖠𝖳^.

 Remark 46.

The well-known argument that shows 𝖬𝖠𝖩𝖳𝖧𝖱=𝖬𝖠𝖩𝖬𝖠𝖩 (see [14]) can be straightforwardly adapted to show that 𝖲𝖸𝖬𝖳𝖧𝖱=𝖲𝖸𝖬𝖬𝖠𝖩. Thus, 𝖲𝖸𝖬𝖳𝖧𝖱𝗄𝖲𝖳𝖠𝖳^ follows from Theorem 27 as well.

A.9 Proof of Theorem 28

Proof.

First, as a warm-up, we show that 𝖨𝖯𝗄𝖭𝖭. Recall that 𝖨𝖯(𝒙,𝒚)=i=1n(xiyi). Denote by 𝒂=(12,,12) an 2n-dimensional vector with 12 in each coordinate. Note that Δ(𝒂,(𝒙,𝒚))=n2 for all (𝒙,𝒚){0,1}2n.

For each i=1,,n introduce two anchors 𝒑𝒊𝟎=𝒂+12(𝒆𝒊+𝒆𝒊+𝒏) and 𝒑𝒊𝟏=𝒂+14(𝒆𝒊+𝒆𝒊+𝒏). If for some (𝒙,𝒚) we have xi=yi=1, then

Δ((𝒙,𝒚),𝒑𝒊𝒋)n22(14116)=n238.

If, on the other hand, xi=0 or yi=0, then

Δ((𝒙,𝒚),𝒑𝒊𝒋)n2(14116)(14916)=n2+18>n2.

For each i=1,,n+1 and j=0,1 and l=0,1 introduce an anchor 𝒒𝒊,𝒋,𝒍=𝒂+(1)l2i+j8n𝒆𝟏. For (𝒙,𝒚) with x1=1 it is not hard to see that

n238< Δ((𝒙,𝒚),𝒒𝒏+𝟏,𝟏,𝟎)<Δ((𝒙,𝒚),𝒒𝒏+𝟏,𝟎,𝟎)<<
< Δ((𝒙,𝒚),𝒒𝟏,𝟏,𝟎)<Δ((𝒙,𝒚),𝒒𝟏,𝟎,𝟎)<n2

and Δ((𝒙,𝒚),𝒒𝒊,𝒋,𝟏)>n2 for all i,j. The situation is symmetric for x1=0. We assign label j to the anchor 𝒑𝒊𝒋. We assign label 1 to the anchor 𝒒𝒊𝒋𝒍 iff i+j is odd. We let k=2n+1.

It is easy to see that for a given (𝒙,𝒚) among the k closest anchors we have all pairs of anchors 𝒑𝒊𝟎,𝒑𝒊𝟏 for all i such that xi=yi=1. Denote the number of such i by t. Also among the k closest anchors we will have pairs of anchors 𝒒𝒊,𝟎,𝒍,𝒒𝒊,𝟏,𝒍 for an appropriate l and for i=n+1,,t+2. In each of these pairs the labels of anchors are opposite and they cancel out when we compute the majority. Finally, one last anchor we will have among the k closest anchors is 𝒒𝒕+𝟏,𝟏,𝒍. The label of this anchor determines the majority among the k closest anchors and it is 1 iff t is odd. As a result, we get the desired representation for 𝖨𝖯 with 6n+4 anchors.

Now we extend this argument to 𝖲𝖸𝖬𝖠𝖭𝖣. Consider a function f(𝒙)=g(f1(𝒙),,fs(𝒙)), where each fi has the form fi(𝒙)=(iSixi)(iTi¬xi) for some disjoint Si,Ti[n]. For each fi we let 12>ϵi1>ϵi0>0 be a couple of parameters to be fixed later. We introduce a pair of anchors 𝒑𝒊𝟏,𝒑𝒊𝟎 in the following way: Set the kth coordinate of 𝒑𝒊𝒋 to

𝒑𝒊𝒋(k)={1/2kSiTi3/2ϵijkSiϵij1/2kTi

It is easy to see that for 𝒙 such that fi(𝒙)=1 we have Δ(𝒑𝒊𝒋,𝒙)=n4|SiTi|(ϵijϵij2) and for 𝒙 such that fi(𝒙)=0 we have Δ(𝒑𝒊𝒋,𝒙)n4(|SiTi|1)(ϵijϵij2)+1. We fix ϵij in such a way that n4|SiTi|(ϵijϵij2)<n412 and n4(|SiTi|1)(ϵijϵij2)+1>n4+12. We set 𝗅𝖺𝖻𝖾𝗅(𝒑𝒊𝒋)=j.

We construct anchors 𝒒𝒊𝒋𝒍 for i=1,,s+1 and j=0,1 the same way as above and assign 𝗅𝖺𝖻𝖾𝗅(𝒒𝒊𝟏𝒍) to be equal to g(𝒚) for 𝒚 of weight i1 and 𝗅𝖺𝖻𝖾𝗅(𝒒𝒊𝟎𝒍) to be the opposite. We let k=2s+1. The same argument as for 𝖨𝖯 shows that we get the desired representation of f with 6s+4 anchors.

A.10 Proof of Theorem 30

Proof.

Consider a function f𝖤𝖫𝖣𝖫 and suppose the linear forms in its representation are L1,,Ls. Here Li corresponds to the i’th query. As in the proof of Theorem 27, we can assume that all thresholds in all linear forms are 0.

We are going to construct a representation for f of the form provided by Theorem 26. We add to this representation the following linear forms:

(s+1)L1,(s+1)L1,(s+1)L2+1,(s+1)L21,,(s+1)Ls+s1,(s+1)Lss+1.

That is, for each form Li in 𝖤𝖫𝖣𝖫 representation, we add the two forms (s+1)Li+(i1) and (s+1)Li(i1). We set k=s.

Assume that for some x we have Li(𝒙)=0 and all previous linear forms are non-zero. We than have that (s+1)Li(𝒙)+i1=i1. It is not hard to see that for j<i we have that among forms (s+1)Lj(𝒙)+j1 and (s+1)Lj(𝒙)j+1 exactly one is greater than i1: it is the first one if Lj(𝒙)>0 and the second one if Lj(𝒙)<0. For j>i in a similar way we can see that among the forms (s+1)Lj(𝒙)+j1 and (s+1)Lj(𝒙)j+1 exactly one is greater than i1: it is the first one if Lj(𝒙)0 and the second one if Lj(𝒙)<0. As a result there are exactly s1 forms that are greater than (s+1)Li(𝒙)+i1. We assign to this form the same label Li(𝒙) has in 𝖤𝖫𝖣𝖫. From this it follows that the constructed representation computes the same function.

Clearly, the coefficients in the constructed form are polynomially related to the coefficients in the original forms. Thus, the same proof gives 𝖤𝖫𝖣𝖫^𝗄𝖲𝖳𝖠𝖳^.

 Remark 47.

Note that decision lists are computable in 𝖠𝖢0 and thus can be computed by quasi-polynomial-size 𝖲𝖸𝖬𝖠𝖭𝖣 circuits. As a result, 𝖤𝖫𝖣𝖫 can be computed by quasi-polynomial-size circuit in 𝖲𝖸𝖬𝖠𝖭𝖣𝖤𝖳𝖧𝖱=𝖲𝖸𝖬𝖤𝖳𝖧𝖱=𝖲𝖸𝖬𝖬𝖠𝖩, where the second equality follows since 𝖤𝖳𝖧𝖱 is closed under 𝖠𝖭𝖣 operation. Still, Theorem 30 gives a polynomial reduction that translates to the case of small coefficients.

A.11 Proof of Theorem 37

Such constructions are likely known; we outline a simple one for completeness.

Lemma 48.

For any even integer k>0, there exists a CNF with n variables and n2kok(k)/k clauses with 2(1ok(1))n components.

Proof.

Assume k divides n. Divide the set of variables to n/k disjoint sets S1,,Sn/k of size k. For each set Si, define a CNF Ci which evaluates to 1 if and only if exactly half of the variables in S are equal to 1. This can be achieved with (kk/2)=2kok(k) clauses.

Then, the CNF C=C1Cn/k has exactly (kk/2)n/k=2(1ok(1))n satisfying assignments, and the Hamming distance between any two of such assignments is at least 2. Thus, each of them constitutes a component.

Hence, Theorem 37 follows from Lemmas 36 and 48 by taking k to be a constant independent of n. It is easy to extend the construction above to odd k. We omit the simple details.

Appendix B Circuits computing nearest neighbors

In this section we describe a straightforward construction of a depth-three circuit computing 𝖧𝖭𝖭 and then compress it to depth-two at the cost of exponential weights. The folklore result of [31] is that any 𝖭𝖭 representation with m anchors can be computed by a depth three threshold circuit with size O(m2). A short proof can be found in [24].

Theorem 49 ([31]).
  • 𝖭𝖭𝖮𝖱𝖠𝖭𝖣𝖳𝖧𝖱,𝖠𝖭𝖣𝖮𝖱𝖳𝖧𝖱

  • 𝖧𝖭𝖭𝖮𝖱𝖠𝖭𝖣𝖬𝖠𝖩,𝖠𝖭𝖣𝖮𝖱𝖬𝖠𝖩

Namely, every 𝖭𝖭 (𝖧𝖭𝖭) representation is computed by a depth-three 𝖠𝖢0𝖳𝖧𝖱(𝖬𝖠𝖩) circuit with size |P||N|+min{|P|,|N|}+1.

Note that the only difference between the circuits for 𝖧𝖭𝖭 and 𝖭𝖭 is that the first-level threshold gates are guaranteed to have polynomial weights (in the case of 𝖧𝖭𝖭). It turns out that the size of the 𝖧𝖭𝖭 circuit can be improved (when n|P|+|N|).

Lemma 50.
𝖧𝖭𝖭𝖮𝖱𝖠𝖭𝖣𝖬𝖠𝖩

In particular, every 𝖧𝖭𝖭 representation with m anchors is computed by an 𝖮𝖱𝖠𝖭𝖣𝖳𝖧𝖱 circuit with size (n+1)m+(n+1)|P|+1.

Proof.

Note that 𝟙[Δ(𝒙,𝒑)i] is computed by a threshold gate fi𝒑(𝒙) defined by 𝒘=𝒑𝒑¯ and θ=Δ(𝒑)i. (And similarly 𝟙[Δ(𝒙,𝒑)i].) Suppose f has an 𝖧𝖭𝖭 representation PN. Then, f(𝒙)=in𝒑P(fi𝒑(𝒙)𝒒Nfi𝒒(𝒙))

Note that the threshold circuits from Theorem 49 and Lemma 50 have size O(m2) and O(mn) respectively. In fact, the latter circuit can be compressed to a depth-two threshold circuit with exponential weights.

Theorem 51.
𝖧𝖭𝖭𝖳𝖧𝖱𝖬𝖠𝖩.

Namely, every 𝖧𝖭𝖭 representation with m anchors is computed by a threshold of 2nm majority gates.

Proof.

The first level will consist of 2mn gates fi𝒑, fi𝒑 which output 1 if and only if Δ(𝒙,𝒑)i and Δ(𝒙,𝒑)i, respectively, for 1in. Define the sum

gip(𝒙):=fi𝒑(𝒙)+fi𝒑(𝒙)1

and note that gip(𝒙)=𝟙[Δ(x,p)=i]. We can then write the output gate as

h(𝒙)=𝟙(pP,inm3(ni)+1gip(𝒙)qN,inm3(ni)giq(𝒙)0).

If some positive anchor is at distance at most j and all negative anchors are at distance at least j to 𝒙, then

pP,inm3(ni)+1gip(𝒙)m3(nj)+1qN,inm3(ni)giq(𝒙).

Conversely, if some negative anchor is at distance at most j and all positive anchors are at distance at least j+1, then

pP,inm3(ni)+1gip(𝒙)m3(nj)1<m3(nj)qN,inm3(ni)giq(𝒙).

 Remark 52.

Theorem 51 can be obtained through Theorem 14, as a consequence of the following result derived from [18]. We include the direct construction to avoid the slight increase in circuit size.

Lemma 53.
𝗆𝗉𝖯𝖳𝖥(poly(n))𝖳𝖧𝖱𝖬𝖠𝖩

Every 𝗆𝗉𝖯𝖳𝖥 with terms and maximum weight W is computed by a linear threshold of at most 4Wlog majority gates.

Proof.

Let 𝖯𝖳𝖥1,2 refer to Boolean functions (over {1,2}) equal to the sign of an n-variate polynomial. [18] prove that any 𝖯𝖳𝖥1,2 with terms and degree at most d is computed by a linear threshold (with exponential weights) of at most 2d majority gates (replacing {1,2} with {0,1}), and any 𝗆𝗉𝖯𝖳𝖥 with terms and maximum weight W can be represented by a 𝖯𝖳𝖥1,2 with terms and degree at most 2Wlog.

 Remark 54.

It is not hard to see that the circuits constructed in this section are polynomial-time uniform; they can be generated by a Turing machine given the set of anchors in polynomial time.