Nearest Neighbor Complexity and Boolean Circuits
Abstract
A nearest neighbor representation of a Boolean function is a set of vectors (anchors) labeled by or such that if and only if the closest anchor to is labeled by . This model was introduced by Hajnal, Liu and Turán [2022], who studied bounds on the minimum number of anchors required to represent Boolean functions under different choices of anchors (real vs. Boolean vectors) as well as the analogous model of -nearest neighbors representations.
We initiate a systematic study of the representational power of nearest and -nearest neighbors through Boolean circuit complexity. To this end, we establish a close connection between Boolean functions with polynomial nearest neighbor complexity and those that can be efficiently represented by classes based on linear inequalities – min-plus polynomial threshold functions – previously studied in relation to threshold circuits. This extends an observation of Hajnal et al. [2022]. Next, we further extend the connection between nearest neighbor representations and circuits to the -nearest neighbors case.
As an outcome of these connections we obtain exponential lower bounds on the -nearest neighbors complexity of explicit -variate functions, assuming . Previously, no superlinear lower bound was known for any . At the same time, we show that proving superpolynomial lower bounds for the -nearest neighbors complexity of an explicit function for arbitrary would require a breakthrough in circuit complexity. In addition, we prove an exponential separation between the nearest neighbor and -nearest neighbors complexity (for unrestricted ) of an explicit function. These results address questions raised by [17] of proving strong lower bounds for -nearest neighbors and understanding the role of the parameter . Finally, we devise new bounds on the nearest neighbor complexity for several families of Boolean functions.
Keywords and phrases:
Complexity, Nearest Neighbors, CircuitsCopyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Complexity classesAcknowledgements:
We would like to thank Bill Martin for several insightful comments. The second and third author thank the Simons Institute for the Theory of Computing where collaboration on this project has began.Editors:
Raghu MekaSeries and Publisher:

1 Introduction
A nearest-neighbor representation of a function is a set of vectors, called “anchors,” say such that if and only if the nearest anchor to (under the Euclidean distance metric) belongs to . The set of anchors can be seen as a (disjoint) union of “positive” and “negative” examples. If , we refer to the representation as Boolean, and if we call it real. This model was pioneered by [17], who advocated the study of Boolean functions admitting efficient representations, focusing on the number of anchors as a measure of the complexity of the representation. They also consider the -nearest neighbors variation, where the value of on input is computed as a majority vote of the nearest anchors to .
In particular, [17] observed a relationship between nearest-neighbor representations and threshold circuits. Motivated by their work, we initiate a systematic study of the connections to circuit complexity. Some of our results are related to the weight complexity (number of bits) needed when representing Boolean functions by real anchors: A topic receiving recent attention by [24].
There are numerous extensions and variations of nearest neighbor complexity. While studying all of these here is infeasible, one goal of ours is to encourage further exploration of this broad topic. We discuss some future directions in the conclusion section.
1.1 Motivation
We believe that nearest neighbor representations are a natural and interesting model of computation worthy of study. Furthermore, nearest neighbor complexity is relevant to several central research topics listed next.
1.1.1 Boolean function complexity
Nearest neighbor representations are related to depth-two threshold circuits and decision trees: [17] establish that lower bounds for these models are useful for establishing lower bounds for nearest neighbors. Polynomial threshold functions and their relation to geometric representations of Boolean functions also bear a strong connection to nearest neighbor representations. In fact, we will show that the expressivity of nearest neighbors is essentially equivalent to that of polynomial threshold functions over the tropical semiring – An interesting model of computation due to [18].
Our approach to understanding nearest neighbor complexity in terms of Boolean circuit complexity follows a long tradition in computational learning theory and computational complexity (e.g., [29, 22, 26, 27, 41, 39]). Uncovering new connections between nearest neighbors and well-studied computational models (such as linear decision lists and depth-two circuits) enables us to prove new unconditional lower bounds and upper bounds on the nearest neighbor complexity of explicit111By “explicit,” we mean a function that can be computed in polynomial time by a Turing machine. functions. It also allows us to phrase some open problems in circuit complexity in terms of nearest neighbor complexity. For instance, the difficulty of proving super-polynomial lower bounds for -nearest neighbors representations – for appropriate values of – follows from representations (that we construct) of circuits with the same difficulty.
1.1.2 Machine learning and pattern recognition
Learning algorithms based on nearest neighbors have been the subject of extensive research for more than 50 years [8, 11]. For example, there is evidence that increasing in the -nearest neighbors rule can decrease its estimation error [10]. However, much less seems to be known about the capacity of the nearest and -nearest neighbor rules to represent certain functions, while the capacity of other machine learning models such as Boolean circuits and neural networks has received considerable interest [16, 19, 30, 12, 38].
1.1.3 Algorithms for nearest neighbors classification and search
Efficient implementation of the nearest neighbor rule in high dimension has received considerable interest, leading to efficient algorithms and sophisticated data structures [7, 2, 1]. The study of nearest neighbor complexity leads naturally to the study of nonuniform algorithms (Boolean circuits) for this rule. Our work (along with previous findings) yields upper and lower bounds on the size of a circuit needed to implement nearest neighbor classification. We focus on exact representations, leaving the study of circuit complexity for approximate nearest neighbor search (e.g., [20, 28]) for future work.
1.2 Our results
The objects of our study are the classes of Boolean functions admitting real and Boolean nearest neighbor representation of polynomial complexity, and respectively. Standard complexity classes based on circuit-like models of computation are closed under two natural “rewiring” operations: Substitution of variables by constants and duplication of variables (See Definition 11). However, it is not clear how to efficiently perform these operations in the nearest neighbor model. Thus, if we would like to give a precise characterization of and in terms of circuits, we have to consider the closure of these classes under the same operations. As a result, we obtain classes of subfunctions of polynomial-size nearest neighbor representations, and (see Defintion 12). We then give a precise characterization (equality) of and in terms of the class of min-plus polynomial threshold functions of polynomial complexity ()222An mpPTF is an expression of the form where are linear forms.. This adds to previous results of [17] showing the containment of and in . As a consequence, we prove (among other results) that contains functions that cannot be computed by depth-two threshold circuits with polynomial size and weight. We also observe that the closure of is closely connected to the class of functions with representations of logarithmic bit-complexity.
We study the -nearest neighbors complexity of Boolean functions for . First, we extend the aforementioned characterization – the closure of in terms of – to the closure of . We use this characterization to prove that for constant is closely related to and that there exists an explicit function that requires exponential complexity when (for an -variate function). Next, we generalize the characterization of to arbitrary by introducing a new class, – functions realizable by an inequality of the -statistics of two sets of linear forms – which generalizes . Consequently, we show that proving lower bounds for for arbitrary would result in lower bounds for the circuit class , which would be a major breakthrough in Boolean circuit complexity.
Finally, we present new bounds for nearest neighbor complexity of specific Boolean functions such as disjointness, CNFs, and majority. For example, we show that CNFs with polynomially many clauses have representations with polynomially-many anchors which also exhibit constant bit-complexity. In contrast, there exist CNFs of polynomial size with exponential complexity. We also establish a new lower bound of on the complexity of the majority function with an even number of inputs (). This lower bound is tight, as it matches the upper bound proved in [17].
1.3 Related work
Nearest neighbor complexity (under Euclidean distance) was formalized by [17]. They prove that the functions333 and both require an exponential number of Boolean anchors but only and real anchors, respectively. In fact, the same argument proves that requires at least anchors for any , which gives exponential lower bound for bounded away from , but is vacuous for . It was subsequently shown in [24] that any symmetric Boolean function has an representation with anchors, where denotes the number of intervals of – the minimal number of contiguous intervals partitioning where is constant for – and this bound is optimal when all intervals have the same length. This extends the result of [17] that every symmetric function has nearest neighbor complexity at most .
1.3.1 Connections to circuits
It was observed in [17] that functions with polynomial nearest neighbor complexity can be simulated by min-plus polynomial threshold functions, but it is an open question of whether or not the inclusion is proper. Relations to the class are of interest because it deepens the connection between nearest neighbors and circuit complexity. For instance, [18] establish that systems of s compute exactly the class of circuits.
The expressive power of -nearest neighbors rule was also studied by [17]. In particular, they prove that can simulate linear decision trees, which yields a linear (in ) lower bound for the number of anchors in representations of the function. They also state the open problem of proving stronger lower bounds for the -nearest neighbors complexity of an explicit function. In [25], it is shown that, under some regularity assumptions, polynomial-size nearest neighbor representations can simulate convex polytopes (i.e., ) as well as logarithmic fan-in circuits and linear decision lists ().
Constructions of Boolean circuits computing nearest neighbor classification are known. [31] constructs an circuit computing any function with an -anchor representation in size . (See Appendix B). A very similar depth-three construction for , also with size , was found by [6]. Note that the weights of the above circuits are bounded by a polynomial in .
1.3.2 Bit complexity
It was shown in [24] that (the aforementioned) representations for symmetric functions have logarithmic bit-complexity, and that this is tight for some functions. It is left as an open problem to characterize representations of threshold functions in terms of bit-complexity. To this end, the same authors (in [25]) show that logarithmic bit complexity suffices to represent the comparison, equality, and odd-max bit Boolean functions, and conjecture that a logarithmic upper bound holds for any threshold function.
Other works have studied the role of bit-complexity in approximate nearest neighbor search; where we wish to find an anchor whose distance is minimal to a query point, up to a factor of . For example, [21] provide tight bounds (in terms of bit-complexity) on the size of data structures performing approximate nearest neighbor search. This setting is quite different from our focus on exact classification of Boolean vectors.
The bit-complexity of the weights in polynomial-size threshold circuits has been studied extensively (see, e.g., surveys [34, 37]). For example, it was proved by [14, 15] that arbitrarily large weights can be reduced to have logarithmic bit-complexity by slightly increasing the depth of the circuit (along with a polynomial blow-up in size).
1.4 Organization
Section 2 outlines basic definitions required in subsequent sections. Section 3 establishes the equivalence between and min-plus polynomial threshold functions, then discusses some of the consequences. Section 4 generalizes to a new class, , and proves that a similar equivalence holds with . Here, we also derive several connections to circuit classes such as . Section 5 contains new results (upper and lower bounds) for the nearest neighbor complexity of explicit Boolean functions. Many proofs are relegated to Appendix A due to space constraints. Appendix B contains some direct constructions of threshold circuits computing , one of which has depth two.
2 Preliminaries
We use the following notation throughout the paper:
Vectors are written in bold (i.e. ). |
---|
The ’th statistic of , denoted , is the ’th smallest element of . – In particular, |
denotes the squared Euclidean distance between . |
denotes the real inner product (dot product), . |
refers to an arbitrary polynomial in the variable . |
denotes the Boolean function whose value is 1 if and only if holds. |
Note that the (squared) Euclidean distance between two Boolean vectors is equal to their Hamming distance, , so the Hamming weight of a Boolean vector is denoted .
2.1 Boolean functions
Definition 1.
A threshold gate is a Boolean function defined by a weight vector and a threshold such that
(1) |
A threshold circuit is a sequence of gates such that the first gates are equal to the input variables (i.e., for ) and subsequent gates are threshold gates whose inputs are some subset of the previous gates. The output of the circuit is equal to the output of the final gate. The size of the circuit is equal to .
A threshold circuit can be viewed as a directed acyclic graph. Nodes with fan-in 0 correspond to inputs, and other nodes correspond to threshold gates applied to the values of the preceding nodes. The node with fan-out 0 correspond to the output node. The depth of the circuit is the length of the longest path from an input node to the output node.
Remark 2.
It is well known that we may assume that the weights (and the threshold) are integers without loss of generality: Since the domain of a threshold gate is finite, we may approximate each weight by a rational number and multiply by a common denominator. See [23] for a comprehensive introduction to circuit complexity.
Definition 3.
A Nearest Neighbor () representation of a Boolean function is defined by two disjoint sets of positive and negative anchors such that
-
if there exists a with for all .
-
if there exists a with for all .
A Hamming Nearest Neighbor () representation is defined identically for Boolean anchors in .
Definition 4.
A -Nearest Neighbors () representation of a function is defined by two disjoint sets of positive and negative anchors and an integer such that
there exists an with the following properties:
-
1.
-
2.
-
3.
for all , .
A representation is defined identically for Boolean anchors.
Definition 5 ([18]).
A min-plus polynomial threshold function () is a Boolean function defined by two sets of linear forms with integer coefficients444As for threshold gates, there is no loss of generality in the assumption that weights of s are integers. satisfying
(2) |
The number of terms in an is equal to , and the maximum weight is equal to the largest absolute value of the coefficients of any form.
Definition 6 ([36]).
A linear decision list () representation of a Boolean function is a sequence of instructions “if , then output (and halt)” for , followed by “output 0.” Here, are threshold gates and . Exact linear decision lists () are defined similarly using exact threshold functions – threshold gates where the inequality in (1) is replaced with equality. The length of an or is the number of gates, , and its maximum weight is equal to the largest coefficient of any .
Definition 7.
We consider the following well-known Boolean functions.
The majority function, |
The disjointness function, |
The inner product mod 2 function, |
The odd-max-bit function, |
2.2 Function classes
First, we define classes of Boolean circuits whose inputs may be variables, their negations, or the constants 0 and 1. , , , and are the classes of polynomial-size555“polynomial” in this context is always with respect to the input size, . depth-one circuits composed of , , threshold gates, and symmetric functions (i.e., Boolean functions which depend only on the Hamming weight of the input) respectively. is the set of threshold gates with polynomial weights666We abuse the notation denoting by both specific function and a class of function. The meaning of our notation will be also clear from the context.. is the class of constant-depth circuits consisting of a polynomial number of , , and gates.
For two circuit classes , , the class of circuits consisting of a circuit from whose inputs are (the outputs of) a polynomial number of circuits from is denoted by . (e.g., refers to depth two threshold circuits of polynomial size.)
Definition 8.
is the class of Boolean functions that have nearest neighbor representations with polynomially-many anchors. is the same class where anchors are Boolean. and are defined in the same manner for a positive integer .
Definition 9.
is the class of min-plus polynomial threshold functions with a polynomial number (in terms of the number of inputs) of terms and unbounded maximum weight. is the same class with polynomially-bounded maximum weight.
Definition 10.
is the class of Boolean functions representable by linear decision lists with polynomial length. is the same class with polynomially-bounded maximum weight. and are defined similarly for exact linear decision lists.
3 Min-plus PTFs vs. nearest neighbors
In this section, we introduce the closure operation and derive an equivalence between (the closure of) , and .
Definition 11.
Define a substitution of variables as a function where duplicates variables or adds constant variables (e.g., ). Then, a Boolean function is a subfunction of when and there exists a substitution of variables such that for all .
Subfunctions may equivalently be obtained from by identifying variables (e.g., ) and assigning variables to constants (e.g., ).
Definition 12.
For any function class , let denote the closure of : The set of subfunctions derived from the elements of . In particular, we say that a Boolean function has an “ representation” if it is a subfunction of some .
Note 13.
Circuit classes are already closed under this operation. For example, : Subfunctions of the majority function simply add (polynomially-bounded) coefficients and constant terms.
Theorem 14.
Theorem 14 and some consequences are proved in Appendix A.1. Namely, we observe that any -variate function in is a sub-function of an -variate representation, and that captures precisely the power of representations with bit-complexity . Then, using the results of [17] and [18], we immediately establish the following two corollaries.
Corollary 15.
Corollary 16.
representations of and require anchors.
(Proofs in A.2 and A.3.) Theorem 14 also yields lower bounds for the circuit complexity of functions belonging to . (A direct construction in Appendix B shows that .)
Theorem 17.
More precisely, there is a Boolean function with an representation with anchors which cannot be computed by a depth-two majority circuit with gates.
Proof.
First, we claim that . Indeed, is computed by an with terms:
where . Note that if , then and otherwise . Hence, the minimum is obtained at the maximum index where . The claim follows from Theorem 14.
Second, it is known that by [4, 16]. Thus, if was in , then we could use the representation described above to get a circuit computing , which is a contradiction.
Finally, we observe a connection between s and linear decision lists. This provides additional proof techniques for and helps to relate a question of separation of and to the similar question for linear decision lists. The following lemma is proved in Appendix A.4.
Lemma 18.
More precisely, any with terms and maximum weight is equivalent to a linear decision list with length and maximum weight .
Remark 19.
This lemma enables another technique to prove lower bounds for apart from sign-rank. More specifically, it is known that any function without large monochromatic rectangles must have a large linear decision list by [5].
Lemma 20.
.
Proof.
It was shown in [18, Lemma 22] that . Our lemma follows since is complete for the class of decision lists – See [18, Lemma 22].
It is open whether and are equal by [5]. Lemmas 18 and 20 immediately allow us to relate this problem to the problem of separating and .
Corollary 21.
If , then .
Proof.
4 kNN vs. Circuits
In this section, we give a circuit-style characterization of and provide connections to known circuit classes. From these results, we obtain a separation between and . Additionally, our results imply complexity theoretic barriers for proving superpolynomial lower bounds for representations of explicit functions.
4.1 Characterization for small
Here, we use the connection to representations to get our first results on -nearest neighbors complexity. In particular, we relate -nearest neighbors representations for constant to and prove a lower bound on -nearest neighbors complexity for sublinear .
Theorem 22.
Any Boolean function with an -anchor representation is computed by an with terms.
Proof.
We prove only the first statement as both arguments are identical. As noted in the proof of Theorem 14, the distances from anchors to a query point are linear forms . Assign each linear form a label where a positive label indicates placement on the left-hand side of the and vice versa.
Then, consider the collection and the compliment . The resulting with terms, , realizes the original representation: The minimum is attained by groups of -nearest neighbors and if any such group has a positive majority then the inequality holds.
It follows that Boolean functions with -anchor representations can be represented in with anchors. These results generalize to both weighted and to non-Boolean inputs. See Appendix A.5 for a discussion.
As a consequence of Theorem 22, sign-rank lower bounds (e.g., Corollary 16) also apply to . In particular, we get an exponential lower bound for with for constant . This addresses an open question posed in [17] regarding -nearest neighbors complexity.
Corollary 23.
Any representation of or requires anchors.
Proof.
4.2 Characterization for arbitrary
In this section, we generalize the ideas of Theorem 14 to the closure of , yielding further connections between nearest neighbors and circuit complexity.
Definition 24.
Define by the class of functions representable by an inequality between -statistics of two sets consisting of a polynomial number of linear forms: Given and integers and ,
(3) |
and is bounded by a polynomial in .
As usual, we can assume that all coefficients in the linear forms are integers. Define the subclass where all coefficients are bounded by a polynomial in 777 can be viewed as a special case of in which ..
Note that we can reduce Definition 24 to the case of with only a linear increase in the size. This can be done by adding “dummy” linear forms that are always smaller than all others.
Theorem 25.
See Appendix A.6 for the proof. Next, we provide another equivalent form of that is sometimes more convenient.
Theorem 26.
The class consists exactly of functions for which there exist linear forms with , a positive integer , and a labelling function , such that for all ,
(4) |
The class consists exactly of functions with the same representation with polynomial-size coefficients in the linear forms.
See Appendix A.7 for the proof. Now we show that some well-known circuit classes, for which we do not have any known lower bounds, are computable by .
Theorem 27.
Any symmetric function of threshold functions has a representation with .
See Appendix A.8 for the proof. Using the same strategy, we can embed a large complexity class into directly:
Theorem 28.
Any symmetric function of conjunctions has a representation with .
See Appendix A.9 for the proof.
Remark 29.
As a result of Theorem 28, if we prove for some explicit function that , it will follow that , and this would be a major breakthrough in circuit complexity. Also note that and thus, by Theorem 28, . Together with Corollary 16, this gives a separation between and . This also shows that in Corollary 23 we cannot get rid of in the lower bound.
Theorem 30.
, .
See Appendix A.10 for the proof.
Remark 31.
The class is known to be contained in and proving super-polynomial lower bounds for is an open problem (See [9]).
5 New bounds for the nearest neighbor complexity of Boolean functions
In this section, we derive several bounds on the nearest neighbor complexity of Boolean functions.
5.1 Nearest neighbor complexity of CNFs
We first show that any CNF admits an efficient representation.
Theorem 32.
Any CNF or DNF with clauses has an representation with anchors and constant bit-complexity.
Proof.
It suffices to prove the statement for DNFs as any CNF can be converted to a DNF by negation.
Let and note that for every input (where is the squared Euclidean distance). For each clause, say , introduce a positive anchor
If any variable is negated, replace the corresponding (or ) with (or ).
If , then . Otherwise, some literal in is equal to zero, hence . Therefore, the entire DNF, say , is satisfied if and only if some is a nearest neighbor of .
The polynomial-size representation above does not generalize to deeper circuits of depth larger than . For instance, Corollary 16 exhibits a function computable by a depth-three De Morgan circuit of polynomial size which does not belong to . For the well studied disjointness function (that admits a compact CNF representation) we can get an efficient representation:
Theorem 33.
The disjointness function (in dimensions) has an representation with anchors.
Proof.
Consider anchors and where denotes the ’th standard basis vector and their concatenation.
Let and suppose for some . Then, for all it holds that with equality when . Otherwise, for all .
Remark 34.
Proceeding, we show that some CNFs with polynomially many clauses have exponential Boolean nearest neighbor complexity.
Definition 35.
The Hamming cube graph is an undirected graph with vertices and edges . The components of a Boolean function are the connected components of the subgraph of the Hamming cube graph induced by the vertex set .
Lemma 36.
If a Boolean function has components then any representation of has at least anchors.
Proof.
Consider some component of and let denote the vertex boundary of : Vertices in with a neighbor in . Note that .
Suppose has representation and let be the nearest anchor to some . Assume for contradiction that . Note that is equal to the length of the shortest path from to in the Hamming cube graph, which by assumption must contain some . (In particular, .) Thus, there must exist some negative anchor with . By the triangle inequality,
which contradicts the minimality of . Thus, each component contains an anchor.
Using the previous results, another separation between and follows from the existence of a CNF (over -variables) with clauses and exponentially (in ) many components. (See Appendix A.)
Theorem 37.
For any , there exists a -CNF over variables with clauses for which any representation has anchors.
5.2 A new lower bound for majority
We now discuss the disparity between the complexity of the majority function in [17, Theorem 4]: In particular, when is even, the best upper bound is anchors, whereas anchors suffices when is odd. Note that if ties were allowed (won by positive anchors) in Definition 3, then and would suffice as an representation for for all .
Theorem 38.
For even , any representation of requires anchors.
Proof.
Suppose is an representation of for even . We claim that for each satisfying , there is a positive anchor with in coordinate-wise order:
It follows from [17] that the nearest anchor to satisfies . Indeed, for some it holds that , so suppose for contradiction that . Then, construct and let be the nearest anchor to . This yields , contradicting the fact that
(5) |
A similar argument shows that . Hence, , and (5) becomes which implies that , proving the claim.
For contradiction, assume that . Since there must be at least one negative anchor, we have . Then, we can construct with for which there is no positive anchor with , leading to a contradiction: For each , arbitrarily select some where and set , ensuring . After this process, . Arbitrarily fixing more coordinates of to so that completes the construction.
6 Conclusion
We have studied nearest neighbor representations of Boolean functions, proving new lower and upper bounds and devising connections to circuit complexity. There are many future questions and research directions:
-
Studying nearest neighbor complexity with respect to additional discrete domains such as grids as well as more than two labels is an interesting future direction.
-
Circuit complexity has been used to derive new algorithms for nearest neighbor problems [1]. Can ideas about nearest neighbor complexity such as connections to mpPTFs be used to obtain new algorithms for nearest neighbor classification and search?
-
Finally, it remains open whether .
References
- [1] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 136–150. IEEE, 2015. doi:10.1109/FOCS.2015.18.
- [2] Alexandr Andoni. Nearest neighbor search: the old, the new, and the impossible. PhD thesis, Massachusetts Institute of Technology, 2009.
- [3] Richard Beigel and Jun Tarui. On ACC. Comput. Complex., 4:350–366, 1994. doi:10.1007/BF01263423.
- [4] Harry Buhrman, Nikolay Vereshchagin, and Ronald de Wolf. On computation and communication with small bias. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 24–32. IEEE, 2007. doi:10.1109/CCC.2007.18.
- [5] Arkadev Chattopadhyay, Meena Mahajan, Nikhil S. Mande, and Nitin Saurabh. Lower bounds for linear decision lists. Chic. J. Theor. Comput. Sci., 2020, 2020. URL: http://cjtcs.cs.uchicago.edu/articles/2020/1/contents.html.
- [6] Yan Qiu Chen, Mark S Nixon, and Robert I Damper. Implementing the k-nearest neighbour rule via a neural network. In Proceedings of ICNN’95-International Conference on Neural Networks, volume 1, pages 136–140. IEEE, 1995. doi:10.1109/ICNN.1995.488081.
- [7] Kenneth L Clarkson. Nearest neighbor queries in metric spaces. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 609–617, 1997. doi:10.1145/258533.258655.
- [8] Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967. doi:10.1109/TIT.1967.1053964.
- [9] Yogesh Dahiya, K. Vignesh, Meena Mahajan, and Karteek Sreenivasaiah. Linear threshold functions in decision lists, decision trees, and depth-2 circuits. Inf. Process. Lett., 183:106418, 2024. doi:10.1016/J.IPL.2023.106418.
- [10] Luc Devroye. On the asymptotic probability of error in nonparametric discrimination. The Annals of Statistics, 9(6):1320–1327, 1981.
- [11] Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition, volume 31. Springer Science & Business Media, 2013.
- [12] Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks. In Conference on learning theory, pages 907–940. PMLR, 2016. URL: http://proceedings.mlr.press/v49/eldan16.html.
- [13] Jürgen Forster. A linear lower bound on the unbounded error probabilistic communication complexity. Journal of Computer and System Sciences, 65(4):612–625, 2002. doi:10.1016/S0022-0000(02)00019-3.
- [14] Mikael Goldmann, Johan Håstad, and Alexander Razborov. Majority gates vs. general weighted threshold gates. Computational Complexity, 2:277–300, 1992. doi:10.1007/BF01200426.
- [15] Mikael Goldmann and Marek Karpinski. Simulating threshold circuits by majority circuits. SIAM Journal on Computing, 27(1):230–246, 1998. doi:10.1137/S0097539794274519.
- [16] András Hajnal, Wolfgang Maass, Pavel Pudlák, Mario Szegedy, and György Turán. Threshold circuits of bounded depth. Journal of Computer and System Sciences, 46(2):129–154, 1993. doi:10.1016/0022-0000(93)90001-D.
- [17] Péter Hajnal, Zhihao Liu, and György Turán. Nearest neighbor representations of boolean functions. Information and Computation, 285:104879, 2022. doi:10.1016/J.IC.2022.104879.
- [18] Kristoffer Arnsfelt Hansen and Vladimir V Podolskii. Polynomial threshold functions and boolean threshold circuits. Information and Computation, 240:56–73, 2015. doi:10.1016/J.IC.2014.09.008.
- [19] Lisa Hellerstein and Rocco A Servedio. On PAC learning algorithms for rich boolean function classes. Theoretical Computer Science, 384(1):66–76, 2007. doi:10.1016/J.TCS.2007.05.018.
- [20] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998. doi:10.1145/276698.276876.
- [21] Piotr Indyk and Tal Wagner. Approximate nearest neighbors in limited space. In Conference On Learning Theory, pages 2012–2036. PMLR, 2018. URL: http://proceedings.mlr.press/v75/indyk18a.html.
- [22] Jeffrey C Jackson, Adam R Klivans, and Rocco A Servedio. Learnability beyond AC0. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 776–784, 2002.
- [23] Stasys Jukna. Boolean function complexity: advances and frontiers, volume 5. Springer, 2012. doi:10.1007/978-3-642-24508-4.
- [24] Kordag Mehmet Kilic, Jin Sima, and Jehoshua Bruck. On the information capacity of nearest neighbor representations. In 2023 IEEE International Symposium on Information Theory (ISIT), pages 1663–1668, 2023. doi:10.1109/ISIT54713.2023.10206832.
- [25] Kordag Mehmet Kilic, Jin Sima, and Jehoshua Bruck. Nearest neighbor representations of neurons. In 2024 IEEE International Symposium on Information Theory (ISIT), 2024.
- [26] Adam R Klivans and Rocco A Servedio. Learning DNF in time . Journal of Computer and System Sciences, 2(68):303–318, 2004.
- [27] Adam R Klivans and Rocco A Servedio. Toward attribute efficient learning of decision lists and parities. Journal of Machine Learning Research, 7(4), 2006. URL: https://jmlr.org/papers/v7/klivans06a.html.
- [28] Eyal Kushilevitz, Rafail Ostrovsky, and Yuval Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 614–623, 1998. doi:10.1145/276698.276877.
- [29] Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, fourier transform, and learnability. Journal of the ACM (JACM), 40(3):607–620, 1993. doi:10.1145/174130.174138.
- [30] James Martens, Arkadev Chattopadhya, Toni Pitassi, and Richard Zemel. On the representational efficiency of restricted boltzmann machines. Advances in Neural Information Processing Systems, 26, 2013.
- [31] O Murphy. Nearest neighbor pattern classification perceptrons. Neural Networks: Theoretical Foundations and Analysis, pages 263–266, 1992.
- [32] Edward A Patrick and Frederic P Fischer III. A generalized k-nearest neighbor rule. Information and control, 16(2):128–152, 1970. doi:10.1016/S0019-9958(70)90081-1.
- [33] Alexander A Razborov. On the distributional complexity of disjointness. In International Colloquium on Automata, Languages, and Programming, pages 249–253. Springer, 1990. doi:10.1007/BFB0032036.
- [34] Alexander A Razborov. On small depth threshold circuits. In Scandinavian Workshop on Algorithm Theory, pages 42–52. Springer, 1992. doi:10.1007/3-540-55706-7_4.
- [35] Alexander A Razborov and Alexander A Sherstov. The sign-rank of AC0. SIAM Journal on Computing, 39(5):1833–1855, 2010.
- [36] Ronald L Rivest. Learning decision lists. Machine learning, 2:229–246, 1987. doi:10.1007/BF00058680.
- [37] Michael E. Saks. Slicing the hypercube, pages 211–256. London Mathematical Society Lecture Note Series. Cambridge University Press, 1993.
- [38] Matus Telgarsky. Benefits of depth in neural networks. In Conference on learning theory, pages 1517–1539. PMLR, 2016. URL: http://proceedings.mlr.press/v49/telgarsky16.html.
- [39] Gal Vardi, Daniel Reichman, Toniann Pitassi, and Ohad Shamir. Size and depth separation in approximating benign functions with neural networks. In Conference on Learning Theory, pages 4195–4223. PMLR, 2021. URL: http://proceedings.mlr.press/v134/vardi21a.html.
- [40] Nikhil Vyas and R. Ryan Williams. Lower bounds against sparse symmetric functions of ACC circuits: Expanding the reach of #SAT algorithms. Theory Comput. Syst., 67(1):149–177, 2023. doi:10.1007/S00224-022-10106-8.
- [41] R. Ryan Williams. Limits on representing boolean functions by linear combinations of simple functions: Thresholds, relus, and low-degree polynomials. In 33rd Computational Complexity Conference (CCC 2018). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018.
Appendix A Omitted proofs
A.1 Proof of Theorem 14
We break the proof of this theorem into two separate lemmas.
Lemma 39.
More precisely, any representation with anchors is equivalent to an with terms, and any representation with anchors in dimensions is equivalent to an with terms and maximum weight .
Proof.
The distance from to an anchor is a linear form in variables :
We can observe that representations essentially compute , which is an . Subfunctions merely multiply coefficients and add constants to each linear form – For example, .
In the case of , we have for all anchors that and is a linear form with coefficients and positive constants bounded (in absolute value) by . As a result, the weights in are bounded by as well.
Lemma 40.
More precisely, any with terms has an representation with anchors in dimensions. Any with terms and maximum weight has an representation with anchors in dimensions.
Proof.
We start with the case. Let be an arbitrary . First make some pre-processing steps. First, multiply each linear form by and add one to the right-hand side, so that ties are won by the left-hand side. Second, we would like to make all coefficients positive. For this, while there exists a negative term (or constant ), just add (or ) to every linear form until all negative terms are eliminated. No coefficient (or constant) will increase by more than . Third, we make all coefficients even by multiplying all linear forms by two. Finally, we add the same constant (to be decided later) to all linear forms. Then, every linear form is equal to , for positive, even constants .
Define block sizes by (i.e., the maximum coefficient of in any linear form). Also define and let . Inputs are mapped () to query points and linear forms are mapped to anchors such that . In particular,
and
where will be chosen momentarily. (Let and .) The distance between and is equal to
Now let so that . This is valid (i.e., is a non-negative integer) if we choose a large enough value for : The minimal value of such that for all is
Thus, for , we may always choose . Observe that by duplicating each at most times and introducing at most constant variables. Thus, the original is equivalent to a subfunction of an representation with anchors at most dimensions.
For the case, the same method applies, only now we do not need to increase the dimension that much. All coefficients can be realized by choosing anchors and all constants can be corrected using one additional dimension.
From this we can also deduce the following:
Theorem 41.
Any function with an -anchor representation with bit-complexity is equivalent to an with terms. Any function of inputs with an representation with terms is equivalent to a subfunction of a function of inputs with an -anchor representation with bit-complexity .
Proof.
A.2 Proof of Corollary 15
Proof.
A.3 Proof of Corollary 16
Proof.
A.4 Proof of Lemma 18
Proof.
Consider a function and let be its representation. We can assume that all possible values of all linear forms are distinct. For this it is enough to multiply all forms by and to add to each form it’s own unique remainder modulo .
Observe that all linear forms obtain only polynomially many variables (since there output is polynomially bounded in absolute value). Denote possible values of the form by for some polynomially bounded in . Note that, for different linear forms, the number of the values obtained might be not the same. To simplify the notation we assume that we add several equal values to the list to make them all of equal size .
Now we are ready to produce the decision list. Let and . We consider each in increasing order and query if . If so, we output . If not, we proceed to the next .
This decision list computes since we are just looking for the minimal value of a linear form among all possible values of the forms.
A.5 Consequences of Theorem 22
Corollary 42.
Any Boolean function with a representation with anchors has an representation with anchors. (Similarly, Boolean function with a representation with anchors has an representation with anchors.)
Remark 43.
Theorem 22 and Corollary 42 can be extended to non-Boolean inputs. More precisely, the same statements are true over any finite domain . For this we can express (squared) distances to anchors as quadratic forms, for each subset of distances of size consider the average of these distances and represent them as a distance to a new anchor. We still need to add an extra dimension to absorb constant terms.
Remark 44.
Theorem 22 and Corollaries 42 and 23 can be extended to the case of weighted . Indeed, in Theorem 22, instead of sums of linear forms we will have weighted sums. This will require terms in the representation. If the weights in the weighted representation are small and the bit-complexity of anchors is small, this results in a representation and if there are no restrictions of weights and bit-complexity, we get representation. The proof of Corollary 23 still works despite the increase of the number of anchors to .
A.6 Proof of Theorem 25
We first make the following general observation: [32] show that finding the ’th nearest positive anchor and ’th nearest negative anchor and classifying based on which is closest is equivalent to computing a -nearest neighbors representation. This fact can be generalized, considering the closure of .
Lemma 45.
Let and be two sets of numbers and let be the smallest elements of . Then,
where . (As in , we assume exists and is unique).
Proof.
contains a majority of the elements in if and only if . This happens if and only if the ’th smallest element in is smaller than the ’th smallest element in .
We now proceed with the proof of Theorem 25.
Proof.
For the inclusion , consider any function in . It is a subfunction of some function with a representation . As in Lemma 39, the distances between and each anchor are linear forms and which we assume have integer coefficients by the usual finite precision argument. By definition if and only if the set of -nearest neighbors satisfies . By Lemma 45, this happens if and only if , taking . Hence, . As is closed under taking subfunctions, as well.
For the inclusion , assume that has a representation. By adding dummy linear forms we can have . By Lemma 45, the inequality (3) holds if and only if the smallest linear forms consist of more linear forms from the left-hand side than the right. Representing each inequality by an anchor, we obtain a representation of the same function in .
The case of and is analogous.
A.7 Proof of Theorem 26
Proof.
Suppose a Boolean function has a representation satisfying (4) for some function and integer . We will show that . First, we assume that all coefficients in all linear form are integers and ensure that all values of all linear forms are distinct and even. For this, multiply all forms by and shift each form by its own even remainder modulo .
For each , we add one linear form to each side of (3). If , then place the form on the left-hand side and on the right. If , put the on right-hand side and on the left. It is easy to see that the ’th statistics in the left and right-hand sides of the resulting representation are and (not necessarily in that order), where is the ’th statistic of the original representation. Hence, the inequality in (3) holds if and only if .
For the other direction, assume we have a function given by (3). We again assume that all coefficients are integers and all values of all linear forms are distinct. Now we construct the required representation of . For each form we add to the representation the forms for all , and for each form we add to the representation the forms for all . (That is, we have copies of each form and copies of each form ). To each , we assign the label if , and if . Finally, we set .
Now, observe that the inequality (3) holds if and only if, among the smallest forms, there are at least forms . Assume that there are precisely forms and forms . In particular, . Then, in the new representation, these linear forms give us
smallest forms. By construction, the next smallest forms are either or for some . Thus, the ’th smallest form is either or and its label is if and only if as desired.
A.8 Proof of Theorem 27
Proof.
Suppose we are given a function and a circuit computing it. We are going to construct a representation of in the form given by Theorem 26.
We can assume that all gates in the circuit have the same threshold . For this we can just add dummy variables and fix them to constants. Denote the linear forms for gates by (all weights are integers) and denote by the symmetric function at the top of the circuit. Here, is the size of the circuit. Now, construct a representation with the following linear forms:
(6) |
That is, we multiply each linear form by and add constant linear forms with values . We let .
It is easy to see that the ’th statistic of (6) is always one of the constant linear forms. It is the form if and only if of the linear forms among are positive. We assign label to the form if and only if for inputs of weight . As a result, we get the desired representation for and show that .
A.9 Proof of Theorem 28
Proof.
First, as a warm-up, we show that . Recall that . Denote by an -dimensional vector with in each coordinate. Note that for all .
For each introduce two anchors and . If for some we have , then
If, on the other hand, or , then
For each and and introduce an anchor . For with it is not hard to see that
and for all . The situation is symmetric for . We assign label to the anchor . We assign label to the anchor iff is odd. We let .
It is easy to see that for a given among the closest anchors we have all pairs of anchors for all such that . Denote the number of such by . Also among the closest anchors we will have pairs of anchors for an appropriate and for . In each of these pairs the labels of anchors are opposite and they cancel out when we compute the majority. Finally, one last anchor we will have among the closest anchors is . The label of this anchor determines the majority among the closest anchors and it is 1 iff is odd. As a result, we get the desired representation for with anchors.
Now we extend this argument to . Consider a function , where each has the form for some disjoint . For each we let be a couple of parameters to be fixed later. We introduce a pair of anchors in the following way: Set the th coordinate of to
It is easy to see that for such that we have and for such that we have We fix in such a way that and . We set .
We construct anchors for and the same way as above and assign to be equal to for of weight and to be the opposite. We let . The same argument as for shows that we get the desired representation of with anchors.
A.10 Proof of Theorem 30
Proof.
Consider a function and suppose the linear forms in its representation are . Here corresponds to the ’th query. As in the proof of Theorem 27, we can assume that all thresholds in all linear forms are 0.
We are going to construct a representation for of the form provided by Theorem 26. We add to this representation the following linear forms:
That is, for each form in representation, we add the two forms and . We set .
Assume that for some we have and all previous linear forms are non-zero. We than have that . It is not hard to see that for we have that among forms and exactly one is greater than : it is the first one if and the second one if . For in a similar way we can see that among the forms and exactly one is greater than : it is the first one if and the second one if . As a result there are exactly forms that are greater than . We assign to this form the same label has in . From this it follows that the constructed representation computes the same function.
Clearly, the coefficients in the constructed form are polynomially related to the coefficients in the original forms. Thus, the same proof gives .
Remark 47.
Note that decision lists are computable in and thus can be computed by quasi-polynomial-size circuits. As a result, can be computed by quasi-polynomial-size circuit in , where the second equality follows since is closed under operation. Still, Theorem 30 gives a polynomial reduction that translates to the case of small coefficients.
A.11 Proof of Theorem 37
Such constructions are likely known; we outline a simple one for completeness.
Lemma 48.
For any even integer , there exists a CNF with variables and clauses with components.
Proof.
Assume divides . Divide the set of variables to disjoint sets of size . For each set , define a CNF which evaluates to if and only if exactly half of the variables in are equal to . This can be achieved with clauses.
Then, the CNF has exactly satisfying assignments, and the Hamming distance between any two of such assignments is at least . Thus, each of them constitutes a component.
Appendix B Circuits computing nearest neighbors
In this section we describe a straightforward construction of a depth-three circuit computing and then compress it to depth-two at the cost of exponential weights. The folklore result of [31] is that any representation with anchors can be computed by a depth three threshold circuit with size . A short proof can be found in [24].
Theorem 49 ([31]).
Namely, every () representation is computed by a depth-three circuit with size .
Note that the only difference between the circuits for and is that the first-level threshold gates are guaranteed to have polynomial weights (in the case of ). It turns out that the size of the circuit can be improved (when ).
Lemma 50.
In particular, every representation with anchors is computed by an circuit with size .
Proof.
Note that is computed by a threshold gate defined by and . (And similarly .) Suppose has an representation . Then,
Note that the threshold circuits from Theorem 49 and Lemma 50 have size and respectively. In fact, the latter circuit can be compressed to a depth-two threshold circuit with exponential weights.
Theorem 51.
Namely, every representation with anchors is computed by a threshold of majority gates.
Proof.
The first level will consist of gates , which output if and only if and , respectively, for . Define the sum
and note that . We can then write the output gate as
If some positive anchor is at distance at most and all negative anchors are at distance at least to , then
Conversely, if some negative anchor is at distance at most and all positive anchors are at distance at least , then
Remark 52.
Lemma 53.
Every with terms and maximum weight is computed by a linear threshold of at most majority gates.
Proof.
Let refer to Boolean functions (over ) equal to the sign of an -variate polynomial. [18] prove that any with terms and degree at most is computed by a linear threshold (with exponential weights) of at most majority gates (replacing with ), and any with terms and maximum weight can be represented by a with terms and degree at most .
Remark 54.
It is not hard to see that the circuits constructed in this section are polynomial-time uniform; they can be generated by a Turing machine given the set of anchors in polynomial time.