The List-Decoding Size of Fourier-Sparse Boolean Functions

A function defined on the Boolean hypercube is $k$-Fourier-sparse if it has at most $k$ nonzero Fourier coefficients. For a function $f: \mathbb{F}_2^n \rightarrow \mathbb{R}$ and parameters $k$ and $d$, we prove a strong upper bound on the number of $k$-Fourier-sparse Boolean functions that disagree with $f$ on at most $d$ inputs. Our bound implies that the number of uniform and independent random samples needed for learning the class of $k$-Fourier-sparse Boolean functions on $n$ variables exactly is at most $O(n \cdot k \log k)$. As an application, we prove an upper bound on the query complexity of testing Booleanity of Fourier-sparse functions. Our bound is tight up to a logarithmic factor and quadratically improves on a result due to Gur and Tamuz (Chicago J. Theor. Comput. Sci., 2013).


Introduction
Functions defined on the Boolean hypercube {0, 1} n = F n 2 are fundamental objects in theoretical computer science. It is well known that every such function f : F n 2 → R can be represented as a linear combination of the 2 n functions {χ S } S⊆ [n] defined by χ S (x) = (−1) ∑ i∈S x i . This representation is known as the Fourier expansion of the function f , and the numbersf (S) are known as its Fourier coefficients. The Fourier expansion of functions plays a central role in analysis of Boolean functions and finds applications in numerous areas of theoretical computer science including learning theory, property testing, hardness of approximation, social choice theory, and cryptography. For an in-depth introduction to the topic the reader is referred to the book of O'Donnell [22].
A classical result in learning theory is a general algorithm due to Kushilevitz and Mansour [19], based on results of Linial, Mansour, and Nisan [20] and Goldreich and Levin [12], which enables to efficiently learn classes of Boolean functions with a "simple" Fourier expansion. A common Theorem 1.1. For every function f : F n 2 → R, the number of k-Fourier-sparse Boolean functions of distance at most d from f is 2 O(ndk log k/2 n ) .
We observe that for certain choices of k and d the bound given in Theorem 1.1 is tight. For example, let f be the constant zero function, let k < 2 0.9n be a power of 2, and take d = 2 n /k. Consider all the indicator functions of linear subspaces of F n 2 of co-dimension log 2 k. Every such function is of distance d from f and is k-Fourier-sparse (see Claim 2.4). The number of such functions is 2 Θ(n log k) = 2 Θ(ndk log k/2 n ) .

Learning from samples.
As an application of the list-decoding bound, we next consider the problem of learning the class of k-Fourier-sparse Boolean functions on n variables (exactly) from uniform and independent random samples (see, e.g., [2,18] for related work). Let us note already at the outset that all the results mentioned here are not efficient: it is not known if there is an algorithm for the problem whose running time is some fixed polynomial in n times an arbitrary function of k. Among other things, such an algorithm would imply a breakthrough on the longstanding open question of learning juntas from samples [7,21,25,18].
The question of recovering a function that is sparse in the Fourier (or other) basis from a few samples is the central question in the area of sparse recovery. It has been intensely investigated for over a decade and, among other things, has applications for compressed sensing and for the data stream model. The best previously known bounds on our question are O(n · k log 3 k) ≤ O(n 4 · k) due to Cheraghchi, Guruswami, and Velingker [11] and O(n 2 · k log k) ≤ O(n 3 · k) due to Bourgain [8], improving on a previous bound of Rudelson and Vershynin [23] (who themselves improved on the work of Candès and Tao [10]). We note in passing that they actually answer a harder question: first, because they handle all functions, not necessarily Boolean-valued; second, because they show that a randomly chosen set of sample locations of the above cardinality is good with high probability simultaneously for all k-Fourier-sparse functions (sometimes known as the "deterministic" setting), whereas we only want a random set of sample locations to be good with high probability for any fixed k-Fourier-sparse function (the "randomized" setting); finally, because they obtain the recovery result by proving a "restricted isometry property" of the Fourier matrix which among other things implies a recovery algorithm running in time polynomial in 2 n and k.
Using Theorem 1.1, we improve the upper bound on the sample complexity of learning Fouriersparse Boolean functions.

Corollary 1.2. The number of uniform and independent random samples required for learning the class of k-Fourier-sparse Boolean functions on n variables is O(n · k log k).
We believe that our better bound and its elementary proof shed more light on the problem and might be useful elsewhere. In fact, in a follow-up work [15] we employ the techniques developed here to study the "restricted isometry property" of random submatrices of Fourier (and other) matrices, improving on the aforementioned works [11,8]. We finally note that a lower bound of Ω(k · (n − log 2 k)) on the sample complexity can be obtained by considering the problem of learning indicator functions of affine subspaces of F n 2 of co-dimension log 2 k (see Theorem 3.7; see, e.g., [3] for the same lower bound in a different setting).
Testing Booleanity. We next consider the problem of testing Booleanity of Fourier-sparse functions, which was introduced and studied by Gur and Tamuz in [14]. In this problem, given access to a k-Fourier-sparse function f : F n 2 → R, one has to decide if f is Boolean, i.e., its image is contained in {0, 1}, or not. The objective is to distinguish between the two cases with some constant probability using as few queries to f as possible. It was shown in [14] that there exists a (nonadaptive one-sided error) tester for the problem with query complexity O(k 2 ), and that every tester for the problem has query complexity Ω(k). Here, we use our result on learning k-Fouriersparse Boolean functions to improve the upper bound of [14] and prove the following. Theorem 1.3. For every k there exists a non-adaptive one-sided error tester that using O(k · log 2 k) queries to an input k-Fourier-sparse function f : F n 2 → R decides if f is Boolean or not with constant success probability.
We note that, while the tester established in Theorem 1.3 has an improved query complexity, it is not clear if it is efficient with respect to running time. It can be shown, though, that using the learning algorithm of Fourier-sparse functions that follows from [8,15] (instead of Corollary 1.2) in our proof of Theorem 1.3, one can obtain an efficient algorithm (running in time polynomial in n and k) with the slightly worse query complexity of O(k · log 3 k).
Finally, we complement Theorem 1.3 by the following nearly matching lower bound.

Theorem 1.4.
Every non-adaptive one-sided error tester for Booleanity of k-Fourier-sparse functions has query complexity Ω(k · log k).

The List-Decoding Size of Fourier-Sparse Boolean Functions
In order to prove Theorem 1.1, we have to bound from above the number of k-Fourier-sparse Boolean functions of distance at most d from a general function f : F n 2 → R. In the discussion below, let us consider the special case where f is the constant zero function. The general result follows easily.
Here, we have to bound the number of k-Fourier-sparse Boolean functions g : F n 2 → {0, 1} of support size at most d. We start by observing using Parseval's theorem that such functions have small spectral norm ĝ 1 = ∑ S⊆[n] |ĝ(S)|. Next, we observe that the Fourier expansion of the normalized function g/ ĝ 1 is a convex combination of functions ±χ S , and thus can be viewed, following a technique of Bruck and Smolensky [9], as an expectation over a distribution on the S's. Using the Chernoff-Hoeffding bound and the bound on the spectral norm, we obtain a succinct representation for every such function g. The ability to represent these functions by a binary string of bounded length yields the upper bound on their number. We note that the proof approach somewhat resembles that of the upper bound on the list-decoding size of Reed-Muller codes due to Kaufman, Lovett, and Porat [17].

Learning Fourier-Sparse Boolean Functions
As a warmup, let us mention an easy upper bound of O(n · k 2 ). This follows by recalling that there are at most 2 O(nk) k-Fourier-sparse Boolean functions, and that each one differs from any fixed function on at least 1/k fraction of the inputs. Hence by the union bound, after O(n · k 2 ) samples all other functions will be eliminated.
The improved bound in Corollary 1.2 follows similarly using the list-decoding result of Theorem 1.1. Namely, we apply the union bound separately on functions of different distances from the input function. Functions that are nearby are harder to "hit" using random samples, but by the theorem, there are few of them; functions that are further away are in abundance, but they are easier to "hit" using random samples.

Testing Booleanity of Fourier-Sparse Functions
The testing Booleanity problem is somewhat different from typical property testing problems. Indeed, in property testing one usually has to distinguish objects that satisfy a certain property from those that are ε-far from the property for some distance parameter ε > 0. However, here the tester is required to decide if the function satisfies the Booleanity property or not, with no distance parameter involved. This unusual setting makes sense in this case because Fourier-sparse non-Boolean functions are always quite far from every Boolean function. More precisely, the authors of [14] used the uncertainty principle (see Proposition 2.1) to prove that every k-Fourier-sparse non-Boolean function f : F n 2 → R is non-Boolean on at least Ω(2 n /k 2 ) inputs (see Claim 2.3). This immediately implies a (non-adaptive one-sided error) tester that uses O(k 2 ) queries: just check that f is Boolean on O(k 2 ) uniform inputs in F n 2 . The analysis of [14] turns out to be tight, as there are k-Fourier-sparse non-Boolean functions that are not Boolean at only Θ(2 n /k 2 ) points. Indeed, for an even integer n, consider the function which is not Boolean at only one point and has Fourier-sparsity 2 · 2 n/2 (see Claim 2.4).
Upper bound. We prove Theorem 1.3 using our learning result, Corollary 1.2. To do so, we first observe that a restriction of a k-Fourier-sparse non-Boolean function to a random subspace of dimension O(log k) is non-Boolean with high probability (see Lemma 4.1). Since a restriction to a subspace does not increase the Fourier-sparsity, this reduces our problem to testing Booleanity of k-Fourier-sparse functions on n = O(log k) variables. Then, after O(k · log 2 k) samples from the subspace, if a non-Boolean value was found then we are clearly done. Otherwise, by Corollary 1.2, the samples uniquely specify a Boolean candidate for the restricted function. Such a function must be quite far from every other k-Fourier-sparse function (Boolean or not; see Claim 2.2). This enables us to decide if the restricted function equals the Boolean candidate function or not.
Lower bound. The upper bound in Theorem 1.3 gets close to the Ω(k) lower bound proven by Gur and Tamuz in [14]. For their lower bound, they considered the following two distributions: (a) the uniform distribution over all Boolean n-variable functions that depend only on their first log 2 k variables; (b) the uniform distribution over all n-variable functions that depend only on their first log 2 k variables and return a Boolean value on k − 1 of the assignments to the relevant variables and the value 2 otherwise. It can be easily seen that any (possibly adaptive) tester that distinguishes with some constant probability between distributions (a) and (b) has query complexity Ω(k). Since the first distribution is supported on k-Fourier-sparse Boolean functions and the second on k-Fourier-sparse non-Boolean functions, this implies that the same lower bound holds for the query complexity of testing Booleanity of k-Fourier-sparse functions. Note that the distributions considered above are supported on log 2 k-Fourier-dimensional functions. It can be seen (say, using the uncertainty principle) that such functions are not Boolean on at least 1/k fraction of their inputs, so O(k) random samples suffice for finding a non-Boolean value if exists. Hence, in order to get beyond the Ω(k) lower bound, we need to consider k-Fouriersparse functions that are not Boolean at only o(1/k) fraction of the inputs -our functions will Specifically, we consider the distribution of functions obtained by composing the function f given in (1) with a random invertible affine transformation. This is the class of functions that can be represented as a sum 1 V 1 + 1 V 2 of two indicators of affine subspaces V 1 , V 2 ⊆ F n 2 of dimension n/2, which intersect at exactly one point. Intuitively, it seems that distinguishing the functions in this class from those where V 1 and V 2 have empty intersection requires the tester to learn the affine subspaces V 1 and V 2 , a task that requires Ω(n · 2 n/2 ) queries. We prove such a lower bound for nonadaptive one-sided error testers. Since the above functions are k-Fourier-sparse for k = O(2 n/2 ), the obtained lower bound is Ω(k · log k).

Fourier Expansion
Thus, every function f : F n 2 → R can be uniquely represented as a linear combination f = ∑ S⊆[n]f (S) · χ S of this basis. This representation is called the Fourier expansion of f , and the numbersf (S) are referred to as its Fourier coefficients. The support of f is defined by supp( f ) = {x ∈ F n 2 | f (x) = 0} and the support off , known as the Fourier spectrum of f , by The uncertainty principle says that there is no nonzero function f for which the supports of both f andf are small (see, e.g., [22,Exercise 3.15]). We state it below with two simple consequences.
Proof: Apply Proposition 2.1 to the function f − g, whose Fourier-sparsity is at most 2k.

Claim 2.3 ([14]
). For every k-Fourier-sparse function f : Proof: Apply Proposition 2.1 to the function f · ( f − 1), whose Fourier-sparsity is at most where △ stands for symmetric difference of sets.
We also need the following simple claim. Proof: Since V has co-dimension k, there exist a 1 , . . . , a k ∈ F n 2 and b 1 , . . . , b k ∈ F 2 such that V = {x ∈ F n 2 | x, a i = b i , i = 1, . . . , k}. For every i, let S i ⊆ [n] denote the set whose characteristic vector is a i , and observe that for every x ∈ F n 2 , This representation implies that 1 V is 2 k -Fourier-sparse.

The List-Decoding Size of Fourier-Sparse Boolean Functions
We turn to prove Theorem 1.1, which provides an upper bound on the list-decoding size of the code of block length 2 n of all k-Fourier-sparse Boolean functions on n variables. Equivalently, for a general distance d and a function f : F n 2 → R we bound the number of k-Fourier-sparse Boolean functions on n variables of distance at most d from f .
We start by proving that a function f : F n 2 → R with small spectral norm can be well approximated by a linear combination of few functions from {χ S } S⊆ [n] with coefficients of equal magnitude. This was essentially proved in [9] and we include here the proof for completeness.

Proof:
Observe that the function f can be represented as follows.
where D is the distribution defined by D(S) = |f (S)|/ f 1 . Let F be a collection of |F | = O( f 2 1 · log(1/δ)/ε 2 ) independent random samples from the distribution D. For every x ∈ F n 2 , the Chernoff-Hoeffding bound (Theorem 2.5) implies that with probability at least 1 − δ it holds that where a S = sign(f (S)). By linearity of expectation, it follows that there exist F and signs (a S ) S∈F for which (2) holds for all but at most δ fraction of x ∈ F n 2 , as required.
We now apply Lemma 3.1 to Fourier-sparse functions in F n 2 → {−1, 0, +1} with bounded support size, and then, in Corollary 3.3, derive an upper bound on the number of these functions.
2 Repetitions of subsets in the collection F are allowed.

Corollary 3.3. The number of k-Fourier-sparse functions f
Proof: For every k-Fourier-sparse function f : F n 2 → {−1, 0, +1} satisfying | supp( f )| ≤ d, let F and (a S ) S∈F be as given by Corollary 3.2 for, say, δ = 1/(5k). Since the range of f is {−1, 0, +1}, it follows that the collection F , the signs (a S ) S∈F , and the value of f 1 define a function of distance at most δ · 2 n from f . Notice that by Claim 2.2 and our choice of δ, the distance between every two distinct k-Fourier-sparse functions is larger than 2δ · 2 n . Thus, a function of distance at most δ · 2 n from f fully defines f . This implies that f can be represented by a binary string of length O(n · dk log k/2 n ), so the total number of such functions is 2 O(ndk log k/2 n ) . Proof: Let f : F n 2 → {0, 1} be a k-Fourier-sparse Boolean function. Consider the mapping that maps every k-Fourier-sparse Boolean function g : F n 2 → {0, 1}, whose distance from f is at most d, to the function h = f − g. Observe that h is a 2k-Fourier-sparse function from F n 2 to {−1, 0, +1} satisfying | supp(h)| ≤ d. By Corollary 3.3, the number of such functions h is bounded by 2 O(ndk log k/2 n ) . Since the above mapping is bijective, this bound holds for the number of functions g as well.

The bound in Corollary 3.3 implies a bound on the number of Fourier-sparse Boolean functions of bounded distance from a given Boolean function.
Equipped with Corollary 3.3, we restate and prove Theorem 1.1. Proof: If there is no k-Fourier-sparse Boolean function of distance at most d from f , then the bound trivially holds. So assume that such a function g : F n 2 → {0, 1} exists. Observe that every k-Fourier-sparse Boolean function of distance at most d from f has distance at most 2d from g. Thus, by Corollary 3.4 applied to g, the number of such functions is at most 2 O(ndk log k/2 n ) .

The Sample Complexity of Learning Fourier-Sparse Boolean Functions
The sample complexity of learning a class of functions is the minimum number of uniform and independent random samples needed from a function in the class for specifying it with high success probability. Here we consider the class of k-Fourier-sparse Boolean functions on n variables, and show how Theorem 1.1 implies an upper bound on the sample complexity of learning it (Corollary 3.6).
Theorem 3.5. For every n, 1 < k ≤ 2 n , and a k-Fourier-sparse function f : F n 2 → R, the following holds. The probability that when sampling O(n · k log k) uniform and independent random samples from f , there exists a k-Fourier-sparse Boolean function g = f that agrees with f on all the samples is 2 −Ω(n log k) .
x is distributed uniformly and independently in F n 2 . By Claim 2.2, the distance between f and every other k-Fourier-sparse function is at least 2 n /(2k). For an integer ℓ ∈ [1, ⌊log 2 2k⌋], consider all the k-Fourier-sparse Boolean functions whose distance from f is in [2 n−ℓ , 2 n−ℓ+1 ]. By Theorem 1.1, the number of such functions is 2 O(nk log k/2 ℓ ) . The probability that such a function agrees with q random independent samples of f is at most (1 − 2 −ℓ ) q . By the union bound, the probability that at least one of these functions agrees with the q samples is at most where the last inequality holds for an appropriate choice of q = O(nk log k). By applying the union bound over all the values of ℓ, it follows that with probability 1 − 2 −Ω(n log k) all the k-Fourier-sparse Boolean functions (besides f ) are eliminated, completing the proof.
The following corollary follows immediately from Theorem 3.5 and confirms Corollary 1.2. Corollary 3.6. For every n and 1 ≤ k ≤ 2 n , the number of uniform and independent random samples required for learning the class of k-Fourier-sparse Boolean functions on n variables with success probability 1 − 2 −Ω(n log k) is O(n · k log k).
We end with the following simple lower bound. Theorem 3.7. For every n and 1 ≤ k ≤ 2 n , the number of uniform and independent random samples required for learning the class of k-Fourier-sparse Boolean functions on n variables with constant success probability is Ω(k · (n − log 2 k)).
Proof: Assume without loss of generality that k is a power of 2. Let A be an algorithm for learning the class above with constant success probability p > 0 using q uniform and independent random samples. Consider the class G of indicators of affine subspaces of F n 2 of co-dimension log 2 k (i.e., affine subspaces of F n 2 of size 2 n /k). By Claim 2.4, the functions in G are k-Fourier-sparse. Observe that their number satisfies |G| = 2 Θ(n·min(log 2 k,n−log 2 k)) .
By Yao's minimax principle, there exists a deterministic algorithm A ′ (obtained by fixing the random coins of A) that given evaluations of a function, chosen uniformly at random from G, on a fixed collection of q points in F n 2 , learns it with success probability p. Now, observe that the expected number of 1-evaluations that A ′ receives is q/k. By Markov's inequality, the probability that A ′ receives at least 2q/(pk) 1-evaluations is at most p/2. It follows that for at least p/2 fraction of the functions in G the algorithm A ′ receives at most 2q/(pk) 1evaluations and learns them correctly. Assuming that pk ≥ 2, the number of possible evaluation sequences on these inputs is at most where for the first inequality we used the standard inequality ∑ t i=0 ( q i ) ≤ (qe/t) t which holds for t ≤ q (see, e.g., [16,Proposition 1.4]). The above is bounded from below by |G| · p/2, implying that q ≥ Ω(n · min(log 2 k, n − log 2 k) · k/ log 2 k) ≥ Ω(k · (n − log 2 k)), where the last inequality follows by considering separately the cases of k ≥ 2 n/2 and k < 2 n/2 . In case that pk < 2, the number of possible evaluation sequences is at most 2 q , and the bound follows similarly using the assumption that p is a fixed constant.

Testing Booleanity of Fourier-Sparse Functions
In this section we prove upper and lower bounds on the query complexity of testing Booleanity of Fourier-sparse functions. For a parameter k, consider the problem in which given access to a k-Fourier-sparse function f : F n 2 → R one has to decide if f is Boolean, i.e., f (x) ∈ {0, 1} for every x ∈ F n 2 , or not, with some constant success probability.

Upper Bound
As mentioned before, Gur and Tamuz proved in [14] that every k-Fourier-sparse non-Boolean function f on n variables satisfies f (x) / ∈ {0, 1} for at least Ω(2 n /k 2 ) inputs x ∈ F n 2 (see Claim 2.3). Thus, querying the input function f on O(k 2 ) independent and random inputs suffices in order to catch a non-Boolean value of f if such a value exists. In the following lemma it is shown that it is not really needed to choose the O(k 2 ) random vectors independently. It turns out that a restriction of a k-Fourier-sparse non-Boolean function to a random linear subspace of size O(k 2 ), that is, of dimension ≈ 2 log 2 k, is with high probability non-Boolean. Thus, the tester could randomly pick such a subspace and query f on all of its vectors. This decreases the amount of randomness used in the tester of [14] from O(nk 2 ) to O(n log k). More importantly for us, this reduces the problem of testing Booleanity of k-Fourier-sparse functions on n variables to the case of k = Θ(2 n/2 ). Lemma 4.1. Let f : F n 2 → R be a k-Fourier-sparse non-Boolean function, and denote L = (k 2 + k + 2)/2. Then, for every δ > 0, the restriction of f to a uniformly chosen random linear subspace of dimension r ≥ log 2 (L/δ) is also non-Boolean with probability at least 1 − δ.
Proof: Let f : F n 2 → R be a k-Fourier-sparse non-Boolean function. By Claim 2.3, there are at least 2 n /L vectors x ∈ F n 2 for which f (x) / ∈ {0, 1}. This implies that there exists a set S of at least log 2 (2 n /L) linearly independent vectors in F n 2 on which f is not Boolean. Consider a linear subspace V ⊆ F n 2 of dimension n − 1 chosen uniformly at random. Since the vectors in S are linearly independent, the probability that no vector in S is in V is 2 −|S| ≤ L 2 n . It follows that the restriction f | V of f to V is a k-Fourier-sparse function defined on a linear subspace of dimension n − 1, and its probability to be Boolean is at most L 2 n . Note that one can think of the domain of f | V as F n−1 2 , because V and F n−1 2 are isomorphic and a composition with an invertible linear transformation does not affect the Fourier-sparsity. Now, let us repeat the above process n − r − 1 additional times, until we get a linear subspace of dimension r. The probability that the function becomes Boolean in one of the steps is at most and we are done.
We now restate and prove Theorem 1.3, which gives an upper bound of O(k · log 2 k) on the query complexity of testing Booleanity of k-Fourier-sparse functions. In the proof, we first apply Lemma 4.1 to restrict the input function to a subspace of dimension O(log k). Then, we apply Theorem 3.5 in an attempt to learn the restricted function and check if it is consistent with some k-Fourier-sparse Boolean function. Theorem 1.3. For every k there exists a non-adaptive one-sided error tester that using O(k · log 2 k) queries to an input k-Fourier-sparse function f : F n 2 → R decides if f is Boolean or not with constant success probability.
Proof: Consider the tester that given access to an input k-Fourier-sparse function f : F n 2 → R acts as follows: 1. Pick uniformly at random a linear subspace V of F n 2 of dimension r = min(n, ⌈log 2 (100L)⌉), where L = (k 2 + k + 2)/2, and let T be an invertible linear transformation mapping F r 2 to V.
2. Query f on O(r · k log k) random vectors chosen uniformly and independently from the subspace V. Note that these queries can be seen as uniform and independent random samples from the function g : F r 2 → R defined as g = f • T.
3. If there exists a k-Fourier-sparse Boolean function on r variables that agrees with the above samples of g then accept, and otherwise reject.
We turn to prove the correctness of the above tester. If f is a k-Fourier-sparse Boolean function then so is g, because a restriction to a subspace and a composition with a linear transformation leave the function k-Fourier-sparse and Boolean. Hence, in this case the tester accepts with probability 1.
On the other hand, if f is a k-Fourier-sparse non-Boolean function, then by Lemma 4.1 the restriction of f to the random subspace V of dimension r picked in Item 1, as well as the function g defined in Item 2, are also non-Boolean with probability at least 0.99. In this case, by Theorem 3.5, the probability that there is a k-Fourier-sparse Boolean function on r variables that agrees with O(r · k log k) uniform and independent random samples from g is 2 −Ω(r log k) , thus the tester correctly rejects with probability at least, say, 0.9, as required. Finally, observe that the number of queries made by the tester is O(r · k log k) = O(k · log 2 k).

Lower Bound
We turn to restate and prove our lower bound on the query complexity of testing Booleanity of k-Fourier-sparse functions. Theorem 1.4. Every non-adaptive one-sided error tester for Booleanity of k-Fourier-sparse functions has query complexity Ω(k · log k).
Proof: For a given integer k, let n be the largest even integer satisfying k ≥ 3 · 2 n/2 . Define a distribution D no over functions in F n 2 → {0, 1, 2} as follows. Pick uniformly at random a pair (V 1 , V 2 ) of affine subspaces satisfying dim(V 1 ) = dim(V 2 ) = n/2 and |V 1 ∩ V 2 | = 1, and output the sum of indicators 1 V 1 + 1 V 2 . Notice that, by Claim 2.4, such a function has Fourier-sparsity at most 2 · 2 n/2 ≤ k. Thus, a function chosen from D no is k-Fourier-sparse and non-Boolean with probability 1.
Let T be a non-adaptive one-sided error randomized tester for Booleanity of k-Fourier-sparse functions with query complexity q and success probability at least 2/3. By Yao's minimax principle, there exists a deterministic tester T ′ (obtained by fixing the random coins of T) that rejects a random function chosen from D no with probability at least 2/3. Since T is non-adaptive and has one-sided error, it follows that T ′ queries an input function on q fixed vectors a 1 , . . . , a q ∈ F n 2 , accepts every k-Fourier-sparse Boolean function, and rejects a function chosen from D no with probability at least 2/3. We turn to prove that q > (n · 2 n/2 )/1000 = Ω(k · log k).
Assume in contradiction that q ≤ (n · 2 n/2 )/1000. Let f be a random function chosen from D no , that is, f = 1 V 1 + 1 V 2 for random affine subspaces V 1 and V 2 of dimension n/2 satisfying |V 1 ∩ V 2 | = 1. For i = 1, 2, let W i be the affine span of {a 1 , . . . , a q } ∩ V i . Let E be the event that the intersection of W 1 and W 2 is empty. We turn to prove that if the event E happens then the tester T ′ accepts the function f and that the probability of this event is at least 0.9. This contradicts the success probability of T ′ on functions chosen from D no and completes the proof.

Lemma 4.2. If the event E happens then the tester T ′ accepts the function f .
Proof: Assume that the event E happens, i.e., W 1 ∩ W 2 = ∅. Then, there exists an affine subspace V ′ 2 of dimension n/2 − 1 satisfying W 2 ⊆ V ′ 2 V 2 and V 1 ∩ V ′ 2 = ∅. Consider the function g = 1 V 1 + 1 V ′ 2 . By Claim 2.4, g is a Boolean function whose Fourier-sparsity is at most 3 · 2 n/2 ≤ k, thus it is accepted by T ′ . However, g satisfies g(a i ) = f (a i ) for every 1 ≤ i ≤ q. This implies that T ′ cannot distinguish between g and f , so it must accept f as well. Proof: Denote by X the number of vectors in {a 1 , . . . , a q } ∩ V 1 . Since V 1 is distributed uniformly over all affine subspaces of dimension n/2, the probability that a i belongs to V 1 is 2 −n/2 for every 1 ≤ i ≤ q . Thus, by linearity of expectation, E [X] = q 2 n/2 ≤ (n · 2 n/2 )/1000 2 n/2 = n 1000 .
Now, fix a choice of V 1 for which dim(W 1 ) < n/10, and consider the randomness over the choice of V 2 . Notice that, conditioned on V 1 , V 2 is distributed uniformly over all the affine subspaces of dimension n/2 which contain exactly one vector from V 1 . By symmetry, every vector of V 1 has probability |V 1 | −1 = 2 −n/2 to belong to V 2 . Thus, the probability that the vector that belongs to both V 1 and V 2 is in W 1 is |W 1 | · 2 −n/2 < 2 n/10 · 2 −n/2 = 2 −2n/5 .