On Using Toeplitz and Circulant Matrices for Johnson–Lindenstrauss Transforms

The Johnson–Lindenstrauss lemma is one of the cornerstone results in dimensionality reduction. A common formulation of it, is that there exists a random linear mapping f:Rn→Rm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}^m$$\end{document} such that for any vector x∈Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x \in {\mathbb {R}}^n$$\end{document}, f preserves its norm to within (1±ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1 {\pm } \varepsilon )$$\end{document} with probability 1-δ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 - \delta $$\end{document} if m=Θ(ε-2lg(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = \varTheta (\varepsilon ^{-2} \lg (1/\delta ))$$\end{document}. Much effort has gone into developing fast embedding algorithms, with the Fast Johnson–Lindenstrauss transform of Ailon and Chazelle being one of the most well-known techniques. The current fastest algorithm that yields the optimal m=O(ε-2lg(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = {\mathcal {O}}(\varepsilon ^{-2}\lg (1/\delta ))$$\end{document} dimensions has an embedding time of O(nlgn+ε-2lg3(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(n \lg n + \varepsilon ^{-2} \lg ^3 (1/\delta ))$$\end{document}. An exciting approach towards improving this, due to Hinrichs and Vybíral, is to use a random m×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \times n$$\end{document} Toeplitz matrix for the embedding. Using Fast Fourier Transform, the embedding of a vector can then be computed in O(nlgm)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(n \lg m)$$\end{document} time. The big question is of course whether m=O(ε-2lg(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = {\mathcal {O}}(\varepsilon ^{-2} \lg (1/\delta ))$$\end{document} dimensions suffice for this technique. If so, this would end a decades long quest to obtain faster and faster Johnson–Lindenstrauss transforms. The current best analysis of the embedding of Hinrichs and Vybíral shows that m=O(ε-2lg2(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = {\mathcal {O}}(\varepsilon ^{-2}\lg ^2 (1/\delta ))$$\end{document} dimensions suffice. The main result of this paper, is a proof that this analysis unfortunately cannot be tightened any further, i.e., there exist vectors requiring m=Ω(ε-2lg2(1/δ))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = \varOmega (\varepsilon ^{-2} \lg ^2 (1/\delta ))$$\end{document} for the Toeplitz approach to work.


Introduction
The performance of many geometric algorithms depends heavily on the dimension of the input data. A widely used technique to combat this "curse of dimensionality", is to preprocess the input via dimensionality reduction while approximately preserving important geometric properties. Running the algorithm on the lower dimensional data then uses less resources (time, space, etc.) and an approximate result for the high dimensional data can be derived from the low dimensional result.
Theorem 1 (Distributional Johnson-Lindenstrauss lemma) For any 0 < ε, δ < 1/2, there exists a distribution D over linear functions f : R n → R m for some m = O(ε −2 lg(1/δ)) such that ∀x ∈ R n , This result dates back to 1984 and says that to preserve the norm of any vector in R n to within a factor (1±ε) with error probability δ, it suffices to use just m = O(ε −2 lg(1/δ)) dimensions. The bound on m was very recently proven optimal [18]. If we have a set X ⊂ R n of N vectors, we can use Theorem 1 to reduce the dimension of our vectors to O(ε −2 lg N ) while preserving the pairwise distances: Choose δ = 1/N 2 and use the fact that the embedding is linear to union bound over all pairs x, y ∈ X to prove that the probability that all pairwise distances (i.e. the norm of the vector x − y) are preserved, is at least 1/2.
The standard technique for constructing a map with the properties of Theorem 1 is the following. Let A be an m × n matrix with entries independently sampled as either N (0, 1) random variables (as in [10]) or Rademacher (uniform among {−1, +1}) random variables (as in [1]). Once such entries have been drawn, let f : R n → R m be defined as f (x) = 1 √ m Ax.

Time Complexity
Examining the classic Johnson-Lindenstrauss reduction above, we see that to embed a vector, we need to multiply with a dense matrix and the embedding time becomes O(nm) (or equivalently O(nε −2 lg(1/δ))). This may be prohibitively large for many applications (recall one prime usage of dimensionality reduction is to speed up algorithms), and much research has been devoted to obtaining faster embedding time. Fast Johnson-Lindenstrauss Transform Ailon and Chazelle [2] were the first to address the question of faster Johnson-Lindenstrauss transforms. In their seminal paper, they introduced the so-called Fast Johnson-Lindenstrauss transform for speeding up dimensionality reduction. The basic idea in their paper is to first "precondition" the input data by multiplying with a diagonal matrix with random signs, followed by multiplying with a Hadamard matrix. This has the effect of "spreading" out the mass of the input vectors, allowing for the dense matrix A above to be replaced with a sparse matrix.
Since we can multiply with a Hadamard matrix using Fast Fourier Transform, this gives an embedding time of O(n lg n + ε −2 lg 3 (1/δ)) for embedding into the optimal m = O(ε −2 lg(1/δ)) dimensions. For m = ε −2 lg(1/δ) ≤ n 1/2−γ for any constant γ > 0, the embedding complexity was improved even further down to O(n lg m) in [3]. Another approach to achieve the O(n lg m) embedding time, but without the restriction on ε −2 lg(1/δ) ≤ n 1/2−γ , is to sacrifice the target dimension. This was done in [4] and later improved in [17], where the embedding complexity was O(n lg m) at the cost of an increased target dimension m = O(ε −2 lg(1/δ) lg 4 n). Sparse Vectors Another approach to improve the performance of JL transforms, is to assume the input data is sparse, i.e. has few non-zero coordinates. Designing an algorithm based on the work in [24], Dasgupta et al. [9] achieved an embedding [16]. The main idea behind this approach is embedding using a sparse matrix where each column has only ε −1 lg(1/δ) non-zero entries (Table 1). Toeplitz Matrices Finally, another very exciting approach is to use Toeplitz matrices or partial circulant matrices for the embedding. We first introduce the terminology.
An m × n Toeplitz matrix is an m × n matrix, where every entry on a diagonal has the same value: A partial circulant matrix is a special kind of Toeplitz matrix, where every row, except the first, is the previous row rotated once: Sparse Hadamard n is the dimension of the input vectors, m is the dimension of the output vectors, ε is the distortion, and δ is the error probability Hinrichs and Vybíral [13] proposed the following algorithm for generating a JL embedding based on a Toeplitz matrix: 1 Multiplying with a Toeplitz matrix corresponds to computing a convolution and can be done using Fast Fourier Transform. By appropriately blocking the input coordinates, the complexity of embedding a vector x is just O(n lg m) for any target dimension m. The big question is of course, how low can the target dimension m be, while preserving the distances between vectors up to a factor of 1±ε?
In the original paper [13], the authors proved that setting the target dimension to m = O(ε −2 lg 3 (1/δ)), the norm of any vector would be preserved to within (1±ε) with probability at least 1 − δ. Later, the analysis was refined in [23], which lowered the target dimension to m = O(ε −2 lg 2 (1/δ)) for preserving norms to within (1±ε) with probability 1 − δ. Now if the analysis could be tightened even further to give the optimal m = O(ε −2 lg(1/δ)) dimensions, this would end the decades long quest for faster and faster embedding algorithms! Our Contribution Our main result unfortunately shows that the analysis of Vybíral [23] cannot be tightened to give an even lower target dimensionality when preserving norms. More specifically, we prove that the upper bound given in [23] is optimal.
Theorem 2 Let T and D be the m × n Toeplitz and n × n diagonal matrix in the embedding proposed by [13]. For all 0 < ε < C, where C is a universal constant, and any desired error probability δ > 0, if the following holds for every unit vector x ∈ R n : then it must be the case that m = Ω(ε −2 lg 2 (1/δ)).

Preserving Pairwise Distances
While Theorem 2 already shows that one cannot tighten the analysis of Vybíral for preserving the norm of just one vector, Theorem 2 does leave open the possibility that one would not need to union bound over all N 2 pairs of difference vectors when trying to preserve all pairwise distances amongst a set of N vectors.
Krahmer and Ward [17] avoided this union bound by introducing a dependency on the original dimension n: Using the restricted isometry property they show that to preserve pairwise distances among N vectors in R n to within (1±ε) the target dimension m = O(max{ε −1 lg 3/2 N lg 3/2 n, ε −2 lg N lg 4 n}) suffices for the Toeplitz approach, which for some range of parameters is an improvement on [23]. However, we show that there is some range of parameters, where the upper bound in [23] is tight with respect to pairwise distances. [13]. For all 0 < ε < C, where C is a universal constant, if the following holds for every set of N vectors X ⊂ R n :

Theorem 3 Let T and D be the m × n Toeplitz and n × n diagonal matrix in the embedding proposed by
then it must be the case that either m = Ω(ε −2 lg 2 N ) or m = Ω(n/N ).
We remark that our proofs also work if we replace T be a partial circulant matrix (which was also proposed in [13]). Furthermore, we expect that minor technical manipulations to our proof would also show the above theorems when the entries of T and D are N (0, 1) distributed rather than Rademacher (this was also proposed in [13]).

Lower Bound for One Vector
Let T be m × n Toeplitz matrix defined from random variables t −(m−1) , t −(m−2) , . . . , t n−1 such that entry (i, j) takes values t j−i for i = 1, . . . , m and j = 1, . . . , n. Let D be an n × n diagonal matrix with the random variable d i giving the i'th diagonal entry. This section shows the following.

Theorem 4 Let T be m×n Toeplitz and D n×n diagonal. If t
, . . . , t n−1 and d 1 , . . . , d n are independently distributed Rademacher random variables, then for all 0 < ε < C, where C is a universal constant, there exists a unit vector x ∈ R n such that and furthermore, all but the first O( √ m) coordinates of x are 0.
It follows from Theorem 4 that if we want to have probability at least 1 − δ of preserving the norm of any unit vector x to within (1±ε), it must be the case that . This is precisely the statement of Theorem 2. Thus we set out to prove Theorem 4.
To prove Theorem 4, we wish to invoke the Paley-Zygmund inequality, which states, that if X is a non-negative random variable with finite variance and 0 ≤ θ ≤ 1, then We carefully choose a unit vector x, and define the random variable for Paley-Zygmund to be the k'th moment of the difference between the norm of x transformed and 1.
Proof Let k be an even positive integer less than m/4 and define s := 4k. Note that s ≤ m. Let x be an arbitrary n-dimensional unit vector such that the first s coordinates are in {−1/ √ s, +1/ √ s}, while the remaining n − s coordinates are 0. Define the random variable parameterized by k Since k is even, the random variable Z k is non-negative. We wish to lower-bound E[Z k ] and upper-bound E[Z 2 k ] in order to invoke Paley-Zygmund. The bounds we prove are as follows.
Before proving Lemma 1 we show how to use it together with Paley-Zygmund to complete the proof of Theorem 4.
We start by invoking Paley-Zygmund and then rewriting the expectations according to Lemma 1, Here C 0 is some constant greater than 0. For any 0 < ε < 1/C 0 , we can now set This choice of k satisfies k ≤ √ m as required by Lemma 1. We have thus shown that Remark 1 Theorem 4 can easily be extended to partial circulant matrices. The difference between partial circulant and Toeplitz matrices is the dependence between the values in the first m and last m columns. However, as only the first s = 4k ≤ 4 √ m entries in x are nonzero, the last m columns are ignored, and so partial circulant and Toeplitz matrices behave identically in our proof.

Proof of Lemma 1
Before we prove the two bounds in Lemma 1 individually, we rewrite E[Z k ], as this benefits both proofs.
Observe that for j > s or h > s the product becomes 0, as either x j or x h is 0. By removing all these terms, we simplify the sum to To see this, note that by the independence of the random variables, we can write the expectation of the product, as a product of expectations where each term in the product has all the occurrences of the same random variable. Since the d j 's and t a 's are Rademachers, the expectation of any odd power of one of these random variables is 0. Thus if just a single random variable amongst the d j 's and t a 's occurs an odd number Similarly, we observe that if every random variable occurs an even number of times, then the expectation of the product is exactly 1/s k since each x j also occurs an even number of times. If Then we conclude Note that Z 2 k = Z 2k . Therefore, To complete the proof of Lemma 1 we need lower and upper bounds for Γ k and Γ 2k . The bounds we prove are the following.

Lemma 2 If k ≤
√ m, then Γ k and Γ 2k satisfy The proofs of the two bounds in Lemma 2 are given in Sects. 2.1 and 2.2. Substituting the bounds from Lemma 2 in (1) and (2) we get which are the bounds we sought for Lemma 1.

Lower-Bounding 0 k
We first recall that the definition of Γ k is the number of tuples

We view a triple (i, j, h) ∈ ([m]×[s]×[s]) as two entries (i, j) and (i, h) in an m×s
From this and the definition of Γ k it is clear that |F| ≤ Γ k . When constructing S ∈ F, we view S as consisting of two halves S 1 and S 2 , such that S 1 touches exactly the same columns and diagonals as S 2 and both S 1 and S 2 touch each column and diagonal at most once. To capture this, we give the following definition, where S is meant to be the family of such halves S 1 and S 2 .

Definition 1 Let S be the set of all tuples S ∈ ([m] × [s] × [s]) k/2 such that
Definition 1 mimics the definition of Γ k , and the first item in Definition 1 ensures that the triples in a tuple in S are of the same form as in Γ k . The final two items ensure that each column and diagonal, respectively, is touched at most once. This is exactly the properties we wanted of S 1 and S 2 individually.
We can now construct F as all pairs of (half) tuples S 1 , S 2 ∈ S, such that S 1 touches exactly the same columns and diagonals as S 2 . To capture that S 1 and S 2 touch the same columns and diagonals, we introduce the notion of a signature. A signature of S i is the set of columns and diagonals touched by S i .
To have S 1 and S 2 touch exactly the same columns and diagonals, it is necessary and sufficient that they have the same signature. We introduce the following notation: B denotes the number of signatures with at least one member, and by enumerating the signatures, we let b i denote the number of (half) tuples in S with signature i.
We recall that a (half) tuple S 1 ∈ S touches each column and diagonal at most once, and if S 1 and S 2 share the same signature, they touch exactly the same columns and diagonals. Therefore, using • to mean concatenation, S = S 1 • S 2 ∈ F, as each column and diagonal touched is touched exactly twice. Therefore |F| is a lower bound for Γ k . Note that for a given signature i, the number of choices of S 1 and S 2 with that signature is b 2 i . This gives the following inequality, We now apply the Cauchy-Schwarz inequality: To get a lower bound on |S| 2 /B (and in turn Γ k ), we need a lower bound on |S| and an upper bound on B. These bounds are stated in the following lemmas.

Lemma 4 B = O m+s k/2 s k/2 s k .
Before proving any of these lemmas, we show that they together with (3) give the desired lower bound on Γ k : Because s = 4k, we have (k/2) k/2 . With this we can simplify (4) as which is the lower bound we sought.

Proof of Lemma 3 Recall that S ⊆ ([m] × [s] × [s]
) k/2 is the set of (half) tuples that touch each column and diagonal at most once, and, for each triple (i, j, h) in these (half) tuples, we have j = h.
We prove Lemma 3 by analysing how we can create a large number of distinct S ∈ S by choosing the triples in S iteratively.
For each triple, we choose a row and two distinct entries on this row. We choose the row among any of the m rows.
However, because S ∈ S, when choosing entries on the row, we cannot choose entries that lie on columns or diagonals touched by previously chosen triples. Instead we choose the two entries among any of the other entries. Therefore, whenever we choose a triple, this triple prevents at most four row entries from being chosen for every subsequent triple, as the two diagonals and two columns touched by the chosen triple intersect with at most four entries on the rows of the subsequent triples. This leads to the following recurrence, describing a lower bound for the number of triples where r is the number of rows to choose from, c is the minimum number of choosable entries in any row, and t is the number of triples left to choose. Inspecting (5), we can see that F can equivalently be defined as If t ≤ c 8 then the terms inside the product in (6) are greater than c 2 , so we can bound F from below: We now insert the values for r , c and t to find a lower bound for |S|, noting that s = 4k ensures that t ≤ c 8 :

Proof of Lemma 4
Recall that for a triple S ∈ S we define the signature as the set of columns and diagonals touched by S. Furthermore, viewing a triple (i, j, (i, j) and (i, h) in an m × s matrix, we define the left endpoint as (i, min{ j, h}) and the right endpoint as (i, max{ j, h}). The claim to prove is This is proven by first showing an upper bound on the number of choices for the diagonals of left endpoints, then diagonals of right endpoints and finally for columns.
In an m × s matrix there are m + s different diagonals and as the chosen diagonals have to be distinct, there are m+s k/2 choices for the diagonals corresponding to left endpoints in a triple.
As the right endpoint of a triple has to be in the same row as the left endpoint, there are at most s choices for the diagonal corresponding to the right endpoint when the left endpoint has been chosen (which it has in our case). This gives a total of s k/2 choices for diagonals corresponding to right endpoints.
Finally, there are s columns to choose from and the chosen columns have to be distinct, and so the total number of choices of columns is s k . The product of these numbers of choices gives the upper bound sought. To prove an upper bound on Γ 2k , we show how to encode a tuple S ∈ F using at most k lg m + 2k lg s + 2k lg k + O(k) bits, such that S can be decoded from this encoding. Since any S ∈ F can be encoded using k lg m + 2k lg s + 2k lg k + O(k) bits and |F| = Γ 2k , we can conclude:

Upper
Let σ denote the encoding function and σ −1 denote the decoding function. If S ∈ F and t ∈ S, σ (t) denotes the encoding of the triple t, σ (S) denotes the encoding of the entire tuple S, and σ (F) denotes the image of σ .
A tuple S ∈ F consists of triples t 1 , t 2 , . . . , t 2k such that S = t 1 • t 2 • · · · • t 2k . To encode S ∈ F we encode each of the triples and store them in the same order: We will first describe a graph view of a tuple S which will be useful for encoding and decoding, then we will show an encoding algorithm and finally a decoding algorithm. Graph A tuple S ∈ F forms a (multi)graph structure, where every triple (i, j, h) ∈ S is a vertex. Since S ∈ F, there lies an even number of triple endpoints on each diagonal. We can thus pair endpoints lying on the same diagonal, such that every endpoint is paired with exactly one other endpoint. When two triples have endpoints that are paired, the triples have an edge between them in the graph. As every triple has two endpoints, every vertex has degree two, and so the graph consists entirely of simple cycles of length at least two. Encoding To encode an S ∈ F, we first encode each cycle by itself by defining the σ (t)'s for the triples t in the cycle. After this, we order the defined σ (t)'s as the t's were ordered in the input. 1. For each cycle we perform the following.
(a) We pick any vertex t = (i, j, h) of the cycle and give it the type head. Define σ (t) as the concatenation of its type head, its row i, and two columns j and h. This uses lg m + 2 lg s + O(1) bits. (b) We iterate through the cycle starting after head and give vertices, except the last, type mid. The last vertex just before head is given the type last. (c) For each triple t of type mid we store its type and two columns explicitly.
However, instead of storing its row we store the index of its predecessor in the cycle order as well as how they are connected: If we typed t r just before t s when iterating through the cycle, when encoding t s we store r as well as whether t r and t s are connected by the left or right endpoint in t r and left or right endpoint in t s . So define σ (t) as the concatenation of mid, the two columns, the predecessor index and how it is connected to the predecessor. All in all we spend lg k + 2 lg s + O(1) bits encoding each mid. (d) Finally, to encode the triple t, which is typed last, we define σ (t) as the concatenation of its type, its predecessors index, how it is connected to its predecessor, and the column of the endpoint on the predecessor's diagonal. We thus spend lg k + lg s + O(1) bits to encode a last. However, since s = 4k the number of bits per encoded last is equivalent to 2 lg k + O(1), which turns out to simplify the analysis later.
Note that for each triple, the type is encoded in the first 2 bits 2 in the encoding of the triple. This will be important during decoding. 2. After encoding all cycles, we order the encoded triples in the same order as the triples in the input, and output the concatenation of the encoded triples: To analyse the number of bits needed in total, we look at the average number of bits per triple inside a cycle. Since all cycles have a length of at least two, each cycle

Lower Bound for N Vectors
In this section, we generalize the result of Sect. 2 to obtain a lower bound for preserving all pairwise distances amongst a set of N vectors. Our proof uses Theorem 4 as a building block. Recall that Theorem 4 guarantees that there is a vector x such that Moreover, the vector x has non-zeroes only in the first O( √ m) coordinates. From such a vector x, define x →i as the vector having its j'th coordinate equal to the ( j − i)'th coordinate of x if j > i and otherwise its j'th coordinate is 0. In words, x →i is just x with all coordinates shifted by i. The 0-vector clearly maps to the 0-vector when using 1 √ m T D as embedding. Furthermore, by the arguments above, the embeddings of all the remaining vectors are independent and have the same distribution. It follows that Pr ∀x →i ∈ X : Now since 0 ∈ X , it follows that to preserve all pairwise distances amongst vectors in X to within (1±ε), we also have to preserve all norms to within ±ε. This is true since for all x of unit norm: This proves the following. Pr ∀x, y ∈ X : It follows from Theorem 5 that if we want to have constant probability of successfully embedding any set of N vectors, then either it must be the case that m = Ω(n/N ), or where C 0 is a constant. This in turn implies that This completes the proof of Theorem 3.