Abstract 1 Introduction 2 Preliminaries 3 Construction of Averaging Samplers 4 Construction of Matrix Samplers 5 Applications to Extractors and Codes 6 Open Problems References Appendix A Proof of Proposition 4 Appendix B Proof of Lemma 15 Appendix C Proof of Lemma 17

Near-Optimal Averaging Samplers and Matrix Samplers

Zhiyang Xun ORCID Department of Computer Science, University of Texas at Austin, TX, USA David Zuckerman ORCID Department of Computer Science, University of Texas at Austin, TX, USA
Abstract

We present the first efficient averaging sampler that achieves asymptotically optimal randomness complexity and near-optimal sample complexity. For any δ<ε and any constant α>0, our sampler uses m+O(log(1/δ)) random bits to output t=O((1ε2log1δ)1+α) samples Z1,,Zt{0,1}m such that for any function f:{0,1}m[0,1],

Pr[|1ti=1tf(Zi)𝔼[f]|ε]1δ.

The randomness complexity is optimal up to a constant factor, and the sample complexity is optimal up to the O((1ε2log1δ)α) factor.

Our technique generalizes to matrix samplers. A matrix sampler is defined similarly, except that f:{0,1}md×d and the absolute value is replaced by the spectral norm. Our matrix sampler achieves randomness complexity m+O~(log(d/δ)) and sample complexity O((1ε2logdδ)1+α) for any constant α>0, both near-optimal with only a logarithmic factor in randomness complexity and an additional α exponent on the sample complexity.

We use known connections with randomness extractors and list-decodable codes to give applications to these objects. Specifically, we give the first extractor construction with optimal seed length up to an arbitrarily small constant factor above 1, when the min-entropy k=βn for a large enough constant β<1. Finally, we generalize the definition of averaging sampler to any normed vector space.

Keywords and phrases:
Pseudorandomness, Averaging Samplers, Randomness Extractors
Funding:
Zhiyang Xun: Supported by NSF Grant CCF-2312573, a Simons Investigator Award (#409864, David Zuckerman), NSF award CCF-2008868 and the NSF AI Institute for Foundations of Machine Learning (IFML).
David Zuckerman: Supported by NSF Grant CCF-2312573 and a Simons Investigator Award (#409864).
Copyright and License:
[Uncaptioned image] © Zhiyang Xun and David Zuckerman; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Pseudorandomness and derandomization
Acknowledgements:
We thank Kuan Cheng for introducing us to the matrix sampler problem. We thank Shravas Rao for simplifying and slightly improving Lemma 30 (although this didn’t improve our final result). We thank Oded Goldreich, Dana Moshkovitz, Amnon Ta-Shma, Salil Vadhan, and anonymous reviewers for helpful comments.
Editors:
Srikanth Srinivasan

1 Introduction

Randomization plays a crucial role in computer science, offering significant benefits across various applications. However, obtaining true randomness can be challenging. It’s therefore natural to study whether we can achieve the benefits of randomization while using few random bits.

One of the most basic uses of randomness is sampling. Given oracle access to an arbitrary function f:{0,1}m[0,1] on a large domain, our goal is to estimate its average value. By drawing t=O(log(1/δ)/ε2) independent random samples Z1,,Zt{0,1}m, the Chernoff bound guarantees that the average value |1ti=1tf(Zi)𝔼f|ε with probability at least 1δ. This method uses full independence in sampling, but more efficient strategies can be pursued. This leads to the following definition:

Definition 1 ([6]).

A function Samp:{0,1}n({0,1}m)t is a (δ,ε) averaging sampler with t samples using n random bits if for every function f:{0,1}m[0,1], we have

Pr(Z1,,Zt)Samp(Un)[|1tif(Zi)𝔼f|ε]1δ.

The goal is to construct explicit samplers using a small number of random bits that have sample complexity close to the optimal. Researchers have made significant progress toward this goal, and a summary is presented in Table 1. Bellare and Rompel [6] suggested that interesting choices of parameters are δ=exp(poly(m)) and ε=1/poly(m), which allow us to use poly(m) random bits and generate poly(m) samples. For simplicity, we assume δε throughout the paper (see Remark 10 for further discussion).

Table 1: Comparison of averaging samplers, α any positive constant.
Reference Method Random Bits Sample Complexity
[10] Lower Bound m+log(1/δ)log(O(t)) Ω(log(1/δ)/ε2)
[10] Non-Explicit m+2log(2/δ)+loglog(1/ε) 2log(4/δ)/ε2
Standard Full Independence O(mlog(1/δ)/ε2) O(log(1/δ)/ε2)
[12] Pairwise Independence O(m+log(1/δ)) O(1/(δε2))
[17] Expander Walks m+O(log(1/δ)/ε2) O(log(1/δ)/ε2)
[6] Iterated Sampling O(m+(logm)log(1/δ)) poly(1/ε,log(1/δ),logm)
[43] Hash-Based Extractors (1+α)(m+log(1/δ)) poly(1/ε,log(1/δ),m)
[31] Zig-Zag Extractors m+(1+α)log(1/δ) poly(1/ε,log(1/δ))
Theorem 1 Compose [31] m+O(log(1/δ)) O((log(1/δ)/ε2)1+α)
With Almost -wise Ind.

The best existing randomness-efficient averaging sampler comes from the equivalence between averaging samplers and extractors [43], which we will elaborate on later in the paper. Improving Zuckerman’s construction, Reingold, Vadhan, and Wigderson [31] provided a (δ,ε) averaging sampler for domain {0,1}m that uses m+(1+α)log(1/δ) random bits for any positive constant α. This almost matches the lower bound in [10]. However, a notable gap remains in sample complexity: the existing construction’s complexity poly(1/ε,log(1/δ)) does not align with the optimal O(log(1/δ)/ε2). This raised the following open problem (see, e.g., [41, Open Problem 4.24], [18, Section 6]):

Problem 1.

Can we explicitly design a (δ,ε) averaging sampler for domain {0,1}m that uses O(m+log(1/δ)) random bits and only O(log(1/δ)/ε2) samples?

We note that such algorithms do exist for general samplers, which query f and estimate 𝔼f through a more complicated computation than taking the average [5]. However, many applications require the use of averaging samplers, such as the original use in interactive proofs [6]. Beyond these applications, averaging samplers act as a fundamental combinatorial object that relate to other notions such as randomness extractors, expander graphs, and list-decodable codes [43, 39].

1.1 Our Averaging Sampler

In this paper, we construct a polynomial-time computable (δ,ε) averaging sampler with near-optimal sample complexity using an asymptotically optimal number of random bits. In fact, the sampler we constructed is a strong sampler, defined as follows:

Definition 2.

A (δ,ε) averaging sampler Samp is strong if for every sequence of t functions f1,,ft:{0,1}m[0,1], we have

Pr(Z1,,Zt)Samp(Un)[|1ti(fi(Zi)𝔼fi)|ε]1δ.

We now state our main theorems about averaging samplers, which follow from a more general theorem that is slightly more complicated to state, Theorem 21.

Theorem 1.

For every constant α>0, there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m that uses m+O(log(1/δ)) random bits and O((1ε2log1δ)1+α) samples.

This nearly resolves Problem 1. We also give a sampler with asymptotically optimal sample complexity but a worse randomness complexity.

Theorem 2.

There exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m that uses m+O(log1δ(log1ε+loglog1δ)) random bits and O(1ε2log1δ) samples.

1.2 Matrix Samplers

A natural generalization of the classic Chernoff bound is the Matrix Chernoff Bound [33, 2, 37]. Suppose we wish to estimate 𝔼f for a matrix-valued function f:{0,1}md×d satisfying f(x)1. By drawing t=O(log(d/δ)/ε2) independent random samples Z1,,Zt{0,1}m, the Matrix Chernoff Bound guarantees that

Pr[1ti=1tf(Zi)𝔼fε]1δ,

where denotes the spectral norm. As in the real-valued case, we wish to derandomize this process without increasing the sample complexity too much. To address this, Wigderson and Xiao [42] initiated the study of randomness-efficient matrix samplers:

Definition 3.

A function Samp:{0,1}n({0,1}m)t is a d-dimensional (δ,ε) matrix sampler with t samples using n random bits if the following holds: For any function f:{0,1}md×d such that f(x)1 for all x{0,1}m, we have

Pr(Z1,,Zt)Samp(Un)[1tif(Zi)𝔼fε]1δ.

Extending the construction of non-explicit standard averaging samplers [10], we can show that there exists a non-explicit matrix sampler that requires only an additional 2logd bits of randomness compared to averaging samplers while achieving asymptotically optimal sample complexity.

Proposition 4.

There exists a (non-explicit) d-dimensional (δ,ε) matrix sampler for domain {0,1}m using O(1ε2logdδ) samples and m+2log1δ+2logd+loglogdε random bits.

However, explicitly constructing randomness-efficient matrix samplers turns out to be very challenging. While a union bound over matrix entries suggests that a randomness-optimal averaging sampler directly implies a randomness-optimal matrix sampler (see Lemma 17), this method incurs an unavoidable d2 factor in sample complexity, making the dependence on d exponentially worse than optimal. This raises an open question: can we construct a matrix sampler with (nearly) optimal randomness complexity and polynomial sample complexity, analogous to the averaging samplers in [6] and [43]?

Problem 2.

Can we explicitly design a d-dimensional (δ,ε) matrix sampler for domain {0,1}m that uses O(m+log(d/δ)) random bits and poly(1/ε,log(1/δ),logd) samples?

Table 2: Comparison of matrix samplers, α any positive constant, ε=1/poly(m), δ=exp(poly(m)), ignoring lower order terms. The complexity of the union bound sampler depends on the complexity of the “base” averaging sampler, and we use the bound in Theorem 1 here.
Reference Method Random Bits Sample Complexity
Proposition 4 Non-Explicit m+2log(1/δ)+2logd O(log(d/δ)/ε2)
Standard Full Independence O(mlog(d/δ)/ε2) O(log(d/δ)/ε2)
[42] Union Bound Over Entries m+O(log(d/δ)) O((d/ε)2+αlog1+α(1/δ))
[16] Expander Walks m+O((1/ε2)log(d/δ)) O(log(d/δ)/ε2)
Theorem 3 Iterated Sampler Composition m+O(log(1/δ)+logdloglogd) O((log(d/δ)/ε2)1+α)

We summarize prior matrix sampler constructions in Table 2. The best existing construction, a matrix analog of the expander walks sampler, was provided by Garg, Lee, Song, and Srivastava [16]. Similar to expander walks for real-valued sampling, this construction gives asymptotically optimal sample complexity, but the randomness complexity is worse than optimal by a poly(1/ε) factor. We note that even if we allow the the matrix sampler to be non-averaging, no known better construction is currently known.

In this work, we construct a polynomial-time computable (δ,ε) matrix sampler with near-optimal randomness and sample complexity. The randomness complexity is optimal up to a logarithmic factor, and the sample complexity is within a (1ε2logdδ)α factor of optimal for arbitrarily small constant α>0. This brings us close to resolving Problem 2.

Theorem 3.

For any constant α>0: There exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m that uses m+O(log(1/δ)+log(d/ε)loglogd) random bits and O((1ε2logdδ)1+α) samples.

Additionally, we construct a matrix sampler achieving asymptotically optimal randomness complexity, though at the cost of increased sample complexity. This breaks the d2 barrier in sample complexity for randomness-optimal matrix samplers.

Theorem 4.

For any constant α>0, there exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m that uses m+O(log(d/δ)) random bits and O(dαε2+αlog1+α1δ) samples.

1.3 Samplers for General Normed Vector Space

Apart from spectral norms of matrices, it is natural to study the averaging-sampling problem in other normed vector spaces V.

Definition 5.

A function Samp:{0,1}n({0,1}m)t is a (V,)-sampler for a normed space V and a class of functions {f:{0,1}mV} if, for every f,

Pr(Z1,,Zt)Samp(Un)[1tif(Zi)𝔼fε]1δ.

We call Samp a V-sampler when ={f:{0,1}mVf(x)1 for all x}.

Under this definition, a d-dimensional matrix sampler is precisely a (d×d,2)-sampler. Previous work also studied (,)-samplers for a broader class of , such as subgaussian or subexponential real-valued functions [8, 1]. Extending our construction to other normed spaces and broader function classes remains an interesting direction for future research.

1.4 Randomness Extractors

Our sampler construction has implications for randomness extractors. A randomness extractor is a function that extracts almost-uniform bits from a low-quality source of randomness. We define the quality of a random source as its min-entropy.

Definition 6.

The min-entropy of a random variable X is

H(X):=minxsupp(X)log(1Pr[X=x]).

An (n,k)-source is a random variable on n bits with min-entropy at least k.

Then a randomness extractor is defined as:

Definition 7 ([28]).

A function Ext:{0,1}n×{0,1}d{0,1}m is a (k,ε) extractor if for every (n,k)-source X, the distribution Ext(X,Ud)εUm. We say Ext is a strong (k,ε) extractor if for every (n,k)-source X, the distribution (Ext(X,Y),Y)εUm+d, where Y is chosen from Ud.

Randomness extractors are essential tools in theoretical computer science. However, there has been little study of explicit extractors with the right dependence on ε for vanishing ε. This is a particular concern in cryptography, where extractors are widely used as building blocks and security requirements demand superpolynomially small ε [26, 40, 9, 14, 24, 23, 15]. Existentially, there are extractors with seed length d=log(nk)+2log(1/ε)+O(1), and there is a matching lower bound [29].

Zuckerman [43] showed that averaging samplers are essentially equivalent to extractors. Specifically, an extractor Ext:{0,1}n×[2d]{0,1}m can be seen as a sampler that generates Ext(X,i) as its i-th sample point using the random source X. Using this equivalence, we give the first extractor construction with optimal seed length up to an arbitrarily small constant factor bigger than 1, when the min-entropy k=βn for a large enough constant β<1.

Theorem 5.

For every constant α>0, there exists constant β<1 such that for all ε>0 and kβn, there is an efficient strong (k,ε) extractor Ext:{0,1}n×{0,1}d{0,1}m with m=Ω(k)log(1/ε) and d=(1+α)log(nk)+(2+α)log(1/ε)+O(1).

Prior to our work, extractors with a seed length dependence on ε achieving 2log(1/ε) or close to it were based on the leftover hash lemma [7, 21, 20] and expander random walks [17, 44]. Extractors using the leftover hash lemma have a seed length of n+2log(1/ε), which is far from optimal. Expander random walks give a (k,ε) extractor with k>(1Ω(ε2))n and an optimal seed length of log(nk)+2log(1/ε)+O(1). Our extractor is better than expander walks for all vanishing ε by allowing smaller entropy k.

In fact, if we aim to remove the α and achieve the optimal seed length of log(nk)+2log(1/ε)+O(1) to match expander random walks, we can set s=1 in Theorem 21 and get the following extractor for entropy rate 1O(1/logn) for ε1/poly(n):

Theorem 6.

There exists constant β<1 such that for all ε>0 and k(1βlogn+log(1/ε))n, there is an efficient strong (k,ε) extractor Ext:{0,1}n×{0,1}d{0,1}m with m=Ω(k)log2(1/ε) and d=log(nk)+2log(1/ε)+O(1).

This is better than expander random walks’ entropy rate of 1O(ε2) for all εo(1/logn).

1.5 List-Decodable Codes

Another perspective on averaging samplers is its connection to error-correcting codes. Ta-Shma and Zuckerman [35] showed that strong randomness extractors are equivalent to codes with good soft-decision decoding, which is related to list recovery. From this perspective, the composition scheme in our construction is similar to code concatenation.

For codes over the binary alphabet, soft decision decoding amounts to list decodability, which we focus on here.

We give good list-decodable codes without using the composition. That is, by just applying our almost -wise independence sampler on the binary alphabet, we can get a binary list-decodable code with rate Ω(ε2+α) and non-trivial list size, although the list size is still exponential.

Theorem 7.

For every constant α>0: there exists an explicit binary code with rate Ω(ε2+α) that is (ρ=12ε,L) list-decodable with list size L=2(1c)n for some constant c=c(α)>0.

Prior to our work, the best known code rate was Ω(ε3) by Guruswami and Rudra [19]. We emphasize that their code achieved a list size of L=poly(n), while our list size is exponentially large, making our code unlikely to be useful.

1.6 Techniques

1.6.1 Averaging Samplers

Our construction of the averaging sampler is very simple, and is based on two observations:

  1. 1.

    Rather than querying every sample point produced by a sampler Samp, we can use an inner sampler Sampin to select a subset of samples for querying. This sub-sampling approach has been utilized in previous sampler constructions [6, 18]. Although Sampin incurs an additional randomness cost, the final sample complexity depends only on Sampin, leading to reduced overall sample complexity. Since the domain of Sampin is much smaller than the original domain, we can leverage more efficient sampling strategies.

  2. 2.

    The bottleneck of generating an almost -wise independent sequence over a large domain {0,1}m lies in sampling independent random points, which costs m random bits. Since we can only afford O(m) random bits, we are restricted to generating constant-wise independent samples. However, for a much smaller domain, we can use few random bits to generate an almost -wise independent sequence for large .

Our construction is outlined as follows. Let SampE:{0,1}n×[t]{0,1}m be the extractor-based sampler in [31]. Let Y1,,Yt be an almost -wise independent sequence over domain [t], thinking of tt. Our sampler is then defined by

Samp:=(SampE(X,Y1),SampE(X,Y2),,SampE(X,Yt)).

In this construction, we use the almost -wise independent sequence to sub-sample from the extractor-based sampler. This can be viewed as a composition, similar to other cases such as Justesen codes [22] and the first PCP theorem [4], where the goal is to optimize two main parameters simultaneously by combining two simpler schemes, each optimizing one parameter without significantly compromising the other.

Previous works have applied almost -wise independence in extractor constructions. Srinivasan and Zuckerman [34] proved a randomness-efficient leftover hash lemma by sampling an almost -wise independent function f using uniform seeds Ud and output f(X), where X is the weak random source. From an extractor perspective, our inner sampler takes an inverse approach: we use X to pick a function f in the space of almost -wise independent functions, and then output f(Ud). Furthermore, Raz’s two-source extractor [30] follows a more general framework, where two weak random sources to sample are used – one to sample an almost -wise independent function and the other as its input. However, directly applying Raz’s error bound in our analysis (Lemma 20) results in a sample complexity that is off by a log(1/δ) factor.

1.6.2 Matrix Samplers

Using the connection between averaging samplers and matrix samplers (see Lemma 17), our averaging sampler directly implies a (δ,ε) matrix sampler using m+O(log(d/δ)) random bits and O((d2ε2log1δ)1+α) samples. This already gives the best randomness-optimal matrix sampler to date; however, its sample complexity has exponentially worse dependence on d than optimal.

Our sub-sampling technique using almost -wise independence offers a way to further reduce sample complexity. The composition of samplers only depends on the triangle inequality, which also applies to spectral norms. The remaining task is to verify that almost -wise independence also provides good concentration bounds for matrix sampling, which is straightforward given the extensive literature on moment inequalities for random matrices [11, 25, 38].

Applying this composition, we get a (δ,ε) matrix sampler using m+O(log(d/δ)) random bits and O((dαε2+αlog1+α1δ)) samples, as described in Theorem 4. This is close to optimal for cases where d<poly(1/ε,log(1/δ)), though it is not yet sufficient for larger d.

However, we can apply composition recursively. By repeating the composition O(loglogd) times, the dependence on d becomes dαO(loglogd)=O(1). Each round of composition costs an additional O(log(d/δ)) random bits, resulting in a (δ,ε) matrix sampler using m+O(log(d/δ)loglogd) random bits and O((1ε2logdδ)1+α) samples. This already gives a matrix sampler using m+O~(log(d/δ)) random bits and near-optimal sample complexity.

To further improve the dependence on δ in randomness complexity and achieve the bound in Theorem 3, we introduce an alternative way of composing samplers:

Proposition 8.

Suppose we are given two efficient matrix samplers:

  • Let Sampout:{0,1}n1×[t1]{0,1}m be a (δ1,ε1) matrix sampler.

  • Let Sampin:{0,1}n2×[t2]{0,1}n1 be a (δ2,ε2) averaging sampler.

Then, for uniformly random sources XUn2,

Samp(X):=(Sampout(Sampin(X,i),j))i[t2],j[t1]

is an efficient (δ2,2δ1+2ε2+ε1) matrix sampler for domain {0,1}m with t1t2 samples using n2 random bits.

This essentially says, composing a good (ε,ε) matrix sampler Sampout and a good (δ,ε) standard averaging sampler Sampin would give a good (δ,O(ε)) matrix sampler. Although this slightly increases the sample complexity, we can use our sub-sampling technique to reduce it later.

Unlike sub-sampling, in the composition of Proposition 8,Sampin generates multiple random seeds for Sampout, and we query all the samples it produces. This approach effectively reduces the error probability of Sampout from ε to δ. The key idea is that only an O(ε) fraction of the seeds generated by Sampin lead to failure in Sampout, contributing only a tolerable O(ε) additive error in the estimate of 𝔼f. The reasoning is straightforward: at most an ε fraction of all possible seeds for Sampout cause failure, and with probability 1δ, Sampin does not oversample these failure seeds by more than an additional ε proportion. As a result, the final proportion of failure seeds remains bounded by O(ε).

 Remark 9.

We can also define strong matrix samplers as a matrix analog of strong averaging samplers. All results for matrix samplers in this paper would hold for strong matrix samplers as well, with proofs following similar arguments. However, for simplicity, we present our results in the non-strong case only.

2 Preliminaries

Notation. We use [t] to represent set {1,,t}. For integer m, Um is a random variable distributed uniformly over {0,1}m. For random variables X and Y, we use XεY to represent the statistical distance (total variation distance) between X and Y is at most ε, i.e.,

maxTsupp(X)|PrxX[xT]PryY[yT]|ε.

We refer to an algorithm as “efficient” if it is polynomial-time computable. For simplicity, we omit domain sizes for samplers and matrix dimensions when context permits. Unless otherwise specified, statements such as “there exists a (δ,ε) sampler” mean that for all 0<δε<1, there exists a (δ,ε) sampler with the stated properties.

 Remark 10.

The condition δε is very mild and holds in nearly all applications. This requirement can be relaxed to δεα for averaging samplers, and to δdεα for matrix samplers, where α is an arbitrarily small positive constant. Such relaxations do not alter the results.

In the extreme case where δ>εα for every constant α>0, pairwise independence is already a near-optimal averaging sampler (see Lemma 12). Specifically, this yields an efficient strong sampler with O(1/(δε2))O(1/ε2+α) samples, using only O(m+log(1/ε)) random bits. Similarly, for matrix samplers under the condition δ>dεα for all α>0, pairwise independence also achieves near-optimal efficiency with O(1/ε2+α) samples and O(m+log(1/ε)) random bits.

2.1 Extractor-Based Sampler

As mentioned above, averaging samplers are equivalent to extractors. We will introduce this in detail in Section 5.1. Reingold, Vadhan, and Wigderson used this equivalence to achieve the following:

Theorem 11 ([31, Corollary 7.3], see also [18, Theorem 6.1]).

For every constant α>0, there exists an efficient (δ,ε) averaging sampler over {0,1}m with poly(1/ε,log(1/δ)) samples using m+(1+α)log2(1/δ) random bits.

For ease of presentation, we often denote an extractor-based averaging sampler by SampE:{0,1}n×{0,1}d{0,1}m, where SampE(X,i) is the i-th output sample point of the sampler using randomness input X. Therefore, the sample complexity of SampE is 2d.

2.2 Almost -wise Independence

A sequence Z1,,Zt is pairwise independent if the marginal distribution of every pair (Zi1,Zi2) is uniformly random. Chor and Goldreich [12] proved that using pairwise independence, we can have a sampler using few random bits but with unsatisfying sample complexity.

Lemma 12 ([12]).

For all δ,ε>0, there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m sampler with O(1/(δε2)) samples using O(m+log(1/δ)+log(1/ε)) random bits.

Generalizing pairwise independence, an almost -wise independent sequence is a sequence of random variables such that the marginal distribution of every of them is close to uniform.

Definition 13 ([27]).

A sequence of random variables Z1,,Zt{0,1}m is said to be γ-almost -wise independent if for all subsets S[t] such that |S|,

(Zi)i[S]γUm×|S|.

In particular, the pairwise independent sequence mentioned above is a 0-almost 2-wise independent sequence. Naor and Naor proved that such sequences can be randomness-efficiently generated.

Lemma 14 ([3]).

There exists an efficient algorithm that uses (2+o(1))(m2+loglogt)+2log1γ random bits to generate a γ-almost -wise independent sequence z1,,zt{0,1}m.

Using standard techniques, we have the following concentration bound for almost -wise independent sequences (see Appendix B for the proof). Similar bounds for exact -wise independent sequences have been shown in [6, 13].

Lemma 15.

Let Z1,,Zt{0,1}m be a sequence of γ-almost -wise independent variables for an even integer . Then for every sequence of functions f1,,ft:{0,1}m[0,1],

Pr[|1ti=1t(fi(Zi)𝔼fi)|ε]1(25ε2t)/2γε.

2.3 Composition of Samplers

The idea of composing samplers has been studied before. More specifically, Goldreich proved the following proposition.

Proposition 16 ([18]).

Suppose we are given two efficient samplers:

  • A (δ,ε) averaging sampler for domain {0,1}m with t1 samples using n1 random bits.

  • A (δ,ε) averaging sampler for domain {0,1}logt1 with t2 samples using n2 random bits.

Then, there exists an efficient (δ+δ,ε+ε) averaging sampler for domain {0,1}m with t2 samples using O(n1+n2) random bits.

2.4 Averaging Samplers Imply Matrix Samplers

When Wigderson and Xiao first introduced matrix samplers, they observed that an averaging sampler also functions as a matrix sampler with weaker parameters, though they did not provide a formal proof. We formalize this observation below:

Lemma 17.

A (δ,ε) averaging sampler is a d-dimensional (2d2δ,2dε) matrix sampler.

The proof is presented in Appendix C.

3 Construction of Averaging Samplers

Our construction is based on a reduction lemma that constructs a sampler for domain {0,1}m based on a sampler for domain {0,1}O(log(1/ε)+loglog(1/δ)). We exploit the fact that when composing averaging samplers, the final sample complexity depends on only one of the samplers. Our strategy is:

  • Apply the extractor sampler in Theorem 11 as a (δ/2,ε/2) sampler over domain {0,1}m. This uses m+O(log(1/δ)) random bits and generates poly(1/ε,log(1/δ)) samples.

  • By Proposition 16, we only need to design a (δ/2,ε/2) averaging sampler over domain {0,1}O(log(1/ε)+loglog(1/δ)) using O(log(1/δ)) random bits. The total sample complexity will be equal to the sample complexity of this sampler. For this sampler, we use almost -wise independence.

Following the idea of Proposition 16, we first prove that composing samplers maintains the properties of a strong sampler.

Lemma 18 (Strong Composition).

Suppose we are given two efficient averaging samplers:

  • Let Sampout:{0,1}n1×[t1]{0,1}m be a (δ,ε) sampler.

  • Let Sampin:{0,1}n2×[t2]{0,1}logt1 be a strong (δ,ε) sampler.

Then, for uniformly random sources X1Un1 and X2Un2,

Samp(X1X2):=(Sampout(X1,Sampin(X2,i)))i[t2]

is an efficient (δ+δ,ε+ε) strong averaging sampler for domain {0,1}m with t2 samples using n1+n2 random bits.

Proof.

Let f1,,ft2:{0,1}m[0,1] be an arbitrary sequence of functions, and define favg:=1t2i=1t2fi. Since Sampout is a (δ,ε) averaging sampler and favg is bounded in [0,1], we have

PrX1Un1[|𝔼YUlogt1favg(Sampout(X1,Y))𝔼favg|ε]1δ.

Equivalently, we can express this as

PrX1Un1[|1t2i=1t2𝔼YUlogt1fi(Sampout(X1,Y))1t2i=1t2𝔼fi|ε]1δ. (1)

For an arbitrary x, view fi(Sampout(x,)) as a Boolean function on domain {0,1}logt1. Therefore, since Sampin(X2,1),,Sampin(X2,t2) are generated by a strong (δ,ε) sampler,

PrX2[|1t2i=1t2(fi(Sampout(x,Sampin(X2,i)))𝔼YUlogt1fi(Sampout(x,Y)))|ε]1δ. (2)

By the triangle inequality and a union bound over equations (1) and (2), we have

PrX1,X2[|1t2i=1t2(fi(Sampout(x,Sampin(X2,i)))𝔼fi)|ε+ε]1δδ.

This proves that the sampler we constructed is a strong (δ+δ,ε+ε) averaging sampler.

Instantiating Lemma 18 with the extractor-based sampler from Theorem 11 gives:

Lemma 19 (Reduction Lemma).

For any α>0: For a sufficiently large constant C>0, suppose there exists an efficient (δ,ε) averaging sampler Sampbase for domain {0,1}C(log(1/ε)+loglog(1/δ)) with t samples using n random bits. Then there exists an efficient (δ+δ,ε+ε) averaging sampler Samp for domain {0,1}m with t samples using m+(1+α)log(1/δ)+n random bits. Moreover, if Sampbase is a strong sampler, then Samp is also strong.

Proof.

By Theorem 11, there exists an explicit (δ,ε) averaging sampler SampE:{0,1}n×{0,1}d{0,1}m with

n=m+(1+α)(log(1/δ))andd=log(poly(1/ε,log(1/δ)))C(loglog(1/δ)+log(1/ε))

when C is large enough. Therefore, Sampbase can work for domain {0,1}d.

This enables us to apply Lemma 18. The total number of random bits used is n+n=m+(1+α)log(1/δ)+n, and a total of t samples are used.

Next, we show that for domain {0,1}m with mO(log(1/ε)+loglog(1/δ)), we can use an almost -wise independent sequence to design a strong averaging sampler with near-optimal sample complexity.

Lemma 20.

For any 2s<1/δ, there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m with O(sε2log1δ) samples using (2+o(1))(m+2log(1/ε))log(1/δ)logs+2log(1/δ) random bits.

Proof.

We begin by setting =2log(2/δ)logs, γ=δε2, and t=50slog(2/δ)ε2logs. We then define our sampler by outputting a γ-almost -wise independent sequence Z1,,Zt{0,1}m. Taking the parameters of Lemma 15, observe

(25ε2t)/2=(1s)/2=(1s)log(2/δ)logs=δ2,

and

γε=δ2.

Therefore, for every sequence of functions f1,,ft:{0,1}m[0,1],

Pr[|1ti=1t(fi(Zi)𝔼fi)|ε]1δ.

Furthermore, Lemma 14 shows that we have an efficient algorithm that uses only

(2+o(1))(m2+loglogt)+2log1γ=(2+o(1))(m+2log(1/ε))log(1/δ)logs+2log1δ

random bits to generate this γ-almost -wise independent sequence.

We remark that the construction in Lemma 20 can be replaced with a perfectly -wise independent sequence. This yields a slightly weaker sampler that uses O((m+log(1/ε)+loglog(1/δ)logs+1)log1δ) random bits and O(sε2log1δ) samples. This looser construction is already sufficient to establish Theorem 1. However, our tighter construction in Lemma 20 applies more broadly, particularly when m is small – for example, in Lemma 36.

Combining Lemma 19 and Lemma 20, we can prove our main result about averaging samplers:

Theorem 21.

For every constant α>0, and for any m1, δ,ε>0, and 2s1/δ, there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m that uses

m+O(log(1/ε)+loglog(1/δ)logslog1δ)+(3+α)log1δ

random bits and O((s/ε2)log(1/δ)) samples.

Proof.

By Lemma 19, our goal is to design an efficient strong (δ/2,ε/2) averaging sampler Sampbase for domain {0,1}C(log(1/ε)+loglog(1/δ)) for some large enough constant C. By Lemma 20, for any 2s<1/δ, such sampler exists using O(sε2log1δ) samples and

O((log(1/ε)+loglog(1/δ))log(1/δ)logs)+2log1δ

Taking these into Lemma 19 gives us the desired bounds.

For an arbitrarily small constant α, by setting s=ε2αlogα(1/δ) in Theorem 21, we get Theorem 1 as a corollary: See 1

We can also set s=2 in Theorem 21 and get the following sampler with asymptotically optimal sample complexity but a worse randomness complexity.

Theorem 2. [Restated, see original statement.]

There exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m that uses m+O(log1δ(log1ε+loglog1δ)) random bits and O(1ε2log1δ) samples.

4 Construction of Matrix Samplers

Before moving further, we note that non-explicitly, a good matrix sampler exists. This generalizes the non-explicit sampler given in [10], with the proof deferred to Appendix A.

Proposition 4. [Restated, see original statement.]

There exists a (non-explicit) d-dimensional (δ,ε) matrix sampler for domain {0,1}m using O(1ε2logdδ) samples and m+2log1δ+2logd+loglogdε random bits.

Our improved averaging sampler directly implies the best randomness-optimal matrix sampler to date. Applying Lemma 17 to our sampler in Theorem 1 gives:

Lemma 22.

For every constant α>0, there exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m using m+O(log(d/δ)) random bits and O((d2ε2log1δ)1+α) samples.

However, compared to the optimal sample complexity given in the non-explicit construction, our dependence on d is exponentially worse. As d is potentially very large, our goal is to utilize our composition to reduce the sample complexity while not increasing the randomness complexity too much.

4.1 One-Layer Composition

It is easy to verify that the composition lemma holds for matrices:

Lemma 23 (Matrix Composition).

Suppose we are given two efficient matrix samplers:

  • Let Sampout:{0,1}n1×[t1]{0,1}m be a (δ1,ε1) matrix sampler.

  • Let Sampin:{0,1}n2×[t2]{0,1}logt1 be a (δ2,ε2) matrix sampler.

Then, for uniformly random sources X1Un1 and X2Un2,

Samp(X1X2):=(Sampout(X1,Sampin(X2,i)))i[t2]

is an efficient (δ1+δ2,ε1+ε2) matrix sampler for domain {0,1}m with t2 samples using n1+n2 random bits.

The proof is essentially the same as the proof of Lemma 18, since the triangle inequality applies to spectral norms, but since we are not dealing with the strong case we only have to do a union bound for two events, a bad sample from Sampout and a bad sample from Sampin.

The following lemma is the matrix version of Lemma 20, and we delay its proof to Section 4.4.

Lemma 24.

For any 2s<d/δ, there exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m with O(sε2logdδ) samples using O((m+log(1/ε))log(d/δ)logs+log(d/δ)) random bits.

Composing Lemma 22 with Lemma 24 gives us the next theorem.

Lemma 25.

Suppose we have an efficient d-dimensional (δ1,ε1) matrix sampler for domain {0,1}m with t samples using n bits. For any constant α>0 such that (t/ε2)αd/δ2, we can construct an efficient d-dimensional (δ1+δ2,ε1+ε2) matrix sampler for domain {0,1}m with O(tαε22+αlogdδ2) samples using n+O(log(d/δ2)) bits.

Proof.

When (t/ε2)αd/δ, by setting s=(t/ε2)α in Lemma 24, we have a strong (δ2,ε2) matrix sampler for domain {0,1}logt with O(tαε22+αlog1δ2) samples using

O((logt+log(1/ε2))log(d/δ2)logs+log(d/δ2))=O(log(d/δ2))

random bits. Then by Lemma 23, we have the theorem we want.

Theorem 4. [Restated, see original statement.]

For any constant α>0, there exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m that uses m+O(log(d/δ)) random bits and O(dαε2+αlog1+α1δ) samples.

Proof.

Set δ1=δ2=δ/2 and ε1=ε2=ε/2 in Lemma 25 and apply it to Lemma 22 will give the result.

4.2 Iterated Composition

Lemma 26.

There’s an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m using m+O(log(d/δ)loglogd) random bits and O((loglogdε)5log2dδ) samples.

Proof.

Let r=loglogd. We will prove that there exists a constant C>0 such that for each i[r], there exists an efficient (iδr,iεr) matrix sampler using at most m+iClogdδ random bits and ti samples, where

ti=d23iCr5ε5log2dδ.

The i=r case proves the lemma.

We will prove by induction on i from 1 to r.

Base Case (𝒊=𝟏).

By Lemma 22, there exists an efficient d-dimensional (δr,εr) matrix sampler using m+C1(logdδ/r) random bits and C2d3(ε/r)3log1.5rδ samples for some constants C1,C2>0. When C2C1 and CC2, we have

m+C1(logdδ/r)m+C(logdδ)

and

C2d3(ε/r)3log1.5rδd22Cr5ε5log2dδt1.
Inductive Step.

Assume that for some i[1,r1], there exists an efficient (iδr,iεr) matrix sampler using m+iClogdδ random bits and ti samples. By choosing some constant α<1/2 such that (tir/ε)α<dr/δ in Lemma 25, we have a ((i+1)δr,(i+1)εr) matrix sampler using m+iClogdδ+C3logdδ/r random bits and C4ti(ε/r)2.5logdδ/r samples for some constants C3 and C4. When C2C3 and C2C4, we have

m+iClogdδ+C3logdδ/rm+(i+1)Clogdδ

and

C4tiα(ε/r)2+αlogdδ/r2C4Cd23ir5ε5log2dδd23(i+1)Cr5ε5log2dδti+1.

This finishes the induction and proves the lemma. Using Lemma 25, we have the following lemma:

Lemma 27.

For any constant α>0: There exists an efficient d-dimensional (δ,ε) matrix sampler for domain {0,1}m using m+O(log(d/δ)loglogd) random bits and O((1ε2logdδ)1+α) samples.

Proof.

Set δ1=δ2=δ/2 and ε1=ε2=ε/2 in Lemma 25 and apply it to Lemma 26 will give the result.

4.3 Another Composition Scheme

To further reduce the number of random bits used in Lemma 27, we introduce another way of composing matrix samplers. Instead of sub-sampling the samples, we sample the random seeds here.

Proposition 8. [Restated, see original statement.]

Suppose we are given two efficient matrix samplers:

  • Let Sampout:{0,1}n1×[t1]{0,1}m be a (δ1,ε1) matrix sampler.

  • Let Sampin:{0,1}n2×[t2]{0,1}n1 be a (δ2,ε2) averaging sampler.

Then, for uniformly random sources XUn2,

Samp(X):=(Sampout(Sampin(X,i),j))i[t2],j[t1]

is an efficient (δ2,2δ1+2ε2+ε1) matrix sampler for domain {0,1}m with t1t2 samples using n2 random bits.

Proof.

Let f:{0,1}md×d be a function such that f(x)1 for all x{0,1}m. We define hf:{0,1}n1{0,1} as follows:

hf(x):=𝟏[1t1i=1t1f(Sampout(x,i))𝔼f>ε1].

Then 𝔼hf<δ1. One good property of hf is that

1t1i=1t1f(Sampout(x,i))𝔼f2hf(x)+ε1.

For Z1,,Zt2 the output of Sampin, we have

Pr[|1t2ihf(Zi)𝔼hf|ε2]1δ2.

Therefore, with 1δ2 probability,

1t2ihf(Zi)<δ1+ε2.

Then we have

1t1t2i=1t2j=1t1f(Sampout(Zi,j))𝔼f 1t2i=1t21t1j=1t1f(Sampout(Zi,j))𝔼f
1t2i=1t2(2hf(Zi)+ε1)
ε1+2t2ihf(Zi).

This show that, with probability 1δ2, the error of Samp is at most ε1+2δ1+2ε2. See 3

Proof.

We apply Proposition 8 with the following choices:

  • Sampout: Use the (ε/5,ε/5) matrix sampler for domain {0,1}m in Lemma 27 by choosing α=0.5. This uses m+O(log(d/ε)loglogd) random bits and O((1ε2logdε)1.5) samples.

  • Sampin: Use the (δ,ε/5) averaging sampler for domain {0,1}m+O(log(d/ε)loglogd) in Theorem 1 by choosing α=0.5. This uses m+O(log(1/δ)+log(d/ε)loglogd) random bits and O((1ε2log1δ)1.5) samples.

This gives as an efficient (δ,ε) matrix sampler using m+O(log(1/δ)+log(d/ε)loglogd) random bits and O((1ε2logdε)1.5(1ε2log1δ)1.5) samples.

Set δ1=δ2=δ/2 and ε1=ε2=ε/2 in Lemma 25 and apply it to this sampler will reduce the sample complexity to O((1ε2logdδ)1+α) for any constant α>0.

4.4 Proof of Lemma 24

4.4.1 Concentration of Random Matrices

The goal of this section is to prove Lemma 30, an analog of Lemma 15 for random Hermitian matrices. Our approach follows standard techniques from the random matrix literature [36, 11, 25].

Lemma 28.

Let {Xi}i[t] be a sequence of independent, mean-zero, self-adjoint random matrices of size d×d such that Xi<1 for every i[t]. Then, for every even interger q2, we have

𝔼[Tr((i=1tεiXi)q)]d(qt)q/2,

where the sequence {εi} consists of independent Rademacher random variables.

Proof.

Note that

𝔼[Tr((i=1tεiXi)q)]=v=(v1,,vq)[t]q𝔼[Tr(εv1Xv1εvqXvq)]

When some index appears an odd number of times in the word v=(v1,,vq), the corresponding term vanishes. Hence we may restrict the sum to

𝒯={v[t]q:every index in v appears an even number of times},

and we have

𝔼[Tr((i=1tεiXi)q)]=v𝒯𝔼[Tr(εv1Xv1εvqXvq)]v𝒯d𝔼[Xv1Xvq]v𝒯d,

where the last inequality comes from the fact that Xi1 always holds for every i.

It remains to bound |𝒯|. Since each index appearing in v𝒯 must occur an even number of times, one can group the q positions into exactly q/2 unordered pairs so that the two entries in each pair carry the same index. The number of ways to pair is bounded by qq/2. For each pairing there are t choices of index per pair. Therefore,

|𝒯|qq/2tq/2(qt)q/2.

Therefore,

𝔼[Tr((i=1tεiXi)q)]d(qt)q/2d|𝒯|d(qt)q/2,

which proves the lemma.

Using a standard symmetrization trick, we get the following lemma, which is an analog of Proposition 39:

Lemma 29.

Let {Xi}i[t] be a sequence of independent, mean-zero, self-adjoint random matrices of size d×d such that Xi<1 for every i[t]. Then, for every even interger q2, we have

𝔼[Tr((i=1tXi)q)]d(4qt)q/2.
Proof.

Let us write

S:=i=1tXi,S:=i=1tXi,

where {Xi}i=1t is an independent copy of {Xi}i=1t, independent also of the original Xi’s. Since each Xi has mean zero, we have

𝔼[Tr(Sq)]=𝔼[Tr((S𝔼[S])q)]𝔼[Tr((SS)q)],

where the last inequality follows from Jensen’s inequality.

Next, observe that

SS=i=1t(XiXi).

Introduce a fresh sequence of Rademacher random variables {εi}i=1t, independent of everything else. Then, conditional on {Xi,Xi}i=1t, the random matrix

i=1tεi(XiXi)

has the same distribution as SS. Consequently,

𝔼[Tr((SS)q)]=𝔼X,X𝔼ε[Tr((i=1tεi(XiXi))q)].

Note that for each fixed (X,X,ε),

Tr((i=1tεi(XiXi))q)2q1(Tr((i=1tεiXi)q)+Tr((i=1tεiXi)q)).

Taking 𝔼ε, then 𝔼X,X, and using that {Xi} has the same law as {Xi} gives

𝔼[Tr((SS)q)]=𝔼X,X𝔼ε[Tr((i=1tεi(XiXi))q)]2q𝔼X𝔼ε[Tr((i=1tεiXi)q)].

In combination with the symmetrization bound 𝔼[Tr(Sq)]𝔼[Tr((SS)q)], this yields

𝔼[Tr(Sq)]2q𝔼[Tr((i=1tεiXi)q)].

Taking in the bound in Lemma 28, we have

𝔼[Tr((i=1tXi)q)]2qd(qt)q2=d(22qt)q2=d(4qt)q2.

Lemma 30.

Let Z1,,Zt{0,1}m be a sequence of γ-almost -wise independent variables for a positive even integer . Then for any f:{0,1}md×d such that for all x{0,1}m, f(x) is Hermitian and f(x)1, we have

Pr[1ti=1tf(Zi)𝔼fε]1d(16ε2t)/22γdε.
Proof.

Let Wi:=(f(Zi)𝔼f)/2. We have

Pr[1ti=1tf(Zi)𝔼f>ε]=Pr[i=1tWi>tε2]𝔼[i=1tWi](tε/2).

Note that each Wi is always a Hermitian matrix, so their sum is always Hermitian and therefore normal. Then we have

i=1tWi=(i=1tWi).

Moreover, since is a positive even integer, (i=1tWi) is positive semidefinite. Therefore, (i=1tWi)Tr((i=1tWi))=Tr(i1,,i[t]Wi1Wi2Wi)=i1,,i[t]Tr(Wi1Wi2Wi).

Let W1,,Wt be a sequence of independent random variables where Wi:=fi(U{0,1}m)𝔼fi.

Since the Wi’s are γ-almost -wise independent and |Wi|1, we have

𝔼[i=1tWi] 𝔼[i1,,i[t]Tr(Wi1Wi2Wi)]
=i1,,i[t]𝔼[Tr(Wi1Wi2Wi)]
i1,,i[t]𝔼[Tr(Wi1Wi2Wi)]+γdt
=𝔼[Tr((i=1tWi))]+γdt.

Therefore, by Lemma 29, we have

𝔼[i=1tWi]𝔼[Tr((i=1tWi))]+γdtd(4t)/2+γdt.

Hence,

Pr[i=1tWi>tε2]𝔼[i=1tWi](tε/2)d(16ε2t)/2+2γdε.

4.4.2 Almost -wise independence for small domains

We first prove that the concentration analysis for Hermitian matrices directly implies the general case.

Lemma 31.

Let Samp:{0,1}n({0,1}m)t be a function. Suppose for any f:{0,1}m2d×2d such that for all x{0,1}m, f(x) is Hermitian and f(x)1,

Pr(Z1,,Zt)Samp(Un)[1tif(Zi)𝔼fε]1δ.

Then Samp is a d-dimensional (δ,ε) matrix sampler.

Proof.

Let (Z1,,Zt)Samp(Un). We are going to prove that for any function f:{0,1}md×d such that f(x)1 for all x{0,1}m, we have

Pr[1tif(Zi)𝔼fε]1δ.

For any matrix Ad×d, its Hermitian dilation (A)2d×2d is defined by

(A):=[0AA0].

It is easy to verify that A=(A). Then, for function g:x(f(x)), we have

Pr[1tig(Zi)𝔼gε]1δ.

Note that we have

1tif(Zi)𝔼f=(1tif(Zi)𝔼f)=1tig(Zi)𝔼g.

Hence,

Pr[1tif(Zi)𝔼fε]1δ.

Now we are ready to prove Lemma 24. See 24

Proof.

We begin by setting =2log(2d/δ)logs, γ=δε2+1d, and t=16sε2. We then define our sampler by outputting a γ-almost -wise independent sequence Z1,,Zt{0,1}m. Taking the parameters of Lemma 30, observe

d(16ε2t)/2d(1s)/2=d(1s)log(2d/δ)logs=δ2,

and

2γdε=δ2.

Let Samp:{0,1}n({0,1}m)t be a function. Suppose for any f:{0,1}m2d×2d such that for all x{0,1}m, f(x) is Hermitian and f(x)1,

Pr(Z1,,Zt)Samp(Un)[1tif(Zi)𝔼fε]1δ.

Then Samp is a d-dimensional (δ,ε) matrix sampler by Lemma 31. Therefore, by Lemma 30, we have

Pr[|1ti=1t(f(Zi)𝔼f)|ε]1δ.

Our sampler uses

t=O(sε2)=O(sε2logslogdδ)

samples. Furthermore, Lemma 14 shows that we have an efficient algorithm that uses only

O(m+log(1/γ)+loglogt) =O(m+log(1/ε)+log(d/δ)+loglogs)
=O((m+log(1/ε))log(d/δ)logs+log(d/δ))

random bits to generate this γ-almost -wise independent sequence.

 Remark 32.

The initial work by Wigderson and Xiao [42] on matrix samplers focused on Hermitian matrices, where each f(x) was assumed to be Hermitian. Nevertheless, as shown in Lemma 31, any sampler that works for Hermitian matrices can naturally be applied to general matrices as well.

5 Applications to Extractors and Codes

5.1 Applications to Extractors

Zuckerman showed that averaging samplers are equivalent to randomness extractors [43]. Here we state the only direction that we need.

Lemma 33 ( [43]).

An efficient strong (δ,ε) averaging sampler Samp:{0,1}n({0,1}m)t gives an efficient strong (nlog(1/δ)+log(1/ε),2ε) extractor Ext:{0,1}n×{0,1}logt{0,1}m.

Applying Lemma 33 on Theorem 1 gives Theorem 5:

Theorem 5. [Restated, see original statement.]

For every constant α>0, there exists constant β<1 such that for all ε>0 and kβn, there is an efficient strong (k,ε) extractor Ext:{0,1}n×{0,1}d{0,1}m with m=Ω(k)log(1/ε) and d=(1+α)log(nk)+(2+α)log(1/ε)+O(1).

Proof.

By Theorem 1, for any positive constant α>0, there exists a constant λ>1 such that there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m with O(1ε2+αlog1+α1δ) samples using λ(m+log1δ) random bits.

To construct the required strong (k,ε) extractor for every n, we set δ such that log(1/δ)=n2λ+log(1/ε). Then, we construct an efficient strong (δ,ε) sampler Samp for domain {0,1}m where

m=nλlog(1/δ)>n2λlog(1/ε)=Ω(n)log(1/ε).

By the above, Samp uses n random bits and generates O(1ε2+αlog1+α1δ) samples.

By Lemma 33, Samp implies an efficient strong (nlog(1/δ)+log(1/ε),2ε) extractor Ext:{0,1}n×{0,1}d{0,1}m with d(1+α)log(nk)+(2+α)log(1/ε)+O(1). It is only left to verify that nlog(1/δ)+log(1/ε)βn for some constant β<1. We have

nlog(1/δ)+log(1/ε)=nn2λ2λ12λn.

This proves the theorem.

If we would like an extractor with the optimal seed length of d=log(nk)+2log(1/ε)+O(1), we can have the following extractor using Theorem 2. See 6

Proof.

By Theorem 2, there exists a constant λ>1 such that there exists an efficient strong (δ,ε) averaging sampler for domain {0,1}m with O(1ε2log1δ) samples using m+λlog1δ(loglog(1/δ)+log(1/ε)) random bits.

To construct the required strong (k,ε) extractor for every n, we set δ such that log(1/δ)=12λ(nlogn+log(1/ε))+log(1/ε). Then, we construct an efficient strong (δ,ε) sampler Samp for domain {0,1}m where

m =nλlog1δ(loglog(1/δ)+log(1/ε))
nn2loglog(1/δ)+log(1/ε)logn+log(1/ε)log2(1/ε)log(1/ε)logn
Ω(n)log2(1/ε).

By the above, Samp uses n random bits and generates O(1ε2log1δ) samples.

By Lemma 33, Samp implies an efficient strong (nlog(1/δ)+log(1/ε),2ε) extractor Ext:{0,1}n×{0,1}d{0,1}m with d=log(nk)+2log(1/ε)+O(1). It is only left to verify that nlog(1/δ)+log(1/ε)(1βlogn+log(1/ε))n for some constant β<1. We have

nlog(1/δ)+log(1/ε)=n12λ(nlogn+log(1/ε))(112λ(logn+log(1/ε)))n.

This proves the theorem.

5.2 Application to List-Decodable Codes

Error-correcting codes are combinatorial objects that enable messages to be accurately transmitted, even when parts of the data get corrupted. Codes have been extensively studied and have proven to be extremely useful in computer science. Here we focus on the combinatorial property of list-decodability, defined below.

Definition 34.

A code ECC:{0,1}n({0,1}m)t is (ρ,L) list-decodable if for every received message r({0,1}m)t, there are at most L messages x{0,1}n such that dH(ECC(x),r)ρt, where dH denotes the Hamming distance. A code is binary if m=1.

We focus on the binary setting, i.e., m=1.

Lemma 35 ([35]).

An efficient strong (δ,ε) averaging sampler Samp:{0,1}n{0,1}t over the binary domain gives an efficient binary code that is (ρ=12ε,δ2n) list-decodable with code rate R=n/t.

To construct our codes, we will use our almost -wise independence sampler in Lemma 20 directly.

Lemma 36.

For all constant α>0, there exists an efficient strong (δ,ε) averaging sampler for binary domain with O(1ε2+αlog1δ) samples using n=Clog(1/δ) random bits for some constant C1.

Proof.

By setting s=1/εα and m=1 in Lemma 20, we have that whenever 1/εα1/δ, we have a strong (δ,ε) sampler with O(1ε2log1δ) samples using O(log(1/δ)) random bits. When 1/εα>1/δ. Using the pairwise independence sampler in Lemma 12 for binary domain will satisfy the condition.

Applying Lemma 35 to Lemma 36 gives Theorem 7: See 7

Proof.

We use the (δ,ε) sampler in Lemma 36, where we choose δ such that n=Clog(1/δ). Applying Lemma 35 to this sampler implies Theorem 7, where c(α)=1/C here.

6 Open Problems

Our work raises interesting open problems.

  • Comparing to the sampler in [31] which uses m+(1+α)log(1/δ) random bits, our averaging sampler requires m+O(log(1/δ)) random bits. Can we improve our randomness efficiency while maintaining a good sample complexity?

  • Is there a way to eliminate the additional α in the sample complexity? For ε=1/poly(m) and δ=exp(poly(m)), can we design an efficient averaging sampler that is asymptotically optimal in both randomness and sample complexity?

  • Can we further improve the randomness complexity of our matrix samplers to fully resolve Problem 2?

  • Is it possible to reduce the list size of the list-decodable codes in Theorem 7 to poly(n) using the structure of the list?

  • Can we construct randomness-efficient V-samplers on other normed spaces V?

References

  • [1] Rohit Agrawal. Samplers and Extractors for Unbounded Functions. In Dimitris Achlioptas and László A. Végh, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019), volume 145 of Leibniz International Proceedings in Informatics (LIPIcs), pages 59:1–59:21, Dagstuhl, Germany, 2019. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.APPROX-RANDOM.2019.59.
  • [2] Rudolf Ahlswede and Andreas Winter. Strong converse for identification via quantum channels. IEEE Transactions on Information Theory, 48(3):569–579, 2002. doi:10.1109/18.985947.
  • [3] Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple constructions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992. doi:10.1002/RSA.3240030308.
  • [4] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555, May 1998. doi:10.1145/278298.278306.
  • [5] Mihir Bellare, Oded Goldreich, and Shafi Goldwasser. Randomness in interactive proofs. Computational Complexity, 3:319–354, 1993. doi:10.1007/BF01275487.
  • [6] Mihir Bellare and John Rompel. Randomness-efficient oblivious sampling. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 276–287. IEEE, 1994. doi:10.1109/SFCS.1994.365687.
  • [7] Charles H Bennett, Gilles Brassard, and Jean-Marc Robert. Privacy amplification by public discussion. SIAM journal on Computing, 17(2):210–229, 1988. doi:10.1137/0217014.
  • [8] Jarosław Błasiok. Optimal streaming and tracking distinct elements with high probability. ACM Transactions on Algorithms (TALG), 16(1):1–28, 2019. doi:10.1145/3309193.
  • [9] Ran Canetti, Yevgeniy Dodis, Shai Halevi, Eyal Kushilevitz, and Amit Sahai. Exposure-resilient functions and all-or-nothing transforms. In Advances in Cryptology – EUROCRYPT 2000, International Conference on the Theory and Application of Cryptographic Techniques, Bruges, Belgium, May 14-18, 2000, Proceedings, pages 453–469. Springer, 2000. doi:10.1007/3-540-45539-6_33.
  • [10] Ran Canetti, Guy Even, and Oded Goldreich. Lower bounds for sampling algorithms for estimating the average. Information Processing Letters, 53(1):17–25, 1995. doi:10.1016/0020-0190(94)00171-T.
  • [11] Richard Y Chen, Alex Gittens, and Joel A Tropp. The masked sample covariance estimator: an analysis using matrix concentration inequalities. Information and Inference: A Journal of the IMA, 1(1):2–20, 2012.
  • [12] Benny Chor and Oded Goldreich. On the power of two-point based sampling. Journal of Complexity, 5(1):96–106, 1989. doi:10.1016/0885-064X(89)90015-0.
  • [13] Yevgeniy Dodis. PhD thesis: Exposure-resilient cryptography. Massachusetts Institute of Technology, September 2000.
  • [14] Yevgeniy Dodis and Joel Spencer. On the (non)universality of the one-time pad. In 43rd Symposium on Foundations of Computer Science (FOCS 2002), 16-19 November 2002, Vancouver, BC, Canada, Proceedings, pages 376–385. IEEE Computer Society, 2002.
  • [15] Yevgeniy Dodis and Daniel Wichs. Non-malleable extractors and symmetric key cryptography from weak secrets. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31–June 2, 2009, pages 601–610. ACM, 2009. doi:10.1145/1536414.1536496.
  • [16] Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1102–1114, 2018. doi:10.1145/3188745.3188890.
  • [17] David Gillman. A chernoff bound for random walks on expander graphs. SIAM Journal on Computing, 27(4):1203–1220, 1998. doi:10.1137/S0097539794268765.
  • [18] Oded Goldreich. A sample of samplers: A computational perspective on sampling. In Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation: In Collaboration with Lidor Avigad, Mihir Bellare, Zvika Brakerski, Shafi Goldwasser, Shai Halevi, Tali Kaufman, Leonid Levin, Noam Nisan, Dana Ron, Madhu Sudan, Luca Trevisan, Salil Vadhan, Avi Wigderson, David Zuckerman, pages 302–332. Springer, 2011. doi:10.1007/978-3-642-22670-0_24.
  • [19] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Transactions on Information Theory, 54(1):135–150, 2008. doi:10.1109/TIT.2007.911222.
  • [20] Johan Håstad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999. doi:10.1137/S0097539793244708.
  • [21] Russell Impagliazzo and David Zuckerman. How to recycle random bits. In FOCS, volume 30, pages 248–253, 1989. doi:10.1109/SFCS.1989.63486.
  • [22] Jørn Justesen. Class of constructive asymptotically good algebraic codes. IEEE Transactions on information theory, 18(5):652–656, 1972. doi:10.1109/TIT.1972.1054893.
  • [23] Yael Tauman Kalai, Xin Li, and Anup Rao. 2-source extractors under computational assumptions and cryptography with defective randomness. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, October 25-27, 2009, Atlanta, Georgia, USA, pages 617–626. IEEE Computer Society, 2009. doi:10.1109/FOCS.2009.61.
  • [24] Yael Tauman Kalai, Xin Li, Anup Rao, and David Zuckerman. Network extractor protocols. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 654–663. IEEE Computer Society, 2008. doi:10.1109/FOCS.2008.73.
  • [25] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.
  • [26] Chin-Laung Lu. Hyper-encryption against space-bounded adversaries from on-line strong extractors. In Advances in Cryptology - CRYPTO 2002, 22nd Annual International Cryptology Conference, Santa Barbara, California, USA, August 18-22, 2002, Proceedings, pages 257–271. Springer, 2002. doi:10.1007/3-540-45708-9_17.
  • [27] Joseph Naor and Moni Naor. Small-bias probability spaces-efficient constructions and applications. SIAM Journal on Computing, 22(4):838–856, 1993. doi:10.1137/0222053.
  • [28] Noam Nisan and David Zuckerman. Randomness is linear in space. Journal of Computer and System Sciences, 52(1):43–52, 1996. doi:10.1006/JCSS.1996.0004.
  • [29] Jaikumar Radhakrishnan and Amnon Ta-Shma. Bounds for dispersers, extractors, and depth-two superconcentrators. SIAM Journal on Discrete Mathematics, 13(1):2–24, 2000. doi:10.1137/S0895480197329508.
  • [30] Ran Raz. Extractors with weak random seeds. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 11–20, 2005. doi:10.1145/1060590.1060593.
  • [31] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pages 3–13. IEEE, 2000. doi:10.1109/SFCS.2000.892006.
  • [32] Yao-Feng Ren and Han-Ying Liang. On the best constant in Marcinkiewicz–Zygmund inequality. Statistics & probability letters, 53(3):227–233, 2001.
  • [33] Mark Rudelson. Random vectors in the isotropic position. Journal of Functional Analysis, 164(1):60–72, 1999.
  • [34] Aravind Srinivasan and David Zuckerman. Computing with very weak random sources. SIAM Journal on Computing, 28(4):1433–1459, 1999. doi:10.1137/S009753979630091X.
  • [35] Amnon Ta-Shma and David Zuckerman. Extractor codes. IEEE transactions on information theory, 50(12):3015–3025, 2004. doi:10.1109/TIT.2004.838377.
  • [36] Nicole Tomczak-Jaegermann. The moduli of smoothness and convexity and the rademacher averages of the trace classes s{p} (1p<). Studia Mathematica, 50(2):163–182, 1974. URL: http://eudml.org/doc/217886.
  • [37] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12:389–434, 2012. doi:10.1007/S10208-011-9099-Z.
  • [38] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015. doi:10.1561/2200000048.
  • [39] Salil Vadhan. The unified theory of pseudorandomness: guest column. ACM SIGACT News, 38(3):39–54, 2007. doi:10.1145/1324215.1324225.
  • [40] Salil P. Vadhan. On constructing locally computable extractors and cryptosystems in the bounded storage model. In Advances in Cryptology - CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa Barbara, California, USA, August 17-21, 2003, Proceedings, pages 61–77. Springer, 2003. doi:10.1007/978-3-540-45146-4_4.
  • [41] Salil P. Vadhan. Pseudorandomness. Foundations and Trends® in Theoretical Computer Science, 7(1–3):1–336, 2012. doi:10.1561/0400000010.
  • [42] Avi Wigderson and David Xiao. A randomness-efficient sampler for matrix-valued functions and applications. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), pages 397–406. IEEE, 2005. doi:10.1109/SFCS.2005.8.
  • [43] David Zuckerman. Randomness-optimal oblivious sampling. Random Structures & Algorithms, 11(4):345–367, 1997. doi:10.1002/(SICI)1098-2418(199712)11:4\%3C345::AID-RSA4\%3E3.0.CO;2-Z.
  • [44] David Zuckerman. Linear degree extractors and the inapproximability of max clique and chromatic number. Theory of Computing, 3(6):103–128, 2007. doi:10.4086/toc.2007.v003a006.

Appendix A Proof of Proposition 4

In this section, we extend the non-explicit averaging sampler construction from [10] to matrix samplers.

Lemma 37.

Let Samp be a d-dimensional (δ,ε) matrix sampler for domain {0,1}m using t samples. Then there exists a d-dimensional (2δ,3ε) matrix sampler for domain {0,1}m using m+2log1δ+2logd+loglogdε+1 random bits and t samples.

Proof.

Our goal is to construct a (2δ,3ε) matrix sampler Samp based on Samp.

Output Approximation via Discretization.

We construct a discretization grid G by

G:={aΔ+bΔia,b and |a|2+|b|2Δ2},

where Δ=εd. For each x{0,1}m, define an approximation function f that rounds each entry of f(x) to the nearest point in G, yielding f:{0,1}mGd×d. Since each entry in f(x) differs from f(x) by at most Δ, the total approximation error per matrix (in spectral norm) is bounded by dΔε according to Proposition 40. Thus, f has an average that approximates the average of f within ε, and the set of all such approximations f forms a finite function class, which we denote F.

Bounding the Size of 𝑭.

Each entry of a matrix in Gd×d has at most 1/Δ2 possible values. The number of possible matrices is therefore bounded by

(1Δ2)d2=(dε)2d2,

so the total number of functions in F is

|F|((dε)2d2)2m=22m+1d2logdε.
Probabilistic Reduction of Random Bits.

For each function fF, let a random seed be called bad if the estimate of Samp deviates from the true average of f by more than ε. Since Samp is a (δ,ε)-sampler, the fraction of bad random seeds for any fF is at most δ. By Hoeffding’s inequality, if we select k random seeds independently at random, the probability that more than 2δk of them are bad is at most 2e2δ2k. Applying a union bound over all fF, the probability that there exists any f with more than 2δk bad seeds is at most |F|2e2δ2k.

Choosing 𝒌 and Applying Probabilistic Method.

Set

k=ln|F|+ln2.012δ22md2logdε+1δ2

so that |F|2e2δ2k<1. With this choice, there exists a set K of k random seeds such that, for all fF, the fraction of bad seeds in K is at most 2δ. The number of random bits required to select a sequence ρK is

logkm+2log1δ+2logd+loglogdε+1.
Defining the New Sampler Samp.

We define Samp as follows: Select a random seed ρK, and run Samp(ρ) to get samples Z1,,Zt{0,1}m. With 12δ probability, we have

1ti=1tf(Zi)𝔼f 1ti=1t((f(Zi)f(Zi))+(f(Zi)𝔼f)+(𝔼f𝔼f))3ε.

Samp is then a (2δ,3ε) matrix sampler, using only

m+2log1δ+2logd+loglogdε+1

random bits. This completes the proof.

Theorem 38 (Matrix Chernoff Bound, see [38]).

Let X1,,Xk be independent d×d complex random matrices. Suppose Xi1 for all i[k]. Then, for any ε>0, the following inequality holds:

Pr(1ti=1t(Xi𝔼[Xi])>ε)2dexp(38tε2).

Applying matrix chernoff bound, we can prove Proposition 4.

Proposition 4. [Restated, see original statement.]

There exists a (non-explicit) d-dimensional (δ,ε) matrix sampler for domain {0,1}m using O(1ε2logdδ) samples and m+2log1δ+2logd+loglogdε random bits.

Proof.

By Theorem 38, taking t=24ε2log4dδ independent samples in {0,1}m would give a d-dimensional (δ/2,ε/3) matrix sampler for domain {0,1}m using t samples and tm random bits. Applying Lemma 37, we get a d-dimensional (δ,ε) matrix sampler for domain {0,1}m using O(1ε2logdδ) samples and m+2log1δ+2logd+loglogdε random bits.

Appendix B Proof of Lemma 15

Proposition 39 (Marcinkiewicz–Zygmund inequality [32]).

Let {Xi,i1} be a sequence of independent random variables with 𝔼Xi=0, 𝔼|Xi|p<. Then for p2:

𝔼|i=1nXi|pC(p)np/21i=1n𝔼|Xi|p,

where C(p)(32)ppp/2.

Lemma 15. [Restated, see original statement.]

Let Z1,,Zt{0,1}m be a sequence of γ-almost -wise independent variables for an even integer . Then for every sequence of functions f1,,ft:{0,1}m[0,1],

Pr[|1ti=1t(fi(Zi)𝔼fi)|ε]1(25ε2t)/2γε.
Proof.

Let Wi:=fi(Zi)𝔼fi. We have

Pr[|i=1tWi|>tε]𝔼[|i=1tWi|](tε).

Let W1,,Wt be a sequence of independent random variables where Wi:=fi(U{0,1}m)𝔼fi. Since the Wi’s are γ-almost -wise independent and |Wi|1, we have

𝔼[|i=1tWi|]=𝔼[(i=1tWi)]𝔼[(i=1tWi)]+γt=𝔼[|i=1tWi|]+γt.

Since 𝔼Wi=0 and |Wi|1, they satisfy the conditions for Marcinkiewicz–Zygmund inequality. We have

𝔼[|i=1tWi|](32)/2t/21i=1t𝔼|Wi|(5t).

Therefore,

𝔼[|i=1tWi|](tε)(25ε2t)/2+γε.

Appendix C Proof of Lemma 17

To prove Lemma 17, we need the following property of matrix norms:

Proposition 40.

Let Ad×d and define

r=maxi,j|Aij|.

Then the spectral norm of A satisfies

rAdr.
Proof.

Select standard basis vectors ei,ejd such that |Aij|=r. Then,

A|eiAej|ei2ej2=|Aij|=r.

We also have

AAF=i=1dj=1dAij2dr.

Lemma 17. [Restated, see original statement.]

A (δ,ε) averaging sampler is a d-dimensional (2d2δ,2dε) matrix sampler.

Proof.

Let Z1,,Zt be the sampler’s output. We define

A:=1ti=1tf(Zi).

Now we fix some i,j[d]. For all x{0,1}m, we have |f(x)ij|f(x)1 by Proposition 40. Then, since Zis are the output of a (δ,ε) averaging sampler, we have

Pr[|Re(Aij)Re((𝔼f)ij)|ε]1δandPr[|Im(Aij)Im((𝔼f)ij)|ε]1δ,

where Re(x) and Im(x) are the functions that extract the real part and imaginary part of x respectively. Take a union bound, we have with 12δ probability,

|Aij(𝔼f)ij||Re(Aij(𝔼f)ij)|+|Im(Aij(𝔼f)ij)|2ε.

By a union bound over all entries, with 12d2δ probability, all entries have an additive error bounded by 2ε, and this implies that A𝔼f2dε by Proposition 40.