Device-independent randomness extraction for arbitrarily weak min-entropy source

In this paper we design a protocol to extract random bits with an arbitrarily low bias from a single arbitrarily weak min-entropy block source in a device independent setting. The protocol employs Mermin devices that exhibit super-classical correlations. Number of devices used scales polynomially in the length of the block n, containing entropy of at least two bits. Our protocol is robust, it can tolerate devices that malfunction with a probability dropping polynomially in n at the cost of constant increase of the number of devices used.

High quality randomness is a very useful resource in many computation and cryptographic tasks. In fact it has been shown that many protocols (including quantum ones) vitally require perfect randomness for their security [1][2][3].
Unfortunately, at the same time perfect randomness is very rare. In the classical world the true randomness, i.e. independent uniformly distributed random bits, cannot be produced at all. The only available resource is pseudo-randomness, sequences that appear random to all observers (often referred to as adversaries) not having full information about the whole environment. Thus classical randomness generators produce pseudorandom numbers stemming from external sources and fluctuations, hoping that the adversary will not be able to reconstruct all the background information. Sources producing imperfect randomness even taking into account the limited capabilities of the adversary are called weak random sources.
To enhance the quality and security of these sources, randomness extractors are used. These are devices that combine more sources of randomness to obtain fewer bits of higher quality [4].
On the other hand, theoretically the production of true randomness is possible, if one assumes Quantum theory to be valid: Preparation of a pure state and measurement in its complementary basis will yield a perfectly random result. This is due to the inherent randomness present in Quantum theory itself -this principle is being used in the design commercially available devices [5]. The assumption, however, is high quality and stability of quantum devices in an adversarial setting, which is far from trivial to achieve [6].
In addition, quantum devices in reality act more like black boxes that are inaccessible for users except for providing them classical inputs and obtaining classical outputs from them. It is very hard, if not impossible, to directly test what these devices actually do, whether they perform operations and measurements as promised and whether their outputs really come from quantum measurements. Therefor it is crucial to test these devices even during their activity -satisfying these tests shall guarantee that the devices are correctly designed and manufactured and they work as desired. This is possible by utilizing super-classical correlations of certain quantum states -if the device consists from separate parts, their classical results can be tested for correlations and their level, if breaking the classical bound, can be a guarantee of their quantum nature. Using non-trusted (or self-testing) quantum devices is referred as Device independence in a broader scope. The process of transformation of a weak random source into uniformly random bits is called randomness extraction throughout this letter.
Weak random sources -To provide a figure of merit of randomness extractors, one needs to characterize the randomness of the input random source. One of the possible parameterizations is the so called Santha-Vazirani (SV) parametrization [7], given by the following property: Let X = (X 1 , X 2 , . . . ) be an arbitrarily long random bit string produced by an ε-SV source. Then for any 1 ≤ i ≤ n it holds that where E is any information an adversary Eve might hold. Note here that the apparent randomness (i.e. without knowledge of E) of each X i may as well be uniform. The purpose of introducing random variable E is to represent possible correlations between the choice of the measurement settings and internal workings of the devices running a Bell type test. Second possibility is to consider a one-shot use source that would produce n-bit strings X (with n being arbitrary large). Here we can characterize the randomness of the source by the (conditional) min-entropy of the pro-duced sequence defined as A source is called an (n, k) source if H ∞ (X|E) ≥ k and might be also characterized by its min-entropy rate R = k/n. Combining these two approaches we get the reusable min-entropy source with n-bit blocks of output with guaranteed min-entropy k. Such a source can be modeled as a sequence of n-bit random variables X 1 , X 2 , . . . , such that Therefore, each new block has a guaranteed minimal minentropy, even conditioned on the previous ones and any information of the adversary. It is easy to see that SV sources are recovered with n = 1 and ε = 2 −H∞(X) − 1 2 . Source of this type is also called block source.
Classically the task of transforming a single weak source, characterized either as a Santha-Vazirani source, or a min-entropy (block) source into a fully random bit is known to be impossible [4,7]. However, with nonclassical resources the task becomes possible. More precisely, weak random source can be used to choose measurements for a Bell test in order to certify that observed correlations cannot be explained by local theories and thus must necessarily contain intrinsic randomness.
In their seminal paper Colbeck and Renner [8] showed that amplification of Santha-Vazirani sources is possible for a certain range of parameter ε and thus opened a line of research devoted to SV amplification. Subsequent works provided protocols that are able to amplify SV-sources for any ε < 1 2 in various settings [9][10][11][12]. This line of researched culminated in the work of Brandão et. al. [13], who showed how to amplify such source of randomness with the use of only eight noncommunicating devices. Their work was quickly followed by that of Coudron and Yuan [14], who showed how to use 20 non-communicating devices to obtain arbitrary many bits from a Santha-Vazirani source.
On the other hand, extraction from min-entropy sources is relatively unexplored. There is a sequence of works exploring the validity of Bell tests if the measurements are chosen according to a min-entropy source [15,16] and the authors of this paper provided a protocol which uses 3-party GHZ-paradox to amplify sources with min-entropy rate R > 1 4 log 2 (10) against quantum adversaries [17]. Recently an extensive work on this topic was made public on pre-print archive [18]. In this letter we conclude this work by providing a protocol extracting random bits from min-entropy sources of randomness with any non-zero min-entropy rate.
Device-independent concept and Mermin inequality -In this letter we use the three partite Mermin inequal-ity. Let's consider three spatially-separated boxes, each of them having a single bit input and a single bit output. Let us denote the input bits of the respective boxes by X, Y and Z and the corresponding output bits A, B and C. By construction we guarantee X ⊕ Y ⊕ Z = 1, i.e. we consider only inputs XY Z ∈ {111, 100, 010, 001} simultaneously passed to all boxes. The value v of the Mermin term is a function of the 4 conditional probabilities defined by the behaviour of the device and of the probability distribution p on inputs In particular, for the uniform input distribution we set P (XY Z = 111) = P (XY Z = 010) = P (XY Z = 001) = P (XY Z = 100) = 1 4 and denote the Mermin term by v u . Assuming the uniform distribution on all four inputs, the maximal value of v u achievable by a classical device [19] is 3 4 (thus the Mermin inequality reads v u ≤ 3 4 ) and there exists a classical device that can make any 3 conditional probabilities simultaneously equal to 1. In the quantum world we can achieve v u = 1 and satisfy perfectly all 4 conditional probabilities using the tripartite GHZ state 1 √ 2 (|000 + |111 ) and measuring σ X (σ Y ) when receiving 0 (1) on input.
The beautiful property of the Mermin inequality is that the violation v gives us directly the probability that the device passes a specific test A ⊕ B ⊕ C = X · Y · Z. The probability of failing the test reads w = 1 − v.
Mironowicz, Gallego and Pawlowski (MGP) [10] showed the following result: Take a linearly ordered sequence of k Mermin devices D 1 ...D k (k being arbitrary) that have uniform distribution on inputs, and each device knows inputs and outputs of its predecessors (for optional cheating purposes), but devices cannot signal to its predecessors. Let us assume that the inputs of devices are described by random variables XY Z 1 , . . . , XY Z k , and the outputs by ABC 1 , . . . , ABC k . Then there exists a function f (ε) such that if the value of the Mermin variable (3) using uniform inputs is at least v u ≥ f (ε), then the output bit A k has a bias at most ε conditioned on the input and output of all its predecessors and the adversarial knowledge. This function can be lower bounded by a Semi-Definite Program (SDP) using any level of the hierarchy introduced in [20]. By using the second level of the hierarchy one can obtain the bound on f (ε) as a function of ε shown in Fig.1. We can set k = 1 (having just a single device) and get the lower bound on the detection probability of producing a bit biased by more than ε, which is w u > 1 − f (ε). More independent noncommunicating devices can be ordered into any sequence and thus this limit holds for any of these devices simultaneously.
Single-round protocol -In the rest of our analysis we will be working with (n, k) sources for an arbitrary n and k ≥ 2. This is to simplify the explanation, since by taking 2 k blocks of an arbitrary (n , k ) source with k > 0 we get a (n, k) source with n = 2 k n and k = 2 k k ≥ 2. Let us start with a min-entropy (n, 2) source (recall that (n, k) source with k > 2 is also an (n, 2) source) and where outputs of the function 0, 1, 2, 3 identify 111, 100, 010, 001 inputs for the device.
We want to construct H with the property that for every 4-element set S ⊆ {0, . . . , N − 1} there exist at least one hash function h ∈ H such that h(S) = {0, 1, 2, 3}. This is trivially satisfied for the set of all possible hashing functions H f ull = {0, 1, 2, 3} N , however, such a class of functions with its 4 N elements is unpractically large. In the supplementary material we show a construction with logarithmic number of functions in N , thus the number of devices needed scales polynomially with the length of the sequence n. We also stress that for large n one hash function covers as many as 9% of all four-tuples, independently on n. So the size of an optimal set of hash functions might not depend on n at all.
The protocol works as follows: 1. We obtain the (weakly) random n bit string X from the random number generator.
2. Into each device D i we input the 3 bit string h i (X) -inputs X i , Y i and Z i and obtain the outputs A i , B i and C i .
3. We verify whether for each device D i the condition If this is not true, we abort the protocol due to cheating attempt of the provider. 4. We define the output bit of the protocol as The protocol is depicted in the Fig. 2.
Depiction of a single round protocol. Bit string drawn from the flat random source is hashed into m inputs into Mermin devices so that at least one device receives perfectly random distribution. This guarantees at least one result almost perfectly random, what also holds for the product of individual results.
Let us now examine the properties of the bit b. First consider only flat (n, 2) distributions. Recall that these are exactly distributions that are uniform on 4-element subsets of the sample space. Our construction of the class H of hash functions assures that for any flat probability distribution there is a function h j ∈ H and the corresponding device D j such that inputs of D j (hashed by h j ) are uniform on this flat distribution. This gives us that if adversary restricts himself to flat distributions and wants to achieve bias greater than ε for the output bit b, she must achieve this bias in all rounds. The probability that she is not detected while doing this is v u ≤ f (ε) for each round. The same condition holds then also for the product of all output bits b.
The set of all (n, 2) distributions is convex and flat distributions are exactly all extremal points of this convex set. Thus any (n, 2) distribution d can be expressed as a convex combination of at most N (n, 2) flat distributions The probability that the adversary is not detected is given by the successful cheating proba-bilities when using flat distribution d i ∈ {d i } N i=1 averaged thourgh the probability distribution on these flat distri- [21]. Thus the upper bound v u ≤ f (ε) holds for non-flat distributions as well.
To summarize this part, having an (n, k) source with k ≥ 2, with a single round of a protocol, we can produce a single bit that is biased at most by ε with a certainty of 1 − f (ε).
Multiple-round protocol for block sources -Let us state the most general task: we have an (n, k) block source with arbitrary n and k ≥ 2 (recall that any source with k > 0 can be multiplied to obtain k ≥ 2). We would like to produce a bit that is biased by no more than ε with certainty of at least 1 − δ.
If the one-round version does not meet these parameters, we will repeat the whole protocol l times. By using new devices and new outputs of the block source, each of the runs j will produce a bit b j , that is biased by ε from perfectly random bit conditioned on all the previous bits up to a probability f (ε). Thus also the XOR of all output bits b = l j=1 b j will have at most the bias ε.
After l rounds, the probability of the adversary not being detected will be upper bounded by f (ε) l . Note that the product form does not come from the fact that the detection probabilities are independent (they are not). This is a product of a chain of conditional probabilities. Recall that the bound f (ε) holds conditioned by any inputs and outputs of the previous devices (in an arbitrarily ordering that respects the causality). Thus choosing l > log δ log f (ε) will guarantee the fulfillment of the conditions for the parameters ε and δ.
Summing up, with an (n, k) block source and O log δ log f (ε) P oly n 2 k Mermin devices we can produce a single random bit with bias smaller than ε with probability largen than 1 − δ. For producing more bits we simply repeat the whole procedure: all the bits produced will have bias smaller than ε conditioned on the bits produced so far, with linearly scaling of resources.
Protocol for one-shot min-entropy sources -We can model a different scenario where the random source is described by a single use min-entropy source characterized by its min-entropy rate R. In such a case we cannot use the same scenario as before, as there are no independent blocks of randomness with guaranteed min-entropy available. In spite of this fact randomness extraction is still possible on the cost of increasing the number of devices used.
We can draw a bit string from the source with length n and min-entropy Rn, securing at least 2 Rn realizations of the string appearing with non-zero probability. We shall use this string for a single round of the protocol, however using a full set of hashing functions H f ull . Then, for flat sources, there will be at least Rn 2 devices obtaining perfectly random distribution on inputs independently on each other (see supplementary material for explicit construction), yielding failure probability of the protocol δ < f (ε) Rn 2 . Thus choosing n > 2 R log(δ) log(f (ε)) will produce a random bit biased by no more than ε up to a probability δ, though on the costs of double-exponential number of devices in 1 R and log(δ) log f (ε) . For non-flat sources the same result holds due to Caratheodory theorem mentioned earlier.
Robustness -Aborting the protocol after even a single mistake of the devices is certainly highly impractical from the imlementation point of view. Therefor we expand our analysis into a situation where we tolerate certain noise on the devices, which would manifest itself by occasional failing of the test condition even for honest devices. More specifically, we shall tolerate a certain fraction of the devices to malfunction without aborting the protocol.
In the supplementary material we show that we can tolerate up to l (1−f (ε)) 2 devices to fail in the whole protocol and still achieve the same result as for the perfect protocol by choosing l > 8 ln δ f (ε)−1 . This translates into increasing the number of rounds of the protocol comparing to the case of ideal devices by a factor of 8 ln(f (ε)) f (ε)−1 . For small ε the parameter f (ε) approaches 1 and the multiplication factor saturates by 8. For honest devices with individual failure probability bounded by ( , the probability of a false alarm decreases exponentially with the number of protocol rounds l. Conclusion -In this letter we have introduced a protocol that extracts weak randomness obtained from a minentropy source in the device independent setting. The protocol works for arbitrarily weak both single-use and block min-entropy sources, with a reasonable scaling of the number of devices in the latter case. Our protocol is also robust, as it allows tolerating some fraction of malfunctioning devices at the cost of a constant increase of the number of devices used. We want to construct H with the property that for every set S ⊆ {0, . . . , N − 1} with |S| ≥ k there is at least one hash function h ∈ H such that h(S) = {0, 1, 2, 3}. This is trivially satisfied for H f ull = {0, 1, 2, 3} N , however, such a class of functions is unpractically large, it has 4 N elements. Therefor we shall construct a smaller set fulfilling the condition.
Derandomization construction of the class H Let us consider a sequence of random variables Z = (Z 0 , . . . , Z N −1 ) such that Z i ∈ {0, 1, 2, 3}. The outcomes of such a random experiment are N -position sequences from the set {0, 1, 2, 3} N . It is easy to see that each such sequence specifies uniquely a particular function h : {0, ..., N − 1} → {0, 1, 2, 3}, and vice versa. Since now on we will use them interchangeably.
Let us assume that random variables Z satisfy the condition that for every 4-tuple of positions j 0 , j 1 , j 2 , j 3 and every 4-element string a 0 a 1 a 2 a 3 ∈ {0, 1, 2, 3} 4 it holds that Note that for our purposes even a weaker assumption on Z is sufficient: It is enough if for every 4-tuple of positions j 0 , j 1 , j 2 , j 3 there exists at least one 4-element string a 0 a 1 a 2 a 3 ∈ {0, 1, 2, 3} 4 with all a 0 , a 1 , a 2 , a 3 begin mutually different and satisfying (4). However, the stronger condition will make it easier to find a suitable set.
Let us denote H = {a ∈ {0, 1, 2, 3} N s.t. P [Z = a] > 0}. Using the probabilistic method we see, that for each 4-tuple of positions j 0 , j 1 , j 2 , j 3 and every 4-element string a 0 a 1 a 2 a 3 ∈ {0, 1, 2, 3} 4 there exists a function h ∈ H such that The number of functions in H is the same as the number of (nonzero probability) sample space elements of Z. It remains to construct Z with a sample space as small as possible.
Construction of a random variable Z Definition . 1 We define the distance of two distributions D 1 and D 2 by where Ω is the set of all possible events. where U (S) is a uniform distribution over |S|-bit strings and D(S) is a marginal distribution over subset of variables specified by S.

Theorem .3
The logarithm of the cardinality of the sample space needed for constructing N k-wise δdependent random variables is O k + log log N + log 1 δ [22].
Let us consider two sequences X 0 , . . . , X N −1 and Y 0 , . . . , Y N −1 of binary 4-wise δ-dependent random variables, both sequences being mutually independent. Let As both X and Y are δ-dependent, their distance from the uniform distribution on every subset of size at most 4 is at most δ. Assuming there is a zero probability for at least one binary string out of {0, 1} 4 at positions (0, 1, 2, 3) we have that the distance of such a distribution from the uniform distribution is at least 2 × 2 −4 = 2 −3 .
Hence, assuring that δ < 2 −3 we obtain that for each 4 positions there is a nonzero probability of every 4-bit sequence appearing. Hence, for the sequence of random variables Z it holds that every 4-tuple of positions every string out of {0, 1, 2, 3} 4 appears with non-zero probability.
In our case we need two independent sets of N = 2 n 4-wise 1/8-dependent random variables, resulting in a sample space of O(n c ), bearing the desired polynomial construction.

Robustness
Let us assume we would tolerate a failure at most (1−f (ε)) 2 l devices during the run of the whole protocol. Let us first calculate the number of rounds of the protocol l needed to obtain the original ε and δ characteristics of the non-robust device.

Efficiency
Assuming the adversary is cheating (wants to achieve bias greater than ε), in each round of the protocol there will be at least one device failure with probability 1−f (ε). The probability δ that the adversary stays undetected while all devices produce bias at least ε is bounded by the distribution function of the binomial distribution This probability can be upper bounded by Chernoff's inequality by We can derive the necessary number of rounds of the protocol l to be Comparing to the number of rounds needed for the nonrobust protocol log δ log f (ε) we can obtain the scaling factor s to be For f (ε) → 1 (what is the case for small ε) the scaling factor approaches a constant of 8.

Imperfectness
We also want to assure there exist a non-zero failing probability of each individual device µ such that the protocol execution will not be (falsely) declared to be attacked by the adversary with high probability. Let us consider an honest provider (not trying to cheat) and set µ = 1−f (ε) 4m . We will calculate the probability that more than (1−f (ε)) 2 l devices will fail during the process. Since the producer of the devices is assumed to be honest (otherwise the protocol failure is justified), we may assume that failures of devices are independent of each other. Therefore the failures can be modeled by i.i.d. Bernoulli random variables (Z i = 1 if the i-th device fails the test) Z 1 , . . . , Z ml , with P (Z i = 1) = µ = 1−f (ε) 4m ). The number of failures Z = ml i=1 Z i is binomially distributed. For the protocol not to abort we need less than 1−f ( ) 2 l failures, hence we need to upper bound the probability We can use the Hoeffding inequality: i.e. the probability of false protocol abort drops exponentially with the number of rounds l.
Using H f ull for Non-Block Sources We used the following claim in the main text: If we hash the outcome of a (n, Rn)-flat distribution by each of the hash functions from the full set of functions H f ull = {h i : {0, 1} n → {0, 1, 2, 3}}, at least Rn 2 functions have uniform and independent outcomes. First let us suppose Rn is natural and even. Then there are 4 Rn/2 strings appearing with probability 1 4 Rn/2 . Let us label them {s i } (4 Rn/2 −1) i=0 . We will now explicitly construct hash functions {h j } Rn/2 j=0 with desired properties.
Let M be Rn 2 times 4 Rn/2 matrix with i th column being a representation of i in base 4. Let us assign h j (s i ) = M ji (example with Rn = 4 is depicted in Fig. (3)). Although this is only a partial definition of {h j } Rn/2 j=0 , it is sufficient for our purposes, because other strings appear with probability 0. It should now be straightforward to see that each vector of outcomes (h 0 , . . . , h Rn/2 ) appears with equal probability and therefore marginal distributions of outputs of a single function h is uniform and independent on the others. By Caratheodory theorem all other values of Rn can be written as convex combinations of (n, m) flat sources with m = 2 Rn/2 , which gives us that the probability to cheat with such (n, Rn) source is at most the same as with (n, m) flat source -i. e. equal to Rn