Explicit List-Decodable Codes with Optimal Rate for Computationally Bounded Channels

A stochastic code is a pair of encoding and decoding procedures (Enc, Dec) where Enc:{0,1}k×{0,1}d→{0,1}n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\rm Enc} : \{0, 1\}^{k} \times \{0, 1\}^{d} \rightarrow \{0, 1\}^{n}}$$\end{document}. The code is (p, L)-list decodable against a class C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal{C}$$\end{document} of “channel functions” C:{0,1}n→{0,1}n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C : \{0,1\}^{n} \rightarrow \{0,1\}^{n}$$\end{document} if for every message m∈{0,1}k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m \in \{0,1\}^{k}$$\end{document} and every channel C∈C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C \in \mathcal{C}$$\end{document} that induces at most pn errors, applying Dec on the “received word” C(Enc(m,S)) produces a list of at most L messages that contain m with high probability over the choice of uniform S←{0,1}d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S \leftarrow \{0, 1\}^{d}$$\end{document}. Note that both the channel C and the decoding algorithm Dec do not receive the random variable S, when attempting to decode. The rate of a code is R=k/n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R = k/n$$\end{document}, and a code is explicit if Enc, Dec run in time poly(n). Guruswami and Smith (Journal of the ACM, 2016) showed that for every constants 00\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0 < p < \frac{1}{2}, \epsilon > 0$$\end{document} and c>1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c > 1$$\end{document} there exist a constant L and a Monte Carlo explicit constructions of stochastic codes with rate R≥1-H(p)-ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R \geq 1-H(p) - \epsilon$$\end{document} that are (p, L)-list decodable for size nc\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n^c$$\end{document} channels. Here, Monte Carlo means that the encoding and decoding need to share a public uniformly chosen poly(nc)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm poly}(n^c)$$\end{document} bit string Y, and the constructed stochastic code is (p, L)-list decodable with high probability over the choice of Y. Guruswami and Smith pose an open problem to give fully explicit (that is not Monte Carlo) explicit codes with the same parameters, under hardness assumptions. In this paper, we resolve this open problem, using a minimal assumption: the existence of poly-time computable pseudorandom generators for small circuits, which follows from standard complexity assumptions by Impagliazzo and Wigderson (STOC 97). Guruswami and Smith also asked to give a fully explicit unconditional constructions with the same parameters against O(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\log n)$$\end{document}-space online channels. (These are channels that have space O(logn)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\log n)$$\end{document} and are allowed to read the input codeword in one pass.) We also resolve this open problem. Finally, we consider a tighter notion of explicitness, in which the running time of encoding and list-decoding algorithms does not increase, when increasing the complexity of the channel. We give explicit constructions (with rate approaching 1-H(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 - H(p)$$\end{document} for every p≤p0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p \leq p_{0}$$\end{document} for some p0>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{0} >0$$\end{document} ) for channels that are circuits of size 2nΩ(1/d)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2^{n^{\Omega(1/d)}}$$\end{document} and depth d. Here, the running time of encoding and decoding is a polynomial that does not depend on the depth of the circuit. Our approach builds on the machinery developed by Guruswami and Smith, replacing some probabilistic arguments with explicit constructions. We also present a simplified and general approach that makes the reductions in the proof more efficient, so that we can handle weak classes of channels.


Introduction
List-decodable codes. List-decodable codes are extensively studied in Coding Theory and Theory of Computer Science and have many applications. In the paragraph below, we define list-decodable codes, using a functional view, which is more convenient for this paper.
A code is defined by a pair (Enc, Dec) of encoding and decoding procedures. We say that Enc : {0, 1} k → {0, 1} n , is (p, L)-list decodable, if there exits a function Dec which given y ∈ {0, 1} n Dec(y) produces a list of size L containing all elements m ∈ {0, 1} k such that δ(y, Enc(m)) ≤ p (here δ(x, y) is the relative Hamming distance of x and y). Unique decoding is the special case where L = 1, and a code is explicit if both encoding and decoding can be performed in time polynomial in n. The rate of a code is R = k n . (A more detailed formal definition is given in Section 3.2.) Natural examples of complexity classes are polynomial size circuits and logarithmic space branching programs. Note that these two classes are nonuniform, and it is more natural to use nonuniform classes, as such classes trivially contain channels C where E C is constant (meaning that there is a fixed error vector e such that C(z) = z ⊕ e). Such channels are called "additive channels," and as they are the simplest form of adversarial behavior, it makes sense that we allow them in any class of computationally bounded channels.
Another advantage of using nonuniform classes of channels is that it is sufficient to consider deterministic channels, in order to obtain security against randomized channels. This is because by averaging; if there is a computationally bounded randomized channel that is able to prevent decoding on some message m, then we can fix its random coins and obtain a deterministic channel (which is hardwired with a good choice of random coins).

Stochastic codes.
Unfortunately, the notion of computationally bounded channels is not interesting in the standard setup of error correcting codes: It is easy to show that if a code Enc : {0, 1} k → {0, 1} n is list decodable against additive channels, then it is list decodable against unbounded channels. 2 Several setup assumptions were introduced in order to circumvent this problem. In this paper, we are interested in a setup of "stochastic codes" studied by Guruswami & Smith (2016). We remark that other setups have been considered and we mention these in Section 1.4.
Let C be a class of channels that induce at most pn errors. A stochastic code against C (with rate R) consists of a pair of algorithm (Enc, Dec) such that: • The encoding algorithm Enc(m, S) receives a message m ∈ {0, 1} Rn and a uniform string S (that is not known to the 2 Specifically, if a code is not (combinatorially) list decodable, then there exists a received word y that has too many codewords that are close to it. Let c be one of these codewords, and let e = c ⊕ y and consider the additive channel C e (z) = z ⊕ e. This channel "breaks" the code as C(c) = c ⊕ e = y, and y is a received word on which decoding cannot succeed. cc List-Decodable Codes for Bounded Channels Page 5 of 70 3 channel or decoding algorithm) and outputs an n bit string that is the codeword.
• A channel C ∈ C that does not receive the string S corrupts the codeword, generating C(Enc(m, S)).
• The decoding algorithm gets the "corrupted codeword" C(Enc(m, S)), but does not receive the string S.
• For every message m, and for every channel C ∈ C, the decoding done by Dec(C(Enc(m, S))) needs to successfully recover the original message m with probability 1 − ν over the choice of S. (ν > 0 is an error parameter).
Here, "success" means to output m (in case of unique decoding) or output a list of size L that contains m (in case of list-decoding).
We typically parameterize C with two parameters: the complexity of functions in the class and the number errors that they induce.
• L-list decodable with success probability 1 − ν against channels in C if the function Dec above is allowed to output a list of size at most L that contains m.
A code is explicit if its encoding and decoding functions are computable in time polynomial in their input and output. The rate of the code is the ratio of the message length to output length of Enc. Guruswami & Smith (2016) gave explicit constructions of stochastic codes with rate approaching 1 − H(p) (for 0 < p < 1 2 ) that are uniquely decodable against additive channels. They also showed that for p > 1/4 there are computationally weak channel families, against which stochastic codes with rate approaching 1 − H(p) and unique decoding do not exist. (All the complexity classes considered in this paper can simulate these weak channels.) A Monte Carlo construction of stochastic codes for polysize circuits. Guruswami & Smith (2016) showed that for every constant c, there is a Monte Carlo explicit construction of list-decodable stochastic codes against channels of size n c (by this we mean channels implementable by circuits of size n c ). Moreover, these codes achieve a rate that approaches 1 − H(p). By Monte Carlo, we mean that: • The encoding and decoding algorithms receive an additional input y of length poly(n c ).
• With high probability over the choice of y, the encoding and decoding algorithms (that are hardwired with y) form the required stochastic code.
The string y is not kept secret from the channel, and this definition allows the channel to depend on y. However, we mention that the approach of Guruswami and Smith dictates that the length of y is larger than n c (and in general larger than the log of the number of allowed channels). This means that a channel cannot receive y as input, as the number of possible channels is much smaller than the number choices for the string y.

Our results.
Guruswami and Smith stated the following open problem: give a fully explicit (that is not Monte Carlo) constructions of stochastic codes against poly-size circuits, under complexity theoretic assumptions.
Necessity of complexity theoretic assumptions. As we explain later, complexity theoretic assumptions are not necessary in order to give Monte Carlo constructions of stochastic codes. They are necessary to give fully explicit constructions (which are not Monte Carlo) in the following sense: Given a stochastic code against circuits of size n c , we can consider the "optimal channel" that given a codeword z ∈ {0, 1} n tries all possible error vectors e ∈ {0, 1} n of relative Hamming weight p and finds the first one on which decoding fails, if such a vector exists. This channel succeeds iff the code isn't secure against unbounded channels. If the code isn't secure against unbounded channels (but secure against size n c channels), then this attack cannot be carried out in size n c . This means that there is a problem computable in E = DT IM E (2 O(n) ) that for every sufficiently large n cannot be solved by size n c circuits. 3 We remark that this type of assumption (namely, that there is a problem in E that requires large circuits) is exactly the type of assumption that implies and is implied by the existence of explicit pseudorandom generators in the Nisan-Wigderson setting (Impagliazzo & Wigderson 1997;).

Explicit stochastic codes for poly-size circuits.
Our first result resolves the open problem posed by Guruswami and Smith, and we construct explicit stochastic codes against polysize circuit channels, under an assumption that is only somewhat stronger than what is implied by the existence of such codes.
Theorem 1.2. (Explicit stochastic codes for poly-size channels) If there exist a constant β > 0 and a problem in E = DTIME (2 O(n) ) such that for every sufficiently large n, solving the problem on inputs of length n requires circuits of size 2 β·n , then for every constants 0 < p < 1 2 , > 0 and c > 1, there exists a constant L such that for infinitely many n, there are explicit stochastic codes with rate 1 − H(p) − that are L-list decodable for size n c circuits that induce at most pn-errors.
Theorem 1.2 is stated in more detailed form in Theorem 5.5. The assumption used in the theorem is a standard complexity assump-3 Page 8 of 70 Shaltiel and Silbak cc tion and was used by Impagliazzo & Wigderson (1997) to show that BPP = P.

Unconditional explicit stochastic codes for space
O(log n) online channels. Guruswami and Smith also considered "space s online channels." These are channels C : {0, 1} n → {0, 1} n implemented by space s or equivalently width 2 s oblivious read-once branching programs (ROBPs). Below is a standard definition of ROBPs tailored for functions that output many bits.
Read Once Branching Programs. We will only be interested in space s ≥ log n. A space s ROBP C : {0, 1} n → {0, 1} n is defined using a layered graph with n + 1 layers, where the first layer has a single node v 0 , and remaining layers have 2 s nodes. Each node v in the first n layers has two outgoing edges (labeled with zero and one) connected to nodes in the next layer, and each node v is also labeled by an "output bit" b(v). On input x ∈ {0, 1} n , the computation of C is defined by following the unique path from v 0 to the last layer, defined by taking the edge marked with x i at step i. The output C(x) is the concatenation of the n output bits, collected at nodes along the path. It is standard that for s ≥ Ω(log n) ROBPs with space O(s) capture the nonuniform version of space O(s) computation that reads its n bit input x in fixed order. We remark that all the results in this paper also hold if we allow channels to have s bits of "lookahead," allowing them to also read the bits i + 1, . . . , i + s before outputting the i'th bit. Guruswami Guruswami & Smith (2016) contained an unconditional Monte Carlo construction of stochastic code against space O(log n) online channels and a conditional Monte Carlo construction for size n c circuits (relying on the existence of "Nisan-Wigderson style," pseudorandom generators for size n c circuits). However, Monte Carlo constructions can easily obtain "Nisan-Wigderson style" pseudorandom generators, as a random function with polynomial size description is w.h.p. such a generator. Consequently, no hardness assumption is needed for Monte Carlo constructions against polynomial size circuits, which are secure also against O(log n) space online channels. cc List-Decodable Codes for Bounded Channels Page 9 of 70 3 Theorem 1.3. (Explicit stochastic codes for space O(log n) online channels) For every constants 0 < p < 1 2 , > 0 and c > 1, there exists a constant L such that for infinitely many n, there are explicit stochastic codes with rate 1 − H(p) − that are Llist decodable for space c log n online channels that induce at most pn-errors. Theorem 1.3 is stated in more detailed form in Theorem 5.6.
Efficiency of encoding/decoding versus channel complexity. The approach in Guruswami & Smith (2016) (that we also use) dictates that security can only be proven for channel families that are not sufficiently strong to run the decoding algorithm. 5 Consequently, in the Monte Carlo construction and our Theorem 1.2, the running time of encoding and decoding is a polynomial in n that is larger than the circuit size n c . It is an intriguing open problem whether stochastic codes with rate approaching 1 − H(p), that can be encoded and decoded in fixed polynomial time (say n 3 ) against any polynomial size channel, can be constructed (under cryptographic assumptions). We do not know whether this is possible.
We can, however, expect to obtain fixed polynomial time (that does not depend on the constant c) for encoding and decoding in our Theorem 1.3. Unfortunately, this is not the case, and the encoding and decoding algorithm that we obtain in Theorem 1.3 runs in time polynomial in n c (and in particular larger than n c ) when working against space c log n channels. We do not know how to avoid this dependence.
1.2.3. Stochastic codes for AC 0 channels, with fixed polytime encoding/decoding. We are able to obtain fixed polynomial time algorithms for encoding and decoding for a family of channels implemented by superpolynomial size and constant depth circuits. For technical reasons, we achieve this only for p ≤ p 0 for some p 0 > 0. The result is stated below.
Theorem 1.4 (Explicit stochastic codes for AC 0 channels). There are constants p 0 > 0 and a > 0 so that the following holds. For every constants 0 < p ≤ p 0 , > 0 and d, there exists a constant L such that for infinitely many n, there are explicit stochastic codes with rate 1 − H(p) − that are L-list decodable for size 2 n (1/ad) circuits of depth d that induce at most pn-errors. (Here, encoding and decoding runs in polynomial time that does not depend on d.) The constant p 0 comes from a specific construction of AG-codes, and it seems that p 0 can be pushed to be any constant strictly smaller than 1/12. Theorem 1.4 is stated in more detailed form in Theorem 5.7.

Perspective.
Explicit codes against computationally bounded channels give the "best of both worlds": They can recover from errors induced by adversarial channels, while having information theoretic optimal rate approaching 1 − H(p).
As pointed out by Guruswami and Smith, essentially all randomized channels studied in the Shannon framework of error correcting codes are computationally simple (and it seems that all of them can be implemented by constant depth circuits or online logspace). This means that the computational perspective leads to a unified construction of explicit codes that are good for all "Shannon style" randomized channels simultaneously, while also being able to recover against many adversarial channels (and in particular against additive channels).
We believe that the distinction we make above (namely, whether encoding/decoding efficiency is allowed to increase with the complexity of the channel) is important so that the added benefit of codes for computationally bounded channels doesn't come with a price tag of being less efficient. Specifically, our construction for AC 0 channels uses "regular" coding theoretic ingredients and does not have to "pay extra" for being able to handle channels that are superpolynomial size circuits of constant depth.
An intriguing open problem is whether unique decoding is possible for computationally bounded channels with rate approaching 1 − H(p). Guruswami & Smith (2016) showed that this is impossible for p > 1/4 (and their argument works for all classes of channels discussed in this paper). It is not known whether unique decoding is possible for p < 1/4 for the channel classes that we consider.
1.4. Some related work. The notion of computationally bounded channels was initially studied in cryptographic setups. We mention some of these works below.
Shared private randomness. We start with the notion of codes with "shared private randomness." While this setup was considered before the notion of stochastic codes, in this paper, it is natural to view it as a version of stochastic codes in which the decoding algorithm does receive the S.
This corresponds to a standard symmetric cryptography setup in which honest parties (the encoder and decoder) share a uniform private key S, and the bad party (the channel) does not get the key. Lipton (1994) and following work (see Smith (2007) for more details) gave explicit constructions of uniquely decodable codes against computationally bounded channels, with rate approaching 1 − H(p), under cryptographic assumptions.
Note that the setup of stochastic codes is lighter. The encoder and decoder do not need to share a private random key. Moreover, a fresh key can be chosen on the spot every time the encoder encodes a message.
We also point out that the Monte Carlo construction of Guruswami and Smith also requires less setup. While the encoder and decoder do need to share a random string, this string does not need to be private. It can be chosen once and revealed to the channel.
Private Codes. A related notion of "private codes" was studied by Langberg (2004). Here channels are unbounded, codes are existential (and not explicit), and the focus is on minimizing the length of the shared key. Langberg provides asymptotically matching upper and lower bounds of Θ(log n + log(1/ν)), on the amount 3 Page 12 of 70 Shaltiel and Silbak cc of randomness that needs to be shared for unique decoding in this setup, where ν is the error parameter.
Public key setup. Micali et al. (2010) considered computationally bounded channels and a cryptographic public key setup. Their focus is to use this setup to convert a given (standard) explicit listdecodable code into an explicit uniquely decodable codes (in this specific public key setup).
Subsequent work. Following this paper, Kopparty et al. (2019) gave an explicit construction of stochastic codes with rate approaching 1 − H(p), for space bounded channels with space s = n Ω(1) with encoding and decoding in quasilinear time. This improves upon the construction in this paper that requires encoding and decoding time at least 2 s and therefore only achieves polynomial time for s = O(log n).
In a recent and yet unpublished result, Shaltiel & Silbak (2020) extended the results of Kopparty et al. (2019) to give explicit uniquely decodable codes with rate approaching 1 − H(p), for every p < 1 4 . Both these works build heavily on the approach of Guruswami & Smith (2016) as well as on the techniques introduced in this paper.

Overview of the technique
In this section, we give a high level overview of the construction. Our construction heavily relies on previous work in the area (mainly on that of Guruswami & Smith (2016)). In this high-level overview, we attempt to highlight our technical contribution, while also giving a high level overview of the many ideas from previous work that are used in the construction. Therefore, we start with a high level description of earlier work and build up to the work of Guruswami and Smith. Along the way, in Section 2.2 we explain the modifications that allow us to handle weak classes of channels. Finally, in Section 2.4, we present a self-contained problem (that of constructing inner stochastic codes). Constructing such explicit codes is the main source of our improvement over Guruswami and Smith, and we give a high level overview of our approach. The reader can skip this high level overview and go directly to the technical section.

Codes for the setup of shared private randomness.
We start by explaining how to construct codes with rate approaching 1 − H(p) in the case that the setup allows shared private randomness. Recall that this can be thought of as a stochastic code in which the decoding algorithm receives the random string chosen by the encoding. We present the ideas that are used to construct codes against bounded channels in this setup, in two steps. We first explain how to handle additive channels, and then explain how this method can be extended to handle bounded channels that are not additive. The ideas from both these reductions are key components in the construction of Guruswami and Smith. Reducing additive channels to binary symmetric channels.
We start by constructing codes with shared private randomness against additive channels. The encoder and decoder will share a description S π of a uniformly selected permutation π : [n] → [n]. The encoding will be defined by Enc(m, S π ) = π(Enc BSC (m)), meaning that Enc encodes m by a code for binary symmetric channels and then uses the permutation π to rearrange the n indices of the encoding, placing the i'th bit, in the π(i)'th position. Note that for any additive channel C e (z) = z ⊕ e that induces pn errors, the effect of the channel on Enc(m, S π ) can be essentially viewed as applying a binary symmetric channel on Enc BSC (m), meaning that the decoder is able to uniquely decode against additive channels, with a code that has rate approaching R = 1 − H(p) (which can be achieved explicitly for binary symmetric channels). Smith (2007) showed that an (almost) t-wise independent permutation can be coupled with specific constructions of codes for binary symmetric channels and used instead of a truly random permutation. This reduces the length of the shared key and allows keys shorter than n.
Reducing computationally bounded channels to additive channels. It is possible to use cryptography (or more generally pseudorandomness) to handle computationally bounded channels: Assume that in addition to the seed S π , the encoder and decoder also share a seed S P RG for a pseudorandom generator P RG that fools computationally bounded channels and outputs n bits, and define: Enc (m, (S π , S P RG )) = Enc(m, S π ) ⊕ P RG(S P RG ) = π(Enc BSC (m)) ⊕ P RG(S P RG ).
This means that the rate of Enc is inherited from Enc and can approach 1 − H(p). A useful property is that for every fixed s π , the random variable Enc(m, (s π , S P RG )) is pseudorandom for the channel. This can be used to show that a computationally bounded channel cannot prevent correct decoding. We now explain this argument more precisely (as we will need to extend it in order to handle weak channel classes). We start by specifying the decoding algorithm Dec (y, (s π , s P RG )) which will simply compute y = y ⊕ P RG(s P RG ) and apply the previous decoding algorithm Dec on y and s π . We now show that for every computationally bounded channel C that induces at most pn errors, the decoding succeeds with probability at least 1−(ν + P RG ), where P RG is the error of the generator P RG.
We consider the function A(m, s π , e) that checks if Dec BSC (Enc(m, s π ) ⊕ e) successfully recovers m. In the previous section, we've seen that for every message m, and error vector e of relative Hamming weight at most p, Pr[A(m, S π , e)] ≥ 1 − ν. Consequently, for every channel C that induces pn errors, (this follows as U n is independent of S π , and recall that E C (z) = z ⊕ C(z)). If decoding does not work, and there exist a message m such that: Pr[A(m, S π , E C (Enc (m, (S π , S P RG )))) = 1] < 1 − (ν + P RG ).
By averaging over S π , there exists a fixed value s π such that: meaning that D(z) = A(m, s π , E C (z)) distinguishes Enc(m, (s π , S P RG )) from U n with probability P RG , which is a contradiction if P RG is P RG -pseudorandom against D (which is essentially the composition of the channel and Dec BSC ). As Dec BSC runs in polynomial time, it follows that a PRG against poly-size circuits suffices to handle poly-size channels.

2.2.
A more efficient reduction for online logspace and AC 0 . At this point, we make a detour from the description of the earlier work in order to explain how we modify this argument when we need to handle weak channel classes like constant depth circuits or online logspace channels.
A drawback of the approach described above is that while the decoding algorithm Dec BSC runs in polynomial time, existing constructions of such decoders rely on decoding an "outer code" (typically, Reed-Solomon) which cannot be done by small constant depth circuits or small space ROBPs. In this paper, we are interested in channels that run in online logspace or AC 0 . We would like to use PRGs that fool these weaker classes (for which explicit constructions are unconditional) rather than PRGs for poly-size circuits (which are inherently conditional as they imply circuit lower bounds).
For this purpose, we replace the code (Enc BSC , Dec BSC ) (for binary symmetric channels) by a code (Enc balanced , Dec balanced ) that is list decodable from balanced errors. We now define this notion. A string e ∈ {0, 1} n is (b, p, γ)-balanced if when viewed as e ∈ ({0, 1}b) n/b , at most a γ fraction of blocks of e (of sizeb), have relative Hamming weight larger than p. It is not hard to construct explicit codes which are list decodable (with constant size lists) against error vectors that are (b, p, γ)-balanced and have rate approaching 1 − H(p) for small constant γ. We give such a construction in Section 3.2.1.
If we take an error vector of relative Hamming weight p and permute it using a random (or t-wise independent) permutation, then with high probability it will indeed be (b, p+α, γ)-balanced for sufficiently largeb, and small constant α, γ > 0. This means that codes against balanced errors in particular work against binary symmetric channels.
The advantage of this notion is that the function A of the previous section can be made more efficient. Rather than having to decode Enc BSC , it is sufficient to check if the error vector e is (b, p + α, γ)-balanced, which can be performed by models that can count (or even only approximately count) such as small ROBPs, or AC 0 . This leads to more efficient reductions that enable us to use PRGs for weaker classes. 6 2.3. Stochastic codes for bounded channels. We now survey the approach of Guruswami & Smith (2016) to take codes for shared private randomness (as presented in Section 2.1) and convert them into stochastic codes.
Let (Enc , Dec ) be the code for shared private randomness (presented in the previous section). We will reserve N for the block length of the stochastic code that we want to construct, and use N data as the block length of Enc . We have that the rate of Enc can approach 1 − H(p) and so it is sufficient that the rate of the code (Enc, Dec) that we construct, approaches that of Enc .
We will set N = N data + N ctrl where N ctrl = · N (for a small constant ) so that the rate indeed approaches 1 − H(p). Loosely speaking, when given a message m and "control information" S (which will include (S π , S P RG ) as well as additional randomness), we will set c data = Enc (m, (S π , S P RG )) ∈ {0, 1} N data and c ctrl ∈ {0, 1} N ctrl will be an encoding of S (that we specify later). We will then merge these two strings into a string c = (c data , c ctrl ) of length N .
The high-level intuition is that the encoder encodes the control information S and embeds it in the encoding of m, hoping that the decoder can find it, decode it to get S and then use the decoding algorithm Dec (which requires S) to decode the data part. However, there are two seemingly contradicting requirements: On the one hand, the decoder needs to find the "control information" in order to recover S. On the other, if it is easy to identify which part of the encoding encodes the "control information," then the channel can focus its errors on it, wiping it out completely.
Stochastic codes for additive channels. The first step taken by Guruswami and Smith is to ensure that an additive channel cannot wipe out the control information. For this purpose, they divide the N output bits into n = N/b blocks of length b (where b is a parameter to be chosen later). The encoder will use additional randomness S samp to choose a random set I = {i 1 , . . . , i n } of distinct indices in [n]. The string S samp will be part of the "control information" S (making S = (S samp , S π , S P RG )) and in order to make its length less than n, the sampling is made by a randomness efficient averaging sampler (see Section 3.3 for details). We will pretend that the set I is completely random in this high level presentation.
The set I will define which blocks are "control blocks," and the final embedding of c data , c ctrl into an N bit string, is done by placing c ctrl in the control blocks, and c data in the remaining blocks (which are suitably called data blocks). The sampling of I guarantees that for every fixed error vector e of relative Hamming weight at most p, at least an /2 fraction of the control blocks are not hit with significantly more than pb errors. This will suffice for the decoding algorithm.
The decoder (that does not know I) will go over all n blocks, treating each one of them as a potential control block. Even if no errors are inflicted, only · n of the n blocks are indeed control blocks. We want the decoder to be able to "list-decode" and output a small list of candidates s for the "control information." This can be done as follows: When preparing c ctrl , the control information S will be encoded by a concatenated code, where the outer code is list decodable (or more generally list recoverable) and has block length · n, and the inner code has symbols of b bits, and is decodable from slightly more than pb errors. This way, if at least /2 fraction of the control blocks suffer not much more than pb errors (and are therefore decoded correctly by the inner code), then the list-decoding algorithm of the outer code produces a list of candidates that includes the correct control information s. Decoding can now proceed, and for each such candidate s, it can apply Dec on the data part (defined by s samp ) using the control information (s π , s P RG ). This indeed suffices for list-decoding against additive channels.
Extending the approach to computationally bounded channels. There is an obvious concern if we use this strategy against channels that are not additive: The channel C may inspect different n blocks and try to identify which of them are control blocks. It is crucial that the channel will not be able to distinguish a control block from a data block. This means that we want the inner code that produces the b-bit control blocks to have three properties: • It should be able to decode from roughly pb errors.
• The channel should not be able to distinguish control blocks from data blocks.
• Control blocks shouldn't reveal information about S to the channel.
Here, it is useful that the data part is xored with P RG(S P RG ) and is therefore pseudorandom. This means that we can obtain these three properties if we use a stochastic code (instead of a standard code) and require that the output is pseudorandom. Note that here the notion of stochastic codes is not used to "improve decoding properties" (we can do with standard codes). Instead, it is used to perform encoding in a way that does not reveal information about the message. This notion of stochastic codes is defined in the next section.

Pseudorandom stochastic inner codes. Guruswami and
Smith considered the following version of stochastic codes. Let 1. We say that Enc is -pseudorandom for a class of functions C if for every m ∈ {0, 1} k and for every C ∈ C, the distribution Enc(m, U d ) is -pseudorandom for C, meaning that:

2.
We say that Enc is L-list decodable with radius p if there exists a function Dec such that for every Such codes can be plugged in as "inner control codes" in the scheme described in the previous section, and the two properties above suffice for the correctness of the construction (if pseudorandomness is guaranteed against a class sufficiently stronger than the channel, as explained in Section 2.2). Consequently, the task of explicitly constructing stochastic codes against bounded channels reduces to explicitly constructing such stochastic codes with constant size lists. Here, we benefit from the fact that these codes are used as inner codes. The block length b of the inner stochastic code can be much smaller than the block length N of the final code. Note, however, that pseudorandomness needs to hold with respect to channels (and even more complex functions) that have complexity measured as a function of N (which in turn gives a lower bound on b).
We first concentrate on the case where channels are circuits of size N c (which is the case considered by Guruswami and Smith). This allows us to set k, d, b = O(log N ) which in turn means that: we need pseudorandomness against circuits of size N c = 2 Ω(b) , and we are allowed to perform encoding and list-decoding in time 2 O(b) .
However, even with these choices, it seems hard to construct such stochastic codes (no matter what complexity assumption we use). Guruswami & Smith (2016) were not able to give such explicit constructions. Instead, they settle for a Monte Carlo construction using the probabilistic method: They describe a probability space over functions Enc : in which a code with the two properties above is chosen with high probability. The description of Enc in this proba-bility space is of length polynomial in N c , and so this indeed gives a Monte Carlo construction. 7

New constructions of pseudorandom weak inner codes.
We observe that we can relax the second property in the definition of stochastic inner codes, and still be able to use them in the framework described in the earlier sections. Specifically, let Enc : be a function, we use the following modification of condition (2) above: 2'. We say that Enc is L-weakly list decodable with radius p if there exists a function Dec such that for every y The key difference between "weakly list decodable" and the notion used by Guruswami and Smith (which we will call "strongly list decodable") is that this definition allows a message m to be encoded to the same value under many different seeds r, whereas the previous definition did not. 8 It turns out that constructing codes with properties 1 and 2 is significantly simpler than constructing codes with the original properties. More specifically, let a, a , q be constants so that q is sufficiently larger than a, a (the exact dependence is q ≥ a+a 1−H(p)−1/(L+1) ). For the case of inputs and outputs that are of length O(log N ), we give a general transformation that for every 0 < p < 1 2 takes, Note that the obvious approach for checking whether a candidate Enc is pseudorandom against circuits of size N c requires going over all such circuits which is not feasible in polynomial time. and converts it into that is pseudorandom for C, and is L-weakly list decodable with radius p. Furthermore, encoding and list-decoding can be done in time that is poly(N q ) times the running time of G.
This transformation works by setting Enc(m, r) = E(m)⊕G(r) where E : {0, 1} a log n → {0, 1} q log n is a random code. The argument is similar to the proof that random codes are list decodable, and explicitness is achieved by derandomizing the probabilistic argument using (L + 1)-wise independence and using brute force decoding. (Here it is crucial that we are allowed to encode and decode in exponential time in the input and output length.) 9 We can use these transformation to obtain stochastic codes that are weakly list decodable from radius 0 < p < 1 2 and are: • Pseudorandom against size N c circuits, using the pseudorandom generators of Impagliazzo & Wigderson (1997) which rely on the assumption that there exists a constant β > 0 and a problem in E = DTIME(2 O(n) ) that cannot be solved by circuits of size 2 β·n for every sufficiently large n. This gives Theorem 1.2.
• Pseudorandom against space O(log n) ROBPs, using the pseudorandom generators of Nisan & Zuckerman (1996). This (together with the improvements explained in Section 2.2 and some additional effort that goes into making the reduction implementable by small space ROBPs, explained in Section 6.2) gives Theorem 1.3.

Inner stochastic codes for AC 0 .
Our goal is to construct a stochastic code Enc : 1} n that is weaklylist decodable from radius p > 0, and -pseudorandom against large circuits of constant depth d. We want these codes to have fixed poly(n) time encoding and decoding. This is because in the final construction, we will choose the block length n to be N 0.1 (where N is the block length of the final codeword). This choice will enable fooling circuits of superpolynomial size. We will use an explicit binary linear code Enc AG : {0, 1} d+k → {0, 1} n with constant rate R that decodes pn errors. There are constructions of explicit codes with rate R > 0 and p > 0 that have the additional property that the relative distance of the dual code is at least p. Such constructions can be obtained by using the Algebraic Geometric codes of Garcia & Stichtenoth (1996) (that are over constant size alphabets that can be chosen to be a power of two) and viewing them as binary codes (which preserves rate, and decreases relative distance and relative dual distance by a constant factor). A description of these codes appears in a paper by Shpilka (2009) (in an appendix attributed to Guruswami), and we elaborate on this result in Section 4.3.
Let G be the (d + k) × n generator matrix of such codes, and let G (t) denote the d × n matrix obtained by the first d rows of G, and G (b) denote the bottom k × n rows of G. For simplicity, let us set k = d, so that both are linear in n. In the construction of Garcia and Stichtenoth, it can be arranged that G (t) is a generator matrix for a code with similar properties, and in particular the code generated by G (t) has relative dual distance p > 0 (we may need to slightly decrease p for this to hold). We define: We note that the dual code to the code defined by G (t) has relative distance p. This means that (the transpose of) G (t) is the parity check matrix of a code with relative distance p, which in turn implies that every pn columns of G (t) are linearly independent. This gives that the distribution r · G (t) for r ← U d is pn-wise independent, and implies that for every x ∈ {0, 1} k , Enc(x, U d ) is pn-wise independent. By Braverman's theorem (Braverman 2010)  (see also later improvements by Tal (2014)) "polylog-wise independence fools AC 0 ," and in particular, pn-independent distributions are pseudorandom for circuits of size 2 n Ω(1/d) and depth d.
The code Enc AG is uniquely decodable from pn errors. This immediately gives that Enc is (strongly) list decodable with radius p.

Organization of the paper
In Section 3, we give definitions of objects used in out constructions, and the constructions from earlier work that we rely on. In Section 4, we give precise definitions for several variants of stochastic codes, and give constructions of inner stochastic codes that will be used in the main result. In Section 5, we present the construction of stochastic codes, and restate the theorems from the introduction in a more precise way. In Section 6, we prove the correctness of the construction (and explain how to handle weak classes of channels). In Appendix A, we show a construction and proof for codes against balanced errors.

Ingredients used in the construction
In this section, we give formal definitions of the notions and ingredients used in the construction. We also cite previous results from coding theory and pseudorandomness that are used in the construction.

Pseudorandom generators.
Definition 3.1 (Pseudorandom generators). A distribution X on n bits is -pseudorandom for a class C of functions from n bit to one bit if for every In the sections below, we list the constructions of pseudorandom generators, that we use in this paper. We consider several choices of classes C. Definition 3.2 (E is hard for exponential size circuits). We say that E is hard for exponential size circuits if there exists β > 0 and a language L ∈ E = DTIME(2 O(n) ) such that for every sufficiently large n, circuits of size 2 β·n fail to compute the characteristic function of L in inputs of length n.
Theorem 3.3. (Impagliazzo & Wigderson 1997) If E is hard for exponential size circuits then for every constant c > 1, there exists a constant b > 1 such that for every sufficiently large n, there is a G : {0, 1} b·log n → {0, 1} n that is a 1 n c -PRG for circuits of size n c . Furthermore, G is computable in time poly(n c ) (where this polynomial depends on the constant β hidden in the assumption).
We also need PRGs with error that is exponentially small in the seed length. In this setup, we only require arbitrary linear stretch.
Theorem 3.5. (Nisan & Zuckerman 1996) For every b > 1, there exists a constant a > 1 such that for every sufficiently large n, there is a G :

Constant depth circuits.
Theorem 3.6. (Nisan 1991;Tal 2014;Trevisan & Xue 2013) There exists a constant a > 1 such that for every constant d, and for every sufficiently large n, there is a cc List-Decodable Codes for Bounded Channels Page 25 of 70 3 G : {0, 1} (log(s/ )) a·d → {0, 1} n that is an -PRG for circuits of size s ≥ n and depth d. Furthermore, G is computable in time poly(n).
We will also use Braverman's result that polylog-wise independence fools AC 0 .
Theorem 3.7. (Braverman 2010;Tal 2014) There exists a constant a > 1 such that for every sufficiently large n, every (log(s/ )) a·dwise independent distribution on n bits is -pseudorandom for circuits of size s and depth d.

Error-Correcting Codes.
We give a nonstandard definition of error-correcting codes below. For our purposes it is more natural to define codes in terms of a pair (Enc, Dec) of encoding and decoding algorithms. Different variants are obtained by considering different tasks (decoding, list-decoding, list-recovering) of the decoding algorithms and different types of error vectors. 11 Definition 3.8 (Codes). Let k, n, q be parameters and let Enc : {0, 1} k → ({0, 1} log q ) n be a function. We say that Enc is an encoding function for a code that is: A code is explicit if its encoding and decoding functions are computable in time polynomial in their input and output. The rate of the code is the ratio of the message length and output length of Enc, where both lengths are measured in bits.

Codes for balanced errors.
We will make use of codes for balanced error vectors (as explained in Section 2). It is not hard to construct codes for balanced errors with rate approaching 1 − H(p), using code concatenation. The proof of Theorem 3.10 appears in Appendix A.
Theorem 3.10 (codes against balanced errors). There is a constant c > 0 so that the following holds. For every constants 0 < p < 1/2, > 0 and γ ≤ ( /c) 2 , there are constants b and L such that for infinitely many n, there is a code (Enc, Dec) with rate 1 − H(p) − that is L-list decodable against (b, p, γ)-balanced strings of length n. Moreover, the code is explicit (encoding and list decoding can be performed in time poly(n), where the polynomial depends on ).

List-recoverable codes.
We will make use of the following list-recoverable code. Theorem 3.11. (List-recoverable codes, Sudan (1997), Guruswami & Sudan (1999)) There is a constant β > 0 such that for every constants α > 0 and L > 1, and every sufficiently large block length n, there is a code (Enc, Dec) that is (α, β · α · L · n, L)list recoverable, has rate R ≥ β·α L and alphabet size q = n 2 . 13 This follows as Sudan (1997) (see also Guruswami & Sudan (1999)) showed that Reed-Solomon codes are list recoverable from a collection. Given a code Enc that is list recoverable from a collection, Enc (x) i = (Enc(x), i) gives a code that is list recoverable, while increasing the alphabet size. This is why we have the alphabet size of q = n 2 (and not q = n) for a Reed-Solomon code. This idea is also implicitly used by Guruswami & Smith (2016).

Averaging Samplers.
The reader is referred to Goldreich (1997) for a survey on averaging samplers.

Definition 3.12 (Averaging Samplers). A function Samp
A sampler has distinct samples if for every x ∈ {0, 1} n , the elements in Samp(x) are distinct.
The next theorem follows from the "expander sampler." This particular form can be found (for example) in Vadhan (2004).

Almost t-wise permutations.
We also need the following notion of almost t-wise permutations.
Theorem 3.15. (Kaplan et al. 2009) For every t and every sufficiently large n, there exists an ( , t)-wise independent permutation with d = O(t · log n + log(1/ )). Furthermore, π is computable in polynomial time.

Inner stochastic codes
As explained in Section 2.4 and Section 2.5, the construction will rely on an "inner stochastic code." We now give a formal definition of the properties required from these codes. This definition formalizes the looser description given in Section 2. If we do not mention whether the code is weakly or strongly list decodable, then we mean "weakly." In the remainder of this section, we give explicit constructions of "inner stochastic codes" for various channel classes that we consider. We start with a general transformation that transforms a PRG into an inner stochastic code.

PRGs give inner stochastic codes.
We give a general transformation that given a PRG with: • A seed length that is logarithmic in the complexity of the channel.
• Sufficiently large linear stretch as a function of p.
Produces a stochastic code that: • Inherits the logarithmic seed length and pseudorandomness properties of the PRG.
• Is able to encode a string of length logarithmic in the complexity of the channel.
• Is L-weakly list decodable from radius p where L is a constant.
• Has encoding and decoding running in time polynomial in the complexity of the channel and the running time of the PRG.
This transformation is formally stated in the next theorem. We need the following definition that formally defines the action (which we call "xored-restriction") of restricting functions to a subset of the input and negating some of the remaining input bits. The complexity classes that we consider in this paper (AC 0 , P/poly, logspace ROBPs) are all closed under xored restriction. (This is also the case for any natural nonuniform complexity class).
Definition 4.2 (xored restriction). We say that a function C over n bits is an xored-restriction of a function C over n bits if there exist strings y ∈ {0, 1} n , a ∈ {0, 1} n−n and a set S ⊆ [n] of size n such that for every input x , C (x ) = C(x), where x is an n bit string obtained by "filling" the indices in S with x ⊕ y, and the indices outside of S with a.
We now state our transformation. We state that the rate of the inner stochastic code is of secondary importance in our application, as we will use it to encode "control information" that is shorter than the data. • L-weakly list decodable from radius p.
• If for every function C in C and every xored restriction C of C, the function C is in C , then Enc SC is -pseudorandom for C.
• The algorithms Enc SC , Dec SC are computable in time poly(n q·L ) given oracle access to G. (In particular, the code can be encoded/decoded in time poly(n) if G runs in time poly(n)).
Proof. The code will be a combination of two functions, E : {0, 1} a·log n → {0, 1} q·log n and G : {0, 1} b·log n → {0, 1} q·log n , and we will have: Enc SC (x, r) = E(x) ⊕ G(r). We will use a probabilistic construction (similar to that used to show existence of capacity achieving, binary list-decodable codes) which we later derandomize using (L + 1)-wise independence. Claim 4.4. Let E : {0, 1} a·log n → {0, 1} q·log n be chosen at random, so that the random variables (E(x)) x∈{0,1} a·log n are (L + 1)wise independent. Then, with positive probability, Enc SC (x, r) = E(x) ⊕ G(r) is L-weakly list decodable from radius p.
Proof. (of claim) Given y ∈ {0, 1} q log n , we use B(y, p) to denote the ball of radius p · (q · log n) centered at y ∈ {0, 1} q log n . For every x ∈ {0, 1} a log n and y ∈ {0, 1} q log n , we define a random variable indicator We have that: Given a tuple x 1 , . . . , x L+1 ∈ {0, 1} a log n and y ∈ {0, 1} q log n , let B x 1 ,...,x L+1 ,y be the "bad event" that the L + 1 points x 1 , . . . , x L+1 all have seeds of G that make them land in the ball of y, namely B x 1 ,...,x L+1 ,y = 1 iff ∀i ∈ [L + 1], ∃r ∈ {0, 1} b log n such that E(x i ) ⊕ G(r) ∈ B(y, p). The random variables E(x 1 ), . . . , E(x L+1 ) are independent, and therefore, Note that Enc SC (x, r) = E(x)⊕G(r) is L-weakly list decodable from radius p, if and only if the event B x 1 ,...,x L+1 ,y is equal to zero for all choices of x 1 , . . . , x L+1 ∈ {0, 1} a·log n and y ∈ {0, 1} q·log n . Therefore, by a union bound, the probability that we don't obtain an L-weakly list-decodable code from radius p is at most:  H(p) , then the probability is less than one, and there exists an L-weakly list-decodable code from radius p. 14 Given oracle access to a candidate function E : {0, 1} a·log n → {0, 1} q·log n and to G : {0, 1} b log n → {0, 1} log q , we can check whether E induces a code with the required properties in time poly(n q ).
It is standard that there are constructions of 2 a log n = n a random variables that are (L + 1)-wise independent, and each variable is uniform over {0, 1} q log n , that can be sampled using only (L + 1) · q log n random bits. Therefore, in time poly(n L·q ) we can go over all candidate E's, and find one which induces an L-weakly list decodable from radius p.
Once we find a good function E we are guaranteed that Enc SC is -pseudorandom for C. Proof. Otherwise, there exists x ∈ {0, 1} a log n and a function C ∈ C that distinguishes Enc SC (x , U b log n ) = E(x ) ⊕ G(U b log n ) from uniform. This means that there is an xored restriction C of C that distinguishes G(U b log n ) from uniform, and this is a contradiction.
Finally, it remains to justify the claim about the decoding procedure. Given a string y ∈ {0, 1} q log n , the decoding algorithm will use brute force to go over all (x, r) ∈ {0, 1} a·log n × {0, 1} b·log n , and check for each whether δ (Enc(x, r), y) ≤ p. By the L-weakly listdecodable property, there will be at most L distinct values of x. The decoding complexity is O(2 a log n · 2 b log n ) = poly(n a+b ) with oracle access to G.

Inner Stochastic codes for circuits and ROBPs.
By plugging in the pseudorandom generators from Theorem 3.3 and Theorem 3.5 in Theorem 4.3, we immediately obtain the following stochastic codes (that will be used in the construction).
Theorem 4.6 (inner stochastic code for poly-size circuits). If E is hard for exponential size circuits, then for every constant 0 ≤ p < 1 2 , c > 1 and a > 0 there exist constants L, b, q such that for every sufficiently large n, there is a stochastic code (Enc, Dec) where Enc : {0, 1} a·log n × {0, 1} b·log n → {0, 1} q·log n is: • L-weakly list decodable from radius p.
• 1 n c -pseudorandom for size n c circuits.
Furthermore, the code is explicit. Specifically, Enc, Dec are computable in time poly(n c ), where the polynomial depends on p, a and the constant β > 0 hidden in the hardness assumption.
• 1 n c -pseudorandom for space c log n ROBPs.
Furthermore, the code is explicit. Specifically, Enc, Dec are computable in time poly(n c ) where the polynomial depends on p, a.
4.3. Inner stochastic codes for AC 0 channels. In this section, we give a construction of inner stochastic codes for circuits of constant depth. This construction has the advantage that the encoding and decoding of the inner stochastic code run in fixed polynomial time and do not depend on the size or depth of the circuit family. Theorem 4.8 (inner stochastic code for AC 0 ). There exist constants p > 0, R > 0 and r, a > 1 such that for every sufficiently large m and n of the form n = (r m+2 − r m+1 ) · 2 log r, there is a stochastic code (Enc, Dec) where Enc : {0, 1} Rn × {0, 1} Rn → {0, 1} n that is: • 1-strongly list decodable from radius p.

Furthermore, the code is explicit. Specifically, Enc, Dec are computable in time poly(n), for a fixed universal polynomial (only the choice of what n is sufficiently large depends on the constants).
Proof. The theorem will use the following claim (that exploits properties of algebraic geometric codes due to Garcia & Stichtenoth (1996)) Claim 4.9. There exist constants p > 0, R > 0 and an integer constant r which is a power of 2 such that for every sufficiently large m, and n of the form n = (r m+2 − r m+1 ) · 2 log r there is a 2Rn × n matrix G (n) such that: • G (n) is a generator matrix for a binary linear [n, 2Rn]-code that is decodable from pn errors.
• The code (Enc, Dec) that is defined by G (n) is explicit (and in particular G (n) can be constructed in time poly(n)).
• Let G (n) t be the Rn×n matrix obtained by taking the first Rn rows of G (n) . G (n) t is a generator matrix for a binary linear [n, Rn]-code such that its dual code has distance larger than pn.
Proof. (of claim) It has already been observed that a code satisfying the first two properties can be derived from the codes of Garcia & Stichtenoth (1996). A self-contained summary of this argument is presented in Shpilka (2009) (the summary is in an appendix written by Guruswami). Theorem 4 in Shpilka (2009)  (see also Theorem 24 in appendix) contains a precise statement on the existence of such codes. Furthermore, Theorem 4 in Shpilka (2009) also asserts that the relative dual distance of the whole code (namely the code generated by the generator matrix G (n) ) is at least some positive constant.
We now observe that using the exact same argument, we can obtain the third item in the claim, namely that the code defined by the generator matrix G (n) t has positive relative dual distance.
For this purpose, we now survey the proof presented in Shpilka (2009) and explain why it also gives the third item of the claim. The proof presented in Shpilka (2009) uses definitions and notions from algebraic geometry. We will not define these concepts explicitly, and will rely on the notation used in Shpilka (2009).
The first step in the proof presented in Shpilka (2009) is to observe that in order to prove the claim for binary codes, it is sufficient to prove that for q = r 2 and every sufficiently large m, setting n m = r m+2 − r m+1 , there is a q-ary linear code with block length n m that satisfies the properties, and furthermore r is a power of 2. This is because (as explained in Lemmas 25 and 26 in Shpilka (2009)) a trivial argument shows that if q is a power of 2, and one views a q-ary linear code as a binary code, then the obtained code inherits the rate of the original code, while the relative distance and relative dual distance are divided by at most log q.
The proof in Shpilka (2009) proceeds to construct such a q-ary code. Specifically, it shows that there exists R > 0 such that for every r ≥ 8 there exists p > 0 such that for every sufficiently large m, there is an explicit r 2 -ary linear [n m , 2Rn m ] code with block length n m = r m+2 − r m+1 , constant positive rate 2R, and the code is explicitly decodable from p · n m errors. This code is obtained by using the argument of Garcia and Stichtenoth. The paper defines a family of q-ary codes that is referred to as C(D, αQ). In this notation D and Q are certain mathematical objects that we will not elaborate on,and α is an integer. What is important for our purposes is that the choice of D and Q determines two additional integer parameters n and g, and furthermore, in Lemma 22 in Shpilka (2009) it is shown that if α > 2g − 2 then the code C(D, αQ) has dimension k = α − g + 1, distance at least n − α, and dual distance at least α − (2g − 2). 15 Another observation that follows directly from the definition of the linear space C(D, αQ) is that if α ≤ α then C(D, α Q) ⊆ C (D, αQ).
The code C (m) described above is obtained by using the construction of Garcia & Stichtenoth (1996). More specifically, in the proof of Theorem 24 in Shpilka (2009), it is shown that for every integer r ≥ 8 and every sufficiently large m, there are mathematical objects C m , D m with parameters n m = r m+2 −r m+1 and g m ≤ r m+1 . The r 2 -ary code C (m) is then defined to be C(D m , α m Q m ) for α m = n m /2 + g m − 1. This choice is made so that the dimension of the code C (m) is k m = α m − g m + 1 = n m /2. 16 It is argued in the proof of Theorem 24 that this code is explicit, has positive constant rate (in fact it has rate precisely 1/2) and can be explicitly decoded from a positive fraction of errors. We define α m = n m /4 + g m − 1 and consider the code C which by 15 A helpful analogy to keep in mind is that if g = 0, these parameters are similar those obtained by a Reed-Solomon code, when using polynomials of degree α. The advantage of algebraic geometric codes is that at the cost of adding the additional "genus parameter" g, they enable alphabets of constant size. 16 We remark that the proof in Shpilka (2009) aims for more ambitious parameters than the ones we state here. Specifically, the choice of α m above is made to achieve rate exactly 1/2 while approaching relative distance of 1/2. the aforementioned properties has dual distance at least: By choosing r to be a sufficiently large power of 2, we can make sure that this dual distance is at least r m+2 8 ≥ n m /8. This gives that the relative dual distance of C (m) t is at least 1/8 > 0 as required. This concludes the proof of the claim.
This code is 1-strongly list decodable from radius p by the decoding properties of the code generated by G. More precisely, given z ∈ {0, 1} n , we can decode to a unique message y ∈ {0, 1} 2Rn that has Hamming distance at most pn from z, and this message y = r•x can be found efficiently.
We now show the pseudorandomness of Enc. Let G b denote the bottom Rn rows of G (and recall that G t denotes the top Rn rows of G). For every x, r ∈ {0, 1} Rn , The generator matrix G t generates a code with dual distance at least pn. Recall that the transpose matrix is the parity matrix of the dual code. Since the dual code has distance larger than pn, it follows that every pn columns of G t are linearly independent. This gives that the distribution r · G t for r ← U Rn is pn-wise independent, and implies that for every x ∈ {0, 1} Rn , Enc(x, U Rn ) is pn-wise independent. Braverman (2010) (and later improvements by Tal (2014)) (See Theorem 3.7) showed that t-wise independent distributions are -pseudorandom for circuits of size s and depth d, if t ≥ (log s ) c·d for some constant c. This gives that there exists a constant a > 1 such that Enc(x, U Rn ) is 2 −n (1/ad) -pseudorandom for circuits of size 2 n (1/ad) and depth d, as required.

The construction of stochastic codes
In this section, we give the construction of the stochastic code. Our construction imitates that of Guruswami & Smith (2016) (with the modifications explained in Section 2). We start with introducing some notation.

Partitioning codewords into control blocks and data blocks.
The construction will think of codewords c ∈ {0, 1} N as being composed of n = n ctrl + n data blocks of length b = N/n. Given a subset I ⊆ [n] of n ctrl distinct indices, we can decompose c into its data part c data ∈ {0, 1} N data =n data ·b and its control part c ctrl ∈ {0, 1} N ctrl =n ctrl ·b . Similarly, given strings c data and c ctrl , we can prepare the codeword c which we denote by (c data , c ctrl ) I by the reverse operation. More specifically, c = (c data , c ctrl ) I is a block composition of c data and c ctrl where each of the n ctrl control blocks that compose c ctrl is inserted into c according the indices specified in I so that restricting c to the blocks in I gives c ctrl (similarly, restricting c to the blocks in [n] \ I gives c data ). This is stated formally in the definition below.
Definition 5.1. Let I = {i 1 < · · · < i n ctrl } ⊆ [n] be a subset of indices of size n ctrl .
• Given strings c data ∈ {0, 1} N data and c ctrl ∈ {0, 1} N ctrl we define an N bit string c denoted by (c data , c ctrl ) I as follows: We think of c data , c ctrl , c as being composed of blocks of length b (that We enumerate the indices in [n] \ I by j 1 < · · · < j n data and  Permuting strings. Our construction will also use permutations to permute strings as follows: Definition 5.2. Given a string v ∈ {0, 1} N and a permutation π : Description of the construction. Our construction is described in detail in the three figures below. The choice of parameters and ingredients is described in Figure 5.1. The encoding algorithm is described in Figure 5.2, and the list-decoding algorithm is described in Figure 5.3. We state a general theorem that summarizes the correctness of the construction and will be used to prove Theorem 1.2, Theorem 1.3, Theorem 1.4.

Correctness of the construction.
Let C be a class of channels C : {0, 1} N → {0, 1} N that induce at most pN errors. We now show that if the ingredients PRG, Enc SC are pseudorandom for a class C that is sufficiently stronger than C, then the decoding algorithm of Figure 5.3 succeeds with high probability. This is stated precisely, in the next theorem, which uses the notion of "xored restrictions" defined in Definition 4.2. (We remind the reader that nonuniform complexity classes as the ones we consider in this paper are closed under xored restrictions).
Theorem 5.3 (Correctness of construction). For every constants 0 ≤ p < 1 2 and 0 < < 1 2 − p there exists a constants L = L LR · L balanced such that for infinitely many N the following holds: • N -The length (in bits) of the codeword. (Throughout we assume that N is sufficiently large). Other parameters are constants or chosen as a function of N .
• p -The fraction of errors we need to recover from. This is a constant.
• C -A class of functions (slightly stronger than the class of channels we allow).
• b -We will divide the N output bits to n = N/b blocks of length b, where 2 log N ≤ b ≤ N 1/10 is a function of N that will be chosen later on. This implies n ≥ N 0.9 .
• ν ≥ 2 − √ N -A bound on the failure probability of decoding (can be chosen as a function of N ).

Internal parameters:
• Blocks will be of two kinds: "control" and "data". We set n ctrl = · n and n data = n − n ctrl so that n = n ctrl + n data . Let N ctrl = b · n ctrl and N data = b · n data . So that N = N ctrl + N data .
• Let α > 0 be a sufficiently small constant that will be chosen later.

Ingredients that depend on the choice of channel class:
We are given: ) and is L SC -weakly list decodable from radius p + . We require that L SC is a constant, and SC ≤ N .
Other Ingredients: • A code Enc balanced : {0, 1} RN → {0, 1} N data with an algorithm Dec balanced that performs L balanced -list decoding from (b, p + α, α)-balanced errors. By Theorem 3.10 we have an explicit construction with rate R ≥ 1−H(p+α)−c √ α for some constant c, and whereb and L balanced are constants (chosen as a function of the constants α and p). We can choose α > 0 to be sufficiently small constant so that we indeed have R ≥ RN/N data = R/(1 − ), which guarantees that the rate of the final code is R.
• A code Enc LR : {0, 1} ctrl → ({0, 1} 2 log n ctrl ) n ctrl that is ( 2 100 , L SC · n, L LR )-list recoverable. Note that L SC · n = L SC · n ctrl . By Theorem 3.11 we can obtain such a code with constant rate R > 0 for some constant L LR (these two constants depend on ). The rate we allow for Enc LR above is ctrl 2 log n ctrl ·n ctrl By Theorem 3.15 we have an explicit construction with seed length N 0.7 ≤ ctrl .

Prepare data part:
We prepare a string c data of length N data as follows: • Encode m by x = Enc balanced (m).

Prepare control part:
We prepare a string c ctrl of length N ctrl (which we view as n ctrl blocks of length b) as follows: • Encode s by z = Enc LR (s). This is a string composed of n ctrl blocks of length 2 log n ctrl . • Use Enc SC as an "inner code" to encode blocks of z using the randomness r 1 , . . . , rn ctrl . That is, (c ctrl ) j = Enc SC (z j , r j ) = Enc SC (Enc LR (s) j , r j ).

Merge data and control parts:
We prepare the final output codeword c ∈ {0, 1} N by merging c data and c ctrl . That is, c = (c data , c ctrl ) I .
If the parameters and ingredients are chosen as shown in Figure 5.1, then the stochastic code (Enc, Dec) specified in Figure 5.2, Figure 5.3, has the following properties: • It has rate R ≥ 1 − H(p) − .
• It is L-list decodable with success probability 1 − ν for channels in C. Use each control candidate s to decode data: For each s = (ssamp, sπ, s PRG ) ∈ List ctrl (recall that there are L LR of them) we produce a list Lists of L balanced candidate messages. Our final output list will be the union of these lists.

Merge lists: The final output is List = s∈List ctrl
Lists. • There exist a polynomial P (·) that depends on p and such that: -The function Enc can be computed in DTIME P RG,Enc SC (P (N )) (and is therefore explicit if P RG, Enc SC are explicit).
-The function Dec can be computed in DTIME P RG,Dec SC (P (N )) (and is therefore explicit if P RG, Dec SC are explicit). an outer code in the proof of Theorem 3.10). In a preliminary version of this work, we incorrectly claimed that the list size in our current construction is L = poly(1/ ) which is not obtained by the current written proof. Nevertheless, we remark that we believe that both these issues can be resolved by using a more complicated construction and analysis. More specifically, by replacing codes for balanced errors, with codes for t-wise independent errors (as was originally done by Guruswami & Smith (2016), see Section 2.2). This modification requires some changes to the construction. It also requires a more complicated analysis for channels that are not sufficiently strong to decode codes for t-wise independent errors (such as AC 0 and online space channels). The approach for this analysis can be found in a preliminary version of Guruswami & Smith (2016), where it was used to handle online channels.

Choosing ingredients and parameters for specific channel families.
We now put everything together and choose pseudorandom generators and inner stochastic codes for poly-size circuits, online logspace and AC 0 .

Poly-size circuit channels.
Here we use the pseudorandom generator of Impagliazzo & Wigderson (1997) (that requires the assumption that E is hard for exponential size circuits). This PRG has logarithmic seed length, and can be used as P RG, as well as the pseudorandom generator that is transformed into an inner stochastic code Enc SC (as done in Theorem 4.6). The precise statement and parameter choices appear below: Theorem 5.5 (explicit codes for poly-size channels). If there exists a constant β > 0 and a problem in E = DTIME(2 O(n) ) such that for every sufficiently large n, solving the problem on inputs of length n, requires circuits of size 2 β·n , then for every constants 0 ≤ p < 1 2 , > 0, there exists a constant L such that for every constant c > 1 and for infinitely many N : • Let C be the class of all circuits C : {0, 1} N → {0, 1} N of size N c that induce at most pN -errors.
• Let C be the class of all size N 2c circuits that output one bit (this includes circuits for all input lengths up to N ). Here, we assume w.l.o.g. that c is sufficiently large so that in time N 2c we can compose size N c computations with fixed polynomial size computations.
• Let (Enc SC , Dec SC ) and the block length b be determined by Theorem 4.6. Specifically, let b = q · log N for a sufficiently large constant q, guaranteed by Theorem 4.6 so that we get that Enc SC : is L SC -weakly list decodable from radius p + for a sufficiently large constant L SC (chosen as a function of p), and furthermore,

Online logspace channels.
Here we use the pseudorandom generator of Nisan (1992). This PRG has seed length that is poly-logarithmic, and can be used as P RG. However, it is unsuitable to serve in the construction of inner stochastic codes. This is because the dependence of the seed length on the error does not allow linear stretch with error that is exponentially small in the seed length. Instead, we use the pseudorandom generator of Nisan & Zuckerman (1996) that has these properties and can be transformed into an inner stochastic code Enc SC (as done in Theorem 4.7). The precise statement and parameter choices appear below: Theorem 5.6 (explicit codes for online logspace channels). For every constants 0 ≤ p < 1 2 , > 0, there exists a constant L such that for every constant c > 1 and for infinitely many N : • Let C be the class of all space 2c log N ROBPs that output one bit (this includes ROBPs for all input lengths up to N ).
Here we assume w.l.o.g. that c is sufficiently large so that an ROBP of space 2c log N can compose space c logN online computation with c 0 log N online computation, for any fixed c 0 .
• Let (Enc SC , Dec SC ) and the block length b be determined by Theorem 4.7. Specifically, let b = q · log N for a sufficiently large constant q, guaranteed by Theorem 4.7 so that we get that Enc SC : is L SC -weakly list decodable from radius p + for a sufficiently large constant L SC (chosen as a function of p), and furthermore, Enc SC is N −(c+1) -pseudorandom for C . (Note that N −(c+1) ≤ ν 10·n ctrl as required).
These choices satisfy the requirements of Theorem 5.3, and by this theorem the stochastic code (Enc, Dec) specified in Figure 5.1, Figure 5.2, Figure 5.3 has rate 1−H(p)− , and is L-list decodable with success probability 1 − N −c against channels in C. Furthermore, Enc, Dec are computable in time poly(N c ) where the polynomial depends on p and .

Constant depth channels.
Here we use the pseudorandom generator of Nisan (1991). This PRG has seed length that is subpolynomial for any fixed constant depth d, and can be used as P RG. We use the construction of inner stochastic codes given in Theorem 4.8 for Enc SC . This construction only works for p < p 0 for some p 0 > 0, and this requirement is inherited by our final theorem. The precise statement and parameter choices appear below: Theorem 5.7 (explicit codes for constant depth channels). There exist constant p 0 > 0, d 0 > 1 and a > 0 such that for every constants 0 ≤ p < p 0 , > 0, there exists a constant L such that for every constant d > d 0 and for infinitely many N : 17 In Theorem 4.8 we are force to choose b of a special form, namely b = (r m+2 − r m+1 ) · 2 log r for some constant r. However, as these choices of b are dense, we can always find a suitable b that is Θ(N 1/10 ) which we can use.

Analyzing the construction
This section is devoted to proving Theorem 5.3.
The setup: Throughout the remainder of the section, we fix the following setup: Let 0 ≤ p < 1 2 and 0 < < 1 2 − p be constants. Let C, C be classes as required in Theorem 5.3. We use the choices and requirements made in Figure 5.1. More specifically, as shown in Figure 5.1, we assume that we are supplied with P RG and (Enc SC , Dec SC ) that satisfy the requirements made in Figure 5.1. That is, that for some "required error" parameter ν ≥ 2 − √ N we have: 10·n ctrl ) and is L SC -weakly list decodable from radius p + , for a constant L SC .
Our goal in this section is to show that for infinitely many N , the encoding and decoding algorithms specified in Figure 5.2 and Figure 5.3 satisfy the conclusion of Theorem 5.3. This setup is assumed throughout this section. Guruswami & Smith (2016), we will analyze the construction in two steps: We first consider the case that the channel C is an additive channel, namely that C(z) = z ⊕ e for some fixed error vector e, and later extend to general channels that can choose e as a function of z.

Milestones for correct decoding. Following
We present the following abstraction of this method (which will be convenient for our purposes as we use several different classes of channels). We will define "milestones" (as a function of m, s π , s samp and e) and will require that: 2. If S π , S samp are random and e is fixed (that is, if the channel is additive), then the milestones occur with probability close to one.
3. Checking whether the milestones occur is computationally easy.
We will state a general theorem showing that if such milestones exist, then the correctness of the decoding holds even against channels that are not additive, as long as the construction is using pseudorandomness against a class C that can simulate the channel and milestones. This is stated formally in the definition and theorem below (in which we allow milestones to be probabilistic).  (Enc(m, S, R)))] ≥ 1 − ν We defer the proof of the milestones lemma to Section 6.3. In the next section, we explain how the milestones lemma implies Theorem 5.3.

Milestones Lemma implies Theorem 5.3.
In this section, we show that Lemma 6.2 implies Theorem 5.3. Our task is to define a milestone function that meets the three requirements in Definition 6.1. We start with the following definition. Where e i j is the i j block of e of size b.
We will use slightly different milestone functions for different complexity measures (as we need the milestone function to be efficient for the corresponding complexity measure). It will be convenient to start by defining two milestone functions (a strong one, and a weak one). We will later show that more efficient milestone functions can be "sandwiched" between the two milestone functions. This will mean that correctness of the more efficient milestone functions will follow by analyzing the simpler versions.
Definition 6.4. It will be convenient to denote the input to a milestone function by (x, y) where x = (m, s samp , s π , e) and y is the "random coins;" we define the following functions (which do not depend on y) 18 : Control milestone: Let μ = 2 /4.
Note that for every (x, y), Combined milestones: Note that for every (x, y), The next two lemmas give that any milestone function that is "sandwiched" between A weak and A strong satisfies the first two properties of a milestone function. This follows as the function A weak was defined precisely so that the decoding components in the decoding algorithm of Figure 5.3 are used with the correct guarantee. A full proof appears in Section 6.4. This follows as the function A strong was defined precisely so that the pseudorandom components (the sampler and permutation) are "sufficiently random" to imply that A strong holds. For this, we only need to analyze the case where e is fixed and the Seeds (S samp , S π ) are chosen at random. A full proof appears in Section 6.5.
Milestones for poly-size circuits. Both functions A weak , A strong satisfy the first two properties, and are obviously computable in polynomial time. This immediately gives that they satisfy the third and final property if C is sufficiently stronger than C in the sense that it can run poly-time computations "on top of" computations in C. This also immediately implies Theorem 5.3 for the case where A is allowed to run in some fixed polynomial time.
We would like to give tighter reductions in which the milestone function is computable in AC 0 or by a small space ROBP. We now explain how to achieve such milestone functions.
Milestone function for constant depth circuits. We would like to implement the milestone function A weak (or A strong ) by a poly-size constant depth circuit. Note that the third property in Definition 6.1 considers the case that S samp , S π are fixed to some values s samp , s π , and the only live input is e. This means that the choice of permutation, and which blocks are control blocks is fixed (and can be hardwired as nonuniform advice) to the circuit. Furthermore, in the data milestone the inputs can be rearranged according to π sπ , at no cost, meaning that the circuit can compute e π from e at no cost. Thus, computing the milestone function reduces to several counting tasks on the number of ones in e and e π .
It is known that the problem of counting the number of ones in an n bit input, cannot be solved by poly-size depth circuits. However, Ajtai (1983) showed that for every η > 0, there is a polynomial size constant depth circuit that can produce a quantity that is the number of ones, up to an error of ηn. (In fact, the results of Ajtai are much stronger, and in particular allow subconstant η.) This means that there is a circuit with constant depth and polynomial size A ssamp,sπ (e) such that for every m, y: A strong (m, s samp , s π , e, y) = 1 ⇒ A ssamp,sπ (e) = 1 ⇒ A weak (m, s samp , s π , e, y) = 1 This means that the milestone function A middle (x, y) = A ssamp,sπ (e) satisfies the three properties of a milestone function proving Theorem 5.3 for the case of constant depth circuits.
Milestones for read once branching programs. As in the case of constant depth circuits, we need to implement the milestone function by an O(log n) space ROBP for fixed s samp , s π . Using the approach we used for constant depth circuits, this may seem easy at first glance, as ROBPs with space O(log n) can count up to n O(1) and this sufficed for the earlier implementation. Indeed, this reasoning applies to the control milestone, and the functions A strong ctrl and A weak ctrl can be easily implemented by an ROBP of space O(log n) (for fixed s samp , s π ).
The functions A strong data and A weak data pose a problem. Unlike circuits, an ROBP is not allowed to reorder the input by a fixed permutation π sπ prior to reading it. Thus, we cannot assume that online access to e, gives online access to e π .
We do have that s π is fixed, and can be hardwired to the ROBP. This means that when an ROBP reads the i'th bit of the input e, it can tell whether this bit belongs to a control block or a data block, and in the latter case, it can tell to which of the N data /b blocks of lengthb, does i belong to. (All these are operations that do not depend on e, and only depend on the fixed s samp , s π ). The issue is that the order in which the ROBP reads the data bits is permuted, and does not respect their partitioning into blocks of lengthb. This means that the ROBP cannot keep a single counter and use it for all blocks, and must maintain different counters, if it wants to count the number of ones in different blocks. The naive way to check if e π is balanced is to keep counters for all = N data /b blocks, and asb is constant, this takes space O( ) = O(N data /b) which is way too much. The solution is to use randomization. The milestone function is allowed to toss random coins (in the form of the input y). It will choose = O(log N ) uniform indices from [N data /b], and will only keep count of the number of ones in these blocks. (This can indeed be done in space O(log N )). The milestone function will count the fraction of sampled blocks which have Hamming weight larger than p + α/4, and use this quantity ρ as an approximation for the real quantity ρ (which is the fraction of blocks in e π which have Hamming weight larger than p + α/4). By a Chernoff bound, with probability 1 − 2 −Ω(α 2 · ) = 1− N O(1) , we have that |ρ − ρ | ≤ α/100. Therefore, the ROBP can safely output one if ρ ≤ α/2, as this indeed implies that This gives that by Lemma 6.5, A middle satisfies the first property of a milestone function. By Lemma 6.6, A middle defined in this form, satisfies the second property of milestone functions, where we suffer an additive loss of 2 −Ω(α 2 ) relative to what we can get for A strong , because of the error induced by the Chernoff bound. In Theorem 5.3, we are allowed to use space O(log N ) for ν = 2 −Ω(log N ) , and as α is a constant, the theorem follows.

Proof of Milestones Lemma.
We prove the milestones lemma in two steps, described in the two sections below. 6.3.1. The hiding lemma. The following lemma states that for a function D that is slightly weaker than functions in C , an encoding of a message m is pseudorandom for D. (We will later consider the case where D is a composition of a channel and milestone functions).
Lemma 6.7 (Hiding Lemma). Let D be a function such that every xored-restriction of D is in C . For every message m ∈ {0, 1} RN , sampler seed s samp ∈ {0, 1} ctrl and permutation seed s π ∈ {0, 1} ctrl , let V = Enc(m, s π , s samp , S P RG , R 1 , . . . , R n ctrl ) be a random variable (defined over the probability space where S P RG , R 1 , . . . , R n ctrl By the first property of a milestones function and an averaging argument, we have that: Enc(m, s, r)))} be the set of pairs on which C causes a decoding error. We have that Pr[(S, R) ∈ B] ≥ ν.
Note that for a fixed (s, r) the error vector e induced by the channel C is also fixed. We consider the probability space where (S, R) = (s, r) are fixed and Y (the random coins of the function A) is chosen uniformly. By the first property of a milestone function, we have that for a fixed (s, r) ∈ B and a fixed error e, Pr[A(m, s samp , s π , e, Y ) = 0] > 1 2 (as otherwise decoding must succeed). Let A = A(m, S samp , S π , E C (Z), Y ) be the random variable of the output of function A in the probability space where S, R, Y are chosen uniformly. We add an independent random variable Z U that is uniform over {0, 1} N to our probability space (that now consists of independently chosen S, R, Y, Z U ). By the second property of a milestone function, we have that for every error vector e, Pr[A(m, S samp , S π , e, Y ) = 1] ≥ 1 − ν/10.
As Z U is independent of (S samp , S π ) this holds also for an error vector of the form E C (Z U ). Namely, This means that: By averaging, there exist fixed values s samp , s π and y such that if we consider the event W = S samp = s samp , S π = s π , Y = y .
We have that (S samp , S π , Y ) is independent of Z U and also independent of (S P RG , R). Therefore: This setup (namely, where S samp , S π are fixed, and S P RG , R = (R 1 , . . . , R n ctrl ) are uniform) is exactly the probability space considered in the hiding lemma (Lemma 6.7). By the third property of milestones functions, we have that every xored restriction of the function D(z) = A(m, s samp , s π , E C (z), y ) is in C . Therefore, the function D that we obtained gives a contradiction to the hiding lemma.
6.4. Proof of Lemma 6.5. We will prove the lemma in two steps that correspond to the two steps of the decoding: decoding control, and decoding data.
Claim 6.10. For every m, s = (s samp , s π , s PRG ), r, e and y, let c = Enc(m, s, r) and c = c ⊕ e. If A weak ctrl (m, s samp , s π , e, y) = 1, then s ∈ List ctrl . Where List ctrl is the list obtained in the decoding algorithm described in Figure 5 properties of Enc SC , if the Hamming weight of e i is less than (p + ) · b then c i ∈ List i . We have that e is (p + , μ 10 )-good for s samp , and this means that for at least μ 10 · n ctrl = 2 ·n ctrl 40 of the n ctrl control blocks i ∈ I = Samp(S samp ), c i ∈ List SC = ∪ i∈ [n] List i . Thus, we indeed have that Pr i←[n ctrl ] [Enc LR (s) i ∈ List SC ] ≥ μ 10 > 2 100 for a set List SC of size n · L SC . By the list recoverability of Enc LR we get that s ∈ List ctrl meaning that the control information was successfully recovered as desired.
Claim 6.11. For every m, s = (s samp , s π , s PRG ), r, e and y, let c = Enc(m, s, r) and c = c ⊕ e. If A weak data (m, s samp , s π , e, y) = 1 and s ∈ List ctrl (meaning that s was recovered correctly by the first step of decoding) then m ∈ Dec(c ).
Proof. We have that s ∈ List ctrl , meaning that s is one of the candidates considered in the second step of the decoding. Let y be the string obtained from c after the decoding uses s samp to find the data blocks, s PRG to unmask the data, and s π to permute it back to it's original state. The requirement that A weak data (m, s samp , s π , e, y) = 1 implies that e π is (b, p + α, α)-balanced. Note that e π is the error vector used on the balanced code. By the guarantee on Dec balanced this gives that m ∈ List s = Dec balanced (y ) , since the correct control is in List ctrl then m ∈ Dec(c ) = s∈List ctrl List s as desired.
The lemma follows from the combination of both claims. 6.5. Proof of Lemma 6.6. A good intuition to keep in mind is that we are trying to bound the harm that can be caused by an additive channel that uses fixed error vector e of Hamming weight at most p.
We start by showing that with high probability, no more than an 2 /4 fraction of the control blocks, suffer too many errors from the error vector e.
Claim 6.12. For every m, e of Hamming weight at most pn, y, and s π , Proof. For a given error vector e, we define: T e = i : The ith block has a weight at most (p + 4 ) · b . For every e that has Hamming weight at most pN , it holds that |T e | > 4 ·n (otherwise we would have more than pN errors). Define f e : [n] → {0, 1} such that f e (i) = 1 iff i ∈ T e . By the properties of the sampler Samp, Thus, if we choose S samp uniformly and independently we get that with probability 1 − 2 −N 0.6 , the number of control blocks that are good (have error less than p + 4 ) is at least ( 4 − 2 100 )n ctrl > ( 9 10 · 2 /4)n ctrl = ((1 − 1 10 ) · μ)n ctrl . This means that the error vector e is (p + /4, (1 − 1 10 ) · μ)-good with probability 1 − 2 −N 0.6 and the claim holds.
We now show that the fraction of errors induced by e to the data part cannot be significantly larger than p.
Thus, with probability 1 − 2 −N 0.6 the number of errors induced to the control blocks is at least N ctrl (p − α 100 ), which implies that the number of error induced to the data is less than N data (p + α 100 ), and the claim follows.
We will now show that permuting the data part e, produces a balanced error vector with high probability. Let s samp be a sampler seed that is good with respect to the two previous claims. A 1 − 2 · 2 −N 0.6 fraction of sampler seeds, satisfy these properties. By Claim 6.13, we can assume that the relative Hamming weight of e ssamp data is at most p + α/100. We will denote e data = e ssamp data in order to avoid clutter. The lemma will follow from the following claim.
This is because together the three claims above give that with probability 1 − 2 −N 0.51 all good events happen, and A strong (x, y) = 1. In the remainder of this section, we prove Claim 6.14.
Let N = N data /b be the number ofb length blocks. We now define random variables D 1 , . . . , D N as follows.
The i'th block of π(S π , e data ) has weight more than (p + α 4 ) ·b 0 otherwise Claim 6.14 can now be seen as a claim that the sum of the D i 's is small with high probability. We will use a Chernoff style bound, due to Schmidt, Siegel and Srinivasan (Schmidt et al. 1995) in order to bound the probability of deviation.
We can now use Lemma 6.15 with k = N 0.55 , δ = 1 and μ = α/10 to get that: In order to prove Claim 6.16, we prove the following claim, for which we introduce the following notation: We use e sπ to denote π sπ (e data ). We use e sπ [i] to denote the i'th block of e sπ (where blocks are of lengthb). We use e sπ [i, j] to denote the j'th bit in the i'th block of e sπ .
Claim 6.17. Let v < N 0.55 , let i 1 , . . . , i v ∈ [N ] be distinct blocks, let i ∈ [N ] be an additional block, and let j 1 , . . . , j k ∈ [b]. Let a 1 , . . . , a v ∈ {0, 1}b be strings such that the relative Hamming weight of each a i is at least p+α/100. Let E = ∩ m∈ [v]  Let us first imagine that π is an (0, k)-wise independent permutation. In this case, the denominator is some quantity β ≥ 1/N v data ≥ 1/N N 0.55 ≥ 1/2 N 0.56 and the enumerator is at most β · (p + α/100) k . This is because conditioned on the v values, the fraction of ones that is "still available" in e data has not increased, and is still at most p + α/100. It follows that the actual quantity is at most β · (p + α/100) k + 2 −N 0.6 β − 2 −N 0.6 = (p + α/100) k + 2 −N 0.6 /β 1 − 2 −N 0.6 /β ≤ (p + α/50) k where the last inequality follows for sufficiently large N because p, α and k ≤b are constants, and for every two constants A < A , We now show that Claim 6.16 follows directly from Claim 6.17, using Lemma 6.15.

Conclusion and open problems
Encoding and decoding that run in a fixed polynomial time. The milestone proof approach is an abstraction that enables us to deal with a variety of channel families. However, as stated before, this framework demands that the encoding and decoding are computationally "stronger" than the channel; this drawback is inherent in the construction and proof. It remains unclear whether it is possible to achieve explicit optimal binary stochastic codes when the channel is not computationally inferior to the encoding and decoding functions.
This gives motivation to re-examine our problem under different settings and assumptions. A natural example is the cryptographic setup, which gives the following problem: Get an explicit stochastic codes with optimal rate, such that the encoding and decoding run in a fixed polynomial (say n 3 ) against any polynomial size channel, under cryptographic assumption.

Uniquely decodable codes beyond the GV bound.
The following open problem was stated by Guruswami and Smith, and remains relevant in our modified and improved construction. The codes we design for time bounded channels are list decodable, but not necessarily uniquely decodable. This is inherent to the current analysis, since even a very simple adversary may inject valid control blocks into the codeword, potentially causing the decoder to come up with several seemingly valid control strings.
For p ≥ 1/4, the limitation is inherent to any construction, as Guruswami & Smith (2016) showed an attack that can be carried out by a very simple attacker that yields the lower bound. However, for p < 1/4, it may still be possible to design codes that lead the decoder to a unique, correct codeword with high probability and achieve rate approaching 1 − H(p).
This problem is addressed in a recent subsequent work that is yet unpublished (Shaltiel & Silbak 2020). It is shown that against space bounded channels for every p < 1/4 there is an explicit stochastic code with rate approaching 1 − H(p) that is uniquely decodable.
Codes for (larger) space-bounded online channels. Our result resolves the open problem posed by Guruswami and Smith, as we construct unconditional explicit stochastic codes for space O(log n) online channels. An intriguing open problem is to extend our codes to handle space larger than O(log n), for example O( √ n). This is not possible in our current construction that "modifies" a PRG for space into an inner code with pseudorandom properties; since the decoding procedure performs a brute force search over all seeds of the PRG, we are naturally bounded by a logarithmic seed length (and known PRGs for space s computation require seed ≥ s).
In a subsequent work, Kopparty, Shaltiel and Silbak (Kopparty et al. 2019) address this problem and give a construction that is able to handle channels with space n Ω(1) where encoding and decoding run in quasilinear time. This result also builds on the approach of Guruswami & Smith (2016), while also utilizing many of the improvements and techniques developed in this paper. cc By averaging, for every such m, we have that for a 1 − √ γ fraction of i 1 ∈ [n 1 ], and so performing two steps of list-recovering indeed recovers the original message.
The outer code C 1 can be taken to be a folded Reed-Solomon code with evaluation points restricted to an explicit subspace evasive set, the reader is referred to Guruswami & Wang (2013) for the polytime encoding and decoding algorithm used for folded Reed-Solomon codes, and Dvir & Lovett (2012) for an explicit construction of subspace evasive sets (see, for example, Figure 1 in Hemenway & Wootters (2015)). The suggested code achieves the desirable parameters if ≥ O( √ γ) and L is a sufficiently large constant determined as a function of and L 1 . We now turn our attention to the inner code C 2 . We will use the probabilistic method to show the existence of a good code, and such code can be later found by exhaustive search using (L 1 + 1)wise independence, as we did in Theorem 4.3.

Proof.
We consider a uniformly chosen C 2 . For every subset S ⊆ {0, 1} k 2 of size L 1 + 1, and every collection T of sets T 1 , . . . , T n 2 ⊆ {0, 1} n in of size L in let B S,T be the event that for every x ∈ S, for a 1 − √ γ fraction of i ∈ [n 2 ], Enc C 2 (x) i ∈ T i .
For this to hold we choose n in > 2 log(L in ) √ γ and L 1 + 1 ≥ L in √ γ .
The inner code C 2 is over an alphabet of logarithmic size, and can be found (and decoded) by brute force search using (L 1 + 1)wise independence as explained before.