Abstract 1 Introduction 2 Preliminaries 3 List-recovery of Random Linear Codes 4 List-Recovery of Reed–Solomon codes 5 Any linear code needs output list-size 𝛀(𝑹/ϵ) 6 Concluding remarks References

Near-Optimal List-Recovery of Linear Code Families

Ray Li ORCID Department of Mathematics and Computer Science, Santa Clara University, CA, USA Nikhil Shagrithaya ORCID Department of EECS, University of Michigan, Ann Arbor, MI, USA
Abstract

We prove several results on linear codes achieving list-recovery capacity. We show that random linear codes achieve list-recovery capacity with constant output list size (independent of the alphabet size and length). That is, over alphabets of size at least Ω(1/ε), random linear codes of rate R are (1Rε,,(/ε)O(/ε))-list-recoverable for all R(0,1) and . Together with a result of Levi, Mosheiff, and Shagrithaya, this implies that randomly punctured Reed–Solomon codes also achieve list-recovery capacity. We also prove that our output list size is near-optimal among all linear codes: all (1Rε,,L)-list-recoverable linear codes must have LΩ(R/ε).

Our simple upper bound combines the Zyablov-Pinsker argument with recent bounds from Kopparty, Ron-Zewi, Saraf, Wootters, and Tamo on the maximum intersection of a “list-recovery ball” and a low-dimensional subspace with large distance. Our lower bound is inspired by a recent lower bound of Chen and Zhang.

Keywords and phrases:
Error-Correcting Codes, Randomness, List-Recovery, Reed-Solomon Codes, Random Linear Codes
Category:
RANDOM
Funding:
Ray Li: Research supported by NSF grant CCF-2347371.
Nikhil Shagrithaya: Research supported in part by NSF grants CCF-2236931 and CCF-2107345.
Copyright and License:
[Uncaptioned image] © Ray Li and Nikhil Shagrithaya; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Coding theory
; Theory of computation Pseudorandomness and derandomization ; Mathematics of computing Combinatorics
Related Version:
Full Version: https://arxiv.org/abs/2502.13877
Acknowledgements:
The authors would like to thank Yeyuan Chen and Zihan Zhang for pointing out a mistake in Theorem 12 in an earlier version of the paper.
Editors:
Alina Ene and Eshan Chattopadhyay

1 Introduction

In this work, we study list-recovery for random linear codes and random Reed–Solomon codes, proving near-optimal upper and lower bounds.

An (error correcting) code 𝒞 is a subset of Σn for an alphabet Σ, which, in this work, is always 𝔽q for some prime power q. We study linear codes, which are subspaces of 𝔽qn. We want codes to be large, meaning they have large rate R=(logq|𝒞|)/n. We also want codes to tolerate more errors. In the standard unique decoding setting, tolerating many errors means that, for any vector z𝔽qn, there is at most one codeword c𝒞 that agrees with z on many coordinates.

We study a generalization of the unique-decoding problem known as list-recovery. In list-recovery, we want that, for any ××× combinatorial rectangle, there are few codewords c that “agree” with this rectangle on many coordinates. Formally, a code 𝒞 is (ρ,,L)-list-recoverable if for any sets S1,,Sn𝔽q of size |Si|=, there are at most L codewords c1,,cL𝒞 such that ciSi for at least a (1ρ)n fraction of the coordinates. The special case of (ρ,1,1) list-recoverability is the standard unique-decoding setting, and the special case of (ρ,1,L) list-recoverability is known as list-decodability.

List-recovery has motivations in coding theory, complexity theory, and algorithms. In coding theory, list-recovery has been used as a tool to obtain efficient list-decoding algorithms [24, 22, 27, 34, 30]. Also, list-recoverable random-linear codes – which we study in this work – are used as a building block in other coding constructions [18, 31]. In complexity theory, list-recoverable codes find applications in constructions of other pseudorandom objects such as extractors [48] and condensers [26]. In algorithms, they are also useful primitives in group testing [32, 40] sparse recovery [12], and streaming algorithms [36, 8].

The list-recovery capacity theorem states that ρ=1R is the optimal tradeoff between the error radius ρ and the code rate R. (see e.g., [23, 42]). That is, below capacity ρ<1R, there exist (p,,O(1))-list-recoverable codes of rate R, and above capacity ρ>1R, any (ρ,,L)-list-recoverable code must have exponential list size LqΩ(n). The existence holds because uniformly random codes of rate R (over sufficiently large alphabets qΩ(1/ε)) are (1Rε,,O(/ε))-list-recoverable with high probability.

We wish to understand what kinds of codes achieve list-recovery capacity. A number of explicit code constructions are known to achieve list-recovery capacity, including Folded Reed–Solomon codes, Multiplicity codes, Folded Algebraic–Geometry codes. Additional techniques – subspace evasive sets, subspace designs, and expander techniques – can be used to improve the output list-size L and alphabet size q [22, 34, 27, 28, 29, 31, 30, 19, 9, 35, 49] (see Table 1 in [35], see also [47, 5] for even tighter list size bounds in the special case of list-decoding).

Still, several fundamental questions remain open.

  1. 1.

    First, how list-recoverable is a random linear code? A random linear code is a random subspace of 𝔽qn. All explicit constructions are based on linear codes (though many are only linear over a subfield), so it is natural to wonder about list-recovery of a “typical” linear code. As list-recovery is a pseudorandom property, this question also addresses the deeper geometric question of “how similar is a random subspace to a random set over 𝔽qn?”, which is well-studied in the more specific context of list-decoding [51, 10, 17, 7, 50, 43, 44, 45, 38, 20, 1].

  2. 2.

    Second, how list-recoverable are Reed–Solomon codes? The above constructions all generalize the Reed–Solomon code, the most fundamental polynomial evaluation code. Can Reed–Solomon codes themselves achieve list-recovery capacity? Given recent progress that showed the special case that Reed–Solomon codes achieve list-decoding capacity [4, 15, 1], this general case of list-recovery has been an obvious and tantalizing open question.

  3. 3.

    Lastly, is there a fundamental separation between linear and nonlinear codes for list-recovery? On one hand, there is no apparent separation for the special case of list-decoding, where random linear codes are list-decodable to capacity with list-size O(1/ε) [17, 50, 38, 20], just like uniformly random codes. On the other hand, uniformly random codes are list-recoverable with list size O(/ε), but all known linear constructions require output list size at least Ω(1/ε), and this lower bound has been proven in various specific settings [20, 37, 5].

We answer all three questions. We show that random linear codes are list-recoverable to capacity with provably near-optimal output list size. By a recent result of [37], this implies that randomly punctured Reed–Solomon codes are list-recoverable to capacity with near-optimal output list size. Lastly, we prove a fundamental separation between linear and non-linear codes by showing that all linear codes of rate R must have list-size at least LΩ(R/ε).

1.1 Our results

We now state our results in the context of prior work.

Table 1: List-recovery of Random Linear codes.
Citation Radius ρ input list size output list size
[51, 16] 1Rε qO(/ε)
[45] 1Rε qO(log2(/ε))
This work 1Rε (ε)O(/ε)

List recovery for Random Linear Codes

Several known arguments show that random linear codes achieve list-recovery capacity. A random linear code is a code generated by a uniformly random generator matrix 𝐆𝔽qn×k. First, the Zyablov-Pinsker argument [51] adapted to list-recovery shows that random linear codes of rate R over alphabet qΩ(1/ε) are (1Rε,,qO(/ε))-list-recoverable (see, for example [16, Lemma 9.6]). Rudra and Wootters [45] improved the output list size to qO(log2(/ε)), showing random linear codes of rate R over alphabet qΩ(1/ε) are (1Rε,,qO(log2(/ε)))-list-recoverable. We improve the output list size to be independent of the alphabet size q.

Theorem 1 (Theorem 8, Informal).

For all R,ε(0,1), and qΩ(1/ε) a random linear code of rate R is (1Rε,,(ε)O(/ε))-list-recoverable with high probability.

Our list size improves on the prior bounds when qΩ(/ε), which covers most alphabet sizes (qΩ(1/ε) is needed to achieve list-recovery capacity), and the improvement is more significant when q is larger. This improvement to an alphabet-independent output list size is critical for Theorem 2 below (see Remark 3). As we show in Theorem 5, this output list size is near optimal among all linear codes.

Our proof is simple, combining the Zyablov–Pinsker [51] argument with recent analyses of the list-recovery of explicit constructions like Folded Reed–Solomon codes. In particular, the Zyablov–Pinsker argument [51] shows that a random linear code can be list-recovered so that, with high probability the output list always lies in a subspace of dimension at most O(/ε). Naively, this implies an output list size bound of qO(/ε). However, recent analyses of list-recovering explicit codes [35, 49] showed that subspaces of dimension D with good distance – random linear codes are well known to have good distance with high probability – can have at most (/ε)O(D) points inside an -list-recovery ball, thus giving our improved output list size. We also show that we get the best possible output list size for our proof technique, in the sense that, for any linear code, there are output lists that span a subspace of dimension at least Ω(/ε) (see Proposition 13).

We believe that the simplicity of our proof is a strong indicator that we have found the right way to approach the problem, which had previously resisted various other proof techniques.

List recovery for Random Reed-Solomon Codes

Reed–Solomon codes [41] are the most fundamental evaluation codes. A Reed–Solomon code is given by n evaluation points α1,α2,,αn in a finite field 𝔽q, and a degree k<n, and is defined as

RSn,k(α1,,αn):={(f(α1),,f(αn))f𝔽q[x], degf<k}.

List-decoding and list-recovery of Reed–Solomon codes are well-studied questions. The seminal Guruswami–Sudan [24] algorithm showed that Reed–Solomon codes are list-decodable and list-recoverable up to the Johnson radius 1R [33, 25]. Since then, there has been much interest in determining whether Reed–Solomon codes are list-decodable and list-recoverable beyond the Johnson bound, and perhaps even up to capacity ρ=1R (the capacity is 1R for both list-decoding and list-recovery). Initially, there was evidence against this possibility [21, 6, 2], suggesting that Reed–Solomon codes could not be list-decoded or list-recovered much beyond the Johnson bound. Since then, an exciting line of work has shown, to contrary, that Reed–Solomon codes can beat the Johnson bound for list-decoding [43, 46, 11, 13, 14, 4, 15, 1], and, in fact, can be list-decoded up to capacity [4, 15, 1]. All of these works studied randomly punctured Reed–Solomon codes, where α1,,αn are chosen at random from a larger field q.

Despite the exciting progress for list-decoding, there has been comparatively little progress on list-recovery. Lund and Potukuchi [39] and Guo, Li, Shangguan, Tamo, and Wootters [14] proved that (randomly punctured) Reed–Solomon codes are list-recoverable beyond the Johnson bound in the low-rate regime: [39] shows (ρ,,L)-list-recovery for ρ11/2, L=O() and rate Ω(1logq), and [14] shows (Ω(εlog(1/ε)),,O(/ε))-list-recovery for rate 1ε Reed–Solomon codes. Both improve on the Johnson radius of O(1) in the low rate setting.

In [37], Levi, Mosheiff and Shagrithaya showed that random Reed–Solomon codes and random linear codes are locally equivalent, meaning that both random code families achieve identical rate thresholds for all “local properties”, which include (the complements of) list-decoding and list-recovery. Thus, our result for list-recovery of random linear codes transfers to random Reed–Solomon codes as well.

Theorem 2 (Theorem 11, Informal).

For all R,ε(0,1), a randomly punctured Reed–Solomon code of length n over alphabet size q=n(/ε)(/ε)O(/ε), of rate R is
(1Rε,,(ε)O(/ε))-list-recoverable with high probability.

We note that the problem of determining optimal list sizes for random Reed–Solomon codes across all rates R(0,1) has proven resistant to a variety of previous approaches. The simplicity of our proof suggests that the method presented here may offer a promising direction for completely resolving this question.

 Remark 3.

We note that in order to use the equivalence result from [37], it is crucial that the upper bound on the list size be independent of the alphabet size, as guaranteed by Theorem 1. Hence, previous results on list size cannot be used with the equivalence result.

 Remark 4.

A fruitful line of work [22, 34, 27, 9, 35, 49] has culminated in output list sizes of O(ε)O(log()/ε) for various explicit list-recoverable codes such as Folded Reed–Solomon codes and Multiplicity codes. This list size is better than our list size of (ε)O(/ε) by roughly a factor of /log() in the exponent. However, our results are still interesting because, as described above, the list-recovery of random linear codes and Reed–Solomon codes are fundamental questions, and also because our results yield linear codes for list-recovery and use much smaller alphabet sizes.

Lower bounds for list-recovery

We now discuss impossibility results for list-recovery. An early impossibility result of Guruswami and Rudra in [21] showed that, in the setting of zero-error list-recovery (ρ=0), many full length Reed–Solomon codes of rate R require R1/ in order to have polyn output list size, so many full length Reed–Solomon codes cannot be list-recovered beyond the Johnson bound – note this does not contradict Theorem 2, as we consider randomly punctured, as opposed to full length (n=q) codes. More recently it was shown that achieving list-recovery capacity requires exponential list size Ω(1/ε) for particular codes: random linear codes in the high-rate zero-error (ρ=0) regime [20], random linear codes in general parameter settings [37], and for Reed–Solomon codes, Folded Reed–Solomon codes, and Multiplicity codes in general parameter settings [5].

Inspired by the lower bound in [5], we show that any linear code list-recoverable to capacity must have output list size at least Ω(R/ε).

Theorem 5 (Theorem 12, Informal).

Over any field, any linear code of rate R that is (1Rε,,L) list-recoverable must satisfy LΩ(R/ε).

One takeaway from Theorem 5 is that our list sizes of (/ε)O(/ε) in Theorem 1 and Theorem 2 are near-optimal. Additionally, Doron and Wootters [8] asked whether there were explicit list-recoverable codes with, among other desired guarantees, output list size L=O(). Our result shows this is not possible with any linear code. Lastly, our lower bound shows separation between non-linear and linear codes for list-recovery, which is perhaps surprising given that no such separation exists for list-decoding.

We point out that, for list-decoding (=1), our lower bound is trivial (L1), so it does not contradict the recent results that random linear codes, randomly punctured Reed–Solomon codes, and randomly punctured Algebraic-Geometry codes achieve list-decoding capacity with output list size O(1/ε) [4, 15, 1, 3].

2 Preliminaries

For a prime power q, let 𝔽q be the finite field of order q. Let [n] denote the set {1,,n}. For a given vector space V, let (V) denote the set of all subspaces of V. For a given set S, let 2S denote the power set of S. For a vector v, let v[i] denote its ith entry.

A code 𝒞𝔽qn is said to be linear if it is a linear subspace, and said to have rate R(0,1) if R=dim(𝒞)/n. We say 𝒞 has relative distance δ(0,1) if c𝒞,wt(c)δn, where wt(c) denotes the number of non-zero entries in the codeword c. A matrix 𝐆𝔽qn×Rn containing linearly independent columns is said to be the generator matrix of 𝒞 if every codeword c𝒞 can be constructed using some linear combinations of the columns in 𝐆. 𝒞 is said to contain a set of vectors s1,sb𝔽qn if si𝒞 for every i[b].

For a vector x𝔽qn and sets S1,,Sn𝔽q, the agreement set agr(x,S1,,Sn) is defined as:

agr(x,S1,,Sn){i[n]x[i]Si}.

A ρ-radius -list-recovery ball B(ρ,S1××Sn) is given by input lists S1,,Sn𝔽q of size , and is defined to be

B(ρ,S1××Sn)={x𝔽qn:agr(x,S1××Sn)(1ρ)n}. (1)

We can alternatively define (ρ,,L)-list-recovery using the above definition: a code 𝒞𝔽qn is (ρ,,L)-list-recoverable if every ρ-radius -list-recovery ball B contains at most L codewords.

For 0<R<1, a random linear code (RLC) of rate R is a linear code whose generator matrix 𝐆𝔽qn×Rn is a matrix whose entries are chosen uniformly at random from 𝔽q, independently of one another. For α1,,αn𝔽q, we use 𝖱𝖲(α1,,αn;Rn) to denote the Reed–Solomon (RS) code of rate R obtained by evaluating polynomials of degree <Rn on evaluation points α1,,αn𝔽q. We say this is a random RS code if the evaluation points have been chosen uniformly at random and independently of one another111This is different from the usual model for random RS codes, where it is required that the random evaluation points be distinct. However, it can be shown that both models behave similarly (refer to [37], Appendix A for details)..

2.1 Local Coordinate-Wise Linear (LCL) Properties

We now introduce the machinery in [37] that connects random linear codes to (randomly punctured) Reed–Solomon codes. A code property 𝒫n for codes of block length n in 𝔽qn is simply a family of codes in 𝔽qn. We say that a code 𝒞n𝔽qn satisfies 𝒫n if 𝒞n𝒫n. Denoting 𝒫{𝒫n}n, we say that an infinite family of codes 𝒞n{𝒞n}n satisfies 𝒫 if 𝒞n𝒫n for every n. In this paper, we focus on local, monotone-increasing code properties. A local code property, informally speaking, is defined by the inclusion of some bad set. A monotone-increasing code property is one for which the following is true: if 𝒞 satisfies 𝒫, then every 𝒞 for which 𝒞𝒞 holds, also satisfies 𝒫. An example of local, monotone-increasing code property is the complement of (ρ,L)-list-decodability, defined as the family of all codes that contain at least one set of L+1 distinct vectors, all lying within a Hamming ball of radius ρ.

For a locality parameter b, an ordered tuple of subspaces 𝒱=(𝒱1,,𝒱n), where 𝒱i(𝔽qb) for each i[n] is defined to be a b-local profile. Note that 𝒱(𝔽qb)n. We say that a matrix A𝔽qn×b is contained in 𝒱 if the ith row of A belongs to 𝒱i, for all i. A code 𝒞𝔽qn is said to contain 𝒱 if

  1. (a)

    there exists a matrix A𝔽qn×b with distinct columns such that the set of columns of A is contained in 𝒞, and

  2. (b)

    A is contained in 𝒱.

For a family of b-local profiles {n}n, where n(𝔽qb)n, we define a b-local coordinate wise linear (b-LCL) property 𝒫{𝒫n}n as follows:

𝒫n={𝒞𝔽qn𝒱n such that 𝒞 contains 𝒱}.

The complement of (ρ,,L)-list-recoverability is a (L+1)-LCL property. This is proven in [37, Proposition 2.2], but we provide a justification in this paragraph. Every bad set of vectors lying within a given ρ-radius -list-recovery ball agrees with some input lists S1,,Sn𝔽q at a lot of coordinates. This implies that the vectors agree with one another at a lot of coordinates as well, and once we arrange the bad vector sets as columns in a matrix of dimension n×(L+1), we can specify these agreements as linear constraints on the rows of such matrices. Formally, the property is defined by a family of (L+1)-local profiles that we now describe. For every n, we define n by describing the (L+1)-local profiles 𝒱(𝔽qL+1)n that constitute it. Let S(1ρ)(2[n])L+1 denote the collection of all L+1-length tuples where each element is a subset of [n] of size exactly (1ρ)n. Furthermore, let M[][]n×(L+1) denote the set of all matrices of dimension n×(L+1) having elements in [] 222Even though S(1ρ) and M[] depend on n, we have suppressed this dependence in the notation for sake of clarity.. Then, for every s=(s1,,s(L+1))S(1ρ), every MM[], define 𝒱(s,M)=(𝒱1,,𝒱n) such that for every i[n],

𝒱i{r𝔽qL+1j,k[L+1],r[j]=r[k] if (isj)(isk)(M[i,j]=M[i,k])}.

Note that each 𝒱i is a subspace of 𝔽q(L+1), and therefore 𝒱(s,M) is a valid linear profile. We can now define the associated family of linear profiles for the complement of (ρ,,L)-list-recoverability:

n{𝒱(𝔽qL+1)nsS(1ρ),MM[], such that 𝒱=𝒱(s,M)}.

Observe that

|𝒫||S(1ρ)||M[]|(nρn)(L+1)(L+1)n. (2)

In the same work, the authors also prove a threshold theorem for random linear codes (RLCs) in relation to all LCL properties, and moreover, gave a complete characterization of the rate threshold. Informally, the theorem says that RLCs of a sufficiently large alphabet exhibit a sharp threshold phenomenon for all LCL properties. That is, for every LCL property 𝒫, there exists a rate threshold R𝒫 such that RLCs of rate R𝒫ε satisfy 𝒫 with high probability, and RLCs of rate R𝒫+ε do not satisfy 𝒫 with high probability.

Theorem 6 ([37], Theorem 3.1).

Let 𝒫 be a b-LCL property of codes in 𝔽qn and let (𝔽qb)n be a corresponding family of profiles. Let 𝒞𝔽qn be an RLC of rate R. Then, there is some threshold rate R𝒫 for which the following holds.

  1. 1.

    If RR𝒫+ε then 𝐏𝐫[𝒞 satisfies 𝒫]1qεn+b2.

  2. 2.

    If RR𝒫ε then 𝐏𝐫[𝒞 satisfies 𝒫]||qεn+b2.

  3. 3.

    In particular, if RR𝒫ε and q22log2||εn then 𝐏𝐫[𝒞 satisfies 𝒫]qεn2+b2.

The concept of LCL properties allows for “transfer type” theorems between random linear codes and random RS codes. In more detail, for every reasonable LCL property (that is, for every LCL property whose corresponding family of profiles is large), the rate thresholds for random linear codes and random RS codes are the same. That is, any rate threshold proved for LCL properties of RLCs also applies for random RS codes, and vice versa. For our purposes, we only require one part of this result, which we formally state below.

Theorem 7 ([37], Theorem 3.10 (part 1) (Threshold theorem for RS codes)).

Let 𝒫 be a b-LCL property of codes in 𝔽qn, with associated local profile family 𝒫(𝔽qb)n and (random linear code) threshold rate R𝒫. Let 0<R<1 and let 𝒞=𝖱𝖲𝔽q(α1,,αn;Rn), and α1,,αn are sampled independently and uniformly from 𝔽q. Assume that q>Rnb. Fix εn2b(b+1). If RR𝒫ε, then

𝐏𝐫[𝒞 satisfies 𝒫](2b1)((4b)4bRnεq)εn2b|𝒫|. (3)

3 List-recovery of Random Linear Codes

In this section, we prove Theorem 1, that random linear codes achieve list-recovery capacity with constant output list size. Formally, we show the following.

Theorem 8.

Fix 0<R<1, ε>0 so that (1Rε)>0, , and let q be a prime power such that qmax(8Rε+6,24/ε). Let 𝒞𝔽qn be an RLC of rate R. Then with probability at least 12qεn8, 𝒞 is ((1Rε),,L)-list-recoverable with L satisfying L(2ε)2/ε.

The theorem follows as a consequence of two lemmas. We first state both lemmas, and then give the proof of Theorem 8 using them. The first lemma essentially states that any low dimensional subspace with good distance has few points in a list-recovery ball. This lemma appears in [35, 49]; we state the version from [49, Lemma 3.1].

Lemma 9 ([49], see also [35]).

For ε>0 and , let 𝒞𝔽qn be a linear code with relative distance δ>ε/2 that is (δε2,,L)-list-recoverable. Assume further that any output list is contained in a subspace V𝒞 of dimension r. Then the output list size L(2ε)r.

The second lemma uses the Zyablov–Pinsker argument [51], showing that a random linear code does not have too many linearly independent codewords within a list-recovery ball.

Lemma 10.

Fix 0<R<1, ε>0 so that (1Rε)>0, , and let q be a prime power such that qmax(8Rε+6,24/ε). Let 𝒞𝔽qn be an RLC of rate R. Then with probability at least 1qεnL8, for every input lists 𝒮1,,𝒮n of size , the maximal linearly independent subset of 𝒞 within the (1Rε) radius -list-recovery ball B((1Rε),S1××Sn) has size less than 2/ε.

Proof of Lemma 10.

Denote ρ:=1Rε and L:=2/ε. We also assume q is a multiple of for simplicity of exposition, and note that the result holds in the general case as well. We show that an RLC “avoids” all bad configurations with high probability. A bad configuration is a set V of linearly independent vectors of size L such that there exist input lists 𝒮1,,𝒮n of size , so that VB(ρ,S1××Sn). We say that 𝒞 contains a bad configuration V if for every vV, v is also in 𝒞. If this condition is not satisfied, then we say that 𝒞 does not contain V. It is easy to see that if 𝒞 contains no bad configurations, then the maximal linearly independent subset of 𝒞 within any -list-recovery ball of radius ρ has size less than 2/ε. Therefore we show that the probability of 𝒞 containing a bad configuration is low.

Fix input lists 𝒮1,,𝒮n of size , and let B:=B(ρ,S1××Sn) be the corresponding ρ radius -list-recovery ball. The size of B is (nρn)(1ρ)nqρn. The probability that a particular configuration is bad is equal to the probability of the encodings of some L linearly independent messages being inside B simultaneously, which is (|B|qn)L. By a union bound over at most qn possible input lists and all L-sized linearly independent subsets of the message vectors in 𝔽qRn (there are at most qRnL such subsets), we have

𝐏𝐫𝒞[𝒞 contains a bad configuration] ((nρn)(1ρ)nqρnqn)LqnqRnL
=((q)n(nρn)(q)ρn)LqnqRnL
((q)n(q/)Hq/(ρ)n)LqnqRnL
((q/)(1Hq/(ρ))n)LqnqRnL
((q/)(R+3ε4)n)LqnqRnL (4)
=(R+3ε4)nLq(R+3ε4)nLqεnL2qRnL
=(R+3ε4)nLqεnL4
qεnL8.

In Equation 4, we used q24/ε, and for the last inequality, we used q8Rε+6. This implies that the probability with which 𝒞 does not contain any bad configuration is at least 1qεnL8.

We now prove Theorem 8.

Proof of Theorem 8.

Denote ρ:=1Rε. Denote by E1 the event that for a RLC 𝒞 of rate R, the maximal linearly independent subset of 𝒞 within every (1Rε) radius -list-recovery ball has size less than 2/ε. By Lemma 10, we know that E1 happens with probability at least 1qεnL8. Let E2 denote the event that a rate R RLC 𝒞 has distance at least 1Rε2. By the Gilbert-Varshamov bound (see [23], Section 4.2), and because of the fact that q24/ε24/ε, E2 happens with probability at least 1qεn2. Therefore we have

𝐏𝐫𝒞[E1E2]1qεnL8qεn212qεn8

When E1 and E2 occur simultaneously, the assumptions of Lemma 9 are satisfied by 𝒞 with r=2/ε and δ=1Rε2, and therefore we see that

𝐏𝐫𝒞[B|𝒞B|(2ε)2/ε]12qεn8

where B is ranging over all (1Rε) radius -list-recovery balls, and we are done.

4 List-Recovery of Reed–Solomon codes

In this section, we will prove the following result, which says that random Reed–Solomon codes are list-recoverable to capacity with constant output list size. The proof combines Theorem 8 from the previous section with Theorem 6, the equivalence theorem from [37].

Corollary 11.

Fix 0<R<1, ε>0 so that (1Rε)>0, . Fix a constant ε>0 such that ε<R, and denote L(2ε)2/ε. Let η>0 be a constant, and let q be a prime power satisfying q>(4(L+1))4(L+1)Rnε2((log+2)(L+1)+η)2(L+1)ε. Then, a random RS code of rate Rε over 𝔽qn is (1Rε,,L)-list-recoverable with probability at least 12ηn.

Proof of Corollary 11.

Denote L(2ε)2/ε and bL+1. Let 𝒫 be the b-LCL property of not being (1Rε,,L)-list-recoverable, and let R𝒫 be the corresponding (random linear code) threshold rate. By Theorem 6, part 1 [37], we know that if 𝒞 is an RLC of rate R, then the following holds for every constant ε>0:

𝐏𝐫[𝒞 satisfies 𝒫]<1qεn+b2R<R𝒫+ε

According to Theorem 8, a rate R RLC (having a sufficiently large alphabet size) satisfies 𝒫 only with probability at most 2qεn8<1qεn+b2. Therefore, R<R𝒫+ε for every ε>0, and so RR𝒫.

We will now work with random RS codes having rate slightly less than R. Define RRεR𝒫ε, take n to be large enough so that εn2b(b+1). Note that q>Rnb>Rnb. Define 𝒞=𝖱𝖲𝔽q(α1,,αn;Rn), where α1,,αn are sampled independently and uniformly from 𝔽q. Upon denoting 𝒫 to be the local profile family associated with property 𝒫, we see that the hypothesis of Theorem 7 [37] is satisfied, and therefore, Equation 3 is satisfied.

Recall that we calculated an upper bound for |𝒫| in Equation 2, and so |𝒫|(n(1Rε)n)bbn. Substituting this bound on |𝒫| in Equation 3,

𝐏𝐫[𝒞 satisfies 𝒫] (2b1)((4b)4bRnεq)εn2b|𝒫|
((4b)4bRnεq)εn2b(n(1Rε)n)bbn
((4(L+1))4(L+1)Rnεq)εn2(L+1)2(H2(1Rε)+1)(L+1)n.

Because q>(4(L+1))4(L+1)Rnε2((log+2)(L+1)+η)2(L+1)ε, we see that 𝐏𝐫[𝒞 satisfies 𝒫]2ηn. Thus, 𝒞 is (1Rε,,L)-list-recoverable with probability at least 12ηn.

5 Any linear code needs output list-size 𝛀(𝑹/ϵ)

We now prove our lower bounds for list-recovery, that any linear code list-recoverable to capacity needs output list size Ω(R/ε).

Theorem 12.

Let R,ε(0,1), be a positive integer, and nn0(,R,ε) be sufficiently large. Let 𝒞𝔽n be a linear code of rate R. If 𝒞 is (1Rε,,L)-list-recoverable, then L>R/ε.

Proof.

Let kRn be the dimension of the code. Let k=εRk. Let m=k1k+1. By Gaussian elimination and permuting rows and columns, we may, without loss of generality write the generator matrix of 𝒞 as

𝐆=[111] (5)

where each is a length nk vector. For i[k], let vi𝔽n denote the columns. By rank-nullity, there exist vectors w0,,wm1𝔽n such that wi is a linear combination of vi(k+1)+1,,v(i+1)(k+1) such that wi is not supported on indices k+1,,k+k (there are k+1 vectors and k indices). Now let wm=vk. Restricted to indices in [k+k], vectors w0,,wm have pairwise disjoint supports: within indices [k+k], for i=0,,m1, vector wi is supported on i(k+1)+1,,(i+1)(k+1)k1, and vector wm is supported on k,,k+k.

Now fix arbitrary distinct values β1,,β𝔽q. Consider the output list ={i=0mβriwi:ri[]} to be all linear combinations of wi with coefficients from β1,,β. The fact that the vectors w0,,wm have pairwise disjoint supports on [k+k] implies (i) the vectors w0,,wm are linearly independent, and so all vectors in are distinct, and (ii) the vectors in can only take on one of values at any index in [k+k]. Therefore, we can choose input lists S1,,Sk+k, each of size such that all codewords in agree with all of S1,,Sk+k.

Choosing the rest of the input lists arbitrarily, we see that if this code is (ρ,,L) list-recoverable with radius ρ=(nkk)/n<1Rε, then the list size satisfies Lm+1R/ε.333m+1=k+kk+1k+εRkεRk+2Rε, where we used that k>2R2/ε2.

We also show that our Zyablov-Pinsker type argument in Theorem 1 (Lemma 10) is tight, in the sense that any linear code must have Ω(/ε) linearly independent codewords in a list-recovery ball.

Proposition 13.

Let R,ε(0,1), be a positive integer, and nn0(,R,ε) be sufficiently large. Let 𝒞 be a linear code of rate R. Then there exists a (1Rε) radius -list-recovery ball B that contains at least (1R)/ε1 linearly independent elements of 𝒞.

Proof.

Upon writing the generator matrix of 𝒞 in the same form as described above in the proof of Theorem 12, consider the first m(1R)/ε1<(1R)/ε columns of the generator matrix. Denote these linearly independent column vectors by v1,,vm. Create input lists S1,,Sk each of size that contain 0 and 1, but are otherwise arbitrary. Then create lists Sk+1,,Sn of size , each containing elements that are evenly distributed so that they agree equally with each of v1,,vm. Thus, each of v1,,vm agrees with S1,,Sn on the first k, and on at least m(nk)>εn of the remaining coordinates. Therefore, these vectors lie inside a (1Rε)n-radius -list-recovery ball around S1,,Sn, as desired.

6 Concluding remarks

We showed that random linear codes and Reed–Solomon codes are list-recoverable to capacity with near-optimal output-list size. Several open questions remain.

  1. 1.

    What is the optimal output-list size for random linear codes and Reed–Solomon codes? There is a gap between our upper bound of (ε)O(/ε) and the lower bound of Ω(1/ε). We surmise that the correct answer is closer to the lower bound.

  2. 2.

    As asked by Doron and Wootters [8], are there explicit list-recoverable codes with output list size L=Oε()? (and, even better, over alphabet size q=poly()). We showed (Theorem 5) that any such code must be nonlinear.

  3. 3.

    Our alphabet size for list-recovering Reed–Solomon codes (Theorem 2) is optimal in that it is linear in n, but the constant is double-exponential in /ε. By contrast, for list-decoding, the best known alphabet size for achieving capacity has an exponential-type constant, 2poly(1/ε)n [1]. Can our alphabet size be improved?

References

  • [1] Omar Alrabiah, Venkatesan Guruswami, and Ray Li. Randomly punctured reed-solomon codes achieve list-decoding capacity over linear-sized fields. In Bojan Mohar, Igor Shinkar, and Ryan O’Donnell, editors, Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, Vancouver, BC, Canada, June 24-28, 2024, pages 1458–1469. ACM, 2024. doi:10.1145/3618260.3649634.
  • [2] Eli Ben-Sasson, Swastik Kopparty, and Jaikumar Radhakrishnan. Subspace Polynomials and Limits to List Decoding of Reed-Solomon Codes. IEEE Trans. Inform. Theory, 56(1):113–120, January 2010. doi:10.1109/TIT.2009.2034780.
  • [3] Joshua Brakensiek, Manik Dhar, Sivakanth Gopi, and Zihan Zhang. AG codes achieve list decoding capacity over contant-sized fields. CoRR, abs/2310.12898, 2023. doi:10.48550/arXiv.2310.12898.
  • [4] Joshua Brakensiek, Sivakanth Gopi, and Visu Makam. Generic reed-solomon codes achieve list-decoding capacity. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1488–1501. ACM, 2023. doi:10.1145/3564246.3585128.
  • [5] Yeyuan Chen and Zihan Zhang. Explicit folded reed-solomon and multiplicity codes achieve relaxed generalized singleton bounds. In Michal Koucký and Nikhil Bansal, editors, Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 1–12. ACM, 2025. doi:10.1145/3717823.3718114.
  • [6] Qi Cheng and Daqing Wan. On the List and Bounded Distance Decodability of Reed-Solomon Codes. SIAM J. Comput., 37(1):195–209, April 2007. Place: Philadelphia, PA, USA Publisher: Society for Industrial and Applied Mathematics. doi:10.1137/S0097539705447335.
  • [7] Mahdi Cheraghchi, Venkatesan Guruswami, and Ameya Velingker. Restricted isometry of Fourier matrices and list decodability of random linear codes. SIAM J. Comput., 42(5):1888–1914, 2013. doi:10.1137/120896773.
  • [8] Dean Doron and Mary Wootters. High-Probability List-Recovery, and Applications to Heavy Hitters. In Electron. Colloquium Comput. Complex., volume 27, page 162, 2020.
  • [9] Zeev Dvir and Shachar Lovett. Subspace evasive sets. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 351–358, 2012. doi:10.1145/2213977.2214010.
  • [10] Peter Elias. Error-correcting codes for list decoding. IEEE Transactions on Information Theory, 37(1):5–12, 1991. Publisher: IEEE. doi:10.1109/18.61123.
  • [11] Asaf Ferber, Matthew Kwan, and Lisa Sauermann. List-decodability with large radius for Reed-Solomon codes. IEEE Transactions on Information Theory, 68(6):3823–3828, 2022. Publisher: IEEE. doi:10.1109/TIT.2022.3148779.
  • [12] Anna C Gilbert, Hung Q Ngo, Ely Porat, Atri Rudra, and Martin J Strauss. l2/l2-foreach sparse recovery with low risk. In International Colloquium on Automata, Languages, and Programming, pages 461–472. Springer, 2013. doi:10.1007/978-3-642-39206-1_39.
  • [13] Eitan Goldberg, Chong Shangguan, and Itzhak Tamo. Singleton-type bounds for list-decoding and list-recovery, and related results. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 2565–2570. IEEE, 2022. doi:10.1109/ISIT50566.2022.9834849.
  • [14] Zeyu Guo, Ray Li, Chong Shangguan, Itzhak Tamo, and Mary Wootters. Improved list-decodability and list-recoverability of Reed-Solomon codes via tree packings. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 708–719. IEEE, 2022.
  • [15] Zeyu Guo and Zihan Zhang. Randomly punctured reed-solomon codes achieve the list decoding capacity over polynomial-size alphabets. In 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023, Santa Cruz, CA, USA, November 6-9, 2023, pages 164–176. IEEE, 2023. doi:10.1109/FOCS57990.2023.00019.
  • [16] Venkatesan Guruswami. List Decoding of Error-Correcting Codes (Winning Thesis of the 2002 ACM Doctoral Dissertation Competition), volume 3282 of Lecture Notes in Computer Science. Springer, 2004. doi:10.1007/B104335.
  • [17] Venkatesan Guruswami, Johan Håstad, and Swastik Kopparty. On the List-Decodability of Random Linear Codes. IEEE Trans. Inform. Theory, 57(2):718–725, February 2011. doi:10.1109/TIT.2010.2095170.
  • [18] Venkatesan Guruswami and Piotr Indyk. Expander-based constructions of efficiently decodable codes. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 658–667. IEEE, 2001. doi:10.1109/SFCS.2001.959942.
  • [19] Venkatesan Guruswami and Swastik Kopparty. Explicit subspace designs. Combinatorica, 36(2):161–185, 2016. Publisher: Springer. doi:10.1007/S00493-014-3169-1.
  • [20] Venkatesan Guruswami, Ray Li, Jonathan Mosheiff, Nicolas Resch, Shashwat Silas, and Mary Wootters. Bounds for list-decoding and list-recovery of random linear codes. IEEE Transactions on Information Theory, 68(2):923–939, 2021. Publisher: IEEE. doi:10.1109/TIT.2021.3127126.
  • [21] Venkatesan Guruswami and Atri Rudra. Limits to List Decoding Reed–Solomon Codes. IEEE Trans. Inform. Theory, 52(8):3642–3649, August 2006. Place: Piscataway, NJ, USA Publisher: IEEE Press. doi:10.1109/TIT.2006.878164.
  • [22] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Transactions on Information Theory, 54(1):135–150, 2008. Publisher: IEEE. doi:10.1109/TIT.2007.911222.
  • [23] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory, 2022. Draft available at https://cse.buffalo.edu/faculty/atri/courses/coding-theory/book/prev-ver/web-coding-book-Jan31-2022.pdf.
  • [24] Venkatesan Guruswami and Madhu Sudan. Improved Decoding of Reed–Solomon and Algebraic-Geometry Codes. IEEE Transactions on Information Theory, 45(6):1757–1767, 1999. doi:10.1109/18.782097.
  • [25] Venkatesan Guruswami and Madhu Sudan. Extensions to the Johnson bound. Manuscript, February, 2001.
  • [26] Venkatesan Guruswami, Christopher Umans, and Salil P. Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. J. ACM, 56(4):20:1–20:34, 2009. doi:10.1145/1538902.1538904.
  • [27] Venkatesan Guruswami and Carol Wang. Linear-algebraic list decoding for variants of Reed–Solomon codes. IEEE Transactions on Information Theory, 59(6):3257–3268, 2013. Publisher: IEEE. doi:10.1109/TIT.2013.2246813.
  • [28] Venkatesan Guruswami and Chaoping Xing. Folded codes from function field towers and improved optimal rate list decoding. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 339–350, 2012. doi:10.1145/2213977.2214009.
  • [29] Venkatesan Guruswami and Chaoping Xing. List decoding Reed-Solomon, Algebraic-Geometric, and Gabidulin subcodes up to the Singleton bound. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 843–852, 2013. doi:10.1145/2488608.2488715.
  • [30] Brett Hemenway, Noga Ron-Zewi, and Mary Wootters. Local list recovery of high-rate tensor codes and applications. SIAM Journal on Computing, 49(4):FOCS17–157, 2019. Publisher: SIAM. doi:10.1137/17M116149X.
  • [31] Brett Hemenway and Mary Wootters. Linear-time list recovery of high-rate expander codes. Information and Computation, 261:202–218, 2018. Publisher: Elsevier. doi:10.1016/J.IC.2018.02.004.
  • [32] Piotr Indyk, Hung Q Ngo, and Atri Rudra. Efficiently decodable non-adaptive group testing. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1126–1142. SIAM, 2010. doi:10.1137/1.9781611973075.91.
  • [33] Selmer Johnson. A new upper bound for error-correcting codes. IRE Transactions on Information Theory, 8(3):203–207, 1962. Publisher: IEEE. doi:10.1109/TIT.1962.1057714.
  • [34] Swastik Kopparty. List-decoding multiplicity codes. Theory of Computing, 11(1):149–182, 2015. Publisher: Theory of Computing Exchange. doi:10.4086/TOC.2015.V011A005.
  • [35] Swastik Kopparty, Noga Ron-Zewi, Shubhangi Saraf Saraf, and Mary Wootters. Improved decoding of folded Reed-Solomon and multiplicity codes. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 212–223. IEEE, 2018.
  • [36] Kasper Green Larsen, Jelani Nelson, Huy L Nguyen, and Mikkel Thorup. Heavy hitters via cluster-preserving clustering. Communications of the ACM, 62(8):95–100, 2019. doi:10.1145/3339185.
  • [37] Matan Levi, Jonathan Mosheiff, and Nikhil Shagrithaya. Random Reed-Solomon Codes and Random Linear Codes are Locally Equivalent, November 2024. arXiv:2406.02238. doi:10.48550/arXiv.2406.02238.
  • [38] Ray Li and Mary Wootters. Improved list-decodability of random linear binary codes. IEEE Transactions on Information Theory, 67(3):1522–1536, 2020. Publisher: IEEE. doi:10.1109/TIT.2020.3041650.
  • [39] Ben Lund and Aditya Potukuchi. On the List Recoverability of Randomly Punctured Codes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020), volume 176, pages 30:1–30:11, 2020. doi:10.4230/LIPIcs.APPROX/RANDOM.2020.30.
  • [40] Hung Q Ngo, Ely Porat, and Atri Rudra. Efficiently decodable error-correcting list disjunct matrices and applications. In Automata, Languages and Programming: 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part I 38, pages 557–568. Springer, 2011.
  • [41] I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960. doi:10.1137/0108018.
  • [42] Nicolas Resch. List-decodable codes:(randomized) constructions and applications, 2020.
  • [43] Atri Rudra and Mary Wootters. Every list-decodable code for high noise has abundant near-optimal rate puncturings. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 764–773. ACM, 2014. doi:10.1145/2591796.2591797.
  • [44] Atri Rudra and Mary Wootters. It’ll probably work out: Improved list-decoding through random operations. In Tim Roughgarden, editor, Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, ITCS 2015, Rehovot, Israel, January 11-13, 2015, pages 287–296. ACM, 2015. doi:10.1145/2688073.2688092.
  • [45] Atri Rudra and Mary Wootters. Average-radius list-recoverability of random linear codes. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 644–662. SIAM, 2018. doi:10.1137/1.9781611975031.42.
  • [46] Chong Shangguan and Itzhak Tamo. Combinatorial list-decoding of Reed-Solomon codes beyond the Johnson radius. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing, STOC 2020, pages 538–551, 2020. doi:10.1145/3357713.3384295.
  • [47] Shashank Srivastava. Improved list size for folded reed-solomon codes. In Yossi Azar and Debmalya Panigrahi, editors, Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025, New Orleans, LA, USA, January 12-15, 2025, pages 2040–2050. SIAM, 2025. doi:10.1137/1.9781611978322.64.
  • [48] Amnon Ta-Shma and David Zuckerman. Extractor codes. IEEE Trans. Inf. Theory, 50(12):3015–3025, 2004. doi:10.1109/TIT.2004.838377.
  • [49] Itzhak Tamo. Tighter List-Size Bounds for List-Decoding and Recovery of Folded Reed-Solomon and Multiplicity Codes, December 2023. arXiv:2312.17097. doi:10.48550/arXiv.2312.17097.
  • [50] Mary Wootters. On the List Decodability of Random Linear Codes with Large Error Rates. In Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC ’13, pages 853–860, New York, NY, USA, 2013. ACM. event-place: Palo Alto, California, USA. doi:10.1145/2488608.2488716.
  • [51] Victor Vasilievich Zyablov and Mark Semenovich Pinsker. List concatenated decoding. Problemy Peredachi Informatsii, 17(4):29–33, 1981. Russian Academy of Sciences.