Abstract 1 Introduction 2 Preliminaries 3 Zero-Rate List-Decoding 4 Zero-Rate List-Recovery 5 Code Construction References

Tight Bounds on List-Decodable and List-Recoverable Zero-Rate Codes

Nicolas Resch ORCID Informatics’ Institute, University of Amsterdam, The Netherlands Chen Yuan ORCID School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China Yihan Zhang ORCID Institute of Science and Technology Austria, Klosterneuburg, Austria
Abstract

In this work, we consider the list-decodability and list-recoverability of codes in the zero-rate regime. Briefly, a code 𝒞[q]n is (p,,L)-list-recoverable if for all tuples of input lists (Y1,,Yn) with each Yi[q] and |Yi|=, the number of codewords c𝒞 such that ciYi for at most pn choices of i[n] is less than L; list-decoding is the special case of =1. In recent work by Resch, Yuan and Zhang (ICALP 2023) the zero-rate threshold for list-recovery was determined for all parameters: that is, the work explicitly computes p:=p(q,,L) with the property that for all ε>0 (a) there exist positive-rate (pε,,L)-list-recoverable codes, and (b) any (p+ε,,L)-list-recoverable code has rate 0. In fact, in the latter case the code has constant size, independent on n. However, the constant size in their work is quite large in 1/ε, at least |𝒞|(1ε)O(qL).

Our contribution in this work is to show that for all choices of q, and L with q3, any (p+ε,,L)-list-recoverable code must have size Oq,,L(1/ε), and furthermore this upper bound is complemented by a matching lower bound Ωq,,L(1/ε). This greatly generalizes work by Alon, Bukh and Polyanskiy (IEEE Trans. Inf. Theory 2018) which focused only on the case of binary alphabet (and thus necessarily only list-decoding). We remark that we can in fact recover the same result for q=2 and even L, as obtained by Alon, Bukh and Polyanskiy: we thus strictly generalize their work.

Our main technical contribution is to (a) properly define a linear programming relaxation of the list-recovery condition over large alphabets; and (b) to demonstrate that a certain function defined on a q-ary probability simplex is maximized by the uniform distribution. This represents the core challenge in generalizing to larger q (as a binary simplex can be naturally identified with a one-dimensional interval). We can subsequently re-utilize certain Schur convexity and convexity properties established for a related function by Resch, Yuan and Zhang along with ideas of Alon, Bukh and Polyanskiy.

Keywords and phrases:
List Decoding, List Recovery, Zero Rate
Copyright and License:
[Uncaptioned image] © Nicolas Resch, Chen Yuan, and Yihan Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Coding theory
Related Version:
Full Version: https://arxiv.org/abs/2309.01800
Funding:
The research of C. Yuan was support in part by the National Key R&D Program of China under Grant 2023YFE0123900 and Natural Science Foundation of Shanghai under the 2024 Shanghai Action Plan for Science, Technology and Innovation Grant 24BC3200700. The research of N. Resch is supported in part by an NWO (Dutch Research Council) grant with number C.2324.0590, and this work was done in part while he was visiting the Simons Institute for the Theory of Computing, supported by DOE grant #DE-SC0024124.
Editors:
Raghu Meka

1 Introduction

Given an error-correcting code 𝒞[q]n, a fundamental requirement is that the codewords are sufficiently well-spread in order to guarantee some non-trivial correctability properties. This is typically enforced by requiring that the minimum distance of the code d=min{dH(𝒄,𝒄):𝒄𝒄𝒞}, where dH(,) denotes the Hamming distance (i.e. the number of coordinates on which two strings differ). Note that minimum distance d is equivalent to the following “packing” property: if we put a ball of radius r:=d/2 around any point 𝒛[q]n – i.e. we consider the Hamming ball H(𝒚,r):={𝒙[q]n:dH(𝒙,𝒚)r} – then all these balls contain at most 1 codeword from 𝒞.

This latter viewpoint can easily be generalized to obtain list-decodability, where we now require that such Hamming balls do not capture “too many” codewords. That is, for p[0,1] and L a code is called (p,L)-list-decodable if every Hamming ball of radius pn contains less than L codewords from 𝒞. In notation: for all 𝒚[q]n, |H(𝒚,pn)𝒞|L1.111Typically the upper bound is L, rather than L1. However, for “impossibility” arguments this parametrization is more common, as it leads to less cumbersome computations. This notion was already introduced in the 50’s by Elias and Wozencraft [10, 30, 11] but has in the past 20 years seen quite a bit of attention due to its connections to other parts of theoretical computer science [13, 2, 21, 19, 18, 26].

One can push this generalization further to obtain list-recoverability. Here, we consider a tuple of input lists 𝒀=(Y1,,Yn), where each Yi[q] has size at most (for some ). The requirement is that the number of codewords that “disagree” with 𝒀 in at most pn coordinates is at most L1. More formally, if for all 𝒀=(Y1,,Yn) the number of codewords 𝒄𝒞 such that |{i[n]:ciYi}|pn is at most L1, the code is called (p,,L)-list-recoverable. Note that (p,L)-list-decodability is nothing other than (p,1,L)-list-recoverability. Initially, list-recoverability was abstracted as a useful stepping stone towards list-decoding concatenated codes. However, in recent years this notion has found many connections to other parts of computer science, e.g. in cryptography [15, 16], randomness extraction [14], hardness amplification [8], group testing [17, 23], streaming algorithms [9], and beyond.

Rate versus noise-resilience.

Having fixed a desired “error-tolerance” as determined by the parameters p, and L we would also like the code 𝒞 to be as large as possible: intuitively, this implies that the code contains the minimal amount of redundancy possible. A fundamental question in coding theory is to understand the achievable tradeoffs between the rate R:=logq|𝒞|n and some “error-resilience” property of the code, e.g., minimum distance, list-decodability, or list-recoverability.

This question in full generality is wide open. Even for the special case of q=2 and L=2 (i.e. determining the optimal tradeoff between rate and distance for binary codes) is unclear: on the possibility side we have the Gilbert-Varshamov bound [12, 28] showing R1H2(p/2) is achievable (here, H2(x)=xlog2x(1x)log2(1x) is the binary entropy function), while bounds of Elias and Bassalygo [3] and the linear programming bound [29, 22, 7] give incomparable and non-tight upper bounds. None of these bounds have been substantially improved in at least 40 years. The situation is even more complicated for larger q: for q=49 (and larger prime powers) the celebrated algebraic geometry codes of Tsafsman, Vladut and Zink [27] provide explicit codes of higher rate in certain regimes than those promised by the Gilbert-Varshamov bound.

When one relaxes the question to allow an asymptotically growing list size L then we do have a satisfactory answer: the answer is provided by the list-decoding/-recovery theorem, which states that for all ε>0 there exist (p,,O(1/ε))-list-recoverable codes of rate 1Hq,(p) where

Hq,(x):=plogq(qp)+(1p)logq(1p)

is (q,)-ary entropy function [24].222Note that setting =1 recovers the standard q-ary entropy function, which itself reduces to the binary entropy function upon setting q=2. On the other hand, any code of rate R1hq,(p) fails to be (p,,L)-list-recoverable unless LqΩ(εn). However, this does not provide very meaningful bounds if one is interested in, say, (p,2,5)-list-recoverable codes.

Positive versus zero-rate regimes.

Thus far, we have implicitly been discussing the positive-rate regime. However, one can also ask questions about the behaviour of codes in the zero-rate regime. For context, recent work by Resch, Yuan and Zhang [25] computed the zero-rate threshold for list-recovery: that is, for all alphabet sizes q2, input list sizes and output list size L, they determine the value p(q,,L) such that (a) for all p<p(q,,L) there exist infinite families of positive rate (p,,L)-list-recoverable codes over the alphabet [q], and (b) for all p>p(q,,L) there does not exist such an infinite family.

Having now delineated the “positive rate” and the “zero-rate” regimes depending on how p compares to p(q,,L), in this work we study the zero-rate regime for list-recoverable codes for all alphabet sizes q. In [25], it is shown that (p,,L)-list-recoverable codes 𝒞[q]n with p=p(q,,L)+ε have constant size (that is, independent of the block length n); however, this constant is massive in the parameters due to the use of a Ramsey-theoretic bound. In particular, the dependence on ε is at least (1/ε)2qL, and this is additionally multiplied by a tower of 2’s of height roughly L.

To the best of our knowledge, prior work on this question focuses exclusively on the q=2 case. For example, in the case of L=2 (i.e., unique-decoding) we have p(2,1,2)=1/4, and work by Levenshtein [20] shows that a code construction based on Hadamard matrices corrects a 1/4+ε fraction of errors and has size 1/(4ε)+O(1). A particularly relevant prior work is due to Alon, Bukh and Polyanskiy [1]. Herein the authors consider this question for the special case of q=2 (and thus, necessarily, only for list-decoding). In particular, they show that when L is even if p=p(2,1,L)+ε then such a (p,L)-list-decodable code 𝒞[2]n has size at most OL(1/ε), and moreover provide a construction of such a code with size ΩL(1/ε).333Note that for the special case of q=2, the zero-rate threshold for list-decoding had already been established by Blinovsky [4]. They observe some interesting behaviour in the case of odd L; in particular, the maximum size of a (p(2,1,3)+ε,3)-list-decodable code is Θ(1/ε3/2).444This argument in fact shows a flaw in an earlier claimed proof of Blinovsky that claimed such codes have size OL(1/ε) for all L.

Our motivations for this investigation are three-fold. Firstly, the zero-rate regime offers combinatorial challenges and interesting behaviours that we uncover in this work. Secondly, many codes that find applications in other areas of theoretical computer in fact have subconstant rate. Lastly, the zero-rate regime appears much more tractable than the positive rate regime – indeed, we can obtain tight upper and lower bounds on the size of a code, as we will soon see. It would be interesting to determine to what extent such techniques could be useful for understanding the positive rate regime as well.

1.1 Our results

Our main result in this work is a tight bound on the size of a (p,,L)-list-recoverable code over an alphabet of size q3 when p>p(q,,L). The main technical challenge is to compute the following upper bound on the size of such a code.

Theorem 1.

Let q,,L with q3. <q and L> be fixed constants. Let ε>0 and put p=p(q,,L)+ε. Suppose 𝒞[q]n is (p,,L)-list-recoverable. Then |𝒞|Oq,,L(1/ε).

We complement the above negative result with the following code construction, showing the upper bound is tight.

Theorem 2.

Let q,,L with q3 and <q be fixed constants. Let ε>0 and put p=p(q,,L)+ε. There exists a (p,,L)-list-recoverable code 𝒞[q]n such that |𝒞|Ωq,,L(1/ε).

We emphasize that in the above theorems the implied constants may depend on q, and L.

Note that our results explicitly exclude the case of q=2. As [1] prove, the binary alphabet behaves in subtle ways: the bound on the code size depends on the parity of L. Intriguingly, our work demonstrates that such behaviour does not arise over larger alphabets.

1.2 Technical Overview

The double-counting argument.

Since our focus is on zero-rate list-decodable/-recoverable codes, it helps to first review the proof of the zero-rate threshold p(q,,L). A lower bound can be easily obtained by a random construction that attains a positive rate for any pp(q,,L)ε. For the upper bound, let us first consider the list-decoding case, i.e., =1. The proof in [5, 6, 25], at a high-level, proceeds via a double-counting argument.555A characterization of p(q,1,L) was announced in [5, 6] whose proof was flawed. The work [25] filled in the gaps therein and characterized p(q,,L) for general . For any (p,,L)-list-decodable code 𝒞[q]n, the proof aims to upper and lower bound the radius of a list averaged over the choice of the list from 𝒞:

1ML(𝒄1,,𝒄L)𝒞LradH(𝒄1,,𝒄L). (1)

Comparing the bounds produces an upper bound on |𝒞|. Here radH(), known as the Chebyshev radius of a list, is the relative radius of the smallest Hamming ball containing all codewords in the list. A lower bound on Equation 1 essentially follows from list-decodability of 𝒞. Indeed, each term (corresponding to lists consisting of distinct codewords) is lower bounded by p, otherwise a list that fits into a ball of radius at most np is found, violating list-decodability of 𝒞. Therefore Equation 1 is at least po(1), where o(1) is to account for lists with not-all-distinct codewords.

On the other hand, it is much more tricky to upper bound Equation 1 as, in general, radH admits no analytically closed form and can only be computed by solving a min-max problem. Previous proofs [25] then first extracts a subcode 𝒞 with highly-regular list structures via the hypergraph Ramsey’s theorem. This allows one to assert that all lists have essentially the same radius and all codewords in each list have essentially the same distance to the center of the list. As a result, the min-max expression is “linearized” and Equation 1 can be upper bounded when restricted to 𝒞. The downside is that the Ramsey reduction step is rather lossy for code size.

Weighted average radius.

The effect of the Ramsey reduction, put formally, is to enforce the average radius:

rad¯H(𝒄1,,𝒄L) 1nmin𝒓{0,1}n1Li=1LdH(𝒄i,𝒓) (2)

of every list in the subcode to be approximately equal. To extract the regularity structures in lists without resorting to extremal bounds from Ramsey theory, [1] introduced the notion of weighted average radius which “linearizes” the Chebyshev radius in a weighted manner:

rad¯ω(𝒄1,,𝒄L) 1nmin𝒓{0,1}ni=1Lω(i)dH(𝒄i,𝒓)

where ω is a distribution on L elements. For any weighting ω, rad¯ω of lists from the code forms a suite of succinct statistics of the list distribution. It turns out rad¯UL=rad¯ (where UL denotes the uniform distribution on [L]) is maximal under all ω. Recall that the double-counting argument suggests that in an optimal zero-rate code, the behaviour of the ensemble average of rad is essentially captured by that of rad¯. In particular, list-decodability ensures that rad¯ of most lists should be large. However, not too many lists in an optimal code are expected to have large rad¯ω for any ωUL. [1] then managed to quantify the gap between rad¯=rad¯UL and rad¯ω (with ωUL), which yields improved (and sometimes optimal) size-radius trade-off of zero-rate codes.

Generalization to 𝒒-ary list-decoding.

Our major technical contribution is in extrapolating the above ideas to list-recovery. The challenge lies particularly in defining a proper notion of weighted average radius and proving its properties. Our definition relies crucially on an embedding φ from [q] to the simplex in q and relaxes the center 𝒓 of the list to be a fractional vector. Specifically, denoting by Δ={𝒂0q:i=1qai=1} the simplex in q and Δ={𝒆1,,𝒆q} its vertices (i.e., the standard basis of q), we let the embedding φ map each symbol x[q] to the standard basis vector 𝒆xΔ. Denoting by 𝒙1,,𝒙L(Δ)n the (element-wise) images of a list 𝒄1,,𝒄L[q]n, we define the weighted average radius of 𝒙1,,𝒙L as:

rad¯ω(𝒙1,,𝒙L) =1nmin𝒚Δn12𝔼iω[𝒙i𝒚1], (3)

where ω is any distribution on [L].

The notriviality and significance of the above notion, especially the embedding used therein, is three-fold.

  • First, as the weighting ω varies, rad¯ω serves as a bridge between the standard average radius in Equation 2 and the Chebyshev radius. Indeed, ω=UL recovers the former, and the maximum rad¯ω over ω recovers the latter. However, we caution that the second statement does not hold without the embedding since the Hamming distance between q-ary symbols per se is not and cannot be interpolated by a convex function, which makes the minimax theorem inapplicable. Fortunately, our embedding affinely extend the q-ary Hamming distance to the simplex therefore brings back the applicability of the minimax theorem and connects maxωrad¯ω to rad.

  • Second, our definition in Equation 3 allows 𝒚 to take any value on the simplex, instead of only its vertices, i.e., the image of [q] under φ. Though embedding naively to the hypercube [0,1]q seems convenient, upon solving the expression with fractional 𝒚 one does not necessarily obtain a notion that is guaranteed to closely approximate the original version with integral 𝒚. In contrast, using linear programming duality, we show that our embedding yields relaxed notion of radius which closely approximates the actual Chebyshev radius. Indeed, upon rounding the fractional center 𝒚 and taking its pre-image under φ, our results guarantee that the resulting radius must have negligible difference from the Chebyshev radius. Precisely speaking, we want to find a vector 𝒚=(y(i,j))[n]×[q]Δ close to the L images of the codewords 𝒙1,,𝒙L by linear programming. Meanwhile, we want 𝒚(i):=(y(i,1),,y(i,q)) to belong to Δ so that we can find a preimage of 𝒚(i) in [q]. Since 𝒚(i)Δ, the components in 𝒚(i) are subject to j=1qy(i,j)=1. This implies that at least one component of 𝒚(i) is nonzero. The basic feasible solution guarantees that there exists a feasible solution such that most of y(i,j) are 0. Combining with the fact j=1qy(i,j)=1 forces (y(i,1),,y(i,q))Δ for almost all i[n]. Thus, we obtain a negligible loss in the conversion between Hamming distance and Euclidean distance.

  • Finally, under the embedding φ, the weighted average radius rad¯ω still retains the appealing feature that the minimization can be analytically solved, therefore giving rise to an explicit expression (see Equation 31) which greatly facilitates our analysis.

We then show, via techniques deviating from those in [1], three key properties that are required by the subsequent arguments.

  1. 1.

    For any fixed distribution P, if entries of codewords in the list are generated i.i.d. using P, then

    f(P,ω)𝔼(X1,,XL)PL[1maxx[q]i[L]Xi=xω(i)]

    is maximized when ω=UL. Moreover, the equality holds if and only if ω=UL for q3 and any L. Our approach is different from [1] as we can not explicitly represent function f(P,ω).

  2. 2.

    Furthermore, if entries of codewords in the list are generated i.i.d. using a certain P, then f(P,UL) is upper bounded by f(Pq,p,UL) with Pq,p=(1pq,,1pq,p) and p=maxi[q]P(i). This follows from the Schur convexity property proved in [25].

  3. 3.

    Finally, denoting by Pi the distribution of the i-th components of codewords in code 𝒞, Schur convexity promises f(Pi,UL)f(Pq,pi,UL). In [25], it is proved that f(Pq,p,UL) is convex for p[1/q,1]. Thus, we can conclude that

    1ni[n]f(Pi,UL)f(Pq,p,UL)

    with p=1ni[n]pi.

The remaining part of our proof is similar to [1]. We show that a code 𝒞 either has radius

rad(𝒞)=1nmin𝒙[q]nmax𝒄𝒞dH(𝒄,𝒙)11qδ

or most of L-tuples with distinct codewords in 𝒞 are distributed close to uniform,. For the former case, we use the convexity property to show that the list-decodability of 𝒞 can not exceed f(Uq,UL)=p(q,L) by much. For the latter case, since most of L-tuples of distinct codewords in 𝒞L looks uniformly at random, we can show that the list-decodablilty of 𝒞 is very close to that of random codes which is f(Uq,UL).

Generalization to list-recovery.

For list-recovery, i.e., >1, we find an embedding φ that maps each element in [q] to a superposition of vertices of the simplex in q, i.e., we map the element in [q] to a vector space [0,1]𝒳 where 𝒳=([q]) is the collection of all -subsets in [q]. Concretely, we define φ(i):=A𝒳,iA𝒆A where (𝒆A)A𝒳 is a standard basis of 𝒳. The intuition behind this map is that if iX, we have φ(i)𝒆X1=(q)1 and otherwise φ(i)𝒆X1=(q)+1. Similar to the list decoding, given L codewords in [q]n, we obtain L vectors 𝒙1,,𝒙L under the map φ. Our goal is to find a vector 𝒚=(y(i,A))[n]×𝒳 close to these L vectors subject to the constraint that A𝒳y(i,A)=1 for any i[n]. This constraint combined with the basic feasible solution argument forces that for almost all i[n], (y(i,A))A𝒳 is of the form 𝒆X. For such i, we can find an -subset X𝒳 preserving the distance, i.e.,

dLR(i,X)=𝟙{iX}=12(φ(i)𝒆X1(q)+1).

Besides the linear programming relaxation, further adjustments for the proof of properties analogous to Items 1, 2, and 3 above are required.

Code construction.

As alluded to before, a code that saturates the optimal size-radius trade-off should essentially saturate both the upper and lower bounds on the quantity

1ML(𝒄1,,𝒄L)𝒞Lrad¯H(𝒄1,,𝒄L)

considered in the double-counting argument. Indeed, our impossibility result implies that any optimal zero-rate code must contain a large fraction of random-like L-tuples (𝒄1,,𝒄L), i.e., for every 𝒖[q]L

i=1n𝟙{(𝒄1(i),,𝒄L(i))=𝒖}nqL (4)

where 𝒄j=(𝒄j(1),,𝒄j(n))[q]n. To match such an impossibility result, an optimal construction should contain as many such L-tuples as possible. A simplex-like code then becomes a natural candidate. This is a natural extension of the construction in [1] to larger alphabet. An M×n codebook 𝒞 consisting of M codewords each of length n is constructed by putting as columns all possible distinct length-M vectors that contains identical numbers of 1,2,,q. It is not hard to see by symmetry that (4) becomes equality for every L-tuple with distinct codewords in 𝒞. Thus, 𝒞 is the most regular code.

We also remark that, unlike for positive-rate codes, the prototypical random construction (with expurgation) does not lead to favorable size-radius trade-off since the deviation of random sampling is comparatively too large in the zero-rate regime. In contrast, the simplex code is deterministically regular and has no deviation.

1.3 Organization

The remainder is organized as follows. First, Section 2 provides the necessary notations and definitions, together with some preliminary results which will be useful in the subsequent arguments. Sections 3.1, 3.2, 3.3, and 3.4 contain our argument establishing Theorem 1 for list-decoding (i.e. the case =1); in Section 4 we elucidate the changes that need to be made to establish the theorem for general . Next, Section 5 provides the code construction establishing Theorem 2.

2 Preliminaries

Firstly, for convenience of the reader we begin by summarizing the notation that we use. This is particularly relevent as we will often be in situations where we need multiple indexes for, e.g., lists of vectors where each coordinate lies in a probability simplex, so the reader is encouraged to refer to this table whenever it is unclear what is intended.

Table 1: Notation for list-decoding.
English letter in boldface [q]n-valued vector
Greek letter in boldface Δ([q])-valued vector
Δ:=Δ([q]) Simplex in [0,1]q, i,e., Δ={(x1,,xq)[0,1]q:i=1qxi=1}
Δ Set of vertices of Δ
𝒆xΔ The image of x[q] under φ, i.e., the x-th vertex of Δ
𝒄i[q]n The i-th codeword in a list
𝒙iΔn Image of 𝒄i under φ (applied component-wise)
𝒚Δn Relaxed center of a list
𝒙(j)Δ,𝒚(j)Δ The j-th block (of length q) in 𝒙(Δ)n,𝒚Δn, respectively
x(j,k){0,1},y(j,k)[0,1] The (j,k)-th element of 𝒙(Δ)n,𝒚Δn, respectively
radH (Standard) Chebyshev radius
rad Relaxed Chebyshev radius
rad¯ Average radius
rad¯ω Average radius weighted by ωΔ([L])
f(P,ω) Expected average radius (weighted by ω) of P-distributed symbols
(X1,,XL)PL A list of i.i.d. P-distributed symbols
Uk Uniform distribution on [k]

For a finite set S and an integer 0k|S|, we denote (Sk){TS:|T|=k}. Let [q]={1,,q}.

2.1 List-Decoding

Fix q3 and L2. Let dH(𝒄,𝒓) denote the Hamming distance between 𝒄,𝒓[q]n, i.e., the number of coordinates on which the strings differ. For t[0,n], let H(𝒚,t):={𝒄[q]n:dH(𝒄,𝒚)t} denote the Hamming ball centered around 𝒚 of radius t.

Definition 3 (List-decodable code).

Let p[0,1]. A code 𝒞[q]n is (p,L)q-list-decodable if for any 𝐲[q]n,

|𝒞H(𝒚,np)| L1.

In [25] the zero-rate regime for list-decoding was derived, which is the supremum over p[0,1] for which (pε,L)q-list-decodable codes of positive rate exist for all ε>0. This value was shown to be

p(q,L)=11L𝔼(X1,,XL)UqL[𝗉𝗅(X1,,XL)], (5)

where the function 𝗉𝗅 outputs the number of times the most popular symbol appears. In [25] it is shown that (p(q,L)+ε,L)-list-decodable codes have size Oε,q,L(1), i.e., some constant independent of n. Our target in this work is to show that the correct dependence on ε is Oq,L(1/ε), except for the case of q=2 with odd L.

A “dual” definition of list-decodability is proffered by the Chebyshev radius.

Definition 4 (Chebyshev radius).

The Chebyshev radius of a list of distinct vectors 𝐜1,,𝐜L[q]n is defined as

radH(𝒄1,,𝒄L) 1nmin𝒓[q]nmaxi[L]dH(𝒄i,𝒓).

Observe that a code 𝒞[q]n is (p,L)-list-decodable if and only if

min{radH(𝒄1,,𝒄L):𝒄1,,𝒄L𝒞 distinct}>p. (6)

In particular, to show a code fails to be list-decodable, it suffices to upper bound the Chebyshev radius of L distinct codewords.

Recall that our main target is an upper bound on the size of list-decodable/-recoverable codes (in the zero-rate regime). A natural approach is to derive from Equation 6 the desired bound on the code; however, this quantity is quite difficult to work with directly. We therefore work with a relaxed version, which we now introduce.

We require the following definitions. Let us embed [q]n into the simplex Δ([q]) via the following map φ:

φ:[q]Δ([q])x𝒆x (9)

where 𝒆x is the q-dimensional vector with a 1 in the x-th location and 0 everywhere else. Denote by Δ=Δ([q]) the simplex and Δ={𝒆1,,𝒆q} its vertices. For 𝝌=𝒆xΔ and 𝜼Δ, let

d(𝝌,𝜼) :=12𝝌𝜼1=12(1𝜼(x)+x[q]{x}𝜼(x)). (10)

Note that if 𝜼=𝒆yΔ, then

d(𝝌,𝜼) =dH(x,y).

From now on we will only work with Δn-valued vectors and will still denote such length-qn vectors by boldface letters, abusing the notation. For 𝒚Δn, we use y(j,k)[0,1] to denote its (j,k)-th element and use 𝒚(j)=(y(j,1),,y(j,q))Δ to denote its j-th block of size q. For 𝒄[q]n, we use 𝒄(j)[q] to denote its j-th element.

For 𝒙(Δ)n and 𝒚Δn, the definition of d(,) can be extended to length-qn vectors in the natural way. Specifically,

d(𝒙,𝒚) =j=1nd(𝒙(j),𝒚(j)). (11)

We may now define the relaxed Chebyshev radius.

Definition 5.

The relaxed Chebyshev radius of a list of distinct vectors 𝐱1,,𝐱L(Δ)n is

rad(𝒙1,,𝒙L) 1nmin𝒚Δnmaxi[L]d(𝒙i,𝒚). (12)

Observe that

rad(𝒙1,,𝒙L) radH(𝒄1,,𝒄L). (13)

where φ(𝒄i)=𝒙i (here we extend the definition of φ to length-n inputs in a similar way as in Equation 11). This justifies the “relaxation” terminology.

As a last piece of terminology, we define the radius of a code.

Definition 6.

For any code 𝒞[q]n, we define the Chebyshev radius of 𝒞 as

rad(𝒞)=1nmin𝒙[q]nmax𝒄𝒞dH(𝒄,𝒙).

3 Zero-Rate List-Decoding

3.1 Linear Programming Relaxation

We have shown in Equation 13 that rad is smaller than radH. Conversely, Lemma 7 below establishes that rad and radH do not differ much. That is, for any list, if a center 𝒚Δn achieves a relaxed radius t, then there must exist 𝒓[q]n attaining approximately the same t for sufficiently large n.

Lemma 7 (rad is close to radH).

Let 𝐜1,,𝐜L[q]n. Denote by 𝐱1,,𝐱L(Δ)n the images of 𝐜1,,𝐜L under the embedding φ. Then

radH(𝒄1,,𝒄L) rad(𝒙1,,𝒙L)+Ln.

Proof.

Suppose rad(𝒙1,,𝒙L)=t. Then there exists 𝒚Δn such that for every i[L],

d(𝒙i,𝒚) =12j=1n(1y(j,ci(j))+x[q]{ci(j)}y(j,x))t,

where the first equality is by Equations 10 and 11. That is, the following polytope is nonempty:

{𝒚Δn:i[L],d(𝒙i,𝒚)t}={(y(j,k))(j,k)[n]×[q]:(j,k)[n]×[q],y(j,k)0,j[n],k=1qy(j,k)=1,i[L],12j=1n(1y(j,𝒄i(j))+x[q]{𝒄i(j)}y(j,x))t}. (14)

Equivalently, the following linear program (LP) is feasible:

max(y(j,k))(j,k)[n]×[q]0s.t.(j,k)[n]×[q],y(j,k)0,j[n],k=1qy(j,k)=1,i[L],12j=1n(1y(j,𝒄i(j))+x[q]{𝒄i(j)}y(j,x))t. (19)

Since the equality 𝒂,𝒚b is equivalent to the equality 𝒂,𝒚+z=b,z0, the above LP can be written in equational form:

max(y(j,k))(j,k)[n]×[q],(z(i))i[L]0s.t.(j,k)[n]×[q],y(j,k)0,i[L],z(i)0,j[n],k=1qy(j,k)=1,i[L],12j=1n(1y(j,𝒄i(j))+x[q]{𝒄i(j)}y(j,x))+z(i)t, (25)

or more compactly in matrix form:

max𝒚nq,𝒛L0s.t.[AILB0][𝒚𝒛]=[t𝟏L𝟏n],𝒚,𝒛𝟎. (29)

Here AL×(nq) and Bn×(nq) encode respectively the fourth and third constraints in Equation 25, and ILL×L,𝟏LL denote respectively the L×L identity matrix and the all-one vector of length L. It is clear that rk([AILB0])n+L. This implies that there exists a feasible solution 𝒚,𝒛 that has at most n+L nonzeros and thus 𝒚=(y(j,k))(j,k)[n]×[q] has at most n+L nonzeros. Indeed, such solutions are known as the basic feasible solutions. Note that for every block j[n], k=1qy(j,k)=1. This implies that y(j,1),,y(j,q) cannot be simultaneously 0. Moreover, if q1 out of them are 0, the remaining one is forced to be 1. Since there are n blocks in total, by the pigeonhole principle, there are at least nL choices of j[n] such that 𝒚(j)=(y(j,1),,y(j,q))Δ. Without loss of generality, we assume that these nL indices are 1,,nL. Let 𝒓[q]n be such that φ(𝒓(j))=𝒚(j) for j=1,,nL and r(j) is any value in [q] for j=nL+1,,n. Since d(𝒙i(j),𝒚(j))[0,1] and dH(𝒄i(j),𝒓(j)){0,1}, the difference between d(𝒙i,𝒚) and dH(𝒄i,𝒓) is at most L. The proof is completed.

We further relax rad by defining the weighted average radius. For 𝒙1,,𝒙L(Δ)n and ωΔ([L]), let

rad¯ω(𝒙1,,𝒙L) 1nmin𝒚Δn𝔼iω[d(𝒙i,𝒚)]=1nmin𝒚Δni[L]ω(i)d(𝒙i,𝒚).

In words, weighted average radius is obtained by replacing the maximization over i[L] in the definition of relaxed radius (see Equation 12) with an average with respect to a distribution ω.

Since the objective of the minimization is separable, one can minimize over each y(j) individually and obtain an alternative expression. Suppose 𝒙1,,𝒙L(Δ)n are the images of 𝒄1,,𝒄L[q]n under the embedding φ. Then

rad¯ω(𝒙1,,𝒙L)=1nmin𝒚Δni[L]ω(i)d(𝒙i,𝒚)
=12nmin(𝒚1,,𝒚n)Δnj=1n[i[L]ω(i)(1𝒚j(𝒄i(j))+x[q]{𝒄i(j)}𝒚j(x))]
=12nj=1nmin𝒚jΔ[i[L]ω(i)(12𝒚j(𝒄i(j))+x[q]𝒚j(x))] (30)
=12nj=1nmin𝒚jΔ[i[L]ω(i)(22𝒚j(𝒄i(j)))]
=1nj=1nmin𝒚jΔ[1i[L]ω(i)𝒚j(𝒄i(j))]=11nj=1nmaxx[q]i[L]𝒄i(j)=xω(i). (31)

Equation 30 holds since the objective in brackets only depends on 𝒚j, not on other (𝒚j)j[n]{j}. To see Equation 31, we note that a maximizer 𝒚Δ to the following problem max𝒚Δi[L]ω(i)y(xi), where ωΔ([L]) and (x1,,xL)[q]L are fixed, is given by 𝒚=𝒆x where x[q] satisfies xargmaxx[q]i[L]ω(i)𝟙{xi=x}.

Obviously, by definition, for any 𝒙1,,𝒙L(Δ)n and ωΔ([L]), rad¯ω(𝒙1,,𝒙L)rad(𝒙1,,𝒙L). In fact, the following lemma shows that rad is equal to the maximum rad¯ω over ω.

Lemma 8 (rad equals maximum rad¯ω).

For any 𝐱1,,𝐱L(Δ)n,

rad(𝒙1,,𝒙L) =maxωΔ([L])rad¯ω(𝒙1,,𝒙L).

Proof.

Note that

rad(𝒙1,,𝒙L) 1nmin𝒚Δnmaxi[L]d(𝒙i,𝒚)=1nmin𝒚ΔnmaxωΔ([L])i[L]ω(i)d(𝒙i,𝒚),

since the inner maximum is anyway achieved by a singleton distribution. Note also that the objective function

i[L]ω(i)d(𝒙i,𝒚) =12i[L]ω(i)j=1n(1y(j,𝒄i(j))+x[q]{𝒄i(j)}y(j,x))

is affine in ω and linear in 𝒚. Therefore, von Neumann’s minimax theorem allows us to interchange min and max and obtain

rad(𝒙1,,𝒙L) =1nmaxωΔ([L])min𝒚Δni[L]ω(i)d(𝒙i,𝒚)=maxωΔ([L])rad¯ω(𝒙1,,𝒙L),

as claimed by the lemma.

In fact, we can say something stronger: it is not necessary to maximize over the entire (uncountable) probability simplex Δ([L]). Instead, we can extract a finite subset ΩLΔ([L]) and maximize over this set to recover rad. The following lemma is analogous to [1, Lemma 6].

Lemma 9 (rad is achieved by finitely many ω).

For every L, there exists a finite set of probability measures ΩLΔ([L]) such that

rad(𝒙1,,𝒙L)=maxωΩLrad¯ω(𝒙1,,𝒙L).

for all 𝐱1,,𝐱LΔn.

Proof.

The idea is to view the computation of maxωΩLrad¯ω(𝒙1,,𝒙L) as finding the maximum among some finite set of linear program maxima over some convex polytopes, and then to take ΩL to be the set of vertices of the defined convex polytopes.

First, we define the convex polytopes based on a (q-ary version of a) signature. For each ωΔ([L]), we define a signature for ω which is a function Sω:[q]L[q] such that

Sω(𝒖)argmaxx[q]i:𝒖(i)=xω(i)

for 𝒖[q]L. Define further the q halfspaces H𝒖,x:={ωΔ([L]):i:𝒖(i)=xω(i)1/q} for x[q]. Observe that if S(𝒖)=x where S is a signature for ω then ωH𝒖,x. Thus, by ranging over the choices for 𝒖[q]L and x[q] we obtain qL+1 halfspaces that partition the (L1)-dimensional space Δ([L]) into at most jL1(qL+1j) regions.

For each possible signature S:[q]L[q], let ΩS={ωΔ([L]):S is a signature for ω}, and note that ΩS is a convex polytope. Indeed, it is an intersection over 𝒖[q]L of the convex polytopes

{ωΔ([L]): signature  Sω for ω s.t. Sω(𝒖)=S(𝒖)}
=y[q]S(𝒖){ωΔ([L]):i:𝒖(i)=S(𝒖)ω(i)i:𝒖(i)=yω(i)}

where Sω is a signature for ω. Now, to maximize

rad¯ω(𝒙1,,𝒙L)=1nmin𝒚Δni[L]ω(i)d(𝒙i,𝒚)

over ωΩL, consider the set T𝒖={i[n]:(𝒙1(i),,𝒙L(i))=𝒖} for 𝒖[q]L and let a𝒖=|T𝒖|n. We claim it suffices to find the maximum of the following linear function:

𝒖[q]La𝒖y𝒖,s.t.y𝒖=i:𝒖(i)=S(𝒖)ω(i) (32)

over all ωΩS. Indeed, by Equation 31, we have

rad¯ω(𝒙1,,𝒙L)=11nj=1nmaxx[q]i[L]ci(j)=xω(i).

This implies that a maximizer only depends on the index set T𝒖, and furthermore that its value is determined by the a𝒖’s as in Equation 32.

We can thus take the union of all vertex sets of all polytopes ΩS for all signatures S. Multiplying this by the Oq,L(1) regions defined by all the halfspaces H𝒖,x we obtain a finite set of vertices, as desired.

3.2 Properties of 𝒇(𝑷,𝝎)

Now, we consider the expected weighted average radius of a sequence of i.i.d. symbols. Specifically, for PΔ([q]) and ωΔ([L]), let

f(P,ω) 𝔼(X1,,XL)PL[rad¯ω(𝒆X1,,𝒆XL)].

[1] studies f(P,ω) for q=2 and even L. In this case, one can take advantage of the fact that PΔ([2]) may be parametrized by a single real number, and thereby yield a fairly simple expression for f(P,ω).

Nonetheless, in this subsection, we will show that all properties of f(P,ω) in [1] holding for q=2 and even L can be generalized to any q3 and any L. Let us first provide a more explicit expression for f(P,ω) using Equation 31:

f(P,ω) 𝔼(X1,,XL)PL[1maxx[q]i[L]Xi=xω(i)]
=1(x1,,xL)[q]L(i=1LP(xi))maxx[q]i[L]ω(i)𝟙{xi=x}.

We define the shorthand notation

maxω(x1,,xL) maxx[q]i[L]xi=xω(i) (33)

for any ωΔ([L]) and (x1,,xL)[q]L.

The first property that we would like to establish is that f(P,ω) only increases if ω is replaced by UL, and furthermore that the maximum is uniquely obtained at UL if P(x)>0 for all x[q]. In order to do this, we will regularly “average-out” coordinates of ω and then show that the function value increases (or at least, does not decrease). To be introduce some terminology, for S[L] we say that ω¯ is obtained from ω by averaging-out the subset S of coordinates if ω¯ is defined as

ω¯(i)={jSω(j)|S|iSω(i)iS.

The following lemma gives a simple criterion for establishing that, if ω¯ is obtained from ω by averaging two coordinates, then f(P,ω¯)f(P,ω), and it furthermore gives a criterion for the inequality to be strict. The main thrust of the proof of Lemma 11 is thus to show that this criterion is always satisfied.

Lemma 10.

Let PΔ([q]) and ωΔ([L]). Suppose ω(L1)ω(L) and that ω¯Δ([L]) is obtained by averaging-out the last two coordinates of ω. Suppose that for all (x1,,xL)[q]L we have

12(maxω(x1,,xL1,xL)+maxω(x1,,xL,xL1)) maxω¯(x1,,xL1,xL). (34)

Then f(P,ω¯)f(P,ω).

Furthermore, suppose that additionally there exists (x1,,xL)[q]L with i=1LP(xi)>0 such that the inequality in Equation 34 is strict. Then f(P,ω¯)>f(P,ω).

Proof.

Define ωΔ([L]) as

ω(i) ={ω(i),i[L]{L1,L}ω(L),i=L1ω(L1),i=L.

That is, ω is obtained by swapping the last two components of ω. By symmetry, we have f(P,ω)=f(P,ω) and so

f(P,ω) =12(f(P,ω)+f(P,ω))
=12(1(x1,,xL)[q]L(i=1LP(xi))maxω(x1,,xL1,xL)
+1(x1,,xL)[q]L(i=1LP(xi))maxω(x1,,xL1,xL))
=1(x1,,xL)[q]L(i=1LP(xi))12(maxω(x1,,xL1,xL)+maxω(x1,,xL,xL1)))
1(x1,,xL)[q]L(i=1LP(xi))maxω¯(x1,,xL1,xL)
=f(P,ω¯),

where the inequality follows from Equation 34. From the above sequence of inequalities, it is also clear that if additionally there exists (x1,,xL)[q]L with i=1LP(xi)>0 for which the inequality in Equation 34 is strict, then f(P,ω¯)>f(P,ω).

We now establish that the function value cannot decrease if ω is replaced by UL.

Lemma 11.

For any PΔ([q]) and ωΔ([L]), f(P,ω)f(P,UL).

Proof.

Fix any (x1,,xL)[q]L. Let ωΔ([L]) be non-uniform. Without loss of generality, assume ω(L1)ω(L). Let ω¯Δ([L]) be obtained by uniformizing the last two components of ω, i.e.,

ω¯(i) ={ω(i),i[L]{L1,L}12(ω(L1)+ω(L)),i{L1,L}.

We claim f(P,ω¯)f(P,ω). By Lemma 10, we just need to establish Equation 34.

Equation 34 trivially holds if xL1=xL. We therefore assume below xL1xL. Let xL1=a and xL=b. Let

ω(a)=i[L2]xi=aω(i),ω(b)=i[L2]xi=bω(i).

Then, we have

i[L]xi=aω¯(i)=ω(a)+12(ω(L1)+ω(L)),i[L]xi=bω¯(i)=ω(b)+12(ω(L1)+ω(L)).

We first assume that there exists c{a,b} such that

maxω¯(x1,,xL)=i[L]xi=cω¯(i)=i[L]xi=cω(i)

where the second equality follows since the set {i[q]:xi=c} does not contain L1,L. Equation 34 therefore holds as

maxω(x1,,xL1,xL)i[L]xi=cω(i),maxω(x1,,xL,xL1)i[L]xi=cω(i).

We proceed to the case that

maxω¯(x1,,xL)=max{12(ω(L1)+ω(L))+ω(a),12(ω(L1)+ω(L))+ω(b)}.

Equation 34 holds as

maxω(x1,,xL1,xL)max{ω(a)+ω(L1),ω(b)+ω(L)}

and

maxω(x1,,xL,xL1)max{ω(a)+ω(L),ω(b)+ω(L1)}.

Thus, Lemma 10 implies f(P,ω¯)f(P,ω), as desired.

We can then continue averaging components of ω and in this way obtain a sequence (ωi)i of distributions with ω1=ω. This sequence converges in -norm to the uniform distribution UL and satisfies f(P,ωi+1)f(P,ωi) for all i. Observing that ωf(P,ω) is a continuous function – the term maxx[q]i[L]ω(i)𝟙{xi=x} is a maximum over linear functions of ω, hence linear, implying that f(P,) is a linear combination of continuous functions – it follows that f(P,UL)=limif(P,ωi), and in particular that f(P,UL)f(P,ω1)=f(P,ω), as desired.

We now strengthen the conclusion of Lemma 11 by showing that for all q3 and L2 the function ωf(P,ω) is uniquely maximized by the setting ω=UL, except for degenerate cases where P(x)=0 for some x[q].

Before stating and proving this fact, we note that the proof of Lemma 11 in fact shows that we can average out any subset of coordinates of ω and only increase the value of f(P,ω). We formalize this fact in the following lemma, which will be useful in the following arguments.

Lemma 12.

Let PΔ([q]), ωΔ([L]) and S[L]. Let ω¯ be obtained from ω by averaging-out the subset S of coordinates. Then f(P,ω¯)f(P,ω).

Theorem 13.

Let q3, L2 and let PΔ([q]) be such that P(x)>0 666In fact, our proof only apply with P=Uq which clearly satisfies the condition. for all x[q]. Then for all ωΔ([L]), f(P,ω)f(P,UL) with equality if and only if ω=UL.

Proof.

The inequality was already established in Lemma 11, so we focus on showing ω=UL when f(P,ω)=f(P,UL). As q3, let a,b and c denote 3 distinct elements of [q]. Let ωUL and suppose for a contradiction that f(P,ω) is a maximum of the function ωf(P,ω). The proof proceeds via a number of cases.

  1. 1.

    𝑳 is even. Without loss of generality, ω(L1)<ω(L). If L4, let ω be obtained from ω by averaging-out the first L2 coordinates; by Lemma 12, f(P,ω)f(P,ω). If L=2, set ω=ω.

    If L4, since 2|(L2), we can set x1==xL/21=a and xL/2==xL2=b. Set further xL1=a and xL=b. We observe that for this (x1,,xL)[q]L and ω¯ obtained from ω by averaging-out the last two coordinates, Equation 34 strictly holds. Indeed,

    maxω(x1,,xL1,xL)+maxω(x1,,xL,xL1)=2j=1(L2)/2ω(i)+2ω(L)

    and

    2maxω¯(x1,,xL1,xL)=2j=1(L2)/2ω(i)+ω(L1)+ω(L).

    Thus Lemma 10 implies f(P,ω¯)>f(P,ω)f(P,ω), a contradiction.

  2. 2.

    𝑳 is odd and at least three components of ω take distinct values, or the components in ω only take two different values and at least two of them take the minimum value. Without loss of generality ω(L2)ω(L1)<ω(L). If L5, let ω be obtained from ω by averaging-out the first L3 coordinates; by Lemma 12, f(P,ω)f(P,ω). If L=3, set ω=ω.

    Since 2|(L3), if L5, we set x1==x(L1)/21=a and x(L1)/2==xL3=b. Let xL2=c, xL1=a and xL=b. We observe that for this (x1,,xL)[q]L and ω¯ obtained from ω by averaging-out the last two coordinates, Equation 34 strictly holds. Indeed, since ω(L2)<ω(L) we have

    maxω(x1,,xL1,xL)+maxω(x1,,xL,xL1)=2j=1(L2)/2ω(i)+2ω(L)

    and

    2maxω¯(x1,,xL1,xL)=2j=1(L3)/2ω(i)+ω(L1)+ω(L).

    Thus Lemma 10 implies f(P,ω¯)>f(P,ω)f(P,ω), a contradiction.

  3. 3.

    𝑳 is odd and only one component takes the minimum value. That is, ω(1)=ω(2)==ω(L1)<ω(L). Let ω be obtained from ω by averaging-out the subset {L1,L2}. Then f(P,ω)f(P,ω) by Lemma 12 and moreover ω is such that at least two coordinates take on the minimum value, as ω(1)==ω(L2)>ω(L1)=ω(L). The argument from the previous case can now be applied to derive a contradiction.

Thus, except for degenerate choices for PΔ([q]), it follows that the function ωf(P,ω) is maximized by the choice of ω=UL. The next step is to determine the distribution PΔ([q]) maximizing Pf(P,UL). At this point, we can rely on a main result of [25], as the function P1f(P,UL) is the same as the function fq,L(P) defined in [25, Equation (17)]. It is shown therein that fq,L(P) is strictly Schur convex, which in particular means that fq,L(P) has a unique minimum at P=Uq. That is, f(P,UL) has a unique maximum at P=Uq.

The (strict) Schur convexity also implies the following: if p=maxx[q]P(x), then fq,L(P)fq,L(Pq,p) where

Pq,p(x)={1pq1x{1,2,,q1}px=q. (35)

That is, we can conclude that f(P,UL)f(Pq,p,UL). We encapsulate these facts in the following proposition.

Proposition 14 (Theorem 1,2 [25]).

Let q2, Lq and PΔ([q]). Suppose p=maxx[q]P(x). Then f(P,UL)f(Pq,p,UL). Furthermore, f(Pq,p,UL)f(Uq,UL) is monotone decreasing for p1/q. Lastly, f(Pq,p,UL) is concave for p[1/q,1], i.e., 1ni=1nf(Pq,pi,UL)f(Pq,p,UL) with p=1ni=1npi.

A further fact that we have from [25] is that

p(q,L)=f(Uq,UL).

In fact, this was taken as the definition of p(q,L). To end this subsection, we prove the following theorem by utilizing the concavity of f(Pq,p,UL).

Theorem 15.

Assume radH(𝒞)p, then we have

𝔼(𝒄1,,𝒄L)𝒞L[radω(φ(𝒄1),,φ(𝒄L))]f(Pq,p,UL).

Proof.

Let 𝒚 be the center attaining radH(𝒞). Without loss of generality, we can assume 𝒚 is a all zero vector. Let Pi be the distribution of symbols in the i-th index of 𝒞, i.e., Pi(j)=Pr[𝒄(i)=j] with the distribution taken over 𝒄𝒞. Let pi=maxx[q]Pi(x) and p=1ni=1npi. Clearly, pi1/q. Then, we have

𝔼(𝒄1,,𝒄L)𝒞L[radω(φ(𝒄1),,φ(𝒄L))]=1ni=1nf(Pi,ω)
1ni=1nf(Pi,UL)1ni=1nf(Pq,pi,UL)f(Pq,p,UL)f(Pq,p,UL).

The first inequality is due to Lemma 11 and the second and third inequalities are due to Proposition 14. The last inequality is due to radH(𝒞)p and the center 𝒚 is all zero vector. The proof is completed.

3.3 Abundance of Random-Like 𝑳-tuples

Recall the notations

rad(𝒞)=1nmin𝒚[q]nmax𝒄𝒞dH(𝒄,𝒚)

and

𝗍𝗒𝗉𝖾𝒖(𝒄1,,𝒄L)=1ni=1n𝟙{(𝒄1(i),,𝒄L(i))=𝒖}

where 𝒄i=(𝒄i(1),,𝒄i(n))[q]n and 𝒖[q]L. In this subsection, we prove a code 𝒞[q]n either contains a large subcode 𝒞𝒞 with radius rad(𝒞)11qε, or most of its L-tuples are of type close to the uniform distribution (for all 𝒖[q]L).

We first show that for any projection πA with |A|μn (for some parameter μ[0,1]), the projection πA(𝒞) almost preserves the radius rad(𝒞) with small loss. Then, if rad(𝒞)>11qε for any subcode 𝒞 with large size, we find a codeword 𝒄1 in 𝒞 whose symbols’ distribution is close to the uniform. In fact, most codewords in 𝒞 satisfies this requirement. Let Ai be the index set of 𝒄1 taking value i. We apply πAi to 𝒞 and claim that πAi(𝒞) preserves the radius. Thus, we can find a codeword 𝒄2 such that for every i[q], the symbol’s distribution of πAi(𝒄2) is close to uniform. Moreover, most of codewords in 𝒞 satisfy this requirement. The proof is the completed by induction.

Lemma 16.

Let πA:[q]n[q]A be the projection on a set A of size m. Suppose 𝒞[q]n is a code of size qs satisfying radH(πA(𝒞))11qε. Then, there exists a subcode 𝒞𝒞 of size at least s such that radH(𝒞)11qmnε.

Proof.

The proof is omitted due to the limited space.

Theorem 17.

Let L be fixed. For every ε>0, there exists a δ>0 with the following property. If s is a natural number, there exist constants M0=M0(s) and c(s) such that for any code 𝒞[q]n with size MM0, at least one of the following must hold:

  1. 1.

    There exists 𝒞𝒞 such that |𝒞|s and radH(𝒞)11qδ.

  2. 2.

    There exist at least MLc(s)ML1 many L-tuples of distinct codewords (𝒄1,,𝒄L) in 𝒞 such that for all 𝒖[q]L we have

    |𝗍𝗒𝗉𝖾𝒖(𝒄1,,𝒄L)qL|ε

    and thus

    |rad¯ω(φ(𝒄1),,φ(𝒄L))f(Uq,ω)|qLε.

Proof.

The proof is omitted due to the limited space.

3.4 Putting Everything Together

The argument follows the same line of reasoning as [1]. We provide the proof for completeness. Define ρL(𝒞)=minrad(φ(𝒄1),,φ(𝒄L)) with minimum taken over all L-tuples (𝒄1,,𝒄L)𝒞L with distinct elements, where we recall that rad denotes the relaxed Chebyshev radius (Definition 5).

Theorem 18.

Let L2 and q3. If 𝒞[q]n is (p(q,L)+ε,L)-list-decodable, then |𝒞|=Oq,L(1ε).

Proof.

The proof is omitted due to the limited space.

4 Zero-Rate List-Recovery

In this section, we show how our results on list-decoding can naturally be extended to list-recovery. Due to the page limit, we refer the reader to the full version. The result is summarized in Theorem 1.

5 Code Construction

In this section, we present a simple simplex-like code construction and show that it attains the optimal size-radius trade-off by analyzing its list-decoding and -recovery radius.

Our construction will be identical for list-decoding and -recovery and therefore we will directly analyze its list-recovery radius. Before presenting the construction and its analysis, let us define the average radius rad¯. This is a standard notion that “linearizes” the Chebyshev radius rad and often finds its use in the analysis of list-recoverable codes in the literature. The definition reads as follows: for any 𝒄1,,𝒄L[q]n,

rad¯(𝒄1,,𝒄L) 1Lmin𝒀𝒳ni=1LdLR(𝒄i,𝒀).

It is well-known and easy to verify (by, e.g., following the derivations leading to Equation 31) that the above minimization admits the following explicit solution:

rad¯(𝒄1,,𝒄L) j=1n(11L𝗉𝗅(𝒄1(j),,𝒄L(j))), (36)

Equation 36 should be interpreted as the average distance from each 𝒄i to the “centroid” 𝒀𝒳n of the list defined as777If there are multiple maximizers, take an arbitrary one and the value of rad¯ remains the same.

𝒀(j) argmaxA𝒳i=1L𝟙{𝒄i(j)A}.

for each j[n].

Finally, for integers q1 and L0, denote by

𝒜q,L ={(a1,,aq)0q:i=1qai=L}

the set of q-partitions of L, i.e., ai is the number of indices taking value i. For 𝒂𝒜a,L, we shorthand (L𝒂)=(La1,,aq) where 𝒂=(a1,,aq). Define max{𝒂}=maxA𝒳iAai, i.e., the sum of largest components in 𝒂.

Theorem 19 (Construction of zero-rate list-recoverable codes).

Fix any integers q3, 1 and L2. For any sufficiently large m, there exists a (p,,L)-list-recoverable code 𝒞 with blocklength

n=(qmm,,mq), (37)

and the trade-off between code size M and (relative) radius p given by:

M=qm,p=p(q,,L)+cq,,Lm1+O(m2),

where

cq,,L qL𝒂𝒜q,Lmax{𝒂}L(L𝒂)[i=1q(ai2)1q(L2)]>0. (38)

Proof.

The proof is omitted due to the page limit.

References

  • [1] Noga Alon, Boris Bukh, and Yury Polyanskiy. List-decodable zero-rate codes. IEEE Transactions on Information Theory, 65(3):1657–1667, 2018. doi:10.1109/TIT.2018.2868957.
  • [2] László Babai, Lance Fortnow, Noam Nisan, and Avi Wigderson. BPP has subexponential time simulations unless EXPTIME has publishable proofs. Comput. Complex., 3:307–318, 1993. doi:10.1007/BF01275486.
  • [3] L. A. Bassalygo. New upper bounds for error-correcting codes. Probl. of Info. Transm., 1:32–35, 1965.
  • [4] Vladimir M Blinovsky. Bounds for codes in the case of list decoding of finite volume. Problems of Information Transmission, 22:7–19, 1986.
  • [5] Vladimir M Blinovsky. Code bounds for multiple packings over a nonbinary finite alphabet. Problems of Information Transmission, 41:23–32, 2005. doi:10.1007/S11122-005-0007-5.
  • [6] Vladimir M Blinovsky. On the convexity of one coding-theory function. Problems of Information Transmission, 44:34–39, 2008. doi:10.1134/S0032946008010031.
  • [7] Philippe Delsarte. An algebraic approach to the association schemes of coding theory. Philips Res. Rep. Suppl., 10:vi+–97, 1973.
  • [8] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1057–1068. IEEE, 2020. doi:10.1109/FOCS46700.2020.00102.
  • [9] Dean Doron and Mary Wootters. High-probability list-recovery, and applications to heavy hitters. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.55.
  • [10] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.
  • [11] Peter Elias. Error-correcting codes for list decoding. IEEE Transactions on Information Theory, 37(1):5–12, 1991. doi:10.1109/18.61123.
  • [12] Edgar N Gilbert. A comparison of signalling alphabets. The Bell System Technical Journal, 31(3):504–522, 1952.
  • [13] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), pages 25–32. ACM, 1989. doi:10.1145/73007.73010.
  • [14] Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh–vardy codes. Journal of the ACM (JACM), 56(4):1–34, 2009. doi:10.1145/1538902.1538904.
  • [15] Iftach Haitner, Yuval Ishai, Eran Omri, and Ronen Shaltiel. Parallel hashing via list recoverability. In Annual Cryptology Conference, pages 173–190. Springer, 2015. doi:10.1007/978-3-662-48000-7_9.
  • [16] Justin Holmgren, Alex Lombardi, and Ron D Rothblum. Fiat–shamir via list-recoverable codes (or: parallel repetition of gmw is not zero-knowledge). In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 750–760, 2021. doi:10.1145/3406325.3451116.
  • [17] Piotr Indyk, Hung Q Ngo, and Atri Rudra. Efficiently decodable non-adaptive group testing. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1126–1142. SIAM, 2010. doi:10.1137/1.9781611973075.91.
  • [18] Jeffrey C Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):414–440, 1997. doi:10.1006/JCSS.1997.1533.
  • [19] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, 1993. doi:10.1137/0222080.
  • [20] VI Levenshtein. Application of hadamard matrices on coding problem. Problems of Cybernetica, 5:123–136, 1961.
  • [21] Richard J Lipton. Efficient checking of computations. In Proceedings of the 7th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 207–215. Springer, 1990. doi:10.1007/3-540-52282-4_44.
  • [22] Robert J. McEliece, Eugene R. Rodemich, Howard Rumsey, Jr., and Lloyd R. Welch. New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Trans. Inform. Theory, IT-23(2):157–166, 1977. doi:10.1109/tit.1977.1055688.
  • [23] Hung Q Ngo, Ely Porat, and Atri Rudra. Efficiently decodable error-correcting list disjunct matrices and applications. In International Colloquium on Automata, Languages, and Programming, pages 557–568. Springer, 2011.
  • [24] Nicolas Resch. List-decodable codes:(randomized) constructions and applications. School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep., CMU-CS-20-113, 2020.
  • [25] Nicolas Resch, Chen Yuan, and Yihan Zhang. Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery. arXiv preprint, 2022. doi:10.48550/arXiv.2210.07754.
  • [26] Madhu Sudan, Luca Trevisan, and Salil Vadhan. Pseudorandom generators without the XOR lemma. Journal of Computer and System Sciences, 62(2):236–266, 2001. doi:10.1006/JCSS.2000.1730.
  • [27] Michael A Tsfasman, SG Vlădutx, and Th Zink. Modular curves, shimura curves, and goppa codes, better than varshamov-gilbert bound. Mathematische Nachrichten, 109(1):21–28, 1982.
  • [28] RR Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.
  • [29] Lloyd R. Welch, Robert J. McEliece, and Howard Rumsey, Jr. A low-rate improvement on the Elias bound. IEEE Trans. Inform. Theory, IT-20:676–678, 1974. doi:10.1109/tit.1974.1055279.
  • [30] Jack Wozencraft. List decoding. Quarter Progress Report, 48:90–95, 1958.