Tight Bounds on List-Decodable and List-Recoverable Zero-Rate Codes

Resch, Nicolas; Yuan, Chen; Zhang, Yihan

doi:10.4230/LIPIcs.ITCS.2025.82

Tight Bounds on List-Decodable and List-Recoverable Zero-Rate Codes

Nicolas Resch

Informatics’ Institute, University of Amsterdam, The Netherlands Chen Yuan

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China Yihan Zhang

Institute of Science and Technology Austria, Klosterneuburg, Austria

Abstract

In this work, we consider the list-decodability and list-recoverability of codes in the zero-rate regime. Briefly, a code $\mathcal{C}\subseteq[q]^{n}$ is $(p,\ell,L)$ -list-recoverable if for all tuples of input lists $(Y_{1},\dots,Y_{n})$ with each $Y_{i}\subseteq[q]$ and $|Y_{i}|=\ell$ , the number of codewords $c\in\mathcal{C}$ such that $c_{i}\notin Y_{i}$ for at most $p n$ choices of $i\in[n]$ is less than $L$ ; list-decoding is the special case of $\ell=1$ . In recent work by Resch, Yuan and Zhang (ICALP 2023) the zero-rate threshold for list-recovery was determined for all parameters: that is, the work explicitly computes $p_{*}:=p_{*}(q,\ell,L)$ with the property that for all $\varepsilon>0$ (a) there exist positive-rate $(p_{*}-\varepsilon,\ell,L)$ -list-recoverable codes, and (b) any $(p_{*}+\varepsilon,\ell,L)$ -list-recoverable code has rate $0$ . In fact, in the latter case the code has constant size, independent on $n$ . However, the constant size in their work is quite large in $1/\varepsilon$ , at least $|\mathcal{C}|\geq(\frac{1}{\varepsilon})^{O(q^{L})}$ .

Our contribution in this work is to show that for all choices of $q,\ell$ and $L$ with $q\geq 3$ , any $(p_{*}+\varepsilon,\ell,L)$ -list-recoverable code must have size $O_{q,\ell,L}(1/\varepsilon)$ , and furthermore this upper bound is complemented by a matching lower bound $\Omega_{q,\ell,L}(1/\varepsilon)$ . This greatly generalizes work by Alon, Bukh and Polyanskiy (IEEE Trans. Inf. Theory 2018) which focused only on the case of binary alphabet (and thus necessarily only list-decoding). We remark that we can in fact recover the same result for $q=2$ and even $L$ , as obtained by Alon, Bukh and Polyanskiy: we thus strictly generalize their work.

Our main technical contribution is to (a) properly define a linear programming relaxation of the list-recovery condition over large alphabets; and (b) to demonstrate that a certain function defined on a $q$ -ary probability simplex is maximized by the uniform distribution. This represents the core challenge in generalizing to larger $q$ (as a binary simplex can be naturally identified with a one-dimensional interval). We can subsequently re-utilize certain Schur convexity and convexity properties established for a related function by Resch, Yuan and Zhang along with ideas of Alon, Bukh and Polyanskiy.

Keywords and phrases:

List Decoding, List Recovery, Zero Rate

Copyright and License:

2012 ACM Subject Classification:

Mathematics of computing

\rightarrow

Coding theory

Related Version:

Full Version: https://arxiv.org/abs/2309.01800

Funding:

The research of C. Yuan was support in part by the National Key R&D Program of China under Grant 2023YFE0123900 and Natural Science Foundation of Shanghai under the 2024 Shanghai Action Plan for Science, Technology and Innovation Grant 24BC3200700. The research of N. Resch is supported in part by an NWO (Dutch Research Council) grant with number C.2324.0590, and this work was done in part while he was visiting the Simons Institute for the Theory of Computing, supported by DOE grant #DE-SC0024124.

DOI:

10.4230/LIPIcs.ITCS.2025.82

Event:

16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Editors:

Raghu Meka

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Given an error-correcting code $\mathcal{C}\subseteq[q]^{n}$ , a fundamental requirement is that the codewords are sufficiently well-spread in order to guarantee some non-trivial correctability properties. This is typically enforced by requiring that the minimum distance of the code $d=\min\{d_{\mathrm{H}}(\boldsymbol{c},\boldsymbol{c}^{\prime}):\boldsymbol{c}% \neq\boldsymbol{c}^{\prime}\in\mathcal{C}\}$ , where $d_{\mathrm{H}}(\cdot,\cdot)$ denotes the Hamming distance (i.e. the number of coordinates on which two strings differ). Note that minimum distance $d$ is equivalent to the following “packing” property: if we put a ball of radius $r:=\lfloor d/2\rfloor$ around any point $\boldsymbol{z}\in[q]^{n}$ – i.e. we consider the Hamming ball $\mathcal{B}_{\mathrm{H}}(\boldsymbol{y},r):=\{\boldsymbol{x}\in[q]^{n}:d_{% \mathrm{H}}(\boldsymbol{x},\boldsymbol{y})\leq r\}$ – then all these balls contain at most 1 codeword from $\mathcal{C}$ .

This latter viewpoint can easily be generalized to obtain list-decodability, where we now require that such Hamming balls do not capture “too many” codewords. That is, for $p\in[0,1]$ and $L\in{\mathbb{N}}$ a code is called $(p,L)$ -list-decodable if every Hamming ball of radius $p n$ contains less than $L$ codewords from $\mathcal{C}$ . In notation: for all ${\boldsymbol{y}}\in[q]^{n}$ , $|\mathcal{B}_{\mathrm{H}}({\boldsymbol{y}},pn)\cap{\mathcal{C}}|\leq L-1$ .¹¹1Typically the upper bound is $L$ , rather than $L-1$ . However, for “impossibility” arguments this parametrization is more common, as it leads to less cumbersome computations. This notion was already introduced in the 50’s by Elias and Wozencraft [10, 30, 11] but has in the past 20 years seen quite a bit of attention due to its connections to other parts of theoretical computer science [13, 2, 21, 19, 18, 26].

One can push this generalization further to obtain list-recoverability. Here, we consider a tuple of input lists ${\boldsymbol{Y}}=(Y_{1},\dots,Y_{n})$ , where each $Y_{i}\subseteq[q]$ has size at most $\ell$ (for some $\ell\in{\mathbb{N}}$ ). The requirement is that the number of codewords that “disagree” with ${\boldsymbol{Y}}$ in at most $p n$ coordinates is at most $L-1$ . More formally, if for all ${\boldsymbol{Y}}=(Y_{1},\dots,Y_{n})$ the number of codewords ${\boldsymbol{c}}\in{\mathcal{C}}$ such that $|\{i\in[n]:c_{i}\notin Y_{i}\}|\leq pn$ is at most $L-1$ , the code is called $(p,\ell,L)$ -list-recoverable. Note that $(p,L)$ -list-decodability is nothing other than $(p,1,L)$ -list-recoverability. Initially, list-recoverability was abstracted as a useful stepping stone towards list-decoding concatenated codes. However, in recent years this notion has found many connections to other parts of computer science, e.g. in cryptography [15, 16], randomness extraction [14], hardness amplification [8], group testing [17, 23], streaming algorithms [9], and beyond.

Rate versus noise-resilience.

Having fixed a desired “error-tolerance” as determined by the parameters $p,\ell$ and $L$ we would also like the code $\mathcal{C}$ to be as large as possible: intuitively, this implies that the code contains the minimal amount of redundancy possible. A fundamental question in coding theory is to understand the achievable tradeoffs between the rate $R:=\frac{\log_{q}|\mathcal{C}|}{n}$ and some “error-resilience” property of the code, e.g., minimum distance, list-decodability, or list-recoverability.

This question in full generality is wide open. Even for the special case of $q=2$ and $L=2$ (i.e. determining the optimal tradeoff between rate and distance for binary codes) is unclear: on the possibility side we have the Gilbert-Varshamov bound [12, 28] showing $R\geq 1-H_{2}(p/2)$ is achievable (here, $H_{2}(x)=-x\log_{2}x-(1-x)\log_{2}(1-x)$ is the binary entropy function), while bounds of Elias and Bassalygo [3] and the linear programming bound [29, 22, 7] give incomparable and non-tight upper bounds. None of these bounds have been substantially improved in at least 40 years. The situation is even more complicated for larger $q$ : for $q=49$ (and larger prime powers) the celebrated algebraic geometry codes of Tsafsman, Vladut and Zink [27] provide explicit codes of higher rate in certain regimes than those promised by the Gilbert-Varshamov bound.

When one relaxes the question to allow an asymptotically growing list size $L$ then we do have a satisfactory answer: the answer is provided by the list-decoding/-recovery theorem, which states that for all $\varepsilon>0$ there exist $(p,\ell,O(1/\varepsilon))$ -list-recoverable codes of rate $1-H_{q,\ell}(p)$ where

H_{q,\ell}(x):=p\log_{q}\left(\frac{q-\ell}{p}\right)+(1-p)\log_{q}\left(\frac% {\ell}{1-p}\right)

is $(q,\ell)$ -ary entropy function [24].²²2Note that setting $\ell=1$ recovers the standard $q$ -ary entropy function, which itself reduces to the binary entropy function upon setting $q=2$ . On the other hand, any code of rate $R\geq 1-h_{q,\ell}(p)$ fails to be $(p,\ell,L)$ -list-recoverable unless $L\geq q^{\Omega(\varepsilon n)}$ . However, this does not provide very meaningful bounds if one is interested in, say, $(p,2,5)$ -list-recoverable codes.

Positive versus zero-rate regimes.

Thus far, we have implicitly been discussing the positive-rate regime. However, one can also ask questions about the behaviour of codes in the zero-rate regime. For context, recent work by Resch, Yuan and Zhang [25] computed the zero-rate threshold for list-recovery: that is, for all alphabet sizes $q\geq 2$ , input list sizes $\ell$ and output list size $L$ , they determine the value $p_{*}(q,\ell,L)$ such that (a) for all $p<p_{*}(q,\ell,L)$ there exist infinite families of positive rate $(p,\ell,L)$ -list-recoverable codes over the alphabet $[q]$ , and (b) for all $p>p_{*}(q,\ell,L)$ there does not exist such an infinite family.

Having now delineated the “positive rate” and the “zero-rate” regimes depending on how $p$ compares to $p_{*}(q,\ell,L)$ , in this work we study the zero-rate regime for list-recoverable codes for all alphabet sizes $q$ . In [25], it is shown that $(p,\ell,L)$ -list-recoverable codes ${\mathcal{C}}\subseteq[q]^{n}$ with $p=p_{*}(q,\ell,L)+\varepsilon$ have constant size (that is, independent of the block length $n$ ); however, this constant is massive in the parameters due to the use of a Ramsey-theoretic bound. In particular, the dependence on $\varepsilon$ is at least $(1/\varepsilon)^{2q^{L}}$ , and this is additionally multiplied by a tower of 2’s of height roughly $L$ .

To the best of our knowledge, prior work on this question focuses exclusively on the $q=2$ case. For example, in the case of $L=2$ (i.e., unique-decoding) we have $p_{*}(2,1,2)=1/4$ , and work by Levenshtein [20] shows that a code construction based on Hadamard matrices corrects a $1/4+\varepsilon$ fraction of errors and has size $1/(4\varepsilon)+O(1)$ . A particularly relevant prior work is due to Alon, Bukh and Polyanskiy [1]. Herein the authors consider this question for the special case of $q=2$ (and thus, necessarily, only for list-decoding). In particular, they show that when $L$ is even if $p=p_{*}(2,1,L)+\varepsilon$ then such a $(p,L)$ -list-decodable code ${\mathcal{C}}\subseteq[2]^{n}$ has size at most $O_{L}(1/\varepsilon)$ , and moreover provide a construction of such a code with size $\Omega_{L}(1/\varepsilon)$ .³³3Note that for the special case of $q=2$ , the zero-rate threshold for list-decoding had already been established by Blinovsky [4]. They observe some interesting behaviour in the case of odd $L$ ; in particular, the maximum size of a $(p_{*}(2,1,3)+\varepsilon,3)$ -list-decodable code is $\Theta(1/\varepsilon^{3/2})$ .⁴⁴4This argument in fact shows a flaw in an earlier claimed proof of Blinovsky that claimed such codes have size $O_{L}(1/\varepsilon)$ for all $L\in{\mathbb{N}}$ .

Our motivations for this investigation are three-fold. Firstly, the zero-rate regime offers combinatorial challenges and interesting behaviours that we uncover in this work. Secondly, many codes that find applications in other areas of theoretical computer in fact have subconstant rate. Lastly, the zero-rate regime appears much more tractable than the positive rate regime – indeed, we can obtain tight upper and lower bounds on the size of a code, as we will soon see. It would be interesting to determine to what extent such techniques could be useful for understanding the positive rate regime as well.

1.1 Our results

Our main result in this work is a tight bound on the size of a $(p,\ell,L)$ -list-recoverable code over an alphabet of size $q\geq 3$ when $p>p_{*}(q,\ell,L)$ . The main technical challenge is to compute the following upper bound on the size of such a code.

Theorem 1.

Let $q,\ell,L\in{\mathbb{N}}$ with $q\geq 3$ . $\ell<q$ and $L>\ell$ be fixed constants. Let $\varepsilon>0$ and put $p=p_{*}(q,\ell,L)+\varepsilon$ . Suppose ${\mathcal{C}}\subseteq[q]^{n}$ is $(p,\ell,L)$ -list-recoverable. Then $|{\mathcal{C}}|\leq O_{q,\ell,L}(1/\varepsilon)$ .

We complement the above negative result with the following code construction, showing the upper bound is tight.

Theorem 2.

Let $q,\ell,L\in{\mathbb{N}}$ with $q\geq 3$ and $\ell<q$ be fixed constants. Let $\varepsilon>0$ and put $p=p_{*}(q,\ell,L)+\varepsilon$ . There exists a $(p,\ell,L)$ -list-recoverable code ${\mathcal{C}}\subseteq[q]^{n}$ such that $|{\mathcal{C}}|\geq\Omega_{q,\ell,L}(1/\varepsilon)$ .

We emphasize that in the above theorems the implied constants may depend on $q,\ell$ and $L$ .

Note that our results explicitly exclude the case of $q=2$ . As [1] prove, the binary alphabet behaves in subtle ways: the bound on the code size depends on the parity of $L$ . Intriguingly, our work demonstrates that such behaviour does not arise over larger alphabets.

1.2 Technical Overview

The double-counting argument.

Since our focus is on zero-rate list-decodable/-recoverable codes, it helps to first review the proof of the zero-rate threshold $p_{*}(q,\ell,L)$ . A lower bound can be easily obtained by a random construction that attains a positive rate for any $p\leq p_{*}(q,\ell,L)-\varepsilon$ . For the upper bound, let us first consider the list-decoding case, i.e., $\ell=1$ . The proof in [5, 6, 25], at a high-level, proceeds via a double-counting argument.⁵⁵5A characterization of $p_{*}(q,1,L)$ was announced in [5, 6] whose proof was flawed. The work [25] filled in the gaps therein and characterized $p_{*}(q,\ell,L)$ for general $\ell$ . For any $(p,\ell,L)$ -list-decodable code ${\mathcal{C}}\subset[q]^{n}$ , the proof aims to upper and lower bound the radius of a list averaged over the choice of the list from ${\mathcal{C}}$ :

\displaystyle\frac{1}{M^{L}}\sum_{({\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}% }_{L})\in{\mathcal{C}}^{L}}\operatorname{rad}_{\mathrm{H}}({\boldsymbol{c}}_{1% },\cdots,{\boldsymbol{c}}_{L}).

(1)

Comparing the bounds produces an upper bound on $|{\mathcal{C}}|$ . Here $\operatorname{rad}_{\mathrm{H}}(\cdot)$ , known as the Chebyshev radius of a list, is the relative radius of the smallest Hamming ball containing all codewords in the list. A lower bound on Equation 1 essentially follows from list-decodability of ${\mathcal{C}}$ . Indeed, each term (corresponding to lists consisting of distinct codewords) is lower bounded by $p$ , otherwise a list that fits into a ball of radius at most $n p$ is found, violating list-decodability of ${\mathcal{C}}$ . Therefore Equation 1 is at least $p-o(1)$ , where $o(1)$ is to account for lists with not-all-distinct codewords.

On the other hand, it is much more tricky to upper bound Equation 1 as, in general, $\operatorname{rad}_{\mathrm{H}}$ admits no analytically closed form and can only be computed by solving a min-max problem. Previous proofs [25] then first extracts a subcode ${\mathcal{C}}^{\prime}$ with highly-regular list structures via the hypergraph Ramsey’s theorem. This allows one to assert that all lists have essentially the same radius and all codewords in each list have essentially the same distance to the center of the list. As a result, the min-max expression is “linearized” and Equation 1 can be upper bounded when restricted to ${\mathcal{C}}^{\prime}$ . The downside is that the Ramsey reduction step is rather lossy for code size.

Weighted average radius.

The effect of the Ramsey reduction, put formally, is to enforce the average radius:

\displaystyle\overline{\operatorname{rad}}_{\mathrm{H}}({\boldsymbol{c}}_{1},% \cdots,{\boldsymbol{c}}_{L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{r}}\in\{0,1\}^{n}}\frac{1}% {L}\sum_{i=1}^{L}d_{\mathrm{H}}({\boldsymbol{c}}_{i},{\boldsymbol{r}})

(2)

of every list in the subcode to be approximately equal. To extract the regularity structures in lists without resorting to extremal bounds from Ramsey theory, [1] introduced the notion of weighted average radius which “linearizes” the Chebyshev radius in a weighted manner:

\displaystyle\overline{\operatorname{rad}}_{\omega}({\boldsymbol{c}}_{1},% \cdots,{\boldsymbol{c}}_{L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{r}}\in\{0,1\}^{n}}\sum_{i=% 1}^{L}\omega(i)d_{\mathrm{H}}({\boldsymbol{c}}_{i},{\boldsymbol{r}})

where $\omega$ is a distribution on $L$ elements. For any weighting $\omega$ , $\overline{\operatorname{rad}}_{\omega}$ of lists from the code forms a suite of succinct statistics of the list distribution. It turns out $\overline{\operatorname{rad}}_{U_{L}}=\overline{\operatorname{rad}}$ (where $U_{L}$ denotes the uniform distribution on $[L]$ ) is maximal under all $\omega$ . Recall that the double-counting argument suggests that in an optimal zero-rate code, the behaviour of the ensemble average of $\operatorname{rad}$ is essentially captured by that of $\overline{\operatorname{rad}}$ . In particular, list-decodability ensures that $\overline{\operatorname{rad}}$ of most lists should be large. However, not too many lists in an optimal code are expected to have large $\overline{\operatorname{rad}}_{\omega}$ for any $\omega\neq U_{L}$ . [1] then managed to quantify the gap between $\overline{\operatorname{rad}}=\overline{\operatorname{rad}}_{U_{L}}$ and $\overline{\operatorname{rad}}_{\omega}$ (with $\omega\neq U_{L}$ ), which yields improved (and sometimes optimal) size-radius trade-off of zero-rate codes.

Generalization to $𝒒$ -ary list-decoding.

Our major technical contribution is in extrapolating the above ideas to list-recovery. The challenge lies particularly in defining a proper notion of weighted average radius and proving its properties. Our definition relies crucially on an embedding $\varphi$ from $[q]$ to the simplex in ${\mathbb{R}}^{q}$ and relaxes the center ${\boldsymbol{r}}$ of the list to be a fractional vector. Specifically, denoting by $\Delta=\left\{{\boldsymbol{a}}\in{\mathbb{R}}_{\geq 0}^{q}:\sum_{i=1}^{q}a_{i}% =1\right\}$ the simplex in ${\mathbb{R}}^{q}$ and $\partial\Delta=\{{\boldsymbol{e}}_{1},\cdots,{\boldsymbol{e}}_{q}\}$ its vertices (i.e., the standard basis of ${\mathbb{R}}^{q}$ ), we let the embedding $\varphi$ map each symbol $x\in[q]$ to the standard basis vector ${\boldsymbol{e}}_{x}\in\partial\Delta$ . Denoting by ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ the (element-wise) images of a list ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}\in[q]^{n}$ , we define the weighted average radius of ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}$ as:

\displaystyle\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},% \cdots,{\boldsymbol{x}}_{L})

\displaystyle=\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\frac{1}{2}% \mathop{\mathbb{E}}_{i\sim\omega}\left[\left\|{\boldsymbol{x}}_{i}-{% \boldsymbol{y}}\right\|_{1}\right],

(3)

where $\omega$ is any distribution on $[L]$ .

The notriviality and significance of the above notion, especially the embedding used therein, is three-fold.

$\blacksquare$

First, as the weighting $\omega$ varies, $\overline{\operatorname{rad}}_{\omega}$ serves as a bridge between the standard average radius in Equation 2 and the Chebyshev radius. Indeed, $\omega=U_{L}$ recovers the former, and the maximum $\overline{\operatorname{rad}}_{\omega}$ over $\omega$ recovers the latter. However, we caution that the second statement does not hold without the embedding since the Hamming distance between $q$ -ary symbols per se is not and cannot be interpolated by a convex function, which makes the minimax theorem inapplicable. Fortunately, our embedding affinely extend the $q$ -ary Hamming distance to the simplex therefore brings back the applicability of the minimax theorem and connects $\max_{\omega}\overline{\operatorname{rad}}_{\omega}$ to $\operatorname{rad}$ .
$\blacksquare$

Second, our definition in Equation 3 allows ${\boldsymbol{y}}$ to take any value on the simplex, instead of only its vertices, i.e., the image of $[q]$ under $\varphi$ . Though embedding naively to the hypercube $[0,1]^{q}$ seems convenient, upon solving the expression with fractional ${\boldsymbol{y}}$ one does not necessarily obtain a notion that is guaranteed to closely approximate the original version with integral ${\boldsymbol{y}}$ . In contrast, using linear programming duality, we show that our embedding yields relaxed notion of radius which closely approximates the actual Chebyshev radius. Indeed, upon rounding the fractional center ${\boldsymbol{y}}$ and taking its pre-image under $\varphi$ , our results guarantee that the resulting radius must have negligible difference from the Chebyshev radius. Precisely speaking, we want to find a vector ${\boldsymbol{y}}=(y(i,j))_{[n]\times[q]}\in\Delta$ close to the $L$ images of the codewords ${\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{L}$ by linear programming. Meanwhile, we want ${\boldsymbol{y}}(i):=(y(i,1),\ldots,y(i,q))$ to belong to $\partial\Delta$ so that we can find a preimage of ${\boldsymbol{y}}(i)$ in $[q]$ . Since ${\boldsymbol{y}}(i)\in\Delta$ , the components in ${\boldsymbol{y}}(i)$ are subject to $\sum_{j=1}^{q}y(i,j)=1$ . This implies that at least one component of ${\boldsymbol{y}}(i)$ is nonzero. The basic feasible solution guarantees that there exists a feasible solution such that most of $y(i,j)$ are $0$ . Combining with the fact $\sum_{j=1}^{q}y(i,j)=1$ forces $(y(i,1),\ldots,y(i,q))\in\partial\Delta$ for almost all $i\in[n]$ . Thus, we obtain a negligible loss in the conversion between Hamming distance and Euclidean distance.
$\blacksquare$

Finally, under the embedding $\varphi$ , the weighted average radius $\overline{\operatorname{rad}}_{\omega}$ still retains the appealing feature that the minimization can be analytically solved, therefore giving rise to an explicit expression (see Equation 31) which greatly facilitates our analysis.

We then show, via techniques deviating from those in [1], three key properties that are required by the subsequent arguments.

1.

For any fixed distribution $P$ , if entries of codewords in the list are generated i.i.d. using $P$ , then

$f(P,\omega)\coloneqq\mathop{\mathbb{E}}_{(X_{1},\cdots,X_{L})\sim P^{\otimes L% }}\left[1-\max_{x\in[q]}\sum_{\begin{subarray}{c}i\in[L]\\ X_{i}=x\end{subarray}}\omega(i)\right]$

is maximized when $\omega=U_{L}$ . Moreover, the equality holds if and only if $\omega=U_{L}$ for $q\geq 3$ and any $L$ . Our approach is different from [1] as we can not explicitly represent function $f(P,\omega)$ .
2.

Furthermore, if entries of codewords in the list are generated i.i.d. using a certain $P$ , then $f(P,U_{L})$ is upper bounded by $f(P_{q,p},U_{L})$ with $P_{q,p}=(\frac{1-p}{q},\ldots,\frac{1-p}{q},p)$ and $p=\max_{i\in[q]}P(i)$ . This follows from the Schur convexity property proved in [25].
3.

Finally, denoting by $P_{i}$ the distribution of the $i$ -th components of codewords in code ${\mathcal{C}}$ , Schur convexity promises $f(P_{i},U_{L})\leq f(P_{q,p_{i}},U_{L})$ . In [25], it is proved that $f(P_{q,p},U_{L})$ is convex for $p\in[1/q,1]$ . Thus, we can conclude that

$\frac{1}{n}\sum_{i\in[n]}f(P_{i},U_{L})\leq f(P_{q,p},U_{L})$

with $p=\frac{1}{n}\sum_{i\in[n]}p_{i}$ .

The remaining part of our proof is similar to [1]. We show that a code ${\mathcal{C}}$ either has radius

\operatorname{rad}({\mathcal{C}})=\frac{1}{n}\min_{{\boldsymbol{x}}\in[q]^{n}}% \max_{{\boldsymbol{c}}\in{\mathcal{C}}}d_{\mathrm{H}}({\boldsymbol{c}},{% \boldsymbol{x}})\leq 1-\frac{1}{q}-\delta

or most of $L$ -tuples with distinct codewords in ${\mathcal{C}}$ are distributed close to uniform,. For the former case, we use the convexity property to show that the list-decodability of ${\mathcal{C}}$ can not exceed $f(U_{q},U_{L})=p_{*}(q,L)$ by much. For the latter case, since most of $L$ -tuples of distinct codewords in ${\mathcal{C}}^{L}$ looks uniformly at random, we can show that the list-decodablilty of ${\mathcal{C}}$ is very close to that of random codes which is $f(U_{q},U_{L})$ .

Generalization to list-recovery.

For list-recovery, i.e., $\ell>1$ , we find an embedding $\varphi_{\ell}$ that maps each element in $[q]$ to a superposition of $\ell$ vertices of the simplex in ${\mathbb{R}}^{q}$ , i.e., we map the element in $[q]$ to a vector space $[0,1]^{{\mathcal{X}}}$ where ${\mathcal{X}}=\binom{[q]}{\ell}$ is the collection of all $\ell$ -subsets in $[q]$ . Concretely, we define $\varphi_{\ell}(i):=\sum_{A\in{\mathcal{X}},i\in A}{\boldsymbol{e}}_{A}$ where $({\boldsymbol{e}}_{A})_{A\in{\mathcal{X}}}$ is a standard basis of ${\mathbb{R}}^{{\mathcal{X}}}$ . The intuition behind this map is that if $i\in X$ , we have $\left\|\varphi_{\ell}(i)-{\boldsymbol{e}}_{X}\right\|_{1}=\binom{q}{\ell}-1$ and otherwise $\left\|\varphi_{\ell}(i)-{\boldsymbol{e}}_{X}\right\|_{1}=\binom{q}{\ell}+1$ . Similar to the list decoding, given $L$ codewords in $[q]^{n}$ , we obtain $L$ vectors ${\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{L}$ under the map $\varphi_{\ell}$ . Our goal is to find a vector ${\boldsymbol{y}}=(y(i,A))_{[n]\times{\mathcal{X}}}$ close to these $L$ vectors subject to the constraint that $\sum_{A\in{\mathcal{X}}}y(i,A)=1$ for any $i\in[n]$ . This constraint combined with the basic feasible solution argument forces that for almost all $i\in[n]$ , $(y(i,A))_{A\in{\mathcal{X}}}$ is of the form ${\boldsymbol{e}}_{X}$ . For such $i$ , we can find an $\ell$ -subset $X\in{\mathcal{X}}$ preserving the distance, i.e.,

d_{\mathrm{LR}}(i,X)=\mathds{1}{\left\{i\notin X\right\}}=\frac{1}{2}\left(% \left\|\varphi_{\ell}(i)-{\boldsymbol{e}}_{X}\right\|_{1}-\binom{q}{\ell}+1% \right).

Besides the linear programming relaxation, further adjustments for the proof of properties analogous to Items 1, 2, and 3 above are required.

Code construction.

As alluded to before, a code that saturates the optimal size-radius trade-off should essentially saturate both the upper and lower bounds on the quantity

\displaystyle\frac{1}{M^{L}}\sum_{({\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}% }_{L})\in{\mathcal{C}}^{L}}\overline{\operatorname{rad}}_{\mathrm{H}}({% \boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L})

considered in the double-counting argument. Indeed, our impossibility result implies that any optimal zero-rate code must contain a large fraction of random-like $L$ -tuples $({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{L})$ , i.e., for every ${\boldsymbol{u}}\in[q]^{L}$

\sum_{i=1}^{n}\mathds{1}{\left\{({\boldsymbol{c}}_{1}(i),\ldots,{\boldsymbol{c% }}_{L}(i))={\boldsymbol{u}}\right\}}\approx\frac{n}{q^{L}}

(4)

where ${\boldsymbol{c}}_{j}=({\boldsymbol{c}}_{j}(1),\ldots,{\boldsymbol{c}}_{j}(n))% \in[q]^{n}$ . To match such an impossibility result, an optimal construction should contain as many such $L$ -tuples as possible. A simplex-like code then becomes a natural candidate. This is a natural extension of the construction in [1] to larger alphabet. An $M\times n$ codebook ${\mathcal{C}}$ consisting of $M$ codewords each of length $n$ is constructed by putting as columns all possible distinct length- $M$ vectors that contains identical numbers of $1,2,\cdots,q$ . It is not hard to see by symmetry that (4) becomes equality for every $L$ -tuple with distinct codewords in ${\mathcal{C}}$ . Thus, ${\mathcal{C}}$ is the most regular code.

We also remark that, unlike for positive-rate codes, the prototypical random construction (with expurgation) does not lead to favorable size-radius trade-off since the deviation of random sampling is comparatively too large in the zero-rate regime. In contrast, the simplex code is deterministically regular and has no deviation.

1.3 Organization

The remainder is organized as follows. First, Section 2 provides the necessary notations and definitions, together with some preliminary results which will be useful in the subsequent arguments. Sections 3.1, 3.2, 3.3, and 3.4 contain our argument establishing Theorem 1 for list-decoding (i.e. the case $\ell=1$ ); in Section 4 we elucidate the changes that need to be made to establish the theorem for general $\ell$ . Next, Section 5 provides the code construction establishing Theorem 2.

2 Preliminaries

Firstly, for convenience of the reader we begin by summarizing the notation that we use. This is particularly relevent as we will often be in situations where we need multiple indexes for, e.g., lists of vectors where each coordinate lies in a probability simplex, so the reader is encouraged to refer to this table whenever it is unclear what is intended.

Table 1: Notation for list-decoding.

English letter in boldface	$[q]^{n}$ -valued vector
Greek letter in boldface	$\Delta([q])$ -valued vector
$\Delta:=\Delta([q])$	Simplex in $[0,1]^{q}$ , i,e., $\Delta=\{(x_{1},\ldots,x_{q})\in[0,1]^{q}:\sum_{i=1}^{q}x_{i}=1\}$
$\partial\Delta$	Set of vertices of $\Delta$
${\boldsymbol{e}}_{x}\in\partial\Delta$	The image of $x\in[q]$ under $\varphi$ , i.e., the $x$ -th vertex of $\Delta$
${\boldsymbol{c}}_{i}\in[q]^{n}$	The $i$ -th codeword in a list
${\boldsymbol{x}}_{i}\in\Delta^{n}$	Image of ${\boldsymbol{c}}_{i}$ under $\varphi$ (applied component-wise)
${\boldsymbol{y}}\in\Delta^{n}$	Relaxed center of a list
${\boldsymbol{x}}(j)\in\partial\Delta,{\boldsymbol{y}}(j)\in\Delta$	The $j$ -th block (of length $q$ ) in ${\boldsymbol{x}}\in(\partial\Delta)^{n},{\boldsymbol{y}}\in\Delta^{n}$ , respectively
$x(j,k)\in\{0,1\},y(j,k)\in[0,1]$	The $(j,k)$ -th element of ${\boldsymbol{x}}\in(\partial\Delta)^{n},{\boldsymbol{y}}\in\Delta^{n}$ , respectively
$\operatorname{rad}_{\mathrm{H}}$	(Standard) Chebyshev radius
$\operatorname{rad}$	Relaxed Chebyshev radius
$\overline{\operatorname{rad}}$	Average radius
$\overline{\operatorname{rad}}_{\omega}$	Average radius weighted by $\omega\in\Delta([L])$
$f(P,\omega)$	Expected average radius (weighted by $\omega$ ) of $P$ -distributed symbols
$(X_{1},\cdots,X_{L})\sim P^{\otimes L}$	A list of i.i.d. $P$ -distributed symbols
$U_{k}$	Uniform distribution on $[k]$

For a finite set $S$ and an integer $0\leq k\leq|S|$ , we denote $\binom{S}{k}\coloneqq\left\{T\subset S:|T|=k\right\}$ . Let $[q]=\{1,\ldots,q\}$ .

2.1 List-Decoding

Fix $q\in{\mathbb{Z}}_{\geq 3}$ and $L\in{\mathbb{Z}}_{\geq 2}$ . Let $d_{\mathrm{H}}({\boldsymbol{c}},{\boldsymbol{r}})$ denote the Hamming distance between ${\boldsymbol{c}},{\boldsymbol{r}}\in[q]^{n}$ , i.e., the number of coordinates on which the strings differ. For $t\in[0,n]$ , let $\mathcal{B}_{\mathrm{H}}({\boldsymbol{y}},t):=\{{\boldsymbol{c}}\in[q]^{n}:d_{% \mathrm{H}}({\boldsymbol{c}},{\boldsymbol{y}})\leq t\}$ denote the Hamming ball centered around ${\boldsymbol{y}}$ of radius $\left\lfloor t\right\rfloor$ .

Definition 3 (List-decodable code).

Let $p\in[0,1]$ . A code ${\mathcal{C}}\subseteq[q]^{n}$ is $(p,L)_{q}$ -list-decodable if for any ${\boldsymbol{y}}\in[q]^{n}$ ,

\displaystyle\left|{\mathcal{C}}\cap\mathcal{B}_{\mathrm{H}}({\boldsymbol{y}},% np)\right|

\displaystyle\leq L-1.

In [25] the zero-rate regime for list-decoding was derived, which is the supremum over $p\in[0,1]$ for which $(p-\varepsilon,L)_{q}$ -list-decodable codes of positive rate exist for all $\varepsilon>0$ . This value was shown to be

\displaystyle p_{*}(q,L)=1-\frac{1}{L}\mathop{\mathbb{E}}_{(X_{1},\cdots,X_{L}% )\sim U_{q}^{\otimes L}}\left[\mathsf{pl}(X_{1},\cdots,X_{L})\right],

(5)

where the function $\mathsf{pl}$ outputs the number of times the most popular symbol appears. In [25] it is shown that $(p_{*}(q,L)+\varepsilon,L)$ -list-decodable codes have size $O_{\varepsilon,q,L}(1)$ , i.e., some constant independent of $n$ . Our target in this work is to show that the correct dependence on $\varepsilon$ is $O_{q,L}(1/\varepsilon)$ , except for the case of $q=2$ with odd $L$ .

A “dual” definition of list-decodability is proffered by the Chebyshev radius.

Definition 4 (Chebyshev radius).

The Chebyshev radius of a list of distinct vectors ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}\in[q]^{n}$ is defined as

\displaystyle\operatorname{rad}_{\mathrm{H}}({\boldsymbol{c}}_{1},\cdots,{% \boldsymbol{c}}_{L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{r}}\in[q]^{n}}\max_{i\in[L% ]}d_{\mathrm{H}}({\boldsymbol{c}}_{i},{\boldsymbol{r}}).

Observe that a code ${\mathcal{C}}\subseteq[q]^{n}$ is $(p,L)$ -list-decodable if and only if

\displaystyle\min\{\operatorname{rad}_{\mathrm{H}}(\boldsymbol{c}_{1},\dots,% \boldsymbol{c}_{L}):\boldsymbol{c}_{1},\dots,\boldsymbol{c}_{L}\in{\mathcal{C}% }\text{ distinct}\}>p\ .

(6)

In particular, to show a code fails to be list-decodable, it suffices to upper bound the Chebyshev radius of $L$ distinct codewords.

Recall that our main target is an upper bound on the size of list-decodable/-recoverable codes (in the zero-rate regime). A natural approach is to derive from Equation 6 the desired bound on the code; however, this quantity is quite difficult to work with directly. We therefore work with a relaxed version, which we now introduce.

We require the following definitions. Let us embed $[q]^{n}$ into the simplex $\Delta([q])$ via the following map $\varphi$ :

\displaystyle\begin{array}[]{rclc}\varphi\colon&[q]&\to&\Delta([q])\\ &x&\mapsto&{\boldsymbol{e}}_{x}\end{array}

(9)

where ${\boldsymbol{e}}_{x}$ is the $q$ -dimensional vector with a $1$ in the $x$ -th location and $0$ everywhere else. Denote by $\Delta=\Delta([q])$ the simplex and $\partial\Delta=\{{\boldsymbol{e}}_{1},\cdots,{\boldsymbol{e}}_{q}\}$ its vertices. For ${\boldsymbol{\chi}}={\boldsymbol{e}}_{x}\in\partial\Delta$ and ${\boldsymbol{\eta}}\in\Delta$ , let

\displaystyle d({\boldsymbol{\chi}},{\boldsymbol{\eta}})

\displaystyle:=\frac{1}{2}\left\|{\boldsymbol{\chi}}-{\boldsymbol{\eta}}\right% \|_{1}=\frac{1}{2}\left(1-{\boldsymbol{\eta}}(x)+\sum_{x^{\prime}\in[q]% \setminus\{x\}}{\boldsymbol{\eta}}(x^{\prime})\right).

(10)

Note that if ${\boldsymbol{\eta}}={\boldsymbol{e}}_{y}\in\partial\Delta$ , then

\displaystyle d({\boldsymbol{\chi}},{\boldsymbol{\eta}})

\displaystyle=d_{\mathrm{H}}(x,y).

From now on we will only work with $\Delta^{n}$ -valued vectors and will still denote such length- $q n$ vectors by boldface letters, abusing the notation. For ${\boldsymbol{y}}\in\Delta^{n}$ , we use $y(j,k)\in[0,1]$ to denote its $(j,k)$ -th element and use ${\boldsymbol{y}}(j)=(y(j,1),\ldots,y(j,q))\in\Delta$ to denote its $j$ -th block of size $q$ . For ${\boldsymbol{c}}\in[q]^{n}$ , we use ${\boldsymbol{c}}(j)\in[q]$ to denote its $j$ -th element.

For ${\boldsymbol{x}}\in(\partial\Delta)^{n}$ and ${\boldsymbol{y}}\in\Delta^{n}$ , the definition of $d(\cdot,\cdot)$ can be extended to length- $q n$ vectors in the natural way. Specifically,

\displaystyle d({\boldsymbol{x}},{\boldsymbol{y}})

\displaystyle=\sum_{j=1}^{n}d({\boldsymbol{x}}(j),{\boldsymbol{y}}(j)).

(11)

We may now define the relaxed Chebyshev radius.

Definition 5.

The relaxed Chebyshev radius of a list of distinct vectors ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ is

\displaystyle\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{% L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\max_{i% \in[L]}d({\boldsymbol{x}}_{i},{\boldsymbol{y}}).

(12)

Observe that

\displaystyle\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{% L})

\displaystyle\leq\operatorname{rad}_{\mathrm{H}}({\boldsymbol{c}}_{1},\cdots,{% \boldsymbol{c}}_{L}).

(13)

where $\varphi({\boldsymbol{c}}_{i})={\boldsymbol{x}}_{i}$ (here we extend the definition of $\varphi$ to length- $n$ inputs in a similar way as in Equation 11). This justifies the “relaxation” terminology.

As a last piece of terminology, we define the radius of a code.

Definition 6.

For any code ${\mathcal{C}}\in[q]^{n}$ , we define the Chebyshev radius of ${\mathcal{C}}$ as

\operatorname{rad}({\mathcal{C}})=\frac{1}{n}\min_{{\boldsymbol{x}}\in[q]^{n}}% \max_{{\boldsymbol{c}}\in{\mathcal{C}}}d_{\mathrm{H}}({\boldsymbol{c}},{% \boldsymbol{x}}).

3 Zero-Rate List-Decoding

3.1 Linear Programming Relaxation

We have shown in Equation 13 that $\operatorname{rad}$ is smaller than $\operatorname{rad}_{\mathrm{H}}$ . Conversely, Lemma 7 below establishes that $\operatorname{rad}$ and $\operatorname{rad}_{\mathrm{H}}$ do not differ much. That is, for any list, if a center ${\boldsymbol{y}}\in\Delta^{n}$ achieves a relaxed radius $t$ , then there must exist ${\boldsymbol{r}}\in[q]^{n}$ attaining approximately the same $t$ for sufficiently large $n$ .

Lemma 7 ( $\operatorname{rad}$ is close to $\operatorname{rad}_{\mathrm{H}}$ ).

Let ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}\in[q]^{n}$ . Denote by ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ the images of ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}$ under the embedding $\varphi$ . Then

\displaystyle\operatorname{rad}_{\mathrm{H}}({\boldsymbol{c}}_{1},\cdots,{% \boldsymbol{c}}_{L})

\displaystyle\leq\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x% }}_{L})+\frac{L}{n}.

Proof.

Suppose $\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L})=t$ . Then there exists ${\boldsymbol{y}}\in\Delta^{n}$ such that for every $i\in[L]$ ,

\displaystyle d({\boldsymbol{x}}_{i},{\boldsymbol{y}})

\displaystyle=\frac{1}{2}\sum_{j=1}^{n}\left(1-y(j,c_{i}(j))+\sum_{x\in[q]% \setminus\{c_{i}(j)\}}y(j,x)\right)\leq t,

where the first equality is by Equations 10 and 11. That is, the following polytope is nonempty:

\left\{{\boldsymbol{y}}\in\Delta^{n}:\forall i\in[L],\;d({\boldsymbol{x}}_{i},% {\boldsymbol{y}})\leq t\right\}\\ =\left\{(y(j,k))_{(j,k)\in[n]\times[q]}:\begin{array}[]{l}\forall(j,k)\in[n]% \times[q],\,y(j,k)\geq 0,\\ \forall j\in[n],\,\sum_{k=1}^{q}y(j,k)=1,\\ \forall i\in[L],\,\frac{1}{2}\sum_{j=1}^{n}\left(1-y(j,{\boldsymbol{c}}_{i}(j)% )+\sum_{x\in[q]\setminus\{{\boldsymbol{c}}_{i}(j)\}}y(j,x)\right)\leq t\end{% array}\right\}.

(14)

Equivalently, the following linear program (LP) is feasible:

\displaystyle\begin{array}[]{cl}\max\limits_{(y(j,k))_{(j,k)\in[n]\times[q]}}&% 0\\ \mathrm{s.t.}&\forall(j,k)\in[n]\times[q],\,y(j,k)\geq 0,\\ &\forall j\in[n],\,\sum_{k=1}^{q}y(j,k)=1,\\ &\forall i\in[L],\,\frac{1}{2}\sum_{j=1}^{n}\left(1-y(j,{\boldsymbol{c}}_{i}(j% ))+\sum_{x\in[q]\setminus\{{\boldsymbol{c}}_{i}(j)\}}y(j,x)\right)\leq t.\end{array}

(19)

Since the equality $\left\langle{\boldsymbol{a}},{\boldsymbol{y}}\right\rangle\leq b$ is equivalent to the equality $\left\langle{\boldsymbol{a}},{\boldsymbol{y}}\right\rangle+z=b,z\geq 0$ , the above LP can be written in equational form:

\displaystyle\begin{array}[]{cl}&\max\limits_{(y(j,k))_{(j,k)\in[n]\times[q]},% (z(i))_{i\in[L]}}0\\ \mathrm{s.t.}&\forall(j,k)\in[n]\times[q],\,y(j,k)\geq 0,\\ &\forall i\in[L],\,z(i)\geq 0,\\ &\forall j\in[n],\,\sum\limits_{k=1}^{q}y(j,k)=1,\\ &\forall i\in[L],\,\frac{1}{2}\sum\limits_{j=1}^{n}\left(1-y(j,{\boldsymbol{c}% }_{i}(j))+\sum\limits_{x\in[q]\setminus\{{\boldsymbol{c}}_{i}(j)\}}y(j,x)% \right)+z(i)\leq t,\end{array}

(25)

or more compactly in matrix form:

\displaystyle\begin{array}[]{cl}\max\limits_{{\boldsymbol{y}}\in{\mathbb{R}}^{% nq},{\boldsymbol{z}}\in{\mathbb{R}}^{L}}&0\\ \mathrm{s.t.}&\begin{bmatrix}A&I_{L}\\ B&0\end{bmatrix}\begin{bmatrix}{\boldsymbol{y}}\\ {\boldsymbol{z}}\end{bmatrix}=\begin{bmatrix}t\boldsymbol{1}_{L}\\ \boldsymbol{1}_{n}\end{bmatrix},\\ &{\boldsymbol{y}},{\boldsymbol{z}}\geq\boldsymbol{0}.\end{array}

(29)

Here $A\in{\mathbb{R}}^{L\times(nq)}$ and $B\in{\mathbb{R}}^{n\times(nq)}$ encode respectively the fourth and third constraints in Equation 25, and $I_{L}\in{\mathbb{R}}^{L\times L},\boldsymbol{1}_{L}\in{\mathbb{R}}^{L}$ denote respectively the $L\times L$ identity matrix and the all-one vector of length $L$ . It is clear that $\operatorname{rk}\left(\begin{bmatrix}A&I_{L}\\ B&0\end{bmatrix}\right)\leq n+L.$ This implies that there exists a feasible solution ${\boldsymbol{y}},{\boldsymbol{z}}$ that has at most $n+L$ nonzeros and thus ${\boldsymbol{y}}=(y(j,k))_{(j,k)\in[n]\times[q]}$ has at most $n+L$ nonzeros. Indeed, such solutions are known as the basic feasible solutions. Note that for every block $j\in[n]$ , $\sum_{k=1}^{q}y(j,k)=1$ . This implies that $y(j,1),\ldots,y(j,q)$ cannot be simultaneously $0$ . Moreover, if $q-1$ out of them are $0$ , the remaining one is forced to be $1$ . Since there are $n$ blocks in total, by the pigeonhole principle, there are at least $n-L$ choices of $j\in[n]$ such that ${\boldsymbol{y}}(j)=(y(j,1),\ldots,y(j,q))\in\partial\Delta$ . Without loss of generality, we assume that these $n-L$ indices are $1,\ldots,n-L$ . Let ${\boldsymbol{r}}\in[q]^{n}$ be such that $\varphi({\boldsymbol{r}}(j))={\boldsymbol{y}}(j)$ for $j=1,\ldots,n-L$ and $r(j)$ is any value in $[q]$ for $j=n-L+1,\ldots,n$ . Since $d({\boldsymbol{x}}_{i}(j),{\boldsymbol{y}}(j))\in[0,1]$ and $d_{\mathrm{H}}({\boldsymbol{c}}_{i}(j),{\boldsymbol{r}}(j))\in\{0,1\}$ , the difference between $d({\boldsymbol{x}}_{i},{\boldsymbol{y}})$ and $d_{\mathrm{H}}({\boldsymbol{c}}_{i},{\boldsymbol{r}})$ is at most $L$ . The proof is completed. $\hfill\blacktriangleleft$

We further relax $\operatorname{rad}$ by defining the weighted average radius. For ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ and $\omega\in\Delta([L])$ , let

\displaystyle\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},% \cdots,{\boldsymbol{x}}_{L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\mathop{% \mathbb{E}}_{i\sim\omega}\left[d({\boldsymbol{x}}_{i},{\boldsymbol{y}})\right]% =\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\sum_{i\in[L]}\omega(i)d({% \boldsymbol{x}}_{i},{\boldsymbol{y}}).

In words, weighted average radius is obtained by replacing the maximization over $i\in[L]$ in the definition of relaxed radius (see Equation 12) with an average with respect to a distribution $\omega$ .

Since the objective of the minimization is separable, one can minimize over each $y(j)$ individually and obtain an alternative expression. Suppose ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ are the images of ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}\in[q]^{n}$ under the embedding $\varphi$ . Then

	$\displaystyle\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},% \cdots,{\boldsymbol{x}}_{L})=\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}% \sum_{i\in[L]}\omega(i)d({\boldsymbol{x}}_{i},{\boldsymbol{y}})$
	$\displaystyle=\frac{1}{2n}\min_{({\boldsymbol{y}}_{1},\cdots,{\boldsymbol{y}}_% {n})\in\Delta^{n}}\sum_{j=1}^{n}\left[\sum_{i\in[L]}\omega(i)\left(1-{% \boldsymbol{y}}_{j}({\boldsymbol{c}}_{i}(j))+\sum_{x\in[q]\setminus\{{% \boldsymbol{c}}_{i}(j)\}}{\boldsymbol{y}}_{j}(x)\right)\right]$
	$\displaystyle=\frac{1}{2n}\sum_{j=1}^{n}\min_{{\boldsymbol{y}}_{j}\in\Delta}% \left[\sum_{i\in[L]}\omega(i)\left(1-2{\boldsymbol{y}}_{j}({\boldsymbol{c}}_{i% }(j))+\sum_{x\in[q]}{\boldsymbol{y}}_{j}(x)\right)\right]$		(30)
	$\displaystyle=\frac{1}{2n}\sum_{j=1}^{n}\min_{{\boldsymbol{y}}_{j}\in\Delta}% \left[\sum_{i\in[L]}\omega(i)\left(2-2{\boldsymbol{y}}_{j}({\boldsymbol{c}}_{i% }(j))\right)\right]$
	$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\min_{{\boldsymbol{y}}_{j}\in\Delta}% \left[1-\sum_{i\in[L]}\omega(i){\boldsymbol{y}}_{j}({\boldsymbol{c}}_{i}(j))% \right]=1-\frac{1}{n}\sum_{j=1}^{n}\max_{x\in[q]}\sum_{\begin{subarray}{c}i\in% [L]\\ {\boldsymbol{c}}_{i}(j)=x\end{subarray}}\omega(i).$		(31)

Equation 30 holds since the objective in brackets only depends on ${\boldsymbol{y}}_{j}$ , not on other $({\boldsymbol{y}}_{j^{\prime}})_{j^{\prime}\in[n]\setminus\{j\}}$ . To see Equation 31, we note that a maximizer ${\boldsymbol{y}}^{*}\in\Delta$ to the following problem $\max_{{\boldsymbol{y}}\in\Delta}\sum_{i\in[L]}\omega(i)y(x_{i})$ , where $\omega\in\Delta([L])$ and $(x_{1},\cdots,x_{L})\in[q]^{L}$ are fixed, is given by ${\boldsymbol{y}}^{*}={\boldsymbol{e}}_{x^{*}}$ where $x^{*}\in[q]$ satisfies $x^{*}\in\operatorname*{argmax}_{x\in[q]}\sum_{i\in[L]}\omega(i)\mathds{1}{% \left\{x_{i}=x\right\}}.$

Obviously, by definition, for any ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ and $\omega\in\Delta([L])$ , $\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},\cdots,{% \boldsymbol{x}}_{L})\leq\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{% \boldsymbol{x}}_{L}).$ In fact, the following lemma shows that $\operatorname{rad}$ is equal to the maximum $\overline{\operatorname{rad}}_{\omega}$ over $\omega$ .

Lemma 8 ( $\operatorname{rad}$ equals maximum $\overline{\operatorname{rad}}_{\omega}$ ).

For any ${\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}\in(\partial\Delta)^{n}$ ,

\displaystyle\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{% L})

\displaystyle=\max_{\omega\in\Delta([L])}\overline{\operatorname{rad}}_{\omega% }({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{L}).

Proof.

Note that

\displaystyle\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{% L})

\displaystyle\coloneqq\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\max_{i% \in[L]}d({\boldsymbol{x}}_{i},{\boldsymbol{y}})=\frac{1}{n}\min_{{\boldsymbol{% y}}\in\Delta^{n}}\max_{\omega\in\Delta([L])}\sum_{i\in[L]}\omega(i)d({% \boldsymbol{x}}_{i},{\boldsymbol{y}}),

since the inner maximum is anyway achieved by a singleton distribution. Note also that the objective function

\displaystyle\sum_{i\in[L]}\omega(i)d({\boldsymbol{x}}_{i},{\boldsymbol{y}})

\displaystyle=\frac{1}{2}\sum_{i\in[L]}\omega(i)\sum_{j=1}^{n}\left(1-y(j,{% \boldsymbol{c}}_{i}(j))+\sum_{x\in[q]\setminus\{{\boldsymbol{c}}_{i}(j)\}}y(j,% x)\right)

is affine in $\omega$ and linear in ${\boldsymbol{y}}$ . Therefore, von Neumann’s minimax theorem allows us to interchange $\min$ and $\max$ and obtain

\displaystyle\operatorname{rad}({\boldsymbol{x}}_{1},\cdots,{\boldsymbol{x}}_{% L})

\displaystyle=\frac{1}{n}\max_{\omega\in\Delta([L])}\min_{{\boldsymbol{y}}\in% \Delta^{n}}\sum_{i\in[L]}\omega(i)d({\boldsymbol{x}}_{i},{\boldsymbol{y}})=% \max_{\omega\in\Delta([L])}\overline{\operatorname{rad}}_{\omega}({\boldsymbol% {x}}_{1},\cdots,{\boldsymbol{x}}_{L}),

as claimed by the lemma. $\hfill\blacktriangleleft$

In fact, we can say something stronger: it is not necessary to maximize over the entire (uncountable) probability simplex $\Delta([L])$ . Instead, we can extract a finite subset $\Omega_{L}\subset\Delta([L])$ and maximize over this set to recover $\operatorname{rad}$ . The following lemma is analogous to [1, Lemma 6].

Lemma 9 ( $\operatorname{rad}$ is achieved by finitely many $\omega$ ).

For every $L$ , there exists a finite set of probability measures $\Omega_{L}\subseteq\Delta([L])$ such that

\operatorname{rad}({\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{L})=\max_{% \omega\in\Omega_{L}}\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1% },\ldots,{\boldsymbol{x}}_{L}).

for all ${\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{L}\in\partial\Delta^{n}$ .

Proof.

The idea is to view the computation of $\max_{\omega\in\Omega_{L}}\overline{\operatorname{rad}}_{\omega}({\boldsymbol{% x}}_{1},\dots,{\boldsymbol{x}}_{L})$ as finding the maximum among some finite set of linear program maxima over some convex polytopes, and then to take $\Omega_{L}$ to be the set of vertices of the defined convex polytopes.

First, we define the convex polytopes based on a ( $q$ -ary version of a) signature. For each $\omega\in\Delta([L])$ , we define a signature for $\omega$ which is a function $S_{\omega}:[q]^{L}\rightarrow[q]$ such that

S_{\omega}({\boldsymbol{u}})\in\operatorname*{argmax}_{x\in[q]}\sum_{i:{% \boldsymbol{u}}(i)=x}\omega(i)

for ${\boldsymbol{u}}\in[q]^{L}$ . Define further the $q$ halfspaces $H_{{\boldsymbol{u}},x}:=\{\omega\in\Delta([L]):\sum_{i:{\boldsymbol{u}}(i)=x}% \omega(i)\geq 1/q\}$ for $x\in[q]$ . Observe that if $S({\boldsymbol{u}})=x$ where $S$ is a signature for $\omega$ then $\omega\in H_{{\boldsymbol{u}},x}$ . Thus, by ranging over the choices for ${\boldsymbol{u}}\in[q]^{L}$ and $x\in[q]$ we obtain $q^{L+1}$ halfspaces that partition the $(L-1)$ -dimensional space $\Delta([L])$ into at most $\sum_{j\leq L-1}\binom{q^{L+1}}{j}$ regions.

For each possible signature $S:[q]^{L}\to[q]$ , let $\Omega_{S}=\{\omega\in\Delta([L])\colon S\text{ is a signature for }\omega\}$ , and note that $\Omega_{S}$ is a convex polytope. Indeed, it is an intersection over ${\boldsymbol{u}}\in[q]^{L}$ of the convex polytopes

	$\displaystyle\left\{\omega\in\Delta([L])\colon\exists\text{ signature }\right.$	$\displaystyle\left.S_{\omega}\text{ for }\omega\text{ s.t. }S_{\omega}({% \boldsymbol{u}})=S({\boldsymbol{u}})\right\}$
		$\displaystyle=\bigcap_{y\in[q]\setminus S({\boldsymbol{u}})}\left\{\omega\in% \Delta([L])\colon\sum_{i:{\boldsymbol{u}}(i)=S({\boldsymbol{u}})}\omega(i)\geq% \sum_{i:{\boldsymbol{u}}(i)=y}\omega(i)\right\}$

where $S_{\omega}$ is a signature for $\omega$ . Now, to maximize

\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},\cdots,{% \boldsymbol{x}}_{L})=\frac{1}{n}\min_{{\boldsymbol{y}}\in\Delta^{n}}\sum_{i\in% [L]}\omega(i)d({\boldsymbol{x}}_{i},{\boldsymbol{y}})

over $\omega\in\Omega_{L}$ , consider the set $T_{\boldsymbol{u}}=\{i\in[n]:({\boldsymbol{x}}_{1}(i),\ldots,{\boldsymbol{x}}_% {L}(i))={\boldsymbol{u}}\}$ for ${\boldsymbol{u}}\in[q]^{L}$ and let $a_{{\boldsymbol{u}}}=\frac{|T_{\boldsymbol{u}}|}{n}$ . We claim it suffices to find the maximum of the following linear function:

\displaystyle\sum_{{\boldsymbol{u}}\in[q]^{L}}a_{{\boldsymbol{u}}}y_{{% \boldsymbol{u}}},\quad\text{s.t.}\quad y_{{\boldsymbol{u}}}=\sum_{i:{% \boldsymbol{u}}(i)=S({\boldsymbol{u}})}\omega(i)

(32)

over all $\omega\in\Omega_{S}$ . Indeed, by Equation 31, we have

\overline{\operatorname{rad}}_{\omega}({\boldsymbol{x}}_{1},\cdots,{% \boldsymbol{x}}_{L})=1-\frac{1}{n}\sum_{j=1}^{n}\max_{x\in[q]}\sum_{\begin{% subarray}{c}i\in[L]\\ c_{i}(j)=x\end{subarray}}\omega(i).

This implies that a maximizer only depends on the index set $T_{\boldsymbol{u}}$ , and furthermore that its value is determined by the $a_{{\boldsymbol{u}}}$ ’s as in Equation 32.

We can thus take the union of all vertex sets of all polytopes $\Omega_{S}$ for all signatures $S$ . Multiplying this by the $O_{q,L}(1)$ regions defined by all the halfspaces $H_{{\boldsymbol{u}},x}$ we obtain a finite set of vertices, as desired. $\hfill\blacktriangleleft$

3.2 Properties of $f(P,\omega)$

Now, we consider the expected weighted average radius of a sequence of i.i.d. symbols. Specifically, for $P\in\Delta([q])$ and $\omega\in\Delta([L])$ , let

\displaystyle f(P,\omega)

\displaystyle\coloneqq\mathop{\mathbb{E}}_{(X_{1},\cdots,X_{L})\sim P^{\otimes L% }}\left[\overline{\operatorname{rad}}_{\omega}({\boldsymbol{e}}_{X_{1}},\cdots% ,{\boldsymbol{e}}_{X_{L}})\right].

[1] studies $f(P,\omega)$ for $q=2$ and even $L$ . In this case, one can take advantage of the fact that $P\in\Delta([2])$ may be parametrized by a single real number, and thereby yield a fairly simple expression for $f(P,\omega)$ .

Nonetheless, in this subsection, we will show that all properties of $f(P,\omega)$ in [1] holding for $q=2$ and even $L$ can be generalized to any $q\geq 3$ and any $L$ . Let us first provide a more explicit expression for $f(P,\omega)$ using Equation 31:

	$\displaystyle f(P,\omega)$	$\displaystyle\coloneqq\mathop{\mathbb{E}}_{(X_{1},\cdots,X_{L})\sim P^{\otimes L% }}\left[1-\max_{x\in[q]}\sum_{\begin{subarray}{c}i\in[L]\\ X_{i}=x\end{subarray}}\omega(i)\right]$
		$\displaystyle=1-\sum_{(x_{1},\cdots,x_{L})\in[q]^{L}}\left(\prod_{i=1}^{L}P(x_% {i})\right)\max_{x\in[q]}\sum_{i\in[L]}\omega(i)\mathds{1}{\left\{x_{i}=x% \right\}}.$

We define the shorthand notation

\displaystyle\mathrm{max}_{\omega}(x_{1},\cdots,x_{L})

\displaystyle\coloneqq\max_{x\in[q]}\sum_{\begin{subarray}{c}i\in[L]\\ x_{i}=x\end{subarray}}\omega(i)

(33)

for any $\omega\in\Delta([L])$ and $(x_{1},\dots,x_{L})\in[q]^{L}$ .

The first property that we would like to establish is that $f(P,\omega)$ only increases if $\omega$ is replaced by $U_{L}$ , and furthermore that the maximum is uniquely obtained at $U_{L}$ if $P(x)>0$ for all $x\in[q]$ . In order to do this, we will regularly “average-out” coordinates of $\omega$ and then show that the function value increases (or at least, does not decrease). To be introduce some terminology, for $S\subseteq[L]$ we say that $\overline{\omega}$ is obtained from $\omega$ by averaging-out the subset $S$ of coordinates if $\overline{\omega}$ is defined as

\overline{\omega}(i)=\begin{cases}\frac{\sum_{j\in S}\omega(j)}{|S|}&i\in S\\ \omega(i)&i\notin S\end{cases}\ .

The following lemma gives a simple criterion for establishing that, if $\overline{\omega}$ is obtained from $\omega$ by averaging two coordinates, then $f(P,\overline{\omega})\leq f(P,\omega)$ , and it furthermore gives a criterion for the inequality to be strict. The main thrust of the proof of Lemma 11 is thus to show that this criterion is always satisfied.

Lemma 10.

Let $P\in\Delta([q])$ and $\omega\in\Delta([L])$ . Suppose $\omega(L-1)\neq\omega(L)$ and that $\overline{\omega}\in\Delta([L])$ is obtained by averaging-out the last two coordinates of $\omega$ . Suppose that for all $(x_{1},\dots,x_{L})\in[q]^{L}$ we have

\displaystyle\frac{1}{2}\left(\mathrm{max}_{\omega}(x_{1},\cdots,x_{L-1},x_{L}% )+\mathrm{max}_{\omega}(x_{1},\cdots,x_{L},x_{L-1})\right)

\displaystyle\geq\mathrm{max}_{\overline{\omega}}(x_{1},\cdots,x_{L-1},x_{L}).

(34)

Then $f(P,\overline{\omega})\geq f(P,\omega)$ .

Furthermore, suppose that additionally there exists $(x_{1},\dots,x_{L})\in[q]^{L}$ with $\prod_{i=1}^{L}P(x_{i})>0$ such that the inequality in Equation 34 is strict. Then $f(P,\overline{\omega})>f(P,\omega)$ .

Proof.

Define $\omega^{\prime}\in\Delta([L])$ as

\displaystyle\omega^{\prime}(i)

\displaystyle=\begin{cases}\omega(i),&i\in[L]\setminus\{L-1,L\}\\ \omega(L),&i=L-1\\ \omega(L-1),&i=L\end{cases}.

That is, $\omega^{\prime}$ is obtained by swapping the last two components of $\omega$ . By symmetry, we have $f(P,\omega)=f(P,\omega^{\prime})$ and so

	$\displaystyle f(P,\omega)$	$\displaystyle=\frac{1}{2}(f(P,\omega)+f(P,\omega^{\prime}))$
		$\displaystyle=\frac{1}{2}\left(1-\sum_{(x_{1},\cdots,x_{L})\in[q]^{L}}\left(% \prod_{i=1}^{L}P(x_{i})\right)\mathrm{max}_{\omega}(x_{1},\ldots,x_{L-1},x_{L}% )\right.$
		$\displaystyle\quad\quad\quad\left.+1-\sum_{(x_{1},\cdots,x_{L})\in[q]^{L}}% \left(\prod_{i=1}^{L}P(x_{i})\right)\mathrm{max}_{\omega^{\prime}}(x_{1},% \ldots,x_{L-1},x_{L})\right)$
		$\displaystyle=1-\sum_{(x_{1},\cdots,x_{L})\in[q]^{L}}\left(\prod_{i=1}^{L}P(x_% {i})\right)\frac{1}{2}\left(\mathrm{max}_{\omega}(x_{1},\ldots,x_{L-1},x_{L})+% \mathrm{max}_{\omega}(x_{1},\ldots,x_{L},x_{L-1}))\right)$
		$\displaystyle\leq 1-\sum_{(x_{1},\cdots,x_{L})\in[q]^{L}}\left(\prod_{i=1}^{L}% P(x_{i})\right)\mathrm{max}_{\overline{\omega}}(x_{1},\ldots,x_{L-1},x_{L})$
		$\displaystyle=f(P,\overline{\omega})\ ,$

where the inequality follows from Equation 34. From the above sequence of inequalities, it is also clear that if additionally there exists $(x_{1},\dots,x_{L})\in[q]^{L}$ with $\prod_{i=1}^{L}P(x_{i})>0$ for which the inequality in Equation 34 is strict, then $f(P,\overline{\omega})>f(P,\omega)$ . $\hfill\blacktriangleleft$

We now establish that the function value cannot decrease if $\omega$ is replaced by $U_{L}$ .

Lemma 11.

For any $P\in\Delta([q])$ and $\omega\in\Delta([L])$ , $f(P,\omega)\leq f(P,U_{L})$ .

Proof.

Fix any $(x_{1},\cdots,x_{L})\in[q]^{L}$ . Let $\omega\in\Delta([L])$ be non-uniform. Without loss of generality, assume $\omega(L-1)\neq\omega(L)$ . Let $\overline{\omega}\in\Delta([L])$ be obtained by uniformizing the last two components of $\omega$ , i.e.,

\displaystyle\overline{\omega}(i)

\displaystyle=\begin{cases}\omega(i),&i\in[L]\setminus\{L-1,L\}\\ \frac{1}{2}(\omega(L-1)+\omega(L)),&i\in\{L-1,L\}\end{cases}.

We claim $f(P,\overline{\omega})\geq f(P,\omega)$ . By Lemma 10, we just need to establish Equation 34.

Equation 34 trivially holds if $x_{L-1}=x_{L}$ . We therefore assume below $x_{L-1}\neq x_{L}$ . Let $x_{L-1}=a$ and $x_{L}=b$ . Let

\omega^{(a)}=\sum_{\begin{subarray}{c}i\in[L-2]\\ x_{i}=a\end{subarray}}\omega(i),\qquad\omega^{(b)}=\sum_{\begin{subarray}{c}i% \in[L-2]\\ x_{i}=b\end{subarray}}\omega(i)\ .

Then, we have

\sum_{\begin{subarray}{c}i\in[L]\\ x_{i}=a\end{subarray}}\overline{\omega}(i)=\omega^{(a)}+\frac{1}{2}(\omega(L-1% )+\omega(L)),\qquad\sum_{\begin{subarray}{c}i\in[L]\\ x_{i}=b\end{subarray}}\overline{\omega}(i)=\omega^{(b)}+\frac{1}{2}(\omega(L-1% )+\omega(L)).

We first assume that there exists $c\notin\{a,b\}$ such that

\mathrm{max}_{\overline{\omega}}(x_{1},\cdots,x_{L})=\sum_{\begin{subarray}{c}% i\in[L]\\ x_{i}=c\end{subarray}}\overline{\omega}(i)=\sum_{\begin{subarray}{c}i\in[L]\\ x_{i}=c\end{subarray}}{\omega}(i)

where the second equality follows since the set $\{i\in[q]:x_{i}=c\}$ does not contain $L-1,L$ . Equation 34 therefore holds as

\mathrm{max}_{{\omega}}(x_{1},\cdots,x_{L-1},x_{L})\geq\sum_{\begin{subarray}{% c}i\in[L]\\ x_{i}=c\end{subarray}}{\omega}(i),\qquad\mathrm{max}_{\omega}(x_{1},\cdots,x_{% L},x_{L-1})\geq\sum_{\begin{subarray}{c}i\in[L]\\ x_{i}=c\end{subarray}}{\omega}(i).

We proceed to the case that

\mathrm{max}_{\overline{\omega}}(x_{1},\cdots,x_{L})=\max\left\{\frac{1}{2}(% \omega(L-1)+\omega(L))+\omega^{(a)},\frac{1}{2}(\omega(L-1)+\omega(L))+\omega^% {(b)}\right\}.

Equation 34 holds as

\mathrm{max}_{\omega}(x_{1},\cdots,x_{L-1},x_{L})\geq\max\left\{\omega^{(a)}+% \omega(L-1),\omega^{(b)}+\omega(L)\right\}

and

\mathrm{max}_{{\omega}}(x_{1},\cdots,x_{L},x_{L-1})\geq\max\left\{\omega^{(a)}% +\omega(L),\omega^{(b)}+\omega(L-1)\right\}.

Thus, Lemma 10 implies $f(P,\overline{\omega})\geq f(P,\omega)$ , as desired.

We can then continue averaging components of $\omega$ and in this way obtain a sequence $(\omega_{i})_{i\in{\mathbb{N}}}$ of distributions with $\omega_{1}=\omega$ . This sequence converges in $\ell_{\infty}$ -norm to the uniform distribution $U_{L}$ and satisfies $f(P,\omega_{i+1})\geq f(P,\omega_{i})$ for all $i\in{\mathbb{N}}$ . Observing that $\omega\mapsto f(P,\omega)$ is a continuous function – the term $\max_{x\in[q]}\sum_{i\in[L]}\omega(i)\mathds{1}{\left\{x_{i}=x\right\}}$ is a maximum over linear functions of $\omega$ , hence linear, implying that $f(P,\cdot)$ is a linear combination of continuous functions – it follows that $f(P,U_{L})=\lim_{i\to\infty}f(P,\omega_{i})$ , and in particular that $f(P,U_{L})\geq f(P,\omega_{1})=f(P,\omega)$ , as desired. $\hfill\blacktriangleleft$

We now strengthen the conclusion of Lemma 11 by showing that for all $q\geq 3$ and $L\geq 2$ the function $\omega\mapsto f(P,\omega)$ is uniquely maximized by the setting $\omega=U_{L}$ , except for degenerate cases where $P(x)=0$ for some $x\in[q]$ .

Before stating and proving this fact, we note that the proof of Lemma 11 in fact shows that we can average out any subset of coordinates of $\omega$ and only increase the value of $f(P,\omega)$ . We formalize this fact in the following lemma, which will be useful in the following arguments.

Lemma 12.

Let $P\in\Delta([q])$ , $\omega\in\Delta([L])$ and $S\subseteq[L]$ . Let $\overline{\omega}$ be obtained from $\omega$ by averaging-out the subset $S$ of coordinates. Then $f(P,\overline{\omega})\geq f(P,\omega)$ .

Theorem 13.

Let $q\geq 3$ , $L\geq 2$ and let $P\in\Delta([q])$ be such that $P(x)>0$ ⁶⁶6In fact, our proof only apply with $P=U_{q}$ which clearly satisfies the condition. for all $x\in[q]$ . Then for all $\omega\in\Delta([L])$ , $f(P,\omega)\leq f(P,U_{L})$ with equality if and only if $\omega=U_{L}$ .

Proof.

The inequality was already established in Lemma 11, so we focus on showing $\omega=U_{L}$ when $f(P,\omega)=f(P,U_{L})$ . As $q\geq 3$ , let $a, b$ and $c$ denote $3$ distinct elements of $[q]$ . Let $\omega\neq U_{L}$ and suppose for a contradiction that $f(P,\omega)$ is a maximum of the function $\omega\mapsto f(P,\omega)$ . The proof proceeds via a number of cases.

1.

$𝑳$ is even. Without loss of generality, $\omega(L-1)<\omega(L)$ . If $L\geq 4$ , let $\omega^{\prime}$ be obtained from $\omega$ by averaging-out the first $L-2$ coordinates; by Lemma 12, $f(P,\omega^{\prime})\geq f(P,\omega)$ . If $L=2$ , set $\omega^{\prime}=\omega$ .

If $L\geq 4$ , since $2|(L-2)$ , we can set $x_{1}=\dots=x_{L/2-1}=a$ and $x_{L/2}=\dots=x_{L-2}=b$ . Set further $x_{L-1}=a$ and $x_{L}=b$ . We observe that for this $(x_{1},\dots,x_{L})\in[q]^{L}$ and $\overline{\omega}$ obtained from $\omega^{\prime}$ by averaging-out the last two coordinates, Equation 34 strictly holds. Indeed,

$\mathrm{max}_{\omega^{\prime}}(x_{1},\ldots,x_{L-1},x_{L})+\mathrm{max}_{% \omega^{\prime}}(x_{1},\ldots,x_{L},x_{L-1})=2\sum_{j=1}^{(L-2)/2}\omega(i)+2% \omega(L)$

and

$2\mathrm{max}_{\overline{\omega}}(x_{1},\ldots,x_{L-1},x_{L})=2\sum_{j=1}^{(L-% 2)/2}\omega(i)+\omega(L-1)+\omega(L).$

Thus Lemma 10 implies $f(P,\overline{\omega})>f(P,\omega^{\prime})\geq f(P,\omega)$ , a contradiction.
2.

$𝑳$ is odd and at least three components of $\omega$ take distinct values, or the components in $\omega$ only take two different values and at least two of them take the minimum value. Without loss of generality $\omega(L-2)\leq\omega(L-1)<\omega(L)$ . If $L\geq 5$ , let $\omega^{\prime}$ be obtained from $\omega$ by averaging-out the first $L-3$ coordinates; by Lemma 12, $f(P,\omega^{\prime})\geq f(P,\omega)$ . If $L=3$ , set $\omega^{\prime}=\omega$ .

Since $2|(L-3)$ , if $L\geq 5$ , we set $x_{1}=\dots=x_{(L-1)/2-1}=a$ and $x_{(L-1)/2}=\dots=x_{L-3}=b$ . Let $x_{L-2}=c$ , $x_{L-1}=a$ and $x_{L}=b$ . We observe that for this $(x_{1},\dots,x_{L})\in[q]^{L}$ and $\overline{\omega}$ obtained from $\omega^{\prime}$ by averaging-out the last two coordinates, Equation 34 strictly holds. Indeed, since $\omega(L-2)<\omega(L)$ we have

$\mathrm{max}_{\omega^{\prime}}(x_{1},\ldots,x_{L-1},x_{L})+\mathrm{max}_{% \omega^{\prime}}(x_{1},\ldots,x_{L},x_{L-1})=2\sum_{j=1}^{(L-2)/2}\omega(i)+2% \omega(L)$

and

$2\mathrm{max}_{\overline{\omega}}(x_{1},\ldots,x_{L-1},x_{L})=2\sum_{j=1}^{(L-% 3)/2}\omega(i)+\omega(L-1)+\omega(L).$

Thus Lemma 10 implies $f(P,\overline{\omega})>f(P,\omega^{\prime})\geq f(P,\omega)$ , a contradiction.
3.

$𝑳$ is odd and only one component takes the minimum value. That is, $\omega(1)=\omega(2)=\cdots=\omega(L-1)<\omega(L)$ . Let $\omega^{\prime}$ be obtained from $\omega^{\prime}$ by averaging-out the subset $\{L-1,L_{2}\}$ . Then $f(P,\omega^{\prime})\geq f(P,\omega)$ by Lemma 12 and moreover $\omega^{\prime}$ is such that at least two coordinates take on the minimum value, as $\omega^{\prime}(1)=\cdots=\omega^{\prime}(L-2)>\omega^{\prime}(L-1)=\omega^{% \prime}(L)$ . The argument from the previous case can now be applied to derive a contradiction.

$\hfill\blacktriangleleft$

Thus, except for degenerate choices for $P\in\Delta([q])$ , it follows that the function $\omega\mapsto f(P,\omega)$ is maximized by the choice of $\omega=U_{L}$ . The next step is to determine the distribution $P\in\Delta([q])$ maximizing $P\mapsto f(P,U_{L})$ . At this point, we can rely on a main result of [25], as the function $P\mapsto 1-f(P,U_{L})$ is the same as the function $f_{q,L}(P)$ defined in [25, Equation (17)]. It is shown therein that $f_{q,L}(P)$ is strictly Schur convex, which in particular means that $f_{q,L}(P)$ has a unique minimum at $P=U_{q}$ . That is, $f(P,U_{L})$ has a unique maximum at $P=U_{q}$ .

The (strict) Schur convexity also implies the following: if $p=\max_{x\in[q]}P(x)$ , then $f_{q,L}(P)\geq f_{q,L}(P_{q,p})$ where

\displaystyle P_{q,p}(x)=\begin{cases}\frac{1-p}{q-1}&x\in\{1,2,\dots,q-1\}\\ p&x=q\end{cases}\ .

(35)

That is, we can conclude that $f(P,U_{L})\leq f(P_{q,p},U_{L})$ . We encapsulate these facts in the following proposition.

Proposition 14 (Theorem 1,2 [25]).

Let $q\geq 2$ , $L\geq q$ and $P\in\Delta([q])$ . Suppose $p=\max_{x\in[q]}P(x)$ . Then $f(P,U_{L})\leq f(P_{q,p},U_{L})$ . Furthermore, $f(P_{q,p},U_{L})\leq f(U_{q},U_{L})$ is monotone decreasing for $p\geq 1/q$ . Lastly, $f(P_{q,p},U_{L})$ is concave for $p\in[1/q,1]$ , i.e., $\frac{1}{n}\sum_{i=1}^{n}f(P_{q,p_{i}},U_{L})\leq f(P_{q,p},U_{L})$ with $p=\frac{1}{n}\sum_{i=1}^{n}p_{i}$ .

A further fact that we have from [25] is that

p_{*}(q,L)=f(U_{q},U_{L})\ .

In fact, this was taken as the definition of $p_{*}(q,L)$ . To end this subsection, we prove the following theorem by utilizing the concavity of $f(P_{q,p},U_{L})$ .

Theorem 15.

Assume $\operatorname{rad}_{\mathrm{H}}({\mathcal{C}})\leq p$ , then we have

\mathop{\mathbb{E}}_{({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{L})\in{% \mathcal{C}}^{L}}\left[\operatorname{rad}_{\omega}(\varphi({\boldsymbol{c}}_{1% }),\ldots,\varphi({\boldsymbol{c}}_{L}))\right]\leq f(P_{q,p},U_{L}).

Proof.

Let ${\boldsymbol{y}}$ be the center attaining $\operatorname{rad}_{\mathrm{H}}({\mathcal{C}})$ . Without loss of generality, we can assume ${\boldsymbol{y}}$ is a all zero vector. Let $P_{i}$ be the distribution of symbols in the $i$ -th index of ${\mathcal{C}}$ , i.e., $P_{i}(j)=\Pr[{\boldsymbol{c}}(i)=j]$ with the distribution taken over ${\boldsymbol{c}}\in{\mathcal{C}}$ . Let $p_{i}=\max_{x\in[q]}P_{i}(x)$ and $p^{\prime}=\frac{1}{n}\sum_{i=1}^{n}p_{i}$ . Clearly, $p_{i}\geq 1/q$ . Then, we have

	$\displaystyle\mathop{\mathbb{E}}_{({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}% }_{L})\in{\mathcal{C}}^{L}}\left[\operatorname{rad}_{\omega}(\varphi({% \boldsymbol{c}}_{1}),\ldots,\varphi({\boldsymbol{c}}_{L}))\right]=\frac{1}{n}% \sum_{i=1}^{n}f(P_{i},\omega)$
	$\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}f(P_{i},U_{L})\leq\frac{1}{n}\sum_{i% =1}^{n}f(P_{q,p_{i}},U_{L})\leq f(P_{q,p^{\prime}},U_{L})\leq f(P_{q,p},U_{L}).$

The first inequality is due to Lemma 11 and the second and third inequalities are due to Proposition 14. The last inequality is due to $\operatorname{rad}_{\mathrm{H}}({\mathcal{C}})\leq p$ and the center ${\boldsymbol{y}}$ is all zero vector. The proof is completed. $\hfill\blacktriangleleft$

3.3 Abundance of Random-Like $𝑳$ -tuples

Recall the notations

\operatorname{rad}({\mathcal{C}})=\frac{1}{n}\min_{{\boldsymbol{y}}\in[q]^{n}}% \max_{{\boldsymbol{c}}\in{\mathcal{C}}}d_{\mathrm{H}}({\boldsymbol{c}},{% \boldsymbol{y}})

and

\mathsf{type}_{\boldsymbol{u}}({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{L% })=\frac{1}{n}\sum_{i=1}^{n}\mathds{1}{\left\{({\boldsymbol{c}}_{1}(i),\ldots,% {\boldsymbol{c}}_{L}(i))={\boldsymbol{u}}\right\}}

where ${\boldsymbol{c}}_{i}=({\boldsymbol{c}}_{i}(1),\ldots,{\boldsymbol{c}}_{i}(n))% \in[q]^{n}$ and ${\boldsymbol{u}}\in[q]^{L}$ . In this subsection, we prove a code ${\mathcal{C}}\subseteq[q]^{n}$ either contains a large subcode ${\mathcal{C}}^{\prime}\subseteq{\mathcal{C}}$ with radius $\operatorname{rad}({\mathcal{C}}^{\prime})\leq 1-\frac{1}{q}-\varepsilon$ , or most of its $L$ -tuples are of type close to the uniform distribution (for all ${\boldsymbol{u}}\in[q]^{L})$ .

We first show that for any projection $\pi_{A}$ with $|A|\geq\mu n$ (for some parameter $\mu\in[0,1]$ ), the projection $\pi_{A}({\mathcal{C}})$ almost preserves the radius $\operatorname{rad}({\mathcal{C}})$ with small loss. Then, if $\operatorname{rad}({\mathcal{C}}^{\prime})>1-\frac{1}{q}-\varepsilon$ for any subcode ${\mathcal{C}}^{\prime}$ with large size, we find a codeword ${\boldsymbol{c}}_{1}$ in ${\mathcal{C}}$ whose symbols’ distribution is close to the uniform. In fact, most codewords in ${\mathcal{C}}$ satisfies this requirement. Let $A_{i}$ be the index set of ${\boldsymbol{c}}_{1}$ taking value $i$ . We apply $\pi_{A_{i}}$ to ${\mathcal{C}}$ and claim that $\pi_{A_{i}}({\mathcal{C}})$ preserves the radius. Thus, we can find a codeword ${\boldsymbol{c}}_{2}$ such that for every $i\in[q]$ , the symbol’s distribution of $\pi_{A_{i}}({\boldsymbol{c}}_{2})$ is close to uniform. Moreover, most of codewords in ${\mathcal{C}}$ satisfy this requirement. The proof is the completed by induction.

Lemma 16.

Let $\pi_{A}:[q]^{n}\rightarrow[q]^{A}$ be the projection on a set $A$ of size $m$ . Suppose ${\mathcal{C}}\subseteq[q]^{n}$ is a code of size $q s$ satisfying $\operatorname{rad}_{\mathrm{H}}(\pi_{A}({\mathcal{C}}))\leq 1-\frac{1}{q}-\varepsilon$ . Then, there exists a subcode ${\mathcal{C}}^{\prime}\subseteq{\mathcal{C}}$ of size at least $s$ such that $\operatorname{rad}_{\mathrm{H}}({\mathcal{C}}^{\prime})\leq 1-\frac{1}{q}-% \frac{m}{n}\varepsilon$ .

Proof.

The proof is omitted due to the limited space. $\hfill\blacktriangleleft$

Theorem 17.

Let $L$ be fixed. For every $\varepsilon>0$ , there exists a $\delta>0$ with the following property. If $s$ is a natural number, there exist constants $M_{0}=M_{0}(s)$ and $c(s)$ such that for any code ${\mathcal{C}}\subseteq[q]^{n}$ with size $M\geq M_{0}$ , at least one of the following must hold:

1.

There exists ${\mathcal{C}}^{\prime}\subseteq{\mathcal{C}}$ such that $|{\mathcal{C}}^{\prime}|\geq s$ and $\operatorname{rad}_{\mathrm{H}}({\mathcal{C}}^{\prime})\leq 1-\frac{1}{q}-\delta$ .
2.

There exist at least $M^{L}-c(s)M^{L-1}$ many $L$ -tuples of distinct codewords $({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{L})$ in ${\mathcal{C}}$ such that for all ${\boldsymbol{u}}\in[q]^{L}$ we have

$|\mathsf{type}_{\boldsymbol{u}}({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{% L})-q^{-L}|\leq\varepsilon$

and thus

$|\overline{\operatorname{rad}}_{\omega}(\varphi({\boldsymbol{c}}_{1}),\cdots,% \varphi({\boldsymbol{c}}_{L}))-f(U_{q},\omega)|\leq q^{L}\varepsilon.$

Proof.

The proof is omitted due to the limited space. $\hfill\blacktriangleleft$

3.4 Putting Everything Together

The argument follows the same line of reasoning as [1]. We provide the proof for completeness. Define $\rho_{L}({\mathcal{C}})=\min\operatorname{rad}(\varphi({\boldsymbol{c}}_{1}),% \ldots,\varphi({\boldsymbol{c}}_{L}))$ with minimum taken over all $L$ -tuples $({\boldsymbol{c}}_{1},\ldots,{\boldsymbol{c}}_{L})\in{\mathcal{C}}^{L}$ with distinct elements, where we recall that $\operatorname{rad}$ denotes the relaxed Chebyshev radius (Definition 5).

Theorem 18.

Let $L\geq 2$ and $q\geq 3$ . If ${\mathcal{C}}\subseteq[q]^{n}$ is $(p_{*}(q,L)+\varepsilon,L)$ -list-decodable, then $|{\mathcal{C}}|=O_{q,L}(\frac{1}{\varepsilon})$ .

Proof.

The proof is omitted due to the limited space. $\hfill\blacktriangleleft$

4 Zero-Rate List-Recovery

In this section, we show how our results on list-decoding can naturally be extended to list-recovery. Due to the page limit, we refer the reader to the full version. The result is summarized in Theorem 1.

5 Code Construction

In this section, we present a simple simplex-like code construction and show that it attains the optimal size-radius trade-off by analyzing its list-decoding and -recovery radius.

Our construction will be identical for list-decoding and -recovery and therefore we will directly analyze its list-recovery radius. Before presenting the construction and its analysis, let us define the average radius $\overline{\operatorname{rad}}_{\ell}$ . This is a standard notion that “linearizes” the Chebyshev radius $\operatorname{rad}$ and often finds its use in the analysis of list-recoverable codes in the literature. The definition reads as follows: for any ${\boldsymbol{c}}_{1},\cdots,{\boldsymbol{c}}_{L}\in[q]^{n}$ ,

\displaystyle\overline{\operatorname{rad}}_{\ell}({\boldsymbol{c}}_{1},\cdots,% {\boldsymbol{c}}_{L})

\displaystyle\coloneqq\frac{1}{L}\min_{{\boldsymbol{Y}}\in{\mathcal{X}}^{n}}% \sum_{i=1}^{L}d_{\mathrm{LR}}({\boldsymbol{c}}_{i},{\boldsymbol{Y}}).

It is well-known and easy to verify (by, e.g., following the derivations leading to Equation 31) that the above minimization admits the following explicit solution:

\displaystyle\overline{\operatorname{rad}}_{\ell}({\boldsymbol{c}}_{1},\cdots,% {\boldsymbol{c}}_{L})

\displaystyle\coloneqq\sum_{j=1}^{n}\left(1-\frac{1}{L}\mathsf{pl}_{\ell}({% \boldsymbol{c}}_{1}(j),\cdots,{\boldsymbol{c}}_{L}(j))\right),

(36)

Equation 36 should be interpreted as the average distance from each ${\boldsymbol{c}}_{i}$ to the “centroid” ${\boldsymbol{Y}}^{*}\in{\mathcal{X}}^{n}$ of the list defined as⁷⁷7If there are multiple maximizers, take an arbitrary one and the value of $\overline{\operatorname{rad}}_{\ell}$ remains the same.

\displaystyle{\boldsymbol{Y}}^{*}(j)

\displaystyle\coloneqq\operatorname*{argmax}_{A\in{\mathcal{X}}}\sum_{i=1}^{L}% \mathds{1}{\left\{{\boldsymbol{c}}_{i}(j)\in A\right\}}.

for each $j\in[n]$ .

Finally, for integers $q\geq 1$ and $L\geq 0$ , denote by

\displaystyle{\mathcal{A}}_{q,L}

\displaystyle=\left\{(a_{1},\cdots,a_{q})\in{\mathbb{Z}}_{\geq 0}^{q}:\sum_{i=% 1}^{q}a_{i}=L\right\}

the set of $q$ -partitions of $L$ , i.e., $a_{i}$ is the number of indices taking value $i$ . For ${\boldsymbol{a}}\in{\mathcal{A}}_{a,L}$ , we shorthand $\binom{L}{{\boldsymbol{a}}}=\binom{L}{a_{1},\ldots,a_{q}}$ where ${\boldsymbol{a}}=(a_{1},\ldots,a_{q})$ . Define $\mathrm{max}_{\ell}\{{\boldsymbol{a}}\}=\max_{A\in{\mathcal{X}}}\sum_{i\in A}a% _{i}$ , i.e., the sum of $\ell$ largest components in ${\boldsymbol{a}}$ .

Theorem 19 (Construction of zero-rate list-recoverable codes).

Fix any integers $q\geq 3$ , $\ell\geq 1$ and $L\geq 2$ . For any sufficiently large $m$ , there exists a $(p,\ell,L)$ -list-recoverable code ${\mathcal{C}}$ with blocklength

\displaystyle n=\binom{qm}{\underbrace{m,\cdots,m}_{q}},

(37)

and the trade-off between code size $M$ and (relative) radius $p$ given by:

\displaystyle M=qm,\quad p=p_{*}(q,\ell,L)+c_{q,\ell,L}m^{-1}+O(m^{-2}),

where

\displaystyle c_{q,\ell,L}

\displaystyle\coloneqq q^{-L}\sum_{{\boldsymbol{a}}\in{\mathcal{A}}_{q,L}}% \frac{\mathrm{max}_{\ell}\left\{{\boldsymbol{a}}\right\}}{L}\binom{L}{{% \boldsymbol{a}}}\left[\sum_{i=1}^{q}\binom{a_{i}}{2}-\frac{1}{q}\binom{L}{2}% \right]>0.

(38)

Proof.

The proof is omitted due to the page limit. $\hfill\blacktriangleleft$

References

[1] Noga Alon, Boris Bukh, and Yury Polyanskiy. List-decodable zero-rate codes. IEEE Transactions on Information Theory, 65(3):1657–1667, 2018. doi:10.1109/TIT.2018.2868957.
[2] László Babai, Lance Fortnow, Noam Nisan, and Avi Wigderson. BPP has subexponential time simulations unless EXPTIME has publishable proofs. Comput. Complex., 3:307–318, 1993. doi:10.1007/BF01275486.
[3] L. A. Bassalygo. New upper bounds for error-correcting codes. Probl. of Info. Transm., 1:32–35, 1965.
[4] Vladimir M Blinovsky. Bounds for codes in the case of list decoding of finite volume. Problems of Information Transmission, 22:7–19, 1986.
[5] Vladimir M Blinovsky. Code bounds for multiple packings over a nonbinary finite alphabet. Problems of Information Transmission, 41:23–32, 2005. doi:10.1007/S11122-005-0007-5.
[6] Vladimir M Blinovsky. On the convexity of one coding-theory function. Problems of Information Transmission, 44:34–39, 2008. doi:10.1134/S0032946008010031.
[7] Philippe Delsarte. An algebraic approach to the association schemes of coding theory. Philips Res. Rep. Suppl., 10:vi+–97, 1973.
[8] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1057–1068. IEEE, 2020. doi:10.1109/FOCS46700.2020.00102.
[9] Dean Doron and Mary Wootters. High-probability list-recovery, and applications to heavy hitters. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.55.
[10] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.
[11] Peter Elias. Error-correcting codes for list decoding. IEEE Transactions on Information Theory, 37(1):5–12, 1991. doi:10.1109/18.61123.
[12] Edgar N Gilbert. A comparison of signalling alphabets. The Bell System Technical Journal, 31(3):504–522, 1952.
[13] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), pages 25–32. ACM, 1989. doi:10.1145/73007.73010.
[14] Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh–vardy codes. Journal of the ACM (JACM), 56(4):1–34, 2009. doi:10.1145/1538902.1538904.
[15] Iftach Haitner, Yuval Ishai, Eran Omri, and Ronen Shaltiel. Parallel hashing via list recoverability. In Annual Cryptology Conference, pages 173–190. Springer, 2015. doi:10.1007/978-3-662-48000-7_9.
[16] Justin Holmgren, Alex Lombardi, and Ron D Rothblum. Fiat–shamir via list-recoverable codes (or: parallel repetition of gmw is not zero-knowledge). In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 750–760, 2021. doi:10.1145/3406325.3451116.
[17] Piotr Indyk, Hung Q Ngo, and Atri Rudra. Efficiently decodable non-adaptive group testing. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1126–1142. SIAM, 2010. doi:10.1137/1.9781611973075.91.
[18] Jeffrey C Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):414–440, 1997. doi:10.1006/JCSS.1997.1533.
[19] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, 1993. doi:10.1137/0222080.
[20] VI Levenshtein. Application of hadamard matrices on coding problem. Problems of Cybernetica, 5:123–136, 1961.
[21] Richard J Lipton. Efficient checking of computations. In Proceedings of the 7th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 207–215. Springer, 1990. doi:10.1007/3-540-52282-4_44.
[22] Robert J. McEliece, Eugene R. Rodemich, Howard Rumsey, Jr., and Lloyd R. Welch. New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Trans. Inform. Theory, IT-23(2):157–166, 1977. doi:10.1109/tit.1977.1055688.
[23] Hung Q Ngo, Ely Porat, and Atri Rudra. Efficiently decodable error-correcting list disjunct matrices and applications. In International Colloquium on Automata, Languages, and Programming, pages 557–568. Springer, 2011.
[24] Nicolas Resch. List-decodable codes:(randomized) constructions and applications. School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep., CMU-CS-20-113, 2020.
[25] Nicolas Resch, Chen Yuan, and Yihan Zhang. Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery. arXiv preprint, 2022. doi:10.48550/arXiv.2210.07754.
[26] Madhu Sudan, Luca Trevisan, and Salil Vadhan. Pseudorandom generators without the XOR lemma. Journal of Computer and System Sciences, 62(2):236–266, 2001. doi:10.1006/JCSS.2000.1730.
[27] Michael A Tsfasman, SG Vlădutx, and Th Zink. Modular curves, shimura curves, and goppa codes, better than varshamov-gilbert bound. Mathematische Nachrichten, 109(1):21–28, 1982.
[28] RR Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.
[29] Lloyd R. Welch, Robert J. McEliece, and Howard Rumsey, Jr. A low-rate improvement on the Elias bound. IEEE Trans. Inform. Theory, IT-20:676–678, 1974. doi:10.1109/tit.1974.1055279.
[30] Jack Wozencraft. List decoding. Quarter Progress Report, 48:90–95, 1958.

[bib.bib1] [1] Noga Alon, Boris Bukh, and Yury Polyanskiy. List-decodable zero-rate codes. IEEE Transactions on Information Theory, 65(3):1657–1667, 2018. doi:10.1109/TIT.2018.2868957.

[bib.bib2] [2] László Babai, Lance Fortnow, Noam Nisan, and Avi Wigderson. BPP has subexponential time simulations unless EXPTIME has publishable proofs. Comput. Complex., 3:307–318, 1993. doi:10.1007/BF01275486.

[bib.bib3] [3] L. A. Bassalygo. New upper bounds for error-correcting codes. Probl. of Info. Transm., 1:32–35, 1965.

[bib.bib4] [4] Vladimir M Blinovsky. Bounds for codes in the case of list decoding of finite volume. Problems of Information Transmission, 22:7–19, 1986.

[bib.bib5] [5] Vladimir M Blinovsky. Code bounds for multiple packings over a nonbinary finite alphabet. Problems of Information Transmission, 41:23–32, 2005. doi:10.1007/S11122-005-0007-5.

[bib.bib6] [6] Vladimir M Blinovsky. On the convexity of one coding-theory function. Problems of Information Transmission, 44:34–39, 2008. doi:10.1134/S0032946008010031.

[bib.bib7] [7] Philippe Delsarte. An algebraic approach to the association schemes of coding theory. Philips Res. Rep. Suppl., 10:vi+–97, 1973.

[bib.bib8] [8] Dean Doron, Dana Moshkovitz, Justin Oh, and David Zuckerman. Nearly optimal pseudorandomness from hardness. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1057–1068. IEEE, 2020. doi:10.1109/FOCS46700.2020.00102.

[bib.bib9] [9] Dean Doron and Mary Wootters. High-probability list-recovery, and applications to heavy hitters. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.ICALP.2022.55.

[bib.bib10] [10] Peter Elias. List decoding for noisy channels. Wescon Convention Record, Part 2, pages 94–104, 1957.

[bib.bib11] [11] Peter Elias. Error-correcting codes for list decoding. IEEE Transactions on Information Theory, 37(1):5–12, 1991. doi:10.1109/18.61123.

[bib.bib12] [12] Edgar N Gilbert. A comparison of signalling alphabets. The Bell System Technical Journal, 31(3):504–522, 1952.

[bib.bib13] [13] Oded Goldreich and Leonid A Levin. A hard-core predicate for all one-way functions. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC), pages 25–32. ACM, 1989. doi:10.1145/73007.73010.

[bib.bib14] [14] Venkatesan Guruswami, Christopher Umans, and Salil Vadhan. Unbalanced expanders and randomness extractors from parvaresh–vardy codes. Journal of the ACM (JACM), 56(4):1–34, 2009. doi:10.1145/1538902.1538904.

[bib.bib15] [15] Iftach Haitner, Yuval Ishai, Eran Omri, and Ronen Shaltiel. Parallel hashing via list recoverability. In Annual Cryptology Conference, pages 173–190. Springer, 2015. doi:10.1007/978-3-662-48000-7_9.

[bib.bib16] [16] Justin Holmgren, Alex Lombardi, and Ron D Rothblum. Fiat–shamir via list-recoverable codes (or: parallel repetition of gmw is not zero-knowledge). In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 750–760, 2021. doi:10.1145/3406325.3451116.

[bib.bib17] [17] Piotr Indyk, Hung Q Ngo, and Atri Rudra. Efficiently decodable non-adaptive group testing. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1126–1142. SIAM, 2010. doi:10.1137/1.9781611973075.91.

[bib.bib18] [18] Jeffrey C Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):414–440, 1997. doi:10.1006/JCSS.1997.1533.

[bib.bib19] [19] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, 1993. doi:10.1137/0222080.

[bib.bib20] [20] VI Levenshtein. Application of hadamard matrices on coding problem. Problems of Cybernetica, 5:123–136, 1961.

[bib.bib21] [21] Richard J Lipton. Efficient checking of computations. In Proceedings of the 7th Annual Symposium on Theoretical Aspects of Computer Science (STACS), pages 207–215. Springer, 1990. doi:10.1007/3-540-52282-4_44.

[bib.bib22] [22] Robert J. McEliece, Eugene R. Rodemich, Howard Rumsey, Jr., and Lloyd R. Welch. New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities. IEEE Trans. Inform. Theory, IT-23(2):157–166, 1977. doi:10.1109/tit.1977.1055688.

[bib.bib23] [23] Hung Q Ngo, Ely Porat, and Atri Rudra. Efficiently decodable error-correcting list disjunct matrices and applications. In International Colloquium on Automata, Languages, and Programming, pages 557–568. Springer, 2011.

[bib.bib24] [24] Nicolas Resch. List-decodable codes:(randomized) constructions and applications. School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep., CMU-CS-20-113, 2020.

[bib.bib25] [25] Nicolas Resch, Chen Yuan, and Yihan Zhang. Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery. arXiv preprint, 2022. doi:10.48550/arXiv.2210.07754.

[bib.bib26] [26] Madhu Sudan, Luca Trevisan, and Salil Vadhan. Pseudorandom generators without the XOR lemma. Journal of Computer and System Sciences, 62(2):236–266, 2001. doi:10.1006/JCSS.2000.1730.

[bib.bib27] [27] Michael A Tsfasman, SG Vlădutx, and Th Zink. Modular curves, shimura curves, and goppa codes, better than varshamov-gilbert bound. Mathematische Nachrichten, 109(1):21–28, 1982.

[bib.bib28] [28] RR Varshamov. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.

[bib.bib29] [29] Lloyd R. Welch, Robert J. McEliece, and Howard Rumsey, Jr. A low-rate improvement on the Elias bound. IEEE Trans. Inform. Theory, IT-20:676–678, 1974. doi:10.1109/tit.1974.1055279.

[bib.bib30] [30] Jack Wozencraft. List decoding. Quarter Progress Report, 48:90–95, 1958.

Tight Bounds on List-Decodable and List-Recoverable Zero-Rate Codes

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

Funding:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Rate versus noise-resilience.

Positive versus zero-rate regimes.

1.1 Our results

Theorem 1.

Theorem 2.

1.2 Technical Overview

The double-counting argument.

Weighted average radius.

Generalization to 𝒒-ary list-decoding.

Generalization to list-recovery.

Code construction.

1.3 Organization

2 Preliminaries

2.1 List-Decoding

Definition 3 (List-decodable code).

Definition 4 (Chebyshev radius).

Definition 5.

Definition 6.

3 Zero-Rate List-Decoding

3.1 Linear Programming Relaxation

Lemma 7 (rad is close to radH).

Proof.

Lemma 8 (rad equals maximum rad¯ω).

Proof.

Lemma 9 (rad is achieved by finitely many ω).

Proof.

3.2 Properties of 𝒇⁢(𝑷,𝝎)

Lemma 10.

Proof.

Lemma 11.

Proof.

Lemma 12.

Theorem 13.

Proof.

Proposition 14 (Theorem 1,2 [25]).

Theorem 15.

Proof.

3.3 Abundance of Random-Like 𝑳-tuples

Lemma 16.

Proof.

Theorem 17.

Proof.

3.4 Putting Everything Together

Theorem 18.

Proof.

4 Zero-Rate List-Recovery

5 Code Construction

Theorem 19 (Construction of zero-rate list-recoverable codes).

Proof.

References

Generalization to $𝒒$ -ary list-decoding.

Lemma 7 ( $\operatorname{rad}$ is close to $\operatorname{rad}_{\mathrm{H}}$ ).

Lemma 8 ( $\operatorname{rad}$ equals maximum $\overline{\operatorname{rad}}_{\omega}$ ).

Lemma 9 ( $\operatorname{rad}$ is achieved by finitely many $\omega$ ).

3.2 Properties of $f(P,\omega)$

3.3 Abundance of Random-Like $𝑳$ -tuples