Abstract 1 Introduction 2 Preliminaries and Ingredients 3 Definitions for Randomized Codes and Self-Seeded Codes 4 Optimal Randomized Codes and Self-Seeded Codes 5 Randomized Codes Imply One Way Functions 6 Self-Seeded Codes Imply Collision Resistance Hash Functions References

Detecting and Correcting Computationally Bounded Errors: A Simple Construction Under Minimal Assumptions

Jad Silbak Northeastern University, Boston, MA, USA Daniel Wichs Northeastern University, Boston, MA, USA
NTT Research, Sunnyvale, CA, USA
Abstract

We study error detection and error correction in a computationally bounded world, where errors are introduced by an arbitrary polynomial time adversarial channel. We consider codes where the encoding procedure uses random coins and define two distinct variants: (1) in randomized codes, fresh randomness is chosen during each encoding operation and is unknown a priori, while (2) in self-seeded codes, the randomness of the encoding procedure is fixed once upfront and is known to the adversary. In both cases, the randomness need not be known to the decoding procedure, and there is no trusted common setup between the encoder and decoder. The encoding and decoding algorithms are efficient and run in some fixed polynomial time, independent of the run time of the adversary.

The parameters of standard codes for worst-case (inefficient) errors are limited by the Singleton bound: for rate R it is not possible to detect more than a 1R fraction of errors, or uniquely correct more than a (1R)/2 fraction of errors, and efficient codes matching this bound exist for sufficiently large alphabets. In the computationally bounded setting, we show that going beyond the Singleton bound implies one-way functions in the case of randomized codes and collision-resistant hash functions in the case of self-seeded codes. We construct randomized and self-seeded codes under these respective minimal assumptions with essentially optimal parameters over a constant-sized alphabet:

  • Detection: the codes have a rate R1 while detecting a ρ1 fraction of errors.

  • Correction: for any ρ<1/2, the codes uniquely correct a ρ fraction of errors with rate R1ρ.

Codes for computationally bounded errors were studied in several prior works starting with Lipton (STACS ’94), but all such works either: (a) need some trusted common setup (e.g., public-key infrastructure, common reference string) between the encoder and decoder, or (b) only handle channels whose complexity is a prior bounded below that of the code.

Keywords and phrases:
Error Correction, One-Way Functions, Collision Resistant Hashing
Funding:
Jad Silbak: Research supported by the Khoury College Distinguished Postdoctoral Fellowship.
Daniel Wichs: Research supported by NSF CNS-2349972 and CNS-2055510.
Copyright and License:
[Uncaptioned image] © Jad Silbak and Daniel Wichs; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Cryptographic primitives
Related Version:
Full Version: https://eprint.iacr.org/2024/1461 [22]
Acknowledgements:
We are very grateful to Ronen Shaltiel and Tal Yankovitz for very useful discussions.
Editor:
Raghu Meka

1 Introduction

The theory of error detection and error correction goes back to the seminal works of Shannon [21] and Hamming [7]. A sender encodes a message and sends the resulting codeword over a noisy channel that may introduce errors by modifying a subset of the codeword symbols. A receiver gets the modified codeword and attempts to decode the original message. The goal of error-detection is to recover the original message when there are no errors or detect the presence of errors. The goal of error-correction is to always uniquely recover the original message. The main parameters of interest are the rate R, defined as the ratio of message length to codeword length, and the relative error tolerance ρ, defined as the ratio between the number of errors that can be detected/corrected and the codeword length.

Starting with the work of Hamming [7], a large body of research studies codes for worst-case computationally unbounded (adversarial) channels that can introduce arbitrary errors patterns. The error tolerance in this setting is fully determined by the distance of the code, which is subject to several well-established fundamental bounds. For example, the Singleton bound implies that for any rate R0 the relative error tolerance is ρ1R for error detection and ρ(1R)/2 for error correction. Furthermore, there are efficient codes matching the Singleton bound for sufficiently large alphabet sizes.

Computationally Bounded Channels

We consider a setting similar to that of Hamming [7], where errors are introduced by a fully adversarial channel, but with the additional natural restriction that the channel is computationally bounded, and can run in arbitrary polynomial (or even sub-exponential) time. This restriction is well motivated – if an error pattern takes longer than the lifetime of the universe to compute, then we can safely assume it will not occur. Our goal is to construct efficient codes that can detect/correct errors in this setting with rate vs error-tolerance trade-offs that go beyond the Singleton bound. Many prior works starting with [11] and including [14, 12, 8, 4, 17, 1, 18, 19, 20, 3] have previously considered computationally bounded channels, either to get better parameters, or to achieve other desirable properties such as improved local/list decoding. However, all such prior works suffer from at least one of the following deficiencies:

  1. 1.

    Trusted setup: Many works require a trusted coordinated setup between the encoding and decoding algorithms. For example, [11] assumes that they share a secret key, which is unknown to the adversarial channel. Alternatively, [12] require a public-key infrastructure where the encoding algorithm has its own secret key, while the decoding algorithm (as well as the adversarial channel) gets the corresponding public-key. In both cases, the encoding algorithm is also stateful and updates its state between operations. The use of secret keys and state are significant limitations that make such codes inapplicable in many natural settings. The work of Grossman, Holmgren and Yogev [3] introduces the notion of seeded codes that only requires a light trusted setup consisting of a common (uniformly) random seed shared by the encoder and decoder. They construct such seeded codes correcting ρ1R errors over a large alphabet based on strong and non-standard cryptographic hardness assumptions – namely a certain form of two-input correlation-intractable hash functions, which are currently only known to exist in the random-oracle model, but have no known provably secure instantiations under standard cryptographic hardness assumptions.

  2. 2.

    Codes have higher complexity than adversary: Many works [4, 17, 18, 19, 20] restrict the complexity (time/size or space) of the adversarial channel to be bounded by some arbitrary polynomial chosen a priori, and the complexity of the encoding/decoding algorithms then scales accordingly to some sufficiently larger polynomial. In particular, the adversarial channel is not powerful enough to evaluate the encoding/decoding algorithms in these schemes. This is a significant limitation, since we cannot argue that such schemes protect against all errors that can arise in a computationally bounded world – a world that can compute the encoding/decoding algorithms of the scheme can also find error patterns to defeat them! In contrast, we would like to have efficient codes with some fixed polynomial run-time that protect against arbitrary polynomial (or even sub-exponential time) adversarial channels.

The different prior works and their properties are summarized in Table 1.

Table 1: Summary of related work. Seeded codes are also known as Monte-Carlo codes in the literature and require a common random seed shared by the encoder and decoder as a trusted setup. Randomized codes require fresh on-the-fly randomness at each encoding operation. Seeded + randomized codes require both simultaneously. Codes for channels in SIZE(nc) have correspondingly larger run-time poly(nc), while codes for channels in P/poly have some a priori fixed polynomial run-time independent of the channel. Some works focus on the binary alphabet while others focus on larger alphabets.
work Setup/Type Channel Decoding Assumption
[11]
symmetric-key
stateful encoding
P/poly unique one-way functions
[12]
public key
stateful encoding
P/poly unique one-way functions
[4] seeded + randomized SIZE(nc) list none
[17] randomized SIZE(nc) list derandomization assumption
[18] randomized SPACE(nδ) unique none
[19] seeded + randomized SIZE(nc) unique none
[20] randomized SIZE(nc) unique derandomization assumption
[3] seeded P/poly unique
correlation-intractable hash
(random oracle)
This work randomized P/poly unique one-way functions
This work self-seeded P/poly unique collision-resistant hash

1.1 Our Results

In this work we construct error detection and error correction codes for computationally bounded channels, while simultaneously achieving: (1) no trusted setup, (2) a fixed polynomial-time code that protects against arbitrary polynomial (or even sub-exponential) time adversarial channels, (3) provable guarantees under basic/minimal computational hardness assumptions, (4) essentially optimal trade-offs between rate and error tolerance beyond the Singleton bound for sufficiently large constant-sized alphabets.

Randomized/Self-Seeded Codes

Standard deterministic codes cannot achieve better parameters against (non-uniform) computationally bounded channels beyond what they achieve for computationally unbounded channels, and in particular, they cannot go beyond the Singleton bound. This is because a worst-case message and error pattern for which detection/correction fails can simply be hard-coded as non-uniform advice to an efficient adversary. To overcome this limitation, we therefore consider codes (Enc,Dec), where the encoding procedure Enc takes additional randomness as input. We consider two variants. In the basic variant of randomized codes, fresh randomness r for the encoding algorithm is chosen each time on the fly and is unknown to the adversary ahead of time. The adversary chooses a worst-case message m, gets the resulting randomized codeword cEnc(m;r) for a fresh random r, and then chooses the error e as an arbitrary efficiently computable function of the codeword. We also consider a stronger variant called self-seeded codes, where the randomness r of the encoding algorithm is fixed once and for all and known to the adversary a priori. That is, the adversary gets the randomness r and then uses it to adaptively choose the message m (which fully defines the codeword c=Enc(m;r)) along with the error e. In both cases, the fractional Hamming weight of e is bounded by the error tolerance ρ. Error detection ensures that with overwhelming probability the resulting erroneous codeword c=c+e decodes to m when there are no errors, and decodes to either m or when there are errors. Error correction ensures that with overwhelming probability c decodes to m.

The main technical difference between randomized and self-seeded codes is adaptivity: in the former the message m to be encoded is chosen by the adversary before the randomness r is chosen, while in the latter the message m can adaptively depend on r.111Randomized codes may also be secret-coin, where the adversary never sees the full randomness r. However, our construction will be public-coin and the randomness r can be easily recovered from the codeword c. Self-seeded codes allow us to fix the randomness for the code once and for all as a public seed, and afterwards the code is fully deterministic. Moreover, the seed can be chosen by the encoder unilaterally without any coordination with the decoder. For example, different software manufacturers can individually choose their own seeds to be used by the encoding algorithms in their software, while allowing for interoperability with the decoders implemented by other manufacturers without any coordination. Self-seeded codes should be contrasted to the notion of seeded codes in [3], which required a shared random seed as a common trusted setup between the encoder and decoder. Self-seeded codes are the strongest notion and directly imply the seemingly incomparable notions of randomized codes and seeded codes.

Minimal Assumptions

We show that any randomized code that goes beyond the Singleton bound with error detection tolerance ρ>1R or error correction tolerance ρ>(1R)/2 implies one-way functions, while any such seeded (and therefore also self-seeded) code implies collision-resistant hash functions.

Main Theorem

We construct randomized and self-seeded codes going beyond the Singleton bound, with essentially optimal parameters for both error detection and error correction, under the above respective minimal assumptions.

Theorem 1 (Informal).

Assuming one-way functions (resp. collision-resistant hash functions) there exist efficiently computable randomized (resp. self-seeded) error-detection and error-correction codes for arbitrary polynomial-time adversarial errors with the following parameters:

  • Error Detection: For any constant 0<ε<1, there is a code over a constant-sized alphabet with rate R=1ε that detects a ρ=1ε fraction of errors.

  • Error Correction: For any constants 0<ε<1 and 0ρ<1/2, there is a code over a constant-sized alphabet with rate R=1ρε that uniquely corrects from a ρ fraction of errors.

If we assume sub-exponentially secure one-way functions (resp. collision-resistant hash functions), then the above properties hold even for sub-exponential time adversarial errors.

Recall that, in the case of error-detection, the Singleton bound says that the relative error tolerance ρ1R necessarily degrades as the rate improves when considering worst-case errors; in contrast, for computationally bounded errors, our result simultaneously achieves the best possible error tolerance ρ1 and rate R1. In the case of error correction, the Singleton bound says the maximal error tolerance is ρ(1R)/2 for worst-case errors, while for computationally bounded errors our result achieves twice as large error tolerance ρ1R, subject to the upper limit ρ<1/2.

Our achieved parameters are essentially optimal, up to the arbitrarily small constant ε. This is clear for error detection, where we get the best possible rate R1 and error tolerance ρ1 simultaneously. For error-correction, one cannot tolerate ρ1/2 fraction of errors since an efficient adversarial channel can modify 1/2 the codeword positions to a fresh encoding of a different message, causing decoding to fail with probability at least 1/2.222The works of [11, 12] overcome this limitation by having a stateful and secret keyed encoding procedure to ensure that an adversarial channel cannot generate/reuse valid codewords. Moreover, one cannot have rate R>1ρ since zeroing out the first ρ fraction of positions of the codeword would remove information about the message, causing decoding to fail.

We remark that, not only are one-way functions and collision-resistant hash functions the minimal assumptions needed to achieve the above results, they are considered some of the most standard and well-studied cryptographic assumptions/primitives. In particular, one-way functions are a minimal assumption necessary for essentially any non-trivial task in cryptography and are implied by all concrete cryptographic hardness assumptions. Collision-resistant hash functions are not known to be implied by one-way function, but can be constructed from essentially every concrete cryptographic hardness assumption, such as hardness of factoring, discrete logarithms, learning with errors, etc.

The closest related prior work of [3] achieves essentially the same parameters as our work, but only for the weaker notion of seeded codes, which require trusted setup, and only by assuming a strong form of “two-input correlation-intractable hash functions”, which we do not know how to instantiate under any standard cryptographic hardness assumption. Our result for self-seeded codes yields a stronger primitive under significantly weaker minimal assumptions. Comparing the notions of randomized, seeded and self-seeded codes, it is clear that self-seeded codes are the strongest notion easily implying the other two, while randomized and seeded codes are seemingly incomparable. Interestingly, since we show that any non-trivial seeded codes imply collision-resistant hashing, which in turn implies self-seeded codes, we show that seeded and self-seeded codes are equivalent in terms of the needed assumptions.

Simultaneous Correction and Detection

While we stated separate results for error-detection and error-correction above, we can also simultaneously achieve a non-trivial combination of correction and detection. In particular, we can augment the previous theorem statement with the following:

For any constants 0<ε<1 and 0ρc<1/2, there is a code over a constant-sized alphabet with rate R=1ρcε that simultaneously uniquely corrects from a ρc fraction of errors and detects a ρd=1ρcε fraction of errors.

The above implies both of the previously stated results for error detection (by setting ρc=0) and error correction (by ignoring ρd). However simultaneous error correction and detection is an interesting property, which is more than the sum of its parts. It gives a single decoding procedure that either outputs a message or (error detection symbol) and is guaranteed to uniquely recover the correct message if the error rate is ρc and to never output the wrong message as long as the error rate is ρd. We can envision scenarios where correcting a small fraction of errors (e.g., ρc=.1) may suffice. Our result show that for such instances, we can have a code with a correspondingly large rate (e.g., R.9) while simultaneously being able to detect a much larger fraction of errors (e.g., ρd.9) at no additional cost.

Alphabet Size

We note that our results hold for large constant-size alphabets, but do not extend to the case of a binary alphabet. Giving optimal results for binary codes that detect/correct computationally bounded errors under minimal assumptions remains an interesting open problem. The results of [3] for seeded error correcting codes using a strong form of correlation intractable hash functions extend to the binary case and show that one can uniquely decode computationally bounded errors with essentially the same parameters as list-decoding for worst-case errors. It remains an open problem to achieve such results for randomized or self-seeded codes under any assumptions, and/or to to get provably secure seeded codes under standard cryptographic hardness assumptions.

1.2 Our Techniques

Our techniques are quite simple in retrospect. We start by showing how to construct self-seeded codes assuming collision-resistant hashing, and then discuss how to modify these results to get analogous ones for randomized codes from one-way functions. Throughout the introduction, we use “” to denote that two values differ by an arbitrarily small constant.

Self-Seeded Error Detection with Sub-Exponential Alphabet

We start with a naive construction of self-seeded error-detection codes that requires a sub-exponentially large alphabet size, and then show how to reduce the alphabet size to some (large) constant. Let ={hr:{0,1}k{0,1}λ}r{0,1}λ be a collision-resistant hash function family with output length λ (security parameter), which we can think of as λ=kα for some small constant α>0. Given a random public seed r{0,1}λ, no polynomial-time adversary can find two messages mm such that hr(m)=hr(m).

The encoding algorithm Enc(m;r) takes a message m{0,1}k and parses it as n=k/w symbols m=(m1,,mn) over an alphabet Σdata={0,1}w for some large w=O(λ). It then “folds in” the seed r and hash of the message hr(m) into each message symbol to yield the codeword

c=([m1,r,hr(m)],[m2,r,hr(m)],,[mn,r,hr(m)])Σn

interpreted as n symbols over the larger alphabet Σ={0,1}w+2λ. The decoding algorithm Dec(c) gets

c=([m1,r1,y1],[m2,r2,y2],,[mn,rn,yn])Σn,

which gives a candidate message m=(m1,,mn). It checks that the second part of each codeword symbol has the same common seed value r1==rn and that the third part of each symbol has the same common hash value yi=hri(m); if so it outputs m else . The code achieves rate R=w/(w+2λ)1 by choosing a sufficiently large w=O(λ). Moreover, it achieves error-tolerance ρ=11/n. Unless the adversary changes every symbol of the codeword c, or finds a collision, decoding is guaranteed to output the correct message m or . To see this, assume at least one symbol of the codeword remains unchanged and decoding outputs mm for some m. Then it must be the case that ri=r,yi=hr(m)=hr(m) for all i and therefore the this gives a collision mm such that hr(m)=hr(m).

Self-Seeded Error Detection with Constant Alphabet

We reduce the alphabet size to a constant by applying a standard error-correcting code (Encctrl,Decctrl) over a constant sized alphabet to the small “control” value (r,hr(m)). In particular, for some constants w>v, we use a code with alphabet Σctrl={0,1}v such that Encctrl:{0,1}2λΣctrln maps a 2λ-bit message to a codeword of length n=k/w. The corresponding rate of the code is very small Rctrl=O(λ/k)=o(1), but we will want large distance δctrl1. Such codes can be constructed by concatenating Reed-Solomon codes with a brute-force inner code over the alphabet Σctrl (see Theorem 13). We do not need efficient error-decoding for now, but only require an efficient way to recognize and valid codewords and invert them without errors. The encoding algorithm of our self-seeded code Enc(m;r) first computes cctrl=(cctrl,1,,cctrl,n)=Encctrl((r,hr(m)))Σctrln. It interprets m{0,1}k as m=(m1,,mn)Σdatan consisting of n=k/w symbols over the alphabet Σdata={0,1}w and then “folds” the two vectors together to yield the final codeword

c=([m1,cctrl,1],,[mn,cctrl,n])Σn,

interpreted as n symbols in over the alphabet Σ=Σdata×Σctrl={0,1}v+w. The decoding algorithm Dec(c) gets

c=([m1,cctrl,1],,[mn,cctrl,n])Σn,

and recovers a candidate message m=(m1,,mn) as well as a candidate cctrl=(cctrl,1,,cctrl,n). It checks that cctrl is a valid codeword and if so recovers (r,y) such that cctrl=Encctrl(r,y). If cctrl is not a valid codeword or hr(m)y, it outputs else it outputs m. Note that this generalizes the previous naive idea with a sub-exponentially large alphabet, which corresponds to using a repetition code for the control code. The rate of the overall code is R=w/(w+v) which can be made arbitrarily close to 1 by choosing a sufficiently large constant w>v. The error-tolerance is ρ=δctrl1. In particular, if c is within distance ρ of c and cctrl is a valid codeword then it must be the case that cctrl=cctrl and hence r=r,y=hr(m). In this case, the only way that that decoding can output some mm is if the values form a hash collision hr(m)=hr(m).

Self-Seeded Error Correction

To achieve error-correction, we use essentially the same construction as before, but additionally encode the data m using an efficiently list-decodable error-correcting code (Encdata,Decdata) over a constant size alphabet Σdata={0,1}w for some large constant w. For any constant ρ<1/2 we want this code to have rate Rdata1ρ and to be list-decodable from a ρ fraction of errors. This can be accomplished using the codes of [5]. Moreover, we will now want the code (Encctrl,Decctrl) to efficiently uniquely decode from ρ errors, which can be achieved with the same parameters as previously. Our self-seeded code Enc(m;r) computes cctrl=(cctrl,1,,cctrl,n)=Encctrl((r,hr(m)))Σctrln as before, and also cdata=(cdata,1,,cdata,n)=Encdata(m)Σdatan. It then “folds” the two codewords together to yield the final codeword

c=([cdata,1,cctrl,1],,[cdata,n,cctrl,n])Σn

where Σ=Σdata×Σctrl={0,1}v+w. The decoding algorithm Dec(c) gets

c=([cdata,1,cctrl,1],,[cdata,n,cctrl,n])Σn.

It applies list-decoding on the data part cdata of the received value c to recover a polynomial list of candidate messages {m(i)}=Decdata(cdata). It also uniquely decodes (r,y)=Decctrl(cctrl). Then it outputs the first value m(i) such that hr(m(i))=y, or outputs if none is found or if either of the decoding procedures fail. The rate of the overall code is R=Rdata(w+v)/w1ρ by choosing sufficiently large constants w>v. To analyze the error-correction, first observe that if the received value c within relative distance ρ of c=Enc(m;r) then, by the correctness of list-decoding of the “data code”, the correct message m will be in the decoded list with m=m(i) for some i, and by the correctness of unique decoding of the “control code” we have (r,y)=(r,hr(m)). So the only way that error correction can fail is if there is another m(j)m in the list such that hr(m(j))=hr(m), but this gives a collision on the hash function.

Randomized Codes from One-Way Functions

Our constructions of randomized codes with error-detection and error-correction are identical to the above constructions of self-seeded codes, but instead of requiring the hash family hr to be collision-resistant, we only need it to be a universal one-way hash function (UOWHF). A UOWHF ensures that a polynomial time adversary that selectively chooses a message m first and then gets the seed r cannot find a collision mm such that hr(m)=hr(m) except with negligible probability. Such UOWHFs can be constructed from one-way functions [13, 16]. Recall that the only difference between self-seeded and randomized codes is whether the encoded message m can be chosen adaptivley depending on the randomness r or selectively before r is chosen, which exactly matches the difference between collision-resistant hash functions and UOWHFs.

Necessity of Assumptions

We show that self-seeded codes beating the Singleton bound, with error-detection tolerance ρ>1R or error-correction tolerance ρ>(1R)/2, imply collision-resistant hash functions (CRHFs). Start by considering a self-seeded error-detection code (Enc,Dec) with error-detection tolerance ρ. We claim that for every fixed subset S of 1ρ fraction of codeword positions, the function hr(m)=Enc(m;r)[S] that outputs the codeword symbols of Enc(m;r) in the positions indexed by S is collision-resistant. In particular, if there is an efficient adversary that, given r, can find a collision mm such that hr(m)=hr(m), it means that Enc(m;r) and Enc(m;r) are at distance <ρ and therefore such an adversary can break error detection by choosing the message m and modifying Enc(m;r) to Enc(m;r). Moreover, if the rate of the code exceeds the Singleton bound with R>1ρ, then the functions hr are compressing, making them CHRFs. To get the analogous result for error-correction, we simply note that ρ-error-correction implies (2ρ)-error-detection. In particular, if an efficient adversary given r can find two messages m0m1 such that Enc(m0;r) and Enc(m1;r) are within relative distance 2ρ then it can find a “midpoint” c which is at relative distance ρ from both Enc(m0;r) and Enc(m1;r). For one of b{0,1}, modifying Enc(mb;r) to c necessarily causes decoding to output the incorrect message m1b.

We also show that randomized error-detection and error-correction codes beating the Singleton bound imply one-way functions. The argument is somewhat different from above since we don’t want to assume that Enc(m;r) reveals r in the full. Instead, we rely on the fact that, if one-way functions don’t exist, then we can efficiently sample “almost uniform” inverses of any efficient function [9]. Consider a random codeword c=Enc(m;r) for a random message m and randomness r. For any subset S of 1ρ fraction of codeword positions we can efficiently find an almost uniform inverse (m,r) of the function f(m,r)=Enc(m;r)[S] that outputs the codeword symbols in positions S. This means that c=Enc(m;r) and c=Enc(m;r) agree on the positions in S and therefore have relative distance ρ. If the rate of the code exceeds the Singleton bound with R>1ρ, then there is a non-negligible probability that mm since f loses information about m. Therefore the channel that changes c to c is efficient and defeats error detection by introducing ρ-fraction errors. Similarly the channel that changes a random subset of ρ/2 positions outside of S from c to c defeats error correction since decoding is almost as likely to output m as m.

Organization of the Paper

In Section 2 we give general notations and state the ingredients that we will use. In Section 3 we give formal definitions for randomized codes and self-seeded codes. In Section 4 we state and prove our main theorems. In Section 5 we show that randomized codes imply one-way functions. Due to space limitations, this extended abstract does not contain the proof that self-seeded codes imply collision resistance hash functions. The reader is referred to [22] for a full version that contains this proof.

2 Preliminaries and Ingredients

2.1 Notations

All logarithms are taken in base 2. Let poly stand for the set of all polynomials. Let PPT stand for non-uniform probabilistic polynomial-time. We note that while we rely on the non-uniform model of computation by default, all our results also hold if we were to define PPT in the uniform model. We use sans-serif (e.g., 𝖠) for PPT algorithms. A function ν:[0,1] is negligible, denoted ν(k)=neg(k), if ν(k)<1/p(k) for every ppoly and large enough k. For a set S, let xS denote that x was drawn uniformly from S. Given a distribution P, we use xP to denote that x is chosen according to P.

Hamming distance

The Hamming distance between x,y[q]n is Δ(x,y)=|{i:xiyi}|. The relative Hamming distance between x,y[q]n is δ(x,y)=Δ(x,y)n.

Distributions and random variables

The statistical distance (also known as, variation distance) of two distributions P and Q over a discrete domain X is defined by SD(P,Q)=maxSX|P(S)Q(S)|.

Definition 2 (Min-entropy).

A random variable X has min-entropy k, denoted H(X)k,

if maxx𝐏𝐫[X=x]2k.

We will also use the following lemma about average min-entropy:

Definition 3 (Average min-entropy).

Let (X,Y) be a pair of random variables. The average min-entropy of X conditioned on Y is:

H~(X|Y)=log[𝐄yY[2H(X|Y=y)]]

Average min-entropy gives a bound on the probability that an adversary that gets the value of Y will guess the value of X in a single attempt. We will use the following useful lemma that connects average min-entropy and min-entropy.

Lemma 4 ([2, 15]).

H~(X|Y)H(X,Y)v, where 2v is the number of elements in support of Y.

2.2 One-Way Functions

Definition 5 (One-way functions).

A PPT algorithm 𝖥:{0,1}{0,1} is a one-way function if for every PPT 𝖠 and every sufficiently large k,

𝐏𝐫x{0,1}k[𝖠(1k,𝖥(x))𝖥1(𝖥(x))]neg(k).
Distributional one-way functions

We will also use the classical result of Impagliazzo and Luby on distributional one-way functions implying one-way functions.

Definition 6 (Distributional One-way functions).

A PPT algorithm 𝖥:{0,1}{0,1} is a distributional one-way function if there exists a constant c>0 such that for every PPT 𝖠 and every sufficiently large k,

SD((𝖠(1k,𝖥(X)),𝖥(X)),(X,𝖥(X)))1/kc.

where X is sampled uniformly from {0,1}k.

Theorem 7 (Distributional one-way functions imply one-way functions, [9]).

If there exists a distributional one-way function then there exists a one-way function.

2.3 Universal One-Way Hash Functions

Definition 8 (Universal one-way hash functions).

For a security parameter k. A family of functions ={hs:{0,1}n(k){0,1}(k)}s{0,1}(k) is a family of universal one-way hash functions (UOWHFs) if it satisfies:

  1. 1.

    Efficiency: Given s{0,1}(k) and x{0,1}n(k), hs(x) can be evaluated in time poly(n(k),(k)).

  2. 2.

    Shrinking: (k)<n(k).

  3. 3.

    Target Collision Resistance: For every PPT 𝖠 and sufficiently large k, if we consider the following randomized experiment:

    • (x,state)𝖠(1k){0,1}n(k)×{0,1},

    • Sampling s{0,1}(k),

    • x𝖠(state,s)

    Then, 𝐏𝐫[xxhs(x)=hs(x)]neg(k).

The notion of universal one-way hash functions (UOWHF) was first introduced by Naor and Yung [13]. Rompel [16] gave the first UOWHF construction from arbitrary one-way functions.

Theorem 9 (One-way functions imply universal one-way hash function, [13, 16]).

There is a black-box construction of universal one-way hash functions (UOWHFs) from one-way functions (OWFs). In particular, assuming OWFs, for any arbitrarily large constant c and arbitrarily small constant δ>0 there is a UOWHF with input length n(k)=kc and output/seed length (k)=kδ.

2.4 Collision Resistance Hashing

Definition 10 (Collision resistance hash functions (CRHF)).

For a security parameter k. A family of functions ={hs:{0,1}n(k){0,1}(k)}s{0,1}(k) is a family of collision resistance hash functions (CRHFs) if it satisfies:

  1. 1.

    Efficiency: Given s{0,1}(k) and x{0,1}n(k), hs(x) can be evaluated in time poly(n(k),(k)).

  2. 2.

    Shrinking: (k)<n(k).

  3. 3.

    Collision Resistance: For every PPT 𝖠 and sufficiently large k,

    𝐏𝐫s{0,1}k,x,x𝖠(s)[xxhs(x)=hs(x)]neg(k).

We mention that the parameters of CRHFs above do not matter much: any “non-trivial” CRHF implies an “arbitrarily good” CRHF. That is, start with any CRHF with arbitrarily large polynomial seed/output length (k) that has even 1-bit compression n(k)=(k)+1. Then, for any arbitrarily small constant δ>0 and arbitrarily large constant c, we can also get a CRHF with small seed/output length (k)=kδ and large input length n(k)=kc. Therefore, when we assume the existence of CRHFs we without loss of generality assume the above “arbitrarily good” CRHF parameters.333We can decrease the seed/output length by changing the security parameter from k to kε for a sufficiently small ε and noting that poly/negl in kε is the same as poly/negl in k. We can increase the input length arbitrarily using the Merkle-Damgard transform.

2.5 Standard Correcting/Detecting Codes

We now define error-correcting codes for information-theoretic adversaries also called Hamming channels. In this section, a code is a pair (Enc,Dec) of deterministic encoding and decoding algorithms, and different notions are obtained by considering the requirements of the decoding algorithm.

Definition 11 (Standard codes for Hamming channels).

Let k,n,q be parameters and let Enc:{0,1}k[q]n be a function. We say that Enc is an encoding function for a code that:

  • decodes d errors, if there exists a function Dec:[q]n{0,1}k such that for every m{0,1}k and every v[q]n with Δ(Enc(m),v)d, we have Dec(v)=m.

  • L-list-decodes d errors, if the function Dec is allowed to output a list of size at most L, and for every m{0,1}k and every v[q]n with Δ(Enc(m),v)d, we have mDec(v).

  • detects d errors, if there exists a function Dec:[q]n{0,1}k{} such that for every m{0,1}k, Dec(Enc(m))=m and for every v[q]n such that vEnc(m) and Δ(Enc(m),v)d, we have Dec(v)=.

The rate of the code is the ratio of the message length and output length of Enc, where both lengths are measured in bits. That is the rate R=knlogq.

A code (Enc, Dec), is explicit if Enc and Dec run in polynomial time. (Naturally, this makes sense only for a family of encoding and decoding/detecting functions with varying message length k block length n(k), and alphabet sizes q(k)).444We point out that traditionally codes are indexed by the codeword length (that is, k and q are chosen as a function of the codeword size n). However, for the sake of consistency with the rest of our definitions, we chose to define the codeword length n and alphabet q as a function of the input size k.

We will also use the following construction by Guruswami and Rudra [5] for optimal list-decodable codes over a constant alphabet.

Theorem 12 ([5]).

For every 0<R<1, every sufficiently small ε>0, there is an explicit family of codes over an alphabet of size 2O(ε4log(1ε)) that have rate at least R and which can be list decoded up to a fraction 1Rε of errors in polynomial time, with a list of size nO(ε1log(1ε)).555We remark that while Theorem 12 does not explicitly mention that the rate can be achieved for every sufficiently large message size k, and instead just claims that it holds for infinitely often choices of k. A careful examination of the construction and proof by [5] reveals that the argument and proof hold for every sufficiently large k, given a suitable expander graph. An explicit construction for such graphs were given for every sufficiently large k in [10] (see Lemma 2.7 for a formal proof). Thus, we assume that Theorem 12 holds for every sufficiently large k.

It is folklore that for every ε>0 by using code concatenation it is possible to construct standard codes with distance d1ε and rate R=o(1) for any sufficiently large k with an alphabet size that depends on ε. We make this formal in the following Theorem (see for example the book by Guruswami, Rudra, and Sudan [6] for more details).

Theorem 13.

For every sufficiently small 0<ε, and every computable function n: such that for every sufficiently large k, n(k)>kc, (for a universal constant c>1) there exists an explicit standard code (Enc,Dec), for Enc:{0,1}k[q]n(k) and Dec:[q]n(k){0,1}k with alphabet q=2(1/ε3) such that the code has distance d(k)>n(k)εn(k) and (uniquely) decodes from d(k)/21 errors.

3 Definitions for Randomized Codes and Self-Seeded Codes

In this section, we formally define randomized codes and self-seeded codes, and their related notions of error detection and correction.

3.1 Randomized Codes

We start by giving a formal definition of randomized encoding and decoding algorithms. This syntax will be also shared with our notion of self-seeded codes defined in the next section.

Definition 14 (Randomized codes).

Let n,q: be poly-time computable functions. An (n,q)-randomized code is a pair (Enc,Dec) such that:

  • Enc is a PPT algorithm that on input m{0,1}k outputs a string in c[q(k)]n(k). We use Encr(m) to denote the instantiation of Enc(m) when using the string r as random coins.

  • Dec is a PPT algorithm that on input z[q(k)]n(k) outputs a string m^{0,1}k{}.

For a fixed k, the rate of the code for k is defined to be Rk=klog(q)n(k). When R=limkRk[0,1] is well defined we say that R is the rate of the code. We omit all or a subset of the functions n and q when they are clear from the context.

We now define a notion of reliable codes for randomized codes, which essentially states that it is possible to recover the message given a codeword (that was not damaged by error).

Definition 15 (Codes with reliable decoding).

A randomized code (Enc,Dec) is said to be perfectly reliable if for every k and m{0,1}k, 𝐏𝐫[Dec(Enc(m))=m]=1.

Intuitively, in randomized codes, each time the sender wants to send a message, it samples fresh randomness on the fly and uses it to encode the message. We make this formal in the definition below.

Definition 16 (Error detection/correction for randomized codes).

Let n,,d: be poly-time computable functions and let (Enc,Dec) be a randomized code. For a PPT algorithm 𝖠 and k, consider the following randomized experiment:

  1. 1.

    (m,state)𝖠(1k){0,1}k×{0,1}

  2. 2.

    Compute cEnc(m) (by sampling r{0,1}(k) and computing Encr(m)).

  3. 3.

    z𝖠(m,state,c)

We say that the code:

  • is reliable if for every PPT algorithm 𝖠 and every sufficiently large k:
    𝐏𝐫[Dec(c)m]neg(k).666This is a weaker variant of Definition 15. Our construction achieves the stronger variant of perfect reliability. However, our lower bounds results are stronger since they apply to this weaker notion.

  • detects d-errors if it is a reliable code, and for every PPT algorithm 𝖠 and every sufficiently large k: 𝐏𝐫[Δ(c,z)d(k)Dec(z){,m}]neg(k).

  • corrects d-errors if for every PPT algorithm 𝖠 and every sufficiently large k:
    𝐏𝐫[Δ(c,z)d(k)Dec(z)m]neg(k).

For a constant ρ[0,1], the code detects or corrects ρ-relative errors, if for every sufficiently large k, ρd(k)/n(k).

Note that by definition if a code corrects d-errors for d>0 then the code is reliable. In addition, our definition of error detection permits also error correction (that is, when errors are induced, the decoding algorithm can either return or successfully decode and return the original message m). By doing so we can achieve simultaneous error detection and correction (that is, the same decoding function can at the same time satisfy the notion of detection and correction defined above).

3.2 Self-Seeded Codes

We now formally, define the notion of self-seeded codes. A self-seeded code is a randomized code that shares the same syntax as defined in 14. However, a self-seeded code provides a strictly stronger notion of security as compared to the notion of error correction and detection for randomized codes stated above in Definition 16.

Definition 17 (Error detection/correction for self-seeded codes).

Let n,,d: be poly-time computable functions. A randomized code is said to be a self-seeded code if the following holds: For a PPT algorithm 𝖠 and k, consider the following randomized experiment:

  1. 1.

    r{0,1}(k)

  2. 2.

    (m,z)𝖠(r)

We say that the code:

  • is reliable if for every PPT algorithm 𝖠 and every sufficiently large k:
    𝐏𝐫[Dec(Encr(m))m]neg(k).

  • detects d-errors if it is a reliable code, and for every PPT algorithm 𝖠 and every sufficiently large k: 𝐏𝐫[Δ(Encr(m),z)d(k)Dec(z){,m}]neg(k).

  • corrects d-errors if for every PPT algorithm 𝖠 and every sufficiently large k:
    𝐏𝐫[Δ(Encr(m),z)d(k)Dec(z)m]neg(k).

For a constant ρ[0,1], the code detects or corrects ρ-relative errors, if for every sufficiently large k, ρd(k)/n(k).

Self-seeded code enables the sender to sample its randomness once, fix it, and then use this same randomness for encoding multiple messages (this contrasts with the standard notion of randomized codes, which requires the sender to generate fresh randomness for each message in real time). We stress that after being fixed, this randomness is known to any potential adversary but is not given to the decoding algorithm.777We note that our notion of self-seeded codes is strictly stronger than the notion of “seeded codes” defined in [3]. Seeded codes (unlike self-seeded codes), requires the encoding and decoding algorithms to share a random seed. Thus, unlike self-seeded codes, seeded codes necessitate a trusted setup between the encoding and decoding. Still, we require that no efficient adversary can find a message and a close error vector that leads to a decoding error.

Indeed, self-seeded codes offer a stronger adaptive notion of security, where the adversary can choose the message m adaptively based on the randomness fixed by the sender. In contrast, standard randomized codes require the sender to choose fresh randomness after the adversary has decided on the message.

4 Optimal Randomized Codes and Self-Seeded Codes

We now state our main theorems for randomized codes and self-seeded codes. Our results show that it is possible to construct codes that bypass the Singleton bound for computationally bounded channels under minimal cryptographic assumptions. Specifically, for the randomized codes construction, we assume the existence of one-way functions, and for the self-seeded codes construction, we assume the existence of a collision resistance hash function.

Main theorem

We state our result for randomized codes and self-seeded codes together in the following theorem.

Theorem 18 (Randomized codes [resp., self-seeded codes] with constant alphabet size).

Assuming the existence of one-way functions [resp., collision resistance hash functions], the following holds: For every constant 0p<1/2 and every sufficiently small constant 0<ε, there exists a (q,n)-randomized [resp., (q,n)-self-seeded code] code with rate R=1pε that simultaneously (uniquely) corrects p relative errors and detects from 1pε relative errors. Moreover, the alphabet size q is a constant that depends on ε.

The proof of Theorem 18 appears in Section 4.2 and relies on our construction described in Section 4.1.

 Remark 19.

If the underlying cryptographic primitives assumed in our theorem statements (that is, one-way functions for randomized codes and collision resistance hash functions for self-seeded codes) have sub-exponential security, then our constructed codes in Theorem 18 will also have sub-exponential failure probability for correction and detection against channels that are computable by sub-exponential size circuits. This follows since all our reductions run in polynomial time and have a polynomial loss in advantage. However, for the sake of simplicity, we focus on the case of polynomial/negligible security in our formal statements and proofs.

4.1 Construction

In this section, we describe the construction that we will use to prove Theorem 18.

We now describe the encoding and decoding algorithms. In Figure 1, we state the parameters and ingredients that are needed to define our encoding and decoding algorithms (Enc,Dec) that are specified in Figure 2. The encoding and decoding algorithms are essentially the same for randomized codes and for self-seeded codes. In both cases the encoding algorithm Enc takes as input a message m and some value r. For randomized codes, a fresh random r is chosen every time we send a new message, and for the self-seeded codes r is sampled once (and is given to any possible adversary), and this single fixed r could be used in the encoding of multiple messages.

In both cases, the decoding algorithm Dec only takes as input a possibly damaged codeword z. In particular, the decoding algorithm is not provided with the seed r.

Figure 1: Parameters and Ingredients for folding construction.
Figure 2: Encoding and decoding algorithm.

4.2 Proof of Theorem 18

In this section, we prove Theorem 18. We make use of the following technical lemma. The proof for the lemma is given below in Section 4.2.1.

Lemma 20.

If there are ingredients and parameters that satisfy the conditions stated in Figure 1. Moreover, if the hash family specified in Figure 1 is a family of universal one-way hash functions [resp., collision resistance hash functions]. Then, the code (Enc,Dec) as specified in Figure 2 is a (q,n)-randomized code [resp., (q,n)-self-seeded code]. Moreover, the code is perfectly reliable, has alphabet size q=qctrlqdata, rate R=Rdatalog(qdata)log(qdata)+log(qctrl), and the decoding algorithm simultaneously,

  • detects from dctrlddata errors, and

  • corrects from ddata errors.

Proof of Theorem 18.

Note that one-way functions imply universal one-way hash functions that satisfy the conditions stated for the hash family in Figure 1 (see, Theorem 9).

Given a sufficiently small 0<ε, let ε^=ε/4 and apply Theorem 12 for ε^ and Rdata=1pε^ to get a standard code (Encdata,Decdata) with rate Rdata=1pε^, codeword length ndata alphabet size qdata=2O(ε^5) . Moreover, (Encdata,Decdata) list decodes from ddata=pndata errors (that is, from p relative errors) with a list size that depends only on ε^ and k. Note also that we can increase the alphabet size by bundling symbols together without damaging the list decoding and the rate Rdata of the code, and so, we can assume that log(qdata) is of size at least ε^5. To summarize (Encdata,Decdata) meets the conditions stated in Figure 1 for “data code” for the parameters Rdata and ddata and log(qdata)>ε^5.

Apply theorem 13 for ε^ to get a code (Encctrl,Decctrl) with codeword length ndata=nctrl, alphabet size at most qctrl=2ε^3 that has distance dctrl>nctrl(1ε^) and efficient decoding of up to dctrl/21 errors. Note that for a sufficiently small ε (and ε^) it holds that dctrl>2ddata+1. This means that (Encctrl,Decctrl) meets the conditions stated in Figure 1 for the “control code” with parameters dctrl and log(qctrl)<ε^3.

Thus, we can apply Lemma 20 for (Encdata,Decdata), (Encctrl,Decctrl) and to get a (q,n)-randomized code when we take to be a universal one-way hash family and (q,n)-self-seeded code if is a collision resistance hash family. In both cases, it follows that

  • n=ndata

  • q=qdataqctrl=2O(ε5),

  • for every sufficiently small ε (and ε^), R=Rdatalog(qdata)log(qdata)+log(qctrl)>(1pε^)11+ε^>(1pε).

  • The code detects from dctrlddata=n(1ε^)np>n(1pε) errors. That is, it detects from 1pε relative errors.

  • The code corrects ddata=pn errors. That is, it corrects up to p relative errors.

This concludes the proof of Theorem 18.

4.2.1 Proof of Lemma 20

Let Encdata,Encctrl,hr, be as defined in Figure 1 and let (Enc,Dec) be as defined in Figure 2. By construction, (Enc,Dec) is perfectly reliable. This holds since for every fixed randomness r if no errors are applied, all ingredients in the construction are deterministic, and standard codes are by definition perfectly reliable. Similarly, by construction q=qctrlqdata and

R=knlog(q)=Rdata(nlog(qdata))n(log(qctrl)+log(qdata))=Rdatalog(qdata)log(qdata)+log(qctrl).

We now prove the rest of Lemma 20. For simplicity, we focus on the randomized codes case, the proof for the self-seeded codes is essentially the same and follows along similar lines.888The main difference is that we consider an adaptive adversary (that is, first r{0,1}(k) and then (m,z)𝖠(r)) and we reduce to the security of the CRH instead of the security of the UOWHF, otherwise, the proof is identical.

Consider the following randomized experiment for a PPT algorithm 𝖠 and k:

  1. 1.

    (m,state)𝖠(1k){0,1}k×{0,1}

  2. 2.

    Compute cEncr(m) (by sampling r{0,1}(k)).

  3. 3.

    z𝖠(m,state,c)

We make use of the following notation: Let cdata=Encdata(m), w=(hr(m),r) and cctrl=Encctrl(w). Recall that z([qdata]×[qctrl])n, and define zdata=((zdata)1,,(zdata)n) and zctrl=((zctrl)1,,(zctrl)n) such that for every i[n], zi=((zdata)i,(zctrl)i). Let w^=(y^,r^)=Decctrl(zctrl) and List=Decdata(zdata).

Error correction

To prove the error correction property, we need to show that

𝐏𝐫[Δ(c,z)ddataDec(z)m]neg(k).

We make use of the following claims:

Claim 21.

If Δ(c,z)ddata then mList and w=w^.

Proof of Claim 21.

If Δ(c,z)ddata it follows that Δ(cctrl,zctrl)ddata and
Δ(cdata,zdata)ddata. Since (Encctrl,Decctrl) is a standard code that uniquely decodes from dctrl/21ddata errors it holds that w=w. Similarly, since (Encdata,Decdata) is a standard code that list-decodes from ddata errors it follows that mList.

Claim 22.

m^List, if m^m then 𝐏𝐫[hr(m^)=y]neg(k).

Proof of Claim 22.

Follows directly from the security of the universal one-way hash function, as otherwise, we can use 𝖠 to find collisions in h contradicting the collision resistance property.

By Claim 21, mList and w=w^. Thus, by Claim 22, with all but negligible probability, m is the only message in List for which hr(m)=y (passing the “Check consistency” step in the decoding algorithm) which implies Dec(z)=m.

Error detection

To prove the error detection property, we need to show that

𝐏𝐫[Δ(c,z)dctrlddataDec(z){m,}]neg(k).

We make use of the following Claim.

Claim 23.

If Δ(c,z)dctrlddata and Dec(z) then w=w^.

Proof.

Let c^ctrl=Encctrl(w^) and note that the by the “Check control distance” step in the decoding algorithm (see Figure 2), if Dec(z) it follows that Δ(zctrl,c^ctrl)ddata. Moreover, by construction, if Δ(c,z)dctrlddata it follows that Δ(cctrl,zctrl)dctrlddata. Thus, if Dec(z) and Δ(c,z)dctrlddata by the triangle inequality

Δ(cctrl,c^ctrl)dctrl.

Since cctrl and c^ctrl are both valid codewords of the standard code (Encctrl,Decctrl) that has distance dctrl, it follows that w=w^. By Claim 23 if Dec(z) then w=w^. This concludes the proof, since by Claim 22, if w=w, with all but negligible probability, m is the only valid message that can be outputted by Dec(z).

5 Randomized Codes Imply One Way Functions

We now show that non-trivial randomized codes imply the existence of one-way functions. For large alphabets, this is indeed optimal since as we have seen in our construction it is possible to construct optimal randomized codes assuming only universal one-way functions (which can be constructed from one-way functions).

The following theorem states that if a randomized code has a rate that bypasses the singleton bound, it implies one-way functions.

Theorem 24 (Randomized codes with rate beating the Singleton bound imply one way functions).

Let (Enc,Dec) be a randomized code with rate 0R<1. If one of the following holds:

  • The code detects ρ0-relative errors and ρ0>1R+1n.

  • The code corrects from ρ1-relative errors and 2ρ1>1R+2n.

Then there exists one-way functions.

The proof will make use of the following lemma.

Lemma 25 (Non-trivial error detection/correction for randomized codes implies one-way functions).

Let (Enc,Dec) be an (n,q)-randomized code that has at least one of the following properties:

  • The code detects d-errors.

  • The code corrects d/2-errors.

Let snd, if vlog(q)sk1 then 𝖥:{0,1}k+{0,1}v defined so that on input (m,r){0,1}k+ it computes Encr(m) and outputs a binary representation of the first s symbols is a distributional one-way function.999Here, is just the implicitly assumed bound on the number of random bits used by Enc. 101010The statement of Lemma 25 could be readily extended so that restricting the output of Encr(m) to any s symbols (that can be efficiently identified) not just the first s symbols is itself a distributional one-way function. This would be similar to our statement on self-seeded codes made in the full version [22], where we show that restricting the output to any subset of size s is a collision resistance hash family. However, for the sake of simplicity, we restrict our statement and analysis to the simple case where the distributional one-way function is defined to be the first s symbols.

The proof for the lemma is given below, but first, we will use it to prove Theorem 24.

Proof of Theorem 24.

Let d=Max(ρ0n,2ρ1n) and note that by definition at least one of the following holds: The code detects d-errors, or corrects d/2-errors. Moreover, by our assumption on the rate of the code, it holds that d>nRn+1. Let s=nd and let v=log(q)s since Rklog(q)n it holds that vk1. Thus, we meet the conditions of Lemma 25 which implies that there exists a distributional one-way function (in fact outputting the binary representation of the first s symbols of Enc is the distributional one-way function), and by Theorem 7 there exists a one-way function.

Proof of Lemma 25.

For k, let Mk and Rk denote the random variables sampled uniformly from {0,1}k and {0,1}(k) respectively. Let Yk=𝖥(Mk,Rk) and assume for contradiction that 𝖥 is not a distributional one-way function (see Definition 6). That is, for every constant c>0 there exists a PPT algorithm 𝖠 such that for infinitely many k,

SD((𝖠(Yk),Yk),((Mk,Rk),Yk))1/kc. (1)

For a fixed k for which the above holds, we omit the subscript k from the random variables to avoid clutter.

Let (M,R)=𝖠(Y) and note that by the definition of average min-entropy (see Definition 3)

𝐏𝐫[M=M]2H~(M|Y)1/2 (2)

where the last inequality follows by Lemma 4 that states that H~(M|Y)H(M,Y)vkv1.

Let C=EncR(M) and C=EncR(M) and recall that by definition, Y=𝖥(M,R) is just the binary representation of the first s symbols of C. Thus, if 𝖥(M,R)=𝖥(M,R)=Y it follows that, Δ(EncR(M),EncR(M))ns=d, which by Equation 1 implies that:

𝐏𝐫[Δ(C,C)>d]1/kc. (3)

Note also that by Equation 1,

𝐏𝐫[M=]1/kc (4)

We first handle the case where the code detects d-error and then the case where the code corrects d/2-errors.

Detection.

By Equation 1 and by the reliability of the code,

𝐏𝐫[Dec(C)=M]𝐏𝐫[Dec(C)=M]1/kc>0.9 (5)

We now define a PPT algorithm 𝖠 (that makes use of 𝖠) that contradicts the detection property of the code: On input 1k, 𝖠 samples m{0,1}k and on input c[q]n it runs (m,r)𝖠(y) where y is just the binary representation of the first s symbols of c (i.e., y=𝖥(m,r)), and outputs c=Encr(m). It holds that:

  • M=𝖠(1k) (where by definition M is sampled uniformly by 𝖠).

  • Compute C=EncR(M) (by sampling R{0,1}).

  • C=EncR(M)=𝖠(M,C).

Thus, 𝖠 contradicts the d-detection property since

𝐏𝐫[Dec(C){M,}Δ(C,C)d]
𝐏𝐫[Dec(C)=MM{M,}Δ(C,C)d]
𝐏𝐫[Dec(C)=M]𝐏𝐫[M=M]𝐏𝐫[M=]𝐏𝐫[Δ(C,C)>d]
0.90.52/kc0.1

Where the penultimate inequality follows from Equations 5, 2, 4 and 3.

Correction.

We now prove the correction property. Let G be a function defined as follows: On inputs c,c[q]n, G samples a set I of random (ns)/2 indices from the set [n][s] (if ns is an odd number then with probability 1/2 it sample (ns)/2 indices and with probability 1/2 it samples (ns)/21 indices), and outputs c[q]n where for every iI, ci=ci and for every i[n]I, ci=ci. Intuitively, if c and c agree on the first s indexes then G(c,c)=c is a random “mid point” between c and c.

Recall that C=EncR(M), C=EncR(M) and let C=G(C,C). By Equation 1 and symmetry,

SD((M,R,C,C),(M,R,C,C))2/kc. (6)

Since by assumption ns=d, by the definition of C it follows that:

𝐏𝐫[Δ(C,C)>d/2]1/kc (7)

Recall that (M,R) are sampled uniformly at random, thus, by Equations 6 and 7 and the decoding properly of the code

𝐏𝐫[Dec(C)=M]𝐏𝐫[Dec(C)=M]3/kc0.9. (8)

We now define a PPT algorithm 𝖠 (that makes use of 𝖠) that contradicts the correction property of the code: On input 1k, 𝖠 samples m{0,1}k and on input c[q]n runs (m,r)𝖠(y) where y is just the binary representation of the first s symbols of c (i.e., y=𝖥(m,r)), computes c=Encr(m) and outputs c=G(c,c). It holds that,

  • M=𝖠(1k) (where by definition M is sampled uniformly by 𝖠).

  • Compute C=EncR(M) (by sampling R{0,1}).

  • C=𝖠(M,C) (recall that C=G(C,C) for C=EncR(M) and (M,R)=𝖠(𝖥(M,R))).

Thus, 𝖠 contradicts the d/2-correction property since,

𝐏𝐫[Dec(C)MΔ(C,C)d/2]
𝐏𝐫[Dec(C)=MMMΔ(C,C)d/2]
𝐏𝐫[Dec(C)=M]𝐏𝐫[M=M]𝐏𝐫[Δ(C,C)d/2]
0.90.51/kc0.1.

Where the penultimate inequality follows from Equations 2, 7 and 8. This concludes the proof for Lemma 25.

6 Self-Seeded Codes Imply Collision Resistance Hash Functions

We now show that non-trivial self-seeded codes imply the existence of a collision resistance hash family. Loosely speaking, the following theorem states that if a self-seeded code has rate that bypasses the Singleton bound, then it implies collision resistance hash functions (since there are codes with large alphabet that can achieve the Singleton bound, this result means that for large alphabet, any non-trivial self-seeded code implies collision resistance functions).

We note that while the result below is stated with respect to the notion of self-seeded codes, the same exact proof also holds for the weaker notion seeded codes defined in [3] in which the encoding and decoding algorithms both share the random seed. Since our lower bound still holds even if the decoding algorithm is given the random seed as input, the lower bound immediately extends to the weaker notion of seeded codes. However, to avoid confusion and clutter, we present our result with respect to the notion of self-seeded codes defined in this paper.

Theorem 26 (Self-Seeded codes with rate beating the Singleton bound imply collision resistance hash functions).

Let (Enc,Dec) be a self-seeded code with rate 0R<1. If one of the following holds:

  • The code detects ρ0-relative errors and ρ0>1R+1n.

  • The code corrects from ρ1-relative errors and 2ρ1>1R+2n.

Then there exists an explicit construction for a collision resistance hash family given black-box access to the code.

The proof for Theorem 26 appears in the full version [22].

References

  • [1] Jeremiah Blocki, Venkata Gandikota, Elena Grigorescu, and Samson Zhou. Relaxed locally correctable codes in computationally bounded channels. IEEE Transactions on Information Theory, 67(7):4338–4360, 2021. doi:10.1109/TIT.2021.3076396.
  • [2] Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam Smith. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM journal on computing, 38(1):97–139, 2008. doi:10.1137/060651380.
  • [3] Ofer Grossman, Justin Holmgren, and Eylon Yogev. Transparent error correcting in a computationally bounded world. In Rafael Pass and Krzysztof Pietrzak, editors, Theory of Cryptography - 18th International Conference, TCC 2020, Durham, NC, USA, November 16-19, 2020, Proceedings, Part III, volume 12552 of Lecture Notes in Computer Science, pages 530–549. Springer, 2020. doi:10.1007/978-3-030-64381-2_19.
  • [4] V. Guruswami and A. Smith. Optimal rate code constructions for computationally simple channels. Journal of the ACM (JACM), 63(4):35, 2016.
  • [5] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Transactions on information theory, 54(1):135–150, 2008. doi:10.1109/TIT.2007.911222.
  • [6] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory, 2012. Draft available at https://cse.buffalo.edu/faculty/atri/courses/coding-theory/book/web-coding-book.pdf.
  • [7] R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, 1950. doi:10.1002/j.1538-7305.1950.tb00463.x.
  • [8] Brett Hemenway, Rafail Ostrovsky, Martin J. Strauss, and Mary Wootters. Public key locally decodable codes with short keys. In Leslie Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14th International Workshop, APPROX 2011, and 15th International Workshop, RANDOM 2011, Princeton, NJ, USA, August 17-19, 2011. Proceedings, volume 6845 of Lecture Notes in Computer Science, pages 605–615. Springer, 2011. doi:10.1007/978-3-642-22935-0_51.
  • [9] Russell Impagliazzo and Michael Luby. One-way functions are essential for complexity based cryptography. In FOCS 30, pages 230–235, 1989. doi:10.1109/SFCS.1989.63483.
  • [10] Swastik Kopparty, Or Meir, Noga Ron-Zewi, and Shubhangi Saraf. High-rate locally correctable and locally testable codes with sub-polynomial query complexity. Journal of the ACM (JACM), 64(2):1–42, 2017. doi:10.1145/3051093.
  • [11] R. J. Lipton. A new approach to information theory. In 11th Annual Symposium on Theoretical Aspects of Computer Science, pages 699–708, 1994. doi:10.1007/3-540-57785-8_183.
  • [12] Silvio Micali, Chris Peikert, Madhu Sudan, and David A. Wilson. Optimal error correction for computationally bounded noise. IEEE Trans. Information Theory, 56(11):5673–5680, 2010. doi:10.1109/TIT.2010.2070370.
  • [13] Moni Naor and Moti Yung. Universal one-way hash functions and their cryptographic applications. In David S. Johnson, editor, Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989, Seattle, Washington, USA, pages 33–43. ACM, 1989. doi:10.1145/73007.73011.
  • [14] Rafail Ostrovsky, Omkant Pandey, and Amit Sahai. Private locally decodable codes. In Lars Arge, Christian Cachin, Tomasz Jurdzinski, and Andrzej Tarlecki, editors, Automata, Languages and Programming, 34th International Colloquium, ICALP 2007, Wroclaw, Poland, July 9-13, 2007, Proceedings, volume 4596 of Lecture Notes in Computer Science, pages 387–398. Springer, 2007. doi:10.1007/978-3-540-73420-8_35.
  • [15] Leonid Reyzin. Some notions of entropy for cryptography: (invited talk). In International Conference on Information Theoretic Security, pages 138–142. Springer, 2011. doi:10.1007/978-3-642-20728-0_13.
  • [16] John Rompel. One-way functions are necessary and sufficient for secure signatures. In Harriet Ortiz, editor, Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13-17, 1990, Baltimore, Maryland, USA, pages 387–394. ACM, 1990. doi:10.1145/100216.100269.
  • [17] R. Shaltiel and J. Silbak. Explicit list-decodable codes with optimal rate for computationally bounded channels. Comput. Complex., 30(1):3, 2021. doi:10.1007/s00037-020-00203-w.
  • [18] R. Shaltiel and J. Silbak. Explicit uniquely decodable codes for space bounded channels that achieve list-decoding capacity. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1516–1526, 2021. doi:10.1145/3406325.3451048.
  • [19] R. Shaltiel and J. Silbak. Error correcting codes that achieve BSC capacity against channels that are poly-size circuits. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 13–23, 2022. doi:10.1109/FOCS54457.2022.00009.
  • [20] R. Shaltiel and J. Silbak. Explicit codes for poly-size circuits and functions that are hard to sample on low entropy distributions. Electronic Colloquium on Computational Complexity (ECCC), 2024. URL: https://eccc.weizmann.ac.il/report/2023/149.
  • [21] Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27(4):623–656, 1948. doi:10.1002/J.1538-7305.1948.TB00917.X.
  • [22] Jad Silbak and Daniel Wichs. Detecting and correcting computationally bounded errors: A simple construction under minimal assumptions. Cryptology ePrint Archive, Paper 2024/1461, 2024. URL: https://eprint.iacr.org/2024/1461.