Detecting and Correcting Computationally Bounded Errors: A Simple Construction Under Minimal Assumptions
Abstract
We study error detection and error correction in a computationally bounded world, where errors are introduced by an arbitrary polynomial time adversarial channel. We consider codes where the encoding procedure uses random coins and define two distinct variants: (1) in randomized codes, fresh randomness is chosen during each encoding operation and is unknown a priori, while (2) in self-seeded codes, the randomness of the encoding procedure is fixed once upfront and is known to the adversary. In both cases, the randomness need not be known to the decoding procedure, and there is no trusted common setup between the encoder and decoder. The encoding and decoding algorithms are efficient and run in some fixed polynomial time, independent of the run time of the adversary.
The parameters of standard codes for worst-case (inefficient) errors are limited by the Singleton bound: for rate it is not possible to detect more than a fraction of errors, or uniquely correct more than a fraction of errors, and efficient codes matching this bound exist for sufficiently large alphabets. In the computationally bounded setting, we show that going beyond the Singleton bound implies one-way functions in the case of randomized codes and collision-resistant hash functions in the case of self-seeded codes. We construct randomized and self-seeded codes under these respective minimal assumptions with essentially optimal parameters over a constant-sized alphabet:
-
Detection: the codes have a rate while detecting a fraction of errors.
-
Correction: for any , the codes uniquely correct a fraction of errors with rate .
Codes for computationally bounded errors were studied in several prior works starting with Lipton (STACS ’94), but all such works either: (a) need some trusted common setup (e.g., public-key infrastructure, common reference string) between the encoder and decoder, or (b) only handle channels whose complexity is a prior bounded below that of the code.
Keywords and phrases:
Error Correction, One-Way Functions, Collision Resistant HashingFunding:
Jad Silbak: Research supported by the Khoury College Distinguished Postdoctoral Fellowship.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Cryptographic primitivesAcknowledgements:
We are very grateful to Ronen Shaltiel and Tal Yankovitz for very useful discussions.Editor:
Raghu MekaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
The theory of error detection and error correction goes back to the seminal works of Shannon [21] and Hamming [7]. A sender encodes a message and sends the resulting codeword over a noisy channel that may introduce errors by modifying a subset of the codeword symbols. A receiver gets the modified codeword and attempts to decode the original message. The goal of error-detection is to recover the original message when there are no errors or detect the presence of errors. The goal of error-correction is to always uniquely recover the original message. The main parameters of interest are the rate , defined as the ratio of message length to codeword length, and the relative error tolerance , defined as the ratio between the number of errors that can be detected/corrected and the codeword length.
Starting with the work of Hamming [7], a large body of research studies codes for worst-case computationally unbounded (adversarial) channels that can introduce arbitrary errors patterns. The error tolerance in this setting is fully determined by the distance of the code, which is subject to several well-established fundamental bounds. For example, the Singleton bound implies that for any rate the relative error tolerance is for error detection and for error correction. Furthermore, there are efficient codes matching the Singleton bound for sufficiently large alphabet sizes.
Computationally Bounded Channels
We consider a setting similar to that of Hamming [7], where errors are introduced by a fully adversarial channel, but with the additional natural restriction that the channel is computationally bounded, and can run in arbitrary polynomial (or even sub-exponential) time. This restriction is well motivated – if an error pattern takes longer than the lifetime of the universe to compute, then we can safely assume it will not occur. Our goal is to construct efficient codes that can detect/correct errors in this setting with rate vs error-tolerance trade-offs that go beyond the Singleton bound. Many prior works starting with [11] and including [14, 12, 8, 4, 17, 1, 18, 19, 20, 3] have previously considered computationally bounded channels, either to get better parameters, or to achieve other desirable properties such as improved local/list decoding. However, all such prior works suffer from at least one of the following deficiencies:
-
1.
Trusted setup: Many works require a trusted coordinated setup between the encoding and decoding algorithms. For example, [11] assumes that they share a secret key, which is unknown to the adversarial channel. Alternatively, [12] require a public-key infrastructure where the encoding algorithm has its own secret key, while the decoding algorithm (as well as the adversarial channel) gets the corresponding public-key. In both cases, the encoding algorithm is also stateful and updates its state between operations. The use of secret keys and state are significant limitations that make such codes inapplicable in many natural settings. The work of Grossman, Holmgren and Yogev [3] introduces the notion of seeded codes that only requires a light trusted setup consisting of a common (uniformly) random seed shared by the encoder and decoder. They construct such seeded codes correcting errors over a large alphabet based on strong and non-standard cryptographic hardness assumptions – namely a certain form of two-input correlation-intractable hash functions, which are currently only known to exist in the random-oracle model, but have no known provably secure instantiations under standard cryptographic hardness assumptions.
-
2.
Codes have higher complexity than adversary: Many works [4, 17, 18, 19, 20] restrict the complexity (time/size or space) of the adversarial channel to be bounded by some arbitrary polynomial chosen a priori, and the complexity of the encoding/decoding algorithms then scales accordingly to some sufficiently larger polynomial. In particular, the adversarial channel is not powerful enough to evaluate the encoding/decoding algorithms in these schemes. This is a significant limitation, since we cannot argue that such schemes protect against all errors that can arise in a computationally bounded world – a world that can compute the encoding/decoding algorithms of the scheme can also find error patterns to defeat them! In contrast, we would like to have efficient codes with some fixed polynomial run-time that protect against arbitrary polynomial (or even sub-exponential time) adversarial channels.
The different prior works and their properties are summarized in Table 1.
| work | Setup/Type | Channel | Decoding | Assumption | ||
| [11] |
|
P/poly | unique | one-way functions | ||
| [12] |
|
P/poly | unique | one-way functions | ||
| [4] | seeded + randomized | SIZE() | list | none | ||
| [17] | randomized | SIZE() | list | derandomization assumption | ||
| [18] | randomized | SPACE() | unique | none | ||
| [19] | seeded + randomized | SIZE() | unique | none | ||
| [20] | randomized | SIZE() | unique | derandomization assumption | ||
| [3] | seeded | P/poly | unique |
|
||
| This work | randomized | P/poly | unique | one-way functions | ||
| This work | self-seeded | P/poly | unique | collision-resistant hash |
1.1 Our Results
In this work we construct error detection and error correction codes for computationally bounded channels, while simultaneously achieving: (1) no trusted setup, (2) a fixed polynomial-time code that protects against arbitrary polynomial (or even sub-exponential) time adversarial channels, (3) provable guarantees under basic/minimal computational hardness assumptions, (4) essentially optimal trade-offs between rate and error tolerance beyond the Singleton bound for sufficiently large constant-sized alphabets.
Randomized/Self-Seeded Codes
Standard deterministic codes cannot achieve better parameters against (non-uniform) computationally bounded channels beyond what they achieve for computationally unbounded channels, and in particular, they cannot go beyond the Singleton bound. This is because a worst-case message and error pattern for which detection/correction fails can simply be hard-coded as non-uniform advice to an efficient adversary. To overcome this limitation, we therefore consider codes , where the encoding procedure takes additional randomness as input. We consider two variants. In the basic variant of randomized codes, fresh randomness for the encoding algorithm is chosen each time on the fly and is unknown to the adversary ahead of time. The adversary chooses a worst-case message , gets the resulting randomized codeword for a fresh random , and then chooses the error as an arbitrary efficiently computable function of the codeword. We also consider a stronger variant called self-seeded codes, where the randomness of the encoding algorithm is fixed once and for all and known to the adversary a priori. That is, the adversary gets the randomness and then uses it to adaptively choose the message (which fully defines the codeword ) along with the error . In both cases, the fractional Hamming weight of is bounded by the error tolerance . Error detection ensures that with overwhelming probability the resulting erroneous codeword decodes to when there are no errors, and decodes to either or when there are errors. Error correction ensures that with overwhelming probability decodes to .
The main technical difference between randomized and self-seeded codes is adaptivity: in the former the message to be encoded is chosen by the adversary before the randomness is chosen, while in the latter the message can adaptively depend on .111Randomized codes may also be secret-coin, where the adversary never sees the full randomness . However, our construction will be public-coin and the randomness can be easily recovered from the codeword . Self-seeded codes allow us to fix the randomness for the code once and for all as a public seed, and afterwards the code is fully deterministic. Moreover, the seed can be chosen by the encoder unilaterally without any coordination with the decoder. For example, different software manufacturers can individually choose their own seeds to be used by the encoding algorithms in their software, while allowing for interoperability with the decoders implemented by other manufacturers without any coordination. Self-seeded codes should be contrasted to the notion of seeded codes in [3], which required a shared random seed as a common trusted setup between the encoder and decoder. Self-seeded codes are the strongest notion and directly imply the seemingly incomparable notions of randomized codes and seeded codes.
Minimal Assumptions
We show that any randomized code that goes beyond the Singleton bound with error detection tolerance or error correction tolerance implies one-way functions, while any such seeded (and therefore also self-seeded) code implies collision-resistant hash functions.
Main Theorem
We construct randomized and self-seeded codes going beyond the Singleton bound, with essentially optimal parameters for both error detection and error correction, under the above respective minimal assumptions.
Theorem 1 (Informal).
Assuming one-way functions (resp. collision-resistant hash functions) there exist efficiently computable randomized (resp. self-seeded) error-detection and error-correction codes for arbitrary polynomial-time adversarial errors with the following parameters:
-
Error Detection: For any constant , there is a code over a constant-sized alphabet with rate that detects a fraction of errors.
-
Error Correction: For any constants and , there is a code over a constant-sized alphabet with rate that uniquely corrects from a fraction of errors.
If we assume sub-exponentially secure one-way functions (resp. collision-resistant hash functions), then the above properties hold even for sub-exponential time adversarial errors.
Recall that, in the case of error-detection, the Singleton bound says that the relative error tolerance necessarily degrades as the rate improves when considering worst-case errors; in contrast, for computationally bounded errors, our result simultaneously achieves the best possible error tolerance and rate . In the case of error correction, the Singleton bound says the maximal error tolerance is for worst-case errors, while for computationally bounded errors our result achieves twice as large error tolerance , subject to the upper limit .
Our achieved parameters are essentially optimal, up to the arbitrarily small constant . This is clear for error detection, where we get the best possible rate and error tolerance simultaneously. For error-correction, one cannot tolerate fraction of errors since an efficient adversarial channel can modify 1/2 the codeword positions to a fresh encoding of a different message, causing decoding to fail with probability at least .222The works of [11, 12] overcome this limitation by having a stateful and secret keyed encoding procedure to ensure that an adversarial channel cannot generate/reuse valid codewords. Moreover, one cannot have rate since zeroing out the first fraction of positions of the codeword would remove information about the message, causing decoding to fail.
We remark that, not only are one-way functions and collision-resistant hash functions the minimal assumptions needed to achieve the above results, they are considered some of the most standard and well-studied cryptographic assumptions/primitives. In particular, one-way functions are a minimal assumption necessary for essentially any non-trivial task in cryptography and are implied by all concrete cryptographic hardness assumptions. Collision-resistant hash functions are not known to be implied by one-way function, but can be constructed from essentially every concrete cryptographic hardness assumption, such as hardness of factoring, discrete logarithms, learning with errors, etc.
The closest related prior work of [3] achieves essentially the same parameters as our work, but only for the weaker notion of seeded codes, which require trusted setup, and only by assuming a strong form of “two-input correlation-intractable hash functions”, which we do not know how to instantiate under any standard cryptographic hardness assumption. Our result for self-seeded codes yields a stronger primitive under significantly weaker minimal assumptions. Comparing the notions of randomized, seeded and self-seeded codes, it is clear that self-seeded codes are the strongest notion easily implying the other two, while randomized and seeded codes are seemingly incomparable. Interestingly, since we show that any non-trivial seeded codes imply collision-resistant hashing, which in turn implies self-seeded codes, we show that seeded and self-seeded codes are equivalent in terms of the needed assumptions.
Simultaneous Correction and Detection
While we stated separate results for error-detection and error-correction above, we can also simultaneously achieve a non-trivial combination of correction and detection. In particular, we can augment the previous theorem statement with the following:
For any constants and , there is a code over a constant-sized alphabet with rate that simultaneously uniquely corrects from a fraction of errors and detects a fraction of errors.
The above implies both of the previously stated results for error detection (by setting ) and error correction (by ignoring ). However simultaneous error correction and detection is an interesting property, which is more than the sum of its parts. It gives a single decoding procedure that either outputs a message or (error detection symbol) and is guaranteed to uniquely recover the correct message if the error rate is and to never output the wrong message as long as the error rate is . We can envision scenarios where correcting a small fraction of errors (e.g., ) may suffice. Our result show that for such instances, we can have a code with a correspondingly large rate (e.g., ) while simultaneously being able to detect a much larger fraction of errors (e.g., ) at no additional cost.
Alphabet Size
We note that our results hold for large constant-size alphabets, but do not extend to the case of a binary alphabet. Giving optimal results for binary codes that detect/correct computationally bounded errors under minimal assumptions remains an interesting open problem. The results of [3] for seeded error correcting codes using a strong form of correlation intractable hash functions extend to the binary case and show that one can uniquely decode computationally bounded errors with essentially the same parameters as list-decoding for worst-case errors. It remains an open problem to achieve such results for randomized or self-seeded codes under any assumptions, and/or to to get provably secure seeded codes under standard cryptographic hardness assumptions.
1.2 Our Techniques
Our techniques are quite simple in retrospect. We start by showing how to construct self-seeded codes assuming collision-resistant hashing, and then discuss how to modify these results to get analogous ones for randomized codes from one-way functions. Throughout the introduction, we use “” to denote that two values differ by an arbitrarily small constant.
Self-Seeded Error Detection with Sub-Exponential Alphabet
We start with a naive construction of self-seeded error-detection codes that requires a sub-exponentially large alphabet size, and then show how to reduce the alphabet size to some (large) constant. Let be a collision-resistant hash function family with output length (security parameter), which we can think of as for some small constant . Given a random public seed , no polynomial-time adversary can find two messages such that .
The encoding algorithm takes a message and parses it as symbols over an alphabet for some large . It then “folds in” the seed and hash of the message into each message symbol to yield the codeword
interpreted as symbols over the larger alphabet . The decoding algorithm gets
which gives a candidate message . It checks that the second part of each codeword symbol has the same common seed value and that the third part of each symbol has the same common hash value ; if so it outputs else . The code achieves rate by choosing a sufficiently large . Moreover, it achieves error-tolerance . Unless the adversary changes every symbol of the codeword , or finds a collision, decoding is guaranteed to output the correct message or . To see this, assume at least one symbol of the codeword remains unchanged and decoding outputs for some . Then it must be the case that for all and therefore the this gives a collision such that .
Self-Seeded Error Detection with Constant Alphabet
We reduce the alphabet size to a constant by applying a standard error-correcting code over a constant sized alphabet to the small “control” value . In particular, for some constants , we use a code with alphabet such that maps a -bit message to a codeword of length . The corresponding rate of the code is very small , but we will want large distance . Such codes can be constructed by concatenating Reed-Solomon codes with a brute-force inner code over the alphabet (see Theorem 13). We do not need efficient error-decoding for now, but only require an efficient way to recognize and valid codewords and invert them without errors. The encoding algorithm of our self-seeded code first computes . It interprets as consisting of symbols over the alphabet and then “folds” the two vectors together to yield the final codeword
interpreted as symbols in over the alphabet . The decoding algorithm gets
and recovers a candidate message as well as a candidate . It checks that is a valid codeword and if so recovers such that . If is not a valid codeword or , it outputs else it outputs . Note that this generalizes the previous naive idea with a sub-exponentially large alphabet, which corresponds to using a repetition code for the control code. The rate of the overall code is which can be made arbitrarily close to by choosing a sufficiently large constant . The error-tolerance is . In particular, if is within distance of and is a valid codeword then it must be the case that and hence . In this case, the only way that that decoding can output some is if the values form a hash collision .
Self-Seeded Error Correction
To achieve error-correction, we use essentially the same construction as before, but additionally encode the data using an efficiently list-decodable error-correcting code over a constant size alphabet for some large constant . For any constant we want this code to have rate and to be list-decodable from a fraction of errors. This can be accomplished using the codes of [5]. Moreover, we will now want the code to efficiently uniquely decode from errors, which can be achieved with the same parameters as previously. Our self-seeded code computes as before, and also . It then “folds” the two codewords together to yield the final codeword
where . The decoding algorithm gets
It applies list-decoding on the data part of the received value to recover a polynomial list of candidate messages . It also uniquely decodes . Then it outputs the first value such that , or outputs if none is found or if either of the decoding procedures fail. The rate of the overall code is by choosing sufficiently large constants . To analyze the error-correction, first observe that if the received value within relative distance of then, by the correctness of list-decoding of the “data code”, the correct message will be in the decoded list with for some , and by the correctness of unique decoding of the “control code” we have . So the only way that error correction can fail is if there is another in the list such that , but this gives a collision on the hash function.
Randomized Codes from One-Way Functions
Our constructions of randomized codes with error-detection and error-correction are identical to the above constructions of self-seeded codes, but instead of requiring the hash family to be collision-resistant, we only need it to be a universal one-way hash function (UOWHF). A UOWHF ensures that a polynomial time adversary that selectively chooses a message first and then gets the seed cannot find a collision such that except with negligible probability. Such UOWHFs can be constructed from one-way functions [13, 16]. Recall that the only difference between self-seeded and randomized codes is whether the encoded message can be chosen adaptivley depending on the randomness or selectively before is chosen, which exactly matches the difference between collision-resistant hash functions and UOWHFs.
Necessity of Assumptions
We show that self-seeded codes beating the Singleton bound, with error-detection tolerance or error-correction tolerance , imply collision-resistant hash functions (CRHFs). Start by considering a self-seeded error-detection code with error-detection tolerance . We claim that for every fixed subset of fraction of codeword positions, the function that outputs the codeword symbols of in the positions indexed by is collision-resistant. In particular, if there is an efficient adversary that, given , can find a collision such that , it means that and are at distance and therefore such an adversary can break error detection by choosing the message and modifying to . Moreover, if the rate of the code exceeds the Singleton bound with , then the functions are compressing, making them CHRFs. To get the analogous result for error-correction, we simply note that -error-correction implies -error-detection. In particular, if an efficient adversary given can find two messages such that and are within relative distance then it can find a “midpoint” which is at relative distance from both and . For one of , modifying to necessarily causes decoding to output the incorrect message .
We also show that randomized error-detection and error-correction codes beating the Singleton bound imply one-way functions. The argument is somewhat different from above since we don’t want to assume that reveals in the full. Instead, we rely on the fact that, if one-way functions don’t exist, then we can efficiently sample “almost uniform” inverses of any efficient function [9]. Consider a random codeword for a random message and randomness . For any subset of fraction of codeword positions we can efficiently find an almost uniform inverse of the function that outputs the codeword symbols in positions . This means that and agree on the positions in and therefore have relative distance . If the rate of the code exceeds the Singleton bound with , then there is a non-negligible probability that since loses information about . Therefore the channel that changes to is efficient and defeats error detection by introducing -fraction errors. Similarly the channel that changes a random subset of positions outside of from to defeats error correction since decoding is almost as likely to output as .
Organization of the Paper
In Section 2 we give general notations and state the ingredients that we will use. In Section 3 we give formal definitions for randomized codes and self-seeded codes. In Section 4 we state and prove our main theorems. In Section 5 we show that randomized codes imply one-way functions. Due to space limitations, this extended abstract does not contain the proof that self-seeded codes imply collision resistance hash functions. The reader is referred to [22] for a full version that contains this proof.
2 Preliminaries and Ingredients
2.1 Notations
All logarithms are taken in base . Let stand for the set of all polynomials. Let stand for non-uniform probabilistic polynomial-time. We note that while we rely on the non-uniform model of computation by default, all our results also hold if we were to define in the uniform model. We use sans-serif (e.g., ) for algorithms. A function is negligible, denoted , if for every and large enough . For a set , let denote that was drawn uniformly from . Given a distribution , we use to denote that is chosen according to .
Hamming distance
The Hamming distance between is . The relative Hamming distance between is .
Distributions and random variables
The statistical distance (also known as, variation distance) of two distributions and over a discrete domain is defined by .
Definition 2 (Min-entropy).
A random variable has min-entropy , denoted ,
if .
We will also use the following lemma about average min-entropy:
Definition 3 (Average min-entropy).
Let be a pair of random variables. The average min-entropy of conditioned on is:
Average min-entropy gives a bound on the probability that an adversary that gets the value of will guess the value of in a single attempt. We will use the following useful lemma that connects average min-entropy and min-entropy.
2.2 One-Way Functions
Definition 5 (One-way functions).
A algorithm is a one-way function if for every and every sufficiently large ,
Distributional one-way functions
We will also use the classical result of Impagliazzo and Luby on distributional one-way functions implying one-way functions.
Definition 6 (Distributional One-way functions).
A algorithm is a distributional one-way function if there exists a constant such that for every and every sufficiently large ,
where is sampled uniformly from .
Theorem 7 (Distributional one-way functions imply one-way functions, [9]).
If there exists a distributional one-way function then there exists a one-way function.
2.3 Universal One-Way Hash Functions
Definition 8 (Universal one-way hash functions).
For a security parameter . A family of functions is a family of universal one-way hash functions (UOWHFs) if it satisfies:
-
1.
Efficiency: Given and , can be evaluated in time .
-
2.
Shrinking: .
-
3.
Target Collision Resistance: For every and sufficiently large , if we consider the following randomized experiment:
-
,
-
Sampling ,
-
Then,
-
The notion of universal one-way hash functions (UOWHF) was first introduced by Naor and Yung [13]. Rompel [16] gave the first UOWHF construction from arbitrary one-way functions.
Theorem 9 (One-way functions imply universal one-way hash function, [13, 16]).
There is a black-box construction of universal one-way hash functions (UOWHFs) from one-way functions (OWFs). In particular, assuming OWFs, for any arbitrarily large constant and arbitrarily small constant there is a UOWHF with input length and output/seed length .
2.4 Collision Resistance Hashing
Definition 10 (Collision resistance hash functions (CRHF)).
For a security parameter . A family of functions is a family of collision resistance hash functions (CRHFs) if it satisfies:
-
1.
Efficiency: Given and , can be evaluated in time .
-
2.
Shrinking: .
-
3.
Collision Resistance: For every and sufficiently large ,
We mention that the parameters of CRHFs above do not matter much: any “non-trivial” CRHF implies an “arbitrarily good” CRHF. That is, start with any CRHF with arbitrarily large polynomial seed/output length that has even 1-bit compression . Then, for any arbitrarily small constant and arbitrarily large constant , we can also get a CRHF with small seed/output length and large input length . Therefore, when we assume the existence of CRHFs we without loss of generality assume the above “arbitrarily good” CRHF parameters.333We can decrease the seed/output length by changing the security parameter from to for a sufficiently small and noting that poly/negl in is the same as poly/negl in . We can increase the input length arbitrarily using the Merkle-Damgard transform.
2.5 Standard Correcting/Detecting Codes
We now define error-correcting codes for information-theoretic adversaries also called Hamming channels. In this section, a code is a pair of deterministic encoding and decoding algorithms, and different notions are obtained by considering the requirements of the decoding algorithm.
Definition 11 (Standard codes for Hamming channels).
Let be parameters and let be a function. We say that is an encoding function for a code that:
-
decodes errors, if there exists a function such that for every and every with , we have .
-
-list-decodes errors, if the function is allowed to output a list of size at most , and for every and every with , we have .
-
detects errors, if there exists a function such that for every , and for every such that and , we have .
The rate of the code is the ratio of the message length and output length of , where both lengths are measured in bits. That is the rate .
A code (, ), is explicit if and run in polynomial time. (Naturally, this makes sense only for a family of encoding and decoding/detecting functions with varying message length block length , and alphabet sizes ).444We point out that traditionally codes are indexed by the codeword length (that is, and are chosen as a function of the codeword size ). However, for the sake of consistency with the rest of our definitions, we chose to define the codeword length and alphabet as a function of the input size .
We will also use the following construction by Guruswami and Rudra [5] for optimal list-decodable codes over a constant alphabet.
Theorem 12 ([5]).
For every , every sufficiently small , there is an explicit family of codes over an alphabet of size that have rate at least and which can be list decoded up to a fraction of errors in polynomial time, with a list of size .555We remark that while Theorem 12 does not explicitly mention that the rate can be achieved for every sufficiently large message size , and instead just claims that it holds for infinitely often choices of . A careful examination of the construction and proof by [5] reveals that the argument and proof hold for every sufficiently large , given a suitable expander graph. An explicit construction for such graphs were given for every sufficiently large in [10] (see Lemma 2.7 for a formal proof). Thus, we assume that Theorem 12 holds for every sufficiently large .
It is folklore that for every by using code concatenation it is possible to construct standard codes with distance and rate for any sufficiently large with an alphabet size that depends on . We make this formal in the following Theorem (see for example the book by Guruswami, Rudra, and Sudan [6] for more details).
Theorem 13.
For every sufficiently small , and every computable function such that for every sufficiently large , , (for a universal constant ) there exists an explicit standard code , for and with alphabet such that the code has distance and (uniquely) decodes from errors.
3 Definitions for Randomized Codes and Self-Seeded Codes
In this section, we formally define randomized codes and self-seeded codes, and their related notions of error detection and correction.
3.1 Randomized Codes
We start by giving a formal definition of randomized encoding and decoding algorithms. This syntax will be also shared with our notion of self-seeded codes defined in the next section.
Definition 14 (Randomized codes).
Let be poly-time computable functions. An -randomized code is a pair such that:
-
is a algorithm that on input outputs a string in . We use to denote the instantiation of when using the string as random coins.
-
is a algorithm that on input outputs a string .
For a fixed , the rate of the code for is defined to be . When is well defined we say that is the rate of the code. We omit all or a subset of the functions and when they are clear from the context.
We now define a notion of reliable codes for randomized codes, which essentially states that it is possible to recover the message given a codeword (that was not damaged by error).
Definition 15 (Codes with reliable decoding).
A randomized code is said to be perfectly reliable if for every and ,
Intuitively, in randomized codes, each time the sender wants to send a message, it samples fresh randomness on the fly and uses it to encode the message. We make this formal in the definition below.
Definition 16 (Error detection/correction for randomized codes).
Let be poly-time computable functions and let be a randomized code. For a algorithm and , consider the following randomized experiment:
-
1.
-
2.
Compute (by sampling and computing ).
-
3.
We say that the code:
-
is reliable if for every algorithm and every sufficiently large :
.666This is a weaker variant of Definition 15. Our construction achieves the stronger variant of perfect reliability. However, our lower bounds results are stronger since they apply to this weaker notion. -
detects -errors if it is a reliable code, and for every algorithm and every sufficiently large :
-
corrects -errors if for every algorithm and every sufficiently large :
For a constant , the code detects or corrects -relative errors, if for every sufficiently large , .
Note that by definition if a code corrects -errors for then the code is reliable. In addition, our definition of error detection permits also error correction (that is, when errors are induced, the decoding algorithm can either return or successfully decode and return the original message ). By doing so we can achieve simultaneous error detection and correction (that is, the same decoding function can at the same time satisfy the notion of detection and correction defined above).
3.2 Self-Seeded Codes
We now formally, define the notion of self-seeded codes. A self-seeded code is a randomized code that shares the same syntax as defined in 14. However, a self-seeded code provides a strictly stronger notion of security as compared to the notion of error correction and detection for randomized codes stated above in Definition 16.
Definition 17 (Error detection/correction for self-seeded codes).
Let be poly-time computable functions. A randomized code is said to be a self-seeded code if the following holds: For a algorithm and , consider the following randomized experiment:
-
1.
-
2.
We say that the code:
-
is reliable if for every algorithm and every sufficiently large :
-
detects -errors if it is a reliable code, and for every algorithm and every sufficiently large :
-
corrects -errors if for every algorithm and every sufficiently large :
For a constant , the code detects or corrects -relative errors, if for every sufficiently large , .
Self-seeded code enables the sender to sample its randomness once, fix it, and then use this same randomness for encoding multiple messages (this contrasts with the standard notion of randomized codes, which requires the sender to generate fresh randomness for each message in real time). We stress that after being fixed, this randomness is known to any potential adversary but is not given to the decoding algorithm.777We note that our notion of self-seeded codes is strictly stronger than the notion of “seeded codes” defined in [3]. Seeded codes (unlike self-seeded codes), requires the encoding and decoding algorithms to share a random seed. Thus, unlike self-seeded codes, seeded codes necessitate a trusted setup between the encoding and decoding. Still, we require that no efficient adversary can find a message and a close error vector that leads to a decoding error.
Indeed, self-seeded codes offer a stronger adaptive notion of security, where the adversary can choose the message adaptively based on the randomness fixed by the sender. In contrast, standard randomized codes require the sender to choose fresh randomness after the adversary has decided on the message.
4 Optimal Randomized Codes and Self-Seeded Codes
We now state our main theorems for randomized codes and self-seeded codes. Our results show that it is possible to construct codes that bypass the Singleton bound for computationally bounded channels under minimal cryptographic assumptions. Specifically, for the randomized codes construction, we assume the existence of one-way functions, and for the self-seeded codes construction, we assume the existence of a collision resistance hash function.
Main theorem
We state our result for randomized codes and self-seeded codes together in the following theorem.
Theorem 18 (Randomized codes [resp., self-seeded codes] with constant alphabet size).
Assuming the existence of one-way functions [resp., collision resistance hash functions], the following holds: For every constant and every sufficiently small constant , there exists a -randomized [resp., -self-seeded code] code with rate that simultaneously (uniquely) corrects relative errors and detects from relative errors. Moreover, the alphabet size is a constant that depends on .
The proof of Theorem 18 appears in Section 4.2 and relies on our construction described in Section 4.1.
Remark 19.
If the underlying cryptographic primitives assumed in our theorem statements (that is, one-way functions for randomized codes and collision resistance hash functions for self-seeded codes) have sub-exponential security, then our constructed codes in Theorem 18 will also have sub-exponential failure probability for correction and detection against channels that are computable by sub-exponential size circuits. This follows since all our reductions run in polynomial time and have a polynomial loss in advantage. However, for the sake of simplicity, we focus on the case of polynomial/negligible security in our formal statements and proofs.
4.1 Construction
In this section, we describe the construction that we will use to prove Theorem 18.
We now describe the encoding and decoding algorithms. In Figure 1, we state the parameters and ingredients that are needed to define our encoding and decoding algorithms that are specified in Figure 2. The encoding and decoding algorithms are essentially the same for randomized codes and for self-seeded codes. In both cases the encoding algorithm takes as input a message and some value . For randomized codes, a fresh random is chosen every time we send a new message, and for the self-seeded codes r is sampled once (and is given to any possible adversary), and this single fixed could be used in the encoding of multiple messages.
In both cases, the decoding algorithm only takes as input a possibly damaged codeword . In particular, the decoding algorithm is not provided with the seed .
4.2 Proof of Theorem 18
In this section, we prove Theorem 18. We make use of the following technical lemma. The proof for the lemma is given below in Section 4.2.1.
Lemma 20.
If there are ingredients and parameters that satisfy the conditions stated in Figure 1. Moreover, if the hash family specified in Figure 1 is a family of universal one-way hash functions [resp., collision resistance hash functions]. Then, the code as specified in Figure 2 is a -randomized code [resp., -self-seeded code]. Moreover, the code is perfectly reliable, has alphabet size , rate , and the decoding algorithm simultaneously,
-
detects from errors, and
-
corrects from errors.
Proof of Theorem 18.
Note that one-way functions imply universal one-way hash functions that satisfy the conditions stated for the hash family in Figure 1 (see, Theorem 9).
Given a sufficiently small , let and apply Theorem 12 for and to get a standard code with rate , codeword length alphabet size . Moreover, list decodes from errors (that is, from relative errors) with a list size that depends only on and . Note also that we can increase the alphabet size by bundling symbols together without damaging the list decoding and the rate of the code, and so, we can assume that is of size at least . To summarize meets the conditions stated in Figure 1 for “data code” for the parameters and and .
Apply theorem 13 for to get a code with codeword length , alphabet size at most that has distance and efficient decoding of up to errors. Note that for a sufficiently small (and ) it holds that . This means that meets the conditions stated in Figure 1 for the “control code” with parameters and .
Thus, we can apply Lemma 20 for , and to get a -randomized code when we take to be a universal one-way hash family and -self-seeded code if is a collision resistance hash family. In both cases, it follows that
-
-
,
-
for every sufficiently small (and ), .
-
The code detects from errors. That is, it detects from relative errors.
-
The code corrects errors. That is, it corrects up to relative errors.
This concludes the proof of Theorem 18.
4.2.1 Proof of Lemma 20
Let , be as defined in Figure 1 and let be as defined in Figure 2. By construction, is perfectly reliable. This holds since for every fixed randomness if no errors are applied, all ingredients in the construction are deterministic, and standard codes are by definition perfectly reliable. Similarly, by construction and
We now prove the rest of Lemma 20. For simplicity, we focus on the randomized codes case, the proof for the self-seeded codes is essentially the same and follows along similar lines.888The main difference is that we consider an adaptive adversary (that is, first and then ) and we reduce to the security of the CRH instead of the security of the UOWHF, otherwise, the proof is identical.
Consider the following randomized experiment for a algorithm and :
-
1.
-
2.
Compute (by sampling ).
-
3.
We make use of the following notation: Let , and . Recall that , and define and such that for every , . Let and .
Error correction
To prove the error correction property, we need to show that
We make use of the following claims:
Claim 21.
If then and .
Proof of Claim 21.
If it follows that and
.
Since is a standard code that uniquely decodes from errors it holds that . Similarly, since is a standard code that list-decodes from errors it follows that .
Claim 22.
, if then .
Proof of Claim 22.
Follows directly from the security of the universal one-way hash function, as otherwise, we can use to find collisions in contradicting the collision resistance property.
Error detection
To prove the error detection property, we need to show that
We make use of the following Claim.
Claim 23.
If and then .
Proof.
Let and note that the by the “Check control distance” step in the decoding algorithm (see Figure 2), if it follows that . Moreover, by construction, if it follows that . Thus, if and by the triangle inequality
Since and are both valid codewords of the standard code that has distance , it follows that . By Claim 23 if then . This concludes the proof, since by Claim 22, if , with all but negligible probability, is the only valid message that can be outputted by .
5 Randomized Codes Imply One Way Functions
We now show that non-trivial randomized codes imply the existence of one-way functions. For large alphabets, this is indeed optimal since as we have seen in our construction it is possible to construct optimal randomized codes assuming only universal one-way functions (which can be constructed from one-way functions).
The following theorem states that if a randomized code has a rate that bypasses the singleton bound, it implies one-way functions.
Theorem 24 (Randomized codes with rate beating the Singleton bound imply one way functions).
Let be a randomized code with rate . If one of the following holds:
-
The code detects -relative errors and .
-
The code corrects from -relative errors and .
Then there exists one-way functions.
The proof will make use of the following lemma.
Lemma 25 (Non-trivial error detection/correction for randomized codes implies one-way functions).
Let be an -randomized code that has at least one of the following properties:
-
The code detects -errors.
-
The code corrects -errors.
Let , if then defined so that on input it computes and outputs a binary representation of the first symbols is a distributional one-way function.999Here, is just the implicitly assumed bound on the number of random bits used by . 101010The statement of Lemma 25 could be readily extended so that restricting the output of to any symbols (that can be efficiently identified) not just the first symbols is itself a distributional one-way function. This would be similar to our statement on self-seeded codes made in the full version [22], where we show that restricting the output to any subset of size is a collision resistance hash family. However, for the sake of simplicity, we restrict our statement and analysis to the simple case where the distributional one-way function is defined to be the first symbols.
The proof for the lemma is given below, but first, we will use it to prove Theorem 24.
Proof of Theorem 24.
Let and note that by definition at least one of the following holds: The code detects -errors, or corrects -errors. Moreover, by our assumption on the rate of the code, it holds that . Let and let since it holds that . Thus, we meet the conditions of Lemma 25 which implies that there exists a distributional one-way function (in fact outputting the binary representation of the first symbols of is the distributional one-way function), and by Theorem 7 there exists a one-way function.
Proof of Lemma 25.
For , let and denote the random variables sampled uniformly from and respectively. Let and assume for contradiction that is not a distributional one-way function (see Definition 6). That is, for every constant there exists a algorithm such that for infinitely many ,
| (1) |
For a fixed for which the above holds, we omit the subscript from the random variables to avoid clutter.
Let and note that by the definition of average min-entropy (see Definition 3)
| (2) |
where the last inequality follows by Lemma 4 that states that .
Let and and recall that by definition, is just the binary representation of the first symbols of . Thus, if it follows that, , which by Equation 1 implies that:
| (3) |
Note also that by Equation 1,
| (4) |
We first handle the case where the code detects -error and then the case where the code corrects -errors.
Detection.
By Equation 1 and by the reliability of the code,
| (5) |
We now define a algorithm (that makes use of ) that contradicts the detection property of the code: On input , samples and on input it runs where is just the binary representation of the first symbols of (i.e., ), and outputs . It holds that:
-
(where by definition is sampled uniformly by ).
-
Compute (by sampling ).
-
.
Thus, contradicts the -detection property since
Where the penultimate inequality follows from Equations 5, 2, 4 and 3.
Correction.
We now prove the correction property. Let be a function defined as follows: On inputs , samples a set of random indices from the set (if is an odd number then with probability it sample indices and with probability it samples indices), and outputs where for every , and for every , . Intuitively, if and agree on the first indexes then is a random “mid point” between and .
Recall that , and let . By Equation 1 and symmetry,
| (6) |
Since by assumption , by the definition of it follows that:
| (7) |
Recall that are sampled uniformly at random, thus, by Equations 6 and 7 and the decoding properly of the code
| (8) |
We now define a algorithm (that makes use of ) that contradicts the correction property of the code: On input , samples and on input runs where is just the binary representation of the first symbols of (i.e., ), computes and outputs . It holds that,
-
(where by definition is sampled uniformly by ).
-
Compute (by sampling ).
-
(recall that for and ).
Thus, contradicts the -correction property since,
Where the penultimate inequality follows from Equations 2, 7 and 8. This concludes the proof for Lemma 25.
6 Self-Seeded Codes Imply Collision Resistance Hash Functions
We now show that non-trivial self-seeded codes imply the existence of a collision resistance hash family. Loosely speaking, the following theorem states that if a self-seeded code has rate that bypasses the Singleton bound, then it implies collision resistance hash functions (since there are codes with large alphabet that can achieve the Singleton bound, this result means that for large alphabet, any non-trivial self-seeded code implies collision resistance functions).
We note that while the result below is stated with respect to the notion of self-seeded codes, the same exact proof also holds for the weaker notion seeded codes defined in [3] in which the encoding and decoding algorithms both share the random seed. Since our lower bound still holds even if the decoding algorithm is given the random seed as input, the lower bound immediately extends to the weaker notion of seeded codes. However, to avoid confusion and clutter, we present our result with respect to the notion of self-seeded codes defined in this paper.
Theorem 26 (Self-Seeded codes with rate beating the Singleton bound imply collision resistance hash functions).
Let be a self-seeded code with rate . If one of the following holds:
-
The code detects -relative errors and .
-
The code corrects from -relative errors and .
Then there exists an explicit construction for a collision resistance hash family given black-box access to the code.
References
- [1] Jeremiah Blocki, Venkata Gandikota, Elena Grigorescu, and Samson Zhou. Relaxed locally correctable codes in computationally bounded channels. IEEE Transactions on Information Theory, 67(7):4338–4360, 2021. doi:10.1109/TIT.2021.3076396.
- [2] Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam Smith. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM journal on computing, 38(1):97–139, 2008. doi:10.1137/060651380.
- [3] Ofer Grossman, Justin Holmgren, and Eylon Yogev. Transparent error correcting in a computationally bounded world. In Rafael Pass and Krzysztof Pietrzak, editors, Theory of Cryptography - 18th International Conference, TCC 2020, Durham, NC, USA, November 16-19, 2020, Proceedings, Part III, volume 12552 of Lecture Notes in Computer Science, pages 530–549. Springer, 2020. doi:10.1007/978-3-030-64381-2_19.
- [4] V. Guruswami and A. Smith. Optimal rate code constructions for computationally simple channels. Journal of the ACM (JACM), 63(4):35, 2016.
- [5] Venkatesan Guruswami and Atri Rudra. Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Transactions on information theory, 54(1):135–150, 2008. doi:10.1109/TIT.2007.911222.
- [6] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory, 2012. Draft available at https://cse.buffalo.edu/faculty/atri/courses/coding-theory/book/web-coding-book.pdf.
- [7] R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, 1950. doi:10.1002/j.1538-7305.1950.tb00463.x.
- [8] Brett Hemenway, Rafail Ostrovsky, Martin J. Strauss, and Mary Wootters. Public key locally decodable codes with short keys. In Leslie Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 14th International Workshop, APPROX 2011, and 15th International Workshop, RANDOM 2011, Princeton, NJ, USA, August 17-19, 2011. Proceedings, volume 6845 of Lecture Notes in Computer Science, pages 605–615. Springer, 2011. doi:10.1007/978-3-642-22935-0_51.
- [9] Russell Impagliazzo and Michael Luby. One-way functions are essential for complexity based cryptography. In FOCS 30, pages 230–235, 1989. doi:10.1109/SFCS.1989.63483.
- [10] Swastik Kopparty, Or Meir, Noga Ron-Zewi, and Shubhangi Saraf. High-rate locally correctable and locally testable codes with sub-polynomial query complexity. Journal of the ACM (JACM), 64(2):1–42, 2017. doi:10.1145/3051093.
- [11] R. J. Lipton. A new approach to information theory. In 11th Annual Symposium on Theoretical Aspects of Computer Science, pages 699–708, 1994. doi:10.1007/3-540-57785-8_183.
- [12] Silvio Micali, Chris Peikert, Madhu Sudan, and David A. Wilson. Optimal error correction for computationally bounded noise. IEEE Trans. Information Theory, 56(11):5673–5680, 2010. doi:10.1109/TIT.2010.2070370.
- [13] Moni Naor and Moti Yung. Universal one-way hash functions and their cryptographic applications. In David S. Johnson, editor, Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989, Seattle, Washington, USA, pages 33–43. ACM, 1989. doi:10.1145/73007.73011.
- [14] Rafail Ostrovsky, Omkant Pandey, and Amit Sahai. Private locally decodable codes. In Lars Arge, Christian Cachin, Tomasz Jurdzinski, and Andrzej Tarlecki, editors, Automata, Languages and Programming, 34th International Colloquium, ICALP 2007, Wroclaw, Poland, July 9-13, 2007, Proceedings, volume 4596 of Lecture Notes in Computer Science, pages 387–398. Springer, 2007. doi:10.1007/978-3-540-73420-8_35.
- [15] Leonid Reyzin. Some notions of entropy for cryptography: (invited talk). In International Conference on Information Theoretic Security, pages 138–142. Springer, 2011. doi:10.1007/978-3-642-20728-0_13.
- [16] John Rompel. One-way functions are necessary and sufficient for secure signatures. In Harriet Ortiz, editor, Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13-17, 1990, Baltimore, Maryland, USA, pages 387–394. ACM, 1990. doi:10.1145/100216.100269.
- [17] R. Shaltiel and J. Silbak. Explicit list-decodable codes with optimal rate for computationally bounded channels. Comput. Complex., 30(1):3, 2021. doi:10.1007/s00037-020-00203-w.
- [18] R. Shaltiel and J. Silbak. Explicit uniquely decodable codes for space bounded channels that achieve list-decoding capacity. In STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1516–1526, 2021. doi:10.1145/3406325.3451048.
- [19] R. Shaltiel and J. Silbak. Error correcting codes that achieve BSC capacity against channels that are poly-size circuits. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 13–23, 2022. doi:10.1109/FOCS54457.2022.00009.
- [20] R. Shaltiel and J. Silbak. Explicit codes for poly-size circuits and functions that are hard to sample on low entropy distributions. Electronic Colloquium on Computational Complexity (ECCC), 2024. URL: https://eccc.weizmann.ac.il/report/2023/149.
- [21] Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27(4):623–656, 1948. doi:10.1002/J.1538-7305.1948.TB00917.X.
- [22] Jad Silbak and Daniel Wichs. Detecting and correcting computationally bounded errors: A simple construction under minimal assumptions. Cryptology ePrint Archive, Paper 2024/1461, 2024. URL: https://eprint.iacr.org/2024/1461.