Abstract 1 Introduction 2 Preliminaries 3 Locally decodable/correctable codes 4 Stream decodable codes of near quadratic length 5 Stream decodable codes require quadratic length 6 Stream decodable codes for linear functions of near linear length References

Tight Bounds for Stream Decodable Error-Correcting Codes

Meghal Gupta ORCID UC Berkeley, CA, USA Venkatesan Guruswami ORCID UC Berkeley, CA, USA Mihir Singhal ORCID UC Berkeley, CA, USA
Abstract

In order to communicate a message over a noisy channel, a sender (Alice) uses an error-correcting code to encode her message, a bitstring x, into a codeword. The receiver (Bob) decodes x correctly whenever there is at most a small constant fraction of adversarial errors in the transmitted codeword. We investigate the setting where Bob is restricted to be a low-space streaming algorithm. Specifically, Bob receives the message as a stream and must process it and write x in order to a write-only tape while using low (say polylogarithmic) space. Note that such a primitive then allows the execution of any downstream streaming computation on x.

We show three basic results about this setting, which are informally as follows:

  1. (i)

    There is a stream decodable code of near-quadratic length, resilient to error-fractions approaching the optimal bound of 1/4.

  2. (ii)

    There is no stream decodable code of sub-quadratic length, even to correct any small constant fraction of errors.

  3. (iii)

    If Bob need only compute a private linear function of the bits of x, instead of writing them all to the output tape, there is a stream decodable code of near-linear length.

Our constructions use locally decodable codes with additional functionality in the decoding, and (for the result on linear functions) repeated tensoring. Our lower bound, which rather surprisingly demonstrates a strong information-theoretic limitation originating from a computational restriction, proceeds via careful control of the message indices that may be output during successive blocks of the stream, a task complicated by the arbitrary state of the decoder during the algorithm.

Keywords and phrases:
Coding theory, Streaming computation, Locally decodable code, Lower Bounds
Copyright and License:
[Uncaptioned image] © Meghal Gupta, Venkatesan Guruswami, and Mihir Singhal; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Error-correcting codes
Related Version:
Full Version: https://arxiv.org/abs/2407.06446
Funding:
Research supported in part by NSF GRFP Fellowships (for M.G and M.S), a Simons Investigator award (for V.G), and NSF grans CCF-2210823 and CCF-2211972.
Editors:
Srikanth Srinivasan

1 Introduction

Consider the following task: a sender wishes to communicate a message to a receiver that it receives and processes bit-by-bit. This scenario arises, for instance, in automatic control applications, where a device receives an incoming stream of instructions that it executes in sequence. Concretely, consider a small satellite receiving instructions from a large control center on the ground. The control center wants to send the satellite instructions in a way that satisfies two properties:

  • Error resilience. The satellite should execute the correct set of instructions even if a constant fraction of the transmission is corrupted.

  • Low-memory decoding. The satellite should be able to process the instructions in order while only using limited space (significantly less than the total length of the instructions).

Sending the list of instructions x1xn directly, although easy to process and execute one-by-one, offers no resilience against errors. On the other hand, encoding x1xn into 𝖤𝖢𝖢(x1xn) with a standard error-correcting code [29, 17] is resilient to error, but requires the receiver to store the whole stream to decode, and thus use too much space. An intermediate approach would be to encode the individual instructions each by error-correcting codes as 𝖤𝖢𝖢(x1)𝖤𝖢𝖢(x2)𝖤𝖢𝖢(xn). However, this does not withstand a constant overall fraction of corruption: the adversary can corrupt 𝖤𝖢𝖢(x1) entirely, using only a 1/n fraction of corruption, and thus never allow the satellite to recover x1. We remark that is reminiscent of the issue faced by the naive approach, that encodes each round separately, in protecting interactive communication against errors [26]. What we would like is a code that encodes the message globally but can be decoded in low space as the encoded message arrives.

Stream-decodable codes.

This motivates the notion of a stream-decodable error-correcting code. In this model, we require that the receiver can output the entire message x1xn when any small fraction ρ of the message is adversarially corrupted while using low space (for example, polylog(n) space) to process the transmission bit-by-bit. More formally, a stream decodable code has the following two components:

  • An encoding function 𝖾𝗇𝖼:{0,1}n{0,1}m(n).

  • A randomized decoding algorithm 𝖽𝖾𝖼:{0,1}m(n){0,1}n that uses s(n) space (s(n) is much smaller than n: for instance, s(n)=polylog(n)). For all x, and for any z{0,1}m(n) within Hamming distance ρm(n) of 𝖾𝗇𝖼(x), it should hold that 𝖽𝖾𝖼(z) outputs x with high probability.

It is not clear that such codes should exist at all for any s(n)=o(n), even for small error rates ρ and an arbitrary communication blow-up m(n). In particular, a standard error-correcting code could require storing the full encoding at once to process, and so it requires s(n)=n.

Our first result constructs stream-decodable codes that achieve the following parameters (see Theorem 1 for a precise statement).

  • Error resilience of ρ=14ε for any ε>0, matching the best possible error resilience of standard error-correcting codes.

  • Near-quadratic blow-up in communication: m(n)=n2+r(s(n))s(n) (here r(s(n)) is small – typically o(1), but when s(n)=log(n)t, then r(s(n))=1/t). This is a larger blow-up in communication than incurred by standard error-correcting codes.

The construction itself is quite simple: it encodes the message by a locally decodable code of near-linear length, and repeats the encoding O(n) times. The more interesting part is the corresponding decoding algorithm for Bob. To this end, we work with stronger local decoding guarantees, specifically having access to soft information for unique decoding, and local list decoding with advice from close to 1/2 errors.

A matching lower bound.

Our next result demonstrates, surprisingly, that the communication blowup of our codes is essentially optimal: any stream-decodable code requires transmission length m(n)=Ω(n2s(n)), in contrast to the standard error-correction model. (See Theorem 2 for the precise statetemt.) This result is surprising and notable because it obtains a strong lower bound on an information-theoretic quantity (the codeword blow-up) leveraging the computational restriction of space-boundedness. The lower bound is established by carefully controlling the set of message indices that the decoder can output when processing successive blocks of sub-linear size of the stream. It is also interesting to contrast this with the earlier mentioned setting of interactive communication, which despite the challenge of encoding the global conversation that is only available in local fragments, admits non-trivial schemes with a constant factor communication blow-up, from Schulman’s seminal work [26] and many follow-ups (surveyed in [10]).

Comparison to [12].

Our work can be viewed as a strengthening of and resolution to most of the open problems in [12]. The problem is framed slightly differently in their work: in [12], instead of outputting x1xn, the decoder is only required to output a single bit f(x1xn). Here, the function f represents the output of an arbitrary streaming algorithm performed on x1xn, so it must be possible to compute in s(n) space in a streaming manner. The function f is unknown to Alice (or else she could simply send the value f(x1xn) to Bob) but known to the adversary causing the errors. One could imagine, for example, that f is an arbitrary linear function of x1xn, or one’s physical location after executing some (possibly non-commutative) sequence of movements x1xn.

For this problem, [12] provide a scheme requiring slightly larger than O(n4) encoding length. Specifically, if s(n)=log(n)t, their code requires n4+O(1/t) space. Our notion of a stream-decodable code is necessarily stronger: that is, if the decoder can write x1xn to an output tape in that order, they can also compute the output of any streaming algorithm f in low space. In this sense, we provide a generic “front-end” error-correction scheme that enables executing any streaming task in a noise-resilient manner. Further, our construction improves upon the encoding length of the one in [12], providing a nearly quadratic-length code for their problem.

Gupta and Zhang [12] also specifically investigate the scenario where Alice knows beforehand that Bob’s function f is a linear function of x1,,xn. For this restricted class of functions, they demonstrate a scheme that uses slightly larger than O(n2) encoding length.

Near-linear length code for stream computation of linear functions.

Our final result, stated precisely as Theorem 3, is a new scheme for stream-decoding linear functions. Specifically, we improve upon Gupta and Zhang’s result, demonstrating a scheme that requires only near-linear communication in n for computing linear functions. This is achieved using a tensor product of locally decodable codes for the encoding, and a careful recursive decoding approach to recover the desired linear function of the message.

1.1 The model definition

Before we provide the technical statements of our three main results, let us formally define the model of stream decodable codes. A (ρ,m(n),s(n))-stream coding scheme with probability p of success consists of the following:

  • An explicit family of encoding algorithms 𝖾𝗇𝖼={𝖾𝗇𝖼n:{0,1}n{0,1}m(n)} with encoding time poly(n).

  • An explicit family of randomized decoding algorithms 𝖽𝖾𝖼={𝖽𝖾𝖼n:{0,1}m(n){0,1}n} that read the input in a stream and are permitted s(n) memory and poly(n) time. The output is written to a one-way (left-to-right) write-only tape that writes only left-to-right. Whenever the Hamming distance Δ(z,𝖾𝗇𝖼(x))<ρm, 𝖽𝖾𝖼(z) correctly outputs x with probability p.

More generally, a (ρ,m(n),s(n))-stream coding scheme for a family of functions ={f:{0,1}{0,1}} consists of the following similar components, with the same time and space guarantees as above:

  • An explicit family of encoding algorithms 𝖾𝗇𝖼()={𝖾𝗇𝖼n():{0,1}n{0,1}m(n)}.

  • For each f, an explicit family of randomized decoding algorithms 𝖽𝖾𝖼(f)={𝖽𝖾𝖼n(f):{0,1}m(n){0,1}n} that read the input in a stream and outputs to one-way write-only tape. Whenever the Hamming distance Δ(z,𝖾𝗇𝖼(x))<ρm, the decoder 𝖽𝖾𝖼(f)(z) outputs f(x) with probability p.

We emphasize that the encoding has no knowledge of f, only of the family , while the decoding algorithm must succeed for all f.

1.2 Our results

In this section, we formally state our results. In this section, when we use the phrase “absolute constant” to describe any parameter, we mean that any asymptotic notation henceforth may have constants depending on that absolute constant.

The first result is a stream decodable error-correcting code incurring approximately quadratic blow-up in communication.

Theorem 1.

Fix any absolute constant ε>0. Then, for some large enough C=C(ε) and c=c(ε), the following hold.

  • If s(n)=(logn)t for some absolute constant t>C, then there is a (14ε,n2+c/t,s(n))-stream coding scheme.

  • For any function s(n)=(logn)ω(1), there is a (14ε,n2+o(1)s(n),s(n))-stream coding scheme. (Here, the implicit constants in the o(1) may depend on those in the ω(1).)

Both schemes succeed with probability 11nω(1).

The second result establishes that Theorem 1 is essentially optimal. That is, any encoding of a message that permits a low-space streaming algorithm to decode requires Ω(n2s(n)) length. We view this as one of the major surprises and contributions of this work.

Theorem 2.

Fix an absolute constant ρ>0 and let the space for the decoding algorithm be s(n)logn. Suppose there is a (ρ,m,s(n))-coding scheme for streams that succeeds with probability at least 112n2. Then, m=Ω(n2s(n)).

The final result states that the encoding length can be made almost linear for stream coding schemes that compute a linear function f(x). Here, the decoder need only output a private linear function f of the input bits rather than the entire input. When s(n)=nδ for sufficiently small n, this can even be made exactly linear, which is optimal.

Theorem 3.

Fix any absolute constant ε>0. Then, for some large enough C=C(ε) and c=c(ε), the following hold.

  • If s(n)=(logn)t for some absolute constant t>C, then there is a (14ε,n1+c/t,s(n))-stream coding scheme for the family of linear functions.

  • If s(n)=Ω(nδ) for some absolute constant δ>0, then there is a (14ε,O(n),s(n))-coding scheme for the family of linear functions.

Both schemes succeed with probability 11nω(1).

1.3 Discussion and further directions

Tightening lower order terms

Both Theorem 1 and Theorem 3 construct codes that are not optimal in lower-order terms. It may be possible to construct stream coding schemes of exactly length O(n2s(n)) and stream coding schemes for linear functions of length O(n). This is an interesting direction for future work.

Specifically, in the case of linear functions, it may be possible to construct constant rate codes. Interestingly, we can pose this question for an even more restrictive class of functions than linear functions: the class of index functions. These are the functions fi(x)=xi for all i. We do not even know if constant rate stream coding schemes exist here, when s(n) is sufficiently small, say polylog(n).

One simple strategy is to encode with a locally decodable code requiring Q=s(n)O(1) queries to recover an index. The decoder can then ignore all the indices except the Q it needs to recover the target index, and in poly(Q)=s(n) space, recover any individual index of the message. Indeed, for our constructions in both Theorem 1 and Theorem 3, this procedure is an important primitive. Unfortunately, the best locally decodable codes for polylogarithmic locality have super-linear encoding length [32], and so are not constant rate.

However, this is not necessarily the only way. Barring a constant rate construction of polylog(n)-query locally decodable codes, can we construct constant rate stream decoding schemes? We remark that if one removes the streaming requirement and only requires that the decoder be low space with arbitrary queries, constant rate codes are known [30, 13].

Improvements to the lower bound

We will discuss a few potential strengthenings to Theorem 2.

The work of [12] initially proposes the model of stream coding schemes where the decoder need only output f(x1xn), for an arbitrary choice of Boolean function f that can be computed by receiving x1xn in a stream in s(n) space. The simplest way to accomplish this task is to compute x1xn in order and perform the streaming computation of f as each bit is discovered. Our lower bound shows that this method requires encoding length Ω(n2s(n)), but there could be a different way. Nonetheless, we conjecture that there is a Boolean function computable by a s(n)-space streaming algorithms for which the lower bound of Ω(n2s(n)) encoding length holds. Candidates for such a function might be some form of group product over non-abelian groups.

Secondly, our lower bound in Theorem 2 only disproves stream coding schemes where the decoder outputs x with probability 11poly(n). However, random guessing only outputs x correctly with probability 12n. We conjecture that a stream coding scheme requires Ω(n2s(n)) space to output x even if we only require the success probability to be 1poly(n).

1.4 Related Work

Aside from the connection to [12], we discuss the relation of our work different models of error-correcting codes and to streaming algorithms.

Efficiency of error-correction algorithms.

Our work explores the model of stream decodable error-correcting codes where the decoder must be low-space and read the codeword bits in a single pass. Without the latter restriction, it is known how to construct asymptotically good codes for which a logspace decoder can correct a constant fraction of errors, and in fact one can also have a logspace encoder [30, 13]. Note that the decoder is not restricted to a single pass on the codeword. Since one typically receives a communicated codeword as a stream, a natural question is whether such results extends to low space decoding in a stream. This is our focus, and we show that for error-correction in this setting, one cannot have codes of constant rate, and in fact a near quadratic blow-up in message length becomes necessary.

Codes against streaming channels.

Streaming in the context of error-correction has previously been considered for channels which are low-space (say logarithmic in the code length) and cause errors in an online fashion as they read the input codeword in a single pass. This was part of a more general line of work on coding against computationally bounded channels whose systematic study was initiated in [15]. List-decodable codes with optimal rate against such channels were constructed in [27] for all error fractions p[0,1/2), and their decoding time improved in [20]. More recently and surprisingly, even unique-decodable codes with optimal rate (for at most fractions p<1/4 of errors caused by a streaming channel) were constructed in [28].

There is also beautiful work on causal channels, which must read the codeword in one pass, but there is no space restriction [5]. In contrast, our work is in the model where the receiver, as opposed to the channel, is the computationally restricted party.

Additionally, the authors of [8] consider a related version of the problem we consider, where the encoder also receives the message as a stream rather than all at once, but the encoder and decoder are permitted shared randomness. In this setting, they show that it is possible to achieve a constant-rate encoding with any constant fraction of errors less than 1 (over large alphabets).

Locally decodable codes.

One specific type of error-correcting codes related to our result is that of locally decodable codes. Locally decodable codes [33] can be viewed as a low-time and low-space version of error-correcting codes, where the goal is to learn a single bit of the original message. In constrast, for us, the decoder must be able to piece the entire stream with the relaxation that the decoder accesses the entire encoding via a stream rather than via query access. Locally decodable codes have been constructed in a variety of different parameter regimes, including constant query [7, 6] and rates approaching 1 [19]. In our work, we will use Reed-Muller codes [24, 25] that achieve polylog(n) query complexity and slightly super-linear block length.

As discussed in Section 1.3, our work also connects to locally decodable codes as a relaxation of the query model. Specifically, q-query locally decodable codes are poly(q) space stream coding schemes for the family of linear functions (as long as the locally decodable code permits poly(q) decoding space). Thus, our model can be viewed as a simpler setting than local decoding in which to construct high rate codes. In particular, the existence of constant rate stream decodable codes for index functions may be easier to resolve than constant rate polylogarithmic-query locally decodable codes.

Streaming Algorithms.

The algorithms in this work are streaming algorithms for processing noisy encoded communication.

Streaming algorithms are a prolific field of research with algorithms for a multitude of problems, including approximate counting [23] on approximate counting, heavy hitters [3], p approximation [1, 22, 18], and identifying a nonzero entry in a vector (for turnstile algorithms) [22]. Many works, for example [9, 4, 21, 2], also consider the problem of processing noisy data using streaming algorithms. [9] shows a memory lower bound for learning a parity function with noisy samples of random linear equations.

However, the typical streaming setting is quite different from our setting. The algorithms mentioned above are used to process adversarial or “random” data. Our algorithms on the other hand process carefully formatted streams in the presence of communication noise, rather than sample or data noise. Our streaming algorithms are for processing formatted communication rather than processing data.

2 Preliminaries

Notation

  • The function log is in base 2 unless otherwise specified.

  • The set [n] denotes the integers 1n.

  • Given a tuple T and element i, the expression T|i denotes i concatenated to T.

  • The phrase “with high probability in n” means with probability at least 11nω(1).

  • We use Δ(x,y) to denote the Hamming distance between two strings x,y(Σ)n, and δ(x,y) to denote the relative distance between them (i.e. δ(x,y)=1nΔ(x,y)). Any element of Σ is considered distance 12 from .

  • For clarity, we will often omit floor and ceiling signs where they would technically be necessary.

Lemma 4 (Tail bound for k-wise independent random variables).

Let k>4 be an even integer. Suppose X1,X2,,Xn are k-wise independent random variables taking values in [0,1]. Let Z=iXi and μ=𝔼[Z]. Then

Pr[|Zμ|A]8(kμ+k2A2)k/2.

2.1 Error-correcting codes

We begin with some results about error-correcting codes. We first state a theorem detailing the existence of distance 11/|𝕂|ε codes that are efficiently encodable and efficiently decodable. It is standard, and based on GMD decoding of concatenated codes with an outer code approaching the Singleton bound (like Reed-Solomon or algebraic-geometric codes), and a small inner code of relative distance close to (11/|𝕂|) (see for instance [14, Chap.14]).

Theorem 5.

For every ε>0 and every finite field 𝕂, there exists an explicit systematic linear error-correcting code 𝖤𝖢𝖢ε={𝖤𝖢𝖢ε,n:𝕂n𝕂m}n with relative distance at least 11/|𝕂|ε and mn/εO(1), and a Oε(n2)-time and space decoding algorithm 𝖣𝖤𝖢ε:𝕂m𝕂n, such that for any x𝕂n and w𝕂m satisfying δ(𝖤𝖢𝖢ε(x),w)<(11/|𝕂|)(1ε)/2, it holds that x=𝖣𝖤𝖢ε(w).

Our constructions will use Reed-Muller codes (based on evaluations of multivariate polynomials) concatenated with the codes from Theorem 5. In order to locally decode these codes, we will correct them along lines for which we would need to run list decoding algorithms for concatenated codes with outer Reed-Solomon codes. The following list decoding result for such codes is standard, and based on list-decoding the inner codes by brute force and list-recovering the outer Reed-Solomon codes; see for example [16]. (Better parameters are possible, but having poly(ε) rate and poly(1/ε) output list size suffices for our purposes.)

Theorem 6.

Let ε>0. Let 𝖢 be a concatenated code with outer Reed-Solomon code over 𝔽q of rate (ε4)4 and an inner code of relative distance at least 12ε216. Then C can be list-decoded in poly(q) time from a fraction (1ε)/2 of errors with an output list size of 64/ε3.

3 Locally decodable/correctable codes

In this section, we introduce the locally decodable/correctable codes that will form the backbone of the constructions in our paper. There are two theorems we require, one for Section 4 and one for Section 6. Each of our codes will require a feature besides just correctness of local decoding/correcting when the distance to a codeword is small. Both proofs are omitted for this conference version.

Our first code for binary alphabets has two additional requirements. It requires that the decoder output a probability of each output 0 and 1 rather than only one of them (smoothness). This requirement is similar to list decoding with confidences from [12], and we adapt their proof below. Secondly, it has a local decoding with advice guarantee. To establish this notion, we use ideas similar to [31] and the locally list-decodable codes of [11].

Our second code requires only a smoothness guarantee for local decoding. However, it is in the regime with large alphabet and non-constant ε, and the code is required to be linear.

Theorem 7 (Binary locally decodable code).

Fix an arbitary ε>0. Let Q=Q(n)[(logn)100,n]. There is a code 𝖫𝖣𝖢:{0,1}n{0,1}N that satisfies the following properties:

  • Length: The length N=N(n)n(logQn)100logQn.

  • Distance: For any xy{0,1}n, it holds that δ(𝖫𝖣𝖢(x),𝖫𝖣𝖢(y))12ε.

  • Smooth local decoding/correcting: There exists a randomized algorithm 𝒜 that on input i[n] (resp. input i[N]) non-adaptively queries Q bits of the encoding and runs in O(Q3) time and space and achieves the following guarantee. For any word w{0,1}N, with high probability in n, the algorithm outputs probabilities p(b) for b{0,1} that satisfy p(0)+p(1)=1 and p(xi)>12δ(w,𝖫𝖣𝖢(x))ε (resp. p(𝖫𝖣𝖢(x)i)>12δ(w,𝖫𝖣𝖢(x))ε).

  • Local decoding with advice: There exists a randomized algorithm that on input i[n] queries Q bits of the encoding (non-adaptively) and runs in poly(Q) time and space, and a distribution 𝒟 on [N]u (independent of i) for some u=O(Q) (that is, subsets of indices of size u), which does the following. For any word w{0,1}N satisfying δ(w,𝖫𝖣𝖢(x))<12ε, it outputs xi with high probability in n when additionally given 𝖫𝖣𝖢(x)d1𝖫𝖣𝖢(x)du for d1ds𝒟.

The proof can be found in the full version of the paper on arXiv.

We next turn to the proof of the large alphabet version of Theorem 7. Here, we will only need the smooth local decoding guarantee.

Theorem 8 (Large alphabet locally decodable code).

Let ε>0 and let 𝕂 be a field of the form 𝔽2k where 2k>ε10, and let Q=Q(n)[(ε1logn)100,n]. There is a linear code 𝖫𝖣𝖢:𝕂n𝕂N that satisfies the following properties:

  • Length: The length N=N(n) satsifies Nn(ε1logQ(n))100logQ(n).

  • Distance: For any xy𝕂n, it holds that δ(𝖫𝖣𝖢(x),𝖫𝖣𝖢(y))1ε6.

  • Large alphabet smooth local decoding/correcting: There exists a randomized algorithm 𝒜 that on input i[n] (resp. input i[N]) queries Q bits (non-adaptively) of the encoding and runs in O(Q3) time and space and does the following. For any word w(𝕂)N, it outputs a list of probabilities p(σ) for σ(𝕂), satisfying that σ(𝕂p(σ)=1, at most one σ𝕂 has p(σ)>0, and p(xi)+0.5p()>1δ(w,𝖫𝖣𝖢(x))ε (resp. p(𝖫𝖣𝖢(x)i)+0.5p()>1δ(w,𝖫𝖣𝖢(x))ε) with high probability in n. Here, the Hamming distance between and σ𝕂 is 0.5. Moreover, the decoding algorithm queries any specific index with probability at most 1.1QN.

The proof can be found in the full version of the paper on arXiv.

4 Stream decodable codes of near quadratic length

In this section we prove Theorem 1, which is a construction of a stream-decodable code that achieves nearly quadratic length in polylogarithmic space. We restate it here:

Theorem 1. [Restated, see original statement.]

Fix any absolute constant ε>0. Then, for some large enough C=C(ε) and c=c(ε), the following hold.

  • If s(n)=(logn)t for some absolute constant t>C, then there is a (14ε,n2+c/t,s(n))-stream coding scheme.

  • For any function s(n)=(logn)ω(1), there is a (14ε,n2+o(1)s(n),s(n))-stream coding scheme. (Here, the implicit constants in the o(1) may depend on those in the ω(1).)

Both schemes succeed with probability 11nω(1).

For convenience, we will scale ε by a factor of 10, so that the adversary introduces at most (1/410ε)m errors into the stream (rather than (1/4ε)m). Also, we will present an algorithm whose space is O(s(n)) rather than just s(n), since we can simply scale s(n) to account for this. We will assume throughout this section that n is sufficiently large.

The encoding algorithm 𝖾𝗇𝖼 will be very simple. First, we let Q=min(s(n)0.1,2logn). Then, we take a code 𝖫𝖣𝖢 as in Theorem 7 with parameters n,ε,Q, and let y=𝖫𝖣𝖢(x) have length N=n(logQn)O(logQn). Then, we simply define 𝖾𝗇𝖼 as follows:

Definition 9.

Define 𝖾𝗇𝖼(x)=𝖫𝖣𝖢(x)k=yk, where yk denotes the string y repeated k times, and k=Q2n/s.

Note that we will then have m|𝖾𝗇𝖼(x)|=kN, so the adversary will be permitted at most (1/410ε)kN errors.)

Algorithm 1 Decoding algorithm 𝖽𝖾𝖼 for Theorem 1.

We now describe how the decoding algorithm 𝖽𝖾𝖼 will work, as given formally in Algorithm 1. Let u be chosen as in the “local decoding with advice” property in Theorem 7, and let v=(logn)2. Pick indices j1,,ju+v[1,N], where j1,,ju are chosen according to the distribution 𝒟 in Theorem 7 and ju+1,,ju+v are chosen uniformly randomly.

Informally, these bits will be used to create a “checksum” for y=𝖫𝖣𝖢(x). More explicitly, we will use the first part of the stream to determine what yj1,,yju+v are with high probability. Then, for the remaining (corrupted) copies of y, we will first check whether they have at most 1/4ε errors by comparing their bits at ju+1,,ju+v. Then, if they do, we will use the local decoding with advice property in Theorem 7 to recover bits of x.

More specifically, for each index jt, we will keep track of quantities P0t,P1t, which are the total confidence that yjt is 0 or 1, respecitively. When receiving the stream, we will perform smooth local decoding (as described in Theorem 7) on each (corrupted) copy of y, to obtain confidences p0,p1 for each index jt (in parallel for each t). For each t, we then increment P0t,P1t, respectively, by the obtained p0,p1. We show the following claim:

Claim 10.

With high probability, the following holds for all t and at every step of the algorithm: Let b=yjt. Suppose that, after reading corrupted copies of y, we have Pbt(1ε)k/2. Then, the number of errors in the first copies of y is at least 12(1ε)(12k)N.

The proof can be found in the full version of the paper on arXiv.

5 Stream decodable codes require quadratic length

We now prove our lower bound, Theorem 2 (restated below), demonstrating that the construction in Section 4 is essentially tight. That is, any error-correcting code that can be decoded with failure probability at most 1/2n2 by a stream permitting s(n) space must have encoding length at least Ω(n2s(n)).

Theorem 2. [Restated, see original statement.]

Fix an absolute constant ρ>0 and let the space for the decoding algorithm be s(n)logn. Suppose there is a (ρ,m,s(n))-coding scheme for streams that succeeds with probability at least 112n2. Then, m=Ω(n2s(n)).

Proof.

Suppose otherwise; that is, suppose that there is a (ρ,m,s)-coding scheme for streams, where m=ρn2/104s, ρ is a fixed constant, and s=s(n)logn (and n is sufficiently large). Also, we may assume that s<n/100 (otherwise the statement is obvious).

We will then demonstrate how to construct an adversarial input for this coding scheme, so that 𝖽𝖾𝖼 fails with probability at least 1/2n2. First, note that we can assume that 𝖽𝖾𝖼 does not output anything when it receives a 0 bit (except at the end of the stream): instead, we may have 𝖽𝖾𝖼 keep track of the length of the current run of 0s (using only O(logn)s memory), and process all the 0s when it encounters the next 1. In particular, we will have the adversary replace several parts of the input with all 0s, and thus we can assume that the algorithm does not output anything at these parts (except perhaps the last block).

For an input string x{0,1}n, the encoding 𝖾𝗇𝖼(x) then has length m. We split 𝖾𝗇𝖼(x) up into k=n/100s contiguous “blocks” which each consist of =ρn/100 bits; denote these blocks B1(x),,Bk(x) (we will sometimes abbreviate Bi(x) by just Bi). We will consider what the algorithm 𝖽𝖾𝖼 may output during each block, assuming that the block Bi is uncorrupted. Essentially, we will show that there is a fixed set of roughly indices such that 𝖽𝖾𝖼 essentially only outputs indices from this set while it is processing block i.

To this end, we will let ai{0,1}s denote the contents of the memory of 𝖽𝖾𝖼 right before receiving block i. Note that ai is random and may depend on the randomness of 𝖽𝖾𝖼, as well as on the bits that the adversary has changed in previous blocks. We will mostly restrict our attention to values of ai that do not cause the algorithm to fail with significant probability. Specifically, we say that ai is good with respect to x, or just good (for particular values of x and i), if the probability that 𝖽𝖾𝖼 outputs an incorrect bit during block i with starting memory ai is at most 1/n2. (This probability is taken over only the randomness of the algorithm 𝖽𝖾𝖼, since the contents of block Bi are a deterministic function of x.)

Now, suppose that the decoder 𝖽𝖾𝖼 currently has memory state ai and is about to receive block i (whose contents are Bi). While it processes Bi, it will output various bits of x; that is, there are various pairs (j,b) for which 𝖽𝖾𝖼 will output that xj=b. (We assume, as we may, that 𝖽𝖾𝖼 keeps track of which index it is on, so we can determine j from the memory contents of 𝖽𝖾𝖼.) When it does so, we say that 𝖽𝖾𝖼 outputs the pair (j,b). Then, let T(ai,Bi) be the set of all (j,b) which 𝖽𝖾𝖼 outputs with probability at least 1/n2 when it receives Bi with initial memory contents ai. (Again, this probability is only over the randomness of 𝖽𝖾𝖼.) Note that if ai is good (with respect to x), then T(ai,Bi) must match x (that is, xj=b for all (j,b)T(ai,Bi)). Note that T(ai,Bi) is a deterministic function of ai,Bi.

We are now ready to prove the following lemma.

Lemma 11.

There exists x{0,1}n such that the following holds for all i: There is a set Si of size at most 3 such that for all good ai, we have |T(ai,Bi(x))Si|<3s.

Proof.

Let ai(1),,ai(r) each be good ai (with respect to a particular choice of x and i), where r=/s. Consider the following union:

𝒯=1jrT(ai(j),Bi).

Essentially, if we can show, for a particular x, that this union is always small, we will then be able to show that T(ai,Bi) cannot take too large a range of values as (good) ai varies, because the union of any r such instances will have small size. To this end, we first show the following claim.

Claim 12.

There exists x such that, for every i, the union 𝒯 always has size at most 3 (no matter the choice of r good ai(j)’s).

Proof.

First observe that since each T(ai,Bi) must match x, this means that 𝒯 must also match x (recall that this means that for every (j,b)𝒯, we have xj=b). However, 𝒯 is a deterministic function of (Bi,ai(1),,ai(r)), which consists of 2 bits. Therefore, there are only at most 22 possible values that 𝒯 can take for any particular i. Thus, in total (over all i) there are at most n22<23 possible values for 𝒯.

However, each possible value of 𝒯 that has size at least 3 can only match a 23 proportion of x{0,1}n. Therefore, there exists some x which does not match any possible 𝒯 of size at least 3, thus proving the claim.

Now fix x such that Claim 12 holds, and let i be arbitrary. We will now finish the proof of Lemma 11 by constructing Si. Let be the family that consists of T(ai,Bi) for all good ai. Claim 12 means that the union of any r sets in has size at most 3. We wish to find Si which contains all but at most 3s elements of each T.

Now, construct Si in steps as follows: at each step, find T which has more than 3s elements which are not in Si, and add all its elements to Si. This process terminates when there is no such T remaining. Obviously this set satisfies |TSi|3s for all T, so it remains only to check that |Si|<3. Indeed, suppose that |Si|3; consider the first step in which its size reached or exceeded 3. Note that at each step the size of Si increases by more than 3s, so in total the number of steps for |Si| to reach 3 is at most 33s=r. But then Si is the union of at most r sets in and has size at least 3, contradicting Claim 12. Thus, Si has the desired properties, completing the proof of Lemma 11.

With this lemma proven, we return to the proof of Theorem 2. We will now demonstrate a strategy for the adversary such that, with probability at least 1/2n2, the decoding algorithm 𝖽𝖾𝖼 fails to output x. Fix the input x and sets Si such that Lemma 11 is satisfied.

Now, the adversary picks a uniformly random index j{1,,n/2}. Then, for each i such that Si contains at least 5s(n) indices in [j+10(i1)s,j+10is), the adversary replaces the whole block Bi with zeros (unless it is the last block). We will first show that 𝖽𝖾𝖼 must fail on this input with probability at least 1/2n2. Let us suppose otherwise.

As before, let ai be the (random) memory state of 𝖽𝖾𝖼 before processing Bi. Note that with probability at least 1/2, all the ai are good (since in the cases where they are not, 𝖽𝖾𝖼 fails with probability at least 1/n2). If they are all good, then by the definition of T and a union bound (over the block number i and the indices j), with probability at least 0.99, at every block Bi, the indices output during block i are all in T(ai,Bi). In this case, observe that during block i, 𝖽𝖾𝖼 may never output the index j+10is (or any greater index). Indeed, if this were not the case, during some block 𝖽𝖾𝖼 would have to output everything in [j+10(i1)s,j+10is), but then we would have |T(ai,Bi)Si|>5s, contradicting Lemma 11.

Thus, right before the last block, the algorithm cannot have output any index past j+10ks=j+n/10. Then, in the last block, the algorithm outputs at most |Si|+3s3+3s by Lemma 11. Therefore, overall, with probability at least 0.49, the algorithm outputs at most j+n/10+3+3s<n indices, and thus does not output all of x.

Therefore we have shown that, under this strategy for the adversary, the algorithm must fail on x with probability at least 1/2n2. It remains only to show that the adversary deletes (i.e., replaces with 0’s) at most an ρ fraction of blocks. Indeed, it is enough to show that at most an ρ fraction of blocks are deleted in expectation, since the adversary can pick j such that the fewest blocks are deleted. Fix a block Bi. The probability that Bi gets deleted is equal to the probability that Si has at least 5s indices in the interval [j+10(i1)s,j+10is). For any fixed jBi, there are at most 10s choices of j such that j lands in this interval, so the probability that it does is at most 20s/n (since j is chosen uniformly at random from n/2 choices). Thus the expected number of indices in Si in the interval is (20s/n)|Si|60s/n. By Markov’s inequality, the probability that this is at least 5s is at most 12/n<ρ. Therefore, the probability that Bi is replaced with 0’s is at most ρ for each block Bi. The expected number of blocks replaced by 0’s is therefore at most ρk.

Putting everything together, the adversary has a strategy which deletes at most an ρ fraction of blocks (and thus at most an ρ fraction of the bits) which causes 𝖽𝖾𝖼 to fail with probability at least 1/2n2. This completes the proof of Theorem 2.

6 Stream decodable codes for linear functions of near linear length

Our final result is a noise-resilient encoding of essentially linear length that admits efficient stream decoding of arbitrary linear functions. The family of linear functions is defined as the functions f:{0,1}n{0,1} for which there exists y{0,1}n such that f(x)=xymod2.

Theorem 3. [Restated, see original statement.]

Fix any absolute constant ε>0. Then, for some large enough C=C(ε) and c=c(ε), the following hold.

  • If s(n)=(logn)t for some absolute constant t>C, then there is a (14ε,n1+c/t,s(n))-stream coding scheme for the family of linear functions.

  • If s(n)=Ω(nδ) for some absolute constant δ>0, then there is a (14ε,O(n),s(n))-coding scheme for the family of linear functions.

Both schemes succeed with probability 11nω(1).

Parameters and notation

Throughout this section, fix ε (we will hide dependence on ε in big O notation), δ if it exists, and the space function s. Let n represent the length of Alice’s message. We assume that s(n)>(logn)1000. Set the following parameters:

r=s(n)0.2andd=lognlograndε=ε10d.

Let 𝕂 be 𝔽2k where ε10<2k2ε10 so that the condition of Theorem 8 is satisfied for 𝕂 and ε.

Let 𝖫𝖣𝖢:𝕂r𝕂R with be a linear locally decodable code satisfying the guarantees of Theorem 8 for ε. The locality is Q>(ε1logr)100. This is satisfied whenever Q>(dlogr)1000 because

(ε1logr)100(dε1logr)100(d(logr)2)100(dlogr)1000

since logr is sufficiently large compared to ε1. We will set Q subject to this constraint later. Also note that Q>ε1logn=ε1dlogr which is a fact we will use later and Q<r must be satisfied.

This gives us a value of R(ε1logQr)100logQr, satisfied if R(dlogQr)150logQr. We will actually set R later, subject to this constraint. We can make R larger as needed by a variety of methods, such as padding the input or duplicating each bit of the code.

We assume for simplicity that r and d are integers. It will be useful to index Alice’s (the sender’s) input x{0,1}n by a tuple in [r]d rather than an integer in [n]. Whenever we say an event occurs with high probability, we mean with high probability in n unless specified otherwise. We refer the reader to Section 2 to review notation used in this section.

6.1 Statement of encoding scheme

The encoding 𝖾𝗇𝖼(x) that Alice (the encoder) sends in the stream is a tensor code. To this end, we begin by defining a tensor power of a linear code 𝖢. We remark that tensor products of distinct linear codes can also be defined, but we will only need to take a tensor power of one code.

Definition 13 (𝖢k).

Let 𝖢:𝕂m𝕂M be a linear error correcting code on strings of length N on some alphabet 𝕂. Since the code is linear, 𝖢 is an M×m matrix. Then the k-th tensor power of this encoding matrix 𝖢k:{0,1}[m]k{0,1}[M]k is the encoding function 𝖢k. We note that for any code 𝖢, it holds that 𝖢0:𝕂𝕂 is the identity function.

Next, we will state the encoding 𝖾𝗇𝖼(x) that Alice (the encoder) sends in the stream.

Definition 14 (𝖢inner).

Let 𝖢inner:𝕂{0,1}O(1) be a distance (1ε/4) linear code guaranteed by Theorem 5. It’s length is Ninner.

Definition 15 (𝖾𝗇𝖼(x)).

Alice’s encoding is 𝖾𝗇𝖼(x) is defined as follows. Viewing x as an element of 𝕂n (which we may since 𝕂 is of the form 𝔽2k, she computes 𝖫𝖣𝖢d(x) (where the message and codeword bits are both taken in lexicographic order) and concatenates this with 𝖢inner.

We note that Alice’s encoding is length Rd, and her encoding takes time at most poly(n). We’ll later choose our parameters to satisfy the conditions of Theorem 3.

6.2 Statement of decoding scheme

Let Bob’s private vector be =(1,,1),,(r,,r), (here the ordering is lexicographic). The function Bob is trying to compute is x=(1,,1)x(1,,1)++(r,,r)x(r,,r). For ad, the vector (i1ida) is defined to be an ra dimensional vector that denotes restricted to indices where the first da entries of the tuple are (i1ida). Throughout, we will view and x as elements of 𝕂 rather than 𝔽2 and note that it suffices to compute x in 𝕂. The same notation applies for x or any other string canonically indexed by tuples.

Before we state our main algorithm, we efficiently construct lists 𝗊𝗅𝗂𝗌𝗍1,,𝗊𝗅𝗂𝗌𝗍r satisfying certain properties that Bob can find in s(n) space and poly(n) time. These lists will be the indices of LDC that we query for each i, and it will be important that they overlap as little as possible.

Lemma 16.

Given a locally decodable code 𝖫𝖣𝖢:𝕂m𝕂M satisfying m>logn with q queries satisfying the requirements of Theorem 8 for ε, there is a (randomized) algorithm (permitted 1mω(1) probability of failure) that generates lists 𝗊𝗅𝗂𝗌𝗍1,,𝗊𝗅𝗂𝗌𝗍m in time poly(mq) and space O(mq2) such that the following holds. The smooth local decoding algorithm for each index i queries only the indices 𝗊𝗅𝗂𝗌𝗍i, and for each z{0,1}M, with high probability in m, satisfies the smoothness guarantees in Theorem 8 for all i. Moreover, no I[M] appears in more than 3mq2M lists.

The proof can be found in the full version of the paper on arXiv.

We are now ready to state the main algorithm.

Algorithm 2 Bob’s decoding algorithm 𝗅𝗂𝗇𝖾𝖺𝗋_𝖽𝖾𝖼.

The proof of Theorem 3 can be found in the full version of the paper on arXiv.

References

  • [1] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20–29, 1996. doi:10.1145/237814.237823. 6
  • [2] Omri Ben-Eliezer, Rajesh Jayaram, David P Woodruff, and Eylon Yogev. A framework for adversarially robust streaming algorithms. ACM Journal of the ACM (JACM), 69(2):1–33, 2022. doi:10.1145/3498334. 7
  • [3] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693–703. Springer, 2002. doi:10.1007/3-540-45465-9_59. 6
  • [4] Jiecao Chen and Qin Zhang. Bias-aware sketches. arXiv preprint arXiv:1610.07718, 2016. arXiv:1610.07718. 7
  • [5] Zitan Chen, Sidharth Jaggi, and Michael Langberg. A characterization of the capacity of online (causal) binary channels. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing, pages 287–296, 2015. doi:10.1145/2746539.2746591. 6
  • [6] Zeev Dvir, Parikshit Gopalan, and Sergey Yekhanin. Matching vector codes. SIAM Journal on Computing, 40(4):1154–1178, 2011. doi:10.1137/100804322. 6
  • [7] Klim Efremenko. 3-query locally decodable codes of subexponential length. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 39–44, 2009. doi:10.1145/1536414.1536422. 6
  • [8] Matthew Franklin, Ran Gelles, Rafail Ostrovsky, and Leonard J. Schulman. Optimal coding for streaming authentication and interactive communication. IEEE Transactions on Information Theory, 61(1):133–145, 2015. doi:10.1109/TIT.2014.2367094. 6
  • [9] Sumegha Garg, Pravesh K Kothari, Pengda Liu, and Ran Raz. Memory-sample lower bounds for learning parity with noise. arXiv preprint arXiv:2107.02320, 2021. arXiv:2107.02320. 7
  • [10] Ran Gelles. Coding for interactive communication: A survey. Found. Trends Theor. Comput. Sci., 13(1-2):1–157, 2017. doi:10.1561/0400000079. 3
  • [11] Shafi Goldwasser, Dan Gutfreund, Alexander Healy, Tali Kaufman, and Guy N Rothblum. Verifying and decoding in constant depth. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 440–449, 2007. doi:10.1145/1250790.1250855. 8
  • [12] Meghal Gupta and Rachel Yun Zhang. A noise resilient transformation for streaming algorithms. arXiv preprint arXiv:2307.07087, 2023. doi:10.48550/arXiv.2307.07087. 3, 5, 8
  • [13] Venkatesan Guruswami and Valentine Kabanets. Hardness amplification via space-efficient direct products. Computational Complexity, 17(4):475–500, 2008. doi:10.1007/S00037-008-0253-1. 5, 6
  • [14] Venkatesan Guruswami, Atri Rudra, and Madhu Sudan. Essential coding theory. Draft available at http://cse. buffalo. edu/faculty/atri/courses/coding-theory/book, 2019. 7
  • [15] Venkatesan Guruswami and Adam Smith. Optimal rate code constructions for computationally simple channels. Journal of the ACM (JACM), 63(4):1–37, 2016. doi:10.1145/2936015. 6
  • [16] Venkatesan Guruswami and Madhu Sudan. List decoding algorithms for certain concatenated codes. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, STOC ’00, pages 181–190, New York, NY, USA, 2000. Association for Computing Machinery. doi:10.1145/335305.335327. 7
  • [17] R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, 1950. doi:10.1002/j.1538-7305.1950.tb00463.x. 2
  • [18] Piotr Indyk and David Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202–208, 2005. doi:10.1145/1060590.1060621. 6
  • [19] Swastik Kopparty, Shubhangi Saraf, and Sergey Yekhanin. High-rate codes with sublinear-time decoding. J. ACM, 61(5), September 2014. doi:10.1145/2629416. 6
  • [20] Swastik Kopparty, Ronen Shaltiel, and Jad Silbak. Quasilinear time list-decodable codes for space bounded channels. In 60th Annual Symposium on Foundations of Computer Science (FOCS), 2019. 6
  • [21] Chaoyi Ma, Haibo Wang, Olufemi Odegbile, and Shigang Chen. Noise measurement and removal for data streaming algorithms with network applications. In 2021 IFIP Networking Conference (IFIP Networking), pages 1–9. IEEE, 2021. doi:10.23919/IFIPNETWORKING52078.2021.9472845. 7
  • [22] Morteza Monemizadeh and David P Woodruff. 1-pass relative-error lp-sampling with applications. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1143–1160. SIAM, 2010. doi:10.1137/1.9781611973075.92. 6, 7
  • [23] Robert Morris. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840–842, 1978. doi:10.1145/359619.359627. 6
  • [24] David E Muller. Application of boolean algebra to switching circuit design and to error detection. Transactions of the IRE professional group on electronic computers, 3(3):6–12, 1954. doi:10.1109/IREPGELC.1954.6499441. 6
  • [25] Irving S Reed. A class of multiple-error-correcting codes and the decoding scheme. IEEE Transactions on Information Theory, 4(4):38–49, 1954. doi:10.1109/TIT.1954.1057465. 6
  • [26] Leonard J. Schulman. Coding for interactive communication. IEEE Trans. Inf. Theory, 42(6):1745–1756, 1996. doi:10.1109/18.556671. 2, 3
  • [27] Ronen Shaltiel and Jad Silbak. Explicit list-decodable codes with optimal rate for computationally bounded channels. computational complexity, 30:1–70, 2021. 6
  • [28] Ronen Shaltiel and Jad Silbak. Explicit uniquely decodable codes for space bounded channels that achieve list-decoding capacity. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1516–1526, 2021. doi:10.1145/3406325.3451048. 6
  • [29] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. doi:10.1002/j.1538-7305.1948.tb01338.x. 2
  • [30] Daniel A. Spielman. Linear-time encodable and decodable error-correcting codes. IEEE Trans. Inf. Theory, 42(6):1723–1731, 1996. doi:10.1109/18.556668. 5, 6
  • [31] Madhu Sudan, Luca Trevisan, and Salil Vadhan. Pseudorandom generators without the xor lemma. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pages 537–546, 1999. 8
  • [32] Sergey Yekhanin. Locally decodable codes. Foundations and Trends in Theoretical Computer Science, 6(3):139–255, 2012. doi:10.1561/0400000030. 5
  • [33] Sergey Yekhanin et al. Locally decodable codes. Foundations and Trends® in Theoretical Computer Science, 6(3):139–255, 2012. doi:10.1561/0400000030. 6