Abstract 1 Introduction 2 Graph codes: Basics 3 Existence of optimal graph codes 4 Explicit graph codes for high distance: Concatenated Codes 5 Explicit graph codes with very high distance: dual-BCH Codes References Appendix A Justensen-like code

Error-Correcting Graph Codes

Swastik Kopparty ORCID Department of Mathematics and Department of Computer Science, University of Toronto, Canada Aditya Potukuchi ORCID Department of Electrical Engineering and Computer Science, York University, Toronto, Canada Harry Sha111Optional footnote, e.g. to mark corresponding author ORCID Department of Mathematics and Department of Computer Science, University of Toronto, Canada
Abstract

In this paper, we construct Error-Correcting Graph Codes. An error-correcting graph code of distance δ is a family C of graphs, on a common vertex set of size n, such that if we start with any graph in C, we would have to modify the neighborhoods of at least δn vertices in order to obtain some other graph in C. This is a natural graph generalization of the standard Hamming distance error-correcting codes for binary strings.

Yohananov and Yaakobi were the first to construct codes in this metric. We extend their work by showing

  1. 1.

    Combinatorial results determining the optimal rate vs distance trade-off nonconstructively.

  2. 2.

    Graph code analogues of Reed-Solomon codes and code concatenation, leading to positive distance codes for all rates and positive rate codes for all distances.

  3. 3.

    Graph code analogues of dual-BCH codes, yielding large codes with distance δ=1o(1). This gives an explicit “graph code of Ramsey graphs”.

Several recent works, starting with the paper of Alon, Gujgiczer, Körner, Milojević, and Simonyi, have studied more general graph codes; where the symmetric difference between any two graphs in the code is required to have some desired property. Error-correcting graph codes are a particularly interesting instantiation of this concept.

Keywords and phrases:
Graph codes, explicit construction, concatenation codes, tensor codes
Funding:
Swastik Kopparty: Research supported by an NSERC Discovery Grant.
Aditya Potukuchi: Research supported by an NSERC Discovery Grant.
Copyright and License:
[Uncaptioned image] © Swastik Kopparty, Aditya Potukuchi, and Harry Sha; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Coding theory
Acknowledgements:
We are grateful to Mike Saks, Shubhangi Saraf and Pat Devlin for valuable discussions.
Editors:
Raghu Meka

1 Introduction

In this paper, we studi Error-Correcting Graph Codes. These are large families of undirected graphs on the same vertex set such that any two graphs in the family are “far apart” in a natural graph distance. Informally, the graph distance between two graphs on the same vertex set of size n measures the minimum number of vertices one needs to delete to make the resulting graphs identical (not just isomorphic). This can also be thought of as (1) the number of vertices whose neighborhoods one has to modify to go from one graph to another, (2) the vertex cover number of the symmetric difference of the two graphs, or (3) n minus the largest independent set in the symmetric difference of the two graphs.

Definition (Graph Distance).

Given two graphs G and H on vertex set [n], the graph distance dgraph(G,H) is the size of the smallest set S[n] such that G[S¯]=H[S¯].

This is a very natural metric and encompasses deep information about graphs. For example, note the following two simple facts (1) the graph distance of a graph from the empty graph is n minus the independence number of the graph. (2) the graph distance of a graph from the complete graph is n minus the clique number of the graph. Thus the answer to the question: “how far can a graph be from both the empty graph and the complete graph?” is precisely the question of finding the right bound for the diagonal Ramsey numbers; the answer is nO(logn).

Additionally, the notion of graph distance arises in the definition of node differential privacy (see for example [5, 8, 11, 9]). One instantiation of this setting is a graph encoding a social network where vertices correspond to people and edges correspond to social connections. The goal for node differential privacy is to design an algorithm 𝒜 that approximately computes certain statistics of the graph (such as counting edges, triangles, and connected components) while maintaining each individual’s privacy. Here, privacy is ensured by requiring that the output distribution on a certain graph does not change by much with any one vertex is deleted. In other words, for any graphs G, H of graph distance 1, the output distribution of 𝒜 on the G and H should be similar. Graph distances of greater than 1 are then considered in the continual release model where the graph varies over time as studied in [9].

Codes in the graph-distance metric were initially studied by Yohananov, Efron, and Yaakobi [19, 18], who gave several optimal and near-optimal constructions in a variety of parameters. Their setting allows for arbitrary symbols α𝔽q to be written on the edges, and the graph is allowed to have self-loops. We provide several new constructions for the binary setting, and the graph is not allowed to have self-loops. We show that random codes are asymptotically optimal codes in this metric and so focus our attention on explicit constructions.

Error-correcting graph codes also fall into the general framework of graph codes defined by Alon, Gujgiczer, Körner, Milojević, and Simonyi [3], where for a fixed family of graphs, one seeks a large code C of graphs on the same n-vertex set such that the symmetric difference of any two graphs in C does not lie in . This class of problems was studied for a wide variety of natural in a number of recent works [3, 2, 1]. As discussed in [2], for a suitable choice of , graph codes become equivalent to classical Hamming distance codes.

Finally, we note that error-correcting codes are pseudorandom objects, and the connection to Ramsey graphs suggests that error-correcting graph codes might be closely related to pseudorandom graphs. Thus, the problem of studying and explicitly constructing a pseudorandom family of pseudorandom graphs is interesting in its own right.

1.1 Related work

Similar to the Hamming setting, we briefly define the dimension, rate, and distance of a graph code C on n vertices where each edge is allowed to have an arbitrary symbol α𝔽q written on it.

  • Dimension. k=logq(|C|).

  • Rate. R=logq(|C|)/(n2).

  • Distance. The distance of a code is the largest d such that dgraph(G,H)d for each G,HC such that GH. The relative distance δ is d/n.

Unless specified otherwise , we will always be interested in asymptotics of the above parameters as n.

As noted in [19], an argument similar to the Singleton bound states that for a graph code of dimension k, and distance d, we have

k(nd+12). (1)

In terms of and rate R, and relative distance δ, we have

R(1δ)2+o(1). (2)

A code is called optimal if it R(1δ)2, equivalently, δ1R.

The relevant constructions in [19], [18] are summarized in Table 1. Note that all of the constructions in Table 1 are for undirected graphs where self-loops are allowed. One can impose n linear constraints to get codes with no self-loops (zeros on the diagonals of the adjacency matrix). Since the block length of these codes is (n2), adding these these constraints changes the rate by o(1).

Table 1: Summary of constructions in [19], and [18].
Name 𝜹 Field Tradeoff
[19] (Construction 1) =2/n 𝔽2 Optimal
[19] (Construction 3) 1 𝔽q,qn1 Optimal
[19] (Construction 5) 1/2 𝔽2 R=12δ
[18] (Construction 1) =3/n 𝔽2 Optimal
[18] (Construction 2) =4/n 𝔽2 Optimal

Construction 5 of [19] leverages a connection to symmetric array codes and construction by Schmidt [15]. The trade-off achieved, R12δ is close to optimal for δ very small.

Construction 3 of [19] translates Hamming distance into graph distance using the tensor product. Although this construction doesn’t directly imply anything about binary codes due to the requirement on field size, a similar code will be an important ingredient in our constructions.

1.2 Results

Our main results are:

  1. 1.

    We show that there are optimal binary codes for any constant δ(0,1).

  2. 2.

    We give constructions of graph codes that have positive constant R for all constant δ<1. In particular, we give a quasi-polynomial time explicit construction achieving δ=1O(R1/4), while optimal codes have δ=1R1/2. We also give an explicit construction with δ=1O(R1/6), and a strongly explicit construction with δ=1O(R1/8). Although these codes are not optimal, they are the first binary error-correcting graph codes achieving δ>1/2 with a constant rate.

  3. 3.

    We give (strongly) explicit constructions of graph codes with very high δ=1O(nϵ) and R=Ω(nϵ1/2) for constant ϵ>0. This gives a “graph code of Ramsey graphs” as will be discussed later.

Independent work

Pat Devlin and Cole Franks [6] independently proposed the study of graph error-correcting codes under this metric, determined the optimal R vs δ tradeoff, and gave some weaker explicit constructions of graph codes that worked for certain ranges of R and δ.

1.3 Techniques

We now discuss our techniques. We will often specify graphs by their adjacency matrices, viewed as matrices with 𝔽2 entries.

Our nonconstructive existence result is a straightforward application of the probabilistic method. We consider a uniformly random 𝔽2-linear subspace of the 𝔽2-linear space of symmetric 0-diagonal n×n matrices (i.e., the space of all adjacency matrices of graphs); this turns out to give a good graph code with optimal R vs δ tradeoff. Such graph codes can be specified by an 𝔽2 basis for it. We say that a construction is explicit if this basis can be produced in poly(n) time. We say it is strongly explicit if, given (i,j,k), the (i,j) entry of the k’th basis element can be computed in poly(log(n)) time.

To get asymptotically good codes for any constant δ(0,1) we take a longer detour.

  1. 1.

    We start with a slight variation of [19] (Construction 3), which gives a way to get a good graph code from a classical Hamming-distance linear code C𝔽2n. We first consider the tensor code CC, where the elements are matrices all of whose rows and columns are codewords of C. A-priori, C could contain matrices that are neither symmetric, not have a 0 diagonal. But interestingly, if we consider the set C of all matrices in CC that are symmetric and have 0 diagonal, then C is a linear space with quite large dimension. In particular, if the classical Hamming distance code C has positive rate, then so does the graph code C. We call this construction C=STCZD(C) (Symmetric Tensor Code with Zero Diagonals).

    It turns out that if C has good relative distance (in the Hamming metric), then STCZD(C) has good distance in the graph metric. However the relative graph distance of STCZD(C) such a code is bounded by the relative distance of C – and since C is a binary code, this is at most 1/2.

  2. 2.

    Now, we bring in another idea from the Hamming code world: code concatenation. Instead of constructing a graph code of symmetric zero-diagonal matrices over 𝔽2, we instead construct a “large-alphabet graph code” of symmetric zero-diagonal matrices over 𝔽q for some large q=2t and then try to reduce the alphabet size down to 2 by replacing the q-ary symbols with 𝔽2-matrices with suitable properties.

    Applying the analog of STCZD to a large alphabet code allows one to get large-alphabet graph codes with large δ, approaching 1 (since over large alphabets Hamming distance codes can have length approaching 1). Using Reed-Solomon codes as these large alphabet codes also allows us to make the STCZD construction strongly explicit. Furthermore, when applied to Reed-Solomon codes, these codes have a natural direct description: these are the evaluation tables of low-degree bivariate polynomials P(X,Y) on product sets S×S that are (1) symmetric (to get a symmetric matrix), and (2) multiples of (XY)2 (to get zero diagonal).

  3. 3.

    What remains now is to develop the right kind of concatenation so that the resulting graph code has good distance. This turns out to be subtle and requires an “inner code” with a stronger “directed graph distance” property with δ nearly 1. Fortunately, this inner code we seek is of very slowly growing size, and we may find this by brute force search. This concludes our description of our explicit construction of graph codes with δ approaching 1 and positive constant R.

Finally, we discuss our constructions for very high distances, δ=1o(1). In this regime, as mentioned earlier, this is related to constructions of Ramsey graphs, a difficult problem in pseudorandomness with a long history. Our constructions work up to δ=1Ω(1n); concretely, we get a large linear space of graphs such that all graphs in the family have no clique or independent set of size Ω(n). The construction is based on polynomials over finite fields of characteristic 2: When n=2t, we consider a linear space of certain low degree univariate polynomials f(X) over 𝔽2t, and create the 𝔽2 matrix with rows and columns indexed by 𝔽2t whose x,y entry is 𝖳𝗋(f(x+y)). Here 𝖳𝗋 is the finite field trace map from 𝔽2t to 𝔽2. The use of 𝖳𝗋 of polynomials is inspired by the construction of dual-BCH codes. We then show that any such matrix has no large clique or independent set unless 𝖳𝗋f is identically 0 or identically 1 (corresponding to the empty and complete graphs respectively). The proof uses the Weil bounds on character sums and a Fourier analytic approach to bound the independence number for the graphs. Our constructions are listed in Table 2.

Table 2: A list of constructions of error-correcting graph codes in this paper. All except Concatenated RS Tensor Codes are explicit. Note that the last four constructions are interesting in the regime where δ is close to 1.
Name Approximate Tradeoffs Strongly Explicit?
Random Linear Codes (Proposition 4) R=(1δ)2o(1) No
Concatenated RS Tensor Codes (19) R=(1δ)4o(1) No
Double Concatenated RS Tensor Codes R=(1δ1/3)6o(1) No
Triple Concatenated RS Tensor Codes (22) R=(1δ1/4)8o(1) Yes
Dual BCH Codes (28) R=log(n)(1δ)/n Yes

1.4 Concluding thoughts and questions

The most interesting question in this context is to obtain explicit constructions of graph codes with optimal R vs δ tradeoff. While we have several constructions achieving nontrivial parameters in various regimes, it would even be interesting to get the right asymptotic behavior for the endpoints with δ approaching 1. The setting of large δ (including δ=1o(1)) seems especially challenging, given the connection with the notorious problem of constructing Ramsey graphs.

Another interesting question is to get decoding algorithms for graph codes. For a certain graph code C, if we are given a graph that is promised to be close in graph distance to some graph G in C. Then, can we efficiently find G?

A more general context relevant to error-correcting graph codes is the error correction of strings under more general error patterns. Suppose we have a collection of subsets Si[m] for i[t], where iSi=[m]. These Si denote the corruption zones; a single “corruption” of a string z{0,1}m entails, for some i[t], changing z|Si to something arbitrary in {0,1}Si. We want to design a code C{0,1}m such that starting at any xC if we do fewer than d corruptions to x, we do not end up at any yC with yx. When the Si are all of size b and form a partition of [m] into t=m/b parts, then such a code is exactly the same as a classical Hamming distance code an alphabet of size 2b. Error-correcting graph codes give a first step into the challenging setting where the Si all pairwise intersect - here we have m=(n2), t=n, the Si (which correspond to all edges incident on a given vertex) all have size n1, and every pair Si and Sj intersect in exactly 1 element. It would be interesting to develop this theory – to both find the limits of what is achievable and to develop techniques for constructing codes against this error model.

Finally, there are many other themes from classical coding theory that could make sense to study in the context of graph codes and graph distance, including in the context of sublinear time algorithms. It would be interesting to explore this.

Organization of this paper

We set up basic notions in Section 2. We show the existence of optimal graph codes in Section 3. In Section 4 we construct asymptotically good codes. Finally, in Section 5, we show explicit constructions of graph codes with very high distance.

2 Graph codes: Basics

Definitions and notations

All graphs will be simple, undirected graphs on the vertex set [n] unless otherwise noted. For any graph G, use AG to denote the adjacency matrix of G, and view AG as an element of the vector space 𝔽2(n2). For two graphs G, H, let GΔH be the symmetric difference of the two graphs, i.e. AGΔH=AGAH. For a subset S[n], we use G[S] to denote the subgraph of G induced by the vertex set S. If A is a n×n matrix and S,T[n], let AS,T be the sub-matrix indexed by S on the rows and T on the columns. For x[0,1], we use h2(x)=xlog2x(1x)log2(1x) to denote the binary entropy function.

Definition 1 (Graph distance and relative graph distance).
  • The graph distance between two graphs G and H, denoted by dgraph(G,H), is the smallest d such that there is a set S[n], |S|=d, and G[[n]S]=H[[n]S].

  • The relative graph distance, or simply relative distance, between G and H is denoted by δgraph(G,H), and is the quantity dgraph(G,H)n.

In the above definition, we require that the graphs G[S¯] and H[S¯] be identical and not just isomorphic. Lemma 2 describes several equivalent characterizations of graph distance.

Proposition 2 (Alternate characterizations of dgraph).

Suppose G and H are graphs on the same vertex set. Then

  1. 1.

    dgraph(G,H) is the minimum vertex cover size of GΔH.

  2. 2.

    dgraph(G,H) is the minimum number of vertices whose neighborhoods you need to edit to transform G into H

  3. 3.

    dgraph(G,H) is the minimum number of vertices whose neighborhoods you need to edit to transform GΔH into the empty graph.

Note that dgraph is a metric (see [19], Lemma 5).

Definition 3 (Graph code).

We say that a set C2([n]2) is a graph code on [n] with distance d if for every pair of graph G,HC, we have that dgraph(G,H)d.

  • The rate of C, denoted by RC, is the quantity log2(|C|)(n2)2log2(|C|)n2.

  • The distance (resp. relative distance) of C, denoted by dC (resp. δC), is the quantity minG,HCGHdgraph(G,H) (resp. minG,HCGHδgraph(G,H)).

Upper bound

As noted in [19], the Singleton bound can be used to obtain an upper bound on the rate of a graph code. We include a proof here for completeness.

Proposition 4.

Any graph code with relative distance δ has dimension at most (n(1δ)+12).

Proof.

Consider any graph code C of relative distance δ. Let A[n] be any subset of at most δn1 vertices. For any two distinct G1,G2C, we have that the graphs induced on the vertices outside A, G1[[n]A] and G2[[n]A], are different. Indeed, since otherwise, A is a vertex cover of G1ΔG2, contradicting the relative distance assumption. So, we have that

|C|2(n(1δ)+12).

Expressed in terms of rate and distance, Proposition 4 implies that for constant relative distance δ, R(1δ)2+O(1/n).

3 Existence of optimal graph codes

As with other objects in the theory of error-correcting codes, the first question we seek to answer relates to the optimal rate-distance tradeoff.

In contrast to the Hamming world, we find that, in the graph distance, random linear codes meet the Singleton bound.

Proposition 5.

Let δ(0,1). Then, there exists a linear graph code with distance greater than δ and dimension at least

max{(n(1δ)2)h2(δ)n2,0}.
Proof.

We only consider the case when (n(1δ)2)h2(δ)n2>0, and prove this by a probabilistic construction. Let 𝐆n,1/2, be the Erdös-Rényi random graph distribution where the vertices are [n], and each of the (n2) possible edges are selected independently with probability 1/2. Let k=(n(1δ)2)h2(δ)n2, and let G1,,Gk be graphs chosen independently from 𝐆n,1/2. Consider the 𝔽2-linear space C=span{AG1,,AGk}. We wish to show that C has distance at least δn with high probability.

For α𝔽2k, let Hα be the graph with the adjacency matrix i=1kαiAGi. Recalling the definition of graph distance, we need that for any distinct α,β𝔽2k, HαΔHβ must have minimum vertex cover size at greater than δn. Since HαΔHβ=Hαβ, it suffices to show that for any non-zero α𝔽2k, that Hα has no vertex cover of size δn.

For any α𝔽2k{0}, Hα has the same law as 𝐆n,12. Let Bα be the event that Hα has a vertex cover of size δn. We have

Pr(Bα) =Pr(S([n]δn):G[[n]S]= is the empty graph)
(nδn)2(n(1δ)2)
2(n(1δ)2)+h2(δ)n,

where the first inequality uses the union bound over subsets of size δn, and the fact that there are (n(1δ)2) edges outside of a vertex cover that all have to be unselected. Then, union bounding over the choices of α, we get

Pr(α𝔽2k{0}Bα)2k(n(1δ)2)+h2(δ)n1/4.

Therefore, with probability at least 3/4, there does not exist α𝔽2k, such that α0, and Hα has a vertex cover of size δn, and hence C is a code with relative distance greater than δ.

Finally, note that if Hα does not have a vertex cover of size δn, then Hα is not the empty graph. Thus, the fact that there does not exist α𝔽2k, α0 such that Hα has a vertex cover of size δn implies that G1,,Gk are linearly independent. Hence, the dimension of C is k=(n(1δ)2)h2(δ)n2, as required.

As a result, we have the following corollaries.

Corollary 6.

For any constant δ(0,1), there exist optimal linear graph codes with relative distance at least δ.

Corollary 7.

For any constant c>2, there exists a linear graph code with dimension at least Ω(log2n) and relative distance at least δ=1clognn.

4 Explicit graph codes for high distance: Concatenated Codes

To get explicit codes of distance δ>1/2, we start with Construction 3 of [19]. The construction utilizes the tensor product code introduced by [17], where elements of the code are matrices where all rows and columns are codewords over some base Hamming code. The elements of the tensor code are then the adjacency matrices of graphs. Since we consider undirected graphs and do not allow self-loops, we take the subcode of the tensor code containing only symmetric matrices with zeros on the diagonal.

Definition 8 (Symmetric Tensor Code with Zeros on the Diagonal).

Let C be a code over 𝔽q. The symmetric tensor code with zeros on the diagonal built on C denoted STCZD(C) is the set of matrices A over 𝔽qn×n such that (1) A is symmetric, (2) the rows and columns of A are codewords of C, and (3) the entries on the diagonal are all 0.

Properties of elements of Tensor Product Codes that are symmetric and zero-diagonal were also previously studied, in the context of constructing a gap-preserving reduction from SAT to the Minimum Distance of a Code problem, by Austrin and Khot [4].

We will also define another notion of distance that will be useful later on.

Definition 9 (Directed graph distance).

Let A, and B be n×n matrices over some field. Define the directed graph distance denoted ddirected(G,H) to be the minimum d such that there exists sets S,T[n] of size d where (AB)S¯,T¯=𝟎.

For weighted, directed graphs, G, and H, abbreviate ddirected(AG,AH)=ddirected(G,H). To better distinguish between ddirected and dgraph, we sometimes refer to dgraph as the undirected graph distance.

When G and H are weighted directed graphs, ddirected(AG,AH) can be viewed as the minimum d such that you can go from G to H by editing the incoming edges of d vertices and the outgoing edges of d vertices. The main difference between directed and undirected graph distance is that directed graph distance allows the subset of rows and the subset of columns to be edited to be different. Insisting that S=T in the definition for directed graph distance recovers the undirected graph distance. From this, it easily follows that if G and H are undirected graphs, then ddirected(G,H)dgraph(G,H). Thus, to find codes with high graph distance, it suffices to find codes with large directed graph distance, where all the elements are adjacency matrices of undirected, unweighted graphs (i.e., 0/1 matrices that are symmetric and zero diagonal). Note that when discussing rate directed graph codes C, we are referring to the quantity logq(|C|)/n2 instead of logq(|C|)/(n2).

In the next lemma, we show several properties of STCZD(C). Most importantly, the Hamming distance of C translates to the directed graph distance of STCZD(C).

Lemma 10.

Let C be a linear [n,k,d]q-code, then STCZD(C)𝔽qn×n is linear, has dimension at least (k+12)n, and has directed graph distance d.

Proof.

Let C be a linear [n,k,d]q-code, and let C=STCZD(C). C is linear because C is linear, and the sum of symmetric matrices is symmetric.

WLOG, we assume that C is systematic, i.e., it has k×n generator matrix G=[I|A], where I is the k×k identity and A is a k×(nk) matrix. Then, for every X𝔽qk×k, the following has rows and columns belonging to C

GTXG=[XXAATXATXA].

Furthermore, GTXG is symmetric and has zeros on the diagonal iff X is symmetric, X has zeros on the diagonal, and ATXA has zeros on the diagonal. This imposes (k+12)+(nk) linear constraints on the entries of X. Thus, the subspace of X for which GTXGC has dimension at least k2(k+12)(nk)=(k+12)n.

Since C is linear, to show the distance property, it suffices to show that ddirected(A,𝟎)d for every non-zero AC. Let AC be a non-zero element of C, we’ll show that for any S,T[n], with |S|<d, and |T|<d, AS¯,T¯𝟎.

Since A is non-zero, there is some non-zero entry Aij. Since the rows are elements of a linear code of distance d, the Hamming weight of the ith row is at least d. Since |T|<d, there is some jT such that Aij is non-zero. Then, the jth column is also a non-zero codeword of C, so it also has Hamming weight at least d. Since |S|<d, there is some iS such that Aij is non-zero. Thus, AS¯,T¯𝟎.

 Remark 11.

A simple calculation shows that if C has constant rate, R, then STCZD(C) has rate R2/2o(1) as a directed graph code.

Given this lemma (and using the fact that dgraphddirected), for any binary code C𝔽2n with rate R and relative distance δ, STCZD(C) is a (undirected) graph code with rate R2o(1), and relative distance δ. Thus, if C has rate distance tradeoff R=f(δ), then STCZD(C) has rate distance tradeoff R=f(δ)2o(1). Immediately, we get that taking the STCZD of any asymptotically good binary code yields an asymptotically good graph code.

There are two problems with this construction. Firstly, these codes may not be strongly explicit. Secondly, the Plotkin bound [13] implies that any binary code with distance >1/2 has vanishing rate. So this falls short of our goal of obtaining strongly explicit, asymptotically good codes with δ>1/2.

We will address the first problem by showing that if the base code is a Reed Solomon code [14], then there is a large subcode that is strongly explicit.

Code 12 (Reed Solomon Code RS(n,R,q) [14]).

The Reed Solomon Code with parameters n, R, and q, where qn, is a code over 𝔽qn with rate R and distance 1R.

Lemma 13.

Let CRS(n,R,q) where Rn=k1. Then, there exists a strongly explicit subcode SSTCZD(C) such that the dimension of S is at least (k12).

Proof.

Essentially, we will evaluate symmetric polynomials that are a multiple of (XY)2 on a n×n grid.

Suppose h(X,Y) is a symmetric polynomial of individual degree at most k3, and let M be the evaluations of f(X,Y)=(XY)2h(X,Y) on a n×n grid. M is symmetric and has zeros on the diagonal. Furthermore, for a fixed value, y, f(X,y) is a univariate polynomial in X of degree at most k1, and hence the column indexed by y is an element of a Reed Solomon code of dimension k, and block length n. Similarly, the rows are also elements of the same code. Thus MSTCZD.

Let S be the space of bivariate symmetric polynomials of degree at most k3. For a,b, define polynomials pa,b(X,Y)=XaYb+XbYa. Notice that pa,b is symmetric, and furthermore the set

{pa,b:0a<bk3}{XiYi:i{0,1,,k3}},

is linearly independent. Thus dim(S)=(k22)+k2=(k12), as desired.

To extend this construction to the setting of δ>1/2, we use the concatenation paradigm from standard error-correcting code theory, initially introduced by Forney [7].

We will start with a code over a large alphabet and then concatenate with an inner code, which will be an optimal directed graph code.

Lemma 14.

For any ϵ>0, and sufficiently large n, for any k<ϵ2n22n, there exists a linear directed graph code over 𝔽2 of dimension k and distance at least (1ϵ)n.

The proof is standard and similar to that of Proposition 5. So we will omit it.

Code 15 (Optimal Directed Graph Code Opt(ϵ,n,k)).

Require k<ϵ2n22n. Refer to a code with the properties in Lemma 14 as Opt(ϵ,n,k).

4.1 Symmetric concatenation

Since our inner code is not guaranteed to be symmetric, simply replacing each field element in the outer code with its encoding might result in an asymmetric matrix. To remedy this, we transpose the encoding for entries below the diagonal. This is made formal below.

Definition 16 (Symmetric Concatenation).

Let q,Q be prime powers, and n,N be positive integers. Let Cout𝔽QN×N and Cin𝔽qn×n such that |Cin|=Q. Define CinCout𝔽qnN×nN to be the code obtained by taking codewords of Cout and replacing each symbol of the outer alphabet with by their encodings under Cin if they lie above or on the diagonal, and with the transpose of their encodings if they lie below the diagonal.

Refer to caption
Figure 1: Example of symmetric concatenation. An outer codeword is shown on the left, with field elements represented as different colors. The concatenation with the inner code is shown to the right. Black squares represent 0, and white squares represent 1s.

Figure 1 visualizes an example of symmetric concatenation. We now show that distance and dimension concatenate exactly like it does for standard error-correcting codes.

Lemma 17.

Suppose Cin and Cout are linear codes as in the previous definition with directed graph distance d and D, respectively. Let k be the dimension of Cin, and K be the dimension of Cout. Note |Cin|=qk=Q. Then CinCout is linear and has distance at least dD, and dimension kK.

Proof.

Let C=CinCout. First note that CinCout can be made linear by using a 𝔽q-linear map from 𝔽qk to 𝔽qk before encoding with the inner code.

Consider a non-zero outer codeword O, and let A be the codeword after concatenation. Let S,T[nN] be of size less than dD. We’ll show that AS¯,T¯𝟎. Partition A into N×N blocks, where the (I,J)’th block for I,J[N], is the n×n matrix encoding the symbol at OIJ. Identify the indices [nN] with [N]×[n] where the tuple (I,i) corresponds to the i’th index in the I’th block.

For I[N], let SI={i[n]:(I,i)S} be the set rows in S in the I’th block. Define TJ similarly. Let S={I[N]:|SI|d}, be the set of blocks in which there are at least d elements in S, and similarly define T, S<, and T<.

Since I[N]|SI|<dD, J[N]|TJ|<dD, we have |S|<D, and |T|<D. Since the outer code has directed distance D, OS<,T<𝟎, so there exists IS<, and JT< such that OI,J is non-zero. So, the (I,J)’th block of A is a non-zero codeword or the transpose of a non-zero codeword of Cin. Let us call it X, and suppose that XCin.

Since |SI|<d, and |TJ|<d, and the inner code has distance at least d, we have that XSI¯,TJ¯𝟎. To finish the proof, note that ddirected(X,𝟎)=ddirected(XT,𝟎) by switching the roles of S and T in the definition of directed graph distance.

For the claim about dimension, note that the number of codewords in C is the number of codewords in Cout, which is QK. The dimension of C is then Klogq(Q)=Kk.

Additionally, it is clear from the definition of symmetric concatenation that if Cout is symmetric and zero-diagonal, so is CinCout.

 Remark 18.

Lemma 17 also holds for the standard definition of concatenation (without transposing blocks below the diagonal). However, we will not need this fact.

4.2 Concatenated graph codes

We can instantiate the concatenated code using Reed Solomon codes.

Code 19 (Concatenated Code CRS(ϵ,n,k,N,ρ)).

Let Q=2k be the size of the alphabet of the outer code. Let ϵ,ρ(0,1), and n,k,N, to be integers satisfying k<ϵ2n22n, and NQ. Then,

CRS=Opt(ϵ,n,k)STCZD(RS(N,ρ,Q)).

The following theorem follows directly from Lemmas 10 and 17. As a reminder, here we are considering the rate of the codes as a (undirected) graph code.

Theorem 20.

Let ϵ,n,k,N,ρ be parameters satisfying the requirements listed in 19, then CRS(ϵ,n,k,N,ρ) is a graph code with rate ϵ2ρ2o(1), and relative distance (1ϵ)(1ρ).

Note that using this construction, we can get asymptotically good codes for any constant rate and distance - including for distances >1/2, which was not obtained in any of the previous constructions. We get R=(1δ)4o(1) by setting ϵ=ρ.

One drawback of this construction is that it is not strongly explicit or even explicit. The outer code can be made strongly explicit using Lemma 13, however, the inner code was an optimal code which we obtained by a randomized construction. The complexity of searching for such a code by brute force is too large. In particular, the optimal code has dimension ϵ2n2, and block length n2. Since we need the size of the code to be equal to the size of the outer alphabet, we have N=2ϵ2n2, so n=log(N)/ϵ. Then, there are at least 2ϵ2n4=2log(N)2/ϵ2 generator matrices to search over. Thus, we cannot find such a code efficiently.

To address this, we reduce the search space by concatenating multiple times. The resulting code will have a slightly worse distance/rate tradeoff but will still be asymptotically good for any constant distance or rate.

We also note that CRS can also be made strongly explicit using a Justensen-like construction. However, although this code is again asymptotically good, it has distance bounded away from 1. We present this construction in the Appendix.

4.3 Multiple concatenation

While concatenating twice suffices to obtain an explicit code, it is not clear that the obtained code is strongly explicit. This may be addressed by concatenating three times, at the cost of slightly weaker parameters. Here, we will also use the tensor product code as a building block. For any linear code C𝔽qn let TC(C) be the tensor product code of C. As a reminder, TC(C) is the code consisting of matrices A𝔽qn×n such that each row and each column of A are elements of C.

 Remark 21.

Suppose C is a linear code with distance d and rate R. Then, it follows from the proof of Lemma 10 that TC(C) has directed graph distance at least d. It is also well known that TC(C) has rate R2.

Below we present the analysis for triple concatenation.

Code 22 (Triple Concatenation CTrip(ρ,N)).

For ρ(0,1) and an integer N, let C be the subcode of STCZD(RS(N,ρ,N)) in Lemma 13. Then

CTrip=Opt(N3,ρ)TC(RS(N2,ρ,N2))TC(RS(N1,ρ,N1))C,

where N1,N2 and N3 are picked to make the concatenation work, i.e., |Opt(N3,ρ)|N2, and so on.

Notice that only the outer-most code needs to be symmetric and have zero diagonal since we use the symmetric concatenation operation (entries below the diagonal will be transposed). Thus, using the Tensor Product Code for the two middle codes (instead of STCZD) is sufficient and saves a factor of 2 (each time) on the rate.

Theorem 23.

Let N be a positive integer and ρ(0,1). Then CTrip(ρ,N) has distance at least (1ρ)4, and rate ρ8. Furthermore, CTrip(ρ,N) is strongly explicit.

Proof.

The claims about rate and distance follow directly from Lemma 17.

We’ll now show that this code is strongly explicit. The outermost code C is strongly explicit, and the two codes in the middle built on Reed Solomon codes are also strongly explicit. The idea is that the concatenation steps middle allow us to shrink the alphabet size from N to (less than) log(log(N)). Searching for optimal codes of this size can be done easily by brute force.

The dimension of TC(RS(N1,ρ,N1)) is (ρN1)2, so the number of codewords is N1(ρ1N1)2, and for the concatenation to work, we need this to be at least N. That is, we need (ρ1N1)2log(N1)log(N), which we can get easily by setting N1=O(log(N)). For the same reason, we can take N2=O(loglog(N)).

This is now small enough to do a brute-force search for an optimal code. The inner-most code has dimension ρ2N32, so we need 2ρ2N32=N2, or N3=O(log(N2)). There are then ρ2N34 possible generator matrices to search over. So the total number of codes we will need to search over is at most 2ρ2n4=2O(logloglog(N)2)=2o(loglog(N))=O(log(N)).

The tradeoff for this code is then

R=(1δ1/4)8.

Thus, we get strongly explicit asymptotically good codes for any constant distance or rate.

If we just wanted explicit codes (instead of strongly explicit), concatenating twice would suffice. In particular, the search space for the inner-most code has size

2O(loglog(N)2)=2o(log(N)),

which is smaller than any polynomial in N, but not polylogarithmic. The corresponding tradeoff for the double concatenated code is R=(1δ1/3)6.

5 Explicit graph codes with very high distance: dual-BCH Codes

In this section, we give explicit constructions of graph codes for the setting of very high distance (δ=1o(1)). As noted earlier, when the complete graph and the empty graph are part of the code, this is a generalization of the problem of constructing explicit Ramsey graphs (i.e., graphs with no large clique or independent set), which corresponds to graph codes of size at least 3.

Our main result here is an explicit construction of a graph code with distance 1nϵn1/2 and dimension Ω(nϵlogn), for all ϵ[0,1/2).

Theorem 24.

For all d, there is a strongly explicit construction construction of a code with dimension Ω(dlogn) and distance nO(dn).

In analogy with the situation for Hamming-distance codes, these are the dual-BCH codes of the graph-distance world.

5.1 Warmup: a graph code with dimension 𝛀(𝐥𝐨𝐠𝒏)

As a warmup, we first construct code with distance 11nϵ with growing dimension.

Let n=2t. Let 𝖳𝗋:𝔽2t𝔽2 denote the finite field trace map. Concretely, it is given by:

𝖳𝗋(x)=x+x2+x4++x2i++x2t1

For each α𝔽2t, consider the matrix Mα𝔽2n×n, where the rows and columns of Mα are indexed by elements of 𝔽2t, given by:

(Mα)x,y=𝖳𝗋(α(x+y)3).

Note that each Mα is symmetric. Consider the code

Code 25.

For n of the form 2t, let us define the family of codes

CWarmup={Mαα𝔽2t}.

We have that CWarmup is a linear code of dimension t=log2n.

Theorem 26.

The distance of CWarmup is at least 1O(n1/2).

Proof.

Fix any α𝔽2t. Let S𝔽2t be an arbitrary subset of vertices. It suffices to show that if S is bigger than Ω(n1/2)=Ω(2t/2), then there exist some x,yS such that

𝖳𝗋(α(x+y)3)=1.

Suppose not. Then we have:

x,yS(1)𝖳𝗋(α(x+y)3)=|S|2.

By Cauchy-Schwarz, we get:

|S|4 =(xSyS(1)𝖳𝗋(α(x+y)3))2
(xS(yS(1)𝖳𝗋(α(x+y)3))2)|S|
(x𝔽2t(yS(1)𝖳𝗋(α(x+y)3))2)|S|
=(x𝔽2ty1,y2S(1)𝖳𝗋(α((x+y1)3+(x+y2)3)))|S|
(y1,y2|x𝔽2t(1)𝖳𝗋(α((x+y1)3+(x+y2)3))|)|S|.

For y1,y2𝔽2t, let Py1,y2(X) be the polynomial

Py1,y2(X) =α((X+y1)3+(X+y2)3)
=α((y1+y2)X2+(y12+y22)X+(y13+y23)).

The key observation is that for most (y1,y2)S2, the trace of the polynomial Py1,y2(X) is a nonconstant 𝔽2-linear function, and thus the inner sum:

x𝔽2t(1)𝖳𝗋(Py1,y2(x))

equals 0.

Lemma 27.

If P(X)=aX2+bX+c𝔽2t[X], then

𝖳𝗋P:𝔽2t𝔽2

is a nonconstant 𝔽2-linear function unless a=b2.

The proof is standard, and we omit it.

By the lemma, we get that there are at most 4|S| choices of (y1,y2) such that the inner sum is non-zero (namely those (y1,y2)S2 for which α(y1+y2)=(α(y12+y22))2, which are few in number by the Schwartz-Zippel lemma).

Thus we get:

|S|44|S|22t,

from which we get |S|O(2t/2), as desired.

5.2 Larger dimension

We now see how to get graph codes of distance 11nϵ with ϵ<12 and larger rate.

For a polynomial f(X)𝔽2t[X], let Mf be n×n matrix with rows and columns indexed by 𝔽2t for which:

(Mf)x,y=𝖳𝗋(f(x+y)).

Let Wd be the 𝔽2t-linear space of all polynomials f(X) of the form:

f(X)=32i+1dαiX2i+1,

where the αi𝔽2t.

Then, let us define our construction.

Code 28.

For n of the form 2t and dn1/2, let us define the family of codes

CDualBCH(n,d)={Mf:fWd}.
Theorem 29.

We have that CDualBCH(n,d) is a linear graph code of distance 1O(dn1/2) and dimension Ω(dt)=Ω(dlogn).

Proof.

The proof is very similar to the proof of Theorem 26. Consider any MfCDualBCH(n,d) with f0. It suffices to show that the independence number222An essentially identical proof shows that the clique number also has the same bound. The only change is to replace the LHS of (3) by (|S|2|S|), and this sign change does not affect anything later because we immediately apply Cauchy-Schwarz to get (4). This justifies our referring to this code as a “code of Ramsey graphs”. of Mf is O(dn1/2).

Assume that S𝔽2t is an independent set in Mf. Then

|S|2=x,yS(1)𝖳𝗋(f(x+y)). (3)

As in the proof of Theorem 26, by the Cauchy-Schwartz inequality and some simple manipulations, we get:

|S|4 (y1,y2S|x𝔽2t(1)𝖳𝗋(Py1,y2(x))|)|S|, (4)

where:

Py1,y2(X)=f(X+y1)f(X+y2).

At this point, we need an upper bound in the inner sum:

Uy1,y2=|x𝔽2t(1)𝖳𝗋(Py1,y2(x))|.

To get this, we will invoke the deep and powerful Weil bound:

Theorem 30 ([16], Chapter II, Theorem 2E).

Suppose P(X)𝔽2t(X) is a nonzero polynomial of odd degree with degree at most d. Then:

|x𝔽2t(1)𝖳𝗋(P(x))|O(d2t/2).

We will use this to show that all but a few pairs (y1,y2)S2, Uy1,y2 are small.

Lemma 31.

For all but d|S| pairs (y1,y2)S2,

Uy1,y2O(d2t/2).

is at most d|S|.

Assuming this for the moment, we can proceed with Equation (4):

|S|4 (d|S|2t+|S|2O(d2t/2))|S|
=d|S|22t+O(d|S|32t/2).

Thus:

|S|2d2t+O(d|S|2t/2),

which implies that |S|O(d2t/2), as desired.

Proof of Lemma 31

Proof.

Theorem 30 only applies to polynomials with odd degree. We first recall a standard trick involving the 𝖳𝗋 map to deduce consequences for arbitrary degree polynomials.

Note that 𝖳𝗋(a2)=𝖳𝗋(a) for all a𝔽2t, and that every element of 𝔽2t has a square root. Thus for any positive degree monomial M(X)=αXi, where i=j2k with j odd, the equality:

𝖳𝗋(M(x))=𝖳𝗋(M~(x))

for each x𝔽2t, where M~(X) is the odd degree monomial given by:

M~(X)=α1/2kXj.

Extending by linearity, this allows us to associate, to every polynomial P(X)𝔽2t[X], a polynomial P~(X) with

𝖳𝗋(P(x))=𝖳𝗋(P~(x))

for all x𝔽2t, and where every monomial of P~(X) (except possibly the constant term) has odd degree.

The key observation is that whenever P~y1,y2(X) is nonconstant, it has odd degree, and so we can apply the Weil bound. In this case, since:

𝖳𝗋(Py1,y2(x))=𝖳𝗋(P~y1,y2(x))

for each x𝔽2t, we get:

Uy1,y2 =|x𝔽2t(1)𝖳𝗋(P~y1,y2(x))| (5)
O(d2t/2), (6)

where the last step follows from the Weil bound (Theorem 30).

Thus we simply need to show that there are at most d|S| pairs (y1,y2)S2 for which P~y1,y2(X) is a constant.

Suppose f(X) has degree exactly 2e+1. Let α be the coefficient of X2e+1 in f(X).

Define γi(Y)𝔽q[Y] by:

f(X+Y)=j=02e+1γi(Y)Xi.

Note that deg(γi(Y))2e+1i. It is easy to check that γ2e+1(Y)=α and γ2d(Y)=αY.

Then we have:

Py1,y2(X) =f(X+y1)f(X+y2)
=i2e(γi(y1)γi(y2))Xi.

Then by definition,

P~y1,y2(X)=j2e1j odd (k0j2k2e(γj2k(y1)γj2k(y2))12k)Xj.

We are trying to show that for most y1,y2, this is nonconstant. We will do this by identifying a monomial of positive degree which often has a nonzero coefficient. Let e=j02k0 with j odd. We will focus on the coefficient of Xk0. It equals:

(γ2e(y1)γ2e(y2))1/2k0+1+(0kk0(γj2k(y1)γj2k(y2))12k).

By linearity of the map aa1/2k, this can be expressed in the form Q(y11/2k0+1,y21/2k0+1), where Q(Z1,Z2) is a bivariate polynomial of degree at most 2e. Furthermore, using the fact that γ2e(Y)=αY, the homogeneous part of Q(Z1,Z2) of degree 1 exactly equals:

α1/2k0+1(Z1Z2),

which is nonzero. Thus Q(Z1,Z2) is a nonzero polynomial.

Thus by the Schwartz-Zippel lemma, for T={y1/2k0+1yS}, there are at most 2e|T|d|S| values of (z1,z2)T2 such that Q(z1,z2)=0. Thus there are at most d|S| values of (y1,y2)S2 for which the coefficient of Xk0 in P~y1,y2(X) is 0. Whenever it is nonzero, Equation (6) bounding Uy1,y2 applies, and we get the desired result.

References

  • [1] Noga Alon. Connectivity graph-codes. arXiv preprint, 2023. arXiv:2308.07653.
  • [2] Noga Alon. Graph-codes. arXiv preprint, 2023. arXiv:2301.13305.
  • [3] Noga Alon, Anna Gujgiczer, János Körner, Aleksa Milojević, and Gábor Simonyi. Structured codes of graphs. SIAM Journal on Discrete Mathematics, 37(1):379–403, 2023. doi:10.1137/22M1487989.
  • [4] Per Austrin and Subhash Khot. A Simple Deterministic Reduction for the Gap Minimum Distance of Code Problem. IEEE Transactions on Information Theory, 60(10):6636–6645, 2014. doi:10.1109/TIT.2014.2340869.
  • [5] Shixi Chen and Shuigeng Zhou. Recursive mechanism: Towards node differential privacy and unrestricted joins. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pages 653–664. Association for Computing Machinery, 2013. doi:10.1145/2463676.2465304.
  • [6] Pat Devlin. Personal Communication.
  • [7] G David Forney. Concatenated codes. Technical report, Massachusetts Institute of Technology, Research Laboratory of Electronics, 1965.
  • [8] Michael Hay, Chao Li, Gerome Miklau, and David Jensen. Accurate Estimation of the Degree Distribution of Private Networks. In 2009 Ninth IEEE International Conference on Data Mining, pages 169–178, 2009. doi:10.1109/ICDM.2009.11.
  • [9] Palak Jain, Adam Smith, and Connor Wagaman. Time-Aware Projections: Truly Node-Private Graph Statistics under Continual Observation. arXiv, 2024. doi:10.48550/arXiv.2403.04630.
  • [10] J. Justesen. Class of constructive asymptotically good algebraic codes. IEEE Transactions on Information Theory, 18(5):652–656, 1972. doi:10.1109/TIT.1972.1054893.
  • [11] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Analyzing graphs with node differential privacy. In Proceedings of the 10th Theory of Cryptography Conference on Theory of Cryptography, TCC’13, pages 457–476. Springer-Verlag, 2013. doi:10.1007/978-3-642-36594-2_26.
  • [12] James L Massey. Threshold decoding. Technical report, Massachusetts Institute of Technology, Research Laboratory of Electronics, 1963.
  • [13] M. Plotkin. Binary codes with specified minimum distance. IRE Transactions on Information Theory, 6(4):445–450, 1960. doi:10.1109/TIT.1960.1057584.
  • [14] I. S. Reed and G. Solomon. Polynomial Codes Over Certain Finite Fields. Journal of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960. doi:10.1137/0108018.
  • [15] Kai-Uwe Schmidt. Symmetric bilinear forms over finite fields with applications to coding theory. Journal of Algebraic Combinatorics, 42(2):635–670, 2015. doi:10.1007/s10801-015-0595-0.
  • [16] W.M. Schmidt. Equations over Finite Fields: An Elementary Approach. Lecture Notes in Mathematics. Springer Berlin Heidelberg, 2006. URL: https://books.google.co.uk/books?id=up97CwAAQBAJ.
  • [17] J. Wolf. On codes derivable from the tensor product of check matrices. IEEE Transactions on Information Theory, 11(2):281–284, 1965. doi:10.1109/TIT.1965.1053771.
  • [18] Lev Yohananov, Yuval Efron, and Eitan Yaakobi. Double and Triple Node-Erasure-Correcting Codes Over Complete Graphs. IEEE Transactions on Information Theory, 66(7):4089–4103, 2020. doi:10.1109/TIT.2020.2971997.
  • [19] Lev Yohananov and Eitan Yaakobi. Codes for Graph Erasures. IEEE Transactions on Information Theory, 65(9):5433–5453, 2019. doi:10.1109/TIT.2019.2910040.

Appendix A Justensen-like code

The construction in this example is inspired by the Justensen code [10], which uses an ensemble of codes for the inner code instead of a single inner code. Justensen uses an ensemble known as the Wozencraft Ensemble [12] with the following properties.

Theorem 32 (Wozencraft Ensemble).

For every large enough k, there exists codes C(1),C(2),,C(2k1) over 𝔽22k with rate 1/2, where 1ϵ fraction of them have distance at least H21(1/2ϵ).

Since our goal is graph distance, we use the STCZD operation to covert the Wozencraft Ensemble from codes over strings with good Hamming distance to codes of matrices with good graph distance.

Lemma 33 (Wozencraft Ensemble Modification).

For any ϵ>0, and large enough k, there exists codes D(1),D(2),,D(2k1) over 𝔽22k×2k. View these as directed graph codes. Then these codes have rate 1/8, and at least a 1ϵ fraction of them have distance at least H21(1/2ϵ).

Proof.

Let C(1),C(2),,C(N) be the Wozencraft Ensemble. For each I[N], define D(I)=STCZD(C(I)). Note that each D(I) is a code over 𝔽22k×2k. Note that by lemma Lemma 10, each of the codes has rate 1/8. Since the STCZD operation translates Hamming distance to directed graph distance, we also have the same guarantee as the original Wozencraft Ensemble - at least (1ϵ) fraction of the codes have distance at least H21(1/2ϵ).

Concatenating STCZD(RS) with the modified Wozencraft Ensemble in a particular arrangement yields our next construction.

Code 34 (Justensen-like CJustensen(ϵ,k,ρ)).

Require ϵ,ρ(0,1). Let Q=2k, and N=2k1.

Let D(1),D(2),,D2k1 be the modified Wozencraft Ensemble Lemma 33.

Then CJustensen is the code where for each element of ASTCZD(RS(N,ρ,Q)), for each I,J[N], we replace the symbol at AIJ with its encoding under D(min(I,J)). If J<I, we transpose the encoding (to keep the matrix symmetric).

Figure 2 shows where each inner code is applied.

[D(1)D(1)D(1)D(1)D(1)D(1)TD(2)D(2)D(2)D(2)D(1)TD(2)TD(3)D(3)D(3)D(1)TD(2)TD(3)TD(4)D(4)D(1)TD(2)TD(3)TD(4)TD(5)D(1)TD(2)TD(3)TD(4)TD(5)T]
Figure 2: Inner code arrangement for CJustensen.
Theorem 35.

For any ϵ,ρ(0,1), and k, a sufficiently large integer, CJustensen(ϵ,ρ,k) is a strongly explicit linear graph code with rate ρ2/8o(1), and distance at least (1ρϵ)H1(1/2ϵ).

Proof.

Let N=2k1, and n=2k be the side lengths of the inner and outer codes, respectively. First note that CJustensen is a linear graph code over 𝔽2nN×nN, since both the inner and outer codes are linear, and we can apply a 𝔽2 linear map from 𝔽2k𝔽2k before encoding with the inner code.

Rate.

By Lemma 10, the outer code, STCZD(RS(N,ρ,Q)), has rate ρ2/2o(1), and by Lemma 33, the inner codes have rate 1/8o(1). Thus, the rate is ρ2/8o(1) as an undirected graph code.

Distance.

Let O be a non-zero outer codeword. For convenience, let d=H1(1/2ϵ), and n=2k be the side length of the inner code. We claim the distance is at least (1ρϵ)d.

Call I[N], bad if the distance of D(I)<d, and good otherwise. Let B[N] be the subset of bad indices. By the guarantee of the Wozencraft ensemble, we know that |B|<ϵN. Since OIJ gets encoded with min(I,J), if I,JB, then OIJ is encoded with an inner code of distance at least d.

Define SI,TJ as in the proof of Lemma 17. Let S={I:|SI|din and IB}. Similarly, define T. Then |S|,|T|<(1ρϵ)N. Then |SB|,|TB<(1ρ)N. Since this is less than the outer distance of the code, we have that OIJ0 for some ISB, and JTB. In other words, |SI|<d, |TJ|<din, and OIJ is encoded with a code of directed graph distance at least d. Thus, by the inner distance, there remains a non-zero element in the (I,J)th block outside of SI, and TJ.