Erasure-Resilient Sublinear-Time Graph Algorithms

We investigate sublinear-time algorithms that take partially erased graphs represented by adjacency lists as input. Our algorithms make degree and neighbor queries to the input graph and work with a specified fraction of adversarial erasures in adjacency entries. We focus on two computational tasks: testing if a graph is connected or $\varepsilon$-far from connected and estimating the average degree. For testing connectedness, we discover a threshold phenomenon: when the fraction of erasures is less than $\varepsilon$, this property can be tested efficiently (in time independent of the size of the graph); when the fraction of erasures is at least $\varepsilon,$ then a number of queries linear in the size of the graph representation is required. Our erasure-resilient algorithm (for the special case with no erasures) is an improvement over the previously known algorithm for connectedness in the standard property testing model and has optimal dependence on the proximity parameter $\varepsilon$. For estimating the average degree, our results provide an"interpolation"between the query complexity for this computational task in the model with no erasures in two different settings: with only degree queries, investigated by Feige (SIAM J. Comput. `06), and with degree queries and neighbor queries, investigated by Goldreich and Ron (Random Struct. Algorithms `08) and Eden et al. (ICALP `17). We conclude with a discussion of our model and open questions raised by our work.


Introduction
The goal of this work is to model and investigate sublinear-time algorithms that run on graphs with incomplete information.Typically, sublinear-time models assume that algorithms have query or sample access to an input graph.However, this assumption does not accurately reflect reality in some situations.Consider, for example, the case of a social network where vertices represent individuals and edges represent friendships.Individuals might want to hide their friendship relations for privacy reasons.When input graphs are represented by their adjacency lists, such missing information can be modeled as erased entries in the lists.In this work, we initiate an investigation of sublinear-time algorithms whose inputs are graphs represented by the adjacency lists with some of the entries adversarially erased.
In our erasure-resilient model of sublinear-time graph algorithms, an algorithm gets a parameter α ∈ [0, 1] and query access to the adjacency lists of a graph with at most an α fraction of the entries in the adjacency lists erased.We call such a graph α-erased or, when α is clear from the context, partially erased.Algorithms access partially erased graphs via degree and neighbor queries.The answer to a degree query v is the degree of the vertex v.A neighbor query is of the form (v, i), and the answer is the i th entry in the adjacency list of v.If the i th entry is erased1 , the answer is a special symbol ⊥.A completion of a partially erased graph G is a valid graph represented by adjacency lists (with no erasures) that coincide with the adjacency lists of G on all nonerased entries.We formulate our computational tasks in terms of valid completions of partially erased input graphs and analyze the performance of our erasure-resilient algorithms in the worst case over all α-erased graphs.We investigate representative problems from two fundamental classes of computational tasks in our model: graph property testing and estimating a graph parameter.
In the context of graph property testing [GGR98], we study the problem of testing whether a partially erased graph is connected.Our model is a generalization of the general graph model of Parnas and Ron [PR02] (which is in turn a generalization of the bounded degree model of Goldreich and Ron [GR02]) to the setting with erasures.A partially erased graph G has property P (in our case, is connected) if there exists a completion of G that has the property.For ε ∈ (0, 1), such a graph with m edges (more precisely, 2m entries in its adjacency lists) is ε-far from P (in our case, from being connected) if every completion of G is different in at least εm edges from every graph with the property.The goal of a testing algorithms is to distinguish, with high probability, α-erased graphs that have the property from those that are ε-far.For testing connectedness in our erasure-resilient model, we discover a threshold phenomenon: when the fraction of erasures is less than ε, this property can be tested efficiently (in time independent of the size of the graph); when the fraction of erasures is at least ε, then a number of queries linear in the size of the graph is required to test connectedness.Additionally, when there are no erasures, our tester has better query complexity than the best previously known standard tester for connectedness [PR02,BRY14], also mentioned in the book on property testing by Goldreich [Gol17].Our tester has optimal dependence on ε, as evidenced by a recent lower bound in [PRV20] for this fundamental property.
Next, we study erasure-resilient algorithms for estimating the average degree of a graph.The problem of estimating the average degree of a graph, in the case with no erasures, was studied by Feige [Fei06], Goldreich and Ron [GR08], and Eden et al. [ERS17,ERS19].Feige designed an algorithm that, for all ε > 0, makes O( √ n/ε) degree queries to an n-node graph and outputs, with high probability, an estimate that is within a factor of 2 + ε of the average degree.He also showed that to get a 2-approximation, one needs Ω(n) degree queries.Goldreich and Ron proved that if an algorithm can make uniformly random neighbor queries (that is, obtain a uniformly random neighbor of a specified vertex) then, for all ε > 0, the average degree can be estimated to within a factor of 1 + ε using O( √ n • poly(log n, 1/ε)) queries.Eden et al. proved a tighter bound of O( √ n • log log n • poly(1/ε)) on the query complexity of this problem and provided a simpler analysis.We describe an algorithm that estimates the average degree of α-erased graphs to within a factor of 1 + min(2α, 1) + ε using O( √ n • log log n • poly(1/ε)) queries.Our result can be thought of as an interpolation between the results in [Fei06] and [GR08,ERS17,ERS19].In particular, when there are no erasures, that is, when α = 0, we get a (1 + ε)-approximation; when all adjacency entries are erased, and only the degree queries are useful, that is, when α = 1, we obtain a (2 + ε)-approximation.We also show that our result cannot be improved significantly: to get a (1 + α)-approximation, Ω(n) queries are necessary.
Discussion of our model.For the case of graph property testing, our model is an adaptation of the erasure-resilient model for testing properties of functions by Dixit et al. [DRTV18].Dixit et al. designed erasure-resilient testers for many properties of functions, including monotonicity, the Lipschitz property, and convexity.The conceptual difference between the two models is that the adjacency lists representation of a graph cannot be viewed as a function.(This is not the case for the adjacency matrix representation.)For a function, erased entries can be filled in arbitrarily and, as a result, they never contribute to the distance to the property.For the adjacency lists representation, this is not the case: erasures have to be filled so that the resulting completion is a valid graph.The restrictions on how they can be filled may result in some contribution to the distance coming from the erased entries2 .For example, consider the property of bipartiteness.Let B be a complete balanced bipartite graph (U, V ; E), and let B ′ be obtained from B by adding an erased entry to the adjacency list of every vertex in U .Then, in every completion of B ′ , all formerly erased entries have to be changed to make the graph bipartite.Furthermore, Dixit et al. [DRTV18] gave results only on property testing in the erasure-resilient model.We go beyond property testing in our exploration of erasure-resilient algorithms by considering more general computational tasks.
Finally, our model opens up many new research directions, some of which are discussed in Section 4.

The Model
We consider simple undirected graphs G = (V, E) represented by adjacency lists, where some entries in the adjacency lists could be adversarially erased (these entries are denoted by ⊥).
Definition 1.1 (α-erased graph; completion).Let α ∈ [0, 1] be a parameter.An α-erased graph on a vertex set V is a concatenation of the adjacency lists of a simple undirected graph (V, E) with at most an α fraction of all entries (that is, at most 2α|E| entries) in the lists erased.A completion of an α-erased graph G is the adjacency lists representation of a simple undirected graph G ′ that coincides with G on all nonerased entries.By definition, every partially erased graph has a completion, because it was obtained by erasing entries in a valid graph.
Given a partially erased graph G over a vertex set V , we use n to denote |V | and m to denote the number of edges in any completion of G, that is, half the sum of lengths of the adjacency lists of all the vertices in G.The average degree, that is, 2m/n, is denoted by d.For u ∈ V , we use Adj(u) to denote the adjacency list of u.The degree u, denoted deg(u), is the length of Adj(u).Definition 1.2 (Nonerased and half-erased edges).Let G be a partially erased graph over a vertex set V .For vertices u, v ∈ V , the set {u, v} is a nonerased edge in G if u is present in Adj(v) and vice versa.The set {u, v} is a half-erased edge if u is in Adj(v) but v is not in Adj(u), or vice versa.
Our algorithms make two types of queries: degree queries and neighbor queries.A degree query specifies a vertex v, and the answer is deg(v).A neighbor query specifies (v, i), and the answer is the i th entry in Adj(v).
Definition 1.3 (Distance to a property; erasure-resilient property tester).Let α ∈ [0, 1], ε ∈ (0, 1) be parameters.An α-erased graph G satisfies a property P if there exists a completion of G that satisfies P.An α-erased graph G is ε-far from a property P if every completion G ′ of G is different in at least εm edges from every graph that satisfies P.

Our Results
In this section, we state our main results for the erasure-resilient model of sublinear-time algorithms.

Testing Connectedness
The problem of testing connectedness in the general graph model (that we further generalize to the erasure-resilient setting) was studied by Parnas and Ron [PR02].The results on this fundamental problem are described in Section 10.2.1 in [Gol17].The best tester for this problem to date, due to [BRY14], had query complexity O 1 (εd) 2 .We give two erasure-resilient testers for connectedness: one for small values of α and another for intermediate values of α.Both testers work for all3 values of the proximity parameter, ε.We first give a tester that works for all α < ε/2.(This tester is presented in Section 2.1.)Theorem 1.4.There exists an α-erasure-resilient ε-tester for connectedness of graphs with the average degree d that has O min query and time complexity and works for every ε ∈ (0, 2/d) and α ∈ [0, ε/2).The tester has 1-sided error.When the average degree d of the input graph is unknown, α-erasure-resilient ε-testing of connectedness (with 1-sided error) has query and time complexity O( 1 ε−2α log 1 ε−2α ).Importantly, when the input adjacency lists have no erasures (i.e., when α = 0), our tester has better query complexity than the previously known best (standard) tester for connectedness, which was due to [BRY14].We present a standalone algorithm for this important special case in Appendix A for easy reference.By substituting α = 0 in Theorem 1.4, we get O min 1 (εd) 2 , 1 ε log 1 εd query complexity for the case when d is known and O( 1 ε log 1 ε ) query complexity when d is unknown.For the case with no erasures, the improvement in query complexity as a function of The latter is optimal, as evidenced by an Ω( 1 ε log 1 ε ) lower bound for testing connectedness of graphs of degree 2 in [PRV20].We note that Berman et al. [BRY14] already proved that testing connectedness of graphs (with no erasures) in the bounded degree graph model of [GR02] has query complexity O( 1 ε log 1 εD ) where D denotes the degree bound.Our result shows that the same query complexity (with D replaced by d) is attainable in the general graph model.
Our first tester looks for small connected components that do not have any erasures.When α ∈ [ε/2, ε), some α-erased graphs that are ε-far from connected may not have any connected component that is free of erasures.Consequently, our first tester fails to reject such graphs.We give a different algorithm (presented in Section 2.2) which works by looking for a subset of vertices that has at most one erasure and gets completed to a unique connected component in every completion of the partially erased graph.(In the beginning of Section 2.2, we give an explanation, illustrated by Figure 1, of why two erasures in a witness may render it not detectable from a local view obtained by a sublinear algorithm.)Theorem 1.5.There exists an α-erasure-resilient ε-tester for connectedness of graphs with the average degree 2 , 1 query and time complexity and works for every ε ∈ (0, 2/d) and α ∈ [0, ε).The tester has 1-sided error.
Finally, we show that when α ≥ ε, the task of α-erasure-resilient ε-testing of connectedness requires examining a linear portion of the graph representation.That is, we discover a phase transition in the complexity of this problem when the fraction of erasures α reaches the proximity parameter ε.
Theorem 1.6.For all ε ∈ (0, 1/7], every ε-erasure-resilient ε-tester for connectedness that makes only degree and neighbor queries requires a number of queries linear in the size of the graph representation. To prove this theorem, we construct (in Section 2.3) a family of partially erased graphs for which it is hard to distinguish connected graphs from graphs that are far from connected.The average degree of the graphs in our constructions is constant.So, the lower bound for this graph family is Ω(n) = Ω(m).

Estimating the Average Degree
In Section 3.1, we give an erasure-resilient algorithm for estimating the average degree by generalizing the algorithm of Eden et al. [ERS17,ERS19] to work for the case with erasures.
Theorem 1.7.Let α ∈ [0, 1] and ε ∈ (0, 1/2).There exists an algorithm that makes O( √ n • log log n • poly(1/ε)) degree queries and uniformly random neighbor queries to an α-erased input graph of average degree d ≥ 1 and outputs, with probability at least 2/3, an estimate d satisfying The running time of the algorithm is the same as its query complexity.
For graphs with no erasures, a good estimate of the number of edges gives a good estimate of the average degree.Feige's algorithm [Fei06] (that has access only to degree queries) counts some edges twice and gets an estimate of the average degree that is within a factor of 2+ε.Goldreich and Ron [GR08] and Eden et al. [ERS17,ERS19] avoid the issue of double-counting by ranking vertices according to their degrees and estimating, within a factor of 1 + ε, the number of edges going from lower-ranked to higher-ranked vertices.These algorithms use degree queries and uniformly random neighbor queries.Having erasures in the adjacency lists is, in a rough sense, equivalent to not having access to some of the neighbor queries.This results in the additional 2α error term in the approximation guarantee.Consequently, when the fraction of erasures approaches 1/2, all the "relevant" entries in the adjacency lists of the input graph could be erased, and we enter the regime of having access only to degree queries.
In Section 3.2, we show that, for any fraction α ∈ (0, 1], estimating the average degree of an α-erased graph to within a factor of (1 + α) requires Ω(n) queries.In other words, the approximation ratio of our erasure-resilient algorithm for estimating the average degree cannot be improved significantly.
Theorem 1.8.Let α ∈ (0, 1] be rational.For all γ < α, at least Ω(n) queries are necessary for every algorithm that makes degree and neighbor queries to an α-erased graph with the average degree d and outputs, with probability at least 2/3, an estimate d ∈ d, (1 + γ)d .

Research Directions and Further Observations
There are numerous research questions that arise from our work.In Section 4, we discuss some of them and also give additional observations about variants of our model.We mention open questions about another (weaker) threshold in erasure-resilient testing of connectedness, about erasure-resilient testing of monotone graph properties, about the relationship between testing with erasures and testing with errors, and about the variant of our model that allows only symmetric erasures.We show that some of the questions we discuss are open in our model, but easy in the bounded-degree version of our model.Sublinear-time algorithms for estimating various graph parameters have also received significant attention.There are sublinear-time algorithms for estimating the weight of a minimum weight spanning tree [CRT05], the number of connected components [CRT05,BKM14], the average degree [Fei06,GR08], the average pairwise distance [GR08], moments of the degree distribution [GRS11,ERS17], and subgraph counts [GRS11, ELRS17, ERS18, ER18, ABG + 18, AKK19].

Erasure-Resilient Testing of Connectedness
In this section, we present our results on erasure-resilient testing of connectedness in graphs.

An Erasure-Resilient Connectedness Tester for α < ε/2
In this section, we present our connectedness tester for small α and prove Theorem 1.4.The tester looks for witnesses to disconnectedness in the form of connected components with no erasures.It repeatedly performs a breadth first search (BFS) from a random vertex until it finds a witness to disconnectedness or exceeds a specified query budget.
A simple counting argument shows that if a partially erased graph is far from connected then it has many small witnesses to disconnectedness.Moreover, the size of the average witness among them is at most some bound b (that we calculate later).Our tester uses BFS to detect a witness to disconnectedness of size at most b.
The best tester for connectedness to date, by Berman et al. [BRY14], uses a technique called the work investment strategy.Specifically, their algorithm repeatedly samples a uniformly random vertex v, guesses the size of the witness to disconnectedness C (v) containing v, and then performs a BFS from v for |C (v) | 2 queries.Clearly, |C (v) | 2 queries are enough to detect C (v) .Using the fact that the expected size of a witness is b, they argue that their algorithm has complexity O(b 2 ).
The new idea in our connectedness tester is to perform the BFS from a uniformly random vertex v for |C (v) | • deg(v)/2 queries.The expected value of the latter quantity is bounded by E (v) , where E (v) denotes the number of edges in the witness containing v, and the expectation is over the choice of a uniformly random vertex from C (v) .That is, in expectation, the number of queries that we invest into the BFS from v is enough to detect C (v) .We show that, overall, the expected complexity of this algorithm is Our erasure-resilient tester is Algorithm 1, with a small standard modification to ensure that the stated complexity bounds hold in the worst case (not just in expectation).It is obtained by running the algorithm of Berman et al. (generalized to handle erasures) when b < d and running the above algorithm otherwise.
Before stating the algorithm, we formalize the notion of the witness to disconnectedness and argue that partially erased graphs that are far from being connected have many witnesses to disconnectedness.
Definition 2.1 (Witness to disconnectedness).A set C of vertices is a witness to disconnectedness in a partially erased graph G if the adjacency lists of vertices in C have no erasures, and C forms a connected component in every completion of G.
Next, in Claim 2.3, we argue that if the fraction of erasures is small, many of the connected components present in a completion G ′ are also present as witnesses to disconnectedness in G.
Proof.By Observation 2.2, every completion G ′ of G has at least εm + 1 connected components.The number of connected components in G ′ with at least one erased entry in the union of its adjacency lists (with respect to G) is at most 2αm.Hence, the number of connected components in G ′ that do not have any erased entry in the union of its adjacency lists (with respect to G) is at . By Claim 2.3, the size of the average witness to disconnectedness is at most b.Now we are ready to state Algorithm 1.
Clearly, Algorithm 1 accepts all connected partially erased graphs.
Proof.Let V be the vertex set of G.We start by defining the quality of a vertex v ∈ V .The definition is different for the two cases, corresponding to the two stopping conditions Algorithm 1 uses for BFS.First, we consider the case when b ≤ d • log b, that is, when Algorithm 1 runs the version of BFS specified in Step 6. Definition 2.5 (Quality of a vertex when b ≤ d • log b).The quality of a vertex v, denoted q(v), is defined as follows.If v belongs to a witness to disconnectedness in G then q(v) = 1/|C (v) |, where C (v) denotes the witness to disconnectedness that v belongs to.Otherwise, q(v) = 0.
The important feature of q(v) is that, for a witness C to disconnectedness, v∈C q(v) = 1.
Next, we define the quality of a vertex for the case when b > d • log b, that is, when Algorithm 1 runs the version of BFS specified in Step 8.
and let E (v) denote the number of edges in C (v) .The quality of a vertex v, denoted q(v), is defined as As was the case for q(v) from Definition 2.5, for a witness C to disconnectedness, v∈C q(v) = 1.The rest of the proof of Lemma 2.4 is the same for both cases.We analyze the expected quality of a uniformly random vertex v ∈ V .Using the fact that v∈C q(v) = 1 and Claim 2.3, Finally, we apply the following work investment strategy lemma due to [BRY14, Lemma 2.5].

Lemma 2.7 ([BRY14]). Let X be a random variable that takes values in
We apply Lemma 2.7 with X equal to q(v) for a uniformly random v ∈ V .Set β = 1/b and t = ⌈log(4/β)⌉.For i ∈ [t], set p i to be the probability that a vertex v sampled uniformly at random belongs to a witness to disconnectedness of G that has at most (i) Then the probability that Step 9 of the tester does not reject is t i=1 (1 − p i ) k i .By Lemma 2.7, this step rejects with probability at least 5/6.Proof of Theorem 1.4.We start by analyzing the query and time complexity of Algorithm 1. Case 1: When b ≤ d • log b, the query and time complexity of Algorithm 1 is Case 2: When b > d • log b, the expected query and time complexity of Algorithm 1 is Substituting the value of b, we get: The final tester is obtained by running Algorithm 1 and then aborting and accepting if the number of queries exceeds six times its expectation.The final tester then has the query complexity and the running time stated in Theorem 1.4.
The final tester never rejects a connected partially erased graph.However, a partially erased graph that is ε-far from connected can get accepted incorrectly if Algorithm 1 accepts it or if the final algorithm aborts.The probability of the former event is at most 1/6, by Lemma 2.4.The probability of aborting is also at most 1/6, by Markov's inequality.By a union bound, the final algorithm accepts incorrectly with probability at most 1/3, completing the proof of the theorem for the case when d is given to the algorithm.
We can adjust the algorithm to work without access to the average degree at a small cost in query and time complexity, using the technique explained in Appendix A.1.

Our Erasure-Resilient Connectedness Tester for
In this section, we prove Theorem 1.5.We describe and analyze a 1-sided error α-erasure-resilient ε-tester for connectedness that can work with more erasures in the input graph than Algorithm 1 can handle.Specifically, the tester works for all α < ε.However, it has better performance than Algorithm 1 only for α ∈ [ε/2, ε).
When α > ε/2, an α-erased graph that is ε-far from being connected may not contain any witnesses to disconnectedness as defined in Section 2.1.Specifically, every set C of nodes that gets completed to a connected component could have an erasure in the union of the adjacency lists of the nodes in C. To get around this issue, our tester looks for a generalized witness to disconnectedness, which is, intuitively, a connected component with at most one erasure.Observe that a component with two erasures could have a unique completion, but impossible to certify as Figure 1: An example of a component with two erasures, where a BFS from any vertex fails to detect that this component is disconnected from the rest of the graph.a separate connected component from the local view from any of its vertices.Figure 1 shows an example of a small component, where a BFS from any vertex will be unable to certify that the graph is disconnected.
Our tester repeatedly performs a BFS from a random vertex until it detects a generalized witness to disconnectedness, or exceeds a specified query budget.We show, by a counting argument, that every partially erased graph that is far from connected has several small generalized witnesses to disconnectedness.The correctness of the tester is ensured by the observation that each such witness C contains at least one vertex from which all the other vertices in C are reachable.(It is possible to have exactly one vertex in C from which all the other vertices are reachable.Figure 2 shows an example of a connected component, where a BFS can detect the generalized witness to disconnectedness only if started at vertex v 1 , but will fail to do so from all other vertices.) Before we state our tester, we formalize the notion of generalized witnesses.
Definition 2.8 (Generalized witness to disconnectedness).Given a partially erased graph G over a vertex set V , a set C ⊂ V is a generalized witness to disconnectedness of G if 1. there is at most one erased entry (⊥) in v∈C Adj(v), 2. every nonerased entry in v∈C Adj(v) is a vertex from C, Definition 2.8 implies that the only erasure, if any, in the union of the adjacency lists of the nodes in C is part of a half-erased edge within C, and that C forms a connected component in every completion of G.
Let b = 4/((ε − α)d).Our tester is presented in Algorithm 2. In the rest of the section, we analyze the correctness and complexity of the tester.Definition 2.9 (Small and big sets).Let G be a partially erased graph and let ε ⋆ ∈ (0, 2/d) be a parameter.The representation length of a set C of nodes is the sum of lengths of the adjacency lists of nodes in C. The set C is ε ⋆ -small if either  Run a BFS starting from s using at most min{b 2 , b • d} neighbor queries.

if
Step 4 detected a generalized witness to disconnectedness then 6 Reject.
Claim 2.10 shows that a partially erased graph that is far from connected has sufficiently many small generalized witnesses to disconnectedness.
Proof.We first argue that there are many small connected components in every completion G ′ of G and then prove that many of these are generalized witnesses in G.
Consider a completion , since the representation length of the vertex set V of G is 2m.By Observation 2.2, the total number of connected components in G ′ is at least εm + 1.Hence, the number of (ε − α)-small connected components in G ′ is at least (ε + α)m/2.Let C ⊂ V denote the set of vertices corresponding to an (ε − α)-small connected component in G ′ .If v∈C Adj(v) has no erasures, then C is a generalized witness to disconnectedness of G. Next, assume that v∈C Adj(v) has exactly one erasure.We show that the set C is a generalized witness to disconnectedness of G. Condition 1 is satisfied by definition.Condition 2 is true since C forms a connected component in G ′ .To see that Condition 3 holds, let u ∈ C be the vertex with ⊥ ∈ Adj(u).Since C is a connected component in G ′ , this erased entry was completed with the label of another vertex v ∈ C.Moreover, every vertex in C is reachable by a BFS from v, since C forms a connected component in G ′ , and the erased entry is not needed for these searches because it would lead back to v. Therefore, C is a generalized witness to disconnectedness of G if v∈C Adj(v) has exactly one erasure.
Among the (ε − α)-small connected components in G ′ , at most αm have at least 2 erased entries in the union of their adjacency lists.Hence, the number of (ε − α)-small generalized witnesses to disconnectedness of G is at least Proof.Consider an α-erased graph G over a vertex set V .Assume that G is connected, that is, there exists a connected completion G ′ of G. Consider an arbitrary C ⊂ V .There exist vertices The partially erased graphs G + and G − described in the proof of Theorem 1.6.The dotted lines represent erased entries in the adjacency lists of the corresponding vertices.In G + , the directed edges from v ⋆ point to the vertices in its adjacency list.The circles represent cycles.
Hence, C is not a generalized witness to disconnectedness of G. Therefore, the tester accepts G.
Next, assume that G is ε-far from connected.Let W denote the family of all (ε − α)-small generalized witnesses to disconnectedness of G. Step 4 of Algorithm 2 makes at most min{b 2 , bd} queries.Thus, the query complexity of Algorithm 2 is O(b • min{b 2 , bd}), which simplifies to the claimed expression.Checking (in Step 5) whether a set C is a generalized witness to disconnectedness can be done with a constant number of passes over the adjacency lists of vertices in C. Since the algorithm queried all entries in them, its running time is asymptotically equal to its query complexity.

A Lower Bound for Erasure-Resilient Connectedness Testing
In this section, we prove Theorem 1.6.We note that hard graphs in our construction have constant average degree.That is, for those graphs, our lower bound is Ω(n) = Ω(m).
Proof of Theorem 1.6.We apply Yao's minimax principle, as stated in [RS06].Specifically, we construct distributions D + and D − , the former over connected graphs and the latter over graphs that are ε-far from connected, such that every deterministic ε-erasure-resilient ε-tester for connectedness makes Ω(m) queries to distinguish the two distributions.
Without loss of generality, assume that t = (1 − ε)/(2ε) is an integer.Observe that t ≥ 3 as ε ≤ 1/7.Let k be an even number and n = kt + 1.We first construct two partially erased n-node graphs G + and G − , depicted in Figure 3.The vertices of G + are partitioned into k + 1 sets.Each of the first k sets induces a t-node cycle.Exactly one node in each cycle has degree 3 and has an erasure in its adjacency list, in addition to its two neighbors on the cycle.The last set contains a single node v ⋆ of degree k.Its adjacency list contains the labels of the degree-3 vertices in the cycles.The graph G − is the same as G + , except that in G − , we have that Adj(v ⋆ ) is empty, that is, v ⋆ is isolated.
We can obtain a connected completion of G + by connecting the vertex v ⋆ to all the degree-3 vertices.In contrast, at least k/2 edges need to be added to every completion of G − to make it connected.Hence, the distance from G − to connectedness is (k/2)/(kt + k/2) = 1/(2t + 1) = ε.
The distributions D + and D − are uniform over the sets of all partially erased graphs isomorphic to G + and G − , respectively.Each partially erased graph sampled from D + is connected.Each partially erased graph sampled from D − is ε-far from connected.
Claim 2.12.Every deterministic algorithm A has to make Ω(n) queries to distinguish D + and D − with probability at least 2/3.
Proof.Let q denote the number of queries made by A and assume q ≤ n/6.In this proof, we use v ⋆ as a shorthand for the vertex from the singleton set in the construction of D + and D − , as opposed to the label of that vertex.Since D + and D − differ only on v ⋆ , it is important to understand when A gets any information about v ⋆ .Definition 2.13 (Node status).Given a sequence of queries made by A and answers it has received so far, a node v is known if it has been queried (via a degree or neighbor query) or received as an answer to a (neighbor) query; otherwise, it is unknown.
The node v ⋆ is unknown before A makes its first query.Since v ⋆ cannot be received as an answer to a query for the graphs in the support of D + and D − , it can become known only if A queries an unknown node that happens to be v ⋆ .At most two new nodes become known per query.So, the probability (over the distribution D + or D − ) that a specific unknown node queried by A turns out to be v ⋆ is at most 1/(n − 2q).Let p denote the probability that v ⋆ becomes known by the end of an execution of A. By a union bound over all queries made by A, If v ⋆ is unknown by the end of a particular execution then the view of the partially erased graph obtained by A in that execution arises with the same probability under D + and under D − .Such an execution of A can distinguish D + and D − with probability at most 1/2.Therefore, the probability that In our construction, m = Θ(n).Thus, every ε-erasure-resilient ε-tester for connectedness that uses only degree and neighbor queries must make Ω(m) queries in the worst case over the input graph, completing the proof of Theorem 1.6.

An Algorithm for Estimating the Average Degree
In this section, we describe and analyze an algorithm for estimating the average degree of (or, equivalently, the number of edges in) a partially erased graph and prove Theorem 1.7.Our algorithm is a generalization of the algorithm for counting the number of edges in graphs by Eden et al. [ERS17,ERS19] to the case of partially erased graphs.We first give an algorithm (Algorithm 3) that takes a crude estimate of the average degree as input and outputs a more accurate estimate.Our final algorithm (Algorithm 4) uses Algorithm 3 as a subroutine to gradually refine its estimate of the average degree.
Algorithm 3, like the algorithm of Eden et al. [ERS17,ERS19], works by empirically estimating a random variable whose expectation is close to the number of edges in the graph.We first rank vertices according to their degrees, breaking ties arbitrarily.Then we orient the nonerased edges of the graph from lower-ranked to higher-ranked endpoints.This orientation allows us to attribute each nonerased edge to its lower-ranked endpoint in order to avoid double-counting the edge.Since the number of edges between high-degree vertices is small, we ignore such edges.Algorithm 3 samples low-degree vertices uniformly at random and estimates, via sampling, the number of edges "credited" to them.
The crucial difference in the behavior of the algorithm in the case of partially erased graphs is the following.When we sample an erased entry from the adjacency list of a low-degree vertex u, we assume that it gets completed to a vertex ranked higher than u and, therefore, attribute the corresponding edge to u.Consequently, some erased edges get counted twice.This results in the additional term depending on the fraction of erasures in the approximation guarantee.
The ranking or the total ordering on the vertices of a graph is defined below.Query the oracle for a uniformly random entry v from Adj(u).
Lemma 3.2.Let G be an α-erased n-node graph with the average degree d ≥ 1.Let d be a crude estimate of the average degree, given as an input to Algorithm 3. Then the output d of Algorithm 3 satisfies the following: 8 then, with probability at least 3/4, we have d ≤ 8d.
2. Furthermore, if d 8 ≤ d ≤ 8d then with probability at least 1 − δ, The query complexity of the algorithm is Proof.The algorithm makes at most two degree queries and one neighbor query in each iteration, and it runs for Θ Hence, the bound on its query complexity is as claimed in the lemma.
To prove the guarantees on the output estimate d, we first show that for all i ∈ [s], the expected value of χ i is a good estimate to the average degree of the partially erased graph, where s is the number of samples taken by Algorithm 3. We then apply Markov's inequality and Chernoff bound to prove parts 1 and 2 of the lemma, respectively.For all i ∈ [s], the random variables χ i set by the algorithm are mutually independent and identically distributed.Hence, it suffices to bound Proof.Let m = nd/2 denote the total number of edges in the graph, and denote the set of high degree vertices.Let m = n d/2 be the number of edges in the graph estimated from the input parameter d.Since d ≥ d/8, we have m ≥ m/8.Hence, where the first inequality holds because the sum of degrees of high-degree vertices is at most 2m, and the second inequality follows from m ≥ m/8.The following quantity, d + (u), was defined in [ERS19] for (standard) graphs.We extend their definition to partially erased graphs.Definition 3.4.For a vertex u in a partially erased graph G, let N (u) denote the set of (nonerased) neighbors present in Adj(u).Let d + (u) = |{v ∈ N (u) | u ≺ v}| denote the number of nonerased neighbors of u that are higher than u w.r.t. the ordering on vertices (as in Definition 3.1).
Roughly, d + (u) denotes the number of nonerased neighbors of u with the degree higher than that of u.The following fact is based on an observation by [ERS19].
Fact 3.5.For a partially erased graph G over a vertex set V , the sum u∈V d + (u) ≤ m.The inequality can be replaced with equality when G has no erasures.
The fact holds because each nonerased and half-erased edge in G is counted exactly once and at most once, respectively, in the sum u∈V d + (u).
Let u 1 , u 2 , . . ., u |H| be a labeling of the the high degree vertices such that u 1 ≺ u 2 ≺ . . .≺ u |H| .For each j ∈ [|H|], observe that d + (u j ) ≤ |H| − j, as d + (u j ) is at most the number of vertices that are higher than u j in the ordering.Hence, where the last inequality follows from (1).Let d ⊥ (u) denote the number of erased entries in Adj(u).The expectation since the degree of the sampled vertex u is assigned to χ 1 if and only if 1. deg(u) ≤ 4 n d/ε, i.e., u ∈ V \ H; and 2. the queried entry from Adj(u) is either a vertex v ≻ u or ⊥.
We now bound the quantity on the right hand side of (3) from below and above.Let G ′ be an arbitrary completion of G, and let d + G ′ (•) denote the quantity defined in Definition 3.4 with respect to G ′ (instead of G).For each u ∈ V , observe that d + (u) + d ⊥ (u) ≥ d + G ′ (u).Also note that the upper bound in (2) still holds if we replace d + (•) with d + G ′ (•).Hence, from (3), On the other hand, from (3), where the last inequality uses Fact 3.5 and This completes the proof of Claim 3.3 because, using (4),( 5) and (6), we get Let random variable χ = 1 s s i=1 χ i denote the mean of χ i 's calculated in Step 9 of Algorithm 3. Since all χ i 's are independent and identically distributed, where we used ε < 1/2 and d ≤ 8d in the simplification.Hence, with probability at least 1 − δ, Since d = 2χ, by Claim 3.3, we get that with probability at least 1 − δ, proving part 2 of Lemma 3.2.8 return 1.
Proof of Theorem 1.7.Our algorithm (Algorithm 4) uses Algorithm 3 as a subroutine.It runs with values of initial estimates d set in powers of 2, stopping and returning the current estimate once it exceeds the initial estimate for this iteration.Let ℓ ∈ {0, 1, . . ., ⌈log n⌉} be the iteration in which the algorithm returns the estimate in Step 7. If the algorithm returns the estimate in Step 8 then we let ℓ be ⌈log n⌉ + 1.Consider an iteration i ∈ {0, 1, . . ., ℓ} of the algorithm.Call iteration i good if d i satisfies the guarantees of Lemma 3.2 and bad otherwise.The probability that iteration i is bad is equal to the probability that at least t/2 runs of Step 4 fail to satisfy the guarantees of Lemma 3.2.By Chernoff bound, this probability is at most 1/(4 log n).Hence, by the union bound, the probability that there exists a bad iteration in the execution of the algorithm is at most ℓ+1 4 log n ≤ ⌈log n⌉+1 4 log n which is at most 1/3 whenever n ≥ 39.In the rest of the proof, we condition on the event that all iterations are good.ε 2.5 .Furthermore, when all iterations are good, we have n/2 ℓ ≥ d/8 which implies that ℓ ≤ log(8n/d).Hence, the running time of the algorithm is when Algorithm 4 outputs the correct estimate.When it fails to output the correct estimate, the worst-case query complexity is O √ n • log log n ε 2.5 .

A Lower Bound for Estimating the Average Degree
In this section, we prove Theorem 1.8.
Proof of Theorem 1.8.Fix λ = 2α 1+α .Note that λ ∈ (0, 1] since α ∈ (0, 1].Consider any integer n such that λ(n − 1) is an even integer.Since α is rational, there are infinitely many such n.We define two n-node graphs, G 1 and G 2 (see Figure 4).Both graphs contain a cycle consisting of (1 − λ)(n − 1) vertices.Of the remaining λ(n − 1) + 1 vertices, both graphs have λ(n − 1) vertices of degree 1, with the only entry in the adjacency list of each such vertex erased.The last vertex, called v ⋆ , is where G 1 and G 2 differ.In G 1 , we have that Adj(v ⋆ ) consists of the labels of the λ(n − 1) degree-1 vertices.In contrast, in G 2 , the vertex v ⋆ is isolated.
The graph G 1 can only be completed to a graph consisting of two components: a cycle of length (1−λ)(n−1) and a star consisting of λ(n−1) edges.The graph G 2 can only be completed to a graph consisting of a cycle of length (1 − λ)(n − 1), one isolated vertex, and a matching of size λ(n − 1)/2.Hence, the total lengths of the adjacency lists of G 1 and G 2 are 2(n − 1) and (2 − λ)(n − 1), respectively.The number of entries erased in both graphs is λ(n − 1).So, the fraction of erased entries in the adjacency lists of G 1 and G 2 are λ 2 and λ 2−λ , respectively.Hence, both G 1 and G 2 are α-erased, as λ 2−λ = α.The average degree of G 1 and G 2 are 2(n−1) n and (2−λ)(n−1) n , respectively.The ratio of the average degrees is 2 2−λ = 1 + α.The rest of the proof is similar to that of Theorem 1.6.We define two distributions D 1 and D 2 as the uniform distributions over the set of all graphs isomorphic to G 1 and G 2 , respectively.To differentiate between the two distributions, any tester must necessarily query v ⋆ which requires Ω(n) queries.The ratio of the average degrees of the two distributions is 1 + α.Hence, to approximate the average degree within a factor of (1 + γ), where γ < α, any tester must query Ω(n) vertices.

Conclusion and Open Questions
In this work, we initiate the study of sublinear-time algorithms for problems on partially erased graphs.Our investigation opens up a plethora of research directions and possibilities for future work.In what follows, we discuss several specific open questions arising from our work.
Phase Transitions in the Complexity of Erasure-Resilient Connectedness Testing.As shown in Section 2, there is a phase transition in the complexity of connectedness testing at α = ε from time independent of the size of the graph to Ω(n).Our upper bound on the complexity of this problem exhibits another, less drastic phase transition at α = ε/2, when the asymptotic dependence of the running time on ε and α changes.We conjecture that this second phase transition is inherent (and not an artifact of our techniques).It would be interesting to investigate whether connectedness testing when α ∈ [ε/2, ε) is fundamentally different from the same problem when α ∈ [0, ε/2).
Erasure-Resilient Testing of Monotone Properties in the Bounded-Degree Model.A property of a graph is monotone if it is preserved under deletion of edges and vertices.That is, if G satisfies a monotone property then so does every subgraph of G.Many important graph properties, including bipartiteness, 3-colorability, and triangle-freeness, are monotone.
In the bounded-degree property testing model [GR02], an n-node graph G with the degree bound D is represented as a concatenation of n adjacency lists, each of length D. For a vertex v ∈ G and an index i ∈ [D], a neighbor query (v, i) returns a valid vertex in the graph if i ≤ deg(v) and a special symbol, say , if i > deg(v).The graph G is ε-far from satisfying a property P if at least εnD entries in the adjacency lists of G need to be modified to make it satisfy P.
Bounded-degree property testing can be generalized in a natural way to account for erased entries in adjacency lists.A bounded-degree graph is α-erased if at most αnD entries of its adjacency lists are erased.We observe that a tester for a monotone property of bounded-degree graphs can be made erasure-resilient via a simple transformation.
Observation 4.1.Let P be a monotone property of graphs.Suppose there exists an ε-tester for P in the bounded-degree model that makes q(ε, n, D) queries.Then there exists an α-erasure-resilient ε-tester for P in the bounded-degree model that makes at most D 2 •q(ε− 2α, n, D) queries and works for all α ∈ (0, ε/2).
Proof.Fix an α-erased bounded-degree graph G on the vertex set V .Let G ⋆ = (V, E ⋆ ) be the graph consisting of only the nonerased edges of G (see Definition 1.2).We construct an oracle O that simulates access to G ⋆ by querying G. Let Adj(•) and Adj ⋆ (•) denote the adjacency lists of G and G ⋆ , respectively.On a degree or a neighbor query for a vertex v ∈ V, the oracle O internally constructs Adj ⋆ (v) from Adj(v) as follows: 1. Initialize Adj ⋆ (v) to an empty list.
3. Pad Adj ⋆ (v) with special characters so that its length is D.
The oracle O then answers the query with respect to the nonerased adjacency list Adj ⋆ (v).As Adj(v) has length at most D, and checking if v ∈ Adj(u) for each u ∈ Adj(v) takes at most D queries, the oracle makes at most D 2 queries to G to answer each query about G ⋆ .
Observe that an edge {u, v} ∈ G ⋆ iff u ∈ Adj(v) and v ∈ Adj(u).If G satisfies P then so does G ⋆ , as G ⋆ is a subgraph of a completion of G that satisfies the monotone property P. Suppose that G is ε-far from satisfying P. Fix an arbitrary completion G ′ of G.As G is α-erased, at most αnD edges of G ′ are (fully or partially) erased in G.As G ⋆ is a subgraph of G ′ consisting of only the nonerased edges, the adjacency lists of G and G ⋆ differ on at most 2αnD entries.As G ′ is ε-far from P, the graph G ⋆ is (ε − 2α)-far from P.
Let T be an ε-tester for P whose query complexity is q(ε, n, D).Then, for α < ε/2, an αerasure-resilient ε-tester T ′ for P can be obtained by simulating T with the proximity parameter ε − 2α on G ⋆ via the oracle O and returning the result of the simulation.The complexity of T ′ is D 2 • q(ε − 2α, n, D) as the oracle O makes at most D 2 queries to G for each of the q(ε − 2α, n, D) queries it receives.This transformation is not efficient for general graphs, as the maximum degree of a graph can be n − 1.It is interesting to understand how much erasure-resilience affects query complexity of testing monotone properties in our erasure-resilient model for general graphs.
Erasure-Resilient vs. Tolerant Testing of Graphs.For 0 ≤ ε 1 < ε 2 < 1, an (ε 1 , ε 2 )-tolerant tester for a property P must accept, with high probability, if the input is ε 1 -close4 to P and reject, with high probability, if the input is ε 2 -far from P [PRR06].Dixit et al. [DRTV18] observed that, for properties of functions, erasure-resilient testing is no harder than tolerant testing.Specifically, a tolerant tester for a property of functions can be easily converted to an erasure-resilient tester with the same complexity.The new tester can run the tolerant tester, filling in the queried erasures with arbitrary values.However, this argument fails in the case of testing properties of graphs represented as adjacency lists, since the erased entries have to be filled in so that the resulting completion is a valid graph.In the bounded-degree model, we can use a (2α, ε − 2α)-tolerant tester for a property P to obtain an α-erasure-resilient ε-tester for P with an overhead O(D 2 ) in query complexity via a transformation similar to the one explained in our discussion of monotone properties.It is an important open question to understand the relationship between erasure-resilient and tolerant testing in the general graph model.Symmetric vs. Asymmetric Erasures.Our definition of partially erased graphs is general in the sense that erased entries may be asymmetric: an edge (u, v) can be erased in Adj(u), but not in Adj(v).A partially erased graph has only symmetric erasures if it has no half-erased edges, that is, u ∈ Adj(v) iff v ∈ Adj(u) for any two nodes u, v.It is an interesting direction to investigate which computational tasks are strictly easier in the model with symmetric erasures compared to the model with asymmetric erasures.
Erasure-resilient sublinear-time algorithms, in the context of testing properties of functions, were first investigated by Dixit et al. [DRTV18], and further studied by Raskhodnikova et al. [RRV19], Pallavoor et al. [PRW20], and Ben-Eliezer et al. [BFLR20].Property testing in the general graph model was first studied by Parnas and Ron [PR02], who considered a relaxed version of the problem of testing whether the input graph has small diameter.Kaufman et al. [KKR04] studied the problem of testing bipartiteness in the general graph model and obtained tight upper and lower bounds on its complexity.

Figure 2 :
Figure 2: An example of a generalized witness to disconnectedness, where only a BFS from v 1 (but not from any other vertex) detects the generalized witness.A dotted line represents an erasure in the adjacency list of the corresponding vertex.An arrow pointing from a vertex a in the direction of a vertex b represents that b ∈ Adj(a), but a / ∈ Adj(b).
representation length of C is at most b • d < b 2 .Hence, the representation length of C is at most min{b 2 , b • d}.If v∈C Adj(v) has no erasures then every vertex in C is reachable from every other vertex in C. Otherwise, the vertex v in Condition 3 of Definition 2.8 is such a vertex.If Algorithm 2 performs a BFS from v, it will detect a generalized witness to disconnectedness after at most min{b 2 , b • d} queries and reject.Since |W| ≥ (ε − α)m/2 and each generalized witness in W has at least one vertex from which the generalized witness is detectable by a BFS, a single iteration of Algorithm 2 rejects with probability at least |W|/n = 1/b.Hence, Algorithm 2 rejects with probability at least 1 − (1 − (1/b)) ⌈b ln 3⌉ ≥ 1 − exp(− ln 3) = 2/3.

Definition 3. 1 (1
Total ordering ≺).In a partially erased graph G, for any two vertices u, v, we write u ≺ v if either deg(u) < deg(v), or deg(u) = deg(v) and u is lexicographically smaller than v. Algorithm 3: Erasure-Resilient Algorithm for Improving an Estimate of Average Degree input : Parameters ε ∈ (0, 1/2), δ ∈ (0, 1/3); query access to a partially erased graph G on n nodes; a crude estimate d of the average degree of G Set s ← 660 ln(2/δ) n ε 5 • d . 2 for i = 1 to s do 3 Sample a node u from V uniformly at random and query its degree, deg(u). 4

Algorithm 4 :4 repeat t times 5 Run6
Erasure-Resilient Algorithm for Estimating the Average Degree input : Parameter ε ∈ (0, 1/2); query access to a partially erased graph G on n nodes 1 Set t ← ⌈12 ln(4 log n)⌉. 2 for i = 0 to ⌈log n⌉ do 3 Set d i ← n/2 i .Algorithm 3 on inputs ε and d i with δ = 1/4.Let d i be the median of the answers returned by Algorithm 3 in all the runs.7 if d i > d i then return d i .

Claim 3. 6 .
If all iterations are good then d i ≥ d/8 for all i ∈ {0, 1, . . ., ℓ}.Proof.Since d i−1 = 2 d i for all i ∈ [ℓ], it suffices to prove that d ℓ ≥ d/8.Suppose for the sake of contradiction that d ℓ < d/8.Then, for some iteration k < ℓ, the estimated k satisfied d/4 ≤ d k < d/2.Since iteration k was good, part 2 of Lemma 3.2 implies that d k ≥ (1 − ε)d > d/2.Hence, d k > d k .ThenStep 7 in iteration k would have returned an output and terminated the algorithm, contradicting the fact that the algorithm ran for ℓ iterations.Hence, d ℓ ≥ d/8.By Step 7, d ℓ < d ℓ .By Claim 3.6 and part 1 of Lemma 3.2, the output satisfies d ℓ ≤ 8d.Hence, d ℓ ≤ 8d.Combining this with Claim 3.6, by part 2 of Lemma 3.2, the output of the algorithm satisfies (1 − ε)d < d < (1 + ε + 2 min(α, 1 2 ))d.The running time of each run of Algorithm 3 in Step 4 of iteration i is O 2 i/2

Figure 4 :
Figure 4: The partially erased graphs G 1 and G 2 described in the proof of Theorem 1.8.The dotted lines represent erased entries in the adjacency lists of corresponding vertices.The lines with arrows indicate that the entry corresponds to the vertex to which the arrow points to.The circles represent the (1 − λ)(n − 1)-cycles.