Abstract 1 Introduction 2 Properties of random graphs 3 Proof of Theorem 1 4 Proof of Theorem 3 5 Proof of Theorem 2 6 Proof of Theorem 4 7 Open questions References

Reconstructing Random Graphs from Distance Queries

Michael Krivelevich ORCID School of Mathematical Sciences, Tel Aviv University, Israel Maksim Zhukovskii ORCID School of Computer Science, The University of Sheffield, UK
Abstract

We estimate the minimum number of distance queries that is sufficient to reconstruct the binomial random graph G(n,p) with constant diameter with high probability. We get a tight (up to a constant factor) answer for all p>n1+o(1) outside “threshold windows” around nk/(k+1)+o(1), k>0: with high probability the query complexity equals Θ(n4dp2d), where d is the diameter of the random graph. This demonstrates the following non-monotone behaviour: the query complexity jumps down at moments when the diameter gets larger; yet, between these moments the query complexity grows. We also show that there exists a non-adaptive algorithm that reconstructs the random graph with O(n4dp2dlnn) distance queries with high probability, and this is best possible.

Keywords and phrases:
random graphs, graph reconstruction, distance queries, query complexity
Funding:
Michael Krivelevich: Research supported in part by NSF-BSF grant 2023688.
Copyright and License:
[Uncaptioned image] © Michael Krivelevich and Maksim Zhukovskii; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Random graphs
; Mathematics of computing Graph algorithms
Related Version:
Full Version: https://arxiv.org/pdf/2404.18318
Editors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz Herman

1 Introduction

Reconstruction of graphs was thoroughly studied in many different contexts and has various applications, e.g., in DNA sequencing [2, 13, 34], phylogenetics [1, 23, 29], and recovering neural networks [39]. A vast amount of literature is devoted to the average-case problem, i.e. to the reconstruction of random graphs [8, 12, 19, 21, 28, 31, 33, 35, 41]. In particular, the conjecture of Kelly and Ulam [24, 42] which is considered as one of the main reconstruction challenges is known to be true in binomial random graphs whp111With high probability, that is, with probability tending to 1 as n. [8, 28, 35].

Let G be a connected graph on [n]:={1,,n}, and let 𝒬([n]2) be a set of pairs of vertices. In the usual way, we denote by dG(x,y) the graph distance between x and y in G. Let us say that 𝒬 reconstructs G, if G is the only graph on [n] with distances dG(x,y) between {x,y}𝒬, i.e. for every graph G on [n] such that dG(x,y)=dG(x,y) for all {x,y}𝒬, we have G=G. We will also call the graph 𝒬-reconstructible in this case.

In this paper, we study the model of reconstruction with an access to distance query oracle introduced in [5], although its variants were considered long before that for modelling reconstruction of a phylogenetic tree [18, 25, 43]. For every queried pair of vertices, the oracle answers the distance between these vertices in the hidden unknown graph G. For a queried pair of vertices {x,y}, we denote the oracle’s answer to this query by d(x,y), omitting the unknown graph in the subscript. An adaptive algorithm, at every step, selects the next query (a pair of vertices) based on the responses from the oracle to earlier queries.

Let q. If there exists an adaptive algorithm 𝙰 such that, for the hidden input graph G on [n], it queries at most q pairs of vertices comprising a set 𝒬([n]2) that reconstructs G, then we call G q-reconstructible by 𝙰. Let us call the minimum q such that G is q-reconstructible by some algorithm the (distance) query complexity of G. It is easy to see that, in the worst case, the query complexity equals (n2): all pairs should be queried in order to reconstruct the graph. For example, this is the case when G is a clique. Reyzin and Srivastava [37] showed that, even for some trees, Ω(n2) pairs are required to query. On the other hand, Mathieu and Zhou [31] presented an algorithm such that whp a uniformly random d-regular graph on [n] is (nln2n+dn/2)-reconstructible by this algorithm, and wondered whether methods similar to those developed in their paper are applicable to sparse binomial random graphs G(n,p). In this paper, in contrast, we study the distance query complexity of relatively dense random graphs G(n,p) assuming p>n1+ε for some ε>0, i.e., when the random graph has bounded diameter whp. Although our main motivation lies in attempting to achieve tight bounds for the average-case complexity in a situation where the general setup is hard to analyse, additional interest in this problem is sparked by a surprising phenomenon, which will be discussed in the next two paragraphs.

One can expect that dense graphs are typically harder to recover through distance queries, as the influence of a single edge to be recovered on distances between vertices might be less pronounced and thus harder to detect. In order to get some intuition of how the query complexity of the random graph evolves, let us first assume that p is significantly above the threshold for the property of having diameter at most 2, namely, p2n2lnnlnlnn as n. In this case, whp every pair of non-adjacent vertices has at least two common neighbours. Indeed, probability that a fixed pair of vertices has at most one common neighbour equals

(Bin(n2,p2)1)=(1p2)n2+(n2)p2(1p2)n3=O(np2enp2)=o(n2).

Therefore, by the union bound, every non-adjacent pair of vertices of G(n,p) has at least two common neighbours whp. Then whp the query complexity equals (n2). Indeed, for any set 𝒬 of pairs of vertices of size less than (n2), changing any single adjacency relation of a pair from ([n]2)𝒬 does not influence the distance between x and y for any {x,y}𝒬. Therefore, any set 𝒬 that does not contain all pairs does not reconstruct G(n,p) whp.

On the other hand, it seems likely that for p close to the connectivity threshold lnn/n, the query complexity might be much less – quasilinear, as in the case of random regular graphs. Note that, in general the query complexity does not decrease as the graph becomes sparser since, in particular, as we mention above, for certain trees, the query complexity equals Ω(n2). Nevertheless, we show that, on average, as density decreases, the query complexity jumps down at specific moments around hitting times of diameter’s increments. However, between these moments the query complexity grows quite rapidly. In particular, there exist n1+εp1p2n1/2 such that whp the diameter of G(n,p1) is larger than the diameter of G(n,p2) while the query complexity of the former is also larger.

Related work

Kannan, Mathieu, and Zhou [22, 30] presented a reconstruction algorithm that uses O~(n3/2) distance queries for bounded degree graphs. They also proved an information–theoretic lower bound Ω(nlogn/loglogn) for trees with maximum degree 3 and asked whether O~(n) is achievable for all bounded degree graphs. Tight results for certain families of bounded degree graphs – trees, chordal graphs, and graphs with bounded treelength – were recently obtained by Bastide and Groenland in [4]. In particular, they proved that for every Δ3, any randomised algorithm requires Ω(nlogn) queries to reconstruct n-vertex trees of maximum degree Δ for a certain sequence of sizes of trees, which is tight since there exists a deterministic algorithm to reconstruct such trees using O(nlogn) queries. In [31], Mathieu and Zhou answered the question from [22] for random d-regular graphs with constant d: whp nln2n distance queries is enough to reconstruct a uniformly random d-regular graph.

A modified version of the problem where the hidden graph is a tree and it is only allowed to query pairs of leaves (that are known) comes from biology: it serves as a model of reconstructing evolutionary (phylogenetic) trees and was introduced by Waterman, Smith, Singh, and Beyer [43]. This problem is well studied: lower bounds on query complexity for deterministic and randomised algorithms were obtained in [25] and [4] respectively, and upper bounds were investigated in [11, 23, 25, 27].

Other types of query oracles explored in the literature include all-shortest-path, all-distances, and shortest-path queries. The first two oracles answer all shortest paths [5, 38] and all distances [5, 14] from a queried vertex respectively. The shortest-path oracle answers a shortest path between a queried pair of vertices [22, 37]. Erlebach, Hall, and Mihal’ák [15] studied reconstruction of G(n,p) from all-shortest-path queries for p=const and p=n1+ε for specific ε>0.

Mathieu and Zhou [31] also showed that their construction of the set of distance queries can be used to get an upper bound for the metric dimension of random d-regular graphs. Let us recall that the metric dimension β(G) of a graph G is the minimum cardinality of SV(G) such that all vertices of G have different vectors of distances to the vertices from S. Such resolving sets S or their variants were used, in particular, to label canonically vertices of random graphs [3, 26]. The metric dimension of G(n,p) was studied by Bollobás, Mitsche, and Prałat in [10]. In particular, they proved that the metric dimension of G(n,p) undergoes the following “zigzag” behaviour: if p=nα, then lognβ(G(n,p))=1(1α)1/(1α) whp; see also [36] for related results. Although in our proof of the upper bound for the distance query complexity of G(n,p) we use a resolving set whose size is far from being optimal, as we show below the “jumps” of the metric dimension and of the query complexity are synchronous – both happen around nk/(k+1), k>0.

Our contribution

As follows from the next result, the query complexity drops for the first time when p passes a threshold n1/2+o(1) on its way down. More generally, we prove that, for every integer constant k1, as soon as p passes a threshold n(k+1)/(k+2)+o(1), the query complexity drops by a factor of Θ(np).

Theorem 1.

Let k1 be a fixed integer, ε>0, n1+ε<p<nk/(k+1)ε, and let GnG(n,p). Then there exist C=C(k,ε)>0 and an adaptive algorithm 𝙰 such that, for q=C/(nk2pk), whp Gn is q-reconstructible by 𝙰.

We prove Theorem 1 in Section 3.

Theorem 2.

Let k1 be a fixed integer, ε>0, and let nk+1pk+22lnn as n. Then whp GnG(n,p) has query complexity at least 12(k+1+ε)nk2pk.

Theorem 2 is proven in Section 5. Let us emphasise that its proof is based on an explicit argument which is more efficient than the usual information-theoretic approach: we show a lower bound on the query complexity of a deterministic graph in terms of its diameter and maximum degree, see Claim 14. Unlike the information-theoretic approach, our approach allows to bound the query complexity of any given graph. So, it actually demonstrates that, when the number of queries is less than the bound, even knowing the input graph G does not help choosing the optimal quering strategy – no matter which pairs of vertices are queried, there is always a graph GG with exactly the same answers.

Note that, for any wn as n and any

(2lnn+wnnk+1)1/(k+2)<p<nk/(k+1)ε, (1)

the bounds in Theorem 1 and Theorem 2 differ by a constant factor, see Figure 1. So, roughly speaking (ignoring the constant factor) the query complexity in the interval (1) increases with p.

Refer to caption
Figure 1: Upper and lower bounds for query complexity of G(n,p) for p satisfying (1).

We also prove tight (up to a constant factor) bounds for “non-adaptive reconstructibility”. It turns out that if the set of queried pairs 𝒬 is fixed, then both upper and lower bounds on the minimum |𝒬| such that G(n,p) is 𝒬-reconstructable increase by a factor of logn when p satisfies (1).

Theorem 3.

Let k1 be a fixed integer, ε>0, and let p>n1+ε. Then there exists a set 𝒬=𝒬(n)([n]2) of size at most 3nk2pkenkpk+1lnn such that whp GnG(n,p) is 𝒬-reconstructible.

Theorem 3 is proven in Section 4.

Theorem 4.

Let k1 be a fixed integer, ε>0, and let wn as n. If

(2lnn+wnnk+1)1/(k+2)<p<((1k+1ε)lnnnk)1/(k+1),

then there is no set 𝒬=𝒬(n) of size at most 18(k+1)2nk2pkenkpk+1lnn such that whp GnG(n,p) is 𝒬-reconstructible.

Theorem 4 is proven in Section 6.

Proofs outline

For simplicity of presentation, let us assume that p=nα, where kk+1<α<k+1k+2 for some k>0. The crucial observation that allows to prove tight (up to a constant) upper bounds both for adaptive and non-adaptive query complexities is the following: if {v1,x} and {v2,x} are queried and d(v2,x)d(v1,x)+2, then v1 and v2 are not adjacent.

Although whp the diameter of GnG(n,p) equals k+2, for a sufficiently large fixed set X[n], every vertex v[n] has a vertex from X in its k-neighbourhood. Indeed, if X has size Θ(lnnnk1pk), then the expected number of k-paths from a fixed vertex v[n] to X is Θ(lnn). Thus, using standard concentration inequalities, it is possible to overcome the union bound over the choice of v[n] for |X|=Clnnnk1pk and C sufficiently large. Actually, the following stronger property can be proved: whp for every v1,v2X, there exists xX such that dGn(v1,x)k while dGn{v1,v2}(v2,x)k+2, where Gn{v1,v2} is a spanning subgraph of Gn that is obtained by excluding the edge {v1,v2}. Thus, as soon as all pairs of vertices with at least one vertex in X are queried, all the remaining adjacencies are reconstructible whp. Indeed, for non-adjacent v1,v2, there exists x such that d(v1,x)k and d(v2,x)k+2, thus we may reconstruct all non-adjacencies. All the other pairs of vertices must be adjacent since otherwise if we could exclude, say, the edge {v1,v2}, then there is an xX such that d(v2,x)k+1 while, in the corrupted graph, the distance between v2 and x would be at least k+2. This proves Theorem 3, see details in Section 4. A counterpart of this very simple algorithm was employed by Mathieu and Zhou in [31], who showed that it allows typically to reconstruct random regular graphs with (1+o(1))nln2n queries, which is tight, up to a (logn)1+o(1)-factor.

The proof of Theorem 1 in Section 3 is based on similar ideas but also utilises the possibility of updating queries adaptively: indeed, instead of querying all pairs {xX,v[n]} simultaneously, we may split the set X into ν=Θ(logn) equal parts X1,,Xν and then, for every i=1,,ν, query a pair {xXi,v[n]} only if v has not yet been excluded from the consideration. A vertex v is excluded from the consideration at time i, if it has sufficiently many (at least M for some large enough constant M) vertices in the intersection of its k-neighbourhood with X1Xi. This allows to query the desired number of pairs whp. In order to complete the proof, it remains to observe that whp there are no v1,v2 that have M different vertices xX so that dGn{v1,v2}(v1,x)k and dGn{v1,v2}(v2,x)k+1. This would imply the existence of a good xX as in the proof of Theorem 3 for any pair {v1,v2}.

Proofs of both lower bounds in Theorem 2 and Theorem 4 use the notion of an undetectable pair in a graph G having diameter d3 with a set of queried pairs 𝒬(V(G)2): a pair of vertices {u1,u2} is undetectable if {u1,u2}𝒬E(G) and there is no “path” u1w1wu2 in G, where d2, such that of its “edges” are actual edges of G, and the one remaining pair of consecutive vertices belongs to 𝒬. It is easy to see that if G has at least one undetectable pair, then it is not reconstructible from 𝒬 since the addition of the edge {u1,u2} does not change answers to queries. When 𝒬 is small, then a simple counting proof shows that there are undetectable pairs in Gn whp. This proves Theorem 2, see details in Section 5.

In order to improve the lower bound by a factor of logn for the “non-adaptive reconstructibility” (Theorem 4), we first refine the notion of undetectability, and then introduce a randomised algorithm that finds an undetectable pair (in the refined sense) in Gn whp. Let us call a pair of vertices {u1,u2} -undetectable in G, if {u1,u2}𝒬E(G) and there is no “path” u1w1wu2 in G such that of its “edges” are actual edges of G, the one remaining pair of consecutive vertices belongs to 𝒬, and the distance between the two vertices in this pair is at least +2. Though the refined notion is weaker, it still allows to prove non-reconstructibility: if G has diameter d and has a pair which is -undetectable for all d2, then G is not 𝒬-reconstructible. Notice that, to decide whether a given pair {u1,u2} is undetectable, it is sufficient to know only the edges induced by the union of k-neighbourhoods of u1 and u2. In Section 6, we show that the following algorithm finds an undetectable pair whp: at every step, choose a pair of vertices {u1,u2} uniformly at random from the set of unconsidered vertices, consider all vertices in k-neighbourhoods of u1,u2, and check whether {u1,u2} is -undetectable for all k.

Structure of the paper

The rest of the paper is organised as follows. In Section 2 we recall properties of random graphs that we will use in our proofs: namely, asymptotic behaviour of sizes of balls of small radii and the diameter. Proofs of upper bounds – Theorem 1 and Theorem 3 – appear in Sections 3 and 4 respectively. In Sections 5 and 6 we prove the lower bounds, Theorems 2 and 4 respectively. Finally, in Section 7 we raise several natural questions that we leave unanswered.

2 Properties of random graphs

In this section, we describe certain properties of GnG(n,p) that we will use in the proofs. We also refer the reader to monographs [9, 16, 20] for a comprehensive exposition of properties of random graphs and of probabilistic tools – especially, to [20, Chapter 2] containing well-known standard concentration inequalities (in particular, Chernoff bounds for binomial distribution [20, Theorem 2.1], Hoeffding’s exponential tail bound for the hypergeometric distribution [20, Theorem 2.10], and Harris’ lemma [20, Theorem 2.12]) that we will use in the proofs and will not state them explicitly.

Let us state a technical claim about the sizes of k-neighbourhoods in Gn that we will use frequently both for lower and upper bounds. For v[n] and r0, everywhere in the paper we denote by Nr(v) the (random) r-ball around v in Gn. We will also denote Nr(U):=vUNr(v) for U[n].

Claim 5.

Let k be a positive integer and let log2nnp(nk1logn)1/k. Fix v[n]. With probability 1o(n2), in G(n,p) we have |Nr(v)|=(np)r(1+o(log1n)), for all r[k].

Proof.

From the Chernoff bound, it follows that |N1(v)|=np(1+o(1/logn)) with probability 1o(n2). Let us prove by induction over r that, for every, rk, |Nr(v)|=(np)i(1+o(1/logn)) with probability 1o(n2). Assume that for an r[k1], this probability bound is already obtained. Let us expose the r-ball around v and denote the set of vertices at distance exactly r by Nr. Due to the induction assumption, we may suppose that |Nr|=(np)r(1+o(1/logn))=o(n/logn). Thus, the number of vertices in [n]Nr(v) having neighbours in Nr (denoted by |Nr+1|) is distributed as Bin(n(1o(1/logn)),1(1p)|Nr|). Clearly, Nr+1(v)=Nr(v)Nr+1. Since 𝔼|Nr+1|=nr+1pr+1(1+o(1/logn))log4n, we get that

(Nr+1|𝔼|Nr+1>𝔼|Nr+1|/(lnn)3/2)=exp[Ω(𝔼|Nr+1|/ln3n)]=o(n2)

by the Chernoff bound, as needed.

Let us recall the result of Bollobás [7] about the asymptotic distribution of the diameter of Gn, see also [9, Chapter 10.2].

Theorem 6.

Let c be a constant and p=(2lnn+cnk)1/(k+1). Then the diameter of Gn converges in distribution to a random variable taking values k+1 and k+2 with probabilities eec/2 and 1eec/2 respectively.

Since the property of having diameter at most d is monotone, we immediately get the following corollary.

Corollary 7.

If nkpk+12lnn as n, then whp the diameter of Gn is at most k+1. If nkpk+12lnn as n, then whp the diameter of Gn is at least k+2.

3 Proof of Theorem 1

Let us recall that k1 is a fixed integer, ε>0, and n1+ε<p<nk/(k+1)ε. We let GnG(n,p). Let M be large enough and let us choose a large enough real CM. Let us also set ν:=log2n and fix a set X[n] of size νC/(2nk1pk). We will need the following auxiliary statements.

Lemma 8.

Let i=i(n)[ν] and let XX be a fixed set of size iC/(nk1pk). Then with probability 1o(1/logn), at least (n|X|)(12i) vertices of [n]X have at least M vertices at distance at most k in X.

For an edge e of a graph G, we denote by Ge the spanning subgraph of G obtained by deleting e. If eE(G), then we let Ge=G.

Lemma 9.

Whp there are no v1,v2[n] and u1,,uMX such that, for all j[M],

dGn{v1,v2}(v1,uj)k and dGn{v1,v2}(v2,uj)k+1. (2)

We prove Lemma 8 and Lemma 9 in Sections 3.1 and 3.2, respectively.

Let us consider a balanced partition of X into ν disjoint sets X1,,Xν and assume that Gn is a graph satisfying conclusions of Lemma 8 and Lemma 9 deterministically, i.e.:

  1. 1

    for every i[ν], the number of vertices v[n]X that have at least M vertices uX1Xi such that d(v,u)k is at least (n|X|)(12i);

  2. 2

    there are no v1,v2[n] and u1,,uMX such that (2) holds for all j[M].

Let us prove that these conclusions imply the statement of Theorem 1.

Let us first query all pairs from (X2) in Gn. Then, we query all pairs from X1×([n]X) and find all vertices vX such that there are at least M vertices uX1 satisfying d(v,u)k. Let V1[n]X be the set of all such vertices v. If V1=[n]X, then we set i^:=1. We then proceed by induction: assume that at step i{1,,ν1} disjoint sets V1,,Vi[n]X are chosen and V1Vi[n]X. We then query all pairs from Xi+1×([n](XV1Vi)) and find all vertices vXV1Vi such that there are at least M vertices uX1XiXi+1 satisfying d(v,u)k. We let Vi+1[n](XV1Vi) be the set of all such v. If V1Vi+1=[n]X, then we set i^:=i+1. From 𝟏, we get that, for every i, |V1Vi|(12i)(n|X|). Thus, the inductive procedure stops at some moment i^ν, i.e. V1Vi^=nX, and we query

(|X|2)+i=1i^|Xi||[n](XV1Vi1)|C2nk1pki=1i^n21i<2nC2nk1pkq

pairs.

Let us now show that Gn is reconstructible from the set of queried pairs. For every ii^, we have to recover all adjacencies between v1Vi and v2[n]. Without loss of generality we may assume that v2 is either from X or from Vj for some j>i. Since all pairs from Vi×(X1Xi) are queried, we may consider only v2(Xi+1Xν)(Vi+1Vi^). It is crucial that for all such v2, the pairs {v2,u} for uX1Xi are queried, so that we got both d(v1,u) and d(v2,u) from the oracle.

Let us first assume that v1 and v2 are non-adjacent in Gn. Let u1,,uMX1Xi be at distance at most k from v1. From 𝟐, it follows that there exists u{u1,,uM} such that d(v1,u)k while d(v2,u)k+2. It may only happen when v1,v2 are not adjacent. Thus, we may “reconstruct non-adjacencies” for all pairs of non-adjacent vertices in Gn.

Now, assume that G is obtained from Gn by removing some edges that belong to ii^Vi×([n](X1Xi)) and that for G we get exactly the same answers as for Gn for all queried pairs. Let {v1,v2}GnG, where v1Vi and v2(Xi+1Xν)(Vi+1Vi^). As above, we let u1,,uMX1Xi be at distance at most k from v1 in Gn. From 𝟐, we get that there exists u{u1,,uM} such that dG(v2,u)k+2. This inequality holds true since distances in G cannot be less than in Gn. But then dGn(v2,u)k+2 as dGn(v2,u)=dG(v2,u)=d(v2,u) by the assumption since the pair {v2,u} was queried. On the other hand, v1v2 in Gn and d(v1,u)=dGn(v1,u)k. So d(v2,u)d(v1,u)+1k+1 – a contradiction. Thus, edges are also reconstructible, Theorem 1 follows.

3.1 Proof of Lemma 8

Fix i[ν] and XX of size iC/(2nk1pk). For v[n]X, we let v be the event saying that at least M vertices in X are at distance at most k from v. We denote ξ:=v[n]X𝟙v the number of such vertices. We have to prove that (ξ<(n|X|)(12i))=o(1/logn).

Fix a vertex v[n]X. For any N[n], subject to the event {|Nk(v)|=N}, the set Nk(v) is a uniformly random N-subset of [n] that contains v. We derive the following inequality from Hoeffding’s exponential tail bound for the hypergeometric distribution (see [20, Theorem 2.10]): for every N=(np)k(1+o(1)),

(|Nk(v)X|<14iC||Nk(v)|=N)exp[iC16o(1)]. (3)

First of all, (3) implies that (|Nk(v)X|<M|Nk|=N)=o(1/(nlogn)) uniformly over N=(np)k(1+o(1)) if i17lnn/C. By the union bound and Claim 5, we get that ξ=n|X| for such i with probability 1o(1/logn). We then further assume that i<17lnn/C and that C is so large that e2i>n2/(k+1).

For v[n]X, denote by Nr¬X(v) the r-ball around v in Gn[[n]X]. Due to Claim 5, with probability 1o(1/n), |Nk1¬X(v)|=(np)k1(1+o(1)) for all v[n]X. We then expose only edges induced by [n]X and assume that the latter event holds deterministically.

For v[n]X, we let 𝒞v be the event saying that less than M vertices in X have neighbours in Nk1¬X(v). Let η=v[n]X𝟙𝒞v. Note that ηn|X|ξ. Therefore, it suffices to prove that (η2i(n|X|))=o(1/logn). A vertex from X has a neighbour in Nk1¬X(v) for a fixed v[n]X with probability

1(1p)(np)k1(1+o(1))=1enk1pk(1+o(1))nk1pk(1+o(1))

since p<nk/(k+1)ε. Therefore, by the Chernoff bound,

(𝒞v)(Bin(|X|,nk1pk(1+o(1))<14iC)exp[iC16o(1)].

We get that 𝔼ηei(n|X|) for C>16 and for large n. Since the relation uNk1¬X(v) on pairs (u,v) of vertices from [n]X is symmetric, every vertex from [n]X belongs to at most (np)k1(1+o(1)) balls Nk1¬X(v), v[n]X. We shall use the following bounded difference inequality applied to η considered as a function of edges between X and [n]X:

Theorem 10 ([32]).

Let ξ1,,ξNBern(p) be independent, and let f:N satisfy

|f(x1,,xi1,xi,xi+1,,xN)f(x1,,xi1,xi,xi+1,,xN)|ci

for some c1,,cN>0 and every i[N], x1,,xN,xi. Then, for every t>0,

(|f(ξ1,,ξN)𝔼f(ξ1,,ξN)|t)exp[t22p(1p)i=1Nci2+23tmaxici].

Assuming that G^n is a graph on [n] such that G^n[[n]X]=Gn[[n]X] contains exactly the exposed edges, we get that, for any pair eX×([n]X), values η(G^n) and η(G^ne) differ by at most (np)k1(1+o(1)) since e may change only the values of 𝟙𝒞v such that the vertex of e from [n]X belongs to Nk1¬X(v). Therefore, we may set all ci equal to (np)k1(1+o(1)). Then Theorem 10 implies that, for some constant c>0,

(η2i(n|X|)) (ηe2ei(n|X|))
exp[ce2i(n|X|)2p(n|X|)|X|(np)2k2+ei(n|X|)(np)k1]
exp[2cC+1e2ini(np)k1]exp[nε(k1)+o(1)]=o(1logn),

where the penultimate inequality is due to e2i>n2/(k+1), completing the proof.

3.2 Proof of Lemma 9

Assume that MM1/ε.

Claim 11.

Fix a vertex v[n]. With probability 1o(n2), |Nk1(v)X|M.

Proof.

Due to Claim 5, we have that |Nk1(v)|=(np)k1(1+o(1)) with probability 1o(n2). For simplicity of presentation, let us assume that vX (the case vX does not differ much). Subject to |Nk1(v)|=N+1, the random variable |Nk1(v)X| has a hypergeometric distribution, implying that

(|Nk1(v)X|>M) (1+o(1))maxN=(np)k1(1+o(1))(|X|M)(n|X|1NM)(n1N)
(1+o(1))(|X|Mn)M=o(n2).

Let us define the following family of graphs on [n]: fix a vertex v1 and draw from this vertex M/M disjoint paths of length at most k to distinct vertices from X and avoiding the edge {v1,v2}; fix another vertex v2 and draw from this vertex a path of length at most k+1 to each of the paths starting at v1 and avoiding the edge {v1,v2}: these M/M paths from v2 are not necessarily disjoint.

Claim 12.

Whp Gn does not contain any subgraph from .

Proof.

First of all note that any H has at least M/M and at most 2kM/M+2 vertices. Moreover, for any x in this range, the number of graphs in with exactly x vertices is at most cxnxM/M|X|M/M for a certain constant cx – there are at most |X|M/M choices for the M/M ends in X of disjoint paths emanating from v1 and at most nxM/M ways to choose the remaining xM/M vertices. Let us bound from below the number of edges in such a graph and apply the union bound. Let us fix vertices v1,v2 and u1,,uM/M that play the role of ends of paths from v1 in X. Note that a graph from can be obtained from this set of vertices by a sequential addition of 2M/M paths joining two vertices that are presented in the graph at the previous step. First M/M paths are of length at most k, and the last M/M paths are of length at most k+1. Assuming that at the j-th step the number of added vertices equals xj, we get that the constructed graph H has density

ρ(H)=|E(H)||V(H)|=j=12M/M(xi+1)2+M/M+j=12M/Mxi=x2+M/Mx(2k+1)M/M2kM/M+2.

Finally, assuming M/M>3ε(k+1), the expected number of subgraphs H in Gn is at most

xcxnxM/M|X|M/Mpx(2k+1)M/M2kM/M+2=O((νnk+2/M/Mpk+1)M/M)=O(n1).

Let us now fix v1,v2 and assume that the graph G:=Gn{v1,v2} satisfies the following properties deterministically:

  • for all v[n], |Nk1(v)X|M;

  • u1,,uMNk(v1);

  • G does not contain a graph from family .

It is then sufficient to prove that there exists j[M] such that ujNk+1(v2).

Take a shortest path P1 (of length at most k) between v1 and u1. We then proceed by induction. Assume that at step j<M/M, we found j disjoint shortest paths P1,,Pj from v1 to u1,,uj respectively. Let w1,,wj be the first vertices (after v1) on P1,,Pj respectively. We have that there exists j[M][j] such that ujNk1(w1)Nk1(wj). Without loss of generality let j=j+1. Observe that the shortest path Pj+1 between v1 and uj+1 cannot intersect any of P1,,Pj outside of v1. Eventually we get M/M disjoint paths from v1 to u1,,uM/M. If, for some j[M/M], ujNk+1(v2), we are done. Otherwise, Gn contains a graph from – a contradiction.

4 Proof of Theorem 3

Let λ=nkpk+1. Without loss of generality we may assume that

p((k+o(1))lnn(k+1)nk)1/(k+1) (4)

so that 3eλlnnnk2pk(n2). Let X be a fixed set of vertices of size 3eλlnn/(nk1pk)n2. Let us query all the pairs from [X×([n]X)](X2). Thus, we query at most 3nk2pkeλlnn pairs. As in the previous section, we let GnG(n,p), and, for v[n], we let Nk(v) denote the set of vertices in Gn at distance at most k from v.

We further show that whp in Gn for every v1,v2[n]X, there is an xX such that x is at distance at most k from v1 and at least k+2 from v2 in the graph obtained from Gn by removing the edge {v1,v2}. It would immediately imply that whp Gn is q-reconstructible. Indeed, we need only to reconstruct adjacency relations between pairs of vertices v1,v2X. If v1,v2 are non-adjacent, then we queried {v1,x}, {v2,x} such that d(v1,x)k, d(v2,x)k+2. This may only happen when v1,v2 are non-adjacent. Then we can reconstruct all non-adjacencies. It remains to prove that we cannot delete any subset of edges from Gn[[n]X] without changing the answers to the queries. If v1v2, then we queried {v1,x}, {v2,x} such that d(v1,x)k, d(v2,x)d(v1,x)+1, and any shortest path from x to v2 follows the edge {v1,v2}. Thus, the edge {v1,v2} must be in the graph.

Therefore, the following lemma immediately imples Theorem 3.

Lemma 13.

Whp for every v1,v2[n]X, there is an xX such that

dGn{v1,v2}(v1,x)kanddGn{v1,v2}(v2,x)k+2.
Proof.

Fix vertices v1,v2X and remove the edge {v1,v2} if it is in Gn. Let N be the (closed) (k1)-neghbourhood of v1 in Gn[X{v2}], and let Ni′′ be the (closed) i-neghbourhood of v2 in Gn[NX] for i{k1,k}. Note that, as soon as N is exposed, Nk1′′ and Nk′′ are defined solely by adjacencies that are entirely outside N and their cardinalities do not depend on edges induced by N – only on the size of N. Therefore, due to Claim 5, we may assume that |N|=(np)k1(1+o(1/logn)), |Ni′′|=(np)i(1+o(1/logn)), i{k1,k}. By the Chernoff bound and due to (4), the number of neighbours of vertices from Nk1′′ in N is o((np)k1/logn) with probability 1o(n2). Note that the above makes sense only when k2. In the case k=1, we do not need to apply the Chernoff bound since N={v1}, Nk1′′={v2}, and there is no edge {v1,v2} in the considered graph.

We then exclude from N the neighbours of Nk1′′ and get that the refined set N0 still has the size (np)k1(1+o(1/logn)). If we can prove that with probability 1o(n2), there exists a vertex xX that has a neighbour in N0 and does not have neighbours in Nk′′(NN0), we immediately get the statement of Lemma 13: if P is a path of length at most k+1 from v2 to x in Gn{v1,v2}, then by construction the neighbour of x on P is in N0. Then the farthest from x vertex in PN0 has a neighbour in PNk1′′ – a contradiction. The probability bound on the latter event is also immediate since

(Bin(|X|,(1(1p)|N0|)(1p)|Nk′′|+|NN0|)=0)=(1(1(1p)(np)k1(1+o(1/logn)))(1p)(np)k(1+o(1/logn)))|X|=(4)exp[(1+o(1))|X|nk1pkeλ]=o(n2).

5 Proof of Theorem 2

Without loss of generality, we assume p=o(1). Let us show that Theorem 2 follows from

Claim 14.

Let d3, ε>0. There exists C=C(d,ε)>0 such that, for every graph G on [n] with diameter at most d and maximum degree Δ>C, its query complexity q satisfies:

q(d1+ε)Δd2(n2)|E(G)|. (4)

Indeed, whp Gn has (1+o(1))(n2)p edges, diameter at most k+2 (by Corollary 7), and maximum degree np(1+o(1)) (by Claim 5, see also [6]), implying that the query complexity of G(n,p) is at least 12(k+1+ε)nk2pk whp, as needed.

Proof of Claim 14..

Let q be an integer that does not satisfy (4). Let 𝒬([n]2) have size q. Let us call a pair of vertices {u1,u2} undetectable in G if:

  • u1,u2 are not adjacent in G,

  • the pair {u1,u2} was not queried,

  • there is no “path” u1w1wu2, d2, such that of its “edges” are actual edges of G, and the one remaining pair of consecutive vertices belongs to 𝒬.

Assuming that the pair {u1,u2} is undetectable in G, observe that the addition of the edge {u1,u2} agrees with the answers to all the queries. Indeed, if it affects some queried pair {w,w}, then there should be a path wu1u2w of length at most d1 in G{u1,u2} – a contradiction.

It remains to prove that G has an undetectable pair. For an appropriate choice of C, the number of pairs that do not satisfy the third property of an undetectable pair is at most

=1d2(+1)qΔ<q((d1)Δd2+(d2)2Δd3)<q(d1+ε)Δd2q

On the other hand, the number of pairs that satisfy the first two properties is at least (n2)|E(G)|q>q(d1+ε)Δd2q, implying the existence of an undetectable pair.

6 Proof of Theorem 4

Let GnG(n,p), N:=eλlnn8(k+1)2nk2pk, where λ=nkpk+1. We assume that a set 𝒬([n]2) of N pairs of vertices from [n] is fixed in advance to be queried. We call a pair {u1,u2} small if

  • the number of {v1,v2}𝒬 such that v1Nk(u1) and v2Nkk(u2) for some k{0,1,,k} is at most M:=7(k+1)(np)kN/n2;

  • there are no {v1,v2}𝒬 such that v1Nk(u1) and v2Nk1k(u2) for some k{0,1,,k1};

  • there are no {v1,v2}𝒬 such that, for some k{0,1,,k}, v1Nk(u1), v2Nkk(u2), and either v2Nk(u1) or v1Nk(u2).

Claim 15.

Whp at least 12(n2) pairs of vertices are small.

Proof.

Due to Claim 5, whp, for every k{0,1,,k}, for every {v1,v2}𝒬, there are at most (1+o(1))(np)k ordered pairs (u1,u2) such that v1Nk(u1) and v2Nkk(u2). Then whp the following holds: if the number of pairs {u1,u2} satisfying the first condition in the definition of a small pair is less than 56(n2), then 16n(n1)M<(1+o(1))(k+1)(np)kN. The later inequality is false for large enough n. Similarly, whp, for every {v1,v2}𝒬, there are at most (1+o(1))k(np)k1 ordered pairs (u1,u2) such that v1Nk(u1) and v2Nk1k(u2) for some k{0,1,,k1}. Then whp the following holds: if the number of pairs {u1,u2} satisfying the second condition in the definition of a small pair is less than 56(n2), then 16n(n1)<(1+o(1))k(np)k1N. The later inequality is false for large enough n as well.

Let us finally show that whp the number of pairs {u1,u2} satisfying the third condition in the definition of a small pair is at least 56(n2). We will do it in two steps: first, we will show that whp there are only o(n2) pairs {u1,u2} that have in their “neighbourhood” Nk(u1)×Nkk(u2) queried pairs {v1,v2} with d(v1,v2)k; second, we will observe that whp as soon as a queried pair {v1,v2} has d(v1,v2)>k, then the intersection of the k-ball around v1 and the k-ball around v2 has size around (np)k+k/n – that will be sufficient to finish the proof.

Fix a queried pair {v1,v2}. Since the expected number of paths of length at most k between v1,v2 is at most nk1pk(1+o(1)), we get that the distance between v1,v2 is at most k with probability at most nk1pk(1+o(1)) by Markov’s inequality. Therefore, the expected number of pairs in 𝒬 that are at distance at most k is at most nk1pk(1+o(1))N. By Markov’s inequality, we get that the number of such pairs in 𝒬 is Op(nk1pkN). As we noted above, whp, for every pair {v1,v2}𝒬, there are at most (1+o(1))(k+1)(np)k pairs (u1,u2) such that v1Nk(u1), v2Nkk(u2) for some k{0,1,,k}. Let us fix a sequence wn growing to infinity with n sufficiently slowly. We get that whp, if the number of pairs {u1,u2} that have v1Nk(u1) and v2Nkk(u2) for some k{0,1,,k} such that {v1,v2}𝒬 and d(v1,v2)k is at least 112(n2), then 112n(n1)<wn(np)knk1pkN. This is false for all large enough n. This proves that whp the number of pairs satisfying the last property in the definition of small pairs with the restriction that d(v1,v2)k (note that it always holds for k{0,k}) is at least 1112(n2). It then remains to prove that whp the number of pairs {u1,u2} with no {v1,v2}𝒬 such that d(v1,v2)>k and, for some k[k1], v1Nk(u1), v2Nkk(u2), and either v2Nk(u1) or v1Nk(u2), is at least 1112(n2) as well.

As promised, we first show that whp, for any {v1,v2} such that d(v1,v2)>k and any k[k], the intersection of the k-ball around v1 and the k-ball around v2 has size at most (1+o(1))(np)k+k/n+ln2n. Fix v2 and expose the k-ball around it. Due to Claim 5, with probability 1o(n2), this ball Nk(v2) has size (1+o(1))(np)k. Let us now observe that we may assume that all vertices from S:=Nk(v2)Nk1(v2) have degrees at most ln2n in Nk(v2). Indeed, the complementary event implies that there exists a vertex wS such that the total number of (k+1)-paths and k-paths between u in v2 is at least ln2n, which is false with probability 1o(n2): due to [40, Theorem 5], there exists C>0 such that, with probability 1o(n2), the number of paths of length at most k+1 between any two vertices in G(n,p) is at most Clnn.

Then, fix v1Nk(v2). For i{0,1,,k1}, let Ni be the i-neighbourhood of v1 in [n]Nk(v2), and let Ni be the set of neighbours of Ni in Nk(v2). Note that, for every vertex wNk(v1)Nk(v2), a shortest path from w to v1 leaves Nk(v2) at some vertex wS. Thus, the set Nk(v1)Nk(v2) coincides with i=0k1[Nk1i(Ni)Nk(v2)]. Then

|Nk(v2)Nk(v1)|i=0k2|Ni|ln2n(np)k2i+|Nk1| (5)

with probability 1o(n2). Moreover, due to Claim 5, with probability 1o(n2), for every i{0,1,,k1}, |Ni|(np)i(1+o(1)).

Expose all Ni, i{0,1,,k1}, and assume that |Ni|(np)i(1+o(1)) and |Nk(v2)|(1+o(1))(np)k hold deterministically. Since |Ni| is the number of vertices in Nk(v2) that have at least one neighbour in Ni, we get, by the Chernoff bound,

(|Ni|>(np)k+ip(1+(lnlnn)1)+ln2n)(Bin((np)k(1+o(1)),1(1p)(np)i(1+o(1)))>(np)k+ip(1+1lnlnn)+ln2n)=o(n2).

We finally get from (5) that with probability 1o(n2),

|Nk(v1)Nk(v2)|i=0k2ln2n(np)k2i((np)k+ip(1+o(1))+ln2n)+(np)k+k1p(1+o(1))+ln2n=(1+o(1))(np)k+k1p+ln2n,

as needed. From this and since whp, for every vertex v2 and every k[k1], |Nkk(v2)|=(1+o(1))(np)kk due to Claim 5, we get that whp, for every {v1,v2}𝒬 satisfying d(v1,v2)>k, there at most k((np)2kn+(np)k1ln2n) pairs (u1,u2) such that,

for some k[k1],v1Nk(u1),v2Nkk(u2),andv2Nk(u1). (6)

So, the event that the number of pairs {u1,u2} that have a pair {v1,v2} satisfying d(v1,v2)>k and (6) is at least 124(n2) implies

124n(n1)<k((np)2kn+(np)k1ln2n)N,

which is false for all large n. Due to symmetry, we get that whp the number of pairs {u1,u2} that have a pair {v1,v2} satisfying d(v1,v2)>k, v1Nk(u1), v2Nkk(u2), and v1Nk(u2) is less than 124(n2) as well, completing the proof.

Let [k]. We call a pair of vertices (u1,u2) -undetectable, if

  • u1,u2 are not adjacent in Gn and {u1,u2}𝒬,

  • there is no “path” u1w1wu2, such that of the pairs of consecutive vertices {u1,w1}, {w1,w2}, , {w,u2} are actual edges of Gn, the one remaining pair of consecutive vertices is from 𝒬, and the distance between the vertices in this pair is at least +2 in Gn.

Let us recall that whp Gn has diameter k+2 due to Corollary 7. Assuming that the pair {u1,u2} is -undetectable for all [k], observe that whp the addition of the edge {u1,u2} agrees with the answers to all the queries. Indeed, if it affects some queried pair {w,w}, then there should be a path wu1u2w of length +1k+1 in Gn{u1,u2} while in Gn the distance between w,w should be at least +2. This contradicts the fact that {u1,u2} is -undetectable since if we consider the sequence u2wwu1 obtained form the path wu1u2w by replacing the prefix wu1 and the suffix u2w, we get that the pair {w,w} is queried, d(w,w)+2, and all the other consecutive pairs of vertices are edges of Gn. Thus, whp the event that there exists a pair {u1,u2} which is -undetectable for all [k] implies that Gn is not 𝒬-reconstructible. It then remains to prove that whp Gn has a pair {u1,u2} which is -undetectable for all [k].

Let us now consider the following iterative procedure. At every step i1, we are given with a set of “considered” vertices Xi[n] (initially X1=), then we sample a pair {u1,u2} uniformly at random from ([n]Xi2) and expose Gn[Nk(u1)] and Gn[Nk(u2)]. If the pair {u1,u2} is not small or u1,u2 are adjacent (note that the exposed edges are enough to get these decisions), then we just skip this step and switch to the step i+1 with Xi+1:=XiNk(u1)Nk(u2).

Now, assume that at the current step i, we observe that the pair {u1,u2} is small and u1,u2 are not adjacent. Note that the pair {u1,u2} is -undetectable for every [k1] since it is small (so, due to the first bullet point in the definition of a small pair, it cannot have any “path” of length at most k from the definition of an undetectable pair). The crucial observation is that the edges between Nk(u1)Nk(u2) and Nk(u2)Nk(u1) have not been exposed yet unless they are entirely inside Xi. We then expose these missing edges and show that they define an event that has sufficiently large probability and implies that the pair {u1,u2} is k-undetectable (see Claim 16 below). After this step, we switch to the step i+1 with Xi+1:=XiNk(u1)Nk(u2) as before.

We perform τ=n1/(k+1)o(1) steps, where τ is chosen so that τ(np)k=o(n/logn). Due to Claim 5, whp (in the measure of the union of the balls exposed before this step) |Xτ|=o(n/logn), implying that whp (in the same measure), for every iτ, the probability that the pair {u1,u2} we sample at step i is small (in the measure of the random pair {u1,u2}) is at least 1/2o(1) by Claim 15.

Claim 16.

There exists an event u1,u2 that depends only on edges between Nk(u1)Nk(u2) and Nk(u2)Nk(u1), implies that the pair {u1,u2} is k-undetectable, and satisfies

(u1,u2Gn[Xi],{u1,u2}E(Gn),{u1,u2} is small)n7/(8(k+1))+o(1).
Proof.

Fix kk, v1Nk(u1), v2Nkk(u2) such that the pair {v1,v2} was queried. Note that since {u1,u2} is small, it may only happen when v1Nk(u2), v2Nk(u1).

Without loss of generality, assume that kkk. Note that, if there exists an edge in Gn between A(v1):=Nkk(v1)(XiNk(u2)) and A(v2):=Nk(v2)Nk(u1), then the distance between v1,v2 is at most k+1. Due to Claim 5, whp

|Nkk(v1)|=(1+o(1/logn))(np)kk, |Nk(u1)|=(1+o(1/logn))(np)k,
|Nk(v2)|=(1+o(1/logn))(np)k, |Nk(u2)|=(1+o(1/logn))(np)k.

Since |Xi|=o(n/logn), due to Hoeffding’s exponential tail bound for the hypergeometric distribution, whp |A(v1)|=(1+o(log1n))(np)kk and |A(v2)|=(1+o(log1n))(np)k. Then

(there is an edge between A(v1) and A(v2))=1(1p)(1+o(1/logn))(np)k=1eλ+o(1).

Since {u1,u2} is small, the number of queried {v1,v2} in k=0kNk(u1)×Nkk(u2) is at most M. Now, we define the desired event u1,u2: for every k and every queried pair {v1Nk(u1),v2Nkk(u2)}, there exists an edge between the respective A(v1) and A(v2). Then, due to the Harris inequality [17],

(u1,u2)(1eλ+o(1))Mexp[(7/8+o(1))lnn/(k+1)].

Whp in at least (1/2o(1))τ steps we get a small non-adjacent pair. Due to Claim 16, we then get that the probability that, for at least one of these small pairs {u1,u2}, the respective event u1,u2 happens is at least

1(1n7/(8(k+1))+o(1))(1/2o(1))τ=1exp[12n1+o(1)8(k+1)]=1o(1),

completing the proof.

7 Open questions

Although we get tight bounds on distance query complexity for the binomial random graph G(n,p) when the edge probability p satisfies (1), as p gets closer to a hitting time of a diameter increment, the difference between the bounds increases. In particular, there is a polynomial gap between the bounds even for the “non-adaptive reconstructibility” when p=Θ((nklnn)1/(k+1)) satisfies

(1o(1))(lnn(k+1)nk)1/(k+1)<p<(2lnn+O(1)nk)1/(k+1).

It would be interesting to get at least the right power of n in the query complexity in this case.

The further natural step is to generalise our results to growing diameter, i.e. to p=n1+o(1). Although Theorems 2, 3, and 4 are generalised directly to pln2n/n, it is unclear whether it is possible to extend Theorem 1 without any significant modification of the proof method. The main complication is in the application of the bounded difference inequality (Theorem 10) since we use the fact that e2i>n2/(k+1) for all i<lnn/C and some sufficiently large constant C. If k is growing, then there is no such C.

As we mention in Introduction, Mathieu and Zhou [31] proved that, for constant d, the query complexity of the random d-regular graph on [n] is at most nln2n whp. On the other hand, it is easy to prove the information–theoretic lower bound Ω(nlogn/loglogn). It seems likely that the query complexity of the random d-regular graph actually equals n(logn)1o(1) whp, and we are asking whether the lower bound is tight.

References

  • [1] R. Afshar, M. T. Goodrich, P. Matias, and M. C. Osegueda. Reconstructing Biological and Digital Phylogenetic Trees in Parallel. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 3:1–3:24, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.ESA.2020.3.
  • [2] R. Arratia, D. Martin, G. Reinert, and M. S. Waterman. Poisson process approximation for sequence repeats, and sequencing by hybridization. Journal of Computational Biology, 3(3):425–463, 1996. doi:10.1089/CMB.1996.3.425.
  • [3] L. Babai, P. Erdős, and S. M. Selkow. Random graph isomorphism. SIAM Journal on Computing, 9(3):628–635, 1980. doi:10.1137/0209047.
  • [4] P. Bastide and C. Groenland. Optimal distance query reconstruction for graphs without long induced cycles. arXiv preprint arXiv:2306.05979, 2023.
  • [5] Z. Beerliova, F. Eberhard, T. Erlebach, A. Hall, M. Hoffmann, M. Mihal’ak, and L. S. Ram. Network discovery and verification. IEEE Journal on selected areas in communications, 24(12):2168–2181, 2006. doi:10.1109/JSAC.2006.884015.
  • [6] B. Bollobás. The distribution of the maximum degree of a random graph. Discrete Mathematics, 32:201–203, 1980. doi:10.1016/0012-365X(80)90054-0.
  • [7] B. Bollobás. The diameter of random graphs. Transactions of the American Mathematical Society, 267:41–52, 1981.
  • [8] B. Bollobás. Almost every graph has reconstruction number three. Journal of Graph Theory, 14(1):1–4, 1990. doi:10.1002/JGT.3190140102.
  • [9] B. Bollobás. Random Graphs. Cambridge University Press, 2 edition, 2001.
  • [10] B. Bollobás, D. Mitsche, and P. Prałat. Metric dimension for random graphs. Electronic Journal of Combinatorics, 20(4):P1, 2013.
  • [11] R. Brodal, G. S. Fagerberg, C. N. S. Pedersen, and A. Östlin. The complexity of constructing evolutionary trees using experiments. BRICS Report Series, 8(1), 2001.
  • [12] Y. Demidovich, Y. Panichkin, and M. Zhukovskii. Reconstruction of graph colourings. arXiv preprint, 2023. arXiv:2308.01671.
  • [13] M. Dyer, A. Frieze, and S. Suen. The probability of unique solutions of sequencing by hybridization. Journal of Computational Biology, 1(2):105–110, 1994. doi:10.1089/CMB.1994.1.105.
  • [14] T. Erlebach, A. Hall, M. Hoffmann, and M. Mihal’ák. Network discovery and verification with distance queries. Algorithms and Complexity, pages 69–80, 2006.
  • [15] T. Erlebach, A. Hall, and M. Mihal’ák. Approximate discovery of random graphs. In Stochastic Algorithms: Foundations and Applications, SAGA 2007, volume 4665 of Lecture Notes in Computer Science. Springer, 2007. doi:10.1007/978-3-540-74871-7_8.
  • [16] A. Frieze and M. Karoński. Introduction to Random Graphs. Cambridge University Press, 2015.
  • [17] T. E. Harris. A lower bound for the critical probability in a certain percolation process. Mathematical Proceedings of the Cambridge Philosophical Society, 56:13–20, 1960.
  • [18] J. J. Hein. An optimal algorithm to reconstruct trees from additive distance data. Bulletin of mathematical biology, 51(5):597–603, 1989.
  • [19] H. Huang and K. Tikhomirov. Shotgun assembly of unlabeled Erdős–Rényi graphs. Probab. Theory Relat. Fields, 192:575–624, 2025. doi:10.1007/s00440-024-01347-4.
  • [20] S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. J. Wiley & Sons, 2000.
  • [21] T. Johnston, G. Kronenberg, A. Roberts, and A. Scott. Shotgun assembly of random graphs. Probability Theory and Related Fields, 2025. doi:10.1007/s00440-025-01380-x.
  • [22] S. Kannan, C. Mathieu, and H. Zhou. Graph reconstruction and verification. ACM Transactions on Algorithms, 14(4):1–30, 2018. doi:10.1145/3199606.
  • [23] S. K. Kannan, E. L. Lawler, and T. J. Warnow. Determining the evolutionary tree using experiments. Journal of Algorithms, 21(1):26–50, 1996. doi:10.1006/JAGM.1996.0035.
  • [24] P. J. Kelly. A congruence theorem for trees. Pacific Journal of Mathematics, 7:961–968, 1957.
  • [25] V. King, L. Zhang, and Y. Zhou. On the complexity of distance-based evolutionary tree reconstruction. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03), pages 444–453, 2003.
  • [26] L. Kučera. Canonical labeling of regular graphs in linear average time. In Proceedings of 28th Annual Symposium on Foundations of Computer Science (FOCS’87), pages 271–279, 1987.
  • [27] A. Lingas, H. Olsson, and A. Östlin. Efficient merging, construction, and maintenance of evolutionary trees. In Automata, Languages and Programming, pages 544–553, 1999.
  • [28] N. Linial and J. Mosheiff. On the rigidity of sparse random graphs. Journal of Graph Theory, 85(2):466–480, 2017. doi:10.1002/JGT.22073.
  • [29] G. D. Marmerola, M. A. Oikawa, Z. Dias, S. Goldenstein, and A. Rocha. On the reconstruction of text phylogeny trees: evaluation and analysis of textual relationships. PloS one, 11(12):e0167822, 2016.
  • [30] C. Mathieu and H. Zhou. Graph reconstruction via distance oracles. In International Colloquium on Automata, Languages and Programming (ICALP’13), pages 733–744, 2013.
  • [31] C. Mathieu and H. Zhou. A simple algorithm for graph reconstruction. In 29th Annual European Symposium on Algorithms (ESA’21), pages 68:1–68:18, 2021. doi:10.4230/LIPIcs.ESA.2021.68.
  • [32] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics: Invited Papers at the Twelfth British Combinatorial Conference, London Mathematical Society Lecture Note Series, pages 148–188. Cambridge University Press, 1989.
  • [33] E. Mossel and N. Ross. Shotgun assembly of labelled graphs. IEEE Transactions on Network Science and Engineering, 6(2):145–157, 2019. doi:10.1109/TNSE.2017.2776913.
  • [34] A. S. Motahari, G. Bresler, and D. N. Tse. Information theory of dna shotgun sequencing. IEEE Transactions on Information Theory, 59(10):6273–6289, 2013. doi:10.1109/TIT.2013.2270273.
  • [35] V. Müller. Probabilistic reconstruction from subgraphs. Commentationes Mathematicae Universitatis Carolinae, 17(4):709–719, 1976.
  • [36] G. Odor. The role of adaptivity in source identification with time queries. PhD thesis, EPFL, 2022.
  • [37] L. Reyzin and N. Srivastava. Learning and verifying graphs using queries with a focus on edge counting. In Algorithmic Learning Theory, pages 285–297. Springer, 2007. doi:10.1007/978-3-540-75225-7_24.
  • [38] S. Sen and V. N. Muralidhara. The covert set-cover problem with application to network discovery. In International Conference and Workshops on Algorithms and Computation (WALCOM’10), pages 228–239, 2010.
  • [39] D. Soudry, S. Keshri, P. Stinson, M.-h. Oh, G. Iyengar, and L. Paninski. Efficient “shotgun” inference of neural connectivity from highly sub-sampled activity data. PLoS Computational Biology, 11(12):e1004657, 2015.
  • [40] J. Spencer. Counting extensions. Journal of Combinatorial Theory, Series A, 55:247–255, 1990. doi:10.1016/0097-3165(90)90070-D.
  • [41] H. Spinoza and D. B. West. Reconstruction from the deck of k-vertex induced subgraphs. Journal of Graph Theory, 90:497–522, 2019. doi:10.1002/JGT.22409.
  • [42] S. M. Ulam. A collection of mathematical problems. Number 8 in Interscience Tracts in Pure and Applied Mathematics. Interscience Publishers, New York–London, 1960.
  • [43] M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer. Additive evolutionary trees. Journal of Theoretical Biology, 64(2):199–213, 1977.