Abstract 1 Introduction 2 Proof Overview 3 Preliminaries 4 Streaming Algorithm 5 Universal Optimality References

Universally Optimal Streaming Algorithm for Random Walks in Dense Graphs

Klim Efremenko ORCID Ben-Gurion University, Be’er-Sheva, Israel Gillat Kol ORCID Princeton University, NJ, US Raghuvansh R. Saxena Tata Institute of Fundamental Research, Mumbai, India Zhijun Zhang ORCID INSAIT, Sofia University “St. Kliment Ohridski”, Bulgaria
Abstract

Sampling a random walk is a fundamental primitive in many graph applications. In the streaming model, it is known that sampling an L-step random walk on an n-vertex directed graph requires Ω(nL) space, implying that no sublinear-space streaming algorithm exists for general graphs.

We show that sublinear algorithms are possible for the case of dense graphs, where every vertex has out-degree at least Ω(n). In particular, we give a one-pass turnstile streaming algorithm that uses only 𝒪~(L) memory for such graphs. More broadly, for graphs with minimum out-degree at least d, our streaming algorithm samples a random walk using 𝒪~(ndL) memory.

We show that our algorithm is optimal in a strong “beyond worst-case” sense. To formalize this, we introduce the notion of universal optimality for graph streaming algorithms. Informally, a streaming algorithm is universally optimal if it performs (almost) as well as possible on every graph, assuming a worst-case choice of the streaming order. This notion of universal optimality is a key conceptual contribution of our work.

Keywords and phrases:
Random Walk, streaming Algorithm, universal Optimality
Funding:
Klim Efremenko: Supported by the Israel Science Foundation (ISF) through grant No. 1456/18 and European Research Council Grant number: 949707.
Gillat Kol: Supported by a National Science Foundation CAREER award CCF-1750443 and by a BSF grant No. 2018325.
Raghuvansh R. Saxena: Supported by the Department of Atomic Energy, Government of India, under project no. RTI4001.
Zhijun Zhang: This research was partially done while at Princeton University and was partially funded by the Ministry of Education and Science of Bulgaria (support for INSAIT, part of the Bulgarian National Roadmap for Research Infrastructure).
Copyright and License:
[Uncaptioned image] © Klim Efremenko, Gillat Kol, Raghuvansh R. Saxena, and Zhijun Zhang; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithms
Editor:
Shubhangi Saraf

1 Introduction

1.1 Background and Motivation

Sampling a random-walk in a graph is a fundamental problem with broad algorithmic applications. For instance, it underlies techniques for connectivity testing [17], clustering [2, 3, 6, 21], sampling [13], constructing random spanning trees [20], and approximate counting [12]. Another well-known application of random walk sampling is for PageRank estimation [19], where the rank of a webpage corresponds to the likelihood that a user, randomly following hyperlinks, arrives at that page111Formally, in a web graph G=(V,E), PageRank values satisfy the system 𝖯𝖺𝗀𝖾𝖱𝖺𝗇𝗄(u)=(v,u)E𝖯𝖺𝗀𝖾𝖱𝖺𝗇𝗄(v)/dout(v) for every vertex u, where dout(v) denotes the out-degree of v [5].. Since these applications often involve extremely large, real-world networks, an important challenge is to develop space-efficient graph-streaming algorithms that process the graph edge-by-edge rather than storing it explicitly.

The folklore algorithm.

A well-known folklore streaming algorithm for simulating an L-step random walk on an n-vertex directed graph from a given start vertex v0 uses 𝒪~(nL) memory222Throughout, the notation 𝒪~() hides polylogarithmic factors in max(n,L). Algorithms are assumed to know n in advance, but not the length of the stream (the number of edges).. The algorithm works as follows: As the graph’s edges arrive in a stream, maintain for each vertex v a list of L outgoing edges selected independently and uniformly at randomly. To generate the walk, begin at v0 and, at step i[L], move to a new vertex vi by taking the first sampled edge that haven’t been taken before from the precomputed list of outgoing edges at vi1.333For example, if the first two sampled edges from vi1 have already been used, take the third. A single edge (v,u) may appear and be traversed multiple times if it occurs repeatedly in v’s stored list of outgoing edges.

The folklore algorithm was later shown to be optimal [14]; however, the hard instances used by the lower-bound proof crucially involve vertices of very low out-degree. This naturally raises the question of whether better algorithms might exist for dense graphs, which we define as graphs in which every vertex has out-degree at least d for some large d1.444We assume d1, since if a vertex (e.g., v0) has out-degree 0, there may not be any paths of length L starting from v0. We use the minimum out-degree d as our measure of density rather than, say, the average degree (or equivalently, the total number of edges), as, for instance, the graph formed by the union of a clique and a sparse subgraph may still have a high average out-degree. Yet, if v0 is in the sparse subgraph, then sampling a random walk starting from v0 requires sampling a random walk in the sparse subgraph.

1.2 Our Results

1.2.1 Random Walk Sampling for Dense Graphs

The algorithm 𝝅𝒅,𝑳sparse.

We begin by noting that if the input graph has minimum out-degree at least d and Ld is large, the folklore algorithm can be improved to store only 𝒪~(Ldn) edges. Specifically, instead of storing L outgoing edges per vertex, πd,Lsparse stores 𝒪~(Ld) outgoing edges for each vertex. We show that this is enough because, with high probability, no vertex is visited more than that many times during an L-step random walk on a graph with minimum out-degree at least d. We call this algorithm πd,Lsparse.

The distortion of a random walk sampling algorithm is the maximum, over all graphs G, of the statistical distance between the true distribution of random walk on G and the distribution over outputs produced by the algorithm on input G. The distortion of πd,Lsparse is small but nonzero, since, for example, it never outputs the rare walks in which a vertex is visited more than Ld times.

The algorithm 𝝅𝒅,𝑳dense.

Since πd,Lsparse must store at least one sampled edge per vertex (and we assume Ld is large), it never achieves sublinear space. We therefore propose a second, fundamentally different, streaming algorithm, πd,Ldense, which, when d is sufficiently large (relative to L), only uses sublinear memory.

The algorithm πd,Ldense proceeds as follows: Set V0={v0}. Sample L sets of vertices (layers of a graph) V1,,VL, each of size 𝒪~(nd) (if a vertex is sampled in multiple layers, we treat each occurrence as a distinct copy of the original vertex). Intuitively, Vi represents the possible choices for vertex vi. Then, for each i{0,,L1} and vVi, the algorithm samples a random outgoing edge from v into Vi+1, thereby constructing an L-layer graph. The output is the unique path of length L starting from v0 in this layered graph.

It is not hard to verify the correctness of πd,Ldense: πd,Ldense fails in the event that some vertex in a layer has no outgoing edge to the next layer. However, the sizes of the layers are selected such that this event happens with low probability. If the algorithm does not fail and the path (v0,v1,,vL) is sampled, we claim that each vi+1 is a random (outgoing) neighbor of vi, and therefore that the output is a random walk. Indeed, since Vi+1 is a random subset of vertices (of a specific size), choosing a random neighbor of vi within Vi+1 is equivalent to selecting a random neighbor of vi in V. Additionally, the algorithm only stores 𝒪~(ndL) edges, and, in particular, on graphs with minimum out-degree d=Ω(n), its space usage reduces to 𝒪~(L), which can be sublinear in n.

Note that πd,Ldense enjoys the following property: It fails with a small probability, but, conditioned on non-failure, it samples from the true distribution of random walks. πd,Lsparse does not satisfy this property.

1.2.2 Streaming Universal Optimality

The notion of streaming universal optimality.

πd,Ldense is essentially optimal for very dense graphs with d=Ω(n) as it uses only 𝒪~(L) memory and the output consists of L edges. But is πd,Ldense optimal for all values of d?

A worst-case optimality result for a graph streaming algorithm would identify a family of hard graph instances (on which the algorithm uses the most memory) and show that no other correct algorithm can use less memory on these instances. However, for easier instances, the algorithm may be highly suboptimal. This notion of optimality can be viewed as existential optimality: There exist inputs within the considered class for which the algorithm is optimal. A much more ambitious goal is to seek universal optimality, where the algorithm achieves optimal performance on every instance.

Universal optimality [10], along with other “beyond worst-case” notions, has been studied for decades in various computational models, see the great book [18]. In this work, we introduce the notion of universal optimality for streaming algorithms. Informally, a streaming algorithm is universally optimal if it performs (nearly) as well as possible on every graph under worst-case choice of streaming order. Specifically, for any graph G, if the algorithm uses S space on some streaming order of G’s edges, then every other correct algorithm must also use at least (approximately) S space on some (possibly different) streaming order of G. Further discussion of the notion of universal optimality for streaming algorithms appears in Section 1.2.3.

Universally optimal random walk samplers?

A priori, it is not clear that universally optimal algorithms should even exist for the task T of sampling an L-step random walk with distortion, say, 0.1, over the set of graphs 𝒢 consisting of directed graphs with minimum out-degree at least d. Indeed, it is not hard to see that neither πd,Lsparse nor πd,Ldense satisfies such universal optimality for all values of d and L.555For d=Ω(n) and L=n, πd,Lsparse uses Ω(n) memory, while πd,Ldense uses 𝒪~(n), so πd,Lsparse is not universally optimal. If L=n and n vertices have out-degree Ω(n) while the rest have constant out-degrees, d is constant. In this case, πd,Ldense uses Ω(n1.5) memory, whereas πd,Lsparse uses only 𝒪~(n) (as each of nn constant out-degree vertices only stores a constant number of edges), showing that πd,Ldense is not universally optimal.

Nevertheless, we show that with a careful implementation, πd,Lsparse and πd,Ldense can be combined into a universally optimal algorithm πd,L as follows: If Ld is greater than (roughly) logn, run πd,Lsparse; otherwise, run πd,Ldense.

Theorem 1.

For d,L, πd,L solves the task of sampling an L-step random walk, with distortion 0.1, from a given vertex in any directed graph with minimum out-degree d. It uses at most 𝒪~(vmin(dout(v),Ld))𝒪~(ndL) space.

Furthermore, it is universally optimal over the set of directed graphs with minimum out degree at least d, for any d,Llog10n.

We note that πd,L does not work for dynamic streams, since it relies on reservoir sampling [23] to select the outgoing edges to be stored. Nevertheless, by replacing reservoir sampling with 0-samplers [15], we can obtain a new algorithm that works for dynamic streams and uses 𝒪~(ndL) space. However, this algorithm is not (insertion-only) universally optimal with a straightforward implementation666Intuitively, it is possible that a vertex v has Ω(L/d) outgoing edges during the stream while its actual out-degree is very small at the end. For πd,Lsparse, this means Ω(L/d) 0 samplers are needed. In contrast, if only insertion-only streams are considered, πd,Lsparse can simply store all outgoing edges if dout(v)<L/d.. In Section 1.2.3, we will show that this turns out to be the best we can do.

As pointed out in [10], an interesting “side-effect” of universal optimality is that it precisely identifies the parameters of the problem that are inherently responsible for its complexity. In our case, the minimum out-degree d of the graph dictates the optimal space complexity (at least when d is known to the algorithm and within certain parameter regimes). In particular, once d is specified, the optimal space requirement no longer depends on other parameters (such as the average out-degree, the minimum total degree, or the length of the longest directed path in the graph).

Do we need to know 𝒅?

The algorithm πd,L requires advance knowledge of d. A natural question is whether one can design a one-pass streaming algorithm with comparable space usage that does not rely on knowing d beforehand777This can be done in just two passes: the first pass computes the minimum out-degree d, and the second pass runs πd,L using the degree d computed in the first pass..

Specifically, let d be sufficiently large for Theorem 1 to hold, and let πL be an algorithm for sampling an L-step random walk in any graph with minimum out-degree at least d (but unknown to πL). Let G be a graph with minimum out-degree d=Ω(n). Running πd,L on G requires Ω(ndL) space. By the universal optimality of πd,L guaranteed by Theorem 1, it follows that any algorithm that works for all graphs with minimum out-degree at least d, including πL, must also use Ω~(ndL) space on some streaming order of G. In contrast, πd,L requires only 𝒪~(L) space on every streaming order of G. Hence, algorithms that do not know d in advance cannot achieve the same space as πd,L.

1.2.3 Further Discussion of the Notion of Universal Optimality

Universal optimality for dynamic streaming algorithms.

Our definition of universal optimality considers an algorithm τ that works for insertion-only streams, and, for every graph, compares its space usage to that of any other algorithm τ that works for insertion-only streams. One can consider the same definition where τ and τ work for dynamic streams. However, we observe that notion of universal optimality for dynamic streaming algorithms coincides with standard worst-case optimality.

To see this, suppose τ is a worst-case optimal algorithm for some task T on dynamic streams. Fix n. Let S denote the maximum memory used by τ on graphs with n vertices. By (worst-case) optimality of τ, for every other algorithm τ that solves T, there exists a stream of edge updates Eτ on n vertices such that τ requires at least S memory on Eτ. Now take any graph G on n vertices and consider an algorithm τ for T. Construct the following stream for G: First insert/delete edges according to Eτ, then reverse all of them, and finally insert the edges of G. Since τ already uses at least S memory on the prefix Eτ, it uses at least S memory on this streaming order of G. In contrast, τ never exceeds S memory on any stream. Hence, τ is also universally optimal.

Universal optimality vs. instance optimality.

Our definition of universal optimality requires the algorithm to performs (nearly) as well as any competitor on every graph G, when comparing their worst-case memory usage across all possible streaming orders of G. However, this leaves open the possibility that on specific streaming orders of G, a competing algorithm could do substantially better. A stronger requirement would be to guarantee optimality on every stream, not just every graph. Concretely, for any edge stream I, the memory used by any competing algorithm on I must be at least as large as that used by our algorithm on the same I. This stronger guarantee corresponds to adapting the notion of instance optimality, where an algorithm must be optimal on each individual instance, to the context of graph streaming algorithms [9] (see also [1, 22]).

However, it turns out that instance optimality makes little sense. Suppose d=n and L=n. Consider an n-vertex graph G which is the disjoint union of n cliques of size n each, along with the input stream where edges are given in lexicographical order. We can construct an algorithm τ which checks if the input stream is exactly the edges of G given in lexicographical order. This can be done in only 𝒪~(1) additional space because the edges of G can be easily generated in lexicographical order in such space. As soon as the check fails, τ simulates πd,L from the beginning of the stream. This is possible because the prefix of the stream can be regenerated. If the check passes at the end, then an L-step random walk can be easily generated in 𝒪~(L) space as we know all the edges (implicitly). Clearly, τ always performs as well as πd,L. Furthermore, τ is strictly better than πd,L on the specific instance of G and a lexicographical order of the edges of G.

More generally, the above “hard-coding” method can apply to any graph problem, if we can design a more space-efficient algorithm tailored to specific instances that have succinct representations and can be checked in small space. However, note that such things cannot happen with universal optimality as there is no way to check, using small space, if the input graph falls into certain categories.

1.3 Additional Related Work

In an influential work, [19] give an 𝒪~(L)-pass, 𝒪~(n)-space algorithm for simulating L-step random walks on directed graphs, and used it to derive space-efficient algorithms for estimating PageRank on graph streams. For one-pass streaming, [14] proved an Ω(nL) space lower bound, thereby proving the optimality of the folklore algorithm. For undirected graphs, [14] also gave an 𝒪(nL)-space algorithm and a matching lower bound for sufficiently large L. Recently, [7] showed that the space complexity of random walk sampling on directed graphs is Θ~(nL) when two passes over the stream are allowed. Additionally, [16, 8] studied the random walk sampling problem in the random-order streaming model.

[4] studies sublinear algorithms for MAXCUT and correlation clustering. For dense graphs, constant-space algorithms are known, whereas for sparse graphs, Ω(n) space lower bounds hold in the streaming model. Their work bridges these two extremes by providing algorithms whose performance is parameterized by the average degree.

2 Proof Overview

We overview the proof of Theorem 1 in this section. As the algorithm mentioned was already described in Section 1, we mostly focus on the proof of universal optimality. However, we do give some extra intuition for the algorithm πd,Ldense.

2.1 The Sampling Algorithm 𝝅𝒅,𝑳dense

Just like πd,Lsparse, our algorithm πd,Ldense is based on the folklore algorithm but develops it from a different perspective than πd,Lsparse. Recall that the standard view of the folklore algorithm interprets its memory as a set {Lv}vV of n lists, where each Lv stores L sampled outgoing edges from v. The algorithm πd,Lsparse is obtained by simply shortening these lists by a factor of d and a proof that such a shortening will not affect the correctness of the algorithm.

For the algorithm πd,Ldense, we reinterpret the memory of the folklore algorithm as maintaining an L-layered graph rather than a collection of lists. Each layer contains a copy of every vertex v, and the (only) outgoing edge of v in layer i points to some vertex in layer i+1. Specifically, if the i-th edge in Lv is (v,u), then the copy of v in layer i has a single outgoing edge to the copy of u in layer i+1. This perspective is equivalent to viewing the folklore algorithm’s memory as an n×L matrix of edges, where the row corresponding to vertex vV encodes Lv. In this representation, the i-th column specifies the edges connecting layer i to layer i+1 in the layered graph.

In this matrix (or layered-graph) view, πd,Lsparse corresponds to restricting the number of columns (equivalently, the number of layers) from L to Ld. By contrast, our algorithm πd,Ldense reduces the number of rows (equivalently, the size of each layer) from n to nd, using the guarantee on the minimum out-degree to show that such a reduction does not affect the correctness of the algorithm. It is easy to see that this memory restriction of the folklore algorithm yields exactly the algorithm πd,Ldense described in Section 1.2.1.

The property testing approach.

Prior work has sometimes been motivated by a query model perspective to design algorithms for dense graphs. As such a perspective also yields an algorithm for our problem, we outline it here. However, the algorithm so obtained turns out to be suboptimal.

In the query model, a random walk can be efficiently simulated if adaptivity is allowed. Starting from v0, we select a random neighbor v1 by querying random entries in the adjacency row of v0 until a 1 is found, then proceed similarly to select a neighbor v2 of v1, and so on. Consider, for instance, the case where minimum degree at least d=Ω(n). In this case, the expected number of queries needed to find the next vertex is constant. Consequently, the entire process requires roughly Θ(L) queries in expectation.

To obtain a one-pass streaming algorithm, we aim to transform the above adaptive query algorithm into a non-adaptive one, which are easily simulated by a streaming algortihm. The beautiful technique of [11] shows how to convert an adaptive query algorithm with q queries into a non-adaptive one by sampling an induced subgraph on 𝒪(q) vertices, which requires 𝒪(q2) queries.888While [11] state their result in the context of converting adaptive graph property testers into non-adaptive ones, their technique can be applied to convert the adaptive random walk algorithm above into a non-adaptive version. However, applying this idea directly to our adaptive random walk sampling algorithm yields a memory complexity of 𝒪~(L2) when d=Ω(n), which is not optimal as it is worse than the bound we get.

2.2 Universal Optimality

We now overview the proof that our algorithm is universally optimal. For this, we first describe a lower bound approach designed for directed bipartite graphs that has been helpful in prior works as well. Consider a directed bipartite graph G=(V1V2,E) and an integer d>0 such that every vertex vV1 has out-edges to an independent uniformly random subset of size dout of vertices in V2. The vertices in V2 have only one out-edge and all these edges lead to the same vertex zV1 that is chosen uniformly at random.

From a “sampling” problem to a “search” problem.

Because all the vertices in V2 lead to the same vertex z, a random walk starting from any vertex in V2 would first go to z, then take a uniformly random edge going out of z, then go to z again, and so on for L steps. This means that any such walk contains within it, a sequence of L/2 edges going out of z chosen independently and uniformly at random. As the number of distinct out-edges of z is dout, we get that with high probability at least Ω(min(dout,L)) of these edges are distinct. We conclude that any algorithm that samples a random walk in this graph can also be used to solve a search problem, where the goal is to identify a subset of edges of size Ω(min(dout,L)) coming out of z, with high probability.

Lower bound for the search problem.

However, it can be shown that the search problem is hard for streaming algorithms. Indeed, consider what happens when the edges in the graph are presented in a stream with all the edges going out of the vertices in V1 appearing first in the stream followed by the edges going out of the vertices in V2. In this case, when the algorithm is processing the edges going out of the vertices in V1, it has no information about the vertex z for which it has to remember the out-edges. Thus, unless the algorithm records enough out-edges from a lot of vertices in V1, it is highly likely that when z is chosen uniformly at random from the vertices in V1, the algorithm would not have stored enough edges coming out of z.

In other words, in order to output Ω(min(dout,L)), the algorithm actually needs to remember this number of edges from all the vertices in V1, thereby needing at least Ω(|V1|min(dout,L)) memory. Observe that this proof is an adaptation of the lower bound proof for the well-known Index problem.

Further discussion and extensions of this approach.

Note that it is important in the above proof that every vertex in V1 has the same number dout of edges going out of it. In case the number of out-edges going out of different vertices in V1 is vastly different, the algorithm can focus on remembering only the edges from vertices that have few out-edges, and hope that when z is sampled it happens to be one of these vertices. This could mean a worse space lower bound.

Also, observe that the instances considered in this lower bound approach are such that all vertices in V2 have out-degree 1. However, for all dmin(dout,|V1|), it can be extended to the case where the out-degree of every vertex is at least d by letting all the vertices in V2 have out-edges to the same subset ZV1 of size d of vertices in V1 instead of having edges to exactly one vertex zV1.

With this change, a random walk would not always return to the same vertex in V1 and instead returns to a uniformly random vertex in Z. Thus, for any given vertex in Z, a random walk will only return to it (roughly) L/(2d) many times. In turn, we have that the algorithm only needs to store around L/(2d) many edges coming out of any vertex. As this number is now smaller, this would lead to a reduced space lower bound of Ω(|V1|min(dout,Ld)). This is exactly where the expression in Theorem 1 comes from, but its not the end of the proof, as the input graph is not guaranteed to be bipartite or promised to have the same number of edges from all vertices in V1.

Extending to general graphs 𝑮.

We now turn to show how the above approach can be extended to work for all graphs and give a lower bound of Ω~(vmin(dout(v),Ld)) by embedding a hard instance of the form above in any input graph G. For this, we “clean” any input graph G in a sequence of steps, ultimately leading to a hard instance like those above.

Our first step is to identify a “dominant” degree in the graph. For this, we partition the vertices into logn levels based on which two consecutive powers of 2 its out-degree lies in. This also partitions the sum we want in our lower bound and we focus on the part that is the largest. This gives a subset of vertices VV of vertices that have similar degrees and suffices for the lower bound up to logarithmic factors.

Having decided on a set V to focus on, we now identify a cut in the graph G such that a constant fraction of the vertices in V lie on the left side of the cut, and a constant fraction of the neighbors of all these vertices lie on the right side of the cut. To show that such a cut exists, note that a random cut in the graph will have the property such that in expectation, a constant fraction of the neighbors of any vertex lie on a different side of the cut as that vertex. Using probabilistic arguments, we show that a cut exists for which this is true for a constant fraction of the vertices in V.

We are almost there, except that the vertices in V don’t have exactly the same degree which was assumed in our proof approach. This can be easily fixed by removing some of the edges from all vertices in V till they have the same degree. As they had degrees within a factor of 2 to being with, this loses only a constant fraction of edges and only costs us another constant factor in our lower bound. Another difference is that we started with a graph where every vertex is promised to have out-degree at least d. However, as we removed the edges not in the cut, we may have shrunk the out-degrees of vertices. As this may make some out-degrees smaller than d, it may take the resulting graph outside the promise, effectively making it useless. To fix this, we identify another set of 𝒪(d) randomly chosen vertices in the graph and connect them with each other in a clique and also connect all the other vertices to them. This increases the degrees of all vertices back to at least d, satisfying the promise. Moreover, as999When d and n are within a constant factor of each other, our bound becomes 𝒪~(L) which is tight as it is the size of the output. d=o(n), this only loses us another constant in the lower bound.

A hard distribution.

Finally, note that the above construction only gave us a single graph while our lower bound approach required a distribution of graphs. This is a major difference, as there are algorithms that will solve the search problem on this particular graph (one can simply “hardwire” the answer for this graph in the algorithm) making a lower bound impossible. We get around this by converting our hard graph into a distribution over graphs as follows: For every vertex in V, we only include a randomly chosen subset of half its edges in the graph and the other edges are discarded. This again only costs us a constant factor but creates for us a distribution just like we need for the lower bound approach.

We finish this section by arguing why a lower bound obtained from such a hard distribution after making all these changes would also apply to the original graph G, as required for universal optimality. For this, we observe that our lower bound approach shows a lower bound on the memory needed at a specific point in the stream, when it has processed all the edges coming out from V1 but before processing any edges that come out from V2. In our constructed distribution, the set V1 corresponds to the set V and thus, the inputs streams we generate will start with the edges coming out of the vertices in V and the memory lower bound will hold for the memory state of the algorithm immediately after processing the edges. Crucially, (unlike the edges going into V or the edges added to raise the degree) all these edges were also part of the original graph G. Thus, for all the algorithm knows, the graph might as well have been G with these edges presented first in the stream, and the lower bound we show for our constructed distribution will also apply to the original graph G, finishing the proof.

3 Preliminaries

Notation.

For integer n>0, we write [n] as a shorthand for {1,,n}. denotes the set of positive integers. We often omit ceiling symbols, and by assigning a real number x to an integer y, we mean that y gets x.

Graphs.

For a directed graph G=(V,E) and any vertex vV, we define NoutG(v) to be the set of all out-edges of v and define doutG(v)=|NoutG(v)| to be the out-degree of v. We omit the superscript G when it is clear from context. We use n=|V| throughout.

Random walks.

An L-step random walk in G with starting vertex v0V is a (directed) path (v0,,vL) of length L in G such that vi is an independent and uniformly random out-neighbor of vi1 for all i[L]. We say an algorithm samples an L-step random walk starting from v0 with distortion δ0, if the sampled distribution of (v0,,vL) is δ-close in statistical distance to the true distribution of an L-step random walk starting from v0.

Graph streaming.

An (insertion-only) graph streaming algorithm takes as input a sequence of edges of an underlying graph, and is required to process the edges one at a time using limited space. At the end of the stream, the algorithm needs to return an output for certain graph problems. In this paper, we consider the task of sampling random walks.

Universal optimality.

A graph streaming algorithm τ is universally optimal for a task T over a set of graphs 𝒢 if:

  1. 1.

    τ solves T.

  2. 2.

    For any streaming algorithm τ that also solves T, the following holds: Let G be any graph in 𝒢 and let S be the maximum memory size used by τ over all streaming orders of G. Then, the maximum memory size used by τ over any streaming order of G is at most Spolylogn.

Chernoff bound.

We use the following version of the Chernoff bound.

Lemma 2 (Multiplicative Chernoff bound).

Suppose 𝖷1,,𝖷n are independent random variables taking values in [0,1]. Let 𝖷 denote their sum. Then,

Pr(𝖷(1+δ)𝔼[𝖷]) eδ2𝔼[𝖷]2+δ, 0δ,
Pr(𝖷(1δ)𝔼[𝖷]) eδ2𝔼[𝖷]2, 0δ1.
Information theory.

We use sans-serif letters to denote random variables. We reserve to denote an arbitrary event. All random variables will be assumed to be discrete and we shall adopt the convention 0log10=0. When it is clear from context, we may abbreviate an event 𝖷=x as just x. All logarithms are taken with base 2.

Definition 3 (Entropy).

The (binary) entropy of 𝖷 is defined as:

(𝖷)=x𝗌𝗎𝗉𝗉(𝖷)Pr(x)log1Pr(x).

The entropy of 𝖷 conditioned on is defined as:

(𝖷)=x𝗌𝗎𝗉𝗉(𝖷)Pr(x)log1Pr(x).
Definition 4 (Conditional entropy).

We define the conditional entropy of 𝖷 given 𝖸 and as:

(𝖷𝖸,)=y𝗌𝗎𝗉𝗉(𝖸)Pr(y)(𝖷y,).

Henceforth, we shall omit writing the 𝗌𝗎𝗉𝗉() when it is clear from context.

Lemma 5 (Chain rule for entropy).

It holds for all 𝖷, 𝖸, 𝖹 and that:

(𝖷,𝖸𝖹,)=(𝖷𝖹,)+(𝖸𝖷,𝖹,).
Lemma 6 (Conditioning reduces entropy).

It holds for all 𝖷, 𝖸, 𝖹 and that:

(𝖷𝖸,𝖹,)(𝖷𝖹,).

Equality holds if and only if 𝖷 and 𝖸 are independent conditioned on 𝖹, .

As a corollary of Lemmas 5 and 6, we get that entropy is subadditive.

Corollary 7 (Subadditivity of entropy).

It holds for all 𝖷, 𝖸, 𝖹 and that:

(𝖷,𝖸𝖹,)(𝖷𝖹,)+(𝖸𝖹,).
Lemma 8.

It holds for all 𝖷 and that:

0(𝖷)log(|𝗌𝗎𝗉𝗉(𝖷)|).

The second inequality is tight if and only if 𝖷 conditioned on is the uniform distribution over 𝗌𝗎𝗉𝗉(𝖷).

As a corollary of Lemmas 5 and 8, we also have the following bound.

Corollary 9.

It holds for all 𝖷, 𝖸, 𝖹 and that:

(𝖷𝖸,𝖹,)(𝖷𝖹,)(𝖸𝖹,).
Lemma 10 (Shearer’s inequality).

Let n>0 and 𝖷1,,𝖷n be random variables. Let 𝖷=(𝖷1,,𝖷n) and 𝖹[n] be a set-valued random variable that is independent of 𝖷. For any event determined by 𝖷, it holds that:

(𝖷)mini[n]Pr(i𝖹)(𝖷𝖹𝖹,).
Proof.

We have:

(𝖷𝖹𝖹,) =ZPr(Z)(𝖷𝖹Z,) (Definition 4)
=ZPr(Z)(𝖷ZZ,)
=ZPr(Z)(𝖷Z) (as 𝖹 is independent of 𝖷,)
=ZPr(Z)iZ(𝖷i𝖷Z[i1],) (Lemma 5)
ZPr(Z)iZ(𝖷i𝖷[i1],) (Lemma 6)
=i=1nZPr(Z)𝟙(iZ)(𝖷i𝖷[i1],)
mini[n]Pr(i𝖹)i=1n(𝖷i𝖷[i1],)
=(𝖷)mini[n]Pr(i𝖹). (Lemma 5)

Corollary 11.

Let n>0 and 𝖷1,,𝖷n be random variables. Let 𝖷=(𝖷1,,𝖷n) and 𝖹[n] be a set-valued random variable that is independent of 𝖷. For any random variable 𝖸 determined by 𝖷, it holds that:

(𝖷𝖸)mini[n]Pr(i𝖹)(𝖷𝖹𝖹,𝖸).

4 Streaming Algorithm

The goal of this section is to prove the following Theorem 12. Algorithms πd,L, πd,Lsparse, and πd,Ldense, are given in Algorithms 1, 2, and 3, respectively. By “output Fail,” we mean that the algorithm outputs (v0,,v0). We will elaborate on the details of their streaming implementation later on.

Algorithm 1 The protocol πd,L(G,v0).
Theorem 12.

Let d,L. πd,L solves the task of sampling an L-step random walk with distortion 0.1. It uses 𝒪~(vmin(dout(v),L/d)) space.

As Algorithm 1 is simply a combination of Algorithms 2 and 3, Theorem 12 is a direct corollary of the following Lemmas 13 and 14. Indeed, when L/d103logn, we know that L/d is larger than min(dout(v),L/d) by a multiplicative factor of at most 103logn as dout(v)d1. Note that vmin(dout(v),L/d) can be sublinear in n when L/d1.

We also emphasize that the proof of Lemma 14 does not require L/d103logn. In other words, Algorithm 3 alone is sufficient to achieve 𝒪~(Ln/d) space in the worst case. However, as noted in Section 1, Algorithms 2 and 13 are necessary for achieving universal optimality.

Algorithm 2 The protocol πd,Lsparse(G,v0).
Lemma 13.

Let d,L such that L/d>103logn. πd,Lsparse solves the task of sampling an L-step random walk with distortion 0.1. It uses 𝒪~(vmin(dout(v),L/d)) space.

Algorithm 3 The protocol πd,Ldense(G,v0).
Lemma 14.

Let d,L. πd,Ldense outputs Fail with probability at most 0.1 and otherwise samples an (unbiased) L-step random walk. It uses 𝒪~(Ln/d) space.

The following two sections constitute proofs of Lemmas 13 and 14, respectively.

4.1 Proof of Lemma 13

The correctness of Algorithm 2 follows from the following claim.

Claim 15.

Let d,L such that L/d>103logn. For any directed graph G=(V,E) with minimum out-degree at least d and vertex v0V, with probability at least 0.9, an L-step random walk from v0 in G visits any vertex at most 2L/d times.

Proof.

Fix vV. For i[L], let Vi be the random variable for the i-th vertex on an L-step random walk from v0 in G. For any sub-path (v0,,vi1), Pr(Vi=vV0=v0,,Vi1=vi1)1d because vi1 has at least d outgoing edges, with vi possibly being one of them.

Note that the events Vi=v may not be independent. Nevertheless, as each step of a random walk is sampled independently, the probability of v being visited more than 2L/d times is always upper bounded by the probability that L independent biased coins, each with 1/d probability of getting a head, generate more than 2L/d heads in total. By Lemma 2 and the assumption L/d>103logn, this happens with probability at most 1/n2. The claim then follows by union bounding over all vertices.

As to the space usage, Algorithm 2 can be implemented using reservoir sampling [23]. Concretely, for each vertex vV, it takes 𝒪(logn) bits to keep track of dout(v). Meanwhile, it stores all its outgoing edges while dout(v)2L/d. Once dout(v)>2L/d during the stream, it can use 2L/d independent reservoir samplers, each taking 𝒪(logn) bits. So each vertex v takes 𝒪~(min(dout(v),L/d)) space. Since the reservoir samplers are unbiased, this concludes the proof of Lemma 13 by applying Claim 15.

4.2 Proof of Lemma 14

To implement Algorithm 3, for each iL and vertex vVi1, one reservoir sampler is enough since all Vi’s can be sampled in advance. The space bound follows as L is at most poly(n).

We now show that conditioned on no failure, Algorithm 3 samples an (unbiased) L-step random walk. In other words, for every i[L], vi has to be a uniformly random out-neighbor of vi1 in G. To this end, consider any two out-neighbors u,u of vi1 in G, and subset UV of size s. Suppose Vi=U. If u,uU, each of them has the same probability of being sampled as vi. If u,uU, none of them would be sampled as vi. If exactly one of u,u is in U, assume uU without loss of generality. Since Vi is a uniformly random subset of size s, U=(U{u}){u} has the same probability as U of being sampled as Vi. Furthermore, the number of out-neighbors of vi1 in Vi is the same in the two cases. As a result, the probability of u being sampled as vi in the case Vi=U is the same as the probability of u being sampled as vi in the case Vi=U. Altogether, this means that the overall probability of being sampled as vi is the same for u,u.

It then remains to show that Algorithm 3 outputs Fail with probability at most 0.1. Note that the only possibility of failure is when there exists i[L] and vertex vVi1 such that v has no out-neighbor in Vi. This happens only if s=n/dln(10Ln). For any i[L] and vVi1, the probability of v not having out-neighbors in Vi is at most

(nds)(ns)=j=1sndj+1nj+1(ndn)sexp(dsn)=110Ln.

This concludes the proof of Lemma 14 by union bounding over all 1+s(L1)LsLn options for i,v.

5 Universal Optimality

The goal of this section is to show that the algorithm described in Algorithm 1 is universally optimal upto logarithmic factors.

We define a triple of positive integers L,d,n to be significant if n>0 is large enough for asymptotic inequalities to holds and L(logn)10 and (logn)10dn(logn)10.101010If dn/(logn)10, πd,L (or simply πd,Ldense) uses at most 𝒪~(L) space. This must be universally optimal as outputting a length-L path requires Ω(L) space. For d,n>0, we define the set 𝒢n,d to be the set of all directed simple graphs where there are n vertices and every vertex has out-degree at least d. We say that a streaming algorithm 𝖠𝗅𝗀 is a random-walk algorithm if given significant integers L,d,n, a graph G=(V,E)𝒢n,d and a start vertex V, the algorithm 𝖠𝗅𝗀 samples an L-step random walk from the start vertex with distortion at most 0.1. The theorem we show in this section is the following Theorem 16. Theorem 1 then follows directly from Theorem 16 and Theorem 12.

Theorem 16.

Let L,d,n be significant, 𝖠𝗅𝗀 be a random-walk algorithm, and G=(V,E)𝒢n,d with a designated start vertex. There exists a streaming order for the edges of G on which 𝖠𝗅𝗀 needs memory at least 1(logn)4vVmin(dout(v),Ld).

The proof of Theorem 16 spans this entire section. Fix L,D,n,𝖠𝗅𝗀, and G be as in the theorem statement. As G𝒢n,d, we have that for all vV, it holds that ddout(v)<n. This means that there exists logd<klogn such that:

1lognvVmin(dout(v),Ld)vV2k1dout(v)<2kmin(dout(v),Ld). (1)

We fix this k for the rest of this proof and define V to be the set of vertices in the sum on the right. Namely,

V={vV2k1dout(v)<2k}. (2)

Our first lemma shows that the set V is large.

Lemma 17.

It holds that |V|dlogn.

Proof.

We first prove this under the assumption that Ld2. In this case, we have Ldddout(v) for all vV. Plugging this in Equation 1, we get 1lognnLd|V|Ld which gives the lemma as dn. Now, assume that L>d2. This means that Ld>d implying that the minimum in the left hand side in Equation 1 can be lower bounded by d while the minimum on the right hand side can be upper bounded by n. Plugging in, we get 1lognndn|V|. Rearranging gives the result.

We use PrUV() and 𝔼UV[] to denote the probability and expectation when U is a subset of vertices chosen uniformly at random. For any subset UV, we define 𝖼𝗎𝗍(U)=E((U×U¯)(U¯×U)) to be the set of edges in the cut formed by U. Also, for all UV, define:

𝖡𝗂𝗀(U)={vV|Nout(v)𝖼𝗎𝗍(U)|0.1dout(v)}. (3)

We show that there exists UV such that many of the vertices in V lie in 𝖡𝗂𝗀(U)U.

Lemma 18.

There exists UV satisfying:

|V𝖡𝗂𝗀(U)U|15|V|.
Proof.

For any vV such that dout(v)10, we have that v𝖡𝗂𝗀(U) (for some UV) if and only if all its out-neighbors are on the same side of the cut as v. As every vertex has at least one out-neighbor, we get that this event happens with probability at most 12 for such v for a uniformly random UV. For vV such that dout(v)10, we have by Lemma 2 that:

PrUV(v𝖡𝗂𝗀(U))PrUV(|Nout(v)𝖼𝗎𝗍(U)|0.1dout(v))e0.1dout(v)<12.

Combining, we get that PrUV(v𝖡𝗂𝗀(U))12 for all vV. It follows that:

𝔼UV[V𝖡𝗂𝗀(U)]12|V|.

In particular, there exists a cut U for which V𝖡𝗂𝗀(U)12|V|. From this, we get that at least one of the quantities V𝖡𝗂𝗀(U)U and V𝖡𝗂𝗀(U)U¯ is at least 15|V|. To finish the proof, observe from Equation 3 that 𝖡𝗂𝗀(U)=𝖡𝗂𝗀(U¯) and thus, at least one of U and U¯ satisfies the lemma.

Next, we fix U to be the set promised by Lemma 18 and define U=V𝖡𝗂𝗀(U)U and d=0.1max(d,2k1). It follows that |U|15|V|d5logn by Lemma 17. Moreover, for all vU, we have that

|Nout(v)({v}×U¯)||Nout(v)({v}×U¯)|=|Nout(v)𝖼𝗎𝗍(U)|0.1dout(v)0.1max(d,2k1)=d. (4)

Next, we isolate a suitably chosen set of vertices W1V.

Lemma 19.

There exists a subset W1V such that all the following hold:

2d|W1| 4d,
|UW1| 12|U|,
vU:|Nout(v)({v}×(U¯W1¯))| 12|Nout(v)({v}×U¯)|.
Proof.

Proof by the probabilistic method. Consider the set-valued random variable 𝖶1 obtained by including all vV with probability 3d/n. By Lemma 2 and using the fact that d>(logn)10, we get that the first inequality holds except with probability 1n2. For the other inequalities, consider any set SV such that |S|(logn)5. Note that because of our bound |U|d5logn and the bound in Equation 4, we have that U and111111Technically, as it needs to be a subset of V, the projection of Nout(v)({v}×U¯) on its second coordinate is such a set, but the same argument applies. Nout(v)({v}×U¯) for all vU are such sets. As dn(logn)10 implies that 3d/n0.01, we have from Lemma 2 that Pr(|S𝖶1|12|S|)1n2 for any such set S. A union bound over all sets S considered in the lemma statement finishes the proof.

Henceforth, we let W1 be as promised by Lemma 19. We also define W2=UW1 and conclude from Lemma 19 that |W2|12|U|d10logn. Finally, we define W3=V(W1W2). For all vW2, we have:

|Nout(v)({v}×W3)|=|Nout(v)({v}×W1W2¯)|=|Nout(v)({v}×UW1¯)|12|Nout(v)({v}×U¯)|d/2. (5)

For all vW2, let Ev be an (arbitrary) subset of its out-edges going to vertices in W3 of size d/2. Observe from Equation 5 that at least one such subset exists and therefore, Ev is well defined.

Next, for all vW2, define the set-valued random variable 𝖷vEv to be an independent and uniformly random subseteq of Ev of size d/4. For notational convenience, for any subset ZW2, we will abbreviate (𝖷v)vZ to 𝖷Z and omit the subscript when Z=W2. Also, define the set-valued random variable 𝖹W2 to be a uniformly random subset of W2 of size d20logn chosen independently of 𝖷. As |W2|d10logn, we have that the random variable 𝖹 is well defined. The following lemma is straightforward.

Lemma 20.

Let vW2 and SvEv. It holds that:

|{Xv𝗌𝗎𝗉𝗉(𝖷v)SvXv}|(d/2d/4)2|Sv|.
Proof.

The size of the set we have to bound equals (d/2|Sv|d/4|Sv|)(d/2d/4)2|Sv|.

5.1 A “Search” Lower Bound

Before showing the desired lower bound on the memory used by the algorithm 𝖠𝗅𝗀, we first define a related distributional search problem and show a lower bound for algorithms solving that. For this, let 𝖷=(𝖷v)vW2 and 𝖹 be as defined above. We use these random variables to define a graph G with the vertex set V=W1W2W3 and the following edges:

  1. (1)

    For all vV and all vvW1, there is an edge from v to v. These edges are called the fixed edges.

  2. (2)

    For all vW2, all the edges in Xv are present in G. These edges are called the 𝖷-edges.

  3. (3)

    For all vW1 and all vZW2, there is an edge from v to v. These edges are called the 𝖹-edges.

We let 𝖦 be the random variable corresponding to G. Note that the random variables 𝖦 and (𝖷,𝖹) determine each other, and, more specifically, the 𝖷-edges and 𝖷 determine each other and the 𝖹-edges and 𝖹 determine each other. Also, note from the fact that |W1|2d (see Lemma 19) that the out-degree of any vertex in any graph G in the support of 𝖦 is at least d. It follows that the support of 𝖦 is a subset of 𝒢n,d.

Having defined the random variable 𝖦, the search problem we consider receives the edges of a graph G𝖦, which we can equivalently view as a pair (X,Z), in the stream, with the 𝖷-edges arriving first (in lexicographic order), then the other edges (in lexicographic order). The goal of the algorithm is to make one-pass over the stream, and with probability at least 0.8, output at least d(logn)3min(d,Ld) distinct edges in vZXv.

Consider any streaming algorithm 𝒜 that solves the above problem. As we study 𝒜 on a specific distribution of inputs (specifically, the distribution 𝖦), we can assume without loss of generality that 𝒜 is deterministic. Let 𝖬 the random variable denoting the memory state of the algorithm 𝒜 after it has processed the 𝖷-edges but before processing the other edges and let 𝖮 be the random variable denoting the output of 𝒜. As 𝒜 is deterministic, observe that 𝖬 is a function of 𝖷 and that 𝖮 is a function of (𝖬,𝖹). We abbreviate events of the form 𝖬=M to just M, 𝖹=Z to just Z, and so on.

Lemma 21.

It holds that:

(𝖷𝖹𝖹,𝖬)d20lognlog(d/2d/4)12d(logn)3min(d,Ld).
Proof.

Let 𝖤 be an indicator random variable that is 1 if and only if the algorithm 𝒜 is correct on input 𝖦 (which determines and is determined by (𝖷,𝖹)). We have:

(𝖷𝖹𝖹,𝖬) (𝖷𝖹𝖹,𝖬,𝖤)+(𝖤𝖹,𝖬) (Corollary 9)
1+(𝖷𝖹𝖹,𝖬,𝖤) (Lemma 8)
1+Z,M,EPr(Z,M,E)(𝖷𝖹Z,M,E) (Definition 4)
1+Z,M,EPr(Z,M,E)(𝖷ZZ,M,E).

We now use the upper bound for the entropy in Lemma 8. For the terms corresponding to E=0, we use Lemma 20 with Sv= to get |𝗌𝗎𝗉𝗉(𝖷v)|(d/2d/4) for all vZ. For the terms corresponding to E=1, we use the fact that Z and M together determine the output of the algorithm, which when E=1, determines d(logn)3min(d,Ld) of the edges in vZXv. By Lemma 20, this means that conditioned on Z, M, E=1, the random variable 𝖷Z is supported on a set of size at most ((d/2d/4))d20logn2d(logn)3min(d,Ld). We get:

(𝖷𝖹𝖹,𝖬) 1+d20lognlog(d/2d/4)Pr(𝖤=1)d(logn)3min(d,Ld)
1+d20lognlog(d/2d/4)0.8d(logn)3min(d,Ld) (As Pr(𝖤=1)0.8)
d20lognlog(d/2d/4)12d(logn)3min(d,Ld). (As L,d(logn)10 and d0.1d)

Combining Lemma 21 with Corollary 11, we get:

(𝖷𝖬)d20logn1|W2|d20lognlog(d/2d/4)12d(logn)3min(d,Ld).

Using the fact that (𝖷)=|W2|log(d/2d/4) and simplifying, we get:

(𝖷𝖬)(𝖷)10|W2|(logn)2min(d,Ld).

From Corollary 9 and the fact that d2k/20, this means that:

(𝖬)12|W2|(logn)2min(2k,Ld).

Now, from Equation 1 and as |W2|12|U|110|V|, we get:

(𝖬)1(logn)4vVmin(dout(v),Ld).

As (𝖬) is upper bounded by the logarithm of the size of the support of 𝖬, we have that the algorithm 𝒜 must use at least 1(logn)4vVmin(dout(v),Ld) bits of memory.

5.2 Finishing the Proof

To finish the argument, we prove that the streaming 𝖠𝗅𝗀 (that is a random-walk algorithm) implies a streaming algorithm that solves our search problem from Section 5.1 with probability at least 0.8. For this, recall that any graph G in the support of 𝖦 lies in 𝒢n,d. This means that we can run the algorithm 𝖠𝗅𝗀 (with the start vertex being an arbitrary vertex v0W1 and the order of edges being the same as in the search problem) and it will sample an L-step random walk from v0 with distortion at most 0.1. Let 𝖯G=(v0,𝗏1,,𝗏L) be the random variable denoting the path, written as a sequence of vertices, taken by an L-step random walk from v0.

Lemma 22.

For all graphs G, which we can equivalently view as a pair (X,Z), it holds that:

Pr(𝖯G has <d(logn)3min(d,Ld) distinct edges from vZXv)<0.1
Proof.

Note that when the walk is at any vertex vW1, the edge from v will either go to another vertex in W1 (when it is a fixed edge added in Item 1 in Section 5.1) or it will go to a vertex in ZW2 (when it is a 𝖹-edge added in Item 3). In the latter case, either the second edge will go to a vertex in W1 (a fixed edge) or it will go to a vertex in W3 (when it is an 𝖷-edge added in Item 2). If it does go to a vertex in W3, the third edge has to be fixed and take it to a vertex in W1. Thus, at least one of any three consecutive vertices in 𝖯G must lie in W1.

In particular, if we break the random walk into chunks of length 10, then each such chunk l[L/10] will contain at least two vertices in W1 within it. For all l[L/10], we define 𝗎l(1),𝗎l(2) to be random variables denoting the positions (these are values in the interval [(l1)10+1,(l1)10+6]) of the first two vertices in W1 in chunk l. As |W1|4d by Lemma 19, |Z|=d20logn and |Xv|=d/4d/40 for all vW2, we have by the above discussion that, for all l[L/10]:

Pr(𝗎l(2)𝗎l(1)=3)11000logn.

Because of the fact that 𝖯G is a random walk, the same bound holds even when conditioned on 𝗎1(1),𝗎1(2),𝗎l1(1),𝗎l1(2). This means that we can find mutually independent indicator random variables 𝖸1,,𝖸L/10, jointly distributed with 𝖯G such that for all l[L/10], we have Pr(𝖸l=1)=11000logn and 𝖸l=1𝗎l(2)𝗎l(1)=3 with probability 1. From Lemma 2, it follows that:

Pr(l=1L/10𝟙(𝗎l(2)𝗎l(1)=3)L(logn)2)Pr(l=1L/10𝖸lL(logn)2)1n2,

as L(logn)10 and n is large enough.

For the rest of this proof, we condition on values (ul(1),ul(2))l[L/10] of (𝗎l(1),𝗎l(2))l[L/10] such that the above event does not occur, and also condition on all the vertices visited in the walk except the vertices (𝗏ul(1)+1,𝗏ul(1)+2)l[L/10]:ul(2)ul(1)=3. Observe that, under this conditioning, for all l[L/10] such that ul(2)ul(1)=3, the edge (𝗏ul(1)+1,𝗏ul(1)+2) is an independent and uniformly random edge in vZXv. As the total number of edges in vZXv is dd80logn and these edges constitute at least L(logn)2 independent uniformly random samples from it, the number of distinct samples will be at least 1100min(dd80logn,L(logn)2)d(logn)3min(d,Ld) except with probability at most 0.05. A union bound finishes the proof.

As for any graph G𝖦, the algorithm 𝖠𝗅𝗀 samples from 𝖯G with distortion at most 0.1, we have from Lemma 22 that the algorithm 𝖠𝗅𝗀 solves the search problem with at least probability 0.8. From the result proved in Section 5.1, we get that the memory needed by the 𝖠𝗅𝗀 after processing the 𝖷-edges is at least 1(logn)4vVmin(dout(v),Ld). As the 𝖷-edges are always a subset of the edges in the graph G, there is a streaming order of the edges of G such that these are the edges that the algorithm 𝖠𝗅𝗀 sees first when it is run on G. It follows that 𝖠𝗅𝗀 needs memory at least 1(logn)4vVmin(dout(v),Ld) when run on G and the proof of Theorem 16 is complete.

References

  • [1] Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. Instance-optimal geometric algorithms. J. ACM, 64(1):3:1–3:38, 2017. doi:10.1145/3046673.
  • [2] Reid Andersen, Fan Chung, and Kevin Lang. Using pagerank to locally partition a graph. Internet Mathematics, 4(1):35–64, 2007. doi:10.1080/15427951.2007.10129139.
  • [3] Reid Andersen and Yuval Peres. Finding sparse cuts locally using evolving sets. In Symposium on Theory of computing (STOC), pages 235–244, 2009. doi:10.1145/1536414.1536449.
  • [4] Aditya Bhaskara, Samira Daruki, and Suresh Venkatasubramanian. Sublinear algorithms for MAXCUT and correlation clustering. In International Colloquium on Automata, Languages, and Programming (ICALP), volume 107 of LIPIcs, pages 16:1–16:14, 2018. doi:10.4230/LIPIcs.ICALP.2018.16.
  • [5] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998. doi:10.1016/S0169-7552(98)00110-X.
  • [6] Moses Charikar, Liadan O’Callaghan, and Rina Panigrahy. Better streaming algorithms for clustering problems. In Symposium on Theory of computing (STOC), pages 30–39, 2003. doi:10.1145/780542.780548.
  • [7] Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu. Near-optimal two-pass streaming algorithm for sampling random walks over directed graphs. In International Colloquium on Automata, Languages, and Programming (ICALP), volume 198, pages 52:1–52:19, 2021. doi:10.4230/LIPIcs.ICALP.2021.52.
  • [8] Ashish Chiplunkar, John Kallaugher, Michael Kapralov, and Eric Price. Factorial lower bounds for (almost) random order streams. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 486–497. IEEE, 2022. doi:10.1109/FOCS54457.2022.00053.
  • [9] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614–656, 2003. doi:10.1016/S0022-0000(03)00026-6.
  • [10] Juan A. Garay, Shay Kutten, and David Peleg. A sublinear time distributed algorithm for minimum-weight spanning trees. SIAM journal on computing, 27(1):302–316, 1998. doi:10.1137/S0097539794261118.
  • [11] Oded Goldreich and Luca Trevisan. Three theorems regarding testing graph properties. Random Struct. Algorithms, 23(1):23–57, 2003. doi:10.1002/rsa.10078.
  • [12] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM journal on computing, 18(6):1149–1178, 1989. doi:10.1137/0218077.
  • [13] Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical computer science, 43:169–188, 1986. doi:10.1016/0304-3975(86)90174-X.
  • [14] Ce Jin. Simulating random walks on graphs in the streaming model. In Avrim Blum, editor, Innovations in Theoretical Computer Science Conference (ITCS), volume 124 of LIPIcs, pages 46:1–46:15, 2019. doi:10.4230/LIPIcs.ITCS.2019.46.
  • [15] Hossein Jowhari, Mert Saglam, and Gábor Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Symposium on Principles of Database Systems (PODS), pages 49–58, 2011. doi:10.1145/1989284.1989289.
  • [16] John Kallaugher, Michael Kapralov, and Eric Price. Simulating random walks in random streams. In Symposium on Discrete Algorithms (SODA), pages 3091–3126, 2022. doi:10.1137/1.9781611977073.120.
  • [17] Omer Reingold. Undirected connectivity in log-space. Journal of the ACM (JACM), 55(4):1–24, 2008. doi:10.1145/1391289.1391291.
  • [18] Tim Roughgarden, editor. Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press, 2020. doi:10.1017/9781108637435.
  • [19] Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13:1–13:19, 2011. doi:10.1145/1970392.1970397.
  • [20] Aaron Schild. An almost-linear time algorithm for uniform random spanning tree generation. In Symposium on Theory of Computing (STOC), pages 214–227, 2018. doi:10.1145/3188745.3188852.
  • [21] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, 42(1):1–26, 2013. doi:10.1137/080744888.
  • [22] Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM J. Comput., 46(1):429–455, 2017. doi:10.1137/151002526.
  • [23] Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37–57, 1985. doi:10.1145/3147.3165.