Parameterized Streaming Algorithms for Topological Sorting
Abstract
Computing a topological ordering for an -node directed acyclic graph (DAG) is computationally challenging in streaming models. Chakrabarti et al. [SODA 2020] showed that in the insertion-only streaming model, every single-pass algorithm requires space, and every -pass algorithm requires space for any constant .
We study the parameterized complexity of streaming algorithms for topological sorting, considering two parameters: the independence number and the maximum displacement . Our results include an -pass -space streaming algorithm and an -pass -space streaming algorithm. For dense random DAGs, both and are small, allowing us to improve the state-of-the-art for topological sorting in random DAGs.
As applications, we show that strongly connected components (SCC) decomposition and 2-satisfiability (2-SAT) can be solved in passes using space and space, respectively, where denotes the independence number of the implication graph induced by the input 2-SAT instance.
Keywords and phrases:
Independence Number, Chain Cover, SCC Decomposition, 2-SatisfiabilityCopyright and License:
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithmsAcknowledgements:
We want to thank the anonymous reviewers for their valuable comments.Funding:
This research was supported in part by the National Science and Technology Council under contract NSTC 114-2221-E-001-023 and 113-2221-E-002-204-MY3.Editors:
Pat Morin and Eunjin OhSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
We study the problem of computing a topological ordering of a given -node directed acyclic graph (DAG) using space subquadratic in the number of nodes. Specifically, the goal is to output an ordering of the nodes in using space for some constant , such that for every 111We define . with , there is no directed path in from to . This problem has been studied by Chakrabarti et al. [5], who showed that in the insertion-only streaming model, every single-pass algorithm requires space, and every -pass algorithm requires space for any constant .
For problems that require substantial space when computed with few passes in the streaming model, a natural direction is to study their parameterized complexity. This approach was introduced by Fafianie and Kratsch for edge dominating set [10] and by Chitnis et al. for maximal matching and vertex cover [7]. Since then, the parameterized complexity of various problems has been explored. Chitnis and Cormode studied treewidth and dominating set [6]. Agrawal et al. studied Min-Ones SAT [1]. Oostveen and van Leeuwen studied diameter and connectivity [21]. Lokshtanov et al. studied feedback vertex set and multiway cut [17].
Our model of computation follows the graph streaming model [20, 18]. The edges of the input graph are presented in an order determined by an adversary who has access to the algorithm but not its random seed. The algorithm processes the edges sequentially, making up to passes over the input while using at most space. During each pass, the algorithm can only read the stream forward and cannot backtrack. We refer to such an algorithm as a -pass -space streaming algorithm. If the edges are static, meaning only edge insertions occur in the stream, the model is called the insertion-only model. In contrast, if both edge insertions and deletions are allowed, where deletions can only remove previously inserted edges, the model is referred to as the turnstile model.
Our first main result shows that any -node directed acyclic graph with independence number can be topologically sorted in passes using space. A directed graph has independence number if there exists a set of nodes in such that no two nodes in the set are joined by an edge in , and no larger set satisfies this property. The number of passes can be further reduced to for every if space is available. Additionally, we provide a smooth tradeoff between passes and space. Formally, we have:
Theorem 1.
Let be an -node directed acyclic graph with independence number . For every , there exists an -pass -space deterministic streaming algorithm to topologically sort in the insertion-only model. Moreover, there is a smooth tradeoff between passes and space: for every constant , there exists an -pass -space streaming algorithm with the same functionality with probability .
Another main result is that any -node directed acyclic graph with an advice of maximum displacement can be topologically sorted in passes using space. For an -node acyclic graph , a function is an advice of with maximum displacement if admits a topological ordering such that for every , the displacement satisfies . Note that the function is not required to be injective. We provide also a smooth tradeoff between passes and space. Formally, we have:
Theorem 2.
Let be an -node directed acyclic graph with an advice of maximum displacement . For every , there exists an -pass -space deterministic streaming algorithm in the insertion-only model to topologically sort .
The above two algorithms can be applied to random directed acyclic graphs (random DAGs) , where each edge is included independently with probability . In random DAGs, the independence number can be estimated quite accurately, and an advice can be provided using the in-degree of the nodes. Using these facts, we have an -pass -space streaming algorithm and an -pass -space streaming algorithm. These algorithms have almost identical pass and space requirement as [5]. But, unlike [5] which requires a chain that goes through all nodes, our algorithms apply to random DAGs .
Furthermore, both of our algorithm exhibit smooth trade-offs between pass and space. Also, both of our algorithms and their analysis directly applies to graphs of type , where is a random DAG and can be any given graph with maximum in-degree that has the same node set and the same topological ordering as .
The final main result gives a deterministic algorithm in the turnstile model to compute a topological ordering for any directed acyclic graph, stated below:
Theorem 3.
Let be an -node directed acyclic graph. For every , there exists an -pass -space deterministic streaming algorithm in the turnstile model to topologically sort .
1.1 Technical Ingredients
If a directed acyclic graph has independence number , then every node-induced subgraph of admits a chain cover of at most chains. This is a stronger condition than simply assuming that a largest antichain in contains at most nodes, a parameter studied extensively for algorithms on a RAM [16, 4], as our condition ensures a structural property that holds for all node-induced subgraphs, enabling a divide-and-conquer approach to compute a chain cover consisting of few chains. Since the chain cover consists of chains, the set of nodes reachable from any node in can be succinctly represented by a tuple of integers. Combined with a distributed variant of the Bellman-Ford algorithm, this forms the basis of our first algorithm.
Our second and third algorithms rely on the concept of a source subgraph, a node-induced subgraph of defined by a set such that no edge in enters from . The second algorithm leverages the maximum displacement to construct a subgraph that fits in memory and contains an -node source subgraph of for some sufficiently large . The third algorithm refines by removing a majority of out-neighbors from high-degree nodes, producing a sparse subgraph (in terms of edge density) that also contains a large source subgraph of . Consequently, topological sorting of can be performed by iteratively peeling away a source subgraph. Since each source subgraph contains a moderate number of nodes, the number of peeling rounds remains low.
1.2 Applications
Finally, in Section 7, we show that the idea used in our topological sort algorithm can be applied to SCC decomposition, yielding an -pass -space deterministic algorithm for any , as well as a ranodmized -pass -space. Since SCC decomposition can be directly applied to solve 2-SAT, these results also extend to the 2-SAT problem.
1.3 Paper Organization
The paper is organized as follows. In Section 2, we introduce our notation. In Section 3, we develop algorithms for topologically sorting directed acyclic graphs with independence number , proving Theorem 1. In Section 4, we design algorithms for topologically sorting directed acyclic graphs given an advice function with maximum displacement , thereby proving Theorem 2. We then combine the results from Section 3 and Section 4 to obtain a result for random DAGs in Section 5. In Section 6, we introduce a deterministic algorithm in the turnstile model for topologically sorting general DAGs. Finally, in Section 7, we discuss selected applications of topological sorting, including SCC decomposition and 2-satisfiability.
2 Notation
Our input graph is an -node directed acyclic graph in which each directed edge is from node to node . Let denote the set and (resp. ) denote the closed (resp. open) interval between and . A topological ordering of is an ordering of the nodes in such that there is no directed path in from to for any with . We use to denote the node-induced subgraph of induced by the node set ; that is, where all edges in with both endnodes in form . We use to denote an -node random directed acyclic graph where each edge is included in with probability independently, and orient the directions of all edges based on a random permutation on the nodes from an earlier node in the permutation to a later node.
For each node in , if there is a directed edge , we say that is an out-neighbor of , and is an in-neighbor of . A node with no in-neighbors (resp. no out-neighbors) is called a source (resp. a sink). We say that a node has in-degree (resp. out-degree) if it has in-neighbors (resp. out-neighbors), denoted as (resp. ).
The transitive closure of a directed graph is the graph , where if and only if there exists a directed path from to in . A chain of is a sequence of nodes for some integer , such that forms a directed path in the transitive closure . A chain cover of is a collection of chains such that the nodes contained in the chains partition the node set . An antichain of is a set of nodes such that no directed path exists between any two distinct nodes in the set.
We say that a directed graph has independence number if there exists a set of nodes in such that no two nodes in the set are joined by an edge in , and no larger set satisfies this property.
For an -node acyclic graph , we say that a function is an advice of with maximum displacement if admits a topological ordering such that for every , the displacement satisfies .
In this paper, the space usage of each algorithm is measured in words, while the space lower bound of each problem is measured in bits. Therefore, if an algorithm’s space usage matches the problem’s space lower bound, the algorithm is considered nearly optimal in space usage, up to an factor.
3 An -Pass -Space Algorithm and Tradeoff
We begin with showing that the transitive closure of an -node directed acyclic graph with independence number can be represented by the transitive closure of an -edge subgraph of . Since and have the same transitive closure, computing a topological ordering of suffices to topologically sort . Then, we devise a streaming algorithm in the insertion-only model to compute , which yields an -pass -space streaming algorithm to topologically sort any given -node acyclic directed graph with independence number . Moreover, there is a smooth tradeoff between passes and space: for every constant , there exists an -pass -space streaming algorithm with the same functionality with probability . This proves Theorem 1.
Lemma 4.
Every directed acyclic graph with independence number contains an -edge subgraph such that and have the same transitive closure. Moreover, if a graph has the same transitive closure as , then also contains an -edge subgraph with the same transitive closure, even if the independence number of exceeds .
Proof.
Let be an antichain in ; that is, a subset of nodes in such that for any two nodes and in the subset, there exists no directed path from to . Thus, is an independent set and . This property holds also for any largest antichain in . By Dilworth’s theorem [9], has a chain cover consisting of at most chains. For each node in , if has out-degree greater than , then by the pigeonhole principle, there exist two out-neighbors and of that belong to the same chain in the chain cover. Let the chain be for some and for some . If the edge is removed from , then the resulting graph has the same transitive closure as . This holds because can still reach via the edge followed by a directed path that witnesses the chain . Note that is not on , as is acyclic, and neither is the edge . Consequently, one can iteratively remove an edge from while retaining the transitive closure unchanged until every node in has out-degree at most .
The same argument works for any graph that has the same transitive closure as , even if the independence number of exceeds .
One may find that Lemma 4 still holds if the number of nodes in a largest antichain in is . However, it is essential for our algorithm to have the independence number of bounded by . In this way, every node-induced subgraph of (rather than only ) has an antichain consisting of at most chains, which is the key to find the subgraph efficiently in the streaming model.
First, we find the chain cover used in the proof of Lemma 4 as follows.
Lemma 5.
Let be an -node acyclic directed graph with independence number . For every , there exists an -pass -space streaming algorithm that outputs a chain cover of consisting of chains.
Proof.
We begin with an -pass -space streaming algorithm. First, we partition the nodes of into subsets arbitrarily for some integer , each containing nodes. We then scan the stream of all edges in once. For each , we collect all edges in the node-induced subgraph in memory. Next, we apply the approach from the proof of Lemma 4 to compute a chain cover consisting of chains for and trim into a sparse subgraph with edges.
We then scan the stream of all edges in again. For each pair of consecutive node-induced subgraphs and , we collect edges with one endnode in and the other in (i.e., crossing edges). Since we have a chain cover consisting of chains for both and , at most essential crossing edges per node in and are required to preserve the transitive closure. This can be done using space because, for each of the chains in the node-induced subgraph (resp. ), each node in (resp. ) only needs to keep the single out-neighbor that appears first in the chain. Finally, we replace the union of , , and their essential crossing edges with an -edge subgraph , ensuring that and have the same transitive closure and thus the chain cover of consists of at most chains. Such a sparse subgraph can be found because the union of , , and the essential crossing edges has the same transitive closure as and hence we can apply Lemma 4 though the independence number of the union may exceed .
The above process reduces the number of subsets by half. The space usage remains . Repeating this process for iterations merges all subsets into a single subgraph with a chain cover of chains.
The above approach groups two subsets at a time. More generally, we can group subsets at a time. This increases space usage by a factor of while reducing the number of passes to
Then, we use the chain cover found in Lemma 5 to compute as follows.
Corollary 6.
Let be an -node acyclic directed graph with independence number . For every , there exists an -pass -space streaming algorithm that outputs an -edge subgraph such that and have the same transitive closure.
Proof.
By Lemma 5, we obtain a chain cover of consisting of chains using the claimed number of passes and space. Then, in another pass over all edges of , for each node , we store all of its out-neighbors in a list. If the list exceeds out-neighbors, at least one can be discarded, as established in the proof of Lemma 4.
Finally, given the subgraph , a topological ordering of can be computed by topologically sorting . This gives an -pass -space deterministic streaming algorithm to compute a topological ordering for any n-node directed acyclic graph in the insertion-only model. This proves the few-pass algorithm stated in Theorem 1.
3.1 Tradeoff Between Passes and Space
We give a smooth tradeoff between passes and space: for every constant , there exists an -pass -space streaming algorithm with the same functionality. Our divide-and-conquer approach is inspired by the parallel topological sorting method of Schudy [22]. However, since reachability is computationally expensive in streaming models, we overcome this limitation by leveraging the concept of chain cover, which is a novel aspect of our approach compared to Schudy’s result.
Lemma 7.
Given an -node directed acyclic graph and a topological ordering of nodes, sampled independently and uniformly at random (possibly with repetition), there exists an -pass -space streaming algorithm that topologically sorts with probability .
Proof.
We add a new node to , with a directed edge to every node in that has in-degree . Let be a set of sampled nodes, denoted as , such that no directed path in starts from to for any , where . Define .
For each integer , let be the set of nodes in that are reachable from but not from any with . Since can reach every node in , the sets for all integer form a partition of . Therefore, storing all in memory needs space. Given this partition, a topological ordering of can be obtained by first sorting the nodes within each independently and then concatenating their orderings.
Since is a randomly selected subset of nodes, the sets satisfy the following property:
Claim 8.
For each integer and each node , any longest path in (with unweighted edges) from to contains at most edges with probability .
Proof.
Let be the set of nodes in , including . Consider any pair of nodes such that is reachable from . Fix a longest path from to , and let be the collection of such paths whose lengths exceed . Since there are at most such pairs, we have .
The probability that none of the internal nodes in (excluding and ) is included in is
Applying the union bound, we conclude that with probability at least , every path in contains at least one internal node in .
Consequently, if there exists a node such that a longest path from to contains more than edges, then with probability at least , some internal node in belongs to . Since there is a path from to , we have , contradicting the assumption that is in .
To match the claimed number of passes and space, we design a distributed variant of the Bellman-Ford algorithm (stated also in Algorithm 1). Assign a weight of to all edges in . Then, the shortest distance from to any node corresponds to the length of the longest path (in terms of edge count) from to . Sorting the nodes by their distances from yields a topological ordering. However, computing the distances from may require many passes or substantial space.
Instead of running the Bellman-Ford algorithm from a single source node , we execute it in parallel from every node in the set . If a node is reachable from two nodes and with , we discard the information from , since does not belong to . In Algorithm 1, this behavior is enforced by the rank function , which is set such that for all with , ensuring that only information from the appropriate source is retained. As a result, the space usage per node and in total. By Claim 8, the Bellman-Ford algorithm terminates after iterations.
The node ordering returned by Algorithm 1 is a topological ordering of because it topologically sorts the nodes in each by their longest distances (in term of edge count) from . Moreover, the concatenation of the topological orderings of nodes in s for all cannot have any edge in for some and with , as it would violate the definition of the sets .
Lemma 9.
Given an -node directed acyclic graph , a chain cover of consisting of chains, and nodes sampled uniformly at random from independently (possibly with repetition), there exists an -pass -space streaming algorithm that topologically sorts these sampled nodes with probability .
Proof.
We topologically sort the sampled nodes by ordering them based on the cardinalities of their reachable sets. Let be the set of sampled nodes. For each , we perform a depth- BFS rooted at . The chains can be stored in memory using space. Each BFS computation requires an additional space, as detailed below, and across all BFSs, the exploration of nodes at depth is performed simultaneously. Consequently, the total number of passes is .
We execute each of these BFSs as follows. The key difference between our BFS and a standard BFS is that we define the child nodes of any node to include both its out-neighbors and the out-neighbors of all descendant nodes of along the chain to which belongs. Since we are only interested in the set of nodes that can reach, we do not need to maintain the full structure of a standard BFS. Thus, for each BFS level, we only need to store at most nodes. This is because if two nodes belong to the same chain, where one is an ancestor of the other, then the child nodes of the ancestor form a superset of those of the descendant.
We only need to perform depth- BFSs for all to compute the reachable sets for all . For every pair of nodes , consider a shortest path from to . Let be the collection of all such shortest paths of length at least . For each path in , the probability that none of its internal nodes belong to is
Applying the union bound over all paths in , where , we conclude that with probability , every path in contains at least one internal node in . Consequently, if there exists a node that is reachable from via a shortest path longer than , then it must be visible from the -depth BFS rooted at some node in .
We are now ready to prove the tradeoff stated in Theorem 1. Let be any constant in . First, we partition the node set of into subsets arbitrarily, each containing nodes. For each subset , we apply Lemma 5, setting and , to obtain a chain cover of the subgraph induced by , which consists of chains. This holds because any node-induced subgraph of also has independence number . Combining these chain covers across all subsets yields a chain cover of with chains.
3.2 Graph Classes with Small Independence Number
In what follows, we provide a brief proof (without claiming novelty) that a dense random DAG has a small independence number with high probability for every . Given the bounds on the number of nodes in a largest antichain in a random DAG due to Bollobás and Brightwell [3], our bound in Lemma 10 is nearly optimal, up to a logarithmic factor. We note that a result by Frieze [12] may eliminate this logarithmic factor, but it requires the assumption .
Lemma 10.
For a random DAG , where each edge is included independently with probability , the independence number of is with probability .
Proof.
Let , and define as the number of independent sets of nodes in . Since , we have
Since is non-negative, applying Markov’s inequality gives
Thus, with probability , contains no independent set of nodes or more.
4 An -Pass -Space Algorithm and Tradeoff
In this section, we present an -pass -space topological sort algorithm assuming that an advice with maximum displacement is given, where can be any given number in . This proves Theorem 2.
The main idea is that, given any , we devise a process that can find and sort a source subgraph of at least nodes using a single pass and space. We say a node-induced subgraph of is a source subgraph if no edge in is from a node outside to a node in . Our algorithm just repeats this process times. In order to find a source subgraph, we need the following observation:
Lemma 11.
Let be an -node directed acyclic graph with an advice of maximum displacement . Given any . Let be the subgraph induced by all nodes with and be the subgraph induced by all nodes with . If is a subgraph of and a source subgraph of , then is a source subgraph of .
Proof.
It suffices to show that no edge goes from to . Since is a source subgraph of , we only need to prove that that no edge goes from to .
Let be a topological ordering of such that for all nodes . For any node , we have . Similarly, for any node , we have . Combining the two inequalities, we have . Therefore, no edge goes from to , which completes the proof.
Using the above lemma, to find a source subgraph in , it suffices to remember the induced subgraph and find the maximum source subgraph that satisfies the conditions listed. This can be done by remembering the entire induced subgraph in a single pass. The following two lemmas show that keeping the entire subgraph takes space, and the resulting source subgraph has size at least .
Lemma 12.
Let be an -node directed acyclic graph with an advice of maximum displacement . Given any . There are at most nodes with .
Proof.
Let be a topological ordering of such that for all nodes . Only the first nodes of can satisfy .
Lemma 12 shows that the subgraph defined in Lemma 11 has nodes. Since the maximum displacement between the advice and the true ordering is at most , any pair of nodes whose advice values differ by more than must appear in the same relative order as suggested by the advice. Therefore, when computing a topological ordering, it suffices to retain only the edges between nodes whose advice values differ by less than . These edges will be referred to as relevant edges. The subgraph can be stored in space if we only keep the relevant edges.
Lemma 13.
Let be an -node directed acyclic graph with an advice of maximum displacement . Given any . Let be the subgraph induced by all nodes with and be the subgraph induced by all nodes with . There exists a subgraph of size at least such that is a subgraph of and a source subgraph of .
Proof.
Let be a topological ordering of such that for all nodes . The subgraph induced by the first nodes in has and thus is a subgraph of . Since is the first nodes in a topological ordering, has no incoming edges and thus is a source subgraph of .
We are now ready to present the algorithm stated in Theorem 2. Given a graph and an advice function , the algorithm makes a single pass over the input and stores the subgraph induced by all nodes with , along with all relevant edges between them. It then identifies the largest source subgraph of by iteratively expanding , starting from the empty set and adding nodes whose advice values are at most and whose in-neighbors have all already been included in . By Lemma 13, this source subgraph, denoted , contains at least nodes. The algorithm sorts the nodes of and places them at the front of the final topological ordering. Repeating this process times yields a topological ordering of the entire graph .
According to Lemma 12, the space requirement for keeping the subgraph in each step is . space is needed to keep the final topological ordering. Therefore, the total space usage for our streaming algorithm is . Note that when , the algorithm just remembers all relevant edges in a single pass using space .
5 Random DAGs
In this section, we consider the topological ordering of random directed acyclic graphs (random DAGs) defined in Section 2. We apply our algorithms described in Section 3 and Section 4 to random DAGs in the streaming model and obtain a hybrid algorithm comprised of the two. We conclude this section by comparing our results to the results in [5].
First, for dense random DAGs, we can apply our algorithm in Section 3. Lemma 10 shows that, for a random DAG , where each edge is included independently with probability , the independence number of is with probability . Substituting this number into our algorithm in Section 3, we have an -pass -space streaming algorithm for every . Choosing to be , we have an -pass -space streaming algorithm.
Second, we apply our topological sort algorithm described in Section 4. In order to apply the algorithm, we must first obtain an advice . For a random DAG , we can obtain an advice with maximum displacement at most with probability using the incoming degrees of the nodes. The same technique has been used in [5] while considering a variant of random directed acyclic graphs.
Lemma 14.
Let be a random DAG. For every node , with probability .
Proof.
For each node , the in-degree is a random variable that follows a binomial distribution with trials and success probability . By applying Chernoff Bound we have
By Lemma 14, is an advice with maximum displacement with high probability. The calculation of can be done in one pass and space by counting the incoming degrees of the nodes.
Using as an advice, Theorem 2 directly gives an -pass -space streaming algorithm where . Furthermore, notice that the space requirement is for keeping the relevant edges for each of nodes in . However, in a random graph , the probability that all subgraphs that could potentially be chosen as have only edges is at least . As a result, the algorithm described in Theorem 2 only uses -pass -space with probability . Choosing to be , we have an -pass -space streaming algorithm.
Combining the above algorithms, we have two topological sort algorithms for random DAGs with almost identical (up to poly-logarithmic improvement) pass and space requirement as [5]. Unlike [5] which requires a chain that goes through all nodes, our algorithms apply to random DAGs .
Furthermore, both of our algorithm exhibit trade-offs between pass and space. Also, both of our algorithms and their analysis directly apply to graphs of type , where is a random DAG and can be any given graph with maximum in-degree that has the same node set and same node ordering as . The independence number only decreases after adding the edge set of to , and the advice derived from Lemma 14 is still correct with high probability for sparse .
6 A Deterministic Approach in the Turnstile Model
We devise a deterministic algorithm to compute a topological ordering for any directed acyclic graph in the turnstile model. This proves Theorem 3. Our technical ingredients for this deterministic result include a sparse certificate and the set reconciliation.
Let be an -node directed acyclic graph. We say that a subgraph of is a -certificate of topological sorting if for any topological ordering of there exists a topological ordering of so that the longest common prefix of and has length at least . Our deterministic algorithm to compute (initialized as an empty sequence) is simply computing a topological ordering of a -certificate of , appending the first nodes in to , followed by reducing the problem to a subproblem. Note that the first nodes in induces a source subgraph of . See also Algorithm 2, whose correctness follows from the fact that the first nodes in form a valid prefix of .
In what follows, we present how to obtain a sparse -certificate . For each node , define to be the set of edges in that connect ’s in-neighbors to . Define further that to be any subset of of cardinality exactly . We show in Lemma 15 that the union of for all is a -certificate of edges.
Lemma 15.
For every integer , the subgraph
is a -certificate of consisting of edges.
Proof.
Let be a topological ordering of . Let be the set of nodes placed at the first positions of . Let be the subgraph induced by the node set . It suffices to show that the concatenation of any topological ordering of and is a topological ordering of .
For any node , because all in-neighbors of in have to precede in . Combining this observation with the definition of , we have that for any node . Hence, no edges crossing and in can be directed from to . Consequently, let (resp. ) be the topological ordering of (resp. ), the concatenation will not induce any backward edge. Because and both are subgraphs of and is acyclic, this ensures the existence of and .
The remaining to show is how to obtain for all deterministically in the turnstile model. To achieve this, we use the characteristic polynomials introduced in the study of the set reconciliation problem [19]. The reason not to use -samplers [15] to obtain , a standard building block for the computations in the dynamic streams, is that our algorithm has to be deterministic here. Unfortunately, we have no idea how to obtain for all deterministically in the turnstile stream. To get around with this issue, we note that the nodes with are not involved in the computation of the first positions in , as shown in the proof of Lemma 15. We maintain the degree of node using space to decide whether has or not. If , the edges incident to does not involve the computation of the first positions in , so there is no need to compute ; otherwise , we use Lemma 16 to recover based on the characteristic polynomials used for the set reconciliation [19].
Lemma 16.
For every integer , for every node with , the set can be obtained deterministically in the turnstile model using space.
Proof.
Define the characteristic polynomial of to be the univariate polynomial
noting that the roots of comprise the set and the degree of corresponds to . Though at the end of the dynamic input stream, it can be much larger than in the middle of processing the input. Hence, to recover using space we cannot directly maintain . Instead, we maintain the values of at distinct points outside the range . That is, for each , while processing an edge insertion update with and while processing an edge deletion update with , i.e. the multiplicative inverse of . Given the point-value pairs , by the standard interpolation [19], if the degree of less than at the end of the dynamic input stream (i.e. the given premise ) can be recovered.
We are now ready to prove Theorem 3. We use -certificates times, each requiring a single pass and using space. Set as , so our algorithm uses space as claimed.
7 Applications
In this section, we present applications of our topological sorting algorithms. In Section 7.1, we develop an -pass -space streaming algorithm for SCC decomposition, based on the chain cover construction described in Section 3. We then present an -pass -space streaming algorithm for SCC decomposition, using the notion of the -certificate introduced in Section 6. Given these algorithms, the results for 2-SAT in Section 7.3 follow immediately.
7.1 An -Pass -Space SCC decompisition Algorithm
In this subsection, we focus on the problem of SCC decomposition. Let be an -node directed graph. A strongly connected component (SCC) is a maximal set of nodes such that, for all , there exists a directed path from to . The SCC decomposition of is typically defined as a partition of , where each is an SCC of .
However, in many applications, it is not only the identification of SCCs that matters, but also their topological ordering. For instance, the algorithm for finding a satisfying assignment of 2-SAT [2] relies on such an ordering.
Therefore, we define an SCC decomposition to be an ordered collection of sets , where each is an SCC, , and there are no edges from any node in to any node in for all .
We begin with showing that the transitive closure of an -node directed graph (possibly cyclic) with independence number can be represented by the transitive closure of a subgraph containing edges. We note that Lemma 17 extends Lemma 4, which applies only to DAGs.
Lemma 17.
Every directed graph (possibly cyclic) with independence number contains an -edge subgraph such that and have the same transitive closure. Moreover, if a graph has the same transitive closure as , then also contains an -edge subgraph with the same transitive closure, even if the independence number of exceeds .
Proof.
By the Gallai-Milgram Theorem [8], the node set of a directed graph with independence number can be covered by at most node-disjoint paths. The proof of Lemma 4 can be adapted to this setting by replacing the chain cover with the above node-disjoint paths.
Lemma 18.
Let be an -node directed graph with independence number . For every , there exists an -pass -space streaming algorithm that outputs a collection of node-disjoint directed paths whose node sets partition the node set of .
Proof.
The algorithm is almost the same as that in Lemma 5 except that we replace the chain cover used in Lemma 5 with the path cover stated in Lemma 17.
Corollary 19.
Let be an -node directed graph with independence number . For every , there exists an -pass -space streaming algorithm that outputs an -edge subgraph such that and have the same transitive closure.
Proof.
By Lemma 18, we obtain a path cover of consisting of directed paths using the claimed number of passes and space. Then, in another pass over all edges of , for each node , we store all of its out-neighbors in a list. If the list exceeds out-neighbors, at least one can be discarded, as established in the proof of Lemma 4.
Finally, given the subgraph , the SCC decomposition of can be computed by computing the SCC decomposition of . This gives an -pass -space deterministic streaming algorithm to compute the SCC decomposition for any n-node directed graph in the insertion-only model.
7.2 An -Pass -Space SCC Decomposition Algorithm
In this subsection we devise an SCC decomposition algorithm based on our topological sort algorithm in Section 6. We first present an algorithm that w.h.p. identifies all SCCs that are large enough. We then demonstrate that a modified version of our topological sort algorithm can produce the SCC decomposition of when all SCCs are small. Combining the two algorithms, we obtain an algorithm for SCC decomposition in directed graphs.
For every , let be the set of nodes reachable from , and be the set of nodes that can reach . The intersection is the SCC that includes . We use this property to devise our algorithm. Given an integer , we sample a set of nodes from uniformly at random independently. If we can compute for each , then all the SCCs containing at least nodes are identified with high probability as shown in the following lemma.
Lemma 20.
Let be an -node directed graph. Given any , let be a set of nodes sampled uniformly at random from . Every SCC of that has at least nodes contains some node in with probability .
Proof.
Let be an SCC with at least nodes. The probability that none of the nodes of are sampled is at most for some constant . Applying a union bound over all SCCs with at least nodes, the probability that there exists such an SCC containing no sampled nodes is .
In what follows, we show how to compute for each . We perform BFS from each in both forward and backward directions. Besides usual BFS steps, whenever a node reaches another node in the -th forward (resp. backward) BFS step, we merge all nodes that can reach in forward (resp. backward) BFS steps to . Formally, for all and a given integer , let be the set of nodes that can reach in BFS steps. can be obtained from within one pass. We initiate to be . For each input edge , if , then we add to . If , then we also add to . Before the algorithm starts, we initiate for all . We define similarly as above.
The following lemma shows that, with high probability, after passes of forward and backward BFS from each sampled node , the intersection is correctly identified for every sampled node .
Lemma 21.
Let be an -node directed graph. Given any , let be a set of nodes sampled uniformly at random from . Then, with probability , for every .
Proof.
Consider a pair of nodes where and is a node in the same SCC as . By definition, there is a path from to and a path from to within the SCC. Let be the length of a shortest path from to . If , apparently . If , we claim that in each subpath of length in , there is a node with probaility at least . Let be the sampled nodes on the path arranged by their distance from . By the claim, for all the distance between is at most with probability at least . Therefore for all . Also, by the claim, and . Observe that, by our construction, for every pair of , if , then . Therefore . By a similar proof, . Combining the two, we have with probability at least . Applying the union bound, we complete the proof. It remains to prove the claim. The probability that all nodes in a subpath are not sampled is at most for a sufficiently large constant , as desired.
The above results lead to the following algorithm for finding all large enough SCCs.
Lemma 22.
Let be an -node directed graph. Given any , there is an pass space streaming algorithm that determines all SCCs of that contain at least nodes with probability .
Proof.
Given any , the algorithm samples nodes uniformly at random, and then performs BFS from each node for steps in parallel. Lemma 21 shows that the reachability sets for all sampled nodes can be obtained with probability , and Lemma 20 shows that all SCCs with at least nodes will be found with probability . The algorithm uses passes to simulate the -step BFS. Performing BFS from each of the sampled nodes requires space. Therefore, the space usage of the algorithm is .
It is unclear whether the algorithm described in Lemma 22 is optimal. Nevertheless, we present a one-pass lower bound by first proving a lower bound for an easier problem and then extend to it.
Theorem 23.
Let be an -node directed graph. Deciding whether is strongly connected in one pass requires bits.
Proof.
A reduction from -reachability to this problem is shown in [13] on a RAM. For completeness, we restate it here. Given an instance of -reachability consisting of a directed graph and two designated nodes , we generate a new graph by adding edges for all and for all to . is strongly connected if and only if has a path from to . Adding the edges is straightforward in the streaming model and hence the reduction can be done within space and without using extra passes. Since -reachability requires bits to solve in one pass [11], the same lower bound extends to this problem. Theorem 23 immediately implies the following:
Corollary 24.
Given any integer , identifying all SCCs of with at least nodes in one pass requires bits.
Corollary 25.
Identifying all SCCs for an -node directed graph in one pass requires bits.
In the rest of the section, we show that our topological sort algorithm from Section 6 can be modified to produce the SCC decomposition of when all SCCs in are small. We start with the following observation.
Lemma 26.
Let be an -node directed graph, and let be a source subgraph of . For any SCC of , if contains at least one node from , then is entirely contained within .
Proof.
Suppose, for the sake of contradiction, that there exists an SCC such that but . That is, there exist nodes where . Take any , by the definition of an SCC, there must be a path from to . This implies the existence of an edge from to , which contradicts the assumption that is a source subgraph.
Using Lemma 26, we can design an algorithm for SCC decomposition. The process proceeds as follows. We first identify a source subgraph of and compute the SCC decomposition of . We then remove from and repeat this procedure until we obtain the complete SCC decomposition of .
Here we present an efficient way to find a source subgraph. We can assume that the input graph is an -node directed graph with all SCCs having at most nodes, as larger SCCs can be identified and contracted by the above algorithm. Given any , we consider a subgraph
In the following lemmas, we will show that by leveraging the structure of along with the degree information of all nodes in , we can identify a source subgraph containing at least nodes.
Lemma 27.
Let be an -node directed graph. Given , let be the set of nodes such that for all , . Let
If is a source subgraph of and , then is a source subgraph of .
Proof.
For all nodes , since , we have . In other words, all nodes in have all their incoming edges recorded in . Therefore, if has no incoming edges in , it also has no incoming edges in , which completes the proof.
Lemma 28.
Let be an -node directed graph with all SCCs having at most nodes. Given any , let be the set of nodes such that for all , . Let
There exists a subgraph with at least nodes s.t. is a source subgraph of and .
Proof.
Consider any topological ordering of the SCCs of . Let be the largest integer in such that the number of nodes in the union of is at most . Define as the subgraph induced by the nodes in the union. We will show that this choice of satisfies all the required conditions.
Note that the SCCs have no incoming edges from . Therefore, is a source subgraph of and of any subgraph of that contains it. Since every node contained in for has at most incoming edges, each such node belongs to and the subgraph induced by the nodes in the union is contained in . Thus, is a source subgraph of . In addition, the union of must contain more than nodes and has at most nodes, it follows that the union of contains at least nodes, as required.
Theorem 29.
Let be an -node directed graph such that all SCCs have at most nodes. For any integer , there is an -pass -space streaming algorithm that determines the SCC decomposition of .
Proof.
Combining Lemma 27 and Lemma 28, we can determine the SCC decomposition for at least nodes by keeping edges in memory in one pass. We only need to repeat times to obtain the SCC decomposition of . The space requirement is since we keep at most edges in each pass, and maintaining the SCC decomposition uses space.
Combining Lemma 22 and Theorem 29, we obtain an SCC decomposition algorithm.
Theorem 30.
Let be an -node directed graph. For any , there is an -pass -space algorithm that determines the SCC decomposition of with probability in the insertion-only model.
Proof.
We begin by applying the algorithm from Lemma 22 to , which identifies all SCCs containing at least nodes. This step runs in passes and uses space, succeeding with probability at least . After identifying these SCCs, we contract them, resulting in a new graph , where all SCCs contain at most nodes.
Next, we apply the algorithm from Theorem 29 to compute the SCC decomposition of . We set , ensuring that , which satisfies the conditions of Theorem 29. Substituting these parameters, the total number of passes used is , and the total space required is .
7.3 2-SAT
In this subsection, we focus on the problem of 2-SAT. A 2-SAT instance is a conjunction of clauses, . Each clause is a disjunction of two literals. Every , is one of the Boolean variables or their negations . A satisfying assignment of is an assignment of truth values for each variable that makes to be true, and we say is satisfiable if there exists at least one satisfying assignment. The task is to determine if is satisfiable, and if so, determine a satisfying assignment. Under streaming setting, each input of the input stream is a clause , and the conjunction of all inputs is the 2-SAT instance under examination.
Before we show how to apply SCC decomposition to 2-SAT in streaming, we show a one pass lower bound for 2-SAT.
Theorem 31.
Deciding if a 2SAT instance is satisfiable in one pass requires bits.
Proof.
We translate the reduction shown in [14] into streaming setting. Suppose to the contrary that we have a 2-SAT solver that could solve 2-SAT in one pass using bits, then we have an algorithm that could solve -reachability in one pass using bits by the following reduction. Given an instance of -reachability consisting of a directed graph and two designated nodes , for each input edge , generate input for the 2-SAT solver, and for the two nodes , generate input for the 2-SAT solver. The 2SAT instance generated is unsatisfiable if and only if can reach in . Since testing -reachability in a directed graph in one pass requires bits [11], there cannot exist such a 2-SAT solver. Theorem 31 shows that, merely deciding if a 2-SAT instance is satisfiable in one pass requires space enough to keep the whole input. Henceforth we focus on multi-pass algorithms.
We follow the approach of [2] which reduces 2-SAT to SCC-decomposition. The reduction constructs an implication graph from a 2-SAT instance . The satisfiability and the truth assignment of can be retrieved from the SCC decomposition of . We show that, given an input stream of 2-SAT, the construction of a stream of its implication graph only double the input complexity. For each variable and its negation , construct nodes . For each input clause , construct two edges , . The resulting implication graph is a graph with nodes and edges. We can apply our SCC decomposition algorithms to and obtain a solution to the 2-SAT instance.
Theorem 32.
Given any , 2-SAT can be solved in -pass and -space with probability in the insertion-only model.
Let denote the independence number of the implication graph of the input 2-SAT instance. We can apply our algorithm presented in Corollary 19.
Theorem 33.
A 2-SAT instance can be solved in -pass -space in the insertion-only model.
References
- [1] Akanksha Agrawal, Arindam Biswas, Édouard Bonnet, Nick Brettell, Radu Curticapean, Dániel Marx, Tillmann Miltzow, Venkatesh Raman, and Saket Saurabh. Parameterized streaming algorithms for Min-Ones d-SAT. In 39th Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 150 of LIPIcs, pages 8:1–8:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.FSTTCS.2019.8.
- [2] Bengt Aspvall, Michael F. Plass, and Robert Endre Tarjan. A linear-time algorithm for testing the truth of certain quantified boolean formulas. Inf. Process. Lett., 8(3):121–123, 1979. doi:10.1016/0020-0190(79)90002-4.
- [3] Béla Bollobás and Graham R. Brightwell. The dimension of random graph orders. In The Mathematics of Paul Erdős II, pages 47–68. Springer, 2013. doi:10.1007/978-1-4614-7254-4_5.
- [4] Manuel Cáceres. Minimum chain cover in almost linear time. In 50th International Colloquium on Automata, Languages, and Programming (ICALP), volume 261 of LIPIcs, pages 31:1–31:12. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.ICALP.2023.31.
- [5] Amit Chakrabarti, Prantar Ghosh, Andrew McGregor, and Sofya Vorotnikova. Vertex ordering problems in directed graph streams. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1786–1802. SIAM, 2020. doi:10.1137/1.9781611975994.109.
- [6] Rajesh Chitnis and Graham Cormode. Towards a theory of parameterized streaming algorithms. In 14th International Symposium on Parameterized and Exact Computation (IPEC), volume 148 of LIPIcs, pages 7:1–7:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPICS.IPEC.2019.7.
- [7] Rajesh Hemant Chitnis, Graham Cormode, Mohammad Taghi Hajiaghayi, and Morteza Monemizadeh. Parameterized streaming: Maximal matching and vertex cover. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1234–1251. SIAM, 2015. doi:10.1137/1.9781611973730.82.
- [8] Reinhard Diestel. Graph Theory, 4th Edition, volume 173 of Graduate texts in mathematics. Springer, 2012.
- [9] R. P. Dilworth. A decomposition theorem for partially ordered sets. Annals of Mathematics, 51(1):161–166, 1950.
- [10] Stefan Fafianie and Stefan Kratsch. Streaming kernelization. In Mathematical Foundations of Computer Science 2014 - 39th International Symposium (MFCS), volume 8635 of Lecture Notes in Computer Science, pages 275–286. Springer, 2014. doi:10.1007/978-3-662-44465-8_24.
- [11] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2-3):207–216, 2005. doi:10.1016/J.TCS.2005.09.013.
- [12] Alan M. Frieze. On the independence number of random graphs. Discret. Math., 81(2):171–175, 1990. doi:10.1016/0012-365X(90)90149-C.
- [13] Neil D. Jones. Space-bounded reducibility among combinatorial problems. J. Comput. Syst. Sci., 11(1):68–85, 1975. doi:10.1016/S0022-0000(75)80050-X.
- [14] Neil D. Jones, Y. Edmund Lien, and William T. Laaser. New problems complete for nondeterministic log space. Math. Syst. Theory, 10:1–17, 1976. doi:10.1007/BF01683259.
- [15] Hossein Jowhari, Mert Saglam, and Gábor Tardos. Tight bounds for Lp samplers, finding duplicates in streams, and related problems. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, (PODS), pages 49–58. ACM, 2011. doi:10.1145/1989284.1989289.
- [16] Shimon Kogan and Merav Parter. Faster and unified algorithms for diameter reducing shortcuts and minimum chain covers. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 212–239. SIAM, 2023. doi:10.1137/1.9781611977554.CH9.
- [17] Daniel Lokshtanov, Pranabendu Misra, Fahad Panolan, M. S. Ramanujan, Saket Saurabh, and Meirav Zehavi. Meta-theorems for parameterized streaming algorithms. In Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 712–739. SIAM, 2024.
- [18] Andrew McGregor. Graph stream algorithms: a survey. SIGMOD Rec., 43(1):9–20, 2014. doi:10.1145/2627692.2627694.
- [19] Yaron Minsky, Ari Trachtenberg, and Richard Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Trans. Inf. Theory, 49(9):2213–2218, 2003. doi:10.1109/TIT.2003.815784.
- [20] S. Muthukrishnan. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci., 1(2):117–236, August 2005. doi:10.1561/0400000002.
- [21] Jelle J. Oostveen and Erik Jan van Leeuwen. Parameterized complexity of streaming diameter and connectivity problems. Algorithmica, 86(9):2885–2928, 2024. doi:10.1007/S00453-024-01246-Z.
- [22] Warren Schudy. Finding strongly connected components in parallel using reachability queries. In Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 146–151. ACM, 2008. doi:10.1145/1378533.1378560.
