Abstract 1 Introduction 2 Preliminaries 3 Technical Overview 4 Discussion References Appendix A Model-Agnostic Algorithm

Model-Agnostic Approximation of Constrained Forest Problems

Corinna Coupette ORCID Aalto University, Finland
Max Planck Institute for Informatics, Saarbrücken, Germany
Alipasha Montaseri ORCID Sharif University of Technology, Tehran, Iran Christoph Lenzen ORCID CISPA Helmholtz Center for Information Security, Saarbrückem, Germany
Abstract

Constrained Forest Problems (CFPs) as introduced by Goemans and Williamson in 1995 capture a wide range of network design problems with edge subsets as solutions, such as Minimum Spanning Tree, Steiner Forest, and Point-to-Point Connection. While individual CFPs have been studied extensively in individual computational models, a unified approach to solving general CFPs in multiple computational models has been lacking. Against this background, we present the shell-decomposition algorithm, a model-agnostic meta-algorithm that efficiently computes a (2+ε)-approximation to CFPs for a broad class of forest functions. The shell-decomposition algorithm isolates the problem-specific hardness of individual CFPs in a single computational subroutine, breaking the remainder of the computation into fundamental tasks that are studied extensively in a wide range of computational models. In contrast to prior work, our framework is compatible with the use of approximate distances.

To demonstrate the power and flexibility of this result, we instantiate our algorithm for three fundamental, NP-hard CFPs (Steiner Forest, Point-to-Point Connection, and Facility Placement and Connection) in three different computational models (Congest, PRAM, and Multi-Pass Streaming). For constant ε, we obtain the following (2+ε)-approximations in the Congest model:

  1. (1)

    For Steiner Forest specified via input components (SF–IC), where each node knows the identifier of one of k disjoint subsets of V (the input components), we achieve a deterministic (2+ε)-approximation in 𝒪~(n+D+k) rounds, where D is the hop diameter of the graph, significantly improving over the state of the art.

  2. (2)

    For Steiner Forest specified via symmetric connection requests (SF–SCR), where connection requests are issued to pairs of nodes u,vV, we leverage randomized equality testing to reduce the running time to 𝒪~(n+D), succeeding with high probability.

  3. (3)

    For Point-to-Point Connection, we provide a (2+ε)-approximation in 𝒪~(n+D) rounds.

  4. (4)

    For Facility Placement and Connection, a relative of non-metric Uncapacitated Facility Location, we obtain a (2+ε)-approximation in 𝒪~(n+D) rounds.

We further show how to replace the n+D term by the complexity of solving Partwise Aggregation, achieving (near-)universal optimality in any setting in which a solution to Partwise Aggregation in near-shortcut-quality time is known. Notably, all of our concrete results can be derived with relative ease once our model-agnostic meta-algorithm has been specified. This demonstrates the power of our modularization approach to algorithm design.

Keywords and phrases:
Distributed Graph Algorithms, Model-Agnostic Algorithms, Steiner Forest
Funding:
Corinna Coupette: Supported by Digital Futures at KTH Royal Institute of Technology.
Copyright and License:
[Uncaptioned image] © Corinna Coupette, Alipasha Montaseri, and Christoph Lenzen; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Distributed algorithms
; Mathematics of computing Graph algorithms ; Theory of computation Models of computation
Related Version:
Full Version: https://arxiv.org/abs/2407.14536 [10]
Editor:
Dariusz R. Kowalski

1 Introduction

The classic approach to determining the computational complexity of a task consists in deriving upper and lower bounds that are as tight as possible in a particular model of computation. On the upper-bound side, this can result in solutions that are specifically engineered toward the chosen computational model, such that the task at hand needs to be reexamined for every relevant model. Worse still, one might end up with fragile solutions that overcome only the challenges specifically represented by the model under study. At first glance, lower bounds might appear more robust in this regard, since they highlight an obstacle that any algorithm needs to overcome. However, many such lower bounds are shown for very specific network topologies that are rarely observed in practice. This might lead to a false sense of success in having classified the complexity of a given task – although the lower bound merely indicates that the parameters used to capture the computational complexity of the task are insufficient. In brief, traditional upper and lower bounds bear two limitations: model specificity and existential optimality.

A textbook illustration of these limitations is provided by the Steiner Forest problem (SF), which generalizes the well-known Steiner Tree problem (ST). In the classic SF formulation using input components (SF–IC), we are given a weighted graph, along with disjoint subsets of nodes V1,,VkV, and the goal is to determine a minimum-weight subgraph spanning each Vi, i[k]. This general and fundamental connectivity problem has been studied in depth in the classic centralized model of computation [1, 16, 28, 8, 19]. It is of significant practical importance, both due to direct application, see e.g., [36, Sec. 18.5.5], and being the key building block in variants of Steiner-type problems with broad real-world utility, e.g., Steiner Tree reoptimization [5] and Multicommodity Rent-or-Buy [20]. The state of the art in the Congest model is due to Lenzen and Patt-Shamir [30], who presented modified and adapted variants of a well-known centralized algorithm by Agrawal et al. [1]. One might suspect that algorithmic techniques underlying these tailored solutions could be transferred to other computational models, but this cannot be readily determined from their specialized solution (model specificity). Furthermore, for any fixed topology, there are inputs such that even their fastest algorithm runs for Ω~(n) rounds.111We use 𝒪~ and Ω~ to suppress factors of log𝒪(1)n. This is known to be necessary in the worst case (existential optimality): the Minimum Spanning Tree (MST) problem is a special case of the Steiner Forest Problem with k=1 and V1=V, to which the Ω~(n) lower bound by Das Sarma et al. [11] applies. However, the topology of the lower-bound graph is highly contrived, and the technique of using low-congestion shortcuts has been demonstrated to overcome the n barrier for MST in many more realistic settings [13, 24, 14]. These findings motivate us to revisit the Steiner Forest problem, along with a much broader class of connectivity problems, in an attempt to overcome the abovementioned limitations.

Beyond Model Specificity and Existential Optimality

To go beyond of model specificity and existential optimality, two paradigms have emerged.

First, numerous works have pushed toward what we term model agnosticism, developing algorithms that can be readily instantiated in many computational models [35, 32, 7, 6, 21]. The individual steps of these algorithms are either core tasks like distance computation, identifying connected components, and sorting, which are well-studied across a wide range of models, or they are easily implemented at low cost in any notable model. Hence, we call these algorithms model-agnostic (meta-)algorithms. Naturally, model-agnostic algorithms are more robust against model variations, and by allowing us to plug in optimized model-specific subroutines for the core computational problems, they sacrifice little performance over model-specific solutions. Thus, model-agnostic algorithms can be viewed as model-agnostic reductions to more basic computational tasks. Moreover, since these basic tasks can be solved by model-specific subroutines, they improve whenever progress is made on the current performance bottleneck in a given computational model.

Second, universal optimality, which was coined in the context of distributed computing, pushes for the design of topology-adaptive algorithms that are asymptotically worst-case optimal on every network topology, i.e., when varying input parameters other than the underlying communication graph [27, 15, 37].222Note that instance optimality, which requires that an algorithm is 𝒪(1)-competitive with any other always-correct algorithm on each instance, including the best algorithm for the specific instance, is often unachievable [27]. In particular, one can test whether the input matches an abritrary fixed instance in D rounds. If so, a matching solution can be output without further computation; otherwise, a fallback algorithm is executed. One might argue that this idea is no different than taking into account more parameters, such as the network diameter, the node connectivity, or any other quantity meaningful for the computational task. However, parametrizing complexity by the input graph is extremely general, subsuming a large number of parameters that one might consider, and capturing any topology-specific obstacle for the task at hand in a given computational model.

Our Contributions in Brief

In this work, we study a general class of connectivity problems called Constrained Forest Problems (CFPs), introduced by Goemans and Williamson [16], through the lenses of model agnosticism and universal optimality. Intuitively, a CFP is specified by a Boolean function f that indicates, for each node subset SV, whether it needs to be connected to its complement, i.e., if the output must contain an edge from S to VS. We require f to be proper, i.e., (1) f(V)=0, (2) f(S)=f(VS), and (3) f(A)=f(B)=0 for disjoint A,BV implies that f(AB)=0. For example, the Steiner Forest problem can be specified by setting f(S)=1 if and only if, for given disjoint node subsets ViV, there exists some i[k] such that both ViS and Vi(VS) are non-empty.

Model-Agnostic Algorithm for CFPs

We devise the shell-decomposition algorithm, a generic approximation algorithm for CFPs that is efficient if f can be evaluated efficiently.

Theorem 1 (Model-Agnostic Complexity of Constrained Forest Problems).

Given 0<ε1 and a graph G=(V,E) with polynomially bounded edge weights c:E, a (2+ε)-approximation to a CFP with proper forest function f:2V{0,1} can (up to book-keeping operations) be obtained at complexity 𝒪~((aSSSP+MST+RPS+FFE)ε1), with the terms in the sum denoting the complexities of solving (1) aSSSP: (1+ε)-approximate Set-Source Shortest-Path Forest, (2) MST: Minimum Spanning Tree, (3) RPS: Root-Path Selection, and (4) FFE: Forest-Function Evaluation for f.

By book-keeping operations, we denote simple steps that extract the input for the next subroutine from the output of the previous ones and can be easily implemented at low complexity, where the complexity measure(s) depend on the model at hand (e.g., round complexity in Congest). MST is the special case of SF in which all nodes are to be connected by a lightest possible tree. The task of aSSSP requires us, given ε>0 and SV, to compute a forest that preserves distances to S up to a factor of 1+ε, which ie equivalent to computing a (1+ε)-approximate shortest-path tree in the graph obtained by identifying all nodes in S. Finally, RPS (a special case of the more general transshipment problem) demands that, given a rooted forest FE and a set of marked nodes, we select all edges on paths from marked nodes to the roots of their respective trees.

MST, aSSSP, and RPS are well-understood in many models of computation. In contrast, f can be (ab)used to force the evaluation of an arbitrarily hard function g on the entire topology.333For an input graph G=(V,E) with uniform edge weights and any function g mapping such a graph to a desired range, choosing an arbitrary node vV and setting f(S)=1 if and only if (i) S={v} or S=V{v}, and (ii) g(G)=1 results in a proper forest function f. Thus, Theorem 1 can be viewed as confining the hardness of the task arising from the choice of f to 𝒪(ε1logn) iterations of evaluating f(C) for all C𝒞, where 𝒞 is a set of disjoint connected components. Because for any such 𝒞, one can enforce this evaluation by assigning weight 1 to edges inside each C and a large weight, say n2, to all other edges, this is the best possible up to an 𝒪(ε1logn) factor.

To illustrate the power and flexibility of our result, we apply our machinery in three models of computation – Congest, Parallel Random-Access Machine (PRAM), and Multi-Pass Streaming (MPS) – to three NP-hard CFPs: (1) Steiner Forest (SF); (2) Point-to-Point Connection (PPC), i.e., given X,YV of equal cardinality, finding a lightest set of edges such that each induced component the number of nodes from X is the same as the number of nodes from Y; and (3) Facility Placement and Connection (FPC), i.e., minimizing the cost of opening facilities at some nodes and connecting a set of clients CV to them. Footnote˜5 summarizes our results in Congest; for all problems, our PRAM algorithms require 𝒪~(ε3m) work and 𝒪~(ε3) depth, and our MPS algorithms need 𝒪~(ε3) passes and 𝒪~(n) memory.

Table 1: Congest results for SF derived from our algorithm, for εΘ(1). Deterministic and randomized results are marked with 𝔇 and , respectively. Here, n is the number of nodes, D is the (unweighted) hop diameter, s is the shortest-path diameter (which may be different from the weighted diameter), t is the number of terminals, k is the number of SF input components, and Q is the shortcut quality of the input graph, cf. Table 3. In our results, the n+D terms can be replaced by TPAno(1), where TPA𝒪(n+D), TPAQ, is the running time of Partwise Aggregation [25]. The no(1) factors can be removed conditionally on the existence of certain cycle-cover algorithms [35]. For any values of the other parameters, s and t can be linear in n.555Strictly speaking, the upper bounds from Lenzen and Patt-Shamir [30] are incomparable to ours, as it is possible to construct instances in which s, t, and k are small, yet TPA is not. This corresponds to “easy” instances for the subroutines we use, so one could provide refined bounds. However, this is orthogonal to our goals in this work, as we seek to derive bounds that depend on the topology (V,E) only.
Problem LB Previous Work Our Work
Ref. APX Complexity Ref. APX Complexity
SF(–IC) Ω~(Q) [27] (2+ε) 𝔇 𝒪~(sk+min{st,n}) [30] (2+ε) 𝔇 𝒪~(n+D+k)
𝒪(logn) 𝒪~(min{s,n}+D+k)
PPC Ω~(Q) [27] (2+ε) 𝔇 𝒪~(n+D)
FPC Ω~(Q) [27] (2+ε) 𝔇 𝒪~(n+D)
Table 2: Congest results for the four input variants of SF, with the notational conventions from Footnote˜5. Regardless of the input representation, on any graph, a lower bound of Ω~(Q) applies to randomized algorithms; the listed lower bounds arising from the input representation are existential (ex.). As before, n+D terms can be replaced by TPAno(1).
Problem Input ex. LB APX Complexity
SF–IC Each node v is given its component identifier λv[k]{} Ω~(k) (2+ε) 𝔇 𝒪~(n+D+k)
SF–CIC As in SF–IC, but node v knows λv and |{uVλu=λv}| (if λv) Ω~(k) 𝔇 (2+ε) 𝒪~(n2/3+D)
SF–CR Each node v is given vV{v} Ω~(t) (2+ε) 𝔇 𝒪~(n+D+t)
SF–SCR (V2); each node v knows
v={uV{u,v}}
Ω~(t) 𝔇 (2+ε) 𝒪~(n+D)
Approaching Universal Optimality

In our upper bounds for Congest, n+D can largely be replaced by TPAno(1), where TPA is the running time of an algorithm performing Partwise Aggregation (PA) [25, 14] – i.e., given a partition of V into connected subsets of nodes, computing the result of an aggregation function separately on each subset and outputting the result at each of its constituent nodes. Due to the respective hardness results [27], this implies that the running times of our solutions to PPC and FPC are universally optimal up to a factor of no(1).

For SF–IC, this is true up to the additive term of k. Here, PA is insufficient: Evaluating f requires us to determine, for each set Vi, if it is contained in a single connected component induced by the set of edges that have been selected into the current (partial) solution, but the Vi may not induce connected components in G. Existential lower bounds demonstrate that this obstacle is unavoidable in general [30]. We suspect that studying Partwise Aggregation with disjoint components that may be internally disconnected – i.e., Disjoint Aggregation (DA) – is key to characterizing the universal complexity of SF–IC, but leave this to future work, noting that an upper bound of 𝒪(k+D) arises from standard pipelining techniques.

An orthogonal consideration, however, yields surprising results, as summarized in Table 2. For the SF problem, the input representation drives the problem complexity. It is known that if nodes are given the identifiers of other nodes they must connect to in the output to encode connectivity requirements (SF–CR), an existential lower bound of Ω~(t) applies [30], where t is the number of terminals, i.e., nodes that need to be connected to some other node. In our corresponding upper bound, the additive k then becomes an additive t, again resulting in existential but not universal optimality. Interestingly, this picture is turned upside down when inputs are symmetric in the following sense: If uV knows that it must connect to vV by the input, then also v knows that it must connect to u (SF–SCR). In this setting, we can exploit symmetry to efficiently evaluate f using randomized equality testing. This leads to an algorithm running in ε3TPAno(1) rounds, which is close to universal optimality, provided that an efficient partwise-aggregation routine is available. Similarly, we observe that the Ω~(k) bound can be circumvented if nodes receive not only the identifier of their input component Vi, but also the size |Vi| of this input component (SF–CIC). We then obtain a randomized algorithm running in 𝒪~(ε3(n+D)+ε1n2/3) time.

We note that a simple adaptation of the lower-bound construction from Lenzen and Patt-Shamir [30] shows the same hardness (i.e., Ω~(t) for SF–SCR and Ω~(k) for SF–CIC, respectively) for deterministic algorithms, based on a communication-complexity reduction from 2-player equality testing. To the best of our knowledge, this is the first natural example of a provably large gap between the randomized and deterministic complexity in the Congest model for a global problem.

Structure

After introducing our main definitions in Section 2, we provide a technical overview of our results in Section 3. To conclude the main text, we discuss open questions in Section 4. A full specification of our model-agnostic algorithm and a detailed proof of its correctness are given in Appendix A. Model descriptions, model-specific results, and further related work are deferred to the additional appendices available in the extended version of this work [10].

2 Preliminaries

Table 3: Main notation used in this work. All notation involving nodes and edges makes implicit reference to the graph G.
Symbol Definition Meaning
|S| Cardinality of a set S
2S = {XXS} Set of all subsets (i.e., power set) of S
(Sk) = {XS|X|=k} Set of k-element subsets of S
[k] = {ii,ik} Set of nonnegative integers no larger than k
[k]0 = {ii0,ik} Set of positive integers no larger than k
G = (V,E) Graph with node set V and edge set E
n = |V| Number of nodes
m = |E| Number of edges
c(e) ={1,2,} Cost of edge e
c(Z) = eZc(e) Cost of edge subset ZE
p(u,v) = (e1,,e) s.t. (v1,,v+1): -hop path between u and v
ei={vi,vi+1}E i[], (distinct nodes and distinct edges)
u=v1, v=v+1
h(u,v) = min{ip(u,v)with|p(u,v)|=i} Hop (= unweighted) distance between u and v
D = max{h(u,v){u,v}(V2)} Hop diameter of G
d(u,v) = min{ip(u,v)withc(p(u,v)} Weighted distance between u and v
P(u,v) = p(u,v) with c(p(u,v))=d(u,v) Shortest path between u and v
s = max{min{|P(u,v)|{u,v}(V2)} Shortest-path diameter of G
δ(S) = {eE|eS|=1} Set of edges with exactly one endpoint in SV
𝒯 = {vVf({v})=1} Terminals
t = |𝒯| Number of terminals
k Number of input components in SF–IC

We begin by defining our basic notation as well as the class of problems we are interested in.

Basic Notation

Our basic notation is summarized in Table 3.

For a set S, we denote its cardinality by |S|, its power set by 2S={XXS}, and the set of its k-element subsets by (Sk). We extend functions f:S to subsets XS in the natural way by setting f(X)=xXf(x), and we write the sets of positive and nonnegative integers no greater than k as [k]={iik} resp. [k]0={i0ik}.

We consider weighted graphs G=(V,E) with n=|V| nodes, m=|E| edges, and edge weights (edge costs) c:E0 polynomially bounded in n.666Assuming polynomially bounded edge weights allows us to encode polynomial sums of edge weights with 𝒪(logn) bits, which means that we can encode edge weights in a single message (Congest) or memory word (PRAM and MPS). Zero-weight edges arise naturally when simulating contractions in the distributed setting. We can handle them by scaling all non-zero edge weights by n/ε (where w.l.o.g., 1/ε is polynomially bounded as well), and assigning weight 1 to all previously zero-weight edges. Each node is equipped with a unique identifier of 𝒪(logn) bits, which is also used to break ties. An -hop path from uV to vV, denoted p(u,v), is a sequence of distinct edges (e1,e) arising from a sequence of distinct nodes (v1,,v+1) such that ei={vi,vi+1}E for i[], v1=u, and v+1=v. The (unweighted) hop distance between u and v is the smallest number of hops needed to go from u to v, i.e., h(u,v)=min{ip(u,v)with|p(u,v)|=i}, and the hop diameter of G is D=max{h(u,v){u,v}(V2)}. The (weighted) shortest-path distance between u and v is d(u,v)=min{ip(u,v)withc(p(u,v))=i}. A shortest path between u and v, denoted P(u,v), is a path from u to v of length d(u,v). The shortest-path diameter s of G is the maximum over all {u,v}(V2) of the minimum number of hops contained in a shortest path from u to v, i.e., s=max{min{|P(u,v)|{u,v}(V2)}}. Given a cut (S,VS), the set of edges with exactly one endpoint in S is denoted as δ(S)={eE|eS|=1}.

An event occurring with high probability (w.h.p.) has probability at least 11/nc for a freely chosen constant c1.

Constrained Forest Problems

We are interested in Constrained Forest Problems (CFPs) as introduced by Goemans and Williamson [16].777To simplify the technical exposition, like Goemans and Williamson [16], we disallow zero-weight edges. However, it is straightforward to extend our approach to zero-weight edges by scaling edge weights as discussed in Footnote 6. Given a graph G=(V,E) with edge costs c:E and a function f:2V{0,1}, a CFP asks us to solve the integer program stated as Problem 1, whose dual relaxation is provided as Problem 2.

Problem 1 (CFP Primal IP).
mineEc(e)xe
s.t. x(δ(S))f(S)SV
xe{0,1}eE
Problem 2 (CFP Dual LP).
maxSVf(S)yS
s.t. S:eδ(S)ySc(e)eE
yS0

That is, a CFP is a minimization problem whose optimal solution is a forest888For any cycle and node set S, δ(S) cannot contain exactly one edge from the cycle. Hence, from any solution with a cycle, we can remove an edge. of edges from the input graph G that meets the constraints imposed by the forest function f. Like Goemans and Williamson [16], we consider CFPs with proper functions f, which satisfy (1) zero, i.e., f(V)=0, (2) symmetry, i.e., f(S)=f(VS), and (3) disjointness (also called maximality [17]), i.e., if AB= for two sets A and B, then f(A)=f(B)=0 implies f(AB)=0.999We also assume that f is nontrivial, i.e., that there exists at least one SV such that f(S)=1.

For a CFP with forest function f, a node v is called a terminal if f({v})=1. Given a forest function f, we denote the set of terminals as 𝒯={{v}vV,f({v})=1} and the number of terminals as t=|𝒯|.

Specific Constrained Forest Problems

To demonstrate the flexibility of our approach, we consider Survivable Network Design Problems (SNDPs) that originate from three different real-world challenges.

Definition 2 (Steiner Forest (SF)).

Given a partition of the terminals 𝒯=V1˙˙Vk, find a minimum-cost edge subset connecting the nodes in each input component. The corresponding forest function evaluates to 1 on SV if and only if there is i[k] such that ViSVi. We distinguish between several input representations:

SF–CR

Terminal v is given the identifiers of other terminals as connection requests Rv𝒯.

SF–SCR

As SF-CR, but connection requests are symmetric, i.e., uRvvRu.

SF–IC

Terminal vVi is given a unique identifier (of size 𝒪(logn)) for its input component.

SF–CIC

As SF-IC, but vVi is also given the cardinality |Vi| of its input component as input.

SF has practical relevance especially in infrastructure development [1], with MST (𝒯=V1=V) and ST (𝒯=V1V) as special cases. It is NP-complete [29] and APX-hard [9].

Definition 3 (Point-to-Point Connection (PPC)).

Given a set of sources XV and a set of targets YV, find a minimum-cost edge subset such that in each connected component, the number of sources equals the number of targets. That is, for SV, f(S)=1 if and only if |SX||SY|.

PPC is motivated by challenges from circuit switching and VLSI design, and the problem is NP-complete [31].

Our last problem is Facility Placement and Connection (FPC), an NP-complete facility-location-type problem arising, e.g., in operations research. Intuitively, FPC can be stated as follows.

Definition 4 (FPC [intuitive]).

Given for each node vV an opening cost ov and indication whether it is in the set of clients CV, identify a subset OV of facilities to open and an edge set FE such that each client is connected to a facility by F, minimizing vOov+eFc(e).

To turn this task into a CFP matching our framework, we add one additional node sV and, for each vV, an edge {v,s} of weight c({v,s})=ov. The task then becomes to determine a (low-weight) Steiner Tree spanning C{s}, i.e., the special case of SF with k=1.

Definition 5 (FPC [rephrased]).

Given, for each node vV, an opening cost ov and an indication whether it is in the set of clients CV, solve ST on G=(V˙{s},E˙{{v,s}|vV}), with edge costs of c(e) for eE and c({v,s})=ov for vV, where the terminals are 𝒯=C{s}.

Even in Congest, we can solve ordinary ST instances efficiently, regardless of the specific input representation. However, the virtual node s and its incident edges need to be simulated in the chosen model of computation. This is trivial in PRAM (simply modify the input representation in parallel) and Multi-Pass Streaming (since we use Ω(nlogn) bits of memory anyway) but nontrivial in Congest. For Congest, we show that we can simulate the virtual node efficiently using Partwise Aggregation.

Partwise Aggregation and Shortcut Quality

Finally, we introduce a subroutine that we use as a black box to achieve near-universal optimality in cases where efficient solutions are known.

Definition 6 (Partwise Aggregation [13]).

For disjoint node sets V1,,VkV, suppose that Vi induces a connected subgraph. In the Partwise Aggregation (PA) problem, each vVi is given a unique identifier for Vi (of size 𝒪(logn)) and a second 𝒪(logn)-bit value x(v)X. For a specified associative and commutative operator :X×XX, for each i[k] and each vVi, v needs to compute its output wVix(w).

Definition 7 (Shortcut Quality).

The shortcut quality of G, denoted by Q, is the maximum over all feasible operators and partitions of V of the minimum number of rounds in which a Congest algorithm with knowledge of the full topology can solve Partwise Aggregation. Put differently, the algorithm may preprocess the graph and the operator, but must then compute the output within Q rounds after the nodes have been given their inputs to the PA instance.

Haeupler et al. [27] show that Ω~(Q) is a lower bound for MST (i.e., SF and ST with k=1 and V1=V) and shortest s-t path (the special case of PPC with |X|=|Y|=1) – regardless of the approximation ratio and also for randomized Las Vegas algorithms, i.e., those that guarantee a feasible output. They further prove that PA can be solved in 𝒪~(Q) rounds in the Supported Congest model, which is Congest with the unweighted graph topology given as part of the input.

3 Technical Overview

In this section, we outline our techniques and discuss the technical challenges to be overcome. We also use this opportunity to discuss the most relevant related work in context.

Figure 1: Overview of our model-agnostic shell-decomposition algorithm for approximating Constrained Forest Problems. Main tasks are colored; book-keeping operations are shaded in gray.
Model-Agnostic Algorithm for Proper Constrained Forest Problems

Our shell-decomposition algorithm, depicted in Figure 1, is based on the primal-dual formulation for general CFPs given by Goemans and Williamson [16] (GW algorithm), restated as Algorithm 1, which yields a (22/t)-approximation [16] for CFPs with proper forest functions. Our algorithm can also be seen as a model-agnostic generalization of the algorithm by Agrawal et al. [1], ported to the distributed setting by Lenzen and Patt-Shamir [30]. These algorithms have also been called moat-growing algorithms [18, 3, 30].

Figure 2: Illustration of our shell-decomposition argument on a small instance of Steiner Forest (SF–IC). Gray lines indicate original edges, black lines indicate (parts of) selected edges, black circle linings indicate active components, and node colors indicate input components. The illustrations provided by Goemans and Williamson [16] (Figs. 2–5) are stylistically similar, but our drawings clarify the phase-wise charging argument underlying our shell-decomposition algorithm.

Intuitively, our algorithm operates as follows. We maintain connected components 𝒞, initialized to the singletons 𝒞={{v}vV}. A component C𝒞 is active if f(C)=1. The algorithm concurrently grows balls around all active components, with respect to the metric induced by the then-current edge costs c. In the growth process, two balls can touch only when at least one of their associated components is active, and only when the balls of two active components C, C touch, adding edges to merge the components can make the resulting component inactive – otherwise, i.e., assuming f(CC)=f(C)=0, symmetry implies f(V(CC))=0, disjointness implies f(VC)=f((V(CC))C)=0, and using symmetry again we get f(C)=0, a contradiction.

As illustrated in Figure 2, until the balls around two components touch, they are disjoint, witnessing that the dual problem has a solution with weight larger than the product of the current radius times the number of currently active components. Accordingly, when merging active components, we can afford to connect the terminals by adding a shortest path between them to the primal solution, paying a cost of at most twice the radius at merge. Because each merge we perform reduces the number of active components by at least one, the ball growth always witnesses sufficient additional weight in a dual solution to pay for future merges up to an approximation factor of 2. Upon termination, i.e., when all components are inactive, the set of edges we selected constitutes a feasible solution.

The intuition sketched above already suggests that the ball-growing process allows for substantial concurrency. To achieve high efficiency across the board in various models, our shell-decomposition algorithm performs several modifications to the centralized GW algorithm, which originally assumes a final filtering step to eliminate unneeded edges from the solution, as well as exact distances and immediate forest-function evaluation.

  1. (A)

    Incremental Solution-Set Construction. We merge active components that touch regardless of whether this ultimately turns out to be necessary to satisfy connectivity requirements. This does not impact the approximation guarantee, which was implicit already in the contribution by Agrawal et al. [1] and is now made explicit in our reformulation of the GW framework [16]. As a result, we need to determine if f imposes further connectivity requirements only as often as we iterate through the loop in Figure 1.

  2. (B)

    Approximate Distance Computations. At the cost of a factor of 1+ε in the approximation ratio, we replace exact distance computations with (1+ε)-approximate distance computations. The challenge here is to ensure that we do not violate the dual constraints when charging dual variables (corresponding to cuts) based on the progress of the primal solution. This can be achieved by constructing the dual solution in the true metric space, rather than reusing the approximate distances leveraged by the primal solution. In contrast to the other modifications, this requires a comparatively involved argument, and it is the main technical novelty and contribution in this part of our work.

  3. (C)

    Deferred Forest-Function Evaluation. Again at the cost of a factor of 1+ε in the approximation ratio, we let components grow to radii that are integer powers of 1+ε before reevaluating our forest function to update their activity status. This technique was introduced by Lenzen and Patt-Shamir [30] for the Steiner Forest problem specifically; we show its general correctness in the context of the GW algorithm. Using this technique, we can limit the number of loop iterations in Figure 1 to 𝒪(ε1logn) (assuming polynomially bounded edge weights).

Figure 3: Operation of our shell-decomposition algorithm (Algorithm 2) on an s-t-shortest-path instance seeking to connect the red nodes, starting with r0=1/2 (cf. Line 7), and working over phases 0, 1, and 2. Panels are labeled with their phase number and the illustrated step of Algorithm 2. Nodes absorbed by the SSSP forest are drawn in orange, edge-cost reduction is indicated in purple, edges selected into the SSSP forest are marked in green, and edges selected into the solution are marked in black. Distance approximations and deferred forest-function evaluation are not shown.

The pseudocode of the resulting model-agnostic shell-decomposition algorithm is given as Algorithm 2. Here, to increase clarity and highlight the primal-dual nature of our algorithm, we also compute and output the lower bound LB on the cost of an optimal solution, which is not required to determine the output forest F. An example execution of Algorithm 2 is depicted in Figure 3. In Appendix A, we prove that the algorithm maintains an approximation ratio of 2+ε.

Leveraging the modifications specified in Items B, C, and A, to derive concrete algorithms in specific models of computation, what remains is to implement the individual steps in Figure 1. Up to simple book-keeping operations, this entails four main tasks (colored boxes in Figure 1):

  1. (1)

    (Approximate) Set-Source Shortest-Path Forest (aSSSP). This is essentially computing a single-source shortest-path tree with a virtual source node, so we can plug in state-of-the-art algorithms for each model of interest [4, 35].

Algorithm 1 GW Algorithm for (22/t)-Approximation of Constrained Forest Problems [16].
Algorithm 2 Our Shell-Decomposition Algorithm for (2+ε)-Approximation of Constrained Forest Problems. To obtain the desired approximation guarantee, we choose ε,ε′′ε/4.
  1. (2)

    Minimum Spanning Tree (MST). This is another well-studied task for which near-optimal solutions are known in all prominent models [33, 27, 34, 12].

  2. (3)

    Root-Path Selection (RPS). Here, we are given a forest rooted at sources (where each node knows its parent in its tree and its closest source), along with a number of marked nodes. Our goal is to select the edges on the path from each marked node to its closest source. This problem can be straightforwardly addressed by solving a much more general flow problem: (Approximate) Transshipment, in which flow demands are to be satisfied at minimum cost. Again, we can plug in state-of-the-art algorithms for each model of interest [4, 35]. Note that in our case, the solution is restricted to containing the edges of a predetermined forest, such that the solution is unique, edge weights play no role, and the challenge becomes to determine the feasible flow quickly.

  3. (4)

    Forest-Function Evaluation (FFE). The remaining subtask is to assess if f(C)=1, for each component CV in a set of disjoint, internally connected components 𝒞– i.e., to determine which of the components still need to be connected to others. This is the only step that depends on f, requiring an implementation of f matching the given model of computation.

Note that f is an arbitrary proper function, so evaluating it can be arbitrarily hard. Thus, our algorithm confines the hardness of the task that comes from the choice of f to O(ε1logn) evaluations of f(C) for disjoint connected components C𝒞 (cf. Item A). In contrast, the other three subtasks can be solved by state-of-the-art algorithms from the literature in a black-box fashion.

To illustrate the power of our result, we apply our machinery in three models of computation – Congest, Parallel Random-Access Machine (PRAM), and Multi-Pass Streaming (MPS) – to three Constrained Forest Problems. Our results follow from Theorem 1 with (1) the referenced results on aSSSP, MST, and RPS, (2) model- and problem-specific subroutines for FFE, and (3) model-specific subroutines for book-keeping operations.

In Footnote˜5, we highlight the improvements over the state of the art achieved for our example problems by instantiating the shell-decomposition algorithm in the Congest model.

  1. (I)

    Steiner Forest (SF). From 𝒪~(ε1(sk+min{st,n})) time101010Recall that n denotes the number of nodes, k the number of input components, t the number of terminals, D the unweighted (hop) diameter, and s the shortest-path diameter. Both t and s can be up to Ω(n), regardless of k or D. for a (2+ε)-approximation and 𝒪~(min{s,n}+D+k) time for an 𝒪(logn)-approximation to SF–IC obtained by Lenzen and Patt-Shamir [30] to (2+ε)-approximations in time (1) 𝒪~(ε3(n+D)+ε1k) for SF–IC, (2) 𝒪~(ε3(n+D)) for SF–SCR, and (3) 𝒪~(ε3(n+D)+ε1min{n2/3,k}) for SF–CIC. For SF–CR, we incur the same additive running-time overhead of 𝒪(t) as Lenzen and Patt-Shamir [30], matching the existential lower bound they showed.

  2. (II)

    Point-to-Point Connection (PPC). We are not aware of prior work providing Congest algorithms for the PPC problem. Here, we obtain a (2+ε)-approximation in 𝒪~(ε3(n+D)) time.

  3. (III)

    Facility Placement and Connection (FPC). We do not know of any existing Congest algorithms for the FPC problem, and we realize a (2+ε)-approximation in 𝒪~(ε3(n+D)) time.

We also derive algorithms for SF, PPC, and FPC in the PRAM and Multi-Pass Streaming models, taking 𝒪~(ε3m) work and 𝒪~(ε3) depth in the PRAM model, as well as 𝒪~(ε3) passes and 𝒪~(n) space in the Multi-Pass Streaming model. To the best of our knowledge, these algorithms are either the first ones to perform these tasks in their respective models or they cover more general classes of instances than the state of the art (see extended version). Notably, we obtain our diverse results with relative ease once the analysis of our model-agnostic shell-decomposition algorithm is in place – which contrasts with the challenges of directly designing specific algorithms for specific problems in specific models.

Taking Shortcuts

In the Congest model, the n term in the complexity is due to the fact that, in general, it is not possible to solve Partwise Aggregation (PA) in o~(n) rounds. As formalized in Definition 6, PA denotes the task of performing an aggregation and broadcast operation on each subset in a partition of V that induces connected components [14]. We can leverage PA, inter alia, to determine a leader and distribute its ID, or to find and make known a heaviest outgoing edge, for each component (a.k.a. part) in parallel. PA-type operations are key to efficient MST construction, and any TPA-round solution to Partwise Aggregation lets us solve MST in 𝒪~(TPA+D)=𝒪~(TPA) rounds [13, 14].

As MST computation is a CFP, it might not surprise that Partwise Aggregation can serve as a key subroutine for other CFPs in the Congest model as well. We show that the n term in the complexity of CFPs can be replaced by TPA. Since this is already known for (1+ε)-approximate Set-Source Shortest-Path Forest [35], Minimum Spanning Tree [13], and (1+ε)-approximate Transshipment [35] (which can be used to solve Root-Path Selection),111111The results for approximate Set-Source Shortest-Path Forest and approximate Transshipment are conditional on the existence of fast (𝒪~(1),𝒪~(1))-cycle-cover algorithms; otherwise, we incur an no(1) overhead (see Footnote 5). again we need to show this for Forest-Function Evaluation only. This is straightforward for PPC and FPC, and simple algorithms evaluate the forest function for SF-IC and SF-CR in 𝒪(TPA+k) and 𝒪(TPA+t) rounds, respectively.

The substantial literature on low-congestion shortcuts provides a large array of results on solving Partwise Aggregation in time comparable to the shortcut quality of the input graph, i.e., the best possible running time TPA for Partwise Aggregation [37, 35, 27, 26, 15, 14, 22, 23, 2]. In particular, TPA𝒪~(D) if the input graph does not contain a fixed 𝒪~(1)-dense minor, without precomputations or further knowledge of G [14]. Examples of graphs without 𝒪~(1)-dense minors are planar graphs and, more generally, graphs of bounded genus. Moreover, in Supported Congest (where (V,E) is known – or rather, can be preprocessed), TPA𝒪~(Q), i.e., within a polylogarithmic factor of the optimum. Due to the modular structure of our results, any future results on Partwise Aggregation and low-congestion shortcuts will automatically improve the state of the art for CFPs in the Congest model.

The Quest for Universal Optimality

In the Congest model, it is known that even on a fixed network topology, Ω~(TPA) rounds are required to obtain any non-trivial approximation for a large class of problems. This class includes MST, a special case of ST, which reduces to FPC by making all but the opening cost of a single node very high, and shortest s-t-path [27], a special case of PPC. Thus, this universal lower bound applies to our example tasks as well. In particular, our results on PPC and FPC have almost universally tight running times.

In contrast, the additive terms of k and t in the running times of our algorithms for SF-IC and SF-CR, respectively, are only existentially optimal [30]: The lower-bound graph – a double star – has shortcut quality 𝒪(1), but in a fully connected graph, it is trivial to evaluate f for all current components in two rounds. This motivates us to split Forest-Function Evaluation for Steiner Forest into two parts: (1) determining, for each input component ViV or connection request r, respectively, whether the specified connectivity requirements are met, and (2) determining, for each current component C𝒞, whether it contains a terminal whose connectivity requirements are not met. While the second part can be solved via standard Partwise Aggregation, the first part requires aggregating information within k (or t) disjoint components that may be internally disconnected. We call this task Disjoint Aggregation on p parts, DA(p), achieve the trivial bound TDA(p)𝒪(D+p) via pipelining over a BFS tree, and hypothesize that the universal complexity of DA(p) merits further investigation in future work.

Figure 4: Example of a lower-bound graph used to reduce Equality Testing to Steiner Forest, for an instance with n=8. Node colors indicate input components (SF–CIC) resp. connection requests (SF–SRC), and node shapes indicate which nodes are simulated by which player.

As an orthogonal approach, we explore the effect of the input specification on our SF results. The assumptions that (1) pairs of terminals know that they need to be connected (SF–SCR), or (2) the size of each input component is known to its constituent terminals (SF–CIC) both are plausible, and they change the basis of the applicable existential lower bounds from 2-party set disjointness [30] to 2-party equality, as depicted in Figure 4. We provide the details in the extended version.

Assuming SF–SCR, we show how to achieve a running time of 𝒪~(min{TPAno(1),n+D}) using efficient randomized 2-party equality testing. We sketch a solution assuming shared randomness here; standard techniques achieve the same without shared randomness. For each terminal-request pair {u,v}, we flip a fair independent coin and denote the result by c{u,v}{0,1}. Now each component C aggregates uCvvc{u,v}mod2. Observe that if u,vC for request pair {u,v}, then for SF–SCR, it holds that vu and uv. Hence, if C contains, for each request pair, either both terminals or none of the terminals, then the sum is guaranteed to be 0 modulo 2. Otherwise, fix a request pair {u,v} with uC and vVC. After evaluating the sum up to coin cu,v, it is either 0 or 1, and hence, by independence of cu,v, with probability 1/2, the sum is 1 modulo 2. Therefore, performing the process 𝒪(logn) times in parallel, we can distinguish between f(C)=0 and f(C)=1 with high probability. This computation can be performed by a single aggregation, where the aggregation operator is given by bit-wise addition modulo 2.

Our strategy for SF–CIC is similar, but results in a much weaker bound of 𝒪~(n2/3+D); our main point here is to demonstrate that the Ω~(k) can be beaten. We distinguish three cases. (1) Components C of size at most n2/3 are spanned by a rooted tree of size n2/3. Here, we can aggregate the terminal counts, for all input components, within C at C’s root in 𝒪(n2/3) rounds to determine activity status. (2) For each input component of size at least n1/3, we globally aggregate if there are two distinct component IDs with a terminal from that input component. This requires one aggregation for each such input component, all of which can be completed within 𝒪(n2/3+D) rounds via a BFS tree, as there can be at most n2/3 input components of this size. (3) For input components of size s<n1/3, each component C of size larger than n2/3 uses the same strategy as for SF–SCR. However, only input components of the same size can be handled in a single aggregation, as the summation is now modulo s. Hence, 𝒪~(n1/3) aggregations by at most n1/3 components of size larger than n2/3 are required, which can again be performed in 𝒪~(n2/3+D) rounds.

4 Discussion

In this work, we presented a general model-agnostic framework for the (2+ε)-approximation of Constrained Forest Problems (CFPs) and demonstrated its utility on three NP-hard CFPs in three models of computation. We conclude with a number of open questions – beyond applying our framework to other graph problems and computational models – in increasing order of generality:

  1. (Q1)

    Can the running time of SF–CIC in Congest be improved to nearly TPA?

  2. (Q2)

    Many of our results are near-universally optimal in Congest, even for Las Vegas algorithms – i.e., randomized with guaranteed success – but our algorithms for SF–SCR and SF–CIC are Monte Carlo. Due to existential lower bounds based on the communication complexity of 2-party equality, this is required to (always) achieve small running times w.h.p.
    Is Ω~(Q) also a lower bound for Las Vegas Congest algorithms?

  3. (Q3)

    How hard is Disjoint Aggregation, i.e., how can we characterize TDA(p)?

  4. (Q4)

    The FPC problem minimizes the sum of opening and forest costs, disregarding the (often distance-based) connection costs considered in other facility-location-type problems.
    Does our approach to FPC generalize to problems with connection costs?

  5. (Q5)

    As we reduce most of our tasks to few PA instances, in Congest, TPAQ rounds are both necessary and sufficient to achieve near-universal optimality. Since PA can be solved in 𝒪(n) work and 𝒪(logn)=𝒪~(1) depth in PRAM, PA-based algorithms also yield good solutions in PRAM. However, in the streaming model, while PA can be solved in 𝒪~(n) memory and two passes, it is unclear if this yields optimal results (unless we insist that the output needs to be held in memory), as we are mostly limited to existential Ω~(n) lower bounds.
    Are there o~(n)-memory streaming algorithms with 𝒪~(1) passes? Better yet: Is there an analog to universal optimality in the MPS model?
    Note that TPA seems inadequate as a parameter here, as each part will require some memory for a few-pass implementation, but there may be Ω(n) parts.

  6. (Q6)

    Does our approach generalize to CFPs on hypergraphs?

  7. (Q7)

    Beyond proper functions, the primal-dual method has proven useful for uncrossing functions [17]. One of the main features of optimization problems with uncrossing functions is that they are guaranteed to feature an optimal dual solution that is laminar.
    Can our approach be extended to uncrossing or other non-proper functions?

References

  • [1] Ajit Agrawal, Philip N. Klein, and R. Ravi. When trees collide: An approximation algorithm for the generalized steiner problem on networks. SIAM J. Comput., 24(3):440–456, 1995. doi:10.1137/S0097539792236237.
  • [2] Ioannis Anagnostides, Christoph Lenzen, Bernhard Haeupler, Goran Zuzic, and Themis Gouleakis. Almost universally optimal distributed laplacian solvers via low-congestion shortcuts. Distributed Comput., 36(4):475–499, 2023. doi:10.1007/s00446-023-00454-0.
  • [3] Aaron Archer, MohammadHossein Bateni, MohammadTaghi Hajiaghayi, and Howard J. Karloff. Improved approximation algorithms for prize-collecting steiner tree and TSP. SIAM J. Comput., 40(2):309–332, 2011. doi:10.1137/090771429.
  • [4] Ruben Becker, Sebastian Forster, Andreas Karrenbauer, and Christoph Lenzen. Near-optimal approximate shortest paths and transshipment in distributed and streaming models. SIAM J. Comput., 50(3):815–856, May 2021. doi:10.1137/19M1286955.
  • [5] Davide Bilò. New algorithms for steiner tree reoptimization. Algorithmica, 86(8):2652–2675, 2024. doi:10.1007/S00453-024-01243-2.
  • [6] Joakim Blikstad, Jan van den Brand, Yuval Efron, Sagnik Mukhopadhyay, and Danupon Nanongkai. Nearly optimal communication and query complexity of bipartite matching. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 1174–1185. IEEE, 2022. doi:10.1109/FOCS54457.2022.00113.
  • [7] Yi-Jun Chang, Manuela Fischer, Mohsen Ghaffari, Jara Uitto, and Yufan Zheng. The complexity of (Δ+1) coloring in congested clique, massively parallel computation, and centralized local computation. In Peter Robinson and Faith Ellen, editors, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, PODC 2019, Toronto, ON, Canada, July 29 - August 2, 2019, pages 471–480. ACM, 2019. doi:10.1145/3293611.3331607.
  • [8] Chandra Chekuri and F. Bruce Shepherd. Approximate integer decompositions for undirected network design problems. SIAM J. Discret. Math., 23(1):163–177, 2008. doi:10.1137/040617339.
  • [9] Miroslav Chlebík and Janka Chlebíková. The steiner tree problem on graphs: Inapproximability results. Theor. Comput. Sci., 406(3):207–214, 2008. doi:10.1016/j.tcs.2008.06.046.
  • [10] Corinna Coupette, Alipasha Montaseri, and Christoph Lenzen. Model-agnostic approximation of constrained forest problems, 2024. doi:10.48550/arXiv.2407.14536.
  • [11] Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. Distributed verification and hardness of distributed approximation. SIAM J. Comput., 41(5):1235–1265, 2012. doi:10.1137/11085178X.
  • [12] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2-3):207–216, 2005. doi:10.1016/j.tcs.2005.09.013.
  • [13] Mohsen Ghaffari and Bernhard Haeupler. Distributed algorithms for planar networks II: low-congestion shortcuts, mst, and min-cut. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 202–219. SIAM, 2016. doi:10.1137/1.9781611974331.ch16.
  • [14] Mohsen Ghaffari and Bernhard Haeupler. Low-congestion shortcuts for graphs excluding dense minors. In Avery Miller, Keren Censor-Hillel, and Janne H. Korhonen, editors, PODC ’21: ACM Symposium on Principles of Distributed Computing, Virtual Event, Italy, July 26-30, 2021, pages 213–221. ACM, 2021. doi:10.1145/3465084.3467935.
  • [15] Mohsen Ghaffari and Goran Zuzic. Universally-optimal distributed exact min-cut. In Alessia Milani and Philipp Woelfel, editors, PODC ’22: ACM Symposium on Principles of Distributed Computing, Salerno, Italy, July 25 - 29, 2022, pages 281–291. ACM, 2022. doi:10.1145/3519270.3538429.
  • [16] Michel X. Goemans and David P. Williamson. A general approximation technique for constrained forest problems. SIAM J. Comput., 24(2):296–317, 1995. doi:10.1137/S0097539793242618.
  • [17] Michel X Goemans and David P Williamson. The primal-dual method for approximation algorithms and its application to network design problems. In Approximation Algorithms for NP-Hard Problems, pages 144–191. PWS Publishing Co., 1996.
  • [18] Anupam Gupta and Amit Kumar. A constant-factor approximation for stochastic steiner forest. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 659–668. ACM, 2009. doi:10.1145/1536414.1536504.
  • [19] Anupam Gupta and Amit Kumar. Greedy algorithms for steiner forest. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 871–878. ACM, 2015. doi:10.1145/2746539.2746590.
  • [20] Anupam Gupta, Amit Kumar, Martin Pál, and Tim Roughgarden. Approximation via cost-sharing: A simple approximation algorithm for the multicommodity rent-or-buy problem. In 44th Symposium on Foundations of Computer Science, FOCS 2003, Cambridge, MA, USA, October 11-14, 2003, Proceedings, pages 606–615. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238233.
  • [21] Bernhard Haeupler, D. Ellis Hershkowitz, and Thatchaphol Saranurak. Maximum length-constrained flows and disjoint paths: Distributed, deterministic, and fast. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1371–1383. ACM, 2023. doi:10.1145/3564246.3585202.
  • [22] Bernhard Haeupler, Taisuke Izumi, and Goran Zuzic. Low-congestion shortcuts without embedding. In George Giakkoupis, editor, Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing, PODC 2016, Chicago, IL, USA, July 25-28, 2016, pages 451–460. ACM, 2016. doi:10.1145/2933057.2933112.
  • [23] Bernhard Haeupler, Taisuke Izumi, and Goran Zuzic. Near-optimal low-congestion shortcuts on bounded parameter graphs. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, Paris, France, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 158–172. Springer, 2016. doi:10.1007/978-3-662-53426-7_12.
  • [24] Bernhard Haeupler, Taisuke Izumi, and Goran Zuzic. Low-congestion shortcuts without embedding. Distributed Comput., 34(1):79–90, 2021. doi:10.1007/s00446-020-00383-2.
  • [25] Bernhard Haeupler and Jason Li. Faster distributed shortest path approximations via shortcuts. In Ulrich Schmid and Josef Widder, editors, 32nd International Symposium on Distributed Computing, DISC 2018, New Orleans, LA, USA, October 15-19, 2018, volume 121 of LIPIcs, pages 33:1–33:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2018. doi:10.4230/LIPIcs.DISC.2018.33.
  • [26] Bernhard Haeupler, Harald Räcke, and Mohsen Ghaffari. Hop-constrained expander decompositions, oblivious routing, and distributed universal optimality. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1325–1338. ACM, 2022. doi:10.1145/3519935.3520026.
  • [27] Bernhard Haeupler, David Wajc, and Goran Zuzic. Universally-optimal distributed algorithms for known topologies. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1166–1179. ACM, 2021. doi:10.1145/3406325.3451081.
  • [28] Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274–296, 2001. doi:10.1145/375827.375845.
  • [29] Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85–103. Plenum Press, New York, 1972. doi:10.1007/978-1-4684-2001-2_9.
  • [30] Christoph Lenzen and Boaz Patt-Shamir. Improved distributed Steiner forest construction. In Magnús M. Halldórsson and Shlomi Dolev, editors, ACM Symposium on Principles of Distributed Computing, PODC ’14, Paris, France, July 15-18, 2014, pages 262–271. ACM, 2014. doi:10.1145/2611462.2611464.
  • [31] Chung-Lun Li, S. Thomas McCormick, and David Simchi-Levi. The point-to-point delivery and connection problems: complexity and algorithms. Discret. Appl. Math., 36(3):267–292, 1992. doi:10.1016/0166-218X(92)90258-C.
  • [32] Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: Sequential, cut-query, and streaming algorithms. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 496–509. ACM, 2020. doi:10.1145/3357713.3384334.
  • [33] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16–34, 2002. doi:10.1145/505241.505243.
  • [34] Seth Pettie and Vijaya Ramachandran. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. Comput., 31(6):1879–1895, 2002. doi:10.1137/S0097539700371065.
  • [35] Václav Rozhon, Christoph Grunau, Bernhard Haeupler, Goran Zuzic, and Jason Li. Undirected (1+ε)-shortest paths via minor-aggregates: Near-optimal deterministic parallel and distributed algorithms. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 478–487. ACM, 2022. doi:10.1145/3519935.3520074.
  • [36] Stefan Voß. Steiner tree problems in telecommunications. In Mauricio G. C. Resende and Panos M. Pardalos, editors, Handbook of Optimization in Telecommunications, pages 459–492. Springer, 2006. doi:10.1007/978-0-387-30165-5_18.
  • [37] Goran Zuzic, Gramoz Goranci, Mingquan Ye, Bernhard Haeupler, and Xiaorui Sun. Universally-optimal distributed shortest paths and transshipment via graph-based 1-oblivious routing. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages 2549–2579. SIAM, 2022. doi:10.1137/1.9781611977073.100.

Appendix A Model-Agnostic Algorithm

A.1 Modified Centralized (𝟐+𝜺)-Approximation

Theorem 8.

For a graph G=(V,E), edge costs c:E, and proper function f, the modified GW algorithm with incremental solution-set construction, (1+ε)-approximate distance computations, and (1+ε′′)-deferred forest-function evaluation (Algorithm 2) yields a (22/t)(1+ε)(1+ε′′)2-approximation to the optimal solution, i.e., a (2+ε)-approximation for ε,ε′′ε/41/4.

Denote by i=0,1, the iterations of the while loop in Algorithm 2, calling each iteration a merge phase, and for each phase by (i) ri:=(1+ε′′)iε′′4 the radius r at the beginning of phase i (with r1:=(1+ε′′)1ε′′4), (ii) Ui, ci Fi, Fi, Ai, Ei, and 𝒯i1 the values of the respective variables at the end of phase i, and (iii) ai|𝒯i1| the number of active components at the end of the phase i (with a1t). We prepare our proof in five lemmas.

Lemma 9.

For all i0 s.t. Algorithm 2 does not terminate at the end of phase i, we have (i) ri=(1+ε′′)ri1, (ii) ci+1(e)ci(e)c(e) for each eEi, (iii) Ui is spanned by FiAi, (iv) UiUi+1, (v) FiAiFi+1, (vi) FiFi+1, (vii) FiFi=FiAi and (viii) (V,Ei) is connected and the weighted diameter w.r.t. reduced costs c is decreasing.

Proof.

The first and the second statement hold by construction due to Lines 21 and 12, respectively. For the third statement, observe that Fi connects each node in Ui to some node in 𝒯i1, and the set Mi includes all edges that are both contained in Ui (i.e., whose reduced cost has become 0) and connect different connectivity components with respect to Fi of Ui. Thus, the choice of Ai ensures that Ui is spanned by FiAi. This readily implies the forth statement, since FiAiEi, as well as the fifth, since all other edges e of reduced cost c(e)=0 are removed to obtain Ei, forcing any approximate SSSP solution to reselect the edges from FiAi into Fi+1. The sixth statement holds because the algorithm only adds edges to F, and the seventh statement holds by induction, using that Fi=Fi1AiX for some set XFi. Finally, for the eighth statement, observe that since Ui is spanned by FiAi, whose reduced cost is 0, and any edges not fully contained in Ui remain in Ei, the shortest path between any pair of nodes w.r.t. c only becomes shorter: For any edge contained in Ui, there is now a path of reduced weight 0 between its endpoints, while any edge e with reduced cost ci(e)>0 is still present and retains at most its original cost c(e).

Lemma 10.

For ε,ε′′n𝒪(1), Algorithm 2 terminates in 𝒪(lognε′′) iterations of its loop.

Proof.

Since each of the operations within the loop terminate (assuming correct subroutines), it suffices to show the claimed bound on the number of loop iterations. To exit the while-loop, we must have f(C)=0 for each C𝒞, where 𝒞 is the set of connected components induced by our current edge set F. Recall that edge weights and hence the weighted diameter of the graph are polynomially bounded in n. As ε′′,εn𝒪(1), there is an j𝒪(lognε′′) for which rj=(1+ε′′)jε′′4 exceeds the weighted diameter of G times 1+ε. By Lemma 9 (viii), this entails that if we reach this phase, then Uj contains the entire connected graph (V,Ej). By Lemma 9 (iii), Uj (and thus V) is spanned by FjAj. The merge operation in Line 16 hence ensures that all terminals in 𝒯j11 are connected by Fj. Denote by C the connectivity component of (V,Fj) containing the nodes in 𝒯j11. For each terminal τ𝒯j11C, τ lies in a connected component C of (V,Fj1) satisfying that f(C)=0. This entails that we can partition VC into two types of sets: (i) components C of (V,Fj1) satisfying that f(C)=0, and (ii) non-terminals vV𝒯, which satisfy f({v})=0 by definition. By disjointness of the proper forest function f, it follows that f(V𝒞)=0, and by symmetry it follows that f(C)=0. Finally, by Lemma 9 (vi), we have Fj1Fj. Therefore, each connectivity component C of (V,Fj) other than C decomposes into connectivity components of (V,Fj1), satisfying that f(C)=0, and non-terminals, which again by disjointness implies that f(C)=0. Thus, Algorithm 2 terminates by the end of phase j𝒪(lognε′′).

Lemma 11.

The edge set F output by Algorithm 2 is primal feasible.

Proof.

When Algorithm 2 terminates, we must have exited its while-loop, implying that we have f(C)=0 for each C𝒞, where 𝒞 is the set of connected components induced by our output set F. Consider any set SV. If S non-trivially intersects a component C𝒞, i.e., SCC, then there is an edge of F in the cut defined by S, i.e., |δ(S)F|1f(S). Otherwise, S=C𝒞SC for some 𝒞S𝒞. As f satisfies disjointness and f(C)=0 for each C𝒞S, it follows that |δ(S)F|0=f(S) in this case, too. Thus, F is primal feasible.

Lemma 12.

The value LB output by Algorithm 2 is the cost of a feasible dual solution.

Proof.

Recall that r=ri=(1+ε′′)iε′′4 in phase i, and denote by j the index of the final phase Abbreviating Δi(1+ε)1ri, we can write the value LB as

LB=(1+ε)1i=0j(1+ε′′)iε′′4ai=(1+ε)1i=0jriai=i=0jΔiai.

Denote the cost reduction of an edge achieved in phase i as c¯i(e)cUi(e)=ci1(e)ci(e), with c1(e)c(e), and the amount that y is increased in phase i for some set SV as yi(S). Now observe that to construct a feasible dual solution of value LB, it suffices to, in each phase i[j]0, for each component C surviving phase i, increase the dual variables associated with the subsets SC in sum by Δi, while ensuring that for each eE, we maintain S:δ(S)eyi(S)c¯i(e).

Since f is proper, we know that for each component C surviving phase i, there exists at least one component C active in phase i such that CC. Therefore, to allocate Δi to dual variables associated with C, we can trace the construction of C from the set of components 𝒞{CCf(C)=1, C is a connected component of (V,Fi1)} (where F1) as follows. Starting with 𝒞 as it resulted from phase i1, we increase the radii of all components C𝒞 at rate ρ and the y-variables of C𝒞 at rate ρ1+ε. Thus, we gradually construct Ui and reduce the costs of all edges in the affected cuts (i.e., increase cUi=c¯i(e)) until the first edge achieves ci(e)=0 (or the phase ends). This happens at the latest when the first dual constraint in phase i becomes tight – it can happen earlier as the edge cost reductions associated with our radius increases are based on (1+ε)-approximate distances. When an edge achieves ci(e)=0, we update F and 𝒞 to ensure that both endpoints of e lie in the same component, and we iterate the process described above.

Since C survived phase i, this process does not add an edge to F that lies in the cut (C,VC) until we have increased the radius by ri. We claim that at no point in this process, 𝒞 becomes empty. Assuming otherwise, we would have that C=C𝒞C with f(C)=0 at the respective point of the process, implying the contradiction that f(C)=0 by disjointness of f. Accordingly, the total increase of y-variables that we attribute to C during phase i is at least Δi, as desired. Moreover, our direct coupling of the y-variable increases with edge-cost reductions further ensures that the y-variables relevant for any individual edge e increase by at most c¯i(e), i.e., its cost reduction in phase i, guaranteeing feasibility.

Summing over the ai components surviving phase i, we increase the dual variables by at least Δiai per merge phase, concluding the proof.

Lemma 13.

Denoting by j the phase after which Algorithm 2 terminates, it holds that c(F)(22t)(1+ε′′)2i=0j1riai.

Proof.

Recall that merge phase i0 is the phase in which r=ri=(1+ε′′)iε′′4, and ai is the number of active components surviving phase i. Moreover, note that all edges e added to F satisfy that c(e)=0 on termination, and that the cost reduction for each edge in phase i is at most 2ri, as each endpoint of the edge is contained in at most one tree of F. Hence, the cost of each edge in F can be amortized over the phases i in which its cost is reduced by the algorithm, i.e., which it starts with ci1(e)>0, and in which it intersects Ui. Accordingly, F decomposes into a set of nested shells UiUi1 (with U1:=), which the algorithm iteratively and implicitly constructs around active components. This shell-decomposition argument is illustrated in Figure 2.

Since in phase i, the remaining edges FFi1 (where F1) to be added to F form a forest with ai active components as nodes, and the average degree of such a forest is at most 22/ai, we can bound the cost of the forest F output by Algorithm 2 from above as

c(F)i=0jriai1(22ai1)(22t)(1+ε′′)i=0jri1ai1=(22t)(1+ε′′)i=1j1riai,

where a1t. Since F is a feasible primal solution by Lemma 11, each terminal must be incident with at least one edge from F, and the minimum edge weight is at least 1, we have that c(F)t/2. On the other hand, a1r1=ε′′t4(1+ε′′), yielding that
c(F)(1+ε′′)c(F)ε′′t2=(1+ε′′)(c(F)2a1r1)(22t)(1+ε′′)2i=0j1riai.

Using the above lemmas, we can now prove Theorem 8.

Proof of Theorem 8.

Assume that Algorithm 2 terminates at the end of phase j, i.e., aj=0; by Lemma 10, such a phase exists. Observe that the value LB output by Algorithm 2 then satisfies LB=(1+ε)1i=0jriai=(1+ε)1i=0j1riai. By Lemma 11, F is a feasible primal solution. As LB is the cost of a feasible dual solution by Lemma 12, using Lemma 13, we obtain the desired approximation guarantee as

c(F)LB(22/t)(1+ε′′)2i=0j1riai(1+ε)1i=0j1riai=(22t)(1+ε)(1+ε′′)2.

A.2 Specification of our Model-Agnostic Meta-Algorithm

The task of (2+ε)-approximating CFPs in any specific model reduces to simulating Algorithm 2. By Lemma 10, we can divide the computation into 𝒪(lognε′′) phases, where, starting at phase 0, phase i grows components by radius ri=(1+ε′′)iε′′4. Each phase tackles six abstract problems, corresponding to the six building blocks of our algorithm (see Figure 1).

A.2.1 Problems Used as Building Blocks

Problem 3 (α-approximate Set-Source Shortest-Path Forest).

Given a connected graph G=(V,E) with edge costs c:E0 and a set of sources SV, compute a forest F spanning G such that for all nodes vV, dF(S,v)αd(S,v), where d(S,v)=minuS{d(u,v)}, and dF(u,v) is the weighted distance between u and v in F.

Note that in phase i, we can confine ourselves to computing an α-approximate set-source shortest-path forest up to distance ri (see Algorithm 2, Lines 910).

Problem 4 (Candidate-Merge Identification).

Given a graph G=(V,E), edge costs c:E0, and a rooted forest F with a subset of its trees marked such that each node v in a marked tree knows that its tree is marked as well as the identity of its root τv, identify all edges e={u,v} that are in distinct marked trees and satisfy c(e)=0.

Problem 5 (Minimum Spanning Tree).

Given a graph G=(V,E) with edge costs c:E0, compute the Minimum Spanning Tree of G.

Problem 6 (Root-Path Selection).

Given a graph G=(V,E) with edge costs c:E0, a rooted forest, and a set of marked nodes SV, select the forest edges connecting each marked node to its root.121212Root-Path Selection can be reduced to approximate transshipment as follows: (i) count the number of marked nodes in each tree; (ii) set the demand of each marked node to 1 and the demand of the root of each tree to the number of marked nodes in the tree; (iii) set edge costs to 0 for tree edges and to + for all other edges; (iv) solve the approximate transshipment problem for these demands and edge weights; and (v) select all edges with non-zero flow in the output. Note that the only non-trivial step of the reduction is the computation of the demand, which boils down to a single Partwise Aggregation.

Problem 7 (Edge-Cost Reduction).

Given a graph G=(V,E) with edge costs c:E0, a radius r, and an output of Problem 3, compute, for each edge eE, c(e)=max{0,c(e)vemax{rdF(S,v),0}}.

Problem 8 (Forest-Function Evaluation).

Given a graph G=(V,E), a partition 𝒞 of the node set such that each C𝒞 is connected, and a proper forest function f:2V{0,1}, evaluate f(C) for each component C𝒞.

A.2.2 Meta-Algorithm Using the Building Blocks

Initialize F, 𝒞, 𝒯1, c, and r as in Algorithm 2. Throughout our algorithm, we maintain a set of connected components 𝒞 with activity statuses f(C) for each C𝒞. At the beginning of phase 0, 𝒞 contains exactly the singleton sets corresponding to all nodes, i.e., 𝒞={{v}vV}, and the active components are the terminals. Each phase i of our algorithm (i.e., one loop iteration in Figure 1, simulating one while-loop iteration of Algorithm 2) then consists of the following steps, executed for r=(1+ε′′)iε′′4.

  1. (1)

    Approximate Set-Source Shortest-Path Forest (aSSSP). Assign as temporary edge weight to eE the reduced cost c(e) if c(e)>0 or eFF (where F in phase 0). Compute a (1+ε)-approximate (r-restricted) SSSP forest F, using the active terminals 𝒯1={min{vv𝒯C}C𝒞,f(C)=1} as sources, i.e., for each active component, 𝒯1 contains the terminal with the minimum identifier. After Step 1, for each node vV𝒯1, we know its parent in the truncated SSSP forest, its closest source u𝒯1 in the respective shortest-path tree (if any), and its distance dF(u,v) to that source.

  2. (2)

    Edge-Cost Reduction (ECR). Using the approximate r-restricted SSSP forest and the distances computed in Step 1, update the edge costs in accordance with Problem 7.131313We can keep the edge costs in 0 by making sure that phases end with integral values of r. Note that εn𝒪(1), or distance computations must be exact. Scale all weights by 1/ε. Now rounding r up to the next integer has marginal impact on the approximation guarantee, as overgrowing by factor (1+ε) plus an additive 1 is not worse than overgrowing by factor (1+2ε).

  3. (3)

    Candidate-Merge Identification (CMI). Using that nodes’ parents and reduced edge costs are known, identify candidate merges M for adjacent trees of the aSSSP forest (Step 1).

  4. (4)

    Minimum Spanning Tree (MST). Compute a Minimum Spanning Tree T of G with the following edge weights: (i) 0 for edges in F, i.e., the tree edges in the output of Step 1, (ii) 1 for edges in M, i.e., those determined in Step 3, and (iii) + (or a large value) for all other edges. Mark all selected edges of T that are also in the set M known from Step 3, i.e., the edges constituting A, and add them to F (thus excluding all edges with temporary weight greater than 1). For each connected component C of the forest constituted by the selected edges of temporary weight 0 or 1 that contains a terminal τC, set min{vCf({v})=1} as the new identifier of the component C to be created from C, making it known to all vC.

  5. (5)

    Root-Path Selection (RPS). Connect the marked edges identified in Step 4 to the roots (i.e., the node with the same identifier as the component) of the components they connect by adding the necessary edges to F.

  6. (6)

    Forest-Function Evaluation (FFE). Using the new component memberships known from Step 4, update the set 𝒞 and evaluate f(C) for each updated C𝒞. If f(C)=0 for all such components, terminate and output F– else, continue with the next loop iteration.

A.3 Correctness and Complexity of our Model-Agnostic Meta-Algorithm

Theorem 14 (Model-Agnostic Shell-Decomposition Algorithm).

For ε,ε,ε′′ as in Theorem 8, a graph G=(V,E), edge costs c:E, and proper function f, the modular shell-decomposition algorithm (A.2.2) yields a feasible (2+ε)-approximation to the optimal solution.

Proof.

We prove the claim by induction on the phase i, going step by step through the algorithm given in Section A.2.2 and arguing why the computed objects, in particular F, match those of Algorithm 2. We augment the induction hypothesis by the claim that at the beginning of a phase, in Algorithm 2, E contains exactly the edges of non-zero reduced cost and Fi1Fi1. The induction anchor (phase i=1) is given by the identical initialization of objects. For the step to phase i0, observe first that by the induction hypothesis (in particular the additional claim), Step 1 computes the same Fi and the same distances as Algorithm 2, and hence Step 2 yields the same ci. It follows that Step 3 computes the same set M of candidate merges, implying that Step 4 correctly determines Ai and adds it to F. Note that the latter step also updates component memberships and component identifiers, but does not yet evaluate whether f(C)=1 for the new components. This is finally done in Step 6, such that in Step 1 of the next phase, the correct set 𝒯i1 will be used. It remains to prove the additional claim that Ei contains exactly the edges of non-zero reduced cost and Fi1Fi1, which now is immediate from Line 17 and Lemma 9 (vii).

We conclude that both algorithms terminate at the end of the same phase j, returning the same forest F=Fj, which by Theorem 8 is a (2+ε)-approximation.

The proof of our main theorem now follows immediately.

Proof of Theorem 1.

By Theorem 14, our modular shell-decomposition algorithm (Algorithm 2) delivers the desired approximation guarantee. Without loss of generality, we may assume that εn𝒪(1), as this is enough to enforce that ε times the cost of an optimal solution is smaller than 1, i.e., a 2-approximation is guaranteed. Thus, it is sufficient to instantiate the algorithm with ε,ε′′n𝒪(1), such that by Lemma 10, the algorithm will terminate after 𝒪(lognε′′)=𝒪(lognε)=𝒪~(ε1) while-loop iterations. In each of these iterations (up to bookkeeping operations), aSSSP, MST, RPS, and FFE computations are performed exactly once, yielding the desired model-agnostic complexity.