Demand-Aware Network Designs of Bounded Degree

Traditionally, networks such as datacenter interconnects are designed to optimize worst-case performance under arbitrary traffic patterns. Such network designs can however be far from optimal when considering the actual workloads and traffic patterns which they serve. This insight led to the development of demand-aware datacenter interconnects which can be reconfigured depending on the workload. Motivated by these trends, this paper initiates the algorithmic study of demand-aware networks (DANs) designs, and in particular the design of bounded-degree networks. The inputs to the network design problem are a discrete communication request distribution, D, defined over communicating pairs from the node set V , and a bound, d, on the maximum degree. In turn, our objective is to design an (undirected) demand-aware network N = (V,E) of bounded-degree d, which provides short routing paths between frequently communicating nodes distributed across N. In particular, the designed network should minimize the expected path length on N (with respect to D), which is a basic measure of the efficiency of the network. We show that this fundamental network design problem exhibits interesting connections to several classic combinatorial problems and to information theory. We derive a general lower bound based on the entropy of the communication pattern D, and present asymptotically optimal network-aware design algorithms for important distribution families, such as sparse distributions and distributions of locally bounded doubling dimensions.


Introduction
The problem studied in this paper is motivated by the advent of more flexible datacenter interconnects, such as ProjectToR [15,16]. These interconnects aim to overcome a fundamental drawback of traditional datacenter network designs: the fact that network designers must decide in advance on how much capacity to provision between electrical packet switches, e.g., between top-of-rack (ToR) switches in datacenters. This leads to an undesirable tradeoff [22]: either capacity is over-provisioned and therefore the interconnect expensive (e.g., a fat-tree provides fullbisection bandwidth), or one may risk congestion, resulting in a poor cloud application performance. Accordingly, systems such as ProjectToR provide a reconfigurable interconnect, allowing to establish links flexibly and in a demand-aware manner. For example, direct links or at least short communication paths can be established between frequently communicating ToR switches. Such links can be implemented using a bounded number of lasers, mirrors, and photodetectors per node [16]. First experiments with this technology demonstrated promising results: although the interconnecting networks is of bounded degree, short routing paths can be provided between communicating nodes.
The problem of designing demand-aware networks is a fundamental one, and finds interesting applications in many distributed and networked systems. For example, while many peer-to-peer overlay networks today are designed towards optimizing the worst-case performance (e.g., minimal diameter and/or degree), it is an intriguing question whether the "hard instances" actually show up in real life, and whether better topologies can be designed if we are given more information about the actual communication patterns these networks serve in practice.
While the problem is natural, surprisingly little is known today about the design of demand-aware networks. At the same time, as we will show in this paper, the design of demand-aware networks is related to several classic combinatorial problems.
Our vision is reminiscent in spirit to the question posed by Sleator and Tarjan over 30 years ago in the context of binary search trees [9,23]: While there is an inherent lower bound of Ω(log n) for accessing an arbitrary element in a binary search tree, can we do better on some "easier" instances? The authors identified the entropy to be a natural metric to measure the performance under actual demand patterns. We will provide evidence in this paper that the entropy, in a slightly different flavor, also plays a crucial role in the context of network designs, establishing an interesting connection. The Problem: Bounded Network Design. We consider the following network design problem, henceforth referred to as the Bounded Network Design problem, short BND. We consider a set of n nodes (e.g., top-of-rack switches, servers, peers) V = {1, . . . , n} interacting according to a certain communication pattern. The pattern is modelled by D, a discrete distribution over communication requests defined over V ×V . We represent this distribution using a communication matrix M D [p(i, j)] n×n where the (i, j) entry indicates the communication frequency, p(i, j), from the (communication) source i to the (communication) destination j. The matrix is normalized, i.e., ij p(i, j) = 1. Moreover, we can interpret the distribution D as a weighted directed demand graph G D , defined over the same set of nodes V : A directed edge (u, v) ∈ E(G D ) exists iff p(u, v) > 0. We set the edge weight to the communication probability: w(i, j) = p(i, j).
In turn, our objective is to design an unweighted, undirected Demand-Aware Network (DAN) defined over the set of nodes V and the distribution D, henceforth denoted as N (D) or just N when D is clear from the context. The objective is that N (D) optimally serves the communication requests from D under the constraint that N must be chosen from a certain family of desired topologies N . In particular, we are interested in sparse networks (i.e., having a linear number of edges) with bounded degree ∆ (i.e., nodes have a small number of lasers [16]), and we denote the family of ∆-bounded degree graphs by N ∆ .
Note that the designed network can be seen as "hosting" the served communication pattern, i.e., the demand graph is embedded on the designed network. Accordingly, we will sometimes refer to the demand graph as the guest network and to the designed network as the host network.
Our objective is to minimize the expected path length [1,2,21] of the designed host network N ∈ N : For u, v ∈ V (N ), let d N (u, v) denote the shortest path between u and v in N . Given a distribution D over V × V and a graph N over V , the Expected Path Length (EPL) of route requests is defined as: Since routing across the host network usually occurs along shortest paths, the expected path length captures the average hop-count of a route (e.g., latency incurred or energy consumed along the way).
Succinctly, the Bounded Network Design (BND) problem is to minimize the expected path length and is defined as follows: Definition 1 (Bounded Network Design) Given a communication distribution, D and a maximum degree ∆, find a host graph N ∈ N ∆ that minimizes the expected path length:

Our Contributions
This paper initiates the study of a fundamental problem: the design of demand-aware communication networks. While our work is motivated by recent trends in datacenter network designs, our model is natural and finds applications in many distributed and networked systems (e.g., peer-to-peer overlays). The main contribution of this paper is to establish an interesting connection of the network design problem to the conditional entropy of the communication matrix. In particular, we present a lower bound on the expected path length of a network with maximum degree ∆ which is proportional to the conditional entropy of D, H ∆ (X | Y ) + H ∆ (Y | X) where ∆ is the base of the logarithm used for calculating the entropy. While this lower bound can be as high as log n, for many distributions it can be much lower (even constant). Our main results are presented in Theorem 4 which proves a matching upper bound for the case when D is a sparse distribution, and Theorem 6 which proves a matching upper bound for the case when D is a regular and uniform (but maybe dense) distribution of a locally bounded doubling dimension. Also in these two cases the conditional entropy could range from a constant and up to log n. At the heart of our technical contribution is a novel technique to transform a low-distortion network of some high maximum degree to a low-degree network whose maximum degree equals the average degree of the original network, while maintaining an expected path length in the order of the conditional entropy. Moreover, we show an interesting reduction of uniform and regular distributions to graph spanners in Theorem 5.

Paper Organization
The remainder of this paper is organized as follows. We put our work into perspective with respect to related work in Section 2 and introduce some preliminaries in Section 3. We derive lower bounds in Section 4 and present algorithms to design networks for sparse distributions resp. regular and uniform distributions in Section 5 resp. Section 6. We conclude our work and outline directions for future research in Section 7.

Putting Things Into Perspective and Related Work
There are at least three interesting perspectives on our problem. The first one arises when trying to gain some intuition about the problem complexity. If ∆ = n, the problem is simple: the demand (or guest) graph G D itself can be used as the host graph or DAN N ∈ N ∆ , providing an ideal expected path length 1. If a sparse host graph is desired, a star topology could be used as a DAN to provide an expected path length of at most 2. At the other end of the spectrum, if ∆ = 2 (and the host network is required to be connected) the DAN N must be a line or a ring graph. However, the problem of how to arrange nodes on the linear chain or the ring such that the expected path length is minimized, is already NP-hard: the problem is essentially a Minimum Linear Arrangment (MinLA) problem [7,10,14]. One perspective to see our contribution is that in this paper, we are interested in what happens between these extremes, for other values of ∆, in particular for a constant ∆ which guarantees that our host network will be sparse, i.e., has a linear number of edges. In contrast to the general arrangement problem which asks for an embedding of the guest graph on a specific and given host graph, in our network design problem we are free to choose the best host graph from a given family of graphs (i.e., bounded degree graphs). One might wonder: does this flexibility make the problem easier? Sparse and distance-preserving spanners open a second perspective on our work: intuitively, a good host graph N for G D "looks similar" to G D . But in contrast to classic spanner problems in the literature which are primarily concerned with minimizing the worst-case distortion (resp. the average distortion) among all node pairs [4,19], we are only interested in the local distortion. Namely, we aim to find a good "spanner" which preserves locality of neighborhoods, i.e., 1-hop neighborhoods in the demand graph. Second, unlike classic spanner problems but similar to geometric (metric) spanners, the designed network N does not have to be a subgraph and may include edges which do not exist in the demand network G D , i.e., 0-entries in the corresponding communication matrix M D . We refer the corresponding edges as auxiliary edges (a.k.a. shortcut edges [18]). It is easy to see that auxiliary edges can indeed be required to compute optimal network designs, and yield strictly lower communication costs than subgraph spanners. Third, in contrast to the frequently studied sparse graph spanner problem variants, we require that nodes in the designed network are of degree at most ∆. Finally, we are not aware of any work studying the relationship between spanners and entropy. This makes our model fundamentally different from existing models studied in the literature.
The fact that our matrix represents a distribution provides some interesting structure. In particular, it leads us to a third connection, namely to information and coding theory, see also [3] for a code-based network deisgn of arbitrary degree. It is known that the expected path length in binary search trees [23] as well as in network designs providing local routing [2,3,21] is related to the entropy H(X) (over the elements X in the data structure) resp. conditional entropy of the distribution: H(X|Y ) + H(Y |X) is a lower bound on the expected path length of local routing tree designs [21] where X, Y are the random variables distributed according to the marginal distribution of the sources and destinations in D. This bound is tight for the limited case where D is a product distribution (i.e., p(i, j) = p(i)p(j)). Additionally the optimal binary search tree can be computed efficiently for every D using dynamic programming [21]. In the current work we extend this line of research by studying more general distributions and a larger family of host networks.

Preliminaries
We start with some notation about D. Let D[i, j] or p(i, j) denote the probability that source i routes to destination j. Let p(i) denote the probability that i is a source, i.e., p(i) = j p(i, j). Similarly let q(j) denote the probability that j is a destination. Let X, Y be random variables describing the marginal distribution of the sources and destinations in D, respectively. Let − → D [i] denote the normalized i'th row of D, that is, the probability distribution of destinations given that the source is i. Similarly let ← − D [j] denote the normalized j'th column of D, that is the probabil-ity distribution of sources given that the destination is j. Let Y i and X j be random variables that are distributed according to , respectively. We say that D is regular if G D is a regular graph (both in terms of in and out degrees). We say that We will show that a natural measure to assess the quality of a designed network relates to the entropy of the communication pattern. For a discrete random variable X with possible values {x 1 , . . . , x n }, the entropy H(X) of X is defined as where p(x i ) is the probability that X takes the value x i . Note that, 0·log 2 1 0 is considered as 0. Ifp is a discrete distribution vector (i..e, p i ≥ 0 and i p i = 1) then we may write H(p) or H(p 1 , p 2 , . . . p n ) to denote the entropy of a random variable that is distributed according top. Ifp is the uniform distribution with support s (s being the number of places in the distribution with p i > 0, i.e., p i = 1/s) then H(p) = log s.
Using the decomposition (a.k.a. grouping) properties of entropy the following is well-known [8]: and For a joint distribution over X, Y , the joint entropy is defined as Also recall the definition of the conditional entropy H(X|Y ): For X and Y defines as above from D we also have H(Y |X) is defined similarly and we note that it may be the case that H(X|Y ) = H(Y |X). We may simply write H for the entropy if the purpose is given by the context. By default, we will denote by H the entropy computed using the binary logarithm; if a different logarithmic basis ∆ is used to compute the entropy, we will explicitly write H ∆ . We recall the definition of a graph spanner. Given and t is the distortion of the spanner. We say that G = (V, E ) is a graph metric t-spanner if it is not a subgraph of G, i.e., it may have additional edges that are not in G.

A Lower Bound
We now establish an interesting connection to information theory and show that the conditional entropy serves as a natural metric for bounded network designs. In particular, we prove that the expected path length BND(D, ∆) in any demand-aware bounded network design, is at least in the order of the conditional entropy. Formally: Theorem 1 Consider the joint frequency distributions D. Let X, Y be the random variables distributed according to the marginal distribution of the sources and destinations in D, respectively. Then Before delving into the proof, let EPL(p, T ) denote the expected path length in a tree T from the root to its nodes where the expectation it taking over a distributionp. That is EPL(p, T ) = i p i d T (root, i). We recall the following well-known theorem: be the entropy of the frequency distribution p = (p 1 , p 2 , . . . , p n ). Let T be an optimal binary search tree built over the above frequency distribution. Then EPL(p, T ) ≥ 1 log 3 H(p).
Namely, the entropy H(p), is a lower bound on the expected path length from the root to the nodes in the tree. For higher degree graphs, we can extend the result: Lemma 1 Let H ∆ (p) be the entropy (calculated using the logarithm of base ∆) of frequency distributionp = (p 1 , p 2 , . . . , p n ). Let T be an optimal ∆-ary search tree built over the above frequency distribution.
. The proof almost directly follows from the proof of Theorem 2 in [17], by extending properties of binary trees to ∆-ary trees, see the appendix for details. We now prove the lower bound.
Proof:[Proof of Theorem 1] The proof idea is to consider a network which is the union of the optimal trees, one for each individual node. While the resulting network may violate the degree requirement, it constitutes a valid lower bound. So let us start by finding an optimal structure for each source node i, according to its communications to all destination nodes from it, Towards this end, we construct n ∆-ary trees, and let T i ∆ be the optimal tree for source node i build using ∆ . Now consider any bounded degree network N ∆ and compare it to the forest T made up of n trees T 1 ∆ , T 2 ∆ , . . . , T n ∆ . Then, Similarly, we can consider a set of trees optimized toward the incoming communication of node j, ← − D [j], and the marginal destination probability. We show: Hence the theorem follows.

Network Design for Sparse Distributions
We now present families of distributions which enable DANs that match the lower bound. Our approach will be based on replacing neighborhoods with near optimal binary (or ∆-ary) trees. Following the lower bound of Lemma 1, it is easy to show a matching upper bound (for a constant ∆).
Lemma 2 Letp be a probability distribution on a set of node destinations (sources) V , and let u be a single source (destination) node. Then one can design a tree T with u as a root node with degree one, connected to a ∆-ary tree over V such that the expected path length from u to all destinations (or from all sources to u) is: Proof: The proof follows by designing a Huffman ∆-ary code overp (with expected code length less than H ∆ (p) + 1 [8]) and using it to build a rooted ∆-ary tree. While the nodes in the Huffman code are tree leaves, we can move them up to become internal nodes, which only improves the expected path length.

Tree Distributions
A most fundamental class of distributions for which we can construct optimal network designs is based on trees.
Theorem 3 Let D be a communication request distribution such that G D is a tree (i.e., ignoring the edge direction, G D forms a tree). Let X, Y be the random variables of the sources and destinations in D, respectively. Then, it is possible to generate a DAN N with maximum degree 8, such that This is asymptotically optimal.
Proof: We generate N as follows. Consider an arbitrary node as the root of the tree G D , call this tree T D , and consider the parent-child relationship implied by the root. Let π(i) denote the parent of node i. The construction has two phases. In the first phase we replace outgoing edges. For each node i replace the edges between i and its children with a binary tree according to − → c i and the method of [17] (or Lemma 2 for a general ∆) for creating a near optimal binary tree. Let − → B i denote this tree and recall that Note that every node i may appear in at most two trees its degree is one and in − → B π(i) its degree is at most 3, so the outgoing degree of each node is at most 4 after this phase.
In the second phase we take care of the remaining incoming edges from children to parents. For each node i replace the edges from its children to it with a binary tree according to ← − c i and the method of [17] for creating a near optimal binary tree. Let ← − B i denote this tree and recall that EPL . Note that every node i may appear in at most two trees ← − B i and ← − B π(i) ; in ← − B i i's degree is one and in ← − B π(i) i's degree is at most 3. Thus, the incoming degree of each node is bounded by 4 after this phase. Now we bound EPL(D, N ) by bounding the expected path lengths in the corresponding binary trees of each node, first considering all edges from parent to children and then all edges from children to parent. Let p(i) and q(i) denote the probabilities that node i will be a source and a destination of a communication request, respectively. Then: This matches our lower bound in Theorem 1.

General Sparse Distributions
Asymptotically optimal demand-aware networks can even be designed for general sparse distributions. ). Let X, Y be the random variables of the sources and destinations in D, respectively. Then, it is possible to generate a DAN N with maximum degree 12∆ avg , such that This is asymptotically optimal when ∆ avg is a constant.
Proof: Recall that G D (for short G) is a directed graph and define in-degree and out-degree in the canonical way. Let the (undirected) degree of a node, be the sum of its in-degree and out-degree and denote the average degree as ∆ avg . Denote the n/2 nodes with the lowest degree in G as low degree nodes and the rest as high degree nodes. Note that each low degree node has a degree at most 2∆ avg and both its in-degree and out-degree must be low. A node with out-degree (in-degree) larger than 2∆ avg is called a high out-degree (high in-degree) node (some nodes are neither low or high degree nodes).
The construction of N will be done in two phases. In the first phase, we consider only (directed) edges (u, v) between a high out-degree u and a high indegree node v. We subdivide each such edge with two edges that connect u to v via a helping low degree node , i.e., removing the directed edge (u, v) and adding the edges (u, ) and (v, ). Note that there are at most m such edges, so we can distribute the help between low degree nodes in such a way that each low degree node helps at most ∆ avg such edges. Call the resulting graph G . Accordingly, we also create a new matrix D which, initially, is identical to D. Then for each (u, v) and as above we set D (u, v) = 0, D (u, ) = D(u, l) + D(u, v) and D ( , v) = D(l, v) + D(u, v). Note that D is not a distribution matrix anymore, as the sum of all the entries is more than one, but it has the following property: For each high degree node i, (2)).
In the second phase, we construct N from G . Consider each node i with high out-degree and create a nearly optimal binary tree − → B i according to − → D [i] using the method of [17]. Add an edge from i to the root of − → B i and delete all the out-edges from i from G . Similarly consider each node j with high in-degree and create a nearly optimal binary tree ← − B j according to D [i] using the method of [17]. Add an edge from j to the root of ← − B j and delete all the in-edges of j from G . This completes the construction of N .
We first bound the maximum degree in N . First consider a low degree node , helping an edge (u, v), i.e., u is high out-degree and v is high-indegree. So is part of both u's and v's binary tree, hence 's degree increases by at most 6 (two times degree 3 for being an internal node). Note that needs to help at most ∆ avg edges itself. For each of these ∆ avg edges, 's degree will be at most 6, resulting in a degree of 6∆ avg . Since 's degree was at most 2∆ avg , in the worst case, was associate with 2∆ avg high in-degree or out-degree nodes, i.e., will be present in all these 2∆ avg trees, which results in another 6∆ avg degrees for . In total, 's degree is 12∆ avg . If a node h has both high out-degree and high in-degree, then its degree will be two: h is connected to the root of the tree created of its out-edges and to the root of the tree created of its in-edges. If a node u is only a high out-degree node, its degree in N is bounded by 6∆ avg + 1 (and similarly for a node u which is only a high in-degree node). If a node is neither high nor low degree, then its degree in N is bounded by 12∆ avg (originally it was up to 4∆ avg in G ). We now bound EPL(D, N ). Recall that from Lemma 2 and Eq. (2), we have, For each request (i, j) in D there are two possibilities for the route on N : either the edge (i, j) ∈ N is a direct route, or the route goes via − → B i or ← − B j or both. Let O and I be the set of high out-degree and in-degree nodes, respectively. Then: This matches our lower bound in Theorem 1.

Regular and Uniform Distributions
Another large family of distributions for which demand-aware networks can be designed are regular and uniform distributions D. While it is easy to see that both conditions can be relaxed such that the supported distributions can be "nearly regular" and "nearly uniform", for ease of presentation, we keep the conditions strict in what follows. We first establish an interesting connection to spanners. As we will see, this connection will provide a simple and powerful technique to design a wide range of demand-aware networks meeting the conditional entropy lower bound.
Theorem 5 Let D be an arbitrary (possibly dense) regular and uniform request distribution. It holds that if we can find a constant and sparse (i.e., constant distortion, linear sized) spanner for G D , we can design a constant degree DAN N providing an expected path length of This is asymptotically optimal.
In other words, for regular and uniform distributions, the network design problem boils down to finding a constant 1 sparse spanner: as we will see, we can automatically transform this spanner into an efficient network (which may contain auxiliary edges). The remainder of this section is devoted to the proof of the theorem.
Assume that D is r-regular and uniform. Recall that in this case H(Y | X) = H(X | Y ) = log r, so BND(D, ∆) ≥ Ω(H(Y | X)) where ∆ is a constant. We now describe how to transform a constant sparse (but potentially irregular) spanner for G D into a constant-degree host network N with EPL(D, N ) ≤ O(log r). This will be done using a similar degree reduction technique as discussed earlier (in the proof of Theorem 4).
Lemma 3 Let G be a graph of maximum degree ∆ max and an average degree ∆ avg . Then, we can construct a graph G with maximum degree 8∆ avg which is a graph metric log ∆ max -spanner of G, i.e., Proof: Let us call the n/2 nodes with the lowest degree in G the low degree nodes and the remaining nodes high degree nodes. By the pigeon hole principle, each low degree node has a degree at most 2∆ avg . The construction of G proceeds in two phases. In the first phase we take every edge between high degree nodes u, v and subdivide it with two edges that connect u to v via a helping low degree node , i.e., removing the edge (u, v) and adding the edges (u, ) and (v, ). Note that there are at most m edges connecting high degree nodes so we can distribute the help between low degree nodes such that each low degree node helps to at most ∆ avg such edges.
In the second phase we consider each high degree node u and replace the set of edges between u and its neighbors, Γ(u), with a balanced binary tree that connects u as the root and Γ(u) as remaining nodes of the tree. Denote as B u this tree and note that the height of B u is at most log(|Γ(u)| + 1). We leave edges between low degree nodes as in G.
Let us analyze the degrees in G . Since every high degree node u in G only connects to low degree nodes, it is only a member of B u and its degree reduces to 2 in G . Now consider a low degree node : for each edge (u, v) it helps, participates in B u and B v . Hence, its degree increases by at most 6 in G compared to G. Overall, for helping high degree nodes, the degree of can increase by 6∆ avg . Together with its original neighbors from G, the degree of in G can be at most 8∆ avg . Next consider the distortion of G . The distortion between neighboring low degree nodes is one. The distortion between neighboring high degree nodes is at most twice log ∆ max and the distortion between a neighboring high and low degree is at most log ∆ max .
We will apply Lemma 3 to prove Theorem 5. Proof:[Proof of Theorem 5] Let S be a constant and sparse spanner of G D (G could be either a subgraph or a metric spanner of max degree asymptotically not larger than G D ) of degree at most r. Lemma 3 then tells us how to transform S to a DAN N of degree ∆ avg . Since S is a constant spanner there is a constant c such that, Since S has maximum degree r, according to Lemma 3, it has a graph metric spanner N such that, the distance of any source-destination pair of G(D) in N is at most 2 log r times their distance in S. Hence: The last equality holds since D is r-regular and uniform. The bound is asymptotically optimal when ∆ is a constant: it matches our lower bound in Theorem 1. Theorem 5 allows us to simplify the design of asymptotically optimal networks for uniform and regular distributions D where G D has a constant sparse spanner. In particular, the approach can be used to design optimal networks for the following large families of distributions which are known to have a constant and sparse graph spanners.
Corollary 1 Let D describe a uniform and regular communication request distribution. Then, it is possible to generate a constant degree DAN N such that in the following scenarios: • If, for a constant c ≥ 1, G D has a minimum degree δ ≥ n 1 c . 2 • If G D forms a hypercube with n log n edges.
• If G D forms a (possibly dense) chordal graph.
See Appendix for proof and details. We round off our study of uniform and regular distributions by considering one more interesting family of (possibly very dense) distributions: distributions D which describe a bounded and local doubling dimension, note that this family is more general than the standard bounded doubling dimension graphs which are sparse.
First, recall that a metric space (V, d) has a constant doubling dimension if and only if there exists a constant λ such that every ball of radius r in V can be covered by λ balls of half the radius r/2, for all r ≥ 1. In general, the smallest λ which satisfies this property for a metric space is called doubling constant and log 2 λ is called the doubling dimension [6,11,12,13]. A metric space is called bounded (a.k.a. constant or low) doubling dimension if λ is a constant. There has been a wide range of work on spanners for bounded doubling dimension metrics [6,5,12,13]. However, if the metric is imposed by a graph metric (via shortest paths) then a bounded doubling dimension implies that the graph is nearly regular, of bounded (constant) degree and sparse. Theorem 4 already solved the case of sparse G D , even for non-uniform and irregular distributions.
In the following, however, we are interested in a more general notion of doubling dimension, which allows a higher density, unbounded degree: we call it locally-bounded doubling dimension: Definition 2 ] G D implied by the distribution D has a locally-bounded doubling dimension if and only if there exists a constant λ such that the 2-hop neighbors of any node u are covered by at most λ 1-hop neighbors. Formally, for each u ∈ V , there exists a set of nodes y 1 , y 2 , ...y λ , such that: 1) where B(u, r) are the set of nodes that are at distance at most r-hops from u in G D .
Clearly, every bounded doubling dimension graph is also of locally-bounded doubling dimension, but the converse is not true. In particular, the latter enables graphs which could be dense, with unbound degree, and possibly with irregularity of degree.
In the remainder of this section, we will prove the following theorem.
Theorem 6 Let D describe a uniform and regular communication request distribution of locally-bounded doubling dimension. Then it is possible to design a constant degree DAN N such that This is asymptotically optimal.
Proof: Again, our proof strategy is to employ Theorem 5. Accordingly, we show that a constant sparse spanner exists for locally-bounded doubling dimension networks. In particular, we will design this spanner based on an -net construction. We first recall the definition of -nets [6].
Definition 3 ( -net) A subset V of V is an -net for a graph G = (V, E) if it satisfies the following two conditions: be a locally-bounded doubling dimension network. We now first construct a spanner S of G D which is a subgraph of G D , using the following ( = 2)-net: we sort all nodes according to decreasing (remaining) degrees, and iteratively select the high-degree nodes into the 2-net one-by-one and remove their 2-neighborhoods. Clearly, after this process, each node is either part of the 2-net or has a 2-net node at distance at most 2-hops, and we have computed a legal 2-net.
To form the spanner S, we next arbitrarily match each node u not in the 2-net to one of its nearest 2-net nodes v, and select the edges along a shortest path from u to v into the spanner S. This results in a forest of connected components (2-layered stars). We call these connected components clusters and the corresponding nodes in the 2-net cluster heads. We denote the cluster associated to the net node u by Cl(u).
We next connect the connected clusters to each other, in a sparse manner. Towards this end, we connect each pair of clusters, with an arbitrary single edge, if they contain at least one pair of communicating nodes in G D . We can claim the following.
Lemma 4 S is a constant and sparse spanner of G D (with distortion 9) .
Proof: Let (u, v) be an edge in G D and u ∈ Cl(u), v ∈ Cl(v). By construction, there are nodes x ∈ Cl(u) and y ∈ Cl(v) that are connected by an edge in S, and hence there is a path u, C(u), x, y, The spanner is also sparse: in a nutshell, due to the 2-net properties, we know that the distance between communicating cluster heads is at most 5: since in a locally doubling dimension graph the number of cluster heads at distance 5 is bounded, only a small number of neighboring clusters will communicate. More formally, after associating each node to some unique cluster, we have a linear number of edges in the spanner. Next we bound the number of outgoing edges from each cluster. Let (u, v) be such an edge where u ∈ Cl(u), v ∈ Cl(v). Let the cluster heads of Cl(u) and Cl(v) be i and j, respectively. By construction i and j are at most at distance 5 in G D , i.e., d G D (i, j) ≤ 5. So, if we can bound the number of 2-net nodes which lie within 5 hops from some net node i, it will give us a bound on the number of edges which we add between Cl(u) and other clusters. According to Definition 2, all the two hop neighbors of i can be covered within one hop neighbors of λ nodes, where λ is the corresponding doubling constant. If we consider two hop neighbors of all these λ many nodes, they cover all the 3 hop neighbors of i. To cover the 2 hop neighbors of each of these nodes, we again require one hop neighbors of λ nodes. So, to cover all 3 hop neighbors of i, we require at most λ 2 one hop neighbors. Inductively, by repeating this argument, we require one hop neighbors of at most λ 4 nodes to cover all the 5 hop neighbors of i. Since we constructed a 2-net, each of these λ 4 balls with radius one contains at most one 2-net node. Hence there are at most λ 4 2-net nodes which are at a distance 5 hops or less from i. Consequently, there are at most λ 4 inter-cluster edges associated to some cluster Cl(u), or cluster head i. Since there can not be more than linear number of clusters, hence the number of edges in S is also linear.
Using Lemma 4 we can directly use Theorem 5 and conclude the proof of Theorem 6.
In fact, it turns out that if we consider a metric spanner, and by using auxiliary edges, we can improve the above distortion and constract better constant sparse spanner S . The idea is to add intercluster edges only between the cluster heads. This will reduce the distortion to 5 while keeping the same number to total edges. The degree of each node in S will increase by at most a constant, λ 4 . Adjusting Theorem 5 respectively to support metric spanners (and only a subgraph spanner) will enable us to use S instead of S.

Conclusion
This paper initiated the study of a fundamental network design problem. While our work is motivated in particular by emerging technologies for more flexible datacenter interconnects as well as by peer-topeer overlays, we believe that our model is very natural and of interest beyond this specific application domain considered in this paper. For example, the design of a sparse, bounded-degree and distancepreserving network can also be understood from the perspective of graph sparsification [24]: the designed network can be seen as a compact representation of the original network.
The subject of bounded network design offers several interesting avenues for future research. In particular, while we presented asymptotically optimal network design algorithms for a wide range of distributions and while we believe that the entropy is the right measure to assess network designs, there remain several (dense) distributions for which the quest for optimal network designs remains open, perhaps also requiring us to explore alternative flavors of graph entropy.

Corollary 1
Proof:[Proof of Corollary 1] We prove the claims in turn.
Case G D has a minimum degree ∆ ≥ n 1 c : For this distribution D, we have, Create a ∆-ary tree N with the nodes of G D . Then, Case hypercube. Follows directly from Theorem 5 and the fact that hypercubes admit a sparse 3-spanner [20], allowing us to design a O(log log n) (metric) spanner of bounded degree.
Case chordal graphs. Follows from Theorem 5 and the fact that chordal graphs have a constant sparse spanner [19].

B Notions of Distortion
In the spanner problem, the goal is to find a sparse subgraph S = (V, E ) of G, i.e., E ⊆ E with |E | ≤ O(n) which approximately preserves the distances of G despite having less edges. Usually, the following notion of average distortion [4] is considered and referred to as the all-pairs distortion: Definition 4 (All-Pairs Distortion (APD)) The average all-pairs distortion on a spanner S of a graph G is We in this paper are only interested in preserving distances between communicating neighbors in G, henceforth defined as the neighborhood distortion:

Definition 5 (Neighborhood Distortion (ND))
The average neighborhood distortion on a spanner S of a graph G (with m edges) is, Next we claim that these two notions of distortion are indeed different, that is, low all-pairs distortion does not imply a low neighborhood distortion; and vice versa.
Claim 1 There is a family of graphs G n and a corresponding family of spanners S n (where n is the size of the graph and S n is a spanner of G n ) where lim n→∞ APD(G n , S n ) ND(G n , S n ) = ∞ Claim 2 There is a family of graphs G n and a corresponding family of spanners S n (where n is the size of the graph and S n is a spanner of G n ) where lim n→∞ ND(G n , S n ) APD(G n , S n ) = ∞ We will show this by examples. First consider Figure 2 (a). There is a Θ( √ n)-sized clique in the center, and each of those clique nodes is associated with a line containing Θ( √ n) nodes. To compute the optimal tree spanner with maximum degree ∆, we turn the clique nodes into a ∆-regular tree of diameter Θ(log ∆ √ n) = O(log ∆ n). The nodes remain connected with the corresponding lines.
The asymptotic distortion w.r.t. Definition 5 is: n · log ∆ n + n · 1 n = Θ(log ∆ n) Now we discuss about all pair distortion on the same spanner for this graph. Consider any two nodes which belong to different lines, but are also a member of the clique. Their distance in the spanner may become log √ n. So, according to Definition 4, d S (u, v)/d G (u, v) is equal to 1 2 log n. Now we provide an upper bound on number of such pairs ϕ whose distance can be up to O(log n) times their earlier distance. Consider all the nodes on all the lines which are within distance log n from the corresponding clique node. On the original graph, distances between any two such nodes were in the range [1, 2 log n + 1]. The number of such node pairs is √ n log 2 n. Clearly, ϕ < √ n log 2 n. Now consider any node on a line which is at least at a distance (1 + log n) from the corresponding clique node √ n log 2 n · log n + (n 2 − √ n log 2 n) · 2 n 2 = Θ(1) Now look at Figure 2 (b). There is a star of size n/ log n in the center, and each of the n/ log n nodes is associated with a clique of size log n. Thus, in total, there are n log n edges. To compute a tree spanner of degree ∆ = log n, we replace the cliques consisting of log n nodes with stars of size log n nodes; the star of n/ log n nodes in the center is transformed into a ∆-regular tree whose diameter is Θ(log n/ log log n). Then each tree node is associated with the root of the star corresponding to its associated clique. This tree spanner contains auxiliary edges too. Then, the asymptotic distortion w.r.t. Definition 5 is: n log n log 2 n + n log n · log n log log n n log n = O(1) In contrast, the distortion w.r.t. Definition 4 is Ω(log n/ log log n) since all pairs from the two different cliques now suffer a distortion of Θ(log n/ log log n), and there are O(n 2 ) such pairs. Hence ND(G, S) is constant, which implies that EPL(D, S) is also constant i.e., Equation 13 holds if ND(G, S) is constant.