Approximating Optimal Broadcast of Files in a Hose-Model Network
Abstract
The paper considers the problem of file sharing among peers who are connected to a common core network through links of differing upload and download capacities, as is the case in networks provisioned according to the hose model. The file is assumed to be divided into equal-sized chunks, and a peer can start sending a “chunk” of the file to another peer only after it has received the entire chunk. The objective is to share a chunk, initially residing on one of the peers, with all other peers in the least time possible. Peers can simultaneously send/receive parts of a chunk to/from multiple peers, subject to the upload and download capacity constraints. We only consider the problem of broadcasting one chunk to all peers.
We consider two different models – in the migratory model, a peer can receive the chunk from multiple peers, while in the non-migratory model, any peer can receive the chunk only from one peer. For the migratory model, introduced in this paper, we show a novel integer program and use the optimum solution to the LP-relaxation to give a schedule with makespan where is the time required by the slowest peer to download the chunk.
Minimising makespan in the non-migratory model is known to be NP-hard. We give a solution with makespan and this is the first approximation algorithm for heterogeneous and asymmetric upload/download capacities. We also consider 2 special cases. For uniform download capacities, we obtain a solution with makespan extending a result due to Liu [12]. For uniform upload capacities, we give the first approximation algorithm, producing makespan at most .
Keywords and phrases:
File sharing, scheduling, peer-to-peer networksFunding:
Naveen Garg: Supported by Usha Hasteer Chair.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
C. Aiswarya, Ruta Mehta, and Subhajit RoySeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
We consider the problem of file sharing between peers in networks. The file to be shared initially resides with a root peer and has to be broadcast, i.e., sent to all the remaining peers. The peers are all connected to a core network with links of bounded upload and download capacities. The core network is suitably provisioned so that any traffic matrix between the peers that meets the upload and download capacity constraints on each peer can be supported; this is commonly referred to as the hose model [6, 1]. The rate of transfer between two peers is only constrained by the upload and download capacities of the sending and receiving peer, respectively. For instance, if peer has an upload capacity of 10Mbps and peers have download capacities of 6Mbps then peer can transfer data to peers at a rate of 5Mbps or to peer at 6Mbps and peer at 4Mbps or vice-versa. Our objective is to schedule the transfers between peers, efficiently.
Such a scenario would, for instance, arise when the core is a cloud network [2] to which a set of enterprise clients are connected who have to share a file with all when it becomes available at one. While there has been a lot of previous work on broadcasting in networks, most of it focuses on the structure of the communication network [9, 10], which in our model plays no role.
Large files are typically distributed in “chunks”. A chunk is the smallest unit of the file that must be completely available to a peer before it can start sharing any part of it with another peer. A small file can be treated as a single chunk. In this paper, we shall only be concerned with distributing one chunk residing at the root peer to the remaining peers in the network. Our model permits a chunk to be split and for a peer to send parts of the chunk to multiple peers concurrently. We consider two variants of this model. In the non-migratory setting, a peer must receive the entire chunk from only one peer (its parent peer) while in the migratory setting, a peer could receive parts of the chunk from different peers. Our objective in both settings is to schedule these transfers so that the time by which all peers receive the chunk (the makespan) is minimised.
We remark that it is standard in networking that certain units (such as IP packets) can only be forwarded once they have been received in full, for example, because a checksum needs to be verified before one can be sure that the unit has been received correctly and can start forwarding it. Dropping this requirement for chunks in our setting, i.e., allowing parts of a chunk to be forwarded immediately, gives rise to the fluid limit model, in which the makespan can be minimised in polynomial time. We emphasise that most results on broadcasting treat the message to be broadcast as an atomic unit; the chunk, on the other hand can be split and reach a peer along different paths.
If upload capacities were uniform then it would be natural to distribute the chunk in order of decreasing download capacities with the peer having the largest download capacity receiving it first. Similarly, if the download capacities were uniform it would make sense to distribute the chunk in order of decreasing upload capacity and first sending it to the peer with the largest upload capacity. When upload and download capacities are identical (the symmetric setting) both these ideas yield the same order for distributing the chunk to the peers. However, when upload and download capacities are different (for a peer and across peers) there seems to be no natural preference order for distributing the chunk. For the non-migratory setting minimizing makespan is NP-hard [8] but for the migratory setting, we do not know if the problem is NP-hard.
The problem of scheduling the broadcast of multiple chunks is also of much interest but in this paper, we limit our discussion to a single chunk.
1.1 Our Results
This paper is the first to address broadcast in the hose model in its full generality, i.e., the asymmetric heterogeneous setting where the upload and download capacities of each peer can be different. Previous papers have only considered the symmetric setting when the upload and download capacity of each peer are identical [8]; see Section 1.2 for details. The problem of finding a constant approximation for the asymmetric setting has been open, and even the setting of uniform upload capacities had not been considered earlier.
Additionally, the paper introduces the novel migratory setting that allows a peer to download from multiple peers subject to its download capacity constraint. This is a natural relaxation and does not lead to loss of data identity, as we still process the chunk as a whole. The constraint that a peer starts uploading only after receiving the complete chunk is important and separates our model from the fluid limit model.
Our first result is a novel completion time integer program for the problem of minimising makespan in the migratory setting. Formulating this integer program requires proving that a certain non-trivial property is necessary and sufficient for characterising valid broadcast schedules. Relaxing the integrality constraints gives a linear program, and we exploit the structure of the optimum solution of this LP to obtain a solution with makespan at most (Theorem 5). is the time required for the slowest peer to download the chunk and while it is definitely less than , it could be significantly smaller.
For the non-migratory setting, we do not know how to formulate a suitable integer program, but we circumvent this difficulty by restricting attention to so-called slotted schedules. We formulate an LP by restricting attention to downloads that fit into “slots”; we establish that this restriction does not lead to a significant increase in makespan. Our algorithm for rounding the optimum solution to the linear program proceeds through three stages each of which requires careful reasoning about the structure of the solution. This yields a solution of makespan where is the minimum makespan of any slotted schedule. Since (Lemma 7) we obtain a solution of makespan at most (Theorem 11).
In the Appendix, we also consider 2 special cases in the non-migratory setting. When all download capacities are identical we extend the results of [12] to obtain a 2-approximation algorithm. When all upload capacities are identical we modify the download capacities of some peers and add some dummy peers to obtain a modified instance with makespan at most . We then find an optimum schedule for this modified instance and convert it into a schedule for the original instance with makespan at most .
In the results mentioned above, we assume that the root peer has an upload capacity at least as large as the largest download capacity. If such is not the case, we can, in time units ( is the upload capacity of the root peer) transfer the chunk to the peer with the largest upload capacity and then run our algorithms. This increases the makespan by at most as is also a lower bound on . In the non-migratory model, no peer can receive the chunk at a rate larger than the largest upload capacity, and so we can cap all download capacities at this value. For the migratory model, we show (Theorem 5) that the schedule obtained by our algorithm has makespan where is the ratio of the largest upload capacity to the largest download capacity of the peers.
1.2 Previous Work
The problem of broadcasting, where the task is to disseminate a message from a source node to the remaining nodes of a network as quickly as possible, has been studied under various models (see [9] for a survey). A popular model is the telephone model where communication happens in rounds. In each round, a peer with the message can communicate it to exactly one other peer and the objective is to minimize the number of rounds. If every node can communicate with any other node, Bar-Noy et al. [5] gave a simple recursive doubling algorithm which is optimal when the number of nodes is even and an additive 1 approximation when the number of nodes is odd. When communication can happen only between adjacent nodes of a given graph, Ravi [14] defined the notion of poise and used it to give a -approximation, later improved to a -approximation by Elkin and Kortsarz [7].
In the simultaneous send/receive model a peer can send the message to one peer and receive a message from another peer in the same round. Bar-Noy et al. [5] gave an optimal algorithm for broadcasting multiple messages for this model. A natural generalisation of this model is the -port model which allows each peer to send messages to peers and receive messages from peers in a single round. An additive 1 approximation for the minimum number of rounds needed to send multiple messages was obtained in [4].
Another natural generalisation to these models is allowing for network heterogeneity. Banikazemi et al. [3] considered a setting of heterogeneous processors (peers) capable of point-to-point communication with each other. The peers could have different sending speeds/rates and the model requires that any peer take part in at most a single message transmission at any given time. For this model they proposed a fastest node first algorithm (FNF); in each iteration, the peer with the least transmission time which has the message sends it to the peer with the least transmission time which does not have the message. Liu [12] gave a 2-approximation algorithm for minimizing makespan in this model and showed that the fastest node first algorithm is optimum if sending rates are integer multiples of each other. Khuller and Kim [11] proved that the Fastest node first algorithm is a 1.5-approximation.
Mundinger et al. [13] considered the file-sharing problem and introduced the uplink-sharing model. The file is broken into multiple chunks and a peer can receive a chunk only from 1 other peer. However, a peer could simultaneously send a chunk to multiple peers. The rate of transfer is constrained by the upload capacity of the sending peer and the authors assume that concurrent uploads from one peer would share the upload capacity equally (fair-sharing assumption). Under this assumption, they prove that for identical upload capacities, a simple greedy algorithm is optimal. The authors [13] also studied the fluid limit model where the chunks are infinitesimally small.
Goetzmann et al. [8] removed the fair sharing assumption from [13] and generalised the model by introducing download capacity constraints. They consider the setting of symmetric heterogeneous capacities – equal upload and download capacities per peer that may differ among peers. Continuing with the assumption that a peer can receive a chunk from only one other peer, they prove that the problem of minimizing makespan is strongly NP-hard and give a -approximation algorithm for a single chunk and a 5-approximation algorithm for multiple chunks.
2 Preliminaries and Notations
For , let denote the set of integers between (and including) and . We use as shorthand for and as shorthand for . For , denotes the set of reals between (and including) and , and the reals such that .
The peers are numbered 0 through , and we assume that the chunk resides at peer 0, which we refer to as the root peer. We define a unit of time to be the minimum time needed by the peer with the highest download capacity to download the chunk. The upload/download capacity of a peer is defined as the fraction of the chunk it can upload/download in one time unit. From our definition of a time unit, it follows that the maximum download capacity of any peer is 1. Let be the upload and the download capacities. Thus, a peer with download capacity can download the entire chunk in time units. Note that the peer may download the chunk at a rate lower than its download capacity; in this case, the time it will take is longer than . Let be the maximum time a peer would take to download the chunk if it does so at the maximum permissible rate.
To allow us to model the problems as time-indexed integer programs, we discretize our schedules. For the migratory setting, we assume that all downloads and uploads begin and end at times that are integer multiples of . If a peer is the peer to receive a chunk in the optimum schedule, its completion time will increase by at most due to this assumption, and this increases the makespan of the optimum schedule by at most . In this setting, time-slot , will refer to the time interval .
In the non-migratory setting, our schedules are “slotted” and all uploads, downloads begin and end at integer times. We shall argue (Lemma 7) that this assumption does not cause a significant increase in the makespan of our solution. In this setting, time-slot , will refer to the time interval . In both settings, time will refer to a point in time, while time-slot will refer to an interval.
Let for migratory schedules and for non-migratory schedules. A schedule is a function where is the fraction of the chunk sent from peer to peer in time-slot . Further, satisfies the following constraints.
- Upload Capacity:
-
The total upload by peer in time-slot , cannot exceed its upload capacity .
- Download Capacity:
-
The total download by peer in time slot , , cannot exceed its download capacity .
- Full Download before Upload:
-
A peer can start uploading only after it has received the entire chunk. Thus, if , then .
Figure 1 shows a schedule for an instance with 4 peers with (upload, download) capacities respectively. The root peer has upload capacity 1 and sends the chunk to peers and during . At , the total upload capacity is 2 (root and peer uploading), and the chunk is sent to peers and during . If the root peer sends the chunk to and peer sends the chunk to during , this schedule is non-migratory, as every peer receives the chunk from a single other peer.
3 Migratory Schedules
We first consider the problem of finding a migratory schedule that minimizes makespan. In such schedules, a peer can download the chunk from multiple peers simultaneously and hence the key challenge is to ensure that there is enough upload capacity available to support all the downloads at any point in time.
3.1 Formulating a Linear Program
We formulate an integer program with completion times to determine if a single chunk can be shared by all the peers within time . We then relax the integrality constraint to obtain a linear program that we solve to obtain a fractional solution.
3.1.1 Feasibility of download completion times
Given times , is there a schedule in which peer completes downloading the chunk at time ? We renumber the peers so that . The chunk is initially with peer 0, and so . Let be the total upload capacity of all peers who have received the chunk at or before time . Thus, for , , and . Note that is a monotonically non-decreasing step-function with steps at ; the steps in are the points of discontinuity of . It now remains to check if there is a way of scheduling downloads such that the total download at time does not exceed and at any time the download of peer does not exceed .
By time the chunk has been downloaded by peers. The total upload capacity available till time , , must thus be at least . This condition, although necessary, for the , , to form a feasible set of download completion times, is not sufficient since we have the added constraint that at any time a peer can only download a fraction of the chunk.
Consider a time and a peer such that . Note that at most a fraction of the chunk can be downloaded in the interval and if at least a fraction of the chunk would have to be downloaded by peer before . Since this is true for every peer with , it follows that all such peers download a fraction of the chunk before . Peers whose completion time is at or before download the entire chunk before and hence
| (1) |
is a necessary condition for feasibility of .
3.1.2 Proving sufficiency
We next give an algorithm for scheduling downloads that establishes the sufficiency of condition 1. We pack the downloads in the upload profile in order of increasing completion times. Let denote the residual upload profile after downloads for peers 1 to have been (see Figure 2). Thus and since downloads for peers 1 to are scheduled before time , are identical after time . For every , we ensure that is a monotonically non-decreasing step-function.
We now give the procedure for scheduling the download of peer . Initially set and . We wish to schedule the download of a fraction of the chunk for peer in . Let be such that . We would like to use the part of the profile of above to pack the download for peer . Since is monotonically non-decreasing, such a packing would be valid if and only if . If this condition is true, then we have completed scheduling the download of peer . Else, we schedule units of download for peer at every time in the interval where is the time at which has a step immediately preceding the step at . The remaining download for peer would be scheduled in the interval and this is done by repeating the above process after setting to and to .
Note that when we repeat the above process, the value of decreases. Our procedure would fail if at some step we realize that the total residual upload capacity in which is , is less than and thus insufficient to schedule the download of peer . We next argue that this is a witness to the violation of condition 1.
Let be the value of at the step at which our procedure fails to schedule the download for peer . Thus .
Lemma 1.
If the algorithm fails to schedule downloads that respect the given completion times, then
Proof.
To prove this lemma, we start with the following Claims.
Claim 2.
, is a monotonically non-decreasing step function. Every step in is also a step in .
Proof.
We prove the statement is true by induction on ; it is true for since .
From our procedure for scheduling the download of peer it follows that there exist steps at , in such that
It also follows from our choice of that for . Monotonicity of in follows from the fact that is monotonically non-decreasing and for . Thus it remains to argue that . If then our choice of in the penultimate step of our process would have been such that and so our procedure would have terminated at the penultimate step.
From the above argument it follows that all steps in in the intervals and are retained in . However the steps in in are not steps anymore in since for . This implies all steps in are also steps in and hence in .
Since is a step in , the above claim implies that there exists such that .
Claim 3.
If peer , receives a download fraction less than at time then peer receives the entire chunk in .
Proof.
Since is a step in it is also a step in . If peer received some download before time , then it was downloaded to an extent less than at times before and after . Hence which would imply that is no more a step in leading to a contradiction. The above claim implies that if the download of a peer is scheduled before , then for all , peer received a fraction of the chunk. This in turn implies that fraction of the chunk is downloaded by peer before . Peers 1 to complete their entire download before and since the algorithm fails while scheduling download for peer it must be the case that
which completes the proof of the Lemma.
Thus, constraint (1) is violated, and this establishes the sufficiency of the condition.
3.1.3 An LP relaxation
The constraint (1) gives a necessary and sufficient condition for to be a feasible set of completion times. We use this to formulate a completion time integer program that is feasible if and only if all downloads can be completed by time . Our first step is to round the download capacities down to the nearest power of , where is a constant to be picked later. To simplify notation, we continue to use for the rounded download capacities. We define class as the set of peers with download capacity ; the number of classes is at most , and we denote this by .
Let and be an indicator variable that is 1 if peer finishes downloading the chunk in the time slot and 0 otherwise. Clearly,
| (2) |
Then , where denotes the total upload capacity in time slot . The constraint (1) can now be formulated as
| (3) |
Relaxing integrality constraints on the variables gives a linear program with variables and constraints. In Theorem 5 we argue that and therefore the LP can be solved in time polynomial in .
Figure 3 shows an instance with an integrality gap of 3/2. Peers have (upload, download) capacities (0,1/2),(2,1),(0,1) respectively, and the root peer has upload capacity 1. The linear program is feasible for while the best integral solution has makespan 3.
3.2 Constructing an integer solution
Let be an optimum solution to the linear program and let . We can modify the algorithm from Section 3.1.2 to compute the schedule corresponding to the solution . With each we associate a pseudo-peer which has download capacity , upload capacity and needs to download only a fraction of the chunk by time slot . The algorithm can be used to schedule the download of the chunk to these pseudo-peers in order of increasing . The only modification needed is that , the fraction of the chunk that remains to be downloaded to the pseudo-peer , is initially set to , and , the largest fraction of chunk which can be downloaded in a time-slot, is now set to .
Thus, if the upload capacity in time slot is , we can build a schedule such that an fraction of the chunk finishes downloading to pseudo-peer and hence to peer by time slot . Let be the earliest time slot such that . Peer completes downloading the chunk at time slot in this schedule we have built and hence it can start uploading the chunk only after time slot . Thus the true (or integral) upload capacity in time slot in this schedule is . Note that , the fractional upload capacity at time slot , is the sum of the upload capacities of all peers weighted by the fraction of the chunk that they have received by time slot . It follows therefore, that for all time slots , . A feasible schedule can, in any time slot , only permit downloads of total capacity at most the true upload capacity and hence the schedule obtained from solution is not feasible. In order to build a feasible schedule we need a better understanding of the difference between and . Informally, the following lemma shows that for two peers in the same class, the download of the peer with higher upload capacity precedes the download of the peer with lower upload capacity.
Lemma 4.
There exists an optimum fractional solution in which such that , if then
Proof.
For contradiction assume that and time slots such that . Let and we construct a new LP solution which is identical to with the exception that and .
We next argue that is also a feasible solution to the linear program. Replacing the download of peer that finishes in time slot with an equal download of peer does not change the left-hand side of inequality (3) since . The same holds for replacing the download of peer ending in time slot with a download for peer . Doing these exchanges together ensures that the constraint (2) is also met.
Let . Since , exchanging downloads of and in time slots will only increase the upload capacity at time slots . Thus and they are equal otherwise. This implies that the right-hand-side of inequality 3 is only larger in than in and hence the constraint is met.
3.2.1 Shifting downloads within a class
Let be peers of a class such that . From the above lemma, it follows that there exists an LP solution, say , such that . In time slot , the total contribution (in solution ) by peers of class to the true upload capacity is exactly while their contribution to the fractional upload capacity is at most .
We now build a solution by shifting the downloads of peers within each class. This is done by downloading the chunk for peer in the time slots in which the download for peer was scheduled in the solution . This can be done since the restriction on the fraction of the chunk that can be downloaded in a time-slot (the download capacity) is identical for peers . Thus for all and , we set, , where are peers of a class. For now, we will assume that peer has already received the chunk from the root peer and shall remove this assumption later.
We repeat this process of shifting downloads of peers for all classes. Once this shifting is done, consider and the true upload capacity in solution at time slot . The total contribution by peers of class to this quantity is , which is at least as large as their contribution to the fractional upload capacity in time slot in solution . Since the fractional upload capacity was sufficient to schedule all downloads, it follows that the solution is indeed a feasible solution. We note our assumption that the peers with the highest upload capacity in each class (the leaders) have already received the chunk at time from the root peer and hence their upload capacity is being included in the true upload capacity available in each time slot.
3.2.2 Distributing chunks to class-leaders
We now show how to distribute the chunk to the leaders. The root peer simultaneously uploads the chunk to each of the class leaders at a rate , and hence all class leaders receive the chunk at time . This requires that the root peer has an upload capacity of at least .
We assume is chosen such that is an integer. We construct a new solution by shifting the solution forward in time by units. Formally, for peers and time slots we set . This shifting creates empty time-slots at the start, which we use for downloading the chunk from the root to all class leaders. The makespan of this modified solution is at most .
We are now ready to prove the following theorem.
Theorem 5.
In the migratory setting, there is a polynomial time algorithm to compute a schedule of transfers such that a root peer with upload capacity can share a chunk with all peers in time at most , where is the time required by the slowest peer to download the chunk. When , our schedule has makespan .
Proof.
Suppose . All peers have download capacity at least . Since , they can trivially download the chunk simultaneously from the root peer (which has upload capacity ) in time , and this is the optimum. Hence, we assume . All peers can also download the chunk sequentially from the root peer in time at most , and this is an upper bound on the makespan .
Let be the original instance and the instance with download capacities rounded down to a power of .
Claim 6.
.
Let be a guess on the makespan of the instance . Note that is at least and at most . By doing a binary search on and solving the linear program for the instance on each guess, we determine the smallest value of , say , for which the linear program is feasible. Note that and let be the solution to the linear program. We modify as discussed in previous sections to obtain a feasible solution with makespan .
The upload capacity of the root peer , must be at least and this implies . With this choice of , the schedule returned by our algorithm has makespan . When , the makespan is at most .
If the peer with the largest upload capacity is different from the root then we incur an additional time in transferring the chunk from the root to this peer.
4 The non-migratory setting
This section considers the setting when a peer can download the chunk from only one peer – the parent peer. Let be an estimate of the minimum makespan of a slotted schedule, defined below. We divide the interval into slots of different sizes. Let . For , a class--slot is the time-interval and we refer to this interval as the slot . For a slot and time , we let “” denote , “” denote , and “” denote . Note that any two slots – of the same or different classes – are either disjoint or one slot contains the other.
4.1 Slotted Schedules
A peer is in class if ; let denote the class of peer . A peer downloads the chunk in slot if the chunk is downloaded in the interval at a uniform rate . A peer can download the chunk in a slot of class only if . We call a schedule slotted if every peer downloads the chunk in some slot and we formulate an LP to minimize the makespan of a slotted schedule. The following Lemma, whose proof we omit due to space constraints, argues that the makespan of a slotted schedule is not significantly more than the makespan of the optimum schedule.
Lemma 7.
Any non-migratory schedule with makespan can be converted into a non-migratory slotted schedule with makespan at most .
4.2 An LP for slotted schedules
Let be our estimate of the minimum makespan and let be 1 if peer downloads the chunk from peer in slot . Note that peer can upload the chunk in a slot only if . We combine this with our earlier observation that to define variables only for where
-
1.
The constraint ensures that peer downloads the chunk in some slot.
-
2.
Let denote the fraction of the chunk downloaded by peer before time . Note that this is at most . Peer 0 (the root) has the chunk available to it at time 0 and hence .
-
3.
The total downloads from peer assigned to slots containing is and this cannot exceed which is the total upload capacity of peer at time .
This then yields the following system of linear inequalities.
| (4) | |||||
| (5) | |||||
| (6) | |||||
If the above linear system is infeasible, our estimate of is too low and we double . By performing a binary search on we determine the smallest , say , such that the linear system is feasible and let be this feasible solution.
4.3 Rounding to a non-migratory schedule
An important property of , along the lines of Lemma 4, is that if are peers of the same class and , then cannot be downloading the chunk in a slot that ends strictly before any slot in which downloads the chunk.
Lemma 8.
Let be such that . Let and . Then for all such that and , .
Thus for each class, the chunk is received by peers of this class in decreasing order of their upload capacity. For each class order the peers of this class by decreasing upload capacity, breaking ties arbitrarily and let be the peer in this ordering; it would be convenient to define as 0 (the root peer). Let be the total extent to which peers of class download the chunk in a slot . To determine which slot each peer of class downloads the chunk in, we order slots by increasing end-time. Then peer downloads the chunk in the earliest slots of this ordering for which the sum of is 1. In general, if denotes the time at which the chunk is fully downloaded by peer then is the earliest time at which . We define as 0. Thus, . Note that a peer could be downloading the chunk from many different peers and hence, is a migratory schedule. We now modify in a sequence of steps to obtain a non-migratory schedule.
4.3.1 Ensuring a peer uploads only after downloading the entire chunk
From the preceding discussion it follows that for , for and for . Note that is monotonically non-decreasing with time and increases from 0 to 1 in the interval . In the linear system, a fraction of the upload capacity of peer is available at time and thus in the solution , peer can start uploading the chunk after time and before time . However, recall that in a valid schedule, peer can upload the chunk to the other peers only after it has completed the download at time . We resolve this by
-
1.
assuming that at time 0, the chunk is available to the first peer of each class, and
-
2.
scheduling the downloads by peer , , in slots which peer uses to download the chunk.
Peer would then complete its download by time and the uploads by peer after time in the modified solution would now be valid. The modified solution is defined by
Let be the peers who are not class leaders. We next argue that the solution satisfies the constraints of our linear system for peers in .
Proof.
Consider a peer . Since , we have
where the second inequality follows from the fact that is a feasible solution and satisfies constraint (4).
When viewed as a function of , increases from 0 to 1 at . Since for all it follows that for all , . Thus,
and this establishes constraint (5)
Finally to prove constraint (6) we observe that for any peer and slot , . Thus
where the last inequality follows from the fact that for any peer , increases from 0 to 1 at while remains 0 till and is at most 1 at . For a peer , for all while .
4.3.2 Ensuring a peer downloads from only one peer
Note that in a peer could be downloading the chunk from multiple peers and so is a migratory solution. We will now modify this solution so that each peer downloads the chunk completely in one slot from one peer. Let be the total extent to which the chunk is downloaded from peer in slot .
We now construct a bipartite graph where contains a vertex for each peer which is not a class leader. For each triple we have a vertex in if . Let be a function on the vertices in defined such that for , . For peer we include edge in if and .
The graph and the function define an instance of an assignment problem in which we wish to assign every vertex to a vertex in , such that and the number of vertices assigned to any vertex is at most . For integer it is known that a fractional assignment implies the existence of an integer assignment. The variables define a fractional assignment as follows
-
1.
For edge , assign vertex to vertex to an extent .
-
2.
For , the feasibility of solution implies
Hence vertex is assigned to a total extent of 1 in this fractional assignment.
-
3.
and hence the total fractional assignment of vertices of to a vertex is no more than .
An integer assignment can be computed by finding a maximum flow in a network obtained by adding vertices and edges and to the graph . All edges of and edges incident to are assigned unit capacity while edges are assigned capacity . Since all edge capacities are integer a maximum flow in this network is integral and edges of carrying unit-flow give an integer assignment. We use this assignment to define an integral solution by setting to 1 iff the vertex corresponding to peer in is assigned to vertex . We also set .
Lemma 10.
Proof.
In the assignment we use to define , every peer in is assigned to a (peer, slot) pair. This implies that for ,
which is constraint (4).
For a peer , we have set for and hence we only need to check constraint (5) for . From our construction of graph it follows that in the integer assignment, peer is assigned to a vertex such that . This implies that for
which implies constraint (5) is satisfied.
As concerns constraint (6), note that for any time and peer
where the last inequality follows from the fact that peer is uploading the chunk at time in solution , and hence .
Since every peer downloads the chunk from one peer the solution is non-migratory. However, it violates the constraint that the total download scheduled from a peer at any time should not exceed the upload capacity of that peer. In addition, we need to distribute the chunk to class leaders.
4.3.3 Scaling and shifting the schedule
We first project the solution to obtain a solution that does not reference slots. Let be the fraction of the chunk sent from peer to peer in time-slot . For , we have
and hence every peer in receives the complete chunk. Since is 0/1, peer downloads the chunk in some slot where . This implies for all and hence no peer downloads at a rate more than its download capacity.
Since is 0/1 and
implies . Hence, peer uploads the chunk only after it has completed its download. Finally, the total upload by peer in time slot can be bounded as
To ensure that this does not exceed the upload capacity of peer , we scale and stretch the solution by 3. Formally, for , and we set
With this modification, the total upload by peer in a time slot does not exceed its upload capacity .
Finally, we shift this solution by time units and use the interval to distribute the chunk to class leaders as was done for the migratory setting. The final solution, which we denote by , is defined as . Note that the makespan of the solution is where is the minimum makespan of a slotted schedule. By Lemma 7, where is the minimum makespan of a non-migratory schedule.
Theorem 11.
There is a polynomial time algorithm to find a non-migratory schedule of makespan at most where is the minimum makespan of a non-migratory schedule and is the maximum, over all peers, of the minimum time that a peer needs to download a chunk from root.
5 Conclusion
The paper takes a linear programming approach for scheduling transfers of a chunk of a file in a hose-model network and presents the first results for the general setting when the upload and download capacities of peers could be different – for each peer and also across peers. We revisit the assumption made in earlier papers that a peer can download the chunk from only one other peer (We call this the non-migratory setting). The migratory setting has no such constraint. We present approximation results for makespan in both these settings and various special cases. The non-migratory assumption meant that the solution defined a tree and hence all previous approaches were in effect focused on constructing a suitable tree. Whether the migratory setting (like the non-migratory setting) is NP-hard needs to be studied.
Extending our approach to multiple chunks seems challenging. For sharing multiple chunks one would like to design a schedule which is periodic – the schedule for distributing each chunk is identical, albeit shifted in time. Periodic schedules are easy to implement in a network since the same upload and download policies are followed for every chunk. However, in a periodic schedule the upload capacity for a chunk is no more a monotonically non-decreasing function and this causes most of our arguments to break down.
It would be interesting to see if the algorithm in this paper can be redesigned so that it does not involve solving a linear program or if the linear program could be solved very efficiently perhaps using a combinatorial algorithm.
References
- [1] Satyajeet Singh Ahuja, Varun Gupta, Vinayak Dangui, Soshant Bali, Abishek Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, and Ying Zhang. Capacity-efficient and uncertainty-resilient backbone network planning with hose. In Fernando A. Kuipers and Matthew C. Caesar, editors, ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021, pages 547–559. ACM, 2021. doi:10.1145/3452296.3472918.
- [2] What is cloud networking? accessed on 27/01/2025. URL: https://aws.amazon.com/what-is/cloud-networking/.
- [3] Mohammad Banikazemi, Vijay Moorthy, and Dhabaleswar K. Panda. Efficient collective communication on heterogeneous networks of workstations. In 1998 International Conference on Parallel Processing (ICPP ’98), 10-14 August 1998, Minneapolis, Minnesota, USA, Proceedings, pages 460–467. IEEE Computer Society, 1998. doi:10.1109/ICPP.1998.708518.
- [4] Amotz Bar-Noy and Ching-Tien Ho. Broadcasting multiple messages in the multiport model. IEEE Trans. Parallel Distributed Syst., 10(5):500–508, 1999. doi:10.1109/71.770196.
- [5] Amotz Bar-Noy, Shlomo Kipnis, and Baruch Schieber. Optimal multiple message broadcasting in telephone-like communication systems. Discret. Appl. Math., 100(1-2):1–15, 2000. doi:10.1016/S0166-218X(99)00155-9.
- [6] Nick G. Duffield, Pawan Goyal, Albert G. Greenberg, Partho Pratim Mishra, K. K. Ramakrishnan, and Jacobus E. van der Merwe. A flexible model for resource management in virtual private networks. In Lyman Chapin, James P. G. Sterbenz, Guru M. Parulkar, and Jonathan S. Turner, editors, Proceedings of the ACM SIGCOMM 1999 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 30 - September 3, 1999, Cambridge, Massachusetts, USA, pages 95–108. ACM, 1999. doi:10.1145/316188.316209.
- [7] Michael Elkin and Guy Kortsarz. Sublogarithmic approximation for telephone multicast. Journal of Computer and System Sciences, 72(4):648–659, 2006. doi:10.1016/j.jcss.2005.12.002.
- [8] Kai-Simon Goetzmann, Tobias Harks, Max Klimm, and Konstantin Miller. Optimal file distribution in peer-to-peer networks. In Takao Asano, Shin-Ichi Nakano, Yoshio Okamoto, and Osamu Watanabe, editors, Algorithms and Computation - 22nd International Symposium, ISAAC 2011, Yokohama, Japan, December 5-8, 2011. Proceedings, volume 7074 of Lecture Notes in Computer Science, pages 210–219. Springer, 2011. doi:10.1007/978-3-642-25591-5_23.
- [9] Sandra Mitchell Hedetniemi, Stephen T. Hedetniemi, and Arthur L. Liestman. A survey of gossiping and broadcasting in communication networks. Networks, 18(4):319–349, 1988. doi:10.1002/net.3230180406.
- [10] J. Hromkovic, R. Klasing, B. Monien, and R. Peine. Dissemination of information in interconnection networks (broadcasting and gossiping). In F. Hsu and D.-Z. Du, editors, Combinatorial Network Theory, pages 125–212. Kluwer Academic Publishers, 1995.
- [11] Samir Khuller and Yoo Ah Kim. Broadcasting in heterogeneous networks. Algorithmica, 48(1):1–21, 2007. doi:10.1007/s00453-006-1227-9.
- [12] Pangfeng Liu. Broadcast scheduling optimization for heterogeneous cluster systems. Journal of Algorithms, 42(1):135–152, 2002. doi:10.1006/jagm.2001.1204.
- [13] Jochen Mundinger, Richard R. Weber, and Gideon Weiss. Optimal scheduling of peer-to-peer file dissemination. J. Sched., 11(2):105–120, 2008. doi:10.1007/s10951-007-0017-9.
- [14] R. Ravi. Rapid rumor ramification: Approximating the minimum broadcast time (extended abstract). In 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, USA, 20-22 November 1994, pages 202–213. IEEE Computer Society, 1994. doi:10.1109/SFCS.1994.365693.
Appendix A Special cases of non-migratory schedules
The high approximation guarantee of Theorem 11 motivates us to study two special cases of the non-migratory model – when all peers have the same download capacity and when all peers have the same upload capacity. For uniform download capacities, we adapt a proof by Liu [12] to show that the Greedy algorithm produces the optimum makespan for instances where all upload capacities are a power of ; the full details for this special case and proofs of Lemma marked with a are deferred to the full paper.
A.1 Uniform upload capacity
We now assume that for all and recall . If then peer can transfer the chunk to peer at a maximum rate . Since a peer can receive the chunk from only one peer, there is no loss of generality in limiting download capacities to . Hence for all , . Next, we scale all download and upload capacities by to get an instance, , with upload capacity 1 and .
We now consider a procedure for scheduling downloads of peers that would involve modifying the download capacity of some peers besides adding some dummy peers. We begin by rounding down to the nearest power of 1/2 and renumber peers so that . For the rest of this section, we will assume is a power of 1/2. We will schedule the downloads by peers in blocks in the order to and let be the earliest peer whose download has not yet been scheduled. Let be the earliest time by which peers have completed their download and note that peers (including the root) have received the chunk by time and the upload capacity at time , , is . Initially at , .
Let and note that downloads at rate can be scheduled during ; in subsequent discussion, we refer to this time interval as a block. Peers would be the peers downloading the chunk in this block and we increase their download capacities to . If , i.e., the number of remaining peers is less than then we create dummy peers with upload capacity 1 and download capacity . At the end of this round the value of increases to while that of increases to . We continue to the next round if . Let be the value of when this procedure completes and let be the instance obtained. Note that might have more peers than the original instance and in some peers might have a download capacity different from their download capacity in . Let be the schedule obtained and note that is a valid schedule for and has makespan .
Lemma 12 ().
.
We now convert the schedule into a schedule, , which is valid for the instance and has makespan at most . Consider a block in in which peers to download the chunk during the interval . Let . Note that , the block following in , corresponds to the interval . We observe that peers who download the chunk in block have a download capacity larger than and can therefore be scheduled in block in a valid schedule for instance . This suggests constructing from by simply shifting downloads of peers in a block to the following block. A block, is appended to to accommodate the downloads of peers in the last block of . Hence has makespan .
The last piece of the argument is to prove that . To prove this we shall show that a makespan of is infeasible for instance . To show this we present a technique to argue infeasibility of a makespan value for a given instance of the problem.
Let be a monotonically non-increasing assignment of weights to time slots. If is the upload capacity in time slot in a given schedule then we refer to the quantity as the total weighted upload capacity of and denote it by .
Let denote the sum . If peer finishes its download at time in then it contributes to . The download by peer consumes at least units of . Thus the net contribution of peer to if it completes its download at time in is and we denote this by . Finally, for , let and let , the net contribution of peer 0 to the , equal . Note that is independent of but depends on our choice of .
Claim 13.
If is a feasible value for the makespan then for any choice of , .
Proof.
Let be a schedule with makespan and be the fraction of the chunk that peer downloads from peer in time slot . Let be the time at which peer completes its download. For any peer , and for any time , . This together with the fact that is non-increasing implies that for all ,
| (7) |
For all ,
Multiplying the above inequality by and summing over all yields
We now sum inequality (7) for all and combine it with the above inequality to obtain
which implies that . The claim follows from the fact that . The above claim implies that if we can find an assignment of weights such that then we can argue that .
Lemma 14 ().
Theorem 15.
There exists a polynomial time algorithm for the broadcast problem in the hose model with uniform upload capacities that finds a solution with makespan at most .
