Scheduling Problems over Network of Machines

We consider scheduling problems in which jobs need to be processed through a (shared) network of machines. The network is given in the form of a graph the edges of which represent the machines. We are also given a set of jobs, each speciﬁed by its processing time and a path in the graph. Every job needs to be processed in the order of edges speciﬁed by its path. We assume that jobs can wait between machines and preemption is not allowed; that is, once a job is started being processed on a machine, it must be completed without interruption. Every machine can only process one job at a time. The makespan of a schedule is the earliest time by which all the jobs have ﬁnished processing. The ﬂow time (a.k.a. the completion time) of a job in a schedule is the diﬀerence in time between when it ﬁnishes processing on its last machine and when the it begins processing on its ﬁrst machine. The total ﬂow time (or the sum of completion times) is the sum of ﬂow times (or completion times) of all jobs. Our focus is on ﬁnding schedules with the minimum sum of completion times or minimum makespan. In this paper, we develop several algorithms (both approximate and exact) for the problem both on general graphs and when the underlying graph of machines is a


Introduction
Scheduling problems have been studied extensively over the past several decades.In this paper, we consider a class of scheduling problems in which there is an underlying network of machines.Before stating our problem, let us start with the classical job shop scheduling problem.In job shop, we are given a collection J of n jobs and a set M of m machines.Each job j consists of a sequence of µ j operations O 1j , O 2j , . . ., O µj j .Operation O ij takes p ij ∈ Z ≥0 time units on machine m ij ∈ M .A feasible schedule specifies for each job the times its operations must be performed such that each machine processes at most one operation at any time and for each job, and an operation is performed only if all preceding operations are already performed.We assume all jobs are available at time zero.Let C j be the completion time of job j in a schedule.Then the makespan of the schedule is C max = max j C j and the weighted sum of completion time is j w j C j where w j ≥ 0, j ∈ J are given weights for the jobs.Two common performance measures are to find schedules with minimum makespan or minimum (weighted) sum of completion times.We refer to the latter as min-sum or weighted min-sum objective.When p ij 's are all equal to p j (i.e.independent of the machine) then we have the identical machine setting.Otherwise, we have the unrelated machine setting.
There are many special cases of job shop scheduling studied in the literature.One specialization that still generalizes several other problems and has drawn attention more recently is when there is an underlying network of machines.In this setting, we assume we are given a graph G = (V, E) where each edge e corresponds to a machine.Each job j ∈ J has a specific path Q j starting at s j ∈ V and ending at t j ∈ V .The path specifies the set of machines the job has to go through in a specific order (i.e. the sequence of its operations).If the graph G is a simple path P = v 1 , v 2 , . . ., v m+1 (where v i v i+1 corresponds to machine m i ), each s j = v 1 and t j = v m+1 for all jobs j ∈ J then we get the classical flow shop problem.Another interesting special case is when we have a general graph G, but all p ij 's are 1; this problem becomes the classical packet routing problem in a network (see [14,15]).There are also works when the underlying graph G is a tree or other special graphs (see [1,13,19,20]).

Previous work
The amount of previous work on these problems is simply too large to be reviewed comprehensively here.We mention only some of the work and refer the reader to the references in them.Trivial lower bounds used in many of the previous work for makespan are the congestion and dilation lower bounds.If C is the largest congestion of any machine (the maximum over all machines i of the total running time of jobs that have an operation on i) and D is the largest dilation (longest time it would take a job to perform regardless of the presence of other jobs) then lb = max{C, D} is clearly a lower bound on the makespan.For general job shop Shmoys et al. [25] presented an algorithm with performance ratio O((log lb) 2 / log log lb).When jobs can be preempted (i.e.their processing can be paused in the middle of any operations to be resumed later) one can get better results (see [2]).
Acyclic job shop is a special case of job shop where no job has two operations on the same machine.For this setting, Scheideler and Feige [6] present an algorithm to schedule with makespan O(lb log lb log log lb).To complement this, for acyclic job shop with identical machines they provide a family of instances with optimum makespan Ω(lb log lb/ log log lb).
The approximation in [6] is also the best known result for the case of flow shop (which is a special case of acyclic job shop).For the slightly more general setting of flow shop where each job still has to go through the machines in the order they appear but may not need to be run on all of them (i.e.only needs to be run on a subsequence of machines), Mastrolilli and Svensson [18] prove a hardness of approximation of ratio Ω(log1− lb).For the flow shop problem with identical machines (also referred to as proportionate flow shop), Shakhlevich et al. [23] present a polynomial time algorithm for the weighted min-sum objective.
As mentioned earlier, for the special case of p ij = 1 for all i, j, the problem reduces to the packet routing problem, where each job is simply a packet that takes one unit of time to travel each edge (being a machine or a router).For this, the celebrated result of Leighton et al. [14,15] and subsequent works show that there is a schedule of length O(lb).The most recent result by Harris and Srinivasan [10] show that there exists a schedule of length 7.26 • (C + D) (non-constructive) and an algorithm that finds a schedule of length 8.84 • (C + D).More recently, Peis et al. [19] have shown that for the case of packet routing on a tree, one can get a schedule of length at most C + D − 1; so this implies a simple 2-approximation.For the special case of packet routing when G is simply a path and all packets go from left-to-right, [1,12] show that the schedule in which at each time step each machine (edge) processes the job that has the shortest distance to go finds the optimum solution for the min-sum objective.Similar algorithms (namely furthest-to-go first) find the optimum solution for makespan objective [12].
For packet routing for in-trees or out-trees (directed trees in which the in-degree of each node is at most one, or out-degree is at most one, respectively) results of [16] show that the furthest-to-go strategy gives optimum solution for makespan.Based on this, [19] observe that it is easy to get a 2-approximation for makespan on undirected trees (by converting the tree into a rooted tree and splitting each schedule into two stages where in the first stage all the packets must first go up and then all the packets must go down to their destination in the 2nd stage).Similar results are claimed by Kowalski et al. [13] for makespan and min-sum objective on trees. 1  In [17,22], the authors give a general framework for a broad class of scheduling problems (using LP rounding) that shows that any approximation algorithm with ratio ρ w.r.t. the trivial lower bound lb for makespan can be used to obtain a 2eρ approximation for the min-sum objective.As a special case, this applies to the scheduling problems on networks of identical machines.We will use this result in some of our results.It is worth pointing out that some of the ideas in [17,22] which are also used in subsequent works have similarities to the ideas of approximation of minimum latency in vehicle routing problems (like the classical minimum latency) which use an approximation for minimum k-stroll or minimum k-spanning tree (k-MST) as a subroutine (see [4] and earlier works).
More recent works have looked at some other variants of scheduling on a network.Im and Moseley [11] look at the online scheduling problem where the network is a tree.In their model, the edges are considered routers and each leaf node corresponds to a machine.Each job must start from the root and then pass through the routers to arrive at a machine to be scheduled on.Each router and machine can process one job at a time.Machines may be unrelated, but routers are identical.They present constant factor competitive approximations using constant speed-up for makespan.Bhattacharya et al. [3] look at coordination mechanism for routing problems on a tree.

Our results
All of our results are for the identical machines setting (so each job j ∈ J has a processing time p j , independent of the machine).

Scheduling Problems over Network of Machines
Our first result is really just some smaller observations on our part, our more interesting results are mentioned later.However, it points out an improvement for the acyclic job shop problem with identical machines, so we think it bears mentioning.Theorem 1.For trees, for both makespan and min-sum objective, there are polynomial time O(min{log n, log m, log p max })-approximation algorithms, where p max is the maximum processing time among all jobs.If all jobs have unit processing time, then there is a polynomial time 4e-approximation for the min-sum objective.
For acyclic job shop with identical machines, under both the makespan and the min-sum objective there is an O(min{log n , log p max })-approximation where is the maximum number of machines in a job's sequence.
Note p max ≤ lb so this improves over the approximation for acyclic job shop in [6] by an O(log log lb)-factor, but only for the identical machines case.Recall that [6] show existence of family of instances of acyclic job shop with identical machines having optimum makespan Ω(lb log lb/ log log lb), so the upper bound is tight within an O(log log lb) factor.
We should point out that earlier works [1,12] imply a 2-approximation for minimizing the makespan for identical jobs on trees.We also consider a special case of trees, called junction-trees: in this setting, the network is a rooted tree T and for each job j ∈ J, the Q j path for j contains the root.A special junction-tree is when T is simply a star with all the jobs starting and ending at the leaves of T .
Theorem 2. For scheduling on junction-trees, there is a 4-approximation for makespan and a 8e-approximation for the min-sum objective.Furthermore, if all processing times are 1, there is a different 3-approximation algorithm for the min-sum objective.
Perhaps the strongest and most technical result of our paper is for the simplest setting of star networks.We prove the following.Theorem 3.For the min-sum objective on stars where all the jobs start and end on leaves there is a 7.279-approximation algorithm.For the special case of unit processing time, there is a 1.796-approximation algorithm.This setting is more interesting than one might initially think; it is closely related to biprocessor scheduling problems studied in, say, [9].This connection is examined more closely at the start of Section 2.
Another special case of junction trees is when each job starts at the root and may take (any) root-to-leaf node in order to be completed.So there is not a specified path of machines that job j must run on.Instead, we have to decide the path as well as how to schedule the jobs.This is the same setting as in [11] for which the authors present online algorithms.It turns out for this special case computing a schedule with the min-sum objective can, in fact, be solved in polynomial time.We call this problem rooted-tree routing scheduling.Theorem 4. For the rooted-tree routing scheduling, there is a polynomial time algorithm to compute a schedule with the min-sum objective.

Outline of the paper:
We start by studying the simplest setting (star networks) and prove Theorem 3 in Section 2. The approximation algorithms for trees and junction trees as well as the observation for acyclic job shop with identical machines (Theorems 1, 2, and 4) are presented in Section 3.

Approximation Algorithms for Stars
In this section, we look at the min-sum objective for scheduling on a star where jobs start/end at leaves.One problem related to the scheduling problem defined on a star network is biprocessor scheduling or data migration which can be modelled as edge sum-coloring or edge sum multi-coloring [7,8,9].In the data migration problem, one has to move data stored among devices in a network from one configuration to another.The network is modeled as a graph G = (V, E) where each vertex v ∈ V represents a data storage and an edge e = v i v j represents the need to transfer data between v i and v j .This transfer may take p e time units and will keep both v i and v j busy for that many steps.A transfer cannot be preemptive (hence, once started must run until completed) and no node v i can be transferring data to/from more than one other data storage at the same time.So, only data transfer over edges that form a matching can happen concurrently.The goal is to find a schedule for these transfers and minimize the makespan (the time the last transfer completes) or the min-sum objective (the average time the transfers are completed).This is essentially biprocessor scheduling where the nodes are the processors, the tasks are represented by edges, and each task requires two specific resources (its two end-points) in order to run.When all p e 's are one, minimizing the min-sum objective is equivalent to the min-sum edge coloring of G [9], and it has been studied extensively.In the min-sum edge coloring, one has to find a proper edge coloring φ : E → Z + that minimizes e φ(e).One can think of φ(e) as the time step in which edge e is scheduled to run on the two processors of its end-points.In the min-sum edge multi-coloring, each edge e has a requirement p e and one has to assign p e distinct integers (as colors) to e such that for any two adjacent edges the set of colors assigned to them are disjoint.If one further requires each set of colors to form a consecutive sequence of integers, then those p e integers can be considered to be the time steps in which task e = v i v j is supposed to run on the two processors v i , v j .The best approximation algorithm for the min-sum edge coloring is due to Halldorsson et al. [9] who present a configuration LP rounding with ratio 1.8298 and a combinatorial 1.8886-approximation.For biprocessor scheduling with arbitrary processing times p e , Gandhi et al. [7] give a 7.682-approximation.
The problem we are considering, when restricted to networks of stars is another form of biprocessor scheduling in which each task requires being performed on two specific processors and in a specific order.More formally, suppose that the star T = (V, E) with root/center node r is the network and each job j ∈ J starts and ends at leaf nodes s j , t j , respectively.We first create a directed demand graph H = (V H , E H ) whose vertices correspond to machines (i.e.edges of T ) and whose arcs correspond to jobs in J, where each arc (s j , t j ) ∈ E H reflects the fact that job j needs to be processed on machines {s j , r} and then on {r, t j }.So, |V H | = m and |E H | = n.We will use e j ∈ E H to refer to a job j ∈ J.
In this Section, we prove Theorem 3. We start first by presenting the algorithm for the general case which achieves an approximation ratio of 7.279.We then present a modified algorithm that has ratio 1.796 for when all p j 's are 1.

Approximating stars with general processing times
Our algorithm for both the general and unit processing times has the following general framework which is somewhat similar to the general framework of minimizing latency (see [4] and earlier works) to convert a makespan objective to a min-sum objective.Our algorithm works in stages where in each stage we try to find the maximum number of jobs that can be scheduled subject to a makespan bound B, which is increasing geometrically in each iteration. APPROX/RANDOM'17

5:6 Scheduling Problems over Network of Machines
Data: Auxiliary graph H, a constant c ∈ R >0 to be fixed later Result: A scheduling of the jobs

7
Schedule J i using Proposition 7, starting at the previous iteration's completion time.
10 end Algorithm 1: Approximation for the min-sum scheduling on stars with identical machines.
We show how even a bicriteria approximation for this makespan version of the problem can give a good approximation for the min-sum objective.Most of the work is in finding a good schedule subject to the makespan bound.Given a schedule, for a subset of jobs Ĵ ⊆ J, we define the makespan of Ĵ as the difference in time between when the last job of Ĵ finishes processing on its last machine and when the first job of Ĵ begins processing on its first machine.We also define the load of a machine i to be the total processing time of jobs in Ĵ incident to i in H.Note that the notions of makespan (in our original graph T ) and load (in our demand graph H) are closely related.We define (ρ, t)-proper sets of jobs, which will be used in our algorithm.
Definition 5 ((ρ, t)-proper set).For ρ ≥ 1 and t > 0, we call a subset of jobs Ĵ ⊆ J a (ρ, t)-proper set if the two following conditions hold: | Ĵ| is at least the size of the maximum subset of J that can be scheduled with a makespan of at most t.
For each machine i, the total load (congestion) of jobs in Ĵ that have i as their first machine (called the in-load of i) is at most ρ • t and also the load of jobs that have i as their second machine (called the out-load of i) is at most ρ • t.
We, later on, show how we can build a schedule of jobs in a (ρ, t)-proper subset | Ĵ| with small makespan and small average completion time of those jobs in Proposition 7. Assuming we have an algorithm that can find (ρ, t)-proper sets of jobs for any given t, combined with Proposition 7 we show how we can build an algorithm for the star scheduling problem with the min-sum objective.At each iteration i, we fix a value t i and do the following: we first find a proper set of remaining jobs with respect to t i and then, we find a "good" scheduling of these jobs.Algorithm 1 describes the procedure formally. 2efore we proceed with the analysis of Algorithm 1, we show how to perform Step 6, i.e. find a proper set of jobs among remaining jobs, and also some details about Step 7. Lemma 6.There is a polynomial time algorithm that finds a (1.5, t)-proper set for any t.
Proof.Let OP T t be the maximum number of jobs from J that can be scheduled with makespan at most t.First, observe that jobs/edges e in H with p e > t 2 do not appear in any feasible scheduling with a makespan of t as each such job needs to run sequentially on two machines.Remove such jobs from consideration.Let p max = max j p j ; thus p max ≤ t/2.We will find a set of jobs J such that the in-load of each machine and the out-load of each machine is at most t + p max ≤ 1.5 • t and | J| ≥ OP T t .
To find this set, we first consider the problem of picking the maximum number of jobs such that for each machine i the in-load and out-load are at most t.Note the size of this set is at least OP T t .To find such a set, we round an LP relaxation.
Construct an undirected bipartite graph H = ( Ṽ1 ∪ Ṽ2 , Ẽ) from H: corresponding to every vertex v ∈ V H (i.e. for each machine), we create two copies ṽ1 and ṽ2 in Ṽ1 and Ṽ2 , respectively; for every (directed) edge e = (u, v) ∈ R i (which corresponds to a job) with p e ≤ t/2, we put an undirected edge ẽ = (ũ 1 , ṽ2 ) into Ẽ and let p ẽ denote the corresponding value p e .We work with the following LP relaxation: This LP is exactly the LP relaxation for the so-called demand matching problem whose study was initiated in [24].From [24] (which uses an iterated relaxation technique) and the fact that the graph H is bipartite, we can find an integer vector x ∈ {0, 1} Ẽ with The edges in E corresponding to e ∈ Ẽ with x e = 1 forms a (1.5, t)-proper set.
We should point out that the (1.5, t)-proper set obtained in the proof of Lemma 6 has the property that the in-load and out-load of each node is at most t + p max .Now we describe a method that, given such a (ρ, t)-proper set Ĵ (for any ρ ≥ 1), returns a schedule of them with a makespan of at most ρ • t and furthermore, the average completion time of each job is small.Proposition 7. Suppose that Ĵ is a (1.5, t)-proper set as obtained by Lemma 6.There is a scheduling of the jobs in Ĵ with a makespan of at most 2t + 2p max ≤ 3t.Furthermore, the average completion time of a job in that schedule is at most γ = 2t + p max ≤ 2.5t.
The algorithm for this proposition is a simple 2-stage one: in the first stage, each machine i processes (in some arbitrary order) those jobs in Ĵ that have i as their first leg, i.e. are going towards the center of the star where this machine is their first leg.Once all the jobs in Ĵ have arrived at the center of the star (i.e. have completed their first leg), each machine i starts processing the jobs that have i as their second machine, from smallest to largest processing time.It is straightforward to observe that each stage takes at most t + p max ≤ 1.5t units of time to complete; so the total makespan of all jobs is at most 2t + 2p max ≤ 3t.
The proof that the average completion time of each job is at most 2t + p max is a bit more involved, and we defer the detailed proof to the full version of the paper.Using this proposition in Step 7, we can turn the (1.5, t i )-proper set found in Step 6 into a schedule for that set with makespan at most 3c i+α and average completion time of each job in that set will be 2.5c i+α .Theorem 8. Algorithm 1 is a 7.279-approximation algorithm for the min-sum objective on stars when jobs have general processing times. APPROX/RANDOM'17

5:8 Scheduling Problems over Network of Machines
Proof.Following the notation of [4], let u j be completion time of j'th job in our schedule and let c opt j be the completion time of j'th job in a schedule with the optimum min-sum objective (note that these jobs might not be the same).We would like to bound u j w.r.t.c opt j .Assume that c opt j = dc k for some d < c and some k ≥ 1.Based on the value of d with respect to the random variable α in Algorithm 1, two cases arise: i) d < c α or, ii) d ≥ c α .For the first case, note that since in the optimum there is a schedule of j jobs with makespan at most c opt j = dc k < c k+α , the iteration in which the j'th job is scheduled in our algorithm is at most k.Also, note that the completion time of any job in each iteration i of the previous k − 1 iterations is at most ρc i+α where ρ = 3 and the average completion time of each job in iteration k (using Proposition 7) is at most γc k+α where γ = 2.5.Thus: Similarly, for when d ≥ c α , c opt j = dc k < c k+1+α .Thus, the j'th job is scheduled no later than iteration k + 1. Therefore: In the first case, α ∈ [log c d, 1) and in the second case, α ∈ [0, log c d).By taking the expectation over α over the two cases, one gets Setting ρ = 3 and γ = 2.5, and c = 2.912 leads to the approximation ratio of 7.279.

Refinements for the case of unit processing times
In this section, we modify our general framework to obtain better approximation factors for the case of unit processing times.The main new ingredient of the proof is to use a different algorithm to find (ρ, t)-proper sets instead of Lemma 6. Recall that our general framework works in two steps: first, partition the jobs into disjoint blocks, and second, schedule each block separately.For unit processing time, we follow the same general framework but we use a standard b-matching algorithm for partitioning, and a more careful scheduling algorithm to deal with the jobs of each block.Algorithm 2 describes each stage more formally.
In our algorithm, the procedure b-Matching(b) finds a maximum size b-matching (a subgraph with maximum degree b) in the undirected subgraph obtained from the set of edges in R i in polynomial time (e.g.[5]).

Lemma 9.
For even b ≥ 0, any b-matching can be partitioned into b 2 2-matchings.

5:9
Data: Auxiliary graph H, a constant c ∈ R >0 to be fixed later Result: A scheduling of the jobs Schedule jobs in J i according to Lemma 10 11 end Algorithm 2: Approximation for the min-sum objective on stars with identical jobs.This is known for b-regular graphs [21].It is straightforward to prove the same for graphs with maximum degree b as well.The details appear in full version.
Next, we schedule the jobs in each block.We note that using Vizing's algorithm for edge coloring, we can schedule the jobs in J i using t i + 1 new time steps (details omitted here), however, in order to obtain a better approximation ratio we do the following.Let J = {J 1 , J 2 , . . ., J } be the partitioning constructed by the algorithm, where J i is a maximum t i -matching.Recall that each J i is further partitioned into slots J 1 i , J 2 i , . . ., J t i 2 i .Our goal is to find a scheduling of jobs in J i (for each i ≥ 1) with small makespan for them and at the same time small average completion time.We show how to find a schedule with makespan t i for each J i , i ≥ 2 (relative to the end of the last group J i−1 ), and with makespan t 1 + 1 for J 1 ; furthermore, for each J i the average completion time of the jobs in J i will be ti+1 2 .In the following lemma, we slightly abuse the definition of the makespan within each slot to refer to the number of new time units (in comparison to the previous slot) that is used to schedule its edges.Lemma 10.Given the partitioning J , there exists a scheduling in which every slot J t i has makespan of 2, except for the very first slot J 1  1 which has a makespan of 3. The makespan of each job in J k will be at most 1 + k =1 t k .Furthermore, the average completion time of jobs in J k will be at most 1 + We only sketch the proof here and defer the details to a full version of the paper.
Proof Sketch.Given that each slot J t k accommodates a 2-matching, we first develop a schedule for the first slot of J 1 with a makespan of 3. In doing so, we observe that any 2-matching accommodated in a slot can be modified to a cycle (path) whose vertices alternate between having an in-degree of 2 and an out-degree of 2. By scheduling the jobs of J 1  1 with a makespan of 3, we create one slack time unit since every machine processes at most 2 jobs.We then carry this slack time unit to the subsequent slots and schedule the jobs in each J t k (except J 1  1 ) with a makespan of 2.
The proof of the following theorem is analogous to that of Theorem 8, and we defer it to the appendix.

Scheduling on Trees and General Networks
In this section, we first focus on situations where the topology of the machines is a tree and then on the general acyclic job shop setting.We prove Theorems 1, 2, and 4. We first recall a result from [17,22] that shows how to convert an approximation for the makespan objective that is relative to the lower bound max{C, D} into an approximation for the weighted min-sum objective losing only an additional constant factor.Here, C is the congestion and D is the dilation of the input.The statement below paraphrases their result.
Theorem 12 ([17, 22]).Consider an instance of job shop scheduling with jobs J having weights w j ≥ 0, j ∈ J. Suppose for any J ⊆ J we can find a schedule of J in polynomial time having makespan γ • max{C(J ), D(J )} where C(J ) is the maximum congestion of an edge under jobs J and D(J ) is the dilation of J .Then in polynomial time, we can find a schedule for all of J where the weighted completion time is at most 2eγ times the minimum possible weighted completion time.
When we invoke this, we will simply have proved that for the given instance we can schedule all jobs with makespan bounded by a factor of max{C, D}.But it should be obvious that we would get the analogous bound if we restricted to any subset of jobs because that restricted instance falls in the same family of instances we are considering (e.g. on a tree or acyclic job shop with identical machines).

Proof of Theorem 1
First, note that if all p j 's are 1, then we simply have the packet routing problem in a tree.Peis et al. [19] presented a simple algorithm in this setting that has makespan at most C + D − 1 (where C and D are congestion and dilation).This, together with the result of [17,22], yields a 4e-approximation for the min-sum objective in unit processing time.Now, suppose that we have general processing times.We first present an algorithm with the ratio O(min{log m, log n}) with respect to the two lower bounds of C, D for the makespan.Combined with Theorem 12, this yields the same approximation ratio for the min-sum objective.Finally, we focus on the acyclic job shop and present an O(min{log n , log p max })approximation.This will also provide the O(log p max ) part of the guarantee stated in Theorem 1 for trees.
So, we now focus on trees.Let T be the underlying network.Our plan is to present an O(log m)-approximation, and also an O(log n)-approximation for makespan.We simply return the better of the two.For each, we decompose the problem into a logarithmic number of independent instances, each of which is the union of vertex-disjoint junction-tree instances.
To do this, pick an arbitrary node v 1 ∈ T as the root (we specify which vertex to pick below) and then partition the jobs into two groups: G 1 : those jobs j for which their path Q j contains node v; and the rest are placed in J − G 1 .Note that no job in J − G 1 ever needs processing on any edge incident with v 1 , therefore, each such job is over a subtree of T − v 1 .We claim that we can always pick v 1 such that the number of jobs in each of the subtrees in T − v 1 is at most n/2.Claim 13.Given a tree T with some subpaths Q 1 , . . ., Q n where each Q i is a s i , t i -path for some s i , t i ∈ V (T ) one can always pick a vertex v ∈ T such that the number of paths that are entirely within any subtree of T − v is at most n/2.Proof.For every edge e = uv, if more than n/2 of the paths Q i are contained entirely in one subtree of T − e, direct e toward this subtree.Otherwise, direct e arbitrarily.After directing all edges, there is a node v that has no out-going edge.It should be easy to see v has the required properties.

Trees
Note that we can find a schedule for each of the subtrees of T −v 1 independently and run them in parallel.Therefore, we can now solve the problem on each of those subtrees independently.For each such subtree, we pick a node as the root again; all the jobs that contain one of these roots form group G 2 and the rest of jobs belong to J − G 1 − G 2 , and we do this recursively for each subtree.Since each time, the number of jobs left in a subtree halves, we will have at most log n iterations and hence we obtain σ ≤ log n groups G 1 , G 2 , . . ., G σ and each group is the union of independent (i.e.vertex-disjoint) junction-tree instances.Using Theorem 2 we can obtain a 4-approximation for makespan of each group.Running these log n schedules in any arbitrary order gives an O(log n)-approximation for makespan.
The algorithm for finding an O(log m)-approximation is similar.We only need to pick the root v 1 (and subsequent roots) in such a way that the number of edges (i.e.machines) in each subtree left is at most half the number of edges in the original one.Such a node is commonly called a centroid of the tree.Therefore, we obtain log m groups this way, each of which is a collection of independent junction tree instances.Combining these we get an O(min{log n, log m})-approximation for the makespan on trees and subsequently the same approximation ratio for min-sum objective function.

Acyclic Job Shop
The approximation we devise for acyclic job shop is really just a sequence of simple observations.Recall we are assuming the processing times are integers, so p j ≥ 1 for all jobs j.As in [6], by losing a factor of 2 in p max , C, and D, we assume p j = 2 k for some k ∈ Z ≥0 .This is achieved by scaling up all p j to a power of 2. Observe the optimum solution value at most doubles; we could just double the start times of all operations in an optimum solution.Also, any schedule under these scaled processing times yields a schedule under the original times by using the same start times for each operation.
For each integer 0 ≤ k ≤ log 2 p max , form the group B k = {j : p j = 2 k }.We can view each group B k as an instance of acyclic job shop with identical jobs, so by [15] there is a solution with makespan O(C + D).More specifically, we can scale the running times of each job in B k to be 1, which also scales the congestion and dilation by 2 −k .In polynomial time, we can find a schedule for these unit-length jobs with makespan O(2 −k • (C + D)) [15], so under the original running times 2 k we get a solution with makespan O(C + D).
Finally, we simply concatenate the resulting solutions for these 1 + p max groups to get a solution for all jobs with makespan O(log p max • (C + D)).As this is an approximation relative to the lower bound max{C, D}, we also get an O(log p max )-approximation for the min-sum objective using Theorem 12.
For the O(log n )-approximation, we perform the same bucketing but also form a "small job" group B small = B 0 ∪ B 1 ∪ . . .∪ B a where a = (log 2 p max ) − log 2 n .We round up all jobs in B small to have processing time 2 a .We can solve B small trivially by a greedy algorithm that simply ensures no machine is idle if it has an available job to process.
The makespan of this schedule will be at most 2 a • • n because there are • n operations in total to be performed between all jobs and at any point of time before all jobs are completed at least one machine will be busy.Note 2 a • • n ≤ p max ≤ C + D. We then solve the remaining O(log n ) buckets B a+1 , . . ., B log 2 pmax as before and concatenate their schedules for a total makespan of O(log n ) • (C + D)).Again, using Theorem 12 this yields an O(log n )-approximation for the min-sum objective.

Proof of Theorem 2
Recall that in this setting the network of our machines forms a tree T rooted at r and the path Q j for each job j contains r on its path.

General processing times
In this section, we present a 4-approximation for the makespan on junction trees which is based on the trivial lower bounds of C, D. Again, combined with the result of [17,22], this implies an 8e-approximation for the min-sum objective function.
Let L be the value of makespan in an optimum solution.Our algorithm for makespan has two stages: in the first stage each job j moves from s j to r; in the second stage each job j moves from r to t j .Clearly, each stage can be completed with makespan at most L. We show how each step can be completed with makespan at most 2L, and this yields a solution with makespan at most 4L.
It is easier to describe the algorithm for the 2nd stage first: in this setting, all the jobs are already at the root, and the goal is to send them to their destinations (t j 's).If u 1 , . . ., u σ are children of r, it is enough to focus on the jobs that travel down one arbitrary edge ru i and describe the algorithm for the subtree rooted at u i .Suppose we sort the jobs based on their processing times from smallest to largest and start sending them (from the smallest) as soon as ru i is free.Since each job j starts on its first edge ru i after jobs that have smaller processing time than j, job j does not encounter delay/waiting other than at the root.Let p 1 ≤ p 2 ≤ . . .≤ p n be the jobs going down ru i .Then the maximum delay any job encounters (which happens for the last job) is n−1 i=1 p i which is at most congestion C. Also, note that once j starts on the first edge, the total time it takes to complete j is exactly |rt j | • p j .Noting that the largest |rt j | • p j is dilation D, all jobs are done after at most D steps, once they have started processing.Therefore, the whole makespan is at most C + D which is at most 2L.
The algorithm for sending the jobs to the root is almost the same.The best way to describe it is to consider running the same algorithm as if the jobs were supposed to start at the root and each job j is to be sent to its start point s j .Using the same algorithm as above, all jobs can reach their designated vertex s j in time at most 2L.Run this schedule backwards to move all jobs j from s j to r in time at most 2L.

Special case of unit processing times
Here, we consider the case of junction trees with unit processing time and present a 3approximation algorithm for the min-sum objective.Since we have jobs of unit processing time, we can think of the schedule in synchronized setting were in each time step each machine starts processing one job that is available for that machine.We assume each e = uv has two buffers (queues) b e (u) and b e (v) at the two ends u, v; b e (u) will buffer the jobs that arrive at u and want to cross e and b e (v) will buffer the jobs that arrive at v and want to cross u.
Our algorithm, called Algorithm 3, is very simple; it tries to keep the machines busy.More specifically, at each time step, each machine e = uv (where v is parent of u) performs the following: if there is any job in b e (u) process the next job from b e (u) and send it along its path, else if there is any job in b e (v) then process the next job from b e (v) and send it along its path, else do nothing.Whenever a job arrives at a machine e = uv from whichever end-point, it enters the corresponding buffer.Essentially, the algorithm keeps the machines busy by processing the jobs that have arrived at them (from either end-point), giving priority to the jobs that are moving towards the root (so they are still in their first leg of their path).
We show that this is a 3-approximation for the min-sum objective, which implies the 2nd part of Theorem 2. Theorem 14. Algorithm 3 is a 3-approximation for min-sum objective.
We use δ(r) to denote the set of machines incident to r.For each edge e let L(e) be the set of jobs whose path contains e and l(e) = |L(e)|.Recall that for each job j, Q j is the unique s j , t j path and |Q j | be the number of machines j needs to be processed on.Let OPT denote an optimum schedule and C opt the total flow time of OPT.We use C to denote the cost of our solution.In the following two lemmas, we get lower bounds for the optimum.The proof of the first lemma is immediate and the proof of the second is deferred to a full version of this paper.
We defer the details to a full version of the paper and conclude this section by noting that Algorithm 3 is a 2-approximation for the special case when the machines form a star.This is because by e∈δ(r) (e) = 2n and |Q j | = 2 the bounds proved in Lemmas 16 and 18 simplify to: Recall that for this setting our (more complicated) algorithm of Theorem 3 yields a 1.796-approximation.

Proof of Theorem 4
In this setting, each job j starts at the root and, unlike the previous settings in which a job must be processed on all machines along a given (s j , t j ) path, it can take any path to reach any leaf node of the tree, while it has a processing time of p j on every machine.For this case, we show that a simple greedy algorithm finds a schedule with the min-sum objective in polynomial time, hence proving Theorem 4.
Suppose c 1 , . . ., c d are the children of r.Consider an optimum solution OP T and let J k be the set of jobs that go down a path starting at edge (machine) rc k .The following observation is immediate: Observation 19.In any optimum solution, the following two hold: 1.The optimum solution processes the jobs in J k in the order of their processing time from small to large.2. All the jobs in J k follow the shortest root-to-leaf path.
Processing jobs from the smallest to the largest is known as SPT (Shortest Processing Time) rule, and it is known that on a single machine, SPT minimizes total flow time (which means it minimizes the total delay/waiting on one machine).Since using SPT there is no delay on subsequent machines for any job, it immediately implies that the optimum sends jobs down each path using SPT rule.
Let n k = |J k | and m k be the length of the path (number of machines from root-to-leaf) jobs in J k travel.Suppose that the jobs in J k from small to large are: j 1 k , j 2 k , . . ., j n k k .Since each job j a k ∈ J k will incur a delay only at the root and the delay is , and has a path of length m k of machines to go through, the total flow time of j a k is m k p j a k + 1≤i≤a−1 p j i k .Thus, the total flow time of all the jobs in J k is: Proof.By way of contradiction suppose that OPT is an optimum solution and for two children of r we have n k , n k > 0 and are the sequences of the jobs scheduled on branches rc k and rc k , respectively.Suppose we remove job j 1 k from branch rc k and add it in front of the queue J k .The total flow time of the jobs on branch rc k goes down by h k p j 1 k and the total flow time of the jobs on branch rc k goes up by (h k + 1)p j 1 k .So the total net change in flow time is (−h k + h k + 1)p j 1 k < 0, which contradicts optimality of OPT.We call a schedule in which the load of any two branches differs by at most 1 an almost balanced schedule.So the above lemma shows every optimum solution is almost balanced.We can also assume w.l.o.g. that in any optimum solution for jobs n, . . ., 1, if job 1 (the smallest job) is removed from the schedule, the remaining schedule is still an almost balanced one.In other words, if J k is the set of jobs including job 1 and are scheduled on branch rc k then the load h k is as big as any other branch load.To see this, suppose that job 1 is scheduled on branch rc k with h k < h k for some other branch rc k with n k > 0. Let i be the smallest job in J k and swap 1 and i in the schedule.The net change in the total flow time will be p These properties suggest the following simple greedy algorithm which we show below finds the optimum solution.

Conclusion
We have presented a number of approximations for special cases of acyclic job shop with identical machines.There are still many interesting questions one could ask.For example, we tightened the bound between lb and the minimum makespan for acyclic job shop with identical machines by an O(log log lb) factor, and now the gap is off by only an O(log log lb) factor.Can this be further tightened?Perhaps more interestingly, is the acyclic job shop problem with identical machines hard to approximate within any constant?It may be hard to approximate within Ω(log 1− lb), just like flow shop with unrelated machines [18].
Are we resigned to losing logarithmic factors in trees or can we do better?Note that getting an O(1)-approximation for instances of acyclic flow shop with identical machines where the underlying network is a path and each job must follow a subpath is still open.
Finally, the fact that the makespan objective for acyclic job shop is super-constant hard does not necessarily mean its min-sum counterpart is also hard.By way of analogy, min-sum set cover admits a constant-factor approximation while its classic variant minimum set cover (which can be viewed as a makespan version) has a logarithmic hardness of approximation.The problem of getting either further improvements under the min-sum objective or establishing a super-constant hardness are both open.

A Proof of Theorem 11
Proof.Similar to our analysis for the case of general processing times, let u j be completion time of j'th job in our schedule and let c opt j be the completion time of j'th job in a schedule with the optimum min-sum objective.Assume c opt j = dc k for d < c.We consider the two cases where d < c α and d ≥ c α .In the first case, u j is bounded from above by the amortized bound 1 + .Note that the first two terms in both of these bounds correspond to the sum of completion times of all the jobs in previous blocks (∆ k ), and the second term corresponds to the amortized completion time of job j in the last block.
Simplifying the bound in the first case, we get − c α − c 1+α .For the second case, we obtain the following: Taking the expectation of u j over α, we get

1 while there is a job unfinished do 2 foreach 3 :
machine e = uv (with v being parent of u) do 3 if b e (u) = ∅ then 4 process the first job in b e (u) and pass it to the next buffer; 5 else if b e (v) = ∅ then 6 process the first job in b e (v) and pass it to the next buffer; Approximation for the min-sum objective on junction trees with unit processing times.

Lemma 20 .
and the total flow time of all the jobs in OPT is 1≤k≤d 1≤i≤n k (m k + n k − i)p j i k .We use h k = m k + n k and call it the "load" of the branch rc k .The following lemma follows easily.In any optimum solution, for any two children c k , c k of r with n k , n k > 0 we must have:|m k + n k − m k − n k | ≤ 1.In other words, the difference of loads of any two branches is at most 1.

APPROX/RANDOM'17 5:10 Scheduling Problems over Network of Machines Theorem 11.
Algorithm 2 is a 1.796-approximation algorithm for the star scheduling problem when jobs have unit processing times.

1
Sort the jobs in non-increasing order of their processing time, say p n , p n−1 , . . ., p 1 ; 2 Let c 1 , . . ., c d be the children of r; and J i ← ∅ be the queue of jobs going down branch rc i ; 3 Let m i be the length of shortest root to leaf path from rc i and n i ← |J i |; Schedule job j in front of the queue J k ; We prove by backward induction on i that the greedy finds the optimum solution for the set of jobs n, . . ., i for all n ≥ i ≥ 1.The case of i = n is trivial.Let k ≤ n be an arbitrary integer and suppose that the greedy partial schedule for jobs n, . . ., k + 1 is optimum for this set of jobs; call this schedule S k+1 and let S k be the greedy schedule after adding job k and O 7 Theorem 21.The greedy algorithm (Algorithm 4) finds an optimum solution.Proof.k be an optimum schedule for jobs n, . . ., k.Let O be the schedule for n, . . ., k + 1 obtained from O k by removing job k.Since S k+1 is optimum (by hypothesis), cost(S k+1 ) ≤ cost(O ).Also, note that both S k+1 and O are almost balance and have the same number of jobs.Therefore, if h min (O ) and h min (S k+1 ) are the minimum loads in O and S k+1 , respectively, then h min (O ) = h min (S k+1 ).This implies cost(S k ) = cost(S k+1 ) + p k (h min (S k+1 ) + 1) ≤ cost(O ) + p k (h min (O ) + 1) = cost(O k ).
and in the second case, by the amortized bound