Energy efficient scheduling via partial shutdown

Motivated by issues of saving energy in data centers we define a collection of new problems referred to as "machine activation" problems. The central framework we introduce considers a collection of <i>m</i> machines (unrelated or related) with each machine <i>i</i> having an <i>activation cost</i> of <i>a</i><sub><i>i</i></sub>. There is also a collection of <i>n</i> jobs that need to be performed, and <i>p</i><sub><i>i, j</i></sub> is the processing time of job <i>j</i> on machine <i>i</i>. Standard scheduling models assume that the set of machines is fixed and all machines are available. However, in our setting, we assume that there is an activation cost budget of <i>A</i> - we would like to <i>select</i> a subset <i>S</i> of the machines to activate with total cost <i>a</i>(<i>S</i>) ≤ <i>A</i> and <i>find</i> a schedule for the <i>n</i> jobs on the machines in <i>S</i> minimizing the makespan (or any other metric).
 We consider both the unrelated machines setting, as well as the setting of scheduling uniformly related parallel machines, where machine <i>i</i> has activation cost <i>a</i><sub><i>i</i></sub> and speed <i>s</i><sub><i>i</i></sub>, and the processing time of job <i>j</i> on machine <i>i</i> is <i>p</i><sub><i>i, j</i></sub> = <i>p</i><sub><i>j</i></sub><i>s</i><sub><i>i</i></sub>, where <i>p</i><sub><i>j</i></sub> is the processing requirement of job <i>j</i>.
 For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan <i>T</i> and activation cost <i>A</i> then we can obtain a schedule with makespan (2 + ε)T and activation cost 2(1 + 1/ε)(ln <i>n</i>/<i>OPT</i> + 1)<i>A</i>, for any ε > 0. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of 2<i>T</i> and an activation cost <i>A</i>(1 + ln <i>n</i>).
 For the uniformly related parallel machine sceduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most <i>A</i> and the makespan is at most (1 + ε)<i>T</i> for any ε > 0. For the special case of <i>m</i> identical speed machines, the machine activation problem is trivial, since the cheapest subset of <i>k</i> machines is always the best choice if the optimal solution activates <i>k</i> machines. In addition, we consider the case whem some jobs can be dropped (and are treated as outliers).


Abstract
Motivated by issues of saving energy in data centers we define a collection of new problems referred to as "machine activation" problems. The central framework we introduce considers a collection of m machines (unrelated or related) with each machine i having an activation cost of a i . There is also a collection of n jobs that need to be performed, and p i,j is the processing time of job j on machine i. Standard scheduling models assume that the set of machines is fixed and all machines are available. However, in our setting, we assume that there is an activation cost budget of A -we would like to select a subset S of the machines to activate with total cost a(S) ≤ A and find a schedule for the n jobs on the machines in S minimizing the makespan (or any other metric).
We consider both the unrelated machines setting, as well as the setting of scheduling uniformly related parallel machines, where machine i has activation cost a i and speed s i , and the processing time of job j on machine i is p i,j = pj si , where p j is the processing requirement of job j. For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan T and activation cost A then we can obtain a schedule with makespan (2 + ǫ)T and activation cost 2(1 + 1 ǫ )(ln n OP T + 1)A, for any ǫ > 0. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of 2T and an activation cost A(1 + ln n).
For the uniformly related parallel machine scheduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most A and the makespan is at most (1 + ǫ)T for any ǫ > 0. For the special case of m identical speed machines, the machine activation problem is trivial, since the cheapest subset of k machines is always the best choice if the optimal solution activates k machines. In addition, we consider the case when some jobs can be dropped (and are treated as outliers).

Introduction
Large scale data centers have emerged as an extremely popular way to store and manage a large volume of data. Most large corporations, such as Google, HP and Amazon have dozens of data centers. These data centers are typically composed of thousands of machines, and have extremely high energy requirements. Data centers are now being used by companies such as Amazon Web Services, to run large scale computation tasks for other companies who do not have the resources to create their own data centers. This is in addition to their own computing requirements.
These data centers are designed to be able to handle extremely high work loads in periods of peak demand. However, since the workload on these data centers fluctuates over time, we could selectively shut down part of the system to save energy when the demand on the system is low. Energy savings results not just from putting machines in a sleep state, but also from savings in cooling costs.
Hamilton (see the recent SIGACT News article [3]) argues that a ten fold reduction in the power needs of the data center may be possible if we can simply build systems that are optimized with power management as their primary goal. Suggested examples (summarizing from the original text) are: an activation cost of a i . The activation cost of the subset S is a(S) = i∈S a i . We show that if there is a schedule with activation cost A and makespan T , then we can find a schedule with activation cost 2(1 + 1 ǫ )(ln n OP T + 1)A and makespan (2 + ǫ)T for any ǫ > 0 by the LP-rounding scheme (we call this is a ((2 + ǫ), 2(1 + 1 ǫ )(ln n OP T + 1))-approximation). We also present a greedy algorithm which gives us a (2, 1 + ln n)-approximation. Actually, the ln n term in the activation cost with this general formulation is unavoidable, since this problem is at least as hard to approximate as the set cover problem 1 , for which a (1 − ǫ) ln n approximation algorithm will imply that N P ⊆ DT IM E(n O(log log n) ) [8].
We also show that the recent PTAS developed by Epstein and Sgall [7] can be extended to the framework of machine activation problems for the case of scheduling jobs on uniformly related parallel machines. (The original PTAS by Hochbaum and Shmoys [12] is slightly more complicated than the method suggested by Epstein and Sgall [7]. ) We also consider a version of the problem in which a subset of the jobs may be dropped to save energy (recall Hamilton's point (1)). In this version of the problem, each job j also has a benefit π j and we need to process a subset of jobs with total benefit of at least Π. Suppose that a schedule exists with cost C Π and makespan T Π that obtains a total benefit at least Π. We show that the method due to Shmoys and Tardos [20] can be extended to find a collection of jobs to perform with expected benefit at least Π and expected cost C Π , with a makespan guaranteed to be at most 2T Π (see Appendix A) . (The recent work by Gupta et al. [11] gives a clever deterministic scheme with makespan 3T Π and cost (1 + ǫ)C Π along with several other results on scheduling with outliers. This has been further improved to (2 + ǫ)T Π and cost (1 + ǫ)C Π in [18].) [20], have considered the L p norm. Azar and Epstein [2] give a 2-approximation for any L p norm for any p > 1, and a √ 2-approximation for the L 2 norm. The bounds for p = 2 have been subsequently improved [16]. In addition, we can have release times r ij associated with each job -this specifies the earliest time when job j can be started on machine i. Koulamas et al. [15] give a heuristic solution to this problem on uniformly related machines with a worst case approximation ratio of O( √ m).

Related Work on Scheduling Generalizations of the work by Shmoys and Tardos
Minimizing resource usage has been considered before. In this framework, a collection of jobs J needs to be executed -each job has a processing time p j , a release time r j and a deadline d j . In the continuous setting, a job can be executed on any machine between its release time and its deadline. In the discrete setting each job has a set of intervals during which it can be executed. The goal is to minimize the number of machines that are required to perform all the jobs. For the continuous case, Chuzhoy and Codenotti [4] have recently developed a constant factor approximation, improving upon a previous algorithm given by Chuzhoy et al [5]. For the discrete version Chuzhoy and Naor [6] have shown an Ω(log log n) hardness of approximation. However this framework does not model non-uniformity of machines, which is one of the key issues in data centers. In addition, non-uniformity of activation costs is not addressed in their work neither.

Related Work on Energy Minimization
Augustine, Irani and Swamy [1] develop online algorithms to decide when a particular device should transition to a sleep state when multiple sleep states are available. Each sleep state has a different power consumption rate and a different transition cost. They provide deterministic online algorithms with competitive ratio arbitrarily close to optimal to decide in an online way which sleep state to enter when there is an idle period. See also the survey by Irani and Pruhs for other related work [14].

Our Contributions Our main contributions are:
• A randomized rounding method that approximates both activation cost and makespan for unrelated parallel machines. This method is based on rounding the LP solution of a certain carefully defined LP relaxation and uses ideas from work on dependent rounding [10,16] (Section 2).
• Extensions of the above method when we have assignment costs in addition to activation costs as part of the objective function (Section 3).
• A greedy algorithm that approximates both activation cost and makespan for unrelated parallel machines and gives a (2, 1 + ln n)-approximation (Section 4).
• Extensions of these results to the case of handling outliers using the methods from [11] as well as release times (Section 5).
• A polynomial time approximation scheme for the cost activation problem for uniformly related parallel machines extending the construction given for the version of the problem with no activation costs [7] (Section 6).
• A simple dependent rounding scheme for the partial GAP problem (Appendix A).

LP Rounding for Machine Activation on Unrelated Machines
In this section, we first provide a simple roundinging scheme with an approximation ratio of (O(log n), O(log n)). Then we improve it to a (2 + ǫ, 2(1 + 1 ǫ )(ln n OP T + 1))-approximation by a new rounding scheme. We can formulate the scheduling activation problem as an integer program. We define a variable y i for each machine i, which is 1 if the machine is open and 0, if it is closed. For every machine-job pair, we have a variable x i,j , which is 1, if job j is assigned to machine i and is 0, otherwise. In the corresponding linear programming relaxation, we relax the y i and x i,j variables to be in [0, 1]. The first set of constraints require that each job is assigned to some machine. The second set of constraints restrict the jobs to be assigned to only active machines, and the third set of constraints limit the workload on a machine. We require that 1 ≥ x i,j , y j ≥ 0 and if p i,j > T then x i,j = 0. The formulation is as shown below: Suppose an integral solution with activation cost A and makespan T exist. The LP relaxation will have cost at most A with the correct choice of T . All the bounds we show are with respect to these terms. In Section 2.2 we show that unless we relax the makespan constraint, there is a large integrality gap for this formulation.

Simple Rounding
We first start with a simple rounding scheme. Let us denote the optimum LP solution byȳ,x. The rounding consists of the following four steps: 1. Round each y i to 1, with probabilityȳ i and 0 with probability 1 −ȳ i . If y i is rounded to 1, open machine i.

2.
For each open machine i, consider the set of jobs j, that have fractional assignment > 0 on machine i. For each such job, set X i,j =x i,j y i . If j p i,j X i,j < T , (it is always ≤ T ) then uniformly increase X i,j . Stop increasing any X i,j that reaches 1. Stop the process, when either the total fractional makespan is T or all X i,j 's are 1. If X i,j = 1, assign job j to machine i. If machine i has no job fractionally assigned to it, drop machine i from further consideration. For each job j that has fractional assignment X i,j , assign it to machine i with probability X i,j .
3. Discard all assigned jobs. If there are some unassigned jobs, repeat the procedure.
4. If some job is assigned to multiple machine, choose any one of them arbitrarily.
In the above rounding scheme, we useȳ i 's as probabilities for opening machines and for each opened machine, we assign jobs following the probability distribution given by X i,j 's. It is obvious that the expected activation cost of machines in each iteration is exactly the cost of the fractional solution given by the LP. The following lemmas bound the number of iterations and the final load on each machine. Proof. Consider a job j. In a single iteration, Pr( job j is not assigned to machine Pr( job j is not assigned in an iteration ) The second inequality holds since ix ij = 1 and the quantity is maximized when allx ij 's are equal. Then, it is easy to see the probability that job j is not assigned after 2 ln n iterations is at most 1 n 2 . Therefore, by union bound, with probability at least 1 − 1 n , all jobs can be assigned in 2 ln n iterations. ⊓ ⊔ Lemma 2.2. The load on any machine is O(T log n) with high probability.
Proof. Consider any iteration h. Denote the value of X i,j at iteration h, by X h i,j . For each open machine i and each job j, define a random variable Denote by M i the load on machine i. Therefore, is open only to the extent of 1/m. All jobs are assigned to the extent of 1/m on each machine A 1 , A 2 , .., A m−1 . So the total processing time on any machine A i is m T m = T . The remaining 1 m part of each job is assigned to B. So total processing time on B is T m·m · m = T m . It is easy to see the optimal fractional cost is at most m + R m (by setting y B = 1 m ). Therefore, the integrality gap is at least ≈ m.

Main Rounding Algorithm for Minimizing Scheduling Activation Cost with Makespan
Budget In this section, we describe our main rounding approach, that achieves an approximation factor of 2(1+ 1 ǫ )(ln n OP T + 1) for activation cost and (2+ ǫ) for makespan. Based on this new rounding scheme, we show in Section 3 how to simultaneously approximate both machine activation and job assignment cost along with makespan, and how to extend it to handle outliers, when some jobs can be dropped (Section 5). For the basic problem with only activation cost and makespan, we show in Section 4, that a greedy algorithm achieves an approximation factor of (2, 1 + ln n). However, the greedy algorithm is significantly slower than the LP rounding algorithm, since it requires computations of (m − i) linear programs at the ith step of greedy choice, where i can run from 1 to min(m, n) and m, n are the number of machines and jobs respectively.
The algorithm begins by solving LP (Eq(2.1)). As beforex,ȳ denote the optimum fractional solution of the LP. Let M denote the set of machines and J denote the set of jobs. Let |M | = m and |J| = n. We define a bipartite graph G = (M ∪ J, E) as follows: M ∪ J are the vertices of G and e = (i, j) ∈ E, ifx i,j > 0. The weight on edge (i, j) isx i,j and the weight on machine node i isȳ i . Rounding consists of several iterations. Initialize X =x and Y =ȳ. The algorithm iteratively modifies X and Y , such that at the end X and Y become integral. Random variables at the end of iteration h are denoted by X h i,j and Y h i . The three main steps of rounding are as follow: 1. Transforming the Solution: It consists of creating two graphs G 1 and G 2 from G, where G 1 has an almost forest structure and in G 2 the weight of an edge and the weight of the incident machine node is very close. In this step, only X i,j 's are modified, while Y i 's remain fixed atȳ i 's.

2.
Cycle Breaking: It breaks the remaining cycles of G 1 and convert it into a forest, by moving certain edges to G 2 .
3. Exploiting the properties of G 1 and G 2 , and rounding on G 1 and G 2 separately.
We now describe each of these steps in detail.

Transforming the Solution
We decompose G into two graphs G 1 and G 2 through several rounds. Initially, In each round, we either move one job node and/or one edge from G 1 to G 2 or delete an edge from G 1 . Thus we always make progress. An edge moved to G 2 retains its weight through the rest of the iterations, while the weights of the edges in G 1 keep on changing. We maintain the following invariants, Figure 1: Linear System at the beginning of iteration (h + 1) (I4) Once a variable is rounded to 0 or 1, it is never changed.
Consider round one. Remove any machine node that has Y 1 i = 0 from both G 1 and G 2 . Activate any machine that has Y 1 i = 1. Similarly, discard any edge (i, j) with X 1 i,j = 0, and if X 1 i,j = 1, assign job j to machine i and remove j. If X 1 i,j ≥ȳ i /γ, then remove the edge (i, j) from G 1 and add the job j (if not added yet) and the edge (i, j) with weight x i,j (≥ȳ i /γ) to G 2 . Note that, if for some (i, j) ∈ G, p i,j = 0, then we can simply takex i,j =ȳ i and move the edge to G 2 . Thus we can always assume for every edge (i, j) ∈ G 1 , p i,j > 0. It is easy to see that, after iteration one, all the invariants (I1-I4) are maintained.
Let us consider iteration (h + 1) and let J ′ , M ′ denote the set of jobs and machine nodes in G 1 with degree at least 1 at the beginning of the iteration. Note that Y h i = Y 1 i =ȳ i for all h. Let |M ′ | = m ′ and |J ′ | = n ′ . As in iteration one, any edge with X h i,j = 0 in G 1 is discarded and any edge with X h i,j ≥ȳ i /γ is moved to G 2 (if node j does not belong to G 2 , add it to G 2 also). We denote by w i,j the weight of an edge (i, j) ∈ G 2 . Any edge and its weight moved to G 2 will not be changed further. Since w ij is fixed when (i, j) is inserted to G 2 , we can treated it as a constant thereafter. Consider the linear system (Ax = b) as in Figure 1.
We call the fractional solution x canonical, if x i,j ∈ (0, y i /γ), for all (i, j). Clearly {X h i,j }, for (i, j) ∈ E(G 1 ) is a canonical feasible solution for the linear system in Figure 1. Now, if a linear system is under-determined, we can efficiently find a non-zero vector r, with Ar = 0. Since x is canonical, we can also efficiently identify strictly positive reals, α and β, such that for all (i, j), x i,j + αr i,j and x i,j − βr i,j lie in [0, y i /γ] and there exists at least one (i, j), such that one of the two entries, x i,j + αr i,j and x i,j −βr i,j , is in {0, y i /γ}. We now define the basic randomized rounding step, RandStep(A, x, b) : with probability β α+β , return the vector x + αr and with complementary probability of α α+β , return the vector x − βr.
If X = RandStep (A, x, b), then the returned solution has the following properties [16]: If the linear system in Figure 1 is under-determined, then we apply RandStep to obtain the updated vector X h+1 . If for some (i, j), X h+1 i,j = 0, then we remove that edge (variable) from G 1 . If X h+1 i,j =ȳ i /γ, then we remove the edge from G 1 and add it with weightȳ i /γ to G 2 . Thus the invariants (I1, I3 and I4) are maintained. Since the weight of any edge in G 2 is never changed and load constraints on all machine nodes belong to the linear system, we get from [16], Also for each machine i and iteration h, j X h i,j p i,j = j x i,j p i,j with probability 1. Thus the invariant (I2) is maintained as well.
If the linear system ( Figure 1) becomes determined, then this step ends and we proceed to the next step of "Cycle Breaking".
2.5 Cycle Breaking: Let M ′ and N ′ be the machine and job nodes respectively in G 1 , when the previous step ended. If |M ′ | = m ′ and |N ′ | = n ′ , then the number of edges in G 1 is |E(G 1 )| ≤ m ′ + n ′ . Otherwise, the linear system (Figure 1) remains underdetermined. Actually, in each connected component of G 1 , the number of edges is at most the number of vertices due to the same reason. Therefore, each component of G 1 can contain at most one cycle.
If there is no cycle in G 1 , we are done; else there is at most one cycle, say Note that since G 1 is bipartite, C always has even length. For simplicity of notation, let the current X value on edge e t = (v t−1 , v t ) be denoted by Z t . Note that if v t is a machine node, then Z t ∈ (0,ȳ vt /γ), else v t−1 is a machine node and Z t ∈ (0,ȳ v t−1 /γ). We next choose values µ 1 , µ 2 , . . . , µ k deterministically, and update the X value of each edge e t = (v t−1 , v t ) to Z t + µ t . Suppose that we initialized some value for µ 1 , and have chosen the increments µ 1 , µ 2 , . . . , µ t , for some t ≥ 1. Then, the value µ t+1 (corresponding to edge e t+1 = (v t , v t+1 )) is determined as follows: , is a job node), then µ t+1 = −µ t (i.e., we retain the total assignment value of w t ); (P2) If v t ∈ M (i.e., is a machine node), we set µ t+1 in such a way so that the load on machine v t remains unchanged, i.e., we set µ t+1 = −p vt,v t−1 µ t /p vt,v t+1 , which ensures that the incremental load p vt,v t−1 µ t + p vt,v t+1 µ t+1 is zero. Since p vt,v t+1 is non-zero by the property of G 1 therefore, dividing by p vt,v t+1 is admissible.
Let α be the smallest positive value, such that if we set µ 1 = α, then for all X i,j values (after incrementing by the vector µ as mentioned above stay in [0,ȳ i /γ], and at least one of them becomes 0 orȳ i /γ. Similarly let β be the smallest positive value such that if we set µ 1 = −β, then again all X i,j values after increments lie in [0,ȳ i /γ] and at least one of them is rounded to 0 orȳ i /γ. (It is easy to see that α and β always exist and they are strictly positive.) We now choose the vector µ as follows: If some X i,j is rounded to 0, we remove that edge from G 1 . If some edge X i,j becomesȳ i /γ, then we remove it from G 1 and add it to G 2 , with weightȳ i /γ. Since at least one of these occurs, we are able to break the cycle.
Let φ denote the fractional assignment of x variables at the beginning of the cycle breaking phase. Then clearly, after this step, for all jobs j, considering both G 1 and G 2 , i X i,j = i φ i,j .
For any machine i ∈ M , if i / ∈ C, then clearly j p i,j X i,j = j p i,j φ i,j . If i ∈ C, but i = v 0 , then by property (P2), before inserting any edge to G 2 , we have j p i,j X i,j = j p i,j φ i,j . Any edge added to G 2 after the cycle breaking step has the same weight as it had in G 1 . Therefore, we have, for any i = w 0 , and considering both G 1 and G 2 , j p i,j X i,j = j p i,j φ i,j . Now consider the machine v 0 (= v k ). Its change in load is exactly µ 1 (p v 0 ,v 1 − p v 0 ,v k−1 µ k /µ 1 ). Therefore by the choice of (R1) and (R2), the load on machine v 0 can only decrease. Hence, by property (2.5), we have the following lemma, Lemma 2.4. Considering both G 1 and G 2 , we have after the cycle breaking step with probability 1: i X i,j = 1 ∀j; j X i,j p i,j ≤ Tȳ i ∀i; , X i,j ≤ȳ i ∀i, j.
2.6 Rounding on G 1 and G 2 The previous two steps ensures, that G 1 is a forest and in G 2 , X i,j ≥ȳ i /γ, for all (i, j) ∈ E(G 2 ). We remove any isolated nodes from G 1 and G 2 , an round them separately.

2.6.1
Further Relaxing the Solution Let us denote the job and the machine nodes in G 1 (G 2 ) by J(G 1 ) (or J(G 2 )) and M (G 1 ) (or M (G 2 )) respectively. Consider a job node j ∈ J(G 2 ). If i:(i,j)∈E(G 2 ) X i,j < 1/δ (we choose δ later), we simply remove all the edges (i, j) from G 2 and the following must hold: i:(i,j)∈E(G 1 ) X i,j ≥ 1 − 1/δ. Otherwise, if i:(i,j)∈E(G 2 ) X i,j ≥ 1/δ, we remove all edges (i, j) ∈ E(G 1 ) from G 1 . Therefore at the end of this modification, a job node can belong to either J(G 1 ) or J(G 2 ), but not both. If j ∈ J(G 1 ), we have i∈M X i, For the makespan analysis it will be easier to partition the edges incident on a machine node i into two parts -the job nodes incident to it in G 1 and in G 2 . The fractional processing time due to jobs in J(G 1 ) (or J(G 2 )) will be denoted by T ′ȳ i (or T ′′ȳ i ), i.e., T ′ȳ i = j∈J(G 1 ) p i,j X i,j (or T ′′ȳ i = j∈J(G 2 ) p i,j X i,j ).

Rounding on
to open a machine node i ∈ M (G 2 ), then we can assign all the nodes j ∈ J(G 2 ), that have an edge (i, j) ∈ E(G 2 ), by paying at most T ′′ γ in the makespan.
Hence, we only concentrate on opening a machine in G 2 , and then if the machine is opened, we assign it all the jobs incident to it in G 2 . For each machine i ∈ M (G 2 ), we define Y i = min{1,ȳ i δ}. Since, for all job nodes j ∈ J(G 2 ), we know i∈M (G 2 ) X i,j ≥ 1/δ, after scaling we have for all j ∈ J(G 2 ), (i,j)∈E(G 2 ) Y i ≥ 1. Therefore, this exactly forms a fractional set-cover instance, which can be rounded using the randomized rounding method developed in [22] to get activation cost within a factor of δ(log n OP T + 1). The instance in G 2 thus nicely captures the hard part of the problem, which comes from the hardness of approximation of set cover. Thus we have the following lemma.
Lemma 2.5. Considering only the job nodes in G 2 , the final load on any machine i ∈ M (G 2 ) is at most T ′′ γ and the total activation cost is at most δ(log n OP T + 1)OP T , where T ′′ is the fractional load on machine i ∈ M (G 2 ) before rounding on G 2 and OP T is the optimum activation cost.
2.6.3 Rounding on G 1 : For rounding in G 1 , we traverse each tree in G 1 bottom up. If there is a job node j, that is a child of a machine node i, then if X i,j < 1/η (η to be fixed later), we remove the edge (i, j) from G 1 . Since initially j ∈ J(G 1 ), i∈M X i,j ≥ 1 − 1/δ, even after these edges are removed, is not already open and add job j to machine i. Initiallyȳ i ≥ 1/η, sinceȳ i ≥ X i,j . The initial contribution to cost by machine i was ≥ 1 η a i . Now it becomes a i . If j X i,j y i p i,j = T ′ , with X i,j ≥ 1/η, now it can become at most ηT ′ .
After the above modification, the yet to be assigned jobs in J(G 1 ) form disjoint stars, with the job nodes at their centers. Consider each star, S j with job node j at its center. Let i 1 , i 2 , ., i ℓ j be all the machine nodes in S j , then we have, If there is already some opened machine, i l , assign j to i l by increasing the makespan at most by an additive T . Otherwise, open machine i l with the cheapest a i l . Since the total contribution of these machines to the cost is Hence, we have the following lemma, Lemma 2.6. Considering only the job nodes in G 1 , the final load on any machine i ∈ M (G 1 ) is at most T ′ η + max i,j p i,j and the total activation cost is at most max( 1 η ,

1
(1−1/δ−1/η) )OP T , where T ′ is the fractional load on machine i ∈ M (G 1 ) before rounding on G 1 and OP T is the optimum activation cost. Now combining, Lemma 2.4, 2.5 and 2.6, and by optimizing the values of δ, η and γ, we get the following theorem.
Theorem 2.1. A schedule can be constructed efficiently with machine activation cost 2(1+ 1 ǫ )(ln n OP T + 1)OP T and makespan (2 + ǫ)T , where T is the optimum makespan possible for any schedule with activation cost OP T .

Minimizing Machine Activation Cost and Assignment Cost
We now consider the scheduling problem with assignment costs and machine activation costs. As before, each job can be scheduled only on one machine, and processing job j on machine i requires p i,j time and incurs a cost of c i,j . Each machine is available for T time units and the objective is to minimize the total incurred cost. In this version of the machine activation model, we wish to minimize the sum of the machine activation and job assignment costs. Our objective now is subject to the same constraints as the LP defined in Eq(2.1). Our algorithm for simultaneous minimization of machine activation and assignment cost follows the same paradigm as has been developed in Section 2.3, with some problem specific changes. We mention the differences here.

Transforming the Solution
After solving the LP, we obtain, C = i,j c i,j x i,j . Though, we have an additional constraint C = i,j c i,j x i,j to care about, we do not include it in the linear system and proceed exactly as in Subsection 2.4. As long as the system is underdetermined, we can repeatedly apply RandStep to form the two graphs G 1 and G 2 . By Property 2.6, ∀i, j, h, E X h i,j =x i,j and hence, we have that the expected cost is i,j c i,jxi,j . The procedure can be directly derandomized by the method of conditional expectation giving an 1-approximation to assignment cost.
When the system becomes determined, we move to the next step. Thus at that point, in every component of G 1 , the number of edges is at most the number of vertices. Thus again each component of G 1 , can consist of at most one cycle. In G 2 , for all (i, j) ∈ E(G 2 ), we have X i,j ≥ȳ i /γ.

Breaking the Cycles
For breaking the cycle in every component of G 1 , we proceed in a slightly different manner from the previous section. However, we now have two parameters, p i,j and c i,j associated with each edge. Suppose (i ′ , j) is an edge in a cycle.
If the X i ′ ,j value of this edge exceeds 1 2 then we can assign job j to machine i ′ and increase the processing load on the machine by p i ′ ,j . This increases the makespan at most by an additive T 2 , since the job was already assigned to an extent of 1 2 on that machine. The assignment cost also goes up, but since we pay c i ′ ,j to assign j to i ′ , and the LP solution pays at least 1 2 c i ′ ,j , this cost causes a penalty by a factor of 2 even after summing up all such assignment costs. Similarly, activation cost is also only affected by a factor of 2.
If the X i ′ ,j value is at most 1 2 , then we simply delete the edge (i ′ , j). We scale up all the X i,j values andȳ i values by 2. Thus the total assignment of any job remains at least 1 and the cost of activation and assignment can go up only by a factor of 2.
3.3 Rounding on G 1 , G 2 The first part involves further relaxing the solution, that is identical to the one described in subsection 2.6.1. Therefore, we now concentrate on rounding G 1 and G 2 separately.

Rounding on G 2
In G 2 , since we have for all (i, j) ∈ E(G 2 ), X i,j =ȳ i /γ, if we decide to open machine i, all the jobs j ∈ J(G 2 ) can be assigned to i, by losing only a factor of γ in the makespan. Therefore, we just need to concentrate on minimizing the cost of opening machines and the total assignment cost, subject to the constraint that all the jobs in J(G 2 ) must have an open machine to get assigned. This is exactly the case of non-metric uncapacitated facility location and we can employ the rounding approach developed in [21] to obtain an approximation factor of O(log n+m OP T ) + O(1) on the machine activation and assignment costs.

3.3.2
Rounding on G 1 Rounding on G 1 is similar to the case when there is no assignment costs with a few modifications. We proceed in the same manner and obtain the stars with job nodes at the centers. Now for each star S j , with j at its center, we consider all the machine nodes in S j . If some machine i ∈ S j is already open, we make its opening cost 0. Now we open the machine, ℓ ∈ S j , for which c j + a ℓ,j is minimum. Again using the same reasoning as in Subsection 2.6.3, the total cost does not exceed by more than a factor of 1 1−1/δ−1/η . Now optimizing α, β, γ, we get the following theorem, Theorem 3.1. If there is a schedule with total machine activation and assignment cost as OP T and makespan T , then a schedule can be constructed efficiently in polynomial time, with total cost O(log n+m OP T + 1)OP T and makespan ≤ (3 + ǫ)T .
Note that for both the cases of minimizing alone the machine activation cost and also minimizing the assignment cost simultaneously, total cost is bounded within a constant factor of log d, where d is the maximum degree (total number of edges incident on the bipartite graph) of any machine node in G 2 .

The Greedy Algorithm
In this section, we present a greedy algorithm that achieves an approximation factor of (2, 1 + ln n). The algorithm is similar to the standard set cover type greedy algorithm and runs in iterations. In each iteration, the most "cost-effective" set, the set that maximizes the ratio of the incremental benefit of the set, to its cost, is chosen and added to our solution set, until all elements are covered.
Given that a solution with activation cost A and makespan T exists, at each step we wish to select a machine to activate based on its "cost-effectiveness". Given a set S of active machines, let F (S) denote the maximum number of jobs that can be scheduled with makespan T . However, in this case, the quantity F (S), is NP-hard to compute, thus it is unlikely to have efficient procedures either to test the feasibility of the current set of active machines or to find the most cost-effective machine to activate. The central idea is that instead of using the integral function F (S) that is hard to compute, we use a fractional relaxation that is much easier to compute, and allows us to apply the greedy framework.
Formally, for a value T , we first set all p i,j 's that are larger than T to infinity (or the corresponding x i,j to 0). Let f (S) be the maximum number of jobs that can be fractionally processed by a set S of machines that are allowed to run for time T each. In other words, Note that f (S) can be computed by using a general LP solver or by a generalized flow computation. The generalized flow problem is the same as the traditional network flow problem except that, for each arc e, there is a gain factor γ(e) and for each unit of flow that enters the arc γ(e) units exit. To see that f can be computed by a generalized flow computation, we add a sink t to the bipartite graph G(M ∪ J, E) and connect each job to t with an arc with capacity 1. Each edge (i, j), i ∈ M, j ∈ J has a capacity p ij and gain factor 1/p ij . Every machine i ∈ S has a flow excess of T . It is easy to see the maximum amount of flow that reaches t is exactly the optimal solution of LP (4.7).
A function z : 2 N → R is submodular if z(S) + z(P ) ≥ z(S ∩ P ) + z(S ∪ P ) for any S, P ⊆ N . Let z(S) be the maximum amount of flow that reach t starting with the excesses at nodes in S: Recently, Fleischer [9] proved the following: It is a direct consequence that f (S) is submodular.
Define gain(i, S) = f (S ∪ i) − f (i) for any i ∈ M and S ⊆ M . Our greedy algorithm starts with an empty set S of active machines, and activates a machine s in each iteration that maximizes gain(i,S) a i , until f (S) > n−1. We then round the fractional solution to an integral one using the scheme by Shmoys and Tardos [20].
Activate the machines in set S; Round f (S) to an integer solution to find an assignment.
The problem is actually a special case of the submodular set cover problem: min{ j∈S a j | z(S) = z(N ), S ⊂ N } where z is a nondecreasing submodular function. In fact, Wolsey [23] shows the following result about the greedy algorithm, rephrased in our notation. where OP T is the optimal solution.
In particular, if f () is integer-valued, the theorem yields a 1 + ln n approximation. However, f () is not necessarily integral in our problem. Therefore, we terminate iterations only when more than n − 1 (rather than n) fractional jobs are satisfied, thus f (M ) − f (S t−1 ) ≥ 1 and Theorem 4.1 gives us a (1 + ln n)-approximation for the activation cost.
Finally, we would like to remark that the rounding step guarantees to find a feasible integral solution although the fractional solution we start with only satisfies more than n − 1 jobs. The reason lies in the construction by Shmoys and Tardos (refer to [20] for more details). Therefore, there exists an integral matching such that all jobs are matched. Moreover, it is also proven that the job assignment induced by any integral matching has a makespan at most T + max p ij . Therefore, our final makespan is at most 2T .

Handling Release Times
Suppose each job j has a machine related release time r ij , i.e, job j can only be processed on machine i after time r ij . We can modify the algorithm in Section 2 to handle release times as follows.
For any "guess" of the makespan T , we let x i,j = 0 if r ij + p i,j > T in the LP formulation. Then, we run the ((2 + ǫ), 2(1 + 1 ǫ )(ln n OP T + 1))-approximation regardless of the release times and obtain a subset of active machines and an assignment of jobs to these machines. Suppose the subset J i of jobs is assigned to machine i. We can now schedule the jobs in J i on machine i in order by release time. It is not hard to see the makespan of machine i is at most T + j∈J i p i,j since every job can be scheduled on machine i after time T . Therefore, we get a (3 + ǫ, 2(1 + 1 ǫ )(log n OP T + O(1))) approximation. Similar extensions can be done for the case with activation and assignment costs.

Scheduling with Outliers
We now consider the case where each job j has profit π j and we are not required to schedule all the jobs. Some jobs can be dropped but the total profit that can be dropped is at most Π ′ . Therefore the total profit earned must be at least j π j − Π ′ = Π. We now show how using our framework and a clever trick used in [11], we can obtain a bound of (3 + ǫ) on the makespan and 2(1 + 1 ǫ )(ln n OP T + 1) on the machine activation cost, while guaranteeing that profit of at most Π ′ (1 + ǫ) is not scheduled. If we consider both machine activation and assignment cost, then we obtain a total cost within O(log n+m OP T + O(1)) of the optimum without altering the makespan and the profit approximation factor.
We create a dummy machine dum, which has cost a dum = 0 and for all j, c i,j = 0. Processing time of job j on dum is π j . It is a trivial exercise to show that both the algorithms of the previous sections work when the makespan constraint is different on different machines. If the makespan constraint on machine i is T i , then we the makespan for machine i is at most (1 + ǫ)T i + max j p i,j . For the dummy machine dum, we set a makespan constraint of Π ′ . Since after the final assignment the makespan at the dummy node can be at most (1 + ǫ)Π ′ + max j π j . With some work it can be shown that we can regain the lost profit for a job with maximum profit on dum, to either an existing machine or by opening a new machine. This either increases our cost slightly, or increases the makespan to at most (3 + ǫ)T .

Minimizing Machine Activation Cost in Uniformly Related Machines
In this section, we show that for related parallel machines, there is an polynomial time (1 + ǫ, 1)approximation for any ǫ > 0. If a schedule with activation cost A and makespan T exists, then we find a schedule with activation cost A and makespan at most (1 + ǫ)T .
We briefly sketch the algorithm which is a slight generalization of the approximation scheme for makespan minimization on related parallel machines by Epstein and Sgall [7]. Actually, their algorithm can optimize a class of objective functions which includes for example makespan, L p norm of the load vector etc. We only discuss the makespan objective in our paper. The extensions to other objectives are straightforward.
Roughly speaking, Epstein and Sgall's algorithm works as follows (see [7] for detailed definitions and proofs). They define the notion of a principal configuration which is a vector of constant dimension and is used to succinctly represent a set of jobs (after rounding their sizes). A principal configuration (see Appendix B for more details) is of the form (w, n) where w = 0 or w = 2 i for some integer i and n is a vector of non-negative integers. The number of different principal configurations is polynomially bounded (for any fixed ǫ > 0). They also construct the graph of configurations in which each vertex is of the form (i, α(A)) for any 1 ≤ i ≤ m and principal configuration α(A) of the job set A ⊂ J. There is a directed edge from (i − 1, α) to (i, α ′ ) if α ′ represents a set of jobs that is a superset of what α represents and its length is the (1 + ǫ)-approximated ratio of the weights of the jobs in the difference of these two sets to the speed s i of machine i. Intuitively, an assignment J 1 , . . . , J m with jobs in J i assigned to machine i corresponds to a path P = {(i, α i )} i in G such that α i represents ∪ i j=1 J j and the length of edge ((i − 1, α i−1 ), (i, α i )) is approximately the load of machine i. By computing a path P in G from (0, α(∅)) to (m, α(J)) such that the maximum length of any edge in P is minimized, we can find an 1 + ǫ approximation for minimizing the makespan.
To obtain a (1 + ǫ, 1)-approximation of the machine activation problem, we slightly modify the above construction of the graph as follows. The sets of vertices and edges are the same as before. We associate each edge with a cost. If both endpoints of edge ((i − 1, α i−1 ), (i, α i )) have the same principal configuration α i−1 = α i , then the cost of the edge is 0; Otherwise, the cost is the activation cost a i of machine i. For the guess of the makespan T # , we compute a path from (0, α(∅)) to (m, α(J)) such that the maximum length of any edge in P is at most T # and the cost is minimized. If T ≤ (1 + ǫ)T * , we are guaranteed to find a path of cost at most A.

Conclusions
Current research includes considering different L p norms as well as other measures such as weighted completion time. The greedy approach currently only works for the most basic version giving a makespan of 2T and an activation cost of O(log n)A . Extending it to handle other generalizations of the basic problem is ongoing research.