Energy Efficient Scheduling and Routing via Randomized Rounding

We propose a unifying framework based on configuration linear programs and randomized rounding, for different energy optimization problems in the dynamic speed-scaling setting. We apply our framework to various scheduling and routing problems in heterogeneous computing and networking environments. We first consider the energy minimization problem of scheduling a set of jobs on a set of parallel speed scalable processors in a fully heterogeneous setting. For both the preemptive-non-migratory and the preemptive-migratory variants, our approach allows us to obtain solutions of almost the same quality as for the homogeneous environment. By exploiting the result for the preemptive-non-migratory variant, we are able to improve the best known approximation ratio for the single processor non-preemptive problem. Furthermore, we show that our approach allows to obtain a constant-factor approximation algorithm for the power-aware preemptive job shop scheduling problem. Finally, we consider the min-power routing problem where we are given a network modeled by an undirected graph and a set of uniform demands that have to be routed on integral routes from their sources to their destinations so that the energy consumption is minimized. We improve the best known approximation ratio for this problem.


Introduction
We focus on energy minimization problems in heterogeneous computing and networking environments in the dynamic speed-scaling setting. For many years, the exponential increase of processors' frequencies followed Moore's law. This is no more possible because of physical (thermal) constraints. Today, for improving the performance of modern computing systems, designers use parallelism, i.e., multiple cores running at lower frequencies but offering better performances than a single core. These systems can be either homogeneous where an identical core is used many times, or heterogeneous combining general-purpose and special-purpose cores. Heterogeneity offers the possibility of further improving the performance of the system by executing each job on the most appropriate type of processors [19]. However in order to exploit the opportunities offered by the heterogeneous systems, it is essential to focus on the design of new efficient power-aware algorithms taking into account the heterogeneity of these architectures. In this direction, Gupta et al. in [24] have studied the impact of the introduction of the heterogeneity on the difficulty of various power-aware scheduling problems.
In this paper, we show that rounding configuration linear programs helps in handling the heterogeneity of both the jobs and the processors. We adopt one of the main mechanisms for reducing the energy consumption in modern computer systems which is based on the use of speed scalable processors. Starting from the seminal paper of Yao et al. [30], many papers adopted the speed-scaling model in which if a processor runs at speed s, then the rate of the energy consumption, i.e., the power, is P (s) = s α with α a constant close to 3 (new studies show that α is rather smaller: 1.11 for Intel PXA 270, 1.62 for Pentium M770 and 1.66 for a TCP offload engine [29]). Moreover, the energy consumption is the integral of the power over time. This model captures the intuitive idea that the faster a processor works the more energy it consumes per unit of work.
We first consider a fully heterogeneous environment where both the jobs' characteristics are processor-dependent and every processor has its own power function. Formally, we consider the following problem: we are given a set J of n jobs and a set P of m parallel processors. Every processor i ∈ P obeys to a different speed-to-power function, i.e., it is associated with a different α i ≥ 1 and hence if a job runs at speed s on processor i, then the power is P (s) = s α i . Each job j ∈ J has a different release date r i,j , deadline d i,j and workload w i,j if it is executed on processor i ∈ P. The goal is to find a schedule of minimum energy respecting the release dates and the deadlines of the jobs.
The assumption that the jobs have processor-dependent works covers for example the problem of scheduling in the restricted assignment model (see [28]). In this model each job is associated with a subset of processors and has to be executed on one of them. Clearly, in our model the work of each job is the same on the processors of its corresponding subset and infinite on the remaining processors. Moreover, processor-dependent release dates have been already studied in the literature when the processors are connected by a network. In such a case, it is assumed that every job is initially available at a given processor and that a transfer time must elapse before it becomes available at a new machine [11,20].
In this paper we propose a unifying framework for minimizing energy in different heterogeneous computing and networking environments. We first consider two variants of the heterogeneous multiprocessor preemptive problem. In both cases, the execution of a job may be interrupted and resumed later. In the non-migratory case each job has to be entirely executed on a single processor. In the migratory case each job may be executed by more than one processors, without allowing parallel execution of a job. We also focus on the non-preemptive single processor case. Furthermore, we consider the energy minimization problem in an heterogeneous job shop environment where the jobs can be preempted. Finally, we study the min-power routing problem, introduced in [6], where a set of uniform demands have to be routed on integral routes from their sources to their destinations so that the energy consumption is minimized. We believe that our general techniques will find further applications in energy optimization.

Related Work
Yao et al. [30] proposed an optimal algorithm for finding a feasible preemptive schedule with minimum energy consumption when a single processor is available.
The homogeneous multiprocessor case has been solved optimally in polynomial time when both the preemption and the migration of jobs are allowed [4,8,14,18]. Albers et al. [5] considered the homogeneous multiprocessor preemptive problem, where the migration of the jobs is not allowed. They proved that the problem is N P-hard even for instances with common release dates and common deadlines. Greiner et al. [22] gave a generic reduction transforming an optimal schedule for the homogeneous multiprocessor problem with migration, to a B α -approximate solution for the homogeneous multiprocessor preemptive problem without migration, where B α is the α -th Bell number.
Antoniadis and Huang [9] proved that the single processor non-preemptive problem is N P-hard even for instances in which for any two jobs j and j with r j ≤ r j it holds that d j ≥ d j . They also proposed a 2 5α−4 -approximation algorithm for general instances. For the homogeneous multiprocessor non-preemptive case an approximation algorithm of ratio m α ( m √ n) α−1 has been proposed in [13]. Andrews et al. [6] studied the min-power routing problem for slightly heterogeneous power functions. The heterogeneity here is not expressed in the exponent but by a multiplicative factor, i.e., it is of the form c i s α where c i is a constant characterizing the component i. For uniform demands, i.e., in the case where all the demands have the same value, they proposed a γ-approximation algorithm, where γ = max{1 + τ 2 α(τ +1) log e , 2 + τ 2 α(τ +1) }, with τ = 2 log(α + 4) . For non-uniform demands, they proposed a O(log α−1 D)-approximation algorithm, where D is the maximum value of the demands. Gupta et al. [25] improved the above result by presenting an algorithm of competitive ratio that depends only on α for the homogeneous case. There exist also works for the more enhanced setting where the network components are speed scalable and they may be shutdown when idle. More specifically, the power of a link is of the form σ + f α , where σ is the static power which is paid when the link is used and f is the number of demands using the link. For the slightly heterogeneous setting where the power functions of the edges are of the form c i (σ + f α ), Andrews et al. [7] derived a poly-log approximation algorithm which was latter improved by Antoniadis et al. [10] who proposed an improved and simpler O(log α k)-approximation algorithm.
For further results on energy-efficient scheduling we refer the interested reader to the reviews [2, 3, 21].

Notation
We denote by E(S) the total energy consumed by a schedule S. Moreover, we denote by S * an optimal schedule and by OP T the energy consumption of S * . For each job j ∈ J , we say that j is alive on processor i ∈ P during the interval [r i,j , d i,j ]. Let α = max i∈P {α i }. The Bell number, B n , is defined for any integer n ≥ 0 and corresponds to the number of partitions of a set of n items. It is well known that Bell numbers satisfy the following equality known as Dobinski's formula. Another way to state this formula is that the n-th Bell number is equal to the n-th moment of a Poisson random variable with parameter (expected value) 1. This naturally leads to a more general definition. The generalized Bell number, denoted bỹ k! , is defined for any α ∈ R + and corresponds to the α-th (fractional) moment of a Poisson random variable with parameter 1. Note that the ratios of our algorithms depend on the generalized Bell number, while the previous known [22] on the standard Bell number.

Our Contribution
In this paper we formulate heterogeneous scheduling and routing problems using configuration linear programs (LPs) and we apply randomized rounding.
In Section 3, we consider the heterogeneous multiprocessor speed-scaling problem without migrations and we propose an approximation algorithm of ratio (1 + ε)B α based on a configuration LP. As this LP has an exponential number of variables, we give an alternative (compact) formulation of the problem using a polynomial number of variables and we prove the equivalence between the two LP relaxations. For real values of α our result improves the B α approximation ratio of [22] for the homogeneous case to (1 + ε)B α for the fully heterogeneous environment that we consider here (see Table 1).
In Section 4, using again a configuration LP formulation, we present an algorithm for the heterogeneous multiprocessor speed-scaling problem with migration. This algorithm returns a solution which is within an additive factor of ε from the optimal solution and runs in time polynomial to the size of the instance and to 1/ε. This result generalizes the results of [4,8,14,18] from an homogeneous environment to a fully heterogeneous environment.
In Section 5, we transform the single processor speed-scaling problem without preemptions to the heterogeneous multiprocessor problem without migrations and we give an approximation algorithm of ratio 2 α−1 (1+ε)B α . This result improves upon the previous known 2 5α−4 -approximation algorithm in [9] for any α < 114 (see Table 1).
In Section 6, we study the power-aware preemptive job shop scheduling problem and we propose a ((1 + ε)B α )-approximation algorithm.
Finally, in Section 7, we improve the analysis for the min-power routing problem with uniform demands given in [6], based on the randomized rounding analysis that we propose in this paper. Our approach gives an approximation ratio ofB α significantly improving the analysis given in [6] (see Table 1).

Technical Probabilistic Propositions
In this section, we state and prove a series of technical propositions which are key ingredients in our analysis. Proposition 1 bounds the expectation of a specific function of random variables.  Table 1: Comparison of our approximation ratios vs. better previous known ratios for: (i) the preemptive multiprocessor problem without migrations, (ii) the single processor nonpreemptive problem, and (iii) the min-power routing problem.
Proposition 1. Consider n random variables X 1 , X 2 , . . . , X n and let α > 1. Then, it holds that . Let S be a random set generated by choosing each element e j , 1 ≤ j ≤ n, independently at random with probability Y j . Moreover, let e n = e n+1 . Assume that S is a random set generated by sampling independently at random within {e 1 , e 2 , . . . , e n+1 }, with probabilities Proof. Let P r(T ) be the probability that exactly the constants in the set T are chosen among the constants in U = {e 1 , e 2 , . . . , e n−1 }. That is, P r(T ) = e j ∈T Y j e j ∈U \T (1 − Y j ). We have that Given the fact that Y n = Y n + Y n+1 , the above inequality can be rewritten equivalently as If either Y n = 0 or Y n+1 = 0, the inequality is clearly true. Otherwise, the inequality holds due to the convexity of the function g(x) = x α . Specifically, for any x, y > 0 and θ ∈ [0, 1], it must be the case that For x = A, y = A + 2B and θ = 1 2 , we get that Proposition 3 is a corollary of the generalized means inequality.
where B a is a sum of n independent Bernoulli random variables, E[B a ] = a and P a is a Poisson random variable with parameter a.
Proof. To upper bound the expected value of f (x), we will need the following probabilistic fact that was first proved by Hoeffding [26] for finite sum of Bernoulli random variables and was lately generalized for more general distributions by Berend and Tassa [17].
where Y is a binomial random variable with distribution Y ∼ B(t, µ/t) in case t < ∞, and a Poisson random variable with distribution Y ∼ P (µ) otherwise.
We define a binomial random variable B a as a sum of B a and an infinite number of Bernoulli random variables Y j for j = n + 1, . . . , ∞ such that P r(Y j = 1) = 0. Obviously, Since the function f (x) is convex we can apply the Proposition 5 with parameter t = ∞ and the statement follows.
Proposition 6 estimates moments of Poisson random variables with parameter λ through the moments of Poisson random variables with parameter 1.
Proposition 6. For any real α ≥ 1 and a Poisson random variable P λ with parameter λ ≥ 0, we have: For λ = 0 the statement of the Lemma is trivial. Assume λ > 0, then we derive (b) For the case where λ > 1, we will use two basic facts. The first fact is that given a Poisson random variable X 1 with parameter λ 1 and a Poisson random variable X 2 with parameter λ 2 that are mutually independent, then a random variable X 1 + X 2 is a Poisson random variable with parameter λ 1 + λ 2 . The second fact is that for any random variable X the quantity E[X p ] 1/p defines a norm and therefore satisfies the triangle inequality (Minkowski's norm inequality), i.e., ||X + Y || p ≤ ||X|| p + ||Y || p .
Assume that λ = A B is a rational number, A, B ∈ Z + and A > B ≥ 1. Let X be a Poisson random variable with parameter λ and Y 1 , . . . , Y A be independent Poisson random variables with parameter 1/B. In addition, let Y S be a random variable . . , A}. Let P 1 be a Poisson random variable with parameter 1. Then applying the triangle inequality we obtain which implies the inequality in the statement of the lemma for any rational value of λ > 1.
To derive the inequality for any real λ > 1 we just need to apply the standard limiting argument, i.e., any real is a limit of rationals and the inequality holds for each of these rational values. Therefore, the inequality must hold for the real value of λ.

Heterogeneous Multiprocessor without Migrations
In this section we consider the case where the migration of jobs is not permitted, but their preemption is allowed. The corresponding homogeneous problem is known to be N P-hard even if all jobs have common release dates and deadlines [5]. We propose an approximation algorithm by formulating the problem as a configuration integer program (IP) with an exponential number of variables and a polynomial number of constraints. Given an optimal solution for the configuration LP relaxation, we apply randomized rounding to get a feasible schedule for our problem. In order to get a polynomial-time algorithm, we present another (compact) formulation of our problem with a polynomial number of variables and constraints. Then, we show that any feasible solution for the configuration LP relaxation can be transformed to a feasible solution for the compact LP relaxation with the same energy consumption and vice versa.

Linear Programming Relaxation
In order to formulate our problem as a configuration IP we need to discretize the time. In the following lemma we assume that the release dates and the deadlines of all jobs in all processors are integers.
There is a feasible schedule with energy consumption at most ((1 + ε 1−ε )(1 + 2 n−2 )) α · OP T in which each piece of each job j ∈ J (j is executed on processor i ∈ P) starts and ends at a time point where k ≥ 0 is an integer and ε ∈ (0, 1). Proof. We will first transform an optimal schedule S * to a feasible schedule S in which the execution time of each job j ∈ J executed on processor i ∈ P is at least ε n (d i,j − r i,j ). As the release dates and the deadlines are integers, we can divide the time into unit length slots. We can get now the schedule S from S * as follows: For each unit slot we increase the processors' speeds such that to create an idle period of length ε. This can be done by increasing the speeds by a factor of 1 + ε 1−ε , and hence the total energy consumption in S is increased by a factor of (1 + ε 1−ε ) α . For each job j ∈ J , we reserve an ε n period to each unit slot in (r i,j , d i,j ] on the processor in which j was executed in S * . In S, we decrease the speed of j such that its total work to be executed during the periods where j was executed in S * and the additional d i,j − r i,j reserved periods. Therefore, in the final schedule the processing time of each job j ∈ J is at least ε n (d i,j − r i,j ). After this transformation we apply the Earliest Deadline First (EDF) policy to each processor separately with respect to the set of jobs assigned on this processor in S * and the speeds defined above. This ensures that we have a schedule with at most n preemptions, as in EDF a job may be interrupted only when another job is released.
Next, we transform S to a new schedule S such that to satisfy the statement of the lemma. For each job j ∈ J which is executed on the processor i ∈ P, we split the interval where k ≥ 0 is an integer. As the processing time of j in S is at least ε n (d i,j − r i,j ), the execution of j has been partitioned into at least n 2 slots. In each of these slots, the job j either is executed during the whole slot or is executed into a fraction of it. As we have applied the EDF policy, the job j is preempted at most n times, and hence at most 2n of these slots are not completely covered by j, since for each preempted piece of j at most two slots may not be completely covered by it, i.e., the first and the last slot of its execution. We can modify the schedule S and get the schedule S in which the job j is executed only to the slots where it was entirely executed in S. The number of these slots is at least n 2 − 2n. Thus, we have to increase the speed of j by a factor of 1 + 2 n−2 , and hence the energy is increased by a factor of (1 + 2 n−2 ) α . By taking into account that S is a factor of (1 + ε 1−ε ) α far from the optimal, the lemma follows. Let S be a schedule that satisfies Lemma 1 and let j ∈ J be a job executed on the processor i ∈ P in S. The above lemma implies that the interval (r i,j , d i,j ] can be partitioned into a polynomial, with respect to n and 1/ε, number of equal length slots. In each of these slots, j either is executed during the whole slot or is not executed at all. In what follows we consider schedules that satisfy Lemma 1.
A configuration c is a schedule for a single job on a single processor. Specifically, a configuration determines the slots, with respect to Lemma 1, during which one job is executed. Given a configuration c for a job j ∈ J , we can define the execution time of j that is equal to the number of slots in c multiplied by the length of the slot. Due to the convexity of the speed-to-power function, in a minimum energy schedule that satisfies Lemma 1, the job j runs at a constant speed s j . Hence, s j is equal to the work of j over its execution time. Let C ij be the set of all possible feasible configurations for a job j ∈ J in a processor i ∈ P.
In order to ensure the feasibility of our schedule we need to further partition the time, by merging the slots for all jobs. Given a processor i ∈ P, consider the time points of all jobs of the form . . , t i, i be the ordered sequence of these time points. Consider now the intervals (t i,p , t i,p+1 ], 1 ≤ p ≤ i − 1. In a schedule that satisfies Lemma 1, in each such interval either there is exactly one job that is executed during the whole interval or the interval is idle. Note also that these intervals might not have the same length. Let I be the set of all these intervals for all processors.
We introduce the binary variable x i,j,c that is equal to one if the job j ∈ J is entirely executed on the processor i ∈ P according to the configuration c, and zero otherwise. Note that, given the configuration and the processor i where the job j is executed, we can compute the energy consumption E i,j,c for the execution of j. For ease of notation, we say I ∈ (i, j, c) if the interval I ∈ I is included in the configuration c of processor i ∈ P for the job j ∈ J , that is there is a slot ( Inequality (1) enforces that each job is entirely executed according to exactly one configuration. Inequality (2) ensures that at most one job is executed in each interval (t i,p , t i,p+1 ], We next relax the constraints (3) such that x i,j,c ≥ 0. Since the structure of this LP is quite simple we can define an equivalent compact LP relaxation with polynomial number of constraints and variables. We describe how to do it in Section 3.3. For now we assume that we can find an optimal solution of our configuration LP in polynomial time.

Randomized Rounding
Now, we show how to apply randomized rounding to get an approximation algorithm for our problem. Recall that, by definition, an interval I ∈ I corresponds to a single processor i ∈ P. Our algorithm follows. Theorem 1. Assume that α i ≥ 1 for all i = 1, . . . , m. Algorithm 1 achieves an approximation ratio of ((1 + ε 1−ε )(1 + 2 n−2 )) αB α for the heterogeneous multiprocessor preemptive speedscaling problem without migrations in time polynomial to n and to 1/ε, where α = max i∈P α i and ε ∈ (0, 1).
Proof. For each interval I ∈ I, we estimate the expected energy consumption during I. So, in the remainder of the proof, we fix such an interval (and processor).
Initially, the algorithm computes an optimal solution for the relaxed LP. For each job j ∈ J , let n j be the number of the non-zero x i,j,c variables such that I ∈ (i, j, c). Note that, every such variable corresponds to some configuration c such that if the job j is executed according to c, then it must be executed during I. For notational convenience, let X j,k be the k-th, 1 ≤ k ≤ n j , of these non-zero variables and s j,k be the corresponding speed. The probability that the job j is executed during I in the algorithm's schedule is Y j = n j k=1 X j,k . If the job j is entirely executed according to the configuration which corresponds to the variable X j,k , then its energy consumption is e j,k = |I|s α i j,k , during I. The energy consumption achieved by the optimal solution of the LP relaxation during I is LP * I = n j=1 n j k=1 e j,k X j,k . Assume that the randomized rounding assigns exactly the jobs in the set S to be processed during the interval I. The probability of such an event is P r Let E(S) be the expected energy consumption during I under the condition that exactly the jobs in the set S are executed during I. Then, the expected energy consumption E I of our algorithm during I can be expressed as follows: We, now, estimate E(S). Let U (S) be the set of all combinations of pairs (j, k) that we can choose in order to schedule exactly the jobs in set S, S ⊆ J , during I. If the algorithm schedules the jobs in S during I according to the configurations in U , where U ⊆ U (S), then the total energy consumption during I in the algorithm's schedule is |I| For each job j ∈ S, we denote byẽ j a random variable which takes the value e j,k with probability We set e j = E [ẽ j ]. Therefore, the expected energy consumption during I can be upper bounded as follows We can assume that there exists a sufficiently large Q ∈ N such that Y j = q j Q , 1 ≤ j ≤ n, for some q j ∈ N (we don't make any assumptions on the encoding length of these numbers, we use them only for analysis purposes) since these numbers come from solving an LP with rational coefficients. Let Y = 1/Q and q = n j=1 q j . Note that q ≤ Q. By applying the Proposition 2 iteratively, we split Y j into smaller pieces and we get that By using Proposition 3 we get By changing the order of the sums in the above inequality and given that q−1 k−1 is the number of sets of cardinality k that contain j, we get where B q/Q is a random variable with expectation q Q which corresponds to the sum of q i.i.d Bernoulli random variables. Therefore, where the second inequality follows from Proposition 4 and the last inequality follows from Proposition 6(a). Therefore, by summing over all intervals and processors and as α = max i∈P α i , we get E ≤ LP * · E[P α 1 ] = LP * ·B α The theorem follows.
Remark: Let f i be the contribution of all variables x i,j,c to the value of the optimal LP solution, i.e., i∈P f i = f where f is the optimal value of the configuration LP. One can refine the analysis of the Theorem 1 and show the factor of i∈PB α i

Compact Linear Programming Relaxation
Next, we define a compact integer programming formulation for the problem without migrations and we show that any feasible solution of its LP relaxation can be transformed to a feasible solution for the LP relaxation of the configuration integer program presented before, and vice versa. Recall that, by Lemma 1, there is always an ((1+ ε 1−ε )(1+ 2 n−2 )) α -approximate schedule for our problem such that if the job j ∈ J is executed on the processor i ∈ P, then its feasibility interval (r i,j , d i,j ] can be partitioned into equal-length slots. Given such a slot t, j is either executed during the whole t or it is not executed at all during t. The number of these slots is n 3 ε , while each slot t has length t = ε n 3 (d i,j − r i,j ). Recall also that I denotes the set of all intervals occurred by merging the slots for all jobs.
In order to formulate our problem as a compact LP, we introduce a binary variable y i,j,q which is equal to one if and only if the job j is executed on the processor i during exactly q slots and zero otherwise. Moreover, we introduce a binary variable z i,j,q,t which is equal to one if and only if the job j is executed on the processor i during the slot t and it is executed during exactly q slots in total. Otherwise, z i,j,q,t is equal to zero. We define the as the total execution time and the energy consumption, respectively, of the job j if it is entirely executed on the processor i during exactly q slots.
j,q t:I⊆t z i,j,q,t ≤ 1 ∀i ∈ P, I ∈ I (6) The constraint (4) ensures that each job is entirely executed on some processor. The constraint (5) establishes the relationship between the variables z i,j,q,t and y i,j,q . If y i,j,q = 1, then exactly q variables z i,j,q,t must be equal to one. The constraint (6) enforces that at most one job is executed by each processor at each time. Specifically, given a job j ∈ J which is executed on the processor i ∈ P, if j is executed during the slot t ∈ {1, 2, . . . , n 3 ε }, then j is executed during every interval I ∈ I such that I ⊆ t. Note that the numbers of both the variables and the constraints of the above LP are polynomial to n and to 1/ε.
The configuration and the compact formulations are equivalent, as they both lead to a minimum energy schedule satisfying Lemma 1. Consider now the LPs that occur if we relax constraints (3) and (7), respectively. In Lemma 2 we prove that the equivalence is also true for these relaxations through a transformation of a solution for the configuration LP relaxation to a solution for the compact LP relaxation, and vice versa. As a result, given a solution of the compact LP relaxation obtained by any polynomial time algorithm, we can get a solution for the configuration LP relaxation. Then, we can apply the randomized rounding presented in the previous section and get the approximation ratio of Theorem 1. Proof. Assume that we are given a feasible solution for the relaxation of the configuration LP. Such a solution corresponds to a schedule of the jobs on the processors. Specifically, the value of the variable x i,j,c specifies the part of the job j ∈ J executed on processor i ∈ P during the slots that belong to the configuration c ∈ C ij . Let C ijq ⊆ C ij be the set of configurations of j on i with exactly q slots. Then, we define z i,j,q,t = c∈Cq:t∈c x i,j,c . This defines a feasible solution for the relaxation of the compact LP.
Assume that we are given a feasible solution for the compact LP. We will define a set of configurations and we will assign a non-zero value for each variable x i,j,c that corresponds to these configurations. The number of these configurations should be polynomial to n and to 1 ε . The remaining variables of the configuration LP will be set to zero.
Consider a non-zero variable y i,j,q (and its corresponding variables z i,j,q,t ) in the solution of the compact LP. We partition the part of the schedule defined by y i,j,q into a set of configurations with q slots and we specify the values of the variables x i,j,c that correspond to these configurations. To do this, for each variable y i,j,q and its associated variables z i,j,q,t , we construct a bipartite graph G = (A ∪ B, E) as follows. The set A contains q nodes, i.e., A = {a 1 , a 2 , . . . , a q }. Intuitively, each of these nodes corresponds to one of the q slots of the configurations that will correspond to y i,j,q . The set B contains n 3 ε nodes, one for each possible slot of j on the processor i (see Lemma 1), i.e., B = {b 1 , b 2 , . . . , b n 3 ε }. We will define the set of edges E and their weights, such that each node a k ∈ A has weighted degree exactly y i,j,q and each node b t ∈ B has weighted degree exactly z i,j,q,t . Note that, the total weight of all the edges will be q · y i,j,q = t z i,j,q,t . We start by adding edges from a 1 to b 1 , b 2 , . . . of weight z i,j,q,1 , z i,j,q,2 , . . ., respectively, as long as k t=1 z i,j,q,t ≤ y i,j,q . The first time where k t=1 z i,j,q,t > y i,j,q we add an edge between a 1 and b k of weight y i,j,q − k−1 t=1 z i,j,q,t . Moreover, we add an edge between a 2 and b k of weight z i,j,q,k −(y i,j,q − k−1 t=1 z i,j,q,t ). We continue adding edges from a 2 to b k+1 , b k+2 , . . . of weight z i,j,q,k+1 , z i,j,q,k+2 , . . ., respectively, until the sum of their weights is bigger than y i,j,q . At this point we add an edge of appropriate weight starting from a 3 and we continue like this. Note that, by construction each node b t ∈ B has degree either one or two. Then, we construct a set of configurations based on the following proposition.
Proposition 7. Let G = (A ∪ B, E) be a bipartite graph in which each node in A has weighted degree exactly one and each node in B has weighted degree at most one. There are perfect matchings M 1 , M 2 , . . . , M r (i.e., matchings having exactly |A| edges) and coefficients λ 1 , λ 2 , . . . , λ r such that r i=1 λ i = 1, and for each edge e it holds that i:e∈M i λ i = w e , where w e is the weight of the edge e.
Proof. By the construction of the graph G, all its edges have a positive weight and all nodes in the set A have the same weighted degree which is equal to y i,j,q . Consider an arbitrary perfect matching M in G. Let w min = min e∈M {w e }. Clearly, w min > 0. We define λ 1 = w min and we modify the graph G by setting the weight of every edge e ∈ M equal to w e − w min . Then, we remove all edges with zero weight. We repeat this procedure until the graph is empty. Given that we remove at least one each in each iteration, we compute a polynomial number of perfect matchings.
It is easy to see that the solution obtained for the configuration LP is feasible. The fact that Constraint (1) is satisfied comes from Constraints (4) and (5). The fact that Constraint (2) is satisfied comes from Constraint (6).

Heterogeneous Multiprocessor with Migrations
In this section we present an algorithm for the heterogeneous multiprocessor speed-scaling problem with preemptions and migrations. We assume that, if x units of work for the job j are executed on the processor i, then x/w i,j portion of j is accomplished by i. We formulate the problem as a configuration LP, with an exponential number of variables and a polynomial number of constraints, and we show how to obtain an OP T + ε solution with the Ellipsoid algorithm in time polynomial to the size of the instance and to 1/ε, where ε > 0.
A configuration c is a one-to-one assignment of n c , 0 ≤ n c ≤ m, jobs to the m processors as well as an assignment of a speed value for every processor. We denote by C the set of all possible configurations. A well defined schedule for our problem has to specify exactly one configuration at each time t. The cardinality of C is unbounded, since the processors' speeds may be any real values. Hence, we have to discretize the possible speed values and consider only a finite number of speeds at which the processors can run.

Lemma 3.
There is a feasible schedule of energy consumption at most OP T + ε that uses a finite (exponential to the size of the instance and polynomial to 1/ε) number of discrete processors' speeds, for any ε > 0.
Proof. To discretize the speeds, we first define a lower and an upper bound on the speed of any processor in an optimal schedule. For the lower bound, consider a job j ∈ J . Recall that the release date and the deadline of j are different on different processors. Hence, the feasible intervals of j in different processors may be completely disjoint, that is the processing time of j in an optimal schedule can be equal to i∈P (d i,j − r i,j ). Therefore, due to the convexity of the speed-to-power function, a non-zero lower bound on the speed of every processor is the minimum density among all the jobs, i.e., s LB = min j∈J { For the upper bound, consider a processor i ∈ P. An upper bound on the speed of i can be obtained by calculating the speed at which the jobs would run if they were all executed in the minimum alive interval of any job on i, i.e., j∈J w i,j min j∈J (d i,j −r i,j ) . Hence, an upper bound on the speed of every processor is s U B = max i∈P { j∈J w i,j min j∈J (d i,j −r i,j ) }. Given these lower and upper bounds and a small constant δ > 0, we discretize the speed values in a geometric way. In other words, we consider only the speeds of the form (1+δ)s LB , (1+δ) 2 s LB , . . . , (1+δ) k s LB , where k is the first integer such that (1+δ) k s LB ≥ s U B . Hence, the number of speed values is equal to k = log 1+δ s U B s LB , which is polynomial to the size of the instance and to 1/ log(1 + δ).
Consider now an optimal schedule for our problem. Let S be the schedule obtained from the optimal one by rounding up the processors' speeds to the closest discrete value. The ratio of the energy consumption of any processor i ∈ P at any time t in S over the energy consumption by i at t in the optimal schedule is at most (1 + δ) α i . By summing up for all processors and all time instances, we get that the energy consumption of S is at most (1 + δ) α OP T . Finally, if we pick a δ such that δ = (1 + ε OP T ) 1/α − 1, then the energy consumption of S is at most OP T + ε. However, this selection made the number of discrete speeds to be exponential to the size of the instance and to 1/ε.
In what follows in this section, we deal with schedules that satisfy Lemma 3. Let, now, t 0 < t 1 < . . . < t be the time instants that correspond to release dates and deadlines of jobs so that there is a time t i for every possible release date and deadline. We denote by I the set of all possible intervals of the form (t i−1 , t i ], for 1 ≤ i ≤ . Let |I| be the length of the interval I. We introduce a variable x I,c , for each I ∈ I and c ∈ C, which corresponds to the total processing time during the interval I ∈ I where the processors run according to the configuration c ∈ C. We denote by E I,c the instantaneous energy consumption of the processors if they run with respect to the configuration c during the interval I. Moreover, let s j,c be the speed of the job j according to the configuration c. For notational convenience, we denote by (I, c) the set of jobs which are alive during the interval I and which are executed on some processor by the configuration c. Finally, let i(j, c) be the processor on which the job j is assigned into configuration c. We propose the following configuration LP: x I,c ≥ 1 ∀j ∈ J (9) x I,c ≥ 0 ∀I ∈ I, c ∈ C Consider the schedule for the interval I that occurs by an arbitrary order of the configurations assigned to I. This schedule is feasible, as the processing time of all configurations assigned to I is equal to the length of the interval. Hence, Inequality (8) ensures that for each interval I there is exactly one configuration for each time t ∈ I. Inequality (9) implies that each job j is entirely executed. The above LP has an exponential number of variables. In order to handle this, we create the dual LP, which has an exponential number of constraints. Next, we show how to efficiently apply the Ellipsoid algorithm to it (see [23]). For this, we provide a polynomialtime separation oracle, i.e., we give a polynomial-time algorithm which given a solution for the dual LP decides if this solution is feasible or otherwise it identifies a violated constraint. As we can compute an optimal solution for the dual LP, we can also find an optimal solution for the primal LP by solving it with the variables corresponding to the constraints that were found to be violated during the run of the ellipsoid method and setting all other primal variables to be zero. The number of these violated constraints is polynomial to the size of the instance and to 1/ε. Thus, we can solve the primal LP with a polynomial number of variables.
The dual LP is the following: The separation oracle for the dual LP works as follows. For each I ∈ I, we try to find if there is a violated constraint. Recall that there are O(nm) intervals in the set I. w i(j,c),j λ j ). For each job j ∈ J that is alive during I, the term s ,c),j λ j is minimized at the discrete value v i(j,c),j which is one of the two closest possible discrete speeds to the value . To see this we just need to notice that we minimize a one variable convex function over a set of possible discrete values. The is obtained by minimizing s ,c),j λ j if there is no discretization of the speeds and it is obtained by equating the derivative of the last expression with zero. Hence, given an interval I, we want to find a configuration c that minimizes Since a configuration c assigns 0 ≤ n c ≤ m jobs to m processors, the problem of minimizing the last expression reduces to a maximum weighted matching on the bipartite graph which is constructed as follows: we introduce one node for each job and one node for each processor. There is an edge between each alive job j ∈ J in interval I and each processor i ∈ P with weight equal to − The maximum weight matching in such a bipartite graph defines a configuration c, that is an assignment of n c ≤ m jobs to m processors.
Hence, there is a polynomial time separation oracle for the dual problem which runs in polynomial time. To apply the ellipsoid method in polynomial time, we need to check two additional technical conditions. The first condition is that the value of all dual variables are upper bounded by some number R. The second condition is that for the dual program there is a feasible point (or solution) and every point in a radius r is feasible. Then the running time of the ellipsoid method will be polynomial to log R r . The first condition and the bound on R can be derived from the fact that the solution of the problem must be a vertex of the corresponding polyhedra since we know that the value of an optimal solution is bounded. Therefore, R is a polynomial involving various input parameters. We skip the precise definition of R. The second condition is satisfied for the point (λ, µ) defined as follows: λ j = 1 for all j ∈ J and µ I is large enough such that ,c),j . Hence, the inequalities are satisfied in the ball of radius 1 around (λ, µ), that is r = 1.

Theorem 2.
A schedule for the heterogeneous multiprocessor speed-scaling problem with migrations of energy consumption OP T + ε can be found in polynomial time with respect to the size of the instance and to 1/ε, for any ε > 0.

Single processor without Preemptions
In this section we present an approximation algorithm for the single processor speed-scaling problem in which the preemption of jobs is not allowed. As a single processor is available, each job j ∈ J has a unique release date r j , deadline d j and amount of work w j , while when the processor runs at speed s, it consumes energy with rate s α . Due to the convexity of the speedto-power function, j runs at a constant speed s j in an optimal schedule S * . Antoniadis and Huang [9] proved that this problem is N P-hard and gave a 2 5α−4 -approximation algorithm.
The algorithm in [9] consists of a series of transformations of the initial instance. Our algorithm applies the first of these transformations. Then, we give a transformation to the heterogeneous multiprocessor speed-scaling problem without migrations.
For completeness, we describe the first transformation given in [9]. We partition the time as follows: let t 1 be the smallest deadline of any job in J , i.e., t 1 = min{d j : j ∈ J }. Let J 1 ⊆ J be the set of jobs which are released before t 1 , i.e., J 1 = {j ∈ J : r j ≤ t 1 }. Next, we set t 2 = min{d j : j ∈ J \ J 1 } and J 2 = {j ∈ J : t 1 < r j ≤ t 2 }, and we continue this procedure until all jobs are assigned into a subset of jobs. Let k be the number of subsets of jobs that have been created. Moreover, let t 0 = min{r j : j ∈ J } and t k+1 = max{d j : j ∈ J }.
Consider the intervals (t i−1 , t i ], 1 ≤ i ≤ k + 1. Let I j be the set of intervals in which the job j ∈ J is alive. In some of them j is alive during the whole interval, while in at most two of them it is alive during a part of the interval. Consider now the non-preemptive problem in which the execution of j should take place into exactly one interval I ∈ I j . Note that the execution of j should respect its release date and its deadline. The following proposition is proved in [9]. Proposition 8. Let S be an optimal non-preemptive schedule for the problem in which the execution of each job j ∈ J should take place into exactly one interval I ∈ I j . It holds that E(S) ≤ 2 α−1 OP T .
Proof. In order to get a relation about the energy consumption between the schedule S and the optimal schedule S * , consider first a job j ∈ J which is alive in more than one intervals, i.e., |I j | ≥ 2. By definition, it holds that r j ≤ t and t < d j ≤ t +1 , where < . Moreover, consider a p, < p ≤ , and let j ∈ J p be the job that defines t p , i.e., d j = t p . By definition, for j it holds that t p−1 < r j ≤ t p . Although j is alive at times t p−1 and t p , there is no feasible schedule in which j is executed at both of them; otherwise j could not be feasibly executed as we have available only one processor. Therefore, in S * a job cannot appear into more than two consecutive intervals (t −1 , t ] and (t , t +1 ]. Starting from S * , we create a feasible non-preemptive schedule S for the problem in which the execution of each job j ∈ J should take place into exactly one interval I ∈ I j respecting its release date and its deadline. In order to do this, consider a job j ∈ J which is executed into two intervals in S * , let (t −1 , t ] and (t , t +1 ]. Let e j, and e j, +1 be the execution time of j into (t −1 , t ] and (t , t +1 ], respectively. Assume, w.l.o.g., that e j, ≥ e j, +1 . In S, we execute the whole work of j during (t −1 , t ] such that its execution takes exactly (e j, +e j, +1 ) 2 time. In order to do this, we just have to increase the speed s j that j had in S * by a factor of 2. Hence, the energy consumption of j in S * was (e j, + e j, +1 )s α j , while in S is (e j, +e j, +1 ) 2 (2s j ) α . By summing up for all jobs we get that E(S ) ≤ 2 α−1 OP T . As S is an optimal schedule, we get that E(S) ≤ 2 α−1 OP T .
Next, we describe how to pass from the transformed problem to the heterogeneous multiprocessor speed-scaling problem without migrations. For every interval (t i−1 , t i ], 1 ≤ i ≤ k + 1, we create a processor i. For every job j ∈ J which is alive during a part or during the whole interval (t i−1 , t i ], 1 ≤ i ≤ k + 1, we set: We next apply the approximation algorithm presented in Section 3 which is based on the rounding of a configuration LP. Note that the number of configurations of each job here is polynomial to n and to 1/ε, as we consider that preemptions are not allowed and hence a configuration can only contain continuous slots. Thus, the resulting LP after the transformation has polynomial size and it can be directly solved without using the compact LP presented in Section 3.3. Note also that the algorithm presented in Section 3 will create a preemptive schedule S. However, we can transform S into a non-preemptive schedule S of the same energy consumption. To see this, note that in each processor i, 1 ≤ i ≤ k + 1, each job j ∈ J has r i,j = 0 or d i,j = t i − t i−1 . Hence, by applying the Earliest Deadline First policy to each processor separately we can get the non-preemptive schedule S . Theorem 3. The single processor speed-scaling problem without preemptions can be approximated within a factor of 2 α−1 ((1 + ε 1−ε )(1 + 2 n−2 )) αB α , where ε ∈ (0, 1).

Job Shop Scheduling with Preemptions
In this section, we consider the energy minimization problem in a job shop environment. An instance of the problem contains a set of jobs J , where each job j ∈ J consists of µ j operations O j,1 , O j,2 , . . . , O j,µ j , which must be executed in this order. That is, there are precedence constraints of the form O j,k → O j,k+1 , for each j ∈ J and 1 ≤ k ≤ µ j − 1, meaning that the operation O j,k+1 can start only once the operation O j,k has finished. Let µ be the number of all the operations, i.e., µ = j∈J µ j . Each operation O j,k has an amount of work w j,k . Moreover, we are given a set of m heterogeneous processors P. Every operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , is also associated with a single processor i ∈ P on which it must be entirely executed. Note that more than one operations of the same job may have to be executed on the same processor. Furthermore, for each operation O j,k , we are given a release date r j,k and a deadline d j,k . For each j ∈ J , we can assume that r j,1 ≤ r j,2 ≤ . . . ≤ r j,µ j as well as d j,1 ≤ d j,2 ≤ . . . ≤ d j,µ j . Preemptions of operations are allowed. The objective is to find a feasible schedule of minimum energy consumption. There are no known results for the energy minimization job shop problem in the bibliography. However, job shop is an interesting setting since it introduces the notion of dependency between operations of the same job. Although these precedence dependencies have a very simple form and the operations are already preassigned to the processors, the job shop environment could be considered as a first step in the direction of scheduling parallel applications as these are defined in [1]. Moreover, for the related problem of scheduling non-preemptive jobs subject to general precedence constraints and given a budget of energy in order to minimize the schedule length (makespan), a 2-approximation algorithm is given in [15]. There is also a relation between shop problems and the MapReduce programming paradigm for which some energy-aware results have been presented in [12].
Next, we formulate the job shop problem as an integer configuration LP. A configuration is a schedule for a job, i.e., a schedule for all its operations. In order to define formally the notion of a configuration, we have to discretize the time. We define the time points t 0 , t 1 , . . . , t τ , in increasing order, where each t corresponds to either a release date or a deadline, so that there is a corresponding t for each possible release date and deadline of an operation. Then, we define the intervals I = (t −1 , t ], for 1 ≤ ≤ τ , and we denote by |I | the length of I . We further discretize the time inside each interval I , 1 ≤ ≤ τ , based on the following lemmas in which it is assumed that the release dates and the deadlines of all operations are integers.

Lemma 4.
There is a feasible schedule with energy consumption at most (1 + ε) α · OP T in which each piece of each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , executed during the interval I , 1 ≤ ≤ τ , starts and ends at a time point t −1 + h ε µ(1+ε) |I |, where h ≥ 0 is an integer and ε ∈ (0, 1).
Proof. Consider an optimal schedule S * for our problem and an interval I , 1 ≤ ≤ τ . We define the time points u 0 = t −1 , u 1 , u 2 , . . . , u p = t , in increasing order, where each u q , 0 ≤ q ≤ p, corresponds to either a begin time or a completion time of a piece of an operation on any processor during I in S * , so that for each begin time and completion time there is a corresponding u q . We call the interval (u q−1 , u q ], for 1 ≤ q ≤ p, a slice. Consider any such slice and any processor i ∈ P. During the whole slice, the processor is either idle of fully occupied by a single operation. Note that we can see the part of the schedule S * during the interval I as a schedule for the preemptive job shop problem without speed-scaling where the makespan is at most |I |. Baptiste [16] at al. (see their Corollary 4.2) showed that there is always a schedule for this problem with at most µ slices.
We will now transform S * to a feasible schedule S satisfying the lemma. Consider an interval I , 1 ≤ ≤ τ . We first create an idle period of length at least ε 1+ε |I |. This can be done by increasing the speeds of all processors of all slices in I by a factor of 1 + ε. Hence, the energy consumption becomes at most a factor of (1 + ε) α far from the energy of S * . In order to obtain S, we round up the length of each slice to the closest h ε µ(1+ε) |I |. In this way, the length of each slice is increased by at most ε µ(1+ε) |I |. Since the number of slices is at most µ, the total processing time in I is increased by at most µ( ε µ(1+ε) |I |) = ε 1+ε |I |, which is the length of the created idle period. Thus, S is a feasible schedule, and the lemma follows.
Lemma 5. There is a feasible schedule with energy consumption at most (1+ε) α (1+ 2 µ−2 ) α (1+ ε 1−ε ) α · OP T such that, for each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , there are two time points b j,k and c j,k , as the ones defined in Lemma 4, so that each piece of O j,k starts and ends at a time point where h ≥ 0 is an integer and ε ∈ (0, 1). Proof. Consider a schedule S satisfying Lemma 4. In S, each interval I , 1 ≤ ≤ τ , is partitioned into polynomial to µ and to 1/ε number of equal length slots. In each of these slots, each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , is either executed during the whole slot or is not executed at all. Let b j,k and c j,k be the starting time of the first piece and the completion time of the last piece, respectively, of O j,k in S.
We will first transform the schedule S to a feasible schedule S in which the execution time of each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , is at least ε µ (c j,k − b j,k ). For each time slot s of Lemma 4 we increase the processors' speeds in order to create an idle period of length ε|s|, where |s| is the length of the slot. This can be done by increasing the speeds by a factor of 1 + ε 1−ε , and hence the total energy consumption in S is increased by a factor of (1 + ε 1−ε ) α . For each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , we reserve an ε|s| µ period to each slot s in (b j,k , c j,k ]. We then decrease the speed of O j,k so that its total work is executed during the periods where O j,k was executed in S and the additional c j,k − b j,k reserved periods. Therefore, in the final schedule the processing time of each operation O j,k is at least ε µ (c j,k − b j,k ). After this transformation we apply the Earliest Deadline First (EDF) policy to the operations of each processor separately, considering as release date and deadline of each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , the time points b j,k and c j,k , respectively. This ensures that we have a feasible schedule with at most µ preemptions, as in EDF an operation may be interrupted only when another operation is released.
Next, we transform S to a new schedule S satisfying the lemma. For each operation O j,k , j ∈ J and 1 ≤ k ≤ µ j , we split the interval (b j,k , c j,k ] into slots of length ε µ 3 (c j,k −b j,k ), i.e., we partition (b j,k , c j,k ] into intervals of the form ( , where h ≥ 0 is an integer. As the processing time of j in S is at least ε µ (c j,k − b j,k ), the execution of O j,k has been partitioned into at least µ 2 slots. In each of these slots, the operation O j,k either is executed during the whole slot or is executed into a fraction of it. As we have applied the EDF policy, each operation is preempted at most µ times. Thus, among the time slots that O j,k is executed, at most 2µ of them are not fully occupied by O j,k because for each preempted piece of O j,k at most two slots may not be completely covered by it. We can modify the schedule S and get the schedule S in which the operation O j,k is executed only to the slots where it was entirely executed in S . The number of these slots is at least µ 2 − 2µ. Thus, we have to increase the speed of O j,k by a factor of 1 + 2 µ−2 , and hence the energy is increased by a factor of (1 + 2 µ−2 ) α . By taking into account Lemma 4 and the fact that S is a factor of (1 + ε 1−ε ) α far from S, the lemma follows. Henceforth, we consider schedules that satisfy the above lemma. That is, for each operation O j,k , we consider that there is a polynomial number of candidate time points b j,k and c j,k such that O j,k is entirely executed during (b j,k , c j,k ]. Moreover, the interval (b j,k , c j,k ] is partitioned into a polynomial number of equal length slots so that, given such a slot, the operation O j,k is either executed during the whole slot or is not executed at all during that slot. Now, we can formulate our problem as an integer program. A configuration c is a schedule for a single job j, i.e., a feasible schedule for all its operations. So, a configuration specifies the interval (b j,k , c j,k ] and the time slots inside this interval, with respect to Lemma 5, during which each operation O j,k of the job j is executed. Let C j be the set of all possible feasible configurations for job j ∈ J .
In order to proceed, we need an additional definition by combining the slots of all the operations. Specifically, given a processor i ∈ P, consider the time points of all operations of the form b j,k + h ε µ 3 (c j,k − b j,k ) as introduced in Lemmas 4 and 5. Let t i,1 , t i,2 , . . . , t i,p i be the ordered sequence of these time points on the processor i ∈ P. In a schedule that satisfies Lemma 5, in each interval (t i,q , t i,q+1 ], 1 ≤ q ≤ p i − 1, either there is exactly one operation that is executed during the whole interval or the interval is idle on i. Note also that these intervals might not have the same length. Let I be the set of all these intervals for all processors. According to Lemmas 4 and 5, the size of I is polynomial to the size of the instance and to 1/ε. Recall that the execution interval of a job j can be partitioned into a set of equal length time slots so that, for every such time slot, either a single operation of j is executed during the whole slot or j is not executed at all during that slot. It has to be noticed that, by definition, every such slot consists of one or more intervals in I and every interval in I (during which some operation of j is alive) is contained entirely in a single slot of j.
If we know the configuration according to which the job j is executed, we can compute the energy consumption E j,c for the execution of j because there is always an optimal schedule such that each operation is executed with constant speed. For notational convenience, we say that I ∈ (j, c), if the job j is executed during the interval I ∈ I according to the configuration c. That is, there is an operation O j,k , two time points b j,k and c j,k , and a slot Constraint (10) enforces that each job is entirely executed according to exactly one configuration. Constraint (11) ensures that at most one job is executed in each interval I ∈ I. We consider the relaxed LP of the above integer program where the integrality constraints x j,c ∈ {0, 1} are replaced by the constraints x j,c ≥ 0, for all j ∈ J and c ∈ C j . This LP contains an exponential number of variables but it can be solved in polynomial time by applying the Ellipsoid algorithm to its dual as we explain in the following. The dual LP is We will show that the dual program can be solved in polynomial time by applying the Ellipsoid algorithm. In order to do so, it suffices to construct a polynomial time separation oracle. Assume that we are given a solution (λ j , κ I ) for the dual LP. The separation oracle works as follows. For each job j ∈ J , we try to minimize the term E j,c + I∈(j,c) κ I . If the value min c {E j,c + I∈(j,c) κ I } is less than λ j , then we have a violated constraint. Otherwise, the solution is feasible.
In order to find the configuration that minimizes the above expression, we use dynamic programming. Consider some configuration c. The contribution of the operation O j,k in the expression E j,c + I∈(j,c) κ I is the energy consumption of O j,k plus the κ I 's of the intervals I ∈ I contained in the time slots during which O j,k is executed. Let A k,I be the minimum contribution of the operations O j,1 , O j,2 , . . . , O j,k to the objective function of our separation problem among the configurations in which O j,k completes not later than I. Furthermore, let B k,I ,I be the minimum contribution of the operation O j,k to the objective function of separation problem among the configurations in which it is executed after I and not later than I. Clearly, A k,I = min In order to complete our dynamic programming algorithm, we have to specify a way of computing efficiently the term B k,I ,I . Assume that there are time slots, between I and I, during which O j,k can be executed respecting Lemma 5. If we restrict our attention to configurations in which O j,k is executed in exactly q ≤ slots, O j,k must be executed during the q slots with the minimum κ I 's so that E j,c + I∈(j,c) κ I is minimized. These slots can be computed easily. In order to compute B k,I ,I , it suffices to check all possible values q = 1, 2, . . . , . Thus, for each job j ∈ J , we can compute, in polynomial time, the minimum E j,c + I∈(j,c) κ I among all the configurations c ∈ C j , and hence we have a polynomial-time separation oracle for the dual LP. So, we can solve the dual LP in polynomial time with the Ellipsoid algorithm. Given our discussion in Section 4, we can solve the relaxed LP as well. Then, by applying the same randomized rounding algorithm and analysis as in Section 3.2, we obtain the following theorem.

Routing
Now, we turn our attention to the min-power routing problem. Formally, we are given a directed graph G = (V, E) and a set of demands D. Each demand i ∈ D is associated with a source node s i , a destination node t i and it requests d i integer units of bandwidth. We consider the special case where all the demands request the same bandwidth, i.e., d i = d for all i ∈ D. Each edge e ∈ E is associated with two constants α e and c e such that if f units of demand cross e, then there is an energy consumption equal to c e f αe . The objective is to route all the demands from their sources to their destinations so that the total energy consumption is minimized. We consider the unsplittable version of the problem in which each demand has to be routed through a single path.
Andrews et al. [6] formulated the above problem as an integer convex program and they presented an analysis based on randomized rounding. Using a similar analysis as in Section 3.2, we show that the algorithm presented in [6] has a significantly better approximation ratio.
In order to obtain the integer convex programming formulation in [6] for the min-power routing problem, we introduce a variable x e , for all e ∈ E, which corresponds to the number of demands that cross the edge e and a binary variable y i,e which indicates if the demand i ∈ D crosses the edge e. The integer convex program follows.
The above integer convex program is a valid formulation for our problem. Our goal is to minimize the total energy consumption of all edges, i.e., e∈E c e d αe x αe e . Since all variables x e are integers in any feasible integral solution, an optimal integer solution for the above program corresponds to an optimal integer solution to the program with objective function e∈E c e d αe x αe e and the same set of constraints. However, the use of this objective leads to an integer program with large integrality gap [6]. For this reason, we modify the objective to be e∈E c e d αe max{x e , x αe e } obtaining a program with smaller integrality gap. Equation (15) relates the variables x e and y i,e , while Equations (16)-(18) ensure the flow conservation.
In order to obtain a feasible integral solution for our problem, we solve the relaxation of the above convex program, where the constraints y i,e ∈ {0, 1} are relaxed so that y i,e ≥ 0, and we obtain a fractional solution. Then, we apply a randomized rounding procedure, introduced by Raghavan and Thompson [27], in order to select a path for each demand. Specifically, for each demand i ∈ D, we consider the subgraph of G that contains only the edges with y i,e > 0 and define the standard flow decomposition. We compute a (s i , t i )-path p on this graph and we set z i,p = min e∈p {y i,e }. Then, we subtract z i,p from the variables y i,e which correspond to the edges of the path p. We continue this procedure until there are no (s i , t i )-paths. The randomized rounding algorithm chooses a path p for the demand i with probability z i,p . Note that p z i,p = 1.
Theorem 5. There is aB αmax -approximation algorithm for the min-power routing problem with uniform demands.
Proof. Consider an edge e ∈ E and let λ e = i∈D y i,e be the expected number of demands that cross e. The expected energy consumption on the edge e is Since y i,e come from a mathematical programming solver, we can assume that there exists N ∈ N such that y i,e = λ e · q i,e N for some q i,e ∈ N. Similarly with the proof of Theorem 1, we can chop each y i,e into q i,e pieces z i,e, = λe N . Note that, N = i∈D q i,e since i∈D y i,e λe = 1. For the ease of exposition we identify the set {1, 2, . . . , N } with the set of all pairs ((i, e), ) such that i ∈ D and 1 ≤ ≤ q i,e . By applying Proposition 2 iteratively, we get where P λe is a Poisson random variable with parameter λ e . By summing up over all edges and setting α = max e∈E {α e }, the theorem follows.

Conclusions
We have presented a unified framework for dealing with various scheduling and routing problems in speed-scaling setting. Our algorithms are based on configuration linear programs and randomized rounding. Improving the approximation ratios or studying the inapproximability of the considered problems are some of the interesting directions for future work.
The most intriguing open question is the existence of a constant factor approximation algorithm for the non-preemptive multiprocessor scheduling problem.