Closing the Gap for Makespan Scheduling via Sparsification Techniques

Makespan scheduling on identical machines is one of the most basic and fundamental packing problems studied in the discrete optimization literature. It asks for an assignment of $n$ jobs to a set of $m$ identical machines that minimizes the makespan. The problem is strongly NP-hard, and thus we do not expect a $(1+\epsilon)$-approximation algorithm with a running time that depends polynomially on $1/\epsilon$. Furthermore, Chen et al. [3] recently showed that a running time of $2^{(1/\epsilon)^{1-\delta}}+\text{poly}(n)$ for any $\delta>0$ would imply that the Exponential Time Hypothesis (ETH) fails. A long sequence of algorithms have been developed that try to obtain low dependencies on $1/\epsilon$, the better of which achieves a running time of $2^{\tilde{O}(1/\epsilon^2)}+O(n\log n)$ [11]. In this paper we obtain an algorithm with a running time of $2^{\tilde{O}(1/\epsilon)}+O(n\log n)$, which is tight under ETH up to logarithmic factors on the exponent. Our main technical contribution is a new structural result on the configuration-IP. More precisely, we show the existence of a highly symmetric and sparse optimal solution, in which all but a constant number of machines are assigned a configuration with small support. This structure can then be exploited by integer programming techniques and enumeration. We believe that our structural result is of independent interest and should find applications to other settings. In particular, we show how the structure can be applied to the minimum makespan problem on related machines and to a larger class of objective functions on parallel machines. For all these cases we obtain an efficient PTAS with running time $2^{\tilde{O}(1/\epsilon)} + \text{poly}(n)$.


Introduction
Minimum makespan scheduling is one of the foundational problems in the literature on approximation algorithms [7,8]. In the identical machine setting the problem asks for an assignment of a set of n jobs J to a set of m identical machines M. Each job j ∈ J is characterized by a non-negative processing time p j ∈ Z >0 . The load of a machine is the total processing time of jobs assigned to it, and our objective is to minimize the makespan, that is, the maximum machine load. This problem is usually denoted P ||C max . It is well known to admit a polynomial time approximation scheme (PTAS) [10], and there has been many subsequent works improving the running time or deriving PTAS's for more general settings. The fastest PTAS for P ||C max achieves a running time of 2 O(1/ε 2 ) log 3 (1/ε)) + O(n log n) for (1 + ε)-approximate solutions [11]. Very recently, Chen et al. [3] showed that, assuming the exponential time hypothesis (ETH), there is no PTAS that yields (1 + ε)-approximate solutions for ε > 0 with running time 2 (1/ε) 1−δ + poly(n) for any δ > 0 [3].
Given a guess T ∈ N on the optimal makespan, which can be found with binary search, the problem reduces to deciding the existence of a packing of the jobs to m machines (or bins) of capacity T . If we aim for a (1 + ε)-approximate solution, for some ε > 0, we can assume that all processing times are integral and T is a constant number, namely T ∈ O(1/ε 2 ). This can be achieved with well known rounding and scaling techniques [1,2,9] which will be specified later. Let π 1 < π 2 < . . . < π d be the job sizes appearing in the instance after rounding, and let b k denote the number of jobs of size π k . The mentioned rounding procedure implies that the number of different job sizes is d = O((1/ε) log(1/ε)). Hence, for large n we obtain a highly symmetric problem where several jobs will have the same processing time. Consider the knapsack polytope P = {c ∈ R d ≥0 : π · c ≤ T }. A packing on one machine can be expressed as a vector c ∈ Q = Z d ∩ P, where c k denotes the number of jobs of size π k assigned to the machine. Elements in Q = Z d ∩ P are called configurations. Considering a variable x c ∈ Z ≥0 that decides the multiplicity of configuration c in the solution, our problem reduces to solving the following linear integer program (ILP): x c ∈ Z ≥0 for all c ∈ Q.
In this article we derive new insights on this ILP that help us to design faster algorithms for P ||C max and other more general problems. These including makespan scheduling on related machines Q||C max , and a more general class of objective functions on parallel machines. We show that all these problems admit a PTAS with running time 2 O((1/ε) log 4 (1/ε)) +poly(n). Hence, our algorithm is best possible up to polylogarithmic factors in the exponent assuming ETH [3].
The fastest PTAS known up to date for P ||C max achieves a running time of 2 O((1/ε) 2 log 3 (1/ε)) + O(n log n) [11]. More generally, this work gives an EPTAS for the case of related (uniform) machines, where each machine i ∈ M has a speed s i and assigning to i job j implies a processing time of p j /s i . For this more general case the running time is 2 O((1/ε) 2 log 3 (1/ε)) + poly(n). For the simpler case of P ||C max , the ILP can be solved directly since the number of variables is a constant. This can be done with Lentras' algorithm [15], or even with Kannan's algorithm [14] that gives an improved running time. This technique yields a running time that is doubly exponential in 1/ε. This was, in essence, the approach by Alon et al. [1,2] and Hochbaum and Shmoys [9]. To lower the dependency on 1/ε, Jansen [11] uses a result by Eisenbrand and Shmonin [4] that implies the existence of a solution x with support of size at most O(d log(dT )) = O((1/ε) log 2 (1/ε)). First guessing the support and then solving the ILP with O((1/ε) log 2 (1/ε)) integer variables and using Kannan's algorithm yields the desired running time of 2 O((1/ε) 2 log 3 (1/ε)) + O(n log n).
The configuration ILP has recently been studied in the context of the (1-dimensional) cutting stock problem. In this case, the dimension d is constant, T = 1, and π is a rational vector. Moreover, π and b are part of the input. Goemans and Rothvoß [5] obtain an optimal solution in time log(∆) 2 O(d) , where ∆ is the largest number appearing in the denominator of π k or the multiplicities b k . This is achieved by first showing that there exists a pre-computable setQ ⊆ Q with polynomial many elements, such that there exists a solution x that gives all but constant (depending only on d) amount of weight toQ. We remark that applying this result to a rounded instance of P ||C max yields a running time that is doubly exponential on 1/ε.

Our Contributions
Our main contribution is a new insight on the structure of the solutions of [conf-IP]. These properties are specially tailored to problems in which T is bounded by a constant, which in the case of P ||C max can be guaranteed by rounding and scaling. The same holds for Q||C max with a more complex rounding and case analysis.
We first classify configurations by their support. We say that a configuration is simple if its support is of size at most log(T + 1), otherwise it is complex. Our main structural result 2 states that there exists a solution x in which all but O(d log(dT )) weight is given to simple configurations, the support is bounded by O(d log(dT )) (as implied by Eisenbrand and Shmonin [4]) and no complex configuration has weight larger than 1.
where Q c denotes the set of complex configurations. We call a solution satisfying the properties of the theorem thin. The theorem can be shown by iteratively applying a sparsification lemma that shows that if a solution gives a weight of two or more to a complex configuration, then we can replace this partial solution by two configurations with smaller support. The sparsification lemma is shown by a simple application of the pigeonhole principle. The theorem can be shown by mixing this technique with the theorem of Eisenbrand and Shmonin [4] and a potential function argument.
As an application to our main structural theorem, we derive a PTAS for P ||C max by first guessing the jobs assigned to complex configurations. An optimal solution for this subinstance can be derived by a dynamic program. For the remaining instance we know the existence of a solution using only simple configurations. Then we can guess the support of such solution and solve the corresponding [conf-IP] restricted to the guessed variables. The main use of having simple configurations is that we can guess the support of the solution much faster, as the number of simple configuration is (asymptotically) smaller than the total number of configurations. The complete procedure takes time 2 O((1/ε) log 4 (1/ε)) + O(n log n). Moreover, using the rounding and case analysis of Jansen [11], we derive an mixed integer linear program that can be suitably decomposed in order to apply our structural result iteratively. This yields a PTAS with a running time of 2 O((1/ε) log 4 (1/ε)) + poly(n) for Q||C max .
Similarly, we can extend our results to derive PTAS's for a larger family of objective functions as considered by Alon et al. [1,2]. Let ℓ i denote the load of machine i, that is, the total processing time of jobs assigned to machine i for a given solution. Our techniques then gives a PTAS with the same running time for the problem of minimizing the L p -norms of the loads (for fixed p), and maximizing min i∈M ℓ i , among others. To solve this problem, we can round the instance and state an IP analogous to [conf-IP] but considering an objective function. However, the objective function prevents us to use the main theorem as it is stated. To get over this issue, we study several ILPs. In each ILP we consider x c to be a variable only if c has a given load, and fix the rest to be some optimal solution. Applying to each such ILP Theorem 1, plus some extra ideas, yields an analogous structural theorem. Afterwards, an algorithm similar to the one for makespan minimization yields the desired PTAS.
From an structural point of view, our sparsification lemma has other consequences on the structure of the knapsack polytope and the LP-relaxation of the [conf-IP]. More precisely, we can show that any vertex of the convex hull of Q must be simple. This, for example, helps us to upper bound the number of vertices by 2 O(log 2 (T )+log 2 (d)) . Moreover, we can show that the configuration-LP, obtained by replacing the integrality restriction in [conf-IP] by x ≥ 0, if it is feasible then admits a solution whose support consist purely of simple configurations. Due to space limitations we leave many details and proofs to the appendix.

Preliminaries
We will use the following notation throughout the paper. By default log(·) = log 2 (·), unless stated otherwise. Given two sets A, I, we will denote by A I the set of all vectors indexed by I with entries in A, that is, A I = {(a i ) i∈I : a i ∈ A for all i ∈ I}. Moreover, for A ⊆ R, we denote the support of a vector a ∈ A I as supp(a) = {i ∈ I : a i = 0}.
We consider an arbitrary knapsack polytope is a non-negative integral (row) vector and T is a positive integer. We assume without loss of generality that each coordinate π k of π is upper bounded by T (otherwise c k = 0 for all c ∈ Z d ∩ P). We focus on the set of integral vectors in P which we denote by Q = Z d ∩ P. We call an element c ∈ Q a configuration. Given b ∈ R d , consider the problem of decomposing b as a conic integral combination of m configurations. That is, our aim is to find a feasible solution to [conf-IP], defined above. A crucial property of the [conf-IP] is that there is always a solution with a support of small cardinality. This follows from a Caratheodory-type bound obtained by Eisenbrand and Shmonin [4]. Since we will need the argument later, we state the result applied to our case and revise its (very elegant) proof. We split the proof in two lemmas.
For a given subset A ⊆ Q, let us denote by x A the indicator vector of A, that is x A c = 1 if c ∈ A, and 0 otherwise. Let us also denote by M the (d + 1) × |Q| matrix that defines the system of equalities (1) and (2). Lemma 2 (Eisenbrand and Shmonin [4]). Let x ∈ Z Q ≥0 be a vector such that | supp(x)| > 2(d + 1) log(4(d + 1)T ). Then there exist two disjoint sets A, Proof. Let s := | supp(x)|. Each coordinate of M is smaller than T . Hence, for any A ⊆ supp(x), each coordinate of M x A is no larger than |A| · T ≤ sT . Thus, M x A belongs to {0, . . . , sT } d+1 , and hence there are at most (sT + 1) d+1 = 2 (d+1) log(sT +1) different possibilities for vector M x A , over all possible subsets A ⊆ supp(x). On the other hand, there are 2 s many different subsets of supp(x).

Lemma 3 (Eisenbrand and Shmonin
Proof. Let x be a solution to [conf-IP] that minimizes | supp(x)| = s. Assume by contradiction that s > 2(d + 1) log(4(d + 1)T ). We show that we can find another solution is also a solution to [conf-IP] and has a strictly smaller support since a configuration c * ∈ arg min{x c : c ∈ A} satisfies x ′ c * = 0.

Structural Results
Recall that we call a configuration c simple if | supp(c)| ≤ log(T + 1) and complex otherwise. An important observation to show Theorem 1 is that if c is a complex configuration, then 2c can be written as the sum of two configurations of smaller support. This is shown by the following Sparsification Lemma.
Lemma 4 (Sparsification Lemma). Let c ∈ Q be a complex configuration. Then there exist two configurations c 1 , c 2 ∈ Q such that 1. π · c 1 = π · c 2 = π · c, 3. supp(c 1 ) supp(c) and supp(c 2 ) supp(c). On the other hand, for any vector c S ∈ V it holds that π · c S ≤ π · c ≤ T . Hence, π · c S ∈ {0, 1 . . . , T } can take only T + 1 different values. Using that c is a complex configuration and hence 2 | supp(c)| > 2 log(T +1) = T + 1, the pigeonhole principle ensures that there are two different non-empty configurations c S , c R ⊆ V with π · c S = π · c R . By removing the intersection, we can assume w.l.o.g. that S and R have no intersection. We define c 1 = c−c S +c R and c 2 = c−c R +c S , which satisfy the properties of the lemma as π · c 1 = π · c − π · c S + π · c R = π · c and Since supp(c 1 ) ⊆ supp(c) \ S and supp(c 2 ) ⊆ supp(c) \ R, property 3 is satisfied.
With Lemma 4 we are ready to show Theorem 1. For the proof it is tempting to apply the lemma iteratively, replacing any complex configuration that is used twice by two configurations with smaller support. This can be repeated until there is no complex configuration taken multiple times. Then we can apply the technique of Lemma 3 to the obtained solution to bound the cardinality of the support. However, the last step might break the structure obtained if the solution implied by Lemma 3 uses a complex configuration more than once. In order to avoid this issue we consider a potential function. We show that a vector minimizing the chosen potential uses each complex configuration at most once, and that the number of complex configurations in the support is bounded. Finally, we apply the techniques from Lemma 3 restricted to variables corresponding to simple configurations.
Proof of Theorem 1. Consider the following potential function of a solution x ∈ Z Q ≥0 of [conf-IP], Let x be a solution of [conf-IP] with minimum potential Φ(x), which is well defined since the set of feasible solutions has finite cardinality. We show two properties of x. P1: x c ≤ 1 for each complex configuration c ∈ Q. Assume otherwise. Consider the two configurations c 1 and c 2 implied by the previous lemma. We define a new solution x ′ e = x e for e ∈ {c, c 1 , P2: The number of complex configurations in supp(x) is at most 2(d + 1) log(4(d + 1)T ).
Letx be the vector defined then we have constructed a new solution with smaller potential, contradicting our assumption on the minimality of Φ(x). We conclude that Φ(x B ) = Φ(x A ) and thus Φ(x) = Φ(x ′ ). By construction of x ′ , we obtain that x ′ c > x c ≥ 1 for any complex configuration c ∈ B. Having multiplicity ≥ 2 for a complex configuration c, we can proceed as in Case 1 to find a new solution with decreased potential, which yields a contradiction.
Given these two properties, to conclude the theorem it suffices to upper bound the number of simple configurations by 2(d + 1) log(4(d + 1)T ). Suppose this property is violated, then we find two sets A, B ⊆ supp(x) of simple configurations (see Lemma 2) with M x A = M x B and proceed as in Lemma 3. Since Lemma 3 is only applied to simple configurations, properties P1 and P2 continue to hold and the theorem follows.
Our techniques, in particular our Sparsification Lemma, imply two corollaries on the structure of the knapsack polytope and the LP-relaxation implied by the [conf-IP].
Corollary 5. Every vertex of conv.hull(Q) is a simple configuration. Moreover, the total number of simple configurations in Q is upper bounded by 2 O(log 2 (T )+log 2 (d)) and thus the same expression upper bounds the number of vertices of conv.hull(Q).
Proof. Consider a complex configuration c ∈ Q. By Lemma 4 we know that there exist c 1 , c 2 ∈ Q with c 1 , c 2 = c such that 2c = c 1 + c 2 . Hence, c is not a vertex of Q as it can be written as a convex combination c = c 1 /2 + c 2 /2.
To bound the number of simple configurations fix a set D ⊆ {1, . . . , d}. Notice that the number of configurations c with supp(c) = D is at most T |D| . For simple configurations it suffices to take D with cardinality at most log(T + 1). Since the number of subsets D ⊆ {1, . . . , d} with cardinality i is d i , we obtain that the number of simple configurations is at most The following corollary follows as each complex configuration can be represented by a convex combination of simple configurations.
Assume that there exists c ∈ Q such that c is complex and x c > 0. Then by the previous corollary, configuration c can be written as c = q∈Q λ q q, where q∈Q λ q = 1, λ q ≥ 0 for all q ∈ Q, and λ q = 0 if q ∈ Q is complex. Consider a new solution x ′ defined as This new solution is also feasible for [conf-LP]. As x ′ c = 0, the number of complex configurations in the support of the solution is reduced by 1. This procedure can be repeated until we have a solutionx whose support contains only simple configurations.

Applications to Scheduling on Parallel Machines
In what follows we show how to exploit the structural insights of the previous section to derive faster algorithms for parallel machines scheduling problems. We start by considering P ||C max , where we seek to assign a set of jobs J with processing times p j ∈ Z >0 to a set M of m machines. For a given assignment a : J → M, we define the load of a machine i as j:a(j)=i p j and the makespan as the maximum load of jobs over all machines, which is the minimum time needed to complete the execution of all jobs on the processors. The goal is to find an assignment J → M that minimizes the makespan.
We first follow well known rounding techniques [1,2,9,10]. Consider an error tolerance 0 < ε < 1/3 such that 1/ε 2 is an integer. To get an estimation of the optimal makespan, we follow the standard dual approximation approach. First, we can use, e.g., the 2-approximation algorithm by Graham [7] to get an initial guess of the optimal makespan. Using binary search, we can then estimate the optimal makespan within a factor of (1 + ε) in O(log(1/ε)) iterations. Therefore, it remains to give an algorithm that decides for a given makespan T , if there exists an assignment with makespan (1+O(ε))T or reports that there exists no assignment with makespan ≤ T .
For a given makespan T we define the set of big jobs J big = {j ∈ J : p j ≥ εT } and the set of small jobs J small = J \ J big . The following lemma shows that small jobs can be replaced from the instance by adding big jobs, each of size εT , as placeholders. Let S be the sum of processing times of jobs in J small and let S * denote the next value of S rounded up to the next multiple of εT , that is, S * = εT · ⌈S/(εT )⌉. We define a new instance containing only big jobs by J * = J big ∪ J new , where J new contains S * /(εT ) ∈ N jobs of size εT . Proof. We modify the assignment a of jobs in J by replacing the set of small jobs on each machine by jobs in J new . Let S i be the total processing time of small jobs assigned to machine i. Then the small jobs are replaced by (at most) S * i /(εT ) jobs in J new , where S * i denotes the value of S i rounded up to the next multiple of εT . As the new solution processes all jobs in J new and the load on each machine increases hence by at most εT . Having an assignment for the big jobs J * , we can easily obtain a schedule for jobs J , by adding the small items greedily into the space of the placeholder jobs J new .
By scaling the processing times of jobs in J * , we can assume that the makespan T has value 1/ε 2 . Also notice that we can assume that p j ≤ T for all j, otherwise we cannot pack all jobs within makespan T . This implies that each job j ∈ J * has a processing time of 1/ε ≤ p j ≤ 1/ε 2 . In the following we give a transformation of big jobs in J * by rounding their processing times. We first round the jobs to the next power of 1 + ε as p ′ j = (1 + ε) ⌈log (1+ε) pj ⌉ , and thus all rounded processing times belong to Π ′ = {(1 + ε) k : 1/ε ≤ (1 + ε) k ≤ (1 + ε)/ε 2 and k ∈ N}. We further round processing times p ′ j to the next integerp j = ⌈p ′ j ⌉ and define a new set Π = {⌈p⌉ : p ∈ Π ′ }. Notice that Π only contains integers and |Π| ≤ |Π ′ | ∈ O((1/ε) log(1/ε)).

Lemma 8.
If there is a feasible schedule of jobs J * with processing times p j onto m machines with makespan T * ≤ (1 + ε)T , then there is also a feasible schedule of jobs J * with rounded processingp j with a makespan of at most (1 + 5ε)T . Furthermore, the number of different processing times is at most |Π| ∈ O((1/ε) log(1/ε)).
Proof. Consider a feasible schedule of jobs in J * with processing times p j onto m machines with makespan T * . Let J i1 , . . . , J ir be the set of jobs processed on machine Hence, the same assignment a with processing times p ′ j yields a makespan of at most (1 + ε)T * ≤ (1 + ε) 2 T = 1/ε 2 + 2/ε + 1. Since p ′ j ≥ p j ≥ 1/ε, on every machine are at most 1/ε + 2 jobs. Hence, rounding the processing times p ′ j of each job to the next integer increase the load on each machine by at most 1/ε + 2. Recalling that ε < 1/3, we obtain a feasible schedule with makespan at most (1+ε) In what follows we give an algorithm that decides in polynomial time the existence of a solution for instance J * with processing timesp j and makespanT = ⌊(1 + 5ε)T ⌋. We call numbers in Π by π 1 , . . . , π d and define the vector π = (π 1 , π 2 , . . . , π d ) ∈ N d of rounded processing times. We consider configurations to be vectors in Q = P ∩ Z d , where P = {c ∈ R d ≥0 : π · c ≤T } is a knapsack polytope (see Section 3). As before, we say that a configuration is simple if | supp(c)| ≤ log(T + 1), and complex otherwise. For a given assignment of jobs to machines, we say that a machine follows a configuration c if c k is the number of jobs of size π k assigned to the machine. We denote by Q c ⊆ Q the set of complex configurations and by Q s ⊆ Q the set of simple configurations.
Let b k be the number of jobs of size π k in the instance J * (with processing timesp). Consider an ILP with integer variables x c for each c ∈ Q, which denote the number of machines that follow configuration c. With these parameters the problem of scheduling all jobs in a solution of makespanT is equivalent to finding a solution to [conf-IP]. To solve the ILP we use, among other techniques, Kannan's algorithm [14] which is an improvement on the algorithm by Lenstra [15]. The algorithm has a running time of 2 O(N log N ) s where N is the number of variables and s is number of bits used to encode the input of the ILP in binary.

Solve the ILP restricted to configurations inQ s :
One of the key observations to prove the running time of the algorithm is that the number of simple configurations |Q s | is bounded by a quasi polynomial term: This follows easily by Corollary 5, using that |T | ∈ O(1/ε 2 ) and d = |Π| ∈ O((1/ε) log(1/ε)).
Lemma 10. Algorithm 9 can be implemented with a running time of Proof. In step 1, the algorithm guesses which jobs are processed on machines following a complex configurations. Since each configuration contains at most O(1/ε) jobs, there are at most O(m c /ε) = O((1/ε 2 ) log 2 (1/ε)) jobs assigned to such machines. For each size π k ∈ Π, we guess the number b c k of jobs of size π k assigned to such machines. Hence, we can enumerate all possibilities for jobs assigned to complex machines in time 2 O((1/ε) log 2 (1/ε)) . After guessing the jobs, we can assign them to a minimum number of machines in step 2 (with makespanT ) with a simple dynamic program that stores vectors (ℓ, z 1 , . . . , z d ) with z k ≤ b c k being the number of jobs of size π k used in the first ℓ ≤ m c processors [13]. The size of the dynamic programming table is O(m c d k=1 (b c k +1)). For any vector (ℓ, z 1 , . . . , z d ), determining whether it corresponds to a feasible solution can be done by checking all vectors of the type (ℓ−1, z ′ 1 , . . . , z ′ d ) for z ′ k ≤ z k . Thus, the running time of the dynamic program is O(m c [ d k=1 (b c k + 1)] 2 ). Since b c k ∈ O((1/ε 2 ) log 2 (1/ε)) for each k, recalling that m c ∈ O((1/ε) log 2 (1/ε)), and that d = |Π| ∈ O((1/ε) log(1/ε)), we obtain that step 2 can be implemented with 2 O((1/ε) log 2 (1/ε)) running time.
Putting all pieces together, we conclude with the following theorem.
Proof. Consider a scheduling instance with job set J , processing times p j for j ∈ J and machine set M. The greedy algorithm by Graham to obtain a 2-approximation can be implemented in O(n log n). After guessing the makespan T , the processing times are sorted and rounded as described in Lemma 8. The rounding step can easily be implemented in O(n) time.
If n ≤ 2

Extension to other objectives
We now consider a more general family of objective functions defined by Alon et al. [1,2]. For a fixed function f : R ≥0 → R ≥0 , we consider the following two objective functions: where ℓ i denotes the load of machine i. Analogously, we study maximization versions of the problems For the minimization versions of the problem we assume that f is convex, while for (I') and (II') we assume it is concave. Moreover, we will need that the function satisfies the following sensitivity condition.
It is worth noticing that many interesting functions belong to this family. In particular (II) with f (x) = x corresponds to the minimum makespan problem, (I) with f (x) = x p , for constant p, corresponds to a problem that is equivalent to minimizing the L p -norm of the vector of loads. Similarly, (II') with f (x) = x corresponds to maximizing the minimum machine load. Notice that for all those objectives we have that 1/δ = O(1/ε).
The techniques of Alon et al. [2] are based on a rounding method and then solving an ILP. We based our results in the same rounding techniques. Consider an arbitrary instance of a scheduling problem on identical machines with objective function (I), (II), (I') or (II'). Their first observation is that, if L = j p j /m is the average machine load, then a job with p j ≥ L is scheduled alone on a machine in an optimal solution [2]. Hence, we can remove such job and a machine from the instance. In what follows, we assume without loss of generality, that p j < L for all j. For the sake of brevity, we summarize the rounding techniques of Alon et al. in the following theorem.
Theorem 13 (Alon et al. [2]). Consider an instance for the scheduling problem with job set J , identical machines M, and processing times p j for j ∈ J such that p j < L for all j. There exists a linear time algorithm that creates a new instance I ′ with job set J ′ , machine set M, and processing times p ′ j . Moreover, there is an integer λ ≥ 1/δ with λ ∈ O(1/δ) such that the new instance satisfies the following: 1. Each job j in I ′ has processing time L/λ ≤ p ′ j ≤ L, and p ′ j is a integer multiple of L/λ 2 . 2. If L ′ = j p ′ j /m then L ≤ L ′ ≤ (1 + 2/λ)L.
4. There exists a linear time algorithm that transforms a feasible solution for instance I ′ with objective value V to a feasible solution for I with objective value V ′ such that Given this result, it suffices to find a (1 + ε)-approximate solution for instance I ′ .
To do so, we further round the processing times as in the previous section by definingp j as the value (1 + δ) ⌈log 1+δ p ′ j ⌉ rounded up to the next multiple of L/λ 2 for all j ∈ J ′ . Notice that Hence, for any assignment that gives a load ℓ i on machine i for p ′ j , the same assignment has a loadl i with ℓ i ≤l i ≤ (1 + δ) 2 ℓ i . By Condition 12 we conclude that the new optimal value Opt satisfies that (1 − O(ε))Opt ≤ Opt ≤ (1 + O(ε))Opt.

Lemma 14.
For ε > 0 small enough, the rounded instance with processing timesp j admits an optimal solution with makespan at most 4L.
Proof. Among all optimal solutions to the problem, consider one that minimizes i ℓ 2 i , where ℓ i is the load on machine i. Assume that there exists a machine i such that ℓ i > 4L. Notice that Since δ ≤ ε, for ε small enough (ε ≤ 1/10 suffices) we have that (1+δ) 4 ≤ 2 and thus jp j /m ≤ 2L. Also, recall thatp j ≤ (1+δ) 2 p ′ j ≤ (1+δ) 2 L ≤ 2L for any j, where the second to last inequality follows from Theorem 13. Since ℓ min = min i ℓ i ≤ jp j /m ≤ 2L, then ℓ i − ℓ min > 4L − 2L = 2L. Then, for any job j, we have that p j < ℓ i − ℓ min . Let j * be any job assigned to machine i. Hence, in particular we have that p j * < ℓ i − ℓ min .
Recall that for problems (I) and (II) function f is convex. Hence, it holds that f (x + ∆) + f (y − ∆) ≤ f (x) + f (y) for all 0 ≤ x ≤ y with 0 ≤ ∆ ≤ y − x [2]. Moreover, the inequality becomes strict if f is strictly convex. Setting x = ℓ min , y = ℓ i and ∆ = p j * , the inequality implies that moving job j * to machine i * ∈ arg min i ℓ i decreases strictly i ℓ 2 i . Moreover, the objective function (I) does not increase when performing this move, which yields a contradiction for this objective. Similarly, for problem (II) the objective does not increase since max{f (ξ) : ξ ∈ [x, y]} is always attained at x or y for f convex. This yields a contradiction for (II).
Analogously, for problems (I') and (II') function f is concave and thus f (x + ∆) + f (y − ∆) ≥ f (x) + f (y) holds for all 0 ≤ x ≤ y with 0 ≤ ∆ ≤ y − x [2]. Hence, moving job j * to machine i * decreases i ℓ 2 i but does not increase the objective (I'). Since f concave implies that min{f (ξ) : ξ ∈ [x, y]} is always attained at x or y, we also obtain a contradiction for (II'). The lemma follows.
Let L = j p j /m be the average machine load (of the original instance). After our rounding we obtain an instance I ′ with job set J ′ and processing timesp j for j ∈ J ′ . Moreover, thep j are multiples of L/λ 2 , where λ ≥ 1/δ is an integer such that λ = O(1/δ), and alsop j ≥ L/λ. It holds that there exists an optimal solution of the rounded instance with makespan at most 4L, see Lemma 14 (in particularp j ≤ 4L for all j). Let Π = {π 1 , . . . , π d } be the distinct values that the processing timesp j can take. Our rounding guarantees that d = |Π| = O((1/δ) log(1/δ)). We consider the knapsack polytope with capacityT := 4L, that is P = {c ∈ R d ≥0 : π · c ≤T }. Notice that π andT are integer multiples of L/λ 2 , and that P can also be written as {c ∈ R d ≥0 : π/(L/λ 2 ) · c ≤T /(L/λ 2 )}.
As before, we say that a configuration is simple if | supp(c)| ≤ log(T + 1), and complex otherwise. We denote by Q c ⊆ Q the set of complex configurations and by Q s ⊆ Q the set of simple configurations. In what follows we focus on objective function (I).
We set an ILP for the problem as before. Notice that each configuration c incurs a cost of f c := f (π · c). Moreover, we round and scale the values f c by definingf c = ⌈f c /(εf min )⌉, where f min = min c∈Q f c . It is not hard to see that solving a problem with those coefficients yields a (1 + ε)-approximate solution to the optimal solution of I ′ with processing timesp j . Let also b k be the number of jobs j of processing timep j = π k in J ′ . Consider the ILP obtained by adding to [conf-IP] the objective function min c∈Qf c · x c . We call this ILP [cost-conf-IP]. With our previous discussion, it suffices to solve this ILP optimally. To solve this problem, we first notice that the largest coefficient in the objective can be bounded as follows.
Lemma 15. If f satisfies Condition 12 then the largest value max c∈Qfc is upper bounded by 1/δ O(1) .
Proof. We first bound (max c∈Q f c )/(εf min ). Notice that Condition 12 implies that f is continuous on R ≥0 , and thus it admits a minimum and maximum in the interval [L/λ, 4L]. Let Consider first the case in which x min ≤ x max (this is not always true since f might not be monotone). We now use Condition 12 iteratively. Let y k := (1 + δ) k x min . Since y k ≤ y k−1 (1 + δ), Condition 12 implies that f (y k ) ≤ (1 + ε)f (y k−1 ). Iterating this idea we obtain that f (y k ) ≤ (1 + ε) k f (y 0 ). Taking k = ⌈log 1+δ (x max /x min )⌉ implies that x max ≤ y k ≤ x max (1 + δ) and thus, by Condition 12,it where the last expression follows since log 1+δ (1 + γδ) = ln(1 + γδ)/ ln(1 + δ) ≤ γδ/ ln(1 + δ) = O(γ) = O(1) (for δ small enough), and since λ = O(1/δ). We conclude that max c∈Qf c ≤ (f (x max ))/(εf (x min )) For the case in which x max ≤ x min we define the sequence y k := (1 − δ) k x min . The rest of the proof is analogous and the details are left to the reader.
As we now must consider the objective function, we cannot simply apply Theorem 1 to [costconf-ILP]. However, we can prove a slightly weaker version by decomposing the ILP in several smaller ones and applying the theorem to each of them. Proof. Notice that the load of each configuration π · c is a multiple of L/λ 2 , and thus π · c ∈ {L/λ, L/λ + L/(λ 2 ), . . . , 4L} . We classify the configurations according to their loads, Q ℓ := {c ∈ Q : π · c = L/λ + ℓ · L/(λ 2 )}, for ℓ ∈ {0, . . . , 4λ 2 − λ}. Let x * be an optimal solution of [cost-conf-IP]. Then we can considered an ILP for each load value ℓ: c∈Q ℓ Scaling π by multiplying it by λ 2 /L we obtain an integral vector (since π is an integer multiple of L/(λ 2 )), we can apply Theorem 1 to each ILP [conf-IP] ℓ , which yields that there exists a thin solution x ℓ . In particular the number of complex configurations in x ℓ is c∈Qc∩Q ℓ x ℓ c ∈ O((1/δ) log 2 (1/δ)). Sincef c depends only on the load of c, concatenating these solutions yields a solution x ′ := (x ℓ ) ℓ that is optimal for [cost-conf-IP], such that c∈Qc x ′ c ∈ O((λ 2 ) · (1/δ) log 2 (1/δ)) = O((1/δ 3 ) log 2 (1/δ)). It remains to bound the number of simple configurations in the support. To this end, we consider the ILP restricted to simple configurations as follows: We apply the result of Eisenbrand and Shmonin [4]

Solve the ILP restricted to configurations inQ s :
for all c ∈Q s .
Proof. In step 1, the algorithm guesses which jobs are processed on machines following a complex configurations. Since each configuration contains at most O(1/δ) jobs, there are at most O((1/δ 4 ) log 2 (1/δ)) jobs assigned to such machines. For each size π k ∈ Π, we guess the number b c k of jobs of size π k assigned to such machines. Hence, we can enumerate all possibilities for jobs assigned to complex machines in time 2 O((1/δ) log 2 (1/ε)) . Similarly, we can guess the number of machines in step 2 since m c ∈ O((1/δ 3 ) log 2 (1/δ)). For step 3 we use a simple dynamic program that goes over the machines storing a table T (ℓ, z 1 , . . . , z d ) that contains the minimum cost achieved over the first ℓ ≤ m c machines with z k ≤ b c k jobs of size π k . The number of entries the table is O(m c d k=1 (b c k + 1)). Computing T (ℓ, z 1 , . . . , z d ) can be done by checking all entries of the type (1/δ)) for each k, recalling that m c ∈ O((1/δ 3 ) log 2 (1/δ)), and that d = |Π| ∈ O((1/δ) log(1/δ)), we obtain that step 3 can be implemented with 2 O((1/δ) log 2 (1/δ)) running time.
As in [2], the algorithm above can be easily adapted for objectives (II), (I') and (III') by suitably adapting the ILP. We leave the details to the reader. This suffices to conclude Theorem 19 Theorem 19. Consider the scheduling problem on parallel machines with objective functions (I), (II) for f convex (respectively (I') and (II') for f concave). If f satisfies Condition 12 for 1/δ = O(1/ε), then the problem admits an EPTAS with running time 2 O((1/ε) log 4 (1/ε)) + O(n log n).

Minimum makespan scheduling on uniform machines
In this section we generalize our result for P ||C max to uniform machines. Consider a set of jobs J with processing times p j and a set of m non-identical machines M where machine i ∈ M runs at speed s i . If job j is executed on machine i the machine needs p j /s i time units to complete the job. The problem is to find an assignment a : J → M for the jobs to the machines that minimizes the makespan; max i j:a(j)=i p j /s i . The problem is denoted by Q||C max . We suppose that s 1 ≥ s 2 ≥ . . . ≥ s m . Jansen [11] found an efficient polynomial time approximation scheme (EPTAS) for this scheduling problem which has a running time of 2 O(1/ε 2 log 3 (1/ε)) + poly(n). Here we show how to improve the running time and prove the main result of this section.
We follow the approach by Jansen [11], transforming the scheduling problem into a bin packing problem with different bin capacities, round the processing times and bin capacities, divide the bins into at most three groups and generate four different scenarios depending on the input instance.
First, we compute a 2-approximate solution using the algorithm by Gonzales et al. [6] of length B(I) ≤ 2 Opt(I). Suppose that ε < 1; otherwise we can take the 2-approximate solution and are done. Then we choose a value δ ∈ (0, ε) such that 1/δ is integral (the exact value is specified later) and use a binary search within the interval [B(I)/2, B(I)] (that contains Opt(I)). We use a standard dual approximation method that for each value T either computes an approximate schedule of length T (1 + aδ) (where a is constant) or shows that there is no schedule of length T . Since (δ/2)B(I) ≤ δOpt(I), we can find within O(log(1/δ)) iterations a value T ≤ Opt(I)(1+δ) with a corresponding schedule of length at most T (1 + aδ) ≤ Opt(I)(1 + ε), using δ ≤ ε/(a + 2) and ε ≤ 1. Next, the scheduling problem is transformed into a bin packing problem with m bins and capacities c i = T · s i , the processing times p j are rounded to the next valuep j of the form δ(1 + δ) kj with k j ∈ Z and the bin capacities are rounded to the next power of (1 + δ). We call B the set of bins with rounded capacities.
Lemma 21 (Jansen [11]). If there is a feasible packing of n jobs with processing times p j into m bins with capacities c i , then there is also a packing of n jobs with rounded processing times If the number m of bins is smaller than K ∈ O(1/δ log(1/δ)), then we can use an approximation scheme by Jansen and Mastrolilli [12] to compute an (1 + ε)-approximate solution to schedule n jobs on m unrelated machines (an even more general problem) within time Suppose from now on that K > O(1/δ log(1/δ)). Then, we divide the bins into at most three different bin groups. The first group B 1 consists of the K = O(1/δ log(1/δ)) largest bins. For some γ ∈ Θ(ε 2 ), the next group Lemma 22 (Jansen [11]). If there is a solution for the original instance (J , M) of our scheduling problem with makespan T and corresponding bin sizes, then there is a feasible packing for instance with rounded bin capacitiesc i ≤ c i (1 + δ) 3 and rounded processing timesp j ≤ (1 + δ)p j . Here B ′ 1 is the subset of B 1 with bins of capacity larger than δ/(K − 1)c max (B 1 ) and B 2 has a constant number O(1/δ log(1/δ)) of different bin capacities. In addition we have one of the following four scenarios: (1) Two bin groups B ′ 1 and B 2 with a gap c min (B ′ 1 )/c max (B 2 ) ≥ 1/δ.  = B 3 . The same modification works to show that scenario 2 is a special case of scenario 1. Finally, scenario 1 can be interpreted a special case of scenario 3, using B 3 = ∅. Therefore, it is sufficient to improve the running time for scenario 3. Scenario 3 will be solved using a mix of dynamic programming and mixed integer linear programming (MILP) techniques. In this approach we use only the larger bins in B ′ 1 of B 1 to execute jobs, but in the rounding step for scenario 3 afterwards we may also use the smaller bins in B 1 . Notice that a packing into a bin b i with capacityc i ≤ c i (1 + δ) 3 corresponds to a schedule on machine i with total processing time at mostc i /s i ≤ T (1 + δ) 3 . For T ≤ Opt(I)(1 + δ) this gives us a schedule of length at most Opt(I)(1 + δ) 4 . If there a feasible schedule with makespan T , then the total processing time of the instance is j∈J p j ≤ m i=1c i . If this inequality does not hold, then we discard the choice with makespan T . Otherwise, we can eliminate the set J tiny of tiny jobs with processing time ≤ δc m and pack them greedily at the end of the algorithm into the enlarged bins of sizec i (1 + δ). Hence, in what follows we assume that J tiny is empty.

Solution for the instance
In this subsection we consider scenarios 3 above with three bin groups. First, we preassign all huge jobs with processing time > δc K ′ into the first K ′ machines. After the assignment of the huge jobs, we have a free area S 0 in B 1 for the remaining jobs with processing timep j ≤ δc K ′ . The different bin capacities in B 2 and B 3 are denoted bȳ c(1) > . . . >c(L) andc(L + 1) > . . . >c(L + N ), respectively. Let m ℓ be the number of bins of sizec(ℓ) for ℓ = 1, . . . , L + N . The m ℓ machines of the same speed form a block B ℓ of bins with the same capacityc(ℓ). In addition, we have n 1 , . . . , n P jobs of size δ(1 + δ) kj and suppose that the first P ′ ≤ P job sizes are larger thanc K+1 =c(1).
In the MILP we use C h ℓ as configurations or multisets with numbers δ(1 + δ) kj ∈ [δc(ℓ),c(ℓ)] (these are large processing times corresponding to block B ℓ ), where the total sum size(C Here a(k j , C i . In addition, we use fractional variables y j,ℓ to indicate the number of jobs of size δ(1 + δ) kj placed as small ones in block B ℓ ; i.e. δ(1 + δ) kj < δc(ℓ). For each job size δ(1 + δ) kj ≤c(1), let a j be the smallest index in {1, . . . , L + N } such that δ(1 + δ) kj ≥ δc(a j ). If there is no such index, we have a tiny processing time δ(1 + δ) kj < δc(L + N ). Notice that the first P ′ job sizes are within (c(1), δc K ′ ]. These jobs do not fit into B 2 ∪ B 3 . Therefore, for these job sizes we use only one variable y j,0 = n j and set a j = 0.
In the MILP above, we use integral variables for configurations in the blocks of group B 2 and fractional variables for the configurations in blocks of B 3 . Each feasible packing for the jobs into the bins corresponds to a feasible solution of the MILP. The total number of variables is O(n 2 ) + O(n)2 O(1/δ log(1/δ)) , the number of integral variables is at most 2 O(1/δ log(1/δ)) , and the number of constraints (not counting the non-negativity constraints) is at most O(n). The previous approach to solve the scheduling problem and the underlying MILP had a running time of 2 O(1/δ 2 log 3 (1/δ)) + poly(n). In order to use an approach similar to the scheduling on identical machines, each large size δ(1 + δ) kj ∈ C (ℓ) i is rounded up to the next multiple of δ 2c (ℓ). This enlarges the size of each configuration C (ℓ) i from size(C (ℓ) i ) to at most size(C (ℓ) i ) + δc(ℓ) and the corresponding bin size fromc(ℓ) to (1 + δ)c(ℓ). LetC be the configurations of size at most (1 + δ)c(ℓ) with the rounded-up numbers q(k j , ℓ)δ 2c (ℓ) with q(k j , ℓ) ∈ Z + and multiplicities a(k j ,C (ℓ) i ). This rounding implies also that the rounded size size(C (ℓ) i ) of a configuration is a multiple of δ 2c (ℓ). Each new rounded configurationC (ℓ) i (with rounded-up numbers q(k j , ℓ)δ 2c (ℓ) and multiplicities a(k j ,C (ℓ) i )) corresponds to an integral point inside the knapsack polytope P ℓ = {C = (a(k j , C)) : q · C ≤ 1/δ 2 + 1/δ} such that j q(k j , ℓ)a(k j ,C We consider now a modified MILP with configura-tionsC (ℓ) i and coefficients a(k j ,C (ℓ) i ). Note that the total area of all configurations in B ℓ can be bounded by i size(C i . This, together with the small jobs gives i size(C i +δm ℓc (ℓ)+ j δ(1+δ) kj y j,ℓ ≤ m ℓc (ℓ)(1 + δ); i.e. the total area is increased by at most a multiplicative factor of (1 + δ). Since the total area of all jobs within one block is increased by this rounding, we use the following new constraints in the modified MILP: for ℓ = 1, . . . , L.
Next, we divide the coefficients in the L area constraints above by δ 2c (ℓ). Then the coefficients of the x i )/(δ 2c (ℓ)) = a i,ℓ δ 2c (ℓ)/(δ 2c (ℓ)) = a i,ℓ ∈ {1/δ, . . . , 1/δ 2 + 1/δ}. Using the assumption that 1/δ is integral, all coefficients of the variables are integral and bounded by 2/δ 2 . Notice that increasing the capacities of all bins and dividing all coefficients as above, implies also a feasible solution of the modified MILP. Let us study a feasible solution of the modified MILP. To reduce the number of integral configuration variables in the MILP, we consider the following ILP that uses only the integral x x (ℓ) i integral ≥ 0 for i = 1, . . . ,h ℓ , ℓ = 1, . . . , L.
where the valuesm ℓ ,n j , and Area(ℓ, large) are given by a feasible solution of the modified MILP. Here P (B 2 ) is the set of all indices of large job sizes corresponding to blocks B ℓ ∈ B 2 ; i.e. P (B 2 ) = {j : δ(1 + δ) kj ∈ (δc(L),c(1)]}. The cardinality of P (B 2 ) and the value L can be bounded by O(1/δ log(1/δ)). All the coefficients above of the variables are bounded by O (1/δ 2 ).
The support of a configurationC Using the result by Eisenbrand and Shmonin, we can find a feasible solution of the ILP above (if there is a feasible solution of the modified MILP) with at most O(1/δ log 2 (1/δ)) many variables x i ). We can generalize our result in Theorem 1 to our ILP above.
1. For each job size, guess the number of jobs v j ≤n j covered by complex configurations.
2. For each bin size, guess the number of machines w j ≤m j used to schedule the set of jobs covered by complex configurations.
Proof. As in the case of identical machines, our algorithm guesses in step 1 the complex configurations and the corresponding jobs. Since the number M of complex configurations within B 2 ist at most O(1/δ 2 log 3 (1/δ)) and there are at most O(1/δ) many large job per configuration, the total number N of jobs within the complex configurations is at most O(1/δ 3 log 3 (1/δ)).
To obtain a schedule for the guessed jobs, notice that the number of large job sizes |P (B 2 )| ≤ O(1/δ log(1/δ)). We guess now a vector v = (v j ) with possible job sizes that are covered by the complex configurations. The total number of these vectors is (N +1) |P (B2)| ≤ (1/δ 3 log 3 (1/δ)) O(1/δ log(1/δ)) = 2 O(1/δ log 2 (1/δ)) . In addition, we guess a vector w = (w ℓ ) with the numbers w ℓ of complex configurations in the block groups B ℓ . The number of choices here is at most (M + 1) L ≤ (1/δ 2 log 3 (1/δ)) O(1/δ log(1/δ)) = 2 O(1/δ log 2 (1/δ)) . For each guess v, w we run a dynamic program to test whether the number of job sizes, stored in v, fit on the corresponding machines in the blocks B ℓ , given by vector w. To do this, we run over the machines and store after ℓ machines, for ℓ = 1, . . . , M , the set of all feasible vectors with job sizes that can be packed into the first ℓ machines. This dynamic program runs in time M 2 O(1/δ log 2 (1/δ)) = 2 O(1/δ log 2 (1/δ)) . For each feasible choice of v, w we compute the reduced MILP bym ℓ = m ℓ − w ℓ andn j = n j − v j and guess the support of a feasible solution x in the MILP; i.e. the simple configurations in B 2 with value x For each choice we solve a reduced MILP with d = O(1/δ log 2 (1/δ)) integral variables (step 4). The total size s of the MILP can be bounded by s ≤ poly(n, 1/δ) + n log(n)2 O(1/δ log(1/δ)) . Using the algorithm by Kannan with runnning time d O(d) poly(s) for an MILP with d variables and size s, we obtain a running time to solve one MILP in time 2 O(1/δ log 3 (1/δ)) poly(n); using poly(s) ≤ poly(n)2 O(1/δ log(1/δ)) . Running over all vectors v, w and all guesses for the simple configurations, we obtain a running time of 2 O(1/δ log 4 (1/δ)) + poly(n).
In order to calculate the length of the computed schedule and to specify δ, we use the following result: Lemma 26. [11] If there is a feasible solution of an MILP instance with bin capacitiesc(ℓ) for blocks B ℓ ∈ B 2 ∪ B 3 and capacitiesc i for the K largest bins in B 1 , then the entire job set J can be packed into bins with capacitiesc(ℓ)(1 + 2δ) 2 for blocks B ℓ ∈ B 2 ∪ B 3 and enlarged capacities c i (1 + 3δ) 2 for the first K bins.