Abstract 1 Introduction 2 Preliminaries 3 A 𝟐-Competitive Instantly-Committing Scheduler 4 A 1.678-Competitive Eventually-Committing Scheduler 5 A 1.5-Competitive Never-Committing Scheduler 6 Extending Beyond the Massively Parallel Regime References Appendix A Barriers Against Improved Schedulers Appendix B Lower Bounds from [16] Appendix C Randomized Lower Bounds

When to Give up on a Parallel Implementation

Nathan S. Sheffield ORCID MIT, Cambridge, MA, USA Alek Westover ORCID MIT, Cambridge, MA, USA
Abstract

In the Serial Parallel Decision Problem (SPDP), introduced by Kuszmaul and Westover [SPAA’24], an algorithm receives a series of tasks online, and must choose for each between a serial implementation and a parallelizable (but less efficient) implementation. Kuszmaul and Westover describe three decision models: (1) Instantly-committing schedulers must decide on arrival, irrevocably, which implementation of the task to run. (2) Eventually-committing schedulers can delay their decision beyond a task’s arrival time, but cannot revoke their decision once made. (3) Never-committing schedulers are always free to abandon their progress on the task and start over using a different implementation. Kuszmaul and Westover gave a simple instantly-committing scheduler whose total completion time is 3-competitive with the offline optimal schedule, and proved two lower bounds: no eventually-committing scheduler can have competitive ratio better than ϕ1.618 in general, and no instantly-committing scheduler can have competitive ratio better than 2 in general. They conjectured that the three decision models should admit different competitive ratios, but left upper bounds below 3 in any model as an open problem.

In this paper, we show that the powers of instantly, eventually, and never committing schedulers are distinct, at least in the “massively parallel regime”. The massively parallel regime of the SPDP is the special case where the number of available processors is asymptotically larger than the number of tasks to process, meaning that the work associated with running a task in serial is negligible compared to its runtime. In this regime, we show (1) The optimal competitive ratio for instantly-committing schedulers is 2, (2) The optimal competitive ratio for eventually-committing schedulers lies in [1.618,1.678], (3) The optimal competitive ratio for never-committing schedulers lies in [1.366,1.500]. We additionally show that our instantly-committing scheduler is also 2-competitive outside of the massively parallel regime, giving proof-of-concept that results in the massively parallel regime can be translated to hold with fewer processors.

Keywords and phrases:
Scheduling, Multi-Processor, Online-Algorithms
Copyright and License:
[Uncaptioned image] © Nathan S. Sheffield and Alek Westover; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Online algorithms
Related Version:
arxiv: https://arxiv.org/abs/2408.16092
Editors:
Raghu Meka

1 Introduction

1.1 Background

Many computational tasks can be performed quickly in parallel over a large number of processors – but such parallel implementations may be less work-efficient than a serial implementation on a single processor, requiring substantially more total computation time across all machines. When several different tasks must be completed in as little total time as possible, this trade-off between work and time can necessitate running different tasks in different modes: small tasks can be done in serial to save work, while large tasks must be parallelized to prevent their serial runtimes from dominating the overall computation.

To formalize this problem, Kuszmaul and Westover introduced the Serial Parallel Decision Problem (SPDP) [16]. In their model, each task has exactly two possible implementations: an “embarrassingly parallel” implementation which can be worked on by multiple machines at once (where the rate of progress on the implementation is proportional to the number of processors assigned to it), and a serial implementation which can only be worked on by a single processor at a time. If all tasks are available at time 0, it is easy to efficiently determine the optimal strategy: all jobs with serial completion time smaller than some threshold can be run in serial, and the larger tasks must be run in parallel. The model becomes interesting when previously-unknown tasks are allowed to arrive at arbitrary times, and one wishes to minimize the competitive ratio between the total completion time of an online algorithm compared to the offline optimal completion time.

Kuszmaul and Westover define three distinct versions of this model, parameterized by the degree to which the online scheduler is able to reverse its decisions.

  1. 1.

    An instantly-committing scheduler must choose an implementation for each task as soon as the task arrives, and is not allowed to revisit this choice.

  2. 2.

    An eventually-committing scheduler may delay choosing an implementation, but must choose one irrevocably before assigning its work to a processor.

  3. 3.

    A never-committing scheduler can, at any time, discard all as-yet completed work on an implementation and re-start the task with the other implementation.

The distinction between the eventually- and never-committing models is motivated by potential practical concerns: if a task involves mutating an input in memory, it may not be feasible to cancel an implementation once it begins running. Westover and Kuszmaul present an instantly-committing scheduler achieving competitive ratio 3, and show competitive ratio lower bounds of 2 and ϕ1.618 in the instantly-committing and eventually-committing models, respectively (for deterministic schedulers). They conjecture that the ability to delay or cancel choices should allow for more competitive online algorithms, but leave open the problem of finding better competitive ratio upper bounds than 3.

1.2 This Work

In this work, we consider Kuszmaul and Westover’s SPDP when the number of available processors is much larger than the number of tasks, noting that all of their upper and lower bounds hold in this parameter regime. This is a particularly simple setting, since the work associated to a serial implementation is now negligible compared to its completion time – running a task in serial means accepting a lower bound on completion time, but requires essentially no work. We can think of this setting as an unrelated-machines scheduling problem with an unlimited number of identical “slow” machines, and a single unrelated “fast” machine, representing a massively parallel implementation of the task across many processors – note that this could also describe scenarios with a literal fast machine, such as a single piece of accelerated hardware. Although we consider the number of processors to be large, we won’t assume that tasks complete instantly when run on all the processors; instead, we’ll let the total parallel work of each tasks be proportional to p.

Our main results are tight bounds on the competitive ratio of instantly-committing schedulers in this regime, and separations between the strength of all 3 models. Our results are summarized in Table 1. Note that we focus on deterministic schedulers (although we briefly discuss randomized schedulers in the appendix).

Table 1: Main Results. * = this work.
Model Lower Bound Upper Bound
Instantly-Committing Schedulers 2 [16] 2 *
Eventually-Committing Schedulers 1.618 [16] 1.678 *
Never-Committing Schedulers 1.366 * 1.5 *

In each case, the upper bound comes from a simple heuristic in which the algorithm compares its projected completion time to its current estimate of the optimal completion time. More precisely, at each time t, our scheduler computes an offline optimal strategy “𝗈𝗉𝗍(t)” on the truncation of the task sequence to tasks that arrive before time t, and makes decisions based on the completion time of 𝗈𝗉𝗍(t).

Our main technical contribution is the analysis of the schedulers. Working with 𝗈𝗉𝗍(t) is challenging, because the schedules 𝗈𝗉𝗍(t) and 𝗈𝗉𝗍(t) can be quite different for tt. For instantly-committing schedulers we use an invariant-based approach to bound, at all times, the work taken by our scheduler in terms of the minimum work and completion time among all schedulers. For eventually- and never-committing schedulers this approach is no longer feasible: there is no well defined notion of the “work taken” by our scheduler, because the scheduler may not have committed to a decision yet. Instead, these analyses rely on choosing a couple of critical times to observe the state of our scheduler and 𝗈𝗉𝗍(t), and then establishing a dichotomy: either (1) “real fast tasks” (tasks that 𝗈𝗉𝗍 runs on the fast machine) arrive quickly, in which case our scheduler prioritizes real fast tasks on the fast machine and will run most other tasks on slow machines, or (2) real fast tasks arrive slowly, in which case our scheduler never falls too far behind 𝗈𝗉𝗍, despite making suboptimal use of the fast machine.

In addition to these results, we show that with some effort our instantly-committing scheduler can be adapted to work for any number of processors, fully resolving the question of the optimal competitive ratio of instantly-committing schedulers in the general SPDP, and giving a proof-of-concept that results for a large number of processors can be adapted to hold when the work associated with serial tasks is also a concern.

1.3 Related Work

There is a long line of work studying the phenomenon of work-inefficient parallel implementations in multi-processor scheduling. Typically, the models of limited parallelism considered involve one of three types of jobs:

  1. 1.

    Rigid jobs, which come with a number pi specifying a fixed number of processors the job must be run on at each timestep of its execution.

  2. 2.

    Moldable jobs, where the scheduler may choose the (fixed) number of processors the job is run on, and the amount of work scales depending on this choice according to some speedup curve.

  3. 3.

    Malleable jobs, which like moldable jobs have an associated speedup curve, but where the job may be assigned to different numbers of processors at different timesteps (as opposed to the scheduler choosing a fixed value at the start of the task’s runtime).

In each of these cases, there is interest in minimizing the total completion time (makespan) in both the offline setting – where problems tend to be 𝖭𝖯-hard, but may have approximation algorithms [25, 20, 18, 24, 23] – and the online setting, where the goal is to minimize competitive ratio [8, 7, 4, 14, 9, 29, 28]. Kuszmaul and Westover’s Serial Parallel Decision Problem is related to this line of work, but doesn’t quite fit into the usual framework – in their model, instead of dealing with an arbitrary speedup curve, there is a single binary decision between a completely serial and perfectly parallelizable implementation.

As noted, the massively-parallel regime of the SPDP considered in this paper can be naturally viewed as a scheduling problem with an unlimited number of identical “slow” machines, and a single unrelated “fast” machine. Standard scheduling problems in the unrelated machines model have also been well-studied, in terms of both offline approximation algorithms and hardness results [13, 17, 22, 26, 19, 21, 10, 15, 6], and online algorithms [3, 2, 22, 5, 1, 11, 12, 30]. We note, however, that since we treat “slow” machines as an unbounded resource, and there is only a single fast machine available, most of the typical difficulties of multi-processor scheduling problems do not arise. In particular, unlike a typical load-balancing problem where 𝖭𝖯-hardness follows from a standard set-partition reduction, the One-Fast-Many-Slow Decision Problem (without dependencies) is easily solvable offline, simply by putting all tasks which finish below a certain threshold on distinct slow machines.

1.4 Open Questions

We leave three main open questions as directions for future work.

Question 1.

What are the optimal competitive ratios for eventually/never-committing schedulers?

In Appendix A we identify barriers, showing that improving on our eventually/never-committing schedulers will require substantially different algorithms – but we suspect that such improvements may be possible.

Question 2.

Are randomized schedulers more powerful than deterministic schedulers?

In the main body of the paper we consider only deterministic schedulers; however, for many online problems randomized algorithms can do substantially better than deterministic ones. In Appendix C we give some lower bounds against randomized schedulers, but these bounds are weaker than those known for deterministic schedulers.

Question 3.

Is there a general transformation between schedulers for the massively parallel regime (i.e. the One-Fast-Many-Slow Decision Problem) and the general SPDP?

The fundamental difficulty of the SPDP is deciding between implementations which take a lot of work, and implementations which take a lot of time. This tradeoff is absolute in the massively parallel regime, since the large number of processors means the amount of work associated with a serial implementation is negligible, whereas in the general SPDP it is possible for all processors to be saturated with serial implementations to run. Intuitively, one might expect that having work associated to the serial implementations only makes the problem easier, since it makes the tradeoff less dramatic – indeed, Kuszmaul and Westover’s competitive ratio lower bounds become weaker when the number of processors is small. So, one might hope that algorithms in the massively parallel regime can be generically translated to limited-processor settings. Formalizing this connection is an interesting direction for future research.

2 Preliminaries

2.1 The One-Fast-Many-Slow Decision Problem

In this section we formally define the One-Fast-Many-Slow Decision Problem, where the goal is to distribute work between a single fast machine and an unlimited number of slow machines. An instance of the problem is a Task Arrival Process (TAP) 𝒯=(τ1,,τn), where each task τi consists of a tuple (σi,πi,ti) indicating runtime on a slow machine, runtime on the fast machine, and arrival time, respectively, such that t1tn and such that σiπi. A valid schedule associates at most one task to each machine at each point in time111In order for the notions like “amount of work performed on τi” to be well-defined, we must additionally mandate that a schedule be measurable. Alternatively, one can assume that time is discretized into appropriately fine timesteps. such that no work is done on any task before its arrival time, each task runs on at most 1 machine, and each task τi is either run for a total of σi time on some slow machine, or a total of πi time on the fast machine. The completion time (also known as makespan) of the schedule is the time when the last task is finished.

We will be interested in online algorithms for this problem. An online scheduler learns about each task only at its arrival time, and at each time t must already have fixed the prefix of the schedule on times less than t. We define three distinct models for how these online decisions are made:

  1. 1.

    For each task τ, an instantly-committing scheduler must fix at τ’s arrival time the machine that τ will run on.

  2. 2.

    An eventually-committing scheduler need not fix a machine for any task until that task begins running.

  3. 3.

    A never-committing scheduler is an eventually-committing scheduler with the additional power to, at any time, “cancel” a task from the schedule, erasing all work previously done on the task and allowing it to be re-assigned to a new machine.

In each case, we are interested in minimizing the competitive ratio of an online scheduler, which is the supremum over all TAPs of the ratio of the online scheduler’s completion time to the completion time of an optimal scheduler on that TAP.

2.2 Connection to the SPDP

In Kuszmaul and Westover’s Serial Parallel Decision Problem, a scheduler must allocate work to p equally-powerful processors, where each task is specified by the work of the serial implementation, the work of the parallel implementation, and the runtime. The scheduler must choose whether to run each task in serial or parallel, and then must assign the resulting work to the p processors, where parallel work can run on multiple processors at once but serial work cannot.

We can define the massively parallel regime of this problem to be the limit as the number of processors becomes large compared to the number of tasks. Letting n be the number of tasks, if nεp then restricting the serial implementations to run on only the first n many processors, and the parallel implementations to run on only the last pn many processors, the completion time can increase by at most a 11ε factor. This corresponds directly to the One-Fast-Many-Slow Decision Problem: we think of each of these serial processors as a “slow machine”, noting that since we have as many processors as we have tasks there are effectively an unlimited number of processors. We think of the parallel processors collectively as a “fast machine”, noting that we can assume without loss of generality that, at any point in time, all parallel processors are running the same parallel implementation.

2.3 Notation

We now introduce our notation for describing and analyzing schedulers. For algorithm 𝖺𝗅𝗀 and TAP 𝒯, we let 𝖢𝖺𝗅𝗀𝒯 be the completion time of 𝖺𝗅𝗀 on 𝒯. Let 𝒯t be the truncation of TAP 𝒯 consisting of the tasks τi with tit. When 𝒯 is clear from context we will write 𝖢𝖺𝗅𝗀t to denote 𝖢𝖺𝗅𝗀𝒯t, and we will write 𝖢𝖺𝗅𝗀 to denote 𝖢𝖺𝗅𝗀=𝖢𝖺𝗅𝗀𝒯. We will also use 𝖢~𝖺𝗅𝗀 to denote the completion time of the fast machine – that is, the final time when the fast machine has work.

It will be useful to be able to talk about the optimal completion time of a prefix of the TAP. Define the schedule 𝗈𝗉𝗍(t) to be a schedule for 𝒯t with minimal completion time. Note that 𝗈𝗉𝗍(t) is only defined as an offline strategy, but that an online algorithm can compute it (efficiently!) at time t, thus obtaining a lower bound on 𝖢𝗈𝗉𝗍, which will be useful to inform the algorithm’s future decisions. For ease of notation, we’ll often abbreviate 𝖢𝗈𝗉𝗍(t)t as 𝖢t.

There may be many sets of decisions which result in the optimal completion time; as opposed to letting 𝗈𝗉𝗍(t) be an arbitrary such scheduler, it will be useful to fix a canonical one, which we will do by letting 𝗈𝗉𝗍(t) run as many tasks in serial as possible.

Scheduler 4.

The scheduler 𝗈𝗉𝗍(t), defined on 𝒯t, makes decisions as follows:

  • If τi has σi+ti𝖢t, run τi on a slow machine when it arrives.

  • Otherwise, run τi on the fast machine. Prioritize tasks with larger σi+ti, and break ties by taking tasks with smaller i.

Finally, we let [n]={1,,n}, and for a set J of tasks we will write πJ to denote jJπj.

3 A 𝟐-Competitive Instantly-Committing Scheduler

In this section we present and analyze a 2-competitive instantly-committing scheduler. Kuszmaul and Westover showed that a competitive ratio of (2ε) is impossible for instantly-committing schedulers, so our scheduler is optimal. The scheduler, which we call 𝗂𝗇𝗌 (“instantly commiting”) is defined in 5.

Scheduler 5.

When task τi arrives:

  • If σi+ti>2𝖢ti run τi on the fast machine, with the fast machine processing tasks in order of arrival.

  • Otherwise run τi on a slow machine.

We analyze 𝗂𝗇𝗌 by showing inductively that 𝖢~𝗂𝗇𝗌 (the completion time of the fast machine) is small compared to the work and completion time of any other schedule. For length n TAP 𝒯, scheduler 𝖺𝗅𝗀, and i[n], we define the quantity 𝖪𝖺𝗅𝗀t to be the sum of πj for all tasks τj𝒯t that 𝖺𝗅𝗀 runs on the fast machine. The key to analyzing 𝗂𝗇𝗌 is the following lemma.

Lemma 6.

Fix a length n TAP. For all i[n], and for all instantly-committing schedulers 𝖺𝗅𝗀,

𝖢~𝗂𝗇𝗌ti𝖢𝖺𝗅𝗀ti+𝖪𝖺𝗅𝗀ti. (1)

Proof.

We prove the lemma by induction on i. For i=1 the claim is trivial. Now, fix i[n1],𝖺𝗅𝗀 and assume the lemma for i and for all 𝖺𝗅𝗀; we will prove the lemma for i+1,𝖺𝗅𝗀.

If 𝗂𝗇𝗌 runs τi+1 on a slow machine then 𝖢~𝗂𝗇𝗌ti+1=𝖢~𝗂𝗇𝗌ti, and 𝖢𝖺𝗅𝗀ti+𝖪𝖺𝗅𝗀i𝖢𝖺𝗅𝗀ti+1+𝖪𝖺𝗅𝗀i+1. Thus, the invariant Equation 1 is maintained. We always have 𝖢~𝗂𝗇𝗌ti+1𝖢~𝗂𝗇𝗌ti+πi+1, so if 𝖺𝗅𝗀 runs τi+1 on the fast machine then the invariant Equation 1 is also maintained, since then 𝖢𝖺𝗅𝗀ti+1+𝖪𝖺𝗅𝗀ti+1𝖢𝖺𝗅𝗀ti+𝖪𝖺𝗅𝗀ti+πi+1𝖢~𝗂𝗇𝗌ti+πi+1 by the inductive hypothesis.

The final case to consider is when 𝖺𝗅𝗀 runs τi+1 on a slow machine, while 𝗂𝗇𝗌 runs τi+1 on the fast machine. From the definition 5 of 𝗂𝗇𝗌, the fact that 𝗂𝗇𝗌 ran τi+1 on the fast machine implies

2𝖢ti+1<σi+1+ti+1. (2)

On the other hand, 𝖺𝗅𝗀 ran τi+1 on the fast machine. Thus,

σi+1+ti+1𝖢𝖺𝗅𝗀ti+1. (3)

Now, we use the invariant for (i,𝗈𝗉𝗍ti+1) to bound 𝖢~𝗂𝗇𝗌ti+1 . We have:

𝖢~𝗂𝗇𝗌ti+1𝖢~𝗂𝗇𝗌ti+πi+1𝖪𝗈𝗉𝗍(ti+1)ti+𝖢𝗈𝗉𝗍(ti+1)ti+πi+1. (4)

Because of Equation 2 we know that 𝗈𝗉𝗍(ti+1) must run τi+1 on the fast machine. So, we have

𝖪𝗈𝗉𝗍(ti+1)ti+𝖢𝗈𝗉𝗍(ti+1)ti+πi+1=𝖪𝗈𝗉𝗍(ti+1)ti+1+𝖢𝗈𝗉𝗍(ti+1)ti2𝖢ti+1. (5)

Stringing together the above inequalities Equation 4, Equation 5, Equation 2, and Equation 3, we get

𝖢~𝗂𝗇𝗌ti+1<𝖢𝖺𝗅𝗀ti+1.

Thus, the invariant Equation 1 holds.

Using Lemma 6 it is easy to show that 𝗂𝗇𝗌 is 2-competitive.

Theorem 7.

𝗂𝗇𝗌 is a 2-competitive instantly-committing scheduler.

Proof.

By Lemma 6 we have 𝖢~𝗂𝗇𝗌tn2𝖢𝗈𝗉𝗍. Thus, 𝗂𝗇𝗌 finishes using the fast machine before time 2𝖢𝗈𝗉𝗍. Any task that 𝗂𝗇𝗌 runs on a slow machine must have σi+ti2𝖢𝗈𝗉𝗍, so these tasks finish before 2𝖢𝗈𝗉𝗍 as well.

4 A 1.678-Competitive Eventually-Committing Scheduler

In this section we present and analyze a ξ-competitive eventually-committing scheduler, where ξ1.678 is the real root of the polynomial 2x33x21. Kuszmaul and Westover gave a lower bound of ϕ1.618 on the competitive ratio of any eventually-committing scheduler and conjectured that this lower bound is tight. Our scheduler represents substantial progress towards resolving Kuszmaul and Westover’s conjecture, improving on their previous best algorithm which had a competitive ratio of 3. Our scheduler, which we call 𝖾𝗏𝖾 (“eventually committing”), is defined in 8.

Scheduler 8.

At time t:

  • If task τi, which has arrived but not yet been started, has σi+tξ𝖢t, then start τi on a slow machine.

  • Maintain up to one active task at a time. The fast machine is always allocated to the active task.

  • When there is no active task, but there are unstarted tasks present, choose as the new active task the unstarted task with the largest σi+ti value (breaking ties arbitrarily).

Theorem 9.

𝖾𝗏𝖾 is a ξ-competitive eventually-committing scheduler.

Proof.

Fix TAP 𝒯. Let 𝖢~𝖾𝗏𝖾 denote the time when 𝖾𝗏𝖾 completes the last task run on the fast machine. If τi is run on a slow machine at any time t, then τi finishes before ξ𝖢tξ𝖢𝗈𝗉𝗍. Thus, it suffices to show that 𝖢~𝖾𝗏𝖾ξ𝖢𝗈𝗉𝗍.

For any x[0,𝖢𝗈𝗉𝗍], let R(x) be the first time that an online algorithm becomes aware that the optimal schedule requires at least x completion time – that is, R(x)=inf{t:𝖢tx}. Let 𝒜 (“actual”) be the set of tasks that 𝗈𝗉𝗍 runs on the fast machine, and (“fake”) be the set of tasks that 𝗈𝗉𝗍 runs on a slow machine but 𝖾𝗏𝖾 runs on the fast machine. We can bound the sizes and arrival times of tasks in as follows.

Claim 10.

All tasks τi arrive before time R(𝖢𝗈𝗉𝗍/ξ), and have πi<σi/ξ.

Proof.

All tasks τi are run on the fast machine by 𝖾𝗏𝖾, and on slow machines by 𝗈𝗉𝗍. In particular this means

ξ𝖢ti<σi+ti𝖢𝗈𝗉𝗍ξ𝖢R(𝖢𝗈𝗉𝗍/ξ).

Thus, ti<R(𝖢𝗈𝗉𝗍/ξ). To show πi<σi/ξ, note that πi+ti𝖢ti<σi+tiξ.

To analyze when tasks in get run it will be useful to partition into 𝖻𝗂𝗀={τi:σi+ti>𝖢𝗈𝗉𝗍/ξ} and 𝗌𝗆𝖺𝗅𝗅={τi:σi+ti𝖢𝗈𝗉𝗍/ξ}. Now we show that, without loss of generality, 𝖾𝗏𝖾 does not start any tasks in 𝗌𝗆𝖺𝗅𝗅 too late.

Claim 11.

If 𝖾𝗏𝖾 starts a task τ𝗌𝗆𝖺𝗅𝗅 at any time tR(𝖢𝗈𝗉𝗍/ξ), then 𝖢~𝖾𝗏𝖾ξ𝖢𝗈𝗉𝗍.

Proof.

Note that no task τi𝗌𝗆𝖺𝗅𝗅 can be started after time 𝖢𝗈𝗉𝗍: since 𝖢𝗈𝗉𝗍+σi𝖢𝗈𝗉𝗍+𝖢𝗈𝗉𝗍/ξ<ξ𝖢𝗈𝗉𝗍, any task τi𝗌𝗆𝖺𝗅𝗅 present but not already running at time 𝖢𝗈𝗉𝗍 would be run on a slow machine. Let t be the last time after R(𝖢𝗈𝗉𝗍/ξ) when 𝖾𝗏𝖾 starts a task τ. If t does not exist the claim is vacuously true. In light of our previous observation, t<𝖢𝗈𝗉𝗍. Let τi be the task that 𝖾𝗏𝖾 starts at time t. Because 𝖾𝗏𝖾 prioritizes making tasks with larger σj+tj values active, at time t there are no tasks τ𝖻𝗂𝗀𝒜 present. After time t, no more tasks from can arrive by 10, and at most 𝖢𝗈𝗉𝗍t work in 𝒜 can arrive because 𝗈𝗉𝗍 must be able to complete this work. Thus,

𝖢~𝖾𝗏𝖾t+(𝖢𝗈𝗉𝗍t)+πi=𝖢𝗈𝗉𝗍+πi. (6)

Now, because τi𝗌𝗆𝖺𝗅𝗅 we have πi𝖢𝗈𝗉𝗍/ξ2; using this in Equation 6 we find 𝖢~𝖾𝗏𝖾(1+1/ξ2)𝖢𝗈𝗉𝗍ξ𝖢𝗈𝗉𝗍. This means that we can assume that, after time R(𝖢𝗈𝗉𝗍/ξ), the only tasks that 𝖾𝗏𝖾 runs on the fast machine are 𝒜, 𝖻𝗂𝗀, and whatever the active task was at time R(𝖢𝗈𝗉𝗍/ξ). We call the active task at time R(𝖢𝗈𝗉𝗍/ξ), if one exists, the stuck task, denoted τs. We split into cases depending on how large this stuck task is.

Case 1.

There is no stuck task.

In this case, we in fact have C~𝖾𝗏𝖾𝖢𝗈𝗉𝗍. Since there is no active task at time R(𝖢𝗈𝗉𝗍/ξ), there are no tasks present but not started on slow machines. By 10, 𝖾𝗏𝖾 will run all tasks τ𝒜 arriving after time R(𝖢𝗈𝗉𝗍/ξ) on slow machines. Thus, at all time steps t[R(𝖢𝗈𝗉𝗍/ξ),𝖢𝗈𝗉𝗍], 𝖾𝗏𝖾 either has no active task on the fast machine, or has some τ𝒜 as the active task on the fast machine, so 𝖾𝗏𝖾 completes 𝒜 at least as quickly as 𝗈𝗉𝗍.

Case 2.

There is a stuck task, with σs+ts>𝖢𝗈𝗉𝗍/ξ.

Define 𝒜early={τi𝒜:ti<R(𝖢𝗈𝗉𝗍/ξ)} and 𝒜late=𝒜𝒜early. Let t<R(𝖢𝗈𝗉𝗍/ξ) be a time when all tasks in {τs}𝖻𝗂𝗀𝒜early have already arrived; such a time must exist by 10. Observe that 𝗈𝗉𝗍(t) runs all tasks in {τs}𝖻𝗂𝗀𝒜early on the fast machine due to 𝖢t<𝖢𝗈𝗉𝗍/ξ. This further implies that π𝖻𝗂𝗀{τs}𝒜early𝖢𝗈𝗉𝗍/ξ. Also π𝒜late𝖢𝗈𝗉𝗍R(𝖢𝗈𝗉𝗍/ξ), simply because 𝗈𝗉𝗍 must complete the work on tasks 𝒜late after these tasks arrive. By 11 we may assume without loss of generality that, after time R(𝖢𝗈𝗉𝗍/ξ), 𝖾𝗏𝖾 is always running a task from {τs}𝒜𝖻𝗂𝗀. Thus,

𝖢~𝖾𝗏𝖾 R(𝖢𝗈𝗉𝗍/ξ)+π𝒜𝖻𝗂𝗀{τs}
=R(𝖢𝗈𝗉𝗍/ξ)+π𝒜early𝖻𝗂𝗀{τs}+π𝒜late
R(𝖢𝗈𝗉𝗍/ξ)+𝖢𝗈𝗉𝗍/ξ+(𝖢𝗈𝗉𝗍R(𝖢𝗈𝗉𝗍/ξ))
ξ𝖢𝗈𝗉𝗍

Case 3.

There is a stuck task, with σs+tsC𝗈𝗉𝗍/ξ.

This case will be the most difficult to handle of the three. It will be useful to focus now on the tasks of 𝖻𝗂𝗀 that arrive after the stuck task is started. Let t be the time when 𝖾𝗏𝖾 starts running τs, and let 𝖻𝗂𝗀={τi𝖻𝗂𝗀:tit} be the fake tasks arriving after t. We first observe that, if no such tasks arrive, 𝖾𝗏𝖾 performs very well.

Claim 12.

If 𝖻𝗂𝗀= then 𝖢~𝖾𝗏𝖾ξ𝖢𝗈𝗉𝗍.

Proof.

At time t no tasks τi𝒜𝖻𝗂𝗀 can be present, since all such tasks have σi+ti>𝖢𝗈𝗉𝗍/ξ, so 𝖾𝗏𝖾 would prioritize running them on the fast machine instead of the stuck task. By 11, we know that after time t 𝖾𝗏𝖾 will always be running tasks from {τs}𝒜𝖻𝗂𝗀. The total work on tasks from 𝒜 that arrives after time t is at most 𝖢𝗈𝗉𝗍t, so if 𝖻𝗂𝗀= we have

𝖢~𝖾𝗏𝖾t+πs+𝖢𝗈𝗉𝗍tσs/ξ+𝖢𝗈𝗉𝗍ξ𝖢𝗈𝗉𝗍.

By 12 we may assume 𝖻𝗂𝗀. So, let σmin=minτi𝖻𝗂𝗀(σi); we will be able to control how much work arrives in the TAP by the fact that 𝖾𝗏𝖾 never decides to run the task τi𝖻𝗂𝗀 with σi=σmin on a slow machine. Split 𝒜 into 𝒜early={τ𝒜:tti<R(σmin)} and 𝒜late={τ𝒜:timax(t,R(σmin))} (note that we use a different threshold to define earliness here than we did in case 2). First we need the following analogue of 11.

Claim 13.

If 𝖾𝗏𝖾 starts a task τ𝖻𝗂𝗀 at any time t[R(𝖢𝗈𝗉𝗍/ξ),𝖢𝗈𝗉𝗍], then 𝖢~ξ𝖢𝗈𝗉𝗍.

Proof.

Let t be the last time in [R(𝖢𝗈𝗉𝗍/ξ),𝖢𝗈𝗉𝗍] when 𝖾𝗏𝖾 starts a task τ. If t does not exist the claim is vacuously true. Let τi be the task that 𝖾𝗏𝖾 starts at time t. Because 𝖾𝗏𝖾 prioritizes making tasks with larger σj+tj values active, at time t there are no tasks τ𝒜 present. After time t at most 𝖢𝗈𝗉𝗍t work in 𝒜 can arrive because 𝗈𝗉𝗍 must be able to complete this work. Recalling that π𝖻𝗂𝗀𝖢𝗈𝗉𝗍/ξ, we have

𝖢~𝖾𝗏𝖾t+π𝖻𝗂𝗀+(𝖢𝗈𝗉𝗍t)ξ𝖢𝗈𝗉𝗍.

Claim 14.

t+π𝖻𝗂𝗀+π𝒜𝖾𝖺𝗋𝗅𝗒<(σmin+R(σmin))/ξ.

Proof.

Fix a time t<R(σmin) after all tasks in 𝖻𝗂𝗀𝒜𝖾𝖺𝗋𝗅𝗒 have arrived, and fix a task τi𝖻𝗂𝗀 with σi=σmin. First, note that by 13 we can assume that 𝖾𝗏𝖾 has not started τi by time t. Thus, 𝖾𝗏𝖾 is free to start τi on a slow machine, but chooses not to. This implies

ξ𝖢t<σi+t<σmin+R(σmin). (7)

We also observe that 𝗈𝗉𝗍(t) must run 𝖻𝗂𝗀𝒜𝖾𝖺𝗋𝗅𝗒 on the fast machine, since running any of them on the slow machine would finish after time σmin. Thus,

t+π𝖻𝗂𝗀+π𝒜𝖾𝖺𝗋𝗅𝗒𝖢t. (8)

Combining Equation 8 and Equation 7 gives the desired statement.

The other observation we make is that R(σmin) cannot happen too early.

Claim 15.

R(σmin)>(ξ1)σmin.

Proof.

Let τi𝖻𝗂𝗀 be a task with σi=σmin. Note that tiR(σmin) by 10. Then, by 13 we may assume without loss of generality that at time R(σmin) 𝖾𝗏𝖾 is not running τi, and does not choose to start τi on a slow machine. So,

R(σmin)+σmin>ξ𝖢R(σmin)ξσmin.

Now we conclude the theorem.

Claim 16.

𝖢~𝖾𝗏𝖾ξ𝖢𝗈𝗉𝗍.

Proof.

Note that π𝒜late𝖢𝗈𝗉𝗍R(σmin). Also, note that since τs was not put on a slow machine immediately upon arrival, we must have πsσs/ξ𝖢𝗈𝗉𝗍/ξ2. Then, applying 14 and 15 we have

𝖢~𝖾𝗏𝖾 πs+t+π𝖻𝗂𝗀+π𝒜𝖾𝖺𝗋𝗅𝗒+π𝒜late
𝖢𝗈𝗉𝗍/ξ2+(σmin+R(σmin))/ξ+𝖢𝗈𝗉𝗍R(σmin)
(1/ξ2+(1+ξ1)/ξ+1(ξ1))𝖢𝗈𝗉𝗍
=(3+1/ξ2ξ)𝖢𝗈𝗉𝗍
=ξ𝖢𝗈𝗉𝗍.

 Remark 17.

The simple nature of the lower bound in Proposition 32, along with the fact that 𝖾𝗏𝖾 gets a competitive ratio quite close to ϕ might leave the impression that ϕ is clearly the correct competitive ratio, and a slightly better analysis of (a natural variant of) 𝖾𝗏𝖾 would be ϕ-competitive. However, this is not the case: in Appendix A, we show that no non-procrastinating eventually-committing scheduler can achieve competitive ratio better than ξ, where a scheduler is called non-procrastinating if, whenever tasks are present, it always runs at least one task. Thus, if the competitive ratio of 𝖾𝗏𝖾 can be improved upon, doing so will require a substantially different scheduler: one which occasionally decides to do nothing at all despite work being present.

5 A 1.5-Competitive Never-Committing Scheduler

In this section we analyze never-committing schedulers. First we give a simple lower bound.

Proposition 18.

Fix ε>0. There is no deterministic ((1+3)/2ε)-competitive never-committing scheduler.

Proof.

We may assume ε<.001. Let ψ=(31)/2. Let 𝒯=((1,2ψ,0),(,1ψ,ψ)); that is, τ1 has σ1=1,π1=2ψ,t1=0, and τ2 has σ2=,π2=1ψ,t2=ψ. Let 𝒯=((1,2ψ,0)) be the same TAP without the second task. We have 𝖢𝗈𝗉𝗍𝒯=2ψ, since 𝗈𝗉𝗍 just runs the single task on the fast machine, and 𝖢𝗈𝗉𝗍𝒯=1, since 𝗈𝗉𝗍 can run τ1 on a slow machine and τ2 on the fast machine.

Suppose that 𝖺𝗅𝗀 is a (ψ+1ε)-competitive scheduler. On TAP 𝒯, at time ψε/2, we claim 𝖺𝗅𝗀 must be running τ1 on the fast machine. If not, then 𝖺𝗅𝗀’s completion time must be at least min(σ1,ψε/2+π1)=1, with the branch of the min depending on whether τ1 is ever moved to the fast machine – but this gives competitive ratio 1/(2ψ)=1+ψ.

Before time ψ, it is impossible to distinguish between 𝒯 and 𝒯. Thus, 𝖺𝗅𝗀 must be running τ1 on the fast machine at time ψε/2 on TAP 𝒯. Now, we have 𝖢𝖺𝗅𝗀𝒯min(σ1+ψε/2,π1+π2)=1+ψε/2, with the branch of the min depending on whether τ1 is ever moved to a slow machine – but this gives competitive ratio (1+ψε/2)/1. Thus, 𝖺𝗅𝗀 is not actually (ψ+1ε)-competitive.

Now we give a 1.5-competitive never-committing scheduler, which we call 𝗇𝖾𝗏 (“never committing”). Note that this competitive ratio is smaller than the lower bound of ϕ1.618 known for eventually committing schedulers, so this demonstrates a separation between the strengths of schedulers in the two models.

Scheduler 19.

At time t:

  • If task τi has σi+t1.5𝖢t but is not currently running on a slow machine, start τi on a slow machine, cancelling its fast machine implementation if necessary.

  • Let 𝒫 be the set of τi that have arrived and are not running on a slow machine. Choose τi𝒫 maximizing σi+ti, breaking ties by choosing the task with the smaller i. Run τi on the fast machine during this time step.

Theorem 20.

𝗇𝖾𝗏 is a 1.5-competitive never-committing scheduler.

Proof.

Fix TAP 𝒯. Let C~𝗇𝖾𝗏 denote the final time when 𝗇𝖾𝗏 has work on the fast machine. Observe that if 𝗇𝖾𝗏 ever runs τi on a slow machine, then 𝗇𝖾𝗏 finishes τi before time 1.5𝖢𝗈𝗉𝗍. Thus, to show that 𝗇𝖾𝗏 is 1.5-competitive it suffices to show C~𝗇𝖾𝗏1.5𝖢𝗈𝗉𝗍.

Let 𝒜={τi:σi+ti>𝖢𝗈𝗉𝗍} be the set of tasks that 𝗈𝗉𝗍 actually runs on the fast machine.

Claim 21.

𝗇𝖾𝗏 never runs a task τ𝒜 on the fast machine after time 𝖢𝗈𝗉𝗍.

Proof.

𝗇𝖾𝗏 always allocates the fast machine to the present task with the largest value of σi+ti among tasks that aren’t running on slow machines. Thus, whenever there are tasks from 𝒜 that aren’t running on slow machines, 𝗇𝖾𝗏 will run one such task on the fast machine. 𝗈𝗉𝗍 is able to complete all tasks in 𝒜 on the fast machine by time 𝖢𝗈𝗉𝗍. Thus, 𝗇𝖾𝗏 completes or starts on slow machines all tasks τ𝒜 before time 𝖢𝗈𝗉𝗍.

This means that the only way to have 𝖢~𝖾𝗏𝖾>𝖢𝗈𝗉𝗍 is if there are tasks with σi+ti𝖢𝗈𝗉𝗍 that have yet to be completed at time 𝖢𝗈𝗉𝗍; we assume that this is the case for the remainder of the proof. Let Π(x) be the total amount of work 𝗇𝖾𝗏 performs on the fast machine after time 𝖢𝗈𝗉𝗍 across all tasks with σi+ti[x,1.5x]. For any x𝖢𝗈𝗉𝗍, let R(x)=inf{t|𝖢tx} be the first time an online algorithm becomes aware that the optimal schedule requires x completion time; the following key claim allows us to bound this left-over work Π(x) in terms of R(x).

Claim 22.

For all x, we have Π(x)xR(x).

Proof.

Let 𝒥x denote the set of tasks with σi+ti1.5x that 𝗇𝖾𝗏 runs on the fast machine at some time after 𝖢𝗈𝗉𝗍. First, note that all τi𝒥x must have ti<R(x), or else τi would be placed on a slow machine upon arrival. Choose ε sufficiently small, such that no tasks arrive between times R(x)ε and R(x). Since R(x)ε<R(x), we know 𝖢R(x)ε<x, and so 𝗈𝗉𝗍(R(x)ε) must run all tasks with σi+tix on the fast machine. In order for 𝗈𝗉𝗍(R(x)ε) to finish these tasks before time 𝖢R(x)ε<x, 𝗈𝗉𝗍(R(x)ε) must have at most xR(x)+ε fast work remaining across all such tasks.

Now, by the same argument as in 21, because 𝗇𝖾𝗏 prioritizes tasks with σi+tix over tasks with σi+ti<x on the fast machine whenever they are present (and not yet started on slow machines), 𝗇𝖾𝗏 has at most xR(x)+ε work remaining on tasks in 𝒥x at time R(x)ε. Because no more tasks from 𝒥x arrive after this time, we have Π(x)xR(x)+ε as well. The claim held for all ε>0, and so taking ε0 we have Π(x)xR(x).

We now give an observation to control R(x). Let τi be the task, among all tasks that 𝗇𝖾𝗏 runs on the fast machine after time 𝖢𝗈𝗉𝗍, with the smallest value of σi+ti. Let λ=σi+ti.

Claim 23.

For all xλ, we have R(x)>1.5xλ.

Proof.

First, note that tiR(λ)λ or else 𝗇𝖾𝗏 would start τi on a slow machine upon arrival. Now, because 𝗇𝖾𝗏 doesn’t start τi on a slow machine at time R(x)>ti, we have R(x)+σi>1.5x.

To prove the theorem, it will now suffice to branch into two cases, based on how large λ is.

Case 1.

λ(2/3)𝖢𝗈𝗉𝗍.

In this case, since 1.5λ𝖢𝗈𝗉𝗍, by 21 all left-over work at time 𝖢𝗈𝗉𝗍 comes from tasks with σi+ti[λ,1.5λ]. By 22, the total amount of such work is at most λR(λ). Then, by 23, we know R(λ)>.5λ. Together, this implies that the total amount of leftover work is at most .5λ.5𝖢𝗈𝗉𝗍.

Case 2.

λ<2𝖢𝗈𝗉𝗍/3.

First note that λ𝖢𝗈𝗉𝗍/2 or else τi would be started on a slow machine at time 𝖢𝗈𝗉𝗍. So, all left-over work at time 𝖢𝗈𝗉𝗍 comes either from tasks with σi+ti[λ,1.5λ], or from tasks with σi+ti[1.5λ,1.52λ]. By 22, we can therefore bound the total amount of leftover work by (λR(λ))+(1.5λR(1.5λ)). Now, by 23, this quantity can be at most (λ.5λ)+(1.5λ1.25λ)=.75λ. Since λ<2𝖢𝗈𝗉𝗍/3, this is at most .5𝖢𝗈𝗉𝗍.

 Remark 24.

In Appendix A we show that 19 is optimal among never-committing schedulers that never cancel implementations running on slow machines. This shows that improving on 19 will require a substantially different scheduler.

6 Extending Beyond the Massively Parallel Regime

In Theorem 7, we have shown that 5 is a 2-competitive instantly-committing scheduler in the Massively Parallel regime of the SPDP. In this section, we will show that in fact, 5 is a 2-competitive scheduler even outside of the Massively Parallel regime, although the analysis is slightly more complicated. This result is interesting in its own right, resolving an open question from [16]. However, we think that the main virtue of this proof is that it serves as a proof-of-concept that results from the conceptually simpler Massively Parallel regime can be adapted to apply to the general SPDP: we conjecture that all upper bounds holding in the massively parallel regime should also hold in the general SPDP.

First, recall the setup of the SPDP problem. The input is a sequence of triples τi=(σi,πi,ti), where πi is the work of the parallel implementation of task τi, and σi is the work of the serial implementation of task τi. At each time step, the scheduler allocates its processors to the jobs, giving at most 1 processor to each serial job.

5 is not a defined scheduler in the SPDP, because we specify the decisions for which tasks to run, but do not specify how to schedule the tasks. We extend 𝗂𝗇𝗌 to the general SPDP as follows:

Scheduler 25.

When task τi arrives:

  • If σi+ti>2𝖢ti parallelize τi.

  • Otherwise, serialize τi.

At every timestep, if there are x serial jobs present, then 𝗂𝗇𝗌 schedules the jobs by allocating a processor to each of the min(p,x) serial jobs with the most remaining work (or an arbitrary set of p jobs with maximum remaining work if there are more than p serial jobs with maximum remaining work), and then allocating any remaining processors to an arbitrary parallel job (if a parallel job is present).

Now we analyze 𝗂𝗇𝗌. We say 𝗂𝗇𝗌 is saturated at time t if 𝗂𝗇𝗌 has no idle processors at time t.

Lemma 26.

If 𝗂𝗇𝗌 is unsaturated right before finishing, then 𝖢𝗂𝗇𝗌2𝖢𝗈𝗉𝗍.

Proof.

We claim that if 𝗂𝗇𝗌 is unsaturated at time t, then for each task τi present at time t, τi has been run on every time step since it arrived. Suppose this is not the case. Then, there must have been some time step before time t when there were at least p serial jobs with at least as much remaining work as τi. But then τi will finish at the same time as these other jobs, contradicting the fact that 𝗂𝗇𝗌 is unsaturated at time t. Thus, if 𝗂𝗇𝗌 is unsaturated at time t, then tσi+ti for some i such that 𝗂𝗇𝗌 ran τi in serial. Thus, t2𝖢𝗈𝗉𝗍, as desired.

By virtue of Lemma 26 it suffices to consider the case that 𝗂𝗇𝗌 is saturated immediately before finishing. Let t be the final time in [0,𝖢𝗂𝗇𝗌) when 𝗂𝗇𝗌 is unsaturated (we set t=0 if 𝗂𝗇𝗌 is always saturated). Let i[n] be the smallest i such that tit; in fact we will have ti=t, since in order to transition from being unsaturated to being saturated, some tasks must arrive. For integer i[i,n], let 𝖪𝖺𝗅𝗀i denote the sum of πj for each τj with iji that 𝖺𝗅𝗀 runs in parallel; If 𝖺𝗅𝗀 is an instantly-committing scheduler then 𝖪𝖺𝗅𝗀i can be computed at time ti. Now we prove an analogue of Lemma 6.

Lemma 27.

Fix a length n TAP. For all i[i,n], and for all instantly-committing schedulers 𝖺𝗅𝗀,

𝖪𝗂𝗇𝗌i(𝖢𝖺𝗅𝗀tit)p+𝖪𝖺𝗅𝗀i. (9)

Proof.

We prove Equation 9 by induction on i. The base case is i=i. If 𝖺𝗅𝗀 takes πi work here, then we have 𝖪𝗂𝗇𝗌i𝖪𝖺𝗅𝗀i (and 𝖢𝖺𝗅𝗀tit) so the invariant will hold. If instead 𝖺𝗅𝗀 runs τi in serial, then we’ll have 𝖢𝖺𝗅𝗀titσiπi/p, in which case we have 𝖪𝗂𝗇𝗌i(𝖢𝖺𝗅𝗀tit)p so the invariant holds. This establishes the base case.

Now, assume Equation 9 for i[i,n); we prove Equation 9 for i+1. If 𝖺𝗅𝗀 takes at least as much work as 𝗂𝗇𝗌 on τi+1, i.e., 𝖪𝖺𝗅𝗀i+1𝖪𝖺𝗅𝗀i𝖪𝗂𝗇𝗌i+1𝖪𝗂𝗇𝗌i, then Equation 9 for (i,𝖺𝗅𝗀) implies Equation 9 for (i+1,𝖺𝗅𝗀). It remains to consider the case when 𝖺𝗅𝗀 runs τi+1 in serial, but 𝗂𝗇𝗌 runs τi+1 in parallel. Here, 𝗂𝗇𝗌 thought τi+1 was too large to serialize large, so:

𝖢𝖺𝗅𝗀ti+1σi+1+ti+1>2𝖢ti+1. (10)

One consequence of Equation 10 is that 𝗈𝗉𝗍(ti+1) parallelizes τi+1; a corollary of this is that Equation 9 holds for (i+1,𝗈𝗉𝗍(ti+1)). Thus, applying Equation 9 for (i+1,𝗈𝗉𝗍(ti+1)) and using Equation 10 gives:

𝖪𝗂𝗇𝗌i+1(𝖢ti+1t)p+𝖪𝗈𝗉𝗍(ti+1)i+1(2𝖢ti+12t)p(𝖢𝖺𝗅𝗀ti+1t)p,

so the invariant holds in this case as well.

In the Massively Parallel regime Theorem 7 followed immediately from Lemma 6. Slightly more work is required in the general setting, but Lemma 27 is still very useful.

Theorem 28.

𝗂𝗇𝗌 is a 2-competitive instantly-committing scheduler in the SPDP.

Proof.

Recall from Lemma 26 that we need only consider the case that 𝗂𝗇𝗌 ends saturated, and recall the definition of t. For any scheduler 𝖺𝗅𝗀, let 𝖡𝖺𝗅𝗀 denote the work that 𝖺𝗅𝗀 has left immediately before time t, and let 𝖪𝖺𝗅𝗀 be work that 𝖺𝗅𝗀 takes on tasks τi with tit. Because 𝗂𝗇𝗌 ends saturated, we have

𝖢𝗂𝗇𝗌=t+(𝖪𝗂𝗇𝗌+𝖡𝗂𝗇𝗌)/p.

Applying Lemma 27 gives

t+(𝖪𝗂𝗇𝗌+𝖡𝗂𝗇𝗌)/p𝖢𝗈𝗉𝗍+(𝖪𝗈𝗉𝗍+𝖡𝗂𝗇𝗌)/p. (11)

So, to conclude, it suffices to show that 𝖡𝗂𝗇𝗌+𝖪𝗈𝗉𝗍p𝖢𝗈𝗉𝗍. Let S be the set of tasks that 𝗂𝗇𝗌 has present immediately before time t. Let W=τiSσi. Clearly 𝖡𝗂𝗇𝗌W. On the other hand, 𝗈𝗉𝗍 must take at least W work on the tasks S, and can have made at most pt progress on these tasks by time t. Thus,

𝖡𝗂𝗇𝗌Wpt+𝖡𝗈𝗉𝗍.

Therefore,

𝖡𝗂𝗇𝗌+𝖪𝗈𝗉𝗍pt+𝖡𝗈𝗉𝗍+𝖪𝗈𝗉𝗍p𝖢𝗈𝗉𝗍.

Using this in Equation 11 gives 𝖢𝗂𝗇𝗌2𝖢𝗈𝗉𝗍.

References

  • [1] S Anand, Naveen Garg, and Nicole Megow. Meeting deadlines: How much speed suffices? In Automata, Languages and Programming: 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part I 38, pages 232–243. Springer, 2011. doi:10.1007/978-3-642-22006-7_20.
  • [2] James Aspnes, Yossi Azar, Amos Fiat, Serge Plotkin, and Orli Waarts. On-line routing of virtual circuits with applications to load balancing and machine scheduling. Journal of the ACM (JACM), 44(3):486–504, 1997. doi:10.1145/258128.258201.
  • [3] Baruch Awerbuch, Yossi Azar, Edward F Grove, Ming-Yang Kao, P Krishnan, and Jeffrey Scott Vitter. Load balancing in the lp norm. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 383–391. IEEE, 1995. doi:10.1109/SFCS.1995.492494.
  • [4] Brenda S Baker and Jerald S Schwarz. Shelf algorithms for two-dimensional packing problems. SIAM Journal on Computing, 12(3):508–525, 1983. doi:10.1137/0212033.
  • [5] Ioannis Caragiannis. Better bounds for online load balancing on unrelated machines. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 972–981, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347188.
  • [6] Shichuan Deng, Jian Li, and Yuval Rabani. Generalized unrelated machine scheduling problem. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2898–2916. SIAM, 2023. doi:10.1137/1.9781611977554.CH110.
  • [7] Richard A. Dutton and Weizhen Mao. Online scheduling of malleable parallel jobs. In Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS ’07, pages 136–141, USA, 2007. ACTA Press.
  • [8] R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969. doi:10.1137/0117039.
  • [9] Shouwei Guo and Liying Kang. Online scheduling of malleable parallel jobs with setup times on two identical machines. European Journal of Operational Research, 206(3):555–561, November 2010. doi:10.1016/j.ejor.2010.03.005.
  • [10] Anupam Gupta, Amit Kumar, Viswanath Nagarajan, and Xiangkun Shen. Stochastic load balancing on unrelated machines. Mathematics of Operations Research, 46(1):115–133, 2021. doi:10.1287/MOOR.2019.1049.
  • [11] Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie. Stochastic online scheduling on unrelated machines. In Integer Programming and Combinatorial Optimization: 19th International Conference, IPCO 2017, Waterloo, ON, Canada, June 26-28, 2017, Proceedings 19, pages 228–240. Springer, 2017. doi:10.1007/978-3-319-59250-3_19.
  • [12] Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie. Greed works—online algorithms for unrelated machine stochastic scheduling. Mathematics of operations research, 45(2):497–516, 2020. doi:10.1287/MOOR.2019.0999.
  • [13] Ellis Horowitz and Sartaj Sahni. Exact and approximate algorithms for scheduling nonidentical processors. J. ACM, 23(2):317–327, April 1976. doi:10.1145/321941.321951.
  • [14] Johann L Hurink and Jacob Jan Paulus. Online algorithm for parallel job scheduling and strip packing. In Approximation and Online Algorithms: 5th International Workshop, WAOA 2007, Eilat, Israel, October 11-12, 2007. Revised Papers 5, pages 67–74. Springer, 2008. doi:10.1007/978-3-540-77918-6_6.
  • [15] Sungjin Im and Shi Li. Improved approximations for unrelated machine scheduling. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2917–2946. SIAM, 2023. doi:10.1137/1.9781611977554.CH111.
  • [16] William Kuszmaul and Alek Westover. Scheduling jobs with work-inefficient parallel solutions. In Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, pages 101–111, 2024. doi:10.1145/3626183.3659960.
  • [17] Jan Karel Lenstra, David B Shmoys, and Éva Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical programming, 46:259–271, 1990. doi:10.1007/BF01585745.
  • [18] Walter Ludwig and Prasoon Tiwari. Scheduling malleable and nonmalleable parallel tasks. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 167–176, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314491.
  • [19] Marco Molinaro. Stochastic p Load Balancing and Moment Problems via the L-Function Method, pages 343–354. SIAM, 2019. doi:10.1137/1.9781611975482.22.
  • [20] Gregory Mounie, Christophe Rapine, and Dennis Trystram. Efficient approximation algorithms for scheduling malleable tasks. In Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, SPAA ’99, pages 23–32, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/305619.305622.
  • [21] Daniel R. Page, Roberto Solis-Oba, and Marten Maack. Makespan minimization on unrelated parallel machines with simple job-intersection structure and bounded job assignments. Theoretical Computer Science, 809:204–217, 2020. doi:10.1016/j.tcs.2019.12.009.
  • [22] Andreas S. Schulz and Martin Skutella. Scheduling unrelated machines by randomized rounding. SIAM Journal on Discrete Mathematics, 15(4):450–469, 2002. doi:10.1137/S0895480199357078.
  • [23] John Turek, Walter Ludwig, Joel L. Wolf, Lisa Fleischer, Prasoon Tiwari, Jason Glasgow, Uwe Schwiegelshohn, and Philip S. Yu. Scheduling parallelizable tasks to minimize average response time. In Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, SPAA ’94, pages 200–209, New York, NY, USA, 1994. Association for Computing Machinery. doi:10.1145/181014.181331.
  • [24] John Turek, Uwe Schwiegelshohn, Joel L Wolf, and Philip S Yu. Scheduling parallel tasks to minimize average response time. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 112–121, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314485.
  • [25] John Turek, Joel L Wolf, and Philip S Yu. Approximate algorithms scheduling parallelizable tasks. In Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, pages 323–332, 1992. doi:10.1145/140901.141909.
  • [26] Nodari Vakhania, Jose Hernandez, and Frank Werner. Scheduling unrelated machines with two types of jobs. International Journal of Production Research, 52:1–9, February 2014. doi:10.1080/00207543.2014.888789.
  • [27] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE Computer Society, 1977.
  • [28] Deshi Ye, Danny Z. Chen, and Guochuan Zhang. Online scheduling of moldable parallel tasks. Journal of Scheduling, 21(6):647–654, 2018. Publisher: Springer. URL: https://ideas.repec.org//a/spr/jsched/v21y2018i6d10.1007_s10951-018-0556-2.html, doi:10.1007/S10951-018-0556-2.
  • [29] Deshi Ye, Xin Han, and Guochuan Zhang. A note on online strip packing. Journal of Combinatorial Optimization, 17(4):417–423, 2009. doi:10.1007/S10878-007-9125-X.
  • [30] Xiaoyan Zhang, Ran Ma, Jian Sun, and Zan-Bo Zhang. Randomized selection algorithm for online stochastic unrelated machines scheduling. Journal of Combinatorial Optimization, pages 1–16, 2022.

Appendix A Barriers Against Improved Schedulers

In this section we show that the schedulers of Section 4 and Section 5 are optimal among natural restricted classes of schedulers. This highlights what changes must be made to the schedulers in order to have hopes of achieving better competitive ratios.

First we show that among non-procrastinating eventually-committing schedulers (i.e., eventually-committing schedulers with the property that whenever tasks are present, they will run at least one task), the scheduler 8 is optimal.

Proposition 29.

Fix ε>0. Let ξ1.677 denote the real root of the polynomial 2x33x21. There is no deterministic (ξε)-competitive non-procrastinating eventually-committing scheduler.

Proof.

It suffices to consider the case that ε<.001. Fix a non-procrastinating eventually-committing scheduler 𝖺𝗅𝗀. Assume towards contradiction that 𝖺𝗅𝗀 is (ξε)-competitive. We now describe a TAP 𝒯 on which 𝖢𝖺𝗅𝗀ξ𝖢𝗈𝗉𝗍. The TAP starts with τ1=(1/ξ,1/ξ2,0). Next, let τ2=(1ε2,1/ξε2,ε2). Finally, at each time t[ξ+1/ξ2,1ε2](ε2), give a task τ=(,ε2,t).

We now argue that 𝖺𝗅𝗀 must run all the tasks on the fast machine. Because 𝖺𝗅𝗀 is a non-procrastinating (ξε)-competitive scheduler, 𝖺𝗅𝗀 must instantly start τ1 on the fast machine (in case there are no tasks after τ1). Now we argue that 𝖺𝗅𝗀 runs τ2 on the fast machine as well. Suppose that 𝖺𝗅𝗀 starts τ2 on a slow machine at some time t with

1ε2+t>(ξε)𝖢t. (12)

Then, 𝖺𝗅𝗀 would not be (ξε)-competitive on the truncated TAP 𝒯t. Thus, 𝖺𝗅𝗀 must not start τ2 on a slow machine at any time t satisfying Equation 12. We now show that Equation 12 holds for all tε, thus proving that 𝖺𝗅𝗀 must run τ2 on the fast machine. For t[ε2,ξ+1/ξ2) we have 𝖢t1/ξ, and 1ε2+t1, so Equation 12 holds. For tξ+1/ξ2 we have

𝖢tmin(1,tξ+2+ε2).

Thus, it suffices to show:

1ε2+t>(ξε)min(1,tξ+2+ε2). (13)

To show Equation 13 it suffices to check Equation 13 for t=ξ1ε2 (by monotonicity of the inequality on either side of t=ξ1). At t=ξ1ε2 Equation 13 is:

ξ2ε2>ξε,

which is true because ε<.001.

We have now shown that 𝖺𝗅𝗀 runs all tasks in 𝒯 on the fast machine. Thus, (by definition of ξ)

𝖢𝖺𝗅𝗀1/ξ2+1/ξε2+1(1/ξ+ξ2)ε2=ξ2ε2.

However, 𝖢𝗈𝗉𝗍1. This contradicts the assumption that 𝖺𝗅𝗀 is (ξε)-competitive.

Now, we show that the 1.5-competitive scheduler of Section 5 is optimal among never-committing schedulers that don’t cancel tasks on slow machines.

Proposition 30.

Let 𝖺𝗅𝗀 be a deterministic never-committing scheduler that never cancels serial tasks. Then, for any ε>0, there is a TAP 𝒯 with nO(1) on which 𝖺𝗅𝗀 has is not (1.5ε)-competitive.

Proof.

It suffices to consider the case that ε<.001. The TAP is defined as follows. First, τ1=(2,1,0). Then, for each time t[ε2,1ε2]ε2, a task τ=(,ε2,t) arrives. We will show that if 𝖺𝗅𝗀 starts τ1 on a slow machine at any time t then 𝖺𝗅𝗀 is not (1.5ε)-competitive on 𝒯t. We show this by considering two cases.

Case 1.

𝖺𝗅𝗀 starts τ1 on a slow machine at time t[0,1].
If 𝖺𝗅𝗀 does this, then 𝖢𝖺𝗅𝗀𝒯t2+t. However, 𝖢tt+1+ε2. Thus,

𝖢𝖺𝗅𝗀𝒯t/𝖢t2+tt+1+ε232+ε2>1.5ε.

So 𝖺𝗅𝗀 cannot start τ1 on a slow machine at this time.

Case 2.

𝖺𝗅𝗀 starts τ1 on a slow machine at time t1.
If 𝖺𝗅𝗀 does this, then 𝖢𝖺𝗅𝗀2+t. However, 𝖢𝗈𝗉𝗍2. Thus,

𝖢𝖺𝗅𝗀/𝖢𝗈𝗉𝗍1.5.

In conclusion, 𝖺𝗅𝗀 must run τ1 on the fast machine. But then

𝖢𝖺𝗅𝗀3ε2>(1.5ε)𝖢𝗈𝗉𝗍=(1.5ε)2,

a contradiction.

Appendix B Lower Bounds from [16]

In this section we state, for the reader’s convenience, the lower bounds from [16] against instantly- and eventually- committing schedulers.

Proposition 31 (Kuszmaul, Westover [16]).

Fix ε>0. There is no deterministic (2ε)-competitive instantly-committing scheduler.

Proof.

Consider an n-task TAP where for each i[n], the i-th task has σi=2i,πi=2i1, and the arrival times are all very close to 0. For each i[n], it is possible to handle the first i tasks in the TAP with completion time 2i1. Thus, a (2ε)-competitive scheduler cannot afford to run task τi on a slow machine. So, a (2ε)-competitive scheduler must run all tasks on the fast machine, giving completion time at least 2n1 on this TAP, while 𝖢𝗈𝗉𝗍2n1. For large enough n this implies that the scheduler is not actually 2ε competitive.

Proposition 32 (Kuszmaul, Westover [16]).

Fix ε>0. There is no deterministic (ϕε)-competitive eventually-committing scheduler, where ϕ1.618 is the golden ratio.

Proof.

Suppose that 𝖺𝗅𝗀 is a (ϕε)-competitive eventually-committing scheduler. Let τ1=(ϕ,1,0); if there are no further tasks, 𝖺𝗅𝗀 must run τ1 on the fast machine, starting at some time t01/ϕ. Let τ2=(,ϕt0,t0). On this TAP, 𝖢𝗈𝗉𝗍=ϕ, while 𝖢𝖺𝗅𝗀ϕ+1=ϕ2. So 𝖺𝗅𝗀 is not (ϕε)-competitive.

Appendix C Randomized Lower Bounds

In this section we give lower bounds against randomized schedulers. Our main tool is Yao’s minimax principle [27], which allows us to prove a lower bound on the competitive ratio by exhibiting a distribution over TAPs, and showing that any deterministic scheduler has poor expected cost on a random TAP drawn from the distribution.

Proposition 33.

For any ε>0.03, there is no (5/3ε)-competitive instantly-committing scheduler, even with randomization.

Proof.

Fix N=25. For k, define 𝒯k to be a length k TAP with σi=2i+1,πi=2i. Let 𝒟 denote the following distribution over TAPs: choose k[N] uniformly randomly, and then output TAP 𝒯k. By brute force enumeration of all possible deterministic instantly-committing strategies, one can show that no such strategy is 1.637-competitive on this TAP.

Proposition 34.

For any ε>0, there is no ((3+3)/4ε)-competitive eventually-committing scheduler, even with randomization.

Proof.

In the proof of Proposition 18 we defined two TAPs, and showed that no deterministic eventually-committing scheduler is ((1+3)/2ε)-competitive on both of the TAPs. One can show that if we randomly choose between the two TAPs of Proposition 18, there is no deterministic eventually-committing scheduler with expected competitive ratio (1+3)/4ε.