When to Give up on a Parallel Implementation
Abstract
In the Serial Parallel Decision Problem (SPDP), introduced by Kuszmaul and Westover [SPAA’24], an algorithm receives a series of tasks online, and must choose for each between a serial implementation and a parallelizable (but less efficient) implementation. Kuszmaul and Westover describe three decision models: (1) Instantly-committing schedulers must decide on arrival, irrevocably, which implementation of the task to run. (2) Eventually-committing schedulers can delay their decision beyond a task’s arrival time, but cannot revoke their decision once made. (3) Never-committing schedulers are always free to abandon their progress on the task and start over using a different implementation. Kuszmaul and Westover gave a simple instantly-committing scheduler whose total completion time is -competitive with the offline optimal schedule, and proved two lower bounds: no eventually-committing scheduler can have competitive ratio better than in general, and no instantly-committing scheduler can have competitive ratio better than in general. They conjectured that the three decision models should admit different competitive ratios, but left upper bounds below in any model as an open problem.
In this paper, we show that the powers of instantly, eventually, and never committing schedulers are distinct, at least in the “massively parallel regime”. The massively parallel regime of the SPDP is the special case where the number of available processors is asymptotically larger than the number of tasks to process, meaning that the work associated with running a task in serial is negligible compared to its runtime. In this regime, we show (1) The optimal competitive ratio for instantly-committing schedulers is , (2) The optimal competitive ratio for eventually-committing schedulers lies in , (3) The optimal competitive ratio for never-committing schedulers lies in . We additionally show that our instantly-committing scheduler is also -competitive outside of the massively parallel regime, giving proof-of-concept that results in the massively parallel regime can be translated to hold with fewer processors.
Keywords and phrases:
Scheduling, Multi-Processor, Online-AlgorithmsCopyright and License:
2012 ACM Subject Classification:
Theory of computation Online algorithmsEditors:
Raghu MekaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
1.1 Background
Many computational tasks can be performed quickly in parallel over a large number of processors – but such parallel implementations may be less work-efficient than a serial implementation on a single processor, requiring substantially more total computation time across all machines. When several different tasks must be completed in as little total time as possible, this trade-off between work and time can necessitate running different tasks in different modes: small tasks can be done in serial to save work, while large tasks must be parallelized to prevent their serial runtimes from dominating the overall computation.
To formalize this problem, Kuszmaul and Westover introduced the Serial Parallel Decision Problem (SPDP) [16]. In their model, each task has exactly two possible implementations: an “embarrassingly parallel” implementation which can be worked on by multiple machines at once (where the rate of progress on the implementation is proportional to the number of processors assigned to it), and a serial implementation which can only be worked on by a single processor at a time. If all tasks are available at time , it is easy to efficiently determine the optimal strategy: all jobs with serial completion time smaller than some threshold can be run in serial, and the larger tasks must be run in parallel. The model becomes interesting when previously-unknown tasks are allowed to arrive at arbitrary times, and one wishes to minimize the competitive ratio between the total completion time of an online algorithm compared to the offline optimal completion time.
Kuszmaul and Westover define three distinct versions of this model, parameterized by the degree to which the online scheduler is able to reverse its decisions.
-
1.
An instantly-committing scheduler must choose an implementation for each task as soon as the task arrives, and is not allowed to revisit this choice.
-
2.
An eventually-committing scheduler may delay choosing an implementation, but must choose one irrevocably before assigning its work to a processor.
-
3.
A never-committing scheduler can, at any time, discard all as-yet completed work on an implementation and re-start the task with the other implementation.
The distinction between the eventually- and never-committing models is motivated by potential practical concerns: if a task involves mutating an input in memory, it may not be feasible to cancel an implementation once it begins running. Westover and Kuszmaul present an instantly-committing scheduler achieving competitive ratio , and show competitive ratio lower bounds of and in the instantly-committing and eventually-committing models, respectively (for deterministic schedulers). They conjecture that the ability to delay or cancel choices should allow for more competitive online algorithms, but leave open the problem of finding better competitive ratio upper bounds than .
1.2 This Work
In this work, we consider Kuszmaul and Westover’s SPDP when the number of available processors is much larger than the number of tasks, noting that all of their upper and lower bounds hold in this parameter regime. This is a particularly simple setting, since the work associated to a serial implementation is now negligible compared to its completion time – running a task in serial means accepting a lower bound on completion time, but requires essentially no work. We can think of this setting as an unrelated-machines scheduling problem with an unlimited number of identical “slow” machines, and a single unrelated “fast” machine, representing a massively parallel implementation of the task across many processors – note that this could also describe scenarios with a literal fast machine, such as a single piece of accelerated hardware. Although we consider the number of processors to be large, we won’t assume that tasks complete instantly when run on all the processors; instead, we’ll let the total parallel work of each tasks be proportional to .
Our main results are tight bounds on the competitive ratio of instantly-committing schedulers in this regime, and separations between the strength of all 3 models. Our results are summarized in Table 1. Note that we focus on deterministic schedulers (although we briefly discuss randomized schedulers in the appendix).
| Model | Lower Bound | Upper Bound |
| Instantly-Committing Schedulers | 2 [16] | 2 * |
| Eventually-Committing Schedulers | 1.618 [16] | 1.678 * |
| Never-Committing Schedulers | 1.366 * | 1.5 * |
In each case, the upper bound comes from a simple heuristic in which the algorithm compares its projected completion time to its current estimate of the optimal completion time. More precisely, at each time , our scheduler computes an offline optimal strategy “” on the truncation of the task sequence to tasks that arrive before time , and makes decisions based on the completion time of .
Our main technical contribution is the analysis of the schedulers. Working with is challenging, because the schedules and can be quite different for . For instantly-committing schedulers we use an invariant-based approach to bound, at all times, the work taken by our scheduler in terms of the minimum work and completion time among all schedulers. For eventually- and never-committing schedulers this approach is no longer feasible: there is no well defined notion of the “work taken” by our scheduler, because the scheduler may not have committed to a decision yet. Instead, these analyses rely on choosing a couple of critical times to observe the state of our scheduler and , and then establishing a dichotomy: either (1) “real fast tasks” (tasks that runs on the fast machine) arrive quickly, in which case our scheduler prioritizes real fast tasks on the fast machine and will run most other tasks on slow machines, or (2) real fast tasks arrive slowly, in which case our scheduler never falls too far behind , despite making suboptimal use of the fast machine.
In addition to these results, we show that with some effort our instantly-committing scheduler can be adapted to work for any number of processors, fully resolving the question of the optimal competitive ratio of instantly-committing schedulers in the general SPDP, and giving a proof-of-concept that results for a large number of processors can be adapted to hold when the work associated with serial tasks is also a concern.
1.3 Related Work
There is a long line of work studying the phenomenon of work-inefficient parallel implementations in multi-processor scheduling. Typically, the models of limited parallelism considered involve one of three types of jobs:
-
1.
Rigid jobs, which come with a number specifying a fixed number of processors the job must be run on at each timestep of its execution.
-
2.
Moldable jobs, where the scheduler may choose the (fixed) number of processors the job is run on, and the amount of work scales depending on this choice according to some speedup curve.
-
3.
Malleable jobs, which like moldable jobs have an associated speedup curve, but where the job may be assigned to different numbers of processors at different timesteps (as opposed to the scheduler choosing a fixed value at the start of the task’s runtime).
In each of these cases, there is interest in minimizing the total completion time (makespan) in both the offline setting – where problems tend to be -hard, but may have approximation algorithms [25, 20, 18, 24, 23] – and the online setting, where the goal is to minimize competitive ratio [8, 7, 4, 14, 9, 29, 28]. Kuszmaul and Westover’s Serial Parallel Decision Problem is related to this line of work, but doesn’t quite fit into the usual framework – in their model, instead of dealing with an arbitrary speedup curve, there is a single binary decision between a completely serial and perfectly parallelizable implementation.
As noted, the massively-parallel regime of the SPDP considered in this paper can be naturally viewed as a scheduling problem with an unlimited number of identical “slow” machines, and a single unrelated “fast” machine. Standard scheduling problems in the unrelated machines model have also been well-studied, in terms of both offline approximation algorithms and hardness results [13, 17, 22, 26, 19, 21, 10, 15, 6], and online algorithms [3, 2, 22, 5, 1, 11, 12, 30]. We note, however, that since we treat “slow” machines as an unbounded resource, and there is only a single fast machine available, most of the typical difficulties of multi-processor scheduling problems do not arise. In particular, unlike a typical load-balancing problem where -hardness follows from a standard set-partition reduction, the One-Fast-Many-Slow Decision Problem (without dependencies) is easily solvable offline, simply by putting all tasks which finish below a certain threshold on distinct slow machines.
1.4 Open Questions
We leave three main open questions as directions for future work.
Question 1.
What are the optimal competitive ratios for eventually/never-committing schedulers?
In Appendix A we identify barriers, showing that improving on our eventually/never-committing schedulers will require substantially different algorithms – but we suspect that such improvements may be possible.
Question 2.
Are randomized schedulers more powerful than deterministic schedulers?
In the main body of the paper we consider only deterministic schedulers; however, for many online problems randomized algorithms can do substantially better than deterministic ones. In Appendix C we give some lower bounds against randomized schedulers, but these bounds are weaker than those known for deterministic schedulers.
Question 3.
Is there a general transformation between schedulers for the massively parallel regime (i.e. the One-Fast-Many-Slow Decision Problem) and the general SPDP?
The fundamental difficulty of the SPDP is deciding between implementations which take a lot of work, and implementations which take a lot of time. This tradeoff is absolute in the massively parallel regime, since the large number of processors means the amount of work associated with a serial implementation is negligible, whereas in the general SPDP it is possible for all processors to be saturated with serial implementations to run. Intuitively, one might expect that having work associated to the serial implementations only makes the problem easier, since it makes the tradeoff less dramatic – indeed, Kuszmaul and Westover’s competitive ratio lower bounds become weaker when the number of processors is small. So, one might hope that algorithms in the massively parallel regime can be generically translated to limited-processor settings. Formalizing this connection is an interesting direction for future research.
2 Preliminaries
2.1 The One-Fast-Many-Slow Decision Problem
In this section we formally define the One-Fast-Many-Slow Decision Problem, where the goal is to distribute work between a single fast machine and an unlimited number of slow machines. An instance of the problem is a Task Arrival Process (TAP) , where each task consists of a tuple indicating runtime on a slow machine, runtime on the fast machine, and arrival time, respectively, such that and such that . A valid schedule associates at most one task to each machine at each point in time111In order for the notions like “amount of work performed on ” to be well-defined, we must additionally mandate that a schedule be measurable. Alternatively, one can assume that time is discretized into appropriately fine timesteps. such that no work is done on any task before its arrival time, each task runs on at most 1 machine, and each task is either run for a total of time on some slow machine, or a total of time on the fast machine. The completion time (also known as makespan) of the schedule is the time when the last task is finished.
We will be interested in online algorithms for this problem. An online scheduler learns about each task only at its arrival time, and at each time must already have fixed the prefix of the schedule on times less than . We define three distinct models for how these online decisions are made:
-
1.
For each task , an instantly-committing scheduler must fix at ’s arrival time the machine that will run on.
-
2.
An eventually-committing scheduler need not fix a machine for any task until that task begins running.
-
3.
A never-committing scheduler is an eventually-committing scheduler with the additional power to, at any time, “cancel” a task from the schedule, erasing all work previously done on the task and allowing it to be re-assigned to a new machine.
In each case, we are interested in minimizing the competitive ratio of an online scheduler, which is the supremum over all TAPs of the ratio of the online scheduler’s completion time to the completion time of an optimal scheduler on that TAP.
2.2 Connection to the SPDP
In Kuszmaul and Westover’s Serial Parallel Decision Problem, a scheduler must allocate work to equally-powerful processors, where each task is specified by the work of the serial implementation, the work of the parallel implementation, and the runtime. The scheduler must choose whether to run each task in serial or parallel, and then must assign the resulting work to the processors, where parallel work can run on multiple processors at once but serial work cannot.
We can define the massively parallel regime of this problem to be the limit as the number of processors becomes large compared to the number of tasks. Letting be the number of tasks, if then restricting the serial implementations to run on only the first many processors, and the parallel implementations to run on only the last many processors, the completion time can increase by at most a factor. This corresponds directly to the One-Fast-Many-Slow Decision Problem: we think of each of these serial processors as a “slow machine”, noting that since we have as many processors as we have tasks there are effectively an unlimited number of processors. We think of the parallel processors collectively as a “fast machine”, noting that we can assume without loss of generality that, at any point in time, all parallel processors are running the same parallel implementation.
2.3 Notation
We now introduce our notation for describing and analyzing schedulers. For algorithm and TAP , we let be the completion time of on . Let be the truncation of TAP consisting of the tasks with . When is clear from context we will write to denote , and we will write to denote . We will also use to denote the completion time of the fast machine – that is, the final time when the fast machine has work.
It will be useful to be able to talk about the optimal completion time of a prefix of the TAP. Define the schedule to be a schedule for with minimal completion time. Note that is only defined as an offline strategy, but that an online algorithm can compute it (efficiently!) at time , thus obtaining a lower bound on , which will be useful to inform the algorithm’s future decisions. For ease of notation, we’ll often abbreviate as .
There may be many sets of decisions which result in the optimal completion time; as opposed to letting be an arbitrary such scheduler, it will be useful to fix a canonical one, which we will do by letting run as many tasks in serial as possible.
Scheduler 4.
The scheduler , defined on , makes decisions as follows:
-
If has , run on a slow machine when it arrives.
-
Otherwise, run on the fast machine. Prioritize tasks with larger , and break ties by taking tasks with smaller .
Finally, we let , and for a set of tasks we will write to denote .
3 A -Competitive Instantly-Committing Scheduler
In this section we present and analyze a -competitive instantly-committing scheduler. Kuszmaul and Westover showed that a competitive ratio of is impossible for instantly-committing schedulers, so our scheduler is optimal. The scheduler, which we call (“instantly commiting”) is defined in 5.
Scheduler 5.
When task arrives:
-
If run on the fast machine, with the fast machine processing tasks in order of arrival.
-
Otherwise run on a slow machine.
We analyze by showing inductively that (the completion time of the fast machine) is small compared to the work and completion time of any other schedule. For length TAP , scheduler , and , we define the quantity to be the sum of for all tasks that runs on the fast machine. The key to analyzing is the following lemma.
Lemma 6.
Fix a length TAP. For all , and for all instantly-committing schedulers ,
| (1) |
Proof.
We prove the lemma by induction on . For the claim is trivial. Now, fix and assume the lemma for and for all ; we will prove the lemma for .
If runs on a slow machine then , and . Thus, the invariant Equation 1 is maintained. We always have , so if runs on the fast machine then the invariant Equation 1 is also maintained, since then by the inductive hypothesis.
The final case to consider is when runs on a slow machine, while runs on the fast machine. From the definition 5 of , the fact that ran on the fast machine implies
| (2) |
On the other hand, ran on the fast machine. Thus,
| (3) |
Now, we use the invariant for to bound . We have:
| (4) |
Because of Equation 2 we know that must run on the fast machine. So, we have
| (5) |
Stringing together the above inequalities Equation 4, Equation 5, Equation 2, and Equation 3, we get
Thus, the invariant Equation 1 holds.
Using Lemma 6 it is easy to show that is -competitive.
Theorem 7.
is a -competitive instantly-committing scheduler.
Proof.
By Lemma 6 we have . Thus, finishes using the fast machine before time . Any task that runs on a slow machine must have so these tasks finish before as well.
4 A -Competitive Eventually-Committing Scheduler
In this section we present and analyze a -competitive eventually-committing scheduler, where is the real root of the polynomial . Kuszmaul and Westover gave a lower bound of on the competitive ratio of any eventually-committing scheduler and conjectured that this lower bound is tight. Our scheduler represents substantial progress towards resolving Kuszmaul and Westover’s conjecture, improving on their previous best algorithm which had a competitive ratio of . Our scheduler, which we call (“eventually committing”), is defined in 8.
Scheduler 8.
At time :
-
If task , which has arrived but not yet been started, has , then start on a slow machine.
-
Maintain up to one active task at a time. The fast machine is always allocated to the active task.
-
When there is no active task, but there are unstarted tasks present, choose as the new active task the unstarted task with the largest value (breaking ties arbitrarily).
Theorem 9.
is a -competitive eventually-committing scheduler.
Proof.
Fix TAP . Let denote the time when completes the last task run on the fast machine. If is run on a slow machine at any time , then finishes before . Thus, it suffices to show that
For any , let be the first time that an online algorithm becomes aware that the optimal schedule requires at least completion time – that is, . Let (“actual”) be the set of tasks that runs on the fast machine, and (“fake”) be the set of tasks that runs on a slow machine but runs on the fast machine. We can bound the sizes and arrival times of tasks in as follows.
Claim 10.
All tasks arrive before time , and have .
Proof.
All tasks are run on the fast machine by , and on slow machines by . In particular this means
Thus, . To show , note that .
To analyze when tasks in get run it will be useful to partition into and . Now we show that, without loss of generality, does not start any tasks in too late.
Claim 11.
If starts a task at any time , then .
Proof.
Note that no task can be started after time : since , any task present but not already running at time would be run on a slow machine. Let be the last time after when starts a task . If does not exist the claim is vacuously true. In light of our previous observation, . Let be the task that starts at time . Because prioritizes making tasks with larger values active, at time there are no tasks present. After time , no more tasks from can arrive by 10, and at most work in can arrive because must be able to complete this work. Thus,
| (6) |
Now, because we have ; using this in Equation 6 we find . This means that we can assume that, after time , the only tasks that runs on the fast machine are , , and whatever the active task was at time . We call the active task at time , if one exists, the stuck task, denoted . We split into cases depending on how large this stuck task is.
Case 1.
There is no stuck task.
In this case, we in fact have . Since there is no active task at time , there are no tasks present but not started on slow machines. By 10, will run all tasks arriving after time on slow machines. Thus, at all time steps , either has no active task on the fast machine, or has some as the active task on the fast machine, so completes at least as quickly as .
Case 2.
There is a stuck task, with .
Define and . Let be a time when all tasks in have already arrived; such a time must exist by 10. Observe that runs all tasks in on the fast machine due to . This further implies that . Also , simply because must complete the work on tasks after these tasks arrive. By 11 we may assume without loss of generality that, after time , is always running a task from . Thus,
Case 3.
There is a stuck task, with .
This case will be the most difficult to handle of the three. It will be useful to focus now on the tasks of that arrive after the stuck task is started. Let be the time when starts running , and let be the fake tasks arriving after . We first observe that, if no such tasks arrive, performs very well.
Claim 12.
If then .
Proof.
At time no tasks can be present, since all such tasks have , so would prioritize running them on the fast machine instead of the stuck task. By 11, we know that after time will always be running tasks from . The total work on tasks from that arrives after time is at most , so if we have
By 12 we may assume . So, let ; we will be able to control how much work arrives in the TAP by the fact that never decides to run the task with on a slow machine. Split into and (note that we use a different threshold to define earliness here than we did in case 2). First we need the following analogue of 11.
Claim 13.
If starts a task at any time , then .
Proof.
Let be the last time in when starts a task . If does not exist the claim is vacuously true. Let be the task that starts at time . Because prioritizes making tasks with larger values active, at time there are no tasks present. After time at most work in can arrive because must be able to complete this work. Recalling that , we have
Claim 14.
.
Proof.
Fix a time after all tasks in have arrived, and fix a task with . First, note that by 13 we can assume that has not started by time . Thus, is free to start on a slow machine, but chooses not to. This implies
| (7) |
We also observe that must run on the fast machine, since running any of them on the slow machine would finish after time . Thus,
| (8) |
Combining Equation 8 and Equation 7 gives the desired statement.
The other observation we make is that cannot happen too early.
Claim 15.
.
Proof.
Let be a task with . Note that by 10. Then, by 13 we may assume without loss of generality that at time is not running , and does not choose to start on a slow machine. So,
Now we conclude the theorem.
Claim 16.
.
Proof.
Note that . Also, note that since was not put on a slow machine immediately upon arrival, we must have . Then, applying 14 and 15 we have
Remark 17.
The simple nature of the lower bound in Proposition 32, along with the fact that gets a competitive ratio quite close to might leave the impression that is clearly the correct competitive ratio, and a slightly better analysis of (a natural variant of) would be -competitive. However, this is not the case: in Appendix A, we show that no non-procrastinating eventually-committing scheduler can achieve competitive ratio better than , where a scheduler is called non-procrastinating if, whenever tasks are present, it always runs at least one task. Thus, if the competitive ratio of can be improved upon, doing so will require a substantially different scheduler: one which occasionally decides to do nothing at all despite work being present.
5 A -Competitive Never-Committing Scheduler
In this section we analyze never-committing schedulers. First we give a simple lower bound.
Proposition 18.
Fix . There is no deterministic -competitive never-committing scheduler.
Proof.
We may assume . Let . Let ; that is, has , and has . Let be the same TAP without the second task. We have , since just runs the single task on the fast machine, and , since can run on a slow machine and on the fast machine.
Suppose that is a -competitive scheduler. On TAP , at time , we claim must be running on the fast machine. If not, then ’s completion time must be at least , with the branch of the min depending on whether is ever moved to the fast machine – but this gives competitive ratio .
Before time , it is impossible to distinguish between and . Thus, must be running on the fast machine at time on TAP . Now, we have , with the branch of the min depending on whether is ever moved to a slow machine – but this gives competitive ratio . Thus, is not actually -competitive.
Now we give a -competitive never-committing scheduler, which we call (“never committing”). Note that this competitive ratio is smaller than the lower bound of known for eventually committing schedulers, so this demonstrates a separation between the strengths of schedulers in the two models.
Scheduler 19.
At time :
-
If task has but is not currently running on a slow machine, start on a slow machine, cancelling its fast machine implementation if necessary.
-
Let be the set of that have arrived and are not running on a slow machine. Choose maximizing , breaking ties by choosing the task with the smaller . Run on the fast machine during this time step.
Theorem 20.
is a -competitive never-committing scheduler.
Proof.
Fix TAP . Let denote the final time when has work on the fast machine. Observe that if ever runs on a slow machine, then finishes before time . Thus, to show that is -competitive it suffices to show .
Let be the set of tasks that actually runs on the fast machine.
Claim 21.
never runs a task on the fast machine after time .
Proof.
always allocates the fast machine to the present task with the largest value of among tasks that aren’t running on slow machines. Thus, whenever there are tasks from that aren’t running on slow machines, will run one such task on the fast machine. is able to complete all tasks in on the fast machine by time . Thus, completes or starts on slow machines all tasks before time .
This means that the only way to have is if there are tasks with that have yet to be completed at time ; we assume that this is the case for the remainder of the proof. Let be the total amount of work performs on the fast machine after time across all tasks with . For any , let be the first time an online algorithm becomes aware that the optimal schedule requires completion time; the following key claim allows us to bound this left-over work in terms of .
Claim 22.
For all , we have .
Proof.
Let denote the set of tasks with that runs on the fast machine at some time after . First, note that all must have , or else would be placed on a slow machine upon arrival. Choose sufficiently small, such that no tasks arrive between times and . Since , we know , and so must run all tasks with on the fast machine. In order for to finish these tasks before time , must have at most fast work remaining across all such tasks.
Now, by the same argument as in 21, because prioritizes tasks with over tasks with on the fast machine whenever they are present (and not yet started on slow machines), has at most work remaining on tasks in at time . Because no more tasks from arrive after this time, we have as well. The claim held for all , and so taking we have .
We now give an observation to control . Let be the task, among all tasks that runs on the fast machine after time , with the smallest value of . Let .
Claim 23.
For all , we have .
Proof.
First, note that or else would start on a slow machine upon arrival. Now, because doesn’t start on a slow machine at time , we have .
To prove the theorem, it will now suffice to branch into two cases, based on how large is.
Case 1.
.
Case 2.
.
First note that or else would be started on a slow machine at time . So, all left-over work at time comes either from tasks with , or from tasks with . By 22, we can therefore bound the total amount of leftover work by . Now, by 23, this quantity can be at most . Since , this is at most .
Remark 24.
In Appendix A we show that 19 is optimal among never-committing schedulers that never cancel implementations running on slow machines. This shows that improving on 19 will require a substantially different scheduler.
6 Extending Beyond the Massively Parallel Regime
In Theorem 7, we have shown that 5 is a -competitive instantly-committing scheduler in the Massively Parallel regime of the SPDP. In this section, we will show that in fact, 5 is a -competitive scheduler even outside of the Massively Parallel regime, although the analysis is slightly more complicated. This result is interesting in its own right, resolving an open question from [16]. However, we think that the main virtue of this proof is that it serves as a proof-of-concept that results from the conceptually simpler Massively Parallel regime can be adapted to apply to the general SPDP: we conjecture that all upper bounds holding in the massively parallel regime should also hold in the general SPDP.
First, recall the setup of the SPDP problem. The input is a sequence of triples , where is the work of the parallel implementation of task , and is the work of the serial implementation of task . At each time step, the scheduler allocates its processors to the jobs, giving at most processor to each serial job.
5 is not a defined scheduler in the SPDP, because we specify the decisions for which tasks to run, but do not specify how to schedule the tasks. We extend to the general SPDP as follows:
Scheduler 25.
When task arrives:
-
If parallelize .
-
Otherwise, serialize .
At every timestep, if there are serial jobs present, then schedules the jobs by allocating a processor to each of the serial jobs with the most remaining work (or an arbitrary set of jobs with maximum remaining work if there are more than serial jobs with maximum remaining work), and then allocating any remaining processors to an arbitrary parallel job (if a parallel job is present).
Now we analyze . We say is saturated at time if has no idle processors at time .
Lemma 26.
If is unsaturated right before finishing, then .
Proof.
We claim that if is unsaturated at time , then for each task present at time , has been run on every time step since it arrived. Suppose this is not the case. Then, there must have been some time step before time when there were at least serial jobs with at least as much remaining work as . But then will finish at the same time as these other jobs, contradicting the fact that is unsaturated at time . Thus, if is unsaturated at time , then for some such that ran in serial. Thus, , as desired.
By virtue of Lemma 26 it suffices to consider the case that is saturated immediately before finishing. Let be the final time in when is unsaturated (we set if is always saturated). Let be the smallest such that ; in fact we will have , since in order to transition from being unsaturated to being saturated, some tasks must arrive. For integer , let denote the sum of for each with that runs in parallel; If is an instantly-committing scheduler then can be computed at time . Now we prove an analogue of Lemma 6.
Lemma 27.
Fix a length TAP. For all , and for all instantly-committing schedulers ,
| (9) |
Proof.
We prove Equation 9 by induction on . The base case is . If takes work here, then we have (and ) so the invariant will hold. If instead runs in serial, then we’ll have , in which case we have so the invariant holds. This establishes the base case.
Now, assume Equation 9 for ; we prove Equation 9 for . If takes at least as much work as on , i.e., , then Equation 9 for implies Equation 9 for . It remains to consider the case when runs in serial, but runs in parallel. Here, thought was too large to serialize large, so:
| (10) |
One consequence of Equation 10 is that parallelizes ; a corollary of this is that Equation 9 holds for . Thus, applying Equation 9 for and using Equation 10 gives:
so the invariant holds in this case as well.
In the Massively Parallel regime Theorem 7 followed immediately from Lemma 6. Slightly more work is required in the general setting, but Lemma 27 is still very useful.
Theorem 28.
is a -competitive instantly-committing scheduler in the SPDP.
Proof.
Recall from Lemma 26 that we need only consider the case that ends saturated, and recall the definition of . For any scheduler , let denote the work that has left immediately before time , and let be work that takes on tasks with . Because ends saturated, we have
Applying Lemma 27 gives
| (11) |
So, to conclude, it suffices to show that . Let be the set of tasks that has present immediately before time . Let . Clearly . On the other hand, must take at least work on the tasks , and can have made at most progress on these tasks by time . Thus,
Therefore,
Using this in Equation 11 gives .
References
- [1] S Anand, Naveen Garg, and Nicole Megow. Meeting deadlines: How much speed suffices? In Automata, Languages and Programming: 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part I 38, pages 232–243. Springer, 2011. doi:10.1007/978-3-642-22006-7_20.
- [2] James Aspnes, Yossi Azar, Amos Fiat, Serge Plotkin, and Orli Waarts. On-line routing of virtual circuits with applications to load balancing and machine scheduling. Journal of the ACM (JACM), 44(3):486–504, 1997. doi:10.1145/258128.258201.
- [3] Baruch Awerbuch, Yossi Azar, Edward F Grove, Ming-Yang Kao, P Krishnan, and Jeffrey Scott Vitter. Load balancing in the norm. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 383–391. IEEE, 1995. doi:10.1109/SFCS.1995.492494.
- [4] Brenda S Baker and Jerald S Schwarz. Shelf algorithms for two-dimensional packing problems. SIAM Journal on Computing, 12(3):508–525, 1983. doi:10.1137/0212033.
- [5] Ioannis Caragiannis. Better bounds for online load balancing on unrelated machines. In Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pages 972–981, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347188.
- [6] Shichuan Deng, Jian Li, and Yuval Rabani. Generalized unrelated machine scheduling problem. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2898–2916. SIAM, 2023. doi:10.1137/1.9781611977554.CH110.
- [7] Richard A. Dutton and Weizhen Mao. Online scheduling of malleable parallel jobs. In Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, PDCS ’07, pages 136–141, USA, 2007. ACTA Press.
- [8] R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969. doi:10.1137/0117039.
- [9] Shouwei Guo and Liying Kang. Online scheduling of malleable parallel jobs with setup times on two identical machines. European Journal of Operational Research, 206(3):555–561, November 2010. doi:10.1016/j.ejor.2010.03.005.
- [10] Anupam Gupta, Amit Kumar, Viswanath Nagarajan, and Xiangkun Shen. Stochastic load balancing on unrelated machines. Mathematics of Operations Research, 46(1):115–133, 2021. doi:10.1287/MOOR.2019.1049.
- [11] Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie. Stochastic online scheduling on unrelated machines. In Integer Programming and Combinatorial Optimization: 19th International Conference, IPCO 2017, Waterloo, ON, Canada, June 26-28, 2017, Proceedings 19, pages 228–240. Springer, 2017. doi:10.1007/978-3-319-59250-3_19.
- [12] Varun Gupta, Benjamin Moseley, Marc Uetz, and Qiaomin Xie. Greed works—online algorithms for unrelated machine stochastic scheduling. Mathematics of operations research, 45(2):497–516, 2020. doi:10.1287/MOOR.2019.0999.
- [13] Ellis Horowitz and Sartaj Sahni. Exact and approximate algorithms for scheduling nonidentical processors. J. ACM, 23(2):317–327, April 1976. doi:10.1145/321941.321951.
- [14] Johann L Hurink and Jacob Jan Paulus. Online algorithm for parallel job scheduling and strip packing. In Approximation and Online Algorithms: 5th International Workshop, WAOA 2007, Eilat, Israel, October 11-12, 2007. Revised Papers 5, pages 67–74. Springer, 2008. doi:10.1007/978-3-540-77918-6_6.
- [15] Sungjin Im and Shi Li. Improved approximations for unrelated machine scheduling. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2917–2946. SIAM, 2023. doi:10.1137/1.9781611977554.CH111.
- [16] William Kuszmaul and Alek Westover. Scheduling jobs with work-inefficient parallel solutions. In Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, pages 101–111, 2024. doi:10.1145/3626183.3659960.
- [17] Jan Karel Lenstra, David B Shmoys, and Éva Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical programming, 46:259–271, 1990. doi:10.1007/BF01585745.
- [18] Walter Ludwig and Prasoon Tiwari. Scheduling malleable and nonmalleable parallel tasks. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 167–176, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314491.
- [19] Marco Molinaro. Stochastic Load Balancing and Moment Problems via the L-Function Method, pages 343–354. SIAM, 2019. doi:10.1137/1.9781611975482.22.
- [20] Gregory Mounie, Christophe Rapine, and Dennis Trystram. Efficient approximation algorithms for scheduling malleable tasks. In Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, SPAA ’99, pages 23–32, New York, NY, USA, 1999. Association for Computing Machinery. doi:10.1145/305619.305622.
- [21] Daniel R. Page, Roberto Solis-Oba, and Marten Maack. Makespan minimization on unrelated parallel machines with simple job-intersection structure and bounded job assignments. Theoretical Computer Science, 809:204–217, 2020. doi:10.1016/j.tcs.2019.12.009.
- [22] Andreas S. Schulz and Martin Skutella. Scheduling unrelated machines by randomized rounding. SIAM Journal on Discrete Mathematics, 15(4):450–469, 2002. doi:10.1137/S0895480199357078.
- [23] John Turek, Walter Ludwig, Joel L. Wolf, Lisa Fleischer, Prasoon Tiwari, Jason Glasgow, Uwe Schwiegelshohn, and Philip S. Yu. Scheduling parallelizable tasks to minimize average response time. In Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, SPAA ’94, pages 200–209, New York, NY, USA, 1994. Association for Computing Machinery. doi:10.1145/181014.181331.
- [24] John Turek, Uwe Schwiegelshohn, Joel L Wolf, and Philip S Yu. Scheduling parallel tasks to minimize average response time. In Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, pages 112–121, 1994. URL: http://dl.acm.org/citation.cfm?id=314464.314485.
- [25] John Turek, Joel L Wolf, and Philip S Yu. Approximate algorithms scheduling parallelizable tasks. In Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, pages 323–332, 1992. doi:10.1145/140901.141909.
- [26] Nodari Vakhania, Jose Hernandez, and Frank Werner. Scheduling unrelated machines with two types of jobs. International Journal of Production Research, 52:1–9, February 2014. doi:10.1080/00207543.2014.888789.
- [27] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 222–227. IEEE Computer Society, 1977.
- [28] Deshi Ye, Danny Z. Chen, and Guochuan Zhang. Online scheduling of moldable parallel tasks. Journal of Scheduling, 21(6):647–654, 2018. Publisher: Springer. URL: https://ideas.repec.org//a/spr/jsched/v21y2018i6d10.1007_s10951-018-0556-2.html, doi:10.1007/S10951-018-0556-2.
- [29] Deshi Ye, Xin Han, and Guochuan Zhang. A note on online strip packing. Journal of Combinatorial Optimization, 17(4):417–423, 2009. doi:10.1007/S10878-007-9125-X.
- [30] Xiaoyan Zhang, Ran Ma, Jian Sun, and Zan-Bo Zhang. Randomized selection algorithm for online stochastic unrelated machines scheduling. Journal of Combinatorial Optimization, pages 1–16, 2022.
Appendix A Barriers Against Improved Schedulers
In this section we show that the schedulers of Section 4 and Section 5 are optimal among natural restricted classes of schedulers. This highlights what changes must be made to the schedulers in order to have hopes of achieving better competitive ratios.
First we show that among non-procrastinating eventually-committing schedulers (i.e., eventually-committing schedulers with the property that whenever tasks are present, they will run at least one task), the scheduler 8 is optimal.
Proposition 29.
Fix . Let denote the real root of the polynomial . There is no deterministic -competitive non-procrastinating eventually-committing scheduler.
Proof.
It suffices to consider the case that . Fix a non-procrastinating eventually-committing scheduler . Assume towards contradiction that is -competitive. We now describe a TAP on which . The TAP starts with . Next, let . Finally, at each time , give a task .
We now argue that must run all the tasks on the fast machine. Because is a non-procrastinating -competitive scheduler, must instantly start on the fast machine (in case there are no tasks after ). Now we argue that runs on the fast machine as well. Suppose that starts on a slow machine at some time with
| (12) |
Then, would not be -competitive on the truncated TAP . Thus, must not start on a slow machine at any time satisfying Equation 12. We now show that Equation 12 holds for all , thus proving that must run on the fast machine. For we have , and , so Equation 12 holds. For we have
Thus, it suffices to show:
| (13) |
To show Equation 13 it suffices to check Equation 13 for (by monotonicity of the inequality on either side of ). At Equation 13 is:
which is true because .
We have now shown that runs all tasks in on the fast machine. Thus, (by definition of )
However, . This contradicts the assumption that is -competitive.
Now, we show that the -competitive scheduler of Section 5 is optimal among never-committing schedulers that don’t cancel tasks on slow machines.
Proposition 30.
Let be a deterministic never-committing scheduler that never cancels serial tasks. Then, for any , there is a TAP with on which has is not -competitive.
Proof.
It suffices to consider the case that . The TAP is defined as follows. First, . Then, for each time , a task arrives. We will show that if starts on a slow machine at any time then is not -competitive on . We show this by considering two cases.
Case 1.
starts on a slow machine at time .
If does this, then . However, .
Thus,
So cannot start on a slow machine at this time.
Case 2.
starts on a slow machine at time .
If does this, then . However, .
Thus,
In conclusion, must run on the fast machine. But then
a contradiction.
Appendix B Lower Bounds from [16]
In this section we state, for the reader’s convenience, the lower bounds from [16] against instantly- and eventually- committing schedulers.
Proposition 31 (Kuszmaul, Westover [16]).
Fix . There is no deterministic -competitive instantly-committing scheduler.
Proof.
Consider an -task TAP where for each , the -th task has , and the arrival times are all very close to . For each , it is possible to handle the first tasks in the TAP with completion time . Thus, a -competitive scheduler cannot afford to run task on a slow machine. So, a -competitive scheduler must run all tasks on the fast machine, giving completion time at least on this TAP, while . For large enough this implies that the scheduler is not actually competitive.
Proposition 32 (Kuszmaul, Westover [16]).
Fix . There is no deterministic -competitive eventually-committing scheduler, where is the golden ratio.
Proof.
Suppose that is a -competitive eventually-committing scheduler. Let ; if there are no further tasks, must run on the fast machine, starting at some time . Let . On this TAP, , while . So is not -competitive.
Appendix C Randomized Lower Bounds
In this section we give lower bounds against randomized schedulers. Our main tool is Yao’s minimax principle [27], which allows us to prove a lower bound on the competitive ratio by exhibiting a distribution over TAPs, and showing that any deterministic scheduler has poor expected cost on a random TAP drawn from the distribution.
Proposition 33.
For any , there is no -competitive instantly-committing scheduler, even with randomization.
Proof.
Fix . For , define to be a length TAP with . Let denote the following distribution over TAPs: choose uniformly randomly, and then output TAP . By brute force enumeration of all possible deterministic instantly-committing strategies, one can show that no such strategy is -competitive on this TAP.
Proposition 34.
For any , there is no -competitive eventually-committing scheduler, even with randomization.
Proof.
In the proof of Proposition 18 we defined two TAPs, and showed that no deterministic eventually-committing scheduler is -competitive on both of the TAPs. One can show that if we randomly choose between the two TAPs of Proposition 18, there is no deterministic eventually-committing scheduler with expected competitive ratio .
