Abstract 1 Introduction 2 Background 3 Compressing to a Constant Utilization Bound 4 Partitioned EDF 5 Global EDF and RM 6 Conclusions and Future Work References

Improved Elastic Scheduling Algorithms for Implicit-Deadline Tasks

Marion Sudvarg111Corresponding author ORCID Washington University in St. Louis, MO, USA Christopher Gill ORCID Washington University in St. Louis, MO, USA Sanjoy Baruah ORCID Washington University in St. Louis, MO, USA
Abstract

Elastic scheduling provides a framework under which the utilizations of recurrent tasks are reduced by increasing their periods in response to system overload. The original elastic scheduling model was proposed by Buttazzo et al. in 1998 for implicit-deadline tasks on a uniprocessor and decreases task utilizations to satisfy a schedulable utilization bound. In 2019, Orr and Baruah extended the framework to multiprocessor scheduling of implicit-deadline tasks. In this paper, we propose, analyze, and evaluate new elastic scheduling algorithms for several of the scheduling policies considered in these prior works. In particular, (i) we evaluate an algorithm that we proposed as a short note in the Real-Time Systems journal and demonstrate that it allows for faster admission control than the algorithm of Buttazzo et al. when applied to uniprocessor and fluid scheduling. (ii) We also present faster elastic scheduling algorithms for partitioned EDF scheduling. Finally, (iii) we provide polynomial-time exact elastic scheduling algorithms for global EDF and global RM.

Keywords and phrases:
real-time systems, elastic scheduling, scheduling algorithms
Copyright and License:
[Uncaptioned image] © Marion Sudvarg, Christopher Gill, and Sanjoy Baruah; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Computer systems organization Real-time systems
Acknowledgements:
The authors would like to thank the anonymous reviewers for their helpful comments. Thanks also to the TPC Chairs of SIES 2024 (Daniel Casini, Qingxu Deng, and Zhishan Guo) for inviting us to submit our work to this special issue.
Funding:
This research was supported in part by NSF grants CPS-2229290 and CNS-2141256 and a Washington University seed grant.
Received:
2025-04-17  
Accepted:
2025-11-26  
Published:
2025-12-19
Part Of:
TGDK, Volume 10, Issue 2 (Special Issue on Industrial Real-Time Systems)

1 Introduction

Elastic real-time scheduling models provide a framework for dynamic task adaptation to guarantee schedulability even on an overloaded system. First proposed by Buttazzo et al. [12, 13], elastic scheduling allows tasks to decrease (“compress”) their utilizations by increasing invocation periods or reducing computational workloads (e.g., with imprecise computations [19] or early termination of anytime algorithms [14]). Each task is characterized by a maximum utilization representing the service level required for its desired mode of execution given sufficient computational resources. Tasks with flexible execution requirements – the so-called “elastic” tasks – are each assigned a parameter (its “elasticity”) representing its relative adaptability. On an overloaded system, each elastic task’s utilization is reduced in proportion to its elasticity until the system becomes schedulable. Elastic tasks are also characterized by minimum utilizations representing the service level necessary to maintain correct or safe execution; each task’s utilization cannot be compressed below its minimum value.

While elastic scheduling models are therefore useful for adjusting a predefined set of tasks for execution on a resource-constrained system, the original elastic scheduling model of Buttazzo el al. was primarily intended to enable online adaptation in dynamic and open systems, e.g., in response to admission of new tasks or changes in available computational resources [12, 13]. Therefore, it is important for elastic scheduling algorithms to be efficient (i.e., provide bounded-time complexity guarantees) for online execution while preserving quality of service to the extent possible (i.e., tasks should be compressed only as much as needed to maintain schedulability).

With these two concerns in mind, this paper aims to analyze and improve upon existing approaches to elastic scheduling for sets of sequential tasks with implicit deadlines to be scheduled on both uniprocessor and multiprocessor systems. Prior algorithms for these types of task systems can be broadly classified into two categories. For scheduling algorithms with a utilization bound, a quadratic-time algorithm proposed by Buttazzo et al. [12, 13] for uniprocessor elastic scheduling finds an exact solution; this same algorithm was applied to multiprocessor fluid scheduling by Orr and Baruah [22, 21]. For multiprocessor scheduling algorithms where analysis does not simply check total utilization (e.g., partitioned EDF, global EDF, and global RM), Orr and Baruah proposed to iteratively increase the “amount” of utilization compression applied to the task system. At each level of compression, if the system is determined to be schedulable, the algorithm terminates; otherwise, compression is increased. Such algorithms are tunable in their precision: a smaller increase in compression at each iteration allows a more precise result, but increases the algorithm’s running time.

1.1 Contributions

Three key insights can be leveraged to improve these algorithms. First, an exact solution under a utilization bound can be obtained in quasilinear time, or in linear time for admission control or changes in utilization bound. In a short note published in the Real-Time Systems Journal [26], we presented, but did not evaluate, an algorithm to do so. In Section 3, we measure its performance in comparison to the original quadratic-time algorithm of Buttazzo et al. and discuss its advantages. In Section 4, we extend this algorithm to partitioned EDF scheduling, for which some partitioning heuristics afford a schedulable utilization bound. We demonstrate that, although pessimistic, compressing to the bound provides substantial speedups over the iterative elastic scheduling algorithm from Orr and Baruah in [22, 21]. We argue that this tradeoff may be beneficial in online scenarios that require rapid mode switches, e.g., in mixed-criticality systems [10].

Second, for the inexact iterative search algorithms, an amount of compression can be found by binary, rather than linear search. Compression is lower-bounded by 0 and upper-bounded by the amount that takes all tasks to their minimum utilizations. By binary searching in this range, Section 4 demonstrates that we can find a result more quickly than linear search for partitioned EDF scheduling.

The utilization bounds for global EDF [15] and global RM [7] scheduling are not static, but instead include among their terms the maximum over the task utilizations, so the bounds change as utilizations are compressed. The algorithm of Buttazzo et al. therefore cannot be applied directly; instead, in [21], Orr and Baruah apply their iterative search technique. However, we notice that by replacing the maximum-utilization task with a proxy task using transformed parameters, our improved utilization bound algorithm can be applied. A challenge arises since we don’t know, a priori, which task will require the minimum utilization after compression. Nonetheless, we can apply the compression algorithm once for each task chosen as the proxy, taking the best consistent result. Sections 5.1 and 5.2 demonstrate this for global EDF and global RM scheduling.

1.2 Extensions to the Prior Work

This work is an extension of a paper that we presented at the IEEE International Symposium on Industrial Embedded Systems (SIES) conference in 2024 [27]. In addition to the results in that paper, it includes:

  • A more complete background section, with reproductions of the original elastic scheduling algorithm from [11] and our improved algorithm originally presented in [26].

  • More detailed execution time statistics for the comparison of uniprocessor elastic scheduling algorithms; in particular, mean times are plotted.

  • Plots, which were omitted from [27] due to page limits, relating the execution times of those algorithms to total maximum and minimum utilization of the task set.

  • Additional discussion of the implications of faster online adaptation; in particular, we compare the amount of utilization compression necessary for schedulability when task execution times are inflated to accommodate algorithm overheads.

  • Expanded plots for partitioned EDF scheduling, showing comparisons of schedulability, execution time, and amount of compression achieved for each parameter combination used in randomly generating the evaluated task sets. In [27], parameters were aggregated in summary plots due to the length restriction.

  • A more complete description of our elastic compression algorithm for global EDF scheduling.

  • An extension of that algorithm to global RM scheduling.

  • Experimental evaluations of our global EDF and global RM elastic scheduling algorithms; evaluation of even the global EDF algorithm was absent from [27].

1.3 Organization

The remainder of this paper is organized as follows. In Section 2, we provide background on implicit-deadline elastic scheduling. In Section 3, we evaluate our improved (quasilinear/linear-time) algorithm from [26] compared to the original quadratic algorithm [12, 13] in the context of uniprocessor and multiprocessor fluid scheduling. In Section 4, we propose and evaluate faster approaches to elastic scheduling for partitioned EDF. In Section 5, we propose and evaluate exact algorithms for global EDF and global RM. Section 6 concludes the paper and discusses future work.

2 Background

2.1 Elastic Scheduling with Utilization Bounds

The elastic model for implicit-deadline tasks [12, 13] characterizes each task τi=(Ci,Uimin,Uimax,Ui,Ei) by five non-negative parameters:

  • Ci: The task’s worst-case execution time.

  • Uimax: The task’s maximum utilization, i.e., its nominal value when executing at the desired service level in an uncompressed state.

  • Uimin: Its minimum utilization, i.e., a bound on the amount its service can degrade.

  • Ui: The task’s assigned utilization, constrained to UiminUiUimax (the initial value of this parameter needs to be assigned prior to run-time).

  • Ei: An elastic constant, representing “the flexibility of the task to vary its utilization” [12].

Under the original model proposed by Buttazzo et al. [12, 13], the elastic model was applied to uniprocessor scheduling algorithms with utilization bounds, e.g., earliest deadline first (EDF) scheduling with its bound of 1, or rate monotonic (RM) scheduling under Liu and Layland’s bound [18]. It was since extended by Orr and Baruah [22] to multiprocessor fluid scheduling [1] where the utilization bound is equal to the number of processors m.

Under these models, a task system Γ={τ1,,τn} has a total uncompressed utilization Usummax expressed as

Usummax=i=1nUimax (1)

and a desired utilization UD representing the utilization bound allowed by the scheduling algorithm in use. In the event of system overload, i.e., if Usummax>UD, the elastic model assigns a new utilization Ui to each task τi according to these three conditions:

  1. 1.

    iUi=UD, i.e., total utilization is set to the schedulable bound.

  2. 2.

    Any task for which Ei=0 is considered inelastic; this is equivalent to the case that Uimin=Uimax.

  3. 3.

    For all other tasks τi and τj, if Ui>Uimin and Uj>Ujmin, then Ui and Uj must satisfy the relationship222For tasks τi having Ei=0, Ui=Uimin, and therefore the relationship needs not be satisfied.

    (UimaxUiEi)=(UjmaxUjEj) (2)

A task system Γ for which such Ui exist for all tasks is said to be feasible.

Intuitively, this model represents each task as a spring, with a length corresponding to the utilization Ui and an elasticity corresponding to Ei. The total length of the springs, placed end-to-end, represents Usum. If this exceeds UD, the schedulable bound, a force is applied to the system that compresses each task’s utilization Ui proportionally to its elasticity, subject to the constraint333This statement holds true for inelastic tasks, as Ei=0 implies Uimin=Uimax, and therefore the utilization is not reduced. that the utilization remains no less than the specified minimum Uimin. This physical analogy is illustrated in Figure 1.

Figure 1: The physical spring analogy of the elastic model of Buttazzo et al., reproduced from [11, Figure 9.27] with modifications to the labels. (a) shows the uncompressed task set, while (b) illustrates the application of elastic compression to achieve schedulability.

Compression is often realized by adjusting each task’s period Ti according to its new utilization, i.e., Ti=Ci/Ui. Some work has also explored tasks for which computational workloads can be decreased (Ci=TiUi), e.g., by reducing the quantity of input data to process or by forcing an iterative anytime algorithm to terminate early [23, 25, 28].

2.1.1 An Overview of the Algorithm of Buttazzo et al.

Let Γ denote a feasible task system with Ei>0 for all tasks444Tasks τi with Ei=0 must have UiUimax; we assume this is done in a pre-processing step, with UD updated to reflect the remaining available utilization. τiΓ, and consider the Ui values that satisfy the above conditions. The tasks in Γ may be partitioned into two classes – Γvariable (those tasks for which Ui>Uimin, and which can therefore have their utilizations compressed further if necessary) and Γfixed (those for which Ui=Uimin; i.e., their utilizations are now fixed).

It has been shown [12, Eqn. 8] that for each τiΓvariable, the utilization Ui takes the value

Ui=Uimax(Usum(UDΔ)Esum)×Ei (3)

where

Usum=τiΓvariableUimax (4)

and

Esum=τiΓvariableEi (5)

respectively denote the sum of the Uimax parameters and the Ei parameters of all the tasks in Γvariable, and

Δ=τiΓfixedUimin (6)

denotes the sum of the Uimin parameters555Observe that Δ equals the amount of utilization that is allocated to the tasks in Γfixed; therefore (UDΔ) represents the amount available for the tasks in Γvariable, and (Usum(UDΔ)) the amount by which the cumulative utilizations of these tasks must be reduced from their desired maximums. As shown in the RHS of Equation 3, under elastic scheduling this reduction is shared among the tasks in proportion to their elasticity parameters: τi’s share is (Ei/Esum). of all the tasks in Γfixed.

Given a set of elastic tasks Γ, the algorithm of [12, Figure 3] starts out computing Ui values for the tasks assuming that they are all in Γvariable – i.e., their Ui values are computed according to Equation 3. If any Ui so computed is observed to be smaller than the corresponding Uimin then ① that task is moved from Γvariable to Γfixed; ② the values of Usum, Esum, and Δ are updated to reflect this transfer; and ③ Ui values are recomputed for all the tasks.

The process terminates if no computed Ui value is observed to be smaller than the corresponding Uimin. It is easily seen that one such iteration (i.e., computing Ui values for all the tasks) takes 𝒪(n) time. Since an iteration is followed by another only if some task is moved from Γvariable to Γfixed and there are n tasks, the number of iterations is bounded from above by n. The overall running time for the algorithm is therefore 𝒪(n2).

In his hard real-time computing systems textbook, Buttazzo presents a corrected and improved version of the algorithm that avoids explicitly maintaining separate lists to track Γvariable and Γfixed [11, Figure 9.29]. Nonetheless, the overall running time for the algorithm remains quadratic in the number of tasks. For reference, we present a modified version of this procedure in Algorithm 1. Notation has been modified to match our own.

Algorithm 1 Elastic_Compression(Γ,UD) (adopted from [11, Figure 9.29]).

The same algorithm was also repurposed in [12] for admission control – i.e., for determining whether a new task seeking to join an already-executing system could be admitted without compromising feasibility, and if so, recomputing the utilization values for all tasks.

2.1.2 Our Improved Algorithm

In a short note in the Real-Time Systems journal [26], we presented an algorithm that provides better guarantees on running time in terms of computational complexity. We defined an attribute ϕi for each elastic task τi:

ϕi=def(UimaxUiminEi) (7)

We proved in [26, Theorem 1] that in the algorithm of Buttazzo et al. in [12, Figure 3], tasks may be “moved” from Γvariable to Γfixed in order of their ϕi parameters. Assuming that the tasks are indexed such that ϕiϕi+1 for all i,1i<n, one can simply make a single pass through all the tasks from τ1 to τn, identifying, and computing Ui values for, all those in Γfixed before any in Γvariable. This can all be done in 𝒪(n) time with the procedure in [26, Algorithm 1]; we reproduce it here as Algorithm 2. The cost of sorting the tasks in order to arrange them according to non-increasing ϕi parameters is 𝒪(nlogn), and hence dominates the overall run-time complexity. Determining feasibility and computing the Ui parameters can therefore be done in 𝒪(nlogn)+𝒪(n)=𝒪(nlogn) time.

Algorithm 2 Elastic_Implicit_Uniprocessor(Γ,UD) (adopted from [26, Algorithm 1]).

Admission control – determining whether it is safe to add a new task and recomputing all the Ui parameters if so – requires that the new task be inserted at the appropriate location in the already sorted list of preëxisting tasks. This can be achieved in 𝒪(logn) time by implementing the list as a sorted iteratable data structure. Once this is done, the Ui values can be recomputed in 𝒪(n) time by the same algorithm. Similarly, removing a task from the system and recomputing the Ui values also takes 𝒪(n) time. Furthermore, if UD changes – e.g., in response to changes in available utilization due to dynamic resource reallocation – the sorted list of tasks and their parameters do not change, and so the Ui values can be updated in linear time.

Though we proved better asymptotic time complexity in [26], we did not evaluate the algorithm’s performance for realistic task sets. In Section 3, we perform this evaluation and extend the algorithm to fluid scheduling.

2.2 Scheduling Without a Utilization Bound

In addition to fluid scheduling, in [22], Orr and Baruah also developed elastic models for multiprocessor partitioned EDF and global EDF scheduling. They later applied elastic scheduling to partitioned RM in a journal extension [21]. All of these scheduling policies have feasibility tests that are more involved than simply checking total utilization against a bound that is constant in the number of tasks. To deal with this, they observed that the degree by which compression is applied to a task system can be quantified by the relationship in Equation 2. In doing so, they introduce a term λ that is representative of this relationship, and express the utilization Ui of each task τi as:

Ui(λ)=defmax(UimaxλEi,Uimin) (8)

The value of λ beyond which the utilization Ui of task τi takes its minimum value Uimin can therefore be derived as:

Uimin=UimaxλEiλ=(UimaxUiminEi)

which is equal to the value ϕi in Equation 7. As such, we may hereafter refer to ϕi interchangeably as λimax. For a complete set of tasks Γ we also denote the maximum compression that may be applied to the task system as:

λmax=defmaxτi(λimax) (9)

The problem of elastic scheduling under the model of Buttazzo et al. [12, 13] can therefore be reduced to the problem of finding the minimum value of λ for which a set of tasks are schedulable. For partitioned EDF and RM, global EDF and RM, and algorithm PriD, Orr and Baruah propose an approximate search technique that iterates over values of λ in the interval [0,λmax] with some “granularity” ϵ. For each value of λ, they assess schedulability, terminating the search once the compressed task system is deemed schedulable [22, 21]. In this paper, we explore and propose alternatives to this idea for partitioned and global EDF and global RM scheduling.

2.2.1 Partitioned EDF

Under partitioned earliest deadline first (EDF) scheduling, each task is assigned to a single processor core, though each core may be assigned multiple tasks. On an individual core, jobs are prioritized according to their absolute deadlines – in other words, each core schedules its tasks in an EDF manner independently of the other cores. The problem of deciding whether a set of tasks is schedulable on m cores under partitioned EDF can be stated as follows:

Given a set Γ of n tasks τi, each having utilization Ui, is there a partition of tasks into m sets such that the sum of utilizations in any set does not exceed 1?

This is equivalent to the bin-packing problem, and is therefore NP-hard in the strong sense. Nonetheless, there exist heuristic algorithms to solve bin-packing problems, and Lopez et al. have compared several in the context of partitioned EDF scheduling [20]. For each value of λ tested, Orr and Baruah employ the first fit, worst fit, and best fit heuristics, with tasks τi considered in order of decreasing utilization Ui(λ). If any one heuristic deems feasibility, the algorithm terminates.

For n tasks on m cores, sorting tasks and partitioning them with each heuristic takes at most Θ(nlogn+nm) time. As this must be performed for each tested value of λ – of which there are up to (λmaxϵ+1) – the overall complexity is Θ(λmaxϵ(nlogn+nm)).

2.2.2 Global EDF

Under global EDF scheduling, if at any instant there are more active jobs than processors, those jobs with the earliest absolute deadlines are selected for execution. Goossens et al. showed [15, Theorem 5] that a set Γ of preemptive implicit-deadline tasks is schedulable on m identical processors with global EDF if:

τiΓUim(m1)maxτiΓ{Ui} (10)

Because the utilization bound includes a term representing the maximum among task utilizations, and because that maximum may change (indeed, the task with the maximum utilization may change) as utilizations are compressed, the original algorithm of Buttazzo et al. cannot be applied directly. Orr and Baruah instead perform a linear search over the space of possible values of λ, terminating when Equation 10 holds true [22].

In Section 5.1, we propose and evaluate a polynomial-time algorithm that finds an exact solution, if one exists.

2.2.3 Global RM

Under global rate monotonic (RM) scheduling, if at any instant there are more active jobs than processors, those jobs of tasks with the shortest periods are selected for execution. Bertogna et al. showed [7, Theorem 5] that a set Γ of preemptive implicit-deadline tasks is schedulable on m identical processors with global RM if:

τiΓUim2(1maxτiΓ{Ui})+maxτiΓ{Ui} (11)

As with global EDF, the utilization bound includes terms with the maximum task utilization, which may change as utilizations are compressed. Orr and Baruah again perform a linear search with precision ϵ for the smallest λ that guarantees feasibility [21]. In Section 5.2, we present a polynomial-time exact algorithm for comparison.

3 Compressing to a Constant Utilization Bound

This section considers elastic scheduling applied to scheduling policies with utilization bound tests; in particular, we consider earliest deadline first (EDF) and rate monotonic (RM) scheduling on a uniprocessor and fluid scheduling on a multiprocessor.

3.1 Complexity of the Algorithm of Buttazzo et al.

As noted in Section 2.1, the original elastic scheduling algorithm [12, 13] has worst-case execution time complexity that is quadratic in the number of tasks. Buttazzo et al. note in [12] that this is due to the enforcement of constraints on minimum utilization. If tasks are not thus constrained, the algorithm can run in linear time. Intuitively, we may consider that some tasks, representing non-critical best-effort computation, need not be characterized with minimum utilizations. However, we note that without these constraints, the algorithm can assign negative utilizations.

Example 1.

Consider a set Γ of implicit-deadline elastic tasks to be scheduled by EDF on a uniprocessor and parameterized as follows.

Task 𝝉i 𝑼imax 𝑬i
τ1 0.9 1
τ2 0.9 1
τ3 0.2 8

The total uncompressed utilization Usummax is 2.0, but the desired utilization is UD=1.0. Then, in the absence of a constraint Uimin, the utilization Ui of each task τi will be assigned according to Equation 3:

Ui=Uimax(2.01.0Esum)×Ei=Uimax(110)×Ei

Computing for each task, we obtain U1=U2=0.8 and U3=0.6. While this set of assignments does achieve a total utilization of 1.0, these assignments are not valid: a negative utilization does not have semantic meaning.

Thus, the elastic problem with minimum utilization constraints Uimin is the only meaningful expression of the problem in the context of task scheduling, even if the constraints are set to 0 just for the purpose of enforcing non-negative utilization assignments. Therefore, the algorithm of Buttazo et al. cannot be guaranteed to have better than quadratic time complexity in the number of tasks. On the other hand, our procedure in Algorithm 2 is quasilinear in the number of tasks, and linear for admission control or changes to the utilization bound. The remainder of this section compares the two algorithms empirically using synthetic task sets with randomly-generated parameters.

3.2 Performance Evaluation on a Uniprocessor

We now compare experimentally the performance of our improved algorithm for elastic scheduling of implicit-deadline tasks on a uniprocessor outlined in Algorithm 2 to that of Buttazzo et al. [11, Figure 9.29] listed in Algorithm 1.

3.2.1 Implementation

Evaluations are performed on a Raspberry Pi 3 Model B+, which has a Broadcom BCM2837B0 System on Chip (SoC) with a 4-core ARMv8 Cortex-A53 running666The processor supports a CPU clock speed of 1.4 GHz. However, as noted in [9], this frequency cannot be sustained continuously, and may lead to throttling and instability. To maintain predictability, we boot the Raspberry Pi with a constant 700 MHz CPU clock speed, set the GPU to 250 MHz, and disable throttling. Details can be found at https://www.raspberrypi.com/documentation/computers/config_txt.html at 700 MHz and 1GB of RAM. We used version 6.1.21 of the Linux kernel, compiled for the ARMv7l 32-bit ISA. We implement both algorithms in C++ and quantify execution time performance by measuring elapsed processor cycles. We read directly from the cycle counter using a custom driver and kernel module that enables access to the ARM performance monitoring unit (PMU) from userspace. Algorithms are compiled statically using version 10.2.1 of the Gnu Compiler Collection (GCC) at optimization level O0; this allows us to avoid undesirable instruction reordering, especially around reads to the cycle counter. To avoid interference from other processes, we disable real-time throttling by writing 1 to the file /proc/sys/kernel/sched_rt_runtime_us, isolate CPU core 3 from the scheduler, and run our algorithms on that core at the highest real-time priority under Linux’s SCHED_FIFO scheduling class.

Each task τi is represented as a data structure (struct) containing single-precision floating-point representations of Uimax, Uimin, Ui, and Ei. The derived parameter ϕi from Equation 7 is also an attribute of the structure. For the following experiments, besides those of Section 3.2.5, we are only concerned with the assignment of Ui values; therefore we do not represent the WCET Ci or period Ti of any task.

For the algorithm of Buttazzo et al. (Algorithm 1), since we are considering only utilization assignments, and not period updates, we do not implement Line 18. Line 3 is also not implemented, as we assign UiUimax when task parameters are generated. Line 20 is implemented as UiUimin instead. Furthermore, all period-related checks (Lines 7, 16, and 19) are converted to the equivalent comparison of Ui to Uimin or Uimax.

For our improved algorithm (Algorithm 2), we compare three implementations:

  1. 1.

    The set of tasks Γ is implemented as an array (std::vector), which is sorted prior to executing the algorithm. Inserting or removing tasks takes linear time to move array elements (and, in the case of insertion, to find the location to insert to maintain sorted order).

  2. 2.

    The set of tasks Γ is implemented as a balanced binary tree (std::set), sorted by ϕi. Constructing the set takes quasilinear time, but subsequent insertion and removal requires only logarithmic time, while enabling sequential iteration over tasks in sorted order.

  3. 3.

    The set of tasks Γ is implemented as a linked list (std::list), sorted by ϕi. Removing a task takes constant time, but adding a task takes linear time to find the location to insert in sorted order.

3.2.2 Generating Task Sets

We generate sets Γ of tasks τi using the following methodology:

  1. 1.

    We consider sets of n tasks, generating 10 000 sets for each value of n in 2–50.

  2. 2.

    Each set of tasks has a total maximum utilization Usummax selected at random uniformly from (1.0,2.0] and a total minimum utilization Usummin selected at random uniformly from (0.0,1.0].

  3. 3.

    We apply the Dirichlet Rescale (DRS) algorithm [16] to distribute the total maximum utilization Usummax in an unbiased random fashion across the Uimax values for each individual task. We note that, in this context, the result should be equivalent to an application of the earlier UUniFast and UUniSort techniques [8].

  4. 4.

    We then apply the DRS algorithm to distribute the total minimum utilization Usummin across the individual Uimin values. DRS allows us to select these values uniformly from the space of selections satisfying the conditions that (i) the total iUimin equals the specified Usummin and (ii) each value Uimin does not exceed the corresponding Uimax.

  5. 5.

    Each task τi is assigned an elasticity Ei at random, selected uniformly from the range (0,1].

3.2.3 Execution Time of Utilization Compression for Schedulability

We execute our implementations of Algorithms 1 and 2 to compress each set of tasks to a total utilization of 1.0 (to be EDF-schedulable on a single processor). We measure execution time by reading directly from the cycle counter, reporting elapsed CPU cycles. We separately measure the initialization (“Init”) and compression (“Compress”) times for each algorithm. For the original procedure in Algorithm 1, initialization involves computing the Usummin value and checking whether it exceeds UD, while compression is the do …while loop in the algorithm. For our improved algorithm in Algorithm 2, compression is the forall loop that iterates over tasks in order of their ϕi parameters. The dominant contribution to the algorithm’s execution time complexity is the sorting of tasks by their ϕi values; we therefore include in initialization time both the computation of Usum and Esum as well as the total time to calculate each task’s ϕi value and establish the sorted order. For the array and list, the sort is performed over the complete data structure; for the binary tree, we insert tasks individually as their ϕi values are calculated.

The mean, median, and maximum times for the 10 000 sets of tasks generated for each size n from 2–50 are reported in Figure 2. As expected, the time to initialize the algorithm of Buttazzo et al. is much faster than our improved algorithm, which has to sort tasks by their ϕi values. Of the three implementations of our algorithm, the linked list was the slowest to initialize, while the array was the fastest; we assume that this was due to the data locality and simplicity of managing the data structure.

(a) Mean Init Times.
(b) Mean Compress Times.
(c) Mean Total Times.
(d) Median Init Times.
(e) Median Compress Times.
(f) Median Total Times.
(g) Max Init Times.
(h) Max Compress Times.
(i) Max Total Times.
Figure 2: Performance scaling with number of tasks.

Also, as expected, the compression time for the algorithm of Buttazzo et al. was much longer than for our algorithm. On average, the array tended to be the fastest, followed by the linked list, followed by the binary tree.777We do not observe a significant difference in the trends of worst-case compression times versus number of tasks among the three implementations. This makes sense; while all three data structures enable linear time traversal, the array is the simplest to iterate (in fixed-size strides) and has the best data locality; the linked list is still simple, but requires following pointers between nodes, and does not have as good locality; and the binary tree requires even more complex pointer chasing.

Most interesting, we observe that our algorithm does not strictly dominate the original algorithm of Buttazzo et al. in total running time. In fact, in the average case, the original algorithm performs better because of the low initialization overhead. In the worst case, both the original algorithm and the array-based implementation of our algorithm dominate the other two implementations, but neither clearly dominates the other. This is partially explained by the fact that the compression times for the algorithm of Buttazzo et al. grow roughly linearly with the number of tasks, rather than quadratically, for sets of up to 50 tasks. Under non-pathological cases, the outer loop (Line 6 of Algorithm 2) of the original algorithm only runs a handful of times.

Nonetheless, we argue that our algorithm is better in practice. While there is not a clear advantage to using our algorithm to perform compression over a complete set of tasks, there is no clear disadvantage either. Furthermore, our algorithm performs better in situations where initialization has already happened, e.g. for online adjustment in response to changes in available utilization. In this case, when only compression needs to occur, associated overheads are summarized in Table 1. The worst execution times that we observed for the array-based implementation of our algorithm were 3.45× faster than those of the algorithm of Buttazzo et al. when just compressing tasks. Moreover, as we will demonstrate, there is a clear advantage to using our algorithm during online admission of a new task.

Table 1: Greatest mean, median, and maximum compression times (cycles) observed for up to 50 tasks. Values in parentheses indicate speedup compared to the algorithm of Buttazzo et al.
Buttazzo Binary Tree Linked List Array
mean 51 380 11 157 (4.61×) 9 617 (5.34×) 8 616 (5.96×)
median 54 315 11 175 (4.86×) 9 629 (5.64×) 8 626 (6.30×)
maximum 97 342 36 231 (2.69×) 31 984 (3.04×) 28 202 (3.45×)

3.2.4 Execution Time of Task Admission

We modify our implementations of Algorithms 1 and 2 to measure admission of a single task. For the sets of n tasks of size 2–50 that we already generated, we apply each algorithm to the first n1 tasks. We then measure the time to compress them after adding the final task from the set. For the algorithm of Buttazzo et al., this requires rerunning the complete do …while loop. For our algorithm, this requires ① computing the value ϕi of the new task τi according to Equation 7, then ② adding its utilization and elasticity to Usum (Equation 4) and Esum (Equation 5) or Δ (Equation 6), as appropriate. The task is then ③ inserted into the sorted container, before finally ④ executing the forall loop in Algorithm 2.

(a) Mean Admission Times.
(b) Median Admission Times.
(c) Max Admission Times.
Figure 3: Execution time for admitting the nth task.

Results are illustrated in Figure 3. We observe that, when admitting a new task, all implementations of our improved algorithm dominate the original algorithm of Buttazzo et al. for more than 3 tasks in the average case, and more than 10 tasks in the worst case. The array (which enables logarithmic time search for the location to insert the new task, then requires linear time to perform the insertion) performs the best on average, followed by the balanced binary tree (which allows logarithmic-time insertion, but requires pointer chasing), then the linked list (which allows constant-time insertion after linear time search for the insert location). Overheads and speedups are summarized in Table 2. The array-based implementation of our algorithm admits tasks 2.53× faster than original algorithm in the worst case.

Table 2: Greatest mean, median, and maximum task admission times (cycles) observed for up to 50 tasks. Values in parentheses indicate speedup compared to the algorithm of Buttazzo et al.
Buttazzo Binary Tree Linked List Array
mean 28 165 13 246 (2.13×) 16 704 (1.69×) 11 043 (2.55×)
median 27 931 13 274 (2.10×) 16 698 (1.67×) 11 053 (2.53×)
maximum 79 967 39 213 (2.04×) 41 044 (1.95×) 31 566 (2.53×)

3.2.5 Implications for Online Adaptation

During online adjustment of task utilizations, e.g., in response to admission of a new task, the overhead associated with computing new task utilizations must be accounted to avoid deadline misses. Full treatment of the several alternative methods for handling such transient overheads is outside the scope of this work; here we consider the usual approach in which scheduling and other operating system kernel overheads are added to the execution times of each task in the system. Under elastic scheduling, if the worst-case execution time of the selected compression algorithm is added to each task’s execution time, then clearly more utilization compression is necessary to maintain schedulability in overload scenarios when a slower-running algorithm is used.

To highlight the implications of this, we compare the amount of utilization compression necessary to admit the last task of our task sets when adding the maximum overheads observed in Figure 3(c) for the algorithm of Buttazzo et al. and the array-based implementation of our algorithm. We quantify this using the value λ defined as in Equation 8. We restrict attention to sets of size 3–50 since the algorithm of Buttazzo et al. is slightly faster in the worst case when applied to a set of only 2 tasks.

Until now, we have considered only utilization compression in a general sense; thus, our randomly generated synthetic tasks from Section 3.2.2 are only characterized by acceptable utilization ranges and elasticities. To consider the implications of algorithm overheads, we must also assign periods and execution times. To demonstrate the practical applicability of our approach in real-world scenarios, we draw period values from real-world automotive benchmarks. We assign minimum periods Timin at random to each task from the distribution listed in [17, Table III].888We ignore the “angle-synchronous” entry and renormalize probabilities over the remaining distribution. Execution times are then derived as Ci=UimaxTimin and maximum periods are computed as Timax=CiUimin. Algorithm overheads, though presented in cycles, were measured using a constant 700 MHz CPU clock speed; we convert them to times accordingly.

Figure 4: Difference between λBut and λArr, normalized by λmax.

Figures 4 (Top) and (Middle) show the distribution of differences (λButλArr)λmax between the value λBut necessary to account for the overhead of the algorithm of Buttazzo et al. and λArr of our array-based algorithm, normalized by the value λmax taken from the original task set without overheads. Figure 4 (Bottom) shows the mean, median, and maximum of the normalized values for each number n of tasks. Task sets not requiring compression (λ=0) are excluded, as are those deemed infeasible under any amount of compression.

As expected, greater online overhead incurs additional compression when it is accounted for. We observe that for n=50 tasks, the algorithm of Buttazzo et al. must, on average, compress tasks by nearly 11% more of the allowable range than ours. In the worst case, it compresses tasks by nearly 40× more than the original λmax value. These results highlight that the speedups achieved by our algorithm have practical relevance as they reduce the amount of compression required during online admission of new tasks for workloads with parameters drawn from real-world applications.

3.3 Extension to Fluid Scheduling

A set of tasks is fluid schedulable on m identical processor cores if and only if (i) its total utilization does not exceed m, and (ii) no individual task’s utilization exceeds 1 [1]. Orr and Baruah therefore argued that, so long as each elastic task’s maximum utilization Uimax1, the algorithm of Buttazzo et al. algorithm can be extended to fluid scheduling simply by setting the desired utilization UD=m. The results and conclusions drawn in this section are therefore applicable to fluid scheduling as well: our procedure [26] in Algorithm 2 may be used in place of the original procedure [11] in Algorithm 1 to achieve faster compression (once initialized) and admission of new tasks.

Sets of sequential tasks requiring multiple processors to execute – i.e., those which benefit from fluid scheduling – have total maximum utilizations Usummax>1, and may also have total minimum utilizations Usummin>1. Rather than evaluating new set of tasks, we consider whether our existing results extend to fluid scheduling by asking the question, “Do the values selected for Usummax or Usummin affect execution time?”

For the 10 000 sets of 50 tasks generated in Section 3.2.2, we produce 3 scatter plots for the initialization (“Init”) and compression (“Compress”) times of each implementation: these plot cycles against the values Usummin, Usummax, and the absolute distance between them, (UsummaxUsummin), as shown in Figure 6. We do not observe a significant dependence of execution time on these values. Therefore, the performance results illustrated in Figures 2 and 3 should also extend to fluid scheduling.

Refer to caption
(a) Buttazzo Init.
Refer to caption
(b) Buttazzo Init.
Refer to caption
(c) Buttazzo Init.
Refer to caption
(d) Buttazzo Compress.
Refer to caption
(e) Buttazzo Compress.
Refer to caption
(f) Buttazzo Compress.
Refer to caption
(g) Improved Init: Array.
Refer to caption
(h) Improved Init: Array.
Refer to caption
(i) Improved Init: Array.
Refer to caption
(j) Improved Compress: Array.
Refer to caption
(k) Improved Compress: Array.
Refer to caption
(l) Improved Compress: Array.
Refer to caption
(a) Improved Init: Tree.
Refer to caption
(b) Improved Init: Tree.
Refer to caption
(c) Improved Init: Tree.
Refer to caption
(d) Improved Compress: Tree.
Refer to caption
(e) Improved Compress: Tree.
Refer to caption
(f) Improved Compress: Tree.
Refer to caption
(g) Improved Init: List.
Refer to caption
(h) Improved Init: List.
Refer to caption
(i) Improved Init: List.
Refer to caption
(j) Improved Compress: List.
Refer to caption
(k) Improved Compress: List.
Refer to caption
(l) Improved Compress: List.
Figure 6: Execution times by utilization metrics for 50 tasks.

4 Partitioned EDF

In this section, we propose two alternative approaches to elastic scheduling of partitioned EDF tasks. First, we consider a binary, rather than linear, search over the space of compression allowed due to the minimum utilization constraint on each task. Second, using the insight that under partitioned EDF scheduling a set of tasks is guaranteed to be schedulable if its utilization does not exceed a function of the number of cores, we apply our procedure [26] in Algorithm 2 for compressing to this utilization bound.

4.1 Binary Search

We observe that a straightforward optimization may be applied to the approach of Orr and Baruah [22] summarized in Section 2.2. Rather than iterating over all values of λ[0,λmax] with granularity ϵ in sequential order, we can instead perform a binary search in time Θ(logλmaxϵ), as outlined in Algorithm 3. Thus, total time complexity is reduced to

𝚯((𝒏𝐥𝐨𝐠𝒏+𝒏𝒎)𝐥𝐨𝐠(𝝀𝐦𝐚𝐱ϵ)) (12)
Algorithm 3 Elastic_Partitioned_EDF(Γ,m).

Algorithm 3 uses the notation Γ(λ) from Baruah [3], denoting the task system obtained from Γ by applying compression λ, i.e., with each task τi having a utilization Ui(λ) according to Equation 8. The algorithm first checks if Γ(0) – the uncompressed task set – is schedulable by partitioned EDF on m cores; schedulability may be determined according to the heuristics employed by Orr and Baruah [22]. If so, it returns the value λ=0. It then checks if Γ(λmax) is schedulable; if not, the algorithm fails. Otherwise, it performs binary search over values of λ in the range [0,λmax]: λhi (initialized to λmax) tracks the smallest value of λ tested for which Γ(λ) is schedulable, while λlo (initialized to 0) tracks the largest tested value for which Γ(λ) is not schedulable. At each step, the algorithm checks schedulability of Γ(λ); if feasibility is determined, λhi is decreased to the tested value of λ; otherwise, λlo is increased to the tested value of λ. The algorithm terminates when the difference between λhi and λlo does not exceed ϵ.

4.1.1 Optimality of Search for λ

We now discuss and prove results about the optimality of iterative and binary searches for partitioned EDF scheduling. We begin by introducing the term λΓ,m, defined as the smallest value of λ for which Γ(λ) is schedulable by partitioned EDF on m cores.

The first result is intuitive: it says that, once you compress a task system such that it is schedulable, it will remain schedulable when compressed more.

Theorem 2.

Given a value of λ, if Γ(λ) is partitioned EDF schedulable on m cores, then Γ(λ) is also partitioned EDF schedulable for every value of λλ.

Proof.

Consider a set Γ of n tasks τi. If Γ(λ) is partitioned EDF schedulable on m cores, then there exists a partition {Γ1,,Γm} of Γ such that the following condition holds:

j1..m,τiΓjUi(λ)1

Consider a value λλ. For each task τi,

Ui(λ)=max(UimaxλEi,Uimin)max(UimaxλEi,Uimin)=Ui(λ)

Since Ui(λ)<Ui(λ), it follows that:

j1..m,τiΓjUi(λ)τiΓjUi(λ)1

So there remains a partition of Γ(λ) for which the condition holds.

It follows that Γ(λ) is partitioned EDF schedulable for every value of λ that exceeds λΓ,m. This allows us to say something about the optimality of the elastic algorithms for partitioned EDF scheduling.

Theorem 3.

The values of λ obtained by using the iterative approach of Orr and Baruah [22] or the binary search in Algorithm 3 will be within ϵ of λ if an exact test of partitioned EDF schedulability is performed for Γ(λ) at each considered value of λ. In other words, λλ<ϵ.

Proof.
  • Iterative Approach: The algorithm tests λ=0 first; if λ=0, then the algorithm returns this value. Otherwise, consider the value λ returned by the algorithm: Γ(λ) is feasible, but Γ(λϵ) is not feasible. It follows from Theorem 2 that λ>λϵ, which implies λλ<ϵ.

  • Binary Search Approach: The algorithm again tests λ=0 first; if λ=0, then the algorithm returns this value. Otherwise, consider the value λhi returned by the algorithm: Γ(λhi) is feasible, but Γ(λlo) is not; thus, by Theorem 2, λ>λlo. Due to the algorithm’s termination condition, we know that λhiλloϵ, and so λλ<ϵ.

Corollary 4.

The values λlin obtained by the iterative approach of Orr and Baruah [22] and λbin obtained by Algorithm 3 will be within ϵ of each other if an exact test of partitioned EDF schedulability is performed for Γ(λ) at each considered value of λ. In other words, |λlinλbin|<ϵ.

Proof.

From Theorem 3, λlinλ<ϵ and λbinλ<ϵ, so |λlinλbin|<ϵ.

This tells us that, given an exact schedulability test for partitioned EDF, both algorithms will find values for λ that are within ϵ of the optimal value λ and are within ϵ of each other. However, no such guarantee can be made if schedulability is determined by heuristic, as illustrated in Figure 8. It follows that:

Theorem 5.

The difference between the values λlin obtained by the iterative approach of Orr and Baruah [22] and λbin obtained by Algorithm 3 might exceed ϵ if heuristic tests of EDF schedulability are used.

This has a surprising implication, which follows from the above results.

Corollary 6.

Given a value of λ, if Γ(λ) is identified by heuristic to be partitioned EDF schedulable on m cores, then Γ(λ) might not be identifiable as such for some value of λ>λ.

The implication, then, is that while binary search is faster, it might overcompress a set of tasks by more than ϵ when applying heuristic partitioning (of course, the iterative search might overcompress instead). However, as we show in Section 4.3, binary search compresses, on average, only 0.262×ϵ more than iterative search for the sets of tasks we evaluated.

4.2 Application of Algorithm 2

In [2], it is observed that under the first-fit and best-fit heuristics, a set Γ of tasks τi is schedulable on m processor cores if the total utilization iUi does not exceed m+12 and if no single task’s utilization exceeds 1. Thus, the efficient procedure outlined in Algorithm 2 can be adopted by compressing to a desired utilization UD=m+12, achieving compression in 𝓞(𝒏𝐥𝐨𝐠𝒏) time.

We note that m+12 is an upper-bound on the utilization required by the first-fit and best-fit heuristics. Thus, the amount of compression resulting from an application of this approach might be more than necessary to achieve partitioned EDF schedulability, even under the above-listed heuristics. It follows that the approach of Orr and Baruah [22], while slower, might achieve better results – both in terms of compressing utilizations less aggressively, and by identifying more schedulable task sets. We evaluate these tradeoffs in Section 4.3.

4.3 Evaluation

We now compare the approaches to elastic partitioned EDF scheduling proposed in this section to the original approach of Orr and Baruah [22].

4.3.1 Implementation

In this section, we compare the following three approaches:

  1. 1.

    Iter: The iterative approach from [22]. For each value of λ tested, we attempt to find a schedulable partition by employing the best-fit decreasing then first-fit decreasing bin packing heuristics. The algorithm terminates if either is successful.

  2. 2.

    Bin: Our proposed binary search approach in Algorithm 3, again using the best-fit then first-fit decreasing bin packing heuristics.

  3. 3.

    Util: The utilization-based approach of Algorithm 2, with UD=m+12. We use the array-based implementation, as this was observed to perform best in our evaluations in Section 3.2.

We implement each approach in C++, compiling and measuring execution times with the same settings as described in Section 3.2.1, and running them on the same Raspberry Pi 3 Model B+. All tasks are also represented using the same data structure.

4.3.2 Generating Task Sets

We generate sets Γ of tasks τi according to Orr and Baruah’s methodology in [22]:

  • We consider multiprocessor platforms with m=4, 8, and 16 identical processor cores.

  • For each number of cores m, we consider sets of n tasks, with n=2m, 4m, and 8m.

  • The maximum utilization Uimax assigned to each task τi is selected at random, but we constrain these values to be no more than a parameter α. We separately consider values of α{0.6,0.8,1.0}.

  • Each set of tasks has a total maximum utilization Usummax of umα. We separately consider values of u{1.1,1.5,1.9}.

  • For each combination of values m, n, α, and u, we generate 1000 sets of tasks.

  • We use the DRS algorithm [16] to distribute the total maximum utilization Usummax across individual Uimax values. DRS allows us to select these values uniformly from the space of selections satisfying the conditions that (i) the total iUimax equals the specified Usummax and (ii) each value Uimax does not exceed the constraint imposed by the chosen value of α.

  • Individual minimum utilizations Uimin are assigned at random, selected uniformly from the range (0,Uimax].

  • Elastic coefficients Ei are assigned at random, selected uniformly from the range (1,5].

4.3.3 Linear versus Binary Search

We begin by comparing the Iter and Bin approaches. For each set of tasks, we compute λmax per Equation 9, then search for the optimal λ with granularity ϵ=λmax1000 (the same value tested by Orr and Baruah in [22]).

Figure 7 shows – for each combination of m, n, α, and u – the median and maximum speedup achieved by binary search over linear search. As before, task sets not requiring compression (λ=0) are excluded, as are those deemed infeasible under any amount of compression. Binary search achieves significant speedups, especially for larger values of α and u. These task sets have larger total maximum utilizations, and therefore tend to need more compression to achieve schedulability. In such cases, the linear search takes longer to reach the higher λ value, so binary search is significantly faster. Median speedups for each combination were as high as 51×, while the maximum speedup observed was 86×.

Figure 7: Speedups achieved by Bin over Iter.

Figure 8 shows the distribution of differences (λbinλlin)ϵ between the value λbin achieved by binary search and λlin returned by linear search, normalized by ϵ. We again exclude trivial or infeasible task sets. Where outliers extend beyond the plotted boundaries, the y-axis labels denote the maximum value. We observe that, although the values λbin and λlin typically do not differ by more than ϵ, there are cases where they differ by much more. For 16 tasks on 4 cores, with α=1.0 and u=1.9, λbin exceeds λlin by more than 200×ϵ. Generally, we see that outliers occur where λbin is larger than λlin. This makes sense due to the behavior of binary search: search proceeds downward in factors of 2 from larger values, and if Γ(λ) is deemed unschedulable for some tested value of λ greater than the optimal λ, the binary search will continue to test larger values. Despite these outliers, the average compression values agree closely, differing by less than 0.262×ϵ for every considered combination. Therefore, given the significant speedups gained, binary search is an attractive approach.

Figure 8: Difference between λbin and λlin, normalized by ϵ.

4.3.4 Fast Compression

Although the binary search approach already improves execution time significantly while having minimal impact on the quality of our solution, we expect that our Util approach, which applies our quasilinear-time algorithm, will be even faster, though we also expect it to be more pessimistic. We evaluate this hypothesis by comparing the number of task sets each approach deems feasible, the resulting values of λ necessary to achieve schedulability, and the time to find those values of λ. As in [22], to ensure a consistent comparison, we only compare λ values (and, in our case, execution times – measuring execution times was outside the scope of the work in [22]) for those tasks deemed feasible. Also as in [22], we separately consider each value of m, α, n, and u.

Results are shown in Figure 9, which alternates between two types of plots. The first type shows the percentage of task sets identified as feasible by Bin and Util. The second type compares the median and maximum speedup gained by Util over Bin to the mean λ values achieved by each approach. As in [22], λ values are normalized by λmax to give a value in the interval [0,1]; this is necessary for comparing λ values across task sets.

Figure 9: Speed and Schedulability Tradeoffs Between Bin and Util.

We observe that Util identifies fewer schedulable task sets, and for those it does identify as schedulable, it typically requires more compression. This is especially true for larger values of α and u for which the total maximum utilization Usummax is larger. Nonetheless, at the cost of more pessimism, Util achieves speedups observed to reach over 20×; this may be desirable where online decisions must be made rapidly. For example, in mixed-criticality systems [30, 10], elastic frameworks have been proposed to extend the periods of low-criticality tasks, rather than suspending them, in response to critical task overruns [24, 29]. The transition must be made as quickly as possible, and so overcompression is acceptable (and is no worse than the alternative of dropping all low-criticality jobs).

5 Global EDF and RM

In this section, we present and evaluate an exact polynomial-time algorithm for elastic utilization compression under global earliest deadline first (EDF) and global rate monotonic (RM) scheduling.

5.1 The Global EDF Algorithm

Recall the condition for global EDF schedulability on m cores [15, Theorem 5] from Equation 10 in Section 2.2.2, which we restate here:

τiΓUim(m1)maxτiΓ{Ui}

Without loss of generality, let’s say that τj is the task with the maximum utilization, i.e., Uj=maxτiΓ{Ui}. Then we can restate the schedulability condition as

τi,ijUi+mUjm (13)
Theorem 7.

For a set Γ of elastic tasks, the amount of compression λ needed to satisfy the schedulability condition in Equation 13 can be found by finding λ for the condition τiΓUim for a set of tasks Γ where the task τj=(Ujmin,Ujmax,Ej) in Γ has been replaced by a task τj=(mUjmin,mUjmax,mEj) in Γ.

Proof.

Consider the value λ satisfying τiΓUi=m, i.e.,

τiΓmax{UimaxλEi,Uimin}=m

Assume that τjΓ is parameterized as τj=(mUjmin,mUjmax,mEj). Then:

τiΓ,ijmax{UimaxλEi,Uimin}+max{mUjmaxλmEj,mUjmin}=m

Equivalently,

τiΓ,ijmax{UimaxλEi,Uimin}+m×max{UjmaxλEj,Ujmin}=m

So τiΓ,ijUi(λ)+mUj(λ)=m and Γ is global EDF schedulable.

Intuitively, this says we can replace τj with a task τj with utilization and elasticity values scaled by m; schedulability is then based on a utilization bound of m and the system can be compressed using our algorithm [26, Algorithm 1]. However, as the task with the maximum utilization can change during compression, the utilization bound (the RHS of Equation 10) might no longer hold. Therefore, we must assume every task may take the role of τj after compression, so we repeat this procedure for each task. We then take the result for which (i) the task with the maximum utilization after compression matches the one taking the role of τj; and (ii) if there are multiple such consistent results, we take the one that applies the least compression. This procedure is outlined in Algorithm 4.

Algorithm 4 Elastic_Global_EDF(Γ,m).

5.1.1 Description

The algorithm takes a set Γ of elastic tasks and checks whether compression is necessary, and whether feasibility can be achieved. It then sorts the tasks in non-decreasing order of their ϕi values (see Equation 7). For each task τi in Γ, a representative task τj is constructed with Ujmin=mUimin, Ujmax=mUimax, and Ej=mEi per Theorem 7. Then, τi is replaced in Γ with τj to form the temporary set Γ; from Equation 7 we can see that

ϕj=(mUimaxmUiminmEi)=(UimaxUiminEi)=ϕi

so tasks τiΓ remain sorted by their ϕi values.

Our linear-time procedure listed in Algorithm 2 is then invoked, and the compression value λ is retrieved. We note that although our compression algorithm, as written in [26, Algorithm 1], does not return a value of λ, it can be easily modified to do so. From Equations 3 and 8 we have

λ=(Usum(UDΔ)Esum)

Then from Line 16 of its listing in Algorithm 2, we see that this value is computed and tracked by our algorithm, and so it can be retrieved in constant time for use in Line 9 of Algorithm 4. If, by checking Ui=Uj/m, it is determined that τi would have the maximum compressed utilization, then the task set is schedulable under global EDF. Since multiple feasible configurations might be found, a variable λ is used to track the best value (i.e., the minimum compression needed); this is initialized to λmax as the algorithm has already deemed feasibility.

5.1.2 Execution Time Complexity

For a set Γ of n tasks, sorting in order of ϕi values takes time 𝒪(nlogn). Inside the forall loop in Algorithm 4, constructing τj from τi takes constant time. Our compression algorithm runs in quasilinear time, but this time is dominated by sorting the tasks [26]. Since Γ has already been sorted, compression takes time linear in the number of tasks. Checking whether Uj/m is the maximum compressed utilization also takes linear time. Since each iteration of the loop takes time 𝒪(n) and it runs once for each of the n tasks, the total execution time complexity is 𝓞(𝒏𝟐).

5.2 Extension to Global RM

We now apply this same approach to tasks scheduled in global rate monotonic (RM) fashion. Recall the condition for global RM schedulability on m cores [7, Theorem 5] from Equation 11 in Section 2.2.3, which we restate here:

τiΓUim2(1maxτiΓ{Ui})+maxτiΓ{Ui}

As with global EDF, without loss of generality we can say that τj is the task with the maximum utilization, i.e., Uj=maxτiΓ{Ui}. Then we can restate the schedulability condition as:

τi,ijUi+m2Ujm2 (14)
Theorem 8.

For a set Γ of elastic tasks, the amount of compression λ needed to satisfy the schedulability condition in Equation 14 can be found by finding λ for the condition τiΓUim2 for a set of tasks Γ where the task τj=(Ujmin,Ujmax,Ej) in Γ has been replaced by a task τj=(m2Ujmin,m2Ujmax,m2Ej) in Γ.

Proof.

Our reasoning is the same as that of Theorem 7, but with terms m replaced by terms m2.

Consider the value λ satisfying τiΓUi=m2, i.e.,

τiΓmax{UimaxλEi,Uimin}=m2

Assume that τjΓ is parameterized as τj=(m2Ujmin,m2Ujmax,m2Ej). Then:

τiΓ,ijmax{UimaxλEi,Uimin}+max{m2Ujmaxλm2Ej,m2Ujmin}=m2

Equivalently,

τiΓ,ijmax{UimaxλEi,Uimin}+m2×max{UjmaxλEj,Ujmin}=m2

So τiΓ,ijUi(λ)+m2Uj(λ)=m2 and Γ is global RM schedulable.

Algorithm 4 can therefore be adapted to global RM scheduling by simply changing the transformation of τi into τj and compressing to a utilization bound of m2. For completeness, we outline this procedure as Algorithm 5.

Algorithm 5 Elastic_Global_RM(Γ,m).

5.3 Evaluation

To quantify the improvements offered by our proposed algorithms for global EDF and RM, we compare their execution time to the original algorithms of Orr and Baruah in [22, 21].

We use the same approach to compile and measure execution times as described in Section 4.3, and run on a Raspberry Pi 3 Model B+ as before. We also use the same task sets generated in Section 4.3 to match Orr and Baruah’s methodology in [22]. We use the array-based implementatio of Algorithm 2, as this performed best in the experiments of Section 3.2.

5.3.1 Global EDF

We begin by comparing the execution time of Algorithm 4 to Orr and Baruah’s elastic compression algorithm for global EDF [22, Algorithm 2], which we summarize in Section 2.2.2. As before we search for λ with granularity ϵ=λmax1000.

Figure 10 shows – for each combination of m, n, α, and u – the distribution of speedups achieved by our exact algorithm over iterative search, with horizontal markers indicating median and maximum values. Figure 11 shows the distributions of each algorithm’s execution times. Task sets not requiring compression (λ=0) are excluded, as are those deemed infeasible under any amount of compression. We see that our proposed algorithm achieves significant speedups, especially for larger values of α and u. These task sets have larger total maximum utilizations, and therefore tend to need more compression to achieve schedulability. In such cases, the linear search takes longer to reach the higher λ value, so our exact algorithm is consistently faster. Median speedups for each combination were as high as 37×, while the maximum speedup observed was 60×.

Figure 10: Speedups achieved by Algorithm 4 over iterative search.
Figure 11: Execution times for global EDF compression algorithms.
Figure 12: Speedups achieved by Algorithm 5 over iterative search.
Figure 13: Execution times for global RM compression algorithms.

5.3.2 Global RM

We next compare the execution time of Algorithm 5 to Orr and Baruah’s elastic compression algorithm for global EDF from [21], which we summarize in Section 2.2.3. As before we search for λ with granularity ϵ=λmax1000.

Figure 12 shows the distributions of speedups achieved by our exact algorithm over iterative search and Figure 13 shows the distributions of their execution times. As in Figures 10 and 11, task sets not requiring compression (λ=0) are excluded, as are those deemed infeasible under any amount of compression. We again see significant speedups, with similar patterns to the above results for global EDF scheduling. Median speedups were as high as 43×, while the maximum speedup observed was 51.8×.

6 Conclusions and Future Work

This paper has presented and evaluated new approaches to elastic scheduling of implicit-deadline tasks. We began by considering the original model of Buttazzo et al. [12, 13] for utilization-bound scheduling on a uniprocessor. We compared the execution times of the original quadratic-time algorithm of Buttazzo et al. from [11, Figure 9.29] to the quasilinear time algorithm that we proposed in [26]. In practice, we demonstrated that the original algorithm is significantly slower to compress tasks compared to our proposed approach. However, initializing a sorted data structure dominates the execution time of our algorithm. As a result, for smaller numbers of tasks (up to 50), there is no clear advantage to using our algorithm over that of Buttazzo et al. (or vice versa) to compress complete task sets for schedulability. However, we observed that our algorithm achieves significant speedup during admission of new tasks. Because admission control involves online scheduling decisions, it is especially important that its overhead remains bounded. Therefore, in situations where initialization has already occurred – e.g., during admission of a new task or in response to changes in available utilization – our algorithm is significantly faster than the original approach. As these are more likely to be the scenarios encountered during dynamic online scenarios where the adaptation must have predictably low overheads, our proposed approach has clear advantages.

We then considered elastic scheduling for partitioned EDF. We observed that the approach of Orr and Baruah in [22, 21], which searches iteratively for the optimal “amount” of compression λ with some precision ϵ, may be improved by employing binary, rather than linear, search. We also demonstrated an application of our quasilinear time algorithm. While it runs even faster, it is often pessimistic, at times overcompressing or deeming a schedulable task set to be infeasible. Nonetheless, we can imagine scenarios where this may be desirable. For example, in mixed-criticality systems [30, 10], if a job of a safety-critical task overruns its expected execution time, jobs of less critical tasks are traditionally dropped. Elastic frameworks have been proposed instead where the periods of low-criticality tasks are extended to maintain the service-level guarantees required by critical jobs [24, 29]. In such a situation, the decision must be made as quickly as possible. If our faster approach overcompresses, this is no worse than the inelastic case where all low-criticality jobs are dropped anyway.

Finally, we proposed exact compression algorithms for global EDF and RM scheduling, where Orr and Baruah also proposed iterative search for λ. We evaluated both algorithms and found them to be considerably faster, while providing a more precise result. This makes them more suitable for use in online systems where utilizations may have to be adjusted during runtime.

Elastic scheduling remains a promising approach for dynamic adaptation in overloaded real-time systems. As future work, we will continue to extend the framework to different classes of task systems and their corresponding scheduling policies, including algorithm PriD [15], for which Orr and Baruah again propose an approximate search technique [22]; global [6, 5] and partitioned [4] scheduling of constrained-deadline tasks; semi-federated, reservation-based federated, and bundled scheduling of parallel tasks; and mixed-criticality systems.

References

  • [1] S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel. Proportionate progress: A notion of fairness in resource allocation. Algorithmica, 15(6):600–625, June 1996. doi:10.1007/BF01940883.
  • [2] Sanjoy K. Baruah. Partitioned EDF scheduling: a closer look. Real Time Syst., 49(6):715–729, November 2013. doi:10.1007/s11241-013-9186-0.
  • [3] Sanjoy K. Baruah. Improved uniprocessor scheduling of systems of sporadic constrained-deadline elastic tasks. In Proceedings of the 31st International Conference on Real-Time Networks and Systems, RTNS 2023, Dortmund, Germany, June 7-8, 2023, pages 67–75, New York, NY, USA, 2023. ACM. doi:10.1145/3575757.3575759.
  • [4] Sanjoy K. Baruah and Nathan Fisher. The partitioned multiprocessor scheduling of deadline-constrained sporadic task systems. IEEE Trans. Computers, 55(7):918–923, 2006. doi:10.1109/TC.2006.113.
  • [5] Marko Bertogna and Michele Cirinei. Response-time analysis for globally scheduled symmetric multiprocessor platforms. In Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007), 3-6 December 2007, Tucson, Arizona, USA, pages 149–160. IEEE Computer Society, 2007. doi:10.1109/RTSS.2007.31.
  • [6] Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. Improved schedulability analysis of EDF on multiprocessor platforms. In 17th Euromicro Conference on Real-Time Systems (ECRTS 2005), 6-8 July 2005, Palma de Mallorca, Spain, Proceedings, pages 209–218. IEEE Computer Society, 2005. doi:10.1109/ECRTS.2005.18.
  • [7] Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. New schedulability tests for real-time task sets scheduled by deadline monotonic on multiprocessors. In James H. Anderson, Giuseppe Prencipe, and Roger Wattenhofer, editors, Principles of Distributed Systems, 9th International Conference, OPODIS 2005, Pisa, Italy, December 12-14, 2005, Revised Selected Papers, volume 3974 of Lecture Notes in Computer Science, pages 306–321. Springer, Springer, 2005. doi:10.1007/11795490_24.
  • [8] Enrico Bini and Giorgio C. Buttazzo. Measuring the performance of schedulability tests. Real Time Syst., 30(1-2):129–154, May 2005. doi:10.1007/s11241-005-0507-9.
  • [9] Tobias Blaß, Arne Hamann, Ralph Lange, Dirk Ziegenbein, and Björn B. Brandenburg. Automatic latency management for ROS 2: Benefits, challenges, and open problems. In 27th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2021, Nashville, TN, USA, May 18-21, 2021, pages 264–277. IEEE, 2021. doi:10.1109/RTAS52030.2021.00029.
  • [10] Alan Burns and Robert I. Davis. A survey of research into mixed criticality systems. ACM Comput. Surv., 50(6):82:1–82:37, November 2018. doi:10.1145/3131347.
  • [11] Giorgio C. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications. Springer, New York, 4th edition, 2024. doi:10.1007/978-3-031-45410-3.
  • [12] Giorgio C. Buttazzo, Giuseppe Lipari, and Luca Abeni. Elastic task model for adaptive rate control. In Proceedings of the 19th IEEE Real-Time Systems Symposium, Madrid, Spain, December 2-4, 1998, pages 286–295. IEEE Computer Society, 1998. doi:10.1109/REAL.1998.739754.
  • [13] Giorgio C. Buttazzo, Giuseppe Lipari, Marco Caccamo, and Luca Abeni. Elastic scheduling for flexible workload management. IEEE Trans. Computers, 51(3):289–302, March 2002. doi:10.1109/12.990127.
  • [14] Thomas L. Dean and Mark S. Boddy. An analysis of time-dependent planning. In Howard E. Shrobe, Tom M. Mitchell, and Reid G. Smith, editors, Proceedings of the 7th National Conference on Artificial Intelligence, St. Paul, MN, USA, August 21-26, 1988, volume 88, pages 49–54. AAAI Press / The MIT Press, 1988. URL: http://www.aaai.org/Library/AAAI/1988/aaai88-009.php.
  • [15] Joël Goossens, Shelby Funk, and Sanjoy Baruah. Priority-driven scheduling of periodic task systems on multiprocessors. Real-Time Systems, 25(2):187–205, September 2003. doi:10.1023/A:1025120124771.
  • [16] David Griffin, Iain Bate, and Robert I. Davis. Generating utilization vectors for the systematic evaluation of schedulability tests. In 41st IEEE Real-Time Systems Symposium, RTSS 2020, Houston, TX, USA, December 1-4, 2020, pages 76–88. IEEE, 2020. doi:10.1109/RTSS49844.2020.00018.
  • [17] Simon Kramer, Dirk Ziegenbein, and Arne Hamann. Real world automotive benchmarks for free. In 6th International Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS), volume 130, page 43, 2015.
  • [18] Chung Laung Liu and James W Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM), 20(1):46–61, 1973. doi:10.1145/321738.321743.
  • [19] Jane W.-S. Liu, Wei-Kuan Shih, Kwei-Jay Lin, Riccardo Bettati, and Jen-Yao Chung. Imprecise computations. Proc. IEEE, 82(1):83–94, 1994. doi:10.1109/5.259428.
  • [20] J. M. López, J. L. Díaz, and D. F. García. Utilization Bounds for EDF Scheduling on Real-Time Multiprocessor Systems. Real-Time Systems, 28(1):39–68, October 2004. doi:10.1023/B:TIME.0000033378.56741.14.
  • [21] James Orr and Sanjoy Baruah. Algorithms for implementing elastic tasks on multiprocessor platforms: a comparative evaluation. Real-Time Systems, 57(1):227–264, 2021. doi:10.1007/s11241-020-09358-9.
  • [22] James Orr and Sanjoy K. Baruah. Multiprocessor scheduling of elastic tasks. In Jérôme Ermont, Ye-Qiong Song, and Christopher D. Gill, editors, Proceedings of the 27th International Conference on Real-Time Networks and Systems, RTNS 2019, Toulouse, France, November 06-08, 2019, pages 133–142. ACM, 2019. doi:10.1145/3356401.3356403.
  • [23] James Orr, Christopher D. Gill, Kunal Agrawal, Sanjoy K. Baruah, Christian Cianfarani, Phyllis Ang, and Christopher Wong. Elasticity of workloads and periods of parallel real-time tasks. In Yassine Ouhammou, Frédéric Ridouard, Emmanuel Grolleau, Mathieu Jan, and Moris Behnam, editors, Proceedings of the 26th International Conference on Real-Time Networks and Systems, RTNS 2018, Chasseneuil-du-Poitou, France, October 10-12, 2018, pages 61–71. ACM, 2018. doi:10.1145/3273905.3273915.
  • [24] Hang Su and Dakai Zhu. An elastic mixed-criticality task model and its scheduling algorithm. In Enrico Macii, editor, Design, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013, pages 147–152. EDA Consortium San Jose, CA, USA / ACM DL, 2013. doi:10.7873/DATE.2013.043.
  • [25] Marion Sudvarg, Jeremy Buhler, Roger D. Chamberlain, Christopher D. Gill, James H. Buckley, and Wenlei Chen. Parameterized workload adaptation for fork-join tasks with dynamic workloads and deadlines. In 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2023, Niigata, Japan, August 30 - Sept. 1, 2023, pages 232–242. IEEE, IEEE, 2023. doi:10.1109/RTCSA58653.2023.00035.
  • [26] Marion Sudvarg, Chris Gill, and Sanjoy Baruah. Linear-time admission control for elastic scheduling. Real-Time Systems, 57(4):485–490, October 2021. doi:10.1007/s11241-021-09373-4.
  • [27] Marion Sudvarg, Chris Gill, and Sanjoy K. Baruah. Improved implicit-deadline elastic scheduling. In 14th IEEE International Symposium on Industrial Embedded Systems, SIES 2024, Chengdu, China, October 23-25, 2024, pages 50–57. IEEE, IEEE, 2024. doi:10.1109/SIES62473.2024.10768003.
  • [28] Marion Sudvarg, Daisy Wang, Jeremy Buhler, and Chris Gill. Subtask-level elastic scheduling. In IEEE Real-Time Systems Symposium, RTSS 2024, York, UK, December 10-13, 2024, pages 388–401. IEEE, IEEE, 2024. doi:10.1109/RTSS62706.2024.00040.
  • [29] Zhuoran Sun, Marion Sudvarg, and Christopher D. Gill. Elastic scheduling for graceful degradation of mixed-criticality systems. In Proceedings of the 32nd International Conference on Real-Time Networks and Systems, RTNS 2024, Porto, Portugal, November 6-8, 2024, pages 218–228. ACM, 2024. doi:10.1145/3696355.3699701.
  • [30] Steve Vestal. Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In Proceedings of the 28th IEEE Real-Time Systems Symposium (RTSS 2007), 3-6 December 2007, Tucson, Arizona, USA, pages 239–243. IEEE Computer Society, 2007. doi:10.1109/RTSS.2007.47.