Abstract 1 Introduction 2 System Model 3 Problem Statement 4 Approach 5 Evaluation 6 Related Works 7 Conclusions and Perspectives References

Integrating Multi-Level Mixed-Criticality into MCTS for Robust Resource Management

Franco Cordeiro ORCID LTCI, Télécom Paris, Institut Polytechnique de Paris, Paris, France Samuel Tardieu ORCID LTCI, Télécom Paris, Institut Polytechnique de Paris, Paris, France Laurent Pautet ORCID LTCI, Télécom Paris, Institut Polytechnique de Paris, Paris, France
Abstract

Managing actions with uncertain resource costs is a complex challenge, particularly in autonomous robot mission planning. Robots are often assigned multiple objectives with varying criticality levels, ranging from catastrophic to minor impacts, where failures can significantly affect system safety. Uncertainties in worst-case costs of resources, such as energy and operating time – the time it takes to carry out an action – further complicate mission planning and execution.

Monte Carlo Tree Search (MCTS) is a powerful tool for online planning, yet it struggles to account for uncertainty in worst-case cost estimations. Optimistic estimates risk resource shortages, while pessimistic ones lead to inefficient allocation.

The Mixed-Criticality (MC) approach, originally developed for real-time systems to schedule critical tasks by allocating processing resources under Worst-Case Execution Time (WCET) uncertainty, provides a framework of rules, models and design principles. We claim this framework can be adapted to autonomous robot mission planning, where critical objectives are met through analogous allocation of different kinds of resources such as energy and operating time despite uncertainties.

We propose enhancing MCTS with MC principles to handle uncertainty in worst-case costs across multiple resources and criticality of objectives. High-critical objectives must always be completed, regardless of resource constraints, while low-critical objectives operate flexibly, consuming resources within optimistic estimates when possible or being discarded when resources become scarce. This ensures efficient resource reallocation and prioritization of high-critical objectives.

To implement this, we present (MC)2TS, a novel variant of MCTS that integrates MC principles for dynamic resource management. It supports more than two criticality levels to ensure that the most critical components meet the most stringent safety and reliability requirements, while also enabling robust resource management. By enabling replanning and mode changes, (MC)2TS improves MCTS’s efficiency and enhances MC systems’ adaptability to both degrading and improving resource conditions.

We evaluate (MC)2TS in an active perception scenario, where a drone retrieves data from distributed sensors under unpredictable environmental conditions. (MC)2TS outperforms MCTS by achieving more objectives, adapting plans when costs drop. It explores more objective sequences, minimizes oversizing, and enhances efficiency. Balancing safety and performance, it monitors robot battery, mission and objective resource constraints such as deadlines. Its robustness ensures low-critical objectives do not compromise high-critical objectives, making it a reliable solution for complex systems characterized by uncertain resource costs and critical objectives.

Keywords and phrases:
Embedded Systems, Safety / Mixed-Critical Systems, Real-Time Systems, Energy Aware Systems
Copyright and License:
[Uncaptioned image] © Franco Cordeiro, Samuel Tardieu, and Laurent Pautet; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Computer systems organization Embedded and cyber-physical systems
; Computer systems organization Real-time systems ; Computing methodologies Planning and scheduling
Acknowledgements:
This material is based upon work supported by the CIEDS (Interdisciplinary Centre for Defence and Security of Institut Polytechnique de Paris) and by the FARO project (Heuristic Foundations of Robot Swarms).
Received:
2025-03-14  
Accepted:
2025-09-03  
Published:
2025-12-19
Part Of:
TGDK, Volume 10, Issue 2 (Special Issue on Industrial Real-Time Systems)

1 Introduction

Autonomous robot mission planning presents numerous challenges. In these missions, robots are assigned multiple objectives of varying criticality levels, which classify the impact of potential failures on system safety. More than two levels can be used to distinguish the failure impacts, ranging from catastrophic and hazardous to major and minor impacts [26]. The complexity increases due to uncertainties in worst-case resource cost estimations, such as energy, where rare factors can significantly affect expected resource consumption, impacting mission planning and execution [14]. Energy is not the only resource required to keep track of. The robot may have to meet strict constraints such as deadlines for mission completion or for highly critical objectives, making operating time another crucial resource to monitor. Under these conditions, the robot may allocate or reallocate the resources in order to prioritize critical objectives, potentially foregoing less critical ones.

In this work, we focus on operating time, the time to achieve an action or an objective (e.g., flight duration), which must be clearly distinguished from computational time (e.g., processing time). Unless explicitly stated, all references to time refer exclusively to operating time.

The case study we choose to illustrate our contribution concerns active perception missions. A drone actively gathers information about its surroundings. Multiple geographically distributed sensors collect data without being connected to a network. The robot retrieves data from each sensor via Bluetooth. It is crucial to complete this objective before the sensor runs out of battery or memory. It may also face uncertainties in resource costs, such as the energy or time needed to move under unpredictable environmental conditions, such as strong winds or heavy rain. The drone must complete as many high-critical objectives as possible, followed by less critical ones.

The real-time systems community has studied the challenge of cost uncertainty. The real-time scheduling algorithms are notably affected by the uncertainty in Worst-Case Execution Time (WCET) estimates [1]. Indeed, the WCET evaluation often cannot be carried out with precision (see Figure 1). The system designers may obtain optimistic WCET through measurement-based methods while certification authorities may require pessimistic WCET obtained through static code analysis techniques [31]. Therefore, the WCET can be bounded by either optimistic or pessimistic estimates depending on safety guarantees required by the function. Relying solely on pessimistic WCET estimations in system design may result in unnecessary oversizing. Conversely, leaning towards optimistic WCET estimations can lead to scenarios where execution budget constraints are exceeded before task completion.

Figure 1: Optimistic and pessimistic WCET estimates.

This challenge led to the emergence of the concept of Mixed-Criticality (MC) in real-time systems [5]. In this paradigm, tasks are systematically classified according to their criticality level, i.e. by assessing the consequences of a task failure. Higher critical tasks are usually imperative to the survival of the system, while lower critical tasks are related to services. Suppose a high-critical task must guarantee a failure probability of 109. Its pessimistic WCET estimates (or a pessimistic time budget) must ensure that the probability of exceeding this execution time is below 109. Alternatively, an optimistic WCET estimate (or an optimistic time budget) can be assigned, with a higher overrun probability (e.g., 105). If the task does not complete within its optimistic budget, a mode change is triggered and lower-criticality tasks are either discarded or degraded to release execution time resources. This reallocation must provide the high-criticality task with its pessimistic budget, restoring its failure probability to 109.

In context of action planning, a variety of heuristics have been proposed to navigate the complexities of decision-making [14]. Amid these heuristics, Monte Carlo Tree Search (MCTS) emerges as a popular choice for online planning in which an agent needs to navigate through a tree-like search space to find an optimal path or make optimal decisions. In this context, MCTS is not inherently designed to balance guaranteeing execution of some objectives and maximizing overall objective completion. As it is based on the Monte Carlo method, its decision is strongly influenced by the model’s most probable scenarios. Adapting MCTS to optimistic scenarios can lead to a resource shortage, while adapting to pessimistic scenarios can lead to oversizing the system.

To address these issues, our contribution is twofold. Firstly, we reformulate the MC approach in the context of uncertainty in resource costs. While the Mixed-Criticality (MC) approach was originally developed for real-time systems, where Worst-Case Execution Time (WCET) is the primary uncertain resource, we propose modeling the plan-building problem with both resource uncertainty and objective criticality as a generalized mixed-criticality problem. In this work, we broaden the scope of resource considerations to include energy and operating time requirements. Secondly, we design (MC)2TS, which integrates MC considerations into MCTS-based heuristics. It aims to anticipate changes in resource costs and adjust the decision-making process to optimize objective completion, resource utilization and system reliability in terms of criticality compliance. We accommodate an arbitrary number of criticality levels and incorporate both replanning and mode changes to enhance MCTS efficiency while also enabling MC systems to return to their initial mode after a mode change.

The rest of the paper is organized as follows. In section 2, we describe the system model and define some notations. Then we formulate the problem formally in Section 3. We describe our (MC)2TS heuristic in Section 4. In Section 5, we evaluate our heuristic and demonstrate the improvement in performance brought by our approach.

2 System Model

We consider a robot which selects a sequence of actions 𝒙 = (x1,x2,,xk) to fulfill a set of objectives so that it maximizes the value of a global objective function g(𝒙). The objective function takes into account several parameters such as the reward ri assigned to the completion of each objective xi. Without loss of generality, we assume that only one (potentially complex) action is needed to achieve one objective. For example, this action might involve multiple movements followed by data loading to accomplish the objective. Thus, we use the terms action and objective interchangeably.

We define an ordered set of L criticality levels: 12L based on the consequences of a failure. The system designer assigns a criticality level to an objective. At run-time, the system executes at a criticality mode allowing only future actions with criticality level in {,,L} to execute. When a mode change occurs in mode the system makes a transition to mode +1. The system always starts its execution in the lowest mode 1. The set of actions of criticality is defined as 𝒙.

In the following, we focus on the two-mode model. As noted by Burns and Davis [5], this abstraction captures essential system behavior under degraded conditions and provides a foundation for generalization to multiple criticality levels. In section 4.4, we generalize our contribution to an arbitrary number of criticality levels.

Each action xi is associated with an actual cost C¯i, a tuple whose elements represent the different resources used by the robots. In our case, the cost tuple would be C¯i=[di,ei]. C¯i[duration] (resp C¯i[energy]) designates the actual action duration di (resp energy ei) it takes to run action xi. The value of C¯i is not known a priori; it is observed while performing xi. Each sequence of actions (x1,x2,,xk) is associated with an actual accumulated cost b¯k[r]=h=1kC¯h[r] for resource r.

With two criticality modes, Cjilo[r] represents the optimistic worst-case cost in resource r of an action xi executed after xj when the system executes in low-criticality mode (lo-mode). Similarly to b¯k, we define the optimistic worst-case accumulated cost of a sequence (x1,x2,,xk) as bklo=h=1kChlo[r]. If b¯k exceeds bklo, the system switches to high-criticality mode (hi-mode). Cjihi[r] represents the pessimistic worst-case cost in resource r of an action xi executed after xj when the system executes in hi-mode. We assume that r,Cjihi[r]Cjilo[r]. Similarly to bklo, the pessimistic worst-case accumulated cost bkhi takes into account the worst case accumulated cost possible when a mode change occurs during the execution of any action of a sequence. The formula for this accumulated cost will be detailed later.

The cost of an action xi in lo-mode or hi-mode is determined not only by the optimistic or pessimistic estimates made by the system designer but also by the previously executed action xj. Indeed, as we consider actions as the entire process of reaching an objective and completing it, the position of the robot at the start of an action changes the total cost of executing it. Therefore, we represent the worst case cost in resource r of executing action xi after action xj as Cji[r]. In the MCTS literature, the dependency of the worst-case cost on the previous action is often implicit and the notation is typically shortened into Ci[r]; however, we explicitly highlight this dependency as it is required for analyzing the complexity.

With two criticality modes, the system runs in either high (hi-mode) or low (lo-mode) critical mode. It starts in lo-mode and when any resource consumption exceeds the lo-mode value, a switch to hi-mode occurs. A high-critical action (hi-action) is carried out whether the system is running in lo-mode or hi-mode. Its cost is the one of the current mode. Conversely, future lo-actions are discarded in order to free up resources for the hi-actions currently in the plan. However, if the system changes to hi-mode while a lo-action is ongoing, this action will continue until completion. This difference with the classical MC model will be explained in section 4.

At last, a seamless transition between modes is required: execution must continue running smoothly as far as hi-actions are concerned. The plan must therefore support a safe transition to hi-mode, even under the most pessimistic assumptions. This requires that the transition be anticipated beforehand, with resource budgets allocated to complete actions under hi-mode, even if it was initiated in lo-mode. Similarly, a return to lo-mode is allowed only when there is sufficient confidence that hi-criticality tasks can safely be executed within lo-mode.

This also means hi-actions must not depend on the execution of lo-actions. Regarding dependencies between lo-actions, we do not specify a method to check for dependencies between them. We only assume that it is possible to check them during planning, i.e. during the expansion and simulation phases of MCTS, and during execution to check whether every lo-action required for the current objective have been executed.

We can generalize the notation for any criticality mode . Cji[r] represents the worst-case cost in resource r of an action xi executed after xj when the system executes in mode . We define bk for the worst-case accumulated cost of a sequence (x1,x2,,xk) executed in mode . We also assume that r,,Cji+1[r]Cji[r].

We introduce a last refinement in the model. So far, resources are directly related to the robot, such as the energy available in its battery or the time budget for completing the mission. But, each individual objective may also have its own resource constraints, such as a time budget or a deadline before an objective runs out of memory or energy. These resource constraints may be known to the robot prior to the mission start if the system is not connected to a network. For these objectives, the system designer might also explore alternative degradation strategies to the disard model, such as the elastic task model (to extend the objective deadline) [6] or weakly-hard resource constraints (to extend the objective lifetime) [10]. Naturally, extending an objective’s survivability comes at the cost of degraded service quality.

3 Problem Statement

Our goal is to generate a robust plan that optimizes the objective function g, accounting for actions whose resource costs are uncertain. Traditionally, such planning problems rely on heuristics like Monte Carlo Tree Search (MCTS). However, MCTS is not well-suited to handling uncertainty in resource costs, which can result in failure to meet critical objectives at runtime.

The Mixed-Criticality (MC) approach was originally developed for real-time systems, using scheduling algorithms to allocate processing time under WCET uncertainty while enforcing safe mode transitions. While this framework of models and design principles effectively manages temporal uncertainties, we propose adapting it to autonomous robot mission planning - where multiple resources (energy, operational time, etc.) must be allocated under uncertainty. However, traditional MC systems handle only execution time through specific schedulers. To generalize this approach, decision-making heuristics like MCTS must be extended to: (1) support multiple resource types, and (2) maintain safe mode transitions as required by the MC approach.

To incorporate MC into the MCTS heuristic, the model used in the plan building must be adapted. For this purpose, it is essential to analyze how action costs influence the planning process. Among the four MCTS phases (outlined in Section 4.1), cost uncertainty primarily affects the selection and simulation phases. To achieve a more adaptive planning strategy, these phases must be improved to better handle changes between different worst-case cost estimates. Additionally, since MC systems are capable of supporting multiple criticality levels, a comprehensive mixed-criticality approach must account for an arbitrary number of worst-case cost estimates.

Some MC properties must also be present in our solution. Transitions between action sequences must be seamless, ensuring that sufficient resources are always available for both current and future critical actions. To improve robustness, the addition of actions of lower criticality should not affect the overall performance of the actions of higher criticality. This requires a certain isolation between the criticality levels to ensure that the resulting plan still accomplishes similar performance in high criticality modes as before the addition of the new objectives. Finally, in scenarios where costs are precisely known and all actions are critical, our approach should produce results similar to those achieved using a standard MCTS heuristic.

Research Objectives

Resilience:

We aim to develop a mission planning heuristic that maximizes the completion of critical objectives even under conditions where actual resource cost matches pessimistic assumptions. The resource budgets must be strictly enforced, thus guaranteeing safety.

Efficiency:

We aim to maximize the completion of non-critical objectives under conditions where actual resource cost matches optimistic assumptions, thus minimizing overdimensioning the system. Under stable conditions (whether optimistic or pessimistic), the planning process should produce results comparable with those achieved using a standard MCTS heuristic.

Robustness:

We aim to minimize the effect of adding new objectives on the number of higher-criticality objectives achieved. In particular, adding a new lowermost criticality level with new actions should not result in fewer higher-criticality objectives being achieved, even though the set of those higher-criticality objectives might change.

4 Approach

In this section, we present our (MC)2TS heuristic, an extension of MCTS heuristics to deal with uncertain constraints, such as the energy costs. Our (MC)2TS proposal is part of a larger mission control software:

  • The planner builds a plan by selecting a sequence of actions that the robot can safely perform in the current environment while optimizing the reward (Subsection 4.2, Subsection 4.4) and respecting global and objective-specific budget constraints such as deadlines (Subsection 4.6).

  • The scheduler executes the plan (Subsection 4.3) while constantly monitoring resource levels and switches to a higher criticality mode if resource consumption exceeds the allocated budget for the current mode.

  • Periodically, replanning occurs, which prompts the planner to design a new plan that is better suited to the current conditions; this plan will then be transmitted to the scheduler.

Our solution involves running the MCTS heuristic (Subsection 4.1) while incorporating the MC approach during the selection and simulation phases. We show in Subsection 4.5 that the addition of MC to MCTS does not increase the complexity.

4.1 MCTS: Phases Description

MCTS is composed of four phases: selection, expansion, simulation and backpropagation [3]. These four phases are executed iteratively to incrementally grow a tree, as shown in Figure 2. Each node k of the tree represents a sequence of actions 𝒙k=(x1,x2,,xk) and contains data about the expected reward of the subsequent sequences of actions. During each iteration, a new leaf node is added to the tree, and the statistics within the ancestor nodes are updated accordingly. This iterative process repeats until a computation budget is reached. Note that a computation budget is a standard notation in MCTS literature to designate processing limits to which the heuristic expands the tree. It should not be misunderstood with operating time budgets or worst-case execution time budgets.

Figure 2: MCTS phases.

In the selection phase, the heuristic selects an expandable and rewarding node. An expandable node is defined as a node that has at least one child that has not yet been visited during the search. The heuristic begins at the root node of the tree and recursively traverses child nodes until an expandable node is reached. In order to check whether an action xi is feasible at node k, we verify conditions such as whether the accumulated cost after executing the sequence (x1,x2,,xk,xi) does not surpass the budget. Therefore, cost uncertainties affect the result of this phase.

During the expansion phase, a leaf node k+1 is added to the selected node k by choosing an action xk+1 among the actions previously computed as possible to execute after 𝒙k. This phase is thus indirectly affected by the uncertainty of costs.

In the simulation phase, the expected value of the reward of the new node is computed by simulating possible scenarios after the execution of 𝒙k+1 using a rollout policy. This policy can be a random policy or a heuristic tailored to the problem. A maximum horizon is defined to limit the length of the action sequences simulated within this phase. During the execution of the rollout, it is important to know which sequences of actions are possible after 𝒙k+1, considering environment and budget. Therefore, cost uncertainties affect the result of this phase.

In the backpropagation phase, the result of the simulation phase for the new node is added to the statistics of all nodes along its path up to the root of the tree. These statistics are usually unbiased estimators of the rollout evaluations for the objective function. In this phase, the expected reward of each of the ancestor nodes is updated (backpropagated) with a more precise value, as we have more data about the child nodes resulting from the simulation. This phase is not impacted by the uncertainty of costs.

MCTS is an algorithm that can be greatly optimized by rebuilding the plan periodically during the mission execution, a process known as replanning. Usually, a replan occurs after an action has been completed and MCTS is executed to determine a new set of actions. As the system state is updated, the new plan is better adapted to the new possible outcomes of the mission execution. Replanning thus leads to a more efficient use of the remaining resources by adapting to the new reality of the situation. Thus, our approach can also benefit from replanning, since a change of mode is no substitute for updating the plan to reflect the current system state.

4.2 (MC)2TS: Plan Building for Dual-Criticality Systems

We first examine the simpler scenario involving two criticality levels. In our approach, each action has two costs, one in hi-mode, one in lo-mode. Thus, each node k in the MCTS action tree has two associated accumulated costs bkhi and bklo, with the accumulated costs for the root node b0hi and b0lo being zero. The computation of these accumulated costs depend on the action sequence 𝒙=(x1,x2,xk) assigned to nodes 1,2k.

For lo-objectives, the accumulated costs for any resource r are calculated as

bklo =bk1lo+C(k1)klo (1)
bkhi =bk1lo+C(k1)khi (2)

whereas for hi-objectives, the accumulated costs are

bklo =bk1lo+C(k1)klo (3)
bkhi =maxhkj<k(bjhi+Cjkhi) (4)

where hk=max{n[1,k1]xn𝒙hi} with the convention max{}=0

In lo-mode, as all actions are executed, the accumulated costs for hi and lo actions are computed the same way. This means the accumulated cost is simply the sum of C(k1)klo of every node k in the branch, as shown in Equation 1 and Equation 3.

In case of a budget overrun, the system may change modes during the execution of a lo-objective and must proceed to execute the next hi-objective, if one exists. However, the cost of executing the hi-objective cannot be predetermined from an unknown intermediate system state, as the system state is only fully known at the end of each action. To address this, we ensure the current action completes by allocating it a budget in hi-mode, even if it is a lo-objective.

In hi-mode, since a lo-objective must complete its execution in hi-mode during a mode change, its cumulative cost in hi-mode is calculated by adding the cost of executing xi in hi-mode to the cumulative cost of executing the previous action in lo-mode (as outlined in Equation 2).

Figure 3: Possible mode changes in one branch. The criticality of each node is derived from the criticality of the action leading to that state.

In hi-mode, the accumulated cost associated with a hi-objective xi must account for the worst-case scenario among several possible situations. Figure 3 show the possible executions for one branch: the system might have operated exclusively in hi-mode during the execution of the current action xi (x4); it could have already been running in hi-mode while executing the previous hi-objective xhk (x1); or it might have transitioned from lo to hi-mode during the execution of a prior lo-objective xl, where hk<l<i (x2 or x3). When node i is added to the tree, its accumulated cost is determined by taking the maximum cost among these potential scenarios (as described in Equation 4).

Note that in the cases where there are no uncertainties, i.e. i,Cjilo=Cjihi=Cji, then k,bklo=bkhi=iC(k1)k. This means that (MC)2TS behaves exactly as MCTS when the action costs do not change between lo and hi mode assumptions.

MCTS or (MC)2TS cannot guarantee that all hi-objectives will be achieved due to resource limits. To prevent lo-objectives from overriding hi-objectives, the rewards for hi-objectives should be greater than the sum of lo-objectives ones: xi𝒙hi,xj𝒙lorj<ri. Thus, hi-objective rewards become the greatest contributors to the objective function.

4.3 (MC)2TS: Plan Execution for Dual-Criticality Systems

Figure 4: (MC)2TS to scheduler instructions and constraints.

Figure 4 illustrates how the outcome from the (MC)2TS step is mapped into instructions for the scheduler. In this example, the selected action sequence is (x0,x1,x2,x3). x3 is a hi-objective, while x0,x1,x2 are lo-objectives.

For every action xk, the accumulated cost in lo mode bklo[r] from the (MC)2TS tree is used as the budget corresponding to resource r. The systems starts in lo-mode. After executing any action xk, the actual accumulated cost of each resource r (e.g.,, duration, or energy) is compared against bklo[r]. If the budget is exceeded for at least one resource, the system switches (or stays) in hi-mode.

When the system is running in hi-mode, it discards the next action xk if it is a lo-objective. This ensures that the resources consumption stays below the computed hi-mode accumulated cost bkhi[r]. In the setup shown on Figure 4, only x0,x1,x2 can be dropped as x3 is a hi-objective.

Figure 4 represents an example of a mode change. The first line shows the consumption of resource r in lo-mode, while the second one is a possible scenario where an action surpasses its lo-mode cost. Line <r> represents the progressive usage of the available resources. When the mode change occurs, we let lo-objective x1 complete its execution in hi-mode, lo-objective x2 is discarded and hi-objective x3 is executed after x1 completion.

After transitioning to hi-mode, the system may incur costs similar to those in lo-mode, which can cause the actual accumulated cost for each resource r to fall below bklo[r] after executing action xk. In this case, the system reverts to lo-mode and executes the upcoming lo-objectives as usual. Although this allows for potential frequent mode changes if actual accumulated costs fluctuate around the accumulated cost in lo-mode, mode changes introduce no overhead: the mode is only consulted before each action to determine if the next action can be executed. Furthermore, hi-actions will be completed as planned, and lo-actions are scheduled whenever it is safe.

However, the system needs to check whether any previously dropped lo-objective xi is a dependency of the lo-objective xk it is about to execute. If this is the case, then action xk must be dropped as well. This ensures a seamless mode change to lo-mode, since lo-actions will have every required dependency before their execution. hi-actions do not have this issue as only lo-objectives can depend on lo-objectives.

Replanning, which involves rebuilding the plan during mission execution, often after completing an action, is a key process for optimizing the MCTS algorithm by generating new action sequences. When a replanning occurs, the resources are reallocated completely, and the system restarts in lo-mode after a new plan has been adopted. Note that replanning is no substitute for a mode change. Indeed, the mode change inherently adapts to new execution conditions by integrating this adaptation during its plan building phase, whereas replanning requires creating a completely new plan to address the new conditions. It is important to note that the robot must have the knowledge of which actions were already completed in order to replan accordingly, both to know which actions are still available and which dependencies of future actions were already completed.

The frequency of replanning is a trade-off determined during system design. On the one hand, frequent replanning enhances robot autonomy and reduces data exchanges but increases the risk of skipping multiple lo-objectives when transitioning to hi-mode. This conservative approach may lead to overly cautious actions, as the robot will prioritize safety based on assumed worst-case resource consumption for hi-objectives.

On the other hand, frequent replanning facilitates better adaptation to real conditions. If an objective requires fewer resources than initially estimated, a more ambitious plan can be formulated during the replanning process. However, it is crucial that the replanning executes promptly to avoid delays that consume additional resources, which could otherwise be used for executing objectives.

Finally, if bkhi[r] is exceeded for any resource r during or by the end of xk execution, it implies either wrong system design or accidental operation outside safe conditions, such as due to a hardware failure. In such scenarios, (MC)2TS outperforms MCTS in safety by eliminating lo-mode actions, enabling the system to recover and return to its nominal operating conditions.

4.4 (MC)2TS: Plan Building for Multiple Criticality Systems

(MC)2TS can be adapted to L>2 levels of criticality, with the lowest being the least critical. Each action xk, assigned a criticality level χk, will have L worst-case costs and accumulated worst-case costs, one for each mode, even for criticality levels higher than χk. Indeed, actions xk must be allocated a sufficient budget to guarantee their completion without interruption in the event of a mode change to a higher criticality level >χk.

To ensure that lower criticality actions do not take priority over actions of higher criticality levels, the system designer must ensure that the action rewards enforce xi𝒙,xj𝒙(χj<)rj<ri.

Lemma 1.

Transitioning from mode to a higher mode m>, and then returning to mode has no impact on the worst-case accumulated cost bk.

Proof.

After transitioning to mode m to execute action xi, the system may encounter costs to those of mode . As a result, the actual accumulated cost for each resource r may fall below the expected worst-case accumulated cost bj[r] after executing the subsequent action xj (with xixjxk). In such cases, the system reverts to mode and continues executing subsequent objectives in that mode.

Therefore, for every resource r, the accumulated cost after executing xj is less than or equal to the worst-case accumulated cost bj that would have been incurred without any transition to mode m. Consequently, transitioning to mode m> and returning to mode has no impact on the worst-case accumulated cost bk.

Lemma 2.

In a system where costs for any individual action increase with the criticality level, the worst-case accumulated cost is increasing with the criticality level:

j,i,,Cji+1Cjik,,m{1,,},bkmbk

This is true even if changes of the criticality mode led to skip some earlier actions in the plan.

Proof.

Thanks to Lemma 1, we only need to consider and guarantee higher criticality modes during plan building.

Let xjχj denote an action xj executed in mode chij. We consider the action sequence (x1χ1,x2χ2,,xk), which results in bk, the worst-case accumulated cost in mode for node k. Note that the final action xk is executed in mode .

From this sequence, we can derive another valid sequence (x1χ1,x2χ2,,xk+1). This new sequence is valid because, if one or more mode changes occur during the execution of the final action xk, our model enforces that the action must still be completed at any criticality level. In this case, we assume the system reaches level +1.

If Ck1k+1Ck1k, then the accumulated cost of this valid sequence in mode +1 is greater than or equal to the worst-case accumulated cost in mode , i.e., bk. Thus, the worst-case accumulated cost in mode +1, denoted bk+1, is also greater than or equal to bk. This result implies that the worst-case accumulated cost increases with the criticality level.

Theorem 3.

In a system with a number of criticality levels L2, the worst-case accumulated cost bk for node k in mode is:

bk11+C(k1)k1 if =1 (5)
maxhkj<k(bj+Cjk), if 1<χk (6)
maxhkχkj<k(bjχk+Cjk), if >χk (7)

where hk=max{n[1,k1]χn} with the convention max{}=0, and b0=0 for all .

Proof.

In mode 1, there are no prior mode transitions to consider, neither at the lowest criticality level (since mode 1 is the least critical) nor at higher levels (as ensured by Lemma 1). As a result, the accumulated worst-case cost corresponds to both Equation 1 and Equation 5.

In modes χk, multiple possible mode changes must be considered for the computation of the worst-case accumulated cost, similar to Equation 2, and as described in Figure 3. Since any action with criticality level greater than will always execute, we define hk the earliest possible node to skip to node k as the last node with an action of criticality χh. Indeed, nodes j before hk can at most skip to other nodes nhk in mode . Every action xj such that jhk may trigger a mode change and skip to xk. The maximum value of their accumulated cost must be considered, as well as the cost Cjk to execute node k at mode . According to Lemma 2, the worst case accumulated cost for node j will be at the highest possible level . Thus, the worst-case accumulated cost at node k must consider the worst-case accumulated costs in nodes j at mode .

In modes >χk, the only possible mode change to the current mode occurs during execution, similar to the mode changes considered in Equation 2. Indeed, action xk can only execute in that criticality mode if it was already executing when the mode change occurred. Therefore, the most resource-consuming case would be would be transitioning from a mode χjχk to . According to Lemma 2, the greatest accumulated cost will be at mode χj=χk.

4.5 (MC)2TS: Complexity Analysis

An important consideration is how much (MC)2TS impacts the computation time of MCTS. MCTS has been previously successfully applied to real-time environments [8, 15, 21]. Therefore, if (MC)2TS has a similar execution time, then its applicability is guaranteed. We first evaluate the time complexity of MCTS in our application. MCTS executes I iterations, each one executing the four phases. Consider the current depth of the tree is d, the number of possible actions is n, the simulation horizon is h, the number of different scenarios simulated at each simulation phase is m, and the complexity of computing the accumulated worst-case costs for one node is c.

In the selection phase, we traverse d nodes in the tree, making ni comparisons at each node i, having a time complexity of OMCTSslc=𝒪(0d1(ni))=𝒪(d22+d(n12)). The expansion phase has a constant time complexity. In the simulation phase, we run one simulation traversing d nodes while computing costs, then m simulations traversing h nodes, having a time complexity of OMCTSsim=𝒪((d+hm)c). Moreover, the rollout can be trivially parallelized, making its complexity in a processor with C cores become hmCc and OMCTSsim=𝒪((d+hmC)c). In the backpropagation phase, we traverse the tree backwards by d nodes, having a complexity of OMCTSbck=𝒪(d). Since d is bound by the number of actions n, as a completed objective will not be executed again, the complexity of one MCTS iteration in this application is:

OMCTSsim=𝒪((n+hm)c) (8)
OMCTS=𝒪(2n2+cn+chm/C) (9)

The difference in complexity between MCTS and (MC)2TS is the complexity c of computing the worst-case accumulated costs for a node, and thus the complexity of the simulation phase. For MCTS, we can consider c as 𝒪(1), and OMCTSsim becomes 𝒪(n+hmC). As for (MC)2TS, we have to compute every possible cost Cji to compute the accumulated costs. Each action xi of criticality χi in an action sequence has L possible completions. Following each completion in mode , the next action to be executed will be a different action xj with a criticality χj. The costs in modes (,+1,,L) must be computed. Modes χj< are not concerned, since when an action completes in mode , it will skip actions of criticality inferior to , and completing in lower modes will inevitably be less costly. Therefore, at each node of the chosen branch, there are =0L1(L)=L2+L2 costs that will be computed throughout the execution, and the time complexity O(MC)2TSsim becomes:

O(MC)2TSsim=𝒪(nL2+L2+hmC(L2+L)2)=𝒪(nL22+hmCL22) (10)
O(MC)2TS=𝒪(2n2+nL22+hmCL22) (11)

This analysis leads to three conclusions. First, the complexity of both MCTS and (MC)2TS depends quadratically on the number of objectives, n2. Indeed, L typically ranges between 2 and 3. While some systems, such as those complying to the DO-178C standard for avionics [2], may use up to 5 criticality levels, larger values are uncommon. As a result, in most systems, n will be significantly larger than L2, making 𝒪(2n2) the dominant factor in the time complexity of the algorithm. Second, the complexity of (MC)2TS depends quadratically on the number of criticality levels, L2, meaning the increase in time complexity is impacted by the value of L. At last, given the existence of efficient real-time MCTS implementations, our complexity analysis demonstrates the feasibility of a real-time (MC)2TS implementation.

4.6 (MC)2TS: Resource Consumption, Mission and Objective Constraints

Our model integrate budget constraints at both the robot level (e.g., mission deadline) and the objective level (e.g., objective deadlines). Our analysis has so far focused on robot-centric resource budget constraints, particularly battery capacity and mission deadline. This timing constraint is enforced during planning by ensuring the accumulated worst-case time costs (bklo[time]) never exceeds the mission deadline, effectively treating it as a resource budget constraint.

However, objective-level timing budgets are equally crucial. For example, a sensor may exhaust its energy or storage before the mission deadline. To address this, system designers can impose per-objective time budget constraints, requiring each objective to be completed by its specified deadline. Missing this deadline could result in the loss or degradation of the reward associated with achieving that objective.

Within the (MC)2TS framework, objective timing constraints follow the same budget overrun management approach as mission timing constraints. For each objective xi, the robot operates within its allocated worst-case time budget (bklo[time]). When this budget is exhausted – indicating a deadline miss – the system transitions to hi mode. In this state, lo-objectives are automatically discarded while hi-objectives remain active, with the system ensuring their completion by reallocating time originally designated for lower-priority objectives.

However, discarding objectives because they are classified as lo-objectives might be seen as a disproportionate response, while categorising them as hi-objectives could be viewed as an inappropriate design choice. The MC approach proposes alternatives to the discard model, such as the elastic task model [6] or the weakly hard degradation model [10]. In the case of the elastic task model, when the system switches modes, lo-tasks in hi-mode have their period or deadline lengthened, freeing up resources for hi-tasks. In the same way, a weakly hard system will harvest fewer results in order to use fewer resources.

While our approach can be extended to incorporate mechanisms such as elastic deadlines or weakly hard constraints, this extension remains outside the scope of the current paper. Future work could explore its integration and assess its impact on system performance. In the next paragraph, we provide to the interested reader the main ideas on how to incorporate these alternative approaches into our model.

Since robots and objectives are not inherently connected in our framework, a robot detecting a resource overrun cannot directly instruct an objective to modify its behavior. Instead, the objective itself may transition to a degraded mode when approaching or missing its deadline. For example, upon reaching its initial deadline, an objective could autonomously adopt weakly hard constraints or extend its deadline. The robot can be preconfigured with knowledge of this degradation mechanism, eliminating any need for real-time communication between the robot and objectives. To prevent degraded behavior, any objective that transitions to this behaviour must receive a corresponding reduction in reward. Importantly, the revised reward value must still satisfy the criticality hierarchy constraint: remaining strictly greater than the sum of rewards for all lower-criticality objectives (Section 4.2).

5 Evaluation

In this section, we evaluate the performance of our algorithm in a data collection scenario with regard to the research objectives listed in Section 3:

  • How much does (MC)2TS improve the resilience of the system by increasing the number of completed hi-objectives when actual resource costs match those expected in hi-mode while guaranteeing resource constraints?

  • How much does (MC)2TS reduce system oversizing by increasing the number of completed lo-objectives when actual costs match those expected in lo-mode while guaranteeing resource constraints?

  • How much does (MC)2TS improve the robustness of the system by not altering the results of high critical levels when lower levels change.

We evaluated our solution regarding the research objectives in 6 simulations:

  • Maximize objectives under pessimistic assumptions (Section 5.3): we compare our solution to an MCTS implementation that makes optimistic assumptions about costs, maximizing the number of objectives included in the plan and increasing efficiency. This experiment should demonstrate how considering only one possible cost per action can lead to the system being unsafe when actual costs match the pessimistic estimates.

  • Maximize objectives under optimistic assumptions (Section 5.4): we compare our solution to an MCTS implementation that makes pessimistic assumptions about costs during the plan building. Indeed, due to the safety of the system, every lo or hi-objective in the plan can be executed during the mission even when actual costs match those expected in hi-mode. This experiment should demonstrate how considering only one possible cost per action can lead to the system being oversized when actual costs match the optimistic estimates.

  • Maximize objectives under middle case assumptions (Section 5.5): we compare our solution to MCTS implementations that consider costs in a range between the optimistic costs and the pessimistic costs.

  • Enforce objectives timing constraints (Section 5.6): we evaluate our solution’s ability to ensure the deadline of hi-objectives and the impact of this constraint in the efficiency of the plan.

  • Minimize impact of low criticality objectives on higher critical ones (Section 5.7): we compare our solution in two situations, one considering 2 and another considering 4 criticality levels. We evaluate the influence of adding intermediary levels of criticality with new objectives.

  • Computational budget influence (Section 5.8): we evaluate whether our solution’s performance, when compared to MCTS, is affected by the computational budget.

5.1 Problem specification

Let us consider a scenario in which a drone is deployed to collect data from a set of sensors distributed across an agricultural field. The drone operates under constraints related to both energy consumption and operating time (flight duration or mission deadline). Each of the sensors can be visited independently of the others. Since some sensors are approaching battery depletion, it is crucial to prioritize retrieving their data before they lose power. However, the energy and time required for navigation are subject to stochastic variations due to potential environmental disturbances, such as wind conditions or unforeseen obstacles that may impact the drone’s motor efficiency.

The energy and time consumption associated with movement exhibit a lower bound under ideal conditions but increase as uncertainties arise. Specifically, the more adverse the conditions, the greater the expected cost. To model this variability, we represent the movement cost as a random variable following a half-normal distribution. The minimum possible cost is defined as half of the worst-case cost in lo-mode, denoted by Clo. The standard deviation of this distribution is environment-dependent: under optimistic operating conditions, it is set to Clo10, whereas in more challenging scenarios, it is increased to Clo3, thereby increasing the probability of incurring higher movement costs.

5.2 Experiment setup

The environment is represented as a 100×100 grid, within which the robot can navigate freely. The movement and data retrieval costs considered in this study are presented in Table 1. A unit of movement corresponds to the side length of a single tile in the grid. In hi-mode, each hi-objective is permitted to consume up to twice the previously allocated budget, while lo-objectives are discarded. Energy costs are expressed as a percentage of the total battery capacity. To investigate the impact of resource constraints, the maximum energy budget is set to 60% of the total battery level. The decision-making process employs a random rollout policy in conjunction with the Upper Confidence Bound for Trees (UCT) [20] selection policy – a best-first approach that computes an upper confidence bound to evaluate the optimality of selected nodes.

The implemented action set consists of navigating to an objective location and retrieving data from the sensor at that location. The computation budget, defined as the number of times the Monte Carlo Tree Search (MCTS) executes the selection phase, is set to 600. The horizon, representing the maximum sequence length of actions tested during rollouts, is fixed at 5. Furthermore, the C parameter in UCT, which governs the balance between exploration of alternative paths and exploitation of the best current path, is set to 0.5. The cost of an action is calculated based on the distance between the current objective location and the next objective.

The robot initiates its trajectory from one corner of the grid and is expected to reach the diagonally opposite corner. To demonstrate the effects of a mode change, replanning occurs every two actions, allowing the robot operating under (MC)2TS to execute portions of the mission in hi-mode. This prevents an immediate return to lo-mode after each mode transition, which would occur if replanning were performed after every action. A total of 50 distinct scenarios were generated, each comprising 15 randomly distributed objective locations, 4 of which are designated as high-critical objectives and 11 as low-critical objectives. Both MCTS and (MC)2TS were executed 100 times per scenario, and performance was analyzed under varying time budget constraints.

The objective function considered is

g(𝒙,t)=i𝒙riiritB[time]×104 (12)

where 𝒙 is the set of actions already completed, ri is the reward for completing action xi, t is the time consumed from the beginning of the mission until the end of the last action and B[time] is the total time budget available (or the mission deadline).

The tB[time] expression is used to prioritize plans that achieve the same objectives in less time. The 104 weight is used to ensure this value is always below the reward of achieving an objective. The iri factor ensures the total value is always below 1, which is necessary for UCT. For (MC)2TS, we use blo[time] as t, as it is the accumulated cost in lo-mode. The reward for reaching the recharge site is 1.0, while the one for performing any other hi-action is 0.2 and for completing any lo-objective 0.0166. Note that xi𝒙hi,xj𝒙lorj<ri as hi-objectives are our priority. As reaching the recharge site is the main hi-objective we want to ensure is in the plan, its reward is greater than the sum of the rewards of the other hi-objectives.

We want to assess each algorithm’s ability to find a plan where the robot retrieves data from objective points and reaches the charging site before exhausting its allocated budget. A robot can fail to reach the recharging site due to unexpected high action costs. In this situation, the number of achieved objectives will be counted as zero.

Table 1: Action costs.
Action Clo Clo Chi Chi
[duration] [energy] [duration] [energy]
Move (one unit) 2.0 0.1% 4.0 0.2%
Retrieve data 5.0 1.0% 10.0 2.0%

5.3 Maximize objectives under pessimistic assumptions

Figure 5: MCTS plans built with optimistic cost assumptions but executed under pessimistic costs.

We evaluate the performance of (MC)2TS in comparison to MCTS, where the latter is configured to generate plans that optimize the number of actions under optimistic assumptions. As the plans do not guarantee resource constraints in the most pessimistic scenarios, these plans may fail to ensure a safe return to the charging site.

If we consider our efficiency goal, for which we try to optimize the optimistic case, this MCTS configuration produces plans achieving a greater number of objectives. Consequently, if we compare its performance with that of (MC)2TS under conditions where the costs in mode lo are never exceeded, it will often achieve more objectives. However, if we consider conditions where the costs in mode lo are exceeded, the MCTS configuration generates unsafe plans that require more resources than are available. This is relevant for our resilience goal. Consequently, we evaluate how often MCTS plans fail to handle such situations. We also evaluate the number of total objectives achieved by (MC)2TS versus MCTS during these missions.

The experimental results are presented in Figure 5 .The first graph shows that optimistic MCTS frequently fails midway through mission execution under lower budgets, significantly reducing its effectiveness. In contrast, as illustrated in the second graph, (MC)2TS achieves more objectives even with higher budgets. This is attributed to its capability to revert to lo-mode once cumulative costs fall below optimistic assumptions. Consequently, (MC)2TS provides a safer and more reliable execution under pessimistic cost conditions.

5.4 Maximize objectives under optimistic assumptions

We evaluate the performance of (MC)2TS in comparison to MCTS, where the latter is configured to exclusively generate plans that never exceed the allocated resource budget.

When focusing our safety objective where we only consider conditions where actual costs are pessimistic, this MCTS configuration is optimal as it is designed to never surpass the allocated budget. Indeed, every action in the plan will always be executed. In the most pessimistic scenario, (MC)2TS will drop every lo-objective and execute a plan that only contains hi-objectives. It will compute the accumulated cost as the sum of Chi[r], just as the pessimistic MCTS configuration. Therefore, under conditions where actual costs are pessimistic, both approaches will execute similar amounts of hi-objectives, making them equal when it comes to safety.

When considering an optimistic situation and our efficiency objective, MCTS may execute less lo-objectives than (MC)2TS due to its pessimism. Thus, we evaluate the number of objectives achieved by (MC)2TS compared with that of MCTS by simulating situations where the budget in lo mode is never exceeded.

Figure 6: MCTS plans built on pessimistic costs executed under optimistic costs.

The experimental results, shown in Figure 6, reach a peak in values due to the constrained energy budget. In this scenario, where MCTS is configured to generate plans exclusively based on pessimistic costs while operating under optimistic resource costs, (MC)2TS outperforms MCTS when the budget is insufficient to achieve all objectives. This is because (MC)2TS maintains an accumulated cost closer to that of execution under optimistic actual costs, allowing it to explore more action sequences by leveraging the previously conserved budget. As a result, (MC)2TS makes less pessimistic assumptions about the environment, effectively reducing system oversizing and improving overall efficiency.

5.5 Maximize objectives under middle case assumptions

As shown previously, (MC)2TS outperforms MCTS when MCTS uses either optimistic or pessimistic costs. In this subsection, we evaluate whether MCTS can outperform (MC)2TS when considering costs that lie between these two extremes. Intermediate costs may offer a balanced trade-off, ensuring safety while maintaining efficiency. Such costs would be less pessimistic, preserving a degree of efficiency while providing the robot with a safety margin sufficient for most scenarios. To cover this possibility, we conducted the same simulations as those used for optimistic MCTS, but compared them with an MCTS implementation that incorporates higher costs for movement and data retrieval. We implemented the same simulation of a mode change used in the optimistic MCTS simulation to give these MCTS implementations a higher chance of being safe.

We simulated 4 costs considered by MCTS in the range between Clo and Chi for both time and energy resources. The costs were rounded for the time cost to be always an integer.

Figure 7: MCTS plans built on middle-case costs executed under pessimistic costs. Each MCTS number represents the cost for retrieving data considered.

The experimental results are displayed in Figure 7. While increasing the considered cost reduces the unsafety of MCTS, no cost configuration below the pessimistic threshold makes it entirely safe. Additionally, for budgets exceeding 800, (MC)2TS consistently achieves more objectives than MCTS. Thus, even MCTS implementations with less pessimistic assumptions fail to provide a better solution in terms of balancing safety and performance.

5.6 Enforce objective timing constraints

To assess the impact of respecting the deadlines for hi-objectives, increasing the resilience on (MC)2TS’s plan building, we simulate a scenario in which each hi-objective has a deadline of 500 time units. We compare it to the plan generated by (MC)2TS for a scenario where no objective deadlines are imposed.

Figure 8 shows the distribution of the objectives in the map used. The hi-objectives are agglomerated close to the charging station. The robot must then haste to complete these objectives first to avoid missing their deadline in the pessimistic case. The lo-objectives are separated in two groups at opposite corners of the grid. In this scenario, the robot needs a higher budget to visit all lo-objectives than in the scenario where hi-objectives do not have any deadlines.

Figure 8: Objective distribution in the deadline scenario. The blue X represents the starting point, while the red X represents the finishing point. hi-objectives are represented as red circles while lo-objectives are green circles.
Figure 9: Simulation with hi-objective deadlines. The simulations considered an optimistic environment.

The experimental results are shown in Figure 9. In lower budgets, the robot only completes the hi-objectives, as it does not have enough budget to return and complete lo-objectives after. As the budget increases, the robot is then capable of visiting one lo-objectives group after completing the hi-objectives, and both cases are capable of completing every objective given enough budget. However, from a time budget of 800 onward, the robot in the scenario with deadlines completes less objectives. This is due to the robot having to traverse a longer path to respect the hi-objective deadlines.

Therefore, we successfully guaranteed the completion of objectives before a deadline, increasing the resilience of the system. However, respecting objective deadlines even in hi-mode during planning may produce worse results under conditions where the execution is always in lo-mode. Thus, respecting deadlines in hi-mode should be reserved for hi-actions. For lo-actions, it is more coherent to respect deadlines only in lo-mode, even though they may execute in hi-mode.

5.7 Minimize impact of low criticality objectives on higher critical ones

To evaluate the effects of adding objectives of lower criticality levels, we compared (MC)2TS with L=4 and with L=2 criticality levels. To achieve our robustness goal, we expect that the addition of actions of criticality levels <3 should not change the average results in level 4.

To verify this result, we simulated both (MC)2TS implementations in random scenarios where each criticality level has 4 objectives. The simulations were conducted under conditions where resource costs are optimistic estimates and the battery budget was set to 100%.

Figure 10: Simulation with 2 and 4 levels of criticality. hi represents actions of the highest level of criticality.

The results are shown in Figure 10. When considering the different criticality levels, increasing the budget never reduces the average number of objectives achieved at any criticality level. The average number of objectives of highest criticality achieved are shown side by side. As the time budget increases, more objectives are achieved in both configuration. Therefore, the possibility of completing more objectives of lower criticality does not reduce the number of objectives completed at the highest level. Moreover, this is similar for both implementations of (MC)2TS with every time budget. This demonstrates the robustness of our algorithm, as adding actions of lower criticality does not influence the results for higher criticality levels.

However, the computational overhead increases with the number of criticality levels. Therefore, if replanning is done often, this might impact on the overall performance of the robot considering L=4, especially if computing costs is computationally heavy. Thus, adding more levels of criticality should be reserved for systems where the computation of extra costs will not significantly impact on the planning time.

5.8 Computational budget influence

An essential consideration is whether the results from previous experiments remain consistent across different computational budgets and how this value impacts the comparison between the heuristics. To evaluate possible degradation, we simulate the scenario using pessimistic plans with a fixed time budget of 600 while varying the computational budget.

Figure 11: Simulation varying the computational budget under optimistic actual costs.

The experimental results are shown in Figure 11. As the computational budget increases, the algorithms are capable of finding more optimal plans. With higher budgets, both algorithms degenerate into a standard tree search, and they reach a peak number of objectives. As (MC)2TS cost assumptions are more optimistic, it consistently achieves more objectives on average than pessimistic MCTS. Therefore, (MC)2TS benefits are achievable with any computational budget.

5.9 Conclusions

(MC)2TS outperforms optimistic MCTS by achieving more objectives across varying budgets, leveraging its ability to revert to lo-mode when costs match those of lo-mode, ensuring safer and more reliable execution under conditions where resource costs match those of hi-mode. (MC)2TS also outperforms pessimistic MCTS by maintaining costs closer to the optimistic ones, enabling exploration of more action sequences through conserved resources, reducing execution under hi-mode conditions, minimizing oversizing, and enhancing efficiency. Moreover, even under less pessimistic assumptions, (MC)2TS successfully provides a better solution by effectively balancing safety and performance. Therefore, our solution is better adapted to uncertain environments than MCTS, whether it is configured with an adjusted objective function or built only on pessimistic assumptions. (MC)2TS also monitors multiple uncertain resources, such as energy and action duration, while ensuring that critical objective deadlines are respected. Finally, (MC)2TS demonstrates its robustness, as incorporating lower-criticality actions does not compromise the results for higher-criticality levels.

6 Related Works

In this section, we explore the literature on MCS and MCTS, as these two domains play a key role in our proposed framework for optimizing robot operations.

6.1 Mixed-Criticality Systems

In mixed-criticality systems, researchers have focused on Real-Time Scheduling [5]. As we consider Decision Making Heuristics, we rather focus on the different system models than on the MC scheduling algorithms themselves.

The majority of the mixed-criticality models consider a system with two criticality levels, even though the original paper by Vestal [30] generalizes the approach to a system with multiple levels of criticality. However, many papers extend their approaches to multiple levels, as it is important for common multi-criticality avionics and automotive standards [4]. Fleming extended the Adaptive Mixed Criticality scheduling methods (AMC-rtb and AMC-max) to an arbitrary number of criticality levels [9]. When tasks are non-preemptive, similarly to our model, Novak et al. proposed a static scheduling approach to minimize the schedule makespan when the system has an arbitrary number of criticality levels [24].

Two mechanisms are usually considered to address a resource failure event: either discard the lo-tasks in hi-mode, or degrade the quality of lo-tasks [4]. Gu et al. have criticized the strategy of discarding lo-tasks in hi-mode [12] and propose an unused budget reclamation scheme. Though resource-efficient to some extent, it adopts a pessimistic outlook, degrading the overall efficiency potential. A few studies have proposed to attempt the minimum service level of lo-tasks in hi-mode. Liu et al. propose continuing to execute lo-tasks in hi-mode but with a smaller WCET [23]. However, such an alternative is outside the scope of our case study. Several works also propose increasing the tasks’ period in hi-mode [18, 28, 6], but our system is not periodic.

In the context of energy-aware mixed-criticality systems, some studies propose Dynamic Voltage and Frequency Scaling (DVFS) or DVFS with Earliest Deadline First (EDF-VD) scheduling [23, 17]. While they aim to reduce energy consumption by gracefully degrading lo-tasks in hi-mode, they lack inherent prioritization of hi-tasks over lo-tasks. In these studies, DVFS serves merely a mechanism for degrading the execution of a lo-task. Our approach differs in that it considers energy as an uncertain parameter of the problem, and not just as a resource to be managed. Therefore, we treat energy (as well as action duration) as a first-class citizen within the MC system, equally important as execution time.

6.2 Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) has proven to be pivotal in addressing complex decision-making problems, such as gaming, planning, optimization, and scheduling [16, 3]. Its exploration-exploitation properties make it particularly valuable in path planning applications. Dam et al. extended the application of Monte Carlo methods to improve exploration strategies for robot path planning [8]. Kartal et al. proposed hybrid approach, combining MCTS with Branch and Bound for multi-robot action allocation problem with time windows and capacity constraints [19]. These constraints and time windows only consider one lower and upper bound, which differs from the MC approach.

Most existing studies consider a known environment where cost is fixed. Given a dynamic environment, Patten et al. have proposed a method for using different samples of the robot’s current knowledge to simulate future events [25]. However, it does not consider the uncertainties associated with the robot’s actions. Unlike other approaches that may consider the uncertainty of the rewards of an action [3, 29, 7, 22], we specifically focus on the uncertainties associated with the robot’s resources, introducing a novel aspect of uncertain costs.

Additionally, we highlight mode-change and replanning in (MC)2TS. We demonstrate how they are complementary techniques that increase the safety and efficiency. This emphasizes our adaptability even in the face of future cost changes.

7 Conclusions and Perspectives

In this article, we address the challenge of action planning under uncertain action costs and criticality of objectives, a common issue in complex critical systems where cost overestimation often leads to oversized systems, wasted resources, and reduced performance. To tackle this, we reformulate the Mixed-Criticality (MC) approach, originally from the real-time community, in two key ways. First, we generalize it to handle multiple resources, such as energy and action duration, beyond its initial focus on execution time. Second, we adapt it for use in Monte Carlo Tree Search (MCTS) rather than Real-Time Scheduling.

We introduce (MC)2TS, an extension of MCTS tailored for mixed-criticality systems. This involves adapting the action planning system model to the MC framework and extending cost evaluation to align with the different criticality modes of MC systems. Our evaluation demonstrates significant advantages in reducing system oversizing while handling uncertain costs. Using an active perception example, we show that (MC)2TS enables robots to anticipate environmental changes and adapt cost estimates dynamically.

(MC)2TS outperforms both optimistic and pessimistic MCTS by achieving more objectives across varying budgets, leveraging its ability to revert to lo-mode when costs drop below optimistic assumptions and maintaining costs closer to the optimistic ones. This enables exploration of more action sequences through conserved resources, reduces pessimistic assumptions, minimizes oversizing, and enhances efficiency. (MC)2TS effectively balances safety and performance, even under less pessimistic assumptions, while monitoring uncertain resources like energy and action duration in particular by respecting critical deadlines. Additionally, its robustness ensures that incorporating lower-criticality actions does not compromise higher-criticality outcomes, making it a reliable solution for complex systems characterized by uncertain resource costs and critical objectives.

In future work, we plan to extend our model to robot fleets and other autonomous, self-aware systems [11], enhancing distributed planning with the MC approach. We aim to design (MC)2TS as a real-time service and validate its schedulability using tools like Cheddar [27], demonstrating its effectiveness in real-world environments. Beyond energy and time considerations, an interesting perspective would be to incorporate additional resource constraints such as memory usage for both robotic systems and sensor networks. Another promising investigation would be to extend our approach to other planning paradigms, such as dynamic adaptative policy [13], with mixed criticality approach. This work paves the way for more adaptable, reliable, and efficient solutions in safety-critical and resource-sensitive applications.

References

  • [1] Sanjoy Baruah. Mixed-criticality scheduling theory: Scope, promise, and limitations. IEEE Design & Test, 35(2):31–37, 2018. doi:10.1109/MDAT.2017.2766571.
  • [2] Benjamin Brosgol. DO-178C: the next avionics safety standard. ACM SIGAda Ada Letters, 31(3):5–6, November 2011. doi:10.1145/2070336.2070341.
  • [3] Cameron B. Browne et al. A survey of MCTS. IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43, 2012.
  • [4] Alan Burns and Sanjoy Baruah. Towards a more practical model for mixed criticality systems. In Workshop on Mixed-Criticality Systems (colocated with RTSS), 2013.
  • [5] Alan Burns and Robert Davis. A survey of research into mixed criticality systems. ACM Computing Surveys, 50:1–37, November 2017. doi:10.1145/3131347.
  • [6] G.C. Buttazzo, G. Lipari, and L. Abeni. Elastic task model for adaptive rate control. In 19th IEEE Real-Time Systems Symposium, pages 286–295, 1998.
  • [7] Alberto Castellini, Enrico Marchesini, Alessandro Farinelli, et al. Online monte carlo planning for autonomous robots: Exploiting prior knowledge on task similarities. In AIRO@ AI* IA, pages 25–32, 2019.
  • [8] Tuan Dam, Georgia Chalvatzaki, Jan Peters, and Joni Pajarinen. Monte-carlo robot path planning. Robotics and Automation Letters, 7(4):11213–11220, 2022. doi:10.1109/LRA.2022.3199674.
  • [9] Thomas Fleming. Extending mixed criticality scheduling. PhD thesis, University of York, 2013. URL: https://etheses.whiterose.ac.uk/id/eprint/5688/1/ExtendingMixedCriticalityScheduling.pdf.
  • [10] Oliver Gettings, Sophie Quinton, and Robert I. Davis. Mixed criticality systems with weakly-hard constraints. In Proceedings of the 23rd International Conference on Real Time and Networks Systems, RTNS ’15, pages 237–246, New York, NY, USA, 2015. Association for Computing Machinery. doi:10.1145/2834848.2834850.
  • [11] Holger Giese et al. Architectural Concepts for Self-aware Computing Systems, pages 109–147. Springer, 2017. doi:10.1007/978-3-319-47474-8_5.
  • [12] Xiaozhe Gu and Arvind Easwaran. Dynamic budget management and budget reclamation for mixed-criticality systems. Real-Time Systems, 55, July 2019. doi:10.1007/S11241-019-09330-2.
  • [13] Marjolijn Haasnoot, Jan H. Kwakkel, Warren E. Walker, and Judith ter Maat. Dynamic adaptive policy pathways: A method for crafting robust decisions for a deeply uncertain world. Global Environmental Change, 23(2):485–498, 2013. doi:10.1016/j.gloenvcha.2012.12.006.
  • [14] Zaharuddeen Haruna, Muhammed Bashir Mu’azu, Abubakar Umar, and Glory Okpowodu Ufuoma. Path planning algorithms for mobile robots: A survey. In Motion Planning for Dynamic Agents. IntechOpen, 2023.
  • [15] Todd Hester, Michael Quinlan, and Peter Stone. Rtmba: A real-time model-based reinforcement learning architecture for robot control. In 2012 IEEE International Conference on Robotics and Automation, pages 85–90. IEEE, May 2012. doi:10.1109/icra.2012.6225072.
  • [16] Constantin Hofmann, Xinhong Liu, Marvin May, and Gisela Lanza. Hybrid monte carlo tree search based multi-objective scheduling. Production Engineering, 17:133–144, 2022.
  • [17] Pengcheng Huang, Pratyush Kumar, Georgia Giannopoulou, and Lothar Thiele. Energy efficient dvfs scheduling for mixed-criticality systems. In International Conference on Embedded Software (EMSOFT), pages 1–10, 2014. doi:10.1145/2656045.2656057.
  • [18] Pengcheng Huang, Pratyush Kumar, Georgia Giannopoulou, and Lothar Thiele. Run and be safe: Mixed-criticality scheduling with temporary processor speedup. In DATE, pages 1329–1334, 2015. URL: http://dl.acm.org/citation.cfm?id=2757122.
  • [19] B Kartal, E Nunes, J Godoy, and M Gini. Monte carlo tree search with branch and bound for multi-robot task allocation symposium conducted at the meeting of the the ijcai-16 workshop on autonomous mobile service robots. AAAI16_MCTS. pdf, 2016.
  • [20] Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In European conference on machine learning, pages 282–293. Springer, 2006. doi:10.1007/11871842_29.
  • [21] Sahar Leisiazar, Edward J. Park, Angelica Lim, and Mo Chen. An mcts-drl based obstacle and occlusion avoidance methodology in robotic follow-ahead applications. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 221–228. IEEE, October 2023. doi:10.1109/iros55552.2023.10342150.
  • [22] Wei Li et al. A self-learning monte carlo tree search algorithm for robot path planning. Frontiers in Neurorobotics, 17, 2023.
  • [23] Di Liu et al. EDF-VD scheduling of mixed-criticality systems with degraded quality guarantees. In Real-Time Systems Symposium (RTSS), pages 35–46, 2016.
  • [24] Antonin Novak, Premysl Sucha, and Zdenek Hanzalek. Scheduling with uncertain processing times in mixed-criticality systems. European Journal of Operational Research, 279(3):687–703, 2019. doi:10.1016/j.ejor.2019.05.038.
  • [25] Timothy Patten, Wolfram Martens, and Robert Fitch. Monte carlo planning for active object classification. Autonomous Robots, 42(2):391–421, 2018. doi:10.1007/S10514-017-9626-0.
  • [26] Wind River. Arinc 653–an avionics standard for safe, partitioned systems. In IEEE Seminar, 2008.
  • [27] Stéphane Rubini et al. Scheduling analysis from architectural models of embedded multi-processor systems. SIGBED Rev., 11(1):68–73, February 2014. doi:10.1145/2597457.2597467.
  • [28] Hang Su and Dakai Zhu. An elastic mixed-criticality task model and its scheduling algorithm. In DATE, pages 147–152, 2013. doi:10.7873/DATE.2013.043.
  • [29] Maciej Swiechowski, Konrad Godlewski, Bartosz Sawicki, and Jacek Mandziuk. Monte carlo tree search: a review of recent modifications and applications. Artif. Intell. Rev., 56(3):2497–2562, July 2022. doi:10.1007/S10462-022-10228-Y.
  • [30] Steve Vestal. Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In 28th IEEE International Real-Time Systems Symposium (RTSS 2007), pages 239–243, 2007. doi:10.1109/RTSS.2007.47.
  • [31] Reinhard Wilhelm et al. The worst-case execution-time problem—overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems (TECS), 7(3):1–53, 2008. doi:10.1145/1347375.1347389.