Transition Dominance in Domain-Independent Dynamic Programming
Abstract
Domain-independent dynamic programming (DIDP) is a model-based paradigm for dynamic programming (DP) that enables users to define DP models based on a state transition system. Heuristic search-based solvers have demonstrated strong performance in solving combinatorial optimization problems. In this paper, we formally define transition dominance in DIDP, where one transition consistently leads to better solutions than another, allowing the search process to safely ignore dominated transitions. To facilitate the efficient use of transition dominance, we introduce an interface for defining transition dominance and propose the use of state functions to cache values, thereby avoiding redundant computations when verifying transition dominance. Experimental results on DP models across multiple problem classes indicate that incorporating transition dominance and state functions yields a 5 to 10 times speed-up on average for different search algorithms within the DIDP framework compared to the baseline.
Keywords and phrases:
Dominance, Dynamic Programming, Combinatorial OptimizationCopyright and License:
2012 ACM Subject Classification:
Mathematics of computing Combinatorial optimizationSupplementary Material:
Software: https://github.com/domain-independent-dp/didp-rs/releases/tag/transition-dominance-cp25archived at
swh:1:dir:ec877b961363d47d3f015af89a623b7a571bc736
Funding:
This work was supported by the Australian Research Council OPTIMA ITTC IC200100009, a General Research Fund (CUHK14206321) by the Hong Kong University Grants Committee, a Direct Grant by CUHK, and the Natural Sciences and Engineering Research Council of Canada.Editors:
Maria Garcia de la BandaSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
Dynamic programming [2] is a classical approach to solving combinatorial optimization problems. While it is often the most efficient way of solving many problems, traditionally dynamic programming solutions are hard coded for each problem of interest. Domain-Independent Dynamic Programming (DIDP) [13] is a new paradigm to allow the statement of the Bellman equations defining a combinatorial opmisation problem, independent of the techniques used to solve these equations, separating the specification of the model from the solving, as in other complete solving approaches such as constraint programming (CP) and mixed integer programming (MIP). This separation enables easy experimentation with different search algorithms for the same problem domain and enables research on different DP models for a given problem. Empirical evaluation demonstrates the framework outperforms CP and MIP on multiple problem classes, closing a number of open problem instances [14].
The Dynamic Programming Description Language (DyPDL) [13] is a powerful modelling language for DIDP that facilitates the expression of dynamic programming (DP) models. Beyond the problem components, DyPDL also supports the modelling of redundant information, such as state dominance and dual bounds, to improve the performance of various solving approaches. In this paper, we extend this framework by formally defining transition dominance, redundant information that indicates that certain transitions can be safely ignored if a state satisfies a certain condition. By incorporating transition dominance into DIDP, we make these advanced techniques accessible in a solver-independent manner. Deriving dominances in different problem domains is a challenging task that requires a deep understanding of problem structures. Our case studies demonstrate that the insights of identifying new dominances can be applied to similar problem domains, enabling the discovery of transition dominance from common substructures. Our key contributions are as follows:
-
1.
We formally define the concept of transition dominance for dynamic programming models.
-
2.
We introduce a new component into DyPDL to allow users to model transition dominance effectively and state functions to avoid redundant computations.
-
3.
We present new transition dominances for several combinatorial optimization problems.
-
4.
We demonstrate that using transition dominance and state functions substantially improve the performance of various search algorithms with 5 to 10 times speed-ups.
2 Background
A DyPDL model is a tuple , where is a set of state variables, is the target state, is a set of transitions, is a set of base cases, and is a set of state constraints. A state variable can be an element, set, or numeric variable. A state is a tuple that assigns values to state variables in , where the value of a variable is denoted by .
Expressions are used in transitions, base cases, and state constraints to describe the computation of a value using state variables and constants. When an expression is evaluated given a state , it returns a value . A numeric expression returns a numeric value . It can refer to a numeric constant or variable, use arithmetic operations, and take the cardinality of a set expression. An element expression returns a nonnegative integer , while a set expression returns a set . A condition is a function mapping a state to a Boolean value . We say if , and for a set of conditions if for all . A condition can refer to a Boolean constant, compare two elements or numeric expressions, or check whether an element is included in a set. The conjunction and disjunction of two conditions are also conditions. Base cases in and state constraints in are conditions. When a base case is satisfied by a state, the state is called a base state, and the set of all base states is denoted as . We assume that a function is defined, which returns a numeric value given a base state .
A transition has effect which maps a state to another state , and cost which maps a real value and a state to a value . Preconditions are conditions on state variables, and is applicable in a state only if all preconditions are satisfied, denoted by .
Solving a DyPDL model requires finding a sequence of transitions with the optimal cost to transform the target state into a base state. Let be a sequence of transitions applicable from , i.e., and where and . Then, is an -solution if is a base state, and each with satisfy state constraints, and and with are not base states. If is a base state, we assume that an empty sequence is an -solution. The cost of an -solution is defined recursively as
-
if ,
-
otherwise,
An optimal solution for minimization is an -solution whose cost is less than or equal to the cost of any -solution. Let be a function of a state that returns if there are no -solution or the cost of an optimal -solution otherwise.
In this paper, we assume that a DyPDL model is both finite and acylic, and the cost expression satisfies the Principle of Optimality [2]. Under these assumptions, can be computed by Bellman equation [15]:
| (1a) | ||||
| (1b) | ||||
| else | (1c) | |||
State dominance is a type of redundant information useful for improving the solving efficiency of DyPDL models.
Definition 1.
A state dominates another state , i.e. , if and only if there exists an -solution for any -solution such that and .
In practice, it is challenging to detect and exploit state dominance, and therefore the DyPDL formulation focuses on its approximation defined by resource state variables with additional preference. A state is preferred over (or approximately dominates) another state , denoted as , iff for resource variables where greater is preferred, for resource variables where less is preferred, and for non-resource variables.
3 Transition Dominance in DIDP
In this paper, we enhance the framework of DIDP by introducing transition dominance. State dominance falls into the category of memory-based dominance rules [22], where an unexpanded state is compared with previously generated states and pruned if it is shown to be dominated. In contrast, transition dominance is a type of non-memory-based dominance that implies the existence of a better solution without requiring additional memory.
Definition 2.
Transition dominance at a state is a relation defined over such that dominates in , denoted as , implies that .
It is straightforward to show that transition dominance is a transitive relation. When a transition dominance relation in a state is irreflexive, i.e., , we denote the dominance relation by . If transition dominance relations in all states are irreflexive, removing dominated transitions for all states preserves the optimal cost for a DyPDL model.
Proposition 3.
Let be the set of non-dominated transitions for a state where is an irreflexive transition dominance relation in a state . The optimal cost of the DyPDL model is equivalent to that of .
Note that dominates in state , is different from state dominance . Transition dominance holds as long as , whereas state dominance between and requires . These two inequalities do not necessarily imply each other, as their relationship depends on the specific cost expressions. In practice, state dominance is approximately determined by comparing the values of state variables between two states, whereas transition dominance depends only on the values of variables within a single state.
| Job | Release | Processing |
|---|---|---|
| Job | Release | Processing |
|---|---|---|
Example 4.
Consider a sequencing problem with three jobs, where each job has a release time and a processing time, and can only be processed after its release time. The objective is to find a sequence of jobs that minimizes the total completion time. We can model this problem using a set variable to represent the remaining jobs and an integer resource variable to represent the current time. A transition for each job schedules the job next, with the precondition . The optimal value can be computed as follows:
where is the completion time for job given that the current time is . Figure 1 gives two example instances and their complete state transition graphs. Each edge in the transition graph is labeled with a transition and its corresponding value of . Base states are represented with a double-circle.
Figure 1(a) illustrates the effect of transition dominance, where pruned solutions are highlighted in blue. Transition dominates in state because the release time of job 2 is not earlier than the time when job 1 can be fully processed. Thus, directly selecting from is always preferable, leading to the pruning of all -solutions that start with . Note that transition dominance concerns solutions originating from a single state but differing in their initial transitions.
Figure 1(b) illustrates the effect of state dominance, where the processing time of job 1 is modified to 3, and so no longer dominates . Note that is a resource variable where a smaller value is preferred, which defines the state dominance criterion: if a state has the same set of remaining jobs as another state , but a smaller or equal current time, i.e., and , then dominates . A close examination reveals that dominates , dominates , and dominates . Assuming that the state graph is explored using depth-first search, proceeding from left to right, solutions passing through and , highlighted in red, are pruned by state dominance in this example. As shown, pruning by state dominance and by transition dominance operate differently.
4 Modelling Transition Dominance in DIDP
Theoretically, transition dominance is defined for each state, but transition dominance rules that are valid for a set of states satisfying a certain condition are more interesting in practice. In this section, we introduce two constructs to facilitate modelling transition dominance.
4.1 An Interface for Transition Dominance
Exploiting transition dominance avoids the generation of successor states by dominated transitions. A direct way to specify transition dominance is to add additional preconditions for transitions. Modelers first identify a condition on states such that if a state satisfies it, then dominates in , i.e., . Then, the negation is added to the preconditions for the dominated transition , which states that is applicable only if either transition is not applicable or the condition for transition dominance is not satisfied. Otherwise, can be pruned since is a better applicable transition. In DP-based tools with non-declarative APIs like ddo [10] and CODD [20], transition dominance can be implemented by manually checking these preconditions during successor generation.
Example 5.
Consider the transition dominance in Example 4 again. If there are two jobs and such that the time after scheduling and consecutively is no greater than that after scheduling only in any state, then scheduling first is no worse than scheduling first. Formally, for any state , if is true, then taking at state is no worse than taking . To fully utilize the power of transition dominance, each transition should be compared against all other transitions to determine whether there is a transition which dominates . Therefore, we need to augment the precondition of with .
Directly modelling transition dominance through preconditions introduces two key challenges. First, it conflates the redundant information of transition dominance with the actual preconditions of transitions, making it difficult for solvers to treat these distinct model elements differently. Second, multiple conditions for dominance require careful handling of tie-breaking for symmetric transitions that dominate each other in a state.
Example 6.
Consider again the job sequencing problem in Example 4. If the current time is greater than the release times for two jobs and and , then transition dominates another transition . This is a special case of Proposition 16. However, while the dominance is valid if the jobs have equal processing time, adding preconditions for both jobs may make a satisfiable instance unsatisfiable as both transitions become inapplicable.
Multiple conditions for dominance may form reflexive transition dominance relations in some states, violating the requirement in Proposition 3 to preserve the optimality. To resolve this issue, modelers would need to carefully implement tie-breaking between symmetric transitions, adding unnecessary complexity to the model.
To address these challenges, we introduce a new function in DIDPPy, a Python interface for using DIDP, that allows users to explicitly define transition dominance.
The function model.add_transition returns a unique identifier for the newly added transition that can later be used to retrieve the transition and define transition dominance relationships.
The function model.add_transition_dominance is provided for this purpose and takes three arguments: the identifiers of the dominating and dominated transitions, and the condition for transition dominance.
The following shows a model for the problem in Example 5.
We formalize the entity defined with our interface as a transition dominance rule.
Definition 7.
A transition dominance rule is a triple such that .
Suppose is a set of transition dominance rules. The transition dominance graph for a state is a directed graph where the set of nodes is and a directed edge exists if with .
A transition dominance rule provides a partial description of transition dominance relations across all states. Using the transition dominance graph , we consider the transitive closure of the binary relation induced by a given set of transition dominance rules. Concretely, the binary relation over derived from is defined such that there is a path from to .
Proposition 8.
The relation derived from a transition dominance graph is a transition dominance relation for .
Proof.
For a pair of transitions such that , there exists a path in the transition dominance graph where and . By Definition 7, for . Thus, , which satisfy the requirement of a transition dominance relation. While irreflexivity is required to ensure there exists at least one optimal solution as shown in Proposition 3, a user may define symmetric transition dominance rules, e.g., and , such that there exists a state where . To enforce irreflexivity in the transition dominance relation derived from the transition dominance graph while pruning as many dominated transitions as possible, we identify all strongly connected components (SCCs) and construct a contracted transition dominance graph as follows.
Definition 9.
Given the transition dominance graph for a state , a directed graph is a contracted transition dominance graph if the set of nodes is , and edges are constructed with the following procedure:
-
1.
In each SCC of , select a representative node , and add edge to for each node in the SCC.
-
2.
For each edge in such that and are in different SCCs, add an edge to where and are the representative nodes of the SCCs to which and belong.
Proposition 10.
The binary relation derived from a contracted transition dominance graph is an irreflexive transition dominance relation.
Proof.
Let be the transition dominance graph for a state , and let be the contracted transition dominance graph. We show that if has a path from to , then also has a path from to . Then, by Proposition 8, the binary relation derived from is a transition dominance relation. Consider an arbitrary edge on a path from to in . We show that has a path from to , and thus, by concatenating such paths, we can find a path from to in . If and belong to the same SCC in , then there is a path from to in . If and belong to different SCCs in , then by the construction of , they represent their respective SCCs. Since contains the edge , there exists an edge in where belongs to the SCC of and belongs to the SCC of . Moreover, has a path from to and a path from to . Thus, a path from to exists in that includes .
Next, we show that is acyclic, which implies that the derived binary relation over is irreflexive. Assume for contradiction that contains a cycle, i.e., there exists a path from to and a path from to in . By the previous argument, must then contain a path from to and a path from to , meaning that and belong to the same SCC in . However, since and have outgoing edges in , they must be representative nodes of different SCCs, contradicting the fact that they are in the same SCC. Thus, cannot contain a cycle. Finally, we show the maximality of a contracted transition dominance graph, i.e., we cannot use more transition dominance rules while keeping irreflexivity.
Proposition 11.
Let be the transition dominance graph for a state and be a contracted transition dominance graph. Then, there does not exist an acyclic graph with the set of nodes satisfying all of the following properties:
-
1.
If has a path from to , then also has a path from to .
-
2.
If a node has an incoming edge in , then does so also in .
-
3.
There exists a node that has an incoming edge in but not in .
Proof.
Assume that an acyclic graph with the described properties exists. Based on the third property, let be a node that has an incoming edge in but not in . By construction of , must be the representative node of an SCC in . Let be the set of nodes in this SCC. In , each node does not have incoming edges from nodes in different SCCs; otherwise, an edge from a representative node in a different SCC to should have been added to .
In , since is the representative node of , each node has incoming edge . By the second property, in , each node has an incoming edge. Since also has an incoming edge in , all nodes in have incoming edges in . By the previous paragraph, an incoming edge to each must be from a node in . In , since each node in has an incoming edge from another node in , there must be a cycle, contradicting our assumption. Note that if two transitions dominate each other, the choice of which dominance to enforce can impact search efficiency. However, by definition, our model provides no basis for preferring one over the other. In practice, we use Tarjan’s algorithm to detect all SCCs and keep the first node visited by depth-first search as the representative node of each SCC.
Note that using the transition dominance interface and preconditions offer different advantages in terms of computational efficiency. Suppose that a transition is dominated by if and by if . Given a state with , the transition dominance interface requires evaluating and and performing SCC extraction to construct a contracted transition dominance graph. However, when transition dominance is modeled with preconditions, we can avoid evaluating once we detect . In contrast, using preconditions requires restricting conditions for transition dominance for tie-breaking, which may result in lost opportunities to exploit certain transition dominances within a state. Therefore, selecting the appropriate method depends on the trade-off between computational overhead and the effectiveness of transition dominance exploitation.
4.2 Avoiding Redundant Computation using State Functions
Modelling transition dominance typically requires pairwise comparisons of all applicable transitions of a state. To prevent the generation of dominated solutions, the condition should be evaluated for all pairs of transitions where potentially dominates . However, evaluating these conditions often results in redundant computations.
Example 12.
Consider the problem in Example 5 and its corresponding model in Listing 1.
The expression for all transitions, which is represented by the next_time array in the model, appears in the cost expression, the effects of the transition for job , and the condition for transition dominance when comparing with other transitions.
Suppose there are jobs in total.
The expression needs to be evaluated times when generating applicable transitions and removing dominated transitions.
By caching the values of such redundant evaluations, the number of times to evaluate such expression can be reduced to for each transition.
To address this observation, we introduce state functions in DyPDL. State functions are defined by expressions using state variables and can be used in preconditions, effects, and cost expressions, just as state variables. During the solving process, state functions are lazily computed and cached. The following shows the usage in the job sequencing example.
To use state functions in Listing 1, we define state functions next_time_sf for all transitions and replace all occurrences of dp.max(t, r[i]) + p[i] with next_time_sf[i].
When checking if the condition to test whether dominates , the expression next_time[0] is computed and cached, and for all other comparison between and for , the value is reused and therefore redundant computation of next_time[0] is avoided.
State functions are similar to axioms in PDDL [28], which are derived predicates whose truth is inferred from values of some basic predicates, while results of state functions can be Boolean, integer, or set values.
As we will show in the experimental evaluation, using state functions is sometimes crucial in exploiting transition dominance efficiently. Additionally, state functions also avoid duplicate recomputation in the original models for problems such as Talent Scheduling [25] and OPTW [11]. More detailed descriptions of problems and models are in Appendix B.
5 Case Studies
In this section, we demonstrate the applicability of transition dominance across various optimization problems. For each problem, we describe its DyPDL model and the identified transition dominance. All proofs of propositions in this section are in Appendix A.
5.1 Aircraft Landing
The aircraft landing problem [18] involves scheduling the landing of a set of aircraft on multiple runways to minimize the delay from the target landing times. The aircraft are partitioned into multiple classes, and it is assumed that each class follows a first-come-first-serve sequence according to the target landing time. The decision at a state, then, is to determine which class of aircraft should land next on each runway. The model uses state , where is the index of the next aircraft for each class , is the most recent landing time for runway , and is the class of the most recently landed aircraft for runway . The target and latest landing times of the next aircraft of class are and , respectively.
Landing two aircrafts of class and consecutively on the same runway must respect the minimum separation time . A transition lands the next aircraft of class on runway with the delay , where is if and otherwise. This transition is applicable only when and the actual landing time is no later than . The optimal value is computed as:
where transition updates to , to , and to .
We follow Coppé et al. [6] to add state dominance that are integer resource variables where less is preferred. We define transition dominance where landing the aircraft for class is better than landing another class on the same runway if the former can be landed before . Figure 2 shows an example with two aircraft classes. A large aircraft has landed on the runway at , and the target landing times for the next medium and large aircraft are and respectively. The target landing time of the next large aircraft is so late that the next medium aircraft can land first without delaying the large aircraft.
Proposition 13.
Suppose minimum separation times satisfy the triangle inequality: for any classes . If satisfy
at the current state , then .
The transition dominance described above schedules a task, such as landing an aircraft, before the start of another task. This type of transition dominance also applies to other problems, such as the problem in Example 4, the Orienteering Problem with Time Windows (OPTW) and the Travelling Salesman Problem with Time Windows and Makespan Objective (TSPTW-M). Detailed problem descriptions are provided in Appendix B.
5.2 Graph-Clear
The Graph-Clear problem arises from multi-robot surveillance tasks [12]. The problem aims to find a sequence of steps to decontaminate (“sweep”) nodes in an undirected graph with all nodes being initially contaminated. The objective is to minimize the maximum number of robots used over all steps in a given indoor environment modelled as a graph. Each node can be swept using robots, and each edge can be blocked using robots. If we sweep a node in one step, we must use robots for sweeping , blocking edges connecting , and blocking edges connecting all swept and unswept nodes to avoid recontamination. Figure 3 shows an example instance where the numbers on nodes and edges represent the number of robots required for node sweeping and edge blockage, respectively.
The DyPDL model proposed by Kuroiwa and Beck [13] uses a set state variable to represent swept nodes. A transition for each node is defined to add a node into and has the precondition . The number of robots used to sweep at a state is , and an optimal solution minimizes the maximum number of robots used at any step. The optimal value is computed as follows:
We define a transition dominance for the Graph-Clear problem inspired by the customer search model for the Minimization of Open Stacks Problem (MOSP) [4]. In Graph-Clear, observe that in any sequence, a transition always requires the same number of robots to sweep a node and block the edges connecting to its neighbors. The difference arises in the number of robots required to block the cutting edges that connect swept nodes in and unswept nodes in . If a transition reduces the weight of the cutting edges, sweeping any other unswept node after applying requires fewer robots compared to sweeping in the current state. Moreover, if uses fewer robots compared to transition , then the sequence beginning with transition then uses fewer robots at all steps compared to the corresponding sequence beginning with .
Consider the instance in Figure 3. Sweeping node in the next step is preferable to sweeping node . First, the weight of the cutting edges after sweeping becomes , which is smaller than the current state . Sweeping and afterwards will require fewer robots. Second, sweeping requires only robots, which is fewer than the robots needed to sweep . We can conclude that sweeping then is always better than sweeping from the current state. This can be formulated as transition dominance in the model.
Proposition 14.
Suppose for a state in the DyPDL model of the Graph-Clear problem. If we have and
| (2a) | ||||
| (2b) | ||||
at a state , then .
| Item Parameters | |||
|---|---|---|---|
| Type | Item No. | ||
| Changeover Costs | ||
|---|---|---|
| Type 1 | Type 2 | |
| Type 1 | ||
| Type 2 | ||
5.3 Discrete Lot Sizing
The discrete lot-sizing problem (DLSP) is a production planning problem for items of various types on a single machine [24]. Each type has a set of items, and the item of type must be produced before its due period . A changeover cost is incurred when the machine switches from producing an item of type to an item of type . Additionally, the stocking cost is calculated as the product of the unit cost and the difference between the production period and .
We propose a sequence model that makes decisions based on the reversed production sequence of item types. The model uses three types of state variables: an integer variable represents the latest available period for producing items, represents the type of item chosen in the last decision, and is a vector where indicates the remaining demand for each type . A transition represents the decision to produce an item of type . Since backlogging is not allowed, the item is produced at the period , and the available period is updated to . The total cost of the transition is defined as .
Figure 4 illustrates an example instance and a corresponding solution sequence . Initially, and . The first transition selects an item of type to produce at period , which is the latest due period for type items. The state variables are updated as follows: becomes , is updated to , and is updated to . The second transition selects an item of type to produce at period , incurring a stocking cost of and a changeover cost of . The sequence continues until either all items are produced or the sequence becomes invalid, as the available production period becomes less than the total number of remaining items.
Formally, the DP model can be expressed using the following Bellman equation:
Observe that in the example sequence in Figure 4, the machine is idle at period . An item of type could have been produced at this period to reduce the stocking cost. Additionally, this would save the changeover cost incurred from switching from a type item to a type item at period . We could verify whether dominates after applying the prefix transitions . If indeed dominates , the example sequence can be safely discarded, as it is guaranteed not to be optimal. The following proposition formally characterizes the transition dominance observed in this example.
Proposition 15.
Suppose the changeover costs satisfy the triangle inequality. If where satisfy:
-
has a later production period: ,
-
the total cost is less: ,
at a state , then .
Note that Coppé, Gillard, and Schaus [5] propose a similar model which introduces a auxiliary idle transition and makes decisions for each period in reverse, starting from the last period considered in the planning horizon. Preliminary experiments show that solving the sequence model is more efficient than the period model.
| Problem | Solver | Average Solving Time | Average Expanded Nodes | |||||
|---|---|---|---|---|---|---|---|---|
| base | +D | +S | +D+S | base | +D | reduction | ||
| ACPS | 23.98 | 1.99 | 24.03 | 1.97 | 1803688.77 | 183502.68 | 89.83% | |
| CABS | 153.89 | 3.38 | 153.46 | 3.22 | 14889711.97 | 374543.79 | 97.48% | |
| CAASDy | 19.98 | 1.76 | 20.39 | 1.65 | 1298236.12 | 132313.69 | 89.81% | |
| ALP | ACPS | 69.90 | 47.71 | 69.55 | 43.44 | 5290237.80 | 3154905.57 | 40.36% |
| CABS | 165.68 | 106.90 | 159.41 | 92.69 | 14235715.54 | 8914310.72 | 37.38% | |
| CAASDy | 67.66 | 49.18 | 68.56 | 44.61 | 3951671.66 | 2503272.76 | 36.65% | |
| Graph-Clear | ACPS | 34.25 | 142.44 | 26.17 | 30.13 | 1245620.20 | 947797.81 | 23.91% |
| CABS | 40.12 | 168.04 | 24.04 | 28.94 | 1734871.17 | 952846.14 | 45.08% | |
| CAASDy | 8.36 | 17.51 | 7.11 | 4.35 | 225567.74 | 138467.09 | 38.61% | |
| Talent-Sched | ACPS | 191.24 | 102.09 | 116.47 | 15.11 | 1894764.81 | 394158.89 | 79.20% |
| CABS | 238.81 | 150.64 | 142.33 | 26.35 | 2387816.79 | 665926.33 | 72.11% | |
| CAASDy | 16.47 | 13.62 | 11.28 | 2.83 | 205820.52 | 77316.28 | 62.44% | |
| OPTW | ACPS | 57.08 | 43.49 | 42.77 | 34.74 | 1434287.87 | 1234402.62 | 13.94% |
| CABS | 188.23 | 146.80 | 143.97 | 114.26 | 3877856.20 | 3352288.70 | 13.55% | |
| CAASDy | 47.48 | 40.79 | 35.98 | 31.29 | 1365988.53 | 1175550.17 | 13.94% | |
| TSPTW-M | ACPS | 31.97 | 29.93 | 15.23 | 14.76 | 673951.74 | 619372.42 | 8.10% |
| CABS | 98.45 | 91.26 | 53.30 | 50.64 | 2013448.95 | 1804993.92 | 10.35% | |
| CAASDy | 30.80 | 28.19 | 15.62 | 14.53 | 623198.30 | 557875.92 | 10.48% | |
| DSLP | ACPS | 37.72 | 13.57 | 37.59 | 12.00 | 5664688.02 | 1142853.46 | 79.82% |
| CABS | 249.48 | 105.06 | 255.71 | 93.90 | 27283103.00 | 6743668.77 | 75.28% | |
| CAASDy | 27.02 | 10.62 | 27.33 | 9.38 | 3313961.61 | 783309.53 | 76.36% | |
6 Experimental Evaluation
In this section, we experimentally evaluate the impact of using transition dominance and state functions on seven combinatorial optimization problems. For Graph-Clear, OPTW, and TSPTW-M, we modify the existing models111https://github.com/Kurorororo/didp-models by incorporating additional state functions and transition dominance. Following Kuroiwa and Beck [13, 14], we use benchmark instances from the literature. Further details on these instances are provided in Appendix C. Additionally, we introduce new DyPDL models for four problems:
-
One Machine Scheduling Minimizing Total Weighted Tardiness (): We implement the model based on the one for from a public repository,1 and we generate 300 instances where the number of tasks , and control the distribution of release and due dates of tasks. We implement transition dominance following Akturk and Ozdemir [1].
-
Discrete Lot Sizing Problem (DSLP): we implement the sequence model and generate 360 instances following Coppé et al. [5] with the number of items in , the number of periods in , and the density in . The stocking costs and changeover costs are sampled uniformly in and respectively.
The transition dominance interface and state functions are implemented in didp-rs v0.7.3222https://didp.ai/ using Rust 1.78.0, and all models are implemented in Python 3.9.16 using DIDPPy, a Python interface for didp-rs. All experiments are run on an Intel Xeon-Gold 6150 processor with a single thread, an 8 GB memory limit, and a time limit of 1800 seconds.
We experiment with three search algorithms: Anytime Column Progressive Search (ACPS), Complete Anytime Beam Search (CABS). and Cost-Algebraic A* Search (CAASDy). These solvers usually solve the most instances subject to the limits and exhibit representative tendencies [13] in the DIDP framework. We compare four models for each problem: the “base” model and the base model plus transition dominance “+D”, state functions “+S”, and both transition dominance and state functions “+D+S”.
We compare four configurations of each solving algorithms based on two metrics: (1) coverage, which is the number of instances for which optimality or infeasibility is proven within time and memory limits, and (2) optimality gap, which is the relative difference between the primal and dual bounds, with a value between and . If no solution is found and the instance is not proven infeasible, we set the optimality gap to be . Figure 5 illustrates the cumulative ratio of solved instances with respect to completion time and optimality gap. On the left-hand side of each subfigure, the -axis represents time in seconds, and the -axis represents the ratio of coverage achieved within seconds to the total number of instances. On the right-hand side, the -axis represents the optimality gap, and the -axis represents the ratio of instances where the optimality gap is less than or equal to . A curve that is higher and further to the left indicates better performance, which means more instances are solved within a given time or achieve lower optimality gaps within the time limit.
As shown in Figure 5, “+D” and “+D+S” improve the performance of all search algorithms substantially. Upon inspecting the detailed results, combining both transition dominance and state functions enables ACPS, CAASDy, and CABS to solve 420, 450, and 426 more instances in total, respectively, and reduces the average optimality gap by 0.114, 0.155, and 0.106, respectively, within the limits. Compared with ACPS and CABS, the CAASDy solver exhibits an interesting plateau pattern after initial increases, as its first solution is usually optimal: it is guaranteed for our DP models of , ALP, Graph-Clear, Talent-Sched, and DSLP in theory, and it is usually the case with OPTW and TSPTW-M in practice. The optimality gap remains 1 even though the dual bound is improved. Overall, the gaps between “+D+S” and “base” show that transition dominance and state functions enable the algorithms to solve instances completely with less time and reduce the primal-dual gaps for unsolved instances considerably.
The coverage and optimality gap may be affected by the time-out and memory limits used in our experiments. To better analyze speed-ups, we also report the average solving time and the average number of expanded nodes in Table 1 for co-solved instances that can be solved by all four configurations of each solver. Since the number of expanded nodes remains unchanged when state functions are used, we omit the results for “+S” and “+D+S” in this metric. The fastest average time for each solver is highlighted, and the percentage reduction in expanded nodes is provided for reference.
From the results, we observe that using transition dominance alone (“+D”) reduces the number of expanded nodes across all problems and decreases solving time for all problems except Graph-Clear. The “+S” configuration is particularly beneficial in Talent-Sched and OPTW, where state functions help avoid recomputations in their original formulations. Combining both transition dominance and state functions further reduces the average solving time compared to the “base” setup. Overall, the average speed-ups across all co-solved instances for ACPS, CABS, and CAASDy are 10.10, 8.87, and 5.81, respectively.
In the Graph-Clear problem, evaluating transition dominance involves costly summation computations over sets, which significantly slows down performance when applied alone. A closer examination reveals that transition dominance is more effective in reducing expanded nodes for low-density graphs. Combining state functions with transition dominance is essential to achieve better results. In contrast, the “+S” configuration results in similar or higher average times than “base” in , ALP, and DSLP, as state functions mostly involve simple integer or Boolean expressions that do not benefit much from caching. They become slightly more effective in “+D+S” when evaluating transition dominance requires more reuses of the cached values. Exploring the trade-offs in using transition dominance and state functions in different problem domains is an interesting direction for future research.
7 Concluding Remarks
In this paper, we define transition dominance within the framework of DIDP and introduce new constructs, the transition dominance interface and state function, into DyPDL to facilitate effective and efficient modelling of transition dominance. We also demonstrate the broad applicability of transition dominance by presenting several previously unexploited cases and show that the insights gained from transition dominance can be applied across problem domains with common combinatorial substructures. Our experimental results indicate that incorporating transition dominance enhances the solving process of various search algorithms in DIDP, reducing computational time and improving solution quality across a range of combinatorial optimization problems.
An interesting research direction is to study whether transition dominance and state functions can be extracted automatically by analyzing DyPDL models. In the benchmark problems studied in this paper, the dominance rules follow recurring patterns. Specifically, transition dominance applies when one can construct a better solution, and the way we construct such a better solution is usually by advancing (shifting forward) a transition. The goal is to identify sufficient conditions under which the objective value of the new solution improves. These construction patterns could serve as a basis for a general method to infer automatically transition dominance from a problem model. Previous work has explored automatic generation of dominance-breaking constraints from constraint programming models [16, 17], and similar techniques may be applicable to analyzing DyPDL models to derive state dominance and transition dominance.
State functions capture common subexpressions and can be extracted through the analysis of DyPDL models, but there is a tradeoff between recomputation and caching. Retrieving results from cache can prevent the recomputation of computationally costly expressions. However, for simple expressions such as basic comparisons, recomputation is more efficient than caching in terms of time and space. Analyzing the trade-off is important for extracting state functions automatically.
References
- [1] M Selim Akturk and Deniz Ozdemir. An exact approach to minimizing total weighted tardiness with release dates. IIE Transactions, 32:1091–1101, 2000. doi:10.1023/A:1013741325877.
- [2] Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
- [3] TC Cheng, JE Diamond, and BM Lin. Optimal scheduling in film production to minimize talent hold cost. Journal of Optimization Theory and Applications, 79(3):479–492, 1993. doi:10.1007/BF00940554.
- [4] Geoffrey Chu and Peter J. Stuckey. Minimizing the maximum number of open stacks by customer search. In Ian P. Gent, editor, Principles and Practice of Constraint Programming - CP 2009, 15th International Conference, CP 2009, Lisbon, Portugal, September 20-24, 2009, Proceedings, volume 5732 of Lecture Notes in Computer Science, pages 242–257. Springer, 2009. doi:10.1007/978-3-642-04244-7_21.
- [5] Vianney Coppé, Xavier Gillard, and Pierre Schaus. Decision diagram-based branch-and-bound with caching for dominance and suboptimality detection. INFORMS Journal on Computing, 36(6):1522–1542, 2024. doi:10.1287/IJOC.2022.0340.
- [6] Vianney Coppé, Xavier Gillard, and Pierre Schaus. Modeling and exploiting dominance rules for discrete optimization with decision diagrams. In Bistra Dilkina, editor, Integration of Constraint Programming, Artificial Intelligence, and Operations Research, CPAIOR 2024, volume 14742 of Lecture Notes in Computer Science, pages 226–242. Springer, 2024. doi:10.1007/978-3-031-60597-0_15.
- [7] Maria Garcia de la Banda, Peter J. Stuckey, and Geoffrey Chu. Solving talent scheduling with dynamic programming. INFORMS Journal of Computing, 23(1):120–137, 2011. doi:10.1287/IJOC.1090.0378.
- [8] Yvan Dumas, Jacques Desrosiers, Éric Gélinas, and Marius M. Solomon. An optimal algorithm for the traveling salesman problem with time windows. Operations Research, 43(2):367–371, 1995. doi:10.1287/OPRE.43.2.367.
- [9] Michel Gendreau, Alain Hertz, Gilbert Laporte, and Mihnea Stan. A generalized insertion heuristic for the traveling salesman problem with time windows. Operations Research, 46(3):330–335, 1998. doi:10.1287/OPRE.46.3.330.
- [10] Xavier Gillard, Pierre Schaus, and Vianney Coppé. Ddo, a generic and efficient framework for mdd-based optimization. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 5243–5245. International Joint Conferences on Artificial Intelligence Organization, 2020. doi:10.24963/IJCAI.2020/757.
- [11] Marisa G Kantor and Moshe B Rosenwein. The orienteering problem with time windows. Journal of the Operational Research Society, 43(6):629–635, 1992. doi:10.1057/jors.1992.88.
- [12] Andreas Kolling and Stefano Carpin. The GRAPH-CLEAR problem: definition, theoretical properties and its connections to multirobot aided surveillance. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1003–1008. IEEE, 2007. doi:10.1109/IROS.2007.4399368.
- [13] Ryo Kuroiwa and J Christopher Beck. Domain-independent dynamic programming: Generic state space search for combinatorial optimization. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 33, pages 236–244, 2023. doi:10.1609/ICAPS.V33I1.27200.
- [14] Ryo Kuroiwa and J Christopher Beck. Solving domain-independent dynamic programming problems with anytime heuristic search. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 33, pages 245–253, 2023. doi:10.1609/ICAPS.V33I1.27201.
- [15] Ryo Kuroiwa and J Christopher Beck. Domain-independent dynamic programming. arXiv preprint arXiv:2401.13883, 2024. doi:10.48550/arXiv.2401.13883.
- [16] Jimmy H.M. Lee and Allen Z Zhong. Automatic generation of dominance breaking nogoods for a class of constraint optimization problems. Artificial Intelligence, 323:103974, 2023. doi:10.1016/j.artint.2023.103974.
- [17] Jimmy H.M. Lee and Allen Z Zhong. Exploiting functional constraints in automatic dominance breaking for constraint optimization. Journal of Artificial Intelligence Research, 78:1–35, 2023. doi:10.1613/jair.1.14714.
- [18] Alexander Lieder, Dirk Briskorn, and Raik Stolletz. A dynamic programming approach for the aircraft landing problem with aircraft classes. European Journal of Operational Research, 243(1):61–69, 2015. doi:10.1016/j.ejor.2014.11.027.
- [19] Manuel López-Ibáñez, Christian Blum, Jeffrey W Ohlmann, and Barrett W Thomas. The travelling salesman problem with time windows: Adapting algorithms from travel-time to makespan optimization. Applied Soft Computing, 13(9):3806–3815, 2013. doi:10.1016/j.asoc.2013.05.009.
- [20] Laurent Michel and Willem-Jan van Hoeve. CODD: A decision diagram-based solver for combinatorial optimization. In Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto José Bugarín Diz, Jose Maria Alonso-Moral, Senén Barro, and Fredrik Heintz, editors, ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), volume 392 of Frontiers in Artificial Intelligence and Applications, pages 4240–4247. IOS Press, 2024. doi:10.3233/FAIA240997.
- [21] Roberto Montemanni, Gambardella Luca Maria, et al. An ant colony system for team orienteering problems with time windows. arXiv preprint arXiv:2305.07305, 2009. doi:10.48550/arXiv.2305.07305.
- [22] David R Morrison, Sheldon H Jacobson, Jason J Sauppe, and Edward C Sewell. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization, 19:79–102, 2016. doi:10.1016/j.disopt.2016.01.005.
- [23] Jeffrey W Ohlmann and Barrett W Thomas. A compressed-annealing heuristic for the traveling salesman problem with time windows. INFORMS Journal on Computing, 19(1):80–90, 2007. doi:10.1287/ijoc.1050.0145.
- [24] Yves Pochet and Laurence A Wolsey. Production planning by mixed integer programming, volume 149. Springer, 2006. doi:10.1007/0-387-33477-7.
- [25] Hu Qin, Zizhen Zhang, Andrew Lim, and Xiaocong Liang. An enhanced branch-and-bound algorithm for the talent scheduling problem. European Journal of Operational Research, 250(2):412–426, 2016. doi:10.1016/j.ejor.2015.10.002.
- [26] Giovanni Righini and Matteo Salani. Dynamic programming for the orienteering problem with time windows. Technical report, Università degli Studi di Milano-Polo Didattico e di Ricerca di Crema, 2006. URL: https://air.unimi.it/handle/2434/6449.
- [27] Giovanni Righini and Matteo Salani. Decremental state space relaxation strategies and initialization heuristics for solving the orienteering problem with time windows with dynamic programming. Computers & operations research, 36(4):1191–1203, 2009. doi:10.1016/j.cor.2008.01.003.
- [28] Sylvie Thiébaux, Jörg Hoffmann, and Bernhard Nebel. In defense of PDDL axioms. Artificial Intelligence, 168(1-2):38–69, 2005. doi:10.1016/j.artint.2005.05.004.
- [29] Pieter Vansteenwegen, Wouter Souffriau, Greet Vanden Berghe, and Dirk Van Oudheusden. Iterated local search for the team orienteering problem with time windows. Computers & Operations Research, 36(12):3281–3290, 2009. doi:10.1016/j.cor.2009.03.008.
Appendix A Proofs of Propositions
Proof for Proposition 3.
The proposition is trivial if , as removing transitions in any state does not reduce the cost of that state. Let denote the set of -solutions, which contains a finite number of elements by the assumptions of finiteness of a DyPDL model. We aim to show that there exists an optimal solution such that no transition in the solution is dominated. To achieve this, we define a relation over . For any two solutions and , we say if and only if there exists such that: (1) and , (2) for all , and (3) either , where , or and . Here, we denote the state resulting from applying in by .
It is straightforward to verify that is both transitive and irreflexive. Now, suppose an optimal solution for is pruned due to the restriction of to . Then, there must exist another solution such that . By Definition 2 and Principle of Optimality, , and is also an optimal solution. By repeating this process, we can construct a sequence of optimal solutions such that . Since is transitive and irreflexive, and the set of optimal solutions is a finite subset of , the sequence cannot repeat indefinitely. The sequence must terminate at some , which is optimal and not pruned by replacing with .
Proof for Proposition 13.
We need to show that for any -solution beginning with , there exists a better solution beginning with . Suppose where aircraft lands on runway . We can always construct another -solution , where is excluding . The landing times of all aircraft on runway will not change since the landing time of the first aircraft is still . The landing time of all aircraft on runway cannot be any later since we are removing an aircraft and the minimum separation times satisfy the triangle inequality.
Proof for Proposition 14.
For any -solution beginning with , must be in since and . We can construct another solution , where is excluding . We now prove that the constructed solution uses an equal or lower number of robots at each step. First, the number of robots required to sweep node does not exceed that required to sweep node at state :
The inequality above is from (2a). Also, the number of robots to sweep other nodes does not increase since for any at a state after applying
The first inequality is because , and the second inequality is due to (2b).
Proof for Proposition 15.
Suppose there exists a dominated sequence of the form . We can always construct a dominating sequence , where the first occurrence of in is removed to form . If the dominated sequence is feasible, then the dominating sequence must also be feasible because the due date is less than . Postponing the production of the next item of type and removing from does not cause the production periods of any items to be pushed earlier.
Next, we prove that the total cost of the dominating sequence is less than that of the dominated sequence. We first analyze the difference in changeover costs. In the dominated sequence, an item of type is produced, followed by an item of type , incurring a changeover cost of . If we insert the production of an item of type between them, the changeover cost becomes . The difference is . Since the changeover costs satisfy the triangle inequality, removing the first from does not increase the changeover costs.
As for the stocking costs, the only difference lies in the cost of the next item of type . In the dominated sequence, the item is not produced at period or . Thus, its production period must be postponed by at least two periods, increasing the total cost by .
Appendix B Additional Problem Descriptions
B.1 One Machine Scheduling Minimizing Total Weighted Tardiness
We consider one machine scheduling for a set of jobs , where each job has the processing time , the release dates , the deadline , and the weight , all of which are nonnegative. The objective is to schedule all jobs while minimizing the sum of weighted tardiness, i.e. the different between and the completion time of job .
We formulate a DyPDL model where one job is scheduled at each step. Let be a set variable representing the set of scheduled jobs, and be the current time, which are initially an empty set and respectively. The current time is an integer resource variable where less is preferred. The completion of a job when it is scheduled after time is , and the tardiness is represented as a numeric expression . The optimal value of a state is computed as:
We implement two dominance rules proposed by Akturk and Ozdemir [1].
Proposition 16.
Suppose for a state in the DyPDL model of the problem. If satisfy (1) , (2) , (3) , and (4) at a state , then .
Proposition 17.
Suppose for a state in the DyPDL model of the problem. If , and for all , satisfies (1) , (2) , (3) at a state , then for all transitions .
Note that the second dominance rule can be implemented using forced transitions in DyPDL, a transition such that all other transitions are not applicable when it is applicable, a special case of transition dominance.
B.2 Talent Scheduling Problem
The talent scheduling problem [3] is to find a sequence of scenes to shoot to minimize the total cost of a film. In this problem, a set of actors and a set of scenes are given. In a scene , a set of actors plays for days. For convenience, let denote the set of all actors in all scenes , i.e. . An actor incurs the cost for each day they are on location. If an actor plays on days and , they are on location on days even if they do not play on day to . The objective is to find a sequence of scenes such that the total cost is minimized.
We use the double-ended search model proposed by Garcia de la Banda et al. [7] and implement the dominance proposed by Qin et al. [25]. Let and be two set variables representing the scenes at the beginning and at the end of the schedule, which are empty sets initially, and be the set of remaining scenes. At each step, a scene to shoot is selected from , and append to . There are two types of actors:
-
Type 1: If is neither in nor in but is still present on location during the days of shooting scene . In other words, . Actor must be paid for the shooting days of scene .
-
Type 2: If is not included in but is included in , and scene is their first involved scene, then actor must be paid for all shooting days for scenes in .
Let and . Therefore, The cost per day to shoot is
Overall, we have the following DyPDL model.
We implement the dual bound where the remaining cost must be at least the total cost of actors times the shooting days they must present, i.e.
We follow Qin et al. [25] to implement the following dominance rule as transition dominance.
Proposition 18.
Suppose two scenes are two unscheduled scenes. Let and . If and satisfy
-
, or
-
,
then at the current state.
B.3 Orienteering Problem with Time Window
The OPTW problem [11] asks for a schedule to visit a set of customers starting from the depot . Visiting customer from incurs travel time while producing the profit . Each customer has a service window and can be visited only within the window. The vehicle needs to wait until upon earlier arrival. The objective is to maximize the total profit while returning to the depot before the deadline .
The DyPDL model we implement is similar to the DP model by Righini and Salani [27]. The model uses a set variable to represent the set of customers to visit, an element variable to represent the current location, and a numeric resource variable to represent the current time. We visit customers one by one using transitions. Customer can be visited next if it can be visited and the depot can be reached by the deadline after visiting . Let be the shortest travel time from to . Then, the set of customers that can be visited next is . The optimal value of a state can be computed as follows:
In the actual implementation, we also add additional forced transitions to remove nodes from that cannot be reached without violating the time limit .
Transition dominance in this problem is similar to that of ALP: if customers and can be visited consecutively without reaching starting from the current position and the current time, then taking must not be optimal.
Proposition 19.
Suppose the travel times satisfy the triangle inequality, and customers are unvisited. If satisfy that visiting and consecutively does not reach the start time of , i.e., , then .
Proof.
By definition of we can infer that if . Let the current state be . We show that for any -solution starting with , we can construct a dominating -solution starting with then which has at least the same profits.
Suppose that does not contain , we can construct an -solution . The state after applying transition directly and after applying transitions and are:
respectively. The simplification of is due to the condition in the proposition.
Consider the first transition in . Since , implies . After applying transition to and , the current time becomes and , respectively, with the latter term still being less than or equal to the former. Inductively, if is an -solution, then it must also be a feasible -solution with the same profit. According to the Bellman equation, the -solution has less profit than due to the additional transition .
Now, suppose the -solution does visit customer , and let be the sequence of transitions obtained by removing from . We claim that is a feasible -solution. The arguments for the applicability of any transition before in are similar to those above. Skipping in the solution does not increase the current time due to the triangle inequality of travel times. After transition , the set of unvisited customers becomes the same. Therefore, if is a feasible -solution, then is a feasible -solution. The difference in the objective between and is , and the solutions and have the same objective.
B.4 Travelling Salesman Problem with Time Windows
In the travelling salesperson problem with time windows and makespan objective [19], a set of customers is given. A solution is a tour starting from the depot (index ), visiting each customer exactly once, and returning to the depot. Visiting customer from incurs the travel time . In the beginning, . The visit to customer must be within a time window . Upon earlier arrival, waiting until is required. The objective we consider is to minimize the total makespan where the cost of visiting customer from the current location with time is . Let be the shortest travel time from to . Similar to OPTW, the model uses a set variable represents the set of customers to visit, an element variable represents the current location, and a numeric resource variable represents the current time. We visit customers one by one using transitions. For simplicity, let and .
The optimal value of a state can be computed as follows:
Proposition 20.
Suppose the travel times satisfy the triangle inequality, and customers are unvisited. If visiting and consecutively does not reach the start time of , i.e. , then .
The proof is similar to that of OPTW.
Appendix C Experiment Instances
-
Graph-Clear: We use 135 instances generated by Kuroiwa and Beck [15], where each graph consists of 20, 30, or 40 nodes.
-
OPTW: We use 144 instances from Righini and Salani [26], Montemanni and Gambardella [21], and Vansteenwegen et al. [29]. The original instances are defined on a geometric plane. However, rounding distances between locations in the literature may lead to violations of the triangle inequality. To correct this, we update the distance between locations and to whenever there exists a location such that .
-
TSPTW-M: For TSPTW, we use 290 instances from Dumas et al. [8], Gendreau et al. [9], and Ohlmann and Thomas [23]. Similar to OPTW, rounding integer travel times may result in violations of the triangle inequality. We apply the same correction to ensure that the inequality holds for all travel times.
