Abstract 1 Introduction 2 Background 3 Bellman Operator Reachability in Arbitrary Dimension 4 A Decidable Case 𝒅=𝟐 5 Related Work and Discussion References

On Piecewise Affine Reachability with Bellman Operators

Anton Varonka ORCID TU Wien, Austria Kazuki Watanabe ORCID National Institute of Informatics, Japan
The Graduate University for Advanced Studies (SOKENDAI), Hayama, Japan
Abstract

A piecewise affine map is one of the simplest mathematical objects exhibiting complex dynamics. The reachability problem of piecewise affine maps is as follows: Given two vectors 𝒔,𝒕d and a piecewise affine map f:dd, is there n such that fn(𝒔)=𝒕? Koiran, Cosnard, and Garzon show that the reachability problem of piecewise affine maps is undecidable even in dimension 2.

Most of the recent progress has been focused on decision procedures for one-dimensional piecewise affine maps, where the reachability problem has been shown to be decidable for some subclasses. However, the general undecidability discouraged research into positive results in arbitrary dimension.

In this work, we investigate a rich subclass of piecewise affine maps arising as Bellman operators of Markov decision processes (MDPs). We consider the reachability problem restricted to this subclass and examine its decidability in arbitrary dimensions. We establish that the reachability problem for Bellman operators is decidable in any dimension under either of the following conditions: (i) the target vector 𝒕 is not the fixed point of the operator f; or (ii) the initial and target vectors 𝒔 and 𝒕 are comparable with respect to the componentwise order. Furthermore, we show that the reachability problem for two-dimensional Bellman operators is decidable for arbitrary 𝒔,𝒕d, in contrast to the known undecidability of reachability for general piecewise affine maps.

Keywords and phrases:
piecewise affine map, reachability, value iteration, Markov decision process, Bellman operator
Funding:
Anton Varonka: A. Varonka gratefully acknowledges the support of the ERC consolidator grant ARTIST 101002685.
Copyright and License:
[Uncaptioned image] © Anton Varonka and Kazuki Watanabe; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Abstract machines
; Mathematics of computing Markov processes ; Theory of computation Program verification
Acknowledgements:
We would like to thank the anonymous reviewers for their valuable comments and suggestions.
We further thank Ichiro Hasuo and Laura Kovács for their helpful feedback.
Funding:
The authors were supported by the ASPIRE grant No. JPMJAP2301, JST.
margin: [Uncaptioned image] This paper is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 10103444).
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Introduction

Solving reachability problems is central to formal verification, but it is challenging, as evidenced by a body of work that has been pursued for decades. Specifically, the reachability problem that we are interested in asks the following question: Given two vectors 𝒔,𝒕 and a map f, is there n such that fn(𝒔)=𝒕? One of the seminal results is given by Kannan and Lipton [18, 17]. They answer the decidability question in the affirmative – by presenting a novel polynomial-time algorithm – for affine maps f and vectors 𝒔,𝒕 with rational coefficients.111The results by Kannan and Lipton hold for linear maps. Since the reachability problem is decidable in arbitrary dimension, a standard technique extends it to the affine maps by encoding a d-dimensional affine map as a linear map in dimension d+1; see also [27, Section 5].

Unfortunately, the reachability problem becomes undecidable by slightly extending the class of maps beyond the affine maps – piecewise affine maps (PAMs). Koiran et al. [20] show that the reachability problem for PAMs is undecidable even in the two-dimensional space, which witnesses the significant difficulty of the problem, compared to that of affine maps. Specifically, a PAM f on the domain 𝒟 is a function with the property that for some family {P1,,Pk} of sets such that P1Pk=𝒟, the restriction of f to each Pi is an affine function. We consider piecewise affine maps over the domain 𝒟=[0,1]d, where [0,1] is the unit interval. Each of the finitely many pieces P1,,Pk is defined by a conjunction of finitely many linear inequalities.

Example 1.

Consider an example of a PAM in the dimension d=2. Let f:[0,1]×[0,1][0,1]×[0,1] be defined by f(x1,x2)=(x1,x2) with

x1=12x2+13,x2={12x1+12, if x1x2,14x1+14x2+12, if x1<x2.

Here, P1 is defined as {𝐱=(x1,x2):x1x2} and P2 as {𝐱=(x1,x2):x1<x2}, where according to the definition P1P2=𝒟.

In fact, piecewise affine maps characterise a very rich mathematical object [3, 7]. The reachability problem for PAMs is thus one of a series of challenging problems whose crux lies in the unpredictable behaviour of iterative maps and corresponding discrete-time dynamical systems, see also [3, 11, 19, 27].

A natural question might be: When does the reachability problem for PAMs become decidable? Indeed, the reachability problem for PAMs on a unit interval (dimension one) has received attention in recent research [8, 13, 21], where classes of PAMs with decidable reachability problems have been found while fostering new elaborate techniques. Yet, the decidability of the general one-dimensional reachability problem for PAMs remains open, even for maps defined with two pieces.

Our Approach.

In this work, we propose an orthogonal approach to investigate the challenges behind the reachability problem for PAMs, focusing not on the restriction of the dimension or the number of pieces, but on the restriction of the structure of PAMs. Specifically, we consider the reachability problem for the Bellman operators on Markov decision processes (MDPs) – Bellman operators are, in fact, PAMs that have been studied mostly in the context of software verification or reinforcement learning [1, 4, 24]. For instance, the PAM in Example 1 is a Bellman operator.

MDPs are a standard probabilistic model for systems with uncertainties, and the least fixed points of the Bellman operators Φ:[0,1]d[0,1]d represent the “optimal” reachability probability to the specific target state. Here, optimal means that the maximum reachability probability is induced by a scheduler that resolves the non-deterministic behaviour on MDPs. We formulate our target problem as follows: Given two vectors 𝒔,𝒕[0,1]d and a Bellman operator Φ:[0,1]d[0,1]d, is there n such that Φn(𝒔)=𝒕? We refer to this problem as the reachability problem for Bellman operators (or BOR, for Bellman Operator Reachability).

Under a reasonable assumption, from any vector 𝒔, the sequence Φn(𝒔)n converges to the unique fixed point μΦ. Here, the unique fixed point is precisely the vector of optimal reachability probabilities from each state to the target state. The iterative procedure where μΦ is approximated by applying Φ is referred to as value iteration and is widely studied [1, 2, 10]. Note that the unique fixed point is computable in polynomial time by linear programming [1], so for our problem we can assume that we know the unique fixed point μΦ a priori. Nevertheless, even a convergent sequence does not generally reach μΦ at any n. In fact, the question of the reachability to the fixed point (in finite time) is known in both theoretical computer science and software verification communities. A case in point is the discussion in [20], where this question is explicitly asked for one-dimensional PAMs (see also [6]). Furthermore, in a value iteration survey [10] the same property is listed. While the authors observe that fixed points are not reachable in general, they do not discuss the decidability aspect. In the present paper, we investigate the decidability of reaching 𝒕=μΦ as part of our problem.

Contributions.

We present some decidability results for our target problem under a condition that ensures the existence of the unique fixed point μΦ of the Bellman operators [15]. First, we show that the reachability problem for Bellman operators is decidable for any dimension if the target vector 𝒕 does not coincide with μΦ (𝒕μΦ). This is true because from any vector 𝒔, the iteration of the Bellman operator converges to μΦ. It becomes rather non-trivial when 𝒕=μΦ, that is, for the reachability problem to the unique fixed point μΦ. We show that the reachability problem for Bellman operators when 𝒕=μΦ is decidable for any dimension if 𝒔 is comparable to μΦ, that is, either 𝒔μΦ or μΦ𝒔 holds for the componentwise order. The crux is to show that eventually only “optimal” actions are chosen, and we reduce the reachability problem to a simple qualitative reachability problem that can be shown decidable.

Finally, we address the remaining case: 𝒕=μΦ and 𝒔 is incomparable to μΦ. In dimension 2, we show an algorithmic procedure also for this case – finding the last piece of the puzzle – the reachability problem for two-dimensional Bellman operators is thus decidable. Our argument is based on analysing the equivalent problem for matrix semigroups, and our proof exploits the existence of a total order on the lines induced by actions. To the best of our knowledge, this is the first result to give a reasonably large class of PAMs for which the reachability problem is decidable in the two-dimensional case.

Organization.

We outline our paper as follows.

  • In Section 2, we formally define the main objects of our study, MDPs and Bellman operators, and recall some known properties.

  • In Section 3, we show that the reachability problem for Bellman operators is decidable when either 𝒕μΦ (Proposition 12) or 𝒔 is comparable to 𝒕=μΦ (Theorem 22). Importantly, the result holds for arbitrary dimension.

  • In Section 4, we prove that the reachability problem for Bellman operators is decidable in the two-dimensional case (Theorem 31), by presenting an algorithm that solves the remaining case (𝒔 is incomparable to 𝒕=μΦ).

  • In Section 5, we discuss our result with related work, and list some future directions.

2 Background

We first recall some definitions and properties for Markov decision processes (MDPs) and their Bellman operators, which are necessary for our development. We then define our target problem, namely the piecewise affine reachability problem with Bellman operators.

2.1 Preliminary

Definition 2 (MDP [24]).

An MDP is a tuple (S,Act,) such that (i) S is a finite non-empty set of states; (ii) Act is an indexed family (Acts)sS of finite sets of actions such that the set Acts and Acts of actions on s and s are disjoint for any s,sS; and (iii) is the transition probability function (s,α,_)S[0,1] with finite support that satisfies sS(s,α,s)=1, for any sS and αActs.

We refer to the support of (s,α,_) by supp(s,α). We fix a target state t and assume that t is a sink, i.e., Actt=.

Example 3.

We present an MDP =(S,Act,), where (i) S={s1,s2,s3,t}; (ii) Acts1{α}, Acts2{β1,β2}, and Acts3=Actt; and (iii) is defined by

(s1,α,s2)1/2,(s1,α,s3)1/6,(s1,α,t)1/3,
(s2,β1,s1)1/2,(s2,β1,t)1/2,
(s2,β2,s1)1/4,(s2,β2,s2)1/4,(s2,β2,t)1/2.

Given states s,sS, a path π from s to s is a sequence π(s1,,sm) such that siS\{t} for any i[1,m1], s1=s, and sm=s. We denote the set of paths from s to s by Path(s,s). A scheduler is a function σ:S+sSActs such that σ(s1sm)Actsm. As deterministic schedulers suffice for the reachability objective [1], we further consider the set Σ of all deterministic schedulers. A scheduler is positional if for any s1sms and s1sns, the actions σ(s1sms) and σ(s1sns) coincide. For a path π(s1,,sm) and a scheduler σΣ, define σ(π)i[1,m1](si,σ(πi),si+1), where πi=(s1,,si).

Definition 4 (reachability probability).

Given a scheduler σ, and sS, the reachability probability σ(st) under σ is defined by σ(st)πPath(s,t)σ(π).

The optimal reachability probability is defined as ps:=supσΣσ(st).

We write 𝒑 for the vector (ps)sS\{t} indexed by states S\{t}. The optimal reachability probabilities are in fact achievable by a positional scheduler.

Proposition 5 (e.g. [1]).

There exists an optimal positional scheduler σposΣ such that σpos(st)=ps holds for all sS.

Value Iteration (VI) [1, 24] is a standard technique to approximate the vector of optimal reachability probabilities 𝒑. Specifically, VI applies the Bellman operator Φ to the current approximation for each iteration step.

Let Sd={s1,,sd} be the set of all non-target states whose optimal reachability probability is positive. We note in passing that the set Sd can be computed using graph reachability techniques [1, Chapter 10.6.1].

For each sSd, we associate a polynomial of degree 1, a linear polynomial of α, with each action αActs. This polynomial Lα[𝒙] is defined as

Lα(𝒙)=sSd(s,α,s)xs+(s,α,t).
Definition 6 (Bellman operator).

The Bellman operator Φ:[0,1]d[0,1]d is defined by Φ(𝐱)smaxαActsLα(𝐱) for each 𝐱=(xs)sSd[0,1]d and sSd.

Example 7.

The Bellman operator Φ of the MDP given in Example 3 is given by

Φ(𝒙)s11/2x2+1/3,Φ(𝒙)s2max(1/2x1+1/2, 1/4x1+1/4x2+1/2),

where Sd={s1,s2}. The Bellman operator Φ is indeed the PAM f given in Example 1.

By an abuse of notation, we write Acti for the set Actsi of actions on siSd, and the optimal reachability probability pi for psi. We also restrict 𝒑 to the vector over Sd and write 𝒑=(p1,,pd) when it is clear from the context. Each action αActi is associated with a set succ(α) of successor states defined as succ(α):=supp(si,α)Sd. We emphasise that the successor set, together with transition probabilities (s,α,s), sSd, is a probabilistic subdistribution.

We define the partial order on vectors in d by 𝒖𝒗 if uivi holds for each i[1,d]. We refer to vectors 𝒖,𝒗d as comparable if either 𝒖𝒗 or 𝒖𝒗 holds. Otherwise, the vectors are incomparable, denoted 𝒖𝒗.

Let |||| denote the -norm, or the max-norm, defined by 𝒙:=max(|x1|,,|xd|) for a vector 𝒙=(x1,,xd). In the sequel, the notation 𝒙 stands for 𝒙. We further define the -metric, i.e., the distance between two vectors 𝒙 and 𝒚 is d(𝒙,𝒚):=maxi(|xiyi|).

The set [0,1]d is a complete lattice with the componentwise ordering and the Bellman operator is ω-continuous, that is, it preserves joins of ascending sequences. Due to this, the iterative update by the Bellman operator Φ from the bottom vector 𝟎(0,,0)[0,1]d, which is called VI (from 𝟎), converges to the least fixed point μΦ, which is the optimal reachability probabilities 𝒑=(p1,,pd).

Proposition 8 (​​[1, 10, 24]).

The sequence Φn(𝟎)n is monotonically increasing and converges to the optimal reachability probabilities 𝐩.

By the Kleene fixed-point theorem, we can further see that the iteration by Φ converges to 𝒑 from any initial vector if Φ has a unique fixed point.

Proposition 9 (​​[15]).

Assume Φ has a unique fixed point. The sequence Φn(𝐱)n converges to the unique fixed point 𝐩 from any initial 𝐱[0,1]d.

2.2 Target problem

We begin by recalling the value iteration algorithm. As an iterative procedure, VI boils down to repeatedly applying the Bellman operator Φ starting from a certain vector 𝒔[0,1]d (usually, 𝒔=𝟎) and converging to the least fixed point μΦ(=𝒑) – this becomes the unique fixed point in our setting.

Observe that every Bellman operator Φ is indeed a piecewise affine map on the domain [0,1]d. On close inspection, for every 𝒙[0,1]d, the value of Φ(𝒙) is computed as the maximum of finitely many affine functions ϕ1,,ϕk:[0,1]d[0,1]d evaluated at 𝒙. Let Pi[0,1]d be the set of points 𝒙[0,1]d, for which ϕi(𝒙)ϕj(𝒙) for each ji. Two observations are straightforward: 1) each Pi is defined by a conjunction of linear inequalities; 2) every 𝒙 belongs to at least one set Pi. Moreover, Φ is a well-defined function, which can be observed from 𝒙PiPj implying ϕi(𝒙)=ϕj(𝒙). Therefore, Φ is a PAM on [0,1]d.

In this paper, we investigate the specialisation of the piecewise affine reachability problem to Bellman operators. A standard assumption that ensures the uniqueness of fixed points for Bellman operators is the absence of end components in MDPs [15].

Definition 10 (end component [12]).

Let =(S,Act,) be an MDP. A pair (S,Act) such that SS and ActsSActs is an end component if (i) for all sS and αActActs, supp(s,α)S; and (ii) the directed graph that is induced by (S,Act) is strongly connected.

Note that according to our definitions, the target state of  is not an end component, as there is no action defined at it. We also eliminate all states from which the target state cannot be reached by considering the subset Sd.

Problem BOR (Bellman Operator Reachability).

Let 𝒔,𝒕[0,1]dd and Φ:[0,1]d[0,1]d be a Bellman operator of an MDP  with no end components. Does there exist n such that Φn(𝒔)=𝒕?

In the sequel, we always assume that an MDP  has no end components and hence its Bellman operator Φ has a unique fixed point. Such MDPs remain expressive and exhibit highly non-trivial behaviors, making them an important subject of study in probabilistic verification; see e.g. [15, 9, 25, 16, 2]. Notably, the presence of end components in a given MDP can be checked in P [12]. Moreover, [15] describes a reduction that, for an arbitrary MDP, constructs an MDP with the same least fixed point and without end components; the reduction never increases the dimension.

3 Bellman Operator Reachability in Arbitrary Dimension

In this section, we will discuss the BOR problem without restricting the dimension d. Recall that for the BOR problem, a Bellman operator Φ has a unique fixed point μΦ. It will be instrumental to split the discussion of decidability depending on how the initial and target vectors 𝒔 and 𝒕 compare to μΦ with respect to the componentwise order on d.

3.1 Target vector that is not the fixed point

First, we show that BOR is decidable when 𝒕 is not the unique fixed point μΦ.

Lemma 11.

Let Φ:[0,1]d[0,1]d be a Bellman operator. Consider an arbitrary vector 𝐱[0,1]d and let δ:=𝐱μΦ. We have Φ(𝐱)μΦδ.

Proof.

It holds 𝒙μΦ+𝜹, where 𝜹:=(δ,,δ). For any action α in state i, we have

Lα(μΦ+𝜹)= jSd(i,α,j)(μΦj+δ)+(i,α,t)
= jSd(i,α,j)μΦj+(i,α,t)+jSd(i,α,j)δ
= Lα(μΦ)+jSd(i,α,j)δLα(μΦ)+δ.

Let α~:=argmaxαActiLα(μΦ+𝜹). Then, Φ(μΦ+𝜹)i=Lα~(μΦ+𝜹)μΦi+δ.

From monotonicity of Φ we conclude Φ(𝒙)iΦ(μΦ+𝜹)iμΦi+δ.

Proposition 12.

Let Φ:[0,1]d[0,1]d be a Bellman operator of an MDP with no end components, and 𝐬[0,1]d be an arbitrary initial vector. For every 𝐭[0,1]d with 𝐭μΦ, there exists an effectively computable bound N such that

Φn(𝒔)=𝒕Φn(𝒔)=𝒕 for some nN.

Proof.

Fix vectors 𝒔,𝒕[0,1]d assuming 𝒕μΦ, where μΦ is the unique fixed point of Φ. We crucially use the convergence properties of the interval iteration algorithm [15]. Let 𝟏:=(1,,1) be the greatest element of the lattice [0,1]d. Choose a convergence threshold ε>0. From [15, Theorem 2] we have ΦN(𝟎)ΦN(𝟏)<ε for some NAlogεlog(1BA), where constants A,B only depend on  and can be computed directly from its representation.

From the monotonicity of Φ we have ΦN(𝟎)ΦN(𝒔)ΦN(𝟏) for any 𝒔[0,1]d. We also have ΦN(𝟎)μΦΦN(𝟏), and so ΦN(𝒔)μΦ<ε.

Recall that we can compute the vector μΦ exactly. Now choose the threshold ε:=𝒕μΦ, and let Nε be the previously discussed bound for this threshold. We have ΦNε(𝒔)μΦ<𝒕μΦ and hence, from Lemma 11, Φn(𝒔)μΦ<𝒕μΦ for every nNε. Therefore, either 𝒕=Φn(𝒔) for some n<Nε, or 𝒕 is not reachable from 𝒔 under iteratively applying Φ. The first condition can be checked in finite time since Nε is effectively bounded.

From Proposition 12 we immediately derive an algorithmic procedure for the BOR problem instances (Φ,𝒔,𝒕), where 𝒕μΦ. For an instance like this, it suffices to compute the bound N as above, and to test whether Φn(𝒔) is equal to 𝒕 for some nN .

We now move on to the case 𝒕=μΦ. In the sequel, it will be important to differentiate between two types of actions. These types are defined based on preserving the probabilities of the unique fixed point 𝒕=(t1,,td).

Definition 13.

An action α available in state i is tight, if ti=Lα(t1,,td), where Lα is the linear polynomial of action α. An action α is leaking in state i, if it is not tight.

3.2 Initial vector below the fixed point

We now assume that 𝒔𝒕 and 𝒕=μΦ. Note that Φn(𝒔)𝒕 holds for all n0.

Lemma 14.

Let 𝐱𝐭 be a [0,1]d-vector and let αActi be the action chosen in state i when the Bellman operator is applied in 𝐱, i.e. Φ(𝐱)i=Lα(𝐱).

Φ(𝒙)i=ti holds if and only if α is tight and for each jsucc(α) we have xj=tj.

Proof.

One implication follows from directly applying the definitions. Now consider the () implication. For every action β, we have Lβ(𝒙)Lβ(𝒕). Hence, for a leaking β, this implies Lβ(𝒙)<ti. Since for α we have Lα(𝒙)=ti, it must be tight. Assume further that there exists jsucc(α) such that xj<tj. Then, Lα(𝒙)Lα(t1,,tj1,xj,tj+1,,td)<Lα(𝒕)=ti. This again contradicts Lα(𝒙)=ti, hence xj=tj holds for all jsucc(α).

From probabilities to {𝟏,𝟎}.

The reasoning of Lemma 14 can be extended. Intuitively, we can abstract away from the actual probabilities in vectors Φn(𝒔), n0. This succeeds by only keeping track of whether these probabilities are different from probabilities in 𝐭. This abstraction makes the space of the BOR problem finite, provided 𝒔μΦ.

Formally, we introduce a sign abstraction f:[0,1]d{1,0}d by associating a sign vector 𝜺=f(𝒙)=(ε1,,εd) with every vector 𝒙 such that 𝒙𝒕: εi={0,xi=ti,1,otherwise. According to this definition, f(𝒕)=𝟎. Lemma 14 can now be read as follows: Φ(𝒙)=𝒕 holds if and only if there exists a choice of tight actions (α1,,αd) in 𝒙 (whose sign vector is 𝜺) such that for each state ssuccαi, we have εs=0.

We further prove that the successor of f(𝒙) with respect to the Bellman operator is well-defined. This would allow to only consider the evolution of sign vectors later on.

Lemma 15.

Let 𝐱 and 𝐲 be two vectors satisfying 𝐱𝐭 and 𝐲𝐭. Provided f(𝐱)=f(𝐲), we have f(Φ(𝐱))=f(Φ(𝐲)).

Proof.

Let 𝜺=(ε1,,εd) be the abstraction of Φ(𝒙), that is, 𝜺=f(Φ(𝒙)).

First, notice that a leaking action chosen in state i at 𝒛=(z1,,zd) always implies Lα(𝒛)<ti. Second, recall that having jsucc(α) with zj<tj implies Lα(𝒛)<ti. Hence, Φ(𝒛)i=ti if and only if there exists an action αActi such that Lα(𝒛)=ti. This action is necessarily tight. Using the vocabulary of the sign abstraction, we state f(Φ(𝒛))i=0 holds if and only if there exists a tight action αActi such that Lα(𝒛)=ti. We summarise these observations as

εi=maxαActiα tightminjsucc(α)εj, (1)

and thus make sure that εi does not depend on the actual values in 𝒙 and 𝒚, as soon as those two vectors have the same sign abstraction.

Proposition 16.

There exists an algorithmic procedure for the BOR problem instances (Φ,𝐬,𝐭), where 𝐬μΦ and 𝐭=μΦ.

Proof.

It is easy to observe that the space {1,0}d of possible sign vectors is finite. Given 𝒔, we compute the abstraction f(𝒔) and ask whether 𝟎 is reached by iteratively applying the map 𝜺𝜺 as defined by Equation 1. Once an already explored vector occurs in the sequence 𝜺=f(𝒔),𝜺,𝜺′′,, we can stop. This happens in at most 2d1 iterations. In this finite sequence, 𝟎 occurs if and only if 𝒕 is reached by iterating Φ starting from 𝒔. This is due to Lemma 15.

3.3 Initial vector above the fixed point

Next assumption we are going to work with is 𝒔𝒕=μΦ. The main result of this subsection is the following proposition.

Proposition 17.

There exists an algorithmic procedure for the BOR problem instances (Φ,𝐬,𝐭), where 𝐬μΦ and 𝐭=μΦ.

The procedure for this case is more intricate than in Section 3.2. This is due to the new phenomenon that occurs for sequences initialised with 𝒔𝒕. An iteration of the Bellman operator can choose a leaking action βActi over all tight actions available in state si. Intuitively, this happens if the successor states succ(β) have probabilities significantly greater than optimal – enough to compensate for the “leakage” tiLβ(𝒕). In other words, Φ(𝒙)i>Lα(𝒙) might hold for all tight αActi, unlike in the case 𝒙𝒕 (cf. Lemma 14).

Figure 1: An MDP 1.
Figure 2: An MDP 2.
Example 18.

Consider the MDP 1 in Figure 2. It has S={s1,s2,s3,s4,t} where the “missing” probabilistic transitions lead to s4. Moreover, Act4 comprises a single action with (s4,,s4)=1. We omit s4 and transitions to/from it, for simplicity of presentation.

There is one tight action α and one leaking action β in s1. Let 𝐬(1,1/3,2/3). Clearly, 𝐬>μΦ=(7/12,1/4,1/4). The tight action α is chosen for the first iteration, and Φ(𝐬)=(13/18,2/3,1/4). Next, the leaking action β is chosen and Φ2(𝐬)=(9/12,1/4,1/4), and finally Φ3(𝐬)=𝐭 by choosing the tight action α.

Only tight actions eventually.

However, we show that in a convergent sequence, the actions chosen by the Bellman operator Φ are all tight, after some number of iterations.

Lemma 19.

Let Φ:[0,1]d[0,1]d be a Bellman operator such that μΦ=𝐭. There exists a δ-neighbourhood of the fixed point

Uδ(𝒕)={𝒙[0,1]d:d(𝒙,𝒕)<δ}

such that for every 𝐱Uδ(𝐭), the vector Φ(𝐱) is obtained by applying only tight actions. That is,

Φ(𝒙)=(Lα1(𝒙),,Lαd(𝒙)),

where each αiActi, 1id, is tight.

Proof.

Since the transition probabilities in  are rational numbers, one argues that t1,,td are rational, too. We consider the set of rational numbers that consists of t1,,td, along with Lα(𝒕) for each action α. It is sufficient to consider leaking actions, by definition. Let D be the least common denominator of the numbers in the aforedescribed set.

Let δ:=12D and pick 𝒙Uδ(𝒕). Further let A=(α1,,αd) be the actions chosen at 𝒙 by the Bellman operator Φ. We show that each action in A is tight. Denote by fA the effect of applying A, i.e.,

fA(𝒕)=(Lα1(𝒕),,Lαd(𝒕))andfA(𝒙)=Φ(𝒙).

We notice that fA is 1-Lipschitz (which, in fact, holds for any choice A of actions). Indeed, let us consider arbitrary 𝒖,𝒗[0,1]d. We have

fA(𝒖)fA(𝒗) =max1id|Lαi(𝒖)Lαi(𝒗)|
max1idjsucc(αi)(i,αi,j)|ujvj|
max1id(1maxjsuccαi|ujvj|)=max1id|uivi|=𝒖𝒗.

Utilising the 1-Lipschitz property, we observe fA(𝒕)Φ(𝒙)𝒕𝒙. Moreover, we have Φ(𝒙)𝒕𝒙𝒕 due to Lemma 11. Therefore,

fA(𝒕)𝒕fA(𝒕)Φ(𝒙)+Φ(𝒙)𝒕𝒕𝒙+𝒙𝒕<δ+δ=1D. (2)

On the other hand, fA(𝒕)𝒕=max1id|Lαi(𝒕)ti|. By the definition of D, we know that Lαi(𝒕)ti implies |Lαi(𝒕)ti|1D. We conclude from Equation 2 that fA(𝒕)=𝒕. Equivalently, all actions chosen at 𝒙 are tight.

The vector sequence 𝒔,Φ(𝒔),Φ2(𝒔), converges to 𝒕 due to Proposition 9. Therefore, it reaches a 12D-neighbourhood of 𝒕 after finitely many steps. Furthermore, an upper bound on the number of necessary steps can be computed as in Proposition 12.

Corollary 20.

For an arbitrary initial vector 𝐬[0,1]d, there exists an effectively computable N such that in the sequence (Φn(𝐬))n for every nN, Φn+1(𝐬) is obtained by applying only tight actions to Φn(𝐬).

We point out that the argument used in the proof of Lemma 19, bears resemblance to (and is inspired by) the proof of [15, Theorem 3]. However, the eventual optimality of tight actions, which we establish here, is not a matter of discussion in [15] or any other work we know.

From probabilities to {𝟎,𝟏}.

We now extend the sign abstraction from the previous section to encompass vectors above 𝒕. Let f:[0,1]d{0,1}d be defined by associating a sign vector 𝜺=f(𝒙)=(ε1,,εd) with each vector 𝒙𝒕: εi={0,xi=ti,1,otherwise.

Conforming to this definition is f(𝒕)=𝟎. As before, we show that f is well-defined.

Lemma 21.

Let δ be chosen as above to guarantee that Φ only picks tight actions in the neighbourhood Uδ(𝐭). Consider vectors 𝐱,𝐲[0,1]d satisfying 𝐭𝐱,𝐲𝐭+(δ,,δ).

Provided f(𝐱)=f(𝐲), we have f(Φ(𝐱))=f(Φ(𝐲)).

Proof.

The assumption 𝒙,𝒚Uδ(𝒕) is necessary to shake off the effect of the leaking actions (cf. the assumptions of Lemma 15). Let 𝒛=(z1,,zd) be an arbitrary vector in Uδ(𝒕). This way we guarantee that for every i, there is a tight action αActi with Φ(𝒛)i=Lα(𝒛). It is not hard to see now that having jsucc(α) with zj>tj implies Lα(𝒛)>ti. Therefore, Φ(𝒛)i=ti if and only if there exists no tight action αActi that depends on jsucc(α) with zj>tj. Equivalently, we have

εi=maxαActiα tightmaxjsucc(α)εj, (3)

where 𝜺=(ε1,,εd) is the abstraction of Φ(𝒙), that is, 𝜺=f(Φ(𝒙)).

Therefore, the abstraction vectors of Φ(𝒙) and Φ(𝒚) do not depend on the actual values in 𝒙 and 𝒚, but only on their abstractions f(𝒙)=f(𝒚). The abstraction f is thus well-defined for vectors in a certain neighbourhood of 𝒕.

Proof of Proposition 17.

Starting at 𝒔 above the fixed point, 𝒔𝒕, we first compute the bound N of Corollary 20. If Φn(𝒔)=𝒕 holds for some n<N, we terminate with a positive answer to the BOR problem. Otherwise, we continue with the abstraction argument. We compute f(ΦN(𝒔)) and ask whether 𝟎 is reached by iteratively applying the map 𝜺𝜺 as defined by Equation 3. In at most 2d1 iterations we either reach 𝟎, or discover an ever-repeating vector subsequence that does not contain 𝟎. The rest follows from Lemma 21. Combining Propositions 16 and 17, we obtain the following theorem.

Theorem 22.

There exists an algorithmic procedure that solves all BOR problem instances (Φ,𝐬,𝐭) with 𝐭=μΦ and 𝐬{,}μΦ.

3.4 Initial and target vectors are incomparable

We keep assuming 𝒕=μΦ and consider the remaining case, that is, the case when 𝒔 and 𝒕 are two incomparable vectors, denoted 𝒔𝒕.

We can assume 𝒔Uδ(𝒕) as defined in Lemma 19. Clearly, starting with an arbitrary incomparable vector, we can apply Φ up to the pre-computed power N, reaching either a comparable vector (potentially including 𝒕 itself), or the δ-neighbourhood of 𝒕. In the latter case, set 𝒔 to be the first vector inside the neighbourhood. Hence, for every 𝒙 discussed below, the vector Φ(𝒙) is obtained by applying only tight actions.

Positionality.

Even after eventually adhering to tight actions, the behaviour of Φ’s iterations is not described by a single linear transformation. We recall from Proposition 5 that positionality is prominently sufficient for achieving optimal probabilities in the limit [1]. However, Example 23 shows a more subtle behaviour for our reachability problem.

Example 23.

In Figure 2, we present an MDP 2 with S={s1,s2,s3,s4,t} where the “missing” probabilistic transitions lead to s4Sd. We omit s4 and transitions to/from it, for simplicity. Let Act1={α1,α2}, Act2={β}, Act3={γ} be the actions available in the states of S3. Observe that μΦ=(12,12,12) and all actions are tight, including both α1,α2.

Let 𝐬=(0,56,56). First, α1 is chosen over α2 in 𝐬, and so Φ(𝐬)=(59,49,49). However, in the next iteration, α2(59,49,49)=(12,12,12)(1327,12,12)=α1(59,49,49) yielding Φ2(𝐬)=μΦ.

Spectacularly, none of two positional schedulers reaches μΦ, which can be proved using [17].

With positionality out of question, we need to study schedulers that switch actions over time.

Matrix semigroups.

In the sequel, we take a matrix perspective on BOR by introducing a d×d–matrix for every tuple of tight actions. The set of such tuples is finite, and thus we argue that the behaviour of Bellman operator iterations from 𝒔 is governed by multiplying the vector 𝒔 with elements of a semigroup 𝒮 generated by finitely many matrices M1,,Mk.

We associate with every tight action αActi a row vector ((si,α,s1),,(si,α,sd)) of probabilities for going to the states of Sd, as well as a scalar (si,α,t) for reaching t.

For each state si, let i denote the set of row vectors for tight actions in Acti. Notice that all row vectors only have non-negative entries; furthermore, the sum of elements in each row vector is at most 1. We will further refer to matrices with all rows satisfying these properties as substochastic.

Definition 24.

A family d×d of matrices is called a product family if consists of all possible matrices with i-th row from i for all i{1,,d}.

We further let 𝒮:= be the semigroup generated by .

We introduce the map f by 𝒙𝒙μΦ.

The Product Family Reachability (PFR) Problem.

Let ={M1,,Mk} be a product family of substochastic matrices, and 𝜺[1,1]dd. We define F:dd by F(𝒗)max1ik(Mi𝒗). Then the PFR problem asks: Does there exist n0 such that Fn(𝜺)=𝟎?

Note in passing that max over vectors is taken with respect to the partial order . The operator F is thus well-defined, that is, there exists a matrix M such that F(𝒗)=M𝒗. It holds indeed that F(𝒗)i𝒘𝒗 for any 𝒘i, 1ik.

Proposition 25.

Every d–dimensional instance of BOR with 𝐭=μΦ and 𝐬Uδ(𝐭) (thus guaranteeing that only tight actions are used) is equivalent to a d–dimensional instance of the PFR problem.

Proof.

Let  be an MDP whose Bellman operator Φ has a unique fixed point μΦ[0,1]d. Fix 𝒕=μΦ and 𝒔Uδ(𝒕), where the neighbourhood Uδ(𝒕) is as in Lemma 19. We will introduce a product family of substochastic matrices in d×d with a corresponding operator F and we will prove that there exists n such that Φn(𝒔)=𝒕 if and only if Fn(f(𝒔))=𝟎.

Let i be the set of all row vectors for actions in Acti of , for each i{1,,d}. Then, a finite product family ={M1,,Mk}d×d obtained from these sets only contains substochastic matrices.

Let A=(α1,,αd) be a tuple of actions in , where αiActi for each i. Consider its corresponding matrix M. Formally, (M)i,j=(si,αi,sj). Let LA(𝒙) denote the vector (Lα1(𝒙),,Lαd(𝒙)). By definition of a row vector of an action, we have LA(𝒙)=M𝒙+A, where A=((s1,α1,t),,(sd,αd,t)). In particular, μΦ=MμΦ+A since every action in A is tight. As an intermediate step, we prove the claim below.

Claim.

The following equalities hold for every 𝒙Uδ(𝒕):

  • f(LA(𝒙))=Mf(𝒙),

  • f(Φ(𝒙))=F(f(𝒙)),

  • and f(Φn(𝒙))=Fn(f(𝒙)) for every n1.

Proof (of the Claim).

Use linearity of LA and the equality LA(μΦ)=μΦ to obtain

Mf(𝒙)=M(𝒙μΦ)=M𝒙MμΦ=M𝒙(μΦA)= M𝒙μΦ+A
and f(LA(𝒙))=LA(𝒙)μΦ= M𝒙+AμΦ.

The first statement thus holds. The second statement follows from Φ(𝒙)=maxALA(𝒙). Indeed, f(Φ(𝒙))=f(maxALA(𝒙))=maxA(f(LA(𝒙)))=maxMMf(𝒙)=F(f(𝒙)).

The second statement further serves both as the base case (with argument Φ(𝒙)) and the induction step (with argument Φn+1(𝒙)) to prove the final statement. Now, set 𝜺f(𝒔)=𝒔μΦ in the PFR problem. Clearly, 𝜺[1,1]dd. The equality Φn(𝒔)=μΦ holds if and only if f(Φn(𝒔))=𝟎 if and only if Fn(𝜺)=𝟎.

Example 23 (revisited). The MDP 2 in Figure 2 has a product family ={M1,M2}. The two matrices correspond to the action tuples A1=(α1,β,γ) and A2=(α2,β,γ):

M1=(1/31/31/31/31/301/301/3),M2=(1/21/41/41/31/301/301/3).

We have 𝜺:=f(𝒔)=𝒔μΦ=(1/2,1/3,1/3), and F2(𝜺)=max1i,j2(MjMi𝜺)=M2M1𝜺=𝟎. Indeed, Φ2(𝒔)=μΦ. Meanwhile, M1n𝜺𝟎 and M2n𝜺𝟎 for all n.

 Remark 26.

An instance of the PFR problem is a “yes” instance if and only if

n.(M=MinMi1.M𝜺=𝟎)(M=MjnMj1.M𝜺𝟎),

where i1,,in,j1,,jn{1,,k}. Finding a semigroup element M𝒮 that satisfies M𝛆=𝟎, together with previously discussed techniques, is sufficient to answer the BOR problem. If M𝛆=MinMi1𝛆=𝟎, then Fn(𝛆)0 and hence Φn(𝐬)𝐭. Then, we can apply the complete algorithm from Proposition 17. Notice that this does not necessarily imply that (Φ,𝐬,𝐭) is a positive BOR instance.

However, deciding whether a matrix M𝒮 with M𝜺=𝟎 exists is per se an undecidable problem for general matrices [3]. There, the so-called vector reachability problem for a matrix semigroup 𝒮=M1,,Mk asks for given 𝒙,𝒚d, whether there exists M𝒮 such that M𝒙=𝒚.

Unique tight actions.

We conclude this section with the discussion of the PFR Problem restricted to semigroups generated by a single matrix. This corresponds to the assumption that there is a unique tight action in every state siSd. The decidability for this restriction is not surprising and, in particular, follows from the algorithmic procedure of [18].

Nevertheless, we employ simple linear algebra techniques to provide an alternative proof for our version of the problem (importantly, the target vector is zero).

Proposition 27.

Let Md be a substochastic matrix and 𝛆[1,1]dd an arbitrary vector. If there exists n0 such that Mn𝛆=𝟎, then there exists such nd. Whether there exists n0 such that Mn𝛆=𝟎 can thus be answered algorithmically.

Proof.

Let kerM denote the kernel of M, the set of all vectors 𝒙 such that M𝒙=𝟎.

Clearly, for all n0, kerMnkerMn+1. Furthermore, notice that kerMn=kerMn+1 implies kerMn+k=kerMn for all k0. Assume, for sake of contradiction, that there exists n and k2 such that kerMn==kerMn+k1kerMn+k. Then consider 𝒙kerMn+kkerMn. It holds Mk1𝒙kerMn+1 and, by assumption, Mk1𝒙kerMn. Then, however, 𝒙kerMn+k1kerMn, a contradiction that proves that, for any n, either kerMnkerMn+1, or kerMn=kerMn+1==kerMn+k= holds.

Recall that for every n, kerMn is a linear subspace of d. Therefore, either dimkerMn+1>dimkerMn, or dimkerMn==dimkerMn+k=. Since dimkerMn+i is bounded from above by d, the first statement follows.

Finally, it suffices to check Mn𝜺𝟎 for nd in order to deduce that the equality does not hold for any n. This gives a complete algorithm to solve the 𝟎-reachability problem. The reachability problem for Bellman operators is thus completely solved for MDPs in which a positional scheduler with all tight actions is unique. Equivalently, it is solved for 𝒮=M.

However, we cannot employ the previous argument for 𝒮=M1,M2, since the implication ker(M1k)=ker(M1k+1)ker(M2M1k)=ker(M2M1k+1) does not hold there. For instance, in Example 23, kerM2M1kerM2M12. The decidability remains open for general semigroups – that is, when there are states with non-unique tight actions.

4 A Decidable Case 𝒅=𝟐

In this section, we show that the PFR problem is decidable in dimension d=2. Following Proposition 25, this will suffice for the decidability of the BOR problem in d=2.

The key property that helps us establish the decidability is the existence of a total order associated with the (two-dimensional) row vectors of actions. By arguing about this order, we show that the sequence 𝜺,F(𝜺),F2(𝜺), either has a vector comparable with 𝟎 among its first terms – or never reaches 𝟎.

Actions in 𝒅=𝟐 and lines.

We further consider an MDP  with d=2. Let Act1={α1,,αk} and Act2={β1,,β} be the sets of tight actions available in two states Sd={s1,s2} of . Recall that we associate the sets of row vectors 1 and 2 with actions in Act1 and Act2, respectively. An action is a zero α- or β-action if its row vector is a zero vector (0,0). Denote by Acti the subset of all non-zero actions in Acti, for each i{1,2}.

We now discuss lines that correspond to the actions of MDPs with d=2. For our proof, it is important to identify the actions whose lines have the greatest/least slope. With a mild abuse of notation, we denote by αi(x1,x2) the homogeneous part of the linear polynomial of αi (similarly for βj). This is exactly the dot product of action’s row vector with (x1,x2), or αi(x1,x2):=(s1,αi,s1)x1+(s1,αi,s2)x2.

For each non-zero action γ, the set of points (x1,x2) with γ(x1,x2)=0 is a line orthogonal to the row vector of γ. We denote the angle between this line and the positive direction of the x-axis by γ. The angles of actions in Act=Act1Act2 are numbers in [π2,π] and thus are totally ordered.

We denote by αlo an α-action with the greatest angle. The sign of αlo(x1,x2) is well-defined, that is, it is independent of the choice of action among those with the greatest angle. Similarly, αhi is an α-action with the least angle. Actions βlo and βhi are defined analogously.

We now resume the matrix argumentation. The row vectors in 1 and 2 define the product family {M1,1,,Mk,}2×2. That is, Mi,j𝒗=(αi(𝒗),βj(𝒗)) holds for all αiAct1, βjAct2, and 𝒗2.

Definition 28.

We define F1(𝟎) as the set of all vectors 𝛆 such that F(𝛆)=𝟎.

 Remark 29.

The set F1(𝟎) is entirely contained in the union i,jkerMi,j of kernels.

One immediate consequence thereof is that at least one of the matrices is singular. In other words, there exists Mi,j with kerMi,j{𝟎}. Otherwise, no vector 𝒙μΦ ever reaches μΦ.

Lemma 30.

Given a product family {M1,1,,Mk,}2×2 of substochastic matrices and a vector 𝛆=(ε1,ε2). The map F is defined as above by

F(𝒗)=max1ik,1j(Mi,j𝒗).

Exactly one of the two statements holds:

  1. 1.

    Fn(𝜺)𝟎 for all n.

  2. 2.

    F2(𝜺) is comparable with 𝟎.

Proof.

We can assume 𝜺𝟎, otherwise 2. holds trivially. Let Qi, i{1,2,3,4}, be a coordinate plane quadrant. Here, each Qi is a closed set. For example, Q1={𝒙=(x1,x2):x10x20}. Let intQ1={𝒙=(x1,x2):x1>0x2>0} be the interior of Q1, similarly for other quadrants. Note that {intQ1,intQ3,Q2{0},Q4} is a partition π of 2.

A matrix kernel kerMi,j2 is either a singleton {𝟎}, a line through the origin, or the entire ambient space. Observe that either there exist both a zero α- and a zero β-action; or F1(𝟎) is a union of finitely many lines. In the latter case, both intQ1 and intQ3 do not intersect F1(𝟎) because no one-dimensional kernel intersects intQ1 or intQ3 – all matrices are non-negative. From Remark 29, we have F1(𝟎)Q2Q4 for this case.

The discussion below is driven by the question

“When are the vectors F(𝜺) and F2(𝜺) incomparable with 𝟎?”.

We perform a case distinction by comparing the angles αlo and βlo. Crucially,

F(𝒗)=(max1ikαi(𝒗),max1jβj(𝒗)). (4)

Without loss of generality, let 𝜺=(ε1,ε2) be such that ε1<0 and ε2>0, i.e., 𝜺intQ2.

  1. 1.

    Act1= or Act2=.

    If all α-actions are zero, then F(𝜺)=(0,x2) and thus comparable with 𝟎. Analogously, if all β-actions are zero, then F(𝜺)=(x1,0) is a vector comparable with 𝟎. In the rest of the case distinction, we assume that the sets Act1,Act2 are both non-empty.

  2. 2.

    αlo=βlo. We first consider 𝜺 with αlo(𝜺)<0 and show that F(𝜺) is comparable. Indeed, if there are no zero actions, we have αi(𝜺)<0 for all i. Similarly, all β-actions yield negative values, and so F(𝜺) is a strictly negative vector by (4). Otherwise, at least one zero action is chosen, resulting in a vector with a zero entry (hence comparable with 𝟎). Now, if αlo(𝜺)0, then so is βlo(𝜺)0. It follows immediately that F(𝜺)𝟎.

  3. 3.

    βlo>αlo. There are two subcases based on the existence of zero actions.

    1. (a)

      Act1Act1 (there exists a zero α-action). If αlo(𝜺)0, then F(𝜺)𝟎. Otherwise, a zero α-action is chosen, hence F(𝜺) is comparable with 𝟎 from the argument of Case 1.

    2. (b)

      Act1=Act1 (there are no zero α-actions). We deduce from βlo>αlo that

      𝒙Q2{0}maxiαi(𝒙)<0maxjβj(𝒙)>0. (5)

      We show that Fn(𝜺)Q4 for all n0. Assume towards a contradiction that m is the smallest integer such that Fm(𝜺)Q4. Consider y:=Fm1(𝜺). Since F(𝒚)10, we have αlo(𝒚)0. This implies 𝒚Q2intQ1. But 𝒚intQ1 means βlo(𝒚)>0 and hence F(𝒚)2>0, contradicting F(𝒚)Q4. We have deduced 𝒚Q2. Now, from (5) we either have maxiαi(𝒚)<0 or maxjβj(𝒚)>0. The former implies F(𝒚)1<0 whereas the latter implies F(𝒚)2>0. Either of two contradicts F(𝒚)Q4. Our assumption was wrong, and Fn(𝜺)Q4 for all n0.

  4. 4.

    βlo<αlo.

    1. (a)

      Act2Act2 (there exists a zero β-action). If βlo(𝜺)0, then F(𝜺)𝟎. Otherwise, a zero β-action is chosen, hence F(𝜺) is comparable with 𝟎.

    2. (b)

      Act2=Act2 (there are no zero β-actions). A new phenomenon happens now: there might exist n>0 such that an incomparable vector Fn(𝜺) is in Q4. However, observe first that αlo(𝜺)0 implies βlo(𝜺)<0 and hence the vector F(𝜺) is comparable. Moreover, if βlo(𝜺)0, then similarly αlo(𝜺)>0 and hence, F(𝜺)𝟎. We move on to the case when αlo(𝜺)>0 and βlo(𝜺)<0.

      Indeed, we then have F(𝜺)Q4{𝟎}. In the discussion that follows we analyse how F2(𝜺) depends on F(𝜺). From now on, we focus our attention on actions αhi and βhi, rather than αlo and βlo. In particular, if αhi evaluates to a non-negative number at F(𝜺)Q4, then F2(𝜺)10. We keep assuming that no zero β-actions exist.

      1. i.

        βhi=αhi. Consider first the case αhi(F(𝜺))<0. Then F2(𝜺)𝟎 because βi(F(𝜺))<0 holds for all βiAct2. See also Case 2. Now, if αhi(F(𝜺))0, then so is βhi(F(𝜺))0. It follows immediately that F2(𝜺)𝟎.

      2. ii.

        βhi>αhi. Similarly to the Case 3b, we observe

        𝒙Q4{0}maxiαi(𝒙)>0maxjβj(𝒙)<0.

        We have F1(𝟎)Q4={𝟎}, hence F1(𝟎)Q2. Now if for some 𝒙Q2{0}, we have (F(𝒙))1=0, then αlo(𝒙)0. Then βj(𝒙)<0 for all j, implying F(𝒙)𝟎. Therefore, F1(𝟎)={𝟎} and we have Fn(𝜺)𝟎 for all n𝟎.

      3. iii.

        βhi<αhi. Assume first that a zero α-action exists. In this case, either αhi(F(𝜺))>0 and βhi(F(𝜺))>0 follows as well; or αhi(F(𝜺))0. Regardless, F2(𝜺) is a comparable vector (positive; or having 0 in the first component). We now proceed under the assumption that no zero (α- or β-) actions exist. Recall also that we keep assuming βlo<αlo as well as βhi<αhi. This suffices to prove that F1(𝟎)={𝟎}. Assume the opposite.

        • Let 𝒙F1(𝟎)Q2{0}. It necessarily holds αlo(𝒙)=0. Hence, βj(𝒙)<0 for all 1j, and (F(𝒙))2<0. This contradicts 𝒙F1(𝟎).

        • Let 𝒙F1(𝟎)Q4{0}. We necessarily have βhi(𝒙)=0. Hence, αi(𝒙)<0 for all 1ik, and (F(𝒙))1<0. This clearly contradicts 𝒙F1(𝟎).

        After deducing F1(𝟎)={𝟎}, we have Fn(𝜺)𝟎 for all n also in this case.

In most cases we were able to show F(𝜺) is comparable with 𝟎. This implies F2(𝜺) is comparable with 𝟎, too. In all other cases we either directly showed that F2(𝜺) is comparable with 𝟎, or derived Fn(𝜺)𝟎 for all n. With Lemma 30, if Φ2(𝒔)μΦ for 𝒔Uδ(μΦ), then the answer to the BOR problem is guaranteed negative. Otherwise, we can exploit the procedure of Theorem 22. We derive the following result for Bellman operators in d=2 with arbitrarily many pieces.

Theorem 31.

The BOR problem is decidable with d=2.

Figure 3: An MDP 3 of Example 32.
Figure 4: 3 illustrating case 4.b.i.
Example 32.

We provide an example 3 with d=2, where 𝛆,F(𝛆) are vectors incomparable with 𝟎, and F2(𝛆)=𝟎. Here, α1(x1,x2)=12x1+13x2, α2(x1,x2)=β1(x1,x2)=12x1+15x2. That is, the example fits into the Case 4b.i. Let 𝛆=(31315,16). Then, α1(𝛆)>α2(𝛆); we get F(𝛆)=(2315,163). Further, α2(F(𝛆))>α1(F(𝛆)) and F2(𝛆)=(0,0).

5 Related Work and Discussion

In our work, we have outlined a series of phenomena inherent to the iterative application of Bellman operators. Crucial was the fact that the update coefficients were non-negative.

Notice that any PAM on the domain [0,1]d can be represented as a nested min-max of its affine components [23, 14]. This supports the relevance of PAMs with updates defined by the maximum of affine pieces, Bellman operators of MDPs being among them. Generalising our results beyond Bellman operators of MDPs using monotonicity and fixed-point convergence is a subject of our future work.

It is worth pointing out that we capture a large class of PAMs for which the assumptions of the known techniques do not hold. In proving decidability for d=2, we do not impose the restrictions used in works on the one-dimensional version of the problem. Bellman operators are, in general, neither injective as in [13] nor complete [8], nor even surjective.

Our techniques and results can be applied to other problems about PAMs. Consider, in particular, the universally quantified version of BOR: given a vector 𝒕[0,1]d and a Bellman operator Φ:[0,1]d[0,1]d, does there exist n for every 𝒔[0,1]d such that Φn(𝒔)=𝒕? This is the mortality problem, known to be undecidable for general PAMs in dimension 2 [5, 6]. For Bellman operators, however, it is equivalent to solving the BOR problem for 𝒔=𝟎 and for 𝒔=𝟏. Indeed, both BOR instances are “yes”-instances if and only if every instance with 𝟎𝒔𝟏 is a “yes”. We can answer this using our novel algorithm (Theorem 22), hence mortality for Bellman operators is decidable in all dimensions.

Without restrictions on the dimension d, the decidability of the BOR problem remains open. Notably, if there exist states with multiple tight actions, then the dynamics of the sequence Fn(𝜺)n as defined in Section 3.4 is intricate for 𝜺𝟎. For more works discussing the iterative dynamics of map F see [26, 30]. We highlight that Fn(𝜺) does not have a closed form, in contrast to Mn𝜺 for a fixed matrix M. While the entries of the vector Mn𝜺 are terms of linear recurrence sequences, the class of such sequences is not closed under max [29]. The behaviour of entries in Fn(𝜺) is even subtler than that, as witnessed by the non-positionality discussion and Example 23.

We emphasise that due to the undecidability of the vector reachability problem for matrix semigroups [3], following Remark 26 we need to argue about ultimate non-positivity of Fn(𝜺). However, even for one M, deciding whether (Mn𝜺)10 for some n is equivalent to the positivity problem [22], open in dimension d>5. This hardness carries on to the stochastic matrices [28]. Moreover, the question whether there exists n such that Mn𝜺𝟎 corresponds to the polyhedron-hitting problem, and is Diophantine-hard [11] for general matrices M.

References

  • [1] Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008.
  • [2] Christel Baier, Joachim Klein, Linda Leuschner, David Parker, and Sascha Wunderlich. Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In CAV (1), volume 10426 of Lecture Notes in Computer Science, pages 160–180. Springer, 2017. doi:10.1007/978-3-319-63387-9_8.
  • [3] Paul Bell and Igor Potapov. On undecidability bounds for matrix decision problems. Theor. Comput. Sci., 391(1-2):3–13, 2008. doi:10.1016/J.TCS.2007.10.025.
  • [4] Richard Bellman. A Markovian decision process. Journal of Mathematics and Mechanics, 6(5):679–684, 1957. URL: http://www.jstor.org/stable/24900506.
  • [5] Amir M. Ben-Amram. Mortality of iterated piecewise affine functions over the integers: Decidability and complexity. Comput., 4(1):19–56, 2015. doi:10.3233/COM-150032.
  • [6] Vincent D. Blondel, Olivier Bournez, Pascal Koiran, Christos H. Papadimitriou, and John N. Tsitsiklis. Deciding stability and mortality of piecewise affine dynamical systems. Theor. Comput. Sci., 255(1-2):687–696, 2001. doi:10.1016/S0304-3975(00)00399-6.
  • [7] Vincent D. Blondel and John N. Tsitsiklis. A survey of computational complexity results in systems and control. Autom., 36(9):1249–1274, 2000. doi:10.1016/S0005-1098(00)00050-9.
  • [8] Olivier Bournez, Oleksiy Kurganskyy, and Igor Potapov. Reachability problems for one-dimensional piecewise affine maps. Int. J. Found. Comput. Sci., 29(4):529–549, 2018. doi:10.1142/S0129054118410046.
  • [9] Tomás Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtech Forejt, Jan Kretínský, Marta Z. Kwiatkowska, David Parker, and Mateusz Ujma. Verification of Markov decision processes using learning algorithms. In ATVA, volume 8837 of Lecture Notes in Computer Science, pages 98–114. Springer, 2014. doi:10.1007/978-3-319-11936-6_8.
  • [10] Krishnendu Chatterjee and Thomas A. Henzinger. Value iteration. In 25 Years of Model Checking, volume 5000 of Lecture Notes in Computer Science, pages 107–138. Springer, 2008. doi:10.1007/978-3-540-69850-0_7.
  • [11] Ventsislav Chonev, Joël Ouaknine, and James Worrell. The polyhedron-hitting problem. In SODA, pages 940–956. SIAM, 2015. doi:10.1137/1.9781611973730.64.
  • [12] Luca de Alfaro. Formal verification of probabilistic systems. PhD thesis, Stanford University, USA, 1997. URL: https://searchworks.stanford.edu/view/3910936.
  • [13] Faraz Ghahremani, Edon Kelmendi, and Joël Ouaknine. Reachability in injective piecewise affine maps. In LICS, pages 1–11. IEEE, 2023. doi:10.1109/LICS56636.2023.10175723.
  • [14] V. V. Gorokhovik, O. I. Zorko, and G. Birkhoff. Piecewise affine functions and polyhedral sets. Optimization, 31(3):209–221, 1994.
  • [15] Serge Haddad and Benjamin Monmege. Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci., 735:111–131, 2018. doi:10.1016/J.TCS.2016.12.003.
  • [16] Arnd Hartmanns and Benjamin Lucien Kaminski. Optimistic value iteration. In CAV (2), volume 12225 of Lecture Notes in Computer Science, pages 488–511. Springer, 2020. doi:10.1007/978-3-030-53291-8_26.
  • [17] R. Kannan and R. J. Lipton. Polynomial-Time Algorithm for the Orbit Problem. J. ACM, 33(4):808–821, August 1986. doi:10.1145/6490.6496.
  • [18] Ravindran Kannan and Richard J. Lipton. The orbit problem is decidable. In STOC, pages 252–261. ACM, 1980. doi:10.1145/800141.804673.
  • [19] Toghrul Karimov, Edon Kelmendi, Joël Ouaknine, and James Worrell. What’s decidable about discrete linear dynamical systems? In Principles of Systems Design, volume 13660 of Lecture Notes in Computer Science, pages 21–38. Springer, 2022. doi:10.1007/978-3-031-22337-2_2.
  • [20] Pascal Koiran, Michel Cosnard, and Max Garzon. Computability with low-dimensional dynamical systems. Theor. Comput. Sci., 132(1):113–128, September 1994. doi:10.1016/0304-3975(94)90229-1.
  • [21] Bart Kuijpers. Deciding the point-to-fixed-point problem for skew tent maps on an interval. J. Comput. Syst. Sci., 115:113–120, 2021. doi:10.1016/J.JCSS.2020.07.005.
  • [22] Joël Ouaknine and James Worrell. Positivity problems for low-order linear recurrence sequences. In SODA, pages 366–379. SIAM, 2014. doi:10.1137/1.9781611973402.27.
  • [23] Sergei Ovchinnikov. Max-min representation of piecewise linear functions. Beiträge zur Algebra und Geometrie, 43(1):297–302, 2002. URL: http://eudml.org/doc/225460.
  • [24] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994. doi:10.1002/9780470316887.
  • [25] Tim Quatmann and Joost-Pieter Katoen. Sound value iteration. In CAV (1), volume 10981 of Lecture Notes in Computer Science, pages 643–661. Springer, 2018. doi:10.1007/978-3-319-96145-3_37.
  • [26] Karel Sladký. Bounds on discrete dynamic programming recursions. i. models with non-negative matrices. Kybernetika, 16(6):(526)–547, 1980. URL: http://eudml.org/doc/28460.
  • [27] Ashish Tiwari. Termination of linear programs. In CAV, volume 3114 of Lecture Notes in Computer Science, pages 70–82. Springer, 2004. doi:10.1007/978-3-540-27813-9_6.
  • [28] Mihir Vahanwala. Skolem and positivity completeness of ergodic Markov chains. Inf. Process. Lett., 186:106481, 2024. doi:10.1016/J.IPL.2024.106481.
  • [29] Gerco van Heerdt, Justin Hsu, Joël Ouaknine, and Alexandra Silva. Convex language semantics for nondeterministic probabilistic automata. In ICTAC, volume 11187 of Lecture Notes in Computer Science, pages 472–492. Springer, 2018. doi:10.1007/978-3-030-02508-3_25.
  • [30] W.H.M. Zijm. Generalized eigenvectors and sets of nonnegative matrices. Linear Algebra and its Applications, 59:91–113, 1984. doi:10.1016/0024-3795(84)90161-7.