Abstract 1 Introduction 2 The objects of interest 3 The Lévy-Prokhorov distance lifting 4 𝜺-(Bi)simulations, coalgebraically 5 Concluding remarks References Appendix A Proofs

𝜺-Distance via Lévy-Prokhorov Lifting

Josée Desharnais ORCID Laval University, Québec, Canada Ana Sokolova ORCID University of Salzburg, Austria
Abstract

The most studied and accepted pseudometric for probabilistic processes is one based on the Kantorovich distance between distributions. It comes with many theoretical and motivating results, in particular it is the fixpoint of a given functional and defines a functor on (complete) pseudometric spaces. It is also the foundation for a categorical lifting of pseudometrics.

Other notions of behavioural pseudometrics have also been proposed, one of them (ε-distance) based on ε-bisimulation. ε-Distance has the advantages that it is intuitively easy to understand, it relates systems that are conceptually close (for example, an imperfect implementation is close to its specification), and it comes equipped with a natural notion of ε-coupling. Finally, this distance is easy to compute.

We show that ε-distance is also the greatest fixpoint of a functional and provides a functor. The latter is obtained by replacing the Kantorovich distance in the lifting functor with the Lévy-Prokhorov distance. In addition, we show that ε-couplings and ε-bisimulations have an appealing coalgebraic characterization.

Keywords and phrases:
Lévy-Prokhorov metric, behavioural distance, epsilon-bisimulation, reactive probabilistic transition systems, discrete labelled Markov processes, coalgebraic epsilon-(bi)simulation
Funding:
Josée Desharnais: Work funded by NSERC grant.
Copyright and License:
[Uncaptioned image] © Josée Desharnais and Ana Sokolova; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Probabilistic computation
; Theory of computation Categorical semantics ; Theory of computation Program verification
Related Version:
Full Version: https://arxiv.org/abs/2507.10732
Acknowledgements:
This work was partly done during a sabbatical of Josée Desharnais at the University of Salzburg, as well as during Dagstuhl Seminar 24432 and the Bellairs Workshop on Quantitative Reasoning 2025. We thank these venues, the organizers, and the participants for providing a perfect working environment. We also thank Matteo Mio and Franck van Breugel for some fruitful and motivating discussions, and the anonymous reviewers for their insightful and constructive suggestions.
Editors:
Stefano Guerrini and Barbara König

1 Introduction

Probabilistic systems [28, 38, 48, 7, 8, 44] and their behaviour have been the object of study of over thirty years in the area of formal verification and analysis of systems. They are used to represent uncertainty, incomplete information, as well as randomized behaviour. One important direction, in order to prove that systems behave the same, is the study of behavioural equivalences: these identify states with (exactly) the same behaviour. Behavioural equivalences are an elegant way to analyse and compare behaviour of systems, they have nice foundations in concurrency theory [41, 60] as well as elegant abstract generalizations in the theory of coalgebras [20, 49, 32, 36, 33, 35, 13, 14]. However, as was already observed in [29], in a non-exact world, e.g., when the probabilities in the model are approximate and not exactly known, or estimated by sampling, or when small differences should not be considered the same as large differences, behavioural equivalences may be too strong.

One solution is to employ a distance, actually a pseudometric, providing a quantitative notion of how much states in a probabilistic model differ from one another, or how close they are to each other – distance zero corresponding to bisimulation equivalence. The study of pseudometrics [22, 58, 24, 18, 59, 56, 26, 54, 55] and quantitative theories [39, 40, 5, 6, 9, 42, 16, 43] has been a fruitful one in the past decade(s).

Historically, the impulse to define a behavioral distance between probabilistic processes became imperative for those with a continuous state space and continuous probability distributions, called Labelled Markov Processes (LMPs). It started with the idea of defining a real-valued logic, as proposed by Kozen [37], to express properties satisfied by processes: Taking the supremum over the differences of the values of formulas was a natural way of defining a pseudometric, see Desharnais et al. [22, 24]. The authors later on observed that this distance was the greatest fixpoint of some functional [25] and, in between and later, van Breugel and Worrell [58, 59] and van Breugel et al. [56], proved that this gives a functor (monad) that is a lifting of the distribution functor (monad) using the Kantorovich metric on distributions. Moreover, the distance is the one obtained using final coalgebra semantics. Since then, this pseudometric has been studied further in different contexts and algorithms have been proposed to compute it [53, 5, 6]. The distance comes with many theoretical and motivating results, and has also been extended by Baldan et al. [9] and Sprunger et al. [51, 52] to a generic Kantorovich-style (and dually Wasserstein) lifting of arbitrary functors parametric in certain evaluation maps – the two liftings coincide on distributions. With all these seals of approval, the Kantorovich behavioural distance has imposed itself as the one to use and study. One aspect that it lacks is an intuitive accompanying notion of approximate bisimulation.

Approximate bisimulations provide another strategy to circumvent the non-robustness of bisimulation equivalences. They have been studied in non-probabilistic systems [61, 30, 51], and in probabilistic systems [26, 54, 27, 1, 2, 10, 50]. We focus on the notion of ε-bisimulation defined in [26]. One aim of such approximate bisimulations is to unite the best of both worlds: give a relational structure for reasoning about systems and at the same time define a distance. States related by an ε-bisimulation are at distance at most ε from one another. The advantages of such a distance are its intuitive nature and its ability to relate systems that are almost the same in structure, as the next example will illustrate. Finally, the distance can be computed relatively easily.

For comparison with the Kantorovich behavioural distance, consider the imperfect channel (when ε>0) depicted in Figure 1. The ε-distance of this channel to a perfect channel (one with ε=0) is ε, and it provides a simple example that imperfect implementation

Figure 1: A simple channel that fails to send a token with probability ε.

can be considered close to its specification. On the other hand, the Kantorovich distance (without discount) gives distance 1 to the pair of perfect and imperfect channels, which reflects the fact that the two channels have a very different behavior on the long run – see Example 1 for more details. Interestingly, ε-distance has an almost build-in notion of continuity: Two channels with close values of ε are close in the ε-distance as well.

However, the approximate equivalences had no categorical formulation attached and it was not clear whether the associated distance shares some of the generic good properties identified for the Kantorovich distance. Our main motivation for this work was to investigate whether we can remedy this situation. Indeed, we provide the missing abstract characterization of ε-bisimulation in two directions: by obtaining a fixpoint characterization of the induced distance and by giving a coalgebraic view following Aczel-Mendler coalgebraic bisimilarity. The fixpoint characterization is the main result of the paper. For this, we show how to replace the Kantorovich metric on distributions with the Lévy-Prokhorov metric on distributions, and show that the distance obtained is indeed the ε-distance. It is remarkable how well the Lévy-Prokhorov distance fits into the definition of ε-bisimulation: We show that ε-distance is the greatest fixpoint of a suitable functional, and that the lifting also lifts the discrete distributions functor to a functor on pseudometric spaces. We also show a result that has no matching for the Kantorovich functional: any fixpoint distance of the Lévy-Prokhorov functional defines an ε-bisimulation. Applications of our results may take different routes: On the practical side, the distance could be computed by an iterative algorithm given a suitable fixpoint theorem. On the theoretical side, the fixpoint characterization may open the way for studying new problems, one example being ε-bisimulations up-to as an acceleration to computing ε-bisimilarity, along the lines of [12].

Our observation may also open a new way of studying other pseudometrics on probabilistic systems that would be constructed from other basic distances on distributions.

Unlike in the case of the Kantorovich lifting, the Lévy Prokhorov lifting does not yield a monad on the category of pseudometric spaces, see Section 3.4. As a consequence, we could not follow the path of [56] and reproduce their results on the final coalgebra. Instead, we take a different way and describe ε-(bi)simulations as coalgebraic (bi)simulations, following and extending the framework of Hasuo [32, 33] and the original results of Hughes and Jacobs [34]. We only present the necessary abstraction needed to discuss ε-(bi)simulations on discrete time Markov chains and labelled Markov processes, generalizations of these notions are possible and we plan to elaborate on them in follow-up work.

All proofs omitted for space reasons are in the appendix.

2 The objects of interest

Our development concerns discrete labelled Markov processes (LMPs) also called reactive probabilistic transition systems. They consist of a set of states S and transitions τa:S𝒟S labelled with actions a of some set 𝒜, where 𝒟S is the set of discrete sub-distributions on S:

𝒟S={φ:S[0,1]sSφ(s)1}.

Note that the size of the state set need not be restricted (for the sums to be defined), as the sum is defined as xSφ(x)=supfinite XS(xXφ(x)). It is easy to prove that discrete probability distributions always have a countable support, i.e., they assign non-zero probability to at most countably many elements of S.

For s,tS and a𝒜, the value τa(s)(t) (also written τa(s,t)) encodes the probability of jumping from s to t when action a is taken. Figure 1 shows such a system with actions send and ack. The first transition is a subprobability: the missing probability represents the probability of failure. If XS, we write τa(s)(X) for tXτa(s)(t). Coalgebraically, LMPs, for a set of actions or labels 𝒜, are arrows τ:S(𝒟S)𝒜. We write τa(s) for the subdistribution τ(s)(a). In case of just one label, LMPs are (discrete time) Markov chains, i.e., coalgebras of the functor 𝒟. The reason for working with subdistributions is that they come equipped with a non-trivial pointwise order. Bisimilarity of LMPs is the classical Larsen-Skou [38] bisimilarity.

We denote by 𝐏𝐌𝐞𝐭𝟏 the category of 1-bounded pseudometric spaces. It has as objects pseudometric spaces, which are pairs X,d of a set X and a function d:X×X[0,1] which satisfies d(x,x)=0, is symmetric: d(x,y)=d(y,x) and satisfies the triangular inequality: d(x,y)+d(y,z)d(x,z) for all x,y,zX. As arrows, 𝐏𝐌𝐞𝐭𝟏 has non-expansive maps f:X,dXY,dY, i.e., functions f:XY satisfying dY(f(x1),f(x2))dX(x1,x2). A metric or distance is a pseudometric that additionally satisfies d(x,y)=0 iff x=y. Of particular interest for LMPs are pseudometrics whose kernel is bisimilarity, that is, d(x,y)=0 iff x and y are bisimilar. As we only deal with pseudometrics here, we often use the word distance or metric as a shorthand for pseudometric. Following [59, 56], we will use the opposite pointwise order111One reason for this choice is to obtain the distance as the greatest, rather than least, fixpoint – just like bisimilarity is the greatest fixpoint of a suitable functional. on distances defined by:

d1d2x,yX:d1(x,y)d2(x,y).

However, whenever it is clearer, we may use the direct pointwise order too: d1d2d1d2.

2.1 The Kantorovich pseudometric

The observation that bisimulation is too strong in the context of probabilistic systems has led to the idea of defining a pseudometric that would give zero distance to bisimilar states. Initially, this pseudometric was defined using a real-valued logic. A set of functionals were defined from states of LMPs to [0,1], with the following syntax, mimicking logical formulas [24]:

f:=1inf(f1,f2)1ffqaf with q[0,1].

We omit the semantics, but the next example will give a taste of it. A distance emerged naturally as

dK(s,t)=supf|f(s)f(t)|.
Example 1.

As an example, we can look back at the channel of Figure 1. The maximum difference over functionals between cε and cγ, with ε,γ>0 is over the functional send ack1, which evaluates to 1ε and 1γ, respectively, and thus yields the distance dK(cε,cγ)=|εγ|. Between cε and c0, the functional (sendack)n1 evaluates to (1ε)n on cε, and to 1 on c0. So the supremum results in dK(cε,c0)=1.

Later on, it was proven that the functionals could be any non-expansive maps and that this distance is the distance obtained via a final coalgebra construction for a functor (monad) that is a lifting of the Distribution functor (monad) to pseudometric spaces using the Kantorovich metric, by van Breugel and Worrell [59] and van Breugel et al. [56, 57].

This distance is a fixpoint of the functional on distances on the states of an LMP τ:S(𝒟S)𝒜

ΔK(d)(s0,s1)=supa𝒜δKd(τa(s0),τa(s1)) (1)

where δKd is the well-known Kantorovich distance between two distributions. We do not need its definition here, the interested reader can find it in, e.g., [56].

The distance on distributions is just a parameter in Equation (1); changing the Kantorovich distance δKd to another distance on distributions and looking for a fixpoint will give us a new distance on states. In fact we are interested in the question formulated the other way around: find a distance on distributions for which the fixpoint is the ε-distance d, defined in the next section. Similarly, the functional ΔK was discovered after the behavioural distance dK was introduced. However, it could have been done the other way around, choosing a distance on distributions, and looking for the resulting distance on states, as is done by Baldan et al. [9]. We discuss this in more detail again in Section 3 below.

2.2 𝜺-Bisimulation

Bisimulation, being too strong for the comparison of states in quantitative systems, has also led to relaxing the definition of simulation and bisimulation itself to approximate relations. As mentioned in the introduction, there are a few of those definitions of approximate bisimulations, but we are interested in the following.

Definition 2 ([26]).

Let τ:S(𝒟S)𝒜 be an LMP and let ε[0,1]. A relation RS×S is an ε-simulation if whenever sRt, then for all a𝒜, for all XS

τa(s)(X)τa(t)(R(X))+ε, where R(X)={yxX:xRy}.

If R is symmetric, it is an ε-bisimulation. A state s is ε-simulated by state t, written sεt, if sRt for some ε-simulation R. If sRt for R ε-bisimulation, we write sεt and we say that s and t are ε-bisimilar.

As expected, ordinary (bi)simulation on LMPs [38, 11] is simply 0-(bi)simulation. This definition has an extension to nondeterministic and probabilistic finite systems [54].

The operation R(X) on a set X is what restricts this work to discrete distributions. In general, if X is measurable, we may not have R(X) measurable. Working in analytic spaces, as was done before for LMPs [23] may solve this issue, but we leave it for future work. In particular, one could ask R(X) to be measurable whenever X is, as done in [10], but many results are not proven in that case, like the logical characterisation, or Theorem 5 below.

Example 3.

Consider the following example, with γ(0,1].

The relation R={(s,t),(s1,t1)} is an ε-simulation for ε=γ, and so is RR1. Hence, sγt, tγs and sγt. However, the situation is different for ε<γ: for example, taking ε=0, we observe that s0t (with the relation R{(s2,t1)}). However, t0s, because a relation R relating t and s would have to also relate t1 to s1 and to s2, if γ<1, and t1 to s2 if γ=1. Indeed, taking X={t1}, we need τa(t)(X)τa(s)(R(X))+0, but for this to be the case, we would need τa(s)(R(X))=1 and hence R must include the pairs (t1,s1) and (t1,s2) if γ<1, and otherwise must include (t1,s2) if γ=1. So, in any case (t1,s2)R, but the pair (t1,s2) cannot be related by any ε-simulation relation R for ε<1 (which is the case here as ε=0), as, for example, taking again X={t1}, we would have τa(t1)(X)=1 but τa(s2)(R(X))=0. Hence, also s≁0t.

In contrast to the case of 0-bisimilarity, two-way ε-similarity is not ε-bisimilarity [26, 54]. From the notion of ε-bisimilarity, a pseudometric arises naturally by taking the infimum.

Definition 4 (The ε-distance d [26]).

Let τ:S(𝒟S)𝒜 be an LMP. The pseudometric d on S×S, called ε-distance, is defined as follows:

d:S×S[0,1](s,t)inf{ε[0,1]sεt}.

The function d is a pseudometric on the states of the considered LMP: the triangle inequality comes from the well-known fact, see e.g. [26], that sε1u and uε2t imply sε1+ε2t. (The same property holds for ε as well.) Indeed, ε and ε are not transitive. Instead, these relations are entourages that form a uniform structure, or uniformity, cf. [15].

The kernel of d, that is, the set of pairs at zero distance, is 0, the bisimilarity relation on LMPs.

Theorem 5 ([26]).

d(s,t)=0s0t.

One of the nice properties of this distance is its relatively easy computability, especially regarding computation “by hand”. It suffices to define an ε-bisimulation to obtain an upper bound to the distance between states. In Example 3, we have d(s,t)=γ, and, indeed, we easily see that the probabilities of s and t seen as processes are within γ. Also here, the Kantorovich distance between these states is 1. One way to see that is by using the functional an1, which evaluates to (1ε)n on s, and to 1 on t. So dK(s,t)=1. Similarly, as already noted in the introduction, for the channel in Figure 1, we can prove that d(cε,cγ)=|εγ|=dK(cε,cγ), for ε,γ>0, but d(cε,c0)=εdK(cε,c0). In fact, ε-distance is incomparable to both the discounted and the undiscounted Kantorovich distance, as argued in [26]. That paper also gives a straightforward polynomial algorithm to compute d. The computation of the Kantorovich distance, discounted or not, has been the subject of many papers, which also led to polynomial algorithms [53].

3 The Lévy-Prokhorov distance lifting

We will now explain in details what we think should be a behavioral pseudometric on LMPs. These details may not be new, or may appear obvious to the expert reader, but we find it useful to spell them out here. When looking for a distance d on the states of an LMP, one observes that states are both targets of distributions and (a set of) distributions themselves (since they are defined by their outgoing transition distributions): that is, in Example 3, the states s and t can be viewed as distributions over the set {s1,s2,t1}. As there are already a few distances on distributions that were studied outside computer science and concurrency theory, they give a starting point for distances on states: we could say, given such a distance δ:

d(s,t):=supa𝒜δ(τa(s),τa(t)), for s,tS. (2)

We will use the letter d for distances on states, and the symbol δ for distances on distributions. Analysing the options of this equation in full generality is outside the scope of the current paper but some distances, such as δKd (and the one we introduce below δLPd) have a particularity. They are defined using a parameter d, a basic distance on the space where the distributions are defined. This is desirable for concurrency theory because distributions on states that are different but close (or even bisimilar!) should also be close: so a starting distance d on the state space is of key importance to account for these similarities between states/processes. Note that a few papers [30, 19, 51] start from distances on the states that account for extra information (e.g. a distance on the labels/observations that are attached to states) – but this goes beyond the behaviour of states that we want to capture here (although they constitute an interesting line of work to extend our method).

A needed property for d to be a behavioural distance, as also pointed out in [9] for δKd, is that it accords with the starting distance d used in δd, as follows.

d(s,t)=supa𝒜δd(τa(s),τa(t)), for s,tS. (3)

This says that the distance between states viewed as simple members of the space (the left hand part of the equality), is the same as their distance when viewed as processes, that is, their outgoing transition distributions. So this really says that d treats states according to their behaviour. Technically, this is saying that we are looking for a distance fixpoint d of the functional

Δ(d)(s,t):=supa𝒜δd(τa(s),τa(t)), for s,tS. (4)

Of course, by taking δd:=δKd, this instantiates exactly to Equation (1), the fixpoint property of the Kantorovich behavioural distance, observed by [25, 59] and explicitly proven in [57]. One of our goals in this work was to find a similar functional that would have the ε-distance d as its greatest fixed point. That is, we are seeking a suitable distance δd on distributions.

 Remark 6.

Another property that one might consider natural and one might expect from such a distance δd is that the distance between the Dirac distributions on states is the same as the starting distance between them. That is, for a distance d on states, one might expect

d(s,t)=δd(1s,1t), for s,tS, (5)

where we write 1s for the Dirac distribution on sS. This is an interesting property of the lifting δd – it is a stronger version of the non-expansiveness of the unit of the distribution monad, showing that the unit is an isometry, and we will return to it in Section 3.4 below where we show that it holds for the Lévy-Prokhorov distance, for any starting distance d on states. It also holds for the (undiscounted) Kantorovich distance – this is easy to prove using the Wasserstein formulation of the Kantorovich distance, whereas it does not hold for the total variation distance [43]. Therefore, this property is neither unique to the Lévy-Prokhorov distance that we are interested in, nor it is “behavioural” in the sense that we explained so far: it does not take into account the transition structure τ of the states. For a somewhat behavioural explanation, note that combining it with Equation (2), this condition implies that a pair of states s,t each having a single outgoing transition as a Dirac on state s and t respectively, would get the same distance as s and t. In particular, such a distance does not discount the future.

3.1 The Lévy-Prokhorov distance on distributions

While examining equation (3) and trying to make d fit into it as d we came up with the following distance on distributions. Only afterwards, we discovered that the distance was actually known as the Lévy-Prokhorov lifting of a distance d to distributions. In this section we introduce the distance and provide an example that illustrates it.

Definition 7 (Lévy-Prokhorov distance [45]).

Let S,d be a pseudometric space; we endow 𝒟S with the pseudometric (or distance) δLPd:𝒟S×𝒟S[0,1], defined as

δLPd(μ0,μ1)=inf{εXS:μi(X)μ1i(Xεd)+ε, for i=0,1},

where Xεd={yxX:d(x,y)<ε}.
We call δLPd the Lévy-Prokhorov (LP, for short) distance.

The strict inequality in the definition of Xεd is necessary for the proof of Theorem 11.

In the following example we define simple probability measures that help illustrate the need for the extra “+ε” in this definition and the need for the “ε ball around X”. At first sight, δLPd looks very much like the total variation distance, but this “ε ball around X” makes it very different.

Example 8.

Consider the set of states S={xγγ[0,1]} and a distance on these states given by d(xγ,xξ)=|γξ|, for γ,ξ[0,1]. In a probabilistic transition system, one could imagine that xγ has an a-loop to itself with probability 1γ, γ[0,1], as depicted below (adding transitions out of the states to help see their differences as processes in a transition system). We now define a family of distributions on these states, and we picture them below as states with an outgoing transition without label (one could imagine an a-label). Let νγ=γ1x1+(1γ)1xγ, for γ[0,12), with 1x the Dirac measure on x. The distributions νγ and ν0 are illustrated below. For a fixed γ, these two distributions are actually non-zero on a three-state space S={x0,x1,xγ} .

We show that δLPd(νγ,ν0)=γ. Let 1γ>ε>γ. We show that all such ε satisfy the inequalities in the definition of δLPd and any εγ does not, so the distance, being the infimum over all such values, is indeed γ. In particular, we need to check that for all XS, νγ(X)ν0(Xεd)+ε and ν0(X)νγ(Xεd)+ε. The cases X=S and X= are trivial, since the distributions are full. For the sets X of size two, we have
For X:={x0,x1}:νγ(X)=γ1+ε=ν0(S)+ε=ν0(Xεd)+εν0(X)=11+ε=νγ(S)+ε=νγ(Xεd)+εFor X:={x1,xγ}:νγ(X)=11+ε=ν0(S)+ε=ν0(Xεd)+εν0(X)=01+ε=νγ(S)+ε=νγ(Xεd)+εFor X:={x0,xγ}:νγ(X)=1γ1+ε=ν0({x0,xγ})+ε=ν0(Xεd)+εν0(X)=1(1γ)+ε=νγ({x0,xγ})+ε=νγ(Xεd)+ε

The last inequality is satisfied thanks to the room given by the “+ε”. For sets of size one, omitting the checks where the probability is zero on the left-hand side of the inequality:

For X:={x1}:νγ(X)=γ0+ε=ν0({x1})+ε=ν0(Xεd)+εFor X:={xγ}:νγ(X)=1γ1+ε=ν0({xγ,x0})+ε=ν0(Xεd)+εFor X:={x0}:ν0(X)=11γ+ε=νγ({xγ,x0})+ε=νγ(Xεd)+ε.

In the one but last inequality, it is crucial that xγ be within ε of x0, which includes x0 in the ε ball around xγ. With X in place of Xεd, the inequality would not be satisfied, and not even with Xγ/2d, a set where states at distance zero are included (like bisimilar states – of which there are none here), because we would obtain:

νγ({xγ})=1γ0+ε=ν0({xγ}γ/2d)+ε.

The last inequality also illustrates this, and also the need for the “+ε”, as the ε-ball itself is not enough for the inequality to be satisfied. Moreover, for a pair of distributions νγ,νξ in our family one can similarly prove that δLPd(νγ,νξ)=|γξ|.

Finally, we note that for εγ we have Xεd=X for any XS and hence, e.g., for X={xγ}, as already noted above we have νγ(X)=1γ0+ε=ν0(Xεd)+ε as εγ<12.

One can notice that when viewed as processes, νγ and ν0 are the same as s and t of Example 3. Technically, it is rather τa(s) and τa(t) of course.

Once one has seen the Lévy-Prokhorov distance on distributions it seems not surprising that it has some link with the ε-distance. However, our surprise was the other way around, when we realized that the distance on distributions δd that we had to concoct to make d a fixed point of Δ in Equation (4) already existed.

3.2 The Lévy-Prokhorov distance on LMPs

We are now ready to define the functional on distances over LMPs that we were looking for. We expect a fixpoint of this functional to be a behavioral distance, but more importantly we expect d to be the greatest fixpoint of it.

Definition 9.

Let S,d be a pseudometric space, on which an LMP τ:S(𝒟S)𝒜 is defined. Let s0 and s1 be states. We define the functional ΔLP as

ΔLP(d)(s0,s1) =supa𝒜δLPd(τa(s0),τa(s1))
=supa𝒜inf{ε(AS:τa(si)(A)τa(s1i)(Aεd)+ε,i=0,1)}.

Hence, for a fixed pseudometric space S,d, ΔLP applied to d gives a new pseudometric on S. Clearly, this can be seen as a functional on 𝐏𝐌𝐞𝐭𝟏, mapping S,d to (S,ΔLP(d)).

Before proving that d is a fixpoint, we show how the supremum over actions can safely be replaced by inserting a universal quantification on actions inside the set, as will be useful in Proposition 15.

Lemma 10.

Let S,d be a pseudometric space, on which an LMP τ:S(𝒟S)𝒜 is defined. Let s0 and s1 be states. Then

ΔLP(d)(s0,s1) =inf{ε(a𝒜,AS:τa(si)(A)τa(s1i)(Aεd)+ε,i=0,1)}.

One remarkable property of prefixpoints of this functional is that they define ε-bisimulations. Of course the theorem is also true for fixpoints.

Theorem 11.

Let τ:S(𝒟S)𝒜 be an LMP with S,d a pseudometric space. If d is a prefixpoint of ΔLP, then for any ε>0, Rε:={(s,t)S×Sd(s,t)<ε}, is an ε-bisimulation.

Proof.

Let d be a prefixpoint, i.e., let ΔLP(d)d. Let Rε={(s,t)d(s,t)<ε}. We show it is an ε-bisimulation. Let s0Rεs1, i.e., d(s0,s1)<ε, and also ΔLP(d)(s0,s1)<ε, because d is a prefixpoint. Let a𝒜; then δLPd(τa(s0),τa(s1))<ε. Because δLPd is an infimum, there is some γ<ε such that for i=1,2 and XS

τa(si)(X) τa(s1i)(Xγd)+γ by definition of δLP
τa(s1i)(Xεd)+ε because XγdXεd and γε.
=τa(s1i)(Rε(X))+ε by definition of Rε(X).

So Rε is an ε-bisimulation, as wanted. The proof of this theorem needs that Xεd be defined with a strict inequality. It is necessary for the existence of γ (and for the last equality).

This theorem is nice in itself but it also has an important corollary, that d is greater than or equal to all fixpoints of ΔLP.

Corollary 12.

d is greater than or equal to all fixpoints, i.e.,

ΔLP(d)=ddd.

Proof.

Let d be a fixpoint of ΔLP and let d(s,t)=ε. By the previous theorem, sγt for all γ>ε. Since d(s,t)=inf{eset}, we obtain d(s,t)ε=d(s,t), as wanted.

With the next proposition, we provide another example of a fixpoint of ΔLP: the exact bisimilarity distance, which we denote by d. It is equal to 0 if the states are bisimilar, otherwise it is 1.

Proposition 13.

The exact bisimilarity distance d is a fixpoint of ΔLP.

Proof.

We first prove that bisimilar states s,t satisfy ΔLP(d)(s,t)=0. Consider R to be bisimilarity. Because R is a bisimulation, we have that for all s0Rs1, a𝒜 and XS,

τa(s0)(X)τa(s1)(R(X))=τa(s1)(Xγd),

for any γ(0,1), because Xγd is just the smallest R-closed set that contains X. A symmetric argument applies with 0 and 1 interchanged. Consequently, ΔLP(d)(s0,s1)=0, as wanted.

Conversely, if ΔLP(d)(s0,s1)=0, then there is a sequence of γn(0,1) that converges to zero, for which we have

τa(s0)(X)τa(s1)(Xγnd)+γn=τa(s1)(R(X))+γn.

Since the property is monotone, recall also (6), for all γ(0,1) we have

τa(s0)(X)τa(s1)(Xγd)+γ=τa(s1)(R(X))+γ.

So τa(s0)(X)τa(s1)(R(X)), as wanted.

Proposition 14.

ΔLP is monotonic.

We can now prove that the distance defined by ε-bisimulation is a fixpoint of ΔLP.

Proposition 15.

d is a fixpoint of ΔLP, and hence it is the greatest fixpoint.

At this point we remark again that even if it seems not surprising that the Lévy-Prokhorov metric on distributions has some link with the ε-distance, going from the basic metric on distributions δLP to a meaningful distance on LMPs is not direct (one has to find the fixed-point properties).

3.3 The Lévy-Prokhorov lifting of the subdistribution functor to 𝐏𝐌𝐞𝐭𝟏

Using the LP distance lifting, we define a functor 𝒟LP on 𝐏𝐌𝐞𝐭𝟏, the category of 1-bounded pseudometric spaces with nonexpansive functions, as follows: On objects X,d, we have

𝒟LPX,d=(𝒟X,δLPd)

and on morphisms f:X0,d0X1,d1 we set 𝒟LPf=𝒟f, that is

𝒟LPf:𝒟LPX0,d0𝒟LPX1,d1 with φλy.φ(f1({y})).
Proposition 16.

𝒟LP is a functor on 𝐏𝐌𝐞𝐭𝟏.

Just like for the Kantorovich-lifting of the distribution functor, we can prove that the Lévy-Prokhorov lifting of the distribution functor is locally nonexpansive, in the next proposition. For this reason, note that given pseudometric spaces X,dX and Y,dY in 𝐏𝐌𝐞𝐭𝟏, the hom-set XY of all nonexpansive maps from X to Y carries a metric defined by

dXY(f1,f2)=supxXdY(f1(x),f2(x)).
Proposition 17.

The functor 𝒟LP is locally nonexpansive, that is, for f1,f2XY

δ𝒟LPX𝒟LPY(𝒟LPf1,𝒟LPf2)dXY(f1,f2)

with δ𝒟LPX𝒟LPY being the metric on the hom-set 𝒟LPX,dX𝒟LPY,dY.

For the following property, recall that a map f:XY for (pseudo)metric spaces X,dX and Y,dY is an isometry if and only if for all x,yX, dY(f(x),f(y))=dX(x,y).

Proposition 18.

The functor 𝒟LP preserves isometries, i.e., if f:XY is an isometry for (pseudo)metric spaces X,dX and Y,dY, then so is 𝒟LPf.

As a consequence, from the results of [31], we obtain that the lifting functor 𝒟LP is Kantorovich, i.e., it has codensity lifting which means that there is a set of predicate liftings of type 𝒟[0,1][0,1] such that codensity lifting gives rise to 𝒟LP. This has already been remarked in [31, Example 5.5.2]. However, the exact set of predicate liftings needed to give rise to the LP-lifting is not provided.

3.4 The lifted functor 𝓓𝑳𝑷 is not a monad lifting of 𝓓

Behavioural distances have been axiomatized within the line of work on quantitative equational theories [39]. It was therefore a natural question for us whether the ε-distance, or the Lévy-Prokhorov distance itself can be given a quantitative axiomatization. This is what we briefly investigate in this section.

Mio et al. [43, Lemma 7.2, Theorem 7.7(2)] have shown that:

  • If a monad on metric spaces is axiomatizable, then (just by being a monad on metric spaces) it has nonexpansive unit and multiplication.

  • A monad on metric spaces that is a lifting of a monad on 𝐒𝐞𝐭𝐬 is axiomatizable with a quantitative theory. Being a lifting means that it acts on objects and arrows in metric spaces in the same way it does in 𝐒𝐞𝐭𝐬 and the unit and multiplication are nonexpansive.

The Kantorovich lifting of 𝒟 has been axiomatized, and the total variation distance has been shown non-axiomatizable (as the unit is not nonexpansive). In the case of 𝒟LP we can see that the unit is nonexpansive, but the multiplication is not, and hence 𝒟LP is not a lifting of the monad 𝒟 to 𝐏𝐌𝐞𝐭𝟏 and the Lévy-Prokhorov distance on distributions is not axiomatizable with a quantitative theory, at least not with the standard multiplication.

Recall the definitions of η and μ for the subdistribution monad 𝒟:

ηX(x)=1x, the Dirac distribution at x; and μX(Φ)(x)=φΦ(φ)φ(x).
Lemma 19.

The unit η of 𝒟 is nonexpansive with respect to the Lévy-Prokhorov distance. Moreover, it is an isometry, i.e., δLPd(1x,1y)=d(x,y).

The unit of the distribution monad is also an isometry with respect to the Kantorovich distance (without discount), as is easy to prove using the duality with the Wasserstein distance. However, the multiplication μ of the monad 𝒟 is not nonexpansive, as the following example shows.

Example 20.

Consider a, b and c in the figure below as distributions over {,}.

The ambient distance is d(,)=1, we omit it from the notation: we write δLP for δLPd. Consider the distributions φ:=1a and ψ:=(1ε)1b+ε1c over these distributions, as pictured. We collect several useful facts:

  • δLP(a,b)=ε and δLPδLP(φ,ψ)=ε.

  • μφ=a and μψ=(1ε)2+(1ε)ε+ε=(1ε)2+(2ε)ε.

  • {}γ={} for any γε and hence μφ({})=1 and μψ({}γ)=μψ({})=(1ε)2.

For γ<ε(2ε), we have 1(1ε)2+γ which therefore yields

μφ({})=1(1ε)2+γ=μψ({}γ)+γ.

This shows that δLP(μφ,μψ)ε(2ε)>ε=δLPδLP(φ,ψ) and hence μ is not nonexpansive.

This result is not surprising, as the binding operator μ really is a multiplication, and so it accords well with the Kantorovich metric, which does multiply the probabilities of distributions in play. We do not know yet if another operator could help express the functor 𝒟LP as a monad lifting.

4 𝜺-(Bi)simulations, coalgebraically

The Kantorovich behavioural distance has a coalgebraic characterization via a final coalgebra semantics. Whether the ε-distance is obtained by finality is still open, but the results on the Kantorovich distance do not seem to directly apply here. However, as the distance is defined via ε-bisimulations and bisimilarity, it is natural to see whether a coalgebraic semantics arising from generalizing coalgebraic bisimulations and bisimilarity is in place. The answer to this question is positive and we present the necessary observations in this section. An abstract (in this case coalgebraic) characterization allows for, on the theoretical side, deeper and clearer understanding of the notion under study, and, on the practical side, generalizations. We could use this generality to define notions of ε-bisimulations for other types of (probabilistic) systems.

In a nutshell, in this section we show that ε-simulation and ε-bisimulation have a span-diagram characterization, in a way similar to coalgebraic simulation [34, 32, 33] and Aczel-Mendler coalgebraic bisimulation [3, 35, 47]. The development is using a notion of ε-coupling defined in [54]. For simplicity, we ignore the labels in this section, i.e., we assume there is a single label.

To start with, we recall the basic notions related to coalgebraic (bi)simulation, formulated for the functor 𝒟 that we are mainly interested in in this work. Recall the definition of the sub-distribution functor 𝒟 on 𝐒𝐞𝐭𝐬: It maps a set X to 𝒟X, the set of discrete subdistributions on X and on arrows f:XY, 𝒟f:𝒟X𝒟Y is 𝒟f(φ)(y)=φ(f1({y}).

Coalgebras of the functor 𝒟 are Markov chains, formally they are pairs (X,c) of a carrier set X and transition map c:X𝒟X. We often just refer to the coalgebra by the transition map c. Given two such coalgebras, c:X𝒟X and d:Y𝒟Y a coalgebra homomorphism from (X,c) to (Y,d) is a map h:XY making the left diagram below commute.

Definition 21.

Given two coalgebras c:X𝒟X and d:Y𝒟Y, a coalgebraic bisimulation is a relation RX×Y such that there exists a coalgebra structure b:R𝒟R making the two projections π1:RX and π2:RY coalgebra homomorphisms, i.e., making the right span diagram above commute.

It is well known ([20, 49]) that this corresponds to the standard notion of probabilistic bisimulation ([38]). For the case of continuous distributions, taking a cospan instead of a span avoids the need for analytic spaces [17]. Back to the discrete case, one way to define the coalgebra structure b is using the notion of a coupling: If for all (x,y)R, there is a coupling β for μ=c(x) and ν=d(y), then setting b(x,y)=β provides the needed transition structure. Recall that β𝒟R is a coupling of μ𝒟X and ν𝒟Y if its marginals are μ and ν, respectively:

yYβ(x,y)=μ(x),xXβ(x,y)=ν(y).

Note that this definition of a coupling, fitting the definition of coalgebraic bisimulation for the subprobability distribution functor, ensures that a coupling of two subdistributions exists only if they have the same total mass. One could also define couplings by adding a dummy element and hence viewing a subdistribution as a full distribution, which provides a more general notion. However, these details are not relevant for what we are really interested in, which are ε-bisimulations.

We will use a relaxed notion of ε-coupling [54] to define ε-(bi)simulation in analogy with the coalgebraic definition above. Before we recall those, let us first mention coalgebraic simulations, due to [34, 32, 33] with a coalgebraic formulation for LMPs already in [21]. For this, note that an ordered functor F is a functor with an order on each object FX. The original definition in [34] uses preorders as minimal requirement, and also implies that for any map f:XY, Ff preserves the order, i.e., is monotone – a condition that is not used in our results. The subdistribution functor 𝒟 is ordered, e.g., by pointwise order. This order becomes trivial in case of the distribution functor which is the reason why we work with subdistributions here. Coalgebraic simulation can be defined for ordered functors, here we recall the definition in the special case of the subdistribution functor 𝒟. All notions involve lax and oplax morphism.

Definition 22.

A lax homomorphism from (X,c) to (Y,d) is a morphism l:XY with the property that dl𝒟lc, i.e., it makes the left lax diagram below commute. An oplax homomorphism from (X,c) to (Y,d) is a lax homomorphism for the dual/opposite order, that is a morphism o:XY that makes the middle diagram below commute:

The order on distributions is defined pointwise: For a set A and φ,ψ𝒟A, we have φψ iff for all aA, φ(a)ψ(a).

Note that, equivalently, φψ iff for all BA, φ(B)ψ(B), where the right-to-left implication follows from instantiating on singleton subsets and the left-to-right implication follows from the additivity of distributions.

Initially, generic coalgebraic simulations have been studied for ordered functors, by Hughes and Jacobs [34], using a notion of lax relation lifting. On the other hand, Hasuo [32, 33] discovered (generalizations of) forward, backward, and hybrid simulations in Kleisli categories and moreover showed in [33] that Hughes-Jacobs simulations are a special case. While Kleisli categories are appealing for soundness results, and monads suitable for traces come equipped with an order, we may safely ignore all Kleisli aspects here. We now recall the following notions of simulations [32], formulated for general categories of coalgebras (as long as the functor comes equipped with an order) where they amount to different names for lax/oplax morphisms, for the sake of making the connection to [32] explicit.

Definition 23.

A forward simulation from (X,c) to (Y,d) is a lax homomorphism from (Y,d) to (X,c). A backward simulation from (X,c) to (Y,d) is an oplax homomorphism. Combining the lax-commuting boxes leads to hybrid, forward-backward and backward-forward, simulations. In particular, Hughes-Jacobs simulation is a forward-backward simulation: It is a relation R on which there exists a coalgebra structure b:R𝒟R making π1 a forward simulation from (X,c) to (R,b) and π2 a backward simulation from (R,b) to (Y,d), depicted in the right diagram above.

Clearly, a symmetric simulation on a single system is a bisimulation.

In the rest of this section, we will show that ε-(bi)simulations can be depicted similarly. For this we will need bounded-lax-commutativity of morphisms as well as a notion of ε-coupling (which then directly gives an ε-relation-lifting). We will focus here on the functor 𝒟 only. These notions can be generalized to generic coalgebras for functors with suitable structure. However, those observations are beyond the scope of this paper and we leave them for future work.

Definition 24.

An ε-lax homomorphism from (X,c) to (Y,d) is a morphism l:XY that makes the left ε-lax diagram below commute, i.e., dlε𝒟lc. An ε-oplax homomorphism from (X,c) to (Y,d) is a morphism o:XY with the property that 𝒟ocεdo, i.e., it makes the right ε-lax diagram below commute.

The ε-order on distributions is defined by: For a set A and φ,ψ𝒟A, we have φεψ iff for all BA, φ(B)ψ(B)+ε.

 Remark 25.

Note that ε is not an order relation, namely it is not transitive. Note also that the definition of ε can not be expressed on elements (of A) only, as it is stronger than the property: for all aA, φ(a)ψ(a)+ε. Namely, the role of ε here, just like in the definition of ε-(bi)simulation is global.

We next recall the notion of ε-coupling from [54], where it is called an ε-weight function.

Definition 26.

Let RX×Y. A distribution β𝒟R is an ε-coupling for μ𝒟X and ν𝒟Y iff the following three conditions hold:

  1. 1.

    yYβ(x,y)μ(x), for all xX

  2. 2.

    xXβ(x,y)ν(y), for all yY

  3. 3.

    μ(X)xX,yYβ(x,y)+ε=(x,y)Rβ(x,y)+ε.

To be fully precise, we should consider β𝒟(X×Y) with support contained in R. We write β𝒟R and yet sometimes sum over all xX,yY, i.e., we identify β𝒟R with a distribution in 𝒟(X×Y) that assigns probability 0 to all pairs out of R and acts as β on R.

Before we proceed with the main observation of this section, we prove an auxiliary property that allows for rewriting the definition of ε-coupling.

Lemma 27.

Assume that Condition 1. in Definition 26 holds for β𝒟R with RX×Y and μ𝒟X. Then μ(X)xX,yYβ(x,y)+ε (i.e., Condition 3. in Definition 26) is equivalent to the condition

μ(S)xS,yYβ(x,y)+ε,for all SX.

This property again emphasises the global nature of ε in our situation. As a consequence, for an ε-coupling β𝒟R of μ𝒟X and ν𝒟Y, with RX×Y, for each SX:

xS,yYβ(x,y)μ(S)xS,yYβ(x,y)+ε.
Proposition 28.

Let (X,c) and (Y,d) be two 𝒟-coalgebras. The following three properties are equivalent for RX×Y:

  1. 1.

    R is an ε-simulation.

  2. 2.

    For every (x,y)R, there is an ε-coupling β𝒟R of μ=c(x) and ν=d(y).

  3. 3.

    The “ε-lax-bounded” span of morphisms on the left below commutes.

Note that Condition 2. in Proposition 28 can be taken as a definition of ε-relation-lifting, in analogy to relation lifting for 𝒟 being defined using the existence of a coupling for any pair of elements in R. This way was also taken in [34] with the definition of lax relation lifting.

As a consequence of Proposition 28, we immediately get that a relation RX×X on the states of a 𝒟-coalgebras is an ε-bisimulation iff the right diagram above commutes. This diagram resembles the defining diagram of the recently introduced uncertain bisimulations [46], but requires a double property (for ε as well as for op). A detailed comparison with this interesting work remains a future task.

As already mentioned, generalizing the notions of approximate (bi)simulations to generic coalgebras is an interesting direction for future work. A similar notion is being developed for cost automata [4].

5 Concluding remarks

We have shown that ε-bisimulations are closely connected with the Lévy-Prokhorov metric on (sub)probability distributions: The LP pseudometric yields a lifting of 𝒟 to a functor on pseudometric spaces and induces coinductive behaviour distance on LMPs, which is the greatest fixpoint of a suitable functional and turns out to be exactly the distance induced by ε-bisimilarity. Remarkably, any fixpoint distance of that functional defines an ε-bisimulation. This is the first time that a distance on distributions other than the Kantorovich distance is used for characterizing a known behavioural distance.

References

  • [1] Alessandro Abate. Approximation metrics based on probabilistic bisimulations for general state-space markov processes: A survey. Electronic Notes in Theoretical Computer Science, 297:3–25, 2013. Proceedings of the first workshop on Hybrid Autonomous Systems. doi:10.1016/j.entcs.2013.12.002.
  • [2] Alessandro Abate, Marta Z. Kwiatkowska, Gethin Norman, and David Parker. Probabilistic model checking of labelled markov processes via finite approximate bisimulations. In Franck van Breugel, Elham Kashefi, Catuscia Palamidessi, and Jan Rutten, editors, Horizons of the Mind. A Tribute to Prakash Panangaden - Essays Dedicated to Prakash Panangaden on the Occasion of His 60th Birthday, volume 8464 of Lecture Notes in Computer Science, pages 40–58. Springer, 2014. doi:10.1007/978-3-319-06880-0_2.
  • [3] Peter Aczel and Nax Mendler. A final coalgebra theorem. In Proc. 3rd CTCS, volume 389 of LNCS, pages 357–365. Springer, 1989. doi:10.1007/BFB0018361.
  • [4] Pedro Azevedo de Amorim, Mayuko Kori, and Koko Muroya. A framework for coalgebraic reward-sensitive bisimulation, 2025.
  • [5] Giorgio Bacci, Giovanni Bacci, Kim G. Larsen, and Radu Mardare. On-the-fly computation of bisimilarity distances. Log. Methods Comput. Sci., 13(2), 2017. doi:10.23638/LMCS-13(2:13)2017.
  • [6] Giovanni Bacci, Giorgio Bacci, Kim G. Larsen, and Radu Mardare. On the metric-based approximate minimization of markov chains. In Ioannis Chatzigiannakis, Piotr Indyk, Fabian Kuhn, and Anca Muscholl, editors, 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland, volume 80 of LIPIcs, pages 104:1–104:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ICALP.2017.104.
  • [7] C. Baier. On Algorithmic Verification Methods for Probabilistic Systems. Habilitation Thesis, 1998.
  • [8] Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT Press, 2008.
  • [9] Paolo Baldan, Filippo Bonchi, Henning Kerstan, and Barbara König. Coalgebraic behavioral metrics. Log. Methods Comput. Sci., 14(3), 2018. doi:10.23638/LMCS-14(3:20)2018.
  • [10] Gaoang Bian and Alessandro Abate. On the relationship between bisimulation and trace equivalence in an approximate probabilistic context. In Javier Esparza and Andrzej S. Murawski, editors, Foundations of Software Science and Computation Structures - 20th International Conference, FOSSACS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, volume 10203 of Lecture Notes in Computer Science, pages 321–337, 2017. doi:10.1007/978-3-662-54458-7_19.
  • [11] Richard Blute, Josée Desharnais, Abbas Edalat, and Panangaden Panangaden. Bisimulation for labelled Markov processes. In Proceedings of the Twelfth IEEE Symposium On Logic In Computer Science (LICS), Warsaw, Poland, 1997.
  • [12] Filippo Bonchi, Barbara König, and Daniela Petrisan. Up-to techniques for behavioural metrics via fibrations. Math. Struct. Comput. Sci., 33(4-5):182–221, 2023. doi:10.1017/S0960129523000166.
  • [13] Filippo Bonchi, Alexandra Silva, and Ana Sokolova. The Power of Convex Algebras. In CONCUR 2017, volume 85, pages 23:1–23:18. LIPIcs, 2017. doi:10.4230/LIPIcs.CONCUR.2017.23.
  • [14] Filippo Bonchi, Ana Sokolova, and Valeria Vignudelli. The theory of traces for systems with nondeterminism, probability, and termination. Log. Methods Comput. Sci., 18(2), 2022. doi:10.46298/LMCS-18(2:21)2022.
  • [15] Nicolas Bourbaki. Elements of Mathematics. Springer-Verlag, 1995. Original French edition published by MASSON, Paris in 1971.
  • [16] Keri D’Angelo, Sebastian Gurke, Johanna Maria Kirss, Barbara König, Matina Najafi, Wojciech Rozowski, and Paul Wild. Behavioural metrics: Compositionality of the kantorovich lifting and an application to up-to techniques. In Rupak Majumdar and Alexandra Silva, editors, 35th International Conference on Concurrency Theory, CONCUR 2024, September 9-13, 2024, Calgary, Canada, volume 311 of LIPIcs, pages 20:1–20:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPIcs.CONCUR.2024.20.
  • [17] Vincent Danos, Josée Desharnais, François Laviolette, and Prakash Panangaden. Bisimulation and cocongruence for probabilistic systems. Information and Computation, 2005. Special issue for selected papers from CMCS04. 22 pages.
  • [18] Luca de Alfaro, Marco Faella, and Mariëlle Stoelinga. Linear and branching metrics for quantitative transition systems. In Josep Díaz, Juhani Karhumäki, Arto Lepistö, and Donald Sannella, editors, Automata, Languages and Programming: 31st International Colloquium, ICALP 2004, Turku, Finland, July 12-16, 2004. Proceedings, volume 3142 of Lecture Notes in Computer Science, pages 97–109. Springer, 2004. doi:10.1007/978-3-540-27836-8_11.
  • [19] Luca de Alfaro and Rupak Majumdar. Quantitative solution of omega-regular games. In STOC ’01: Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 675–683, New York, NY, USA, 2001. ACM. doi:10.1145/380752.380871.
  • [20] Erik P. de Vink and Jan J. M. M. Rutten. Bisimulation for probabilistic transition systems: A coalgebraic approach. Theor. Comput. Sci., 221(1-2):271–293, 1999. doi:10.1016/S0304-3975(99)00035-3.
  • [21] Josée Desharnais. Labelled Markov Processes. PhD thesis, McGill University, 2000.
  • [22] Josée Desharnais, Vineet Gupta, R. Jagadeesan, and P. Panangaden. Metrics for labeled Markov processes. In Jos C. M. Baeten and S. Mauw, editors, Proceedings of 10th International Conference on Concurrency Theory, Eindhoven, The Netherlands, Lecture Notes in Computer Science, pages 258–273. Springer-Verlag, August 1999.
  • [23] Josée Desharnais, Vineet Gupta, R. Jagadeesan, and P. Panangaden. Approximating continuous Markov processes. In Proceedings of the 15th Annual IEEE Symposium On Logic In Computer Science, Santa Barbara, Californie, USA, 2000. pp. 95-106.
  • [24] Josée Desharnais, Vineet Gupta, Radha Jagadeesan, and Prakash Panangaden. Metrics for labelled Markov processes. Theoretical Computer Science, 318(3):323–354, 2004. doi:10.1016/j.tcs.2003.09.013.
  • [25] Josée Desharnais, R. Jagadeesan, Vineet Gupta, and P. Panangaden. The metric analogue of weak bisimulation for probabilistic processes. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science (LICS02), pages 413–422, Copenhagen, Denmark, July 2002. IEEE Computer Society. doi:10.1109/LICS.2002.1029849.
  • [26] Josée Desharnais, François Laviolette, and Mathieu Tracol. Approximate analysis of probabilistic processes: Logic, simulation and games. In Fifth International Conference on the Quantitative Evaluaiton of Systems (QEST 2008), 14-17 September 2008, Saint-Malo, France, pages 264–273. IEEE Computer Society, 2008. doi:10.1109/QEST.2008.42.
  • [27] Alessandro D’Innocenzo, Alessandro Abate, and Joost-Pieter Katoen. Robust PCTL model checking. In Thao Dang and Ian M. Mitchell, editors, Hybrid Systems: Computation and Control (part of CPS Week 2012), HSCC’12, Beijing, China, April 17-19, 2012, pages 275–286. ACM, 2012. doi:10.1145/2185632.2185673.
  • [28] A. Giacalone, P.Misra, and S. Prasad. Facile: A symmetric integration of concurrent and functional programming. In LNCS 352: TAPSOFT 89, 1989.
  • [29] Alessandro Giacalone, Chi chang Jou, and Scott A. Smolka. Algebraic reasoning for probabilistic concurrent systems. In Proc. IFIP TC2 Working Conference on Programming Concepts and Methods, pages 443–458. North-Holland, 1990.
  • [30] Antoine Girard and George J. Pappas. Approximate bisimulation relations for constrained linear systems. Automatica, 43(8):1307–1317, 2007. doi:10.1016/j.automatica.2007.01.019.
  • [31] Sergey Goncharov, Dirk Hofmann, Pedro Nora, Lutz Schröder, and Paul Wild. Kantorovich functors and characteristic logics for behavioural distances. In Orna Kupferman and Pawel Sobocinski, editors, Foundations of Software Science and Computation Structures - 26th International Conference, FoSSaCS 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023, Paris, France, April 22-27, 2023, Proceedings, volume 13992 of Lecture Notes in Computer Science, pages 46–67. Springer, 2023. doi:10.1007/978-3-031-30829-1_3.
  • [32] Ichiro Hasuo. Generic forward and backward simulations. In CONCUR 2006, pages 406–420. LNCS 4137, 2006. doi:10.1007/11817949_27.
  • [33] Ichiro Hasuo. Generic forward and backward simulations II: probabilistic simulation. In CONCUR 2010, pages 447–461. LNCS 6269, 2010. doi:10.1007/978-3-642-15375-4_31.
  • [34] Jesse Hughes and Bart Jacobs. Simulations in coalgebra. Theor. Comput. Sci., 327(1-2):71–108, 2004. doi:10.1016/J.TCS.2004.07.022.
  • [35] Bart Jacobs. Introduction to Coalgebra: Towards Mathematics of States and Observation, volume 59 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 2016. doi:10.1017/CBO9781316823187.
  • [36] Bart Jacobs, Alexandra Silva, and Ana Sokolova. Trace semantics via determinization. J. Comput. Syst. Sci., 81(5):859–879, 2015. doi:10.1016/J.JCSS.2014.12.005.
  • [37] Dexter Kozen. Semantics of probabilistic programs. Journal of Computer and Systems Sciences, 22:328–350, 1981. doi:10.1016/0022-0000(81)90036-2.
  • [38] Kim G. Larsen and Arne Skou. Bisimulation through probablistic testing. Information and Computation, 94:1–28, 1991. doi:10.1016/0890-5401(91)90030-6.
  • [39] Radu Mardare, Prakash Panangaden, and Gordon D. Plotkin. Quantitative algebraic reasoning. In Martin Grohe, Eric Koskinen, and Natarajan Shankar, editors, Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16, New York, NY, USA, July 5-8, 2016, pages 700–709. ACM, 2016. doi:10.1145/2933575.2934518.
  • [40] Radu Mardare, Prakash Panangaden, and Gordon D. Plotkin. On the axiomatizability of quantitative algebras. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, June 20-23, 2017, pages 1–12. IEEE Computer Society, 2017. doi:10.1109/LICS.2017.8005102.
  • [41] R. Milner. Communication and Concurrency. Prentice Hall, 1989.
  • [42] Matteo Mio, Ralph Sarkis, and Valeria Vignudelli. Combining nondeterminism, probability, and termination: Equational and metric reasoning. In 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, pages 1–14. IEEE, 2021. doi:10.1109/LICS52264.2021.9470717.
  • [43] Matteo Mio, Ralph Sarkis, and Valeria Vignudelli. Universal quantitative algebra for fuzzy relations and generalised metric spaces. Log. Methods Comput. Sci., 20(4), 2024. doi:10.46298/LMCS-20(4:19)2024.
  • [44] Prakash Panangaden. Labelled Markov Processes. Imperial College Press, 2009.
  • [45] Yu. V. Prokhorov. Convergence of random processes and limit theorems in probability theory. Theory of Probability & Its Applications, 1(2):157–214, 1956. doi:10.1137/1101016.
  • [46] Jurriaan Rot and Thorsten Wißmann. Bisimilar states in uncertain structures. In Paolo Baldan and Valeria de Paiva, editors, 10th Conference on Algebra and Coalgebra in Computer Science, CALCO 2023, Indiana University Bloomington, IN, USA, June 19-21, 2023, volume 270 of LIPIcs, pages 12:1–12:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.CALCO.2023.12.
  • [47] J.J.M.M. Rutten. Universal coalgebra: A theory of systems. Theoretical Computer Science, 249:3–80, 2000. doi:10.1016/S0304-3975(00)00056-6.
  • [48] R. Segala and N.A. Lynch. Probabilistic simulations for probabilistic processes. In Proc. Concur’94, pages 481–496. LNCS 836, 1994.
  • [49] A. Sokolova. Coalgebraic Analysis of Probabilistic Systems. PhD thesis, TU Eindhoven, 2005.
  • [50] Timm Spork, Christel Baier, Joost-Pieter Katoen, Jakob Piribauer, and Tim Quatmann. A spectrum of approximate probabilistic bisimulations. In Rupak Majumdar and Alexandra Silva, editors, 35th International Conference on Concurrency Theory, CONCUR 2024, September 9-13, 2024, Calgary, Canada, volume 311 of LIPIcs, pages 37:1–37:19. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, August 2024. doi:10.4230/LIPIcs.CONCUR.2024.37.
  • [51] David Sprunger, Shin-ya Katsumata, Jérémy Dubut, and Ichiro Hasuo. Fibrational bisimulations and quantitative reasoning. In Corina Cîrstea, editor, Coalgebraic Methods in Computer Science, pages 190–213, Cham, 2018. Springer International Publishing. doi:10.1007/978-3-030-00389-0_11.
  • [52] David Sprunger, Shin-ya Katsumata, Jérémy Dubut, and Ichiro Hasuo. Fibrational bisimulations and quantitative reasoning: Extended version. J. Log. Comput., 31(6):1526–1559, 2021. doi:10.1093/LOGCOM/EXAB051.
  • [53] Qiyi Tang and Franck van Breugel. Algorithms to compute probabilistic bisimilarity distances for labelled markov chains. In Proc. CONCUR 2017, volume 85 of LIPIcs, pages 27:1–27:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.CONCUR.2017.27.
  • [54] Mathieu Tracol, Josée Desharnais, and Abir Zhioua. Computing distances between probabilistic automata. In Mieke Massink and Gethin Norman, editors, Proceedings Ninth Workshop on Quantitative Aspects of Programming Languages, QAPL 2011, Saarbrücken, Germany, April 1-3, 2011, volume 57 of EPTCS, pages 148–162, 2011. doi:10.4204/EPTCS.57.11.
  • [55] Franck van Breugel. Probabilistic bisimilarity distances. ACM SIGLOG News, 4(4):33–51, November 2017. doi:10.1145/3157831.3157837.
  • [56] Franck van Breugel, Claudio Hermida, Michael Makkai, and James Worrell. Recursively defined metric spaces without contraction. Theor. Comput. Sci., 380(1-2):143–163, 2007. doi:10.1016/j.tcs.2007.02.059.
  • [57] Franck van Breugel, Babita Sharma, and James Worrell. Approximating a behavioural pseudometric without discount for probabilistic systems. In Helmut Seidl, editor, FoSSaCS, volume 4423 of Lecture Notes in Computer Science, pages 123–137. Springer, 2007. doi:10.1007/978-3-540-71389-0_10.
  • [58] Franck van Breugel and James Worrell. Towards quantitative verification of probabilistic transition systems. In ICALP ’01: Proceedings of the 28th International Colloquium on Automata, Languages and Programming,, pages 421–432, London, UK, 2001. Springer-Verlag. doi:10.1007/3-540-48224-5_35.
  • [59] Franck van Breugel and James Worrell. A behavioural pseudometric for probabilistic transition systems. Theor. Comput. Sci., 331(1):115–142, February 2005. doi:10.1016/j.tcs.2004.09.035.
  • [60] Rob J. van Glabbeek, Scott A. Smolka, and Bernhard Steffen. Reactive, generative and stratified models of probabilistic processes. Inf. Comput., 121(1):59–80, 1995. doi:10.1006/inco.1995.1123.
  • [61] Mingsheng Ying and Martin Wirsing. Approximate bisimilarity. In AMAST ’00: Proceedings of the 8th International Conference on Algebraic Methodology and Software Technology, pages 309–322, London, UK, 2000. Springer-Verlag. doi:10.1007/3-540-45499-3_23.

Appendix A Proofs

Lemma 10. [Restated, see original statement.]

Let S,d be a pseudometric space, on which an LMP τ:S(𝒟S)𝒜 is defined. Let s0 and s1 be states. Then

ΔLP(d)(s0,s1) =inf{ε(a𝒜,AS:τa(si)(A)τa(s1i)(Aεd)+ε,i=0,1)}.

Proof.

For a𝒜 and ε>0, we define the following predicate:

P(a,ε):=(AS:τa(si)(A)τa(s1i)(Aεd)+ε,i=0,1).

This family of predicates satisfies

if γ>ε,P(a,ε)P(a,γ). (6)

We now prove that

supa𝒜inf{εP(a,ε)}=inf{ε(a𝒜:P(a,ε))}
:

The following sequence of arguments shows this inequality.

{εP(a0,ε)} {ε(a𝒜:P(a,ε))} for all a0𝒜, so
inf{εP(a0,ε)} inf{ε(a𝒜:P(a,ε))} for all a0𝒜, and so
supa𝒜inf{εP(a,ε)} inf{ε(a𝒜:P(a,ε))}.
:

Let α=supa𝒜inf{εP(a,ε)}. Then for all a𝒜, αinf{εP(a,ε)}. Let γ>α. Then we have

for all a𝒜 γ{εP(a,ε)} by (6) and because γ>inf{εP(a,ε)}
so γa𝒜{εP(a,ε)}
so γ{ε(a𝒜:P(a,ε))}
so γinf{ε(a𝒜:P(a,ε))}.

This implies αinf{ε(a𝒜:P(a,ε))} as well. Otherwise, suppose α<ι:=inf{ε(a𝒜:P(a,ε))}. Then α<(α+ι)/2<ι, so the result above with γ=(α+ι)/2 yields (α+ι)/2ι, a contradiction.

The proof that ΔLP is monotonic needs the following simple lemma.

Lemma 29.

If d2d1, we have Xεd2Xεd1.

Proof.

Assume d1d2. Let yXεd2. Then there is some xX such that d2(x,y)<ε. This gives d1(x,y)d2(x,y)<ε. So yXεd1.

Proposition 14. [Restated, see original statement.]

ΔLP is monotonic.

Proof.

Let d1d2. Let s0, s1 be states, let a𝒜 and define the two sets

E1 ={ε(AS:τa(si,A)τa(s1i,Aεd1)+ε,i=0,1)}
andE2 ={ε(AS:τa(si,A)τa(s1i,Aεd2)+ε,i=0,1)}.

Then E2E1. Indeed, let εE2 and AS. Then τa(si,A)τa(s1i,Aεd2)+ετa(s1i,Aεd1)+ε, because Aεd2Aεd1 by Lemma 29. So infE1infE2 and hence ΔLP(d1)=supa𝒜infE1supa𝒜infE2=ΔLP(d2), as wanted.

Proposition 15. [Restated, see original statement.]

d is a fixpoint of ΔLP, and hence it is the greatest fixpoint.

Proof.

Let s0, s1S. We want ΔLP(d)(s0,s1)=d(s0,s1). Consider the following sets

EΔ ={εa𝒜:(XS:τa(si,X)τa(s1i,Xεd)+ε,i=0,1)}
andE ={εs0εs1}
={εR an ε-bisimulation s.t. s0Rs1, that is, for all (t0,t1)R, we have
a𝒜:(XS:τa(ti,X)τa(t1i,R(X))+ε,i=0,1)}.

Then Δ(d)(s0,s1)=infEΔ by Lemma 10 and d(s0,s1)=infE. We prove infEΔ=infE.

:

We prove that EΔE. Let εEΔ. Let Rε={(x,y)xεy}, that is, the biggest ε-bisimulation. We need to prove that s0εs1. For that we prove that Rε+:=Rε{(s0,s1)} is an ε-bisimulation. Let a𝒜 and XS. Then XεdRε+(X). Indeed, let yXεd then d(y,x)<ε for some xX. So, because d(y,x)=inf{eyex}, we have yγx for some γ<ε, and hence yεx as every γ-bisimulation is an ε-bisimulation. Hence yRε+(X). We only need to prove the bisimulation condition for the pair (s0,s1). Then, for i=0,1 we have

τa(si,X) τa(s1i,Xεd)+ε by choice of ε
τa(s1i,Rε+(X))+ε because XεdRε+(X).

So s0εs1 and hence εE.

:

Let εE. We prove that for all γ>ε, we have γEΔ. This will prove that infEΔε. Since ε is arbitrary, this will give the result: infEΔinfE.
So let γ>ε. Because εE, we have s0εs1, and hence there is some ε-bisimulation R such that s0Rs1. We want γEΔ, so let a𝒜 and XS. We want τa(si,X)τa(s1i,Xγd)+γ,i=0,1. First observe that R(X)Xγd. Indeed, if yR(X), then there is some xX such that yRx. Because R is an ε-bisimulation, d(y,x)ε, so yXγd (we cannot say yXεd because these sets are open balls). Then, for i=0,1:

τa(si,X) τa(s1i,R(X))+ε, because R is an ε-bisimulation
τa(s1i,Xγd)+γ, because R(X)Xγdand ε<γ.

So, γEΔ, as wanted. We have proven that d is a fixpoint of ΔLP, and it is the greatest by Proposition 12.

Proposition 16. [Restated, see original statement.]

𝒟LP is a functor on 𝐏𝐌𝐞𝐭𝟏.

Proof.

The only thing to prove is that 𝒟LPf is nonexpansive. Let φ0,φ1𝒟LPX0,d0. Then (we omit the symmetric inequality for simplicity)

δLPd1(𝒟LPf(φ0),𝒟LPf(φ1)) = inf{εBX1:φ0(f1(B))φ1(f1(Bεd1))+ε}
() inf{εBX1:φ0(f1(B))φ1((f1B)εd0)+ε}
= inf{εA{f1(B)BX1}:φ0(A)φ1(Aεd0)+ε}
() inf{εAX0:φ0(A)φ1(Aεd0)+ε}
= δLPd0(φ0,φ1)

where the marked inequalities are justified below. For the first one, marked with (), let

 V={εBX1:φ0(f1(B))φ1((f1B)εd0)+ε} and
W={εBX1:φ0(f1(B))φ1(f1(Bεd1))+ε}.

We show that VW. Let εV. For every BX1, let us first show that f1(B)εd0f1(Bεd1). Indeed, take xf1(B)εd0 and let af1(B) be such that d0(x,a)<ε. Then by nonexpansivity of f,

d1(f(x),f(a))d0(x,a)<ε

and so f(x)Bεd1, because f(a)B. So xf1(Bεd1). Now combining this with the inequality condition in V we get

φ0(f1(B))φ1(f1(B)εd0)+εφ1(f1(Bεd1))+ε

and so εW.
For the inequality marked with (**) , observe that if ε satisfies the inequality for all AX0 then it does for all A{f1(B)BX1}. So the infimum is taken on a possibly bigger set on the left-hand side of the inequality, which concludes the argument.

Proposition 17. [Restated, see original statement.]

The functor 𝒟LP is locally nonexpansive, that is, for f1,f2XY

δ𝒟LPX𝒟LPY(𝒟LPf1,𝒟LPf2)dXY(f1,f2)

with δ𝒟LPX𝒟LPY being the metric on the hom-set 𝒟LPX,dX𝒟LPY,dY.

Proof.

Let f1,f2XY. We have that

δ𝒟LPX𝒟LPY(𝒟LPf1,𝒟LPf2)=supφ𝒟XδLPdY(𝒟LPf1(φ),𝒟LPf2(φ)).

Let φ𝒟X and recall that 𝒟LPfi(φ)=φ(fi1()). We will show that

δLPdY(φ(f11()),φ(f21()))dXY(f1,f2).

Let α=dXY(f1,f2)=supxXdY(f1(x),f2(x)). Let BY. We show f11(B)f21(BαdY)

f11(B) ={xyB:y=f1(x)}
={xyB:y=f1(x) and dY(y,f2(x))α} since dY(f1(x),f2(x))α
{xyB:dY(y,f2(x))α}
={xf2(x)BαdY}
=f21(BαdY).

This implies that α satisfies (BY:φ(f11(B))φ(f21(BαdY))+α), and similarly with 1 and 2 inverted.

δLPdY(φ(f11()),φ(f21())) =inf{εBY:φ(f11(B))φ(f21(BεdY))+ε}
α=dXY(f1,f2),

as wanted.

Lemma 19. [Restated, see original statement.]

The unit η of 𝒟 is nonexpansive with respect to the Lévy-Prokhorov distance. Moreover, it is an isometry, i.e., δLPd(1x,1y)=d(x,y).

Proof.

We have

δLPd(ηX(x),ηX(y)) =δLPd(1x,1y)
=inf{εAX:1x(A)1y(Aεd)+ε,1y(A)1x(Aεd)+ε}.

Let A be a subset of X. We distinguish four cases in which we consider the relevant inequalities:

1x(A) 1y(Aεd)+ε (7)
and 1y(A) 1x(Aεd)+ε (8)
  1. 1.

    xA,yA: Then 1x(A)=0 and 1y(A)=0, and any ε0 satisfies both (7) and (8).

  2. 2.

    xA,yA: Here 1x(A)=0 and 1y(A)=1, and (7) holds always, but (8) holds for ε>d(x,y) as then xAεd and hence 1x(Aεd)=1, and might not hold for εd(x,y), e.g. when A={y} and ε<d(x,y)1, as then 1x(Aεd)=0.

  3. 3.

    xA,yA: This case is dual to the second case. Here (8) always holds, but (7) holds for ε>d(x,y) and might not hold for εd(x,y) as, e.g., for A={x} we have yAεd if and only if ε>d(x,y).

  4. 4.

    xA,yA: Then 1x(A)=1 and 1y(A)=1, as well as 1x(Aεd)=1y(Aεd)=1, and again, as in the first case, any ε0 satisfies both (7) and (8).

Hence, both inequalities are satisfied for all A iff ε>d(x,y), and therefore δLPd(1x,1y)=d(x,y).

Lemma 27. [Restated, see original statement.]

Assume that Condition 1. in Definition 26 holds for β𝒟R with RX×Y and μ𝒟X. Then μ(X)xX,yYβ(x,y)+ε (i.e., Condition 3. in Definition 26) is equivalent to the condition

μ(S)xS,yYβ(x,y)+ε,for all SX.

Proof.

The right-to-left implication is immediate, as XX. For the left-to-right implication, assume that for some SX we have

μ(S)>xS,yYβ(x,y)+ε.

Since, by assumption, for all xX, and hence for all xXS, μ(x)yYβ(x,y), we have

μ(X) = μ(S)+μ(XS)
> xS,yYβ(x,y)+ε+xXSμ(x)
xS,yYβ(x,y)+xXSyYβ(x,y)+ε
= xX,yYβ(x,y)+ε.

The property now follows by contraposition.

Proposition 28. [Restated, see original statement.]

Let (X,c) and (Y,d) be two 𝒟-coalgebras. The following three properties are equivalent for RX×Y:

  1. 1.

    R is an ε-simulation.

  2. 2.

    For every (x,y)R, there is an ε-coupling β𝒟R of μ=c(x) and ν=d(y).

  3. 3.

    The “ε-lax-bounded” span of morphisms on the left below commutes.

Proof.

The equivalence of 1. and 2. has been shown in [54]. Let R be an ε-simulation. For (x,y)R, let μ=c(x) and ν=d(y). Define b:R𝒟R by (x,y)β, the epsilon coupling of μ and ν that exists.

Then, unfolding the definitions, we get:

  • 𝒟π1bcπ1 is equivalent to: for all (x,y)R, condition 1. from Definition 26 holds.

  • 𝒟π2bdπ2 is equivalent to: for all (x,y)R, condition 2. from Definition 26 holds.

  • cπ1ε𝒟π1b is equivalent to: for all (x,y)R, for all SX,

    μ(S)xS,yYβ(x,y)+ε

    which by Lemma 27 is equivalent to condition 3. from Definition 26.

These facts yield the equivalence of 3. with 2. (and hence 1. as well).