Abstract 1 Introduction 2 Preliminaries 3 Reward Interfaces 4 Compatibility and Composition of Reward Interfaces 5 Reward Interface Refinement 6 Checking Compatibility, Refinement, and Implementability 7 Related Work 8 Conclusion References Appendix A Appendix: Compatibility of Reward Interfaces Appendix B Appendix: Properties of Reward Interface Refinement

Reward Interfaces
with Best-Effort Implementations

Rafael Dewes ORCID CISPA Helmholtz Center for Information Security, Saarbrücken, Germany Rayna Dimitrova ORCID CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Abstract

Interface theories, notably interface automata, serve as expressive frameworks for component-based design, specifying component behavior and interaction in concurrent systems. Traditional interface formalisms specify assumptions that a component’s environment must satisfy and the guarantees that each component provides. This qualitative view of component interaction based on imposing strict assumptions and Boolean guarantees may, however, not be expressive enough to capture the system’s allowed or desired behaviors under different environments.

In this paper, we introduce reward interfaces to support component-based design while accommodating multi-valued correctness requirements and adaptive best-effort satisfaction of component’s guarantees. Building upon interface automata, our framework enables modeling a rich class of quantitative component specifications. We propose formal notions of implementation, refinement and compatibility for reward interfaces. We study a class of reward interfaces with automata-based representations, for which we provide algorithms for checking compatibility and refinement, and existence of best-effort implementations. Our framework offers a comprehensive approach to reward interface specification and design.

Keywords and phrases:
Component-based design, interface automata, quantitative specifications
Copyright and License:
[Uncaptioned image] © Rafael Dewes and Rayna Dimitrova; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory
Editors:
Stefano Guerrini and Barbara König

1 Introduction

In system design and analysis, component-based techniques are essential for managing the increasing size and complexity of modern systems. Specification theories, particularly those based on interfaces or contracts [3, 9, 14], provide structured frameworks to address the challenges posed by concurrent systems. Interface theories [15, 18, 10] are especially suited for expressing interactions and dependencies of subsystems, offering formal specifications for analysis and verification. Here, each component is modeled by an interface, enabling a modular, independent design process, while ensuring compatibility between components.

One well-studied formalism are automata-based stateful interfaces, building on de Alfaro and Henzinger’s seminal work on interface automata [14]. They model components with distinct input and output actions, synchronizing with other components. Key properties are compatibility, which allows abstraction of multiple components into a single subsystem, and refinement, which supports independent component implementation. Interface automata have been extended to express more complex system behaviors. For example, resource interfaces [6] handle quantitative aspects like resource usage, while modal interfaces [20, 23, 21] capture richer properties such as liveness, expanding the expressivity of the framework. Further extensions [25, 24] incorporate state-based contracts, which enables concepts like shared memory in interface theories. A core motivation for interface theories is to simplify the design process by relieving designers from defining responses for every possible input. Typically, interfaces adopt an optimistic stance, allowing components to assume specific behavior from their environment. Inputs are either allowed or disallowed, with no obligation to handle disallowed inputs. This assumption holds as long as the system is closed, meaning all inputs come from other components. However, for open systems, where external inputs are beyond our control, these strong assumptions must be reconsidered.

We introduce a novel approach of reward interfaces, building on interface automata, that aims to enhance the design process while accommodating an unrestricted external environment. Central to our approach is the concept of good-enough satisfaction, based on the idea of good-enough synthesis [2], which says satisfaction of requirements is only necessary under feasible input conditions. This notion enables designers to focus on essential aspects of system behavior, removing the task of considering more explicit environment restrictions. It also naturally lends itself to the quantitative domain, introducing an additional direction in expressing complex properties to interface specifications, such as graceful degradation.

We consider an automaton structure coupled with a multi-valued reward function, defined over sequences of inputs and outputs of the interface, and require best-effort satisfaction from its implementation. While this generally loosens obligations, it selects for high-quality implementations if they exist. The interface automaton provides strict assumptions and guarantees, but is limited to safety properties. The reward function is more flexible, and able to express a substantial class of properties including liveness. This keeps the automaton simple and interpretable, offloading complicated requirements to the function without obscuring essential interaction restrictions. Our approach allows designers to succinctly capture complex system behaviors, which we illustrate below using a simple example.

Example 1.

Suppose we want to design a message distribution system 𝖲. The system 𝖲 can receive messages 𝖾𝟣,𝖾𝟤 and should relay the received messages in the respective order via the output actions 𝗈𝟣 and 𝗈𝟤. The system is limited in that it cannot produce an output while receiving an input, and it cannot send more than one output at a time. If the system has to handle both 𝖾𝟣 and 𝖾𝟤 during an execution, this will consume more resources.

Formally, 𝖲 is modeled via the input actions 𝖾𝟣,𝖾𝟤, controlled by the external environment, and output actions 𝗈𝟣, 𝗈𝟤 and 𝗈𝟥, with the following requirements: Once 𝖾𝟣 occurs, then 𝖲 must output 𝗈𝟣, and once 𝖾𝟤 occurs, 𝖲 must output 𝗈𝟤. If both inputs are received, then 𝗈𝟣 and 𝗈𝟤 must be produced in the same order. 𝖲 must not perform actions 𝗈𝟣, 𝗈𝟤 before 𝖾𝟣 or 𝖾𝟤, respectively, has happened. The additional output 𝗈𝟥 is unconstrained by the specification.

The specification of 𝖲 is formalized as a reward function 𝖲 which maps the possible execution traces of 𝖲 to numerical values reflecting the extent to which the requirements are satisfied. Concretely, 𝖲 maps (possibly infinite) sequences σ over the alphabet Σ𝖲={𝖾𝟣,𝖾𝟤,𝗈𝟣,𝗈𝟤,𝗈𝟥} to values v{0,14,12,1}.

Executions where 𝗈𝟣 occurs after an 𝖾𝟣, and no input 𝖾𝟤 is received, are awarded the maximum value of 1. The same value 1 is assigned to executions where the only input is 𝖾𝟤, and 𝗈𝟤 occurs after. If both 𝖾𝟣 and 𝖾𝟤 occur, and both 𝗈𝟣 and 𝗈𝟤 occur in the respective order, the achieved value will be 12. This represents the more demanding input behavior from the environment, abstracting a lowered efficiency, and is not a penalty on the system. The value is lowered to 14 if the order is reversed. Executions where an input is erroneously or not at all relayed are assigned value 0, violating the specification. We also give value 0 to sequences without any input, or an infinite sequence of inputs, as the system would be unable to produce output if continuously receiving inputs from the environment.

Since the inputs 𝖾𝟣,𝖾𝟤 are under the control of the environment, the system cannot force them to occur. This means that no implementation of 𝖲 can guarantee positive satisfaction of 𝖲. A more realistic requirement is to ask 𝖲 to achieve the maximal satisfaction value possible for each sequence of input actions produced by the environment. In this best-effort view, the obligations on 𝖲 depend on the input provided by the environment. Thus, if the environment does not provide 𝖲 with an input, 𝖲 is not expected to achieve value higher than 0. Formally, we identify the so called (𝖲,v)-hopeful input sequences, which allow for achieving satisfaction value v. Here, for v=1 these are the sequences of the form 𝖾𝟣+ or 𝖾𝟤+.

Suppose that internally, 𝖲 is to be designed as the composition of two components P and Q. Component P is responsible for producing the output 𝗈𝟣, and component Q for outputs 𝗈𝟤,𝗈𝟥. This is expressed via the interface automata AP and AQ modeling the allowed interactions of P and Q, as depicted in Figure 1. AP and AQ synchronize on actions 𝗉, 𝗊, and operate independently otherwise. They can enable each other to produce outputs 𝗈𝟣,𝗈𝟤.

However, interface automata alone are unable to capture the quantitative requirement expressed via the function 𝖲. The modeling framework of reward interfaces that we propose in this paper addresses this limitation. Reward interfaces equip interface automata with reward functions to capture the quantitative specifications that components must satisfy. In the coming sections, we will see how reward interfaces for P and Q model local quantitative specifications such that their combination captures the high-level specification 𝖲.  

(a) Interface automaton AP with additional output action 𝗉 and additional input action 𝗊.
(b) Interface automaton AQ with additional output action 𝗊 and additional input action 𝗉.
Figure 1: Interface automata describing the components of the system in Example 1.

As Example 1 shows, our framework offers an elegant way to describe systems of interacting components. Our reward interfaces build directly on interface automata, adding expressivity through the reward function without necessarily making the automata more complex.

We introduce the formal definition of reward interfaces in Section 3, and define reward functions in a general way to maintain a high degree of flexibility. The notions of compatibility and refinement on reward interfaces are defined in Section 4 and Section 5 respectively, and we show that these entail desirable properties of interfaces. In Section 6 we provide a collection of algorithmic solutions for assessing these properties for a specific class of interfaces. Concretely, we consider reward functions with finite range that are represented as automata. For brevity, some of the missing proofs are presented in full in the appendix.

2 Preliminaries

In this section, we introduce necessary notation and preliminaries. Further, we recall and adapt the definition of interface automata and related notions from [14].

Languages and Automata

For an alphabet Σ, the set Σ:=ΣΣω contains all finite and infinite words over Σ. Given a sequence σ(ΣΣ), we denote with σ|Σ the projection of σ to Σ, that is, σ|Σ(Σ) is the sequence obtained from σ by removing all letters in ΣΣ. For a sequence σΣ, we denote with σ[i]Σ the letter of σ at the i-th position.

A non-deterministic finite automaton (NFA) over an alphabet Σ is a tuple 𝒩=Q,Σ,δ,Q0,F with finite set of states Q, initial states Q0Q, transition relation δQ×Σ×Q, and a set of accepting states FQ. A run of 𝒩 on a finite word σ=a1anΣ is a finite sequence ρ=ρ0ρ1ρnQ such that ρ0Q0 and for every i<n, it holds that ρi+1δ(ρi,σi+1). A run ρ is accepting if and only if ρnF. A non-deterministic Büchi automaton (NBA) =Q,Σ,δ,Q0,B on infinite words instead has a set of Büchi-accepting states BQ of which some must be visited infinitely often. A run ρ of on σΣω is defined analogously, and is accepting if and only if for every i there exists ji such that ρ[j]B. The language of an automaton (𝒜) is the set of words σ on which 𝒜 has an accepting run. For a universal co-Büchi automaton (UCW) 𝒞=Q,Σ,δ,Q0,B, a run ρ of 𝒞 on a word σΣω is accepting if and only if there exists i such that for all ji, ρ[j]B, and σΣω is only accepted by 𝒞 if every run is accepting. We define the size of an automaton 𝒜 as the total number of states and transitions, i.e. |𝒜|=|Q|+|δ|.

We use to denote the set of real numbers, and define , as the elements where >n> for all n. The set ={} then contains the real numbers and . We define conventionally the infimum over the empty set as inf()=.

Interface Automata

Interface automata [14] are a modeling formalism for specifying the interactions between components and their environment. Components communicate via synchronization on input and output actions. In contrast to the interface automata in [14], we distinguish external inputs ΣE, which are external to the overall system and broadcast to all components, from inputs ΣI that come from system components outside of A and are subject to assumptions specified by A. The external inputs enable synchronization between multiple components on the same input action, and are used to model behavior uncontrollable by the system.

Definition 2 (Interface Automaton (adapted from [14])).

An interface automaton A=V,Vinit,ΣI,ΣO,ΣE,ΣH,𝒯 is a tuple where:

  • V is a finite set of states and VinitV is a set of initial states. We require that Vinit contains at most one state. If Vinit=, then A is called empty.

  • ΣI,ΣO,ΣE,ΣH are mutually disjoint finite sets of input, output, external input and internal actions respectively. Let ΣA:=ΣIΣOΣEΣH be the set of all actions of A.

  • 𝒯V×ΣA×V is a set of transitions such that for every state vV it holds that:

    • for every external input action aΣE, there exists vV such that (v,a,v)𝒯;

    • for all aΣEΣI and v,v′′V with (v,a,v)𝒯 and (v,a,v′′)𝒯 we have v=v′′.

Note that we require A to be input enabled on external input actions ΣE, i.e., it accommodates any external input in every state. We also require that A is input-deterministic on ΣEΣI, in order to ensure compositionality of parallel composition [15].

We define the size of A as |A|=|V|+|𝒯|. For uV, we define the interface automaton Au=V,{u},ΣI,ΣO,ΣE,ΣH,𝒯 obtained from A by replacing the set of initial states by {u}.

Let A=V,Vinit,ΣI,ΣO,ΣE,ΣH,𝒯 be an interface automaton. An execution of A is an alternating sequence v0,a0,v1,a1, of states viV and actions aiΣA such that (vi,ai,vi+1)𝒯 for all i0. An execution fragment is a finite prefix v0,a0,v1,a1,,vn of an execution ending in a state. A state vV is reachable in A if there exists an execution fragment starting in some v0Vinit that ends in v. We denote with 𝖱𝖾𝖺𝖼𝗁(A)V the set of states reachable in A. For ΣΣA, let 𝖤𝗇𝖺𝖻𝗅𝖾𝖽A(v,Σ):={aΣv.(v,a,v)𝒯} be the set of actions from Σ enabled in state v.

Two key notions in the theory of interface automata are the composability and product of automata. Whether two interface automata are composable depends on their actions. They must not share input, output and internal actions, but may have joint external inputs. For two composable interface automata, we can construct their product. This is formalized in the definitions below. For the remainder of this section, let AP=VP,VPinit,ΣPI,ΣPO,ΣPE,ΣPH,𝒯P and AQ=VQ,VQinit,ΣQI,ΣQO,ΣQE,ΣQH,𝒯Q be interface automata.

Definition 3 (Composability [14]).

Two interface automata AP and AQ are composable if ΣPIΣQI=, ΣPOΣQO=, ΣPHΣQ=, and ΣQHΣP=.

We define 𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ):=(ΣPIΣQO)(ΣQIΣPO), and denote with 𝖩𝗈𝗂𝗇𝗍𝖤(AP,AQ):=ΣPEΣQE the set of joint external inputs of AP and AQ.

Definition 4 (Product).

Let AP and AQ be composable. Their product APAQ=VPQ,VPQinit,ΣPQI,ΣPQO,ΣPQE,ΣPQH,𝒯PQ is the interface automaton with the components defined as follows:

  • VPQ:=VP×VQ, VPQinit=VPinit×VQinit,

  • ΣPQI=(ΣPIΣQI)𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ),ΣPQO=(ΣPOΣQO)𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ),ΣPQE=ΣPEΣQE,ΣPQH=(ΣPHΣQH)𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ),

  • 𝒯PQ={((vP,vQ),α,(vP,vQ))α(𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ)𝖩𝗈𝗂𝗇𝗍𝖤(AP,AQ))(vP,α,vP)𝒯P(vQ,α,vQ)𝒯Q}{((vP,vQ),α,(vP,vQ))α(𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ)𝖩𝗈𝗂𝗇𝗍𝖤(AP,AQ))(vP,α,vP)𝒯P}{((vP,vQ),α,(vP,vQ))α(𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ)𝖩𝗈𝗂𝗇𝗍𝖤(AP,AQ))(vQ,α,vQ)𝒯Q}.

Note that the actions 𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ) become internal actions of APAQ, and P and Q must synchronize on shared and joint external input actions.

Given two composable interface automata AP and AQ, a product state (vP,vQ)VP×VQ is called an illegal state of the product automaton APQ if and only if there exists a𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ) such that a(𝖤𝗇𝖺𝖻𝗅𝖾𝖽P(vP,ΣPO)𝖤𝗇𝖺𝖻𝗅𝖾𝖽Q(vQ,ΣQI))(𝖤𝗇𝖺𝖻𝗅𝖾𝖽Q(vQ,ΣQO)𝖤𝗇𝖺𝖻𝗅𝖾𝖽P(vP,ΣPI)). That is, illegal states are ones in which a shared action can be produced as an output of one of the interface automata but is not allowed as an input by the other one. Let 𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ) be the set of illegal states of APAQ.

We say that two interface automata AP and AQ are compatible if there exists a way for the rest of the system, supplying the inputs ΣPQI to ensure that illegal states are avoided. This gives rise to the notion of legal environment for AP and AQ, which we recall next.

Definition 5 (Legal Environment and Compatibility[14]).

A legal environment for (AP,AQ) is a non-empty interface automaton AR=VR,VRinit,ΣRI,ΣRO,ΣRE,ΣRH,𝒯R where:

  1. 1.

    ΣRE=ΣPQE, ΣRI=ΣPQO, ΣRO=ΣPQI, ΣRH=.

  2. 2.

    AR is composable with APAQ, and 𝖨𝗅𝗅𝖾𝗀𝖺𝗅(APAQ,AR)=.

  3. 3.

    𝖱𝖾𝖺𝖼𝗁((APAQ)AR)(𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ)×VR)=.

Two interface automata AP and AQ are considered compatible if they are non-empty, composable and there exists a legal environment for (AP,AQ).

Intuitively, a legal environment AR for (AP,AQ) represents the remaining system beyond AP and AQ. Condition 3 requires that the inputs provided by AR to the two interfaces steer them away from illegal states in the product.

Definition 6 (Composition of Interface Automata[14]).

Given composable interface automata AP and AQ, a product state (vP,vQ)VP×VQ is called compatible if there exists a legal environment for (APvP,AQvQ). Let 𝖢𝗈𝗆𝗉(AP,AQ) be the set of compatible product states. The composition APAQ of AP and AQ is defined by restricting APAQ to 𝖢𝗈𝗆𝗉(AP,AQ). Formally, APAQ=VPQ𝖢𝗈𝗆𝗉(AP,AQ),VPQinit𝖢𝗈𝗆𝗉(AP,AQ),ΣPQI,ΣPQO,ΣPQE,ΣPQH,𝒯PQ(𝖢𝗈𝗆𝗉(AP,AQ)×ΣPQ×𝖢𝗈𝗆𝗉(AP,AQ)).

Intuitively, the compatible product states are those from which an environment can prevent reaching illegal states. Thus, AP and AQ are compatible if and only if VPQinit𝖢𝗈𝗆𝗉(AP,AQ). Furthermore, for every (vP,vQ)𝖢𝗈𝗆𝗉(AP,AQ), all the external input actions ΣPQE lead to product sates in 𝖢𝗈𝗆𝗉(AP,AQ). If this was not the case, (vP,vQ) would not be in 𝖢𝗈𝗆𝗉(AP,AQ), since an external input action leading to an illegal state cannot be prevented by a legal environment. If an action aΣPQO leads from (vP,vQ) to 𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ), we prune all a-transitions from (vP,vQ). Therefore, for compatible interface automata, the composition APAQ is a non-empty interface automaton conforming to Definition 2.

Another concept, crucial for independent implementability of interfaces, is refinement. Refinement of interface automata [14] is defined via alternating simulation, recalled below.

For an interface automaton A and state v, let ε-𝖼𝗅𝗈𝗌𝗎𝗋𝖾A(v) be the set of states of A that can be reached from v using only internal actions from ΣH. We define:

𝖤𝗑𝗍𝖤𝗇AI,E(v):={auε-𝖼𝗅𝗈𝗌𝗎𝗋𝖾A(v).a𝖤𝗇𝖺𝖻𝗅𝖾𝖽A(u,ΣAIΣAE)} and 𝖤𝗑𝗍𝖤𝗇AO(v):={auε-𝖼𝗅𝗈𝗌𝗎𝗋𝖾A(v).a𝖤𝗇𝖺𝖻𝗅𝖾𝖽A(u,ΣAO)}.

to be the sets of externally enabled input and output actions at v. Further, for actions a𝖤𝗑𝗍𝖤𝗇AI,E(v)𝖤𝗑𝗍𝖤𝗇AO(v), let 𝖤𝗑𝗍𝖣𝖾𝗌𝗍A(v,a)={u(u,a,u)𝒯A.uε-𝖼𝗅𝗈𝗌𝗎𝗋𝖾A(v)}.

Definition 7 (Alternating Simulation [14]).

A binary relation VP×VQ is an alternating simulation from the interface automaton AQ to the interface automaton AP if for all vPVP and vQVQ, vPvQ implies that:

  1. 1.

    𝖤𝗑𝗍𝖤𝗇PI,E(vP)𝖤𝗑𝗍𝖤𝗇QI,E(vQ) and 𝖤𝗑𝗍𝖤𝗇QO(vQ)𝖤𝗑𝗍𝖤𝗇PO(vP).

  2. 2.

    For all a𝖤𝗑𝗍𝖤𝗇PI,E(vP)𝖤𝗑𝗍𝖤𝗇QO(vQ) and vQ𝖤𝗑𝗍𝖣𝖾𝗌𝗍Q(vQ,a), there exists a state vP𝖤𝗑𝗍𝖣𝖾𝗌𝗍P(vP,a) such that vPvQ.

The existence of an alternating simulation from AQ to AP guarantees that the interactions of AP are preserved in AQ. In particular, AQ does not impose more assumptions on the environment than AP, and satisfies all the output restrictions of AP. That is, any environment compatible with AP also is compatible with AQ.

Definition 8 (Interface Automata Refinement [14]).

For interface automata AP and AQ, we say that AQ refines AP, written AQAP, if and only if ΣQE=ΣPE, ΣQIΣPI, ΣQOΣPO, and there exists an alternating simulation from AQ to AP, and states vPVPinit and vQVQinit, such that vPvQ.

Thus, the relation AQAP guarantees that AQ must accept at least all inputs ΣPEΣPI, and may not add any new outputs w.r.t. ΣPO. The alternating simulation ensures that for any input, the outward behavior of Q matches that of P.

3 Reward Interfaces

We now introduce reward interfaces, the central notion of the framework we propose. They build on the classical interface automata, extending them with an additional quantitative requirement, defined as a reward function that assigns numerical values to sequences of observable actions of the interface. Essentially, a reward function defines a quantitative language [8] over the alphabet of non-internal actions of the given interface automaton.

Definition 9 (Reward Interface).

A reward interface is a pair P=(AP,P) where AP=VP,VPinit,ΣPI,ΣPO,ΣPE,ΣPH,𝒯P is an interface automaton and P:(ΣPObs) is a partial function that assigns a value from to sequences over the alphabet ΣPObs:=ΣPIΣPOΣPE of non-internal actions of AP.

Intuitively, P expresses a quantitative specification, by associating reward values with sequences in (ΣPObs). The reward value of a sequence describes how well the respective observed behavior satisfies the quantitative requirement. Since P is part of the interface specification, it is defined in terms of the actions ΣPObs that are visible to the component’s environment. We deliberately do not impose further restrictions on the functions P, in order to retain full generality of the proposed framework. In Section 6 we discuss possible instantiations, in which the reward functions have a natural finite representation.

Example 10.

Continuing from Example 1, we show a possible reward interface P=(AP,P) for component P. The restrictions from AP (Figure 1(a)) ensure that output 𝗈𝟣 can only occur after input 𝗊. The requirement that 𝗈𝟣 must occur after 𝖾𝟣, and not before, is modeled via the reward function P, mapping sequences over ΣPObs={𝖾𝟣,𝖾𝟤,𝗉,𝗊,𝗈𝟣} to values. The reward function P follows 𝖲: We award value 1 if only one type of input 𝖾𝟣 or 𝖾𝟤 occurs, and require respectively 𝗈𝟣 and 𝗉, instead of 𝗈𝟤, as P has no knowledge nor control over 𝗈𝟤. Similarly, if both 𝖾𝟣 and 𝖾𝟤 occur, the order of 𝗈𝟣 and 𝗉 must reflect that to achieve value 12, otherwise the value is lowered to 14. In both cases when 𝖾𝟤 occurs, 𝗈𝟣 and 𝗉 should not be performed infinitely, otherwise the assigned value is 0.

Note that while we could for example encode the safety requirement that there is no 𝗈𝟣 before the first occurrence of 𝖾𝟣 as part of the interface automaton, it would make the automaton structure more complicated. Furthermore, as the internal order of 𝗈𝟣 and 𝖾𝟣 is less relevant for the interaction with component Q, which is not concerned with 𝗈𝟣, it is more meaningful to capture that in the reward function P rather than AP.

A reward function Q for a reward interface Q=(AQ,Q) is defined analogously.  

For v and {<,,,>}, we define Pv:={σ(ΣPObs)P(σ)v} to be the set of words which P maps to some value v. When we write P(σ)v, we implicitly mean that P(σ) is defined. We define 𝑉𝑎𝑙𝑠(P):={vσ(ΣPObs).(σ)=v}.

Let us fix an interface automaton A=S,Sinit,ΣI,ΣO,ΣE,ΣH,𝒯 for the rest of this section. To evaluate implementations of an interface, we define the set of traces of A as

𝖳𝗋𝖺𝖼𝖾𝗌(A):={σΣAωexecution of A on σ}{σΣAexec. of A on σ ending in s:𝖤𝗇𝖺𝖻𝗅𝖾𝖽(s,ΣOΣH)=}.

That is, 𝖳𝗋𝖺𝖼𝖾𝗌(A) is the set that consists of the infinite words over ΣA for which there exists an infinite execution, as well as the finite words over ΣA where an execution ends in a state with no output or internal action possible. That is, we consider maximal traces, taking into account that the environment inputs are not forced to occur.

For a given input sequence σE,I(ΣEΣI), we define 𝖳𝗋𝖺𝖼𝖾𝗌(A,σE,I):={σ𝖳𝗋𝖺𝖼𝖾𝗌(A)σ|(ΣEΣI)=σE,I} to be the subset of 𝖳𝗋𝖺𝖼𝖾𝗌(A) containing the words consistent with σE,I. These words represent all possible behaviors of A when the environment provides the inputs specified by σE,I. If ΓΣA is an alphabet such that ΓΣIΣE, we define 𝖳𝗋𝖺𝖼𝖾𝗌Γ(A,σE,I):={γΓσ𝖳𝗋𝖺𝖼𝖾𝗌(A,σE,I)σ|Γ=γ} to be the projection of 𝖳𝗋𝖺𝖼𝖾𝗌(A,σE,I) on Γ.

Reward functions impose no assumptions on the environment of a component. However, the value that a component can possibly achieve may depend on the behaviour of the environment. Therefore, we require that components implementing a reward interface satisfy the quantitative specification to the best extent possible with respect to the input provided by the component’s environment. This intuition is formalized by the good-enough criterion for interface automata (also used to model implementations).

First, we adapt to our setting the notion of hopeful sequences [2], which is used to characterize the input sequences for which a given reward value is possible. Note that in our setting, we consider asynchronous executions, such that there can be several consecutive inputs with no output in-between, or such that from some point on we only have outputs. This is in contrast to hopeful inputs in [2], which are defined for synchronous executions. We also allow arbitrary hidden actions, which may interleave with the non-internal actions.

Definition 11 (Hopeful Sequences).

Given alphabets Σ and Γ such that ΓΣ, a function :Σ and v, we say that a sequence γΓ is (,v)-hopeful if and only if there exists σΣ such that γ=σ|Γ, (σ) is defined, and (σ)v. We denote the set of (,v)-hopeful Γ-sequences by 𝐻𝑜𝑝𝑒𝑓𝑢𝑙(,v,Γ):={γΓσΣ.γ=σ|Γ(σ)v}.

We use Γ here to denote a subset of Σ, which will typically be instantiated to be a subset of the inputs ΣIΣE, as we will see in the following definition. This notion describes the potential quality of the environment, characterizing the inputs to an interface as hopeful with respect only to specific values of . The hopefulness of an input sequence does not depend on the interface automaton itself, but only on the reward function . An interface automaton is good-enough with respect to a reward function , if, intuitively, the traces it generates on any (,v)-hopeful input sequence achieve reward at least v.

Definition 12 (Good-Enough Interface Automaton).

Consider an interface automaton A and a reward function :Δ. We say that A is good-enough with respect to if and only if for every value v𝑉𝑎𝑙𝑠(), every input sequence σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(,v,(ΣAEΣAI)Δ), and every σ𝖳𝗋𝖺𝖼𝖾𝗌(ΔΣA)(A,σE,I) it holds that (σ) is defined and (σ)v.

This definition is in the spirit of [2], as a good-enough interface automaton must perform to the best extent possible according to , but only for the input it receives. In Definition 12 we used a general alphabet Δ. This is useful for the definition of compatibility of reward interfaces with respect to a given reward function.

Now we define the notion of (best-effort) implementation of a reward interface, which must be good-enough with respect the interface’s reward function.

Definition 13 (Best-Effort Implementation).

An interface automaton S is an implementation of a reward interface P=(AP,P) if and only if it satisfies the following conditions.

  1. 1.

    S refines the interface automaton AP.

  2. 2.

    S is good-enough with respect to the function P.

We denote with 𝖨𝗆𝗉(P) the set of all implementations of P.

An implementation of a reward interface P is an instance of a classical interface automaton, as it does not have a reward function. In the original theory, there is no distinction between an interface and its implementation, as the interface is defined by the automaton structure alone. When considering an interface in isolation, the external inputs ΣE and inputs ΣI are treated in the same way. However, we differentiate between the two when considering the interface in the context of rest of the system, which generates the inputs ΣI.

Example 14.

Let us examine how a best-effort implementation for P could act. With respect to the reward function P from Example 10, a possible implementation SP for P waits for 𝖾𝟣 or 𝖾𝟤, and then expects input 𝗊 to follow with 𝗈𝟣, or produces output 𝗉, respectively. We give a portion of the respective implementation in Figure 2(a), for the behaviors where 𝖾𝟣 occurs first. The missing part has analogous structure for the case when 𝖾𝟤 is received first.

As the implementation must not perform 𝗈𝟣 unless 𝖾𝟣 was received, it waits until 𝖾𝟣 happens. Should 𝗊 never appear, SP has no obligation to perform 𝗈𝟣. This is because, even though P does not require 𝗊, Definition 12 considers the traces of SP, where 𝗊 must be read before 𝗈𝟣 can be produced. While P is unable to control or observe 𝗈𝟤, it exercises control by withholding 𝗉 until 𝖾𝟤, which is represented in P, and therefore in implementation SP. If instead, the implementation could immediately produce 𝗉, the value for P would be lower since it could be that no 𝖾𝟤 occurred.  

(a) (Part of) implementation SP of P.
(b) Composition APAQ.
Figure 2: Interface automata for an implementation of P and composition of AP and AQ.

4 Compatibility and Composition of Reward Interfaces

In this section, we lift the notions of interface compatibility and composition to reward interfaces. Following the classical interface theory, compatibility of reward interfaces requires the existence of a legal environment for the underlying interface automata. For the quantitative part of our reward interfaces, we define compatibility with respect to a “joint” reward function , which, intuitively, is a quantitative specification with respect to which the composition of the implementations of the two interfaces must be good enough. In top-down component-based design, we need to ensure that the local reward functions for interfaces guarantee that the composition of their implementations is good-enough with respect to a given high-level reward function . This is precisely the -compatibility criterion we define.

For this section, let P=(AP,P) and Q=(AQ,Q) be composable reward interfaces with AP=VP,VPinit,ΣPI,ΣPO,ΣPE,ΣPH,𝒯P and AQ=VQ,VQinit,ΣQI,ΣQO,ΣQE,ΣQH,𝒯Q, and APAQ=VPQ,VPQinit,ΣPQI,ΣPQO,ΣPQE,ΣPQH,𝒯PQ their product [Definition 4].

We now define -compatibility, for a given reward function :(ΣPQObs) with alphabet consisting of the non-internal actions of the product automaton APAQ.

Definition 15 (-Compatibility).

The reward interfaces P=(AP,P) and Q=(AQ,Q) are -compatible for a given function :(ΣPQObs) if and only if AP and AQ are compatible and for all implementations SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) of P and Q respectively, if SP and SQ are composable, then SPSQ is good-enough with respect to .

Example 16.

We now illustrate that the reward interfaces P and Q are 𝖲-compatible for the reward function 𝖲 from Example 1. In particular, the composition of P and Q refines the high-level specification of 𝖲 expressed by 𝖲. Clearly, P and Q are composable, and since 𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ)=, we can construct their composition APAQ (shown in Figure 2(b)).

To see that P and Q are 𝖲-compatible, consider any pair of implementations SP and SQ. Now we explain why the composition SPSQ must be good-enough with respect to 𝖲. We examine (𝖲,v)-hopeful input sequences, and consider the expected behaviors of SP and SQ. P requires SP to produce 𝗈𝟣 once 𝖾𝟣 was received, and not before. The same holds with respect to Q, and 𝗈𝟤 and 𝖾𝟤. Since P and Q further constrain 𝗉 and 𝗊, respectively, this ensures that both SP and SQ will each have the opportunity to output 𝗈𝟣 or 𝗈𝟤 respectively, once required. Thus, any pair of best-effort implementations of P and Q together will operate as a best-effort implementation of the high-level reward function 𝖲.

In top-down design, we may further refine our model, by splitting Q into two components Q1 and Q2. Suppose for example that Q1 handles 𝗉, and can output 𝗈𝟥 and some 𝗊1. Q2 outputs 𝗈𝟤 and 𝗊, and synchronizes with Q1 via 𝗊1. The interface automata AQ1 and AQ2 are in Figure 3. The reward function Q1 requires that 𝗊1 only occurs once 𝖾𝟤 did. Similarly, Q2 requires 𝗊 after 𝖾𝟣. Through action 𝗊1, Q1 can prevent Q2 from incorrectly producing 𝗈𝟤.

In the product AQ1AQ2, the state (v1,w2) is illegal, because AQ1 can output 𝗊1 from v1, but w2 in AQ2 can not accept it. In the composition AQ1AQ2, state (v1,w2) is therefore removed. It is easy to see that the composition of any pair of implementations of Q1 and Q2 is good enough with respect to Q. Thus, Q1 and Q2 are Q-compatible.  

(a) Interface automaton AQ1.
(b) Interface automaton AQ2.
Figure 3: Splitting of Q into two components Q1 and Q2 as described in Example 16.

The -compatibility of AP and AQ guarantees that (APAQ,) is a reward interface, as it requires that the interface automata AP and AQ are compatible, and has the right domain. Furthermore, the second condition of -compatibility ensures that P and Q can be implemented independently, and composing the resulting implementations yields a good-enough implementation of (APAQ,). By checking -compatibility of two reward interfaces for a given reward function , we can establish whether their individual reward functions are aligned to guarantee the good-enough satisfaction of the specification .

In bottom-up design, on the other hand, we want to construct the composition of two interfaces by combining their reward functions, in order to express the guarantees of two interfaces when put together. For this, we need to provide a definition of composition of reward functions. The definition is parametrized by a function 𝑐𝑜𝑚𝑏:×, which specifies how we combine the numeric values assigned by the two functions.

The observable behaviours of the composition PQ are over the alphabet ΣPQObs=(ΣPObsΣQObs)𝖲𝗁𝖺𝗋𝖾𝖽(P,Q) of non-internal actions of PQ. Thus, the domain of the composed reward function must be the set of sequences over ΣPQObs. Our goal is to define the composition of P and Q in a way that captures the combined value achieved by any pair of implementations of the two interfaces. Since the implementations of each of the interfaces are constrained by the respective reward function, we cannot assume that any pair of implementations will be cooperating. Therefore, the value assigned to a sequence σ(ΣPQObs) corresponds to the worst case over all the traces produced by some pair of composable implementations that are compatible with the sequence σ|ΣPQI,E of inputs in σ.

Definition 17 (Reward Function Composition).

Let 𝑐𝑜𝑚𝑏:×. We define the composition P𝑐𝑜𝑚𝑏Q:(ΣPQObs) of the functions P and Q, such that for σ(ΣPQObs) where there exist composable SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) and σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ) such that σ|ΣPQObs=σ, we let

P𝑐𝑜𝑚𝑏Q(σ):=inf{𝑐𝑜𝑚𝑏(P(σ′′|ΣPObs),Q(σ′′|ΣQObs))SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) composable,σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQI,E)},

and P𝑐𝑜𝑚𝑏Q(σ) is undefined otherwise.

By taking the worst case in the definition of P𝑐𝑜𝑚𝑏Q, we guarantee that as long as AP and AQ are compatible, the reward interfaces P and Q are P𝑐𝑜𝑚𝑏Q-compatible, as shown in the next proposition. Composing P and Q results in the reward interface (APAQ,P𝑐𝑜𝑚𝑏Q). P and Q can be implemented independently, and the composition of their implementations will be good enough with respect to P𝑐𝑜𝑚𝑏Q.

Proposition 18 (P𝑐𝑜𝑚𝑏Q-Compatibility).

Let 𝑐𝑜𝑚𝑏:×. For all reward interfaces P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, it holds that P and Q are P𝑐𝑜𝑚𝑏Q-compatible.

We discard any observable traces not resulting from implementations of P and Q, restricting the observable behaviors of interface automata good enough w.r.t. P𝑐𝑜𝑚𝑏Q to those resulting from some SPSQ where SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q). This composition is associative if the function 𝑐𝑜𝑚𝑏 used to combine the values is associative, and satisfies some monotonicity and continuity conditions (Proposition 29).

Proposition 19 (Quality of the Reward Function Composition).

Consider reward interfaces P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, and a reward function such that P and Q are -compatible. Then, 𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q)𝖨𝗆𝗉(APAQ,).

Example 20.

Let us consider components P and Q from Example 1 from the perspective of bottom-up design. Recall the reward function P from Example 14, where we impose that for input sequences containing both 𝖾𝟣 and 𝖾𝟤, there must be outputs 𝗈𝟣 and 𝗉 in the respective order. Now, suppose instead we use a function P, which ignores the requirements on 𝗉, meaning that it only asks for 𝗈𝟣 to occur after 𝖾𝟣 and 𝗊, and nothing else. For those input sequences with both 𝖾𝟣 and 𝖾𝟤, P will always award a value of 12 as long as 𝗈𝟣 is produced. Note that this does not disallow 𝗉, it simply does not require it.

A possible behavior of an implementation SP of (AP,P) is then to produce 𝗈𝟣 once receiving 𝗊, and be idle otherwise. In particular, it will never produce 𝗉. Implementations SQ of Q will, as before, produce 𝗈𝟤 only after receiving 𝖾𝟤 and 𝗉.

It is easy to see that the new function P allows implementations of (AP,P) which prevent Q from producing 𝗈𝟤, disabling the best value w.r.t. Q. Since P has no knowledge of Q, we cannot assume that SP will be more cooperative than required by AP and P. As AP and AQ are compatible, choosing 𝑐𝑜𝑚𝑏 to be the sum, we have the guarantee that any SPSQ is good enough w.r.t. P𝑐𝑜𝑚𝑏Q. This guarantee is however weaker than 𝖲. In this case, that means that for the mentioned input sequences featuring 𝖾𝟣 and 𝖾𝟤, where previously the best value of 12+12 was possible, we will now achieve at least 12+0.

In bottom-up design, we can use (APAQ,P𝑐𝑜𝑚𝑏Q) as the reward interface for the composition of P=(AP,P) and Q. This captures the guarantees of the composed components at a higher level of the system, without the need to devise a new reward function and checking if P and Q are compatible with respect to it.  

5 Reward Interface Refinement

Supporting different levels of abstraction for the same component is helpful in the design process, as it allows a coarse description of interactions and dependencies while also defining internal details. Refinement enables this independent implementability of an interface, permitting extended functionality as long as original obligations are maintained. For a reward interface Q to be a refinement of a reward interface P, we require as usual that Q works within the same environments as P, and that an implementation of Q is also an implementation of P. This means, in particular, that, similarly to classical interface automata, a refinement of P must accept at least all inputs of P and not produce more outputs than P. In addition, we must consider the reward function w.r.t. best-effort implementations.

The refinement relation for reward interfaces can be defined as the conjunction of refinement between the respective interface automata [Definition 8] and refinement between the respective reward functions. We define the notion of refinement for quantitative reward functions in the context of good-enough satisfaction. Intuitively, for a function Q to refine a function P, each input sequence for Q must be “as hopeful” w.r.t. Q as it is w.r.t. P, and the value assigned by Q to a trace must be matched by the value assigned by P to the corresponding trace in P. This captures the idea that Q inherits the obligations of P and assigns values in a way that is possible in P. We formalize this intuition as the existence of a helper function rPQ:, that suitably relates the ranges 𝑉𝑎𝑙𝑠(P) and 𝑉𝑎𝑙𝑠(Q) of the functions. The idea is that rPQ preserves the relative order of the values in P, as it relates to both the hopefulness of input traces and values assigned to the respective full traces.

Definition 21 (Reward Function Refinement).

Suppose that for the reward interfaces P=(AP,P) and Q=(AQ,Q) we have ΣPIΣQI, ΣPEΣQE and ΣQOΣPO. We say that Q refines P, denoted QP, iff there exists a function rPQ: such that for every value v𝑉𝑎𝑙𝑠(P), the following two conditions are satisfied.

  1. 1.

    For every σQI,E(ΣQIΣQE), if σQI,E|ΣP𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P,v,ΣPIΣPE),
    For every σQI,E(ΣQIΣQE), then σQI,E𝐻𝑜𝑝𝑒𝑓𝑢𝑙(Q,rPQ(v),ΣQIΣQE).

  2. 2.

    For every σQ(ΣQObs), if Q(σQ)rPQ(v) then P(σQ|ΣP)v.

Definition 21 captures the nature of refinement, because while it retains priority brackets for obligations of the original function, we can also introduce new dependencies or arbitrary subdivisions within each priority. The next example illustrates the refinement relation for reward functions and demonstrates the use of the helper function for establishing the relation.

Example 22.

Going back to the reward interface for 𝖲 with 𝖲 from Example 1, suppose we want to modify the guarantees, requiring 𝖲 to output 𝗈𝟣 or 𝗈𝟤 for a period of time. More concretely, we ask for 𝗈𝟣 to hold as often as possible from the point 𝖾𝟣 is first received until the first occurrence of 𝖾𝟤, as to capture the change in input. We add the same requirement on 𝗈𝟤 if 𝖾𝟤 occurs first. We modify the existing reward function 𝖲 to R, where we multiply the assigned value by a factor of 23 if this new condition is violated.

Clearly, a subset of traces that had value 1 for 𝖲 are now assigned value 23, namely where the other output 𝗈𝟥 is interjected. The same holds for traces going from 12 to 13. The new requirement will always be violated if the order was not preserved in the first place, giving us 16 instead of 14. However, the old obligations are kept: Any (𝖲,1)-hopeful input will be paired with a trace that for R achieves at least 23. The function r𝖲R will then be r𝖲R={(0,0),(14,16),(12,13),(1,23)}. Thus, the priorities from 𝖲 are preserved in the refinement, even though we are able to strengthen requirements in the reward function R.  

The refinement relation for reward interfaces combines the two refinement relations.

Definition 23 (Reward Interface Refinement).

A reward interface Q=(AQ,Q) refines a reward interface P=(AP,P), written as QP, if and only if AQAP and QP.

The rest of this section is dedicated to establishing that the refinement relation we defined has the properties necessary for compositional design. We establish that the refinement relation between reward interfaces is a preorder, i.e. it is reflexive and transitive (Proposition 30). In particular, transitivity of refinement is important, as it allows for an iterative design process, where each refinement needs to only consider the last to maintain a refinement relation to the original specification. It is established by composing the respective helper functions.

As a consequence, an implementation of a refinement is then also an implementation of the original interface. Since the implementation must be good-enough w.r.t. the refined function, by refinement we can show it is also good-enough w.r.t. the original function.

Theorem 24 (Implementation of a Refinement).

If a reward interface Q=(AQ,Q) refines a reward interface P=(AP,P), then 𝖨𝗆𝗉(Q)𝖨𝗆𝗉(P).

The other direction of Theorem 24 does not hold in general. First of all, a reward interface constrains the set of implementations through both the interface automaton and the reward function, while the refinement relation treats those two components separately, which makes it stronger than the inclusion relation between the sets of implementations. Moreover, the refinement relation on reward functions alone is stronger than the inclusion between the respective sets of good-enough automata.

The last properties we establish enable independent implementability, by guaranteeing that refinement does not impair the original context of an interface. They concern refinement in the function composition and the property of substitutability. Intuitively, substitutability states that a refinement of a reward interface P remains -compatible with any reward interfaces that are -compatible with P.

Theorem 25.

Consider P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, and let P=(AP,P) be a non-empty refinement of P, with (ΣPIΣPI)ΣQO=. Then if 𝑐𝑜𝑚𝑏 is monotonically increasing, (APAQ,P𝑐𝑜𝑚𝑏Q)(APAQ,P𝑐𝑜𝑚𝑏Q). If P and Q are -compatible for a reward function , then P and Q are also -compatible.

With that, we established that the refinement for reward interfaces in Definition 23 indeed lifts the ideas of interface refinement to the best-effort quantitative setting and has the properties to enable component-based design of systems with quantitative specifications.

6 Checking Compatibility, Refinement, and Implementability

In this section, we study the algorithmic aspects of our theory of reward interfaces, in the context of an automata-based finite representation of the reward functions. We first introduce this representation. Then, we define the decision problems of interest, namely checking compatibility, refinement and implementability of reward interfaces, and outline algorithms for each of them for our representation. Lastly, we discuss quantitative temporal logics and weighted automata as representations of reward functions, and relate them to the representation studied in this section.

6.1 Automata-Based Finite Representation

The framework we introduced is designed to be general, including the representation of the reward function. Here, we study one possible representation, which is based on automata over finite and infinite words. We need both finite automata to model terminating components, as well as ω-automata to account for infinite behaviors of reactive systems.

We consider functions :Σ such that 𝑉𝑎𝑙𝑠() is finite, and which have a finite representation as a set of pairs of automata Φ={(v,𝒩v)v𝑉𝑎𝑙𝑠()}, where for each v𝑉𝑎𝑙𝑠(), v is an NBA, and 𝒩v is an NFA, and (v)(𝒩v)==v. Intuitively, for each of the finitely many possible values of , the set of finite (resp. infinite) words that maps to that value is given as a NFA (resp. NBA). We denote with Φ the function :Σ represented by Φ. For a reward interface P=(AP,P) where the function P is given in a finite representation ΦP, we abuse notation and write P=(AP,ΦP), and 𝑉𝑎𝑙𝑠(ΦP) instead of 𝑉𝑎𝑙𝑠(ΦP). The size of ΦP is given by |ΦP|=v𝑉𝑎𝑙𝑠(P)(|v|+|𝒩v|).

6.2 Decision Problems and Automata-Based Algorithms

Assuming reward functions Φ given as defined above, with pairs of automata for each value, we now study the main decision problems for reward interfaces.

Theorem 26 (Checking Compatibility of Reward Interfaces).

For a given finitely represented reward function Φ:(ΣPQObs), checking if P=(AP,ΦP) and Q=(AQ,ΦQ) are Φ-compatible can be done in double exponential time.

Proof Sketch.

We reduce the second condition of Definition 15 to checking language inclusion for tree automata, by constructing an automaton capturing the set of joint implementations SPSQ s.t. SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q). To this end, we construct a deterministic interface automaton from APAQ, which incurs an exponential blowup in the worst case. We construct two universal co-Büchi word automata, one for implementations SPSQ, and one for good-enough sequences w.r.t. Φ, that are converted to universal co-Büchi tree automata. Checking language inclusion can be done in exponential time, thus in total we can check reward interface compatibility in time double exponential in |AP||AQ||ΦP||ΦQ|.

Theorem 27 (Checking Reward Interface Refinement).

Checking if (AQ,ΦQ)(AP,ΦP) can be done in exponential time.

Proof Sketch.

Checking AQAP is done in polynomial time [14]. To find a candidate function rPQ according to condition 1 of Definition 21, we iteratively match values of 𝑉𝑎𝑙𝑠(ΦP) to those in 𝑉𝑎𝑙𝑠(ΦQ). We compare sets of hopeful input sequences for the respective pairs of values in exponential time, by checking language inclusion. If a candidate rPQ was constructed, we verify condition 2 of Definition 21, by checking that for every complete trace achieving value rPQ(v) on ΦQ, the corresponding traces achieve value v for ΦP. This is done by checking language emptiness. The overall procedure runs in exponential time.

Theorem 28 (Implementations of a Reward Interface).

Checking if S𝖨𝗆𝗉((AP,ΦP)) for an interface automaton S can be done in polynomial time.
Checking if 𝖨𝗆𝗉(P) for P=(AP,ΦP) can be done in time exponential in |AP||ΦP|.

Proof Sketch.

From ΦP, we construct an NBA whose language contains all counterexamples a good-enough implementation must not produce, whose size is polynomial in |ΦP|. We check for language emptiness of its product with S, which can be done in polynomial time.

For checking the existence of an implementation, we construct from the automaton accepting counterexamples a deterministic parity automaton for the positive case, incurring in the worst case an exponential blow-up in the number of states. We intersect the result with a deterministic automaton obtained from AP to restrict the language to traces with corresponding executions on AP. From the combined automaton we construct a parity game and solve it. If there exists a winning strategy, it gives us a possible implementation of P. If there is no such strategy, then there exists no implementation. Since the number of states in the game is exponential in |AP||ΦP|, the overall check can be done in exponential time.

6.3 Discussion on Reward Functions and their Representation

The automata-based finite representation of reward functions allows system designers to express a quantitative language with a finite set of possible values in a convenient way, by describing the languages mapped to individual values as finite automata, providing a ranking for sets of execution traces with respect to how desirable these executions are.

This finite representation enables higher-level specification languages for quantitative requirements, such as LTL[] [1], which is a temporal logic with quantitative features, and the derived logic LTLf[] [5] over finite traces. Such logic formulas can be seen as functions mapping infinite and finite words, respectively, to values in . The range of values for LTL[] specifications is finite [1]. Following the construction in [1], we can translate, in exponential time, LTL[] and LTLf[] formulas into the above automata representation.

A reward function can also capture a quantitative language L𝒲:Σ, expressed with a weighted automaton 𝒲=Q,qI,Σ,δ,γ, where γ:δ is a weight function, and a value function 𝖵: [8]. These automata can serve as a concise specification in the design process. To use the finite representation above and enable the respective algorithms, we restrict the instances of 𝖵 we consider to those where 𝑉𝑎𝑙𝑠 finite. For the value functions 𝐿𝑎𝑠𝑡 or 𝑀𝑎𝑥 on finite sequences, and 𝑆𝑢𝑝, 𝐿𝑖𝑚𝑆𝑢𝑝 or 𝐿𝑖𝑚𝐼𝑛𝑓 on infinite sequences, we can construct in at most polynomial time [8], from a quantitative automaton 𝒲, individual (Boolean) automata 𝒲v for threshold v and {>,,=,,<}, such that L𝒲v={σΣL𝒲(σ)v}. We can thus translate a pair of quantitative automata with those value functions into a set of automata {(v,𝒩v)v𝑉𝑎𝑙𝑠}. While our theory supports more expressive classes of quantitative languages, such as 𝐿𝑖𝑚𝐴𝑣𝑔 or 𝐷𝑖𝑠𝑐, there the language inclusion problem becomes undecidable for nondeterministic automata [8] and deterministic automata lack many essential closure properties [7]. In the future, we plan to investigate decidable subclasses, in order to extend algorithmic support to more expressive representations of reward functions.

7 Related Work

De Alfaro and Henzinger’s interface automata [14] have been extended in various ways, including the timed domain [16, 12], and focusing on communication and synchronization using game semantics [13]. The work on contract-based design by Benveniste et al. [3] provides a comprehensive overview of different interface theories, with a unified set of properties.

Modal interface theories [20, 23, 21] refine the model by introducing modalities, which enable the expression of liveness properties. Extensions by Tripakis et al. [25], or Mouelhi et al. [22] extend the expressive power of interface automata by introducing additional contracts over the input and output variables at each state and globally. A similar idea is presented concerning shared memory [24], where the pre- and post-conditions on transitions are interpreted as modification of shared data on synchronization. These approaches lift interface automata for more expressivity and finer control. We similarly address these issues with requirements on interfaces expressed as a reward function, which by its multi-valued range is able to capture complex specifications in a concise way, and additionally, we define compatibility with respect to arbitrary functions to express higher level goals.

Quantitative aspects of interface automata have been explored in the form of resource interfaces [6], which extend stateful interfaces to include resource labels. Each state is associated with an increase or decrease in the value of an execution if entered. This allows for naturally specifying minimum resource usage, or checking compatibility within a threshold of consumed energy. For a single interface, reward interfaces are able to express the same using the reward function, but our compatibility notion is less strict outside the automaton structure. We could however use a resource interface in place of the function, similarly to a weighted automaton, since it will associate a value for each execution. Still, the semantics will be different since we interpret obligations given by the function in a good-enough context.

Our approaches to algorithmically checking consistency and compatibility of interfaces reduce to synthesis questions, where we need to find a good-enough strategy for an infinite two-player game. Good-enough synthesis [2] is closely related to dominant [11] and admissible [4] strategies, which are strategies that perform as good as the best alternative, such that they are not dominated by another strategy. There has been further work on applying these ideas to the compositional synthesis setting [11, 19, 17].

8 Conclusion

We introduced a novel interface framework for best-effort quantitative requirements, called reward interfaces, building on interface automata. Our formalism can enforce high-quality implementations for unconstrained external environments, while providing notions of compatibility and refinement central to interface theories. It allows for expressing a wide range of both qualitative and quantitative properties in a concise manner, as to not increase the effort required during the component-based design process. The algorithmic solutions we presented are of considerably higher complexity than those for classic interface automata, due to the high degree of flexibility afforded by the quantitative function.

References

  • [1] Shaull Almagor, Udi Boker, and Orna Kupferman. Formally reasoning about quality. J. ACM, 63(3):24:1–24:56, 2016. doi:10.1145/2875421.
  • [2] Shaull Almagor and Orna Kupferman. Good-enough synthesis. In Shuvendu K. Lahiri and Chao Wang, editors, Computer Aided Verification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II, volume 12225 of Lecture Notes in Computer Science, pages 541–563. Springer, 2020. doi:10.1007/978-3-030-53291-8_28.
  • [3] Albert Benveniste, Benoît Caillaud, Dejan Nickovic, Roberto Passerone, Jean-Baptiste Raclet, Philipp Reinkemeier, Alberto L. Sangiovanni-Vincentelli, Werner Damm, Thomas A. Henzinger, and Kim G. Larsen. Contracts for system design. Found. Trends Electron. Des. Autom., 12(2-3):124–400, 2018. doi:10.1561/1000000053.
  • [4] Romain Brenguier, Jean-François Raskin, and Ocan Sankur. Assume-admissible synthesis. Acta Informatica, 54(1):41–83, 2017. doi:10.1007/s00236-016-0273-2.
  • [5] Alberto Camacho, Meghyn Bienvenu, and Sheila A. McIlraith. Finite LTL synthesis with environment assumptions and quality measures. CoRR, abs/1808.10831, 2018. arXiv:1808.10831.
  • [6] Arindam Chakrabarti, Luca de Alfaro, Thomas A. Henzinger, and Mariëlle Stoelinga. Resource interfaces. In Rajeev Alur and Insup Lee, editors, Embedded Software, pages 117–133, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. doi:10.1007/978-3-540-45212-6_9.
  • [7] Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Expressiveness and closure properties for quantitative languages. In Proceedings of the 2009 24th Annual IEEE Symposium on Logic In Computer Science, LICS ’09, pages 199–208, USA, 2009. IEEE Computer Society. doi:10.1109/LICS.2009.16.
  • [8] Krishnendu Chatterjee, Laurent Doyen, and Thomas A. Henzinger. Quantitative languages. ACM Trans. Comput. Log., 11(4):23:1–23:38, 2010. doi:10.1145/1805950.1805953.
  • [9] Taolue Chen, Chris Chilton, Bengt Jonsson, and Marta Kwiatkowska. A compositional specification theory for component behaviours. In Helmut Seidl, editor, Programming Languages and Systems, pages 148–168, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. doi:10.1007/978-3-642-28869-2_8.
  • [10] Chris Chilton, Bengt Jonsson, and Marta Kwiatkowska. An algebraic theory of interface automata. Theoretical Computer Science, 549:146–174, 2014. doi:10.1016/j.tcs.2014.07.018.
  • [11] Werner Damm and Bernd Finkbeiner. Automatic compositional synthesis of distributed systems. In Cliff B. Jones, Pekka Pihlajasaari, and Jun Sun, editors, FM 2014: Formal Methods - 19th International Symposium, Singapore, May 12-16, 2014. Proceedings, volume 8442 of Lecture Notes in Computer Science, pages 179–193. Springer, 2014. doi:10.1007/978-3-319-06410-9_13.
  • [12] Alexandre David, Kim G. Larsen, Axel Legay, Ulrik Nyman, and Andrzej Wasowski. Timed i/o automata: a complete specification theory for real-time systems. In Proceedings of the 13th ACM International Conference on Hybrid Systems: Computation and Control, HSCC ’10, pages 91–100, New York, NY, USA, 2010. Association for Computing Machinery. doi:10.1145/1755952.1755967.
  • [13] Luca de Alfaro, Leandro Dias da Silva, Marco Faella, Axel Legay, Pritam Roy, and Maria Sorea. Sociable interfaces. In Bernhard Gramlich, editor, Frontiers of Combining Systems, pages 81–105, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. doi:10.1007/11559306_5.
  • [14] Luca de Alfaro and Thomas A. Henzinger. Interface automata. SIGSOFT Softw. Eng. Notes, 26(5):109–120, 2001. doi:10.1145/503271.503226.
  • [15] Luca de Alfaro and Thomas A. Henzinger. Interface-based design. In Manfred Broy, Johannes Grünbauer, David Harel, and Tony Hoare, editors, Engineering Theories of Software Intensive Systems, pages 83–104, Dordrecht, 2005. Springer Netherlands.
  • [16] Luca de Alfaro, Thomas A. Henzinger, and Mariëlle Stoelinga. Timed interfaces. In Alberto Sangiovanni-Vincentelli and Joseph Sifakis, editors, Embedded Software, pages 108–122, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. doi:10.1007/3-540-45828-X_9.
  • [17] Rafael Dewes and Rayna Dimitrova. Compositional high-quality synthesis. In Étienne André and Jun Sun, editors, Automated Technology for Verification and Analysis - 21st International Symposium, ATVA 2023, Singapore, October 24-27, 2023, Proceedings, Part I, volume 14215 of Lecture Notes in Computer Science, pages 334–354. Springer, 2023. doi:10.1007/978-3-031-45329-8_16.
  • [18] Laurent Doyen, Thomas A. Henzinger, Barbara Jobstmann, and Tatjana Petrov. Interface theories with component reuse. In Proceedings of the 8th ACM International Conference on Embedded Software, EMSOFT ’08, pages 79–88, New York, NY, USA, 2008. Association for Computing Machinery. doi:10.1145/1450058.1450070.
  • [19] Bernd Finkbeiner and Noemi Passing. Dependency-based compositional synthesis. In Dang Van Hung and Oleg Sokolsky, editors, Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings, volume 12302 of Lecture Notes in Computer Science, pages 447–463. Springer, 2020. doi:10.1007/978-3-030-59152-6_25.
  • [20] Kim G. Larsen, Ulrik Nyman, and Andrzej Wąsowski. Modal i/o automata for interface and product line theories. In Rocco De Nicola, editor, Programming Languages and Systems, pages 64–79, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
  • [21] Gerald Lüttgen and Walter Vogler. Modal interface automata. In Jos C. M. Baeten, Thomas Ball, and Frank S. de Boer, editors, Theoretical Computer Science - 7th IFIP TC 1/WG 2.2 International Conference, TCS 2012, Amsterdam, The Netherlands, September 26-28, 2012. Proceedings, volume 7604 of Lecture Notes in Computer Science, pages 265–279. Springer, 2012. doi:10.1007/978-3-642-33475-7_19.
  • [22] Sebti Mouelhi, Samir Chouali, and Hassan Mountassir. Refinement of interface automata strengthened by action semantics. In FESCA@ETAPS, volume 253 of Electronic Notes in Theoretical Computer Science, pages 111–126. Elsevier, 2009. doi:10.1016/J.ENTCS.2009.09.031.
  • [23] Jean-Baptiste Raclet, Éric Badouel, Albert Benveniste, Benoît Caillaud, Axel Legay, and Roberto Passerone. A modal interface theory for component-based design. Fundam. Informaticae, 108(1-2):119–149, 2011. doi:10.3233/FI-2011-416.
  • [24] Ayleen Schinko, Walter Vogler, Johannes Gareis, N. Tri Nguyen, and Gerald Lüttgen. Interface automata for shared memory. Acta Informatica, 59(5):521–556, 2022. doi:10.1007/S00236-021-00408-8.
  • [25] Stavros Tripakis, Ben Lickly, Thomas A. Henzinger, and Edward A. Lee. A theory of synchronous relational interfaces. ACM Trans. Program. Lang. Syst., 33(4), July 2011. doi:10.1145/1985342.1985345.

Appendix A Appendix: Compatibility of Reward Interfaces

Proposition 18 (P𝑐𝑜𝑚𝑏Q-Compatibility). [Restated, see original statement.]

Let 𝑐𝑜𝑚𝑏:×. For all reward interfaces P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, it holds that P and Q are P𝑐𝑜𝑚𝑏Q-compatible.

Proof.

Let P=(AP,P) and Q=(AQ,Q) be reward interfaces where AP and AQ are compatible. To prove that P and Q are P𝑐𝑜𝑚𝑏Q-compatible, we need to show that for any pair of implementations SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) that are composable, it holds that SPSQ is good enough w.r.t. P𝑐𝑜𝑚𝑏Q. For the sake of contradiction, assume that there exist composable SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) such that SPSQ is not good-enough w.r.t. P𝑐𝑜𝑚𝑏Q. That means that for some v𝑉𝑎𝑙𝑠(P𝑐𝑜𝑚𝑏Q) there exists an input sequence σE,I(ΣPQE,I) that witnesses this violation, that is,

  • σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P𝑐𝑜𝑚𝑏Q,v,ΣPQE,I), and

  • for some σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σE,I), P𝑐𝑜𝑚𝑏Q(σ) is undef. or P𝑐𝑜𝑚𝑏Q(σ)<v.

By the first item above, together with the definition of hopeful sequences (Definition 11) and the definition of P𝑐𝑜𝑚𝑏Q (Definition 17), there exist composable SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) and σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ) such that σ|ΣPQE,I=σE,I and

inf{𝑐𝑜𝑚𝑏(P(σ′′|ΣPObs),Q(σ′′|ΣQObs))SP′′𝖨𝗆𝗉(P) and SQ′′𝖨𝗆𝗉(Q) composable,σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SP′′SQ′′,σE,I)}v.

Since σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σE,I), by the definition of P𝑐𝑜𝑚𝑏Q, we have that P𝑐𝑜𝑚𝑏Q(σ) is defined. Thus, by the second item we have that P𝑐𝑜𝑚𝑏Q(σ)<v. This, together with the definition of P𝑐𝑜𝑚𝑏Q implies that

inf{𝑐𝑜𝑚𝑏(P(σ′′|ΣPObs),Q(σ′′|ΣQObs))SP′′𝖨𝗆𝗉(P) and SQ′′𝖨𝗆𝗉(Q) composable,σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SP′′SQ′′,σE,I)}<v.

This is a contradiction, which concludes the proof.

Proposition 19 (Quality of the Reward Function Composition). [Restated, see original statement.]

Consider reward interfaces P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, and a reward function such that P and Q are -compatible. Then, 𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q)𝖨𝗆𝗉(APAQ,).

Proof.

Let P=(AP,P) and Q=(AQ,Q) be reward interfaces with automata AP and AQ that are compatible. Let be a reward function such that P and Q are -compatible.

We have to show that for every S𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q) it holds that S𝖨𝗆𝗉(APAQ,). For the sake of contradiction, suppose that there exists S𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q) such that S𝖨𝗆𝗉(APAQ,). Since SAPAQ, this means that S is not good-enough with respect to . Let σ𝖳𝗋𝖺𝖼𝖾𝗌(S) be the trace witnessing this violation. Thus, σ|ΣPQE,I is (,v)-hopeful for some v and (σ|ΣPQObs) is either undefined or strictly smaller than v. We will now show that this assumption leads to contradiction.

Let ={SPSQSP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q),SP and SQ composable} be the set of composed implementations of P and Q. Since implementations do not restrict the allowed inputs, we have that there exists S and σ𝖳𝗋𝖺𝖼𝖾𝗌(S) such that σ|ΣPQE,I=σ|ΣPQE,I. This implies that P𝑐𝑜𝑚𝑏Q(σ|ΣPQE,I) is defined. Thus, since S is good-enough with respect to P𝑐𝑜𝑚𝑏Q, we have that P𝑐𝑜𝑚𝑏Q(σ|ΣPQObs) must be defined as well. By the definition of P𝑐𝑜𝑚𝑏Q, this implies that there exist S′′ and σ′′𝖳𝗋𝖺𝖼𝖾𝗌(S′′) such that σ′′|ΣPQObs=σ|ΣPQObs. Now, since σ′′|ΣPQE,I=σ|ΣPQE,I, and by Proposition 18 S′′ is good-enough with respect to , it must be the case that (σ′′|ΣPQObs)v. Since σ′′|ΣPQObs=σ|ΣPQObs, it also holds that (σ|ΣPQObs)v, which is the desired contradiction.

Proposition 29 (Composition Associativity).

Let P=(AP,P),Q=(AQ,Q) and R=(AR,R) be reward interfaces with pairwise compatible interface automata. If the function 𝑐𝑜𝑚𝑏 is symmetric, then P𝑐𝑜𝑚𝑏Q=Q𝑐𝑜𝑚𝑏P. If 𝑐𝑜𝑚𝑏 is associative, monotonically increasing and continuous, then the reward function composition using 𝑐𝑜𝑚𝑏 is associative, that is, (P𝑐𝑜𝑚𝑏Q)𝑐𝑜𝑚𝑏R=P𝑐𝑜𝑚𝑏(Q𝑐𝑜𝑚𝑏R).

Proof.

Let P=(AP,P),Q=(AQ,Q) and R=(AR,R) be reward interfaces with pairwise compatible interface automata. Clearly, APAQ=AQAP. The associativity of composition for compatible interface automata is established in [14], thus (APAQ)AR=AP(AQAR). Let APQ:=APAQ and AQR:=AQAR.

By the definition of composition, it is clearly the case that for every σΣPQObs we have that P𝑐𝑜𝑚𝑏Q(σ) is defined if and only if Q𝑐𝑜𝑚𝑏P(σ) is defined. When 𝑐𝑜𝑚𝑏 is symmetric, that, for all a,b we have that 𝑐𝑜𝑚𝑏(a,b)=𝑐𝑜𝑚𝑏(b,a), the definition of reward function composition (Definition 17) implies that if defined, the two values are equal.

Suppose that the function 𝑐𝑜𝑚𝑏 satisfies the following conditions for all a,b,c,d,d:

  • (Associativity) 𝑐𝑜𝑚𝑏(comb(a,b),c)=𝑐𝑜𝑚𝑏(a,comb(b,c)).

  • (Monotonicity) If dd, then 𝑐𝑜𝑚𝑏(d,b)𝑐𝑜𝑚𝑏(d,b) and 𝑐𝑜𝑚𝑏(a,d)𝑐𝑜𝑚𝑏(a,d).

  • (Continuity) For every fixed v, the functions f1(x)=𝑐𝑜𝑚𝑏(x,v) and f2(y)=𝑐𝑜𝑚𝑏(v,y) are continuous on .

Under these conditions we prove that for any sequence σ(ΣPQRObs) it either holds that ((P𝑐𝑜𝑚𝑏Q)𝑐𝑜𝑚𝑏R)(σ)=(P𝑐𝑜𝑚𝑏(Q𝑐𝑜𝑚𝑏R))(σ), or both are undefined.

Let PQ:=P𝑐𝑜𝑚𝑏Q, QR:=Q𝑐𝑜𝑚𝑏R, PQ:=(APQ,PQ) and QR:=(AQR,QR), and let A:=(APAQ)AR=AP(AQAR). Let σ(ΣPQRObs).

First we show that (PQ𝑐𝑜𝑚𝑏R)(σ) is defined iff (P𝑐𝑜𝑚𝑏QR)(σ) is defined.

Suppose that (PQ𝑐𝑜𝑚𝑏R)(σ) is defined. Then, there exist composable implementations SPQ𝖨𝗆𝗉(PQ) and SR𝖨𝗆𝗉(R) and σ𝖳𝗋𝖺𝖼𝖾𝗌(SPQSR) such that σ|ΣPQRObs=σ. Since SPQ is good-enough with respect to PQ, from the properties of composition of reward functions we have that there exist composable SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) and a trace σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ) such that σ′′|ΣPQObs=σ|ΣPQObs. By the choice of σ and σ′′ we have that there exists σ′′′𝖳𝗋𝖺𝖼𝖾𝗌((SPSQ)SR) such that σ′′′|ΣPQRObs=σ. Since 𝖳𝗋𝖺𝖼𝖾𝗌((SPSQ)SR)=𝖳𝗋𝖺𝖼𝖾𝗌(SP(SQSR)), we can use SP, SQSR and σ′′′ as a witness to the fact that (P𝑐𝑜𝑚𝑏QR)(σ) is defined. The other direction of the implication is shown analogously, concluding the first part of the proof.

It remains to show that when both functions are defined on σ, then their values are equal.

By definition, we have (PQ𝑐𝑜𝑚𝑏R)(σ)=inf{𝑐𝑜𝑚𝑏(PQ(σ|ΣPQObs),R(σ|ΣRObs))|SPQ𝖨𝗆𝗉(PQ),SR𝖨𝗆𝗉(R),composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPQSR,σ|ΣPQRE,I)}. Applying the definition to PQ, we replace PQ(σ|ΣPQObs) by inf{𝑐𝑜𝑚𝑏(P(σ′′|ΣPObs),Q(σ′′|ΣQObs))|SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q),composable,σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQE,I)}.

Due to the monotonicity and continuity properties of 𝑐𝑜𝑚𝑏, we get (PQ𝑐𝑜𝑚𝑏R)(σ)=inf{𝑐𝑜𝑚𝑏(𝑐𝑜𝑚𝑏(P(σ′′|ΣPObs),Q(σ′′|ΣQObs)),R(σ|ΣRObs))SPQ𝖨𝗆𝗉(PQ),SR𝖨𝗆𝗉(R),composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPQSR,σ|ΣPQRE,I),SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q),composable,σ′′𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQE,I)}. Applying reasoning similar to that in the first part of the proof, we get (PQ𝑐𝑜𝑚𝑏R)(σ)=inf{𝑐𝑜𝑚𝑏(𝑐𝑜𝑚𝑏(P(σ′′′|ΣPObs),Q(σ′′′|ΣQObs)),R(σ′′′|ΣRObs))SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q),composable,SR𝖨𝗆𝗉(R) composable with SPSQ,σ′′′𝖳𝗋𝖺𝖼𝖾𝗌(SPSQSR,σ|ΣPQRE,I)}. Using the associativity of 𝑐𝑜𝑚𝑏 we replace the term 𝑐𝑜𝑚𝑏(𝑐𝑜𝑚𝑏(P(σ′′′|ΣPObs),Q(σ′′′|ΣQObs)),R(σ′′′|ΣRObs)) by
𝑐𝑜𝑚𝑏(P(σ′′′|ΣPObs),𝑐𝑜𝑚𝑏(Q(σ′′′|ΣQObs),R(σ′′′|ΣRObs))).

Additionally, reorganizing the composition, we obtain (PQ𝑐𝑜𝑚𝑏R)(σ)=inf{𝑐𝑜𝑚𝑏(P(σ′′′|ΣPObs),𝑐𝑜𝑚𝑏(Q(σ′′′|ΣQObs),R(σ′′′|ΣRObs)))SP𝖨𝗆𝗉(P),SQSR𝖨𝗆𝗉(QR),composable,σ′′′𝖳𝗋𝖺𝖼𝖾𝗌(SP(SQSR),σ|ΣPQRE,I)}.

Using again the properties of 𝑐𝑜𝑚𝑏, we get (PQ𝑐𝑜𝑚𝑏R)(σ)=inf{𝑐𝑜𝑚𝑏(P(σ′′′|ΣPObs),QR(σ′′′|ΣQRObs))SP𝖨𝗆𝗉(P),SQSR𝖨𝗆𝗉(QR),composable,σ′′′𝖳𝗋𝖺𝖼𝖾𝗌(SP(SQSR),σ|ΣPQRE,I)}. This is precisely (P𝑐𝑜𝑚𝑏QR)(σ).

Appendix B Appendix: Properties of Reward Interface Refinement

Proposition 30 (Refinement as Preorder).

For all reward interfaces P, Q and R, it holds that (1) PP and (2) if QP, and RQ, then also RP.

Proof.

Theorem 4.1 in [14] establishes that the refinement relation between classical interface automata is a preorder. Thus, we show only that reward function refinement is a preorder.

Consider reward interfaces P=(AP,P), Q=(AQ,Q), and R=(AR,R).

(1) is trivial, so we show only (2). Assume that PQ and QR.

Since APAQ and AQAR, we have alternating simulations PQVP×VQ and QRVQ×VR from AQ to AP, and from AR to AQ, respectively. Furthermore we have that ΣPEΣQEΣRE, ΣPIΣQIΣRI, and ΣROΣQOΣPO. For APAR, we define the relation PRVP×VR as PR:={(u,v)VP×VRwVQ.uPQwwQRv}. The relation PR is an alternating simulation from AR to AP. Thus, APAR.

Since PQ and QR, there exist functions rPQ:, and rQR:, which satisfy the conditions of Definition 21. We define the function rPR: such that rPR(v)=rQR(rPQ(v)) for all v𝑉𝑎𝑙𝑠(P). We now show that rPR satisfies the conditions of Definition 21. Let v𝑉𝑎𝑙𝑠(P).

  1. 1.

    Let σRI,E(ΣREΣRI) be such that σRI,E|ΣP𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P,v,(ΣPEΣPI)). We have to show that σRI,E𝐻𝑜𝑝𝑒𝑓𝑢𝑙(R,rPR(v),(ΣREΣRI)). From the properties of rPQ, we get σRI,E|ΣQ𝐻𝑜𝑝𝑒𝑓𝑢𝑙(Q,rPQ(v),(ΣQEΣQI)). From that, using the properties of rQR, we get that σRI,E𝐻𝑜𝑝𝑒𝑓𝑢𝑙(R,rQR(rPQ(v)),(ΣQEΣQI)). Since rQR(rPQ(v))=rPR(v), this is precisely what we needed to show.

  2. 2.

    Let σR(ΣRObs) be such that R(σR)rPR(v). We have to show that P(σR|ΣP)v. Since rPR(v)=rQR(rPQ(v)), from the properties of rQR we have that Q(σR|ΣQ)rPQ(v). From that, by the properties of the function rPQ we get P((σR|ΣQ)|ΣP)v. (σR|ΣQ)|ΣP is equal to σR|ΣP because of the subset relationships between alphabets of P, Q, and R, as stated above, which is precisely what we had to show.

Theorem 24 (Implementation of a Refinement). [Restated, see original statement.]

If a reward interface Q=(AQ,Q) refines a reward interface P=(AP,P), then 𝖨𝗆𝗉(Q)𝖨𝗆𝗉(P).

Proof.

Consider reward interfaces P=(AP,P) and Q=(AQ,Q) such that Q is a refinement of P. We have to show that 𝖨𝗆𝗉(Q)𝖨𝗆𝗉(P). Let S𝖨𝗆𝗉(Q). By Definition 13, we need to show that (a) SAP, and (b) S is good-enough w.r.t. P.

Refinement of interface automata is transitive, and since S is a refinement of AQ, and AQ refines AP, condition (a) is satisfied. To establish condition (b), we have to show that for every σE,I(ΣPEΣPI) with σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P,v,ΣPEΣPI) and every trace σ𝖳𝗋𝖺𝖼𝖾𝗌ΣPObs(S,σE,I) it holds that P(σ) is defined and P(σ)v.

Since QP, there exists a function rPQ: that satisfies the conditions of Definition 21. Because σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P,v,ΣPEΣPI), and (ΣPEΣPI)(ΣQEΣQI), we know by the first property of rPQ that σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(Q,rPQ(v),ΣQEΣQI). Then, since S𝖨𝗆𝗉(Q) we have that for every trace σ𝖳𝗋𝖺𝖼𝖾𝗌ΣQObs(S,σE,I), Q(σ) is defined and Q(σ)rPQ(v). Applying Definition 21, every trace σ𝖳𝗋𝖺𝖼𝖾𝗌ΣQObs(S,σE,I) is such that P(σ|ΣP) is defined and P(σ|ΣP)v. Conditions on rPQ guarantee that the matching sequence σ|ΣP achieves value v for P. Since ΣPOΣQOΣSO, {σ|ΣPσ𝖳𝗋𝖺𝖼𝖾𝗌ΣQObs(S,σE,I)}={σσ𝖳𝗋𝖺𝖼𝖾𝗌ΣPObs(S,σE,I)}. Thus, it follows that for every trace σ𝖳𝗋𝖺𝖼𝖾𝗌ΣPObs(S,σE,I), P(σ) is defined and P(σ)v, which is what we had to prove.

Theorem 25. [Restated, see original statement.]

Consider P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, and let P=(AP,P) be a non-empty refinement of P, with (ΣPIΣPI)ΣQO=. Then if 𝑐𝑜𝑚𝑏 is monotonically increasing, (APAQ,P𝑐𝑜𝑚𝑏Q)(APAQ,P𝑐𝑜𝑚𝑏Q). If P and Q are -compatible for a reward function , then P and Q are also -compatible.

We will differentiate Theorem 25 into Lemma 31 and Lemma 32.

Lemma 31 (Substitutability).

Let P=(AP,P) be -compatible with Q=(AQ,Q) for some function :(ΣPQObs). Then, any non-empty refinement P=(AP,P) of P with (ΣPIΣPI)ΣQO= is -compatible with Q.

Proof.

Since P and Q are -compatible, we know that all the following hold:

  • AP and AQ are non-empty and composable,

  • there exists a legal environment AR for (AP,AQ), and

  • for SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) composable, SPSQ is good-enough w.r.t. .

To prove -compatibility of P and Q, we will first construct an interface automaton AR that is a legal environment for (AP,AQ). We will then show that for any SP𝖨𝗆𝗉(P) and any SQ𝖨𝗆𝗉(Q), SPSQ is good-enough w.r.t. . By (ΣPIΣPI)ΣQO= and ΣPOΣPO, we know that 𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ)𝖲𝗁𝖺𝗋𝖾𝖽(AP,AQ), i.e. AP does not synchronize with AQ on any new actions. We define AR=VR,VRinit,ΣRI,ΣRO,ΣRE,ΣRH,𝒯R such that:

  • ΣRI=ΣRI(ΣPOΣPO), ΣRO=ΣRO(ΣPIΣPI),

  • 𝒯R={(v,a,v)|(v,a,v)𝒯RaΣR}.

That is, AR is equal to AR except for the sets of inputs and outputs actions towards PQ. These are modified to account for the fact that ΣPIΣPI and ΣPOΣPO. Since none of the new outputs (ΣPIΣPI) of AR are used in 𝒯R, functionally AR differs from AR only in not accepting outputs of P that are removed in P, that is (ΣPOΣPO). To prove that AR is a legal environment for (AP,AQ) we need to show that conditions in Definition 5 are met. Meaning that AR is not empty and

  1. 1.

    ΣRE=ΣPQE, ΣRI=ΣPQO, ΣRO=ΣPQI.

  2. 2.

    AR is composable with APAQ, and 𝖨𝗅𝗅𝖾𝗀𝖺𝗅(APAQ,AR)=.

  3. 3.

    𝖱𝖾𝖺𝖼𝗁((APAQ)AR)(𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ)×VR)=

By the definition of AR and the properties of AR, the first two conditions hold.

For the third condition, note that since AP is a refinement of AP, AP will never reject an input accepted by AP. In particular, 𝖱𝖾𝖺𝖼𝗁((APAQ)AR) will not contain illegal states from (𝖨𝗅𝗅𝖾𝗀𝖺𝗅(AP,AQ)×VR) if 𝖱𝖾𝖺𝖼𝗁((APAQ)AR) did not, since 𝒯R𝒯R. Thus, the third condition also holds. Therefore, AR is a legal environment for (P,Q).

Now we show that for any two implementations SP𝖨𝗆𝗉(P) and SQ𝖨𝗆𝗉(Q) that are composable, SPSQ is good-enough w.r.t. .

As PP, by Theorem 24 we have that SP𝖨𝗆𝗉(P). Therefore, the -compatibility of P and Q implies that SPSQ is good-enough w.r.t .

Lemma 32 (Monotonicity of Reward Function Composition).

Consider reward interfaces P=(AP,P) and Q=(AQ,Q) with AP and AQ compatible, and let P=(AP,P) be a non-empty refinement of P, with (ΣPIΣPI)ΣQI=. If 𝑐𝑜𝑚𝑏 is monotonically increasing, then (APAQ,P𝑐𝑜𝑚𝑏Q)(APAQ,P𝑐𝑜𝑚𝑏Q).

Proof.

Let P=(AP,P) and Q=(AQ,Q) be such that AP and AQ are compatible, and P=(AP,P) be non-empty, PP and (ΣPIΣPI)ΣQI=. We show that if 𝑐𝑜𝑚𝑏 is monotonically increasing, then (APAQ,P𝑐𝑜𝑚𝑏Q)(APAQ,P𝑐𝑜𝑚𝑏Q).

For interface automata, APAQAPAQ. We show (P𝑐𝑜𝑚𝑏Q)(P𝑐𝑜𝑚𝑏Q).

From PP, there is a function r:Vals(P)Vals(P) that satisfies the conditions of Definition 21. To show (P𝑐𝑜𝑚𝑏Q)(P𝑐𝑜𝑚𝑏Q), we need to construct a function r: that satisfies the conditions of Definition 21 for these functions.

For all vVals(P𝑐𝑜𝑚𝑏Q), let r(v)=inf{𝑐𝑜𝑚𝑏(r(P(σ|ΣPObs)),Q(σ|ΣQObs))σ(ΣPQObs) and 𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))v}.

For every value vVals(P𝑐𝑜𝑚𝑏Q) we have:

  • Condition 1:
    Let σE,I(ΣPQE,I) be such that σE,I|ΣPQE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P𝑐𝑜𝑚𝑏Q,v,ΣPQE,I). This means that inf{𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q) composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σE,I)}v.

    We know that 𝖨𝗆𝗉(P)𝖨𝗆𝗉(P), so in particular 𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q)𝖨𝗆𝗉(APAQ,P𝑐𝑜𝑚𝑏Q). That means that for P𝑐𝑜𝑚𝑏Q we will consider a subset of the implementations, and therefore not more traces when applying inf, as for P𝑐𝑜𝑚𝑏Q. This, together with the above inequality implies P𝑐𝑜𝑚𝑏Q(σE,I)r(v). Then it holds that any σ𝖳𝗋𝖺𝖼𝖾𝗌(APAQ,σE,I), for which P𝑐𝑜𝑚𝑏Q(σ) is defined, is a witness for σE,I𝐻𝑜𝑝𝑒𝑓𝑢𝑙(P𝑐𝑜𝑚𝑏Q,r(v),ΣPQE,I).

  • Condition 2:
    Let σ(ΣPQObs) be such that (P𝑐𝑜𝑚𝑏Q)(σ)r(v). This means that
    inf{𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q), composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQE,I)}r(v).

    We have to show that inf{𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q), composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQE,I)}v.

    Since PP we have for the function r that P(σ)r(P(σ|ΣPObs)). Since this function r is used in r, then 𝑐𝑜𝑚𝑏(r(P(σ|ΣPObs)),Q(σ|ΣQObs))r(v), meaning that 𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))v by definition of r and monotonicity of 𝑐𝑜𝑚𝑏. Thus, inf{𝑐𝑜𝑚𝑏(P(σ|ΣPObs),Q(σ|ΣQObs))SP𝖨𝗆𝗉(P),SQ𝖨𝗆𝗉(Q), composable,σ𝖳𝗋𝖺𝖼𝖾𝗌(SPSQ,σ|ΣPQE,I)}v.

Thus, both conditions in Definition 21 hold for r. Hence, (P𝑐𝑜𝑚𝑏Q)(P𝑐𝑜𝑚𝑏Q).