Abstract 1 Introduction 2 Operators as a Codatatype 3 Asynchronous Dataflow Operators 4 Asynchronous Dataflow Properties 5 Well-Behaved Operators 6 Code Generation 7 Related Work 8 Discussion References

Nondeterministic Asynchronous Dataflow in Isabelle/HOL

Rafael Castro Gonçalves Silva ORCID Department of Computer Science, University of Copenhagen, Denmark Laouen Fernet ORCID Department of Computer Science, University of Copenhagen, Denmark Dmitriy Traytel ORCID Department of Computer Science, University of Copenhagen, Denmark
Abstract

We formalize nondeterministic asynchronous dataflow networks in Isabelle/HOL. Dataflow networks are comprised of operators that are capable of communicating with the network, performing silent computations, and making nondeterministic choices. We represent operators using a shallow embedding as codatatypes. Using this representation, we define standard asynchronous dataflow primitives, including sequential and parallel composition and a feedback operator. These primitives adhere to a number of laws from the literature, which we prove by coinduction using weak bisimilarity as our equality. Albeit coinductive and nondeterministic, our model is executable via code extraction to Haskell.

Keywords and phrases:
dataflow, verification, coinduction, Isabelle/HOL
Copyright and License:
[Uncaptioned image] © Rafael Castro Gonçalves Silva, Laouen Fernet, and Dmitriy Traytel; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Security and privacy Logic and verification
; Computing methodologies Distributed algorithms ; Software and its engineering Data flow languages
Supplementary Material:
Software: https://doi.org/10.5281/zenodo.15737868 [31]
Acknowledgements:
We thank Asta H. From for numerous discussions on the representation of operators that were instrumental for arriving at the end product presented in this paper and Jan A. Bergstra, Cornelis A. Middelburg, and Gheorghe Stefanescu for discussing their axiom A1 for merge with us and for encouraging words on the merits of formal verification. Thomas Hildebrandt pointed us to Milner’s classic on weak bisimilarity. The ITP reviewers provided helpful textual suggestions.
Funding:
This work is supported by an Independent Research Fund Denmark Sapere Aude grant (3120-00057B, titled DISCOVER) and a Novo Nordisk Foundation start package grant (NNF20OC0063462).
Editors:
Yannick Forster and Chantal Keller

1 Introduction

Data stream processing frameworks like Flink [12], Kafka [21], Spark Streaming [34], and Timely Dataflow [26] are widely used in industry to compute with and analyze streams of data at a large scale. With a long-term objective of formally verifying the correctness of programs expressed in such frameworks in mind, we focus on their common algebraic dataflow foundation in this paper. The dataflow programming paradigm represents a streaming computation as a graph of operators, which are themselves data stream transformers.

We focus on asynchronous dataflow in which operators work autonomously without blocking the overall computation following the mantra “if I cannot make progress maybe someone else can.” Asynchronous computation is naturally nondeterministic: the order in which operators are invoked is not predetermined and might affect the overall computation’s behavior.

Bergstra et al. [3] provide an algebra of asynchronous dataflow operators and building blocks (including sequential and parallel composition and a feedback loop construct), axiomatize the properties of these operators, and give two instances satisfying the axioms. We develop a new instance in Isabelle/HOL using a shallow embedding as a coinductive datatype (short: codatatype). Our operators are essentially nondeterministic input-output machines with silent actions (Section 2). The coinductive representation allows us to keep the state implicit when composing operators. We use corecursion to define composition, feedback, and the various basic asynchronous operators of Bergstra et al. (Section 3). We use countable sets to represent nondeterminism, which simplifies our definition of composition and feedback operators.

We validate that our operators work as intended by proving all but one of Bergstra et al.’s axioms by coinduction (Section 4). Thereby, we use weak bisimilarity as the notion of operator equality. Some of our originally proved properties deviate mildly from Bergstra et al.’s axioms due to a simplification we made in some operators’ definitions and the fact that Isabelle does not have empty types. (Types are used by both us and Bergstra et al. to index the inputs and outputs of an operator. Bergstra et al.’s empty types signify that an operator has no inputs or no outputs). We show how to close this gap and obtain precisely Bergstra et al.’s axioms by working with the subtype of well-behaved operators (Section 5).

With the help of Isabelle’s code generator, we extract executable Haskell code from our formalization (Section 6). To this end, we represent countable sets as a quotient of lazy lists. This allows us to run operators, which is useful for testing. A particular challenge is the executability of the union of a countable set of countable sets as the corresponding merge function on lazy lists. We conclude with discussions of related work (Section 7) and particular formalization aspects (Section 8). Our formalization is publicly available [31].

2 Operators as a Codatatype

Isabelle/HOL’s codatatypes [6] formalize potentially infinite objects such as infinite streams (𝑠𝑡𝑟𝑒𝑎𝑚 type) and lazy lists (𝑙𝑙𝑖𝑠𝑡 type). We introduce operators as:

This 𝐜𝐨𝐝𝐚𝐭𝐚𝐭𝐲𝐩𝐞 command introduces a new type constructor, 𝑜𝑝 (written postfix), with type parameters 'i indexing the input ports, 'o indexing the output ports, and 'd representing the type of data items (or events) an operator processes. An operator will perform one of four actions corresponding to the four codatatype constructors: (1) read an event from a given input port and continue based on the read event, (2) write an event to a given output port and continue as specified, (3) execute an internal (silent) action and continue as specified, and (4) nondeterministically choose a continuation from a countable set (type 𝑐𝑠𝑒𝑡) of options. Operators are possibly infinite trees with nodes labeled by one of these four actions with 𝖱𝖾𝖺𝖽 branching over the read event 'd and 𝖢𝗁𝗈𝗂𝖼𝖾 branching over the countable set of choices. For an operator to be finite, the operator must eventually reach the only leaf 𝖢𝗁𝗈𝗂𝖼𝖾{}c along all branches. Note that we use the subscript c to distinguish sets and operations on them from their countable set counterparts: 𝑠𝑒𝑡 and 𝑐𝑠𝑒𝑡 are separate types in Isabelle.

The 𝐜𝐨𝐝𝐚𝐭𝐚𝐭𝐲𝐩𝐞 command also introduces the functions 𝗂𝗇𝗉𝗎𝗍𝗌::('i,'o,'d)𝑜𝑝'i𝑠𝑒𝑡 and 𝗈𝗎𝗍𝗉𝗎𝗍𝗌::('i,'o,'d)𝑜𝑝'o𝑠𝑒𝑡 that collect the sets of used ports as well as a map function on the ports 𝗆𝖺𝗉_𝗈𝗉::('i1'i2)('o1'o2)('i1,'o1,'d)𝑜𝑝('i2,'o2,'d)𝑜𝑝. Note that an operator does not necessarily have to use all available ports. For example, an operator of type (𝑛𝑎𝑡,𝑛𝑎𝑡,𝑛𝑎𝑡)𝑜𝑝 does not necessarily have an infinite number of input and output ports; rather, it uses some subset of naturally-numbered ports, given by 𝗂𝗇𝗉𝗎𝗍𝗌 and 𝗈𝗎𝗍𝗉𝗎𝗍𝗌.

Isabelle requires all functions to be total. For recursive functions on datatypes, totality means termination. In contrast, corecursive functions, which produce elements of a codatatype, are guaranteed to be total if they are productive – that is, they must always eventually produce a codatatype constructor. The 𝐜𝐨𝐫𝐞𝐜 command facilitates the definition of corecursive functions, in which a codatatype constructor guards all corecursive calls and all call contexts are comprised of friendly functions [5], thereby maintaining productivity and totality.

For example, we define two uncommunicative operators using 𝐜𝐨𝐫𝐞𝐜 in Figure 1. The operator corecurses forever with the 𝖲𝗂𝗅𝖾𝗇𝗍 constructor. The operator corecurses forever with the (singleton) 𝖢𝗁𝗈𝗂𝖼𝖾 constructor as shown in the lemma spin_op_code. The actual 𝐜𝐨𝐫𝐞𝐜 definition is slightly more convoluted. The codatatype nests the corecursive occurrence of 𝑜𝑝, in the countable set type 𝑐𝑠𝑒𝑡, which means that any corecursive call guarded by the 𝖢𝗁𝗈𝗂𝖼𝖾 constructor must be applied using the map function on 𝑐𝑠𝑒𝑡, i.e., the image function `c (written infix). To this end, we pull the corecursive call out of the singleton set. We also use similar transformations in some subsequent operators. To improve readability, we only present the user-friendly derived equations, such as 𝗌𝗉𝗂𝗇_𝗈𝗉_𝖼𝗈𝖽𝖾, rather than the original 𝐜𝐨𝐫𝐞𝐜 command.

Figure 1: Two uncommunicative operators.

2.1 Operators Equivalences

The initial examples , , and represent operators that appear to be identical to an external observer examining their (lack of) communication. However, these operators are syntactically different, which prevents us from proving their equality using 𝑜𝑝’s coinduction principle. Syntactic equality proves too rigid to support more nuanced semantic reasoning.

This limitation leads us to Milner’s classic approach of using bisimilarity [25], which is an equivalence relation based on labeled transition systems (LTS). In an LTS, transitions are associated with labels that represent the performed actions. For our operators, these actions can be a read, a write, or a silent action (τ). Two operators are bisimilar if their corresponding transition systems can mutually simulate each other’s transitions. In this work, we formalize both strong bisimilarity, which requires matching transitions one-by-one in simulations, and weak bisimilarity, which allows one to abstract away finite sequences of silent actions in simulations.

In Figure 2, we define the datatype 𝐼𝑂 for labels and our LTS as the inductive predicate 𝗌𝗍𝖾𝗉, where the first three introduction rules associate the 𝖱𝖾𝖺𝖽, 𝖶𝗋𝗂𝗍𝖾, and 𝖲𝗂𝗅𝖾𝗇𝗍 operator constructors with the appropriate labels. The 𝖢𝗁𝗈𝗂𝖼𝖾 constructor does not have a label on its own; it depends on the existence of a label from a 𝗌𝗍𝖾𝗉 made by an operator in its set of choices.

Strong simulation follows from the definition 𝗌𝗂𝗆 (Figure 2) that takes a relation R and two operators 𝑜𝑝1 and 𝑜𝑝2, which must meet the following condition: whenever 𝑜𝑝1 steps to 𝑜𝑝1 with a label 𝑖𝑜, there must exist a corresponding step from 𝑜𝑝2 to some 𝑜𝑝2 with the same label 𝑖𝑜, such that R relates 𝑜𝑝1 and 𝑜𝑝2. Relation R is a strong simulation if 𝑜𝑝1𝑜𝑝2.R𝑜𝑝1𝑜𝑝2𝗌𝗂𝗆R𝑜𝑝1𝑜𝑝2. Moreover, R is a strong bisimulation when both R and its converse R𝟣 are strong simulations. Strong bisimilarity is the largest strong bisimulation.

Figure 2: Labeled transition system, strong bisimilarity, and corresponding coinduction principle.

In Isabelle/HOL, a coinductive predicate is defined as the greatest fixed point of a corresponding predicate transformer on the predicate lattice following the Knaster–Tarski’s theorem [27]. Coinductive predicates are convenient for defining strong bisimilarity 𝖻𝗂𝗌𝗂𝗆 (denoted with , Figure 2) because they provide a useful coinduction principle [1]. We reformulate and mildly strengthen this coinduction principle for , to improve readability and usability, by incorporating basic upto-techniques for strong bisimilarity (yielding bisimilarity up to reflexivity, up to symmetry, and up to bisimilarity) via the auxiliary inductive predicate 𝒮. A coinduction proof that a coinductive predicate holds for given arguments thus involves exhibiting a witness relation R that relates these arguments. Isabelle’s coinduction proof method [6] generates a canonical instantiation for R when applied to a specific goal.

Figure 3: Examples of operators, and their 𝗌𝗍𝖾𝗉 elimination and introduction rules.

This principle is first used to demonstrate that indeed constitutes an equivalence relation, i.e., it is reflexive, symmetric, and transitive. The next proof by coinduction shows that , which is a consequence that neither operator can make a 𝗌𝗍𝖾𝗉. Three more intricate examples are given in Figure 3. These operators output the natural number 42 at port 1 until they may eventually stop doing so. They illustrate our general technique for proving strong (and weak) bisimilarity: we derive 𝗌𝗍𝖾𝗉 elimination and introduction rules for specific operators, which allow us to prove the cases resulting from the application of the coinduction proof method, e.g., to prove 𝖾𝗑𝟣_𝗈𝗉𝖾𝗑𝟤_𝗈𝗉. Note that neither 𝖾𝗑𝟣_𝗈𝗉 nor 𝖾𝗑𝟤_𝗈𝗉 are strongly bisimilar to 𝖾𝗑𝟥_𝗈𝗉, because 𝖾𝗑𝟥_𝗈𝗉 can perform 𝖳𝖺𝗎 actions while the other two cannot. This leads us towards the notion of weak bisimilarity, which can abstract silent steps away.

Figure 4: Weak bisimilarity and corresponding coinduction principle.

To define weak bisimilarity, we first define what constitutes a weak step, denoted by 𝗐𝗌𝗍𝖾𝗉 in Figure 4. We use Isabelle’s notation: infix 𝖮𝖮 is the relation composition, the superscript denotes the reflexive-transitive closure, and the superscript == the reflexive closure. A weak step consists of a finite, possibly empty sequence of 𝖳𝖺𝗎 steps, followed by a single extended step (𝖾𝗌𝗍𝖾𝗉) and another sequence of 𝖳𝖺𝗎 steps. The extended step introduces reflexivity for 𝖳𝖺𝗎 actions, which enables operators that lack 𝖳𝖺𝗎 steps to simulate operators that include 𝖳𝖺𝗎 steps.

The definition of 𝗐𝗌𝗂𝗆 is similar to 𝗌𝗂𝗆, but it uses a weak step in the implication’s conclusion. Specifically, for 𝑜𝑝1 to be simulated by 𝑜𝑝2, every step taken by 𝑜𝑝1 with label 𝑖𝑜 must be matched using a corresponding weak step by 𝑜𝑝2 with the same label 𝑖𝑜. The weak bisimulation coinductive predicate 𝗐𝖻𝗂𝗌𝗂𝗆 (denoted with ) is analogous to .

We show that is an equivalence relation by coinduction using wbisim_coinduct following Milner [25, p. 109]. We also show that strong bisimilarity implies weak bisimilarity: 𝑜𝑝1𝑜𝑝2𝑜𝑝1𝑜𝑝2. Lastly, we have 𝖾𝗑𝟤_𝗈𝗉𝖾𝗑𝟥_𝗈𝗉 as an example of equivalence of operators in which one 𝖳𝖺𝗎 step of 𝖾𝗑𝟥_𝗈𝗉 is simulated by a “reflexive step” of 𝖾𝗑𝟤_𝗈𝗉.

2.2 Trace Equivalence

The concept of an external observer analyzing communication channels is formalized through the notion of trace. More specifically, we adopt Jonsson’s view [19]: a trace is a potentially non-terminating, linearly ordered sequence of communication events produced by an operator. Hence, trace equivalence relates two operators if and only if they have the same set of traces.

In Figure 5, we define visible actions using the datatype 𝑉𝐼𝑂, along with a function that transforms 𝑉𝐼𝑂 into 𝐼𝑂. We then introduce the concept of an operator being finished, meaning that the operator can no longer produce any visible actions. This is formalized using the coinductive predicate 𝗐𝖿𝗂𝗇𝗂𝗌𝗁𝖾𝖽 (the prefix 𝗐 means we are in a weak setting that does not observe 𝖳𝖺𝗎 steps). Next, we define the traces of an operator using the coinductive predicate 𝗐𝗍𝗋𝖺𝖼𝖾𝖽, which consists of two cases: the base case, which associates the empty lazy list (𝖫𝖭𝗂𝗅) with the operator being finished, and the coinductive case, which prepends (using 𝖫𝖢𝗈𝗇𝗌) the visible action from a weak step to a trace of the operator after this weak step. Lastly, trace equivalence t is the equality of the set of traces. As expected, we show that weak bisimilarity implies trace equivalence 𝑜𝑝1𝑜𝑝2𝑜𝑝1t𝑜𝑝2.

Figure 5: Traces and trace equivalence.

2.3 Numeral Types for Ports

A typical choice for input/output port types is to use Isabelle/HOL’s existing numeral types 0, 1, 2, 3…which model sets of the respective finite cardinality. For example, an operator with two inputs and three outputs can be represented with the type (2,3,'d)𝑜𝑝.

Because all Isabelle/HOL types must be nonempty, the numeral type 0 is peculiar: it is isomorphic to the type of natural numbers. Its “cardinality” is still 0, because in Isabelle’s standard library the cardinality of infinite sets is defined to be 0. This raises a challenge when working with operators that should have no inputs or outputs, such as operator of type (0,'o,'d)𝑜𝑝 or ('i,0,'d)𝑜𝑝. Although these operators should not be capable of reading or writing data, they may use the infinitely many ports available in 0.

To address this limitation, we introduce a type class called 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠, that defines a set of elements named 𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌. (A similar type class singling out a particular default value already existed in Isabelle, but working with a set results in better closure properties for sum types.) We instantiate the 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 type class for 0 by setting its 𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌 to 𝖴𝖭𝖨𝖵 (the universal set containing all elements of the specified type). For other numeral types 1, 2, …, their 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 instance defines 𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}. We also make the sum type + an instance of 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 by taking the union of the 𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌 from both sides. Hence, the type 0+0, just as 0, has no non-default elements. The set of all non-default elements is 𝔘=𝖴𝖭𝖨𝖵𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌. We introduce another type class 𝑎𝑙𝑙_𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 where 𝔘={}, as a subclass of 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠. Naturally, we also provide instances of 𝑎𝑙𝑙_𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 for 0 and +. Finally, when defining operators, we assume that both input and output ports belong to the 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 and 𝑐𝑜𝑢𝑛𝑡𝑎𝑏𝑙𝑒 type classes and follow the convention to only use ports in 𝔘c (the countable set counterpart of 𝔘).

3 Asynchronous Dataflow Operators

This section explains the set of operators of the network algebra for asynchronous dataflow. Following Bergstra et al. [3], they are divided into two categories. First, basic network algebra consists of three operators for structuring the network – sequential composition, parallel composition, and feedback – along with the identity and transposition constants. The second category includes additional asynchronous dataflow constants such as split, sink, merge, dummy source, asynchronous copy, and asynchronous equality test. Except for the sink, all operators are stateful, maintaining internal buffers to store communication data. These buffers function as intermediate first-in-first-out communication channels where data can reside indefinitely.

3.1 Buffer Infrastructure

Most of our definitions will include a buffer function 𝑏𝑢𝑓::'a'd𝑙𝑖𝑠𝑡 (where 'a is the type of the relevant ports). This buffer function 𝑏𝑢𝑓 represents the operator’s current state, e.g., what data has already been read (and on which ports) but not yet written. In the rest of the paper, we will make use of the following convenience functions on buffers: 𝖡𝖧𝖣 gives the first element, on the given port, in the buffer; 𝖡𝖳𝖫 dequeues this first element; 𝖡𝖤𝖭𝖰 enqueues an element to the buffer; 𝖡𝖴𝖫𝖪_𝖡𝖤𝖭𝖰 (or infix ) concatenates the data contained in two buffers. Buffers act as queues: we add elements to the list’s end and remove from its start.

3.2 Network Algebra Operators

We present building blocks for defining arbitrary asynchronous dataflow operators. We recall the convention established in the previous section: only ports in 𝔘 can be used. The first two primitives are the identity and transposition operators, defined respectively as 𝗂𝖽_𝗈𝗉 and 𝗍𝗋𝖺𝗇𝗌𝗉_𝗈𝗉. Their code equations are given in Figure 6.

Figure 6: The identity and transposition operators.

The operator 𝗂𝖽_𝗈𝗉 is a stream delayer: it can read data from any of its (non-default) input ports and enqueue it in its corresponding internal buffer; the operator can also dequeue data from any non-empty buffer and write data to the corresponding output port. The operator 𝗍𝗋𝖺𝗇𝗌𝗉_𝗈𝗉 has type ('m+'n,'n+'m,'d)𝑜𝑝, i.e., it operates on a sum type for the input ports and transposes the ports in the output. For instance, the data read on some input port 𝖨𝗇𝗅p, where p::'m, is enqueued on the internal buffer of the transposition operator and may eventually be written on output port 𝖨𝗇𝗋p. We abbreviate the initial states of these operators (in which internal buffers are empty) by and 𝒳 respectively.

To combine operators, we define a general composition operator in Figure 7. To compose two operators, we must specify the wiring between them, as well as the buffer associated with this wiring. Given operators 𝑜𝑝1::('i1,'o1,'d)𝑜𝑝 and 𝑜𝑝2::('i2,'o2,'d)𝑜𝑝, the wire is a partial function 'o1'i2 defining how 𝑜𝑝1’s output ports are connected to 𝑜𝑝2’s input ports.

The definition of 𝖼𝗈𝗆𝗉_𝗈𝗉 uses the 𝖼𝗁𝗈𝗂𝖼𝖾𝗌 function and 𝗌𝗈𝗎𝗇𝖽_𝗋𝖾𝖺𝖽𝗌 abbreviation. The function 𝖼𝗁𝗈𝗂𝖼𝖾𝗌 descends through 𝖢𝗁𝗈𝗂𝖼𝖾 constructors until it reaches a non-𝖢𝗁𝗈𝗂𝖼𝖾 constructor. It thus computes all steps an operator can make (e.g., 𝖼𝗁𝗈𝗂𝖼𝖾𝗌={} and 𝖲𝗂𝗅𝖾𝗇𝗍𝑜𝑝c𝖼𝗁𝗈𝗂𝖼𝖾𝗌𝑜𝑝𝗌𝗍𝖾𝗉𝖳𝖺𝗎𝑜𝑝𝑜𝑝). It returns a countable set of operators: 𝖱𝖾𝖺𝖽, 𝖶𝗋𝗂𝗍𝖾, and 𝖲𝗂𝗅𝖾𝗇𝗍 are returned as singletons and for 𝖢𝗁𝗈𝗂𝖼𝖾𝑜𝑝𝑠 we obtain the union of 𝖼𝗁𝗈𝗂𝖼𝖾𝗌 applied to each operator in 𝑜𝑝𝑠; formally 𝖼𝗁𝗈𝗂𝖼𝖾𝗌(𝖢𝗁𝗈𝗂𝖼𝖾𝑜𝑝𝑠)=c(𝖼𝗁𝗈𝗂𝖼𝖾𝗌`c𝑜𝑝𝑠). The function 𝗌𝗈𝗎𝗇𝖽_𝗋𝖾𝖺𝖽𝗌𝑤𝑖𝑟𝑒𝑏𝑢𝑓𝑜𝑝𝑠 filters the 𝖱𝖾𝖺𝖽 operators in 𝑜𝑝𝑠 to only keep those whose port either is not connected by the wire (i.e., they communicate with the external world) or the buffer on the port connected by the wire is not empty. This makes reading from the empty buffer an impossible action. The standard 𝗋𝖺𝗇 function returns the range of a partial function.

The 𝖼𝗈𝗆𝗉_𝗈𝗉 operator proceeds by stepping one of the operators independently. In particular, either of the operators may be neglected and never take steps. This design is intentional, as it reflects the behavior of independent nodes in a decentralized network, where no global orchestrator ensures a fair or synchronized execution schedule. Moreover, the left operator is restricted to reading inputs from the external environment, while its outputs can be directed either to the wire’s buffers or to the external world, depending on the wiring. Conversely, the second operator can read either from the buffer or from the external environment, again based on the wiring, but its outputs are limited to the external world. Interactions with the wire’s buffers are recorded by 𝖼𝗈𝗆𝗉_𝗈𝗉 as a silent action.

Figure 7: The general, parallel, and sequential composition operators.

We consider two special cases for the wiring function. When 𝑤𝑖𝑟𝑒=(λ_.𝖭𝗈𝗇𝖾), the two operators are not interacting with each other and we thus obtain parallel composition (infix ): the sum types of input and output ports reflect this. When 𝑤𝑖𝑟𝑒=𝖲𝗈𝗆𝖾, it means that the output ports of 𝑜𝑝1 are connected with the input ports of 𝑜𝑝2 so that all the data that 𝑜𝑝2 may read must have been written by 𝑜𝑝1. Thus, we obtain sequential composition (infix ). Here, 𝗆𝖺𝗉_𝗈𝗉𝗉𝗋𝗈𝗃𝗅𝗉𝗋𝗈𝗃𝗋 is applied to remove the sum types, since in this case we know that the sequential composition’s input ports are the input ports of the first operator, and its output ports are the outputs ports of the second operator. Moreover, we prove 𝗂𝗇𝗉𝗎𝗍𝗌(𝖼𝗈𝗆𝗉_𝗈𝗉𝑤𝑖𝑟𝑒𝑏𝑢𝑓𝑜𝑝1𝑜𝑝2)𝖨𝗇𝗅`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝1𝖨𝗇𝗋`(𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝2𝗋𝖺𝗇𝑤𝑖𝑟𝑒) and 𝗈𝗎𝗍𝗉𝗎𝗍𝗌(𝖼𝗈𝗆𝗉_𝗈𝗉𝑤𝑖𝑟𝑒𝑏𝑢𝑓𝑜𝑝1𝑜𝑝2)𝖨𝗇𝗅`(𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝗈𝗆𝑤𝑖𝑟𝑒)𝖨𝗇𝗋`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝2.

Figure 8: The loop and feedback operators.

Similarly to 𝖼𝗈𝗆𝗉_𝗈𝗉 we define a general loop operator 𝗅𝗈𝗈𝗉_𝗈𝗉 in Figure 8, which utilizes the 𝖼𝗁𝗈𝗂𝖼𝖾𝗌 function and is also parameterized by a wiring. The wiring determines when data is going to/coming from the outside environment or the loop’s own buffer (which is both written to and read from). We give a standard wiring configuration using the sum type, where the left-side ports communicate with the external environment, and the right-side ports handle the looping-back mechanism. We call 𝗅𝗈𝗈𝗉_𝗈𝗉 with this wiring instance the feedback operator (_). Notably, ports in the 𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌 set are excluded from looping-back, enabling the definition of feedback operators without any looping-back ports (e.g., when the sum’s right-side is the 0 type). We show that 𝗅𝗈𝗈𝗉_𝗈𝗉 has similar 𝗂𝗇𝗉𝗎𝗍𝗌/𝗈𝗎𝗍𝗉𝗎𝗍𝗌 lemmas as the ones for 𝖼𝗈𝗆𝗉_𝗈𝗉.

The composition and loop operators combine other operators. Thus it is useful to have congruence rules for them. Below, we show 𝖼𝗈𝗆𝗉_𝗈𝗉’s and 𝗅𝗈𝗈𝗉_𝗈𝗉’s congruence rules for weak bisimilarity. Similar rules also hold for strong bisimilarity as well as for the derived operators.

Now, we describe the remaining operators shown in Figure 9. The dummy source operator ( ! ) is defined as the sequential composition of with the . Since does not produce any output, the identity operator, in turn, also generates no output. In contrast, the sink operator (!) is capable of reading (and ignoring) data from all its usable input ports.

To informally describe the final operators, we consider a fixed usable port p of type 'm::{𝑐𝑜𝑢𝑛𝑡𝑎𝑏𝑙𝑒,𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠}. The split operator (Λ) reads data from p and nondeterministically buffers it either on the left (𝖨𝗇𝗅p) or on the right (𝖨𝗇𝗋p). Eventually, it may send that input data out through the corresponding port (𝖨𝗇𝗅p or 𝖨𝗇𝗋p). The merge operator (𝒱) acts as the dual of Λ by nondeterministically reading data from either the left (𝖨𝗇𝗅p) or the right (𝖨𝗇𝗋p) port and buffering it. 𝒱 may eventually send this input data out through port p. The asynchronous copy operator (𝒞) behaves similarly to (Λ), but it buffers the data on both sides (𝖨𝗇𝗅p) and (𝖨𝗇𝗋p). Finally, the equality test operator (𝒬) resembles (𝒱). However, it operates with optional data and only outputs the data when the heads of both buffers are equal; otherwise, it outputs 𝖭𝗈𝗇𝖾.

Figure 9: Additional operators.

4 Asynchronous Dataflow Properties

We show that the axioms from Tables 1, 2, and 3 presented in Bergstra et al. [3] with the exception of axiom A1 in their Table 3 are valid in our shallow embedding. We discuss insights from proving the axioms and explain formulation differences. To make referencing transparent, we will use the axiom numbering of Bergstra et al.’s, but we merge their Tables 2 and 3 (which use the same axiom numbers) into a single table.

First, we establish a notation for sequentially composing an operator with identity: 𝑜𝑝 denotes 𝑜𝑝 and 𝑜𝑝 is 𝑜𝑝. This pre-/post-composition is added in our formalization to certain lemmas to ensure that the operator exhibits the required asynchronous behavior. This allows for a more precise account on where these additions are needed. For instance, the process algebra model from Bergstra et al. [3] surrounds most operators by identities on both sides. Changes to axiom statements of this nature are highlighted in gray. In Section 5, we demonstrate how these differences can be eliminated through the introduction of a new type.

Table 1: Basic network algebra properties.
B1:𝑜𝑝1(𝑜𝑝2𝑜𝑝3)𝗆𝖺𝗉_𝗈𝗉(𝑜𝑝1𝑜𝑝2)𝑜𝑝3
B2_1:𝑜𝑝(::(0,0,'d)𝑜𝑝)𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗅𝖨𝗇𝗅𝑜𝑝
B2_2:(::(0,0,'d)𝑜𝑝)𝑜𝑝𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗋𝖨𝗇𝗋𝑜𝑝
B3:(𝑜𝑝1𝑜𝑝2)𝑜𝑝3𝑜𝑝1(𝑜𝑝2𝑜𝑝3)
B4_1:𝑜𝑝𝑜𝑝 B4_2:𝑜𝑝𝑜𝑝
B5:(𝑜𝑝1𝑜𝑝2)(𝑜𝑝3𝑜𝑝4)(𝑜𝑝1𝑜𝑝3)(𝑜𝑝2𝑜𝑝4)
B6: B7:𝒳𝒳
B8:(𝒳::('i+0,0+'i,'d)𝑜𝑝)𝗆𝖺𝗉_𝗈𝗉𝗂𝖽(𝖼𝖺𝗌𝖾_𝗌𝗎𝗆𝖨𝗇𝗋𝖨𝗇𝗅)
B9:𝒳𝗆𝖺𝗉_𝗈𝗉(𝒳)𝗆𝖺𝗉_𝗈𝗉𝗂𝖽(𝒳)
B10:(𝑜𝑝1𝑜𝑝2)𝒳𝒳(𝑜𝑝2𝑜𝑝1)
F1:(::(0,0,'d)𝑜𝑝) F2:𝒳
R1:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}
𝑜𝑝2(𝑜𝑝1)((𝑜𝑝2)𝑜𝑝1)
R2:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}
(𝑜𝑝1)𝑜𝑝2(𝑜𝑝1(𝑜𝑝2))
R3:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝2𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝2𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}
𝑜𝑝1(𝑜𝑝2)(𝗆𝖺𝗉_𝗈𝗉(𝑜𝑝1𝑜𝑝2))
R4:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝1𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}
𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝2𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝2𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌={}
(𝑜𝑝1(𝑜𝑝2))((𝑜𝑝2)𝑜𝑝1)
R5:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝={}
𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗅𝖨𝗇𝗅((𝑜𝑝::('i+0,'o+0,'d)𝑜𝑝))𝑜𝑝
R6:
𝖨𝗇𝗋-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝={}𝖨𝗇𝗋-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝={}
𝖨𝗇𝗋-`𝖨𝗇𝗅-`𝗂𝗇𝗉𝗎𝗍𝗌𝑜𝑝={}𝖨𝗇𝗋-`𝖨𝗇𝗅-`𝗈𝗎𝗍𝗉𝗎𝗍𝗌𝑜𝑝={}
(𝑜𝑝)(𝗆𝖺𝗉_𝗈𝗉𝑜𝑝)

The first set of axioms is shown in Table 1. First, we identify some notable axioms. Axiom B1 is the associativity of parallel composition and axiom B3 is the associativity of sequential composition. Notice that for B1 we need to map ports with the ::('a+'b)+'c'a+'b+'c function so the sum types are properly reassociated. We also define its inverse function . Bergstra et al. mostly ignore such port mismatches; in a proof assistant more rigor is needed. Both parts of B4 show that an operator sequentially composed with identity can absorb another identity. This follows directly from the auxiliary lemma. We call such lemmas that remove identities identity absorption.

The feedback operator axioms require additional assumptions to guarantee that arbitrary operators do not use default ports when looping-back (on the right). These assumptions use the inverse image function f-`B{x.fxB} to capture the right inputs and outputs. Recall that the wiring of forces 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 ports on the right to not loop back. Without these assumptions, the invariant that the ports on the right are the looping-back ones would be violated. Again, these modifications are highlighted and will be abstracted away in Section 5.

Table 2: Properties of equality test, merge, copy, split, source and sink operators.
A1:(𝒬)𝒬𝗆𝖺𝗉_𝗈𝗉𝗂𝖽((𝒬)𝒬)
A2:𝒳[Uncaptioned image][Uncaptioned image]for [Uncaptioned image]{𝒬,𝒱} A6:[Uncaptioned image]𝒳[Uncaptioned image]for [Uncaptioned image]{𝒞,Λ}
A3𝒬:((!::(0,'a,'d)𝑜𝑝))𝒬(!::(0+'a,0,'d)𝑜𝑝)!
A3𝒱:((!::(0,'a,'d)𝑜𝑝))𝒱𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗋𝗂𝖽
A4:[Uncaptioned image]!!!for [Uncaptioned image]{𝒬,𝒱} A8:![Uncaptioned image]!!for [Uncaptioned image]{𝒞,Λ}
A5:𝒞(𝒞)𝗆𝖺𝗉_𝗈𝗉𝗂𝖽(𝒞(𝒞))
A7:𝒞(!)𝗆𝖺𝗉_𝗈𝗉𝗂𝖽𝖨𝗇𝗋 A9:!!
A10:𝒬𝒞(𝒞𝒞)(𝗆𝖺𝗉_𝗈𝗉(𝗆𝖺𝗉_𝗈𝗉(𝒳)))(𝒬𝒬)
A11:𝒞𝒬 A12:!(::(0,0,'d)𝑜𝑝)
A16:!(::(0,0,'d)𝑜𝑝) A13:!!! A17:!!!
A14:𝗆𝖺𝗉_𝗈𝗉𝗂𝖽𝖨𝗇𝗅([Uncaptioned image]::(0+0,0,'d)𝑜𝑝)for [Uncaptioned image]{𝒬,𝒱}
A15:[Uncaptioned image]𝗆𝖺𝗉_𝗈𝗉(𝗆𝖺𝗉_𝗈𝗉(𝒳))([Uncaptioned image][Uncaptioned image])for [Uncaptioned image]{𝒬,𝒱}
A18:𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗅𝗂𝖽([Uncaptioned image]::(0,0+0,'d)𝑜𝑝)for [Uncaptioned image]{𝒞,Λ}
A19:[Uncaptioned image]([Uncaptioned image][Uncaptioned image])𝗆𝖺𝗉_𝗈𝗉(𝗆𝖺𝗉_𝗈𝗉(𝒳))for [Uncaptioned image]{𝒞,Λ}
F3:(𝗆𝖺𝗉_𝗈𝗉𝗂𝖽𝖨𝗇𝗋[Uncaptioned image])!for [Uncaptioned image]{𝒬,𝒱}
F4:(𝗆𝖺𝗉_𝗈𝗉𝖨𝗇𝗋𝗂𝖽[Uncaptioned image])!for [Uncaptioned image]{𝒞,Λ}
F5:((𝒞)𝗆𝖺𝗉_𝗈𝗉(𝒳)(𝒬))!!

Table 2 is about the asynchronous equality test, merge, copy, split, the dummy source, and sink operators. Axioms A14, A15, A18, and A19 can be seen as a recursive characterization of equality test, merge, copy, and split. The merge and split operators satisfy fewer axioms than equality test and copy. For example, A11 first duplicates inputs and then performs an equality test on them on the subsequent operator, which is bisimilar to the identity operator. For the 𝒱 and Λ, the axiom A11 is not valid because nondeterministic split choices do not necessarily align with the nondeterministic merge order. A related reason prevented us from proving A1 for 𝒱. In fact, we proved that (𝒱)𝒱 and 𝗆𝖺𝗉_𝗈𝗉𝗂𝖽((𝒱)𝒱) are not weakly bisimilar as they make partial nondeterministic choices in different orders. We explain this in more detail at the end of the section. We do believe that these operators are trace equivalent, but do not have a formal proof of this statement.

To prove the axioms, we follow the approach outlined in Section 2.1. This involves deriving introduction and elimination rules, such as the ones in Figure 3, for each operator and using these rules to construct the required simulations generated by the coinduction. The coinduction proofs require generalizing the operator’s buffers, as using empty buffers from the notations would eventually cause us to be outside of the bisimulation during the coinduction. Consequently, we must consider arbitrary buffers and establish an invariant between them. In most cases, this is straightforward. For example, in the generalization of axiom B7 in Figure 10, the buffers from both sides are related using the function. In this figure, the left-side ports are of the numeral type 3, and the right-side ports are of type 1. Also, we re-use the same symbols as the introduced operator notations to unambiguously identify them.

Figure 10: B7: 𝒳𝒳, generalized to arbitrary buffers. The generalization assumes the buffer invariants l=l1l2l3 and r=r1r2r3.

A common pattern in the proofs involves forward reasoning to move data through the buffers. For instance, consider one of the cases in the B7 proof, where on the right-hand side of the weak bisimilarity produces an output from the l buffer. Then the left-hand side must also produce the same output. However, it may be the case that the buffers l2 and l3 are empty, and the head of l1 must traverse the diagram to generate the output. This results in a sequence of 𝖳𝖺𝗎 steps that move l1’s head to l2 and then to l3. The Isar proof language [32] offers fact chaining using 𝐚𝐥𝐬𝐨, which is well-suited for this kind of transitive reasoning.

The axiom A10 was a challenging one to prove as it required a non-trivial buffer invariant. We consider the diagram in Figure 11, and we assume that a=a1a2a3a4a5, and similarly for b, c, and d. Furthermore, we assume 𝑎𝑐=𝑎𝑐1ac2 and 𝑏𝑑=𝑏𝑑1bd2. To find the invariant, we analyze the left-to-right simulation first. Assume that the left-hand side does an equality test step with 𝒬0. This step can only be weakly simulated on the right-hand side if either or both 𝒬1 and 𝒬2 can make some sequences of tests to catch up with 𝒬0. However, there is no way to have a simulation if either 𝒬1 or 𝒬2 has tested more than 𝒬0. The sequence of channels ap and cp must have the same length as they are consumed together by 𝒬1, and symmetrically bp and dp. Thus, for every port p, there exist natural numbers n,m, such that 𝖽𝗋𝗈𝗉n(ap)=xp𝖽𝗋𝗈𝗉n(cp)=yp𝖽𝗋𝗈𝗉m(bp)=xp𝖽𝗋𝗈𝗉m(dp)=yp. As 𝒬1 and 𝒬2 advance independently, we maintain n=0m=0 as invariant. The already tested elements in zv and in zw can be ahead of the tested elements on the right-hand side of the equivalence (𝑎𝑐 and 𝑏𝑑). This means that (zv)p is precisely the same as 𝑎𝑐p followed by the sequence of n elements still to be pairwise tested in ap and cp.

The established invariant must also hold for the right-to-left simulation. We assume an equality test on the right-hand side in 𝒬2 testing if 𝖡𝖧𝖣pb5 equals 𝖡𝖧𝖣pd5. We must avoid testing too little on the left-hand side of the equivalence to keep the invariant. For example, if b=x and d=y, this new equality test with b5 and d5 will reduce the size of b and d by one; therefore, one equality test on the left-hand side must happen as well.

Figure 11: A10: 𝒬𝒞(𝒞𝒞)(𝗆𝖺𝗉_𝗈𝗉(𝗆𝖺𝗉_𝗈𝗉(𝒳)))(𝒬𝒬), generalized to arbitrary buffers.

The axiom A1 for the merge operator is (𝒱)𝒱𝗆𝖺𝗉_𝗈𝗉𝗂𝖽((𝒱)𝒱). Intuitively, equating the two sides means that it does not matter in which order the data from the three input ports are merged: all interleavings should be allowed by either operator. However, using weak bisimilarity, this axiom turns out to be invalid. Note we include here the post-composition with identities after the last merge operator, so we can later lift the result to the type defined in Section 5. Assuming that the operators are weakly bisimilar, then for every weak step that one operator makes, there must exist a weak step that the other operator makes such that they remain equivalent. We now sketch how we derive a contradiction from this. Consider that (𝒱00)𝒱1 (where we annotate operators with indices to distinguish different instances of the same operator) reads a value x on port 1, the top input port in Figure 12, and x is moved beyond 𝒱0 but before 𝒱1 (into the buffer of sequential composition). Then we consider all possible weak steps 𝗆𝖺𝗉_𝗈𝗉𝗂𝖽((2𝒱2)𝒱3) can make by reading value x on port 1. On a high level, there are two separate cases. First, assume x is read and then may be moved along the buffers but only before 𝒱3 actually performs the merge. Then this operator can read any value yx on port 2 (the middle input port in Figure 12) and output it before x. However, this cannot be replicated by the left-hand side because x has already moved beyond 𝒱0, so y would be enqueued after x and cannot be output before x. The second case is that x is read and then moved along the buffers beyond 𝒱3. Then the left-hand side can read any value zx on port 3 (the bottom input port in Figure 12) and output it before x. However, this cannot be replicated by the right-hand side because x has already moved beyond 𝒱3, so z cannot be output before x.

The main challenge we experienced in disproving weak bisimilarity is dealing with weak steps in the assumptions. Weak steps are significantly more difficult to reason about than single steps, because we do not have simple elimination rules for them, as any number of silent steps may occur before and after the visible action. We can use induction on these sequences of silent steps, but it is generally difficult to use the induction hypotheses because they typically require making buffers arbitrary, which was not practical for disproving A1. Our workaround was to explicitly obtain all possible states of the operators after a weak step, which was manageable here because we know exactly what the operators are and we start from empty buffers.

Figure 12: Counterexample for the 𝒱 version of A1: (𝒱)𝒱𝗆𝖺𝗉_𝗈𝗉𝗂𝖽((𝒱)𝒱).

5 Well-Behaved Operators

We outline the process for eliminating the differences in our lemmas compared to Bergstra et al. [3]: the gray highlights in Table 1 and Table 2. To achieve this, we define a subtype of well-behaved operators using Isabelle’s 𝐭𝐲𝐩𝐞𝐝𝐞𝐟 command. The new type requires its elements to be weakly bisimilar to some operator surrounded by and .

Next, we lift every definition to this new type. This includes all operators, weak bisimilarity, and 𝗆𝖺𝗉_𝗈𝗉. The lifting is done by the 𝐥𝐢𝐟𝐭_𝐝𝐞𝐟𝐢𝐧𝐢𝐭𝐢𝐨𝐧 command [18], which requires a proof that the lifted term respects the subtype’s property. When an operator has identity absorption lemmas, it can be lifted without changes; for example, can be lifted directly as it satisfies . Similarly, , , and are lifted directly too; they satisfy (op1op2)(op1)(op2), (op1op2)(op1)(op2), and (op)(op).

Other operators such as 𝒞 are lifted with no changes as well because they also have identity absorption lemmas for both sides: 𝒞𝒞. We lift the definitions of ! and ! using the 𝖺𝗅𝗅_𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌 type class constraint to ensure that they have no usable ports for their inputs and output, respectively. This guarantees their identity absorption lemmas. In contrast, the operators 𝒬 and 𝒱 satisfy only weaker one-sided absorption lemmas: 𝒬𝒬, 𝒱𝒱. Hence, to make the subtype constraint hold, we lift 𝒬 and 𝒱 instead of 𝒬 and 𝒱.

The lifting of 𝗆𝖺𝗉_𝗈𝗉 requires (𝗆𝖺𝗉_𝗈𝗉fg𝑜𝑝)𝗆𝖺𝗉_𝗈𝗉fg(𝑜𝑝) to hold. This is not true in general, but we can impose sufficient constraints on f and g. They must maintain the exclusive use of ports in the set 𝔘c (i.e., x.fx𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌x𝖽𝖾𝖿𝖺𝗎𝗅𝗍𝗌). The same property must also hold for the inverse of f and g because in the bisimilarity proof their inverses are necessary to construct the respective simulation. In addition, this lemma further requires that f and g and their inverses are injective on 𝔘c. Fortunately, all usages of 𝗆𝖺𝗉_𝗈𝗉 in all tables meet these conditions. Lastly, the highlighted assumptions from Table 1 are also eliminated for the lifted operators because they are weakly bisimilar to 𝑜𝑝 for some 𝑜𝑝, the ports of 𝑜𝑝=𝑜𝑝 are in 𝔘c, and weakly bisimilar operators have the same ports.

6 Code Generation

Our operators are infinitary in two respects: they may branch over a countably infinite set of choices and as codatatypes, they may be trees of infinite depth. We use code generation to Haskell [17] to execute operators lazily and produce prefixes of their traces up to a given depth.

A particular challenge for code generation is our usage of the abstract type of countable sets, which are originally defined as a subtype of arbitrary sets. The default code generation setup for sets supports finite and cofinite sets, but not the countable sets we are interested in. Instead, we provide a new setup for countable sets that recasts them as quotients [20] of the lazy list type modulo reordering of elements. For this purpose, we define the abstraction function 𝖼𝗌𝖾𝗍_𝗈𝖿_𝗅𝗅𝗂𝗌𝗍::'a𝑙𝑙𝑖𝑠𝑡'a𝑐𝑠𝑒𝑡 which is used by the code generator as the only pseudo-constructor of countable sets: a standard way to provide code generation for quotients [16]. Then, operations on countable sets can be made executable by providing a code equation that pattern-matches on this pseudo-constructor and performs the according operation on the underlying lazy lists. For example, countable set union becomes lazy list interleaving:

Things become more challenging for functions on countable sets of more complex types, e.g., countable sets themselves. Specifically, the above recipe fails for the function c::('a𝑐𝑠𝑒𝑡)𝑐𝑠𝑒𝑡'a𝑐𝑠𝑒𝑡. It is not a problem to define a corresponding function on lazy lists:

This definition crucially relies on Isabelle’s corecursion up-to friends [5], in particular that 𝗅𝗂𝗇𝗍𝖾𝗋𝗅𝖾𝖺𝗏𝖾 is a friendly function (which is proved automatically for its definition). However, the following equation uses the map function on lazy lists on the left and is thus invalid code:

A dual problem occurs for functions on subtypes with compound return types. Kunčar [22, §6.4] presents and automates a solution for subtypes, which inspires our solution for quotients. We introduce a type 'a𝑐𝑠𝑒𝑡_𝑙𝑙𝑖𝑠𝑡 that is isomorphic to ('a𝑐𝑠𝑒𝑡)𝑙𝑙𝑖𝑠𝑡 as a quotient of ('a𝑙𝑙𝑖𝑠𝑡)𝑙𝑙𝑖𝑠𝑡 (with the abstraction function 𝖺𝖻𝗌_𝖼𝗌𝖾𝗍_𝗅𝗅𝗂𝗌𝗍::('a𝑙𝑙𝑖𝑠𝑡)𝑙𝑙𝑖𝑠𝑡'a𝑐𝑠𝑒𝑡_𝑙𝑙𝑖𝑠𝑡 serving as the pseudo-constructor for code generation). Since this new type does not have a compound domain, the standard lifting of 𝗅𝗆𝖾𝗋𝗀𝖾 to the function 𝖼𝗌𝖾𝗍_𝗅𝗅𝗂𝗌𝗍_𝗆𝖾𝗋𝗀𝖾::'a𝑐𝑠𝑒𝑡_𝑙𝑙𝑖𝑠𝑡'a𝑐𝑠𝑒𝑡 yields a valid code equation. Thus we have reduced the problem to finding an executable function 𝖼𝗌𝖾𝗍_𝗅𝗅𝗂𝗌𝗍_𝗈𝖿::('a𝑐𝑠𝑒𝑡)𝑙𝑙𝑖𝑠𝑡'a𝑐𝑠𝑒𝑡_𝑙𝑙𝑖𝑠𝑡. This function can be defined by lifting the identity on lazy lists. And it can be made executable by lifting the lazy list constructors to 'a𝑐𝑠𝑒𝑡_𝑙𝑙𝑖𝑠𝑡 and using the following code equations:

Overall, we obtain c(𝖼𝗌𝖾𝗍_𝗈𝖿_𝗅𝗅𝗂𝗌𝗍𝑥𝑠𝑠)=𝖼𝗌𝖾𝗍_𝗅𝗅𝗂𝗌𝗍_𝗆𝖾𝗋𝗀𝖾(𝖼𝗌𝖾𝗍_𝗅𝗅𝗂𝗌𝗍_𝗈𝖿𝑥𝑠𝑠), which is executable. We remark that we cannot use Lochbihler and Stoop’s framework [24] for executing generated Isabelle code lazily in strict programming languages, because quotient types are not supported by that framework. Thus, we resort to Haskell as the code generation target.

7 Related Work

This work provides mechanized proofs of most asynchronous dataflow algebra axioms by Bergstra et al [3]. They present the two original algebra instances: the relation-based stream transformer model and the process algebra model using the Algebra of Communicating Processes (ACP) [2]. Our operator definitions closely follow their ACP counterparts. One difference is that most of their operators are defined by explicitly pre- and post-composing with identity operators (𝑜𝑝 instead of 𝑜𝑝), whereas we work with the core operators and only use the additional identities where they are necessary. For the stream transformer model, Broy and Ştefănescu [9] provide pen-and-paper proofs for a slightly different algebra.

The recent Isabelle/HOL formalization of time-aware stream processing [30] bears some similarities to our work. It defines stream processing operators using a codatatype with a single constructor that combines both reading and writing: a finite list of outputs is produced as a reaction to reading an input. In contrast, we separate both operations (so that the operator can decide what it wants to do next) and additionally support nondeterministic choices and silent steps. Their approach covers time metadata associated with processed events (which is necessary for aggregating computations) but does not support loops. We plan to extend our network algebra operators with capabilities to observe time metadata.

Chappe et al. [14] formalize choice trees – a semantic domain for modeling nondeterministic, recursive, and impure programs, based on interaction trees developed by Xia et al [33] in Rocq. (Later, interaction trees have also been formalized in Isabelle [15].) Our operators can be seen as a domain specific variant of choice trees tailored to asynchronous dataflow. In particular, our types make inputs and output ports explicit, which is used in composition and feedback operators. In contrast, choice trees are monadically composed along their only output. Due to Isabelle’s simple types, our operators are restricted to use a fixed type parameter for the data items. In a dependently-typed system, the type of data could be indexed by the corresponding port, much like the interaction type depends on the action type in choice/interaction trees.

Beyond the world of mechanization, the streaming language Flo [23] has been recently proposed as a formal representation of streaming computations originating from systems like Flink [12] and DBSP [11]. Flo makes two assumptions on these systems: streaming progress and eager execution. The first assumption means that a program produces as much output as it can given the observed inputs; the second ensures that this output is deterministic. One practical challenge in Flo is the need to demonstrate that operators satisfy both assumptions.

Synchronous dataflow languages have received some attention in mechanization, with the Lustre [13] compiler Vélus [7] being a prominent example. Bergstra et al. [4] also develop algebraic foundations for synchronous dataflow, which subtly differ from the asynchronous ones.

Milner’s classic chapter on weak bisimilarity [25] provided us with valuable insights for our proofs. The network algebra axioms were well-behaved in a sense that we did not have to use advanced up-to techniques for weak bisimilarity, such as the ones developed by Pous [28, 29].

8 Discussion

We have presented our Isabelle formalization of a semantic domain for asynchronous dataflow along with operators that satisfy all but one of Bergstra et al.’s axioms [3]. Our formalization comprises 28 000 lines of definitions and proofs. A major bulk of this work are the coinduction proofs of the 51 axioms, each spanning between 12 and almost 4 000 lines.

In terms of proof assistant technology, we make use of Isabelle’s support for codatatypes, in particular using nested corecursion through the type of countable sets. While we could define all our desired operators, some definitions needed adjustments from their most natural (presented) formulations to be accepted by 𝐜𝐨𝐫𝐞𝐜. Our formalization may give insight to developers on how to make this advanced command even more convenient to use.

Although not visible in the resulting formalization, we made great use of the Sketch and Explore tool when proving the axioms. Typically, we would invoke 𝑎𝑢𝑡𝑜 and obtain a number of subgoals to work on. (In case of A10 there are more than fifty subgoals.) The Sketch and Explore tool presents these subgoals as an Isar proof outline, so that once finished the 𝑎𝑢𝑡𝑜 invocation can become a terminal proof method, which is deemed better proof style.

Our formalization provides reusable artifacts beyond dataflow. In particular, we have outlined an approach for emulating empty types through the use of the 𝑑𝑒𝑓𝑎𝑢𝑙𝑡𝑠 type class. We have also sketched how to generate executable code for functions on quotient types with compound domain types. Although we were mostly interested in countable sets, we presented a general technique dualizing the existing solution for a similar problem with subtypes.

An anecdote from our formalization journey is illustrative. After having completed all axiom proofs, we have noticed that we actually proved the Λ version of A5, which according to Bergstra et al. is not expected to hold in all models. This made us suspicious and we were inspecting the axioms extra carefully. The proof assistant did not help here: Isabelle had happily accepted our proofs. It turned out that we made a mistake in statements: instead of reassociating sum types we were rotating them, which has trivialized several axioms. We rushed to fix the mistake and proved all axioms but A1 for 𝒬 and 𝒱. After some (useful) struggle with Isabelle, we realized that our definition of 𝒬 deviated from Bergstra et al.’s: instead of replacing non-equal pairs by 𝖭𝗈𝗇𝖾 we were dropping such pairs entirely. Fixing 𝒬 meant that we had to revisit all our proofs involving it, including A10. Here, Isabelle turned out to be tremendously helpful: the proof structure did not change and local fixes (in places Isabelle pointed to) were sufficient to bring us back on track. We finally also managed to prove A1 for 𝒬 (but not for 𝒱).

We are working on proving A1 for 𝒱 using trace equivalence. As future work, we will formalize the history model (i.e., give operators a semantics in terms of input/output relations on lazy lists) and demonstrate that it suffers from the Brock–Ackermann anomaly [8, 19]. We have already formalized the trace semantics, which constitutes Jonsson’s workaround for this issue [19]. However, a characterization of traces for composition and feedback in terms of their arguments’ traces is missing. As a long term objective, we want to make time a first-class citizen in our dataflows, so that we can formalize programs expressed in data stream processing frameworks such as Timely Dataflow [26]. Our work provides a solid foundation for asynchronous dataflow into which we will incorporate promising initial progress in that direction [30, 10].

References

  • [1] Jesper Bengtson. Formalising process calculi. PhD thesis, Uppsala University, 2010.
  • [2] Jan A. Bergstra and Jan Willem Klop. Process algebra for synchronous communication. Inf. Control., 60(1-3):109–137, 1984. doi:10.1016/S0019-9958(84)80025-X.
  • [3] Jan A. Bergstra, Cornelis A. Middelburg, and Gheorghe Stefanescu. Network algebra for asynchronous dataflow. Int. J. Comput. Math., 65(1-2):57–88, 1997. doi:10.1080/00207169708804599.
  • [4] Jan A. Bergstra, Cornelis A. Middelburg, and Gheorghe Stefanescu. Network algebra for synchronous dataflow. CoRR, abs/1303.0382, 2013. arXiv:1303.0382.
  • [5] Jasmin Christian Blanchette, Aymeric Bouzy, Andreas Lochbihler, Andrei Popescu, and Dmitriy Traytel. Friends with benefits - implementing corecursion in foundational proof assistants. In Hongseok Yang, editor, ESOP 2017, volume 10201 of LNCS, pages 111–140. Springer, 2017. doi:10.1007/978-3-662-54434-1_5.
  • [6] Jasmin Christian Blanchette, Johannes Hölzl, Andreas Lochbihler, Lorenz Panny, Andrei Popescu, and Dmitriy Traytel. Truly modular (co)datatypes for Isabelle/HOL. In Gerwin Klein and Ruben Gamboa, editors, ITP 2014, volume 8558 of LNCS, pages 93–110. Springer, 2014. doi:10.1007/978-3-319-08970-6_7.
  • [7] Timothy Bourke, Lélio Brun, Pierre-Évariste Dagand, Xavier Leroy, Marc Pouzet, and Lionel Rieg. A formally verified compiler for Lustre. In Albert Cohen and Martin T. Vechev, editors, PLDI 2017, pages 586–601. ACM, 2017. doi:10.1145/3062341.3062358.
  • [8] J Dean Brock and William B Ackerman. Scenarios: A model of non-determinate computation. In Formalization of Programming Concepts: International Colloquium Peniscola, Spain, April 19–25, 1981 Proceedings, pages 252–259. Springer, 1981. doi:10.1007/3-540-10699-5_102.
  • [9] Manfred Broy and Gheorghe Stefanescu. The algebra of stream processing functions. Theor. Comput. Sci., 258(1-2):99–129, 2001. doi:10.1016/S0304-3975(99)00322-9.
  • [10] Matthias Brun, Sára Decova, Andrea Lattuada, and Dmitriy Traytel. Verified progress tracking for Timely Dataflow. In Liron Cohen and Cezary Kaliszyk, editors, ITP 2021, volume 193 of LIPIcs, pages 10:1–10:20. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.ITP.2021.10.
  • [11] Mihai Budiu, Tej Chajed, Frank McSherry, Leonid Ryzhyk, and Val Tannen. DBSP: automatic incremental view maintenance for rich query languages. Proc. VLDB Endow., 16(7):1601–1614, 2023. doi:10.14778/3587136.3587137.
  • [12] Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. Apache Flink™: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28–38, 2015. URL: http://sites.computer.org/debull/A15dec/p28.pdf.
  • [13] Paul Caspi, Daniel Pilaud, Nicolas Halbwachs, and John Plaice. Lustre: A declarative language for programming synchronous systems. In POPL 1987, pages 178–188. ACM Press, 1987. doi:10.1145/41625.41641.
  • [14] Nicolas Chappe, Paul He, Ludovic Henrio, Yannick Zakowski, and Steve Zdancewic. Choice trees: Representing nondeterministic, recursive, and impure programs in Coq. Proc. ACM Program. Lang., 7(POPL):1770–1800, 2023. doi:10.1145/3571254.
  • [15] Simon Foster, Chung-Kil Hur, and Jim Woodcock. Formally verified simulations of state-rich processes using interaction trees in Isabelle/HOL. In Serge Haddad and Daniele Varacca, editors, CONCUR 2021, volume 203 of LIPIcs, pages 20:1–20:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPICS.CONCUR.2021.20.
  • [16] Florian Haftmann, Alexander Krauss, Ondrej Kuncar, and Tobias Nipkow. Data refinement in Isabelle/HOL. In Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie, editors, ITP 2013, volume 7998, pages 100–115. Springer, 2013. doi:10.1007/978-3-642-39634-2_10.
  • [17] Florian Haftmann and Tobias Nipkow. Code generation via higher-order rewrite systems. In Matthias Blume, Naoki Kobayashi, and Germán Vidal, editors, FLOPS 2010, volume 6009 of LNCS, pages 103–117. Springer, 2010. doi:10.1007/978-3-642-12251-4_9.
  • [18] Brian Huffman and Ondrej Kuncar. Lifting and transfer: A modular design for quotients in Isabelle/HOL. In Georges Gonthier and Michael Norrish, editors, CPP 2013, volume 8307 of LNCS, pages 131–146. Springer, 2013. doi:10.1007/978-3-319-03545-1_9.
  • [19] Bengt Jonsson. A fully abstract trace model for dataflow and asynchronous networks. Distributed Computing, 7:197–212, 1994. doi:10.1007/BF02280834.
  • [20] Cezary Kaliszyk and Christian Urban. Quotients revisited for Isabelle/HOL. In William C. Chu, W. Eric Wong, Mathew J. Palakal, and Chih-Cheng Hung, editors, SAC 2011, pages 1639–1644. ACM, 2011. doi:10.1145/1982185.1982529.
  • [21] Martin Kleppmann and Jay Kreps. Kafka, Samza and the Unix philosophy of distributed data. IEEE Data Eng. Bull., 38(4):4–14, 2015. URL: http://sites.computer.org/debull/A15dec/p4.pdf.
  • [22] Ondřej Kunčar. Types, abstraction and parametric polymorphism in higher-order logic. PhD thesis, Fakultät für Informatik, Technische Universität München, 2016.
  • [23] Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein, and Mae Milano. Flo: A semantic foundation for progressive stream processing. Proc. ACM Program. Lang., 9(POPL):241–270, 2025. doi:10.1145/3704845.
  • [24] Andreas Lochbihler and Pascal Stoop. Lazy algebraic types in Isabelle/HOL, 2018.
  • [25] Robin Milner. Communication and concurrency. PHI Series in computer science. Prentice Hall, 1989.
  • [26] Derek Gordon Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: a timely dataflow system. In Michael Kaminsky and Mike Dahlin, editors, SOSP 2013, pages 439–455. ACM, 2013. doi:10.1145/2517349.2522738.
  • [27] Lawrence C. Paulson. A Fixedpoint Approach to (Co)Inductive and (Co)Datatype Definitions, pages 187–211. MIT Press, Cambridge, MA, USA, 2000.
  • [28] Damien Pous. Weak bisimulation up to elaboration. In Christel Baier and Holger Hermanns, editors, CONCUR 2006, volume 4137 of LNCS, pages 390–405. Springer, 2006. doi:10.1007/11817949_26.
  • [29] Damien Pous. New up-to techniques for weak bisimulation. Theor. Comput. Sci., 380(1-2):164–180, 2007. doi:10.1016/J.TCS.2007.02.060.
  • [30] Rafael Castro Gonçalves Silva and Dmitriy Traytel. Time-aware stream processing in Isabelle/HOL. In Tobias Nipkow, Lawrence C. Paulson, and Makarius Wenzel, editors, Isabelle workshop 2024, pages 1–19, 2024. URL: https://files.sketis.net/Isabelle_Workshop_2024/Isabelle_2024_paper_8.pdf (accessed: July 1, 2025).
  • [31] Rafael Castro Gonçalves Silva, Laouen Fernet, and Dmitriy Traytel. Nondeterministic asynchronous dataflow in Isabelle/HOL artifact, June 2025. doi:10.5281/zenodo.15737868.
  • [32] Makarius Wenzel et al. The Isabelle/Isar reference manual, 2004.
  • [33] Li-yao Xia, Yannick Zakowski, Paul He, Chung-Kil Hur, Gregory Malecha, Benjamin C. Pierce, and Steve Zdancewic. Interaction trees: representing recursive and impure programs in Coq. Proc. ACM Program. Lang., 4(POPL):51:1–51:32, 2020. doi:10.1145/3371119.
  • [34] Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. Discretized streams: fault-tolerant streaming computation at scale. In Michael Kaminsky and Mike Dahlin, editors, SOSP 2013, pages 423–438. ACM, 2013. doi:10.1145/2517349.2522737.