Compositional Active Learning of Synchronizing Systems Through Automated Alphabet Refinement

Henry, Léo; Mousavi, Mohammad Reza; Neele, Thomas; Sammartino, Matteo

doi:10.4230/LIPIcs.CONCUR.2025.20

Compositional Active Learning of Synchronizing Systems Through Automated Alphabet Refinement

Léo Henry

Royal Holloway, University of London, UK Mohammad Reza Mousavi

King’s College London, UK Thomas Neele

Eindhoven University of Technology, The Netherlands Matteo Sammartino

Royal Holloway, University of London, UK

Abstract

Active automata learning infers automaton models of systems from behavioral observations, a technique successfully applied to a wide range of domains. Compositional approaches for concurrent systems have recently emerged. We take a significant step beyond available results, including those by the authors, and develop a general technique for compositional learning of a synchronizing parallel system with an unknown decomposition. Our approach automatically refines the global alphabet into component alphabets while learning the component models. We develop a theoretical treatment of distributions of alphabets, i.e., sets of possibly overlapping component alphabets. We characterize counter-examples that reveal inconsistencies with global observations, and show how to systematically update the distribution to restore consistency. We present a compositional learning algorithm implementing these ideas, where learning counterexamples precisely correspond to distribution counterexamples under well-defined conditions. We provide an implementation, called CoalA, using the state-of-the-art active learning library LearnLib. Our experiments show that in more than 630 subject systems, CoalA delivers orders of magnitude improvements (up to five orders) in membership queries and in systems with significant concurrency, it also achieves better scalability in the number of equivalence queries.

Keywords and phrases:

Active learning, Compositional methods, Concurrency theory, Labelled transition systems, Formal methods

Funding:

Léo Henry: EPSRC project Verification of Hardware Concurrency via Model Learning (CLeVer) – EP/S028641/1.

Mohammad Reza Mousavi: UKRI Trustworthy Autonomous Systems Node in Verifiability – EP/V026801/2; EPSRC project Verified Simulation for Large Quantum Systems (VSL-Q) – EP/Y005244/1; EPSRC project Robust and Reliable Quantum Computing (RoaRQ), Investigation 009 Model-based monitoring and calibration of quantum computations (ModeMCQ) – EP/W032635/1; ITEA/InnovateUK projects GENIUS (600642) and GreenCode (600643).

Thomas Neele: NWO grant VI.Veni.232.224.

Matteo Sammartino: EPSRC project Verification of Hardware Concurrency via Model Learning (CLeVer) – EP/S028641/1.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Active learning

Supplementary Material:

Software (Source Code): https://doi.org/10.5281/zenodo.15170685

Acknowledgements:

We thank the reviewers for their thorough comments and suggestions, and the authors of [27] for their feedback on an earlier version of this paper.

DOI:

10.4230/LIPIcs.CONCUR.2025.20

Event:

36th International Conference on Concurrency Theory (CONCUR 2025)

Editors:

Patricia Bouyer and Jaco van de Pol

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Automata learning [16] has been successfully applied to learn widely-used protocols such as TCP [12], SSH [13], and QUIC [11], CPU caching policies [38] and finding faults in their black-box actual implementations. Moreover it has been applied to a wide range of applications such as bank cards [1] and biometric passports [2, 28]. There are already accessible expositions of the success stories in this field [20, 36]. However, it is well known that the state-of-the-art automata learning algorithms do not scale beyond systems with more than a few hundreds of input and output symbols in their alphabets [36].

Scalability to larger systems with large input alphabets is required for many real-life systems. This requirement has inspired some recent attempts [25, 31, 27] to come up with compositional approaches to automata learning, to address the scalability issues. Some of the past compositional approaches [31] relied on an a-priori knowledge of the system decomposition, while others [25] considered non-synchronizing state machines or a subset of synchronizing ones to help learning the decomposition [27]. These assumptions need to be relaxed to enable dealing with legacy and black box systems and dealing with synchronizing components. In particular, in the presence of an ever-increasing body of legacy automatically-generated code, architectural discovery is a significant challenge [23, 15, 32]. This is a significant challenge, because refactoring and rejuvenating legacy systems has been posed as a significant application of automata learning [33].

In this paper, we take a significant step beyond the available results and develop a compositional automata learning approach that does not assume any pre-knowledge of the decomposition of the alphabet and allows for an arbitrary general synchronization scheme, common in the theory of automata and process calculi [19]. To this end, we take inspirations from the realizability problem in concurrency theory and use iterative refinements of the alphabet decomposition (called distributions [30]) to arrive in a provably sound decomposition while learning the components’ behavior. To our knowledge this is the first result of its kind and the first extension of realizability into the domain of automata learning.

To summarize, the contributions of our paper are listed below:

$\blacksquare$

We develop a novel theory of system decomposition for LTS synchronization that formally characterizes which alphabet decompositions can accurately model observed behaviors, establishing a theoretical foundation for automated component discovery. Proofs of our theorems are given in the appendix.
$\blacksquare$

Based on this, we propose a compositional active learning algorithm that dynamically refines component alphabets during the learning process, supporting standard synchronization mechanisms without requiring a priori knowledge of the system’s decomposition.
$\blacksquare$

We implemented our approach as the prototype tool CoalA, built on the state-of-the-art LearnLib framework [22], and evaluated it on over 630 systems from three benchmark sets. Compared to a monolithic approach, CoalA achieved substantial reductions in queries, with up to five orders of magnitude fewer membership queries and one order fewer equivalence queries across most of our benchmark systems with parallel components, resulting in better overall scalability. The replication package is available at [18].

2 Related work

Realizability of sequential specifications in terms of parallel components has been a long-standing problem in concurrency theory. In the context of Petri nets, this has been pioneered by Ehrenfeucht and Rozenberg [10], followed up by the work of Castellani, Mukund and Thiagarajan [8]. Realizability has been further investigated in other models of concurrency such as team automata [34], session types [5], communicating automata [17] and labelled transition systems (LTSs) [35]. Related to this line of research is the decomposition of LTSs into prime processes [26]. We are inspired by the work of Mukund [30], characterizing the transition systems that can be synthesized into an equivalent parallel system given a decomposition of their alphabet (called distribution). Mukund explores this characterization for two notions of parallel composition (loosely cooperating- and synchronous parallel composition) and three notions of equivalence (isomorphism, language equivalence, and bisimulation). We base our work on the results of Mukund for loosely cooperating systems and language equivalence. We extend it to define consistency between observations and distributions and refining distributions to reinstate consistency.

Our work integrates two recent approaches on compositional learning: we extend the work on learning synchronous parallel composition of automata [31] by automatically learning the decomposition of the alphabets, through refinement of distributions; moreover, we extend the work on learning interleaving parallel composition of automata [25] by enabling a generic synchronization scheme among components. In parallel to our work, an alternative proposal [27] has appeared to allow for synchronization among interleaving automata; however, the proposed synchronization scheme assumes that whenever two components are not ready to synchronize, e.g., because they produce different outputs on the same input, a special output is produced (or otherwise, the semantic model is output deterministic). We do not assume any such additional information and use a synchronization scheme widely used in the theory of automata and process calculi [19]. Other contributions related to compositional learning include the active learning of product automata, a variation of Mealy machine where the output is the combination of outputs of several Mealy machines – as in our case, the component Mealy machines are learned individually [29]; learning of systems of procedural automata [14], sets of automata that can call each other in a way similar to procedure calls; learning asynchronously-communicating finite state machines via queries in the form of message sequence charts [6], though using a monolithic approach.

3 Preliminaries

We use $\Sigma$ to denote a finite alphabet of action symbols, and $\Sigma^{\star}$ to denote the set of finite sequences of symbols in $\Sigma$ , which we call traces; we use $\epsilon\in\Sigma^{\star}$ to denote the empty trace. Given two traces $\sigma_{1},\sigma_{2}\in\Sigma^{\star}$ , we denote their concatenation by $\sigma_{1}\cdot\sigma_{2}$ . We refer to the $i$ th element of $\sigma$ by $\sigma[i]$ . The projection ${\sigma}_{\upharpoonright\Sigma^{\prime}}$ of $\sigma$ on an alphabet $\Sigma^{\prime}\subseteq\Sigma$ is the sequence of symbols in $\sigma$ that are also contained in $\Sigma^{\prime}$ : ${\epsilon}_{\upharpoonright\Sigma^{\prime}}=\epsilon$ and ${\sigma\cdot a}_{\upharpoonright\Sigma^{\prime}}={\sigma}_{\upharpoonright% \Sigma^{\prime}}\cdot a$ if $a\in\Sigma^{\prime}$ and ${\sigma}_{\upharpoonright\Sigma^{\prime}}$ otherwise. We generalize this notation to sets (and thus languages), such that ${S}_{\upharpoonright\Sigma^{\prime}}=\{{\sigma}_{\upharpoonright\Sigma^{\prime% }}\mid\sigma\in S\}$ . Given a set $S$ , we write $|S|$ for its cardinality. We write $\mathsf{img}(f)$ for the image of a function $f$ .

3.1 Labelled Transition Systems

In this work we represent the state-based behavior of a system as a labelled transition system.

Definition 3.1 (Labelled Transition System).

A labelled transition system (LTS) is a four-tuple $T=(S,\Sigma,\mathord{}\to,\linebreak[1]\hat{s})$ , where $S$ is a set of states; $\Sigma$ is a finite alphabet of actions; $\to\,\subseteq S\times\Sigma\times S$ is a transition relation; $\hat{s}\in S$ is an initial state.

We write in infix notation $s\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{a}$}}}t$ for $(s,a,t)\in\mathnormal{\to}$ . We say that an action $a$ is enabled in $s$ , written $s\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{a}$}}}$ , if there is $t$ such that $s\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{a}$}}}t$ . The transition relation and the notion of enabled-ness are also extended to traces $\sigma\in\Sigma^{\star}$ , yielding $s\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{\sigma}$}}}t$ and $s\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{\sigma}$}}}$ .

Definition 3.2 (Language of an LTS).

The language of $T$ is the set of traces enabled from the starting state, formally: $\mathcal{L}(T)=\{\sigma\in\Sigma^{\star}\mid\hat{s}\mathrel{\smash{\raisebox{-% 1.0pt}{$\xrightarrow{\sigma}$}}}\}\enspace.$

Note that languages of LTSs are always prefix-closed, because every prefix of an enabled trace is necessarily enabled. LTSs correspond exactly to prefix-closed automata.

The parallel composition of a finite set of LTSs is a product model representing all possible behaviors when the LTSs synchronize on shared actions. Intuitively, an action $a$ can be performed when all LTSs that have $a$ in their alphabet can perform it in their current state. The other LTSs remain idle during the transition.

Definition 3.3 (Parallel composition).

Given $n$ LTSs $T_{i}=(S_{i},\Sigma_{i},\to_{i},\hat{s}_{i})$ for $1\leq i\leq n$ , their parallel composition, denoted $\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}T_{i}$ , is an LTS $(S_{1}\times\dots\times S_{n},\bigcup_{i=1}^{n}\Sigma_{i},\mathrel{\smash{% \raisebox{-1.0pt}{$\xrightarrow{}$}}},(\hat{s}_{1},\dots,\hat{s}_{n}))$ , where the transition relation $\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{}$}}}$ is given by the following rule:

\frac{\begin{aligned} s_{i}\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{a}$% }}}_{i}t_{i}\quad&\text{for all $i$ such that $a\in\Sigma_{i}$}\\ s_{j}=t_{j}\quad&\text{for all $j$ such that $a\notin\Sigma_{j}$}\end{aligned}% }{(s_{1},\dots,s_{n})\mathrel{\smash{\raisebox{-1.0pt}{$\xrightarrow{a}$}}}(t_% {1},\dots,t_{n})}

We say that an action $a$ is local if there is exactly one $i$ such that $a\in\Sigma_{i}$ ; otherwise, it is called synchronizing. The parallel composition of LTSs thus forces individual LTSs to cooperate on synchronizing actions; local actions can be performed independently. We typically refer to the LTSs that make up a composite LTS as components. For a parallel composition of two LTSs, we use the infix notation, i.e., $T\mathop{\parallel}T^{\prime}$ , when convenient. Synchronization of components corresponds to communication between components in the real world.

We define the corresponding notion for languages on restricted alphabets.

Definition 3.4 (Parallel composition of languages).

Given $n$ languages and alphabets $(\mathcal{L}_{i},\Sigma_{i})$ such that $\mathcal{L}_{i}\subseteq\Sigma_{i}^{\star}$ for all $1\leq i\leq n$ , let $\Sigma=\bigcup_{i=1}^{n}\Sigma_{i}$ . We define $\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}(\mathcal{L}_{i},\Sigma_% {i})$ as

\{\sigma\in\Sigma^{\star}\mid\forall 1\leq i\leq n\ldotp{\sigma}_{% \upharpoonright\Sigma_{i}}\in\mathcal{L}_{i}\}\ .

Example 3.5 (Running example).

Consider the LTSs $T_{1}$ and $T_{2}$ given in Figure 1, with the respective alphabets $\{a,b,c\}$ and $\{b,d\}$ . Their parallel composition is depicted at the bottom of Figure 1.

Figure 1: The parallel composition of two LTSs.

Here $a$ , $c$ and $d$ are local actions, whereas $b$ is synchronizing. Note that, although $T_{2}$ can perform $b$ from its initial state $t_{0}$ , there is no $b$ transition from $(s_{0},t_{0})$ in $T_{1}\mathop{\parallel}T_{2}$ , because $b$ is not enabled in $s_{0}$ . Action $b$ can only be performed in $T_{1}\mathop{\parallel}T_{2}$ after $T_{1}$ does an $a$ or a $c$ and moves to $s_{1}$ , which is captured as the $a$ and $c$ transitions from $(s_{0},t_{0})$ .

3.2 Active Automata Learning

In active automata learning [4], a Learner infers an automaton model of an unknown language $\mathcal{L}$ by querying a Teacher, which knows $\mathcal{L}$ and answers two query types (see Figure 2):

$\blacksquare$

Membership queries: is a trace $\sigma$ in $\mathcal{L}$ ? The Teacher replies yes/no.
$\blacksquare$

Equivalence queries: given a hypothesis model $\mathcal{H}$ , is $\mathcal{L}(\mathcal{H})=\mathcal{L}$ ? The Teacher either replies yes or provides a counter-example – a trace that is in one language but not in the other.

Figure 2: Active automata learning for a target language

\mathcal{L}

.

Algorithms based on this framework – Angluin’s $\mathtt{L^{\star}}$ being the classical example – converge to a canonical model (e.g., the minimal DFA) of the target language. In practice, the Teacher is realized as an interface to the System Under Learning (SUL): membership queries become tests on the SUL, and equivalence queries are approximated via systematic testing strategies [7, 24].

During learning, the learner gathers observations about the SUL. While these observations are typically organized in a data structure (e.g., a table or a tree), they can be abstractly represented as a partial function mapping traces to their accepted ( $\mathnormal{+}$ ) or rejected ( $\mathnormal{-}$ ) status.

Definition 3.6 (Observation function).

An observation function over $\Sigma$ is a partial function $\mathsf{Obs}:\Sigma^{\star}\rightharpoonup\{\mathnormal{+},\mathnormal{-}\}$ .

We write $\mathsf{Dom}(\mathsf{Obs})$ for the domain of $\mathsf{Obs}$ and only consider observation functions with a finite domain. We sometimes represent an observation function $\mathsf{Obs}$ as the set of pairs $\{(\sigma,\mathsf{Obs}(\sigma))\mid\sigma\in\mathsf{Dom}(\mathsf{Obs})\}$ .

Definition 3.7 (Observation function/language agreement).

An observation function $\mathsf{Obs}$ agrees with a language $\mathcal{L}$ , notation $\mathcal{L}\models\mathsf{Obs}$ , whenever $\sigma\in\mathcal{L}\Leftrightarrow\mathsf{Obs}(\sigma)=+$ , for all $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ .

To compositionally learn a model formulated as a parallel composition of LTSs $\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}T_{i}$ , we define how to derive the local observation functions that will be used for the components.

Definition 3.8 (Local observation function).

Given a sub-alphabet $\Sigma_{i}\subseteq\Sigma$ , a local observation function $\mathsf{Obs}_{\Sigma_{i}}:\Sigma_{i}^{\star}\rightharpoonup\{\mathnormal{+},% \mathnormal{-}\}$ is defined such that $\mathsf{Dom}(\mathsf{Obs}_{\Sigma_{i}})={\mathsf{Dom}(\mathsf{Obs})}_{% \upharpoonright\Sigma_{i}}$ and $\mathsf{Obs}_{\Sigma_{i}}(\sigma^{\prime})=\bigvee_{\{\sigma\mid\sigma\in% \mathsf{Dom}(\mathsf{Obs})\wedge{\sigma}_{\upharpoonright\Sigma_{i}}=\sigma^{% \prime}\}}\mathsf{Obs}(\sigma)$ , for all $\sigma^{\prime}\in\mathsf{Dom}(\mathsf{Obs}_{\Sigma_{i}})$ .

This definition is taken to mimic the behavior of parallel composition, i.e., a component $T_{i}$ accepts $\sigma^{\prime}$ if and only if there is some $\sigma$ such that ${\sigma}_{\upharpoonright\Sigma_{i}}=\sigma^{\prime}$ and $\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}T_{i}$ accepts $\sigma$ .

Example 3.9.

Consider again the LTSs from Figure 1 and suppose we are given the following observation function for $T_{1}\mathop{\parallel}T_{2}$ : $\mathsf{Obs}:a\mapsto\mathnormal{+};aa\mapsto\mathnormal{-};abd\mapsto% \mathnormal{+}$ . The local observation functions we obtain for $T_{1}$ and $T_{2}$ are, respectively:

\mathsf{Obs}_{\{a,b,c\}}:a\mapsto\mathnormal{+};aa\mapsto\mathnormal{-};ab% \mapsto\mathnormal{+}\qquad\mathsf{Obs}_{\{b,d\}}:\epsilon\mapsto\mathnormal{+% };bd\mapsto\mathnormal{+}.

The observation $\mathsf{Obs}(abd)=\mathnormal{+}$ requires both components to cooperate, hence $\mathsf{Obs}_{\{a,b,c\}}(ab)=\mathnormal{+}$ and $\mathsf{Obs}_{\{b,d\}}(bd)=\mathnormal{+}$ . We derive $\mathsf{Obs}_{\{b,d\}}(\epsilon)=\mathnormal{+}$ from $\mathsf{Obs}(a)\lor\mathsf{Obs}(aa)=\mathnormal{+}$ , since the projection of both these traces to $\{b,d\}$ is $\epsilon$ .

4 Distributions

In this section, we first discuss how we decompose the global alphabet into a distribution, i.e., a set of potentially overlapping local alphabets. We then give some properties of distributions and their relation to observation functions. Based on these, we explain how to extend a distribution to model a given observation function.

4.1 Distributions and Observations

In a model expressed as a parallel composition $\mathop{\parallel}_{i}T_{i}$ , permuting symbols belonging to different local alphabets does not affect membership of the language. For example, in Figure 1, because $a b c d$ is in the language, we directly know that $a b d c$ is too, as both $c$ and $d$ are local to different components. To formalize this, we first formally define distributions as follows.

Definition 4.1 (Distribution).

A distribution of an alphabet $\Sigma$ is a set $\Omega=\{\Sigma_{1},\dots,\Sigma_{n}\}$ such that $\bigcup_{i=1}^{n}\Sigma_{i}=\Sigma$ .

For the rest of this section, we fix an alphabet $\Sigma$ , a distribution $\Omega=\{\Sigma_{1},\dots,\Sigma_{n}\}$ of $\Sigma$ and an observation function $\mathsf{Obs}$ over $\Sigma$ unless otherwise specified.

For a given distribution $\Omega$ , we define below the class of languages, called product languages over $\Omega$ , that can be represented over that distribution.

Definition 4.2 (Product language).

$\mathcal{L}$ is a product language over $\Omega$ , notation $\Omega\models\mathcal{L}$ , iff there exists a family of languages $\{\mathcal{L}_{i}\}_{1\leq i\leq n}$ , where $\mathcal{L}_{i}\subseteq\Sigma_{i}^{\star}$ for all $1\leq i\leq n$ , such that $\mathcal{L}=\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}(\mathcal{L}% _{i},\Sigma_{i})$ .

Example 4.3.

In Example 3.5, it is clear by construction that $\{\{a,b,c\},\{b,d\}\}\models\mathcal{L}(T_{1}\mathop{\parallel}T_{2})$ . However, $\mathcal{L}(T_{1}\mathop{\parallel}T_{2})$ is not a product language over $\Omega_{singles}=\{\{a\},\{b\},\{c\},\{d\}\}$ because any product language over $\Omega_{singles}$ should allow for permuting $a$ and $b$ and thus, would fail to capture the fact that $b$ can only come after one $a$ .

We recall the following key lemma for product languages.

Lemma 4.4 ([30], Lemma 5.2).

A language $\mathcal{L}$ is a product language over $\Omega$ if and only if $\mathcal{L}=\mathop{\smash{\scalebox{1.2}{$\parallel$}}}_{i=1}^{n}({\mathcal{L% }}_{\upharpoonright\Sigma_{i}},\Sigma_{i})$ .

We can now define product observations over $\Omega$ , i.e., an observation that can be generated by a product language over $\Omega$ .

Definition 4.5 (Product observation).

$\mathsf{Obs}$ is a product observation over $\Omega$ , notation $\Omega\models\mathsf{Obs}$ , iff there exists a language $\mathcal{L}$ such that $\Omega\models\mathcal{L}$ and $\mathcal{L}\models\mathsf{Obs}$ . We conversely say that $\Omega$ models $\mathsf{Obs}$ .

While Definition 4.5 does not prescribe how to find such a distribution given an observation function, it can be used to detect precisely when a current distribution is not consistent with observations and must be updated. This results in the following proposition linking local and global observations for a given distribution: an observation is a product observation over a distribution if and only if its projections according to the distribution hold the same information as the observation itself. The proof follows largely from Lemma 4.4.

Proposition 4.6.

$\Omega\models\mathsf{Obs}$ if and only if for all traces $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ it holds that $\mathsf{Obs}(\sigma)=\bigwedge_{\Sigma_{i}\in\Omega}\mathsf{Obs}_{\Sigma_{i}}(% {\sigma}_{\upharpoonright\Sigma_{i}}).$

Example 4.7.

Following from Example 4.3, consider the following observation function based on $\mathcal{L}(T_{1}\mathop{\parallel}T_{2})$ from Example 3.5:

\mathsf{Obs}:\epsilon\mapsto\mathnormal{+};\ a\mapsto\mathnormal{+};\ ab% \mapsto\mathnormal{+};\ b\mapsto\mathnormal{-};c\mapsto\mathnormal{+};\ d% \mapsto\mathnormal{-}.

Using the above proposition, we can verify that $\Omega_{singles}=\{\{a\},\{b\},\{c\},\{d\}\}\not\models\mathsf{Obs}$ . This is because $\mathsf{Obs}(b)=\mathnormal{-}$ , whereas for all $\Sigma_{i}\in\Omega_{singles}$ , $\mathsf{Obs}_{\Sigma_{i}}({b}_{\upharpoonright\Sigma_{i}})=\mathnormal{+}$ since ${ab}_{\upharpoonright\{b\}}=b$ causes $\mathsf{Obs}_{\{b\}}(b)=\mathnormal{+}$ and for other alphabets ${b}_{\upharpoonright\Sigma_{i}}=\epsilon$ and $\mathsf{Obs}_{\Sigma_{i}}(\epsilon)=+$ . In contrast, $\Omega_{\{a,b\}}=\{\{a,b\},\{c\},\{d\}\}\models\mathsf{Obs}$ since the alphabet $\{a,b\}$ allows for distinguishing observations $b$ and $a b$ .

In our algorithm, this check on local and global observation functions is used to trigger an update of the current distribution exactly when necessary.

Based on the above proposition, we now define counter-examples to a distribution. By definition of local observations, if $\mathsf{Obs}(\sigma)=\mathnormal{+}$ , then $\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_{i}})=\mathnormal{+}$ . Hence, to obtain $\mathsf{Obs}(\sigma)\neq\bigwedge_{\Sigma_{i}\in\Omega}\mathsf{Obs}_{\Sigma_{i% }}({\sigma}_{\upharpoonright\Sigma_{i}})$ , we must have a globally negative observation $\sigma_{N}$ and a set of globally positive observations whose projections to the local components match the projections of $\sigma_{N}$ , indicating a mismatch between global and local observations.

Definition 4.8 (Counter-example to a distribution).

A counter-example to $\Omega\models\mathsf{Obs}$ is a pair $(\sigma_{N},P)\in\mathsf{Dom}(\mathsf{Obs})\times\mathsf{Dom}(\mathsf{Obs})^{\Omega}$ with

$\blacksquare$

$\sigma_{N}$ a negative observation $\mathsf{Obs}(\sigma_{N})=-$ ;
$\blacksquare$

$P$ a function that maps each $\Sigma_{i}\in\Omega$ to a positive observation $\sigma_{\Sigma_{i}}$ , i.e., $\mathsf{Obs}(\sigma_{\Sigma_{i}})=+$ , such that ${\sigma_{N}}_{\upharpoonright\Sigma_{i}}={\sigma_{\Sigma_{i}}}_{% \upharpoonright\Sigma_{i}}$ .

We call $\mathsf{img}(P)$ the positive image of the counter-example. We write $\mathit{CED}(\Omega,\mathsf{Obs})$ for the set of such counter-examples.

Although these counter-examples are not necessarily related to learning, we use the same terminology as in active learning. This is because the two concepts are directly linked in our case, as will be explained later.

Example 4.9.

Reusing the observation function $\mathsf{Obs}$ and the singleton distribution $\Omega_{singles}$ defined in Example 4.7, for every element of $\mathit{CED}(\mathsf{Obs},\Omega_{singles})$ we have $\sigma_{N}=b$ . and $P(\{b\})=ab$ . For the remaining elements of $\Omega$ , there are more choices: $\{a\}$ can be mapped to either $\epsilon$ or $c$ ; $\{c\}$ to either $\epsilon$ , $a$ or $a b$ ; and $\{d\}$ to either $\epsilon$ , $a$ , $a b$ or $c$ .

Proposition 4.6, specialized to our definition of counter-examples, yields the following corollary, which will be used in the following to detect that a distribution is a model.

Corollary 4.10.

$\Omega\models\mathsf{Obs}\iff\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$ .

4.2 Resolving a Counter-example

Given a distribution $\Omega$ and a fixed observation function $\mathsf{Obs}$ , one key question is how to extend $\Omega$ to a new distribution $\Omega^{\prime}$ modelling $\mathsf{Obs}$ . This is a difficult problem, as new counter-examples can arise when extending a distribution. In this subsection, we explain how to resolve a single counter-example as a first step.

When a counter-example $(\sigma_{N},P)$ to $\Omega\models\mathsf{Obs}$ exists, it reveals a limitation in the distribution $\Omega$ : the projections of $\sigma_{N}$ coincide with projections of elements in $P$ , making them indistinguishable under the current components. To resolve such counter-examples, it is thus necessary and sufficient to augment $\Omega$ with new components that disrupt this matching. In the following, we will fix $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs})$ as a counter-example to $\Omega\models\mathsf{Obs}$ .

More precisely, for each pair of traces $(\sigma_{N},\sigma)$ with $\sigma\in P$ , it suffices to identify a discrepancy between them. There are two types of discrepancies: multiplicity discrepancies and order discrepancies. A multiplicity discrepancy is a symbol occurring a different number of times in each trace. For this, given a trace $\sigma$ , let $\Sigma^{m}(\sigma)$ denote the multiset of symbols occurring in $\sigma$ . Note that $\Sigma^{m}(\sigma)=\Sigma^{m}(\sigma^{\prime})$ if and only if $\sigma$ is a permutation of $\sigma^{\prime}$ . The symmetric difference of multisets $A$ and $B$ is denoted $A\mathrel{\Delta}B$ .

Definition 4.11 (Multiplicity discrepancy).

Given a $\Sigma_{i}\in\Omega$ , the set of multiplicity discrepancies for $\Sigma_{i}$ is $\mathcal{D}_{m}^{\Sigma_{i}}(\sigma_{N},P)=\Sigma^{m}(\sigma_{N})\mathrel{% \Delta}\Sigma^{m}(P(\Sigma_{i}))$ .

We now define an order discrepancy, i.e., a pair of symbols whose relative positions differ between the traces. We do this by considering whether symbols that are not a multiplicity discrepancy, i.e., those appearing the same number of times in both traces, are permuted. We choose the permutation such that the relative order of identical symbols is maintained.

Definition 4.12 (Order discrepancy).

Given $\Sigma_{i}\in\Omega$ , let $\theta=\Sigma\setminus(\Sigma^{m}(\sigma_{N})\mathrel{\Delta}\Sigma^{m}(P(% \Sigma_{i})))$ be the symbols on which $\sigma_{N}$ and $P(\Sigma_{i})$ agree and define $\sigma_{N}^{\prime}={\sigma_{N}}_{\upharpoonright\theta}$ . Let $\pi$ be the unique permutation such that $\sigma_{N}^{\prime}=\pi({{P(\Sigma_{i})}}_{\upharpoonright\theta})$ and $\sigma_{N}^{\prime}[j]=\sigma_{N}^{\prime}[k]\implies\pi(j)<\pi(k)$ , for all $j<k$ . The set of order discrepancies for $\Sigma_{i}$ is then:

\mathcal{D}_{o}^{\Sigma_{i}}(\sigma_{N},P)=\{\{\sigma_{N}^{\prime}[j],\sigma_{% N}^{\prime}[k]\}\mid k<j\land\pi(k)>\pi(j)\}

Multiplicity and order discrepancies can be found in linear and quadratic time, respectively. Finally, we define the discrepancies for a counter-example as sets that contain at least a discrepancy of either type for each alphabet in $\Omega$ .

Definition 4.13 (Discrepancy set).

A set $\delta\subseteq\Sigma$ is a discrepancy for $(\sigma_{N},P)$ iff for all $\Sigma_{i}\in\Omega$ , either $\mathcal{D}_{m}^{\Sigma_{i}}(\sigma_{N},P)\cap\delta\neq\emptyset$ or there is $\delta^{\Sigma_{i}}\in\mathcal{D}_{o}^{\Sigma_{i}}(\sigma_{N},P)$ such that $\delta^{\Sigma_{i}}\subseteq\delta$ . We write $\mathcal{D}(\sigma_{N},P)$ for the set of all discrepancies for the counter-example $(\sigma_{N},P)$ .

For a set of counter-examples $\{ce_{1},\dots,ce_{n}\}$ , we write $\mathcal{D}(\{ce_{1},\dots,ce_{n}\})=\{\{\delta_{1},\ldots,\delta_{n}\}\mid% \forall i\ldotp\delta_{i}\in\mathcal{D}(ce_{i})\}$ , representing all possible selections of one discrepancy per counter-example.

Example 4.14 (Multiplicity discrepancy).

Following Example 4.9 with the singleton distribution $\Omega_{singles}=\{\{a\},\{b\},\allowbreak\{c\},\{d\}\}$ , consider the counter-example $(\sigma_{N},P)=(b,(\{a\}\mapsto\epsilon,\{b\}\mapsto ab,\{c\}\mapsto\epsilon,% \{d\}\mapsto\epsilon))$ . We find the following multiplicity discrepancies: $\mathcal{D}_{m}^{\Sigma_{i}}(\sigma_{N},P)=\{b\}$ , for $\Sigma_{i}\in\{\{a\},\{c\},\{d\}\}$ , because $b$ occurs once in $\sigma_{N}$ vs. zero times in $P(\Sigma_{i})$ , and $\mathcal{D}_{m}^{\{b\}}(\sigma_{N},P)=\{a\}$ . Hence, $\mathcal{D}(\sigma_{N},P)$ includes all subsets of $\{a,b,c,d\}$ containing $\{a,b\}$ . For a different counter-example such as $(\sigma_{N},P^{\prime})=(b,(\{a\}\mapsto c,\{b\}\mapsto ab,\{c\}\mapsto ab,\{d% \}\mapsto c))$ , we obtain $\mathcal{D}_{m}^{\{a\}}(\sigma_{N},P^{\prime})=\mathcal{D}_{m}^{\{d\}}(\sigma_% {N},P^{\prime})=\{b,c\}$ and $\mathcal{D}_{m}^{\{b\}}(\sigma_{N},P^{\prime})=\mathcal{D}_{m}^{\{c\}}(\sigma_% {N},P^{\prime})=\{a\}$ , so $\mathcal{D}(\sigma_{N},P^{\prime})$ includes any subset of $\{a,b,c,d\}$ that contains either $\{a,b\}$ or $\{a,c\}$ .

Example 4.15 (Order discrepancy).

Consider a singleton distribution $\Omega_{singles}=\{\{a\},\{b\},\{c\}\}$ with $\mathsf{Obs}(abc)=\mathnormal{+}$ and $\mathsf{Obs}(bac)=\mathnormal{-}$ . This yields the counter-example $(\sigma_{N},P)=(bac,(\{a\}\mapsto abc,\{b\}\mapsto abc,\{c\}\mapsto abc))$ . For each $\Sigma_{i}\in\Omega_{singles}$ , we find $\mathcal{D}_{o}^{\Sigma_{i}}(\sigma_{N},P)=\{\{a,b\}\}$ . Intuitively, these discrepancies reveals that singleton components allow for all permutations of $a$ and $b$ , but the observation function forbids some of them. Therefore, $\mathcal{D}(\sigma_{N},P)$ includes all subsets of $\{a,b,c\}$ containing $\{a,b\}$ .

We can now state that discrepancies are both sufficient and necessary additions to a distribution in order to eliminate their counter-examples.

Proposition 4.16.

Suppose there exists $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs})$ . For each discrepancy $\delta\in\mathcal{D}(\sigma_{N},P)$ , $(\sigma_{N},P)\not\in\mathit{CED}(\Omega\cup\{\delta\},\mathsf{Obs})$ . Conversely, for any distribution $\Omega^{\prime}$ of $\Sigma$ where $\delta\not\subseteq\Sigma_{i}$ , for all $\delta\in\mathcal{D}(\sigma_{N},P)$ and all $\Sigma_{i}\in\Omega^{\prime}$ , $(\sigma_{N},P)$ is a counter-example to $\Omega^{\prime}\models\mathsf{Obs}$ .

4.3 Extending a Distribution to Model an Observation Function

Using the previous subsection as a basis, we leverage structural properties of distributions to restrict the possible counter-examples that can appear when updating the distribution. Finally, we devise an iterative process that is guaranteed to converge to a distribution modelling $\mathsf{Obs}$ .

Pre-ordering of Distributions

Distributions can be preordered by their “connecting power”, i.e., by the extent to which they connect symbols together as part of the same alphabets.

Definition 4.17 (Connectivity preorder).

Given two distributions $\Omega$ and $\Omega^{\prime}$ of alphabet $\Sigma$ , we say that $\Omega$ is less connecting than $\Omega^{\prime}$ and write $\Omega\preccurlyeq\Omega^{\prime}$ when $\forall\Sigma_{i}\in\Omega\ldotp\exists\Sigma_{j}\in\Omega^{\prime}\ldotp% \Sigma_{i}\subseteq\Sigma_{j}$ (equivalently, $\Omega^{\prime}$ is said to be more connecting than $\Omega$ ). The relation is strict, written $\Omega\prec\Omega^{\prime}$ , when $\Omega^{\prime}\not\preccurlyeq\Omega$ . The relation $\preccurlyeq$ forms a preorder with finite chains.

We relate this notion to the sets of counter-examples for a fixed observation function to show that adding connections in a distribution makes the counter-example set progress along a preorder. For this, we first define a notion of inclusion for counter-examples.

Definition 4.18 (Counter-example inclusion).

Consider two distributions $\Omega$ and $\Omega^{\prime}$ of $\Sigma$ , $(\sigma_{N},P)$ a counter-example to $\Omega\models\mathsf{Obs}$ and $(\sigma_{N}^{\prime},P^{\prime})$ a counter-example to $\Omega^{\prime}\models\mathsf{Obs}$ . We write $(\sigma_{N},P)\subseteq(\sigma_{N}^{\prime},P^{\prime})$ whenever $\sigma_{N}=\sigma_{N}^{\prime}$ and $\mathsf{img}(P)\subseteq\mathsf{img}(P^{\prime})$ . The strict inclusion $(\sigma_{N},P)\subset(\sigma_{N}^{\prime},P^{\prime})$ holds whenever $\sigma_{N}=\sigma_{N}^{\prime}$ and $\mathsf{img}(P)\subset\mathsf{img}(P^{\prime})$ .

In its simplest form, progress means eliminating counter-examples from the current set of counter-examples $\mathit{CED}(\Omega,\mathsf{Obs})$ . However, a counter-example $c e$ might be replaced by new counter-examples $ce^{\prime}$ such that $ce\subseteq ce^{\prime}$ , which emerge when new connections are added to the distribution. Hence, progress means that some counter-examples are either eliminated or replaced by subsuming ones, as depicted in Figure 3.

Figure 3: The different relations between the elements of two sets of counter-examples

CE\preccurlyeq CE^{\prime}

.

Definition 4.19 (Counter-example set preordering).

Consider $C E$ and $CE^{\prime}$ sets of counter-examples. We write $CE\preccurlyeq CE^{\prime}$ when $\forall ce^{\prime}\in CE^{\prime}.\ \exists ce\in CE.\ ce\subseteq ce^{\prime}$ . We write $CE\prec CE^{\prime}$ when, furthermore, either $ce\subset ce^{\prime}$ or $CE\preccurlyeq CE^{\prime}\setminus\{ce^{\prime}\}$ , for $ce\in CE$ and $ce^{\prime}\in CE^{\prime}$ .

Example 4.20.

We give a short example of the preorder on set of counter-examples inspired by our running example, that will later appear in Example 5.5:

\{(b,(\{a,b\}\mapsto cb,\{c\}\mapsto\epsilon,\{d\}\mapsto\epsilon))\}\;% \preccurlyeq\;\{(b,(\{a,b\}\mapsto cb,\{b,c\}\mapsto ab,\{d\}\mapsto\epsilon))% \}\ .

Notice that these two singleton sets’ only difference is $\{c\}\mapsto\epsilon$ replaced by $\{b,c\}\mapsto ab$ such that the image of the first is strictly included in the image of the second.

Using the above definitions, we can prove that increasing the connecting power of a distribution ensures that the set of counter-example progresses.

Proposition 4.21.

Consider two distributions $\Omega$ and $\Omega^{\prime}$ of $\Sigma$ . We have $\Omega\preccurlyeq\Omega^{\prime}\Rightarrow\mathit{CED}(\Omega,\mathsf{Obs})% \preccurlyeq\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

As an immediate consequence, whenever $\Omega$ has no counter-examples, any distribution $\Omega^{\prime}$ that is more connecting than $\Omega$ will also have none, i.e., both distributions will model the same observations.

Corollary 4.22.

Let $\Omega$ be a distribution of $\Sigma$ such that $\Omega\models\mathsf{Obs}$ . For any distribution $\Omega^{\prime}$ of $\Sigma$ such that $\Omega\preccurlyeq\Omega^{\prime}$ , we have $\Omega^{\prime}\models\mathsf{Obs}$ .

Fixing the distribution

Using Propositions 4.16 and 4.21, from an initial distribution $\Omega\not\models\mathsf{Obs}$ we can create a more connecting one that entails a strict progression in counter-examples.

Corollary 4.23.

Suppose that $\mathit{CED}(\Omega,\mathsf{Obs})\neq\emptyset$ . For $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs})$ , we pick a discrepancy $\delta_{(\sigma_{N},P)}\in\mathcal{D}(\sigma_{N},P)$ . For any non-empty subset $S$ of $\mathit{CED}(\Omega,\mathsf{Obs})$ , let $\Omega^{\prime}=\Omega\cup\{\delta_{ce}\mid ce\in S\}$ . Then $\Omega\prec\Omega^{\prime}$ and $\mathit{CED}(\Omega,\mathsf{Obs})\prec\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

$\blacktriangleright$ Remark 4.24.

Corollary 4.23 gives us the freedom to select any discrepancy $\delta_{ce}$ , for each counter-example $c e$ . We can select discrepancies that result in a least connecting distribution, which yields a locally optimal greedy strategy for progress. The intuition behind this choice is that it leads to: (1) more components that are individually smaller and easier to learn; and (2) fewer synchronizing actions between components, thus reducing the complexity of coordination among learners (see Section 5.1 for details).

By iteratively applying this Corollary, we can eliminate counter-examples until reaching a distribution that models the observations. This leads to the following convergence result:

Theorem 4.25.

Suppose $\Omega\not\models\mathsf{Obs}$ . The above process converges to a distribution $\Omega^{\prime}\models\mathsf{Obs}$ such that $\Omega\prec\Omega^{\prime}$ after finitely many of steps. For choosing $S=\mathit{CED}(\Omega,\mathsf{Obs})$ at each step, the number of steps is bounded by $|\mathcal{P}(\Sigma)|$ .

Canonical distributions: inducing a partial order

Distributions have few constraints: they only need to span the entire alphabet, which leaves room for redundancies. We propose to remove redundancies without affecting the distribution’s connecting power, by removing alphabets completely contained within another.

Definition 4.26 (Canonical distribution).

Consider a distribution $\Omega=\{\Sigma_{1},\dots,\Sigma_{n}\}$ and $\textit{Sub}=\{\Sigma_{i}\in\Omega\mid\exists\Sigma_{j}\in\Omega.\ \Sigma_{i}% \subset\Sigma_{j}\}$ . The associated canonical distribution is $\llbracket\Omega\rrbracket_{\preccurlyeq}=\Omega\setminus\textit{Sub}$ .

As one would expect $\llbracket\cdot\rrbracket_{\preccurlyeq}$ collapses equivalence classes of the preorder $\preccurlyeq$ (i.e., $\Omega\preccurlyeq\Omega^{\prime}$ and $\Omega^{\prime}\preccurlyeq\Omega$ ) to create a strict partial order. Canonical distributions allow minimizing the number of alphabets in the distribution while retaining the same connecting power. This means that counter-examples can be easily translated between a distribution and its canonical form, and hence the following proposition.

Proposition 4.27.

$\mathit{CED}(\llbracket\Omega\rrbracket_{\preccurlyeq},\mathsf{Obs})=\emptyset% \Leftrightarrow\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$

5 Compositional Learning Algorithm

Figure 4: Overview of the algorithm.

In this section, we present our algorithm to compositionally learn an unknown system $\mathit{SUL}=M_{1}\mathop{\parallel}M_{2}\mathop{\parallel}\dots\mathop{% \parallel}M_{n}$ consisting of the parallel composition of $n$ LTSs, given only a Teacher for the whole SUL and knowledge of the global alphabet $\Sigma_{\mathit{SUL}}$ .

A bird’s eye view of the algorithm is provided in Figure 4. The key idea is to learn each component via a separate learner. Each learner $L_{i}$ poses membership queries independently, which are suitably translated to queries for the global Teacher, until it produces a hypothesis $\mathcal{H}_{i}$ . Hypotheses returned by local learners are combined to create a global equivalence query. Counter-examples obtained through equivalence queries $(\sigma,b)\in\Sigma_{\mathit{SUL}}^{\star}\times\{-,+\}$ are classified as either global or local. They are global when the updated observations $\mathsf{Obs}^{\prime}=\mathsf{Obs}\cup\{(\sigma,b)\}$ and the current distribution $\Omega$ of $\Sigma_{\mathit{SUL}}$ are such that $\Omega\not\models\mathsf{Obs}^{\prime}$ , and local otherwise. Global counter-examples are used to refine the distribution $\Omega$ , possibly creating components/learners. Local counter-examples are used to update the state of local learners.

We briefly recall the local learning procedure, which was introduced in previous work, before moving on to presenting the details of our main algorithm in Section 5.2.

5.1 Local learners

For each alphabet $\Sigma_{i}$ in the distribution $\Omega$ , we spawn a learner $L_{i}$ that is tasked with formulating a hypothesis for the corresponding component. A key feature of this learner is the ability to translate local membership queries into global ones for the entire SUL, and to interpret the results of global queries at the local level [31]. The main difficulty is that this translation is not always feasible: when a membership query contains synchronizing actions, cooperation with the other learners is required. This should be done in a way that is consistent with the current distribution, i.e., if $\Omega\models\mathsf{Obs}$ , then for any global membership query result $(\sigma,b)\in\Sigma^{\star}_{\mathit{SUL}}\times\{-,+\}$ , we require $\Omega\models\mathsf{Obs}\cup\{(\sigma,b)\}$ . Furthermore, due to the nature of local observation functions, it is possible that at first $\mathsf{Obs}_{\Sigma_{i}}(\sigma)=\mathnormal{-}$ , and later this becomes $\mathsf{Obs}_{\Sigma_{i}}(\sigma)=\mathnormal{+}$ as the set of global observations is extended through counter-examples; the learners must account for this. In summary, local learners should:

$\blacksquare$

translate local queries to global ones, preserving $\Omega\models\mathsf{Obs}$ . If a local query cannot be translated, the answer is “unknown”;
$\blacksquare$

be able to handle “unknown” entries in the learning data structure (e.g., an observation table), ensuring progress even in the presence of incomplete information; and
$\blacksquare$

be able to correct negative counter-examples on the basis of a later positive counter-example.

Our previous work [31] shows one way to implement such a learner as an extension of the $\mathtt{L^{\star}}$ algorithm [4], in which LTSs are represented as prefix-closed DFAs. However, we remark that the algorithms proposed in the present work are independent of the implementation of these local learners, as long as they satisfy the requirements above. Thus, our $\mathtt{L^{\star}}$ -based implementation can be swapped out for other active learner algorithms, such as TTT [21].

5.2 Main algorithm

The main algorithm is presented in Algorithm 1. Initially, $\mathsf{Obs}$ is empty, and the distribution $\Omega$ contains singletons of the alphabet of the SUL. The algorithm iteratively performs the following steps.

Each learner is run in parallel until producing a hypothesis. Observations $\mathsf{Obs}$ are suitably updated to record the interactions with the Teacher. Next, the local hypotheses are composed in parallel to form $\mathcal{H}$ , which is submitted to the Teacher as an equivalence query $\mathit{Teacher}(\mathcal{H})$ . If the query returns no counter-example, the algorithm returns $\mathcal{H}$ and terminates. Otherwise, the returned counter-example $(\sigma_{cex},b)$ is added to $\mathsf{Obs}$ .

Crucially, when $(\sigma_{cex},b)$ is a global counter-example, it corresponds exactly to counter-examples to the distribution (Definition 4.8), as shown in the following lemma.

Lemma 5.1.

Given a global counter-example $(\sigma,b)$ , let $\mathsf{Obs}^{\prime}=\mathsf{Obs}\cup\{(\sigma,b)\}$ :

$\blacksquare$

if $b=-$ , then there is $P\in\mathsf{Dom}(\mathsf{Obs})^{\Omega}$ such that $(\sigma,P)\in\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})$ .
$\blacksquare$

else, there is $\sigma_{N}\in\mathsf{Dom}(\mathsf{Obs})$ , $S\subseteq\Omega$ and $P\in\mathsf{Dom}(\mathsf{Obs})^{\Omega\setminus S}$ such that $(\sigma_{N},P\cup(\Sigma_{i}\mapsto\sigma)_{\Sigma_{i}\in S})\in\mathit{CED}(% \Omega,\mathsf{Obs}^{\prime})$

Furthermore, all of the elements of $\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})$ have the above structure.

Therefore, based on this lemma, the distribution is augmented with discrepancies for a chosen subset $S$ of distribution counter-examples, following Corollary 4.23. This process eventually converges to provide $\Omega$ such that $\Omega\models\mathsf{Obs}$ (Theorem 4.25). The new distribution is then optimized by making it canonical (Definition 4.26) and, if desired, increasing its connectivity. The optimization step does not affect counter-example-freeness (by Proposition 4.27 and Corollary 4.22) and may be used to reduce synchronizations, which improves performance (see Section 6). New learners are then started over the updated alphabets.¹¹1In practice, learners leverage $\mathsf{Obs}$ to partially initialize their observation tables.

If, instead, the counter-example is local, its projections are forwarded to the local learners, and the next iteration starts.

$\blacktriangleright$ Remark 5.2.

We leave the selection of counter-example set $S$ as an implementation choice. While $S=\mathit{CED}(\Omega,\mathsf{Obs})$ maximizes counter-example elimination, finding all counter-examples may be expensive. In our implementation, we process just one counter-example at a time, which in practice often yields a valid distribution after a single update step.

Our main theorem for this section states that the algorithm terminates and returns a correct model of the SUL.

Algorithm 1 Main learning algorithm.

Theorem 5.3.

Let $\mathit{SUL}=M_{1}\mathop{\parallel}M_{2}\mathop{\parallel}\dots\mathop{% \parallel}M_{n}$ consist of $n$ parallel LTSs. Algorithm 1 terminates and returns $\mathcal{H}_{1},\mathcal{H}_{2},\dots,\mathcal{H}_{k}$ such that $\mathcal{H}_{1}\mathop{\parallel}\mathcal{H}_{2}\mathop{\parallel}\dots\mathop% {\parallel}\mathcal{H}_{k}\equiv\mathit{SUL}$ .

$\blacktriangleright$ Remark 5.4.

We make no claims regarding the number of components returned by the algorithm, nor whether $\mathcal{H}_{i}\equiv M_{i}$ , for all $i$ . This is because the final distribution may vary depending on counter-example and discrepancy choices. Moreover, different sets of component LTSs can result in the same parallel composition (see [31, Remark 2]). While the composition $\mathcal{H}_{1}\mathop{\parallel}\mathcal{H}_{2}\mathop{\parallel}\dots\mathop% {\parallel}\mathcal{H}_{k}$ may not be canonical (i.e., minimal), each $\mathcal{H}_{i}$ is guaranteed to be a canonical model of the local observation function $\mathsf{Obs}_{\Sigma_{i}}$ , as each local learner is a (slightly modified) instance of $\mathtt{L^{\star}}$ , for which we have minimality guarantees.

Example 5.5 (Example run).

We give an example run where the target SUL is the model of Figure 1. For the sake of simplicity, we focus on the global counter-examples and the subsequent distribution updates, considering only one distribution counter-example per step. Moreover, we consistently select a smallest discrepancy for each counter-example as our greedy strategy to minimize the connectivity of the resulting distribution.

We start from $\Omega_{singles}=\{\{a\},\{b\},\{c\},\{d\}\}$ . The local alphabets initially contain only one symbol, so local learners will make membership queries about traces containing exclusively that symbol. This leads to the components depicted below.

The first global counter-example is $(ab,\mathnormal{+})$ , yielding several counter-examples to $\Omega_{singles}\models\mathsf{Obs}$ , of which we consider $(b,(\{a\}\mapsto\epsilon,\{b\}\mapsto ab,\{c\}\mapsto\epsilon,\{d\}\mapsto% \epsilon))$ . The smallest discrepancy for this counter-example is $\{a,b\}$ . We use it to update the distribution and obtain $\Omega_{ab}=\{\{a,b\},\{c\},\{d\}\}$ ,²²2We made the distribution canonical (Definition 4.26) and removed the $\{a\}$ and $\{b\}$ components., which models the current observations. The new component over $\{a,b\}$ is then learned locally, producing (the $\{c\}$ and $\{d\}$ components are unchanged):

The next global counter-example $(cb,+)$ , leads to distribution counter-example $(b,(\{a,b\}\mapsto cb,\{c\}\mapsto\epsilon,\{d\}\mapsto\epsilon))$ . Its smallest discrepancy is $\{b,c\}$ and the new distribution is $\Omega_{ab,bc}=\{\{a,b\},\{b,c\},\{d\}\}$ . Although the counter-example has been handled, $\Omega_{ab,bc}$ does not model the observations, as $\mathit{CED}(\Omega_{ab,bc},\mathsf{Obs})$ contains $(b,(\{a,b\}\mapsto cb,\{b,c\}\mapsto ab,\{d\}\mapsto\epsilon))$ . Its smallest discrepancy $\{a,b,c\}$ gives $\Omega_{abc}=\{\{a,b,c\},\{d\}\}$ , modelling the observations.

To finish our example, the next global counter-example is $(abd,+)$ . The corresponding distribution counter-example is $(d,(\{a,b,c\}\mapsto\epsilon,\{d\}\mapsto abd))$ . There are two smallest discrepancies for this counter-example: $\{a,d\}$ and $\{b,d\}$ . Selecting $\{b,d\}$ leads to $\{\{a,b,c\},\{b,d\}\}$ , which models the target language and exactly corresponds to the decomposition of Figure 1. Selecting $\{a,d\}$ creates unnecessary connections, resulting (after some omitted steps) in either $\{\{a,b,c\},\{a,d\},\{b,d\}\}$ or $\{\{a,b,c\},\{a,b,d\}\}$ as a final distribution.

Our current implementation selects either discrepancy as both are locally optimal. Finding efficient ways to explore multiple discrepancy choices for globally optimal distributions remains an open challenge for future work.

6 Experiments

We evaluate the effectiveness of our approach in terms of savings in membership and equivalence queries. We did not expect to gain efficiency (absolute execution time) over the mature available tools, because our current implementation of local learners [31] interfaces with an external SAT solver for forming local hypotheses. To start with, we extended the tool Coal [31] into CoalA (for COmpositional Automata Learner with Alphabet refinement), by adding the ability to refine the alphabets based on global counter-examples. The tool is based on LearnLib 0.18.0 [22]. As discussed in Section 5.2, the theory allows optimizing the distribution. Our implementation of this is based on greedily finding a (clique) edge cover in the hypergraph $(\Sigma,\Delta)$ , where $\Delta$ is the set of all discrepancies found thus far. Since the learners $L_{i}$ perform better when their alphabet contains more local actions, our algorithm tries to optimise for this. We sometimes also merge components to convert synchronizations into local actions.

To validate our approach, we experiment with learning LTSs obtained from three sources:

1.

300 randomly generated LTSs from our previous work [31], varying in structure and size.
2.

328 LTSs obtained from Petri nets that are provided by the Hippo tool [40] website;³³3https://hippo.uz.zgora.pl/ these are often more sequential in nature than our other models.
3.

Two scalable realistic models, namely CloudOpsManagement from the 2019 Model Checking Contest [3] and a producers/consumers model [41, Fig. 8].

Using a machine with four Intel Xeon 6136 processors and 3TB of RAM running Ubuntu 20.04, we apply each of three approaches: (i) our black-box compositional approach (CoalA), (ii) compositional learning with given alphabets (Coal) [31], and (iii) monolithic learning with $\mathtt{L^{\star}}$ (as implemented in LearnLib). Coal can be viewed as an idealized (best-case) baseline where the knowledge of the system decomposition is already available. Each run has a time-out of 30 minutes. We record the number of membership and equivalence queries posed to the Teacher, which we assume answers queries in constant time. This eliminates variations in runtime caused by the Teacher, and ensures that local negative counter-examples (see Section 5.1) are always eventually corrected. Resolving this limitation is orthogonal to the current work and left for future research. A complete replication package is at [18].

Figure 5: Performance of

\mathtt{L^{\star}}

and compositional learning on 300 randomly generated models. Dashed lines indicate time-outs. Results obtained with Coal are in gray.

Random models

Figure 5 shows the results on our random models. The colors indicate various communication structures (see [31] for details). As a reference, the results obtained with Coal are given in gray. We observe that CoalA requires significantly fewer membership queries than $\mathtt{L^{\star}}$ (note the logarithmic scale) and is closer to the theoretical optimum of Coal; the result show 5-6 orders of magnitude of improvement in a large number of concurrent systems. The number of equivalence queries required by CoalA is typically slightly higher, but results of larger instances suggest that CoalA scales better than its monolithic counterpart. The data shows that also in the case of equivalence queries, it is not uncommon to gain an order of magnitude of saving by using our approach. We note that CoalA timeouts occur not due to a high query count, but because local learners use SAT solving to construct minimal hypotheses from observation tables that contain unknowns. This process can be computationally expensive.

Petri Net models

The results of learning the Hippo models are given in Figure 6. Here we are not able to run Coal, since the component alphabets are not known. CoalA does not perform as well as on random models, in particular it requires more equivalence queries than monolithic $\mathtt{L^{\star}}$ . This is explained by the fact that these Petri nets contain mostly sequential behavior, i.e., the language roughly has the shape $\mathcal{L}_{1}\cdot a\cdot\mathcal{L}_{2}$ for some languages $\mathcal{L}_{1}\subseteq\Sigma_{1}^{*}$ and $\mathcal{L}_{2}\subseteq\Sigma_{2}^{*}$ . Even though our learner is able to find the decomposition $\{\Sigma_{1}\cup\{a\},\Sigma_{2}\cup\{a\}\}$ , we do not gain much due to the absence of concurrent behavior. In the Hippo benchmark set, CoalA typically finds between two and nine components.

Figure 6: Performance of

\mathtt{L^{\star}}

and compositional learning on Hippo models. Dashed lines indicate time-outs or out-of-memory.

Realistic models

Finally, Table 1 shows the results of learning two scalable models with relevant parameters indicated. No timeouts are reported in the table because all systems were successfully learned within the time limit. CoalA scales well as the SUL size increases, requiring roughly a constant factor more queries than Coal for both CloudOps and producers/consumers. We remark that practically all runtime of CoalA and Coal is spent in the local learners: they require expensive SAT queries to construct a local hypothesis. Improving the implementation of local learners would decrease these times significantly; we stress that this is orthogonal to the goal of the current work. Furthermore, in any practical scenario, the processing of queries by the Teacher forms the bottleneck and $\mathtt{L^{\star}}$ would be much slower than CoalA.

Table 1: Performance of CoalA and

\mathtt{L^{\star}}

for realistic composite systems. Reported runtimes are in seconds. The number of refinement iterations is listed under “it.” and the number of components found under “com”.

		CoalA					Coal				$\mathtt{L^{\star}}$
model	states	time	memQ	eqQ	it.	com	time	memQ	eqQ	com	time	memQ	eqQ
CloudOps W=1,C=1,N=3	$690$	$225.21$	$7\,686$	$106$	$67$	$9$	$1.27$	$880$	$24$	$5$	$3.29$	$2\,740\,128$	$88$
CloudOps W=1,C=1,N=4	$1\,932$	$235.97$	$9\,521$	$115$	$74$	$10$	$1.93$	$923$	$26$	$6$	$29.62$	$22\,252\,120$	$216$
CloudOps W=2,C=1,N=3	$3\,858$	$232.87$	$25\,968$	$98$	$63$	$8$	$357.51$	$9\,812$	$29$	$5$	$12.69$	$12\,574\,560$	$99$
CloudOps W=2,C=1,N=4	$10\,824$	$708.95$	$33\,616$	$111$	$72$	$10$	$555.82$	$9\,532$	$30$	$6$	$143.28$	$91\,178\,900$	$227$
ProdCons K=3,P=2,C=2	$1\,664$	$3.84$	$685$	$26$	$23$	$5$	$1.40$	$301$	$13$	$5$	$3.05$	$2\,141\,165$	$43$
ProdCons K=5,P=1,C=1	$170$	$1.03$	$451$	$22$	$14$	$3$	$0.88$	$118$	$11$	$3$	$0.40$	$160\,126$	$30$
ProdCons K=5,P=2,C=1	$662$	$1.59$	$379$	$24$	$17$	$4$	$0.93$	$158$	$13$	$4$	$2.79$	$2\,523\,625$	$91$
ProdCons K=5,P=2,C=2	$2\,240$	$5.53$	$697$	$28$	$25$	$5$	$2.35$	$282$	$14$	$5$	$6.40$	$6\,984\,705$	$93$
ProdCons K=5,P=3,C=2	$8\,750$	$17.82$	$1\,305$	$31$	$30$	$6$	$6.33$	$604$	$15$	$6$	$72.87$	$60\,186\,235$	$187$
ProdCons K=5,P=3,C=3	$30\,344$	$73.69$	$2\,454$	$34$	$37$	$7$	$32.59$	$1\,269$	$17$	$7$	$321.54$	$222\,567\,729$	$193$
ProdCons K=7,P=2,C=2	$2\,816$	$5.26$	$715$	$29$	$25$	$5$	$2.20$	$307$	$15$	$5$	$16.92$	$15\,792\,997$	$135$

7 Conclusion

We presented a novel active learning algorithm that automatically discovers component decompositions using only global observations. Unlike previous approaches, our technique handles a general synchronization scheme common in automata theory and process calculi, without any prior knowledge of component structure. We developed a theory of alphabet distributions and their relation to observations, formally characterizing counter-examples that indicate inconsistencies between distributions and observations, and providing systematic methods to resolve them. The algorithm spawns a local learner for each component, dynamically refining the distribution of the alphabet based on counter-examples. Our CoalA implementation dramatically reduces membership queries and achieves better query scalability than monolithic learning on highly concurrent systems. Future work will focus on:

$\blacksquare$

developing a theory of counter-example and discrepancy selection to characterize optimal distributions;
$\blacksquare$

extending our approach to apartness-based learning [37] and more expressive formalisms such as register automata [9] or timed automata [39];
$\blacksquare$

leveraging our compositional techniques to analyze AI-generated code [32]; and
$\blacksquare$

improving the efficiency of the local learners [24].

References

[1] Fides Aarts, Joeri de Ruiter, and Erik Poll. Formal models of bank cards for free. In ICST, pages 461–468, 2013. doi:10.1109/ICSTW.2013.60.
[2] Fides Aarts, Julien Schmaltz, and Frits W. Vaandrager. Inference and abstraction of the biometric passport. In ISoLA, volume 6415, pages 673–686, 2010. doi:10.1007/978-3-642-16558-0_54.
[3] Elvio Amparore et al. Presentation of the 9th Edition of the Model Checking Contest. In TACAS, volume 11429, pages 50–68, 2019. doi:10.1007/978-3-030-17502-3_4.
[4] Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87–106, 1987. doi:10.1016/0890-5401(87)90052-6.
[5] Franco Barbanera, Mariangiola Dezani-Ciancaglini, Ivan Lanese, and Emilio Tuosto. Composition and decomposition of multiparty sessions. Journal of Logical and Algebraic Methods in Programming, 119:100620, 2021. doi:10.1016/J.JLAMP.2020.100620.
[6] Benedikt Bollig, Joost-Pieter Katoen, Carsten Kern, and Martin Leucker. Learning Communicating Automata from MSCs. IEEE Transactions on Software Engineering, 36(3):390–408, 2010. doi:10.1109/TSE.2009.89.
[7] Manfred Broy, Bengt Jonsson, Joost-Pieter Katoen, Martin Leucker, and Alexander Pretschner, editors. Model-Based Testing of Reactive Systems, Advanced Lectures [The volume is the outcome of a research seminar that was held in Schloss Dagstuhl in January 2004], volume 3472, 2005. doi:10.1007/B137241.
[8] Ilaria Castellani, Madhavan Mukund, and P. S. Thiagarajan. Synthesizing distributed transition systems from global specification. In FSTTCS, volume 1738, pages 219–231, 1999. doi:10.1007/3-540-46691-6_17.
[9] Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, and Fredrik Tåquist. Scalable tree-based register automata learning. In TACAS, volume 14571, pages 87–108, 2024. doi:10.1007/978-3-031-57249-4_5.
[10] Andrzej Ehrenfeucht and Grzegorz Rozenberg. Partial (set) 2-structures. part II: state spaces of concurrent systems. Acta Informatica, 27(4):343–368, 1990. doi:10.1007/BF00264612.
[11] Tiago Ferreira, Harrison Brewton, Loris D’Antoni, and Alexandra Silva. Prognosis: closed-box analysis of network protocol implementations. In SIGCOMM, pages 762–774, 2021. doi:10.1145/3452296.3472938.
[12] Paul Fiterau-Brostean, Ramon Janssen, and Frits W. Vaandrager. Combining Model Learning and Model Checking to Analyze TCP Implementations. In CAV, volume 9780, pages 454–471, 2016. doi:10.1007/978-3-319-41540-6_25.
[13] Paul Fiterau-Brostean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits W. Vaandrager, and Patrick Verleg. Model learning and model checking of SSH implementations. In SPIN, pages 142–151, 2017. doi:10.1145/3092282.3092289.
[14] Markus Frohme and Bernhard Steffen. Compositional learning of mutually recursive procedural systems. International Journal on Software Tools for Technology Transfer, 23(4):521–543, 2021. doi:10.1007/s10009-021-00634-y.
[15] Dominik Fuchß, Haoyu Liu, Tobias Hey, Jan Keim, and Anne Koziolek. Enabling architecture traceability by llm-based architecture component name extraction. In ICSA, pages 1–12, 2025. doi:10.1109/ICSA65012.2025.00011.
[16] E. Mark Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302–320, 1978. doi:10.1016/S0019-9958(78)90562-4.
[17] Roberto Guanciale and Emilio Tuosto. Realisability of pomsets via communicating automata. In ICE, volume 279, pages 37–51, 2018. doi:10.4204/EPTCS.279.6.
[18] Léo Henry, Mohammad Mousavi, Thomas Neele, and Matteo Sammartino. Replication package for the paper “Compositional Active Learning of Synchronous Systems through Automated Alphabet Refinement”, April 2025. doi:10.5281/zenodo.15170685.
[19] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall International, 2004.
[20] Falk Howar and Bernhard Steffen. Active automata learning in practice - an annotated bibliography of the years 2011 to 2016. In Machine Learning for Dynamic Software Analysis, volume 11026, pages 123–148, 2018. doi:10.1007/978-3-319-96562-8_5.
[21] Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: A redundancy-free approach to active automata learning. In RV, volume 8734, pages 307–322, 2014. doi:10.1007/978-3-319-11164-3_26.
[22] Malte Isberner, Falk Howar, and Bernhard Steffen. The Open-Source LearnLib: A Framework for Active Automata Learning. In CAV, volume 9206, pages 487–495, 2015. doi:10.1007/978-3-319-21690-4_32.
[23] Rainer Koschke. Architecture Reconstruction. Springer-Verlag, 2009. doi:10.1007/978-3-540-95888-8.
[24] Loes Kruger, Sebastian Junges, and Jurriaan Rot. Small test suites for active automata learning. In TACAS, volume 14571, pages 109–129, 2024. doi:10.1007/978-3-031-57249-4_6.
[25] Faezeh Labbaf, Jan Friso Groote, Hossein Hojjat, and Mohammad Reza Mousavi. Compositional learning for interleaving parallel automata. In FoSSaCS, volume 13992, pages 413–435, 2023. doi:10.1007/978-3-031-30829-1_20.
[26] Bas Luttik. Unique parallel decomposition in branching and weak bisimulation semantics. Theoretical Computer Science, 612:29–44, 2016. doi:10.1016/j.tcs.2015.10.013.
[27] Aryan Bastany Mahboubeh Samadi and Hossein Hojjat. Compositional learning for synchronous parallel automata. In FASE, volume 15693, pages 101–121, 2025. doi:10.1007/978-3-031-90900-9_6.
[28] Stefan Marksteiner, Marjan Sirjani, and Mikael Sjödin. Automated passport control: Mining and checking models of machine readable travel documents. In ARES, 2024. doi:10.1145/3664476.3670454.
[29] Joshua Moerman. Learning product automata. In ICGI, volume 93, pages 54–66, 2018. URL: http://proceedings.mlr.press/v93/moerman19a.html.
[30] Madhavan Mukund. From global specifications to distributed implementations. In Synthesis and Control of Discrete Event Systems, pages 19–35, 2002. doi:10.1007/978-1-4757-6656-1_2.
[31] Thomas Neele and Matteo Sammartino. Compositional automata learning of synchronous systems. In FASE, volume 13991, pages 47–66, 2023. doi:10.1007/978-3-031-30826-0_3.
[32] Marc North, Amir Atapour-Abarghouei, and Nelly Bencomo. Code gradients: Towards automated traceability of llm-generated code. In RE, pages 321–329, 2024. doi:10.1109/RE59067.2024.00038.
[33] Mathijs Schuts, Jozef Hooman, and Frits W. Vaandrager. Refactoring of legacy software using model learning and equivalence checking: An industrial experience report. In IFM, volume 9681, pages 311–325, 2016. doi:10.1007/978-3-319-33693-0_20.
[34] Maurice H. ter Beek, Rolf Hennicker, and José Proença. Team automata: Overview and roadmap. In COORDINATION, volume 14676, pages 161–198, 2024. doi:10.1007/978-3-031-62697-5_10.
[35] Maurice H. ter Beek, Rolf Hennicker, and José Proença. Overview and roadmap of team automata. CoRR, abs/2501.13589, 2025. doi:10.48550/arXiv.2501.13589.
[36] Frits W. Vaandrager. Model learning. Communications of the ACM, 60(2):86–95, 2017. doi:10.1145/2967606.
[37] Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A new approach for active automata learning based on apartness. In TACAS, volume 13243, pages 223–243, 2022. doi:10.1007/978-3-030-99524-9_12.
[38] Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf. CacheQuery: Learning Replacement Policies from Hardware Caches. In PLDI, pages 519–532, 2020. doi:10.1145/3385412.3386008.
[39] Masaki Waga. Active learning of deterministic timed automata with myhill-nerode style characterization. In CAV, pages 3–26, 2023. doi:10.1007/978-3-031-37706-8_1.
[40] Remigiusz Wiśniewski, Grzegorz Bazydło, Marcin Wojnakowski, and Mateusz Popławski. Hippo-CPS: A Tool for Verification and Analysis of Petri Net-Based Cyber-Physical Systems. In Petri Nets 2023, volume 13929, pages 191–204, 2023. doi:10.1007/978-3-031-33620-1_10.
[41] W.M. Zuberek. Petri net models of process synchronization mechanisms. In SMC, volume 1, pages 841–847, 1999. doi:10.1109/ICSMC.1999.814201.

Appendix A Proofs

A.1 Section 4

Proposition 4.6. [Restated, see original statement.]

$\Omega\models\mathsf{Obs}$ if and only if for all traces $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ it holds that $\mathsf{Obs}(\sigma)=\bigwedge_{\Sigma_{i}\in\Omega}\mathsf{Obs}_{\Sigma_{i}}(% {\sigma}_{\upharpoonright\Sigma_{i}}).$

Proof.

We make the proof by double implication.

Suppose $\Omega\models\mathsf{Obs}$ . By definition this means that there is $\Omega\models\mathcal{L}$ such that $\mathcal{L}\models\mathsf{Obs}$ . Since $L$ is a product language, Lemma 4.4 yields that $\mathcal{L}=\{\sigma\mid\forall 1\leq i\leq n\ldotp{\sigma}_{\upharpoonright% \Sigma_{i}}\in\mathcal{L}_{i}\}$ where $\mathcal{L}_{i}=\{{\sigma}_{\upharpoonright\Sigma_{i}}\mid\sigma\in\mathcal{L}\}$ for all $i$ .

Now for any $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ , we wish to show that

\mathsf{Obs}(\sigma)=+\text{ iff }\bigwedge_{\Sigma_{i}\in\Omega}\mathsf{Obs}_% {\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_{i}})=+.

We separate the two implications.

If $\mathsf{Obs}(\sigma)=+$ , then $\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_{i}})=+$ for all $i$ by definition of local observation functions and we have our result.

If $\bigwedge_{\Sigma_{i}\in\Omega}\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{% \upharpoonright\Sigma_{i}})=+$ , it means that for any $i$ , $\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_{i}})=+$ . By definition of local observations, for all $i$ there is $\sigma^{\prime}_{i}\in\mathsf{Dom}(\mathsf{Obs})$ such that $\mathsf{Obs}(\sigma^{\prime}_{i})=+$ and ${\sigma}_{\upharpoonright{\Sigma_{i}}}={{\sigma^{\prime}_{i}}}_{% \upharpoonright{\Sigma_{i}}}$ . As $\mathcal{L}\models\mathsf{Obs}$ , $\sigma^{\prime}_{i}\in\mathcal{L}$ .

This implies that $\forall i\ldotp\exists\sigma^{\prime}_{i}\in\mathcal{L}\land{\sigma}_{% \upharpoonright\Sigma_{i}}={{\sigma^{\prime}_{i}}}_{\upharpoonright\Sigma_{i}}$ . Hence for all $i$ , ${\sigma}_{\upharpoonright\Sigma_{i}}\in\mathcal{L}_{i}$ by definition of $\mathcal{L}_{i}$ . It follows by definition of $\mathcal{L}$ that $\sigma\in\mathcal{L}$ . As $\mathcal{L}\models\mathsf{Obs}$ , we have $\mathsf{Obs}(\sigma)=+$ .

Conversely, suppose that $\forall\sigma\in\mathsf{Dom}(\mathsf{Obs}),\ \mathsf{Obs}(\sigma)=\bigwedge_{% \Sigma_{i}\in\Omega}\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_% {i}})$ .

Consider the language $\mathcal{L}=\{\sigma\in\mathsf{Dom}(\mathsf{Obs})\mid\mathsf{Obs}(\sigma)=+\}$ and, for all $i$ , let $\mathcal{L}_{i}=\{\sigma\in\mathsf{Dom}(\mathsf{Obs}_{\Sigma_{i}})\mid\mathsf{% Obs}_{\Sigma_{i}}(\sigma)=+\}$ . Clearly, we have that $\mathcal{L}\models\mathsf{Obs}$ . Furthermore, by assumption, $\mathcal{L}=\{\sigma\mid\forall 1\leq i\leq n,{\sigma}_{\upharpoonright\Sigma_% {i}}\in\mathcal{L}_{i}\}$ . Hence $\Omega\models\mathcal{L}$ by definition. $\Omega\models\mathsf{Obs}$ follows. $\hfill\blacktriangleleft$

Corollary 4.10. [Restated, see original statement.]

$\Omega\models\mathsf{Obs}\iff\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$ .

Proof.

We make the proof by double implication.

When $\mathit{CED}(\Omega,\mathsf{Obs})\neq\emptyset$ there is a counter-example $(\sigma_{N},P)$ to $\mathsf{Obs}\models\mathsf{Obs}$ . In particular $\mathsf{Obs}(\sigma_{N})=-$ and for all $\Sigma_{i}\in\Omega$ $\mathsf{Obs}_{\Sigma_{i}}({\sigma_{N}}_{\upharpoonright\Sigma_{i}})=\mathsf{% Obs}_{\Sigma_{i}}({P(\Sigma_{i})}_{\upharpoonright\Sigma_{i}})=+$ by definition of a counter-example and local observation functions. Hence by Proposition 4.6, $\Omega\not\models\mathsf{Obs}$ .

Conversely, when $\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$ then for any $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ such that $\mathsf{Obs}(\sigma)=-$ there is some $i$ such that for any $\sigma^{\prime}\in\mathsf{Dom}(\mathsf{Obs})$ verifying $\mathsf{Obs}_{\Sigma_{i}}({{\sigma^{\prime}}}_{\upharpoonright\Sigma_{i}})=% \mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright\Sigma_{i}})$ , $\mathsf{Obs}(\sigma^{\prime})=-$ .

It follows that for any $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ such that $\mathsf{Obs}(\sigma)=-$ , $\bigwedge_{1\leq i\leq n}\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright% \Sigma_{i}}=-)$ . As by definition of local observations, for any $\sigma\in\mathsf{Dom}(\mathsf{Obs})$ such that $\mathsf{Obs}(\sigma)=+$ , $\bigwedge_{1\leq i\leq n}\mathsf{Obs}_{\Sigma_{i}}({\sigma}_{\upharpoonright% \Sigma_{i}}=+)$ we have $\Omega\models\mathsf{Obs}$ by Proposition 4.6. $\hfill\blacktriangleleft$

Proposition 4.16. [Restated, see original statement.]

Suppose there exists $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs})$ . For each discrepancy $\delta\in\mathcal{D}(\sigma_{N},P)$ , $(\sigma_{N},P)\not\in\mathit{CED}(\Omega\cup\{\delta\},\mathsf{Obs})$ . Conversely, for any distribution $\Omega^{\prime}$ of $\Sigma$ where $\delta\not\subseteq\Sigma_{i}$ , for all $\delta\in\mathcal{D}(\sigma_{N},P)$ and all $\Sigma_{i}\in\Omega^{\prime}$ , $(\sigma_{N},P)$ is a counter-example to $\Omega^{\prime}\models\mathsf{Obs}$ .

Proof.

Consider a discrepancy $\delta$ . We first prove that introducing $\delta$ into $\Omega$ is sufficient to remove the counter-example. As $\mathsf{Obs}$ is not changed by the change from $\Omega$ to $\Omega^{\prime}$ we have to prove that there is no $\sigma_{\Sigma_{i}}\in\mathsf{img}(P)$ such that ${\sigma_{N}}_{\upharpoonright\delta}={{\sigma_{\Sigma_{i}}}}_{\upharpoonright\delta}$ .

Fix $\sigma_{\Sigma_{i}}\in\mathsf{img}(P)$ .

$\blacksquare$

If $\mathcal{D}_{m}^{\Sigma_{i}}(\sigma_{N},P)\cap\delta\neq\emptyset$ then there is $a\in\delta$ that occurs in different multiplicities in $\sigma_{N}$ and $\sigma_{\Sigma_{i}}$ . Hence, $a$ also occurs in different multiplicities in ${\sigma_{N}}_{\upharpoonright\delta}$ and ${{\sigma}}_{\upharpoonright\delta}$ . Hence, ${\sigma}_{\upharpoonright\delta}\neq{{\sigma_{i}}}_{\upharpoonright\delta}$ .
$\blacksquare$

Otherwise, following the notations of Definition 4.12, $\sigma_{N}^{\prime}=\pi({\sigma_{\Sigma_{i}}}_{\upharpoonright\theta})$ and for all $j<k$ such that $\sigma_{N}^{\prime}[j]=\sigma_{N}^{\prime}[k]$ it holds that $\pi(j)<\pi(k)$ . By preserving the order of equal symbols, we maintain that $\sigma[j]=a$ is the $l$ -th occurrence of $a$ in $\sigma_{N}^{\prime}$ iff $\sigma^{\prime}[\pi(j)]=a$ is the $l$ -th occurrence of $a$ in $\sigma_{i}$ .

Now $\{a,b\}=\{\sigma_{N}^{\prime}[j],\sigma_{N}^{\prime}[k]\}\subseteq\delta$ by definition of a discrepancy and $\{a,b\}$ is created from a so-called inversion in $\pi$ : a pair $(j,k)$ such that $j<k$ and $\pi(j)>\pi(k)$ . By our assumption on $\pi$ , we know that $a\neq b$ and furthermore that in $\sigma^{\prime}$ at least one more copy of $b$ precedes this $a$ (namely the $b$ at $\sigma^{\prime}[\pi(k)]$ ). As a result of this and the fact that $\{a,b\}\subseteq\delta$ , we have ${\sigma_{N}^{\prime}}_{\upharpoonright\delta}\neq{{{\sigma_{\Sigma_{i}}}_{% \upharpoonright\theta}}}_{\upharpoonright\delta}$ . It follows directly that ${\sigma_{N}}_{\upharpoonright\delta}\neq{{\sigma_{\Sigma_{i}}}}_{% \upharpoonright\delta}$ .

We now prove that it is necessary. Consider $\Omega^{\prime}$ defined as above and fix $\Sigma_{j}\in\Omega^{\prime}$ . In the following, we show that there is $\sigma\in\mathsf{img}(P)$ such that ${\sigma_{N}}_{\upharpoonright\Sigma_{j}}={{\sigma}}_{\upharpoonright\Sigma_{j}}$ .

For this, as there is no discrepancy $\delta\in\mathcal{D}(\sigma_{N},P)$ subset to $\Sigma_{j}$ we know that there is at least one $\Sigma_{i}\in\Omega$ such that no order nor multiplicity discrepancy for $\sigma$ $\delta_{\Sigma_{i}}$ is a subset of $\Sigma_{j}$ . Because of this, $\Sigma^{m}(\sigma_{N})\mathrel{\Delta}\Sigma^{m}(P(\Sigma_{i}))\cap\Sigma_{j}=\emptyset$ . Hence ${{P(\Sigma_{i})}}_{\upharpoonright\Sigma_{j}}$ and ${{\sigma_{N}}}_{\upharpoonright\Sigma_{j}}$ are equal up to permutation. Consider any permutation $\pi^{\prime}$ such that $\sigma_{N}^{\prime}=\pi^{\prime}(\sigma_{i})$ (following again the notations of Definition 4.12). Without loss of generality, we do not permute the position of equal symbols, which fixes a unique permutation $\pi^{\prime}=\pi$ . For any $\{\{\sigma_{N}^{\prime}[j],\sigma_{N}^{\prime}[k]\}\mid j<k\land\pi(j)>\pi(k)\}$ , we know (as these are elements of $\mathcal{D}_{o}^{\Sigma_{i}}(\sigma_{N},P)$ ) that we don’t have $\{\sigma_{N}^{\prime}[j],\sigma_{N}^{\prime}[k]\}\subseteq\Sigma_{j}$ . It follows that the restriction of $\pi$ to the indices corresponding to symbols in $\Sigma_{j}$ is non-decreasing and thus the identity. It follows that ${\sigma_{N}}_{\upharpoonright\Sigma_{j}}={P(\Sigma_{i})}_{\upharpoonright% \Sigma_{j}}$ . $\hfill\blacktriangleleft$

Proposition 4.21. [Restated, see original statement.]

Consider two distributions $\Omega$ and $\Omega^{\prime}$ of $\Sigma$ . We have $\Omega\preccurlyeq\Omega^{\prime}\Rightarrow\mathit{CED}(\Omega,\mathsf{Obs})% \preccurlyeq\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

Proof.

We first prove that $\mathit{CED}(\Omega,\mathsf{Obs})\preccurlyeq\mathit{CED}(\Omega^{\prime},% \mathsf{Obs})$ . Suppose $\Omega\preccurlyeq\Omega^{\prime}$ . If $\mathit{CED}(\Omega^{\prime},\mathsf{Obs})=\emptyset$ then we have our result. Else, take $(\sigma_{N},P)\in\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

We know that for each $\Sigma_{i}\in\Omega$ , there is $\Sigma_{j}^{i}\in\Omega^{\prime}$ such that $\Sigma_{i}\subseteq\Sigma_{j}^{i}$ . Fix such a $\Sigma_{j}^{i}$ . We have that ${\sigma_{N}}_{\upharpoonright\Sigma_{i}}={{P(\Sigma_{j}^{i})}}_{% \upharpoonright\Sigma_{i}}$ as $\Sigma_{i}\subseteq\Sigma_{j}^{i}$ and $(\sigma_{N},P)\in\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

By doing this for all $\Sigma_{i}\in\Omega$ , we get $(\sigma_{N},(P(\Sigma_{j}^{i}))_{\Sigma_{i}\in\Omega})\in\mathit{CED}(\Omega,% \mathsf{Obs})$ and by definition $(\sigma_{N},(P(\Sigma_{j}^{i}))_{\Sigma_{i}\in\Omega})\subseteq(\sigma_{N},P)$ . $\hfill\blacktriangleleft$

Corollary 4.23. [Restated, see original statement.]

Suppose that $\mathit{CED}(\Omega,\mathsf{Obs})\neq\emptyset$ . For $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs})$ , we pick a discrepancy $\delta_{(\sigma_{N},P)}\in\mathcal{D}(\sigma_{N},P)$ . For any non-empty subset $S$ of $\mathit{CED}(\Omega,\mathsf{Obs})$ , let $\Omega^{\prime}=\Omega\cup\{\delta_{ce}\mid ce\in S\}$ . Then $\Omega\prec\Omega^{\prime}$ and $\mathit{CED}(\Omega,\mathsf{Obs})\prec\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ .

Proof.

$\Omega\preccurlyeq\Omega^{\prime}$ follows directly from the definition of $\Omega^{\prime}$ : all elements of $\Omega$ are preserved. Furthermore, for $ce\in S$ , for all $\Sigma_{i}\in\Omega$ we have $\delta_{ce}\not\subseteq\Sigma_{i}\in\Omega$ as ${\sigma_{N}}_{\upharpoonright\delta}\neq{{\sigma_{\Sigma_{i}}}}_{% \upharpoonright\delta}$ and ${\sigma_{N}}_{\upharpoonright\Sigma_{i}}={{\sigma_{\Sigma_{i}}}}_{% \upharpoonright\Sigma_{i}}$ . Hence $\Omega^{\prime}\not\preccurlyeq\Omega$ and $\Omega\prec\Omega^{\prime}$ . From Proposition 4.21 we get that $\mathit{CED}(\Omega,\mathsf{Obs})\preccurlyeq\mathit{CED}(\Omega^{\prime},% \mathsf{Obs})$ . Furthermore, we know by Proposition 4.16 that for a fixed $ce\in S$ , $\mathit{CED}(\Omega\cup{D_{ce}},\mathsf{Obs})$ does not contain $c e$ , ensuring that $\mathit{CED}(\Omega,\mathsf{Obs})\prec\mathit{CED}(\Omega\cup{\delta_{ce}},% \mathsf{Obs})$ . As furthermore $\mathit{CED}(\Omega\cup{\delta_{ce}},\mathsf{Obs})\preccurlyeq\mathit{CED}(% \Omega^{\prime},\mathsf{Obs})$ by construction, we have that $\mathit{CED}(\Omega,\mathsf{Obs})\prec\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ . $\hfill\blacktriangleleft$

Theorem 4.25. [Restated, see original statement.]

Suppose $\Omega\not\models\mathsf{Obs}$ . The above process converges to a distribution $\Omega^{\prime}\models\mathsf{Obs}$ such that $\Omega\prec\Omega^{\prime}$ after finitely many of steps. For choosing $S=\mathit{CED}(\Omega,\mathsf{Obs})$ at each step, the number of steps is bounded by $|\mathcal{P}(\Sigma)|$ .

Proof.

In case of convergence, $\Omega\prec\Omega^{\prime}$ is ensured directly by each step of the process by Corollary 4.23. We now prove that convergence occurs in a finite amount of steps. Each step eliminates all counter-examples that are part of the non-empty set $S$ chosen, as proven in Propositions 4.16 and 4.23. As $\mathit{CED}(\Omega,\mathsf{Obs})\prec\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ the only possible new counter-examples are ones that contain strictly counter-examples of $\mathit{CED}(\Omega,\mathsf{Obs})$ . It follows that iterating this process makes strict progress in the following sense: either counter-examples are eliminated or they are replaced by ones with a strictly larger positive image. Because the positive image size of counter-examples is bounded by the size of the distribution, itself bounded by $|\mathcal{P}(\Sigma)|$ , we know that this converges in a finite number of steps.

If $S=\mathit{CED}(\Omega,\mathsf{Obs})$ at each step, we furthermore have that the size of the smallest positive image of a counter-example increases by at least one at each step, which bounds the number of steps by $|\mathcal{P}(\Sigma)|$ . $\hfill\blacktriangleleft$

Proposition 4.27. [Restated, see original statement.]

$\mathit{CED}(\llbracket\Omega\rrbracket_{\preccurlyeq},\mathsf{Obs})=\emptyset% \Leftrightarrow\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$

Proof.

We make the proof for a fixed $\Sigma_{i}\in\textit{Sub}$ , the result following by induction. By definition $\Omega^{\prime}\preccurlyeq\Omega$ , which entails that $\mathit{CED}(\Omega,\mathsf{Obs})\preccurlyeq\mathit{CED}(\Omega^{\prime},% \mathsf{Obs})$ . From there it follows that $\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset\Rightarrow\mathit{CED}(\Omega^{% \prime},\mathsf{Obs})=\emptyset$ . To get the other one, we reason by contrapositive. Suppose $\mathit{CED}(\Omega,\mathsf{Obs})\neq\emptyset$ and consider $(\sigma_{N},P)$ in it. Then clearly, taking $P^{\prime}=(P(\Sigma_{j}))_{j\neq i}$ , $(\sigma_{N},P^{\prime})\in\mathit{CED}(\Omega^{\prime},\mathsf{Obs})$ and we have our result. $\hfill\blacktriangleleft$

A.2 Section 5

Lemma 5.1. [Restated, see original statement.]

Given a global counter-example $(\sigma,b)$ , let $\mathsf{Obs}^{\prime}=\mathsf{Obs}\cup\{(\sigma,b)\}$ :

$\blacksquare$

if $b=-$ , then there is $P\in\mathsf{Dom}(\mathsf{Obs})^{\Omega}$ such that $(\sigma,P)\in\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})$ .
$\blacksquare$

else, there is $\sigma_{N}\in\mathsf{Dom}(\mathsf{Obs})$ , $S\subseteq\Omega$ and $P\in\mathsf{Dom}(\mathsf{Obs})^{\Omega\setminus S}$ such that $(\sigma_{N},P\cup(\Sigma_{i}\mapsto\sigma)_{\Sigma_{i}\in S})\in\mathit{CED}(% \Omega,\mathsf{Obs}^{\prime})$

Furthermore, all of the elements of $\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})$ have the above structure.

Proof.

We know that $\Omega\models\mathsf{Obs}$ and $\Omega\not\models\mathsf{Obs}^{\prime}$ since the counter-example is global. From Corollary 4.10 we get that $\mathit{CED}(\Omega,\mathsf{Obs})=\emptyset$ and $\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})\neq\emptyset$ .

Consider $(\sigma_{N},P)\in\mathit{CED}(\Omega,\mathsf{Obs}^{\prime})$ . Since it is not a counter-example to $\Omega\models\mathsf{Obs}$ , $\sigma$ must appear in it. If $b=-$ we must have that $\sigma_{N}=\sigma$ by definition of counter-examples to a distribution. Similarly, if $b=+$ then $\sigma\in\mathsf{img}(P)$ . Observing that $\mathsf{Dom}(\mathsf{Obs}^{\prime})=\mathsf{Dom}(\mathsf{Obs})\cup\{\sigma\}$ , we obtain our result. $\hfill\blacktriangleleft$

Theorem 5.3. [Restated, see original statement.]

Let $\mathit{SUL}=M_{1}\mathop{\parallel}M_{2}\mathop{\parallel}\dots\mathop{% \parallel}M_{n}$ consist of $n$ parallel LTSs. Algorithm 1 terminates and returns $\mathcal{H}_{1},\mathcal{H}_{2},\dots,\mathcal{H}_{k}$ such that $\mathcal{H}_{1}\mathop{\parallel}\mathcal{H}_{2}\mathop{\parallel}\dots\mathop% {\parallel}\mathcal{H}_{k}\equiv\mathit{SUL}$ .

Proof.

The correctness of the returned hypothesis is guaranteed by the Teacher.

The algorithm’s termination is established by showing that the “while True” loop cannot execute indefinitely. This is because the two types of counter-examples can only occur finitely many times:

$\blacksquare$

For local counter-examples, when the distribution $\Omega$ is fixed, all the learners eventually converge to a hypothesis after encountering finitely many local counter-examples, as shown in [31, Theorem 2].
$\blacksquare$

For global counter-examples, each global counter-example leads to an updated distribution $\Omega^{\prime}$ such that $\Omega\prec\Omega^{\prime}$ (Theorem 4.25). This update can only happen finitely many times, as the top element of $\prec$ is $\{\Sigma_{SUL}\}$ .

$\hfill\blacktriangleleft$

[bib.bib1] [1] Fides Aarts, Joeri de Ruiter, and Erik Poll. Formal models of bank cards for free. In ICST, pages 461–468, 2013. doi:10.1109/ICSTW.2013.60.

[bib.bib2] [2] Fides Aarts, Julien Schmaltz, and Frits W. Vaandrager. Inference and abstraction of the biometric passport. In ISoLA, volume 6415, pages 673–686, 2010. doi:10.1007/978-3-642-16558-0_54.

[bib.bib3] [3] Elvio Amparore et al. Presentation of the 9th Edition of the Model Checking Contest. In TACAS, volume 11429, pages 50–68, 2019. doi:10.1007/978-3-030-17502-3_4.

[bib.bib4] [4] Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87–106, 1987. doi:10.1016/0890-5401(87)90052-6.

[bib.bib5] [5] Franco Barbanera, Mariangiola Dezani-Ciancaglini, Ivan Lanese, and Emilio Tuosto. Composition and decomposition of multiparty sessions. Journal of Logical and Algebraic Methods in Programming, 119:100620, 2021. doi:10.1016/J.JLAMP.2020.100620.

[bib.bib6] [6] Benedikt Bollig, Joost-Pieter Katoen, Carsten Kern, and Martin Leucker. Learning Communicating Automata from MSCs. IEEE Transactions on Software Engineering, 36(3):390–408, 2010. doi:10.1109/TSE.2009.89.

[bib.bib7] [7] Manfred Broy, Bengt Jonsson, Joost-Pieter Katoen, Martin Leucker, and Alexander Pretschner, editors. Model-Based Testing of Reactive Systems, Advanced Lectures [The volume is the outcome of a research seminar that was held in Schloss Dagstuhl in January 2004], volume 3472, 2005. doi:10.1007/B137241.

[bib.bib8] [8] Ilaria Castellani, Madhavan Mukund, and P. S. Thiagarajan. Synthesizing distributed transition systems from global specification. In FSTTCS, volume 1738, pages 219–231, 1999. doi:10.1007/3-540-46691-6_17.

[bib.bib9] [9] Simon Dierl, Paul Fiterau-Brostean, Falk Howar, Bengt Jonsson, Konstantinos Sagonas, and Fredrik Tåquist. Scalable tree-based register automata learning. In TACAS, volume 14571, pages 87–108, 2024. doi:10.1007/978-3-031-57249-4_5.

[bib.bib10] [10] Andrzej Ehrenfeucht and Grzegorz Rozenberg. Partial (set) 2-structures. part II: state spaces of concurrent systems. Acta Informatica, 27(4):343–368, 1990. doi:10.1007/BF00264612.

[bib.bib11] [11] Tiago Ferreira, Harrison Brewton, Loris D’Antoni, and Alexandra Silva. Prognosis: closed-box analysis of network protocol implementations. In SIGCOMM, pages 762–774, 2021. doi:10.1145/3452296.3472938.

[bib.bib12] [12] Paul Fiterau-Brostean, Ramon Janssen, and Frits W. Vaandrager. Combining Model Learning and Model Checking to Analyze TCP Implementations. In CAV, volume 9780, pages 454–471, 2016. doi:10.1007/978-3-319-41540-6_25.

[bib.bib13] [13] Paul Fiterau-Brostean, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits W. Vaandrager, and Patrick Verleg. Model learning and model checking of SSH implementations. In SPIN, pages 142–151, 2017. doi:10.1145/3092282.3092289.

[bib.bib14] [14] Markus Frohme and Bernhard Steffen. Compositional learning of mutually recursive procedural systems. International Journal on Software Tools for Technology Transfer, 23(4):521–543, 2021. doi:10.1007/s10009-021-00634-y.

[bib.bib15] [15] Dominik Fuchß, Haoyu Liu, Tobias Hey, Jan Keim, and Anne Koziolek. Enabling architecture traceability by llm-based architecture component name extraction. In ICSA, pages 1–12, 2025. doi:10.1109/ICSA65012.2025.00011.

[bib.bib16] [16] E. Mark Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302–320, 1978. doi:10.1016/S0019-9958(78)90562-4.

[bib.bib17] [17] Roberto Guanciale and Emilio Tuosto. Realisability of pomsets via communicating automata. In ICE, volume 279, pages 37–51, 2018. doi:10.4204/EPTCS.279.6.

[bib.bib18] [18] Léo Henry, Mohammad Mousavi, Thomas Neele, and Matteo Sammartino. Replication package for the paper “Compositional Active Learning of Synchronous Systems through Automated Alphabet Refinement”, April 2025. doi:10.5281/zenodo.15170685.

[bib.bib19] [19] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall International, 2004.

[bib.bib20] [20] Falk Howar and Bernhard Steffen. Active automata learning in practice - an annotated bibliography of the years 2011 to 2016. In Machine Learning for Dynamic Software Analysis, volume 11026, pages 123–148, 2018. doi:10.1007/978-3-319-96562-8_5.

[bib.bib21] [21] Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: A redundancy-free approach to active automata learning. In RV, volume 8734, pages 307–322, 2014. doi:10.1007/978-3-319-11164-3_26.

[bib.bib22] [22] Malte Isberner, Falk Howar, and Bernhard Steffen. The Open-Source LearnLib: A Framework for Active Automata Learning. In CAV, volume 9206, pages 487–495, 2015. doi:10.1007/978-3-319-21690-4_32.

[bib.bib23] [23] Rainer Koschke. Architecture Reconstruction. Springer-Verlag, 2009. doi:10.1007/978-3-540-95888-8.

[bib.bib24] [24] Loes Kruger, Sebastian Junges, and Jurriaan Rot. Small test suites for active automata learning. In TACAS, volume 14571, pages 109–129, 2024. doi:10.1007/978-3-031-57249-4_6.

[bib.bib25] [25] Faezeh Labbaf, Jan Friso Groote, Hossein Hojjat, and Mohammad Reza Mousavi. Compositional learning for interleaving parallel automata. In FoSSaCS, volume 13992, pages 413–435, 2023. doi:10.1007/978-3-031-30829-1_20.

[bib.bib26] [26] Bas Luttik. Unique parallel decomposition in branching and weak bisimulation semantics. Theoretical Computer Science, 612:29–44, 2016. doi:10.1016/j.tcs.2015.10.013.

[bib.bib27] [27] Aryan Bastany Mahboubeh Samadi and Hossein Hojjat. Compositional learning for synchronous parallel automata. In FASE, volume 15693, pages 101–121, 2025. doi:10.1007/978-3-031-90900-9_6.

[bib.bib28] [28] Stefan Marksteiner, Marjan Sirjani, and Mikael Sjödin. Automated passport control: Mining and checking models of machine readable travel documents. In ARES, 2024. doi:10.1145/3664476.3670454.

[bib.bib29] [29] Joshua Moerman. Learning product automata. In ICGI, volume 93, pages 54–66, 2018. URL: http://proceedings.mlr.press/v93/moerman19a.html.

[bib.bib30] [30] Madhavan Mukund. From global specifications to distributed implementations. In Synthesis and Control of Discrete Event Systems, pages 19–35, 2002. doi:10.1007/978-1-4757-6656-1_2.

[bib.bib31] [31] Thomas Neele and Matteo Sammartino. Compositional automata learning of synchronous systems. In FASE, volume 13991, pages 47–66, 2023. doi:10.1007/978-3-031-30826-0_3.

[bib.bib32] [32] Marc North, Amir Atapour-Abarghouei, and Nelly Bencomo. Code gradients: Towards automated traceability of llm-generated code. In RE, pages 321–329, 2024. doi:10.1109/RE59067.2024.00038.

[bib.bib33] [33] Mathijs Schuts, Jozef Hooman, and Frits W. Vaandrager. Refactoring of legacy software using model learning and equivalence checking: An industrial experience report. In IFM, volume 9681, pages 311–325, 2016. doi:10.1007/978-3-319-33693-0_20.

[bib.bib34] [34] Maurice H. ter Beek, Rolf Hennicker, and José Proença. Team automata: Overview and roadmap. In COORDINATION, volume 14676, pages 161–198, 2024. doi:10.1007/978-3-031-62697-5_10.

[bib.bib35] [35] Maurice H. ter Beek, Rolf Hennicker, and José Proença. Overview and roadmap of team automata. CoRR, abs/2501.13589, 2025. doi:10.48550/arXiv.2501.13589.

[bib.bib36] [36] Frits W. Vaandrager. Model learning. Communications of the ACM, 60(2):86–95, 2017. doi:10.1145/2967606.

[bib.bib37] [37] Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A new approach for active automata learning based on apartness. In TACAS, volume 13243, pages 223–243, 2022. doi:10.1007/978-3-030-99524-9_12.

[bib.bib38] [38] Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf. CacheQuery: Learning Replacement Policies from Hardware Caches. In PLDI, pages 519–532, 2020. doi:10.1145/3385412.3386008.

[bib.bib39] [39] Masaki Waga. Active learning of deterministic timed automata with myhill-nerode style characterization. In CAV, pages 3–26, 2023. doi:10.1007/978-3-031-37706-8_1.

[bib.bib40] [40] Remigiusz Wiśniewski, Grzegorz Bazydło, Marcin Wojnakowski, and Mateusz Popławski. Hippo-CPS: A Tool for Verification and Analysis of Petri Net-Based Cyber-Physical Systems. In Petri Nets 2023, volume 13929, pages 191–204, 2023. doi:10.1007/978-3-031-33620-1_10.

[bib.bib41] [41] W.M. Zuberek. Petri net models of process synchronization mechanisms. In SMC, volume 1, pages 841–847, 1999. doi:10.1109/ICSMC.1999.814201.

Compositional Active Learning of Synchronizing Systems Through Automated Alphabet Refinement

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Supplementary Material:

Acknowledgements:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

2 Related work

3 Preliminaries

3.1 Labelled Transition Systems

Definition 3.1 (Labelled Transition System).

Definition 3.2 (Language of an LTS).

Definition 3.3 (Parallel composition).

Definition 3.4 (Parallel composition of languages).

Example 3.5 (Running example).

3.2 Active Automata Learning

Definition 3.6 (Observation function).

Definition 3.7 (Observation function/language agreement).

Definition 3.8 (Local observation function).

Example 3.9.

4 Distributions

4.1 Distributions and Observations

Definition 4.1 (Distribution).

Definition 4.2 (Product language).

Example 4.3.

Lemma 4.4 ([30], Lemma 5.2).

Definition 4.5 (Product observation).

Proposition 4.6.

Example 4.7.

Definition 4.8 (Counter-example to a distribution).

Example 4.9.

Corollary 4.10.

4.2 Resolving a Counter-example

Definition 4.11 (Multiplicity discrepancy).

Definition 4.12 (Order discrepancy).

Definition 4.13 (Discrepancy set).

Example 4.14 (Multiplicity discrepancy).

Example 4.15 (Order discrepancy).

Proposition 4.16.

4.3 Extending a Distribution to Model an Observation Function

Pre-ordering of Distributions

Definition 4.17 (Connectivity preorder).

Definition 4.18 (Counter-example inclusion).

Definition 4.19 (Counter-example set preordering).

Example 4.20.

Proposition 4.21.

Corollary 4.22.

Fixing the distribution

Corollary 4.23.

▶ Remark 4.24.

Theorem 4.25.

Canonical distributions: inducing a partial order

Definition 4.26 (Canonical distribution).

Proposition 4.27.

5 Compositional Learning Algorithm

5.1 Local learners

5.2 Main algorithm

Lemma 5.1.

▶ Remark 5.2.

Theorem 5.3.

▶ Remark 5.4.

Example 5.5 (Example run).

6 Experiments

Random models

Petri Net models

Realistic models

7 Conclusion

References

Appendix A Proofs

A.1 Section 4

Proposition 4.6. [Restated, see original statement.]

Proof.

Corollary 4.10. [Restated, see original statement.]

Proof.

$\blacktriangleright$ Remark 4.24.

$\blacktriangleright$ Remark 5.2.

$\blacktriangleright$ Remark 5.4.