The Complexity of Learning LTL, CTL and ATL Formulas
Abstract
We consider the problem of learning temporal logic formulas from examples of system behavior. Learning temporal properties has crystallized as an effective means to explain complex temporal behaviors. Several efficient algorithms have been designed for learning temporal formulas. However, the theoretical understanding of the complexity of the learning decision problems remains largely unexplored. To address this, we study the complexity of the passive learning problems of three prominent temporal logics, Linear Temporal Logic (LTL), Computation Tree Logic (CTL) and Alternating-time Temporal Logic (ATL) and several of their fragments. We show that learning formulas with unbounded occurrences of binary operators is NP-complete for all of these logics. On the other hand, when investigating the complexity of learning formulas with bounded occurrences of binary operators, we exhibit discrepancies between the complexity of learning LTL, CTL and ATL formulas (with a varying number of agents).
Keywords and phrases:
Temporal logic, passive learning, complexityCopyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computationFunding:
Rajarshi Roy acknowledges partial funding by the ERC under the European Union’s Horizon 2020 research and innovation programme (grant agreement No.834115, FUN2MODEL).Editors:
Olaf Beyersdorff, Michał Pilipczuk, Elaine Pimentel, and Nguyễn Kim ThắngSeries and Publisher:

1 Introduction
Temporal logics are the de-facto standard for expressing temporal properties for software and cyber-physical systems. Originally introduced in the context of program verification [33, 15], temporal logics are now well-established in numerous areas, including reinforcement learning [40, 25, 10], motion planning [17, 12], process mining [13], and countless others. The popularity of temporal logics can be attributed to their unique blend of mathematical rigor and resemblance to natural language.
Until recently, formulating properties in temporal logics has been a manual task, requiring human intuition and expertise [6, 39]. To circumvent this step, in the past ten years, there have been numerous works to automatically learn (i.e., generate) properties in temporal logic. Among them, a substantial number of works [29, 11, 35, 26, 41] target Linear Temporal Logic (LTL) [33]. There is now a growing interest in learning formulas [16, 34, 9] in Computation Tree Logic (CTL) [15] and Alternating-time Temporal Logic (ATL) [1] due to their ability to express branching-time properties, including for multi-agent systems.
While existing approaches for learning temporal properties demonstrate impressive empirical performance, detailed comparisons of computational complexity across different temporal logics remain underexplored. Most related works focus on LTL, either in the verification domain [18, 27] or the database domain [19, 24]. These studies primarily report complexity results, often highlighting -completeness for learning LTL-formulas and their fragments. In contrast, the computational complexity of learning CTL- and ATL-formulas has not yet been thoroughly examined.
In this work, we extend the study of learning temporal properties to include CTL- and ATL-formulas. Additionally, we broaden existing results for LTL to cover a more comprehensive set of operators, specifically addressing all binary operators (temporal or not).
To elaborate on our contributions, let us precisely describe the problem that we consider, the passive learning problem for temporal logic [29, 11]. Its decision version asks the following question: given two sets , of positive and negative examples of a system’s behavior and a size bound , does there exist a “separating” formula of size at most , which is satisfied by the positive examples and violated by the negative ones.
Our instantiation of the above problem depends on the considered logic, following related literature [29, 34, 9]: LTL-formulas express linear-time properties, CTL-formulas express branching-time properties, and ATL-formulas express properties on multi-agent systems. Accordingly, the input examples for learning LTL, CTL and ATL are linear structures (or equivalently infinite words), Kripke structures and concurrent game structures, respectively. We refer to Section 2 for formal definitions and other prerequisites.
We summarize our contributions in Table 1.
Our first result, illustrated in the left column, shows that allowing formulas with unrestricted use of at least one binary operator makes the corresponding learning decision problem -complete for all considered logics. Some of these -hardness results are (closely) inspired by [27], involving reductions from the hitting set problem – one of Karp’s 21 -complete problem; some others require novel proof techniques, e.g. one involves a reduction from an -complete modulo-2 calculus problem. We describe the outline of the proofs in Section 3.
Unbounded | Bounded use of binary operators | |||
use of binary | ||||
operators | ||||
-c | ||||
-c | -c | |||
-c | -c | |||
-c |
All of the above -hardness proofs rely on separating formulas with linearly many (in the size of the input) occurrences of binary operators. Thus, in the search of expressive temporal logic fragments with lower complexities, we focus on formulas with a bounded occurrences of binary logical operators such as (and), (or), etc. and no binary temporal operators such as (until), (release), etc. This choice of formulas is motivated by the fact that such formulas can still express interesting properties (e.g., GR(1) [32] formulas, mode-target formulas [4], etc.) and are used in several practical applications (see Section 4.1 for details). We explore several fragments with different unary temporal operators, (next), (eventually) and (globally), and present the results in the rightmost column of Table 1. We notice that, in this case, the complexity of the learning problems varies considerably between different logics and unary operators. Importantly, we exhibit fragments where the learning problem is below . We prove the three -hardness results using a reduction from the hitting set problem; we give key insights on all of these results in Section 4.
All details can be found in the extended version [8].
Related Works.
The most closely related works are [18] and [27], which operate within a similar framework to ours. Both works consider learning problems in several fragments of LTL, especially involving boolean operators such as and , and temporal operators such as , and and prove their -completeness. We extend part of their work by categorizing fragments based on the arity of the operators and studying which type of operators contribute to the hardness. Moreover, there are several differences in the parameters considered for the learning problem. The most important one is the following: the above works consider the size upper bound to be in binary, while we assume given in unary. Although, in complexity problems, integers are most often assumed to be written in binary, we believe that considering size bound in unary is justified since one may want to not only decide the existence of a formula but also effectively output one, which will require to explicitly write it. The other differences with the setting of the above works are mostly due to the fact that we do not only consider learning, but and learning as well. A thorough discussion of these differences can be found in the extended version of this paper [8].
In the past, complexity analysis of passive learning has been studied for formalisms other than temporal logics. For instance, [21] and [2] proved -completeness of the passive learning problems of deterministic finite automata (DFAs) and regular expressions (REs).
When it comes to temporal logics, most related works focus on developing efficient algorithms for learning temporal logic formulas. Among these, the emphasis has predominantly been on learning LTL (or its significant fragments), which has been discussed in detail in a recent survey summarizing various learning techniques [30]. Broadly, the techniques can be categorized into three main types: constraint solving [29, 11, 37, 20, 22], enumerative search [35, 41], and neuro-symbolic techniques [26, 42].
For learning CTL, some approaches rely on handcrafted templates [14, 43] for simple enumerative search, while others employ constraint-solving methods to learn formulas with arbitrary structures [34]. The constraint-solving methods are extended to learn ATL-formulas as well [9]. There are also works on learning other logics such as Signal Temporal Logic [7, 28], Metric Temporal Logic [36], Past LTL [3], Property Specification Language [38], etc.
2 Preliminaries and Definitions
We let denote the set of all integers and denote the set of all positive integers. For all , we let denote the set of integers .
Given any non-empty set , we let and denote the sets of finite, non-empty finite and infinite sequences of elements in , respectively. For all , we denote by the number of elements of . For all , and , if has at least elements, we let: denote the -th element in , in particular is the first element of ; denotes the non-empty finite sequence ; denotes the non-empty sequence , in particular we have .
For the remainder of this section, we fix a non-empty set of propositions .
2.1 Structures
Usually, ATL-formulas are interpreted on concurrent game structures, i..e. games where, at each state, the concurrent actions of several agents have an impact on the next state reached. A special kind of concurrent game structures are turn-based game structures, where each state belongs to a specific agent who decides what the next state is. Here, we introduce only this special kind of games mainly due to a lack of space, but also because all of our hardness results, presented in Table 1, hold even when only considering turn-based game structures.
Definition 1.
A turn-based game structure (TGS for short) is a tuple where: is a finite set of states; is the set of initial states; maps each state to its set of successors; denotes the set of agents; maps each state to the agent owning it; and maps each state to the set of propositions that hold in . A state is said to be self-looping if . A structure is self-looping if all of its states are self-looping.
For all coalitions of agents , a strategy for the coalition is a function such that, for all , if , then . We denote by the set of strategies for the coalition . Then, from any state , we define the set of infinite paths compatible with the strategy from : .
Finally, the size of the turn-based structure is equal to: .
Unless otherwise stated, a turn-based structure will always refer to the tuple .
There are also special kinds of turn-based structures of interest for us, introduced below.
Definition 2.
A Kripke structure is a turn-based structure with only one agent. A linear structure is a Kripke structure such that: , and for all , we have . Finally, a turn-based structure is size-1 if .
Unless otherwise stated, a Kripke structure will always refer to a tuple 111In Kripke structures, there is only one agent, thus and are irrelevant..
We have introduced the notion of linear structures as we are going to interpret LTL-formulas on them. In the literature, they are usually interpreted on ultimately periodic words. However, both models are equivalent and can be encoded into each other straightforwardly.
2.2 ATL, CTL and LTL formulas
The , and -formulas that we consider throughout this paper use the following temporal operators: (neXt), (Future), (Globally), (Until), (Release), (Weak until), (Mighty release). We group these operators into the sets of unary and binary operators: and . We also let be the set of all logical binary operators, i.e. classical logical operators, along with their negations: (we have ).
To define ATL-formulas, we consider two types of formulas: state formulas – where strategic operators occur, denoted with the Greek letter – and path formulas – where temporal operators occur, denoted with the Greek letter . Consider some , , and . For all , we denote by the set of -state formulas defined by the grammar:
where is a state-formula, is a path formula, , , is a subset of agents, , and . We denote by the set of all -state formulas . Note that -formulas are -formulas. Hence, there are only two possible strategic operator: , usually denoted , and usually denoted . We define -formulas as -formulas using only the quantifier . Since -formulas are interpreted on linear structures, where each state has exactly one successor, the strategic operators used have no impact on the satisfaction of the formula. For readability, we will depict -formulas without the quantifier.
The set of sub-formulas of a formula is then defined inductively as follows: where if , if and if . The size of a formula is then defined as its number of sub-formulas: . We also denote by the number of sub-formulas of using a binary operator, with: .
We interpret -formulas over TGS using the standard definitions [1]. That is, given a state and a state formula , the fact satisfies , denoted , is defined inductively:
iff | |||||
iff |
iff | |||||
iff |
where is a binary operator seen as a boolean function with . Furthermore, given a path and a path formula , the fact that holds for the path , also denoted , is defined inductively as follows:
iff | |||||
iff | |||||
iff | |||||
iff |
iff | |||||
iff | |||||
iff |
An -formula accepts a TGS , denoted by , if for all initial states , otherwise it rejects it. Given two formulas , we write if, for all TGS , if , then . We write when and .
2.3 Learning decision problem
We define the , and learning problems below, where models for , , and are linear structures, Kripke structures and turn-based game structures, respectively.
Definition 3.
Let and consider some sets of operators , and . For all , we denote by the decision problem:
-
Input: where is a set of propositions, are two finite sets of models for , and .
-
Output: yes if and only if there exists a -formula such that , , and is separating, i.e. such that : for all (resp. ), we have (resp. ).
The size of the input is equal to (i.e. is written in unary).
As the model checking problems for , , are in [1], it follows that the learning problems for all these logics are in , with a straightforward guess-and-check subroutine.
Proposition 4.
For all , , , , and , the decision problem is in .
2.4 Hitting set problem
We recall below the -complete problem from which we will establish almost all of our (-hardness) reductions.
Definition 5 (Hitting set problem).
We denote by the following decision problem:
-
Input: a triple where , are non-empty subsets of
-
Output: yes iff there is a subset of size at most such that, we have for all . In such a case, the set is called a hitting set.
In the following, if is an instance of the hitting set problem, then refers to for some .
3 Learning with unbounded use of binary operators
First, we consider the case of learning a formula with arbitrarily many occurrences of binary operators. The main result of this section is stated in Theorem 6 below.
Theorem 6.
Let and such that . Then, for all , the decision problem is -hard.
In the passive learning setting that we consider, the size of the formulas is crucial due to the upper bound . Therefore, although it is possible to express e.g. disjunctions with conjunctions and negations, since doing so affects the size of the formulas involved, if we have proved that a learning problem is -hard with the operators , it does not imply a priori that it is also -hard with the operator . Hence, for the sake of completeness, we consider all those fourteen binary operators (ten logical, four temporal), although it seems that some of these binary operators (like or ) make much more sense to consider than others (like or ). Since these operators behave differently, we cannot do a single reduction working for all these operators at once. However, we do partition these operators in different groups and exhibit a reduction per group of operators.
Most of the reductions use only size-1 structures, that are (almost) entirely defined by the subset of propositions labeling their only state. In addition, most of the reductions are done from the hitting set problem. In that case, how we extract a hitting set from a (small enough) separating formula relies only on the variables that need to occur in a formula separating the positive and negative structures, regardless of the operators involved.
We start with the operators , i.e. we assume that . The reduction for this case is actually a straightforward adaptation of the proof of [27, Theorem 2]. We describe it here. Given an instance of the hitting set problem, we let . Furthermore, for all subsets , we let denote a size-1 (linear) structure whose only state is labeled by the set . Then, we let for , , and . Let us illustrate this reduction on a simple example. Assume that , , and . Then, the sets labeling the only state of the positive structures are , , and while the set labeling the only state of the negative structure is . Furthermore, . Then, is the a hitting set with , while , , and are all separating formulas with .
We claim that is a positive instance of iff is a positive instance of . Indeed, given a hitting set with , one can check that the -formula of size accepts and rejects . Note that, the -formulas and of size also accept and reject . On the other hand, consider an -formula of size at most that accepts and rejects . We let . Since , we have . Furthermore, consider any . Let us consider the set (resp. ) labeling the only state of the structure (resp. ). We have . One can then show (rather straightforwardly, by induction on -formulas) that, since accepts and rejects , at least one variable in occurs in . That is, and is a hitting set of size at most .
In fact, the reduction for the operators is obtained from the above one by reversing the positive and negative sets (the arguments are almost identical).
We then handle the operators . The above reductions cannot be used since, when the operator (or the operator ) is used successively, the formula obtained is semantically equivalent to an alternation of conjunctions and disjunctions. For instance, consider six variables to use in a single -formula using only the operator, e.g.: . It is semantically equivalent to: . This is in sharp contrast with the above-formulas and . To circumvent this difficulty, we change the reduction by adding propositions labeling the only state of all the positive size-1 linear structures (but not the only state of all the negative ones). We can then place these propositions where and were in the above formula. That way, we semantically obtain a disjunction on relevant variables . The obtained reduction is slightly more subtle than the previous ones.
Before considering the last two logical operators , we handle the temporal operators . The two previous reductions only use size-1 structures. On such structures, the temporal operators are actually equivalent to and respectively. Hence, the reductions for and can also be used as is for the operators and respectively.
We then handle the final two logical operators . These operators are unlike the other operators. Let us give an intuition of how the learning problems with these operators behave. Consider an -formula using only the operators and and a size-1 structure . Let denote the set of propositions labeling the only state of . We let denote the number of occurrences of the operators in . We also let denote the number of occurrences of the propositions not in in . Then, one can realize that if and only if and have the same parity. This simple observation suggests that the learning problem with the operators is linked to modulo-2 calculus. The reduction for these operators is established from an -complete problem dealing with modulo-2 calculus, known as the Coset Weight problem [5].
Finally, we handle the temporal operators and . On size-1 structures, for all -formulas , we have the following equivalences: . That is, contrary to the temporal operators and , on size-1 structures, and are equivalent to unary operators. Hence, the reduction that we consider does not involve only size-1 structures. It is once again established from the hitting set problem, though the construction and the correctness proof are more involved than for the above cases.
4 Learning with a bounded amount of binary operators
Since, with unbounded use of binary operators, all the learning problems are -hard, we focus on learning formulas where the number of occurrences of binary operators is bounded. Note that the bound parameterizes the decision problem itself, and therefore is independent of the input. For simplicity, we restrict ourselves to formulas that do not use at all binary temporal operators. Before we dive into the details of our results as summarized in the rightmost column of Table 1, let us first argue why this fragment is interesting to focus on.
4.1 Expressivity
The passive learning problem that we consider in this paper bounds the size of the formulas considered. This is because we want a separating formula not to overfit the input (i.e. not to simply describe the positive and negative models). However, another benefit is that the smaller the formulas, the more understandable they are for users. Similarly, using too many binary operators could make the formulas hard to grasp, regardless of their size.
In addition, there are examples of interesting specifications that can be expressed with a bounded amount of binary operators. We give three examples below with -formulas.
Consider first so-called “mode-target” formulas of the shape , where all are propositions. These types of formulas were introduced in [4] and exhibit two interesting features: the corresponding -synthesis problem is tractable, and these formulas express an interesting property, which can be summarized as follows: if a model eventually settles in a mode , then it should eventually settle in one of the target . Interestingly, when the number of different modes and targets that a system can have is fixed, then the number of binary operators sufficient to express such specification is also bounded.
Similarly, there are also interesting specifications related to “generalized reactivity” (from [32] for -formulas). Such specifications are of the shape , where all formulas and do not feature at all temporal operators. As such, up to introducing additional propositions, these could be expressible with few binary operators. These formulas can be read as an implication between assumptions and guarantees. As above, when the number of assumptions and guarantees is bounded, then the number of binary operators sufficient to express such formulas also is.
Finally, one of the popular LTL learning tools, Scarlet [35], relies on a fragment of LTL, directed LTL and its dual, which uses unary temporal operators and binary logical operators only. In these fragments, formulas of a fixed length (a search parameter they define) can use several of and operators while using only bounded occurrences of and operators.
4.2 Abstract recipes
The six decision problems captured in the rightmost column of Table 1 are of two kinds: three are -complete, while three others are below . In fact, the proofs of all three results of the same kind will follow the same abstract recipes. We present them below.
Recipe for the membership-below-NP proofs.
Let denote either formulas, or formulas without the operator , or formulas with only one unary operator or (i.e. one of the three logical fragment for which the corresponding decision problem is below ). Then, we follow the two steps below:
-
A)
First, we show that given the set of propositions and the bound , there is a set of relevant -formulas such that: 1) For all -formulas of size at most , there is a formula such that ; and 2) the size of is polynomial in and .
-
B)
Second, we show that for all -formulas , deciding if satisfies a -model can be done, depending on , within the resources allowed, i.e. logarithmic space for the case, non-deterministic logarithmic space for the case, and polynomial time for the case.
Due to a lack of space, in this paper we will only present these two steps in the context of formulas that do not use any binary operator. Since the occurrences of binary operators is bounded in any case, the arguments are essentially the same for the general case. For instance, for the first step, from the result established for formulas without binary operators, we can straightforwardly deduce the result for all formulas, by induction on the bound . That way, we obtain that could be exponential in the bound , but this does not have an impact complexity-wise, since is fixed.
Recipe for the NP-hardness proofs.
The formulas that we consider only use a bounded amount of binary operators. Thus, contrary to the -hardness reductions of Section 3, here, our -hardness proofs do not rely on binary operators. In fact, these binary operators make it harder to argue about how the permitted unary operators interact. For this reason, our proof of -hardness is decomposed into two steps. We first exhibit reductions for the learning problems without binary operators. Then, from these reductions, we devise reductions for the learning problems with bounded occurrences of binary operators. We present in details the former reductions in this paper and give intuition behind the later reductions below.
Let and be a binary (non-temporal) operator. We consider propositions and we define multiple size-1 structures using the propositions forming two sets and . The idea is that to distinguish these two sets, a separating formula will necessarily feature all the propositions . In fact, from a positive and a negative sets of structures and on the set of proposition 222In fact, for technical reason, in [8], we use two propositions . (which is the only proposition that unary formulas can use in our reductions), we can show the following: if a formula of size at most , with at most occurrences of binary operators, separates both and , and and , then there is a unary formula of size at most that separates and .333Actually, we can also show the converse (which is important for us to prove that the reduction is correct). That way, a reduction for the learning problem without binary operators can be translated (in logspace) into a reduction for the learning with bounded occurrences of binary operators. Note that the arguments presented in this paragraph are not straightforward to formally state and prove (this is handled in Theorem “Proving -hardness without binary operators is sufficient” in the extended version [8]).
Let us now consider how we handle the reduction without binary operators. From an instance of the hitting problem, we proceed as follows. We define a sample of structures (and a bound ) such that all separating formulas have a specific shape, and there is a bijection between subsets and formulas of that specific shape. This correspondence allows us to extract a hitting set. More specifically, we follow the abstract recipe below:
-
(a)
We define the bound and positive and negative structures that “eliminate” certain operators or pattern of operators from any potential separating formula. This way we ensure that any separating formula will be of the form , for some .
-
(b)
We define a negative structure satisfied by a formula if and only if .
-
(c)
For all , we define a positive structure that a formula accepts if and only if .
By construction, the instance of the learning decision problem that we obtain is a positive instance if and only if the hitting set instance also is. Furthermore, note that in all three cases, this reduction can be computed in logspace.
4.3 LTL learning
We start with LTL learning. We have the proposition below.
Proposition 7.
For all sets of unary operators , sets of binary (non-temporal) operators , and , the decision problem is in .
We present Steps A and B of Section 4.2 in the case . Toward Step A, we have the equivalences below (see e.g. [27, Prop. 8]), which imply the corollary that follows.
Observation 8.
For all -formulas and , we have: 1) , ; 2) , ; 3) , .
Corollary 9.
Consider a set of propositions . We let and .
Then, for any -formula , there is an -formula such that and .
Proof sketch.
With the equivalences 1) from Observation 8, we can push the operators in at the beginning of the formula. The equivalences 2) and 3) from Observation 8 ensure that it is possible to have at most two nested operators in the resulting formula .
The set of relevant formulas is then obtained directly from the set of formulas . Note that however, how it is obtained depends on the exact operators in . For instance, if while , we should replace the occurrences of in formulas in by . Nonetheless, in any case, we obtain a set of relevant formulas whose number of elements is linear in . This concludes the arguments for Step A. As for Step B, one can realize that since there are at most two nested operators in formulas in , then checking that they hold on a linear structure can be done in logarithmic space (because it suffices to have a constant number of pointers browsing the structure).
4.4 CTL learning
Consider now the more involved case of CTL learning. As can be seen in Table 1, we distinguish two cases: with and without the operator .
Assume that .
The goal is to show the theorem below.
Theorem 10.
For all sets , , and bound , if , then the decision problem is -hard.
As stated in Section 4.2, we argue the theorem in the case . Recall that in that case we consider a single proposition . Consider an instance of the hitting set problem . We follow the three Steps a, b, and c. Toward Step a, we define Kripke structures that prevent the use of the operators . To do so, we let and for two sets , we consider the Kripke structure that is depicted in Figure 1. These structures satisfy the lemma below.
Lemma 11.
A formula of size at most accepting and rejecting and cannot use the operators .
Proof sketch.
Consider an equivalent CTL-formula with where negations, if any, occur right before the proposition . Then, if uses the operator , it cannot distinguish the structures and . Otherwise, if it uses the operator , it cannot distinguish the structures and . Otherwise, since , and coincide on the first states, has to use at least operators . Since , it cannot use a negation. Thus does not use , and neither does .
In fact, a -formula of size at most accepting and rejecting necessarily uses exactly operators followed by the proposition . Such a formula is therefore entirely defined by the operators before which it uses the quantifier. This suggests the definition below of the CTL-formula induced by a subset .
Definition 12.
For all , we let denote the -formula defined by where, for all , we have and if and only if .
For , we let (with ).
Let us now turn toward Step b. We define a structure with different levels, where: the single starting state is at the bottommost level; the proposition only labels the state at the topmost level; and every state of the bottom levels has a successor at the same level and one level higher. That way, going from to is equivalent to leveling up times. Furthermore, the top most level can be reached in at most . An example is depicted in Figure 2. This structure satisfies the lemma below.
Lemma 13.
For all , we have if and only if .
Consider now Step c. For , we define the Kripke structure with as set of states where is the only state labeled with ; and for all , branches to and, if (and only if) , also branches to , as exemplified in Figure 3. Such structures satisfy the lemma below.
Lemma 14.
For all , we have if and only if .
Proof sketch.
We can show by induction on the property : if and only if . The lemma is then given by .
Assume that .
In that case, the CTL learning problem is now in .
Theorem 15.
For all sets of operators , , and bounds , the decision problem is in .
Toward Step A, a crucial observation is that using the operators or twice in a row is useless. This is stated in the lemma below in the context of -formula because this lemma will be used again in the next subsection.
Lemma 16.
Let , and be an -formula. We have:
Proof sketch.
We argue the result for , the case of is dual. We have by definition of . Furthermore, if a state satisfies then there is a strategy for the coalition such that eventually a state satisfying is surely reached. For all such states , we consider a strategy for the coalition ensuring to eventually visit a state satisfying . Then, consider a strategy for the coalition that: mimics (which is possible since ) until a state satisfying is reached, and then switches to the strategy . That strategy ensures eventually reaching a state satisfying . Therefore, . This is similar for .
From this, we can actually deduce (this is not direct) that there is a bound such that, for any set of propositions and for all , given any -formula , there is an equivalent -formula , with . Thus, the number of -formulas to consider is linear in . As for Step B, consider any such formula . Since the number of quantifiers it uses is bounded by and , we deduce that checking that it satisfies a Kripke structure can be done in .
Proof of NL-hardness.
In Table 1, we do not state only that CTL learning without the operator is in , but also that it is -hard. Proving this result is actually straightforward. We exhibit a reduction from the problem of reachability in a graph (which is -complete [23]). Given an input of that problem, with a graph, the source state and the target state, we define a positive Kripke structure that is obtained from by making its only initial state, and the only state labeled by the proposition . Additionally, we consider as the bound, and with an additional structure, we ensure that if there is a separating formula, then the formula is separating.
4.5 ATL learning
We have seen that CTL learning with the operator is -hard, which implies that it is also the case for ATL learning. Here, we consider the case of ATL learning without the operator . First, let us informally explain why the -hardness reduction that we have described above for CTL cannot possibly work without the operator . A central aspect of the proof of Lemma 14 is to be able to associate a specific operator in a prospective formula with a specific state in a Kripke structure. That is intrinsically not possible with the operator since this operator looks at arbitrarily distant horizons. At least, this is true with CTL-formulas interpreted on Kripke structures. However, with -formulas interpreted on turn-based structures, it is possible to “block the horizon” of operators. Indeed, consider the structure of Figure 5, where blue lozenge-shaped states are Agent-1 states, and red square-shaped states are Agent-2’s. Here, one can see that because Agent 2 can enforce to loop on the Agent-2 state and not see the state , labeled by .
These kinds of turn-based games will be extensively used in the following. In all generality, there are defined as follows: given a pair of agents and , in the turn-based structure , there are self-looping states, alternatively belonging to Agents and , that can get closer and closer to the self-looping sink , the only state labeled by . In fact, such structures are linked to alternating-formulas, defined below.
Definition 17.
An -formula is positive if it does not use any negation. For a pair of agents and , a positive -formula is -free if it does not use an operator with . It is -alternating if it is -free and if there are at least alternating occurrences of operators with and with .
Lemma 18.
Consider two agents , , and a positive -formula that is -free. The formula accepts the structure if and only if it is -alternating.
learning with .
Here, all the turn-based structures that we consider use the set of agents . The goal is to show the theorem below.
Theorem 19.
For all sets , , and bound , if and , then the decision problem is -hard.
In the following, to ease the notations, the strategic operators will simply be denoted , , and respectively. Consider an instance of the hitting set problem. We follow the recipe of Subsection 4.2. Here, we want separating formulas to be promising, i.e. to only use the operators and . To this end, all the structures we use are self-looping, thus making the operators and useless.
Lemma 20.
For all -formulas and self-looping states , we have:
Proof.
Since is self-looping the coalition of agents has a strategy such that . The lemma follows from the definition of the operators and .
We also consider the two structures , of Figure 5 satisfying the lemma below.
Lemma 21.
For all -formulas accepting and rejecting , there is a promising formula with that is equivalent to on self-looping structures.
Proof sketch.
Consider an -formula equivalent to with and with at most one negation occurring before the proposition . Since accepts , it follows that it is positive. By Lemma 20, we can remove the operators and from . Furthermore: cannot use , since it accepts , and it cannot use since it rejects . It is therefore promising.
We will also consider as a positive structure, thus allowing us to focus on -alternating formulas (recall Lemma 18). Then, we want to associate to a subset a promising -alternating -formula. To get an intuition, let us consider the turn-based structure for of Figure 6. This structure is analogous to the structure except that all Agent-2 states have an additional successor: the state that does not satisfy any positive formula. Back to the structure of Figure 6, because Agent 2 owns the states , these states do not accept any positive -formula of the shape . Therefore, for all and positive -formulas , we have if and only if . This actually implies that a -alternating formula accepts if and only if the sequence of operators (without in between) occurs at least twice in (to go from to and then from to ). In fact, we consider formulas that only use operators after and before , as defined below. Such formulas satisfy the lemma that follows.
Definition 22.
For all , we let denote the -formula defined by: where, for all , we have and iff .
For all , we let (with ).
Lemma 23.
A promising -alternating formula with rejects if and only if for some such that .
Proof sketch.
Since is -alternating, it uses at least operators and . Thus, it can use at most operators . In addition, accepts iff there are at least occurrences of the sequence in . Thus, uses each remaining operators between a different pair of successive iff it rejects .
With as positive structures and as negative structures, we have achieved both Steps a and b. Let us turn to Step c. For all , we define a turn-based structure . An example is depicted in Figure 7 for . The structure features a sequence of states alternating between Agent-1 and Agent-2 states ending in a self-looping sink not labeled by . However, the Agent-1 states for which have a “testing state” as successor. That state is self-looping, and may branch to the self-looping sink or to the structure . That state is such that iff (iff ). Furthermore, note that it is useless to “wait” at the state before branching to . Indeed, if for instance but , then it may seem that , for and therefore . However, it is not the case because we do not have , since is not -alternating, and thus it does not satisfy the structure . Overall, we have the lemma below.
Lemma 24.
For all , we have if and only if .
learning with or .
The learning problem is now in .
Theorem 25.
For all sets of operators , , and bounds , the decision problem is in .
We focus on the case , the other is analogous. Towards Step A, consider a formula and the only proposition occurring in . By Lemma 16, we can make the following observations: 1) If the operator occurs in , then ; 2) Otherwise, if the only operator occurring in is then ; 3) Otherwise, is equivalent to a formula alternating between the operators and , with . These observations suggest the definition below, which satisfies the lemma that follows.
Definition 26.
For a set of propositions , we define the set where .
Lemma 27.
For a set of propositions , and , there is an -formula such that and .
Proof of P-hardness.
learning with .
Let us consider learning with one more agent, i.e. learning, still with . The turn-based structures that we consider now use the set of agents . The goal is to show the theorem below.
Theorem 28.
For all sets , , and bound , the decision problem is -hard.
We focus on the case (the case is analogous since the operators and have a dual behavior). Once again, let us consider an instance of the problem . We start right away by defining the -formula associated to a subset .
Definition 29.
For , we let denote the -formula defined by where, for all , we have and if and only if .
For , we let (with ).
Toward Step a, we define as negative structures, thus ensuring that a separating formula does not use an operator with , or . We also define as a positive structure with the bound . That way, a separating formula is necessarily -alternating and only uses the operators , and .
Lemma 30.
If a formula with accepts and rejects , then there is some such that .
Note that, if , then is -alternating. Therefore, since is a negative structure, if is separating, then , i.e. we have also achieved Step b. Let us now turn to Step c. For all , we define the structure . An example is given in Figure 8 with . This structure features a sequence of states alternating between Agent-1 and Agent-2 states ending in a self-looping sink . However, the Agent-1 states for which have an Agent-3 “testing state” as successor. That state is self-looping and also branches to the structure . Note that, given , the sub-formula is -alternating, and therefore satisfies the structure , iff . Thus, since is an Agent-3 state, iff iff . Thus, we have the following lemma.
Lemma 31.
For all , we have if and only if .
This concludes Step c. Overall, we let be an input of the decision problem where , , and . We have that is a positive instance of if and only if is a positive instance of .
5 Future Work
Within our setting, we have covered many cases, as can be seen in Table 1. That is why the complete version of this work [8] is already quite long. However, there are still some cases that we have not tackled. First, there is the case of learning with . We believe that it behaves like the case , but the proofs would entail many additional technical details, since replacing with increases the size of the formulas.
More importantly, when considering a bounded amount of binary operators, we have not allowed binary temporal operators (). Doing so would enhance the expressivity of the fragment that we consider, and we conjecture that we would obtain the same result as in this paper, with proofs that should be only moderately more involved.
On a more high level perspective, in this paper we have focused solely on solving exactly the learning problems and although we have found some relevant tractable cases, many are untractable. A promising research direction would be to look for tractable approximation algorithms, similarly to what is done in [27].
References
- [1] Rajeev Alur, Thomas A. Henzinger, and Orna Kupferman. Alternating-time temporal logic. J. ACM, 49(5):672–713, September 2002. doi:10.1145/585265.585270.
- [2] Dana Angluin. On the complexity of minimum inference of regular sets. Inf. Control., 39(3):337–350, 1978. doi:10.1016/S0019-9958(78)90683-6.
- [3] M. Fareed Arif, Daniel Larraz, Mitziu Echeverria, Andrew Reynolds, Omar Chowdhury, and Cesare Tinelli. SYSLITE: syntax-guided synthesis of PLTL formulas from finite traces. In FMCAD, pages 93–103. IEEE, 2020. doi:10.34727/2020/ISBN.978-3-85448-042-6_16.
- [4] Ayca Balkan, Moshe Y. Vardi, and Paulo Tabuada. Mode-target games: Reactive synthesis for control applications. IEEE Trans. Autom. Control., 63(1):196–202, 2018. doi:10.1109/TAC.2017.2722960.
- [5] Elwyn R. Berlekamp, Robert J. McEliece, and Henk C. A. van Tilborg. On the inherent intractability of certain coding problems (corresp.). IEEE Trans. Inf. Theory, 24(3):384–386, 1978. doi:10.1109/TIT.1978.1055873.
- [6] Dines Bjørner and Klaus Havelund. 40 years of formal methods - some obstacles and some possibilities? In FM, volume 8442 of Lecture Notes in Computer Science, pages 42–61. Springer, 2014. doi:10.1007/978-3-319-06410-9_4.
- [7] Giuseppe Bombara, Cristian-Ioan Vasile, Francisco Penedo, Hirotoshi Yasuoka, and Calin Belta. A decision tree approach to data classification using signal temporal logic. In Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control, HSCC ’16, pages 1–10, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2883817.2883843.
- [8] Benjamin Bordais, Daniel Neider, and Rajarshi Roy. The complexity of learning temporal properties. CoRR, abs/2408.04486, 2024. doi:10.48550/arXiv.2408.04486.
- [9] Benjamin Bordais, Daniel Neider, and Rajarshi Roy. Learning branching-time properties in CTL and ATL via constraint solving. In André Platzer, Kristin Yvonne Rozier, Matteo Pradella, and Matteo Rossi, editors, Formal Methods - 26th International Symposium, FM 2024, Milan, Italy, September 9-13, 2024, Proceedings, Part I, volume 14933 of Lecture Notes in Computer Science, pages 304–323. Springer, 2024. doi:10.1007/978-3-031-71162-6_16.
- [10] Alberto Camacho, Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. McIlraith. LTL and beyond: Formal languages for reward function specification in reinforcement learning. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 6065–6073. ijcai.org, 2019. doi:10.24963/IJCAI.2019/840.
- [11] Alberto Camacho and Sheila A. McIlraith. Learning interpretable models expressed in linear temporal logic. In ICAPS, pages 621–630. AAAI Press, 2019. URL: https://ojs.aaai.org/index.php/ICAPS/article/view/3529.
- [12] Alberto Camacho, Eleni Triantafillou, Christian J. Muise, Jorge A. Baier, and Sheila A. McIlraith. Non-deterministic planning with temporally extended goals: LTL over finite and infinite traces. In AAAI, pages 3716–3724. AAAI Press, 2017. doi:10.1609/AAAI.V31I1.11058.
- [13] Alessio Cecconi, Giuseppe De Giacomo, Claudio Di Ciccio, Fabrizio Maria Maggi, and Jan Mendling. Measuring the interestingness of temporal logic behavioral specifications in process mining. Inf. Syst., 107:101920, 2022. doi:10.1016/J.IS.2021.101920.
- [14] William Chan. Temporal-logic queries. In CAV, volume 1855 of Lecture Notes in Computer Science, pages 450–463. Springer, 2000.
- [15] Edmund M. Clarke and E. Allen Emerson. Design and synthesis of synchronization skeletons using branching-time temporal logic. In Dexter Kozen, editor, Logics of Programs, Workshop, Yorktown Heights, New York, USA, May 1981, volume 131 of Lecture Notes in Computer Science, pages 52–71. Springer, 1981. doi:10.1007/BFB0025774.
- [16] Rüdiger Ehlers, Ivan Gavran, and Daniel Neider. Learning properties in LTL ACTL from positive examples only. In 2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, September 21-24, 2020, pages 104–112. IEEE, 2020. doi:10.34727/2020/ISBN.978-3-85448-042-6_17.
- [17] Georgios E. Fainekos, Hadas Kress-Gazit, and George J. Pappas. Temporal logic motion planning for mobile robots. In ICRA, pages 2020–2025. IEEE, 2005. doi:10.1109/ROBOT.2005.1570410.
- [18] Nathanaël Fijalkow and Guillaume Lagarde. The complexity of learning linear temporal formulas from examples. In Jane Chandlee, Rémi Eyraud, Jeff Heinz, Adam Jardine, and Menno van Zaanen, editors, Proceedings of the 15th International Conference on Grammatical Inference, 23-27 August 2021, Virtual Event, volume 153 of Proceedings of Machine Learning Research, pages 237–250. PMLR, 2021. URL: https://proceedings.mlr.press/v153/fijalkow21a.html.
- [19] Marie Fortin, Boris Konev, Vladislav Ryzhikov, Yury Savateev, Frank Wolter, and Michael Zakharyaschev. Reverse engineering of temporal queries mediated by LTL ontologies. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 3230–3238. ijcai.org, 2023. doi:10.24963/IJCAI.2023/360.
- [20] Jean-Raphaël Gaglione, Daniel Neider, Rajarshi Roy, Ufuk Topcu, and Zhe Xu. Maxsat-based temporal logic inference from noisy data. Innov. Syst. Softw. Eng., 18(3):427–442, 2022. doi:10.1007/S11334-022-00444-8.
- [21] E. Mark Gold. Complexity of automaton identification from given data. Inf. Control., 37(3):302–320, 1978. doi:10.1016/S0019-9958(78)90562-4.
- [22] Antonio Ielo, Mark Law, Valeria Fionda, Francesco Ricca, Giuseppe De Giacomo, and Alessandra Russo. Towards ilp-based ltlf passive learning. In Inductive Logic Programming: 32nd International Conference, ILP 2023, Bari, Italy, November 13–15, 2023, Proceedings, pages 30–45, Berlin, Heidelberg, 2023. Springer-Verlag. doi:10.1007/978-3-031-49299-0_3.
- [23] Neil Immerman. Number of quantifiers is better than number of tape cells. J. Comput. Syst. Sci., 22(3):384–406, 1981. doi:10.1016/0022-0000(81)90039-8.
- [24] Jean Christoph Jung, Vladislav Ryzhikov, Frank Wolter, and Michael Zakharyaschev. Extremal separation problems for temporal instance queries. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024, pages 3448–3456. ijcai.org, 2024. URL: https://www.ijcai.org/proceedings/2024/382.
- [25] Xiao Li, Cristian Ioan Vasile, and Calin Belta. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, September 24-28, 2017, pages 3834–3839. IEEE, 2017. doi:10.1109/IROS.2017.8206234.
- [26] Weilin Luo, Pingjia Liang, Jianfeng Du, Hai Wan, Bo Peng, and Delong Zhang. Bridging ltlf inference to GNN inference for learning ltlf formulae. In AAAI, pages 9849–9857. AAAI Press, 2022. doi:10.1609/AAAI.V36I9.21221.
- [27] Corto Mascle, Nathanaël Fijalkow, and Guillaume Lagarde. Learning temporal formulas from examples is hard. CoRR, abs/2312.16336, 2023. doi:10.48550/arXiv.2312.16336.
- [28] Sara Mohammadinejad, Jyotirmoy V. Deshmukh, Aniruddh Gopinath Puranic, Marcell Vazquez-Chanlatte, and Alexandre Donzé. Interpretable classification of time-series data using efficient enumerative techniques. In HSCC ’20: 23rd ACM International Conference on Hybrid Systems: Computation and Control, Sydney, New South Wales, Australia, April 21-24, 2020, pages 9:1–9:10. ACM, 2020. doi:10.1145/3365365.3382218.
- [29] Daniel Neider and Ivan Gavran. Learning linear temporal properties. In Nikolaj S. Bjørner and Arie Gurfinkel, editors, 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, pages 1–10. IEEE, 2018. doi:10.23919/FMCAD.2018.8603016.
- [30] Daniel Neider and Rajarshi Roy. What Is Formal Verification Without Specifications? A Survey on Mining LTL Specifications, pages 109–125. Springer Nature Switzerland, Cham, 2025. doi:10.1007/978-3-031-75778-5_6.
- [31] C.H. Papadimitriou. Computational Complexity. Theoretical computer science. Addison-Wesley, 1994. URL: https://books.google.de/books?id=JogZAQAAIAAJ.
- [32] Nir Piterman, Amir Pnueli, and Yaniv Sa’ar. Synthesis of reactive(1) designs. In E. Allen Emerson and Kedar S. Namjoshi, editors, Verification, Model Checking, and Abstract Interpretation, 7th International Conference, VMCAI 2006, Charleston, SC, USA, January 8-10, 2006, Proceedings, volume 3855 of Lecture Notes in Computer Science, pages 364–380. Springer, 2006. doi:10.1007/11609773_24.
- [33] Amir Pnueli. The temporal logic of programs. In Proc. 18th Annu. Symp. Found. Computer Sci., pages 46–57, 1977. doi:10.1109/SFCS.1977.32.
- [34] Adrien Pommellet, Daniel Stan, and Simon Scatton. Sat-based learning of computation tree logic. In Christoph Benzmüller, Marijn J. H. Heule, and Renate A. Schmidt, editors, Automated Reasoning - 12th International Joint Conference, IJCAR 2024, Nancy, France, July 3-6, 2024, Proceedings, Part I, volume 14739 of Lecture Notes in Computer Science, pages 366–385. Springer, 2024. doi:10.1007/978-3-031-63498-7_22.
- [35] Ritam Raha, Rajarshi Roy, Nathanaël Fijalkow, and Daniel Neider. Scalable anytime algorithms for learning fragments of linear temporal logic. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 263–280, Cham, 2022. Springer International Publishing. doi:10.1007/978-3-030-99524-9_14.
- [36] Ritam Raha, Rajarshi Roy, Nathanaël Fijalkow, Daniel Neider, and Guillermo A. Pérez. Synthesizing efficiently monitorable formulas in metric temporal logic. In VMCAI (2), volume 14500 of Lecture Notes in Computer Science, pages 264–288. Springer, 2024. doi:10.1007/978-3-031-50521-8_13.
- [37] Heinz Riener. Exact synthesis of LTL properties from traces. In FDL, pages 1–6. IEEE, 2019. doi:10.1109/FDL.2019.8876900.
- [38] Rajarshi Roy, Dana Fisman, and Daniel Neider. Learning interpretable models in the property specification language. In IJCAI, pages 2213–2219. ijcai.org, 2020. doi:10.24963/IJCAI.2020/306.
- [39] Kristin Yvonne Rozier. Specification: The biggest bottleneck in formal methods and autonomy. In VSTTE, volume 9971 of Lecture Notes in Computer Science, pages 8–26, 2016. doi:10.1007/978-3-319-48869-1_2.
- [40] Dorsa Sadigh, Eric S. Kim, Samuel Coogan, S. Shankar Sastry, and Sanjit A. Seshia. A learning based approach to control synthesis of markov decision processes for linear temporal logic specifications. In 53rd IEEE Conference on Decision and Control, CDC 2014, Los Angeles, CA, USA, December 15-17, 2014, pages 1091–1096. IEEE, 2014. doi:10.1109/CDC.2014.7039527.
- [41] Mojtaba Valizadeh, Nathanaël Fijalkow, and Martin Berger. LTL learning on gpus. In Arie Gurfinkel and Vijay Ganesh, editors, Computer Aided Verification - 36th International Conference, CAV 2024, Montreal, QC, Canada, July 24-27, 2024, Proceedings, Part III, volume 14683 of Lecture Notes in Computer Science, pages 209–231. Springer, 2024. doi:10.1007/978-3-031-65633-0_10.
- [42] Hai Wan, Pingjia Liang, Jianfeng Du, Weilin Luo, Rongzhen Ye, and Bo Peng. End-to-end learning of ltlf formulae by faithful ltlf encoding. In AAAI, pages 9071–9079. AAAI Press, 2024. doi:10.1609/AAAI.V38I8.28757.
- [43] Andrzej Wasylkowski and Andreas Zeller. Mining temporal specifications from object usage. Autom. Softw. Eng., 18(3-4):263–292, 2011. doi:10.1007/S10515-011-0084-1.