On Boolean Closed Full Trios and Rational Kripke Frames

We study what languages can be constructed from a non-regular language L using Boolean operations and synchronous or non-synchronous rational transductions. If all rational transductions are allowed, one can construct the whole arithmetical hierarchy relative to L. In the case of synchronous rational transductions, we present non-regular languages that allow constructing languages arbitrarily high in the arithmetical hierarchy and we present non-regular languages that allow constructing only recursive languages. A consequence of the results is that aside from the regular languages, no full trio generated by a single language is closed under complementation. Another consequence is that there is a fixed rational Kripke frame such that assigning an arbitrary non-regular language to some variable allows the definition of any language from the arithmetical hierarchy in the corresponding Kripke structure using multimodal logic.


Introduction
The study of closure properties of language classes has a long tradition, it can be traced back to the introduction of regular languages [10]. Among other applications, they provide insights about whether languages belong to certain classes and, as far as they are effective, allow the computation of representations of languages. They also often serve as a way to describe language classes without reference to concrete generating or accepting devices: In many cases, a language class can be described as the smallest class of languages that possesses a given collection of closure properties and contains certain generating languages.
Here, we are concerned with Boolean closed full trios, i.e., classes closed under the Boolean operations (union, intersection, and complementation) and rational transductions. It is well-known that the class of regular languages constitutes a Boolean closed full trio.
This combination of closure properties is interesting for several reasons. First, in the case of regular languages, this particular collection is exploited, for example, in the theory of automatic structures [9], since it implies that in such structures, every first-order definable relation can be represented by a regular language. Since emptiness is decidable for regular languages, one can therefore decide the first-order theory of these structures.
Second, the languages definable by multimodal logic in a rational Kripke frame, i.e., a Kripke frame in which the worlds are words and the visibility relations are given by rational transductions, are always confined to the Boolean closed full trio generated by the values (that is, languages) assigned to the variables. This was observed by Bekker and Goranko [2] and then used to show that the model checking problem for multimodal logic and rational Kripke frames is decidable if all variables are assigned regular languages.
Third, a wide range of interesting language classes are principal full trios, i.e., full trios that are generated by one language. Since these are always union closed, their closure under complementation is equivalent to the class being a Boolean closed full trio. Examples of principal full trios are the context-free languages, languages accepted by multicounter automata (for a bounded number of counters and blind, partially blind, or with zero test [7]), and the languages accepted by valence automata over a finitely generated monoid [6].
Hence, the question arises whether there are language classes beyond the regular languages that enjoy these closure properties and still admit decision procedures for simple properties such as emptiness. Our first main result (Theorem 9) states that every Boolean closed full trio that contains any non-regular language already includes the whole arithmetical hierarchy (and even the arithmetical hierarchy relative to this language) and thus loses virtually all decidability properties. This is a remarkable fact, because it means that these closure properties are so extremely powerful that even the simplest non-regular languages allow the construction of a very large class of languages.
A large number of grammar and automata models is easily seen to exceed the regular languages but stay within the recursively enumerable languages. Hence, Theorem 9 also implies that the corresponding language classes are never Boolean closed full trios. We can also conclude that other than the regular languages, no principal full trio is closed under complementation.
It should be noted that Theorem 9 does not mean that there is no way of developing a theory of automatic structures beyond regular languages. It might well be that some smaller collection of closure properties suffices to obtain all first-order definable relations and still admits a decision procedure for the emptiness problem.
Actually, it turns out that three fixed rational transductions, together with the Boolean operations, suffice to construct all arithmetical languages from any non-regular language. Therefore, our second main result (Theorem 14) states that there is a fixed rational Kripke frame with three modalities such that assigning any non-regular language to a variable allows the definition of every arithmetical language using multimodal logic.
Other results of a similar spirit on closure properties of language classes have been known for a long time. For example, Hartmanis and Hopcroft [8] have proved that every intersection closed full AFL containing {a n b n | n ∈ N} includes the recursively enumerable languages. Here, a full AFL is a full trio that is closed under union and the Kleene star. Furthermore, Book [4] has shown that the arithmetical languages constitute the smallest Boolean closed full trio that is closed under homomorphic replication, the latter of which is a generalization of homomorphisms. Hence, our result means in Book's result one can replace the homomorphic replication by containment of any non-regular language. However, to the best of the authors' knowledge, to date there is no known combination of natural closure properties that are enjoyed by the regular languages but that yield all the recursively enumerable languages (let alone the arithmetical hierarchy) when applied to any non-regular language.

Preliminaries
Let Σ be a fixed countable set of abstract symbols, the finite subsets of which are called alphabets. Given an alphabet X, the set of words over X is denoted by X * and the empty word by λ. Subsets of X * for alphabets X are called languages. For a language L, the smallest alphabet X with L ⊆ X * is denoted by α(L). The complement of L is defined as Let M be a monoid with neutral element 1. An automaton over M is a tuple A = (Q, M, E, q 0 , F ), in which Q is a finite set of states, E is a finite subset of Q × M × Q called the set of edges, q 0 ∈ Q is the initial state, and F ⊆ Q is the set of final states. The step relation ⇒ A of A is a binary relation on Q × M , for which (p, a) ⇒ A (q, b) if and only if there is an edge (p, c, q) such that b = ac. The set generated by A is then A set R ⊆ M is called rational if it can be written as R = S(A) for some automaton A over M . A rational language is also called regular. We use REG to denote the class of regular languages.
A valence automaton over M is an automaton A over the monoid X * × M , where X is an alphabet. The language accepted by A is defined as L(A) = {w ∈ X * | (w, 1) ∈ S(A)}. The class of languages accepted by valence automata over M is denoted by VA(M ).
Given alphabets X and Y , a rational transduction is a rational subset of the monoid X * × Y * . For a language L ⊆ Y * and a rational transduction R, we write RL for {x ∈ X * | ∃y ∈ L : (x, y) ∈ R}.
A language class is a set of languages that contains at least one non-empty language. A language class C is called a full trio (or cone) if it is closed under (arbitrary) homomorphisms, inverse homomorphisms, and intersection with regular languages. It is well-known [3] that a class C is a full trio if and only if it is closed under rational transductions, i.e., for every L ∈ C and every rational transduction R, we have RL ∈ C. We call a language class Boolean closed if it is closed under all Boolean operations (union, intersection, and complementation). By the full trio generated by the language L we mean the smallest full trio that contains L. A full trio is called a principal full trio if it is generated by some language.
For any language class C, we write RE(C) for the class of languages accepted by some Turing machine with an oracle L ∈ C. Similarly, let REC(C) be the class of languages accepted by some Turing machine that halts on every input and has access to an oracle L ∈ C. Furthermore, let REC denote the class of recursive languages. We also write REC(L) and RE(L) for REC({L}) and RE({L}), respectively. Then the arithmetical hierarchy (see, for example, [11]) is defined as Languages in AH are called arithmetical. The arithmetical hierarchy relative to L is defined as We will often encode words from an alphabet X, |X| ≥ 2, by words in {0, 1} * . If X = {a 1 , . . . , a n }, then a homomorphism g : X * → {0, 1} * with g(a i ) = 10 i will be called a standard encoding. For each subset Y ⊆ X, the homomorphism π Y : Let X be an alphabet. For languages L ⊆ X * and words u, v ∈ X * , we write u ≡ L v if for each w ∈ X * , we have uw ∈ L if and only if vw ∈ L. The equivalence relation ≡ L is called the Myhill-Nerode equivalence. The well-known Myhill-Nerode Theorem states that L is regular if and only if ≡ L has a finite index.
Remark. In the following, we will make statements about certain languages in {0, 1} * being obtainable from other languages in {0, 1} * either by using a finite set of transductions or by using a finite set of transductions and Boolean operations. It will then always be possible to use larger alphabets with auxiliary symbols for the following reason. Suppose there is a finite set S of transductions, each over the alphabet X = {a 1 , . . . , a n }, where {0, 1} ⊆ X. Let h : X * → {0, 1} * be a standard encoding. Then we have By induction, it follows that for every language K ⊆ X * that can be obtained from L ⊆ X * using transductions in S (and Boolean operations), we can obtain h(K) from h(L) by using transductions in

Boolean closed full trios
There is a finite set F of rational transductions in X * × X * such that each regular language K ⊆ X * can be obtained from any non-empty L ⊆ X * using transductions in F .

Proof. It suffices to prove the lemma for
Our goal is to produce the language T A of all words 10 i0 1x 1 10 i1 · · · x n 10 in , such that i 0 = 0, i n = 1, and x j ∈ {0, 1} and (i j , x j+1 , i j+1 ) ∈ E for 0 ≤ j < n. Then, clearly, the rational transduction P that outputs only the x j will satisfy P T A = K. By the above remark, it suffices to provide transductions over the extended alphabet Y = {0, 1, # 1 , # 2 }. The additional symbols # 1 , # 2 are called markers.
First we use the initial transduction I = 1(1{0, 1}10 * ) * 1{0, 1}10 × {0, 1} * to produce the set 1(1{0, 1}10 * ) * 1{0, 1}10 from L. In the following, a word 10 i0 1x 1 10 i1 · · · x n 10 in is called an encoding. Its factors 0 ij are called state blocks and its factors 0 ij 1x10 ij+1 are called transition blocks. The transduction I already guarantees that the leftmost and the rightmost state block correspond to the initial and the final state, respectively. We now wish to remove all words that contain a state block of length greater than k. In order to do this, we use the transduction S 1 , which inserts the marker # 1 in the beginning of every state block. Furthermore, we have the transduction M 1 , which moves each occurrence of the marker one position to the right (i.e. outputs 0# 1 on input # 1 0) if its right neighbor is a 0, and drops the occurrence otherwise. We also have the transduction R, which rejects all inputs that have a factor # 1 0. All other words are accepted by R but stripped of their occurrences of # 1 in the output. Then applying RM k 1 S 1 yields the set of encodings with state blocks of length at most k.
In the next step, we wish to remove from the language all encodings that contain a transition block 10 x10 m with x ∈ {0, 1}, 0 ≤ , m ≤ k, and ( , x, m) / ∈ E. To this end, we have the transductions S 2 and M 2 , which behave analogously to S 1 and M 1 by using # 2 instead of # 1 . We assume that S 1 and S 2 are defined so as to add their marker and leave the other marker in place. We assume further that M 1 and M 2 move their marker so as to overtake the other marker if necessary. Finally, we have for each x ∈ {0, 1} the transition R x , which rejects every word containing a transition block in which # 1 is on the right end of the left state block, # 2 is on the right end of the right state block, and the input letter is x. All other words are accepted by R x but stripped of all occurrences of markers. Applying R x M m 2 S 2 M 1 S 1 clearly yields the set of encodings that do not contain the transition block 10 1x10 m . Therefore, we apply this sequence of transductions for each triple ( , x, m) with 0 ≤ , m ≤ k, x ∈ {0, 1}, and ( , x, m) / ∈ E. This clearly produces the language T A and hence K = P T A is obtained. Since we only used transductions in {Λ, Λ , P, I, S 1 , S 2 , M 1 , M 2 , R, R 0 , R 1 }, the lemma is proven.

Lemma 2.
Let X be an alphabet with |X| ≥ 2. For each finite set F of rational transductions in X * × X * , there are rational transductions R, S, T in X * × X * such that every composition of transductions from F can be written in the form T n S m R with m, n ∈ N.
Proof. Let 0, 1 ∈ X be distinct letters and for x ∈ {0, 1}, let A x be the transduction that appends x to each input word, hence A x = {(wx, w) | w ∈ X * }. Furthermore, let F = {U 0 , . . . , U k−1 }, b = k + 1, and let U i be the rational transduction We shall prove that R = A 1 , S = A 0 , and T = 0≤i≤k U i have the desired property. Let U in · · · U i0 be a composition of elements of F and let i n+1 = k. We claim that Applying S m R appends 10 m to each input word. Then, each application of T to a word w10 chooses some U j , but this choice will only lead to a valid computation of the transducer if is congruent to j modulo b. Hence, applying T n+1 to w10 m has the same effect as applying U in · · · U i0 . Since the most significant digit in the b-ary representation of m is i n+1 = k, applying T once more means applying U k and hence removing the 10 k suffix of the input word. In the end, we applied U in · · · U i0 . Lemmas 1 and 2 together immediately imply the following byproduct, which might be of independent interest. Corollary 3. Let X = {0, 1}. There are rational transductions R, S, T over X * such that every regular language K ⊆ X * can be written as T n S m RX * for some m, n ∈ N.
We define the alphabet ∆ = {+, −, z}, whose elements will represent the operations increment, decrement, and zero test, respectively.

Definition 4.
Let C ⊆ ∆ * be the set of words δ 1 · · · δ m , δ 1 , . . . , δ m ∈ ∆ for which there are numbers x 0 , . . . , x m ∈ N such that for 1 ≤ i ≤ m: We shall prove that from L we can construct the following languageĈ L using a fixed finite set of rational transductions and Boolean operations.
For the inclusion "⊆", let δ 1 · · · δ m ∈ π ∆ (Ĉ L ). Then there are words v 0 , . . . , v m ∈ X * , u 0 , . . . , u n ∈ X * with v 0 δ 1 v 1 · · · δ m v m #u 0 # · · · u n # ∈Ĉ L . By the definition ofĈ L , this means for each 1 ≤ i ≤ m, there is a 1 ≤ j ≤ n such that 1-3 of Definition 5 hold. Hence, we can pick for each 1 ≤ i ≤ m an x i ∈ {1, . . . , n} such that 1-3 of Definition 5 hold with j = x i . Note that since this implies v i−1 ≡ L u j−1 for δ i ∈ {+, z} and v i−1 ≡ L u j for δ i = − and the u k are pairwise incongruent w.r.t. ≡ L , this choice of x i is unique. It can now be verified by induction on i that the conditions 1-3 of Definition 4 are satisfied.
The following lemma is the central ingredient in our proof. The idea is to constructĈ L , which by Lemma 6 allows us to obtain C.
There is a finite set F of rational transductions such that for any non-regular L ⊆ X * , the language C can be obtained from L using transductions in F and Boolean operations.
Proof. We will use the alphabet Y = X ∪ {#} ∪ ∆. We prove the lemma by constructing C from L using a sequence of Boolean operations and transductions T 1 , . . . , T 19 over Y * for which it will be clear that they do not depend on L.
There are clearly rational transductions T 1 and T 2 with which means we can construct W 1 and W 2 . Hence, can also be constructed. We can clearly find a rational transduction T 3 with This means P = {u#v | u ≡ L v} = X * #X * \ W = T 4 W , for some T 4 , can be constructed. With suitable rational transductions T 5 , T 6 , we have meaning that S can be constructed as well. Let M (matching) be the set of all words for suitable rational transductions T 7 , . . . , T 12 , we can also construct M . Let E (error) be the set of words v 1 δv 2 #u 0 # · · · u n # such that for every 1 ≤ j ≤ n, we have v 1 δv 2 #u j−1 #u j / ∈ M or we have δ = z and v 1 ≡ L u 0 . Since for some rational transduction T 13 , we can construct E . Furthermore, since for some rational transductions T 14 , T 15 , we can construct E. Let N (no error) be the set of words v 0 δ 1 v 1 · · · δ m v m #u 0 # · · · u n # such that for every for some rational transductions T 16 , T 17 , we can construct N . Now we haveĈ L = N ∩ (X * ∆) * X * #S = N ∩ T 18 S for some rational transduction T 18 , meaning we can constructĈ L . By Lemma 6, we have C = T 19ĈL for some rational transduction T 19 . This proves our claim and hence the lemma.
There is a finite set F of rational transductions in X * × X * such that for any non-regular L ⊆ X * , each K ∈ RE, K ⊆ X * , can be obtained from L using transductions in F and Boolean operations.
Proof. Let F contain the set of rational transductions provided by Lemma 1 and the one provided by Lemma 7. We will use the alphabet Y = X ∪ ∆ ∪ {#} and a standard encoding g : Y * → X * .
Suppose K ⊆ X * is recursively enumerable and let A = (Q, X, E, q 0 , Q f ) be a 2-counter machine, E ⊆ Q×X * ×∆×∆×Q, accepting K and with Q = {0, . . . , k} and Q f = {k}. Here, we assume that the machine operates on both counters in each step. Let R be the regular language of all words 0 m0 n i=1 #w i #δ i , m i ) ∈ E for every 1 ≤ i ≤ n, m 0 = 0, and m n = k. We can obtain g(R) from L using only transductions in F . Thus, we can obtain R = g −1 (g(R)). Clearly, there are rational transductions T 1 and T 2 such that meaning that we can also obtain U . Finally, applying to U the transduction T 3 that outputs all occurrences of X after odd occurrences of # up to the next occurrence of # clearly yields K. If we let F consist of F and g −1 , T 1 , T 2 , T 3 , the lemma is proven.
There are rational transductions R, S, T over X * such that for any non-regular L ⊆ X * , each K ∈ AH(L), K ⊆ X * , can be obtained from L using R, S, T and Boolean operations.
Proof. We shall prove that there is a finite set F of rational transductions in X * × X * such that for any K ⊆ X * , we can obtain each M ∈ RE(K), M ⊆ X * , from K and L using transductions in F and Boolean operations. This clearly implies that we can obtain all of Σ 1 (L) = RE(L) from L and hence, by induction on i, all of Σ i (L) from L. According to Lemma 2 we can then find transductions R, S, T that have the desired property.
Let F be the set of transductions provided by Lemma 8 and let K ⊆ X * be arbitrary and M ∈ RE(K), M ⊆ X * . This means there is an oracle Turing machine A such that M is accepted by A K . We will use the extended alphabet Y = {0, 1, # 1 , # 2 } and a standard encoding g : Y * → {0, 1} * . Let M ⊆ Y * be the set of words such that there is an accepting computation in A with input w and in which oracle queries about u 1 , . . . , u n are made with a positive result and oracle queries about v 1 , . . . , v m are made with a negative result. Note that this does not mean that u i ∈ K or v i / ∈ K, we collect all computations that A could make and what inputs would be accepted provided that an oracle answered as specified. Then M is clearly recursively enumerable. Therefore, g(M ) can be obtained from L by transductions in F and Boolean operations.
Hence, we can obtain M = g −1 (g(M )) from L. Furthermore, since for some rational transductions T 1 , T 2 , we can construct (K# 1 ) * and (K# 2 ) * from K. Moreover, since for suitable rational transductions T 3 , T 4 , we can construct M from K and L. If we now apply a transduction T 5 that for an input from Y * outputs the longest suffix in X * , we obtain M from K and L. Since, apart from the transductions in F , we only used g −1 and T 1 , . . . , T 5 , the lemma follows.

Corollary 10.
Let L ⊆ X * be a non-regular language. Then AH(L) is the smallest Boolean closed full trio containing L.
Proof. Let T be the smallest Boolean closed full trio containing L. If |X| ≤ 2, Theorem 9 implies that T includes AH(L). If |X| > 2, let g : X * → {0, 1} * be a standard encoding. Then g(L) is non-regular as well and we have AH(L) = AH(g(L)). Hence, according to Theorem 9, T includes AH(L) = AH(g(L)). The fact that AH(L) is a Boolean closed full trio concludes the proof.
The following corollary applies to a wide range of language classes. A full semi-AFL is a union closed full trio. Although the authors are not aware of any particular full semi-AFL for which it is not known whether complementation closure is available, the following fact is interesting because of its generality.
Corollary 11. Other than the regular languages, no full semi-AFL C ⊆ RE is closed under complementation.
Proof. Suppose C were a complementation closed full semi-AFL that contains a non-regular language. According to Theorem 9, it would already include AH and thus not be included in RE.
Note that the following corollary is not a special case of Corollary 11 as it is not restricted to language classes below RE.

Corollary 12. A principal full trio is closed under complementation if and only if it coincides with the regular languages.
Proof. Let T be a principal full trio generated by the language L. If L is regular, T coincides with the regular languages and is therefore closed under complementation.
Suppose L is not regular. T consists of all languages of the form RL, where R is a rational transduction. Hence, T is contained in RE(L) and closed under union. If T were closed under complementation, it would be closed under all Boolean operations and thus, by Theorem 9, contain AH(L). Since RE(L) AH(L), this is a contradiction. Proof. Let L be the identity language corresponding to some finite generating set of M . Since VA(M ) is the principal full trio generated by L, Corollary 12 yields the equivalence between 1 and 2. The equivalence between 2 and 3 has been shown in [14] (and independently in [16]).

Rational Kripke frames
Theorem 9 can be also restated in terms of multimodal logic. A Kripke structure (or edgeand node-labeled graph) is a tuple where V is a set of nodes (also called worlds), A and P are finite sets of actions and propositions, respectively, for every a ∈ A, E a ⊆ V × V , and for every p ∈ P , U p ⊆ V .
The tuple F = (V, (E a ) a∈A ) is then also called a Kripke frame. We say that K (and F) is word-based if V = X * for some finite alphabet X. Formulas of multimodal logic are defined by the following grammar, where p ∈ P and a ∈ A: The semantics [[ϕ]] K ⊆ V of formulas ϕ in K is defined inductively as follows: A word-based Kripke frame F = (X * , (E a ) a∈A ) is called rational if every E a is a rational transduction. Rational Kripke frames with a single relation are also known as rational graphs and have been studied intensively [5,12,13]. A word-based Kripke structure K = (X * , (E a ) a∈A , (U p ) p∈P ) is called rational if every relation E a is a rational transduction and every U p is a regular language. The closure properties of regular languages imply that for every rational Kripke structure K and every multimodal formula ϕ, the set [[ϕ]] K is a regular language that can be effectively constructed from ϕ and (automata describing the structure) K. Using this fact, Bekker and Goranko [2] proved that the model-checking problem for rational Kripke structures and multimodal logic is decidable. This problem has as input a rational Kripke structure K (given by a tuple of automata and transducers), a word w ∈ X * (where X * is the node set of K), and a multimodal formula ϕ, and it is asked whether w ∈ [[ϕ]] K holds. In contrast, there exist rational graphs (even acyclic ones) with an undecidable first-order theory [5,15], but every rational tree has a decidable first-order theory [5]. Rational Kripke structures and frames were also considered in the context of querying graph databases [1]. Our reformulation of Theorem 9 in terms of multimodal logic is: There are rational transductions E r , E s , E t in X * such that the rational Kripke frame F = (X * , E r , E s , E t ) has the following property: For every non-regular language U p ⊆ X * and every language K ∈ AH(U p ), K ⊆ X * , there exists a multimodal formula ϕ such that K = [[ϕ]] K , where K = (X * , E r , E s , E t , U p ).
Proof. Take the rational transductions R, S, T provided by Theorem 9. Let U p ⊆ X * be a non-regular language and take the Kripke structure K = (X * , E r , E s , E t , U p ), where E r = R, E s = S, and E t = T . By induction, we can construct for every language K obtainable from The question arises whether an analogous statement holds when we allow choosing an arbitrary non-rational transduction instead of an arbitrary non-regular language. In other words: Are there rational transductions R 1 , . . . , R n and regular languages L 1 , . . . , L m over an alphabet X such that for any non-rational transduction T , the Kripke structure (X * , R 1 , . . . , R n , T, L 1 , . . . , L m ) allows to define every arithmetical language in multimodal logic? The answer is no, since there are non-rational transductions T that preserve regularity, i.e., for which T L is regular whenever L is regular. Take, for example, the transduction T = {(w, ww) | w ∈ X * }. It is clearly not rational, since T −1 X * = {ww | w ∈ X * } is not regular. However, it is not hard to see that T L is effectively regular for regular languages L [17]. In particular, for every choice of R 1 , . . . , R n and L 1 , . . . , L m as above, every language definable in (X * , R 1 , . . . , R n , T, L 1 , . . . , L m ) is regular and effectively constructible, implying that the model-checking problem is decidable.

Open problems
An interesting open problem is whether in Theorem 9 one can replace the rational transductions by suitable synchronized rational relations. A relation R ⊆ X * × X * is synchronized rational if the set of all convolutions u ⊗ v with (u, v) ∈ R is a rational language. The convolution of two words u = a 1 a 2 · · · a n and v = b 1 b 2 · · · b m is the word (a 1 , b 1 )(a 2 , b 2 ) · · · (a k , b k ) where k = max{n, m}, a i = # for i > n, and b i = # for i > m. Here, # is a fresh symbol not appearing in any pair from R. In other words, R can be recognized by an automaton on two tapes where both heads move synchronously. Synchronized rational relations underlie the definition of automatic structures [9]. Note that the rational transductions used in the proof of Theorem 9 are not synchronized rational. Another open question is whether the number of rational transductions in Theorem 9 can be reduced to 1 or 2.