Minimal DFAs Witnessing Language Inequivalence
Abstract
We study small witnesses for the inequivalence of two regular languages. A natural witness is a distinguishing word, e.g. a word in exactly one of the two languages. We propose using more succinct witnesses in the form of witnessing DFAs. A witnessing DFA recognizes a subset of one of the languages and contains at least one distinguishing word. In this way the DFA expresses behaviour contained in the first language but not the second. We show witnessing DFAs can be used to present more concise witnesses for the inequivalence of two regular languages. We show that the decision problem for the existence of a witnessing DFA of certain size is NP-complete in general, and in P in the special case of unary DFAs. Besides these computational aspects, we study structural properties of witnessing DFAs. Not all languages can be a minimal witness. It turns out that minimal witnesses are exactly the languages that are not decomposable in the union of languages with smaller state-complexity, the so-called prime languages as studied earlier by Kupferman and Mosheiff.
Keywords and phrases:
Deterministic Finite Automata, Language Inequivalence, DFA decomposition, Prime languages2012 ACM Subject Classification:
Theory of computation Regular languagesEditors:
Stefano Guerrini and Barbara KönigSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
We are motivated by computing witnesses that explain difference in behaviour of two Deterministic Finite Automata (DFAs). In the formal verification of complex systems it can be key to compute witnesses and explanations for certain properties. Rather than a computer telling two DFAs are inequivalent, we would like to compute a witness that explains why the DFAs differ. We focus on the smallest witnesses since simple explanations are usually the best explanations.
A straightforward approach to explain the inequivalence of two DFAs would be to provide a distinguishing word, i.e. a word that is accepted by one of the automata but not the other. The method of finding minimal length distinguishing words is well understood and decidable in polynomial time [15]. An efficient implementation is given in [19] that has the same runtime complexity as the best known algorithm for DFA minimization known as Hopcroft’s minimization [8].
A natural question is if more powerful properties could result in smaller witnesses explaining the inequivalence of languages. Given two DFAs and , any property of the language recognized by that is not satisfied by the language of can be used to witness their inequivalence. We propose to use inclusion of a third language as property. In this way a DFA contains the property if the language of is included in the language recognized by . In this work we study DFAs that witness inequivalence of two regular languages. In particular, a minimal witnessing DFA can be much more succinct than a distinguishing word.
The state complexity, also called index, of a language is the number of states of the minimal DFA recognizing . This descriptive complexity measure is well studied in literature, and forms a natural notion of size of a regular language. In this way properties expressed by DFAs are smaller witnesses that provide shorter and more intuitive explanation.
For example, consider the DFAs and shown in Figure 1. The shortest distinguishing word for these DFAs is . Indeed, we confirm but . A different explanation for the inequivalence of and could be: every odd length sequence of ’s is accepted. This property is satisfied by and is not true for .
We call a DFA a witness for two DFAs if the language recognized is a subset of exactly one of the two DFAs. It is then said to distinguish the two languages. In the example from Figure 1, we see that the minimal witnessing DFA contains only two states, i.e. the DFA such that . An automaton recognizing only the minimal distinguishing word would contain at least eight states. Therefore, a witnessing DFA can be much more succinct than a distinguishing word.
In this paper we study witnessing DFAs. We show that there are families of DFAs for which there are witnessing DFAs of constant size, where the length of the minimal distinguishing word grows linearly in the size of the DFAs. Additionally, we prove that computing minimal witnessing DFAs is NP-complete. In the case of unary DFAs this problem is computable in polynomial time.
This last result is due to a connection of witnessing DFAs with so-called prime languages [12]. Prime languages are the languages for which there is no decomposition in languages of smaller DFAs. We show that a DFA is a minimal witness for inequivalence of regular languages if and only if it is prime.
The reduction from CNF-SAT that proves the NP-completeness of deciding witnessing DFAs encodes satisfying assignment in structure of concatenation instead of composition used in other hardness proofs [11]. To show that this construction is more widely useable we show how modify the construction to obtain a simpler proof of the coNP-hardness of deciding the minimal pumping constant.
Structure
This work is structured as follows. First in Section 2 we cover the notation we use and introduce the main concept of witnessing DFAs. In Section 3 we show that computing minimal witnesses is NP-complete. After that, in Section 4, we demonstrate that the proof of the previous section can be more widely used in DFA decision problems. In Section 5 we discuss the connection between minimal witnessing DFAs, language compositionality, and so-called prime languages. In Section 6 we relate our results to the problem of separating words [4]. Finally, in Section 7 we introduce two polynomial time algorithms. The first algorithm computes a witnessing DFA that is not necessarily minimal. The second algorithm computes a minimal DFA that witnesses the non-inclusion of two unary languages. We end with conclusions and related work.
2 Notation & Preliminaries
For two natural numbers we write as the closed interval from to . Given a finite alphabet , a finite sequence of elements of is called a word. We write for the empty word and define as the set of all words over of length , and for all words over . We write for all non-empty words over . Given words , we write and for word concatenation. Additionally, given a number and a word we write for the concatenation of times the word .
Definition 1.
A Deterministic Finite Automaton (DFA) is a five-tuple consisting of:
-
a finite set of states,
-
a finite set of symbols called the alphabet,
-
the transition function,
-
the initial state, and
-
the set of final states.
The transition function extends naturally to a transition function for words . This is done inductively as follows:
The language recognized by a DFA is denoted by , and consists of all words such that . A language is called regular iff there is a DFA such that . We write for the DFA , which recognizes the complement of .
The Myhill-Nerode theorem is a useful tool to establish the number of states necessary to recognize a language. It is based on the equivalence relation between words that have the exact same accepting extensions.
Definition 2.
Let be words and a language, then if and only if for all it holds that .
Theorem 3 (Myhill-Nerode [16, 17]).
Let be a language, then is regular if and only if the relation has a finite number of equivalence classes.
A more specific corollary of the theorem relates the number of equivalence classes of to the smallest number of states a DFA needs in order to recognize . We refer to this number as the index of a language and write it as . For a DFA we write for the index of the language accepted by .
Corollary 4.
Let be a regular language over an alphabet , then the smallest DFA that recognizes has states where is the number of equivalence classes of the relation .
Now we introduce the notion of witnessing DFA for the inequivalence of languages.
Definition 5.
Given DFAs such that , a DFA is called to witness the language inequivalence iff:
A minimal witness for the inequivalence of and is a witness that is minimal (in size) amongst all witnesses. Of course, for DFAs and , such that the DFA itself is already a witness DFA. We are interested in small witnesses for language inequivalence, since in general small DFAs express the most intuive properties.
The notion of minimal witnessing DFAs naturally leads to the following decision problem.
- -DFA-DIST:
-
Let and be DFAs such that , and a number. Decide if there is a DFA with at most states such that:
3 Minimal witnesses are NP-complete
First, we observe that DFA witnesses for language inequivalence can be arbitrarily smaller than distinguishing words. We show this by generalizing the construction from the example in the introduction (Figure 1).
Example 6.
For a number , let , be DFAs defined such that:
The minimal automata recognizing (resp. have (resp. ) states. The smallest word accepted by and not by has length . In this case the DFA accepting all odd length sequences of also acts as a witnessing DFA. Hence, a minimal witnessing DFA is , i.e. the DFA such that . The DFA contains only states, and thus the minimal witnessing DFA is asymptotically smaller than the minimal witnessing word.
The shape of the example automata and are given in Figure 2.
Next, we show the NP-completeness of computing minimal witnessing DFAs for language inequivalence. We prove -DFA-DIST is NP-hard by a reduction from CNF-SAT.
Theorem 7.
Deciding -DFA-DIST is NP-complete.
Before we introduce the reduction we define some notation in which we encode truth values of propositions. In the reduction we represent truth assignments as words over the Boolean alphabet . Given a set of propositional variables , a truth assignment is represented by the word , where for every . The set defines all words that represent truth assignments.
Now we are ready to introduce our reduction from CNF-SAT in order to prove Theorem 7. Let be a CNF formula over the propositional variables , we define two regular languages over the alphabet . The first language is the finite set of at most concatenated truth assignments separated by a , i.e.
The second language is a superset of . In addition to all the words of , the language contains words of more than concatenated truth assignments. Specifically, includes all words consisting of or more concatenated truth assignments of which the first consecutively satisfy the clauses . More precisely,
We sketch the shape of the languages with an example using an unsatisfiable formula.
Example 8.
Consider the CNF-formula over the propositional letters . The language is the language containing the words:
-
for , and
-
for .
The language contains:
-
if , and
-
for every .
Since is the only truth-assignment for that satisfies and only satisfies . The only words that witness the inequivalence are of the shape , for .
The languages and are regular, and hence there are automata that recognize these languages. In particular, there are automata recognizing these languages that are polynomial in size. One way of observing this fact is by inspecting the number of Myhill–Nerode equivalence classes of and .
Lemma 9.
Given a CNF formula , the languages and are recognizable by an automaton that is polynomial in the size of .
The next lemma proves the key fact of our reduction. Given a CNF formula that is satisfiable on the propositional letters , the language formed by a satisfying truth assignment represented as the word iterated arbitrary often is a small distinguishing automaton for the languages and . Conversely, a distinguishing automaton smaller than a certain size necessarily implies the existence of a satisfying truth assignment for .
Lemma 10.
Let be a CNF formula over propositional letters . Then is satisfiable if and only if there is a DFA with at most states such that and .
Proof.
We prove both directions of the implication separately.
- ()
-
Assume is satisfiable, then there is a satisfying truth assignment that is mapped to the word . We define the language , and show that witnesses this implication.
First we show that . Assume , if then by definition and hence also . If , since is a satisfying assignment, it holds for any that , and thus also . By covering both cases this means .
Next, we observe that , and thus . Hence, since any DFA that recognizes is a distinguishing automaton.
The minimal DFA such that contains one loop with states containing all positions of the word and a sink state to reject all other words. Thus, if is satisfiable we can construct with states that distinguishes and , which was to be showed.
- ()
-
We assume is a DFA with at most states such that for the language accepted it holds that and . We show that this means is satisfiable.
Since and there is a word accepted by . By definition is of shape where and and for every the word represents a satisfying truth assignment for the clause . Next we show that represents a satisfying truth assignment for by counting the number of equivalence classes of for the prefixes of , together with the postfix that witnesses an accepting postfix for .
We define the set as the set containing all prefixes of , i.e.
If and then , since there is a such that and and . This means there are distinct classes of . Lastly, since for any we can also conclude that for all .
Since we assumed that has at most states, by Corollary 4 there are at most equivalence classes of . Since trivially , by the pigeonhole principle there is a prefix such that at .
It can not be the case that for some , since
By eliminating all alternatives we conclude . Using this equivalence and since we derive that . In particular, this means that . By definition of this means that the truth assignment satisfies all clauses and hence it is a satisfying assignment for . This witnesses that is a satisfiable formula.
This lemma allows us to prove Theorem 7.
Proof of Theorem 7.
Membership of NP follows naturally. For two DFAs and we can, in polynomial time, check if . This can be done by computing the emptiness of . Moreover, either or itself necessarily already is a distinguishing automaton, so the minimal distinguishing DFA is definitely polynomial in size.
NP-hardness is a direct consequence of Lemma 10 and of the fact that , so the language of any distinguishing automaton is a subset of and not vice-versa.
4 Minimal Pumping Length
The pumping lemma is mostly famous for educational purposes in automata theory. The lemma asserts that in a regular language a word from a certain length can be pumped. A word in a language can be pumped if a non-empty part of it can be arbitrarily repeated while remaining in the language.
Definition 11 (Pumping lemma).
Let be a regular language. There is a such that for all words if then there is a decomposition such that , and for all : .
Recently, the computational complexity of deciding the minimal pumping length was studied in [6]. The decision problem associated with the pumping lemma is to decide if the lemma holds for a certain pumping constant .
- -PUMPING:
-
Let be a DFA, and a number. Decide if the pumping lemma holds with constant .
In [6] it was shown that computing the minimal pumping constant of the lemma is coNP-hard.
Theorem 12 ([6, Cor. 15]).
-PUMPING is coNP-complete.
The proof of the coNP-hardness of -pumping goes by a reduction from Hamiltonian cycle. A slightly modified language based on from Section 3 gives an alternative proof of this fact. The reduction is directly from the tautology of DNF formulas, which is the natural coNP-complete problem.
Let the language be a language similar to the one
previously mentioned, but for a propositional formula in Disjunctive Normal
Form (DNF). Here, a formula is said to be in disjunctive normal formal if
where each is a conjunction of literals.
The language contains all words , where there is an such that the truth
assignment satisfies ,
Lemma 13.
Given a DNF formula , the language is -PUMPING if and only if is a tautology.
Proof.
- ()
-
Let be a word representing a truth assignment. Since is -PUMPING, the pumping lemma holds for the word . Let be the composition as mentioned in the pumping lemma. Since is non-empty, and by the structure of it has to hold that for some . Observe that any such is of the shape , where . From this we can conclude that and thus, by definition of , that represents a satisfying assignment for . In particular, since was arbitrarily chosen, any assignment is a valid truth assignment and thus is a tautology.
- ()
-
Assume is a tautology, and such that . By definition the word can be represented as for some and truth assignments . We distinguish on whether .
-
Assume , then since is a tautology, there is a clause for which is a satisfying truth assignment. We verify that for the words , and , and that for all .
-
In the second case . By construction of the language for some , and hence there is a word that represents a truth assignment. Since is a tautology, there is a clause such that the word represents a satisfying assignment for . We show that , , verifies the pumping lemma. Indeed, in the word satisfies and thus . For any , the word , and since also .
This shows that in both cases, the language is -PUMPING.
-
5 Minimal witnesses are primes
In this section we study the structure of minimal witnesses and show that minimal witnesses correspond to the prime languages [12]. Prime languages are those that cannot be decomposed into a union of languages with smaller state complexity. In other words, a language can be -decomposed if it is the union of languages of index . This hierarchy on regular languages was first studied in [12].
Definition 14 (-composable).
A regular language is -composable if and only if there are DFAs , such that for all , and
A DFA is called prime iff it is minimal and not -composable for any .
Formally in [12], instead of union decomposition, the focus is intersection decomposition. For our purpose it makes more sense to look at union decomposition. All results for intersection compositionality can be trivially dualized for union composition, this is already observed in [12]. Hence, this difference is only a matter of presentation. If a DFA is union composite, e.g. for some DFAs , then also .
We show how the concept of prime languages is related to minimal witnesses. We start by the observation that not all DFAs are a minimal witness. This is illustrated by the following example.
Example 15.
Consider the language over the alphabet . The minimal automaton recognizing is shown in Figure 3. Observe that . Given a language such that , then necessarily there is a word such that the word . Let the language be defined as . Then is a subset of and witnesses the non-inclusion . Since , we see that for the non inclusion of in any language there is a witness of index smaller than the index of .
In general for each define the languages . The language always has a witness for non-inclusion with index exponentially smaller than the index of .
A natural question is when a DFA is a minimal witness for the non-inclusion of some languages . In the example above, the language is not prime. In fact, is -decomposable in the automata recognizing the languages for each . Now we see the language . Next, we prove that prime languages characterize exactly the languages that are minimal witnesses.
Theorem 16.
Given a DFA , is a minimal witness for the non-inclusion of some DFAs in , i.e. if and only if is prime.
Proof.
- ()
-
Given a prime DFA , we will show that there are DFAs and such that is a minimal witness for their non-inclusion. First we define the floor of , by
Now from [12, Theorem 2.2] it holds that since is prime, there is a primality witness such that . Let . Now, for all DFAs , in other words all DFAs such that and , necessarily . Now it is easy to see there is no smaller DFA than that witnesses , and hence is a minimal witness for the non inclusion .
- ()
-
Assume is a minimal witness for the DFAs and . Since is minimal, it holds for all DFAs that is no witness for the non-inclusion of and . Hence, for all it holds that . Since there is a primality witness such that , hence is prime.
A direct consequence of this theorem is that for any DFA and it holds that if is -composable then any non-inclusion is witnessed by a DFA of index at most . Unfortunately, the converse is not necessarily true. A witnessing DFA does not need to be part of a decomposition. For example, consider the minimal automaton recognizing the language . This DFA contains states and is prime. However, in order to witness that we can use the minimal DFA recognizing the language , which only contains states, but is not part of the decomposition of since is prime.
6 Separating words
A special case of the minimal witness problem is the separating words problem. The problem is to find the smallest DFA that distinguishes two given words, in the sense that it accepts one and rejects the other. Given two words what is the minimal DFA such that . In our context this question is exactly to compute a witness for the inequivalence of and . We define as the minimal number of states in a DFA that distinguishes from – accepting one and rejecting the other.
The study of the separating words problem focuses on the asymptotic behavior of the maximum value of , taken over all pairs of words and of length over an alphabet of size . This is captured by the function :
Computing is NP-complete, which is a corollary from the study of identity checking in arbitrary finite semigroups [1]. Two words (often called terms in the context of algebras) form an identity for a finite semigroup if, for every projection , the equality holds. Here, extends to words by applying the semigroup operation, that is,
The identity checking problem for a semigroup is defined as deciding whether is an identity.
A particular semigroup is the transformation group , which consists of all functions from to itself, with function composition as operation.
Theorem 17 (([1, Thm. 2],[10, Thm. 1])).
The identity checking problem for the transformation group is coNP-complete for .
In [2] it is noticed how this is closely related to the separating words problem.
Lemma 18.
Let be a number. Given an alphabet and words . The words are not an identity in if and only if there is a DFA with at most states such that and .
Proof.
If and are not an identity in , then, by definition, there exists a projection such that . Note that, since , there is an such that . Given this witnessing projection , and number we define an -state DFA , where , , and the transition function is defined as:
This DFA operates by simulating the function of the projection on each input symbol. By construction and since , .
Conversely, assume that for words there is a DFA with states that distinguishes and . We derive a projection from the DFA . Without loss of generality, assume and define the projection for every as . Now we observe that witnesses that and are not an identity in .
Corollary 19.
Given an alphabet , and two words deciding whether there is a DFA with at most states such that and is NP-complete for any .
It is worth noting that the function remains equal regardless of the size of the alphabet when [4].
Lemma 20 ([4, Proposition 2.]).
For all , .
However, for any finite , the existence of a state DFA that distinguishes two binary words can be decided in constant time by trying all projections . When the alphabet is part of the input and , then trying all the projections is not feasible, since there are possible projections.
So, deciding whether is known to be NP-complete, for any fixed and variable alphabet size. The complexity remains unresolved when is part of the input and the alphabet is constant. This is particularly significant because the binary case fully determines the asymptotic behaviour of – making it an interesting question for future research.
Question 21.
Given is computation of possible in polynomial time?
7 Computing Witnesses in polynomial time
In this section we provide polynomial algorithms that compute not necessarily minimal witnesses for DFA inequivalence. First we compute witnessing DFAs greedily combining states until a fixed point is reached. Secondly, for unary DFAs we build a polynomial algorithm that computes a witness of index or rejects if it does not exist. This algorithm uses the notion of clean quotients, which are used to show that deciding primality of unary DFAs is decidable in LOGSPACE [9].
Simple witnesses
In this section we introduce a naive polynomial time algorithm that computes a witnessing DFA that is not minimal, but can not become smaller by state merging.
Definition 22 (Irreducible witness).
Let be a DFA such that for some DFA . A DFA is called an irreducible witness w.r.t. and iff there is no DFA such that , where is the floor of defined as
We obtain irreducible DFAs by iteratively combining states. In order to do so, we define an operation on DFAs that maps one state to another. Given a DFA , and two states , when we write , we mean the automaton where the transitions into are relayed to , i.e.
The algorithm follows the following procedure.
- 1. Pick a distinguishing word.
-
First we compute a distinguishing word . This can be done in quasi-linear time [19]. It can be noted that a shortest distinguishing word does not necessarily result in the smallest witnessing automaton. However, by lack of better heuristics it seems best to pick a smallest distinguishing word , since this will guarantee an upper bound on the distinguishing DFA found.
- 2. Greedily combine states.
-
We compute the path takes in the automaton . We try to combine states on this path, and check whether the DFA still accepts a subset of . By construction the DFA still accepts and thus is a witnessing DFA for .
This procedure results in a DFA which is not larger than that of a minimal distinguishing word, and potentially is a minimal witnessing DFA. We prove that the resulting DFA is so-called irreducible, which is, for any smaller DFA , which is a subset of it holds that .
Theorem 23.
Given two DFAs and , such that , and a word the output DFA SimpleWitness() from Algorithm 1 is irreducible.
An interesting question is whether we can compute the minimal witnessing DFA if we compute the correct word in Line 2. In other words, given DFA can we compute a minimal DFA which recognizes a subset of while still containing a word . More formally we are interested in the computational complexity of the following decision problem.
- -DFA-DIST:
-
Given a DFA , a number , and a word is there a DFA of index such that and .
If -DFA-DIST is in P, then also Question 21 is answered affirmitively, e.g. we can compute the smallest DFA that separates two words in polynomial time.
Unary DFAs
When the alphabet of the input DFAs is only a singleton , the problem of deciding minimal witnesses becomes easier. This is due to the specific shapes of these automata.
A unary DFA is a DFA where the alphabet is a singleton, usually we pick . The language of unary DFA is a set of words in the form for some natural number . It is natural to see the language of a unary DFA as a subset of . For a unary DFA we define the unary-language is .
A minimal unary DFA has a specific shape. Since each state has exactly one outgoing transition, it contains exactly one cycle. The initial state might not be on the cycle. In this case the state is the start of a path leading up to the cycle. For the analysis of unary DFAs it is very convenient to distinguish these states.
For two integers and a unary -DFA is a unary DFA with prefix states and a cycle containing states . The path leading into the cycle we call the prefix. Examples of these automata are given in Figure 4.The language of a unary -DFA is given by
For unary DFAs computing minimal DFA witnesses is similar to deciding primality of unary DFAs. Deciding primality for unary DFAs is computable in deterministic logarithmic space [9]. An algorithm is proposed that carefully inspects so-called clean quotients. A clean quotient of a unary DFA is a unary DFA in which the cycle is folded into a smaller cycle.
Although our setting differs from primality testing, a similar technique can be used to compute minimal witnesses for non-inclusion. We inspect all possible covers of the cycle of the unary DFA. For each possible folding this procedure takes only polynomial time and there is only a linear number of possible covers.
In particular there is a unary DFA with a single accepting state that witnesses this non-inclusion. Hence, finding minimal witnesses for the language non-inclusion of two unary DFAs boils down to finding the smallest numbers such that the language distinguishes and .
We define the family of unary DFAs for each , where
-
the set of states is defined as:
-
the transition function is given as:
-
and, one accepting state at the start of the cycle .
Observe that there is always a minimal distinguishing DFA in the shape of .
Lemma 24.
Given unary DFAs such that , then if a unary DFA witnesses the non-inclusion, then there is a DFA such that and witnesses .
Proof.
Let be a distinguishing DFA such that and . Then, by definition there is a word such that . Now we define the DFA such that the only accepting state is the one accepting . Since still accepts , it is still a distinguishing DFA for and by construction .
We complete the proof by showing that for some . If is part of the prefix , then the DFA . In the other case when is some state on a cycle, and let that cycle have length . Then it holds that .
There are only polynomially many of these automata of smaller index, e.g. . A polynomial time algorithm can simply enumerate them all. Our algorithm is slightly less naive and checks for each accepting state that accepts a distinguishing word, the possible languages that accept that distinguishing word.
Lemma 25.
Let be two unary DFAs such that . Then the DFA UnaryMinWitness(), by Algorithm 2 is a minimal witness for the non-inclusion.
Since all loops of Algorithm 2 have a linear bound, the algorithm runs in polynomial time.
Theorem 26.
The decision problem -DFA-DIST for unary DFAs is in P.
8 Related work
There are some decision problems on DFAs that show some similarities, but are different from the work here. For instance the early work of Gold [5] and Pfleeger [18] in which it is shown that learning minimal DFAs from (partial) observations is NP-complete. In this line of work by Gold, so-called separating languages are widely studied in the literature. Here the separating problem is, given languages and , to find a separating language such that and . Although this resembles our witnessing problem, a direct relation is not obvious.
The framework of distinguishing words roughly corresponds to that of Hennessy–Milner theorems for non-deterministic structures [7]. Two non-deterministic structures are not bisimilar if and only if there is a distinguishing formula (within the Hennessy–Milner logic). These formulas are used as counter-examples in the field of model checking. Despite the fact that computing minimal formulas is NP-hard, a similar method as computing distinguishing words results in succinct witnesses [3, 14]. This work can contribute towards witnesses with invariants in the non-deterministic setting.
9 Conclusions
We showed how to use DFAs to witness language inequivalence. In this way it is possible more concisely represent witnesses that explain why the language of input DFAs are not equivalent. These witnesses correspond with prime languages that are previously studied [12, 9]. We show with a reduction from CNF-SAT that deciding minimal distinguishing witnesses is NP-complete. This reduction exploits structure in the language in a non-trivial way. This structure also provides a simpler proof of the coNP-hardness of the minimal pumping length. This indicates that it might be a useful tool in the DFA workbench. In particular, we conjecture that it can be used to closen the complexity gap of deciding primality [12].
To contrast this NP-hardness result we provide a greedy algorithm that in polynomial time computes a witnessing DFA that is never larger than a distinguishing word. In addition, we show that deciding minimal witnesses is in P for unary language non-inclusion.
We leave open two complexity questions that seem interesting. First the complexity of deciding -DFA-DIST and the separating words problem for a binary alphabet. Additionally, it would be interesting to extend the hardness result primality of DFAs. We sketched a connection between prime DFA and minimal witnesses for DFA inequivalence, however, translating this hardness result seems not trivial.
References
- [1] J. Almeida, M. V. Volkov, and S. V. Goldberg. Complexity of the identity checking problem for finite semigroups. Journal of Mathematical Sciences, 158(5):605–614, 2009. doi:10.1007/s10958-009-9397-z.
- [2] Andrei A. Bulatov, Olga Karpova, Arseny M. Shur, and Konstantin Startsev. Lower bounds on words separation: Are there short identities in transformation semigroups?, August 2017. doi:10.37236/6450.
- [3] Rance Cleaveland. On automatically explaining bisimulation inequivalence. In Edmund M. Clarke and Robert P. Kurshan, editors, Proc. Computer-Aided Verification (CAV 1990), pages 364–372, Berlin, Heidelberg, 1991. Springer Berlin Heidelberg. doi:10.1007/BFb0023750.
- [4] Erik D Demaine, Sarah Eisenstat, Jeffrey Shallit, and David A Wilson. Remarks on separating words. In Markus Holzer, Martin Kutrib, and Giovanni Pighizzini, editors, Proc. of DCFS 2011, volume 6808 of LNCS, pages 147–157. Springer, 2011. doi:10.1007/978-3-642-22600-7_12.
- [5] Mark E. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302–320, 1978. doi:10.1016/S0019-9958(78)90562-4.
- [6] Hermann Gruber, Markus Holzer, and Christian Rauch. The pumping lemma for regular languages is hard. In International Conference on Implementation and Application of Automata, pages 128–140. Springer, 2023. doi:10.1007/978-3-031-40247-0_9.
- [7] Matthew Hennessy and Robin Milner. On observing nondeterminism and concurrency. In Jaco de Bakker and Jan van Leeuwen, editors, Automata, Languages and Programming (ICALP ‘1980), pages 299–309, Berlin, Heidelberg, 1980. Springer-Verlag. doi:10.5555/646234.758793.
- [8] John Hopcroft. An n log n algorithm for minimizing states in a finite automaton. In Zvi Kohavi and Azaria Paz, editors, Theory of Machines and Computations, pages 189–196. Academic Press, 1971. doi:10.1016/B978-0-12-417750-5.50022-1.
- [9] Ismaël Jecker, Orna Kupferman, and Nicolas Mazzocchi. Unary Prime Languages. In Javier Esparza and Daniel Král’, editors, Proc. of MFCS 2020, volume 170 of Leibniz International Proceedings in Informatics (LIPIcs), pages 51:1–51:12, Dagstuhl, Germany, 2020. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.MFCS.2020.51.
- [10] Ondřej Klíma. Identity checking problem for transformation monoids. Semigroup Forum, 84(3):487–498, 2012. doi:10.1007/s00233-012-9401-7.
- [11] Dexter Kozen. Lower bounds for natural proof systems. In Proc. of SFCS 1977, pages 254–266. IEEE, 1977. doi:10.1109/SFCS.1977.16.
- [12] Orna Kupferman and Jonathan Mosheiff. Prime languages. Information and Computation, 240:90–107, 2015. doi:10.1016/j.ic.2014.09.010.
- [13] Jan Martens. Deciding minimal distinguishing DFAs is NP-complete. arXiv preprint arXiv:2306.03533, 2023. doi:10.48550/arXiv.2306.03533.
- [14] Jan Martens and Jan Friso Groote. Computing minimal distinguishing Hennessy-Milner formulas is NP-hard, but variants are tractable. In G.A. Pérez and J.-F. Raskin, editors, Proc. of CONCUR 2023, volume 279 of LIPIcs, pages 32:1–32:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPIcs.CONCUR.2023.32.
- [15] Tyler Moore. Gedanken-experiments on sequential machines. In C. E. Shannon and J. McCarthy, editors, Automata Studies, Annals of Mathematical Studies, no. 34. Citeseer, 1956.
- [16] John R. Myhill. Finite automata and the representation of events. WADD Technical Report, 57-624:112–137, 1957.
- [17] Anil Nerode. Linear automaton transformations. Proceedings of the American Mathematical Society, 9(4):541–544, 1958. doi:10.1090/S0002-9939-1958-0135681-9.
- [18] Charles P. Pfleeger. State reduction in incompletely specified finite-state machines. IEEE Transactions on Computers, 100(12):1099–1102, 1973. doi:10.1109/T-C.1973.223655.
- [19] Rick Smetsers, Joshua Moerman, and David N Jansen. Minimal separating sequences for all pairs of states. In Adrian-Horia Dediu, Jan Janoušek, Carlos Martín-Vide, and Bianca Truthe, editors, Proc. of (LATA 2016), volume 9618 of LNCS, pages 181–193. Springer, 2016. doi:10.1007/978-3-319-30000-9_14.
