Probabilistic Finite Automaton Emptiness Is Undecidable for a Fixed Automaton
Abstract
We construct a probabilistic finite automaton (PFA) with 7 states and an input alphabet of 5 symbols for which the PFA Emptiness Problem is undecidable. The only input for the decision problem is the starting distribution. For the proof, we use reductions from special instances of the Post Correspondence Problem.
We also consider some variations: The input alphabet of the PFA can be restricted to a binary alphabet at the expense of a larger number of states. If we allow a rational output value for each state instead of a yes-no acceptance decision, the number of states can even be reduced to 6.
Keywords and phrases:
Probabilistic finite automaton, Undecidability, Post Correspondence Problem2012 ACM Subject Classification:
Theory of computation Formal languages and automata theoryEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Probabilistic finite automata (PFA)
A probabilistic finite automaton (PFA) combines characteristics of a finite automaton and a Markov chain. We give a formal definition below. Informally, we can think of a PFA in terms of an algorithm that reads a sequence of input symbols from left to right, having only finite memory. That is, it can manipulate a finite number of variables with bounded range, just like an ordinary finite automaton. In addition, a PFA can make coin flips. As a consequence, the question whether the PFA arrives in an accepting state and thus accepts a given input word is not a yes/no decision, but it happens with a certain probability. The language recognized (or represented) by a PFA is defined by specifying a probability threshold or cutpoint . By convention, the language consists of all words for which the probability of acceptance strictly exceeds .
The PFA Emptiness Problem is the problem of deciding whether this language is empty. This problem is undecidable. Many other undecidability results rely on the undecidability of the PFA Emptiness Problem, for example, problems about matrix products [2], growth problems [3, Chapter 6], or probabilistic planning problems [14].
We present some sharpenings of the undecidability statement, where certain parameters of the PFA are restricted.
1.1 Formal problem definition
Formally, a PFA is given by a sequence of stochastic transition matrices , one for each letter from the input alphabet . The states correspond to the rows and columns of the matrices. Thus, the matrices are matrices for a PFA with states. The start state is chosen according to a given probability distribution . The set of accepting states is characterized by a 0-1-vector . In terms of these data, the PFA Emptiness Problem with cutpoint is as follows:
PFA Emptiness. Given a set of stochastic matrices , a probability distribution , and a 0-1-vector , is there a sequence with for such that
(1)
The most natural choice is , but the problem is undecidable for any fixed (rational or irrational) cutpoint with . We can also ask instead of .
| Theorem | acceptance crit. | ||||
|---|---|---|---|---|---|
| Thm. 1a | input | 4 | , positive, input | or | |
| Thm. 1b | input | 6 | , positive, fixed | or | |
| Thm. 1c | input | 2 | , positive, input | or | |
| Thm. 1d | input | 2 | , positive, fixed | or | |
| Thm. 2a | 4 | , positive, input | , input | or | |
| Thm. 2b | fixed | 5 | , positive, fixed | , input | or |
| Claus [5] | 9 (5) | , positive, input | |||
| Claus [5] | 2 | , positive, input | |||
| 2 | (), positive, input | ||||
| B.&C. [1] | 2 | , positive, input | or | ||
| 2 | (), positive, input | or | |||
| Hirv. [13] | 2 | , positive, input | |||
| 2 | (), positive, input |
1.2 Statement of results
There are three different proofs of the basic undecidability theorem in the literature. The first proof is due to Masakazu Nasu and Namio Honda [17] from 1969.111This proof is commonly misattributed to Paz [20], although Paz gave credit to Nasu and Honda [20, Section IIIB.7, Bibliographical notes, p. 193]. The second proof, by Volker Claus [5], is loosely related to this proof. Both proofs use a reduction from the Post Correspondence Problem (PCP, see Section 1.4). A completely independent proof with a very different approach, namely a reduction from the halting problem for 2-counter machines, was given by Anne Condon and Richard J. Lipton [6] in 1989, based on ideas of Rūsiņš Freivalds [9] from 1981. The somewhat intricate history is described in [21, Section 3] as part of an extensive survey of the various undecidability proofs.
PFA Emptiness remains undecidable under various constraints on the number of transition matrices (size of the alphabet) and their size (number of states). The first result in this direction is due to Claus [5] from 1981. Later improvements were made by Blondel and Canterini [1] and Hirvensalo [13], who were apparently unaware of [5] and concentrated on the case of two matrices (binary input alphabet). An overview of these results is shown in Table 1 below the double line.
We improve these bounds, concentrating on the minimum number of states without restricting the number of matrices, see Theorem 1. (Such results are only implicit in the proofs of [1] and [13].) Undecidability even holds for a PFA where all data except the starting distribution are fixed.
We also get improved bounds for the 2-matrix case. For the variation where the output vector is not restricted to a 0-1-vector, we can reduce the number of states to 6 (Theorem 2).
Our results are stated in the following theorems and summarized in Table 1.
Theorem 1.
The PFA Emptiness Problem is undecidable for PFAs with a single accepting state and the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):
-
(a)
consists of 4 positive doubly-stochastic transition matrices of size , with cutpoint .
-
(b)
consists of 6 fixed positive doubly-stochastic transition matrices of size , with cutpoint . The only variable input is the starting distribution .
-
(c)
consists of 2 positive transition matrices of size , with cutpoint .
-
(d)
consists of 2 fixed positive transition matrices of size , with cutpoint .
All statements hold also for weak inequality as the acceptance criterion.
Theorem 2.
For any cutpoint in the interval , the PFA Emptiness Problem with an output vector with entries is undecidable for PFAs with the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):
-
(a)
consists of 4 positive transition matrices of size . There is a fixed deterministic start state.
-
(b)
consists of 5 fixed positive transition matrices of size . There is a fixed starting distribution . The only input of the problem is the vector of output values.
All statements hold also for weak inequality as the acceptance criterion.
By combining the different proof techniques, one can extend Theorem 2 to PFAs with two matrices and a fixed start state or starting distribution, analogous to Theorem 1c–d, but we have not worked out these results.
Some weaker results with fixed matrices were previously obtained in a technical report [21, Theorems 2 and 4]. For example, PFA Emptiness was shown to be undecidable for 52 fixed matrices, in the setting of Theorem 2 where the final vector is the only input, with acceptance criterion . For acceptance with strict inequality (), 52 fixed matrices of size were needed. In these constructions, the PCP was derived from a universal Turing machine. For the case of 2 fixed matrices with a single accepting state, where the start distribution is the only input, a bound of 572 states was obtained [21, Theorem 3].
Rational-Weigthed Automata.
A weighted automaton over the reals or a generalized probabilistic automaton or pseudo-stochastic automaton is similar, but the start vector , the end vector , and the matrices can be filled with arbitrary real numbers. These automata are in some ways more natural than PFAs, and, in particular with rational weights, they have recently attracted a lot of attention. Since they generalize PFAs, all our undecidability results immediately carry over to rational-weighted automata.
1.3 Contributions of this paper and relations to other problems
The most striking result of the paper is that PFA Emptiness is undecidable already for a fixed PFA, where only the starting distribution , or the output vector is an input. This follows from a combination of known ideas: The main observation, namely that the reduction of Matiyasevich and Sénizergues [16] from undecidable semi-Thue systems leads to instances of the Post Correspondence problem (PCP) where only the starting pair is variable, has been made by Halava, Harju, and Hirvensalo in 2007 [12, Theorem 6]. The idea of merging the first or last matrix into the starting distribution or into the final vector was used by Hirvensalo in 2007 [13, Step 2 in Section 3].
On the technical side, our major contribution is the reduction of the number of states to six, even for PFAs, and even while keeping the matrices fixed, for the version where the input is the (fractional) output vector .
The PFA Emptiness Problem is a problem about matrix products. There is a great variety of problems about matrix products whose undecidability has been studied; see [7] for a recent survey of this rich area. In fact, the first problem outside the fields of mathematical logic and theory of computing that was shown to be undecidable is a problem about matrix products: Markov proved in 1947 that it is undecidable whether the semigroups generated two finite sets of matrices contain a common element [15]; see Halava and Harju [11] for a presentation of Markov’s proof (as well as an alternative proof). Our basic approach is the same as Markov’s: to model the Post Correspondence Problem (PCP) by matrix products. Our particular technique for doing this has its roots in Paterson’s undecidability proof [19] from 1970 of the matrix mortality problem for matrices.
Besides, we have brought to the light some papers that were apparently forgotten, like Nasu and Honda’s original undecidability proof from 1969 [17], and Claus [5]. Also, our technique for converting matrices with row sums 0 to positive matrices with row sums 1 (hence stochastic matrices) in Section 2.6 is more streamlined and elegant than the proofs that we have seen in the literature.
1.4 The Post Correspondence Problem (PCP)
In the Post Correspondence Problem (PCP), we are given a list of pairs of words . The problem is to decide if there is a nonempty sequence of indices such that
This is one of the well-known undecidable problems. According to Neary [18], the PCP is already undecidable with as few as five word pairs.
2 Proofs of Theorem 1a and Theorem 1b (few states)
We follow the same overall proof strategy as Claus [5], Blondel and Canterini [1] and Hirvensalo [13]: They use undecidable PCP instances with few word pairs and transform them to the Emptiness Problem for integer-weighted automata, which are then converted to PFAs. We deviate from this approach by using an automaton with fractional weights (Section 2.1). These matrices can be converted to column-stochastic matrices without the overhead of an extra state (Section 2.4).
The proof of Theorem 1a contains the main ideas. For the reduction to two matrices, we then apply a technique of Hirvensalo [13] (Section 3.1). All other results are obtained by slight variations of these methods in combination with appropriate results from the literature.
2.1 Step 1: Modeling word pairs by matrices
For a string of decimal digits, we denote its fractional decimal value by . For example, if , then . We will take care to avoid trailing zeros, because their disappearance could cause problems.
For two strings of digits in , we define the matrix
It is not straightforward to see, but it can be checked by a simple calculation that the matrices satisfy a multiplicative law (see [22, Appendix C] for a computer check):
Lemma 3 (Multiplicative Law).
| (2) |
Now we transform these matrices by a similarity transformation with the transformation matrix
| (3) |
The resulting matrices differ in some entries of the first column and the fifth row:
Since the matrices were obtained by a similarity, they satisfy the same multiplicative law:
| (4) |
With the vectors and , we obtain
| (5) |
The sign of this expression can detect inequality of and , as we will presently show.
2.2 Equality detection
Lemma 4 (Equality Detection Lemma).
Let .
-
1.
If , then .
-
2.
If , then .
In particular, is never zero.
Proof.
The first statement is obvious.
To see the second statement, we first consider the case that one of the strings is a prefix of the other (Case A): If is a prefix of , then for fixed , the smallest possible difference among the strings that extend is achieved when . Similarly, if is a prefix of , then the smallest possible difference is achieved when . In either case, . Thus, .
Case B: Neither of and is a prefix of the other. Suppose and share leading digit pairs , . Then one of the two strings starts with and the other with ; the smallest difference between two such numbers is achieved between and , and thus . After squaring this relation, the claim follows with an ample margin.
2.3 Modeling the Post Correspondence Problem (PCP)
The multiplicative law (4) and the capacity to detect string equality are all that is needed to model the PCP.
In the PCP, it is no loss of generality to assume that the words in the pairs use a binary alphabet, since any alphabet can be coded in binary. We recode them to the binary “alphabet” , i.e., they become words in of doubled length. We form the matrices . Multiplicativity gives the relation , and by Section 2.2, if and only if , i. e., is a solution of the PCP.
Since the value 0 is excluded by Section 2.2, nothing changes if the condition “” is replaced by “.” This property will propagate through the proof, with the consequence that in the resulting theorems, it does not matter if we take or as the acceptance criterion for the PFA. We will not mention this issue any more and work only with strict inequality.
There are two problems that we still have to solve:
-
The empty sequence () always fulfills the inequality although it does not count as a solution of the PCP.
-
The matrices are not stochastic, and is not a probability distribution. So far, what we have is a generalized probabilistic automaton, or rational-weighted automaton.
Turakainen [23] showed in 1969 that a generalized probabilistic automaton can be converted to a PFA without changing the recognized language (with the possible exception of the empty word), and this can even be done by adding only two more states [24, Theorem 1(i)]. Thus, by the general technique of [24], we can get a PFA with 8 states. We will use some tailored version of this method, involving nontrivial techniques, so that no extra state is needed to make the matrices stochastic and to simultaneously get rid of the empty solution.
History of ideas.
The idea of modeling of the PCP by multiplication of integer matrices was pioneered by Markov [15] in 1947. He used the matrices and , which generate a free semigroup. A different representation, closer to the one we are using, goes back to Paterson [19] in 1970, who used it to show that mortality for matrices is undecidable.
Our matrix is a variation of the integer matrices that were proposed by Volker Claus [5] in 1981, and again by Blondel and Canterini [1, p. 235] in 2003, in the very context of constructing small undecidable instances of the PFA Emptiness Problem. These matrices extend Paterson’s matrices to larger matrices that can produce quadratic terms. They use positive powers of the base 10 and integer decimal values of the strings instead of fractional values . Such matrices satisfy a multiplicative law such as (2) but with a reversal of the factors. In fact, the method works equally well for any other radix instead of 10.222 “The notation is quaternary, decimal, etc., according to taste.” (Paterson [19]). Claus [5] used radix 3.
Our novel idea is to use this construction with negative powers of 10, leading to entries that are roughly at the same scale as a stochastic matrix, thus facilitating further transformations.
2.4 Step 2: Making column sums 1
We apply another similarity transformation, using the matrix
but this time, we also transform the vectors and , leaving the overall result unchanged:
| (6) |
We analyze the effect of the similarity transform for a matrix of the general form
The transformed matrix is
with and .
Lemma 5.
-
1.
the matrix has column sums .
-
2.
For and , the matrix is positive, and therefore column-stochastic.
Proof.
The first statement can be easily checked directly, but it has a systematic reason:
The second statement is not needed for Theorem 1, because positivity is established anyway in Step 4, after destroying it in Step 3, but we need it for Theorem 2. So let us check it: The only entries that are in danger of becoming negative are the entries of the first row. The two entries and of the matrix can be as large as ; all other entries are safely below . Thus, even the “most dangerous” candidate , where 10 of the entries are subtracted from 1, cannot be zero or negative.
It can be checked that rows 2–6 are positive, because the first column of , which is positive, has been added to every other column.
In the transformation from to , the first entry is added to all other entries. Thus, the vector , becomes , whose entries sum to zero:
| (7) |
It was for this reason that the entry of and the corresponding entries of the transformations (3) were chosen.
The output vector is unchanged by the transformation: .
2.5 Step 3: Making row sums 1 with an extra state
By adding an extra column , we now make all row sums equal to 1. We also add an extra row, resulting in a matrix
| (8) |
The entries of may be negative. They lie in the range . All column sums are still 1: Since the row sums are 1, the total sum of the entries is 7. Therefore, since the first six column sums are 1, the sum of the last column must also be 1.
The vectors and are extended to vectors and of length 7 by adding a 0, but they are otherwise unchanged. Thus, the extra column and row of plays no role for the value of the product (6), which remains unchanged:
2.6 Step 4: Making the matrices positive, and hence stochastic
Let be the doubly-stochastic transition matrix of the “completely random transition” with all entries . Then, with , we form the matrices . The constant is small enough to ensure that . Hence the matrices are doubly-stochastic.
If expand of the product , we get a sum of terms, each containing a product of of the matrices or . We find that all terms containing the factor vanish. The reason is that, since has row and column sums 1, . Moreover, by (7). It follows that
The factor plays no role for the sign, and hence,
| (9) |
if and only if is a solution of the PCP.
2.7 Step 5: Turning into a probability distribution
The start vector has sum 0 and contains negative entries, which are smaller than but not smaller than . We form . It is positive and has sum 1. The effect of substituting by in (9) is that the result is increased by . The reason is that , since the matrices are doubly-stochastic, and in particular, column-stochastic, and therefore .
In summary, we have now constructed a true PFA with seven states that models the PCP:
| (10) |
if and only if is a (possibly empty) solution of the PCP.
2.8 Using a small PCP
We base our proof on the undecidability of the PCP with 5 word pairs, as established by Turlough Neary [18, Theorem 11] in 2015. Neary constructed PCP instances with five pairs that have the following property: Every solution necessarily starts with the pair and ends with . We can also assume that the end pair is used nowhere else. (More precisely, in every primitive solution (which is not a concatenation of smaller solutions), the end pair cannot appear anywhere else than in the final position.) However, the start pair is also used in the middle of the solutions. (This multipurpose usage of the start pair is one of the devices to achieve such a remarkably small number of word pairs.)333The proof of Theorem 11 in Neary [18] contains an error, but this error can be fixed: His PCP instances encode binary tag systems. When showing that the PCP solution must follow the intended patters of the simulation of the binary tag system, Neary [18, p. 660] needs to show that the end pair cannot be used except to bring the two strings to a common end. He claims that a block cannot appear in the encoded string because in (the unencoded string of the binary tag system, which is described in Lemma 9) we cannot have two symbols next to each other. This is not true. The paper contains plenty of examples, and they contradict this claim; for example, the string in (7) [18, p. 657] contains seven ’s in a row. The mistake can be fixed by taking a longer block of 1s: Looking at the appendants in Lemma 9, it is clear that every block of length must contain a symbol . Thus, the pair will work as end pair.
Reversing the words.
For our construction it is preferable to have the word pair at the beginning. We thus reverse all words. Any solution sequence for the original problem must be also be reversed to , but this does not affect the solvability of the PCP. Thus, we work with PCP instances of the form with the following property: Every solution sequence must start with the pair , and the pair cannot be used anywhere else: and for .
2.9 Step 6. Merging the leftmost matrix into the starting distribution
We apply the above construction steps to the word pairs , leading to five matrices . Since the leftmost matrix must be in any solution, we can combine this matrix with the starting distribution :
leading to a new starting distribution . The matrix can be removed from the pool of matrices, leaving only 4 matrices. We have simultaneously eliminated the empty solution with a product of matrices. This concludes the proof of Theorem 1a. ∎
2.10 Proof of Theorem 1b (fixed transition matrices) by using a PCP with only one variable word pair
Matiyasevich and Sénizergues [16] constructed PCP instances with seven word pairs that have the following property: Every solution necessarily starts with the pair and ends with . Both the start pair and the end pair can be assumed to appear nowhere else, in the sense described in Section 2.8, i. e., apart from the possibility to concatenate solutions to obtain longer solutions. Matiyasevich and Sénizergues [16] used a reduction, due to Claus [4], from the individual accessibility problem (or individual word problem) for semi-Thue systems. They showed that the individual accessibility problem is already undecidable for a particular semi-Thue system with 3 rules [16, Theorem 3]. Halava, Harju, and Hirvensalo [12, Theorem 6] observed an important consequence of this: One can fix all words except , and leave only as an input to the problem, and the PCP is still undecidable.
From these word pairs, we form the matrices from the words without reversal. Following the same steps as above, we eventually arrive at corresponding matrices . We merge with into a new starting distribution . We are left with a pool of six fixed matrices. The only variable input is the starting distribution . This concludes the proof of Theorem 1b. ∎
3 Proofs of Theorem 1c and Theorem 1d (binary input)
3.1 Reduction to two matrices
The following lemma and its proof is extracted from Step 3 in Hirvensalo [13, Section 3]. We denote by the acceptance probability of a (generalized) PFA for an input , i.e., the value of the product in the expression (1).
Lemma 6.
Let be a generalized PFA with matrices of size , such that the first row of each matrix is .
Then one can obtain a generalized PFA over the two-symbol alphabet , with matrices and of dimension such that for every word there exists a word with
| (11) |
and conversely, for every word there exists a word with (11).
If the given matrices are nonnegative, then so are and . If the given matrices are stochastic, then so are and . If is a probability distribution, then so is . The entries of are taken from the entries of .
Thus, if is a PFA, then is a PFA as well.
The lemma is not specific about the word or whose existence is guaranteed. Nevertheless, regardless of how these words are found, the statement is sufficient in the context of the emptiness question.
However, we can be explicit about and : The construction uses a binary encoding with the prefix-free codewords . Then, we can take , because
For words that are not of the form , (11) holds for the longest word such that is a prefix of . In other words, is the decoding of the longest decodable prefix of .
Proof.
The procedure is easiest to describe if we assume that is a PFA. Then we can think of as an automaton that decodes the input and carries out a transition of whenever it has read a complete codeword. In addition to the state of , needs to maintain a counter in the range for the number of a’s of the current partial codeword. Whenever reads a b, or if it has read the -st a, performs the appropriate random transition and resets the counter. The number of states of is .
The fact that state 1 is an absorbing state allows a shortcut: If we are in state 1, we can stop to maintain the counter . Thus, only states need to be multiplied by , and the resulting number of states reduces to .
We now describe the construction of in terms of transition matrices. This construction is valid also when is a generalized PFA, where the above description in terms of random transitions makes no sense. To make the description more concrete, we illustrate it with matrices and the corresponding binary codewords b, ab, aab, aaab, aaaa.
We split the matrices and the vectors and into blocks of size :
We define new transition matrices and start and end vectors in block form with block sizes as follows:
From the sequence of powers of ,
we can recognize the pattern of development. We can then work out the matrices , , , , and and check that they are of the form
for , having the original matrices in their upper-left corner and otherwise zeros in the first two rows of blocks. Thus, they simulate the original automaton on the states : It is easy to establish by induction that multiplying the initial distribution vector with a sequence of such matrices will produce a vector of the form
| (12) |
whose first entries coincide with the entries of the corresponding vector produced with the original start vector and the original matrices . If is multiplied with , the result is the same as with and the original vector .
One technicality remains to be discussed: Some “unfinished” words in do not factor into codewords but end in a partial codeword a, aa, or aaa. To analyze the corresponding matrix products, we look at the powers , for . They have the form
| (13) |
and therefore, multiplying the vector from (12) with them yields , , and , respectively. If this is multiplied with , the result is the same as with the vector in (12). Thus, as claimed after the statement of the lemma, input sequences whose decoding process leaves a partial codeword for produce the same value as if that partial codeword were omitted.444Hirvensalo [13, p. 314] defined the vector ( in his notation) “analogously” to the vector ( in his notation), thus padding it with zeros. We see that with this definition, the result for incomplete inputs is , which, in the general setting of our lemma, it has no predictable relation with the meaningful value . In Hirsensalo’s case, can be shown by a delicate argument to be , see [21, Section 8.2.3, footnote 31] in the updated version on the author’s homepage.
3.2 Proof of Theorem 1c
Section 3.1 blows up the number of states by a factor that depends on the number of matrices. Thus, it is advantageous to apply it after merging the start matrix into , when the number of matrices is reduced.
So we start with the 5-pair instances of Neary [18] and construct five matrices from the reversed pairs. We then combine with the starting distribution into , leaving matrices of dimension .
We cannot apply the transformations of Step 2, because it changes the first row, and state 1 would no longer be absorbing, which precludes the application of Section 3.1.
Thus, we apply Section 3.1 to the matrices , resulting in two matrices and of dimension . The new start vector is padded with zeros, and the end vector is still the first unit vector (of dimension 16).
We would now like to use the transformation of Step 2 to make the matrices column-stochastic. However, since we have replaced the initial start vector by , the entries of the start vector after the transformation would not longer sum to 0, a property that is crucial for eventually making the matrices stochastic in Step 4.
Therefore we have to achieve column sums 1 in a different way, with an adaptation of the method of Step 3 (Section 2.5), adding an extra state;
3.3 Step 2′: Making column sums 1 with an extra state
We add an extra row and an extra column to each matrix, ensuring that all column sums become 1.
| (14) |
We now have two matrices with column sums 1. The remaining steps (Steps 3–5) are as before, with the appropriate adaptations, and they add another row and column. (We may have to choose a smaller constant in Step 4.) This concludes the proof of Theorem 1c. ∎
3.4 Proof of Theorem 1d (two fixed matrices)
This is obtained by adapting the Proof of Theorem 1c in the same way as for Theorem 1b (Section 2.10). Instead of Neary’s 5-pair instances, we take the instance of Matiyasevich and Sénizergues and construct seven matrices from these pairs (without reversing the words and ). We then combine , which must be the first matrix in the product, with the starting distribution into . The remaining matrices are fixed. Only is an input to the problem.
The remainder of the proof is the same as in Theorem 1c. We apply Section 3.1 to matrices of dimension , resulting in two fixed matrices and of dimension . The conversion to a PFA requires two more states, and this leads to Theorem 1d. ∎
4 Proof of Theorem 2 (variable end vector )
One can relax the requirement that is a 0-1-vector and allow arbitrary values. If the values are in the interval we can think of the entry as a probability in a final acceptance decision, if the PFA is in state after the input has been read. Another possibility is that represents a prize or reward that that is gained when the process stops in state . Then does not need to be restricted to the interval . In this view, instead of the acceptance probability, we compute the expected reward of the automaton, as in game theory. We call the output values, and and the output vector or the end vector, in analogy to the start vector .
Since the transition matrices after Step 2 are column-stochastic and positive by Section 2.4, we transpose them to produce stochastic matrices, which can be directly used as transition matrices. This will reverse the order of the matrices in the product and swap the start vector with the end vector:
| (15) |
We may prefer to have the end vector in the interval . Thus, we replace by , in analogy to Section 2.7. This vector is positive and has sum 1, and in particular, the entries lie between 0 and 1. The effect is to increase the value of (15) by 1/6.
We take the 5-pair instances of Neary [18] (Section 2.8) and construct five matrices from the reversed pairs. The words and in these instances are nonempty, as required by Section 2.4 [18, Proof of Theorem 11]. We convert them to column-stochastic matrices and use the transposed matrices . We know that must be used at the beginning of the PCP solution () and nowhere else. Hence must be used at the end of the product in (15), and we can merge it with into an end vector . The four remaining matrices form the set . The starting distribution is a unit vector, i.e., there is a single deterministic start state. This proves Theorem 2a for .
4.1 Changing the cutpoint by manipulating the end vector
When the end vector of a PFA is under control, one can change the cutpoint to any desired value in the open interval between and : Adding a constant to all output values will increase the expected output value by , and scaling by a positive constant will affect the expected output value in the same way.
Thus, by changing all to for , one may decrease to any positive value. By changing all to for , one may increase to any value less than 1. If is not constrained to the interval , one can reach any real value .
4.2 Proof of Theorem 2b (fixed transition matrices)
We would like to apply the approach of Theorem 2a to the the seven word pairs from Section 2.10. They should fulfill the assumption of Section 2.4 that none of the words is empty. However, one of the rules of the 3-rule semi-Thue system of Matiyasevich and Sénizergues, the rule , contains an empty word, see [16, System , (23)]. The reduction of Claus [4] (see also [12, Theorem 4]) would translate this into a PCP pair with an empty word.
Luckily, this reduction can be patched to yield seven pairs of nonempty words and . We refer to [22, Appendix B] for details.
As above, we form from these pairs the stochastic matrices . Only is a variable matrix, and all other matrices are fixed. In the product , we merge the first matrix with into , which becomes the fixed start vector, and the last matrix into , which becomes the only input to the problem. The remaining five matrices form the fixed set . This proves Theorem 2b for , and as above, it can be changed to any cutpoint . ∎
5 Outlook
The natural question is to ask for the smallest number of states for which PFA Emptiness is undecidable. Even if we consider generalized PFAs (rational-weighted automata), we could not reduce this number below 6. Claus [5, Theorem 7 and Corollary, p. 155] showed that the emptiness question can be decided for PFAs with two states.
A matrix dimension of six seems to be the minimum required in order to carry enough information to model concatenation of word pairs and allow testing for equality by a quadratic expression like (5), even in weighted automata. Undecidability for five or perhaps even three states would require some completely different (perhaps geometric) approach.
It would be an interesting exercise to trace back the undecidability proof of Matiyasevich and Sénizergues [16] to its origins and explicitly work out the fixed matrices of Theorem 1b or 2b. For one of the weaker results mentioned after Theorem 2, [21, Theorem 4b], one particular matrix from a set of 52 fixed matrices is shown in [21, Section 7.3].
PFAs can be have other merits than just a small number of states and input symbols. We discuss some of these criteria.
Positive and doubly-stochastic transition matrices.
5.1 Obtaining a substantial gap between acceptance and rejection
As seen in formula (5), the acceptance probability barely rises above the threshold when the input represents a solution of the PCP. (This tiny gap is further reduced by multiplying it with in Step 4.) Thus, our constructions depend on the capability of a PFA to “detect” minute fluctuations of the acceptance probability above the cutpoint. This statement applies to all undecidability proofs in the Nasu–Honda line of descent.
By contrast, the proof of Condon and Lipton [6] from 1989 gives a more robust result, see also [21, Section 4]: For any , it yields a PFA such that the largest acceptance probability is either or , and the problem to detect which of the two cases holds is undecidable. Undecidability is derived from the halting problem for 2-counter machines, and the number of states of the PFA is beyond control.
Luckily, our results can be strengthened to even surpass the strong separation achieved in the Condon–Lipton construction, by the following gap amplification technique of Gimbert and Oualhadj, which we formulate in slightly generalized form.
Theorem 7 (Gimbert and Oualhadj 2010 [10]).
We are given a PFA with input alphabet and states, with an output vector . We denote its acceptance probability for an input by . Let be two thresholds with and .
Then, from the description of , we can construct a PFA with input alphabet with states, a single start state and a single accepting state, with the following property.
-
1.
If every input for has acceptance probability , then every input for is accepted by with probability .
-
2.
If has an input with acceptance probability , then has inputs with acceptance probability arbitrarily close to . ∎
This was proved for by Gimbert and Oualhadj [10] in 2010 in order to show that it is undecidable whether the conclusion of Case 2 for a PFA holds (the “Value 1 Problem”). The construction and proof was simplified by Fijalkow as part of a short 8-page note [8]. The generalization to arbitrary and is not difficult, and in addition, we have included the precise statement about the number of states. A proof is given in [22, Appendix A]. It essentially follows Fijalkow’s construction, and it eliminates an oversight in [10] and [8].
With this technique, one can achieve an arbitrarily large gap with a moderate increase in states, roughly by a factor of 2. In particular, from Theorem 2a, we get a PFA with 6 matrices of size , which exhibits the strong dichotomy expressed in Theorem 7, for any .
This construction does not preserve the property of being a PFA with fixed transition matrices. In the PFA constructed in [22, Appendix A, see Table 2], the transitions depend both on the starting distribution and on the final vector of .
5.2 Uniqueness
In Theorem 1a and Theorem 2a, the constructed PFA has the property that the recognized language contains at most one word. As with the large gap guarantee in Section 5.1, this leads to a stronger statement where the problem gets the nature of a promise problem.
Neary [18] derived his undecidable PCP instances from binary tag systems. A binary tag system performs a deterministic computation. It follows from the correctness argument of the simulation that the PCP solution is unique if it exists, apart from the obvious possibility of repeating the solution arbitrarily. In Section 2.8, we have excluded the last possibility by fixing the end pair to be the first pair and removing it from the list. Thus, uniqueness holds for the PFA in Theorems 1a and 2a.
However, in the conversion to a binary input alphabet for Theorem 1c (Section 3.1), uniqueness is lost: We have seen that we may always add a or aa to a solution. We don’t see an easy way to eliminate these extra solutions without increasing the number of states.
5.3 Simple probabilistic automata
In a simple PFA, all probabilities are , , or 1 [10, Definition 2]. We have constructed our PFA with decimal fractions, but it would not be hard to switch to binary fractions. Before Step 4, the number of states should be increased to a power of two, so that the entries of become binary fractions as well. Once all transition probabilities are binary fractions, the PFA can be converted to a simple PFA by padding each input symbol with sufficiently many dummy symbols so that the PFA has time to make its random decisions with a sequence of unbiased coin flips; see [21, Section 4.4, proof of Theorem 1, item (b)] Thus the results can be achieved with simple PFAs, but with a larger number of matrices and a larger number of states. The precise quantities would depend on the lengths of the words and in the PCP.
References
- [1] Vincent D. Blondel and Vincent Canterini. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Systems, 36:231–245, 2003. doi:10.1007/s00224-003-1061-2.
- [2] Vincent D. Blondel and John N. Tsitsiklis. The boundedness of all products of a pair of matrices is undecidable. Systems & Control Letters, 41(2):135–140, 2000. doi:10.1016/S0167-6911(00)00049-9.
- [3] Vuong Bui. Growth of Bilinear Maps. PhD thesis, Freie Universität Berlin, Institut für Informatik, 2023. doi:10.17169/refubium-41705.
- [4] Volker Claus. Some remarks on PCP() and related problems. Bull. Europ. Assoc. Theoret. Computer Sci. (EATCS), 12:54–61, October 1980. URL: http://page.mi.fu-berlin.de/rote/Kram/Volker_Claus-Some_remarks_on_PCP(k)_and_related_problems-1980-Bull-EATCS-12.pdf.
- [5] Volker Claus. The -bounded emptiness-problem for probabilistic acceptors and related problems. Acta Informatica, 16:139–160, 1981. doi:10.1007/BF00261257.
- [6] A. Condon and R. J. Lipton. On the complexity of space bounded interactive proofs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, pages 462–467, USA, 1989. IEEE Computer Society. doi:10.1109/SFCS.1989.63519.
- [7] Ruiwen Dong. Recent advances in algorithmic problems for semigroups. ACM SIGLOG News, 10(4):3–23, December 2023. doi:10.1145/3636362.3636365.
- [8] Nathanaël Fijalkow. Undecidability results for probabilistic automata. ACM SIGLOG News, 4(4):10–17, November 2017. doi:10.1145/3157831.3157833.
- [9] Rūsiņš Freivalds. Probabilistic two-way machines. In Jozef Gruska and Michal Chytil, editors, Mathematical Foundations of Computer Science 1981 (MFCS), volume 118 of Lecture Notes in Computer Science, pages 33–45. Springer, 1981. doi:10.1007/3-540-10856-4_72.
- [10] Hugo Gimbert and Youssouf Oualhadj. Probabilistic automata on finite words: Decidable and undecidable problems. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming. 37th International Colloquium, ICALP 2010, Bordeaux, France, July 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pages 527–538, Berlin, Heidelberg, 2010. Springer-Verlag. Full version in https://hal.science/hal-00456538v3. doi:10.1007/978-3-642-14162-1_44.
- [11] Vesa Halava and Tero Harju. On Markov’s undecidability theorem for integer matrices. Semigroup Forum, 75:173–180, 2007. doi:10.1007/s00233-007-0714-x.
- [12] Vesa Halava, Tero Harju, and Mika Hirvensalo. Undecidability bounds for integer matrices using Claus instances. International Journal of Foundations of Computer Science, 18(05):931–948, 2007. doi:10.1142/S0129054107005066.
- [13] Mika Hirvensalo. Improved undecidability results on the emptiness problem of probabilistic and quantum cut-point languages. In Jan van Leeuwen, Giuseppe F. Italiano, Wiebe van der Hoek, Christoph Meinel, Harald Sack, and František Plášil, editors, SOFSEM 2007: Theory and Practice of Computer Science, volume 4362 of Lecture Notes in Computer Science, pages 309–319, Berlin, Heidelberg, 2007. Springer. doi:10.1007/978-3-540-69507-3_25.
- [14] Omid Madani, Steve Hanks, and Anne Condon. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1–2):5–34, 2003. doi:10.1016/S0004-3702(02)00378-8.
- [15] A. Markov. On certain insoluble problems concerning matrices. Doklady Akad. Nauk SSSR (N.S.), 57:539–542, 1947. (Russian).
- [16] Yuri Matiyasevich and Géraud Sénizergues. Decision problems for semi-Thue systems with a few rules. Theoretical Computer Science, 330(1):145–169, 2005. doi:10.1016/j.tcs.2004.09.016.
- [17] Masakazu Nasu and Namio Honda. Mappings induced by PGSM-mappings and some recursively unsolvable problems of finite probabilistic automata. Information and Control, 15(3):250–273, 1969. doi:10.1016/S0019-9958(69)90449-5.
- [18] Turlough Neary. Undecidability in binary tag systems and the Post correspondence problem for five pairs of words. In Ernst W. Mayr and Nicolas Ollinger, editors, 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), volume 30 of Leibniz International Proceedings in Informatics (LIPIcs), pages 649–661, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2015.649.
- [19] Michael S. Paterson. Unsolvability in matrices. Stud. in Appl. Math., 49(1):105–107, 1970. doi:10.1002/sapm1970491105.
- [20] Azaria Paz. Introduction to Probabilistic Automata. Computer Science and Applied Mathematics. Academic Press, New York, 1971. doi:10.1016/c2013-0-11297-4.
- [21] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable, June 2024. doi:10.48550/arXiv.2405.03035.
- [22] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable for a fixed automaton, December 2024. doi:10.48550/arXiv.2412.05198.
- [23] Paavo Turakainen. Generalized automata and stochastic languages. Proc. Amer. Math. Soc., 21:303–309, 1969. doi:10.1090/S0002-9939-1969-0242596-1.
- [24] Paavo Turakainen. Word-functions of stochastic and pseudo stochastic automata. Annales Fennici Mathematici, 1(1):27–37, February 1975. doi:10.5186/aasfm.1975.0126.
