Abstract 1 Probabilistic finite automata (PFA) 2 Proofs of Theorem 1a and Theorem 1b (few states) 3 Proofs of Theorem 1c and Theorem 1d (binary input) 4 Proof of Theorem 2 (variable end vector 𝒇) 5 Outlook References

Probabilistic Finite Automaton Emptiness Is Undecidable for a Fixed Automaton

Günter Rote ORCID Institut für Informatik, Freie Universität Berlin, Germany
Abstract

We construct a probabilistic finite automaton (PFA) with 7 states and an input alphabet of 5 symbols for which the PFA Emptiness Problem is undecidable. The only input for the decision problem is the starting distribution. For the proof, we use reductions from special instances of the Post Correspondence Problem.

We also consider some variations: The input alphabet of the PFA can be restricted to a binary alphabet at the expense of a larger number of states. If we allow a rational output value for each state instead of a yes-no acceptance decision, the number of states can even be reduced to 6.

Keywords and phrases:
Probabilistic finite automaton, Undecidability, Post Correspondence Problem
Copyright and License:
[Uncaptioned image] © Günter Rote; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory
Related Version:
Full Version: http://arxiv.org/abs/2412.05198 [22]
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Probabilistic finite automata (PFA)

A probabilistic finite automaton (PFA) combines characteristics of a finite automaton and a Markov chain. We give a formal definition below. Informally, we can think of a PFA in terms of an algorithm that reads a sequence of input symbols from left to right, having only finite memory. That is, it can manipulate a finite number of variables with bounded range, just like an ordinary finite automaton. In addition, a PFA can make coin flips. As a consequence, the question whether the PFA arrives in an accepting state and thus accepts a given input word is not a yes/no decision, but it happens with a certain probability. The language recognized (or represented) by a PFA is defined by specifying a probability threshold or cutpoint λ. By convention, the language consists of all words for which the probability of acceptance strictly exceeds λ.

The PFA Emptiness Problem is the problem of deciding whether this language is empty. This problem is undecidable. Many other undecidability results rely on the undecidability of the PFA Emptiness Problem, for example, problems about matrix products [2], growth problems [3, Chapter 6], or probabilistic planning problems [14].

We present some sharpenings of the undecidability statement, where certain parameters of the PFA are restricted.

1.1 Formal problem definition

Formally, a PFA is given by a sequence of stochastic transition matrices Mσ, one for each letter σ from the input alphabet Σ. The states correspond to the rows and columns of the matrices. Thus, the matrices are d×d matrices for a PFA with d states. The start state is chosen according to a given probability distribution πd. The set of accepting states is characterized by a 0-1-vector f{0,1}d. In terms of these data, the PFA Emptiness Problem with cutpoint λ is as follows:

PFA Emptiness. Given a set of k stochastic matrices ={M1,,Mk}d×d, a probability distribution πd, and a 0-1-vector f{0,1}d, is there a sequence i1,i2,,im with 1ijk for j=1,,m such that

πTMi1Mi2Mimf>λ? (1)

The most natural choice is λ=12, but the problem is undecidable for any fixed (rational or irrational) cutpoint λ with 0<λ<1. We can also ask λ instead of >λ.

Table 1: The main characteristics of the data π, , and f for different undecidable versions of PFA Emptiness. The vectors e1 and e2 are two standard unit vectors of appropriate dimension, indicating a single accepting state or a single deterministic start state. The results under the double line are previous results from the literature, which refer to results of Claus [5, Theorem 6(iii)] from 1981, Blondel and Canterini [1, Theorem 2.1] from 2003, and Hirvensalo [13, Section 3] from 2007. The numbers in parentheses are the figures that would be obtained by basing the proofs on Neary’s undecidable 5-word instances of the Post Correspondence Problem (PCP) instead of the smallest known undecidable PCP instances that were current at the time, see [21, Theorems 5 and 6].
Theorem π || M f acceptance crit.
Thm. 1a input 4 7×7, positive, input f=e1 >1/7 or 1/7
Thm. 1b input 6 7×7, positive, fixed f=e1 >1/7 or 1/7
Thm. 1c input 2 18×18, positive, input f=e1 >1/18 or 1/18
Thm. 1d input 2 28×28, positive, fixed f=e1 >1/28 or 1/28
Thm. 2a π=e2 4 6×6, positive, input f[0,1]6, input >λ or λ
Thm. 2b fixed 5 6×6, positive, fixed f[0,1]6, input >λ or λ
Claus [5] π=e2 9 (5) 9×9, positive, input f=e1 >1/9
Claus [5] π=e2 2 65×65, positive, input f=e1 >1/65
π=e2 2 (37×37), positive, input f=e1 >1/37
B.&C. [1] π=e2 2 46×46, positive, input f=e1 >1/46 or 1/46
π=e2 2 (34×34), positive, input f=e1 >1/34 or 1/34
Hirv. [13] π=e2 2 25×25, positive, input f=e1 >1/25
π=e2 2 (20×20), positive, input f=e1 >1/20

1.2 Statement of results

There are three different proofs of the basic undecidability theorem in the literature. The first proof is due to Masakazu Nasu and Namio Honda [17] from 1969.111This proof is commonly misattributed to Paz [20], although Paz gave credit to Nasu and Honda [20, Section IIIB.7, Bibliographical notes, p. 193]. The second proof, by Volker Claus [5], is loosely related to this proof. Both proofs use a reduction from the Post Correspondence Problem (PCP, see Section 1.4). A completely independent proof with a very different approach, namely a reduction from the halting problem for 2-counter machines, was given by Anne Condon and Richard J. Lipton [6] in 1989, based on ideas of Rūsiņš Freivalds [9] from 1981. The somewhat intricate history is described in [21, Section 3] as part of an extensive survey of the various undecidability proofs.

PFA Emptiness remains undecidable under various constraints on the number of transition matrices (size of the alphabet) and their size (number of states). The first result in this direction is due to Claus [5] from 1981. Later improvements were made by Blondel and Canterini [1] and Hirvensalo [13], who were apparently unaware of [5] and concentrated on the case of two matrices (binary input alphabet). An overview of these results is shown in Table 1 below the double line.

We improve these bounds, concentrating on the minimum number of states without restricting the number of matrices, see Theorem 1. (Such results are only implicit in the proofs of [1] and [13].) Undecidability even holds for a PFA where all data except the starting distribution π are fixed.

We also get improved bounds for the 2-matrix case. For the variation where the output vector f is not restricted to a 0-1-vector, we can reduce the number of states to 6 (Theorem 2).

Our results are stated in the following theorems and summarized in Table 1.

Theorem 1.

The PFA Emptiness Problem is undecidable for PFAs with a single accepting state and the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):

  1. (a)

    consists of 4 positive doubly-stochastic transition matrices of size 7×7, with cutpoint λ=1/7.

  2. (b)

    consists of 6 fixed positive doubly-stochastic transition matrices of size 7×7, with cutpoint λ=1/7. The only variable input is the starting distribution π[0,1]7.

  3. (c)

    consists of 2 positive transition matrices of size 18×18, with cutpoint λ=1/18.

  4. (d)

    consists of 2 fixed positive transition matrices of size 28×28, with cutpoint λ=1/28.

All statements hold also for weak inequality (λ) as the acceptance criterion.

Theorem 2.

For any cutpoint λ in the interval 0<λ<1, the PFA Emptiness Problem with an output vector f with entries 0fi1 is undecidable for PFAs with the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):

  1. (a)

    consists of 4 positive transition matrices of size 6×6. There is a fixed deterministic start state.

  2. (b)

    consists of 5 fixed positive transition matrices of size 6×6. There is a fixed starting distribution π. The only input of the problem is the vector f[0,1]6 of output values.

All statements hold also for weak inequality (λ) as the acceptance criterion.

By combining the different proof techniques, one can extend Theorem 2 to PFAs with two matrices and a fixed start state or starting distribution, analogous to Theorem 1cd, but we have not worked out these results.

Some weaker results with fixed matrices were previously obtained in a technical report [21, Theorems 2 and 4]. For example, PFA Emptiness was shown to be undecidable for 52 fixed 9×9 matrices, in the setting of Theorem 2 where the final vector f is the only input, with acceptance criterion 1/2. For acceptance with strict inequality (>1/4), 52 fixed matrices of size 11×11 were needed. In these constructions, the PCP was derived from a universal Turing machine. For the case of 2 fixed matrices with a single accepting state, where the start distribution π is the only input, a bound of 572 states was obtained [21, Theorem 3].

Rational-Weigthed Automata.

A weighted automaton over the reals or a generalized probabilistic automaton or pseudo-stochastic automaton is similar, but the start vector π, the end vector f, and the matrices Mi can be filled with arbitrary real numbers. These automata are in some ways more natural than PFAs, and, in particular with rational weights, they have recently attracted a lot of attention. Since they generalize PFAs, all our undecidability results immediately carry over to rational-weighted automata.

1.3 Contributions of this paper and relations to other problems

The most striking result of the paper is that PFA Emptiness is undecidable already for a fixed PFA, where only the starting distribution π, or the output vector f is an input. This follows from a combination of known ideas: The main observation, namely that the reduction of Matiyasevich and Sénizergues [16] from undecidable semi-Thue systems leads to instances of the Post Correspondence problem (PCP) where only the starting pair is variable, has been made by Halava, Harju, and Hirvensalo in 2007 [12, Theorem 6]. The idea of merging the first or last matrix into the starting distribution π or into the final vector f was used by Hirvensalo in 2007 [13, Step 2 in Section 3].

On the technical side, our major contribution is the reduction of the number of states to six, even for PFAs, and even while keeping the matrices fixed, for the version where the input is the (fractional) output vector f.

The PFA Emptiness Problem is a problem about matrix products. There is a great variety of problems about matrix products whose undecidability has been studied; see [7] for a recent survey of this rich area. In fact, the first problem outside the fields of mathematical logic and theory of computing that was shown to be undecidable is a problem about matrix products: Markov proved in 1947 that it is undecidable whether the semigroups generated two finite sets of matrices contain a common element [15]; see Halava and Harju [11] for a presentation of Markov’s proof (as well as an alternative proof). Our basic approach is the same as Markov’s: to model the Post Correspondence Problem (PCP) by matrix products. Our particular technique for doing this has its roots in Paterson’s undecidability proof [19] from 1970 of the matrix mortality problem for 3×3 matrices.

Besides, we have brought to the light some papers that were apparently forgotten, like Nasu and Honda’s original undecidability proof from 1969 [17], and Claus [5]. Also, our technique for converting matrices with row sums 0 to positive matrices with row sums 1 (hence stochastic matrices) in Section 2.6 is more streamlined and elegant than the proofs that we have seen in the literature.

1.4 The Post Correspondence Problem (PCP)

In the Post Correspondence Problem (PCP), we are given a list of pairs of words (v1,w1),(v2,w2),,(vk,wk). The problem is to decide if there is a nonempty sequence i1i2im of indices ij{1,2,,k} such that

vi1vi2vim=wi1wi2wim

This is one of the well-known undecidable problems. According to Neary [18], the PCP is already undecidable with as few as five word pairs.

2 Proofs of Theorem 1a and Theorem 1b (few states)

We follow the same overall proof strategy as Claus [5], Blondel and Canterini [1] and Hirvensalo [13]: They use undecidable PCP instances with few word pairs and transform them to the Emptiness Problem for integer-weighted automata, which are then converted to PFAs. We deviate from this approach by using an automaton with fractional weights (Section 2.1). These matrices can be converted to column-stochastic matrices without the overhead of an extra state (Section 2.4).

The proof of Theorem 1a contains the main ideas. For the reduction to two matrices, we then apply a technique of Hirvensalo [13] (Section 3.1). All other results are obtained by slight variations of these methods in combination with appropriate results from the literature.

2.1 Step 1: Modeling word pairs by matrices

For a string u=u1u2un of decimal digits, we denote its fractional decimal value by 0.u=j=1nuj10j. For example, if u=𝟺𝟹𝟸𝟷𝟶𝟶, then 0.u=0.4321. We will take care to avoid trailing zeros, because their disappearance could cause problems.

For two strings v,w of digits in {𝟷𝟷,𝟷𝟸}, we define the matrix

A0(v,w)=(1000000.v10|v|0000(0.v)2210|v|0.v102|v|0000.w0010|w|00(0.w)200210|w|0.w102|w|00.v0.w10|v|0.w010|w|0.v010|v||w|).

It is not straightforward to see, but it can be checked by a simple calculation that the matrices A0(v,w) satisfy a multiplicative law (see [22, Appendix C] for a computer check):

Lemma 3 (Multiplicative Law).
A0(v1,w1)A0(v2,w2)=A0(v1v2,w1w2) (2)

Now we transform these matrices A0(v,w) by a similarity transformation with the transformation matrix

U=(100000010000199010000001000000991050000001),U1=(100000010000199010000001000000105990000001). (3)

The resulting matrices A(v,w)=U1A0(v,w)U differ in some entries of the first column and the fifth row:

(1000000.v10|v|0000(0.v)2+199(1102|v|)210|v|0.v102|v|0000.w0010|w|0099105(0.w)20019810510|w|0.w102|w|00.v0.w10|v|0.w010|w|0.v010|v||w|)

Since the matrices were obtained by a similarity, they satisfy the same multiplicative law:

A(v1,w1)A(v2,w2)=A(v1v2,w1w2) (4)

With the vectors π1=(199,0,1,0,10599,2) and f1=(1,0,0,0,0,0)T, we obtain

π1A(v,w)f1 =199(0.v)2199+102|v|/99(0.w)2+20.v0.w
=(0.v0.w)2+102|v|/99 (5)

The sign of this expression can detect inequality of v and w, as we will presently show.

We could have taken the simpler matrix A0 with π0=(0,0,1,0,1,2), and this would give π0A0(v,w)f1=(0.v0.w)2. The contortions with the matrix U were necessary to generate a tiny positive term in (5). The reason for the peculiar entries 99105 and 10599 in (3) and in π1 will become apparent after the next transformation.

2.2 Equality detection

Lemma 4 (Equality Detection Lemma).

Let v,w{𝟷𝟷,𝟷𝟸}.

  1. 1.

    If v=w, then (0.v0.w)2102|v|/99<0.

  2. 2.

    If vw, then (0.v0.w)2102|v|/99>0.

In particular, (0.v0.w)2102|v|/99 is never zero.

Proof.

The first statement is obvious.

To see the second statement, we first consider the case that one of the strings is a prefix of the other (Case A): If w is a prefix of v, then for fixed w, the smallest possible difference |0.v0.w| among the strings v that extend w is achieved when 0.v=0.w𝟷𝟷. Similarly, if v is a prefix of w, then the smallest possible difference is achieved when 0.w=0.v𝟷𝟷. In either case, |0.v0.w|10min{|v|,|w|}0.1110|v|0.11. Thus, (0.v0.w)2102|v|0.0121>102|v|0.010101=102|v|/99.

Case B: Neither of v and w is a prefix of the other. Suppose v and w share k leading digit pairs u{𝟷𝟷,𝟷𝟸}k, 0k<|v|/2. Then one of the two strings starts with u𝟷𝟸 and the other with u𝟷𝟷; the smallest difference |0.v0.w| between two such numbers is achieved between 0.u𝟷𝟸 and 0.u𝟷𝟷𝟷𝟸𝟷𝟸𝟷𝟸, and thus |0.v0.w|>102k0.00878787>102k0.00510|v|+20.005=10|v|/2. After squaring this relation, the claim follows with an ample margin.

2.3 Modeling the Post Correspondence Problem (PCP)

The multiplicative law (4) and the capacity to detect string equality are all that is needed to model the PCP.

In the PCP, it is no loss of generality to assume that the words in the pairs (vi,wi) use a binary alphabet, since any alphabet can be coded in binary. We recode them to the binary “alphabet” {𝟷𝟷,𝟷𝟸}, i.e., they become words in {𝟷𝟷,𝟷𝟸} of doubled length. We form the matrices Ai=A(vi,wi). Multiplicativity gives the relation Ai1Ai2Aim=A(vi1vi2vim,wi1wi2wim), and by Section 2.2, π1Ai1Ai2Aimf1>0 if and only if vi1vi2vim=wi1wi2wim, i. e., i1i2im is a solution of the PCP.

Since the value 0 is excluded by Section 2.2, nothing changes if the condition “>0” is replaced by “0.” This property will propagate through the proof, with the consequence that in the resulting theorems, it does not matter if we take >λ or λ as the acceptance criterion for the PFA. We will not mention this issue any more and work only with strict inequality.

There are two problems that we still have to solve:

  • The empty sequence (m=0) always fulfills the inequality although it does not count as a solution of the PCP.

  • The matrices Ai are not stochastic, and π1 is not a probability distribution. So far, what we have is a generalized probabilistic automaton, or rational-weighted automaton.

Turakainen [23] showed in 1969 that a generalized probabilistic automaton can be converted to a PFA without changing the recognized language (with the possible exception of the empty word), and this can even be done by adding only two more states [24, Theorem 1(i)]. Thus, by the general technique of [24], we can get a PFA with 8 states. We will use some tailored version of this method, involving nontrivial techniques, so that no extra state is needed to make the matrices stochastic and to simultaneously get rid of the empty solution.

History of ideas.

The idea of modeling of the PCP by multiplication of integer matrices was pioneered by Markov [15] in 1947. He used the matrices (1011) and (1101), which generate a free semigroup. A different representation, closer to the one we are using, goes back to Paterson [19] in 1970, who used it to show that mortality for 3×3 matrices is undecidable.

Our matrix A0(v,w) is a variation of the integer matrices that were proposed by Volker Claus [5] in 1981, and again by Blondel and Canterini [1, p. 235] in 2003, in the very context of constructing small undecidable instances of the PFA Emptiness Problem. These matrices extend Paterson’s matrices to larger matrices that can produce quadratic terms. They use positive powers of the base 10 and integer decimal values (u)10 of the strings u instead of fractional values 0.u. Such matrices satisfy a multiplicative law such as (2) but with a reversal of the factors. In fact, the method works equally well for any other radix instead of 10.222 “The notation is quaternary, decimal, etc., according to taste.” (Paterson [19]). Claus [5] used radix 3.

Our novel idea is to use this construction with negative powers of 10, leading to entries that are roughly at the same scale as a stochastic matrix, thus facilitating further transformations.

2.4 Step 2: Making column sums 1

We apply another similarity transformation, using the matrix

V=(111111010000001000000100000010000001),V1=(111111010000001000000100000010000001),

but this time, we also transform the vectors π1 and f1, leaving the overall result unchanged:

π1Ai1Ai2Aimf1 =(π1V)(V1Ai1V)(V1Ai2V)(V1AimV)(V1f1)
=π2Bi1Bi2Bimf2 (6)

We analyze the effect of the similarity transform for a matrix of the general form

A=(100000a21a220000a31a32a33000a41a42a43a4400a51a52a53a54a550a61a62a63a64a65a66).

The transformed matrix is

B=V1AV=(KK12Ka33a43a53a63Ka44a54a64Ka55a65Ka66a21a21+a22a21a21a21a21a31a31+a32a31+a33a31a31a31a41a41+a42a41+a43a41+a44a41a41a51a51+a52a51+a53a51+a54a51+a55a51a61a61+a62a61+a63a61+a64a61+a65a61+a66)

with K=1a21a31a41a51a61 and K12=Ka22a32a42a52a62.

Lemma 5.
  1. 1.

    the matrix B=V1AV has column sums 1.

  2. 2.

    For vϵ and wϵ, the matrix V1A(v,w)V is positive, and therefore column-stochastic.

Proof.

The first statement can be easily checked directly, but it has a systematic reason:

(1,1,1,1,1,1)V1AV=(1,0,0,0,0,0)AV=(1,0,0,0,0,0)V=(1,1,1,1,1,1)

The second statement is not needed for Theorem 1, because positivity is established anyway in Step 4, after destroying it in Step 3, but we need it for Theorem 2. So let us check it: The only entries that are in danger of becoming negative are the entries of the first row. The two entries a21=0.v and a41=0.w of the matrix A(v,w) can be as large as 0.1212120.15; all other entries are safely below 0.05. Thus, even the “most dangerous” candidate K12, where 10 of the entries aij are subtracted from 1, cannot be zero or negative.

It can be checked that rows 2–6 are positive, because the first column of A(v,w), which is positive, has been added to every other column.

In the transformation from π1 to π2=π1V, the first entry is added to all other entries. Thus, the vector π1=(199,0,1,0,1699,2), becomes π2=(199,199,1+199,199,1599,2+199), whose entries sum to zero:

π2(111111)=π1V(111111)=π1(611111)=0 (7)

It was for this reason that the entry 10599 of π1 and the corresponding entries of the transformations (3) were chosen.

The output vector f is unchanged by the transformation: f2=V1f1=f1.

2.5 Step 3: Making row sums 1 with an extra state

By adding an extra column ri, we now make all row sums equal to 1. We also add an extra row, resulting in a 7×7 matrix

Ci=(Biri01) (8)

The entries of ri may be negative. They lie in the range 5r1. All column sums are still 1: Since the row sums are 1, the total sum of the entries is 7. Therefore, since the first six column sums are 1, the sum of the last column must also be 1.

The vectors π2 and f2 are extended to vectors π3 and f3 of length 7 by adding a 0, but they are otherwise unchanged. Thus, the extra column and row of Ci plays no role for the value of the product (6), which remains unchanged:

π3Ci1Ci2Cimf3=π2Bi1Bi2Bimf2

2.6 Step 4: Making the matrices positive, and hence stochastic

Let J be the doubly-stochastic 7×7 transition matrix of the “completely random transition” with all entries 1/7. Then, with α=0.01, we form the matrices Di:=(1α)J+αCi. The constant α is small enough to ensure that Di>0. Hence the matrices Di are doubly-stochastic.

If expand of the product π3j=1mDij=π3j=1m((1α)J+αCij), we get a sum of 2m terms, each containing a product of m of the matrices J or Ci. We find that all terms containing the factor J vanish. The reason is that, since Ci has row and column sums 1, JCi=CiJ=J. Moreover, π3J=0 by (7). It follows that

π3Di1Di2Dimf3=αmπ3Ci1Ci2Cimf3

The factor αm plays no role for the sign, and hence,

π3Di1Di2Dimf3>0 (9)

if and only if i1i2im is a solution of the PCP.

2.7 Step 5: Turning 𝝅 into a probability distribution

The start vector π3 has sum 0 and contains negative entries, which are smaller than 1 but not smaller than 2. We form π4=((2,2,2,2,2,2,2)+π3))/14. It is positive and has sum 1. The effect of substituting π3 by π4 in (9) is that the result is increased by 1/7. The reason is that (1,1,1,1,1,1,1)Di1Di2Dim=(1,1,1,1,1,1,1) , since the matrices Di are doubly-stochastic, and in particular, column-stochastic, and therefore (2,2,2,2,2,2,2)/14Di1Di2Dimf3=2/14=1/7.

In summary, we have now constructed a true PFA with seven states that models the PCP:

π4Di1Di2Dimf3>1/7 (10)

if and only if i1i2im is a (possibly empty) solution of the PCP.

2.8 Using a small PCP

We base our proof on the undecidability of the PCP with 5 word pairs, as established by Turlough Neary [18, Theorem 11] in 2015. Neary constructed PCP instances with five pairs (v1,w1),(v2,w2),(v3,w3),(v4,w4),(v5,w5) that have the following property: Every solution necessarily starts with the pair (v1,w1) and ends with (v5,w5). We can also assume that the end pair (v5,w5) is used nowhere else. (More precisely, in every primitive solution (which is not a concatenation of smaller solutions), the end pair (v5,w5) cannot appear anywhere else than in the final position.) However, the start pair (v1,w1) is also used in the middle of the solutions. (This multipurpose usage of the start pair is one of the devices to achieve such a remarkably small number of word pairs.)333The proof of Theorem 11 in Neary [18] contains an error, but this error can be fixed: His PCP instances encode binary tag systems. When showing that the PCP solution must follow the intended patters of the simulation of the binary tag system, Neary [18, p. 660] needs to show that the end pair (v5,w5)=(10β1111,1111) cannot be used except to bring the two strings to a common end. He claims that a block 1111 cannot appear in the encoded string because in u (the unencoded string of the binary tag system, which is described in Lemma 9) we cannot have two c symbols next to each other. This is not true. The paper contains plenty of examples, and they contradict this claim; for example, the string u in (7) [18, p. 657] contains seven c’s in a row. The mistake can be fixed by taking a longer block of 1s: Looking at the appendants in Lemma 9, it is clear that every block of length |u|+1 must contain a symbol b. Thus, the pair (v5,w5)=(10β1|u|+99,1|u|+99) will work as end pair.

Reversing the words.

For our construction it is preferable to have the word pair (v5,w5) at the beginning. We thus reverse all words. Any solution sequence i1i2im for the original problem must be also be reversed to imim1i1, but this does not affect the solvability of the PCP. Thus, we work with PCP instances of the form (v1R,w1R),(v2R,w2R),,(v5R,w5R) with the following property: Every solution sequence i1i2im must start with the pair (v5R,w5R), and the pair (v5R,w5R) cannot be used anywhere else: i1=5 and 1ij4 for j=2,,m.

2.9 Step 6. Merging the leftmost matrix into the starting distribution

We apply the above construction steps to the word pairs (v1R,w1R),,(v5R,w5R), leading to five matrices D1,D2,D3,D4,D5. Since the leftmost matrix must be D5 in any solution, we can combine this matrix D5 with the starting distribution π4:

π4D5Di2Dimf3=π5Di2Dimf3,

leading to a new starting distribution π5:=π4D5. The matrix D5 can be removed from the pool of matrices, leaving only 4 matrices. We have simultaneously eliminated the empty solution with a product of m=0 matrices. This concludes the proof of Theorem 1a. ∎

2.10 Proof of Theorem 1b (fixed transition matrices) by using a PCP with only one variable word pair

Matiyasevich and Sénizergues [16] constructed PCP instances with seven word pairs (v1,w1),(v2,w2),(v3,w3),(v4,w4),(v5,w5),(v6,w6),(v7,w7) that have the following property: Every solution necessarily starts with the pair (v1,w1) and ends with (v2,w2). Both the start pair (v1,w1) and the end pair (v2,w2) can be assumed to appear nowhere else, in the sense described in Section 2.8, i. e., apart from the possibility to concatenate solutions to obtain longer solutions. Matiyasevich and Sénizergues [16] used a reduction, due to Claus [4], from the individual accessibility problem (or individual word problem) for semi-Thue systems. They showed that the individual accessibility problem is already undecidable for a particular semi-Thue system with 3 rules [16, Theorem 3]. Halava, Harju, and Hirvensalo [12, Theorem 6] observed an important consequence of this: One can fix all words except v1, and leave only v1 as an input to the problem, and the PCP is still undecidable.

From these word pairs, we form the matrices Ai=A(vi,wi) from the words without reversal. Following the same steps as above, we eventually arrive at corresponding matrices D1,,D7. We merge D1 with π4 into a new starting distribution π5:=π4D1. We are left with a pool ={D2,,D7} of six fixed matrices. The only variable input is the starting distribution π5. This concludes the proof of Theorem 1b. ∎

3 Proofs of Theorem 1c and Theorem 1d (binary input)

3.1 Reduction to two matrices

The following lemma and its proof is extracted from Step 3 in Hirvensalo [13, Section 3]. We denote by ϕ(u) the acceptance probability of a (generalized) PFA for an input u, i.e., the value of the product in the expression (1).

Lemma 6.

Let A=(π,,f) be a generalized PFA with k=|Σ|3 matrices Mi of size d×d, such that the first row of each matrix Mi is (1,0,,0).

Then one can obtain a generalized PFA A=(π,,f) over the two-symbol alphabet {𝚊,𝚋}, with 2 matrices M𝚊 and M𝚋 of dimension (k1)(d1)+1 such that for every word u{𝚊,𝚋} there exists a word uΣ with

ϕ(u)=ϕ(u), (11)

and conversely, for every word uΣ there exists a word u{𝚊,𝚋} with (11).

If the given matrices Mi are nonnegative, then so are M𝚊 and M𝚋. If the given matrices Mi are stochastic, then so are M𝚊 and M𝚋. If π is a probability distribution, then so is π. The entries of f are taken from the entries of f.

Thus, if A is a PFA, then B is a PFA as well.

The lemma is not specific about the word u or u whose existence is guaranteed. Nevertheless, regardless of how these words are found, the statement is sufficient in the context of the emptiness question.

However, we can be explicit about u and u: The construction uses a binary encoding τ:Σ{𝚊,𝚋} with the prefix-free codewords {𝚋,𝚊𝚋,𝚊𝚊𝚋,,𝚊k2𝚋,𝚊k1}. Then, we can take u=τ(u), because

ϕ(u)=ϕ(τ(u)).

For words u that are not of the form τ(u), (11) holds for the longest word u such that τ(u) is a prefix of u. In other words, u is the decoding of the longest decodable prefix of u.

Proof.

The procedure is easiest to describe if we assume that A is a PFA. Then we can think of A as an automaton that decodes the input u{𝚊,𝚋} and carries out a transition of A whenever it has read a complete codeword. In addition to the state q{1,,d} of A, A needs to maintain a counter i in the range 0ik2 for the number of a’s of the current partial codeword. Whenever A reads a b, or if it has read the (k1)-st a, A performs the appropriate random transition and resets the counter. The number of states of A is d×(k1).

The fact that state 1 is an absorbing state allows a shortcut: If we are in state 1, we can stop to maintain the counter i. Thus, only states 2,,d need to be multiplied by k1, and the resulting number of states reduces to (d1)(k1)+1.

We now describe the construction of A in terms of transition matrices. This construction is valid also when A is a generalized PFA, where the above description in terms of random transitions makes no sense. To make the description more concrete, we illustrate it with k=5 matrices M1,,M5 and the corresponding binary codewords b, ab, aab, aaab, aaaa.

We split the d×d matrices Mi and the vectors π and f into blocks of size 1+(d1):

Mi=(10ciC^i),π=(p1π^)T,f=(f1f^).

We define new transition matrices and start and end vectors in block form with block sizes 1+(d1)+(d1)+(d1)+(d1) as follows:

M𝚊=(1000000I00000I00000Ic5C^5000),M𝚋=(10000c1C^1000c2C^2000c3C^3000c4C^4000),π=(p1π^2000)T, and f=(f1f^f^f^f^).

From the sequence of powers of M𝚊,

(M𝚊)2=(10000000I00000Ic5C^5000c50C^500),(M𝚊)3=(100000000Ic5C^5000c50C^500c500C^50),

we can recognize the pattern of development. We can then work out the matrices M𝚋, M𝚊M𝚋, M𝚊M𝚊M𝚋, M𝚊M𝚊M𝚊M𝚋, and M𝚊M𝚊M𝚊M𝚊 and check that they are of the form

(10000ciC^i000)

for i=1,2,3,4,5, having the original matrices Mi in their upper-left corner and otherwise zeros in the first two rows of blocks. Thus, they simulate the original automaton on the states 1,,d: It is easy to establish by induction that multiplying the initial distribution vector π with a sequence of such matrices will produce a vector x of the form

x=(x1x^ 0  0  0) (12)

whose first d entries x=(x1x^) coincide with the entries of the corresponding vector produced with the original start vector π and the original matrices Mi. If x is multiplied with f, the result x1f1+x^Tf^ is the same as with x and the original vector f.

One technicality remains to be discussed: Some “unfinished” words in {𝚊,𝚋} do not factor into codewords but end in a partial codeword a, aa, or aaa. To analyze the corresponding matrix products, we look at the powers (M𝚊)i, for i=1,2,3. They have the form

(1000000I00),(10000000I0),and (100000000I). (13)

and therefore, multiplying the vector x from (12) with them yields (x1 0x^ 0 0), (x1 0 0x^ 0), and (x1 0 0 0x^), respectively. If this is multiplied with f, the result is the same as with the vector x in (12). Thus, as claimed after the statement of the lemma, input sequences whose decoding process leaves a partial codeword 𝚊i for 1ik2 produce the same value as if that partial codeword were omitted.444Hirvensalo [13, p. 314] defined the vector f (𝐲3 in his notation) “analogously” to the vector π (𝐱3 in his notation), thus padding it with zeros. We see that with this definition, the result for incomplete inputs is x1f1, which, in the general setting of our lemma, it has no predictable relation with the meaningful value x1f1+x^f^. In Hirsensalo’s case, x1f1 can be shown by a delicate argument to be 0, see [21, Section 8.2.3, footnote 31] in the updated version on the author’s homepage.

3.2 Proof of Theorem 1c

Section 3.1 blows up the number of states by a factor that depends on the number of matrices. Thus, it is advantageous to apply it after merging the start matrix into π, when the number of matrices is reduced.

So we start with the 5-pair instances of Neary [18] and construct five matrices Ai=A(viR,wiR) from the reversed pairs. We then combine A5 with the starting distribution π1 into π2=π1A5, leaving k=4 matrices A1,A2,A3,A4 of dimension d=6.

We cannot apply the transformations of Step 2, because it changes the first row, and state 1 would no longer be absorbing, which precludes the application of Section 3.1.

Thus, we apply Section 3.1 to the matrices A1,A2,A3,A4, resulting in two matrices M𝚊 and M𝚋 of dimension (k1)(d1)+1=16. The new start vector π3 is π2 padded with zeros, and the end vector f is still the first unit vector (of dimension 16).

We would now like to use the transformation of Step 2 to make the matrices column-stochastic. However, since we have replaced the initial start vector π1 by π2, the entries of the start vector after the transformation would not longer sum to 0, a property that is crucial for eventually making the matrices stochastic in Step 4.

Therefore we have to achieve column sums 1 in a different way, with an adaptation of the method of Step 3 (Section 2.5), adding an extra state;

3.3 Step 2: Making column sums 1 with an extra state

We add an extra row si and an extra column to each matrix, ensuring that all column sums become 1.

B1=(M𝚊0s11),B2=(M𝚋0s21) (14)

We now have two 17×17 matrices B1,B2 with column sums 1. The remaining steps (Steps 35) are as before, with the appropriate adaptations, and they add another row and column. (We may have to choose a smaller constant α in Step 4.) This concludes the proof of Theorem 1c. ∎

3.4 Proof of Theorem 1d (two fixed matrices)

This is obtained by adapting the Proof of Theorem 1c in the same way as for Theorem 1b (Section 2.10). Instead of Neary’s 5-pair instances, we take the instance (v1,w1),,(v7,w7) of Matiyasevich and Sénizergues and construct seven matrices Ai=A(vi,wi) from these pairs (without reversing the words vi and wi). We then combine A1, which must be the first matrix in the product, with the starting distribution π1 into π2=π1A1. The remaining matrices A2,A3,,A7 are fixed. Only π2 is an input to the problem.

The remainder of the proof is the same as in Theorem 1c. We apply Section 3.1 to k=6 matrices Ai of dimension d=6, resulting in two fixed matrices M𝚊 and M𝚋 of dimension (k1)(d1)+1=26. The conversion to a PFA requires two more states, and this leads to Theorem 1d. ∎

4 Proof of Theorem 2 (variable end vector 𝒇)

One can relax the requirement that f is a 0-1-vector and allow arbitrary values. If the values are in the interval [0,1] we can think of the entry fi as a probability in a final acceptance decision, if the PFA is in state i after the input has been read. Another possibility is that fi represents a prize or reward that that is gained when the process stops in state i. Then fi does not need to be restricted to the interval [0,1]. In this view, instead of the acceptance probability, we compute the expected reward of the automaton, as in game theory. We call fq the output values, and f and the output vector or the end vector, in analogy to the start vector π.

Since the transition matrices Bi after Step 2 are column-stochastic and positive by Section 2.4, we transpose them to produce stochastic matrices, which can be directly used as transition matrices. This will reverse the order of the matrices in the product and swap the start vector with the end vector:

π2Bi1Bi2Bimf2 =f2TBimTBim1TBi1Tπ2T
=π6BimTBim1TBi1Tf6 (15)

We may prefer to have the end vector f in the interval [0,1]. Thus, we replace f6=π2T by f7=((2,2,2,2,2,2)+π2))T/12, in analogy to Section 2.7. This vector is positive and has sum 1, and in particular, the entries lie between 0 and 1. The effect is to increase the value of (15) by 1/6.

We take the 5-pair instances of Neary [18] (Section 2.8) and construct five matrices Ai=A(viR,wiR) from the reversed pairs. The words vi and wi in these instances are nonempty, as required by Section 2.4 [18, Proof of Theorem 11]. We convert them to column-stochastic matrices Bi and use the transposed matrices BiT. We know that (v5R,w5R) must be used at the beginning of the PCP solution i1i2im (i1=5) and nowhere else. Hence B5T must be used at the end of the product in (15), and we can merge it with f6 into an end vector f8=B5Tf7=B5Tπ2T. The four remaining matrices B1T,B2T,B3T,B4T form the set . The starting distribution π6=f2T is a unit vector, i.e., there is a single deterministic start state. This proves Theorem 2a for λ=1/6.

4.1 Changing the cutpoint 𝝀 by manipulating the end vector 𝒇

When the end vector f of a PFA is under control, one can change the cutpoint λ to any desired value in the open interval between 0 and 1: Adding a constant K to all output values fi will increase the expected output value by K, and scaling by a positive constant will affect the expected output value in the same way.

Thus, by changing all fi to αfi for 0<α<1, one may decrease λ to any positive value. By changing all fi to 1α+αfi for 0<α<1, one may increase λ to any value less than 1. If f is not constrained to the interval [0,1], one can reach any real value λ.

4.2 Proof of Theorem 2b (fixed transition matrices)

We would like to apply the approach of Theorem 2a to the the seven word pairs (v1,w1),,(v7,w7) from Section 2.10. They should fulfill the assumption of Section 2.4 that none of the words is empty. However, one of the rules of the 3-rule semi-Thue system of Matiyasevich and Sénizergues, the rule xx¯ϵ, contains an empty word, see [16, System S5, (23)]. The reduction of Claus [4] (see also [12, Theorem 4]) would translate this into a PCP pair with an empty word.

Luckily, this reduction can be patched to yield seven pairs of nonempty words vi and wi. We refer to [22, Appendix B] for details.

As above, we form from these pairs the stochastic matrices B1T,,B7T. Only B1T is a variable matrix, and all other matrices are fixed. In the product π6BimTBim1TBi1Tf7, we merge the first matrix with π6 into π8:=π6BimT=π6B2T, which becomes the fixed start vector, and the last matrix into f8:=Bi1Tf7=B1Tf7, which becomes the only input to the problem. The remaining five matrices B3T,B4T,B5T,B6T,B7T form the fixed set . This proves Theorem 2b for λ=1/6, and as above, it can be changed to any cutpoint λ. ∎

5 Outlook

The natural question is to ask for the smallest number of states for which PFA Emptiness is undecidable. Even if we consider generalized PFAs (rational-weighted automata), we could not reduce this number below 6. Claus [5, Theorem 7 and Corollary, p. 155] showed that the emptiness question can be decided for PFAs with two states.

A matrix dimension of six seems to be the minimum required in order to carry enough information to model concatenation of word pairs and allow testing for equality by a quadratic expression like (5), even in weighted automata. Undecidability for five or perhaps even three states would require some completely different (perhaps geometric) approach.

It would be an interesting exercise to trace back the undecidability proof of Matiyasevich and Sénizergues [16] to its origins and explicitly work out the fixed matrices of Theorem 1b or 2b. For one of the weaker results mentioned after Theorem 2, [21, Theorem 4b], one particular matrix from a set of 52 fixed 11×11 matrices is shown in [21, Section 7.3].

PFAs can be have other merits than just a small number of states and input symbols. We discuss some of these criteria.

Positive and doubly-stochastic transition matrices.

In our results, the transition matrices can always be assumed to be strictly positive and sometimes also doubly-stochastic. Whenever this is the case, we have mentioned it in Theorems 1 and 2.

5.1 Obtaining a substantial gap between acceptance and rejection

As seen in formula (5), the acceptance probability barely rises above the threshold 0 when the input represents a solution of the PCP. (This tiny gap is further reduced by multiplying it with αm in Step 4.) Thus, our constructions depend on the capability of a PFA to “detect” minute fluctuations of the acceptance probability above the cutpoint. This statement applies to all undecidability proofs in the Nasu–Honda line of descent.

By contrast, the proof of Condon and Lipton [6] from 1989 gives a more robust result, see also [21, Section 4]: For any ε>0, it yields a PFA such that the largest acceptance probability is either 1ε or ε, and the problem to detect which of the two cases holds is undecidable. Undecidability is derived from the halting problem for 2-counter machines, and the number of states of the PFA is beyond control.

Luckily, our results can be strengthened to even surpass the strong separation achieved in the Condon–Lipton construction, by the following gap amplification technique of Gimbert and Oualhadj, which we formulate in slightly generalized form.

Theorem 7 (Gimbert and Oualhadj 2010 [10]).

We are given a PFA A with input alphabet Σ and d states, with an output vector f[0,1]d. We denote its acceptance probability for an input uΣ by ϕ(u). Let λA,λB be two thresholds with 0λA1 and 0<λB<1.

Then, from the description of A, we can construct a PFA B with input alphabet Σ{𝚎𝚗𝚍,𝚌𝚑𝚎𝚌𝚔} with 2d+3 states, a single start state and a single accepting state, with the following property.

  1. 1.

    If every input uΣ for A has acceptance probability ϕ(u)λA, then every input for B is accepted by B with probability λB.

  2. 2.

    If A has an input uΣ with acceptance probability ϕ(u)>λA, then B has inputs with acceptance probability arbitrarily close to 1. ∎

This was proved for λA=λB=1/2 by Gimbert and Oualhadj [10] in 2010 in order to show that it is undecidable whether the conclusion of Case 2 for a PFA B holds (the “Value 1 Problem”). The construction and proof was simplified by Fijalkow as part of a short 8-page note [8]. The generalization to arbitrary λA and λB is not difficult, and in addition, we have included the precise statement about the number 2d+3 of states. A proof is given in [22, Appendix A]. It essentially follows Fijalkow’s construction, and it eliminates an oversight in [10] and [8].

With this technique, one can achieve an arbitrarily large gap with a moderate increase in states, roughly by a factor of 2. In particular, from Theorem 2a, we get a PFA B with 6 matrices of size 15×15, which exhibits the strong dichotomy expressed in Theorem 7, for any λB>0.

This construction does not preserve the property of being a PFA with fixed transition matrices. In the PFA B constructed in [22, Appendix A, see Table 2], the transitions depend both on the starting distribution π and on the final vector f of A.

5.2 Uniqueness

In Theorem 1a and Theorem 2a, the constructed PFA has the property that the recognized language contains at most one word. As with the large gap guarantee in Section 5.1, this leads to a stronger statement where the problem gets the nature of a promise problem.

Neary [18] derived his undecidable PCP instances from binary tag systems. A binary tag system performs a deterministic computation. It follows from the correctness argument of the simulation that the PCP solution is unique if it exists, apart from the obvious possibility of repeating the solution arbitrarily. In Section 2.8, we have excluded the last possibility by fixing the end pair (u5R,w5R) to be the first pair and removing it from the list. Thus, uniqueness holds for the PFA in Theorems 1a and 2a.

However, in the conversion to a binary input alphabet for Theorem 1c (Section 3.1), uniqueness is lost: We have seen that we may always add a or aa to a solution. We don’t see an easy way to eliminate these extra solutions without increasing the number of states.

5.3 Simple probabilistic automata

In a simple PFA, all probabilities are 0, 12, or 1 [10, Definition 2]. We have constructed our PFA with decimal fractions, but it would not be hard to switch to binary fractions. Before Step 4, the number of states should be increased to a power of two, so that the entries of J become binary fractions as well. Once all transition probabilities are binary fractions, the PFA can be converted to a simple PFA by padding each input symbol with sufficiently many dummy symbols so that the PFA has time to make its random decisions with a sequence of unbiased coin flips; see [21, Section 4.4, proof of Theorem 1, item (b)] Thus the results can be achieved with simple PFAs, but with a larger number of matrices and a larger number of states. The precise quantities would depend on the lengths of the words vi and wi in the PCP.

References

  • [1] Vincent D. Blondel and Vincent Canterini. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Systems, 36:231–245, 2003. doi:10.1007/s00224-003-1061-2.
  • [2] Vincent D. Blondel and John N. Tsitsiklis. The boundedness of all products of a pair of matrices is undecidable. Systems & Control Letters, 41(2):135–140, 2000. doi:10.1016/S0167-6911(00)00049-9.
  • [3] Vuong Bui. Growth of Bilinear Maps. PhD thesis, Freie Universität Berlin, Institut für Informatik, 2023. doi:10.17169/refubium-41705.
  • [4] Volker Claus. Some remarks on PCP(k) and related problems. Bull. Europ. Assoc. Theoret. Computer Sci. (EATCS), 12:54–61, October 1980. URL: http://page.mi.fu-berlin.de/rote/Kram/Volker_Claus-Some_remarks_on_PCP(k)_and_related_problems-1980-Bull-EATCS-12.pdf.
  • [5] Volker Claus. The (n,k)-bounded emptiness-problem for probabilistic acceptors and related problems. Acta Informatica, 16:139–160, 1981. doi:10.1007/BF00261257.
  • [6] A. Condon and R. J. Lipton. On the complexity of space bounded interactive proofs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, pages 462–467, USA, 1989. IEEE Computer Society. doi:10.1109/SFCS.1989.63519.
  • [7] Ruiwen Dong. Recent advances in algorithmic problems for semigroups. ACM SIGLOG News, 10(4):3–23, December 2023. doi:10.1145/3636362.3636365.
  • [8] Nathanaël Fijalkow. Undecidability results for probabilistic automata. ACM SIGLOG News, 4(4):10–17, November 2017. doi:10.1145/3157831.3157833.
  • [9] Rūsiņš Freivalds. Probabilistic two-way machines. In Jozef Gruska and Michal Chytil, editors, Mathematical Foundations of Computer Science 1981 (MFCS), volume 118 of Lecture Notes in Computer Science, pages 33–45. Springer, 1981. doi:10.1007/3-540-10856-4_72.
  • [10] Hugo Gimbert and Youssouf Oualhadj. Probabilistic automata on finite words: Decidable and undecidable problems. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming. 37th International Colloquium, ICALP 2010, Bordeaux, France, July 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pages 527–538, Berlin, Heidelberg, 2010. Springer-Verlag. Full version in https://hal.science/hal-00456538v3. doi:10.1007/978-3-642-14162-1_44.
  • [11] Vesa Halava and Tero Harju. On Markov’s undecidability theorem for integer matrices. Semigroup Forum, 75:173–180, 2007. doi:10.1007/s00233-007-0714-x.
  • [12] Vesa Halava, Tero Harju, and Mika Hirvensalo. Undecidability bounds for integer matrices using Claus instances. International Journal of Foundations of Computer Science, 18(05):931–948, 2007. doi:10.1142/S0129054107005066.
  • [13] Mika Hirvensalo. Improved undecidability results on the emptiness problem of probabilistic and quantum cut-point languages. In Jan van Leeuwen, Giuseppe F. Italiano, Wiebe van der Hoek, Christoph Meinel, Harald Sack, and František Plášil, editors, SOFSEM 2007: Theory and Practice of Computer Science, volume 4362 of Lecture Notes in Computer Science, pages 309–319, Berlin, Heidelberg, 2007. Springer. doi:10.1007/978-3-540-69507-3_25.
  • [14] Omid Madani, Steve Hanks, and Anne Condon. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1–2):5–34, 2003. doi:10.1016/S0004-3702(02)00378-8.
  • [15] A. Markov. On certain insoluble problems concerning matrices. Doklady Akad. Nauk SSSR (N.S.), 57:539–542, 1947. (Russian).
  • [16] Yuri Matiyasevich and Géraud Sénizergues. Decision problems for semi-Thue systems with a few rules. Theoretical Computer Science, 330(1):145–169, 2005. doi:10.1016/j.tcs.2004.09.016.
  • [17] Masakazu Nasu and Namio Honda. Mappings induced by PGSM-mappings and some recursively unsolvable problems of finite probabilistic automata. Information and Control, 15(3):250–273, 1969. doi:10.1016/S0019-9958(69)90449-5.
  • [18] Turlough Neary. Undecidability in binary tag systems and the Post correspondence problem for five pairs of words. In Ernst W. Mayr and Nicolas Ollinger, editors, 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), volume 30 of Leibniz International Proceedings in Informatics (LIPIcs), pages 649–661, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2015.649.
  • [19] Michael S. Paterson. Unsolvability in 3×3 matrices. Stud. in Appl. Math., 49(1):105–107, 1970. doi:10.1002/sapm1970491105.
  • [20] Azaria Paz. Introduction to Probabilistic Automata. Computer Science and Applied Mathematics. Academic Press, New York, 1971. doi:10.1016/c2013-0-11297-4.
  • [21] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable, June 2024. doi:10.48550/arXiv.2405.03035.
  • [22] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable for a fixed automaton, December 2024. doi:10.48550/arXiv.2412.05198.
  • [23] Paavo Turakainen. Generalized automata and stochastic languages. Proc. Amer. Math. Soc., 21:303–309, 1969. doi:10.1090/S0002-9939-1969-0242596-1.
  • [24] Paavo Turakainen. Word-functions of stochastic and pseudo stochastic automata. Annales Fennici Mathematici, 1(1):27–37, February 1975. doi:10.5186/aasfm.1975.0126.