Probabilistic Finite Automaton Emptiness Is Undecidable for a Fixed Automaton

Rote, Günter

doi:10.4230/LIPIcs.MFCS.2025.86

Probabilistic Finite Automaton Emptiness Is Undecidable for a Fixed Automaton

Günter Rote

Institut für Informatik, Freie Universität Berlin, Germany

Abstract

We construct a probabilistic finite automaton (PFA) with 7 states and an input alphabet of 5 symbols for which the PFA Emptiness Problem is undecidable. The only input for the decision problem is the starting distribution. For the proof, we use reductions from special instances of the Post Correspondence Problem.

We also consider some variations: The input alphabet of the PFA can be restricted to a binary alphabet at the expense of a larger number of states. If we allow a rational output value for each state instead of a yes-no acceptance decision, the number of states can even be reduced to 6.

Keywords and phrases:

Probabilistic finite automaton, Undecidability, Post Correspondence Problem

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Formal languages and automata theory

Editors:

Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Probabilistic finite automata (PFA)

A probabilistic finite automaton (PFA) combines characteristics of a finite automaton and a Markov chain. We give a formal definition below. Informally, we can think of a PFA in terms of an algorithm that reads a sequence of input symbols from left to right, having only finite memory. That is, it can manipulate a finite number of variables with bounded range, just like an ordinary finite automaton. In addition, a PFA can make coin flips. As a consequence, the question whether the PFA arrives in an accepting state and thus accepts a given input word is not a yes/no decision, but it happens with a certain probability. The language recognized (or represented) by a PFA is defined by specifying a probability threshold or cutpoint $\lambda$ . By convention, the language consists of all words for which the probability of acceptance strictly exceeds $\lambda$ .

The PFA Emptiness Problem is the problem of deciding whether this language is empty. This problem is undecidable. Many other undecidability results rely on the undecidability of the PFA Emptiness Problem, for example, problems about matrix products [2], growth problems [3, Chapter 6], or probabilistic planning problems [14].

We present some sharpenings of the undecidability statement, where certain parameters of the PFA are restricted.

1.1 Formal problem definition

Formally, a PFA is given by a sequence of stochastic transition matrices $M_{\sigma}$ , one for each letter $\sigma$ from the input alphabet $\Sigma$ . The states correspond to the rows and columns of the matrices. Thus, the matrices are $d\times d$ matrices for a PFA with $d$ states. The start state is chosen according to a given probability distribution $\pi\in\mathbb{R}^{d}$ . The set of accepting states is characterized by a 0-1-vector $f\in\{0,1\}^{d}$ . In terms of these data, the PFA Emptiness Problem with cutpoint $\lambda$ is as follows:

PFA Emptiness. Given a set of $k$ stochastic matrices $\mathcal{M}=\{M_{1},\ldots,M_{k}\}\subset\mathbb{Q}^{d\times d}$ , a probability distribution $\pi\in\mathbb{Q}^{d}$ , and a 0-1-vector $f\in\{0,1\}^{d}$ , is there a sequence $i_{1},i_{2},\ldots,i_{m}$ with $1\leq i_{j}\leq k$ for $j=1,\ldots,m$ such that

$\pi^{T}M_{i_{1}}M_{i_{2}}\ldots M_{i_{m}}f>\lambda\ ?$ (1)

The most natural choice is $\lambda=\frac{1}{2}$ , but the problem is undecidable for any fixed (rational or irrational) cutpoint $\lambda$ with $0<\lambda<1$ . We can also ask $\geq\lambda$ instead of $>\lambda$ .

Table 1: The main characteristics of the data

\pi

,

\mathcal{M}

, and

f

for different undecidable versions of PFA Emptiness. The vectors

e_{1}

and

e_{2}

are two standard unit vectors of appropriate dimension, indicating a single accepting state or a single deterministic start state. The results under the double line are previous results from the literature, which refer to results of Claus [5, Theorem 6(iii)] from 1981, Blondel and Canterini [1, Theorem 2.1] from 2003, and Hirvensalo [13, Section 3] from 2007. The numbers in parentheses are the figures that would be obtained by basing the proofs on Neary’s undecidable 5-word instances of the Post Correspondence Problem (PCP) instead of the smallest known undecidable PCP instances that were current at the time, see [21, Theorems 5 and 6].

Theorem	$\pi$	$\|\mathcal{M}\|$	$M\in\mathcal{M}$	$f$	acceptance crit.
Thm. 1 a	input	4	$7\times 7$ , positive, input	$f=e_{1}$	$>1/7$ or $\geq 1/7$
Thm. 1 b	input	6	$7\times 7$ , positive, fixed	$f=e_{1}$	$>1/7$ or $\geq 1/7$
Thm. 1 c	input	2	$18\times 18$ , positive, input	$f=e_{1}$	$>1/18$ or $\geq 1/18$
Thm. 1 d	input	2	$28\times 28$ , positive, fixed	$f=e_{1}$	$>1/28$ or $\geq 1/28$
Thm. 2 a	$\pi=e_{2}$	4	$6\times 6$ , positive, input	$\thinspace f\in[0,1]^{6}$ , input	$>\lambda$ or $\geq\lambda$
Thm. 2 b	fixed	5	$6\times 6$ , positive, fixed	$\thinspace f\in[0,1]^{6}$ , input	$>\lambda$ or $\geq\lambda$
Claus [5]	$\pi=e_{2}$	9 (5)	$9\times 9$ , positive, input	$f=e_{1}$	$>1/9$
Claus [5]	$\pi=e_{2}$	2	$65\times 65$ , positive, input	$f=e_{1}$	$>1/65$
	$\pi=e_{2}$	2	( $37\times 37$ ), positive, input	$f=e_{1}$	$>1/37$
B.&C. [1]	$\pi=e_{2}$	2	$46\times 46$ , positive, input	$f=e_{1}$	$>1/46$ or $\geq 1/46$
	$\pi=e_{2}$	2	( $34\times 34$ ), positive, input	$f=e_{1}$	$>1/34$ or $\geq 1/34$
Hirv. [13]	$\pi=e_{2}$	2	$25\times 25$ , positive, input	$f=e_{1}$	$>1/25$
	$\pi=e_{2}$	2	( $20\times 20$ ), positive, input	$f=e_{1}$	$>1/20$

1.2 Statement of results

There are three different proofs of the basic undecidability theorem in the literature. The first proof is due to Masakazu Nasu and Namio Honda [17] from 1969.¹¹1This proof is commonly misattributed to Paz [20], although Paz gave credit to Nasu and Honda [20, Section IIIB.7, Bibliographical notes, p. 193]. The second proof, by Volker Claus [5], is loosely related to this proof. Both proofs use a reduction from the Post Correspondence Problem (PCP, see Section 1.4). A completely independent proof with a very different approach, namely a reduction from the halting problem for 2-counter machines, was given by Anne Condon and Richard J. Lipton [6] in 1989, based on ideas of Rūsiņš Freivalds [9] from 1981. The somewhat intricate history is described in [21, Section 3] as part of an extensive survey of the various undecidability proofs.

PFA Emptiness remains undecidable under various constraints on the number of transition matrices (size of the alphabet) and their size (number of states). The first result in this direction is due to Claus [5] from 1981. Later improvements were made by Blondel and Canterini [1] and Hirvensalo [13], who were apparently unaware of [5] and concentrated on the case of two matrices (binary input alphabet). An overview of these results is shown in Table 1 below the double line.

We improve these bounds, concentrating on the minimum number of states without restricting the number of matrices, see Theorem 1. (Such results are only implicit in the proofs of [1] and [13].) Undecidability even holds for a PFA where all data except the starting distribution $\pi$ are fixed.

We also get improved bounds for the 2-matrix case. For the variation where the output vector $f$ is not restricted to a 0-1-vector, we can reduce the number of states to 6 (Theorem 2).

Our results are stated in the following theorems and summarized in Table 1.

Theorem 1.

The PFA Emptiness Problem is undecidable for PFAs with a single accepting state and the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):

(a)

$\mathcal{M}$ consists of 4 positive doubly-stochastic transition matrices of size $7\times 7$ , with cutpoint $\lambda=1/7$ .
(b)

$\mathcal{M}$ consists of 6 fixed positive doubly-stochastic transition matrices of size $7\times 7$ , with cutpoint $\lambda=1/7$ . The only variable input is the starting distribution $\pi\in[0,1]^{7}$ .
(c)

$\mathcal{M}$ consists of 2 positive transition matrices of size $18\times 18$ , with cutpoint $\lambda=1/18$ .
(d)

$\mathcal{M}$ consists of 2 fixed positive transition matrices of size $28\times 28$ , with cutpoint $\lambda=1/28$ .

All statements hold also for weak inequality $(\geq\lambda)$ as the acceptance criterion.

Theorem 2.

For any cutpoint $\lambda$ in the interval $0<\lambda<1$ , the PFA Emptiness Problem with an output vector $f$ with entries $0\leq f_{i}\leq 1$ is undecidable for PFAs with the following restrictions on the number of transition matrices (size of the input alphabet) and their size (number of states):

(a)

$\mathcal{M}$ consists of 4 positive transition matrices of size $6\times 6$ . There is a fixed deterministic start state.
(b)

$\mathcal{M}$ consists of 5 fixed positive transition matrices of size $6\times 6$ . There is a fixed starting distribution $\pi$ . The only input of the problem is the vector $f\in[0,1]^{6}$ of output values.

All statements hold also for weak inequality $(\geq\lambda)$ as the acceptance criterion.

By combining the different proof techniques, one can extend Theorem 2 to PFAs with two matrices and a fixed start state or starting distribution, analogous to Theorem 1 c–d, but we have not worked out these results.

Some weaker results with fixed matrices were previously obtained in a technical report [21, Theorems 2 and 4]. For example, PFA Emptiness was shown to be undecidable for 52 fixed $9\times 9$ matrices, in the setting of Theorem 2 where the final vector $f$ is the only input, with acceptance criterion $\geq 1/2$ . For acceptance with strict inequality ( $>1/4$ ), 52 fixed matrices of size $11\times 11$ were needed. In these constructions, the PCP was derived from a universal Turing machine. For the case of 2 fixed matrices with a single accepting state, where the start distribution $\pi$ is the only input, a bound of 572 states was obtained [21, Theorem 3].

Rational-Weigthed Automata.

A weighted automaton over the reals or a generalized probabilistic automaton or pseudo-stochastic automaton is similar, but the start vector $\pi$ , the end vector $f$ , and the matrices $M_{i}$ can be filled with arbitrary real numbers. These automata are in some ways more natural than PFAs, and, in particular with rational weights, they have recently attracted a lot of attention. Since they generalize PFAs, all our undecidability results immediately carry over to rational-weighted automata.

1.3 Contributions of this paper and relations to other problems

The most striking result of the paper is that PFA Emptiness is undecidable already for a fixed PFA, where only the starting distribution $\pi$ , or the output vector $f$ is an input. This follows from a combination of known ideas: The main observation, namely that the reduction of Matiyasevich and Sénizergues [16] from undecidable semi-Thue systems leads to instances of the Post Correspondence problem (PCP) where only the starting pair is variable, has been made by Halava, Harju, and Hirvensalo in 2007 [12, Theorem 6]. The idea of merging the first or last matrix into the starting distribution $\pi$ or into the final vector $f$ was used by Hirvensalo in 2007 [13, Step 2 in Section 3].

On the technical side, our major contribution is the reduction of the number of states to six, even for PFAs, and even while keeping the matrices fixed, for the version where the input is the (fractional) output vector $f$ .

The PFA Emptiness Problem is a problem about matrix products. There is a great variety of problems about matrix products whose undecidability has been studied; see [7] for a recent survey of this rich area. In fact, the first problem outside the fields of mathematical logic and theory of computing that was shown to be undecidable is a problem about matrix products: Markov proved in 1947 that it is undecidable whether the semigroups generated two finite sets of matrices contain a common element [15]; see Halava and Harju [11] for a presentation of Markov’s proof (as well as an alternative proof). Our basic approach is the same as Markov’s: to model the Post Correspondence Problem (PCP) by matrix products. Our particular technique for doing this has its roots in Paterson’s undecidability proof [19] from 1970 of the matrix mortality problem for $3\times 3$ matrices.

Besides, we have brought to the light some papers that were apparently forgotten, like Nasu and Honda’s original undecidability proof from 1969 [17], and Claus [5]. Also, our technique for converting matrices with row sums 0 to positive matrices with row sums 1 (hence stochastic matrices) in Section 2.6 is more streamlined and elegant than the proofs that we have seen in the literature.

1.4 The Post Correspondence Problem (PCP)

In the Post Correspondence Problem (PCP), we are given a list of pairs of words $(v_{1},w_{1}),\allowbreak(v_{2},w_{2}),\ldots,(v_{k},w_{k})$ . The problem is to decide if there is a nonempty sequence $i_{1}i_{2}\ldots i_{m}$ of indices $i_{j}\in\{1,2,\ldots,k\}$ such that

v_{i_{1}}v_{i_{2}}\ldots v_{i_{m}}=w_{i_{1}}w_{i_{2}}\ldots w_{i_{m}}

This is one of the well-known undecidable problems. According to Neary [18], the PCP is already undecidable with as few as five word pairs.

2 Proofs of Theorem 1 a and Theorem 1 b (few states)

We follow the same overall proof strategy as Claus [5], Blondel and Canterini [1] and Hirvensalo [13]: They use undecidable PCP instances with few word pairs and transform them to the Emptiness Problem for integer-weighted automata, which are then converted to PFAs. We deviate from this approach by using an automaton with fractional weights (Section 2.1). These matrices can be converted to column-stochastic matrices without the overhead of an extra state (Section 2.4).

The proof of Theorem 1 a contains the main ideas. For the reduction to two matrices, we then apply a technique of Hirvensalo [13] (Section 3.1). All other results are obtained by slight variations of these methods in combination with appropriate results from the literature.

2.1 Step 1: Modeling word pairs by matrices

For a string $u=u_{1}u_{2}\ldots u_{n}$ of decimal digits, we denote its fractional decimal value by $0.u=\sum_{j=1}^{n}u_{j}\cdot 10^{-j}$ . For example, if $u=\mathtt{432100}$ , then $0.u=0.4321$ . We will take care to avoid trailing zeros, because their disappearance could cause problems.

For two strings $v, w$ of digits in $\{\mathtt{11},\mathtt{12}\}^{*}$ , we define the matrix

A_{0}(v,w)=\begin{pmatrix}1&0&0&0&0&0\\ 0.v&10^{-|v|}&0&0&0&0\\ (0.v)^{2}&2\cdot 10^{-|v|}\cdot 0.v&10^{-2|v|}&0&0&0\\ 0.w&0&0&10^{-|w|}&0&0\\ (0.w)^{2}&0&0&2\cdot 10^{-|w|}\cdot 0.w&10^{-2|w|}&0\\ 0.v\cdot 0.w&10^{-|v|}\cdot 0.w&0&10^{-|w|}\cdot 0.v&0&10^{-|v|-|w|}\end{% pmatrix}.

It is not straightforward to see, but it can be checked by a simple calculation that the matrices $A_{0}(v,w)$ satisfy a multiplicative law (see [22, Appendix C] for a computer check):

Lemma 3 (Multiplicative Law).

A_{0}(v_{1},w_{1})A_{0}(v_{2},w_{2})=A_{0}(v_{1}v_{2},w_{1}w_{2})

(2)

Now we transform these matrices $A_{0}(v,w)$ by a similarity transformation with the transformation matrix

U=\begin{pmatrix}1&0&0&0&0&0\\ 0&1&0&0&0&0\\ \frac{1}{99}&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&\frac{99}{105}&0\\ 0&0&0&0&0&1\end{pmatrix},\quad U^{-1}=\begin{pmatrix}1&0&0&0&0&0\\ 0&1&0&0&0&0\\ -\frac{1}{99}&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&\frac{105}{99}&0\\ 0&0&0&0&0&1\end{pmatrix}.

(3)

The resulting matrices $A(v,w)=U^{-1}A_{0}(v,w)U$ differ in some entries of the first column and the fifth row:

\begin{pmatrix}1&0&0&0&0&0\\ 0.v&10^{-|v|}&0&0&0&0\\ (0.v)^{2}+\frac{1}{99}(1{-}10^{-2|v|})&2\cdot 10^{-|v|}\cdot 0.v&10^{-2|v|}&0&% 0&0\\ 0.w&0&0&10^{-|w|}&0&0\\ \frac{99}{105}\cdot(0.w)^{2}&0&0&\frac{198}{105}\cdot 10^{-|w|}\cdot 0.w&10^{-% 2|w|}&0\\ 0.v\cdot 0.w&10^{-|v|}\cdot 0.w&0&10^{-|w|}\cdot 0.v&0&10^{-|v|-|w|}\end{pmatrix}

Since the matrices were obtained by a similarity, they satisfy the same multiplicative law:

A(v_{1},w_{1})A(v_{2},w_{2})=A(v_{1}v_{2},w_{1}w_{2})

(4)

With the vectors $\pi_{1}=(\frac{1}{99},0,-1,0,-\frac{105}{99},2)$ and $f_{1}=(1,0,0,0,0,0)^{T}$ , we obtain

	$\displaystyle\pi_{1}A(v,w)f_{1}$	$\displaystyle=\tfrac{1}{99}-(0.v)^{2}-\tfrac{1}{99}+10^{-2\|v\|}/99-(0.w)^{2}+2% \cdot 0.v\cdot 0.w$
		$\displaystyle=-(0.v-0.w)^{2}+10^{-2\|v\|}/99$		(5)

The sign of this expression can detect inequality of $v$ and $w$ , as we will presently show.

We could have taken the simpler matrix $A_{0}$ with $\pi_{0}=(0,0,-1,0,-1,2)$ , and this would give $\pi_{0}A_{0}(v,w)f_{1}=-(0.v-0.w)^{2}$ . The contortions with the matrix $U$ were necessary to generate a tiny positive term in (5). The reason for the peculiar entries $\frac{99}{105}$ and $\frac{105}{99}$ in (3) and in $\pi_{1}$ will become apparent after the next transformation.

2.2 Equality detection

Lemma 4 (Equality Detection Lemma).

Let $v,w\in\{\mathtt{11},\mathtt{12}\}^{*}$ .

1.

If $v=w$ , then $(0.v-0.w)^{2}-10^{-2|v|}/99<0$ .
2.

If $v\neq w$ , then $(0.v-0.w)^{2}-10^{-2|v|}/99>0$ .

In particular, $(0.v-0.w)^{2}-10^{-2|v|}/99$ is never zero.

Proof.

The first statement is obvious.

To see the second statement, we first consider the case that one of the strings is a prefix of the other (Case A): If $w$ is a prefix of $v$ , then for fixed $w$ , the smallest possible difference $|0.v-0.w|$ among the strings $v$ that extend $w$ is achieved when $0.v=0.w\mathtt{11}$ . Similarly, if $v$ is a prefix of $w$ , then the smallest possible difference is achieved when $0.w=0.v\mathtt{11}$ . In either case, $|0.v-0.w|\geq 10^{-\min\{|v|,|w|\}}\cdot 0.11\geq 10^{-|v|}\cdot 0.11$ . Thus, $(0.v-0.w)^{2}\geq 10^{-2|v|}\cdot 0.0121>10^{-2|v|}\cdot 0.010101\ldots=10^{-2% |v|}/99$ .

Case B: Neither of $v$ and $w$ is a prefix of the other. Suppose $v$ and $w$ share $k$ leading digit pairs $u\in\{\mathtt{11},\mathtt{12}\}^{k}$ , $0\leq k<|v|/2$ . Then one of the two strings starts with $u\mathtt{12}$ and the other with $u\mathtt{11}$ ; the smallest difference $|0.v-0.w|$ between two such numbers is achieved between $0.u\mathtt{12}$ and $0.u\mathtt{11121212\ldots}$ , and thus $|0.v-0.w|>10^{-2k}\cdot 0.00878787\ldots>10^{-2k}\cdot 0.005\geq 10^{-|v|+2}% \cdot 0.005=10^{-|v|}/2$ . After squaring this relation, the claim follows with an ample margin. $\hfill\blacktriangleleft$

2.3 Modeling the Post Correspondence Problem (PCP)

The multiplicative law (4) and the capacity to detect string equality are all that is needed to model the PCP.

In the PCP, it is no loss of generality to assume that the words in the pairs $(v_{i},w_{i})\allowbreak$ use a binary alphabet, since any alphabet can be coded in binary. We recode them to the binary “alphabet” $\{\mathtt{11},\mathtt{12}\}$ , i.e., they become words in $\{\mathtt{11},\mathtt{12}\}^{*}$ of doubled length. We form the matrices $A_{i}=A(v_{i},w_{i})$ . Multiplicativity gives the relation $A_{i_{1}}A_{i_{2}}\ldots A_{i_{m}}=A(v_{i_{1}}v_{i_{2}}\ldots v_{i_{m}},% \allowbreak w_{i_{1}}w_{i_{2}}\ldots w_{i_{m}})$ , and by Section 2.2, $\pi_{1}A_{i_{1}}A_{i_{2}}\ldots A_{i_{m}}f_{1}>0$ if and only if $v_{i_{1}}v_{i_{2}}\ldots v_{i_{m}}=w_{i_{1}}w_{i_{2}}\ldots w_{i_{m}}$ , i. e., $i_{1}i_{2}\ldots i_{m}$ is a solution of the PCP.

Since the value 0 is excluded by Section 2.2, nothing changes if the condition “ $>0$ ” is replaced by “ $\geq 0$ .” This property will propagate through the proof, with the consequence that in the resulting theorems, it does not matter if we take $>\lambda$ or $\geq\lambda$ as the acceptance criterion for the PFA. We will not mention this issue any more and work only with strict inequality.

There are two problems that we still have to solve:

$\blacksquare$

The empty sequence ( $m=0$ ) always fulfills the inequality although it does not count as a solution of the PCP.
$\blacksquare$

The matrices $A_{i}$ are not stochastic, and $\pi_{1}$ is not a probability distribution. So far, what we have is a generalized probabilistic automaton, or rational-weighted automaton.

Turakainen [23] showed in 1969 that a generalized probabilistic automaton can be converted to a PFA without changing the recognized language (with the possible exception of the empty word), and this can even be done by adding only two more states [24, Theorem 1(i)]. Thus, by the general technique of [24], we can get a PFA with 8 states. We will use some tailored version of this method, involving nontrivial techniques, so that no extra state is needed to make the matrices stochastic and to simultaneously get rid of the empty solution.

History of ideas.

The idea of modeling of the PCP by multiplication of integer matrices was pioneered by Markov [15] in 1947. He used the matrices $(\begin{smallmatrix}1&0\\ 1&1\end{smallmatrix})$ and $(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix})$ , which generate a free semigroup. A different representation, closer to the one we are using, goes back to Paterson [19] in 1970, who used it to show that mortality for $3\times 3$ matrices is undecidable.

Our matrix $A_{0}(v,w)$ is a variation of the integer matrices that were proposed by Volker Claus [5] in 1981, and again by Blondel and Canterini [1, p. 235] in 2003, in the very context of constructing small undecidable instances of the PFA Emptiness Problem. These matrices extend Paterson’s matrices to larger matrices that can produce quadratic terms. They use positive powers of the base 10 and integer decimal values $(u)_{10}$ of the strings $u$ instead of fractional values $0.u$ . Such matrices satisfy a multiplicative law such as (2) but with a reversal of the factors. In fact, the method works equally well for any other radix instead of 10.²²2 “The notation is quaternary, decimal, etc., according to taste.” (Paterson [19]). Claus [5] used radix 3.

Our novel idea is to use this construction with negative powers of 10, leading to entries that are roughly at the same scale as a stochastic matrix, thus facilitating further transformations.

2.4 Step 2: Making column sums 1

We apply another similarity transformation, using the matrix

V=\begin{pmatrix}1&1&1&1&1&1\\ 0&1&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&1\end{pmatrix},\ V^{-1}=\begin{pmatrix}1&-1&-1&-1&-1&-1\\ 0&1&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&1\end{pmatrix},

but this time, we also transform the vectors $\pi_{1}$ and $f_{1}$ , leaving the overall result unchanged:

	$\displaystyle\pi_{1}A_{i_{1}}A_{i_{2}}\ldots A_{i_{m}}f_{1}$	$\displaystyle=(\pi_{1}V)(V^{-1}A_{i_{1}}V)(V^{-1}A_{i_{2}}V)\ldots(V^{-1}A_{i_% {m}}V)(V^{-1}f_{1})$
		$\displaystyle=\pi_{2}B_{i_{1}}B_{i_{2}}\ldots B_{i_{m}}f_{2}$		(6)

We analyze the effect of the similarity transform for a matrix of the general form

A=\begin{pmatrix}1&0&0&0&0&0\\ a_{21}&a_{22}&0&0&0&0\\ a_{31}&a_{32}&a_{33}&0&0&0\\ a_{41}&a_{42}&a_{43}&a_{44}&0&0\\ a_{51}&a_{52}&a_{53}&a_{54}&a_{55}&0\\ a_{61}&a_{62}&a_{63}&a_{64}&a_{65}&a_{66}\end{pmatrix}.

The transformed matrix is

B=V^{-1}AV=\\ \ \ \left(\begin{array}[]{l@{\ \ \ }l@{\ \ \ }l@{\ \ }l@{\ \ }l@{\ \ }l@{\,}}K% &K_{12}&K{-}a_{33}{-}a_{43}{-}a_{53}{-}a_{63}&K{-}a_{44}{-}a_{54}{-}a_{64}&K{-% }a_{55}{-}a_{65}&K-a_{66}\\ a_{21}&a_{21}+a_{22}&a_{21}&a_{21}&a_{21}&a_{21}\\ a_{31}&a_{31}+a_{32}&a_{31}+a_{33}&a_{31}&a_{31}&a_{31}\\ a_{41}&a_{41}+a_{42}&a_{41}+a_{43}&a_{41}+a_{44}&a_{41}&a_{41}\\ a_{51}&a_{51}+a_{52}&a_{51}+a_{53}&a_{51}+a_{54}&a_{51}+a_{55}&a_{51}\\ a_{61}&a_{61}+a_{62}&a_{61}+a_{63}&a_{61}+a_{64}&a_{61}+a_{65}&a_{61}+a_{66}% \end{array}\right)

with $K=1-a_{21}-a_{31}-a_{41}-a_{51}-a_{61}$ and $K_{12}=K{-}a_{22}{-}a_{32}{-}a_{42}{-}a_{52}{-}a_{62}$ .

Lemma 5.

1.

the matrix $B=V^{-1}AV$ has column sums $1$ .
2.

For $v\neq\epsilon$ and $w\neq\epsilon$ , the matrix $V^{-1}A(v,w)V$ is positive, and therefore column-stochastic.

Proof.

The first statement can be easily checked directly, but it has a systematic reason:

(1,1,1,1,1,1)V^{-1}AV=(1,0,0,0,0,0)AV=(1,0,0,0,0,0)V=(1,1,1,1,1,1)

The second statement is not needed for Theorem 1, because positivity is established anyway in Step 4, after destroying it in Step 3, but we need it for Theorem 2. So let us check it: The only entries that are in danger of becoming negative are the entries of the first row. The two entries $a_{21}=0.v$ and $a_{41}=0.w$ of the matrix $A(v,w)$ can be as large as $0.121212\ldots\leq 0.15$ ; all other entries are safely below $0.05$ . Thus, even the “most dangerous” candidate $K_{12}$ , where 10 of the entries $a_{ij}$ are subtracted from 1, cannot be zero or negative.

It can be checked that rows 2–6 are positive, because the first column of $A(v,w)$ , which is positive, has been added to every other column. $\hfill\blacktriangleleft$

In the transformation from $\pi_{1}$ to $\pi_{2}=\pi_{1}V$ , the first entry is added to all other entries. Thus, the vector $\pi_{1}=(\frac{1}{99},0,-1,0,-1-\frac{6}{99},2)$ , becomes $\pi_{2}=(\frac{1}{99},\frac{1}{99},-1+\frac{1}{99},\frac{1}{99},-1-\frac{5}{99% },2+\frac{1}{99})$ , whose entries sum to zero:

\pi_{2}\left(\small\begin{matrix}1\\ 1\\ 1\\ 1\\ 1\\ 1\end{matrix}\right)=\pi_{1}V\left(\small\begin{matrix}1\\ 1\\ 1\\ 1\\ 1\\ 1\end{matrix}\right)=\pi_{1}\left(\small\begin{matrix}6\\ 1\\ 1\\ 1\\ 1\\ 1\end{matrix}\right)=0

(7)

It was for this reason that the entry $\frac{105}{99}$ of $\pi_{1}$ and the corresponding entries of the transformations (3) were chosen.

The output vector $f$ is unchanged by the transformation: $f_{2}=V^{-1}f_{1}=f_{1}$ .

2.5 Step 3: Making row sums 1 with an extra state

By adding an extra column $r_{i}$ , we now make all row sums equal to 1. We also add an extra row, resulting in a $7\times 7$ matrix

C_{i}=\begin{pmatrix}B_{i}&r_{i}\\ 0&1\end{pmatrix}

(8)

The entries of $r_{i}$ may be negative. They lie in the range $-5\leq r\leq 1$ . All column sums are still 1: Since the row sums are 1, the total sum of the entries is 7. Therefore, since the first six column sums are 1, the sum of the last column must also be 1.

The vectors $\pi_{2}$ and $f_{2}$ are extended to vectors $\pi_{3}$ and $f_{3}$ of length 7 by adding a 0, but they are otherwise unchanged. Thus, the extra column and row of $C_{i}$ plays no role for the value of the product (6), which remains unchanged:

\pi_{3}C_{i_{1}}C_{i_{2}}\ldots C_{i_{m}}f_{3}=\pi_{2}B_{i_{1}}B_{i_{2}}\ldots B% _{i_{m}}f_{2}

2.6 Step 4: Making the matrices positive, and hence stochastic

Let $J$ be the doubly-stochastic $7\times 7$ transition matrix of the “completely random transition” with all entries $1/7$ . Then, with $\alpha=0.01$ , we form the matrices $D_{i}:=(1-\alpha)J+\alpha C_{i}$ . The constant $\alpha$ is small enough to ensure that $D_{i}>0$ . Hence the matrices $D_{i}$ are doubly-stochastic.

If expand of the product $\pi_{3}\prod_{j=1}^{m}D_{i_{j}}=\pi_{3}\prod_{j=1}^{m}\bigl{(}(1-\alpha)J+% \alpha C_{i_{j}}\bigr{)}$ , we get a sum of $2^{m}$ terms, each containing a product of $m$ of the matrices $J$ or $C_{i}$ . We find that all terms containing the factor $J$ vanish. The reason is that, since $C_{i}$ has row and column sums 1, $JC_{i}=C_{i}J=J$ . Moreover, $\pi_{3}J=0$ by (7). It follows that

\pi_{3}D_{i_{1}}D_{i_{2}}\ldots D_{i_{m}}f_{3}=\alpha^{m}\cdot\pi_{3}C_{i_{1}}% C_{i_{2}}\ldots C_{i_{m}}f_{3}

The factor $\alpha^{m}$ plays no role for the sign, and hence,

\pi_{3}D_{i_{1}}D_{i_{2}}\ldots D_{i_{m}}f_{3}>0

(9)

if and only if $i_{1}i_{2}\ldots i_{m}$ is a solution of the PCP.

2.7 Step 5: Turning $\pi$ into a probability distribution

The start vector $\pi_{3}$ has sum 0 and contains negative entries, which are smaller than $-1$ but not smaller than $-2$ . We form $\pi_{4}=\bigl{(}(2,2,2,2,2,2,2)+\pi_{3})\bigr{)}/14$ . It is positive and has sum 1. The effect of substituting $\pi_{3}$ by $\pi_{4}$ in (9) is that the result is increased by $1/7$ . The reason is that $(1,1,1,1,1,1,1)D_{i_{1}}D_{i_{2}}\ldots D_{i_{m}}=(1,1,1,1,1,1,1)$ , since the matrices $D_{i}$ are doubly-stochastic, and in particular, column-stochastic, and therefore $(2,2,2,2,2,2,2)/14\cdot D_{i_{1}}D_{i_{2}}\ldots D_{i_{m}}f_{3}=2/14=1/7$ .

In summary, we have now constructed a true PFA with seven states that models the PCP:

\pi_{4}D_{i_{1}}D_{i_{2}}\ldots D_{i_{m}}f_{3}>1/7

(10)

if and only if $i_{1}i_{2}\ldots i_{m}$ is a (possibly empty) solution of the PCP.

2.8 Using a small PCP

We base our proof on the undecidability of the PCP with 5 word pairs, as established by Turlough Neary [18, Theorem 11] in 2015. Neary constructed PCP instances with five pairs $(v_{1},w_{1}),\,(v_{2},w_{2}),\,(v_{3},w_{3}),\,(v_{4},w_{4}),\,(v_{5},w_{5})$ that have the following property: Every solution necessarily starts with the pair $(v_{1},w_{1})$ and ends with $(v_{5},w_{5})$ . We can also assume that the end pair $(v_{5},w_{5})$ is used nowhere else. (More precisely, in every primitive solution (which is not a concatenation of smaller solutions), the end pair $(v_{5},w_{5})$ cannot appear anywhere else than in the final position.) However, the start pair $(v_{1},w_{1})$ is also used in the middle of the solutions. (This multipurpose usage of the start pair is one of the devices to achieve such a remarkably small number of word pairs.)³³3The proof of Theorem 11 in Neary [18] contains an error, but this error can be fixed: His PCP instances encode binary tag systems. When showing that the PCP solution must follow the intended patters of the simulation of the binary tag system, Neary [18, p. 660] needs to show that the end pair $(v_{5},w_{5})=(10^{\beta}1111,1111)$ cannot be used except to bring the two strings to a common end. He claims that a block $1111$ cannot appear in the encoded string because in $u$ (the unencoded string of the binary tag system, which is described in Lemma 9) we cannot have two $c$ symbols next to each other. This is not true. The paper contains plenty of examples, and they contradict this claim; for example, the string $u^{\prime}$ in (7) [18, p. 657] contains seven $c$ ’s in a row. The mistake can be fixed by taking a longer block of 1s: Looking at the appendants in Lemma 9, it is clear that every block of length $|u|+1$ must contain a symbol $b$ . Thus, the pair $(v_{5},w_{5})=(10^{\beta}1^{|u|+99},1^{|u|+99})$ will work as end pair.

Reversing the words.

For our construction it is preferable to have the word pair $(v_{5},w_{5})$ at the beginning. We thus reverse all words. Any solution sequence $i_{1}i_{2}\ldots i_{m}$ for the original problem must be also be reversed to $i_{m}i_{m-1}\ldots i_{1}$ , but this does not affect the solvability of the PCP. Thus, we work with PCP instances of the form $(v_{1}^{R},w_{1}^{R}),\allowbreak(v_{2}^{R},w_{2}^{R}),\ldots,(v_{5}^{R},w_{5}% ^{R})$ with the following property: Every solution sequence $i_{1}i_{2}\ldots i_{m}$ must start with the pair $(v_{5}^{R},w_{5}^{R})$ , and the pair $(v_{5}^{R},w_{5}^{R})$ cannot be used anywhere else: $i_{1}=5$ and $1\leq i_{j}\leq 4$ for $j=2,\ldots,m$ .

2.9 Step 6. Merging the leftmost matrix into the starting distribution

We apply the above construction steps to the word pairs $(v_{1}^{R},w_{1}^{R}),\allowbreak\ldots,(v_{5}^{R},w_{5}^{R})$ , leading to five matrices $D_{1},D_{2},D_{3},D_{4},D_{5}$ . Since the leftmost matrix must be $D_{5}$ in any solution, we can combine this matrix $D_{5}$ with the starting distribution $\pi_{4}$ :

\pi_{4}D_{5}D_{i_{2}}\ldots D_{i_{m}}f_{3}=\pi_{5}D_{i_{2}}\ldots D_{i_{m}}f_{% 3},

leading to a new starting distribution $\pi_{5}:=\pi_{4}D_{5}$ . The matrix $D_{5}$ can be removed from the pool $\mathcal{M}$ of matrices, leaving only 4 matrices. We have simultaneously eliminated the empty solution with a product of $m=0$ matrices. This concludes the proof of Theorem 1 a. ∎

2.10 Proof of Theorem 1 b (fixed transition matrices) by using a PCP with only one variable word pair

Matiyasevich and Sénizergues [16] constructed PCP instances with seven word pairs $(v_{1},w_{1}),\,\allowbreak(v_{2},w_{2}),\,(v_{3},w_{3}),\,(v_{4},w_{4}),\,(v_% {5},w_{5}),\,(v_{6},w_{6}),\,(v_{7},w_{7})$ that have the following property: Every solution necessarily starts with the pair $(v_{1},w_{1})$ and ends with $(v_{2},w_{2})$ . Both the start pair $(v_{1},w_{1})$ and the end pair $(v_{2},w_{2})$ can be assumed to appear nowhere else, in the sense described in Section 2.8, i. e., apart from the possibility to concatenate solutions to obtain longer solutions. Matiyasevich and Sénizergues [16] used a reduction, due to Claus [4], from the individual accessibility problem (or individual word problem) for semi-Thue systems. They showed that the individual accessibility problem is already undecidable for a particular semi-Thue system with 3 rules [16, Theorem 3]. Halava, Harju, and Hirvensalo [12, Theorem 6] observed an important consequence of this: One can fix all words except $v_{1}$ , and leave only $v_{1}$ as an input to the problem, and the PCP is still undecidable.

From these word pairs, we form the matrices $A_{i}=A(v_{i},w_{i})$ from the words without reversal. Following the same steps as above, we eventually arrive at corresponding matrices $D_{1},\ldots,D_{7}$ . We merge $D_{1}$ with $\pi_{4}$ into a new starting distribution $\pi_{5}:=\pi_{4}D_{1}$ . We are left with a pool $\mathcal{M}=\{D_{2},\ldots,D_{7}\}$ of six fixed matrices. The only variable input is the starting distribution $\pi_{5}$ . This concludes the proof of Theorem 1 b. ∎

3 Proofs of Theorem 1 c and Theorem 1 d (binary input)

3.1 Reduction to two matrices

The following lemma and its proof is extracted from Step 3 in Hirvensalo [13, Section 3]. We denote by $\phi(u)$ the acceptance probability of a (generalized) PFA for an input $u$ , i.e., the value of the product in the expression (1).

Lemma 6.

Let $A=(\pi,\mathcal{M},f)$ be a generalized PFA with $k=|\Sigma|\geq 3$ matrices $M_{i}$ of size $d\times d$ , such that the first row of each matrix $M_{i}$ is $(1,0,\ldots,0)$ .

Then one can obtain a generalized PFA $A^{\prime}=(\pi^{\prime},\mathcal{M}^{\prime},f^{\prime})$ over the two-symbol alphabet $\{\mathtt{a,b}\}$ , with $2$ matrices $M_{\mathtt{a}}^{\prime}$ and $M_{\mathtt{b}}^{\prime}$ of dimension $(k-1)(d-1)+1$ such that for every word $u^{\prime}\in\{\mathtt{a,b}\}^{*}$ there exists a word $u\in\Sigma^{*}$ with

\phi(u)=\phi^{\prime}(u^{\prime}),

(11)

and conversely, for every word $u\in\Sigma^{*}$ there exists a word $u^{\prime}\in\{\mathtt{a,b}\}^{*}$ with (11).

If the given matrices $M_{i}$ are nonnegative, then so are $M_{\mathtt{a}}^{\prime}$ and $M_{\mathtt{b}}^{\prime}$ . If the given matrices $M_{i}$ are stochastic, then so are $M_{\mathtt{a}}^{\prime}$ and $M_{\mathtt{b}}^{\prime}$ . If $\pi$ is a probability distribution, then so is $\pi^{\prime}$ . The entries of $f^{\prime}$ are taken from the entries of $f$ .

Thus, if $A$ is a PFA, then $B$ is a PFA as well.

The lemma is not specific about the word $u$ or $u^{\prime}$ whose existence is guaranteed. Nevertheless, regardless of how these words are found, the statement is sufficient in the context of the emptiness question.

However, we can be explicit about $u$ and $u^{\prime}$ : The construction uses a binary encoding $\tau\colon\Sigma^{*}\to\{\mathtt{a,b}\}^{*}$ with the prefix-free codewords $\{\mathtt{b,ab},\mathtt{aab},\allowbreak\ldots,\allowbreak\mathtt{a}^{k-2}% \mathtt{b},\allowbreak\mathtt{a}^{k-1}\}$ . Then, we can take $u^{\prime}=\tau(u)$ , because

\phi(u)=\phi^{\prime}(\tau(u)).

For words $u^{\prime}$ that are not of the form $\tau(u)$ , (11) holds for the longest word $u$ such that $\tau(u)$ is a prefix of $u^{\prime}$ . In other words, $u$ is the decoding of the longest decodable prefix of $u^{\prime}$ .

Proof.

The procedure is easiest to describe if we assume that $A$ is a PFA. Then we can think of $A^{\prime}$ as an automaton that decodes the input $u^{\prime}\in\{\mathtt{a},\mathtt{b}\}^{*}$ and carries out a transition of $A$ whenever it has read a complete codeword. In addition to the state $q\in\{1,\ldots,d\}$ of $A$ , $A^{\prime}$ needs to maintain a counter $i$ in the range $0\leq i\leq k-2$ for the number of a’s of the current partial codeword. Whenever $A^{\prime}$ reads a b, or if it has read the $(k-1)$ -st a, $A^{\prime}$ performs the appropriate random transition and resets the counter. The number of states of $A^{\prime}$ is $d\times(k-1)$ .

The fact that state 1 is an absorbing state allows a shortcut: If we are in state 1, we can stop to maintain the counter $i$ . Thus, only states $2,\ldots,d$ need to be multiplied by $k-1$ , and the resulting number of states reduces to $(d-1)(k-1)+1$ .

We now describe the construction of $A^{\prime}$ in terms of transition matrices. This construction is valid also when $A$ is a generalized PFA, where the above description in terms of random transitions makes no sense. To make the description more concrete, we illustrate it with $k=5$ matrices $M_{1},\ldots,M_{5}$ and the corresponding binary codewords b, ab, aab, aaab, aaaa.

We split the $d\times d$ matrices $M_{i}$ and the vectors $\pi$ and $f$ into blocks of size $1+(d-1)$ :

M_{i}=\begin{pmatrix}1&0\\ c_{i}&\hat{C}_{i}\\ \end{pmatrix},\ \pi=\begin{pmatrix}p_{1}\\ \hat{\pi}\\ \end{pmatrix}^{T},\ f=\begin{pmatrix}f_{1}\\ \hat{f}\\ \end{pmatrix}.

We define new transition matrices and start and end vectors in block form with block sizes $1+(d-1)+(d-1)+(d-1)+(d-1)$ as follows:

M^{\prime}_{\mathtt{a}}=\begin{pmatrix}1&0&0&0&0\\ 0&0&I&0&0\\ 0&0&0&I&0\\ 0&0&0&0&I\\ c_{5}&\hat{C}_{5}&0&0&0\\ \end{pmatrix},\ M^{\prime}_{\mathtt{b}}=\begin{pmatrix}1&0&0&0&0\\ c_{1}&\hat{C}_{1}&0&0&0\\ c_{2}&\hat{C}_{2}&0&0&0\\ c_{3}&\hat{C}_{3}&0&0&0\\ c_{4}&\hat{C}_{4}&0&0&0\\ \end{pmatrix},\ \pi^{\prime}=\begin{pmatrix}p_{1}\\ \hat{\pi}_{2}\\ 0\\ 0\\ 0\end{pmatrix}^{T}\text{, and }f^{\prime}=\begin{pmatrix}f_{1}\\ \hat{f}\\ \hat{f}\\ \hat{f}\\ \hat{f}\end{pmatrix}.

From the sequence of powers of $M^{\prime}_{\mathtt{a}}$ ,

(M^{\prime}_{\mathtt{a}})^{2}=\begin{pmatrix}1&0&0&0&0\\ 0&0&0&I&0\\ 0&0&0&0&I\\ c_{5}&\hat{C}_{5}&0&0&0\\ c_{5}&0&\hat{C}_{5}&0&0\\ \end{pmatrix},\ (M^{\prime}_{\mathtt{a}})^{3}=\begin{pmatrix}1&0&0&0&0\\ 0&0&0&0&I\\ c_{5}&\hat{C}_{5}&0&0&0\\ c_{5}&0&\hat{C}_{5}&0&0\\ c_{5}&0&0&\hat{C}_{5}&0\\ \end{pmatrix},

we can recognize the pattern of development. We can then work out the matrices $M^{\prime}_{\mathtt{b}}$ , $M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{b}}$ , $M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{b}}$ , $M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{a}}M^{\prime% }_{\mathtt{b}}$ , and $M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{a}}M^{\prime}_{\mathtt{a}}M^{\prime% }_{\mathtt{a}}$ and check that they are of the form

\begin{pmatrix}1&0&0&0&0\\ c_{i}&\hat{C}_{i}&0&0&0\\ *&*&*&*&*\\ *&*&*&*&*\\ *&*&*&*&*\\ \end{pmatrix}

for $i=1,2,3,4,5$ , having the original matrices $M_{i}$ in their upper-left corner and otherwise zeros in the first two rows of blocks. Thus, they simulate the original automaton on the states $1,\ldots,d$ : It is easy to establish by induction that multiplying the initial distribution vector ${\pi^{\prime}}$ with a sequence of such matrices will produce a vector $x^{\prime}$ of the form

x^{\prime}=(x_{1}\ \hat{x}\ 0\,\ 0\,\ 0)

(12)

whose first $d$ entries $x=(x_{1}\ \hat{x})$ coincide with the entries of the corresponding vector produced with the original start vector $\pi$ and the original matrices $M_{i}$ . If $x^{\prime}$ is multiplied with $f^{\prime}$ , the result $x_{1}f_{1}+\hat{x}^{T}\hat{f}$ is the same as with $x$ and the original vector $f$ .

One technicality remains to be discussed: Some “unfinished” words in $\{\mathtt{a},\mathtt{b}\}^{*}$ do not factor into codewords but end in a partial codeword a, aa, or aaa. To analyze the corresponding matrix products, we look at the powers $(M^{\prime}_{\mathtt{a}})^{i}$ , for $i=1,2,3$ . They have the form

\begin{pmatrix}1&0&0&0&0\\ 0&0&I&0&0\\ *&*&*&*&*\\ *&*&*&*&*\\ *&*&*&*&*\\ \end{pmatrix},\ \begin{pmatrix}1&0&0&0&0\\ 0&0&0&I&0\\ *&*&*&*&*\\ *&*&*&*&*\\ *&*&*&*&*\\ \end{pmatrix},\ \text{and }\begin{pmatrix}1&0&0&0&0\\ 0&0&0&0&I\\ *&*&*&*&*\\ *&*&*&*&*\\ *&*&*&*&*\\ \end{pmatrix}.

(13)

and therefore, multiplying the vector $x$ from (12) with them yields $(x_{1}\ 0\ \hat{x}\ 0\ 0)$ , $(x_{1}\ 0\ 0\ \hat{x}\ 0)$ , and $(x_{1}\ 0\ 0\ 0\ \hat{x})$ , respectively. If this is multiplied with $f^{\prime}$ , the result is the same as with the vector $x^{\prime}$ in (12). Thus, as claimed after the statement of the lemma, input sequences whose decoding process leaves a partial codeword $\mathtt{a}^{i}$ for $1\leq i\leq k-2$ produce the same value as if that partial codeword were omitted.⁴⁴4Hirvensalo [13, p. 314] defined the vector $f^{\prime}$ ( $\mathbf{y}_{3}$ in his notation) “analogously” to the vector $\pi^{\prime}$ ( $\mathbf{x}_{3}$ in his notation), thus padding it with zeros. We see that with this definition, the result for incomplete inputs is $x_{1}f_{1}$ , which, in the general setting of our lemma, it has no predictable relation with the meaningful value $x_{1}f_{1}+\hat{x}\hat{f}$ . In Hirsensalo’s case, $x_{1}f_{1}$ can be shown by a delicate argument to be $\leq 0$ , see [21, Section 8.2.3, footnote 31] in the updated version on the author’s homepage. $\hfill\blacktriangleleft$

3.2 Proof of Theorem 1 c

Section 3.1 blows up the number of states by a factor that depends on the number of matrices. Thus, it is advantageous to apply it after merging the start matrix into $\pi$ , when the number of matrices is reduced.

So we start with the 5-pair instances of Neary [18] and construct five matrices $A_{i}=A(v_{i}^{R},w_{i}^{R})$ from the reversed pairs. We then combine $A_{5}$ with the starting distribution $\pi_{1}$ into $\pi_{2}=\pi_{1}A_{5}$ , leaving $k=4$ matrices $A_{1},A_{2},A_{3},A_{4}$ of dimension $d=6$ .

We cannot apply the transformations of Step 2, because it changes the first row, and state 1 would no longer be absorbing, which precludes the application of Section 3.1.

Thus, we apply Section 3.1 to the matrices $A_{1},A_{2},A_{3},A_{4}$ , resulting in two matrices $M_{\mathtt{a}}^{\prime}$ and $M_{\mathtt{b}}^{\prime}$ of dimension $(k-1)(d-1)+1=16$ . The new start vector $\pi_{3}$ is $\pi_{2}$ padded with zeros, and the end vector $f$ is still the first unit vector (of dimension 16).

We would now like to use the transformation of Step 2 to make the matrices column-stochastic. However, since we have replaced the initial start vector $\pi_{1}$ by $\pi_{2}$ , the entries of the start vector after the transformation would not longer sum to 0, a property that is crucial for eventually making the matrices stochastic in Step 4.

Therefore we have to achieve column sums 1 in a different way, with an adaptation of the method of Step 3 (Section 2.5), adding an extra state;

3.3 Step 2^′: Making column sums 1 with an extra state

We add an extra row $s_{i}$ and an extra column to each matrix, ensuring that all column sums become 1.

B_{1}=\begin{pmatrix}M_{\mathtt{a}}&0\\ s_{1}&1\\ \end{pmatrix},\ B_{2}=\begin{pmatrix}M_{\mathtt{b}}&0\\ s_{2}&1\\ \end{pmatrix}

(14)

We now have two $17\times 17$ matrices $B_{1},B_{2}$ with column sums 1. The remaining steps (Steps 3–5) are as before, with the appropriate adaptations, and they add another row and column. (We may have to choose a smaller constant $\alpha$ in Step 4.) This concludes the proof of Theorem 1 c. ∎

3.4 Proof of Theorem 1 d (two fixed matrices)

This is obtained by adapting the Proof of Theorem 1 c in the same way as for Theorem 1 b (Section 2.10). Instead of Neary’s 5-pair instances, we take the instance $(v_{1},w_{1}),\,\ldots,\,(v_{7},w_{7})$ of Matiyasevich and Sénizergues and construct seven matrices $A_{i}=A(v_{i},w_{i})$ from these pairs (without reversing the words $v_{i}$ and $w_{i}$ ). We then combine $A_{1}$ , which must be the first matrix in the product, with the starting distribution $\pi_{1}$ into $\pi_{2}=\pi_{1}A_{1}$ . The remaining matrices $A_{2},A_{3},\ldots,A_{7}$ are fixed. Only $\pi_{2}$ is an input to the problem.

The remainder of the proof is the same as in Theorem 1 c. We apply Section 3.1 to $k=6$ matrices $A_{i}$ of dimension $d=6$ , resulting in two fixed matrices $M_{\mathtt{a}}^{\prime}$ and $M_{\mathtt{b}}^{\prime}$ of dimension $(k-1)(d-1)+1=26$ . The conversion to a PFA requires two more states, and this leads to Theorem 1 d. ∎

4 Proof of Theorem 2 (variable end vector ${f}$ )

One can relax the requirement that $f$ is a 0-1-vector and allow arbitrary values. If the values are in the interval $[0,1]$ we can think of the entry $f_{i}$ as a probability in a final acceptance decision, if the PFA is in state $i$ after the input has been read. Another possibility is that $f_{i}$ represents a prize or reward that that is gained when the process stops in state $i$ . Then $f_{i}$ does not need to be restricted to the interval $[0,1]$ . In this view, instead of the acceptance probability, we compute the expected reward of the automaton, as in game theory. We call $f_{q}$ the output values, and $f$ and the output vector or the end vector, in analogy to the start vector $\pi$ .

Since the transition matrices $B_{i}$ after Step 2 are column-stochastic and positive by Section 2.4, we transpose them to produce stochastic matrices, which can be directly used as transition matrices. This will reverse the order of the matrices in the product and swap the start vector with the end vector:

	$\displaystyle\pi_{2}B_{i_{1}}B_{i_{2}}\ldots B_{i_{m}}f_{2}$	$\displaystyle=f^{T}_{2}B^{T}_{i_{m}}B_{i_{m-1}}^{T}\ldots B_{i_{1}}^{T}\pi_{2}% ^{T}$
		$\displaystyle=\pi_{6}B^{T}_{i_{m}}B_{i_{m-1}}^{T}\ldots B_{i_{1}}^{T}f_{6}$		(15)

We may prefer to have the end vector $f$ in the interval $[0,1]$ . Thus, we replace $f_{6}=\pi_{2}^{T}$ by $f_{7}=\bigl{(}(2,2,2,2,2,2)+\pi_{2})\bigr{)}^{T}/12$ , in analogy to Section 2.7. This vector is positive and has sum 1, and in particular, the entries lie between 0 and 1. The effect is to increase the value of (15) by 1/6.

We take the 5-pair instances of Neary [18] (Section 2.8) and construct five matrices $A_{i}=A(v_{i}^{R},w_{i}^{R})$ from the reversed pairs. The words $v_{i}$ and $w_{i}$ in these instances are nonempty, as required by Section 2.4 [18, Proof of Theorem 11]. We convert them to column-stochastic matrices $B_{i}$ and use the transposed matrices $B_{i}^{T}$ . We know that $(v_{5}^{R},w_{5}^{R})$ must be used at the beginning of the PCP solution $i_{1}i_{2}\ldots i_{m}$ ( $i_{1}=5$ ) and nowhere else. Hence $B_{5}^{T}$ must be used at the end of the product in (15), and we can merge it with $f_{6}$ into an end vector $f_{8}=B_{5}^{T}f_{7}=B_{5}^{T}\pi_{2}^{T}$ . The four remaining matrices $B_{1}^{T},B_{2}^{T},B_{3}^{T},B_{4}^{T}$ form the set $\mathcal{M}$ . The starting distribution $\pi_{6}=f_{2}^{T}$ is a unit vector, i.e., there is a single deterministic start state. This proves Theorem 2 a for $\lambda=1/6$ .

4.1 Changing the cutpoint $\lambda$ by manipulating the end vector ${f}$

When the end vector $f$ of a PFA is under control, one can change the cutpoint $\lambda$ to any desired value in the open interval between $0$ and $1$ : Adding a constant $K$ to all output values $f_{i}$ will increase the expected output value by $K$ , and scaling by a positive constant will affect the expected output value in the same way.

Thus, by changing all $f_{i}$ to $\alpha f_{i}$ for $0<\alpha<1$ , one may decrease $\lambda$ to any positive value. By changing all $f_{i}$ to $1-\alpha+\alpha f_{i}$ for $0<\alpha<1$ , one may increase $\lambda$ to any value less than 1. If $f$ is not constrained to the interval $[0,1]$ , one can reach any real value $\lambda$ .

4.2 Proof of Theorem 2 b (fixed transition matrices)

We would like to apply the approach of Theorem 2 a to the the seven word pairs $(v_{1},w_{1}),\,\allowbreak\ldots,\,(v_{7},w_{7})$ from Section 2.10. They should fulfill the assumption of Section 2.4 that none of the words is empty. However, one of the rules of the 3-rule semi-Thue system of Matiyasevich and Sénizergues, the rule $x\bar{x}\longrightarrow\epsilon$ , contains an empty word, see [16, System $S_{5}$ , (23)]. The reduction of Claus [4] (see also [12, Theorem 4]) would translate this into a PCP pair with an empty word.

Luckily, this reduction can be patched to yield seven pairs of nonempty words $v_{i}$ and $w_{i}$ . We refer to [22, Appendix B] for details.

As above, we form from these pairs the stochastic matrices $B_{1}^{T},\ldots,B_{7}^{T}$ . Only $B_{1}^{T}$ is a variable matrix, and all other matrices are fixed. In the product $\pi_{6}B^{T}_{i_{m}}B_{i_{m-1}}^{T}\ldots B_{i_{1}}^{T}f_{7}$ , we merge the first matrix with $\pi_{6}$ into $\pi_{8}:=\pi_{6}B^{T}_{i_{m}}=\pi_{6}B^{T}_{2}$ , which becomes the fixed start vector, and the last matrix into $f_{8}:=B_{i_{1}}^{T}f_{7}=B_{1}^{T}f_{7}$ , which becomes the only input to the problem. The remaining five matrices $B_{3}^{T},B_{4}^{T},B_{5}^{T},B_{6}^{T},B_{7}^{T}$ form the fixed set $\mathcal{M}$ . This proves Theorem 2 b for $\lambda=1/6$ , and as above, it can be changed to any cutpoint $\lambda$ . ∎

5 Outlook

The natural question is to ask for the smallest number of states for which PFA Emptiness is undecidable. Even if we consider generalized PFAs (rational-weighted automata), we could not reduce this number below 6. Claus [5, Theorem 7 and Corollary, p. 155] showed that the emptiness question can be decided for PFAs with two states.

A matrix dimension of six seems to be the minimum required in order to carry enough information to model concatenation of word pairs and allow testing for equality by a quadratic expression like (5), even in weighted automata. Undecidability for five or perhaps even three states would require some completely different (perhaps geometric) approach.

It would be an interesting exercise to trace back the undecidability proof of Matiyasevich and Sénizergues [16] to its origins and explicitly work out the fixed matrices of Theorem 1 b or 2 b. For one of the weaker results mentioned after Theorem 2, [21, Theorem 4b], one particular matrix from a set of 52 fixed $11\times 11$ matrices is shown in [21, Section 7.3].

PFAs can be have other merits than just a small number of states and input symbols. We discuss some of these criteria.

Positive and doubly-stochastic transition matrices.

In our results, the transition matrices can always be assumed to be strictly positive and sometimes also doubly-stochastic. Whenever this is the case, we have mentioned it in Theorems 1 and 2.

5.1 Obtaining a substantial gap between acceptance and rejection

As seen in formula (5), the acceptance probability barely rises above the threshold $0$ when the input represents a solution of the PCP. (This tiny gap is further reduced by multiplying it with $\alpha^{m}$ in Step 4.) Thus, our constructions depend on the capability of a PFA to “detect” minute fluctuations of the acceptance probability above the cutpoint. This statement applies to all undecidability proofs in the Nasu–Honda line of descent.

By contrast, the proof of Condon and Lipton [6] from 1989 gives a more robust result, see also [21, Section 4]: For any $\varepsilon>0$ , it yields a PFA such that the largest acceptance probability is either $\geq 1-\varepsilon$ or $\leq\varepsilon$ , and the problem to detect which of the two cases holds is undecidable. Undecidability is derived from the halting problem for 2-counter machines, and the number of states of the PFA is beyond control.

Luckily, our results can be strengthened to even surpass the strong separation achieved in the Condon–Lipton construction, by the following gap amplification technique of Gimbert and Oualhadj, which we formulate in slightly generalized form.

Theorem 7 (Gimbert and Oualhadj 2010 [10]).

We are given a PFA $A$ with input alphabet $\Sigma$ and $d$ states, with an output vector $f\in[0,1]^{d}$ . We denote its acceptance probability for an input $u\in\Sigma^{*}$ by $\phi(u)$ . Let $\lambda_{A},\lambda_{B}$ be two thresholds with $0\leq\lambda_{A}\leq 1$ and $0<\lambda_{B}<1$ .

Then, from the description of $A$ , we can construct a PFA $B$ with input alphabet $\Sigma\cup\{\mathtt{end},\mathtt{check}\}$ with $2d+3$ states, a single start state and a single accepting state, with the following property.

1.

If every input $u\in\Sigma^{*}$ for $A$ has acceptance probability $\phi(u)\leq\lambda_{A}$ , then every input for $B$ is accepted by $B$ with probability $\leq\lambda_{B}$ .
2.

If $A$ has an input $u\in\Sigma^{*}$ with acceptance probability $\phi(u)>\lambda_{A}$ , then $B$ has inputs with acceptance probability arbitrarily close to $1$ . ∎

This was proved for $\lambda_{A}=\lambda_{B}=1/2$ by Gimbert and Oualhadj [10] in 2010 in order to show that it is undecidable whether the conclusion of Case 2 for a PFA $B$ holds (the “Value 1 Problem”). The construction and proof was simplified by Fijalkow as part of a short 8-page note [8]. The generalization to arbitrary $\lambda_{A}$ and $\lambda_{B}$ is not difficult, and in addition, we have included the precise statement about the number $2d+3$ of states. A proof is given in [22, Appendix A]. It essentially follows Fijalkow’s construction, and it eliminates an oversight in [10] and [8].

With this technique, one can achieve an arbitrarily large gap with a moderate increase in states, roughly by a factor of 2. In particular, from Theorem 2 a, we get a PFA $B$ with 6 matrices of size $15\times 15$ , which exhibits the strong dichotomy expressed in Theorem 7, for any $\lambda_{B}>0$ .

This construction does not preserve the property of being a PFA with fixed transition matrices. In the PFA $B$ constructed in [22, Appendix A, see Table 2], the transitions depend both on the starting distribution $\pi$ and on the final vector $f$ of $A$ .

5.2 Uniqueness

In Theorem 1 a and Theorem 2 a, the constructed PFA has the property that the recognized language contains at most one word. As with the large gap guarantee in Section 5.1, this leads to a stronger statement where the problem gets the nature of a promise problem.

Neary [18] derived his undecidable PCP instances from binary tag systems. A binary tag system performs a deterministic computation. It follows from the correctness argument of the simulation that the PCP solution is unique if it exists, apart from the obvious possibility of repeating the solution arbitrarily. In Section 2.8, we have excluded the last possibility by fixing the end pair $(u_{5}^{R},w_{5}^{R})$ to be the first pair and removing it from the list. Thus, uniqueness holds for the PFA in Theorems 1 a and 2 a.

However, in the conversion to a binary input alphabet for Theorem 1 c (Section 3.1), uniqueness is lost: We have seen that we may always add a or aa to a solution. We don’t see an easy way to eliminate these extra solutions without increasing the number of states.

5.3 Simple probabilistic automata

In a simple PFA, all probabilities are $0$ , $\frac{1}{2}$ , or 1 [10, Definition 2]. We have constructed our PFA with decimal fractions, but it would not be hard to switch to binary fractions. Before Step 4, the number of states should be increased to a power of two, so that the entries of $J$ become binary fractions as well. Once all transition probabilities are binary fractions, the PFA can be converted to a simple PFA by padding each input symbol with sufficiently many dummy symbols so that the PFA has time to make its random decisions with a sequence of unbiased coin flips; see [21, Section 4.4, proof of Theorem 1, item (b)] Thus the results can be achieved with simple PFAs, but with a larger number of matrices and a larger number of states. The precise quantities would depend on the lengths of the words $v_{i}$ and $w_{i}$ in the PCP.

References

[1] Vincent D. Blondel and Vincent Canterini. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Systems, 36:231–245, 2003. doi:10.1007/s00224-003-1061-2.
[2] Vincent D. Blondel and John N. Tsitsiklis. The boundedness of all products of a pair of matrices is undecidable. Systems & Control Letters, 41(2):135–140, 2000. doi:10.1016/S0167-6911(00)00049-9.
[3] Vuong Bui. Growth of Bilinear Maps. PhD thesis, Freie Universität Berlin, Institut für Informatik, 2023. doi:10.17169/refubium-41705.
[4] Volker Claus. Some remarks on PCP( $k$ ) and related problems. Bull. Europ. Assoc. Theoret. Computer Sci. (EATCS), 12:54–61, October 1980. URL: http://page.mi.fu-berlin.de/rote/Kram/Volker_Claus-Some_remarks_on_PCP(k)_and_related_problems-1980-Bull-EATCS-12.pdf.
[5] Volker Claus. The $(n,k)$ -bounded emptiness-problem for probabilistic acceptors and related problems. Acta Informatica, 16:139–160, 1981. doi:10.1007/BF00261257.
[6] A. Condon and R. J. Lipton. On the complexity of space bounded interactive proofs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, pages 462–467, USA, 1989. IEEE Computer Society. doi:10.1109/SFCS.1989.63519.
[7] Ruiwen Dong. Recent advances in algorithmic problems for semigroups. ACM SIGLOG News, 10(4):3–23, December 2023. doi:10.1145/3636362.3636365.
[8] Nathanaël Fijalkow. Undecidability results for probabilistic automata. ACM SIGLOG News, 4(4):10–17, November 2017. doi:10.1145/3157831.3157833.
[9] Rūsiņš Freivalds. Probabilistic two-way machines. In Jozef Gruska and Michal Chytil, editors, Mathematical Foundations of Computer Science 1981 (MFCS), volume 118 of Lecture Notes in Computer Science, pages 33–45. Springer, 1981. doi:10.1007/3-540-10856-4_72.
[10] Hugo Gimbert and Youssouf Oualhadj. Probabilistic automata on finite words: Decidable and undecidable problems. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming. 37th International Colloquium, ICALP 2010, Bordeaux, France, July 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pages 527–538, Berlin, Heidelberg, 2010. Springer-Verlag. Full version in https://hal.science/hal-00456538v3. doi:10.1007/978-3-642-14162-1_44.
[11] Vesa Halava and Tero Harju. On Markov’s undecidability theorem for integer matrices. Semigroup Forum, 75:173–180, 2007. doi:10.1007/s00233-007-0714-x.
[12] Vesa Halava, Tero Harju, and Mika Hirvensalo. Undecidability bounds for integer matrices using Claus instances. International Journal of Foundations of Computer Science, 18(05):931–948, 2007. doi:10.1142/S0129054107005066.
[13] Mika Hirvensalo. Improved undecidability results on the emptiness problem of probabilistic and quantum cut-point languages. In Jan van Leeuwen, Giuseppe F. Italiano, Wiebe van der Hoek, Christoph Meinel, Harald Sack, and František Plášil, editors, SOFSEM 2007: Theory and Practice of Computer Science, volume 4362 of Lecture Notes in Computer Science, pages 309–319, Berlin, Heidelberg, 2007. Springer. doi:10.1007/978-3-540-69507-3_25.
[14] Omid Madani, Steve Hanks, and Anne Condon. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1–2):5–34, 2003. doi:10.1016/S0004-3702(02)00378-8.
[15] A. Markov. On certain insoluble problems concerning matrices. Doklady Akad. Nauk SSSR (N.S.), 57:539–542, 1947. (Russian).
[16] Yuri Matiyasevich and Géraud Sénizergues. Decision problems for semi-Thue systems with a few rules. Theoretical Computer Science, 330(1):145–169, 2005. doi:10.1016/j.tcs.2004.09.016.
[17] Masakazu Nasu and Namio Honda. Mappings induced by PGSM-mappings and some recursively unsolvable problems of finite probabilistic automata. Information and Control, 15(3):250–273, 1969. doi:10.1016/S0019-9958(69)90449-5.
[18] Turlough Neary. Undecidability in binary tag systems and the Post correspondence problem for five pairs of words. In Ernst W. Mayr and Nicolas Ollinger, editors, 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), volume 30 of Leibniz International Proceedings in Informatics (LIPIcs), pages 649–661, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2015.649.
[19] Michael S. Paterson. Unsolvability in $3\times 3$ matrices. Stud. in Appl. Math., 49(1):105–107, 1970. doi:10.1002/sapm1970491105.
[20] Azaria Paz. Introduction to Probabilistic Automata. Computer Science and Applied Mathematics. Academic Press, New York, 1971. doi:10.1016/c2013-0-11297-4.
[21] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable, June 2024. doi:10.48550/arXiv.2405.03035.
[22] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable for a fixed automaton, December 2024. doi:10.48550/arXiv.2412.05198.
[23] Paavo Turakainen. Generalized automata and stochastic languages. Proc. Amer. Math. Soc., 21:303–309, 1969. doi:10.1090/S0002-9939-1969-0242596-1.
[24] Paavo Turakainen. Word-functions of stochastic and pseudo stochastic automata. Annales Fennici Mathematici, 1(1):27–37, February 1975. doi:10.5186/aasfm.1975.0126.

[bib.bib1] [1] Vincent D. Blondel and Vincent Canterini. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Systems, 36:231–245, 2003. doi:10.1007/s00224-003-1061-2.

[bib.bib2] [2] Vincent D. Blondel and John N. Tsitsiklis. The boundedness of all products of a pair of matrices is undecidable. Systems & Control Letters, 41(2):135–140, 2000. doi:10.1016/S0167-6911(00)00049-9.

[bib.bib3] [3] Vuong Bui. Growth of Bilinear Maps. PhD thesis, Freie Universität Berlin, Institut für Informatik, 2023. doi:10.17169/refubium-41705.

[bib.bib4] [4] Volker Claus. Some remarks on PCP( $k$ ) and related problems. Bull. Europ. Assoc. Theoret. Computer Sci. (EATCS), 12:54–61, October 1980. URL: http://page.mi.fu-berlin.de/rote/Kram/Volker_Claus-Some_remarks_on_PCP(k)_and_related_problems-1980-Bull-EATCS-12.pdf.

[bib.bib5] [5] Volker Claus. The $(n,k)$ -bounded emptiness-problem for probabilistic acceptors and related problems. Acta Informatica, 16:139–160, 1981. doi:10.1007/BF00261257.

[bib.bib6] [6] A. Condon and R. J. Lipton. On the complexity of space bounded interactive proofs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science, SFCS ’89, pages 462–467, USA, 1989. IEEE Computer Society. doi:10.1109/SFCS.1989.63519.

[bib.bib7] [7] Ruiwen Dong. Recent advances in algorithmic problems for semigroups. ACM SIGLOG News, 10(4):3–23, December 2023. doi:10.1145/3636362.3636365.

[bib.bib8] [8] Nathanaël Fijalkow. Undecidability results for probabilistic automata. ACM SIGLOG News, 4(4):10–17, November 2017. doi:10.1145/3157831.3157833.

[bib.bib9] [9] Rūsiņš Freivalds. Probabilistic two-way machines. In Jozef Gruska and Michal Chytil, editors, Mathematical Foundations of Computer Science 1981 (MFCS), volume 118 of Lecture Notes in Computer Science, pages 33–45. Springer, 1981. doi:10.1007/3-540-10856-4_72.

[bib.bib10] [10] Hugo Gimbert and Youssouf Oualhadj. Probabilistic automata on finite words: Decidable and undecidable problems. In Samson Abramsky, Cyril Gavoille, Claude Kirchner, Friedhelm Meyer auf der Heide, and Paul G. Spirakis, editors, Automata, Languages and Programming. 37th International Colloquium, ICALP 2010, Bordeaux, France, July 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pages 527–538, Berlin, Heidelberg, 2010. Springer-Verlag. Full version in https://hal.science/hal-00456538v3. doi:10.1007/978-3-642-14162-1_44.

[bib.bib11] [11] Vesa Halava and Tero Harju. On Markov’s undecidability theorem for integer matrices. Semigroup Forum, 75:173–180, 2007. doi:10.1007/s00233-007-0714-x.

[bib.bib12] [12] Vesa Halava, Tero Harju, and Mika Hirvensalo. Undecidability bounds for integer matrices using Claus instances. International Journal of Foundations of Computer Science, 18(05):931–948, 2007. doi:10.1142/S0129054107005066.

[bib.bib13] [13] Mika Hirvensalo. Improved undecidability results on the emptiness problem of probabilistic and quantum cut-point languages. In Jan van Leeuwen, Giuseppe F. Italiano, Wiebe van der Hoek, Christoph Meinel, Harald Sack, and František Plášil, editors, SOFSEM 2007: Theory and Practice of Computer Science, volume 4362 of Lecture Notes in Computer Science, pages 309–319, Berlin, Heidelberg, 2007. Springer. doi:10.1007/978-3-540-69507-3_25.

[bib.bib14] [14] Omid Madani, Steve Hanks, and Anne Condon. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell., 147(1–2):5–34, 2003. doi:10.1016/S0004-3702(02)00378-8.

[bib.bib15] [15] A. Markov. On certain insoluble problems concerning matrices. Doklady Akad. Nauk SSSR (N.S.), 57:539–542, 1947. (Russian).

[bib.bib16] [16] Yuri Matiyasevich and Géraud Sénizergues. Decision problems for semi-Thue systems with a few rules. Theoretical Computer Science, 330(1):145–169, 2005. doi:10.1016/j.tcs.2004.09.016.

[bib.bib17] [17] Masakazu Nasu and Namio Honda. Mappings induced by PGSM-mappings and some recursively unsolvable problems of finite probabilistic automata. Information and Control, 15(3):250–273, 1969. doi:10.1016/S0019-9958(69)90449-5.

[bib.bib18] [18] Turlough Neary. Undecidability in binary tag systems and the Post correspondence problem for five pairs of words. In Ernst W. Mayr and Nicolas Ollinger, editors, 32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), volume 30 of Leibniz International Proceedings in Informatics (LIPIcs), pages 649–661, Dagstuhl, Germany, 2015. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. doi:10.4230/LIPIcs.STACS.2015.649.

[bib.bib19] [19] Michael S. Paterson. Unsolvability in $3\times 3$ matrices. Stud. in Appl. Math., 49(1):105–107, 1970. doi:10.1002/sapm1970491105.

[bib.bib20] [20] Azaria Paz. Introduction to Probabilistic Automata. Computer Science and Applied Mathematics. Academic Press, New York, 1971. doi:10.1016/c2013-0-11297-4.

[bib.bib21] [21] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable, June 2024. doi:10.48550/arXiv.2405.03035.

[bib.bib22] [22] Günter Rote. Probabilistic Finite Automaton Emptiness is undecidable for a fixed automaton, December 2024. doi:10.48550/arXiv.2412.05198.

[bib.bib23] [23] Paavo Turakainen. Generalized automata and stochastic languages. Proc. Amer. Math. Soc., 21:303–309, 1969. doi:10.1090/S0002-9939-1969-0242596-1.

[bib.bib24] [24] Paavo Turakainen. Word-functions of stochastic and pseudo stochastic automata. Annales Fennici Mathematici, 1(1):27–37, February 1975. doi:10.5186/aasfm.1975.0126.

Probabilistic Finite Automaton Emptiness Is Undecidable for a Fixed Automaton

Abstract

Keywords and phrases:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Probabilistic finite automata (PFA)

1.1 Formal problem definition

1.2 Statement of results

Theorem 1.

Theorem 2.

Rational-Weigthed Automata.

1.3 Contributions of this paper and relations to other problems

1.4 The Post Correspondence Problem (PCP)

2 Proofs of Theorem 1a and Theorem 1b (few states)

2.1 Step 1: Modeling word pairs by matrices

Lemma 3 (Multiplicative Law).

2.2 Equality detection

Lemma 4 (Equality Detection Lemma).

Proof.

2.3 Modeling the Post Correspondence Problem (PCP)

History of ideas.

2.4 Step 2: Making column sums 1

Lemma 5.

Proof.

2.5 Step 3: Making row sums 1 with an extra state

2.6 Step 4: Making the matrices positive, and hence stochastic

2.7 Step 5: Turning 𝝅 into a probability distribution

2.8 Using a small PCP

Reversing the words.

2.9 Step 6. Merging the leftmost matrix into the starting distribution

2.10 Proof of Theorem 1b (fixed transition matrices) by using a PCP with only one variable word pair

3 Proofs of Theorem 1c and Theorem 1d (binary input)

3.1 Reduction to two matrices

Lemma 6.

Proof.

3.2 Proof of Theorem 1c

3.3 Step 2′: Making column sums 1 with an extra state

3.4 Proof of Theorem 1d (two fixed matrices)

4 Proof of Theorem 2 (variable end vector 𝒇)

4.1 Changing the cutpoint 𝝀 by manipulating the end vector 𝒇

4.2 Proof of Theorem 2b (fixed transition matrices)

5 Outlook

Positive and doubly-stochastic transition matrices.

5.1 Obtaining a substantial gap between acceptance and rejection

Theorem 7 (Gimbert and Oualhadj 2010 [10]).

5.2 Uniqueness

5.3 Simple probabilistic automata

References

2 Proofs of Theorem 1 a and Theorem 1 b (few states)

2.7 Step 5: Turning $\pi$ into a probability distribution

2.10 Proof of Theorem 1 b (fixed transition matrices) by using a PCP with only one variable word pair

3 Proofs of Theorem 1 c and Theorem 1 d (binary input)

3.2 Proof of Theorem 1 c

3.3 Step 2^′: Making column sums 1 with an extra state

3.4 Proof of Theorem 1 d (two fixed matrices)

4 Proof of Theorem 2 (variable end vector ${f}$ )

4.1 Changing the cutpoint $\lambda$ by manipulating the end vector ${f}$

4.2 Proof of Theorem 2 b (fixed transition matrices)