The Agafonov and Schnorr-Stimm Theorems for Probabilistic Automata

Bienvenu, Laurent; Gimbert, Hugo; Pulari, Subin

doi:10.4230/LIPIcs.FSTTCS.2025.16

The Agafonov and Schnorr-Stimm Theorems for Probabilistic Automata

Laurent Bienvenu

LaBRI, CNRS & Université de Bordeaux, Talence, France Hugo Gimbert

LaBRI, CNRS & Université de Bordeaux, Talence, France Subin Pulari

LaBRI, CNRS & Université de Bordeaux, Talence, France

Abstract

For a fixed alphabet $A$ , an infinite sequence $X$ is said to be normal if every word $w$ over $A$ appears in $X$ with the same frequency as any other word of the same length. A classical result of Agafonov (1966) relates normality to finite automata as follows: a sequence $X$ is normal if and only if any subsequence of $X$ selected by a finite automaton is itself normal. Another theorem of Schnorr and Stimm (1972) gives an alternative characterization: a sequence $X$ is normal if and only if no gambler can win large amounts of money by betting on the sequence $X$ using a strategy that can be described by a finite automaton. Both of these theorems are established in the setting of deterministic finite automata. This raises the question as to whether they can be extended to the setting of probabilistic finite automata. In the case of the Agafonov theorem, a partial positive answer was given by Léchine et al. (MFCS 2024) in a restricted case of probabilistic automata with rational transition probabilities.

In this paper, we settle the full conjecture by proving that both the Agafonov and the Schnorr-Stimm theorems hold true for arbitrary probabilistic automata. Specifically, we show that a sequence $X$ is normal if and only if any probabilistic automaton selects a normal subsequence of $X$ with probability $1$ and also show that a sequence $X$ is normal if and only if any probabilistic finite-state gambler fails to win on $X$ with probability $1$ .

Keywords and phrases:

Normality, Agafonov theorem, probabilistic automata

Funding:

Laurent Bienvenu: supported in part by the ANR project FLITTLA ANR-21-CE48-0023.

Subin Pulari: supported in part by the ANR project FLITTLA ANR-21-CE48-0023.

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Formal languages and automata theory

Editors:

C. Aiswarya, Ruta Mehta, and Subhajit Roy

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Introduction

Given a finite alphabet $A$ of $k$ letters, an infinite sequence $X$ of letters is said to be normal if every word $w$ of $A^{*}$ appear as subword of $X$ with the same frequency as any other word of the same length, namely, $(1/k)^{|w|}$ . The famous Champernowne sequence

01234567891011121314151617181920212223242526\ldots

can be shown to be normal over the alphabet $A=\{0,\ldots,9\}$ . The number $\pi$ , for example, is conjectured to have a normal expansion in every base, though this very much remains an open question. Normal sequences are plentiful, and an easy way to generate a normal sequence $X$ is to draw each letter $X(n)$ at random in the alphabet $A$ (all letters having the same probability $1/|A|$ ) independently of the other chosen letters $X(m)$ . The law of large numbers tells us that we obtain a normal sequence with probability $1$ .

Of course, there are also plenty of examples of non-normal sequences:

$\blacksquare$

Periodic, or ultimately periodic, sequences can never be normal: indeed if the period of $X$ is $k$ there are only $k$ possible subwords of $X$ of length $k$ hence most words of length $k$ will not appear and their frequency will be $0$ .
$\blacksquare$

Sturmian sequences, which are sequences with only $k+1$ different subwords of length $k$ (such as the Fibonacci sequence 100101001001010010100100101001001… obtained by iterating the morphism $0\mapsto 01$ and $1\mapsto 0$ ) are not normal for the same reason.
$\blacksquare$

The Thue-Morse sequence $01101001100101101001011001101001...$ obtained by iterating the morphism $0\mapsto 01$ and $1\mapsto 10$ is not normal because $000$ and $111$ do not appear as subwords.
$\blacksquare$

A sequence of $0$ ’s and $1$ ’s generated at random where each bit is chosen equal to the previous one with probability $2/3$ will have, with probability $1$ , all possible finite words as subwords but will not (still with probability $1$ ) be normal as for example the word $00$ will appear with frequency $1/3$ instead of $1/4$ .

It turns out that normality has a nice interpretation in terms of finite automata. Indeed, two classical results, one due to Agafonov [1] and the other due to Schnorr and Stimm [9], assert that an infinite sequence is normal if and only if it cannot be predicted by a finite automaton with better-than-average accuracy. Of course, one needs to specify what “predicted” means. We consider two prediction models.

(I)

In the Agafonov model, an automaton reads the infinite sequence $X$ one letter at a time and updates its state in the usual way. Some of its states have a “select” tag on them. When the current state has such a tag, the next letter will be selected and added to a subsequence $Y$ . We consider the automaton successful at predicting $X$ if the subsequence $Y$ built in this process is infinite and some letter of $A$ does not have asymptotic frequency $1/|A|$ in $Y$ . This means that the automaton has exhibited a statistical anomaly in the sequence $X$ and isolated this anomaly in the subsequence $Y$ .
(II)

In the Schnorr-Stimm model, the predictor is still an automaton but this time is viewed as a gambling strategy. The gambler starts with a capital of $\$1$ . Each state $q$ is labeled with a betting function $\gamma_{q}:A\rightarrow\mathbb{R}^{\geq 0}$ . This function represents the amount by which the predictor would like her capital to be multiplied by depending on the value of the next bit. For example, suppose the player plays against a sequence $X\in\{a,b,c\}^{\omega}$ . If her current capital is $\$2$ and the current state $q$ is labelled by a betting function $\gamma_{q}$ such that $\gamma_{q}(a)=0.7$ , $\gamma_{q}(b)=1.1$ and $\gamma(c)=1.2$ , if the next letter is $a$ , her new capital will be $\$1.4$ , if it is $b$ her new capital will be $\$2.2$ and if it is $c$ her new capital will be $\$2.4$ . For the game to be fair, each betting function $\gamma_{q}$ must have expectation equal to $1$ , i.e., must satisfy $\frac{1}{|A|}\sum_{a\in A}\gamma_{q}(a)=1$ . We say that the predictor wins if the capital of the player takes arbitrarily large values throughout the (infinite) game. That is, the predictor has spotted some type of statistical anomaly and is exploiting it to get rich!

Both of these models lead to the same conclusion: an infinite sequence $X$ is normal if and only if it is unpredictable (by a finite automaton).

Theorem 1 (Agafonov [1]).

For $X\in A^{\omega}$ , the following are equivalent.

(i)

$X$ is normal.
(ii)

For any automaton $\mathcal{A}$ that selects a subsequence $Y$ of $X$ as in model I, either $Y$ is finite, or every letter of $A$ appears in $Y$ with asymptotic frequency $1/|A|$ .
(iii)

For any automaton $\mathcal{A}$ that selects a subsequence $Y$ of $X$ as in model I, either $Y$ is finite, or $Y$ is normal.

(See also Carton [3] and Seiller and Simonsen [10] for a modern account of the above theorem).

Theorem 2 (Schnorr-Stimm [9]).

For $X\in A^{\omega}$ , the following are equivalent.

(i)

$X$ is normal.
(ii)

Any automaton $\mathcal{A}$ betting on $X$ according to model II does not win.

In both of these theorems, the finite automata used for prediction are assumed to be deterministic. Would the situation change if one allowed probabilistic automata? In principle, one would not expect an unpredictable sequence to become predictable in the presence of a random source. Indeed, given a sequence $X$ and a random source $R$ it seems, informally speaking, that almost surely $R$ will not “know” anything about $X$ and thus will not help in predicting $X$ . Surprisingly, this intuition is wrong in the setting where the predictors are not finite automata but are Turing machines, as shown by Bienvenu et al. [2] who built a sequence that is unpredictable by deterministic Turing machines (in either prediction model of selection or gambling) and becomes predictable (in either model) if one allows probabilistic Turing machines. Nonetheless, finite automata are much weaker than Turing machines and Bienvenu et al.’s construction does not work for such a memoryless model of computation. Indeed, recently, Léchine et al. showed that Agafonov’s theorems holds for probabilistic automata in the restricted case where the transition probabilities are rational.

Theorem 3 (Léchine et al. [6]).

For $X\in A^{\omega}$ , the following are equivalent.

(i)

$X$ is normal.
(ii)

For any probabilistic automaton $\mathcal{A}$ with rational probabilities, almost surely, $\mathcal{A}$ selects a subsequence $Y$ of $X$ that is either finite, or every letter of $A$ appears in $Y$ with asymptotic frequency $1/|A|$ .
(iii)

For any probabilistic automaton $\mathcal{A}$ with rational probabilities, almost surely, $\mathcal{A}$ selects a subsequence $Y$ of $X$ such that either $Y$ is finite, or $Y$ is normal.

This led them to conjecture that the probabilistic version of Agafonov’s theorem holds in the general case, where the probabilities are real-valued rather than being rational. In this paper, we prove this conjecture and also prove the probabilistic version of the Schnorr-Stimm theorem. Additionally, we establish a probabilistic version of the Schnorr-Stimm dichotomy regarding the winning rates of probabilistic gamblers.

Léchine et al.’s proof is a reduction of the rational probabilistic case to the deterministic case by some clever combinatorial reduction (which substantially increases the number of states). Our proof is also a reduction to the deterministic case but is of very different nature: Instead of encoding a probabilistic automaton into a deterministic one, we will use the fact that a probabilistic automaton can be seen as a deterministic one over a larger alphabet, without altering the number of states. For this, we will need an extension of the deterministic case to Bernoulli measures (an extension which was proved by Seiller and Simonsen [10]), which will be presented in the next section.

Notation and terminology

We finish this introduction by formalizing the concepts discussed so far and gathering the notation and terminology that will be used in the rest of the paper.

Given an alphabet $A$ , we denote by $A^{*}$ the set of finite words over $A$ , by $A^{n}$ the set of words of length $n$ , by $A^{\omega}$ the set of infinite sequences of letters and by $A^{\leq\omega}$ the set $A^{*}\cup A^{\omega}$ . For $X\in A^{\leq\omega}$ , we denote by $X(i)$ the $i$ -th letter of $X$ (by convention there is a $0$ -th letter) and by $X[i,j]$ the word $X(i)X(i+1)\ldots X(j)$ . Let $\epsilon$ denote the empty string.

Given a word $u$ of length $k$ and a word $w$ of length $n\geq k$ , we denote by $\operatorname{NbOcc}(u,w)$ the number of occurrences of the word $u$ in $w$ , i.e.,

\operatorname{NbOcc}(u,w)=\#\{i:0\leq i\leq n-k,\ w[i,i+k-1]=u\}

and the frequency of occurrence $\operatorname{Freq}(u,w)$ of $u$ in $w$ is naturally defined by

\operatorname{Freq}(u,w)=\frac{\operatorname{NbOcc}(u,w)}{n-k+1}

When $X$ is an infinite sequence, we define

\operatorname{Freq}^{-}(u,X)=\liminf_{n\rightarrow\infty}\,\operatorname{Freq}% (u,X[0,n])\nobreak\ \nobreak\ \text{and}\nobreak\ \nobreak\ \operatorname{Freq% }^{+}(u,X)=\limsup_{n\rightarrow\infty}\,\operatorname{Freq}(u,X[0,n])

When $\operatorname{Freq}^{-}(u,X)$ and $\operatorname{Freq}^{+}(u,X)$ have the same value, we simply call this common value $\operatorname{Freq}(u,X)$ .

Given $X\in A^{\omega}$ , we say that $X$ is balanced if all letters appear in $X$ with the expected frequency, i.e., $\operatorname{Freq}(a,X)=1/|A|$ for all $a\in A$ . We say that $X$ is normal if all words appear in $X$ with the expected frequency, i.e., $\operatorname{Freq}(w,X)=|A|^{-|w|}$ for all $w\in A^{*}$ .

A deterministic finite automaton (DFA) is a tuple $(Q,A,q_{I},\delta)$ where $Q$ is a finite set of states, $A$ a finite alphabet, $q_{I}$ the initial state and $\delta:Q\times A\rightarrow Q$ the transition function (in this paper, runs of automata are meant to be infinite hence there is no need for final states). We denote by $\delta^{*}$ the function from $Q\times A^{*}$ to $Q$ defined inductively by $\delta^{*}(q,\epsilon)=q$ and for $w\in A^{*}$ and $a\in A$ , $\delta^{*}(q,w\cdot a)=\delta(\delta^{*}(q,w),a)$ , where $\cdot$ is the concatenation of words.

An automatic selector (or selector for short) is a tuple $(Q,A,q_{I},\delta,S)$ where $(Q,A,q_{I},\delta)$ is a DFA and $S$ is a subset of $Q$ , representing the selection states.

Given a selector $\mathcal{S}=(Q,A,q_{I},\delta,S)$ , we define the selection function from $A^{*}$ to $A^{*}$ inductively by:

\operatorname{Select}(\mathcal{S},\epsilon)=\epsilon

and for $w\in A^{*}$ and $a\in A$ :

\operatorname{Select}(\mathcal{S},w\cdot a)=\left\{\begin{array}[]{lr}% \operatorname{Select}(\mathcal{S},w)\cdot a&\text{ if }\delta^{*}(q_{I},w)\in S% \\ \operatorname{Select}(\mathcal{S},w)&\text{ if }\delta^{*}(q_{I},w)\notin S\\ \end{array}\right.

If $X$ is an infinite sequence in $A^{\omega}$ , the sequence of words $\operatorname{Select}(\mathcal{S},X[0,n])$ is non-decreasing with respect to the prefix order and thus converges to a sequence $Y\in A^{\leq\omega}$ which we call the selected subsequence of $X$ selected by $\mathcal{S}$ and denote by $\operatorname{Select}(\mathcal{S},X)$ .

An automatic gambler (or gambler for short) is a tuple $(Q,A,q_{I},\delta,\gamma)$ where $(Q,A,q_{I},\delta)$ is a DFA and $\gamma$ is a function from $Q\times A$ to $\mathbb{R}^{\geq 0}$ such that for all $q$ , $\frac{1}{|A|}\sum_{a\in A}\gamma(q,a)=1$ . As said earlier, the value of $\gamma(q,a)$ should be interpreted as the multiplier by which the gambler, being currently in state $q$ , would like her capital to be multiplied by if the next read letter is $a$ . The condition $\frac{1}{|A|}\sum_{a\in A}\gamma(q,a)=1$ ensures that the game is fair.

Given a gambler $\mathcal{G}=(Q,A,q_{I},\delta,\gamma)$ and $w\in A^{*}$ , we define $\operatorname{Capital}(\mathcal{G},w)$ inductively by

\operatorname{Capital}(\mathcal{G},\epsilon)=1

and for $w\in A^{*}$ and $a\in A$ ,

\operatorname{Capital}(\mathcal{G},w\cdot a)=\operatorname{Capital}(\mathcal{G% },w)\cdot\gamma(\delta^{*}(q_{I},w),a)

and we say that a gambler $\mathcal{G}$ wins against $X\in A^{\omega}$ if

\limsup_{n\rightarrow+\infty}\,\operatorname{Capital}(\mathcal{G},X[0,n])=+\infty

(otherwise we say that $\mathcal{G}$ loses).

2 Deterministic prediction for Bernoulli measures

In classical normality, all letters of the alphabet occur with the same frequency. We can however consider the extension of normality to Bernoulli measures. A Bernoulli measure over $A^{\omega}$ is a probability measure where letters of an infinite sequence $X$ are drawn at random independently of one another but the distribution over the alphabet $A$ is non-uniform.

Definition 4.

Let $\mu:A\rightarrow[0,1]$ be a distribution over the alphabet $A$ (hence satisfying $\sum_{a\in A}\mu(a)=1$ ). The Bernoulli measure induced by $\mu$ , which we also denote by $\mu$ by abuse of notation, is the unique probability measure such that for all $i, k$ , for every word $w=a_{0},\ldots a_{k}\in A$ ,

\Pr_{X\sim\mu}\Big[X[i,i+k]=w\Big]=\prod_{j=0}^{k}\mu(a_{j})

We also denote by $\mu(w)$ the quantity $\prod_{j=0}^{k}\mu(a_{j})$ .

Normality generalizes very naturally to Bernoulli measures.

Definition 5.

Let $\mu$ be a Bernoulli measure. A sequence $X\in A^{\omega}$ is $\mu$ -balanced if $\operatorname{Freq}(a,X)=\mu(a)$ for all $a\in A$ . It is $\mu$ -normal if for all words $w\in A^{*}$ , $\operatorname{Freq}(w,X)=\mu(w)$ .

We say that a Bernoulli measure $\mu$ is positive when $\mu(a)>0$ for every letter $a$ . In the rest of the paper, all Bernoulli measures will be assumed to be positive, and we simply say “Bernoulli measure” to mean “positive Bernoulli measure”.

The Agafonov theorem can be extended to Bernoulli measures, as proven by Seiller and Simonsen [10]. It is this theorem that we will use in the next section to obtain a proof of the Agafonov theorem for probabilistic selectors.

Theorem 6 (Agafonov theorem for Bernoulli measures [10]).

For $X\in A^{\omega}$ , the following are equivalent.

(i)

$X$ is $\mu$ -normal.
(ii)

For any selector $\mathcal{S}$ that selects a subsequence $Y$ of $X$ , either $Y$ is finite or $Y$ is $\mu$ -balanced.
(iii)

For any selector $\mathcal{S}$ that selects a subsequence $Y$ of $X$ , either $Y$ is finite or $Y$ is $\mu$ -normal.

We can also easily generalize the notion of gambler to the setting of Bernoulli measures: it suffices to define a $\mu$ -gambler $\mathcal{G}=(Q,A,q_{I},\delta,\gamma)$ as before but with the fairness condition on $\gamma$ replaced by $\sum_{a\in A}\mu(a)\gamma(q,a)=1$ for every $q$ . The function $\operatorname{Capital}$ and the notion of success are defined as before.

We will now prove that the Schnorr-Stimm theorem, just like the Agafonov theorem, can also be extended to Bernoulli measures.

Theorem 7 (Schnorr-Stimm theorem for Bernoulli measures).

For $X\in A^{\omega}$ and a Bernoulli measure $\mu$ , the following are equivalent.

(i)

$X$ is $\mu$ -normal.
(ii)

No $\mu$ -gambler $\mathcal{G}$ wins by betting on $X$ .

Proof.

$(i)\Rightarrow(ii)$ .

Suppose that $X\in A^{\omega}$ is $\mu$ -normal and consider a $\mu$ -gambler $\mathcal{G}=(Q,A,q_{I},\delta,\gamma)$ .

We can assume that the $\mu$ -gambler only has one state on which it places a non-trivial bet. Indeed, define for every $q\in Q$ the $\mu$ -gambler $\mathcal{G}^{[q]}=(Q,A,q_{I},\delta,\gamma^{[q]})$ where $\gamma^{[q]}(q,a)=\gamma(q,a)$ for all $a$ and $\gamma^{[q]}(q^{\prime},a)=1$ for $q^{\prime}\not=q$ . That is, $\mathcal{G}^{[q]}$ is the gambler $\mathcal{G}$ where all states but state $q$ are neutralized (no bet is placed while on them). By the multiplicative nature of the function $\operatorname{Capital}$ , we have for all $n$ :

\operatorname{Capital}(\mathcal{G},X[0,n])=\prod_{q\in Q}\operatorname{Capital% }(\mathcal{G}^{[q]},X[0,n])

Thus if we can show that all quantities $\operatorname{Capital}(\mathcal{G}^{[q]},X[0,n])$ are bounded, we are done. Let us assume that there is a state $r$ that is the unique state on which $\mathcal{G}$ bets. If instead of $\mathcal{G}$ we consider the selector $\mathcal{S}=(Q,A,q_{I},\delta,\{r\})$ with only $r$ as the selecting state, we know by Agafonov’s theorem for Bernoulli measures (Theorem 6) that the subsequence $Y$ of $X$ selected by $\mathcal{S}$ is $\mu$ -normal, hence in particular $\mu$ -balanced. But this subsequence is precisely the values of $X$ on which $\mathcal{G}$ bets!

We can further assume that $Y$ is infinite, otherwise it means that the run of $\mathcal{G}$ on $X$ passes by $r$ finitely often, hence $\mathcal{G}$ certainly cannot win as other states are not betting states. Now, suppose that at the $n$ -th step of the run on $X$ the state $r$ has been visited $k=k(n)$ times. We have

\operatorname{Capital}(\mathcal{G},X[0,n])=\prod_{a\in A}\gamma(r,a)^{% \operatorname{NbOcc}(a,Y[0,k])}

But since $Y$ is $\mu$ -normal we have, for every $a$ , $\operatorname{NbOcc}(a,Y[0,k])=\mu(a)k+o(k)$ . Thus,

\operatorname{Capital}(\mathcal{G},X[0,n])=\prod_{a\in A}\gamma(r,a)^{\mu(a)k+% o(k)}

or equivalently

\log\operatorname{Capital}(\mathcal{G},X[0,n])=(k+o(k))\cdot\sum_{a\in A}\mu(a% )\cdot\log\gamma(r,a)

(here we assume that all values $\gamma(r,a)$ involved in the product are positive for if not then the capital falls to $0$ and we are done). Since we have $\sum_{a\in A}\mu(a)=1$ ( $\mu$ being a distribution), we can use the strict concavity of the function $\log$ on $(0,+\infty)$ to apply Jensen’s inequality and get

\sum_{a\in A}\mu(a)\cdot\log\gamma(r,a)\leq\log\left(\sum_{a\in A}\mu(a)\gamma% (r,a)\right)

with strict inequality when not all $\gamma(q,a)$ are equal (which is the case where $\mathcal{G}$ makes non-trivial bets). But by the fairness condition, we have $\sum_{a\in A}\mu(a)\gamma(r,a)=1$ hence we see that $\log\operatorname{Capital}(\mathcal{G},X[0,n])$ is either $0$ or ultimately negative which either way means that $\mathcal{G}$ does not win.

$(ii)\Rightarrow(i)$ .

Assume that $X$ is not $\mu$ -normal. This means that there is some word $w$ such that $\operatorname{Freq}(w,X[0,n])$ does not converge to $\mu(w)$ . Let us assume that $w$ is such a word of minimal length and write $w=ux$ with $u\in A^{*}$ and $x\in A$ .

Consider the sequence of vectors $f_{n}$ defined by

f_{n}=\left(\frac{\operatorname{Freq}(ua,X[0,n])}{\sum_{b\in A}\operatorname{% Freq}(ub,X[0,n])}\right)_{a\in A}

or equivalently,

f_{n}=\left(\frac{\operatorname{NbOcc}(ua,X[0,n])}{\sum_{b\in A}\operatorname{% NbOcc}(ub,X[0,n])}\right)_{a\in A}

All of these vectors belong to the set $\Gamma=\{f:A\rightarrow[0,1]:\sum_{a\in A}f(a)=1\}$ . This is a compact set, hence the sequence $f_{n}$ must have cluster points. By definition of $u$ and $x$ , we know that $f_{n}$ does not converge to $\mu$ because $f_{n}(x)$ does not converge to $\mu(x)$ : Indeed, in the definition of $f_{n}(x)$ , the denominator converges to $\operatorname{Freq}(u,X)$ which by minimality of $w$ is defined and equal to $\mu(u)$ , while the numerator is equal to $\operatorname{Freq}(w,X[0,n])$ which by definition of $w$ does not converge to $\mu(w)=\mu(u)\mu(x)$ .

Therefore, the sequence $(f_{n})$ must have at least one cluster point $\nu$ different from $\mu$ . Fix such a cluster point $\nu$ .

We now build our gambler $\mathcal{G}=(Q,A,q_{I},\delta,\gamma)$ . The idea is that the gambler will record the last $|u|$ bits it read and will only place bets when these exactly form the word $u$ . Let us thus take $Q=\{q_{v}:v\in A^{*},|v|\leq|u|\}$ initial state $q_{I}=q_{\epsilon}$ and define $\delta$ by

\displaystyle\delta(q_{v},a)=\begin{cases}q_{va},&\text{if }\lvert v\rvert<|u|% ,\\ q_{v^{\prime}a}&\text{ if }|v|=|u|\text{ and }v=xv^{\prime}\text{ with }x\in A% .\end{cases}

Now, define $\gamma(q_{v},a)=1$ whenever $v\not=u$ and $\gamma(q_{u},a)=\nu(a)/\mu(a)$ for all $a\in A$ . Observe that this is a valid $\mu$ -gambler as the fairness condition is satisfied: $\sum_{a\in A}\nu(a)/\mu(a)\cdot\mu(a)=\sum_{a}\nu(a)=1$ .

Suppose that after reading $n$ letters of $X$ the state $q_{u}$ has been visited $k=k(n)$ times. First, observe that $k(n)$ tends to $+\infty$ . Indeed, the state $q_{u}$ is visited whenever $u$ is seen as a subword of $X$ . But we assumed that $u$ appears in $X$ with frequency $\mu(u)$ , by minimality of $w$ .

Second, unfolding the definition of $f_{n}$ , we have,

\operatorname{Capital}(\mathcal{G},X[0,n])=\prod_{a\in A}\left(\frac{\nu(a)}{% \mu(a)}\right)^{k\cdot f_{n}(a)}.

This gives

\log\operatorname{Capital}(\mathcal{G},X[0,n])=k\cdot\sum_{a\in A}f_{n}(a)\log% \left(\frac{\nu(a)}{\mu(a)}\right).

Since $\nu$ is a cluster point of the sequence $f_{n}$ , for any fixed $\varepsilon>0$ there are infinitely many $n$ (or $k$ ) such that

\sum_{a\in A}f_{n}(a)\log\left(\frac{\nu(a)}{\mu(a)}\right)\geq\sum_{a\in A}% \nu(a)\log\left(\frac{\nu(a)}{\mu(a)}\right)-\varepsilon.

But the term $\sum_{a\in A}\nu(a)\log\left(\frac{\nu(a)}{\mu(a)}\right)$ is the relative entropy from $\mu$ to $\nu$ (also known as Kullback-Liebler divergence, see for example [5]) which we denote by $\mathrm{D}_{KL}(\nu||\mu)$ . This quantity is non-negative in general and is positive when $\nu\not=\mu$ , which is the case here. We have thus established that for any fixed $\varepsilon$ , there are arbitrarily large $n$ and $k$ such that

\log\operatorname{Capital}(\mathcal{G},X[0,n])\geq k\left(\mathrm{D}_{KL}(\nu|% |\mu)-\varepsilon\right).

Taking any $\varepsilon<\mathrm{D}_{KL}(\nu||\mu)$ , this shows that

\limsup_{n\rightarrow+\infty}\operatorname{Capital}(\mathcal{G},X[0,n])=+% \infty.\

$\hfill\blacktriangleleft$

Let us note that this last proof actually gives us a finer analysis of normality, in terms of the rate of failure or success of the gambler. This was already observed by Schnorr and Stimm in their seminal paper, where they proved the following.

Theorem 8 (Schnorr-Stimm dichotomy theorem [9]).

Let $X$ be an infinite sequence in $A^{\omega}$ .

(i)

If $X$ is normal and $\mathcal{G}$ is a gambler, then the capital of $\mathcal{G}$ throughout the game either is ultimately constant or decreases at an exponential rate.
(ii)

If $X$ is not normal, then there exists a gambler $\mathcal{G}$ which wins against $X$ at an “infinitely often” exponential rate (i.e., $\limsup_{n}\log(\operatorname{Capital})/n>0$ ).

As a byproduct of our proof of Theorem 7, we have the same dichotomy for positive Bernoulli measures (i.e., Bernoulli measures such that $\mu(a)>0$ for every letter):

Theorem 9 (Schnorr-Stimm dichotomy theorem for Bernoulli measures).

Let $X$ be an infinite sequence in $A^{\omega}$ and $\mu$ a positive Bernoulli measure.

(i)

If $X$ is $\mu$ -normal and $\mathcal{G}$ is a $\mu$ -gambler, then the capital of $\mathcal{G}$ throughout the game either is ultimately constant or decreases at an exponential rate.
(ii)

If $X$ is not $\mu$ -normal, then there exists a $\mu$ -gambler $\mathcal{G}$ which wins against $X$ at an “infinitely often” exponential rate.

Our proof of Theorem 7 almost establishes this, but we do need an additional technical lemma and the block-wise characterization of $\mu$ -normality. Let $\operatorname{BFreq}(u,w)$ denote the block frequency of the word $u$ in $w$ , defined as the proportion of non-overlapping blocks in $w$ that are equal to $u$ . More precisely when $n=\lvert w\rvert$ and $k=\lvert u\rvert$ ,

\operatorname{BFreq}(u,w)=\frac{\#\{i:0\leq i\leq\lfloor n/k\rfloor\nobreak\ ,% \nobreak\ w[ki,k(i+1)-1]=u\}}{\lfloor n/k\rfloor}.

For $w\in A^{*}$ , let $\operatorname{BFreq}^{-}(w,X)$ , $\operatorname{BFreq}^{+}(w,X)$ and $\operatorname{BFreq}(w,X)$ denote the lower, upper and limit block frequency of $w$ in $X$ defined similarly as $\operatorname{Freq}^{-}(w,X)$ , $\operatorname{Freq}^{+}(w,X)$ and $\operatorname{Freq}(w,X)$ . A sequence $X$ is $\mu$ -block normal if for all words $w\in A^{*}$ , $\operatorname{BFreq}(w,X)=\mu(w)$ . The following was shown by Seiller and Simonsen.

Lemma 10 (Seiller-Simonsen [10]).

If a sequence $X\in A^{\omega}$ is $\mu$ -normal then $X$ is $\mu$ -block normal.

We note that the converse implication also holds.

Lemma 11.

If a sequence $X\in A^{\omega}$ is $\mu$ -block normal then $X$ is $\mu$ -normal.

The above lemmas completes the proof of the following equivalence theorem between $\mu$ -normality and $\mu$ -block normality.

Theorem 12.

A sequence $X\in A^{\omega}$ is $\mu$ -normal if and only if $X$ is $\mu$ -block normal.

The proof of Lemma 11 uses a counting trick from the proof of Theorem 3.1 from [8] which in turn is based on the proof of the main theorem in [7].

Proof of Lemma 11.

As in the proof of Theorem 3.1 from [8], for any finite length string $w=a_{1}a_{2}a_{3}\dots a_{k}\in A^{k}$ and large enough $n$ ,

\displaystyle\operatorname{Freq}(w,X[0,n])=f_{1}(n)+f_{2}(n)+\dots+f_{(1+\left% \lfloor\log_{2}\frac{n}{k}\right\rfloor)}(n)+{\frac{(k-1)\cdot O(\log n)}{n-k+% 1}}

(1)

where $f_{p}(n)$ are defined as follows:
$\displaystyle f_{p}(n)=\begin{cases}\frac{|\{i\nobreak\ \mid\nobreak\ X[ki,k(i% +1)-1]=w\nobreak\ ,\nobreak\ 0\nobreak\ \leq\nobreak\ i\nobreak\ \leq\nobreak% \ \lfloor n/k\rfloor\}|}{n-k+1},&\text{if }\nobreak\ p=1\\ \sum_{j=1}^{k-1}\frac{|\{i\mid X[2^{p-1}ki,2^{p-1}k(i+1)-1]\in S_{j},0\leq i% \leq n/2^{p-1}k\}|}{n-k+1},&\text{if }1<p\leq(1+\left\lfloor\log_{2}(n/k)% \right\rfloor)\\ 0,\text{ otherwise.}\end{cases}$

In the above definition, $S_{j}$ is the set of strings of the form, $u\ a_{1}a_{2}\dots a_{k}\ v$ where $u$ is some string of length $2^{p-2}k-j$ and $v$ is some string of length $2^{p-2}k-k+j$ .

Equation 1 shows that the frequency of $w$ in $X[0,n]$ can be written as a sum of different block frequencies. The quantity $f_{1}(n)$ counts the number of occurrences of $w$ inside disjoint $k$ -length blocks in $X[0,n]$ , $f_{2}(n)$ counts the number of occurrences crossing the boundaries of these $k$ -blocks, $f_{3}(n)$ counts those crossing boundaries of $2k$ -blocks, and so on. In general, $f_{p}(n)$ counts the number of occurrences straddling boundaries of disjoint $2^{p-1}k$ -length blocks. Each $f_{p}(n)$ may miss at most $(k-1)$ occurrences of $w$ , which are accounted for by the error term.

Since $X$ is $\mu$ -block normal,

\displaystyle\lim_{n\to\infty}f_{1}(n)=\lim_{n\to\infty}\frac{|\{i\nobreak\ % \mid\nobreak\ X[ki,k(i+1)-1]=w\nobreak\ ,\nobreak\ 0\nobreak\ \leq\nobreak\ i% \nobreak\ \leq\nobreak\ \lfloor n/k\rfloor\}|}{n-k+1}=\frac{\mu(w)}{k}.

Now, when $p\leq 1+\left\lfloor\log_{2}(\frac{n}{k})\right\rfloor$ ,

	$\displaystyle\lim_{n\to\infty}f_{p}(n)=$	$\displaystyle\sum_{j=1}^{k-1}\lim_{n\to\infty}\frac{\|\{i\mid X[2^{p-1}ki,2^{p-% 1}k(i+1)-1]\in S_{j},0\leq i\leq n/2^{p-1}k\}\|}{n-k+1}$
		$\displaystyle=\frac{\mu(w)}{2^{p-1}k}(k-1).$

Since $\langle\sum_{i=1}^{m}f_{i}(n)\rangle_{m\in\mathbb{N}}$ is uniformly convergent, we have

\displaystyle\operatorname{Freq}(w,X)=\lim_{n\to\infty}\sum_{i=1}^{\infty}f_{i% }(n)=\sum_{i=1}^{\infty}\lim_{n\to\infty}f_{i}(n).

Therefore from (1),

\displaystyle\lim_{n\to\infty}\operatorname{Freq}(w,X[0,n])

\displaystyle=\frac{\mu(w)}{k}+(k-1)\mu(w)\Big[\sum_{i=1}^{\infty}\frac{1}{2^{% i}k}\Big]=\mu(w).

Hence, $X$ is $\mu$ -normal. $\hfill\blacktriangleleft$

We require the following technical lemma in the proof of Theorem 9.

Lemma 13.

Let $(Q,A,q_{I},\delta)$ be a finite-state automaton, $\mu$ be a positive Bernoulli measure and let $V_{q}(n,X)$ denote the number of times the state $q$ is visited upon running the automaton using the first $n$ bits of $X\in A^{\omega}$ .

Then, for every $q\in Q$ , there exists a real number $\pi_{q}\geq 0$ such that, for every $\mu$ -normal sequence $X\in A^{\omega}$ :

$\blacksquare$

either $V_{q}(n,X)$ is ultimately constant (i.e., $q$ is visited only finitely often during the run on $X$ )
$\blacksquare$

or, $\lim\limits_{n\to\infty}\frac{V_{q}(n,X)}{n}=\pi_{q}$ (i.e., the state $q$ is visited with asymptotic frequency $\pi_{q}$ ) and this second case can only happen when $\pi_{q}>0$ .

Proof.

On running an automaton upon a normal sequence, starting from any state, a strongly connected component must be reached in finitely many steps. Similar to [10], let us consider the Markov chain corresponding to the $\lvert Q\rvert\times\lvert Q\rvert$ stochastic matrix $\mathbf{P}$ where,

\displaystyle\mathbf{P}_{ij}=\sum\limits_{a\in A}\mu(a)\cdot 1_{\delta(i,a)=j}.

The proof follows using the Ergodic Theorem for Markov chains and the same steps in the proof of Lemma 4.5 from [9] by replacing the uniform measure with the positive Bernoulli measure induced by $\mu$ . We note that this line of proof uses $\mu$ -block normality, which is equivalent to $\mu$ -normality (Theorem 12). $\hfill\blacktriangleleft$

Proof of Theorem 9.

In part $(i)\Rightarrow(ii)$ of our proof of Theorem 7, we showed that on a given state $r$ , either $r$ is not a betting state ( $\gamma$ is the constant $1$ on this state) or it is and then the gambler loses money exponentially fast in the number of times this state is visited, the exponent being $\alpha_{r}=\sum_{a\in A}\mu(a)\cdot\log\gamma(r,a)$ which we proved to be negative. By Lemma 13, betting states are either visited finitely often or with positive asymptotic density. If they are all visited finitely often, the capital stabilizes after the last bet is made. Otherwise, if betting states $r$ are visited with frequency $\pi_{r}$ and at least one $\pi_{r}$ is positive, then the gambler loses at an exponential rate, where the exponent is $\sum_{r}\pi_{r}\alpha_{r}$ .

In part $(ii)\Rightarrow(i)$ , under the assumption of non- $\mu$ -normality of $X$ , we built a gambler $\mathcal{G}$ satisfying the following: for any fixed $\varepsilon$ , there are arbitrarily large $n$ and $k$ such that $\log\operatorname{Capital}(\mathcal{G},X[0,n])\geq k\left(\mathrm{D}_{KL}(\nu|% |\mu)-\varepsilon\right)$ . Here, $k$ is the number of visits to a state $q_{u}$ . In the proof of Theorem 7 part $(ii)$ , this state $q_{u}$ is visited with frequency $\mu(u)$ for large enough $n$ . Hence the gambler has an “infinitely often” exponential rate of success with exponent $\mu(u)D_{KL}(\nu||\mu)$ . $\hfill\blacktriangleleft$

3 The Agafonov theorem for PFAs

We now want to prove the extension of Theorem 6 to probabilistic automata/selectors. A probabilistic finite automaton (PFA) is a tuple $(Q,A,q_{I},\delta)$ where $Q$ is a finite set of states, $A$ is a finite alphabet, $q_{I}$ is the initial state and $\delta:Q\times A\rightarrow\Delta(Q)$ is a probabilistic transition function, that is, $\Delta(Q)$ is the set of probability distributions over $Q$ . In this setting, we define inductively the random variables $\delta^{*}(q,w)$ by $\delta^{*}(q,\epsilon)=q$ and for $w\in A^{*}$ and $a\in A$ , the event $\delta^{*}(q,w\cdot a)=q^{\prime}$ is defined as the union

\bigcup_{r\in Q}\left[\delta^{*}(q,w)=r\wedge\delta_{|w|+1}(r,a)=q^{\prime}\right]

where $\{\delta_{n}(r,b):n\in\mathbb{N},r\in Q,b\in A\}$ is a family of independent random variables such that for all $(n,r,b)$ , the distribution of $\delta_{n}(r,b)$ is $\delta(r,b)$ .

Modulo this change of type of transition, probabilistic selectors as well as $\operatorname{Select}$ are defined as before. This makes $\operatorname{Select}(\mathcal{S},X)$ a random variable for every given $X\in A^{\leq\omega}$ .

In order to lift the deterministic Agafonov theorem for Bernoulli measures to the probabilistic case, we will need some preliminary lemmas about normality.

Given two alphabets $A$ and $B$ , and given two sequences $X\in A^{\omega}$ and $Y\in B^{\omega}$ , we denote by $X\otimes Y$ the sequence $Z$ over the alphabet $A\times B$ where $Z(n)=(X(n),Y(n))$ . For two words $v$ and $w$ of the same length over $A$ and $B$ respectively, the product $v\otimes w$ is defined in the same way. Likewise, if $\mu$ and $\nu$ are Bernoulli measures over $A$ and $B$ respectively, $\mu\otimes\nu$ is the Bernoulli measure $\xi$ over $A\times B$ where $\xi((a,b))=\mu(a)\nu(b)$ , for all $(a,b)\in A\times B$ . Finally, if $Z$ is a sequence over a product alphabet $A\times B$ , we denote by $\pi_{0}$ and $\pi_{1}$ its first and second projection respectively (in other words, $\pi_{0}(X\otimes Y)=X$ and $\pi_{1}(X\otimes Y)=Y$ for all $X, Y$ ).

Lemma 14.

Let $A$ and $B$ be two alphabets and $\mu$ , $\nu$ two Bernoulli measures over $A$ and $B$ respectively. If a sequence $X\in A^{\omega}$ is $\mu$ -normal and a sequence $Y\in B^{\omega}$ is drawn at random according to $\nu$ , then $\nu$ -almost surely¹¹1For a probability 1 set of $Y$ according to the measure $\nu$ ., $X\otimes Y$ is $\mu\otimes\nu$ -normal.

Proof.

While it is a bit easier to prove this lemma for block-normality, we prefer a more direct proof which does not rely on the equivalence between normality and block-normality.

Let $w=u\otimes v$ be a non-empty word over $(A\times B)^{*}$ . Let $N$ be a large integer. We split $X\otimes Y$ into blocks of length $N$ :

X\otimes Y=(X_{1}\otimes Y_{1})\cdot(X_{2}\otimes Y_{2})\cdot\ldots

with $|X_{i}|=|Y_{i}|=N$ , for all $i\geq 1$ . Introduce the random variables $(B_{i})_{i\geq 1}$ which count the number of occurrences of $u\otimes v$ in each block:

B_{i}=\operatorname{NbOcc}(u\otimes v,X_{i}\otimes Y_{i})\enspace.

For every integer $n\in\mathbb{N}$ , within $(X\otimes Y)[0,n]$ there are $\lfloor n/N\rfloor$ complete blocks. Some occurrences of $u\otimes v$ in $(X\otimes Y)[0,n]$ do occur inside a block $X_{i}\otimes Y_{i}$ while some other do not because they overlap two contiguous blocks. There can be at most $|w|$ such overlapping occurrences between two given blocks. That observation leads to two ways to count the number of occurrences of $u\otimes v$ : the exact way $C_{n}$ and the $N$ -block way $C^{\prime}_{n,N}$ , two random variables satisfying:

		$\displaystyle C_{n}=\operatorname{NbOcc}(u\otimes v,(X\otimes Y)[0,n])$
		$\displaystyle C^{\prime}_{n,N}=\sum_{i\in 1\ldots\lfloor n/N\rfloor}B_{i}$
		$\displaystyle C^{\prime}_{n,N}\leq C_{n}\leq C^{\prime}_{n+N,N}+\|w\|\cdot% \lfloor n/N\rfloor\enspace.$		(2)

Let us focus on $C^{\prime}_{n,N}$ first. The variables $B_{i}$ can be grouped into $|A|^{N}$ different buckets, with respect to the corresponding value of $X_{i}$ . For every $x\in A^{N}$ , set $I_{n}(x)=\{1\leq i\leq\lfloor n/N\rfloor\mid X_{i}=x\}$ , which is non-empty for $n$ large enough, since $X$ is normal. The random variables $(X_{i}\otimes Y_{i})$ are mutually independent, and so are the random variables $(B_{i})_{i\geq 1}$ . Moreover, for a fixed $x\in A^{N}$ , the random variables $(B_{i})_{i\in I_{\infty}(x)}$ are distributed identically, denote by $\xi_{N}(x)$ the corresponding probability distribution on $0\ldots N-|w|+1$ . Then $\mathbb{E}\left[\xi_{N}(x)\right]$ measures the expected number of occurrences of $u\otimes v$ in a sequence where the left part is fixed, equal to $x$ , and the right part is independently generated according to $\nu$ . Thus

\mathbb{E}\left[\xi_{N}(x)\right]=\operatorname{NbOcc}(u,x)\cdot\nu(v)\enspace.

According to the Law of Large Numbers, $\frac{1}{|I_{n}(x)|}\sum_{i\in I_{n}(x)}B_{i}\to_{n}\operatorname{NbOcc}(u,x)% \cdot\nu(v)$ . Since $X$ is normal, every word $x\in A^{N}$ occurs with frequency $\mu(x)$ in $X_{1},X_{2},\ldots$ thus, almost-surely,

\lim_{n}\frac{1}{n}C^{\prime}_{n,N}=\left(\sum_{x\in A^{N}}\mu(x)\cdot% \operatorname{NbOcc}(u,x)\right)\cdot\nu(v)\enspace.

(3)

Since $X$ is normal, the right part in (3) converges to $\mu(u)\cdot\nu(v)$ when $N$ grows large. Using (2), we get the desired result:

\frac{1}{n}C_{n}\to_{n}\lim_{N}\left(\lim_{n}C^{\prime}_{n,N}\right)=(\mu% \otimes\nu)(u\otimes v)\enspace.\

$\hfill\blacktriangleleft$

Lemma 15.

If a sequence $Z$ is $\mu\otimes\nu$ -normal over $A\times B$ , then $\pi_{0}(Z)$ and $\pi_{1}(Z)$ are $\mu$ -normal and $\nu$ -normal respectively.

Proof.

Suppose $Z\in(A\times B)^{\omega}$ is $\mu\otimes\nu$ -normal. We only need to show that $\pi_{0}(Z)$ is $\mu$ -normal, the proof of the $\nu$ -normality of $\pi_{1}(Z)$ is the same by symmetry. Let $w\in A^{n}$ . We have

\operatorname{Freq}^{+}(w,\pi_{0}(Z))=\sum_{w^{\prime}\in B^{n}}\operatorname{% Freq}^{+}(w\otimes w^{\prime},Z)

which, by $\mu\otimes\nu$ -normality of $Z$ , implies

\operatorname{Freq}^{+}(w,\pi_{0}(Z))=\sum_{w^{\prime}\in B^{n}}\mu\otimes\nu(% w\otimes w^{\prime})=\sum_{w^{\prime}\in B^{n}}\mu(w)\nu(w^{\prime})=\mu(w).

The same holds for $\operatorname{Freq}^{-}(w,\pi_{0}(Z))$ , hence we have proven that $\operatorname{Freq}(w,\pi_{0}(Z))=\mu(w)$ , which is what we wanted. $\hfill\blacktriangleleft$

We are now ready to prove Agafonov’s theorem for PFAs.

Theorem 16 (Agafonov’s theorem for PFA).

Let $X\in A^{\omega}$ , and $\mu$ a Bernoulli measure over $A$ . The following are equivalent.

(i)

$X$ is $\mu$ -normal.
(ii)

For any deterministic selector $\mathcal{S}$ that selects a subsequence $Y$ of $X$ , either $Y$ is finite or $Y$ is $\mu$ -normal.
(iii)

For any probabilistic selector $\mathcal{S}$ that selects a subsequence $Y$ of $X$ , almost surely, either $Y$ is finite, or $Y$ is $\mu$ -normal.

Proof.

Since deterministic selectors (for which we already have Agafonov’s theorem) are a subset of the probabilistic ones, all that is left to prove is $(i)\Rightarrow(iii)$ .

Let $\mathcal{S}=(Q,A,q_{I},\delta,S)$ be a probabilistic selector. Recall that each transition $\delta(q,a)$ (where $q\in Q$ and $a\in A$ ) is some probability distribution over $Q$ . Consider the set $\mathcal{T}$ of all functions $Q\times A\rightarrow Q$ . We can put a distribution $\tau$ over $\mathcal{T}$ such that if $t$ is chosen according to $\tau$ , for every $(q,a)$ , the marginal distribution of $t(q,a)$ is $\delta(q,a)$ . An easy way to do this is to take $\tau=\bigotimes_{(q,a)\in Q\times A}\delta(q,a)$ .

This construction means that, for a fixed sequence $X\in A^{\omega}$ , an equivalent way to simulate the probabilistic run of $\mathcal{S}$ on $X$ is, every time we are in a state $q$ and read a letter $a$ , to pick $t$ at random according to $\tau$ and move to state $t(q,a)$ . But $\mathcal{T}$ is a finite set and $\tau$ a Bernoulli measure over it. This Bernoulli measure might not be positive. If it is not, we simply remove from $\mathcal{T}$ all functions whose $\tau$ -probability is $0$ . Reformulating slightly, yet another equivalent way to simulate the run of $\mathcal{S}$ over $X$ is to do the following:

1.

First choose $T\in\mathcal{T}^{\omega}$ at random with respect to the Bernoulli measure $\tau$ .
2.

Then run on the sequence $X\otimes T$ the deterministic selector $\hat{\mathcal{S}}$ whose set of states is $Q$ and transition $\hat{\delta}$ is defined by $\hat{\delta}(q,(a,t))=t(q,a)$ .

Now, the two following random variables have the same distribution:

$\blacksquare$

The subsequence $Y$ of $X$ selected by $\mathcal{S}$ .
$\blacksquare$

The sequence $\pi_{0}(\hat{Y})$ , where $\hat{Y}$ is the subsequence of $X\otimes T$ selected by $\hat{\mathcal{S}}$ when $T$ is chosen randomly according to $\tau$ .

Since $X$ is $\mu$ -normal, by Lemma 14, $X\otimes T$ is $\mu\otimes\tau$ -normal $\tau$ -almost surely. Thus, by Agafonov’s theorem for deterministic selectors and Bernoulli measures (Theorem 6), almost surely, the subsequence $\hat{Y}$ selected by $\hat{\mathcal{S}}$ is $\mu\otimes\tau$ -normal. Finally, by Lemma 15, this implies that almost surely, the subsequence $\pi_{0}(\hat{Y})$ of $X$ is $\mu$ -normal. This concludes the proof. $\hfill\blacktriangleleft$

4 The Schnorr-Stimm theorem for probabilistic gamblers

Finally, we establish the Schnorr-Stimm theorem for probabilistic finite-state automata. The definition of probabilistic ( $\mu$ -)gambler is the same as the definition of deterministic ( $\mu$ -)gambler except that it is based on PFAs instead of DFAs. In this setting, the quantities $\operatorname{Capital}(\mathcal{G},w)$ become random variables.

Theorem 17 (Schnorr-Stimm theorem for PFAs).

For $X\in A^{\omega}$ and $\mu$ a Bernoulli measure, the following are equivalent.

(i)

$X$ is $\mu$ -normal.
(ii)

Any probabilistic $\mu$ -gambler $\mathcal{G}$ betting on $X$ loses almost surely.

Proof.

We already have $(ii)\Rightarrow(i)$ by Theorem 7 (deterministic gamblers being a subset of probabilistic ones).

For $(i)\Rightarrow(ii)$ , the idea is similar to the proof of Theorem 16. To simulate the run of a probabilistic $\mu$ -gambler $\mathcal{G}=(Q,A,q_{I},\delta,\gamma)$ on a sequence $X$ , one can equivalently run on the sequence $X\otimes T$ (where $T\in\mathcal{T}^{\omega}$ is $\tau$ -random sequence)²²2 $\mathcal{T}$ and $\tau$ are defined in the same way as in Theorem 16. the deterministic gambler $\hat{\mathcal{G}}=(Q,A,q_{I},\hat{\delta},\hat{\gamma})$ where again $\hat{\delta}(q,(a,t))=t(q,a)$ for all $(q,a,t)\in Q\times A\times\mathcal{T}$ and $\hat{\gamma}(q,(a,t))=\gamma(q,a)$ (indeed the bet placed by a gambler at a given stage does not take into account which state will be reached next). Note that the fairness condition is respected since for every $q$ ,

$\displaystyle\sum_{(a,t)\in A\times\mathcal{T}}(\mu\otimes\tau)(a,t)\cdot\hat{% \gamma}(q,(a,t))$	$\displaystyle=$	$\displaystyle\sum_{a\in A,t\in\mathcal{T}}\mu(a)\tau(t)\gamma(q,a)$
	$\displaystyle=$	$\displaystyle\sum_{t\in\mathcal{T}}\tau(t)\sum_{a\in A}\mu(a)\gamma(q,a)$
	$\displaystyle=$	$\displaystyle\sum_{t\in\mathcal{T}}\tau(t)$
	$\displaystyle=$	$\displaystyle 1$

(the second-to-last equality holds by fairness condition on $\gamma$ and the last one because $\tau$ is a distribution).

In this way, the two following random variables will have the same distribution:

$\blacksquare$

$\operatorname{Capital}(\mathcal{G},X[0,n])$ .
$\blacksquare$

$\operatorname{Capital}(\hat{\mathcal{G}},(X\otimes T)[0,n])$ where $T\in\mathcal{T}^{\omega}$ is chosen at random according to $\tau$ .

Since $X$ is $\mu$ -normal, by Lemma 14 again, $X\otimes T$ is $\mu\otimes\tau$ -normal $\tau$ -almost surely. Thus, by Theorem 7, for $\tau$ -almost all $T$ , $\operatorname{Capital}(\hat{\mathcal{G}},(X\otimes T)[0,n])$ is bounded by a constant $C$ independent on $n$ . By the equivalence of the two random variables above, this means that almost surely, there exists a constant $C$ such that $\operatorname{Capital}(\mathcal{G},X[0,n])<C$ for all $n$ . In other words, almost surely, $\mathcal{G}$ loses on the sequence $X$ . $\hfill\blacktriangleleft$

We remark that the dichotomy theorem for Bernoulli measures yields the following dichotomy for probabilistic $\mu$ -gamblers.

Theorem 18 (Schnorr-Stimm dichotomy theorem for PFAs).

Let $X$ be an infinite sequence in $A^{\omega}$ and $\mu$ a Bernoulli measure.

(i)

If $X$ is $\mu$ -normal and $\mathcal{G}$ is a $\mu$ -gambler, then almost surely the capital of $\mathcal{G}$ throughout the game either is ultimately constant or decreases at an exponential rate.
(ii)

If $X$ is not $\mu$ -normal, then there exists a $\mu$ -gambler $\mathcal{G}$ which wins against $X$ at an “infinitely often” exponential rate almost surely.

5 Conclusion

The main contributions of this paper are a generalization of the Agafonov theorem for PFA with arbitrary transition probabilities which settles the open question posed by Léchine et al. [6], and an extension of Schnorr-Stimm theorem to probabilistic gamblers.

While we proved the probabilistic Agafonov theorem (Theorem 16) by reduction to the deterministic setting, it is also possible to follow with a more direct approach (i.e., without appealing to the Seiller-Simonsen result), similar to the one followed by Carton to generalize Agafonov’s theorem for DFA [3]. This however makes the argument somewhat more complicated.

An interesting direction for future research is to explore whether the “uselessness of randomness” also holds for pushdown automata. These are a more powerful model of computation and indeed some normal sequences can be predicted by pushdown automata (some in a rather dramatic way, as proven by Carton and Perifel [4]³³3The result proven by Carton and Perifel considers a slightly different paradigm, namely compression (which we did not discuss in this paper) instead of prediction.). We can for example ask: If some probabilistic pushdown selector selects a biased subsequence from a sequence $X$ , does there necessarily exist a deterministic pushdown selector which also selects a biased subsequence? Similarly, if some probabilistic pushdown gambler wins against a sequence $X$ , does there necessarily exist a deterministic pushdown gambler which wins against that same sequence $X$ ?

A related question concerns the speed of success in the gambling model. In the case of finite-state automata, Schnorr and Stimm proved that either a sequence $X$ cannot be predicted or some gambler wins on it at an exponential rate. This dichotomy no longer holds for pushdown automata, but one may ask the following question: If some probabilistic pushdown gambler wins at an exponential rate against a sequence $X$ , does there necessarily exist a pushdown gambler which wins against that sequence $X$ at an exponential rate?

References

[1] V. N. Agafonov. Normal sequences and finite automata. Soviet Mathematics Doklady, 9:324–325, 1968.
[2] Laurent Bienvenu, Valentino Delle Rose, and Tomasz Steifer. Probabilistic vs deterministic gamblers. In 39th International Symposium on Theoretical Aspects of Computer Science, STACS 2022, volume 219 of LIPIcs, pages 11:1–11:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.STACS.2022.11.
[3] Olivier Carton. A direct proof of Agafonov’s theorem and an extension to shift of finite type. CoRR, abs/2005.00255, 2020. arXiv:2005.00255.
[4] Olivier Carton and Sylvain Perifel. Deterministic pushdown automata can compress some normal sequences. Logical Methods in Computer Science, Volume 20, Issue 3, 2024.
[5] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley, 2nd edition edition, 2006.
[6] Ulysse Léchine, Thomas Seiller, and Jakob Grue Simonsen. Agafonov’s theorem for probabilistic selectors. In 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, volume 306 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.MFCS.2024.67.
[7] John E. Maxfield. A short proof of Pillai’s theorem on normal numbers. Pacific Journal of Mathematics, 2(1):23–24, 1952.
[8] Satyadev Nandakumar, Subin Pulari, Prateek Vishnoi, and Gopal Viswanathan. An analogue of Pillai’s theorem for continued fraction normality and an application to subsequences. Bulletin of the London Mathematical Society, 53(5):1414–1428, 2021.
[9] Claus Schnorr and Hermann Stimm. Endliche Automaten und Zufallsfolgen. Acta Informatica, 1(4):345–359, 1972. doi:10.1007/BF00289514.
[10] Thomas Seiller and Jakob Grue Simonsen. Agafonov’s theorem for finite and infinite alphabets and probability distributions different from equidistribution. Ergodic Theory and Dynamical Systems, 45(11):3506–3539, 2025.

[bib.bib1] [1] V. N. Agafonov. Normal sequences and finite automata. Soviet Mathematics Doklady, 9:324–325, 1968.

[bib.bib2] [2] Laurent Bienvenu, Valentino Delle Rose, and Tomasz Steifer. Probabilistic vs deterministic gamblers. In 39th International Symposium on Theoretical Aspects of Computer Science, STACS 2022, volume 219 of LIPIcs, pages 11:1–11:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.STACS.2022.11.

[bib.bib3] [3] Olivier Carton. A direct proof of Agafonov’s theorem and an extension to shift of finite type. CoRR, abs/2005.00255, 2020. arXiv:2005.00255.

[bib.bib4] [4] Olivier Carton and Sylvain Perifel. Deterministic pushdown automata can compress some normal sequences. Logical Methods in Computer Science, Volume 20, Issue 3, 2024.

[bib.bib5] [5] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley, 2nd edition edition, 2006.

[bib.bib6] [6] Ulysse Léchine, Thomas Seiller, and Jakob Grue Simonsen. Agafonov’s theorem for probabilistic selectors. In 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, volume 306 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.MFCS.2024.67.

[bib.bib7] [7] John E. Maxfield. A short proof of Pillai’s theorem on normal numbers. Pacific Journal of Mathematics, 2(1):23–24, 1952.

[bib.bib8] [8] Satyadev Nandakumar, Subin Pulari, Prateek Vishnoi, and Gopal Viswanathan. An analogue of Pillai’s theorem for continued fraction normality and an application to subsequences. Bulletin of the London Mathematical Society, 53(5):1414–1428, 2021.

[bib.bib9] [9] Claus Schnorr and Hermann Stimm. Endliche Automaten und Zufallsfolgen. Acta Informatica, 1(4):345–359, 1972. doi:10.1007/BF00289514.

[bib.bib10] [10] Thomas Seiller and Jakob Grue Simonsen. Agafonov’s theorem for finite and infinite alphabets and probability distributions different from equidistribution. Ergodic Theory and Dynamical Systems, 45(11):3506–3539, 2025.

The Agafonov and Schnorr-Stimm Theorems for Probabilistic Automata

Abstract

Keywords and phrases:

Funding:

Copyright and License:

2012 ACM Subject Classification:

Related Version:

DOI:

Event:

Editors:

Series and Publisher:

1 Introduction

Theorem 1 (Agafonov [1]).

Theorem 2 (Schnorr-Stimm [9]).

Theorem 3 (Léchine et al. [6]).

Notation and terminology

2 Deterministic prediction for Bernoulli measures

Definition 4.

Definition 5.

Theorem 6 (Agafonov theorem for Bernoulli measures [10]).

Theorem 7 (Schnorr-Stimm theorem for Bernoulli measures).

Proof.

(𝒊)⇒(𝒊⁢𝒊).

(𝒊⁢𝒊)⇒(𝒊).

Theorem 8 (Schnorr-Stimm dichotomy theorem [9]).

Theorem 9 (Schnorr-Stimm dichotomy theorem for Bernoulli measures).

Lemma 10 (Seiller-Simonsen [10]).

Lemma 11.

Theorem 12.

Proof of Lemma 11.

Lemma 13.

Proof.

Proof of Theorem 9.

3 The Agafonov theorem for PFAs

Lemma 14.

Proof.

Lemma 15.

Proof.

Theorem 16 (Agafonov’s theorem for PFA).

Proof.

4 The Schnorr-Stimm theorem for probabilistic gamblers

Theorem 17 (Schnorr-Stimm theorem for PFAs).

Proof.

Theorem 18 (Schnorr-Stimm dichotomy theorem for PFAs).

5 Conclusion

References

$(i)\Rightarrow(ii)$ .

$(ii)\Rightarrow(i)$ .