Abstract 1 Introduction 2 Deterministic prediction for Bernoulli measures 3 The Agafonov theorem for PFAs 4 The Schnorr-Stimm theorem for probabilistic gamblers 5 Conclusion References

The Agafonov and Schnorr-Stimm Theorems for Probabilistic Automata

Laurent Bienvenu ORCID LaBRI, CNRS & Université de Bordeaux, Talence, France Hugo Gimbert ORCID LaBRI, CNRS & Université de Bordeaux, Talence, France Subin Pulari ORCID LaBRI, CNRS & Université de Bordeaux, Talence, France
Abstract

For a fixed alphabet A, an infinite sequence X is said to be normal if every word w over A appears in X with the same frequency as any other word of the same length. A classical result of Agafonov (1966) relates normality to finite automata as follows: a sequence X is normal if and only if any subsequence of X selected by a finite automaton is itself normal. Another theorem of Schnorr and Stimm (1972) gives an alternative characterization: a sequence X is normal if and only if no gambler can win large amounts of money by betting on the sequence X using a strategy that can be described by a finite automaton. Both of these theorems are established in the setting of deterministic finite automata. This raises the question as to whether they can be extended to the setting of probabilistic finite automata. In the case of the Agafonov theorem, a partial positive answer was given by Léchine et al. (MFCS 2024) in a restricted case of probabilistic automata with rational transition probabilities.

In this paper, we settle the full conjecture by proving that both the Agafonov and the Schnorr-Stimm theorems hold true for arbitrary probabilistic automata. Specifically, we show that a sequence X is normal if and only if any probabilistic automaton selects a normal subsequence of X with probability 1 and also show that a sequence X is normal if and only if any probabilistic finite-state gambler fails to win on X with probability 1.

Keywords and phrases:
Normality, Agafonov theorem, probabilistic automata
Funding:
Laurent Bienvenu: supported in part by the ANR project FLITTLA ANR-21-CE48-0023.
Subin Pulari: supported in part by the ANR project FLITTLA ANR-21-CE48-0023.
Copyright and License:
[Uncaptioned image] © Laurent Bienvenu, Hugo Gimbert, and Subin Pulari; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Formal languages and automata theory
Related Version:
Extended Version: https://arxiv.org/abs/2502.12307
Editors:
C. Aiswarya, Ruta Mehta, and Subhajit Roy

1 Introduction

Given a finite alphabet A of k letters, an infinite sequence X of letters is said to be normal if every word w of A appear as subword of X with the same frequency as any other word of the same length, namely, (1/k)|w|. The famous Champernowne sequence

01234567891011121314151617181920212223242526

can be shown to be normal over the alphabet A={0,,9}. The number π, for example, is conjectured to have a normal expansion in every base, though this very much remains an open question. Normal sequences are plentiful, and an easy way to generate a normal sequence X is to draw each letter X(n) at random in the alphabet A (all letters having the same probability 1/|A|) independently of the other chosen letters X(m). The law of large numbers tells us that we obtain a normal sequence with probability 1.

Of course, there are also plenty of examples of non-normal sequences:

  • Periodic, or ultimately periodic, sequences can never be normal: indeed if the period of X is k there are only k possible subwords of X of length k hence most words of length k will not appear and their frequency will be 0.

  • Sturmian sequences, which are sequences with only k+1 different subwords of length k (such as the Fibonacci sequence 100101001001010010100100101001001… obtained by iterating the morphism 001 and 10) are not normal for the same reason.

  • The Thue-Morse sequence 01101001100101101001011001101001 obtained by iterating the morphism 001 and 110 is not normal because 000 and 111 do not appear as subwords.

  • A sequence of 0’s and 1’s generated at random where each bit is chosen equal to the previous one with probability 2/3 will have, with probability 1, all possible finite words as subwords but will not (still with probability 1) be normal as for example the word 00 will appear with frequency 1/3 instead of 1/4.

It turns out that normality has a nice interpretation in terms of finite automata. Indeed, two classical results, one due to Agafonov [1] and the other due to Schnorr and Stimm [9], assert that an infinite sequence is normal if and only if it cannot be predicted by a finite automaton with better-than-average accuracy. Of course, one needs to specify what “predicted” means. We consider two prediction models.

  1. (I)

    In the Agafonov model, an automaton reads the infinite sequence X one letter at a time and updates its state in the usual way. Some of its states have a “select” tag on them. When the current state has such a tag, the next letter will be selected and added to a subsequence Y. We consider the automaton successful at predicting X if the subsequence Y built in this process is infinite and some letter of A does not have asymptotic frequency 1/|A| in Y. This means that the automaton has exhibited a statistical anomaly in the sequence X and isolated this anomaly in the subsequence Y.

  2. (II)

    In the Schnorr-Stimm model, the predictor is still an automaton but this time is viewed as a gambling strategy. The gambler starts with a capital of $1. Each state q is labeled with a betting function γq:A0. This function represents the amount by which the predictor would like her capital to be multiplied by depending on the value of the next bit. For example, suppose the player plays against a sequence X{a,b,c}ω. If her current capital is $2 and the current state q is labelled by a betting function γq such that γq(a)=0.7, γq(b)=1.1 and γ(c)=1.2, if the next letter is a, her new capital will be $1.4, if it is b her new capital will be $2.2 and if it is c her new capital will be $2.4. For the game to be fair, each betting function γq must have expectation equal to 1, i.e., must satisfy 1|A|aAγq(a)=1. We say that the predictor wins if the capital of the player takes arbitrarily large values throughout the (infinite) game. That is, the predictor has spotted some type of statistical anomaly and is exploiting it to get rich!

Both of these models lead to the same conclusion: an infinite sequence X is normal if and only if it is unpredictable (by a finite automaton).

Theorem 1 (Agafonov [1]).

For XAω, the following are equivalent.

  1. (i)

    X is normal.

  2. (ii)

    For any automaton 𝒜 that selects a subsequence Y of X as in model I, either Y is finite, or every letter of A appears in Y with asymptotic frequency 1/|A|.

  3. (iii)

    For any automaton 𝒜 that selects a subsequence Y of X as in model I, either Y is finite, or Y is normal.

(See also Carton [3] and Seiller and Simonsen [10] for a modern account of the above theorem).

Theorem 2 (Schnorr-Stimm [9]).

For XAω, the following are equivalent.

  1. (i)

    X is normal.

  2. (ii)

    Any automaton 𝒜 betting on X according to model II does not win.

In both of these theorems, the finite automata used for prediction are assumed to be deterministic. Would the situation change if one allowed probabilistic automata? In principle, one would not expect an unpredictable sequence to become predictable in the presence of a random source. Indeed, given a sequence X and a random source R it seems, informally speaking, that almost surely R will not “know” anything about X and thus will not help in predicting X. Surprisingly, this intuition is wrong in the setting where the predictors are not finite automata but are Turing machines, as shown by Bienvenu et al. [2] who built a sequence that is unpredictable by deterministic Turing machines (in either prediction model of selection or gambling) and becomes predictable (in either model) if one allows probabilistic Turing machines. Nonetheless, finite automata are much weaker than Turing machines and Bienvenu et al.’s construction does not work for such a memoryless model of computation. Indeed, recently, Léchine et al. showed that Agafonov’s theorems holds for probabilistic automata in the restricted case where the transition probabilities are rational.

Theorem 3 (Léchine et al. [6]).

For XAω, the following are equivalent.

  1. (i)

    X is normal.

  2. (ii)

    For any probabilistic automaton 𝒜 with rational probabilities, almost surely, 𝒜 selects a subsequence Y of X that is either finite, or every letter of A appears in Y with asymptotic frequency 1/|A|.

  3. (iii)

    For any probabilistic automaton 𝒜 with rational probabilities, almost surely, 𝒜 selects a subsequence Y of X such that either Y is finite, or Y is normal.

This led them to conjecture that the probabilistic version of Agafonov’s theorem holds in the general case, where the probabilities are real-valued rather than being rational. In this paper, we prove this conjecture and also prove the probabilistic version of the Schnorr-Stimm theorem. Additionally, we establish a probabilistic version of the Schnorr-Stimm dichotomy regarding the winning rates of probabilistic gamblers.

Léchine et al.’s proof is a reduction of the rational probabilistic case to the deterministic case by some clever combinatorial reduction (which substantially increases the number of states). Our proof is also a reduction to the deterministic case but is of very different nature: Instead of encoding a probabilistic automaton into a deterministic one, we will use the fact that a probabilistic automaton can be seen as a deterministic one over a larger alphabet, without altering the number of states. For this, we will need an extension of the deterministic case to Bernoulli measures (an extension which was proved by Seiller and Simonsen [10]), which will be presented in the next section.

Notation and terminology

We finish this introduction by formalizing the concepts discussed so far and gathering the notation and terminology that will be used in the rest of the paper.

Given an alphabet A, we denote by A the set of finite words over A, by An the set of words of length n, by Aω the set of infinite sequences of letters and by Aω the set AAω. For XAω, we denote by X(i) the i-th letter of X (by convention there is a 0-th letter) and by X[i,j] the word X(i)X(i+1)X(j). Let ϵ denote the empty string.

Given a word u of length k and a word w of length nk, we denote by NbOcc(u,w) the number of occurrences of the word u in w, i.e.,

NbOcc(u,w)=#{i:0ink,w[i,i+k1]=u}

and the frequency of occurrence Freq(u,w) of u in w is naturally defined by

Freq(u,w)=NbOcc(u,w)nk+1

When X is an infinite sequence, we define

Freq(u,X)=lim infnFreq(u,X[0,n])andFreq+(u,X)=lim supnFreq(u,X[0,n])

When Freq(u,X) and Freq+(u,X) have the same value, we simply call this common value Freq(u,X).

Given XAω, we say that X is balanced if all letters appear in X with the expected frequency, i.e., Freq(a,X)=1/|A| for all aA. We say that X is normal if all words appear in X with the expected frequency, i.e., Freq(w,X)=|A||w| for all wA.

A deterministic finite automaton (DFA) is a tuple (Q,A,qI,δ) where Q is a finite set of states, A a finite alphabet, qI the initial state and δ:Q×AQ the transition function (in this paper, runs of automata are meant to be infinite hence there is no need for final states). We denote by δ the function from Q×A to Q defined inductively by δ(q,ϵ)=q and for wA and aA, δ(q,wa)=δ(δ(q,w),a), where is the concatenation of words.

An automatic selector (or selector for short) is a tuple (Q,A,qI,δ,S) where (Q,A,qI,δ) is a DFA and S is a subset of Q, representing the selection states.

Given a selector 𝒮=(Q,A,qI,δ,S), we define the selection function from A to A inductively by:

Select(𝒮,ϵ)=ϵ

and for wA and aA:

Select(𝒮,wa)={Select(𝒮,w)a if δ(qI,w)SSelect(𝒮,w) if δ(qI,w)S

If X is an infinite sequence in Aω, the sequence of words Select(𝒮,X[0,n]) is non-decreasing with respect to the prefix order and thus converges to a sequence YAω which we call the selected subsequence of X selected by 𝒮 and denote by Select(𝒮,X).

An automatic gambler (or gambler for short) is a tuple (Q,A,qI,δ,γ) where (Q,A,qI,δ) is a DFA and γ is a function from Q×A to 0 such that for all q, 1|A|aAγ(q,a)=1. As said earlier, the value of γ(q,a) should be interpreted as the multiplier by which the gambler, being currently in state q, would like her capital to be multiplied by if the next read letter is a. The condition 1|A|aAγ(q,a)=1 ensures that the game is fair.

Given a gambler 𝒢=(Q,A,qI,δ,γ) and wA, we define Capital(𝒢,w) inductively by

Capital(𝒢,ϵ)=1

and for wA and aA,

Capital(𝒢,wa)=Capital(𝒢,w)γ(δ(qI,w),a)

and we say that a gambler 𝒢 wins against XAω if

lim supn+Capital(𝒢,X[0,n])=+

(otherwise we say that 𝒢 loses).

2 Deterministic prediction for Bernoulli measures

In classical normality, all letters of the alphabet occur with the same frequency. We can however consider the extension of normality to Bernoulli measures. A Bernoulli measure over Aω is a probability measure where letters of an infinite sequence X are drawn at random independently of one another but the distribution over the alphabet A is non-uniform.

Definition 4.

Let μ:A[0,1] be a distribution over the alphabet A (hence satisfying aAμ(a)=1). The Bernoulli measure induced by μ, which we also denote by μ by abuse of notation, is the unique probability measure such that for all i,k, for every word w=a0,akA,

PrXμ[X[i,i+k]=w]=j=0kμ(aj)

We also denote by μ(w) the quantity j=0kμ(aj).

Normality generalizes very naturally to Bernoulli measures.

Definition 5.

Let μ be a Bernoulli measure. A sequence XAω is μ-balanced if Freq(a,X)=μ(a) for all aA. It is μ-normal if for all words wA, Freq(w,X)=μ(w).

We say that a Bernoulli measure μ is positive when μ(a)>0 for every letter a. In the rest of the paper, all Bernoulli measures will be assumed to be positive, and we simply say “Bernoulli measure” to mean “positive Bernoulli measure”.

The Agafonov theorem can be extended to Bernoulli measures, as proven by Seiller and Simonsen [10]. It is this theorem that we will use in the next section to obtain a proof of the Agafonov theorem for probabilistic selectors.

Theorem 6 (Agafonov theorem for Bernoulli measures [10]).

For XAω, the following are equivalent.

  1. (i)

    X is μ-normal.

  2. (ii)

    For any selector 𝒮 that selects a subsequence Y of X, either Y is finite or Y is μ-balanced.

  3. (iii)

    For any selector 𝒮 that selects a subsequence Y of X, either Y is finite or Y is μ-normal.

We can also easily generalize the notion of gambler to the setting of Bernoulli measures: it suffices to define a μ-gambler 𝒢=(Q,A,qI,δ,γ) as before but with the fairness condition on γ replaced by aAμ(a)γ(q,a)=1 for every q. The function Capital and the notion of success are defined as before.

We will now prove that the Schnorr-Stimm theorem, just like the Agafonov theorem, can also be extended to Bernoulli measures.

Theorem 7 (Schnorr-Stimm theorem for Bernoulli measures).

For XAω and a Bernoulli measure μ, the following are equivalent.

  1. (i)

    X is μ-normal.

  2. (ii)

    No μ-gambler 𝒢 wins by betting on X.

Proof.

(𝒊)(𝒊𝒊).

Suppose that XAω is μ-normal and consider a μ-gambler 𝒢=(Q,A,qI,δ,γ).

We can assume that the μ-gambler only has one state on which it places a non-trivial bet. Indeed, define for every qQ the μ-gambler 𝒢[q]=(Q,A,qI,δ,γ[q]) where γ[q](q,a)=γ(q,a) for all a and γ[q](q,a)=1 for qq. That is, 𝒢[q] is the gambler 𝒢 where all states but state q are neutralized (no bet is placed while on them). By the multiplicative nature of the function Capital, we have for all n:

Capital(𝒢,X[0,n])=qQCapital(𝒢[q],X[0,n])

Thus if we can show that all quantities Capital(𝒢[q],X[0,n]) are bounded, we are done. Let us assume that there is a state r that is the unique state on which 𝒢 bets. If instead of 𝒢 we consider the selector 𝒮=(Q,A,qI,δ,{r}) with only r as the selecting state, we know by Agafonov’s theorem for Bernoulli measures (Theorem 6) that the subsequence Y of X selected by 𝒮 is μ-normal, hence in particular μ-balanced. But this subsequence is precisely the values of X on which 𝒢 bets!

We can further assume that Y is infinite, otherwise it means that the run of 𝒢 on X passes by r finitely often, hence 𝒢 certainly cannot win as other states are not betting states. Now, suppose that at the n-th step of the run on X the state r has been visited k=k(n) times. We have

Capital(𝒢,X[0,n])=aAγ(r,a)NbOcc(a,Y[0,k])

But since Y is μ-normal we have, for every a, NbOcc(a,Y[0,k])=μ(a)k+o(k). Thus,

Capital(𝒢,X[0,n])=aAγ(r,a)μ(a)k+o(k)

or equivalently

logCapital(𝒢,X[0,n])=(k+o(k))aAμ(a)logγ(r,a)

(here we assume that all values γ(r,a) involved in the product are positive for if not then the capital falls to 0 and we are done). Since we have aAμ(a)=1 (μ being a distribution), we can use the strict concavity of the function log on (0,+) to apply Jensen’s inequality and get

aAμ(a)logγ(r,a)log(aAμ(a)γ(r,a))

with strict inequality when not all γ(q,a) are equal (which is the case where 𝒢 makes non-trivial bets). But by the fairness condition, we have aAμ(a)γ(r,a)=1 hence we see that logCapital(𝒢,X[0,n]) is either 0 or ultimately negative which either way means that 𝒢 does not win.

(𝒊𝒊)(𝒊).

Assume that X is not μ-normal. This means that there is some word w such that Freq(w,X[0,n]) does not converge to μ(w). Let us assume that w is such a word of minimal length and write w=ux with uA and xA.

Consider the sequence of vectors fn defined by

fn=(Freq(ua,X[0,n])bAFreq(ub,X[0,n]))aA

or equivalently,

fn=(NbOcc(ua,X[0,n])bANbOcc(ub,X[0,n]))aA

All of these vectors belong to the set Γ={f:A[0,1]:aAf(a)=1}. This is a compact set, hence the sequence fn must have cluster points. By definition of u and x, we know that fn does not converge to μ because fn(x) does not converge to μ(x): Indeed, in the definition of fn(x), the denominator converges to Freq(u,X) which by minimality of w is defined and equal to μ(u), while the numerator is equal to Freq(w,X[0,n]) which by definition of w does not converge to μ(w)=μ(u)μ(x).

Therefore, the sequence (fn) must have at least one cluster point ν different from μ. Fix such a cluster point ν.

We now build our gambler 𝒢=(Q,A,qI,δ,γ). The idea is that the gambler will record the last |u| bits it read and will only place bets when these exactly form the word u. Let us thus take Q={qv:vA,|v||u|} initial state qI=qϵ and define δ by

δ(qv,a)={qva,if |v|<|u|,qva if |v|=|u| and v=xv with xA.

Now, define γ(qv,a)=1 whenever vu and γ(qu,a)=ν(a)/μ(a) for all aA. Observe that this is a valid μ-gambler as the fairness condition is satisfied: aAν(a)/μ(a)μ(a)=aν(a)=1.

Suppose that after reading n letters of X the state qu has been visited k=k(n) times. First, observe that k(n) tends to +. Indeed, the state qu is visited whenever u is seen as a subword of X. But we assumed that u appears in X with frequency μ(u), by minimality of w.

Second, unfolding the definition of fn, we have,

Capital(𝒢,X[0,n])=aA(ν(a)μ(a))kfn(a).

This gives

logCapital(𝒢,X[0,n])=kaAfn(a)log(ν(a)μ(a)).

Since ν is a cluster point of the sequence fn, for any fixed ε>0 there are infinitely many n (or k) such that

aAfn(a)log(ν(a)μ(a))aAν(a)log(ν(a)μ(a))ε.

But the term aAν(a)log(ν(a)μ(a)) is the relative entropy from μ to ν (also known as Kullback-Liebler divergence, see for example [5]) which we denote by DKL(ν||μ). This quantity is non-negative in general and is positive when νμ, which is the case here. We have thus established that for any fixed ε, there are arbitrarily large n and k such that

logCapital(𝒢,X[0,n])k(DKL(ν||μ)ε).

Taking any ε<DKL(ν||μ), this shows that

lim supn+Capital(𝒢,X[0,n])=+.

Let us note that this last proof actually gives us a finer analysis of normality, in terms of the rate of failure or success of the gambler. This was already observed by Schnorr and Stimm in their seminal paper, where they proved the following.

Theorem 8 (Schnorr-Stimm dichotomy theorem [9]).

Let X be an infinite sequence in Aω.

  1. (i)

    If X is normal and 𝒢 is a gambler, then the capital of 𝒢 throughout the game either is ultimately constant or decreases at an exponential rate.

  2. (ii)

    If X is not normal, then there exists a gambler 𝒢 which wins against X at an “infinitely often” exponential rate (i.e., lim supnlog(Capital)/n>0).

As a byproduct of our proof of Theorem 7, we have the same dichotomy for positive Bernoulli measures (i.e., Bernoulli measures such that μ(a)>0 for every letter):

Theorem 9 (Schnorr-Stimm dichotomy theorem for Bernoulli measures).

Let X be an infinite sequence in Aω and μ a positive Bernoulli measure.

  1. (i)

    If X is μ-normal and 𝒢 is a μ-gambler, then the capital of 𝒢 throughout the game either is ultimately constant or decreases at an exponential rate.

  2. (ii)

    If X is not μ-normal, then there exists a μ-gambler 𝒢 which wins against X at an “infinitely often” exponential rate.

Our proof of Theorem 7 almost establishes this, but we do need an additional technical lemma and the block-wise characterization of μ-normality. Let BFreq(u,w) denote the block frequency of the word u in w, defined as the proportion of non-overlapping blocks in w that are equal to u. More precisely when n=|w| and k=|u|,

BFreq(u,w)=#{i:0in/k,w[ki,k(i+1)1]=u}n/k.

For wA, let BFreq(w,X), BFreq+(w,X) and BFreq(w,X) denote the lower, upper and limit block frequency of w in X defined similarly as Freq(w,X), Freq+(w,X) and Freq(w,X). A sequence X is μ-block normal if for all words wA, BFreq(w,X)=μ(w). The following was shown by Seiller and Simonsen.

Lemma 10 (Seiller-Simonsen [10]).

If a sequence XAω is μ-normal then X is μ-block normal.

We note that the converse implication also holds.

Lemma 11.

If a sequence XAω is μ-block normal then X is μ-normal.

The above lemmas completes the proof of the following equivalence theorem between μ-normality and μ-block normality.

Theorem 12.

A sequence XAω is μ-normal if and only if X is μ-block normal.

The proof of Lemma 11 uses a counting trick from the proof of Theorem 3.1 from [8] which in turn is based on the proof of the main theorem in [7].

Proof of Lemma 11.

As in the proof of Theorem 3.1 from [8], for any finite length string w=a1a2a3akAk and large enough n,

Freq(w,X[0,n])=f1(n)+f2(n)++f(1+log2nk)(n)+(k1)O(logn)nk+1 (1)

where fp(n) are defined as follows:
fp(n)={|{iX[ki,k(i+1)1]=w, 0in/k}|nk+1,if p=1j=1k1|{iX[2p1ki,2p1k(i+1)1]Sj,0in/2p1k}|nk+1,if 1<p(1+log2(n/k))0, otherwise.

In the above definition, Sj is the set of strings of the form, ua1a2akv where u is some string of length 2p2kj and v is some string of length 2p2kk+j.

Equation 1 shows that the frequency of w in X[0,n] can be written as a sum of different block frequencies. The quantity f1(n) counts the number of occurrences of w inside disjoint k-length blocks in X[0,n], f2(n) counts the number of occurrences crossing the boundaries of these k-blocks, f3(n) counts those crossing boundaries of 2k-blocks, and so on. In general, fp(n) counts the number of occurrences straddling boundaries of disjoint 2p1k-length blocks. Each fp(n) may miss at most (k1) occurrences of w, which are accounted for by the error term.

Since X is μ-block normal,

limnf1(n)=limn|{iX[ki,k(i+1)1]=w, 0in/k}|nk+1=μ(w)k.

Now, when p1+log2(nk),

limnfp(n)= j=1k1limn|{iX[2p1ki,2p1k(i+1)1]Sj,0in/2p1k}|nk+1
=μ(w)2p1k(k1).

Since i=1mfi(n)m is uniformly convergent, we have

Freq(w,X)=limni=1fi(n)=i=1limnfi(n).

Therefore from (1),

limnFreq(w,X[0,n]) =μ(w)k+(k1)μ(w)[i=112ik]=μ(w).

Hence, X is μ-normal.

We require the following technical lemma in the proof of Theorem 9.

Lemma 13.

Let (Q,A,qI,δ) be a finite-state automaton, μ be a positive Bernoulli measure and let Vq(n,X) denote the number of times the state q is visited upon running the automaton using the first n bits of XAω.

Then, for every qQ, there exists a real number πq0 such that, for every μ-normal sequence XAω:

  • either Vq(n,X) is ultimately constant (i.e., q is visited only finitely often during the run on X)

  • or, limnVq(n,X)n=πq (i.e., the state q is visited with asymptotic frequency πq) and this second case can only happen when πq>0.

Proof.

On running an automaton upon a normal sequence, starting from any state, a strongly connected component must be reached in finitely many steps. Similar to [10], let us consider the Markov chain corresponding to the |Q|×|Q| stochastic matrix 𝐏 where,

𝐏ij=aAμ(a)1δ(i,a)=j.

The proof follows using the Ergodic Theorem for Markov chains and the same steps in the proof of Lemma 4.5 from [9] by replacing the uniform measure with the positive Bernoulli measure induced by μ. We note that this line of proof uses μ-block normality, which is equivalent to μ-normality (Theorem 12).

Proof of Theorem 9.

In part (i)(ii) of our proof of Theorem 7, we showed that on a given state r, either r is not a betting state (γ is the constant 1 on this state) or it is and then the gambler loses money exponentially fast in the number of times this state is visited, the exponent being αr=aAμ(a)logγ(r,a) which we proved to be negative. By Lemma 13, betting states are either visited finitely often or with positive asymptotic density. If they are all visited finitely often, the capital stabilizes after the last bet is made. Otherwise, if betting states r are visited with frequency πr and at least one πr is positive, then the gambler loses at an exponential rate, where the exponent is rπrαr.

In part (ii)(i), under the assumption of non-μ-normality of X, we built a gambler 𝒢 satisfying the following: for any fixed ε, there are arbitrarily large n and k such that logCapital(𝒢,X[0,n])k(DKL(ν||μ)ε). Here, k is the number of visits to a state qu. In the proof of Theorem 7 part (ii), this state qu is visited with frequency μ(u) for large enough n. Hence the gambler has an “infinitely often” exponential rate of success with exponent μ(u)DKL(ν||μ).

3 The Agafonov theorem for PFAs

We now want to prove the extension of Theorem 6 to probabilistic automata/selectors. A probabilistic finite automaton (PFA) is a tuple (Q,A,qI,δ) where Q is a finite set of states, A is a finite alphabet, qI is the initial state and δ:Q×AΔ(Q) is a probabilistic transition function, that is, Δ(Q) is the set of probability distributions over Q. In this setting, we define inductively the random variables δ(q,w) by δ(q,ϵ)=q and for wA and aA, the event δ(q,wa)=q is defined as the union

rQ[δ(q,w)=rδ|w|+1(r,a)=q]

where {δn(r,b):n,rQ,bA} is a family of independent random variables such that for all (n,r,b), the distribution of δn(r,b) is δ(r,b).

Modulo this change of type of transition, probabilistic selectors as well as Select are defined as before. This makes Select(𝒮,X) a random variable for every given XAω.

In order to lift the deterministic Agafonov theorem for Bernoulli measures to the probabilistic case, we will need some preliminary lemmas about normality.

Given two alphabets A and B, and given two sequences XAω and YBω, we denote by XY the sequence Z over the alphabet A×B where Z(n)=(X(n),Y(n)). For two words v and w of the same length over A and B respectively, the product vw is defined in the same way. Likewise, if μ and ν are Bernoulli measures over A and B respectively, μν is the Bernoulli measure ξ over A×B where ξ((a,b))=μ(a)ν(b), for all (a,b)A×B. Finally, if Z is a sequence over a product alphabet A×B, we denote by π0 and π1 its first and second projection respectively (in other words, π0(XY)=X and π1(XY)=Y for all X,Y).

Lemma 14.

Let A and B be two alphabets and μ, ν two Bernoulli measures over A and B respectively. If a sequence XAω is μ-normal and a sequence YBω is drawn at random according to ν, then ν-almost surely111For a probability 1 set of Y according to the measure ν., XY is μν-normal.

Proof.

While it is a bit easier to prove this lemma for block-normality, we prefer a more direct proof which does not rely on the equivalence between normality and block-normality.

Let w=uv be a non-empty word over (A×B). Let N be a large integer. We split XY into blocks of length N :

XY=(X1Y1)(X2Y2)

with |Xi|=|Yi|=N, for all i1. Introduce the random variables (Bi)i1 which count the number of occurrences of uv in each block:

Bi=NbOcc(uv,XiYi).

For every integer n, within (XY)[0,n] there are n/N complete blocks. Some occurrences of uv in (XY)[0,n] do occur inside a block XiYi while some other do not because they overlap two contiguous blocks. There can be at most |w| such overlapping occurrences between two given blocks. That observation leads to two ways to count the number of occurrences of uv: the exact way Cn and the N-block way Cn,N, two random variables satisfying:

Cn=NbOcc(uv,(XY)[0,n])
Cn,N=i1n/NBi
Cn,NCnCn+N,N+|w|n/N. (2)

Let us focus on Cn,N first. The variables Bi can be grouped into |A|N different buckets, with respect to the corresponding value of Xi. For every xAN, set In(x)={1in/NXi=x}, which is non-empty for n large enough, since X is normal. The random variables (XiYi) are mutually independent, and so are the random variables (Bi)i1. Moreover, for a fixed xAN, the random variables (Bi)iI(x) are distributed identically, denote by ξN(x) the corresponding probability distribution on 0N|w|+1. Then 𝔼[ξN(x)] measures the expected number of occurrences of uv in a sequence where the left part is fixed, equal to x, and the right part is independently generated according to ν. Thus

𝔼[ξN(x)]=NbOcc(u,x)ν(v).

According to the Law of Large Numbers, 1|In(x)|iIn(x)BinNbOcc(u,x)ν(v). Since X is normal, every word xAN occurs with frequency μ(x) in X1,X2, thus, almost-surely,

limn1nCn,N=(xANμ(x)NbOcc(u,x))ν(v). (3)

Since X is normal, the right part in (3) converges to μ(u)ν(v) when N grows large. Using (2), we get the desired result:

1nCnnlimN(limnCn,N)=(μν)(uv).

Lemma 15.

If a sequence Z is μν-normal over A×B, then π0(Z) and π1(Z) are μ-normal and ν-normal respectively.

Proof.

Suppose Z(A×B)ω is μν-normal. We only need to show that π0(Z) is μ-normal, the proof of the ν-normality of π1(Z) is the same by symmetry. Let wAn. We have

Freq+(w,π0(Z))=wBnFreq+(ww,Z)

which, by μν-normality of Z, implies

Freq+(w,π0(Z))=wBnμν(ww)=wBnμ(w)ν(w)=μ(w).

The same holds for Freq(w,π0(Z)), hence we have proven that Freq(w,π0(Z))=μ(w), which is what we wanted.

We are now ready to prove Agafonov’s theorem for PFAs.

Theorem 16 (Agafonov’s theorem for PFA).

Let XAω, and μ a Bernoulli measure over A. The following are equivalent.

  1. (i)

    X is μ-normal.

  2. (ii)

    For any deterministic selector 𝒮 that selects a subsequence Y of X, either Y is finite or Y is μ-normal.

  3. (iii)

    For any probabilistic selector 𝒮 that selects a subsequence Y of X, almost surely, either Y is finite, or Y is μ-normal.

Proof.

Since deterministic selectors (for which we already have Agafonov’s theorem) are a subset of the probabilistic ones, all that is left to prove is (i)(iii).

Let 𝒮=(Q,A,qI,δ,S) be a probabilistic selector. Recall that each transition δ(q,a) (where qQ and aA) is some probability distribution over Q. Consider the set 𝒯 of all functions Q×AQ. We can put a distribution τ over 𝒯 such that if t is chosen according to τ, for every (q,a), the marginal distribution of t(q,a) is δ(q,a). An easy way to do this is to take τ=(q,a)Q×Aδ(q,a).

This construction means that, for a fixed sequence XAω, an equivalent way to simulate the probabilistic run of 𝒮 on X is, every time we are in a state q and read a letter a, to pick t at random according to τ and move to state t(q,a). But 𝒯 is a finite set and τ a Bernoulli measure over it. This Bernoulli measure might not be positive. If it is not, we simply remove from 𝒯 all functions whose τ-probability is 0. Reformulating slightly, yet another equivalent way to simulate the run of 𝒮 over X is to do the following:

  1. 1.

    First choose T𝒯ω at random with respect to the Bernoulli measure τ.

  2. 2.

    Then run on the sequence XT the deterministic selector 𝒮^ whose set of states is Q and transition δ^ is defined by δ^(q,(a,t))=t(q,a).

Now, the two following random variables have the same distribution:

  • The subsequence Y of X selected by 𝒮.

  • The sequence π0(Y^), where Y^ is the subsequence of XT selected by 𝒮^ when T is chosen randomly according to τ.

Since X is μ-normal, by Lemma 14, XT is μτ-normal τ-almost surely. Thus, by Agafonov’s theorem for deterministic selectors and Bernoulli measures (Theorem 6), almost surely, the subsequence Y^ selected by 𝒮^ is μτ-normal. Finally, by Lemma 15, this implies that almost surely, the subsequence π0(Y^) of X is μ-normal. This concludes the proof.

4 The Schnorr-Stimm theorem for probabilistic gamblers

Finally, we establish the Schnorr-Stimm theorem for probabilistic finite-state automata. The definition of probabilistic (μ-)gambler is the same as the definition of deterministic (μ-)gambler except that it is based on PFAs instead of DFAs. In this setting, the quantities Capital(𝒢,w) become random variables.

Theorem 17 (Schnorr-Stimm theorem for PFAs).

For XAω and μ a Bernoulli measure, the following are equivalent.

  1. (i)

    X is μ-normal.

  2. (ii)

    Any probabilistic μ-gambler 𝒢 betting on X loses almost surely.

Proof.

We already have (ii)(i) by Theorem 7 (deterministic gamblers being a subset of probabilistic ones).

For (i)(ii), the idea is similar to the proof of Theorem 16. To simulate the run of a probabilistic μ-gambler 𝒢=(Q,A,qI,δ,γ) on a sequence X, one can equivalently run on the sequence XT (where T𝒯ω is τ-random sequence)222𝒯 and τ are defined in the same way as in Theorem 16. the deterministic gambler 𝒢^=(Q,A,qI,δ^,γ^) where again δ^(q,(a,t))=t(q,a) for all (q,a,t)Q×A×𝒯 and γ^(q,(a,t))=γ(q,a) (indeed the bet placed by a gambler at a given stage does not take into account which state will be reached next). Note that the fairness condition is respected since for every q,

(a,t)A×𝒯(μτ)(a,t)γ^(q,(a,t)) = aA,t𝒯μ(a)τ(t)γ(q,a)
= t𝒯τ(t)aAμ(a)γ(q,a)
= t𝒯τ(t)
= 1

(the second-to-last equality holds by fairness condition on γ and the last one because τ is a distribution).

In this way, the two following random variables will have the same distribution:

  • Capital(𝒢,X[0,n]).

  • Capital(𝒢^,(XT)[0,n]) where T𝒯ω is chosen at random according to τ.

Since X is μ-normal, by Lemma 14 again, XT is μτ-normal τ-almost surely. Thus, by Theorem 7, for τ-almost all T, Capital(𝒢^,(XT)[0,n]) is bounded by a constant C independent on n. By the equivalence of the two random variables above, this means that almost surely, there exists a constant C such that Capital(𝒢,X[0,n])<C for all n. In other words, almost surely, 𝒢 loses on the sequence X.

We remark that the dichotomy theorem for Bernoulli measures yields the following dichotomy for probabilistic μ-gamblers.

Theorem 18 (Schnorr-Stimm dichotomy theorem for PFAs).

Let X be an infinite sequence in Aω and μ a Bernoulli measure.

  1. (i)

    If X is μ-normal and 𝒢 is a μ-gambler, then almost surely the capital of 𝒢 throughout the game either is ultimately constant or decreases at an exponential rate.

  2. (ii)

    If X is not μ-normal, then there exists a μ-gambler 𝒢 which wins against X at an “infinitely often” exponential rate almost surely.

5 Conclusion

The main contributions of this paper are a generalization of the Agafonov theorem for PFA with arbitrary transition probabilities which settles the open question posed by Léchine et al. [6], and an extension of Schnorr-Stimm theorem to probabilistic gamblers.

While we proved the probabilistic Agafonov theorem (Theorem 16) by reduction to the deterministic setting, it is also possible to follow with a more direct approach (i.e., without appealing to the Seiller-Simonsen result), similar to the one followed by Carton to generalize Agafonov’s theorem for DFA [3]. This however makes the argument somewhat more complicated.

An interesting direction for future research is to explore whether the “uselessness of randomness” also holds for pushdown automata. These are a more powerful model of computation and indeed some normal sequences can be predicted by pushdown automata (some in a rather dramatic way, as proven by Carton and Perifel [4]333The result proven by Carton and Perifel considers a slightly different paradigm, namely compression (which we did not discuss in this paper) instead of prediction.). We can for example ask: If some probabilistic pushdown selector selects a biased subsequence from a sequence X, does there necessarily exist a deterministic pushdown selector which also selects a biased subsequence? Similarly, if some probabilistic pushdown gambler wins against a sequence X, does there necessarily exist a deterministic pushdown gambler which wins against that same sequence X?

A related question concerns the speed of success in the gambling model. In the case of finite-state automata, Schnorr and Stimm proved that either a sequence X cannot be predicted or some gambler wins on it at an exponential rate. This dichotomy no longer holds for pushdown automata, but one may ask the following question: If some probabilistic pushdown gambler wins at an exponential rate against a sequence X, does there necessarily exist a pushdown gambler which wins against that sequence X at an exponential rate?

References

  • [1] V. N. Agafonov. Normal sequences and finite automata. Soviet Mathematics Doklady, 9:324–325, 1968.
  • [2] Laurent Bienvenu, Valentino Delle Rose, and Tomasz Steifer. Probabilistic vs deterministic gamblers. In 39th International Symposium on Theoretical Aspects of Computer Science, STACS 2022, volume 219 of LIPIcs, pages 11:1–11:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.STACS.2022.11.
  • [3] Olivier Carton. A direct proof of Agafonov’s theorem and an extension to shift of finite type. CoRR, abs/2005.00255, 2020. arXiv:2005.00255.
  • [4] Olivier Carton and Sylvain Perifel. Deterministic pushdown automata can compress some normal sequences. Logical Methods in Computer Science, Volume 20, Issue 3, 2024.
  • [5] T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley, 2nd edition edition, 2006.
  • [6] Ulysse Léchine, Thomas Seiller, and Jakob Grue Simonsen. Agafonov’s theorem for probabilistic selectors. In 49th International Symposium on Mathematical Foundations of Computer Science, MFCS 2024, volume 306 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.MFCS.2024.67.
  • [7] John E. Maxfield. A short proof of Pillai’s theorem on normal numbers. Pacific Journal of Mathematics, 2(1):23–24, 1952.
  • [8] Satyadev Nandakumar, Subin Pulari, Prateek Vishnoi, and Gopal Viswanathan. An analogue of Pillai’s theorem for continued fraction normality and an application to subsequences. Bulletin of the London Mathematical Society, 53(5):1414–1428, 2021.
  • [9] Claus Schnorr and Hermann Stimm. Endliche Automaten und Zufallsfolgen. Acta Informatica, 1(4):345–359, 1972. doi:10.1007/BF00289514.
  • [10] Thomas Seiller and Jakob Grue Simonsen. Agafonov’s theorem for finite and infinite alphabets and probability distributions different from equidistribution. Ergodic Theory and Dynamical Systems, 45(11):3506–3539, 2025.