Abstract 1 Introduction 2 Preliminaries 3 Overlapping Net Occurrence Cover 5 Occurrences of Thue-Morse Words of Smaller Order 6 Net Occurrences in Fibonacci Words 7 Net Occurrences in Thue-Morse Words 8 Conclusion and Future Work References Appendix D A Factorization of Thue-Morse Word

Net Occurrences in Fibonacci and Thue-Morse Words

Peaker Guo ORCID School of Computing and Information Systems, The University of Melbourne, Parkville, Australia Kaisei Kishi Department of Information Science and Technology, Kyushu University, Fukuoka, Japan
Abstract

A net occurrence of a repeated string in a text is an occurrence with unique left and right extensions, and the net frequency of the string is the number of its net occurrences in the text. Originally introduced for applications in Natural Language Processing, net frequency has recently gained attention for its algorithmic aspects. Guo et al. [CPM 2024] and Ohlebusch et al. [SPIRE 2024] focus on its computation in the offline setting, while Guo et al. [SPIRE 2024], Inenaga [arXiv 2024], and Mieno and Inenaga [CPM 2025] tackle the online counterpart. Mieno and Inenaga also characterize net occurrences in terms of the minimal unique substrings of the text. Additionally, Guo et al. [CPM 2024] initiate the study of net occurrences in Fibonacci words to establish a lower bound on the asymptotic running time of algorithms. Although there has been notable progress in algorithmic developments and some initial combinatorial insights, the combinatorial aspects of net occurrences have yet to be thoroughly examined. In this work, we make two key contributions. First, we confirm the conjecture that each Fibonacci word contains exactly three net occurrences. Second, we show that each Thue-Morse word contains exactly nine net occurrences. To achieve these results, we introduce the notion of overlapping net occurrence cover, which narrows down the candidate net occurrences in any text. Furthermore, we provide a precise characterization of occurrences of Fibonacci and Thue-Morse words of smaller order, offering structural insights that may have independent interest and potential applications in algorithm analysis and combinatorial properties of these words.

Keywords and phrases:
Fibonacci words, Thue-Morse words, net occurrence, net frequency, factorization
Copyright and License:
[Uncaptioned image] © Peaker Guo and Kaisei Kishi; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Mathematics of computing Combinatorics on words
Acknowledgements:
The authors thank Hideo Bannai and the organizers of the StringMasters workshop at CPM 2024 for initiating the collaboration between the authors during the workshop. The authors also thank Shunsuke Inenaga and William Umboh for their helpful advice.
Funding:
Peaker Guo: Supported by an Australian Government Research Training Program Scholarship.
Editors:
Paola Bonizzoni and Veli Mäkinen

1 Introduction

The work by Axel Thue at the beginning of the 20th century marked the beginning of the field of combinatorics on words [6]. Central to the field are two key objects that have attracted extensive research: Fibonacci words and Thue-Morse words [30]. These objects are remarkable for their rich combinatorial properties and applications in seemingly unrelated fields beyond combinatorics on words. Fibonacci words, for instance, have been used to establish lower bounds and analyze behaviors of string algorithms [24], while Thue-Morse words appear in diverse areas such as group theory, physics, and even chess [2]. They have also been used to prove properties related to repetitiveness measures [3, 27, 11, 32, 5].

Another key aspect of combinatorics on words involves identifying significant strings in a text. Different definitions of significance lead to different problem formulations. These significant strings could be repetitions [10], tandem repeats [20], or runs [4]. There is also a rich literature on the study of these significant strings in Fibonacci and Thue-Morse words [7, 12, 22, 13]. For many applications, frequency serves as a basis for significance measure. However, frequency alone can be misleading, as it may be inflated by occurrences of longer repeated strings. Consider the text the␣theoretical␣theme as an example. The string the is the most frequent string of length three, but this is due to the fact that two of its occurrences are contained by the longer repeated string ␣the.

To address this issue, Lin and Yu [28, 29] introduced the notion of net frequency (NF), motivated by Natural Language Processing tasks. As reconceptualized by Guo et al. [18], a net occurrence of a repeated string in a text is an occurrence with unique left and right extensions, and the NF of the string is the number of its net occurrences in the text. In the earlier example, only the first occurrence of the is a net occurrence, reflecting the only occurrence that is not contained by a longer repeated string.

There has been a recent surge of interest in the computation of NF. Guo et al. [18] and Ohlebusch et al. [33] focus on the offline setting, while Guo et al. [19], Inenaga [23], and Mieno and Inenaga [31] extend the computation to the online setting. Mieno and Inenaga also characterize net occurrences in terms of the minimal unique substrings of the text. Additionally, Guo et al. [18] study net occurrences in Fibonacci words to establish a lower bound on the asymptotic running time of algorithms. Despite these advances, the combinatorial aspect of net occurrences has yet to be thoroughly investigated. It has been shown that there are at least three net occurrences in each Fibonacci word [18]. However, proving that these are the only three is more challenging and was only conjectured. Meanwhile, the net occurrences in each Thue-Morse word had not been investigated before – both of which we address in this work.

Our results.

In this work, our main contribution is twofold. First, we confirm the conjecture by Guo et al. [18] that there are exactly three net occurrences in each Fibonacci word (Theorem 34). Second, we show that there are exactly nine net occurrences in each Thue-Morse word (Theorem 41). To achieve these results, we first introduce the concept of an overlapping net occurrence cover, which drastically reduces the number of occurrences that need to be examined when proving certain net occurrences are the only ones (Lemma 12). Additionally, we provide a precise characterization of occurrences of smaller-order Fibonacci and Thue-Morse words (Theorem 17 and Theorem 19). These findings could also be of independent interest, providing tools and insights for analyzing algorithms and exploring the combinatorial properties of these words. For example, they lead to methods to count the smaller-order occurrences (Corollary 18 and Corollary 21).

Other related work.

Occurrences of Fibonacci and Thue-Morse words of smaller order have been previously studied. For Fibonacci words, these occurrences have been shown to be related to the Fibonacci representation of positive integers [22, 35]. For Thue-Morse words, these occurrences have been investigated using the binary representation of numbers and properties of the compact directed acyclic word graph (CDAWG) of each Thue-Morse word [34]. We emphasize that our work addresses occurrences of Fibonacci and Thue-Morse words of smaller order from a different angle than prior work: we provide a recurrence relation that precisely characterizes the occurrences, bypassing the need for other representations.

2 Preliminaries

Strings.

Throughout, we consider the binary alphabet Σ:={a,b}. A string is an element of Σ. The length of a string S is denoted as |S|. Let ϵ denote the empty string of length 0. We use S[i] to denote the ith character of a string S. Let [n] denote the set {1,2,,n}. Let ST be the concatenation of two strings, S and T. A substring of a string T of length n, starting at position i[n] and ending at position j[n], is written as T[ij]. A substring T[1j] is called a prefix of T, while T[in] is called a suffix of T. A substring S of T is a proper substring if ST. An occurrence in the text T of length n is a pair of starting and ending positions (i,j)[n]×[n]. We say (i,j) is an occurrence of string S if S=T[ij], and i is an occurrence of S if S=T[ii+|S|1]. An occurrence (i,j) is a sub-occurrence of (i,j) if iijj. An occurrence (i,j) is a super-occurrence of (i,j) if (i,j) is a sub-occurrence of (i,j). Moreover, (i,j) is a proper sub-occurrence (or super-occurrence) of (i,j) if (i,j) is a sub-occurrence (or super-occurrence) of (i,j) and ii or jj. Two occurrences (i,j) and (i,j) overlap if there exists a position k such that ikj and ikj. For a non-empty string S, a sequence of non-empty strings =(xk)k=1m=(x1,x2,,xm) is referred to as a factorization of S if S=x1x2xm. Each string xk is called a factor of . The size of , denoted by ||, is the number of factors in the factorization.

Net frequency and net occurrences.

In a text T, the net frequency (NF) of a unique string in T is defined to be zero. The NF of a repeated string is the number of net occurrences in T.

Definition 1 (Net occurrence [18]).

In a text T, an occurrence (i,j) is a net occurrence if the corresponding string T[ij] is repeated, while both left extension T[i1j] and right extension T[ij+1] are unique. When i=1, T[i1j] is assumed to be unique; when j=|T|, T[ij+1] is assumed to be unique.

For an occurrence (i,j) in text T, we refer to T[i1] and T[j+1] as the left and right extension characters of (i,j), respectively. For a string S occurring in T, we say x,yΣ are left and right extension characters of S if both strings xS and Sy also occur in T.

Fibonacci words.

Let Fi denote the (finite) Fibonacci word of order i where F1:=b,F2:=a, and Fi:=Fi1Fi2 for each i3. Let fi:=|Fi| be the length of the Fibonacci word of order i, which is also the ith Fibonacci number. We next review two useful results on Fi.

Lemma 2 ([12]).

Fi only occurs twice in FiFi.

Lemma 3 ([32]).

The strings aaa and bb do not occur in Fi.

The following result can be readily derived by repeatedly applying the definition of Fi.

Observation 4.

For 1ki, there is a factorization of Fi where each factor is either Fk or Fk+1.

For example, for k=i2i5, we have the following factorizations: Fi=Fi1Fi2=Fi2Fi3Fi2=Fi3Fi4Fi3Fi3Fi4=Fi4Fi5Fi4Fi4Fi5Fi4Fi5Fi4.

Thue-Morse words.

For a binary string S, let S¯ denote the string obtained by simultaneously replacing each a with b and each b with a. Let 𝒯i be the (finite) Thue-Morse word of order i where 𝒯1:=a and 𝒯i:=𝒯i1𝒯i1¯ for each i2. Let τi:=|𝒯i|=2i1 be the length of the Thue-Morse word of order i. We next review two properties of each 𝒯i.

Lemma 5 (Overlap-free [30]).

𝒯i has no overlapping occurrences of the same string.

Lemma 6 (Cube-free [30]).

𝒯i does not contain any string of the form xxx where x is a non-empty string.

The following result can be directly derived by repeatedly applying the definition of 𝒯i.

Observation 7.

For each i2 and 1ji, there is a factorization of 𝒯i where each factor is either 𝒯i(j1) or 𝒯i(j1)¯.

For example, for 2j3, 𝒯i=𝒯i1𝒯i1¯=𝒯i2𝒯i2¯𝒯i2¯𝒯i2. Figure 3 illustrates larger value of j. Also note that this result is analogous to Observation 4.

3 Overlapping Net Occurrence Cover

This section lays the foundation to prove the main results of this paper in the subsequent sections. Specifically, we aim to develop tools to show that certain net occurrences are the only ones in a text. To achieve this, we first provide two characteristics for non-net occurrences. The proofs in this section are presented in Appendix A.

Observation 8.

In a text T, if an occurrence (s,e) is a proper super-occurrence of a net occurrence, then (s,e) is not a net occurrence.

Observation 9.

In a text T, if an occurrence (s,e) is a proper sub-occurrence of a net occurrence, then (s,e) is not a net occurrence.

 Remark 10.

In a text, both a string and its substring can have positive NF, for example, in abaababaabaab, both abaaba and abaab have positive NF. However, this relationship does not hold for an occurrence and its sub-occurrence, as shown in the above two observations.

To show that a given set of net occurrences in T are the only ones in T, the above two observations allow us to ignore any occurrence that is either a sub-occurrence or a super-occurrence of a net occurrence. To fully use these two observations, we focus on the case when the given net occurrences “overlap” one another and collectively “cover” the text. Consequently, the only occurrences that need to be explicitly examined are the super-occurrences of those corresponding to the “overlapping regions” of these net occurrences. To formalize this, we introduce the following definition and lemma.

Definition 11 (ONOC and BNSO).

Consider a text T and a set of c net occurrences in T: 𝒞={(i1,j1),(i2,j2),,(ic,jc)}. We say 𝒞 is an overlapping net occurrence cover (ONOC) of T if i1=1, ik+1jk for 1kc1, and jc=n. Each occurrence in the set {(i2,j1),(i3,j2),(ic,jc1)} is a bridging net sub-occurrence (BNSO) of 𝒞.

An example of Definition 11 is shown in Figure 1.

Figure 1: An example for Definition 11. The set {(1,6),(4,9),(9,14)} is an ONOC, with each of its net occurrences underlined in blue; {(4,6),(9,9)} is the corresponding set of BNSOs. Note that (2,7) is a net occurrence outside of this ONOC, underlined in orange.
Lemma 12.

For a text T, if there exists an ONOC 𝒞 of T such that 𝒞 does not contain all the net occurrences in T, then each net occurrence in T outside of 𝒞 must be a super-occurrence of (i1,j+1), where (i,j) is a BNSO of 𝒞.

In the example in Figure 1, note that net occurrence (2,7) is indeed a super-occurrence of (41,6+1), where (4,6) is a BNSO.

In Section 6 and Section 7, we apply Lemma 12 in three steps. First, for a Fibonacci or Thue–Morse word, we show that an ONOC exists. Next, we examine the set of BNSOs of the ONOC. Finally, we prove that no super-occurrence of (i1,j+1) (where (i,j) is a BNSO) is a net occurrence, thus concluding that the ONOC already contains all the net occurrences in the text.

4 Occurrences of Fibonacci Words of Smaller Order

We study the occurrences of Fij in Fi for appropriate i and j. These results will help us prove the only net occurrences in Fi in Section 6 and may also be of independent interest.

When j=1, with Fi=Fi1Fi2, we have one occurrence of Fi1 at position 1. The following result shows that this is the only one.

Lemma 13 ([32]).

Fi1 only occurs at position 1 in Fi for i3.

The two factorizations in the following result reveal three occurrences of Fi2 in Fi.

Observation 14 ([18]).

For each i6,

Fi =Fi2Fi3Fi2 (1)
Fi =Fi2Fi2Fi5Fi4. (2)

The following result confirms that these are the only three.

Lemma 15 ([32]).

Fi2 only occurs at positions 1, fi2+1, and fi1+1 in Fi for i6.

We next provide the result when j=3 and i7.

Lemma 16.

Fi3 only occurs at positions 1, fi3+1, fi2+1, and fi1+1 in Fi.

Proof.

From Lemma 15, notice that the second occurrence of Fi2 follows immediately after the first occurrence, and the second and the third occurrences of Fi2 overlap. Then, based on Equations 12, we consider the following three cases.

  1. Case 1

    Fi3 occurs within Fi2. From Lemma 13, Fi3 only occurs at position 1 in Fi2. Thus, using Lemma 15, the only occurrences of Fi3 within Fi2 in Fi are at positions 1, fi2+1, and fi1+1.

  2. Case 2

    Fi3 occurs across the boundary of Fi2Fi3. Again from Lemma 15, the only occurrences of Fi3 within Fi2Fi3=Fi1 are at positions 1, fi3+1, and fi2+1. Note that the occurrence at position fi3+1 is the boundary-crossing one: we apply Equation 2 on Fi1 and obtain Fi2Fi3=Fi3Fi3Fi6Fi5.

  3. Case 3

    Fi3 occurs across the boundary of Fi3Fi2. Note that Fi3Fi2=Fi3Fi3Fi4. Using Lemma 2, Fi3 does not occur in Fi3Fi3 and thus does not occur across the boundary of Fi3Fi2.

We now present the main result of the section, illustrated in Figure 2. Before that, we define the following. For a set of integers A and another integer i, Ai denotes the set {a+i:aA}. We write max(A) for the maximum element of set A.

Figure 2: An illustration of Theorem 17 when j=4. Each row depicts a factorization of Fi with relevant factors highlighted in colors. The top two, middle two, and bottom two rows correspond to sets Θi,j2, Θi,j1 and Θi,j, respectively. Each green and blue occurrence of Fij is introduced by an occurrence of Fi(j2) and Fi(j1), respectively. The yellow occurrence is the rightmost one.
Theorem 17.

Let Θi,j denote the set of the starting positions of the occurrences of Fij in Fi. Then, Θi,0=Θi,1={1}, and for 2ji4,

  • when j is even, Θi,j=Θi,j1(Θi,j2fij){fifij+1}, where the three sets in the union are mutually disjoint, and max(Θi,j)=fifij+1;

  • when j is odd, Θi,j=Θi,j1(Θi,j2fij) where the two sets in the union are disjoint, and max(Θi,j)=fifi(j1)+1.

Proof.

We proceed by induction on j.

Base cases.

When j=0, we have Θi,0={1} trivially. When j=1, it follows from Lemma 13 that Θi,1={1}. When j=2, we obtain Θi,2={1,fi2+1,fi1+1} by Lemma 15. Thus, Θi,2=Θi,1(Θi,0fi2){fifi2+1}, where the three sets are mutually disjoint, and max(Θi,2)=fi1+1=fifi2+1. Next, when j=3, we have Θi,3={1,fi3+1,fi2+1,fi1+1} by Lemma 16. Hence, Θi,3=Θi,2(Θi,1fi3), Θi,2(Θi,1fi3)=, and max(Θi,3)=fi1+1=fifi(31)+1.

Inductive step.

For each 4ki4, assume the claim holds for j=k2 and k1, and we now prove the claim for j=k. We first prove the claim for even k. Define Λi,k:=Θi,k1(Θi,k2fik){fifik+1}. We aim to show that Θi,k=Λi,k by showing Λi,kΘi,k and Θi,kΛi,k. Before we proceed, note that Fi(k+2),Fi(k+1),Fik,Fi(k1),Fi(k2) are consecutive Fibonacci words of increasing orders.

To prove that Λi,kΘi,k, we will show that each set in the union defining Λi,k is contained in Θi,k. Note that Θi,k1Θi,k because Fik is a prefix of Fi(k1)=FikFi(k+1). Next, we have (Θi,k2fik)Θi,k because Fik occurs at position fik+1 in Fi(k2)=FikFikFi(k+3)Fi(k+2) where this factorization can be derived similarly to Equation 2. Lastly, by the induction hypothesis on j=k2, we have fifi(k2)+1Θi,k2 and it is the rightmost occurrence of Fi(k2) in Fi. Consider the factorization Fi(k2)=FikFi(k+1)Fik, which can be derived similarly to Equation 1. Note that the rightmost occurrence of Fik in Fi(k2) is at position (fifi(k2)+1)+(fik+fi(k+1))=(fi(fik+fik+fi(k+1))+1)+(fik+fi(k+1))=fifik+1 in Fi. Thus, fifik+1Θi,k.

Next, we prove Θi,kΛi,k by showing that each occurrence of Fik is in Λi,k. By Observation 4, there is a factorization of Fi where each factor is either Fi(k1) or Fik. We now examine the occurrences of Fik based on this factorization. First, when there is an occurrence of Fik within Fi(k1), this occurrence is in Θi,k1Λi,k. Next, when there is an occurrence of Fik (underlined) across the boundary of Fi(k1)Fi(k1)=FikFik¯Fi(k+3)Fi(k+2)Fi(k+1) or across the boundary of Fi(k1)Fik=Fi(k2)=FikFik¯Fi(k+3)Fi(k+2), then, by the fact that Fik only occurs at positions 1, fik+1, and fi(k1)+1 in Fi(k2) (a direct generalization of Lemma 15), this occurrence of Fik must be in (Θi,k2fik)Λi,k. Finally, observe that there does not exist an occurrence of Fik across the boundary of FikFi(k1)=FikFikFi(k1) or across the boundary of FikFik, because otherwise, this would contradict Lemma 2.

Now, consider a position x(Θi,k2fik). Assume, by contradiction, that xΘi,k1, then, we have Fi(k2)=FikFi(k1), which contradicts Fi(k2)=Fi(k1)FikFikFi(k1). This is analogous to Fi1Fi2Fi2Fi1, which follows from the “near-commutative property” of Fibonacci words [26]. Thus, Θi,k1(Θi,k2fik)=. Next, by the induction hypothesis, max(Θi,k1Θi,k2)=fifi(k2)+1. Note that (fifi(k2)+1)+fik<fifik+1. Therefore, fifik+1Θi,k1(Θi,k2fik) and max(Θi,k)=fifik+1.

The proof for odd k is very similar to even k with the difference being that we do not need to consider fifik+1 for odd k.

In the following result, the case where 0ji4 has been addressed in [35], while the case where i3ji1 is straightforward. Our characterization in Theorem 17 can offer an alternative simpler proof for this result.

Corollary 18.

Consider Fi and 0ji1. Let θi,j denote the number of occurrences of Fij in Fi, and define f1:=1 and f0:=0 for convenience. Then,

θi,j={fj+2(jmod2) if 0ji4;fj+1 if i3ji2;fj1 if j=i1.

5 Occurrences of Thue-Morse Words of Smaller Order

Figure 3: An illustration of the occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i for 1j6.

We study the occurrences of 𝒯ij and 𝒯ij¯ in each 𝒯i for appropriate i and j (the occurrences are shown in Figure 3 for 1j6). These results will help us identify the net occurrences in each 𝒯i in Section 6 and may also be of independent interest. We now present the main result of the section, illustrated in Figure 4.

Figure 4: An illustration of Theorem 19. Each dark blue, pink, and light blue occurrence of 𝒯ij is introduced by an occurrence of 𝒯i(j1), 𝒯i(j1)¯, and 𝒯i(j2) respectively. Each occurrence of 𝒯ij that is both dark blue and pink indicates that it is introduced by both an occurrence of 𝒯i(j1) and an occurrence of 𝒯i(j1)¯.
Theorem 19.

For each i2 and 0ji1, let Ai,j and Bi,j denote the set of the starting positions of the occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i, respectively. Then, Ai,0=Ai,1={1}. For each j2, we define

Bi,j1 :=Bi,j1τij,Ai,j2:=Ai,j2(τij+τi(j+1)),
Ii,j3 :={,j=2,Ai,j3Bi,j3,j3, where
Ai,j3 :=Ai,j3(τi(j1)+τij),Bi,j3:=Bi,j3τi(j2).

Then, Ai,j=Ai,j1Bi,j1Ai,j2 with

Ai,j1Bi,j1=Ii,j3,Ai,j1Ai,j2=,andBi,j1Ai,j2=.

Proof.

We proceed by induction on j.

Base cases.

When j=0, the claim holds trivially. When j=1, note that 𝒯i1 only occurs at position 1 because it cannot occur at position τi1+1 (where 𝒯i1¯ occurs), and any other occurrences of 𝒯i1 would overlap with its occurrence at position 1, contradicting Lemma 5. When j=2, observe that Ai,21={1}, Bi,21={τi1+τi2+1} and Ai,22={τi2+τi3+1} are mutually disjoint. Further, there are no occurrences of 𝒯i2 outside of Ai,2=Ai,1Bi,1Ai,0 because any such occurrences would contradict Lemma 5.

Inductive step.

For each 3ki1, assume the claim holds for j=k3, k2, and k1 and we now prove the claim for j=k. Define Vi,k:=Ai,k1Bi,k1Ai,k2. We prove Ai,k=Vi,k by showing Ai,kVi,k and Vi,kAi,k.

To prove Vi,kAi,k, we will show that each set in the union defining Vi,k is contained in Ai,k. Clearly, Ai,k1Ai,k because 𝒯ij is a prefix of 𝒯i(j1)=𝒯ij𝒯ij¯. Similarly, Bi,k1Ai,k because 𝒯ij is a suffix of 𝒯i(j1)¯=𝒯ij¯𝒯ij. Lastly, Ai,k2Ai,k because 𝒯ij occurs at position τik+τi(k+1) of

𝒯i(k2)=𝒯ik𝒯i(k+1)¯𝒯ik𝒯i(k+1)𝒯ik (3)

Next, we prove Ai,kVi,k by showing that each occurrence of 𝒯ik is in Vi,k. By Observation 7, there is a factorization of 𝒯i where each factor is either 𝒯i(k1) or 𝒯i(k1)¯. We thus consider the following four cases.

  1. Case 1

    When 𝒯ik occurs within 𝒯i(k1)𝒯i(k1)=𝒯ik𝒯ik¯𝒯ik𝒯ik¯, by the overlap-free property (Lemma 5), positions 1 and τi(k1)+1 are the only two occurrences of 𝒯ik. They are both contained in Ai,k1, while the latter is also in Bi,k1. (The overlap-free property will be used similarly in the remaining three cases.)

  2. Case 2

    When 𝒯ik occurs within 𝒯i(k1)¯𝒯i(k1)¯=𝒯ik¯𝒯ik𝒯ik¯𝒯ik, positions τik+1 and τi(k1)+τik+1 are the only two occurrences of 𝒯ik. They are both contained in Bi,k1, while the former is also in Ai,k1.

  3. Case 3

    When 𝒯ik occurs within 𝒯i(k1)𝒯i(k1)¯=𝒯i(k2), by Equation 3, positions 1, τik+τi(k+1)+1 and τi(k1)+τik+1 are the only occurrences of 𝒯ik: the first and third are both contained in Ai,k1, the second is in Ai,k2, and the third is also in Bi,k1.

  4. Case 4

    When 𝒯ik occurs within 𝒯i(k1)¯𝒯i(k1)=𝒯ik¯𝒯ik𝒯ik𝒯ik¯, position τik and τi(k1) are the only two occurrences of 𝒯ik. The former is contained in Bi,k1, while the latter is in Ai,k1.

After examining the above four cases, we conclude that Ai,kVi,k, and thus Ai,k=Vi,k. Next we will prove Ai,k1Bi,k1=Ii,k3 by showing Ai,k1Bi,k1Ii,k3 and Ii,k3Ai,k1Bi,k1. Recall that Ii,k3:=Ai,k3Bi,k3.

First, we prove Ai,k1Bi,k1Ii,k3 by establishing that if an occurrence of 𝒯ik is in Ai,k1Bi,k1, then this occurrence is in Ii,k3. First observe that in Cases 12, some occurrences of 𝒯ik are contained in both Ai,k1 and Bi,k1. By Lemma 5, 𝒯i(k3)¯ and 𝒯i(k3) do not overlap in 𝒯i, it follow that, for each occurrence of 𝒯i(k3)¯ in 𝒯i, there is only one occurrence of 𝒯ik contained in Bi,k3. Similarly, for each occurrence of 𝒯i(k3) in 𝒯i, there is only one occurrence of 𝒯ik contained in Ai,k3. Now, consider the factorizations:

𝒯i(k3)¯ =𝒯i(k1)¯𝒯i(k1)𝒯i(k1)𝒯i(k1)¯, and
𝒯i(k3) =𝒯i(k1)𝒯i(k1)¯𝒯i(k1)¯𝒯i(k1).

Observe that the set of occurrences of 𝒯ik in Case 1 is a subset of Bi,k3 since 𝒯i(k1)𝒯i(k1) occurs at position τi(k1)+1 in 𝒯i(k3)¯. Similarly, the set of occurrences of 𝒯ik in Case 2 is a subset of Ai,k3 since 𝒯i(k1)¯𝒯i(k1)¯ occurs at position τi(k1)+1 in 𝒯i(k3).

Next, we prove Ii,k3Ai,k1Bi,k1 by contraposition. Specifically, instead of directly showing that “if an occurrence of 𝒯ik is in Ii,k3, then this occurrence is in Ai,k1Bi,k1”, we prove the equivalent contrapositive: “if an occurrence of 𝒯ik is not in Ai,k1Bi,k1, then this occurrence is not in Ii,k3”. First observe that 𝒯ik occurs at position 1 in 𝒯i(k1)=𝒯ik𝒯ik¯ and occurs at position τik+1 in 𝒯i(k1)¯=𝒯ik¯𝒯ik. Next, occurrences of 𝒯ik𝒯ik¯ and 𝒯ik𝒯ik¯ overlap in 𝒯i to form occurrences of 𝒯ik¯𝒯ik𝒯ik¯ (see Cases 12). Hence, if an occurrence of 𝒯ik is not in Ai,k1Bi,k1, then this occurrence is not at position τik+1 in 𝒯ik¯𝒯ik𝒯ik¯. Next, consider the factorizations:

𝒯i(k3)¯ =𝒯i(k1)¯𝒯ik𝒯ik¯𝒯ik𝒯ik¯𝒯i(k1)¯, and
𝒯i(k3) =𝒯i(k1)𝒯ik¯𝒯ik𝒯ik¯𝒯ik𝒯i(k1).

Since 𝒯i(k3)¯ and 𝒯i(k3) do not overlap in 𝒯i, we know that 𝒯ik¯𝒯ik𝒯ik¯ only occurs at position τi(k1)+τik+1 in 𝒯i(k3)¯ and at position τi(k1)+1 in 𝒯i(k3). Thus, if an occurrence of 𝒯ik is not in Ai,k1Bi,k1, then this occurrence is not in Ii,k3. Therefore, we conclude that Ii,k3Ai,k1Bi,k1.

The analogous characterization of Bi,j is presented as follows, which can be proven in a way similar to the proof of Theorem 19.

Corollary 20.

For each i2 and 0ji1, we have Bi,0=, Bi,1={τi1+1}. For each j2, we define

Ai,j1′′ :=Ai,j1τij,Bi,j2′′:=Bi,j2(τij+τi(j+1)),
Ii,j3 :={,j=2,Ai,j3′′Bi,j3′′,j3, where
Bi,j3′′ :=Bi,j3(τi(j1)+τij),Ai,j3′′:=Ai,j3τi(j2).

Then, Bi,j=Bi,j1Ai,j1′′Bi,j2′′ with

Bi,j1Ai,j1′′=Ii,j3,Bi,j1Bi,j2′′=,andAi,j1′′Bi,j2′′=.

We next use Theorem 19 and Corollary 20 to count the number of occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i.

Corollary 21.

For i2, consider 𝒯i and 0ji1. Let aj and bj denote the number of occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i, respectively. Then,

  • a0=a1=1 and aj=aj1+2aj2 for each j2;

  • b0=0 and bj=bj1+aj1 for each j1.

Proof.

We proceed by induction on j. For 1j2, the claim holds trivially. For j3, we have |Ii,j3|=|Ai,j3|+|Bi,j3|=aj3+bj3=bj2 and |Ai,j|=|Ai,j1|+|Bi,j1|+|Ai,j2||Ii,j3|=aj1+bj1+aj2bj2=aj1+(bj1bj2)+aj2=aj1+2aj2. by induction hypothesis. We can prove bj similarly with Corollary 20.

 Remark 22.

For each i2, aj is the (j+1)th Jacobsthal Number: Sequence A001045 of the On-Line Encyclopedia of Integer Sequences (https://oeis.org/A001045).

With the characterization of the occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i, a natural next step is to investigate the structure of the strings that surround 𝒯ij and 𝒯ij¯, which correspond to the blank areas in each row in Figure 3. In Appendix D, we explore two smallest factorizations of 𝒯i, each containing all occurrences of 𝒯ij and 𝒯ij¯, respectively. The remaining factors in these factorizations represent the surrounding strings.

6 Net Occurrences in Fibonacci Words

In this section, we prove that there are only three net occurrences in each Fi, using the results on the occurrences of Fibonacci words of smaller order from Section 4, the notion of ONOC from Section 3, and new properties that we will develop in this section. We begin by reviewing the following results. Some of the proofs in this section are presented in Appendix B.

Lemma 23 ([18]).

For i7, let Qi:=Fi5Fi6F3F2, Δ(0):=ba, Δ(1):=ab. Then,

Fi4Fi5 =QiΔ(1(imod2)) and (4)
Fi5Fi4 =QiΔ(imod2). (5)
Lemma 24 ([18]).

For each i7, the following are net occurrences in Fi:

  • one occurrence of Fi2 at position fi1+1;

  • two occurrences of Fi2Qi at positions 1 and fi2+1.

Meanwhile, the two occurrences of Fi2 at positions 1 and fi2+1 are not net occurrences.

With Lemma 15, we know that these three are the only occurrences of Fi2 in Fi. Similarly, we now strengthen Lemma 24 by showing the following result.

Lemma 25.

For each i7, Fi2Qi only occurs at positions 1 and fi2+1 in Fi.

By combining Lemma 15, Lemma 24, and Lemma 25, we conclude that Fi2 has only one net occurrence and Fi2Qi only has two net occurrences.

Lemma 26.

For each i7, the net occurrences identified in Lemma 24 are the only net occurrences of Fi2 and Fi2Qi in Fi.

Figure 5: An illustration of several factorizations of Fi from Observation 14 and Lemma 23 where Δ:=Δ(1(imod2)) and Δ:=Δ(imod2). Net occurrences of Fi2 and Fi2Qi are in yellow and green, respectively. Super-occurrences of the two BNSOs are shown as arrows.

It remains to show that there are no additional net occurrences in each Fi. To achieve this, we use the results from Section 3. First, observe that the three net occurrences in Lemma 24 form an ONOC of Fi. The two BNSOs of this ONOC correspond to an occurrence of Qi and an occurrence of Fi4Qi, respectively. See Figure 5 for an illustration. Next, we aim to show that no super-occurrences of these two occurrences can be a net occurrence. To establish this, we analyze the super-occurrences of the occurrences of Fi3 in Lemma 32. This result covers the examination of the super-occurrences of the occurrence of Fi4Qi, since Fi3 is a prefix of Fi4Qi=Fi3Qi1. Furthermore, Lemma 32 helps examining super-occurrences of the occurrence of Qi in Lemma 33.

To prove these two lemmas, we introduce some properties of Fi3 and Qi, which are proved in Appendix B.

Lemma 27.

|Qi|=fi32.

Lemma 28.

For a substring S of Fi, if Fi2Qi is a proper substring of S, then S is unique.

Lemma 29.

Fi3 is always followed by Qi1 in Fi.

Lemma 30.

Fi3Fi6Fi5 and its length-(fi21) prefix are both unique in Fi.

Lemma 31.

The length-(fi31) prefix of Fi3 is always followed by Fi3[fi3] in Fi.

Now, we introduce the two crucial lemmas motivated earlier.

(a) Case (1).
(b) Case (2).
(c) Case (3).
(d) Case (4).
(e) Case (5).
(f) Case (6).
(g) Case (7).
Figure 6: Illustration of the proof of Lemma 32. In each case, four factorizations of Fi are shown, each focusing on one occurrence of Fi3, highlighted in green. For S=XFi3Y, each discussed X and Y is shown in pink and blue, respectively. Each discussed left or right extension character of S is shown in yellow. Recall that Δ:=Δ(1(imod2)) and Δ:=Δ(imod2).
Lemma 32.

Consider an occurrence (s,e) in Fi and let S:=Fi[se]. If (s,e) is a super-occurrence of an occurrence of Fi3, and S is neither Fi2 nor Fi2Qi, then (s,e) is not a net occurrence.

Proof.

The proof is illustrated in Figure 6. Consider strings X and Y such that S=XFi3Y and XYϵ. We examine the following cases depending on |Y|. Note that |Qi1|=fi42 and |Qi+1|=fi22 from Lemma 27.

  1. (1)

    |Y|<|Qi1|. Using Lemma 29, note that Y is a prefix of Qi1 in this case. This means the right extension character of S is always Qi1[|Y|+1]. Thus, no occurrence of S is a net occurrence.

  2. (2)

    |Y|=|Qi1|. Using Lemma 29, Fi3Y=Fi3Qi1 always holds in this case. Next, if Fi3Y occurs at position 1, fi2+1, or fi1+1, then the right extension character is always Δ(imod2)[1]. On the other hand, if Fi3Y occurs at position fi3+1, we examine the left extension character of S=XFi3Y. Notice that |X|fi3 always holds in this case (and S becomes a prefix of Fi when |X|=fi3). Now, observe that if Fi3Y occurs at position fi3+1 or fi1+1, the left extension character of S is always Fi3[fi3|X|1]. Thus, this occurrence of S is also not a net occurrence.

  3. (3)

    |Qi1|<|Y|<fi4. If Fi3Y occurs at positions 1, fi2+1, or fi1+1, observe that occurrences of Fi3 at these three positions are always followed by Fi4. Thus, Y is a prefix of Fi4 and the right extension character of S is always Fi4[|Y|+1]. So these three occurrences of S are not net occurrences. If Fi3Y occurs at position fi3+1, since |Y|=|Qi1|+1=fi41, we have |Fi3Y|=fi3+fi41=fi21. By Lemma 30, Fi3Y is unique, which means S is unique.

  4. (4)

    |Y|=fi4. If Fi3Y occurs at position 1, then X is empty and S=Fi3Fi4=Fi2. If Fi3Y occurs at positions fi3+1, then Fi3Y=Fi3Fi6Fi5 is unique (Lemma 30) so S is also unique. If Fi3Y occurs at position fi2+1, then X ends with Δ(imod2). If Fi3Y occurs at position fi1+1, then X ends with Δ(1(imod2)). Now, since Δ(imod2)Fi2 and Δ(1(imod2))Fi2 are both unique by Lemma 15, S is also unique if Fi3Y occurs at these two positions.

  5. (5)

    fi4<|Y|<|Qi+1|. First observe that the occurrences of Fi3 at positions 1 and fi2+1 are both followed by Fi4Qi=Qi+1. Thus, if Fi3Y occurs at these two positions, then Y is a prefix of Qi+1 and the right extension character of S is always Qi+1[|Y|+1]. So these two occurrences of S are not net occurrences. Next, if Fi3Y occurs at position fi3+1, then Fi3Y is unique because Fi3Fi6Fi5 is a prefix of Fi3Y and the former is unique by Lemma 30. Thus, S is unique. Finally note that, Fi3Y cannot occur at position fi1+1 because |Y|>|Fi4| and Fi3Fi4 is a suffix of Fi.

  6. (6)

    |Y|=|Qi+1|. Similar to the previous case, if Fi3Y occurs at position fi3+1, then Fi3Y is unique, and Fi3Y cannot occur at position fi1+1. If Fi3Y occurs at position 1, then X is empty and S=Fi3Y=Fi3Qi+1=Fi2Qi. If Fi3Y occurs at position fi2+1, then S=XFi2Qi, which is unique by Lemma 28.

  7. (7)

    |Y|>|Qi+1|. Similar to the previous two cases, if Fi3Y occurs at position fi3+1, then Fi3Y is unique, and Fi3Y cannot occur at position fi1+1. If Fi3Y occurs at position fi2+1, then Fi3Y is a prefix of Fi3Fi2=Fi2Fi5Fi4=Fi2QiΔ(imod2), which is unique by Lemma 28. Thus, S is unique. Finally, since |Y|>|Qi+1|>fi21, if Fi3Y occurs at positions 1, then Y begins with the length-(fi21) prefix of Fi3Fi6Fi5, which is unique by Lemma 30. Thus, S is also unique.

Lemma 33.

Consider an occurrence (s,e) in Fi. If (s,e) is a proper super-occurrence of the occurrence of Qi at position fi2+1, then (s,e) is not a net occurrence.

Proof.

Consider strings X and Y such that S=XQiY and XYϵ. When |Y|2, notice that Fi3 occurs in S. By Lemma 32, (s,e) is not a net occurrence. Now, we consider the case when |Y|=1. Note that QiY is precisely the length-(fi31) prefix of Fi3. Thus, by Lemma 31, QiY is always followed by the same right extension character, Fi3[fi3], which means (s,e) is not a net occurrence.

Finally, the main result follows from Lemma 24, Lemma 12, Lemma 32 and Lemma 33.

Theorem 34.

The three net occurrences identified in Lemma 24 are the only ones in Fi.

7 Net Occurrences in Thue-Morse Words

In this section, we prove the only nine net occurrences in each 𝒯i using the results on the occurrences of Thue-Morse words of smaller order from Section 5, the notion of ONOC from Section 3, and new results that we will introduce in this section. We will first show that each occurrence of each string in 𝒫i (defined below) is a net occurrence in 𝒯i, then show that they are the only ones.

Definition 35.

For each i5, define 𝒫i:={𝒯i2,𝒯i2¯,𝒯i4𝒯i3¯,𝒯i4¯𝒯i3}..

We next show several factorizations of 𝒯i, proved in Appendix C.

Lemma 36.

For each i5:

𝒯i =𝒯i2𝒯i2¯𝒯i2¯𝒯i2 (6)
𝒯i =𝒯i2𝒯i3¯𝒯i2𝒯i3𝒯i2 (7)
𝒯i =𝒯i3𝒯i4¯𝒯i4𝒯i3¯𝒯i2𝒯i4𝒯i3¯𝒯i4¯𝒯i3¯ (8)
𝒯i =𝒯i3𝒯i4¯𝒯i3𝒯i4𝒯i2𝒯i4𝒯i4¯𝒯i3𝒯i3¯ (9)
Figure 7: An illustration of several factorizations of 𝒯i from Lemma 36. Net occurrences of each string in Definition 35 are highlighted in a separate color. Super-occurrences of the eight BNSOs are shown as colored arrows, blue for 𝒯i3 and orange for 𝒯i3¯ (see Lemma 40). Each net occurrence is numbered in red at the top-right corner, and each arrow with label ij corresponds to an overlap between the ith and jth net occurrences.

The following two results immediately follow from Theorem 19. Note that Corollary 37 also appears in [34].

Corollary 37.

For each i5:

  • 𝒯i2 only occurs at positions 1, τi2+τi3+1 and τi1+τi2+1 in 𝒯i.

  • 𝒯i2¯ only occurs at positions τi2+1 and τi1+1 in 𝒯i.

  • 𝒯i4𝒯i3¯ only occurs at positions τi3+τi4+1 and τi1+τi3+1 in 𝒯i.

  • 𝒯i4¯𝒯i3 only occurs at positions τi3+1 and τi1+τi3+τi4+1 in 𝒯i.

Corollary 38.

𝒯i3 only occurs at positions 1,τi3+τi4+1,τi2+τi3+1,τi1+τi3+1, and τi1+τi2+1 in 𝒯i.

We now identify the nine net occurrences in 𝒯i.

Lemma 39.

Each occurrence of each string in 𝒫i is a net occurrence in 𝒯i.

Proof.

We proceed by examining the left and right extension characters of each occurrence of each string in 𝒫i.

Since 𝒯i2 is a prefix and a suffix of 𝒯i, by the definition of occurrences, the occurrence of 𝒯i2 at positions 1 has a unique left extension character, and the occurrence of 𝒯i2 at positions τi1+τi2+1 has a unique right extension character. Next, note that the right extension character of the occurrence of 𝒯i2 at position 1 differs from that of the occurrence at position τi2+τi3+1 because 𝒯i3¯[1]𝒯i3[1]. Similarly, the left extension character of the occurrence at position τi2+τi3+1 differs from that of the occurrence at position τi1+τi2+1, because 𝒯i3¯[τi3]𝒯i3[τi3]. Hence, all three occurrences of 𝒯i2 are net occurrences.

For 𝒯i2¯, a similar argument holds. the right extension characters satisfy 𝒯i2¯[1]𝒯i2[1] and the left extension characters satisfy 𝒯i2[τi2]𝒯i2¯[τi2]. Thus, both occurrences of 𝒯i2¯ are net occurrences. For 𝒯i4𝒯i3¯, similarly, the right extension characters satisfy 𝒯i2[1]𝒯i4¯[1] and the left extension characters satisfy 𝒯i4¯[τi4]=𝒯i2¯[τi2]𝒯i2[τi2]. Thus, both occurrences of 𝒯i4𝒯i3¯ are net occurrences. Finally, for 𝒯i4¯𝒯i3, once again, the right extension characters satisfy 𝒯i4[1]𝒯i3¯[1] and the left extension characters satisfy 𝒯i3[τi3]𝒯i4[τi4]. Thus, both occurrences of 𝒯i4¯𝒯i3 are net occurrences.

To show that all other occurrences are not net occurrences, we follow Lemma 12. First note that the nine net occurrences we identified in Lemma 39 form an ONOC of 𝒯i. The eight BNSOs of this ONOC correspond to the occurrences of 𝒯i3 and 𝒯i3¯ shown in Figure 7. We next show that no super-occurrences of these occurrences are net occurrences to conclude that this ONOC already contains all the net occurrences in 𝒯i.

Lemma 40.

Consider an occurrence (s,e) in 𝒯i and let S:=𝒯i[se]. If (s,e) is a proper super-occurrence of 𝒯i3 or 𝒯i3¯, and S𝒫i, then (s,e) is not a net occurrence.

Proof.

We first consider when (s,e) contains an occurrence of 𝒯i3. Consider strings X and Y such that S=X𝒯i3Y and XYϵ. Let position u:=s+|X| be the starting position of this occurrence of 𝒯i3. Let C:={1,τi2+τi3+1,τi1+τi2+1} and D:={τi3+τi4+1,τi1+τi3+1}. By Corollary 38, we have uCD. We next examine the following cases depending on which set u belongs to and how large |Y| is.

We first consider when uC.

  1. (a)

    |Y|<τi3. By Corollary 37, note that Y is always a prefix of 𝒯i3¯. This means the right extension character of S is always 𝒯i3¯[|Y|+1]. Thus, (s,e) is not a net occurrence.

  2. (b)

    |Y|=τi3. By Corollary 37, S𝒫i in this case.

  3. (c)

    |Y|>τi3. By Corollary 37, (s,e) contains a net occurrence of 𝒯i3𝒯i3¯=𝒯i2 as a proper sub-occurrence. By Observation 8 and Lemma 39, (s,e) is not a net occurrence.

We next consider when uD.

  1. (a)

    |Y|<τi4. Recall that 𝒯i4𝒯i3¯=𝒯i4𝒯i4¯𝒯i4=𝒯i3𝒯i4. By Corollary 37, note that Y is always a prefix of 𝒯i4 in this case. This means the right extension character of S is always 𝒯i4[|Y|+1]. Thus, (s,e) is not a net occurrence.

  2. (b)

    |Y|=τi4. Using Corollary 37, S𝒫i in this case.

  3. (c)

    |Y|>τi4. By Corollary 37, in this case (s,e) contains a net occurrence of 𝒯i3𝒯i4=𝒯i4𝒯i4¯𝒯i4=𝒯i4𝒯i3¯ as a proper sub-occurrence. Thus, by Observation 8 and Lemma 39, (s,e) is not a net occurrence.

We can prove the case when (s,e) contains an occurrence of 𝒯i3¯ similarly.

Finally, the main result follows from Lemma 12, Lemma 39 and Lemma 40.

Theorem 41.

The net occurrences in Lemma 39 are the only net occurrences in each 𝒯i.

8 Conclusion and Future Work

In this work, we investigate net occurrences in Fibonacci and Thue-Morse words, making two main contributions. First, we confirm the conjecture that each Fibonacci word contains exactly three net occurrences. Second, we establish that each Thue-Morse word contains exactly nine net occurrences. To achieve these results, we first introduce the notion of an overlapping net occurrence cover and show how it can be used to prove that certain net occurrences in a text are the only ones. We then develop recurrence relations that precisely characterize the occurrences of Fibonacci and Thue-Morse words of smaller order, which could be of independent interest. As an application, we illustrate how these results facilitate the counting of small-order occurrences.

An avenue of future work is to extend our findings to study the net occurrences in k-bonacci words [17, 25, 16, 15] and Thue-Morse-like words [1, 9]. Furthermore, since both Fibonacci and Thue-Morse words can be defined via morphisms, one could also explore net occurrences in other morphic words [14, 21, 8]. Finally, the net occurrences have been characterized in terms of minimal unique substrings [31]; this viewpoint may offer alternative and potentially simpler proofs than those presented in Sections 67.

References

  • [1] Ibai Aedo, Uwe Grimm, Yasushi Nagai, and Petra Staynova. Monochromatic arithmetic progressions in binary Thue-Morse-like words. Theoretical Computer Science, 934:65–80, 2022. doi:10.1016/J.TCS.2022.08.013.
  • [2] Jean-Paul Allouche and Jeffrey O. Shallit. The ubiquitous Prouhet-Thue-Morse sequence. In Cunsheng Ding, Tor Helleseth, and Harald Niederreiter, editors, Sequences and their Applications - Proceedings of SETA 1998, Singapore, December 14-17, 1998, Discrete Mathematics and Theoretical Computer Science, pages 1–16. Springer, 1998. doi:10.1007/978-1-4471-0551-0_1.
  • [3] Hideo Bannai, Mitsuru Funakoshi, Tomohiro I, Dominik Köppl, Takuya Mieno, and Takaaki Nishimoto. A separation of γ and b via Thue-Morse words. In Thierry Lecroq and Hélène Touzet, editors, String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4-6, 2021, Proceedings, volume 12944 of Lecture Notes in Computer Science, pages 167–178. Springer, 2021. doi:10.1007/978-3-030-86692-1_14.
  • [4] Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya Tsuruta. The “runs” theorem. SIAM J. Comput., 46(5):1501–1514, 2017. doi:10.1137/15M1011032.
  • [5] Hideo Bannai, Tomohiro I, and Yuto Nakashima. On the compressiveness of the Burrows-Wheeler transform. In Paola Bonizzoni and Veli Mäkinen, editors, 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025, June 17-19, 2025, Milano, Italy, LIPIcs. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.48550/arXiv.2411.11298.
  • [6] Jean Berstel and Dominique Perrin. The origins of combinatorics on words. European Journal of Combinatorics, 28(3):996–1022, 2007. doi:10.1016/J.EJC.2005.07.019.
  • [7] Srecko Brlek. Enumeration of factors in the Thue-Morse word. Discrete Applied Mathematics, 24(1-3):83–96, 1989. doi:10.1016/0166-218X(92)90274-E.
  • [8] Srecko Brlek, Andrea Frosini, Ilaria Mancini, Elisa Pergola, and Simone Rinaldi. Burrows-Wheeler transform of words defined by morphisms. In Charles J. Colbourn, Roberto Grossi, and Nadia Pisanti, editors, Combinatorial Algorithms - 30th International Workshop, IWOCA 2019, Pisa, Italy, July 23-25, 2019, Proceedings, volume 11638 of Lecture Notes in Computer Science, pages 393–404. Springer, 2019. doi:10.1007/978-3-030-25005-8_32.
  • [9] Jin Chen, Zhi-Xiong Wen, and Wen Wu. On the additive complexity of a Thue-Morse-like sequence. Discrete Applied Mathematics, 260:98–108, 2019. doi:10.1016/J.DAM.2019.01.008.
  • [10] Maxime Crochemore, Lucian Ilie, and Wojciech Rytter. Repetitions in strings: Algorithms and combinatorics. Theoretical Computer Science, 410(50):5227–5235, 2009. doi:10.1016/J.TCS.2009.08.024.
  • [11] Francesco Dolce. String attractors for factors of the Thue-Morse word. In Anna E. Frid and Robert Mercas, editors, Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13899 of Lecture Notes in Computer Science, pages 117–129. Springer, 2023. doi:10.1007/978-3-031-33180-0_9.
  • [12] Xavier Droubay. Palindromes in the Fibonacci word. Information Processing Letters, 55(4):217–221, 1995. doi:10.1016/0020-0190(95)00080-V.
  • [13] Aviezri S. Fraenkel and Jamie Simpson. The exact number of squares in Fibonacci words. Theoretical Computer Science, 218(1):95–106, 1999. doi:10.1016/S0304-3975(98)00252-7.
  • [14] Andrea Frosini, Ilaria Mancini, Simone Rinaldi, Giuseppe Romana, and Marinella Sciortino. Logarithmic equal-letter runs for BWT of purely morphic words. In Volker Diekert and Mikhail V. Volkov, editors, Developments in Language Theory - 26th International Conference, DLT 2022, Tampa, FL, USA, May 9-13, 2022, Proceedings, volume 13257 of Lecture Notes in Computer Science, pages 139–151. Springer, 2022. doi:10.1007/978-3-031-05578-2_11.
  • [15] Narges Ghareghani, Morteza Mohammad Noori, and Pouyeh Sharifani. Some properties of the k-bonacci words on infinite alphabet. The Electronic Journal of Combinatorics, 27(3):3, 2020. doi:10.37236/9406.
  • [16] Narges Ghareghani and Pouyeh Sharifani. On square factors and critical factors of k-bonacci words on infinite alphabet. Theoretical Computer Science, 865:34–43, 2021. doi:10.1016/j.tcs.2021.02.027.
  • [17] France Gheeraert, Giuseppe Romana, and Manon Stipulanti. String attractors of fixed points of k-bonacci-like morphisms. In Anna E. Frid and Robert Mercas, editors, Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13899 of Lecture Notes in Computer Science, pages 192–205. Springer, 2023. doi:10.1007/978-3-031-33180-0_15.
  • [18] Peaker Guo, Patrick Eades, Anthony Wirth, and Justin Zobel. Exploiting new properties of string net frequency for efficient computation. In Shunsuke Inenaga and Simon J. Puglisi, editors, 35th Annual Symposium on Combinatorial Pattern Matching, CPM 2024, June 25-27, 2024, Fukuoka, Japan, volume 296 of LIPIcs, pages 16:1–16:16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2024. doi:10.4230/LIPICS.CPM.2024.16.
  • [19] Peaker Guo, Seeun William Umboh, Anthony Wirth, and Justin Zobel. Online computation of string net frequency. In Zsuzsanna Lipták and Edleno Moura, editors, String Processing and Information Retrieval - 31th International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings, Lecture Notes in Computer Science. Springer, 2024. doi:10.1007/978-3-031-72200-4_12.
  • [20] Dan Gusfield and Jens Stoye. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences, 69(4):525–546, 2004. doi:10.1016/J.JCSS.2004.03.004.
  • [21] Vesa Halava, Tero Harju, Tomi Kärki, and Michel Rigo. On the periodicity of morphic words. In Yuan Gao, Hanlin Lu, Shinnosuke Seki, and Sheng Yu, editors, Developments in Language Theory, 14th International Conference, DLT 2010, London, ON, Canada, August 17-20, 2010. Proceedings, volume 6224 of Lecture Notes in Computer Science, pages 209–217. Springer, 2010. doi:10.1007/978-3-642-14455-4_20.
  • [22] Costas S. Iliopoulos, Dennis W. G. Moore, and William F. Smyth. A characterization of the squares in a Fibonacci string. Theoretical Computer Science, 172(1-2):281–291, 1997. doi:10.1016/S0304-3975(96)00141-7.
  • [23] Shunsuke Inenaga. Faster and simpler online computation of string net frequency. CoRR, 2024. doi:10.48550/arXiv.2410.06837.
  • [24] Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Factorizing strings into repetitions. Theory of Computing Systems, 66(2):484–501, 2022. doi:10.1007/S00224-022-10070-3.
  • [25] Marieh Jahannia, Morteza Mohammad Noori, Narad Rampersad, and Manon Stipulanti. Closed Ziv-Lempel factorization of the m-bonacci words. Theoretical Computer Science, 918:32–47, 2022. doi:10.1016/j.tcs.2022.03.019.
  • [26] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323–350, 1977. doi:10.1137/0206024.
  • [27] Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. On repetitiveness measures of Thue-Morse words. In Christina Boucher and Sharma V. Thankachan, editors, String Processing and Information Retrieval - 27th International Symposium, SPIRE 2020, Orlando, FL, USA, October 13-15, 2020, Proceedings, volume 12303 of Lecture Notes in Computer Science, pages 213–220. Springer, 2020. doi:10.1007/978-3-030-59212-7_15.
  • [28] Yih-Jeng Lin and Ming-Shing Yu. Extracting Chinese frequent strings without dictionary from a Chinese corpus and its applications. Journal of Information Science and Engineering, 17(5):805–824, 2001. URL: https://jise.iis.sinica.edu.tw/JISESearch/pages/View/PaperView.jsf?keyId=86_1308.
  • [29] Yih-Jeng Lin and Ming-Shing Yu. The properties and further applications of Chinese frequent strings. International Journal of Computational Linguistics and Chinese Language Processing, 9(1), 2004. URL: http://www.aclclp.org.tw/clclp/v9n1/v9n1a7.pdf.
  • [30] M. Lothaire. Combinatorics on words, Second Edition. Cambridge mathematical library. Cambridge University Press, 1997.
  • [31] Takuya Mieno and Shunsuke Inenaga. Space-efficient online computation of string net occurrences. In Paola Bonizzoni and Veli Mäkinen, editors, 36th Annual Symposium on Combinatorial Pattern Matching, CPM 2025, June 17-19, 2025, Milano, Italy, LIPIcs. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2025. doi:10.4230/LIPIcs.CPM.2025.23.
  • [32] Gonzalo Navarro, Carlos Ochoa, and Nicola Prezza. On the approximation ratio of ordered parsings. IEEE Transactions on Information Theory, 67(2):1008–1026, 2021. doi:10.1109/TIT.2020.3042746.
  • [33] Enno Ohlebusch, Thomas Büchler, and Jannik Olbrich. Faster computation of Chinese frequent strings and their net frequencies. In Zsuzsanna Lipták and Edleno Moura, editors, String Processing and Information Retrieval - 31th International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings, Lecture Notes in Computer Science. Springer, 2024. doi:10.1007/978-3-031-72200-4_19.
  • [34] Jakub Radoszewski and Wojciech Rytter. On the structure of compacted subword graphs of Thue-Morse words and their applications. Journal of Discrete Algorithms, 11:15–24, 2012. doi:10.1016/J.JDA.2011.01.001.
  • [35] Wojciech Rytter. The structure of subword graphs and suffix trees of Fibonacci words. Theoretical Computer Science, 363(2):211–223, 2006. doi:10.1016/j.tcs.2006.07.025.

Appendix A Proofs Omitted from Section 3

See 8

Proof.

Let (s,e) be the net occurrence, then T[s1e] and T[se+1] are both unique. Since T[se] contains at least one of these two strings as a substring, T[se] is also unique. Thus, (s,e) is not a net occurrence.

See 9

Proof.

Let (s,e) be the net occurrence, then T[se] is repeated. Since (s,e) is a proper sub-occurrence of (s,e), T[s1e] or T[se+1] is also repeated. Thus, (s,e) is not a net occurrence.

See 12

Proof.

Let (s,e) be a net occurrence in T that is outside of 𝒞. Assume, by contradiction, that (s,e) is not a super-occurrence of any occurrence (i1,j+1), where (i,j) is an occurrence in the set of BNSOs, {(i2,j1),(i3,j2),(ic,jc1)}. We consider the following two cases depending on the position of s.

First, when ik+1s<ik+2 for some 0kc2. Given our assumption, (s,e) is not a super-occurrence of (ik+21,jk+1+1), where (ik+2,jk+1) is a BNSO. It follows that e satisfies ejk+1, implying that (s,e) must be a sub-occurrence of (ik+1,jk+1), which is a net occurrence in 𝒞. Now, if (s,e) is a proper sub-occurrence of (ik+1,jk+1), this this contradicts Observation 9; if (s,e) is (ik+1,jk+1), it contradicts the assumption that (s,e) is a net occurrence in T outside of 𝒞. Second, when sic. In this case, (s,e) is a sub-occurrence of (ic,jc), the last net occurrence in 𝒞.

In both cases, (s,e) must be a sub-occurrence of some net occurrence in 𝒞, leading to a contradiction of either the assumption or Observation 9. Therefore, our initial assumption was false and we conclude that (s,e) is indeed a super-occurrence of (i1,j+1), where (i,j) is a BNSO of 𝒞.

Appendix B Proofs Omitted from Section 6

See 25

Proof.

By Lemma 15, there are only three positions where Fi2Qi could occur. First observe that Fi2Qi cannot occur at position fi1+1. Next, by Observation 14 and Lemma 23, the occurrences of Fi2 at positions 1 and fi2+1 are both followed by Qi, thus Fi2Qi only occurs at these two positions.

See 27

Proof.

Note that |Qi|=j=2i5|Fj|=(j=1i5fj)f1=fi31f1=fi32 where the third equality comes from the fact that j=1kfj=fk+21.

See 28

Proof.

By Lemma 25 and Lemma 24, Fi2Qi only occurs twice in Fi, and both are net occurrences, which means the extensions are unique. Thus, any string containing Fi2Qi as a substring is also unique.

See 29

Proof.

Observe that the occurrence of Fi3 at position fi3+1 is followed by Fi6Fi5=Qi1Δ(1(imod2)) while the other three occurrences of Fi3 are all followed by Fi4=Fi5Fi6=Qi1Δ(imod2).

See 30

Proof.

From the proof of Lemma 29, Fi3 is only followed by Fi6Fi5 once and by Fi4 three times, thus Fi3Fi6Fi5 is unique. Next, by Lemma 27, |Fi3Qi1|=fi3+fi42=fi22. Also from the proof of Lemma 29, Fi3Qi1 is only followed by Δ(1(imod2)) once and by Δ(imod2) three times, thus, the length-(fi21) prefix of Fi3Qi1Δ(1(imod2))=Fi3Fi6Fi5 is also unique.

See 31

Proof.

Let U be the length-(fi31) prefix of Fi3. By Equation 1, Fi3=QiΔ(1(imod2). Note that U[|U|1] is always a because Qi ends with F2=a. When U[|U|]=a, the right extension character of U is always b. This is because, if it were not, aaa would occur in Fi, contradicting Lemma 3. On the other hand, when U[|U|]=b, the right extension character of U is always a. This is because, similarly, an occurrence of bb would contradict Lemma 3. Finally, observe that, whether U[|U|] is a or b, U[|U|] concatenated with the right extension character of U is exactly Δ(1(imod2). Therefore, the desired result follows.

Appendix C Proof Omitted from Section 7

See 36

Proof.

The first two follow from Equation 10 and Equation 11. We next proceed by repeatedly applying the definition of Thue-Morse words. Substituting 𝒯i2=𝒯i3𝒯i3¯=𝒯i3𝒯i4¯𝒯i4 and 𝒯i3𝒯i2=𝒯i4𝒯i4¯𝒯i3𝒯i3¯=𝒯i4𝒯i4¯𝒯i4𝒯i4¯𝒯i3¯=𝒯i4𝒯i3¯𝒯i4¯𝒯i3¯ to Equation 10, we have Equation 8. Finally, observe that 𝒯i4𝒯i3¯=𝒯i4𝒯i4¯𝒯i4=𝒯i3𝒯i4 and similarly, 𝒯i3¯𝒯i4¯=𝒯i4¯𝒯i4𝒯i4¯=𝒯i4¯𝒯i3. Substituting them to Equation 8, we derive Equation 9.

Appendix D A Factorization of Thue-Morse Word

First, we define a smallest factorization of a string as one that contains the fewest number of factors while satisfying certain conditions. In this section, we explore two smallest factorizations of 𝒯i, each containing all occurrences of 𝒯ij and 𝒯ij¯, respectively. Observe that such factorizations exist due to the overlap-free property of each Thue-Morse word (Lemma 5).

Definition 42.

For each i2 and 0ji1, we define the following.

  • Let i,jA and i,jB denote the smallest factorization of 𝒯i that contains all occurrences of 𝒯ij and 𝒯ij¯ in 𝒯i, respectively.

  • Define (Tij):=Ti(j+1) and (Tij¯):=Ti(j+1)¯.

  • Consider i,j{i,jA,i,jB} and suppose i,j=(xt)t=1m. We define two operators on i,j:

    (i,j):=((xt))t=1mandi,j¯:=(xt¯)t=1m.
  • Consider two factorizations 𝒳=(xk)k=1m and 𝒴=(yk)k=1. If xm=y1=𝒯ij¯, then

    𝒳𝒴:=(x1,x2,,xm1,𝒯i(j+1)¯,𝒯ij,𝒯i(j+1),y2,y3,,y).

For the definition of operator , note that |𝒳𝒴|=|𝒳|+|𝒴|+1 and

𝒯ij¯𝒯ij¯=𝒯i(j+1)¯𝒯i(j+1)𝒯i(j+1)¯𝒯i(j+1)=𝒯i(j+1)¯𝒯ij𝒯i(j+1).

We next introduce a simple characteristic of i,jA and i,jB.

Observation 43.

For each i2 and 0ji1, consider a factorization 𝒳=(xk)k=1m of 𝒯i that contains all occurrences of 𝒯ij (respectively, 𝒯ij¯). If no two consecutive factors are both different from 𝒯ij (respectively, 𝒯ij¯), then 𝒳 is the smallest and thus 𝒳=i,jA (respectively, 𝒳=i,jB).

The observation holds because, otherwise, we could merge the two consecutive factors and obtain a smaller factorization.

Now, we present the main result of the section, illustrated in Figure 8.

Figure 8: An illustration of Theorem 44 for 1j4. Operators () and ()¯ are defined in Definition 42. Notice the green occurrences of 𝒯ij are introduced from .
Theorem 44.

For i2 and 0ji1, the following statements hold.

  1. (1)

    For each i,j{i,jA,i,jB}, let i,j=(xt)t=1m. Then each term xt is in the set i,j:={𝒯ij,𝒯ij¯,𝒯i(j+1),𝒯i(j+1)¯}. Moreover, x1=𝒯ij. If j is even, then xm=𝒯ij; otherwise, xm=𝒯ij¯.

  2. (2)

    i,0A=(𝒯i), i,1A=(𝒯i1,𝒯i1¯), and for each j2,

    i,jA={(i,j1A)(i,j1B)¯,j is odd,(i,j1A)(i,j1B)¯,j is even.
  3. (3)

    i,0B=(), i,1B=(𝒯i1,𝒯i1¯), and for each j2, i,jB=(i,j1B)(i,j1A)¯.

Proof.

We proceed by induction on j.

Base cases.

The claim holds trivially for j=0. When j=1, (𝒯i1,𝒯i1¯) is the smallest factorization following Observation 43 and Statement 1 holds. Next, note that (i,1A)=(i,1B)=(𝒯i2,𝒯i2¯). When j=2, it follows that

i,2A=(i,1A)(i,1B)¯=(𝒯i2,𝒯i2¯)(𝒯i2¯,𝒯i2)=(𝒯i2,𝒯i3¯,𝒯i2,𝒯i3,𝒯i2) (10)

and

i,2B=(i,1B)(i,1A)¯=(𝒯i2,𝒯i2¯)(𝒯i2¯,𝒯i2)=(𝒯i2,𝒯i2¯,𝒯i2¯,𝒯i2) (11)

Both factorizations are the smallest following Observation 43, and Statement 1 holds.

Inductive step.

Let k be an odd integer such that 3ki1, and assume the claim holds for j=k1. Specifically, assume i,k1A=(xt)t=1m, i,k1B=(yt)t=1l, x1=y1=𝒯i(k1), and xm=yl=𝒯i(k1). We now prove the result for j=k.

First, note that both i,k1A and i,k1B are factorizations of 𝒯i. Then, by the definition of operation (), both (i,k1A) and (i,k1B) are factorizations of 𝒯i1, and (i,k1B)¯ is a factorization of 𝒯i1¯. Now, since 𝒯i=𝒯i1𝒯i1¯, it follows that 𝒴oddA:=(i,k1A)(i,k1B)¯ is a factorization of 𝒯i. It remains to show that 𝒴oddA=i,jA. Observe that

𝒴oddA=(i,k1A)(i,k1B)¯=(x1)(xm1)𝒯ik𝒯ik¯(y2)¯(yl)¯. (12)

By the induction hypothesis on i,k1A and i,k1B and the definition of operation (), we know that (i,k1A) contains all the occurrences of 𝒯ik in 𝒯i1, and (i,k1B)¯ contains all the occurrences of 𝒯ik in 𝒯i1¯. Since 𝒯i=𝒯i1𝒯i1¯ and 𝒯i is overlap-free, it follows that 𝒴oddA contains all the occurrences of 𝒯ik. Moreover, no two consecutive factors of 𝒴oddA are both different from 𝒯ik, so by Observation 43, we conclude that 𝒴oddA=i,jA.

Next we show that all factors of i,jA are elements of i,k. Since xti,k1 for each 1tm and yti,k1 for each 1tl, it follows from Equation 12 that all factors of i,jA are elements of i,k. Additionally, the first factor in i,jA is (x1)=𝒯ik, and the last factor in i,jA is (yl)¯=𝒯ik¯ since k1 is even.

Similarly, we can show that 𝒴oddB:=(i,k1B)(i,k1A)¯ is a factorization of 𝒯i, that 𝒴oddB=i,jB, and that Statement 1 holds.

We can prove analogously when k is even. In this case, the operation is used to ensure that i,jA contains all occurrences of 𝒯ij. Specifically, when k is even, we have (xm)=(y1)¯=𝒯ik¯, and there is a occurrence of 𝒯ik within 𝒯ik¯𝒯ik¯=𝒯i(k+1)¯𝒯ik𝒯i(k+1).