Abstract 1 Introduction 2 Preliminaries 3 New combinatorial properties of injective morphisms 4 Sensitivity of the Measure 𝒓 to the Application of Morphisms 5 Characterization of Binary BWT-Run Preserving Morphisms 6 Morphisms with bounded multiplicative sensitivity 7 Conclusions and future work References

Morphisms and BWT-Run Sensitivity

Gabriele Fici ORCID Department of Mathematics and Computer Science, University of Palermo, Italy Giuseppe Romana ORCID Department of Mathematics and Computer Science, University of Palermo, Italy Marinella Sciortino ORCID Department of Mathematics and Computer Science, University of Palermo, Italy Cristian Urbina ORCID Department of Computer Science, University of Chile, Santiago, Chile
Centre for Biotechnology and Bioengineering (CeBiB), Santiago, Chile
Abstract

We study how the application of morphisms affects the number r of equal-letter runs in the Burrows–Wheeler Transform (BWT). This parameter has emerged as a key repetitiveness measure in compressed indexing. We focus on the notion of BWT-run sensitivity after application of morphisms. For binary alphabets, we characterize the class of injective morphisms that preserve the number of BWT-runs up to a bounded additive increase by showing that it coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. We further prove that deciding whether a given binary morphism has bounded BWT-run sensitivity is possible in polynomial time with respect to the total length of the images of the two letters. Additionally, we explore new structural and combinatorial properties of synchronizing and recognizable morphisms. These results establish new connections between BWT-based compressibility, code theory, and symbolic dynamics.

Keywords and phrases:
Burrows–Wheeler transform, BWT-runs, morphism, pure code, repetitiveness
Funding:
Gabriele Fici: Supported by MUR project PRIN 2022 APML – 20229BCXNW, funded by the European Union – Mission 4 “Education and Research” C2 – Investment 1.1.
Giuseppe Romana: Supported by the project “ACoMPA – Algorithmic and Combinatorial Methods for Pangenome Analysis” (CUP B73C24001050001) funded by the NextGeneration EU programme PNRR MUR M4 C2 Inv. 1.5 – Project ECS00000017 Tuscany Health Ecosystem (Spoke 6), CUP Master B63C22000680007 and by the INdAM – GNCS Project CUP_E53C24001950001.
Marinella Sciortino: Supported by the MUR PRIN Project “PINC, Pangenome INformatiCs: from Theory to Applications” (Grant No. 2022YRB97K), funded by Next Generation EU PNRR M4 C2, Inv. 1.1 and by the INdAM – GNCS Project CUP_E53C24001950001.
Cristian Urbina: Basal Funds FB0001 and AFB240001, ANID, Chile; FONDECYT Project 1-230755, ANID, Chile; ANID-Subdirección de Capital Humano/Doctorado Nacional/2021-21210580, ANID, Chile; NIC Chile Doctoral Scholarship, NIC, Chile.
Copyright and License:
[Uncaptioned image] © Gabriele Fici, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithms
; Theory of computation Formal languages and automata theory ; Mathematics of computing Combinatorics on words
Editors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał Skrzypczak

1 Introduction

Morphisms are a powerful combinatorial mechanism for generating a collection of repetitive texts, and have been largely used in the field of combinatorics on words and formal languages [22, 31]. Formally, a morphism maps each character of an alphabet to a word over the same or another alphabet, by preserving the operation of concatenation. That is, if μ is a morphism and u and v are words, then μ(uv)=μ(u)μ(v). Iterating morphisms can produce long and often highly repetitive sequences, which makes them a natural model for studying repetitiveness in words. Morphisms find applications in a wide range of contexts. Injective morphisms are widely used in information theory, data compression, and cryptography, as they define uniquely decodable codes [4]. More recently, morphisms have been employed in combination with copy-paste mechanisms to define novel compression schemes, known as NU-systems [28], further highlighting their versatility in modeling and processing repetitive data.

The Burrows–Wheeler Transform (BWT) is a reversible transformation introduced in 1994 in the field of data compression [6] and now underpins some of the most used tools in bioinformatics, such as bwa [21, 34] and bowtie [20, 19]. It permutes the characters of a text in a way that makes it more compressible, by clustering characters that precede similar contexts in the text. This property often results in long runs of identical characters, particularly in repetitive texts. The number r of such equal-letter runs, known as BWT-runs, has recently emerged as a measure of repetitiveness [27]. Several measures have been proposed to quantify repetitiveness in strings [26], such as the number z of phrases in the Lempel–Ziv parsing, the size g of the smallest context-free grammar generating the text, the size γ of the smallest string attractor [17, 8]. Among these, the measure r has recently attracted considerable attention due to its close connection with compressed indexing structures, such as the r-index [13], which use space proportional to r and support efficient pattern matching and retrieval in highly repetitive text collections, including genomic datasets and versioned document archives. Akagi et al. [1] explored the question of how much one character edit affects compression-based repetitiveness measures. In [14], the effect of single edit operations on the measure r has also been analyzed.

In this paper, we study how the application of a binary morphism affects the measure r, i.e., the number of BWT-runs. We focus on two notions of BWT-run sensitivity, which capture how much the number of BWT-runs can change when a morphism μ is applied to a word. The additive sensitivity function ASμ gives, for every n>0, the maximum increase in the number of BWT-runs that can occur when applying the morphism μ to any word of length n, while the multiplicative sensitivity function MSμ gives, for every n>0, the maximum ratio between the number of BWT-runs after and before the morphism μ is applied, over all words of length n. These notions allow us to quantify the impact of a morphism on the compressibility of the resulting text. An initial approach to the study of how morphisms affect the number of BWT-runs was given in [12], where we showed that Sturmian morphisms are the only binary injective morphisms that preserve the number of BWT-runs. Here, we tackle the problem of characterizing those binary morphisms that preserve the BWT-based compressibility of a text, in the sense that they have an additive sensitivity function bounded by a constant. We first prove that all non-injective binary morphisms have a bounded additive sensitivity function. In the injective case, which is the most interesting one, we prove that this class coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. As a direct consequence, for these morphisms the multiplicative sensitivity function is also bounded. Primitivity-preserving morphisms are a well-studied class in algebraic theory of codes, and they are crucial in applications involving symbolic sequences, code synchronization, and the structural analysis of words [32, 30, 11, 25, 22, 4, 15].

In addition to establishing a novel connection between BWT-based compressed indexing, combinatorics on words, and code theory, a key contribution of our paper consists in identifying new combinatorial and structural properties of primitivity-preserving, recognizable, and synchronizing morphisms. These properties are central to our main results but also hold independent interest in information theory and symbolic dynamics, where such morphisms play a fundamental role in coding, synchronization, and symbolic representations of dynamical systems [3, 4]. In fact, recognizability ensures that the morphic image of a word can be uniquely decomposed, up to rotations, into a sequence of morphic images of the letters of the alphabet. Synchronizing morphisms guarantee that a window of bounded length is sufficient to detect boundaries between codewords, a property that is crucial for decoding and synchronization in data streams.

We further show that all binary morphisms have bounded multiplicative sensitivity, but this result does not extend to alphabets with more than two symbols.

Furthermore, we prove that it is decidable in polynomial time whether the additive sensitivity function of a binary morphism is bounded by a constant, which makes our results practically applicable to the design of compression and indexing techniques that work directly on morphic encodings of highly repetitive text collections. Such a result builds upon fundamental results in the field of combinatorics on words, including properties of codes and solutions to word equations.

The rest of the paper is organized as follows. In Section 2, we present the preliminaries on words, morphisms, and the BWT. Section 3 introduces new combinatorial and structural properties of primitivity-preserving, recognizable, and synchronizing morphisms. Section 4 formalizes the sensitivity of r with respect to the application of morphisms and motivates our measures. Section 5 contains our main theorem characterizing the morphisms with a bounded additive sensitivity function. Section 6 discusses the multiplicative case, and Section 7 concludes with final remarks and open problems.

2 Preliminaries

Basics

Let Σ={a1,a2,,aσ} be a finite sorted set of letters a1<a2<<aσ, which we call an alphabet. A finite word w=w[1]w[2]w[n] is any finite sequence of letters where w[i]Σ, for i[1,n], and n=|w| is the length of the word. The empty word, denoted by ε, is the unique word of length 0. The set of all finite words (resp. all non-empty words) over the alphabet Σ is denoted by Σ (resp. Σ+). For a letter aiΣ, |w|ai denotes the number of occurrences of ai in w. The vector (|w|a1,,|w|aσ) is called the Parikh vector of w.

If u=u[1]u[n] and v=v[1]v[m] are words, the concatenation of u and v is the word uv=u[1]u[n]v[1]v[m]. We let Πi=1kwi denote the concatenation of the words w1,w2,,wk in that order, and wk the concatenation of the word w with itself k times.

For any 1ij|w|, we use the notation w[i,j] to denote the word w[i]w[i+1]w[j], which we call a factor of w. If i>j, then we assume w[i,j]=ε. We let (w) denote the set of all factors of w. For any Σ, we write ()=w(w). A factor of w is proper if it is different from w itself. The factor w[i,j] is called a prefix when i=1, and a suffix when j=n. The longest common prefix between two words u and v is the longest word that is a prefix of both words. The length of this word is denoted by lcp(u,v). The longest common suffix and the associated function lcs are defined symmetrically.

The run-length encoding of a word w, denoted rle(w), is the sequence of pairs (ci,li) with ciΣ and li>0, such that w=c1l1c2l2crlr and cici+1 for every i[1,r1]. The length |rle(w)| is the number of equal-letter runs in w.

A rotation, or conjugate, of the word w=w[1]w[2]w[n] is a word of the form w[i+1,n]w[1,i], for some 0i<n, obtained by shifting i letters cyclically. We let (w) denote the multiset of all the |w| rotations of w. A word in ~(w):=((w)) is called a circular factor of w.

A word w is primitive if for every word uΣ+, w=uk implies k=1; otherwise, w is called non-primitive (or a power). A word of length n is primitive if and only if it has exactly n distinct rotations, i.e., if (w) has all-distinct elements. We let Q(Σ) denote the set of all primitive words in Σ, and Q(Σ)¯ the set of all non-primitive words in Σ. We say two non-empty words u,v commute if uv=vu. This is equivalent to saying that both uv and vu are not primitive.

Codes and morphisms

A set XΣ+ is a code if for all m,0 and ui,vjX with i[1,],j[1,m], the equation u1u2u=v1v2vm implies that =m and uk=vk, for all k[1,]. Or equivalently, every word wX+ has a unique factorization in words in X. Given a word wX+, a word u is an X-factor of w if there exists a rotation w of w (which can be w itself) that can be factored as w=sup such that u,psX.

Whenever a code X consists of two words, the following property holds [16, 32].

Lemma 1.

A set X={u,v}, u,vΣ+, is a code if and only if u and v do not commute, i.e., uvvu.

If u and v do not commute, then they are not powers of the same word, but in principle this does not exclude the case that either u, v, or both are non-primitive. For example, X={aa,bb} is a code.

Let Σ and Γ be two alphabets. A morphism μ is a map from Σ to Γ such that μ(uv)=μ(u)μ(v) for all words u,vΣ. Therefore, a morphism μ can be defined by specifying its action on the letters of Σ, and can therefore be denoted as μ=(μ(a1),,μ(aσ)). The size of the morphism μ is defined as |μ|=cΣ|μ(c)|. When Σ=Γ, for all t>0 and wΣ+, we have μt(w)=μ(μt1(w)) and μ0(w)=w.

 Remark 2.

Let μ be a morphism. If w and w are conjugates, then so are μ(w) and μ(w). Moreover, since every conjugate of a power is a power, if μ(w) is a power, so is μ(w) for every conjugate w of w.

A morphism μ is cyclic if there exists zΓ+ such that μ(a)z for each aΣ. Otherwise, it is called acyclic.

As shown in the following proposition, there is a very strong relation between codes and injective morphisms.

Proposition 3 ([4]).

Let XΓ be a code. Then, any morphism μ:ΣΓ which induces a bijection of some alphabet Σ onto X is injective. Conversely, let μ:ΣΓ be an injective morphism. Then, X=μ(Σ) is a code.

 Remark 4.

By Lemma 1 and Proposition 3, one has that for a binary morphism μ:{a,b}Γ, injectivity is equivalent to acyclicity, which in turn is equivalent to the condition μ(ab)μ(ba). For this reason, injective morphisms are often considered in applications, such as theory of codes, information theory, etc.

Examples of injective morphisms are the Fibonacci morphism φ=(ab,a), the Thue–Morse morphism τ=(ab,ba), and the period-doubling morphism π=(ab,aa).

From the relationship between codes and morphisms, many properties of codes are reflected in the corresponding properties of injective morphisms. Combinatorial properties of injective morphisms are explored in Section 3.

The Fibonacci morphism belongs to a wider class of morphisms called Sturmian morphisms, strictly related to the well-known Sturmian words [5]. Sturmian morphisms can be defined as those that can be obtained by composition from: the Fibonacci morphism φ, the morphism E=(b,a), and the morphism φ~=(ba,a).

Let us suppose that both Σ and Γ are endowed with a total order relation that yields a lexicographic order, denoted by <Γ and <Σ, respectively. A morphism μ:ΣΓ is abelian order-preserving if for every pair of distinct words x,yΣ having the same Parikh vector, it holds that x<Σyμ(x)<Γμ(y). A morphism μ is abelian order-reversing if for every pair of distinct words x and y having the same Parikh vector, it holds that x<Σyμ(x)>Γμ(y). We simply write < whenever Σ and Γ are clear from the context.

When Σ={a,b}, the following result holds.

Lemma 5 ([12]).

Let μ:{a,b}Γ be an acyclic morphism. Then μ is either abelian order-preserving or abelian order-reversing.

For our purposes, the fact that binary acyclic morphisms are either abelian order-preserving or abelian order-reversing is a crucial property, since it implies that they preserve or reverse the order on the set of rotations of any given binary word.

Burrows–Wheeler Transform

The Burrows–Wheeler transform (BWT) of a word w, denoted by bwt(w), is a permutation of the letters of w obtained by sorting all the rotations of w in ascending lexicographic order and then concatenating the last letter of each rotation. The original word can be recovered if one stores the position where it appears in the list of sorted rotations. Figure 1 shows the sorted rotations of the word w=φ4(a)=abaababa and bwt(w)=bbbaaaaa.

We let r(w) denote the number of equal-letter runs of bwt(w), i.e.,

r(w)=|rle(bwt(w))|.

Such a value can be considered as a measure of the repetitiveness of w. In fact, if a word w is highly repetitive, the number of equal-letter runs of its BWT tends to be small. From Figure 1, one can see that r(abaababa)=2.

One can verify that for each word v(w), bwt(v)=bwt(w) and, consequently, r(v)=r(w) and r(μ(v))=r(μ(w)) for every morphism μ.

Let w be a non-primitive word, i.e., w=zp, for some zΣ+ and p>1. It is well known that if bwt(z)=a1a2a|z|, then bwt(w)=a1pa2pa|z|p [24]. This implies that r(w)=r(z).

Some results proved in [24, 29, 10] establish a strong connection between the BWT and Sturmian morphisms, as synthesized in the following theorem.

Theorem 6.

Let w be a word over {a,b} that is not a power of a single letter. Then the following are equivalent:

  1. 1.

    w=(μ(a)) for a Sturmian morphism μ and for some >0.

  2. 2.

    r(w)=2.

aabaababaababaababaabaababaababaababaababaabaababaababaababaabaa

Figure 1: BWT-matrix of the word φ4(a)=abaababa: for each i, the ith row corresponds to the ith rotation of φ4(a) in lexicographic order, and the Burrows–Wheeler Transform bwt(φ4(a))=bbbaaaaa=b3a5 is highlighted in bold in the last column. So, r(abaababa)=2.

3 New combinatorial properties of injective morphisms

This section focuses on some combinatorial properties and characterizations of some classes of morphisms which are well-known in the context of coding theory and symbolic dynamics. The results provided in this section may be of independent interest and will later be related to BWT-run sensitivity in the next sections.

3.1 Primitivity-Preserving Morphisms

A morphism μ:ΣΓ is called primitivity-preserving if for every wQ(Σ), it holds that μ(w)Q(Γ), that is, primitive words are mapped to primitive words. Primitivity-preserving morphisms are injective, and the associated codes are known in the literature as pure codes [25]. Such codes have been introduced in [30] to study the relationships between locally testable languages and synchronizing properties of codes.

Given a morphism μ:ΣΓ, we call a primitive word w a μ–power if μ(w)=zk, for some primitive word z and an integer k>1. Intuitively, it is a word that witnesses the non-primitivity-preserving property of a morphism. By Pμ we refer to the set of all μ–power words. From the definition, hence, Pμ= if and only if the morphism μ is primitivity-preserving.

Example 7.

Let π=(ab,aa) be the period-doubling morphism. The word b is a π–power, since π(b)=a2. Hence, bPπ, and π is not primitivity-preserving.

Example 8.

Let μ=(a,bab). The word ab is a μ–power, since μ(ab)=(ab)2. Hence, abPμ, and μ is not primitivity-preserving.

In this section, we prove a new characterization of the decompositions of binary primitivity-preserving morphisms. To do so, we first recall the following lemma, characterizing the combinatorial structure of binary primitivity-preserving morphisms.

Lemma 9 ([16, Theorem 3.1]).

Let μ=(u,v) be an injective morphism, with u,v two distinct primitive words. Then μ is a primitivity-preserving morphism if and only if all words in {unvmn,m1} are primitive.

The following lemma describes what happens when the property of Lemma 9 is not verified. In particular, it considers the combinatorial structure of the non-primitive words generated by the morphism when applied to some primitive word distinct from a single letter. Recall that if unvm=zk, for some primitive word z and k>1, then we can derive that n=1 or m=1 (see [23]).

Lemma 10 ([2, 33]).

Let μ=(u,v) be an injective morphism, and let W={unvn1}{uvnn>1}. Then, there is at most one primitive word z and one integer k>1 such that zkW, i.e. |WQ({a,b})¯|1. Moreover, let Y=μ(Q({a,b})2)Q({a,b})¯. Then Y=(zk){u,v}+.

The next lemma provides a characterization of the structure of the set Pμ for an injective morphism μ.

Lemma 11.

Let μ=(u,v) be an injective morphism, and let W={unvn1}{uvnn>1}. We can distinguish the two cases:

  1. 1.

    |WQ({a,b})¯|=0. Then only one of the following occurs:

    1. (a)

      Pμ=;

    2. (b)

      Pμ={c}, for some c{a,b};

    3. (c)

      Pμ={a,b}.

  2. 2.

    |WQ({a,b})¯|=1. Then there exists a unique w{a,b} such that μ(w)WQ({a,b})¯, and only one of the following occurs:

    1. (a)

      Pμ=(w);

    2. (b)

      Pμ=(w){c}, for some c{a,b}.

Note that, among the cases described in Lemma 11, the Case 1a is the only one in which every primitivity-preserving morphism μ=(u,v) falls. In this case, both u and v are primitive words. If the morphism μ=(u,v) is not primitivity-preserving and WQ({a,b})¯=, then we deduce from Lemma 11 that either only one between u and v is a non-primitive word (Case 1b), or both are non-primitive words (Case 1c).

A classification of the non-primitivity-preserving morphisms μ=(u,v) that fall in Cases 2a and 2b, with WQ({a,b})¯, and their respective μ-power words, can be derived from a result given in  [15, Theorem 8].

The set of primitivity-preserving morphisms is closed under composition, as stated in the following lemma.

Lemma 12.

Let μ1:ΣΓ, μ2:ΓΛ be two morphisms. If μ1 and μ2 are both primitivity-preserving, then μ2μ1 is primitivity-preserving too.

However, it is possible to obtain primitivity-preserving morphisms even as a composition of morphisms that do not necessarily satisfy this property. The following proposition gives a complete characterization.

Proposition 13.

Let μ:{a,b}{a,b} be an injective morphism. The morphism μ is primitivity-preserving if and only if, for all ψ,χ:{a,b}{a,b} such that μ=ψχ, it holds that χ=(p,q) is a primitivity-preserving morphism and ψ is an injective morphism such that Pψ{p,q}+=.

Example 14.

Let μ=(abaa,aaab). One can verify that μ=πτ, where π and τ are the period-doubling morphism and the Thue–Morse morphism, respectively. We have that π is not primitivity-preserving and Pπ={b}. Since τ is a primitivity-preserving morphism and b{ab,ba}+, from Proposition 13 it follows that μ is primitivity-preserving too.

Example 15.

Let ψ=(aba,b), and consider the morphism μ=(abab,baba)=ψτ, where τ is the Thue–Morse morphism, which is primitivity-preserving. In this case, ψ is not primitivity-preserving (since ψ(ab)=(ab)2), nor is μ.

3.2 Recognizable Morphisms

In this subsection, we focus on some structural and combinatorial properties of morphisms that generate words admitting a unique factorization in circular factors, similarly to the notion of circular code [4].

Let Σ. An injective morphism μ:ΣΓ is recognizable on μ() if for every non-empty word wμ() and every word w(w), there exist, and are unique, pΓ+, qΓ, zΣ, and cΣ, such that w=qμ(z)p and pq=μ(c). If =Σ, we simply say that μ is recognizable.

In other words, every image under a recognizable morphism μ has a unique circular factorization in words of μ(). Equivalently, a recognizable morphism on can be regarded as an injective map on the necklaces over , i.e., for all x,y, it holds that (μ(x))=(μ(y)) if and only if (x)=(y).

(a) w=baaabbabbbaa.
(b) w=baabaabaabaa.
Figure 2: On the left, the unique circular factorization of w=baaabbabbbaa into μ1(a)=baa and μ1(b)=abb. On the right, two distinct circular factorizations of w=baabaabaabaa into μ2(a)=baa and μ2(b)=aba, respectively in blue and red.
Example 16.

Let us consider the injective morphism μ1=(baa,abb). Such a morphism is recognizable since every word in μ1({a,b}) has a unique circular factorization into the words μ1(a) and μ1(b), as shown in Figure 2(a).

The recognizability of a morphism is well studied in the context of bi-infinite words and symbolic dynamics [3]. Here, it is adapted to necklaces, or circular words, which can be seen as periodic bi-infinite words. Note that most of the results known in the literature on bi-infinite words focus on the aperiodic case. Therefore, the results provided in this section can also be interpreted as contributions toward the less-explored setting of periodic bi-infinite words.

The following lemma establishes the close relationship between recognizable morphisms on μ(Σ) and a property related to the so-called very pure codes, which are properly included in the class of pure codes [30].

Lemma 17 ([4, Proposition 7.1.1]).

An injective morphism μ:ΣΓ is recognizable if and only if for every u,vΓ, if uv,vuμ(Σ) then u,vμ(Σ).

In the case of binary injective morphisms, the next lemma provides a useful characterization of recognizable morphisms.

Lemma 18.

A primitivity-preserving morphism μ=(u,v) is recognizable if and only if u and v are not conjugates.

Example 19.

Consider the injective morphism μ2=(baa,aba). Since μ2(a) and μ2(b) are conjugates, by Lemma 18, μ2 is not recognizable on μ2({a,b}), as shown in Figure 2(b), where two distinct circular factorizations of the word baabaabaabaa are indicated. Indeed, μ2(aaaa) and μ2(bbbb) are equal up to rotations. Figure 3(a) shows two distinct circular factorizations of (ab)6 into τ(a) and τ(b). Hence, τ is also not recognizable. One can note that μ2=φ~τ, where φ~=(ba,a), confirming the characterization established in the following theorem.

The following result shows an important structural property of the primitivity-preserving morphisms that are not recognizable. In particular, we prove that they can always be obtained by composing another morphism with the Thue–Morse morphism.

Theorem 20.

Let μ:{a,b}Γ be a primitivity-preserving morphism. Then exactly one of the following cases occurs:

  1. 1.

    μ is recognizable;

  2. 2.

    μ=ψτ for some injective morphism ψ, where τ is the Thue–Morse morphism.

Proof.

We show that, under the hypothesis of the theorem, it holds that μ=(uv,vu) if and only if μ=ψτ for some injective morphism ψ, and by Lemma 18, the thesis directly follows. For the first direction, one can define ψ=(u,v), and therefore μ=(ψ(τ(a)),ψ(τ(b)))=(ψ(ab),ψ(ba))=(uv,vu). For the other direction, if μ=ψτ, then μ=(ψ(τ(a)),ψ(τ(b)))=(ψ(ab),ψ(ba))=(ψ(a)ψ(b),ψ(b)ψ(a)), and the thesis follows.

3.3 Synchronizing Morphisms

An important notion in the context of injective morphisms is that of synchronization pair, which intuitively marks a position within a factor of a morphic image where the boundary between two codewords can be uniquely identified. Synchronization provides a way to “align” a segment of the morphic image of a circular word with the images of the letters of the alphabet.

Let μ:ΣΓ, Σ, and u~(μ()). We say that (u1,u2) is a synchronization pair of u on μ() if u=u1u2 and, for all v1,v2Γ and f~(), v1uv2=μ(f) implies v1u1=μ(f1) and u2v2=μ(f2), for some f1,f2~() such that f1f2=f.

Observe that a morphism μ is recognizable on μ()μ(Σ) if and only if every word wμ() admits at least one synchronization pair, since from it one can uniquely recover the preimage w=μ1(w), up to rotations.

The following notion of synchronization with delay gives a quantitative measure of the width of a window sliding along the morphic image of a circular word that guarantees the detection of a synchronization point. This is an adaptation to our setting of a definition commonly used in the context of HD0L-systems [7, 18].

We say that a morphism μ:ΣΓ is synchronizing with delay k>0 for wμ(Σ) if every circular factor of w of length at least k admits a synchronization pair. Given Σ, we say that μ is synchronizing with delay K>0 for μ() if it is synchronizing with a finite delay for every w and

sup{minx{kμ is synchronizing with delay k for μ(x)}}K.

It has been proved [30, Theorem 5.1] that a morphism is recognizable if and only if it is synchronizing with finite delay for μ(Σ). The following example shows that the recognizability of a morphism on a proper subset of μ(Σ) does not necessarily imply being synchronizing with finite delay for that subset.

Example 21.

Let τ be the Thue–Morse morphism. Figure 3(a) shows that the Thue–Morse morphism τ is not recognizable. In fact, τ(aaaaaa) and τ(bbbbbb) are equal up to rotations and produce distinct circular factorizations. However, τ is recognizable on τ(), where ={anb,bnan>0}, as shown in Figure 3(b). Observe that even though τ is not recognizable, τ is recognizable on τ(), since (τ(a),τ(b)) and (τ(b),τ(a)) are synchronization pairs that occur in every word of τ(). In the figure, the unique circular factorization of τ(a5b)=(ab)5ba is depicted; the two black squares identify the synchronization pairs (b,b) and (a,a). Moreover, τ is synchronizing with delay 11 on (ab)5ba; more in general it is synchronizing with delay 2n+1 for (ab)nba and for (ba)nab. However, τ is not synchronizing with finite delay for τ(), since the supremum of all minimum finite delays for all the words in is unbounded.

Let =τ({a,b})={ab,ba}. Unlike the previous cases, the synchronization pairs (a,a) and (b,b) occur in every word of ~(τ()) of length at least 5, hence τ is synchronizing with delay 5 for τ(). In fact, as shown in Figure 3(c), every factor of τ(abbaab)=abbabaababba having length at least 5 contains a black square that identifies a synchronization pair.

(a) w=abababababab.
(b) w=abababababba.
(c) w=abbabaababba.
Figure 3: Circular factorizations into τ(a)=ab and τ(b)=ba are depicted, where τ is the Thue–Morse morphism. On the left, two distinct circular factorizations of (ab)6 in blue and red, respectively; in the center, the unique circular factorizations of w=abababababba; on the right, the unique circular factorizations of w=abbabaababba. Each black square identifies a synchronization pair.

We now give a new combinatorial characterization of synchronizing morphisms with finite delay on μ(), for any {a,b}. This characterization is based on the powers of single letters occurring in in the case of non-recognizable primitivity-preserving morphisms, while it is based on the μ-power words in the case of non-primitivity-preserving morphisms.

Lemma 22.

Let μ:{a,b}Γ be a non-recognizable primitivity-preserving morphism and let ~a=~(){a} and ~b=~(){b}, where {a,b}. Then μ is synchronizing with finite delay on μ() if and only if at least one of the sets ~a or ~b is finite.

By using analogous techniques, one can prove that for any non-primitivity-preserving morphism, there exists a k>0 such that each k-length factor w in ~(μ(Σ)) has a synchronization pair, unless w~(μ(z)) for some zPμ.

Lemma 23.

Let μ:{a,b}Γ be a non-primitivity-preserving morphism. Then, there exists an integer k>0 such that every factor wΓk(~(μ(Σ))~({μ(z)zPμ})) has a synchronization pair.

From the previous lemma, the following result can be derived:

Theorem 24.

Let μ:{a,b}Γ be a non-primitivity-preserving morphism and let {a,b}. Then μ is synchronizing with finite delay on μ() if and only if the set ~(){wwPμ} is finite.

4 Sensitivity of the Measure 𝒓 to the Application of Morphisms

Let μ be a morphism and w a word. In [12] we defined:

Δμ+(w)=r(μ(w))r(w)

and

Δμ×(w)=r(μ(w))r(w).

Notice that Δμ+(w) may be negative for some word w. For example, let μ be the morphism over a 3-letter alphabet {a,b,c} defined as μ=(b,a,c) and let w=bcba. One has that r(w)=|rle(bwt(bcba))|=|rle(bcab)|=4 and r(μ(w))=r(acab)=|rle(bwt(acab))|=|rle(cbaa)|=3. However, when μ is defined over a binary alphabet, one can prove that Δμ+(w) is always non-negative [12, Theorem 14].

Definition 25.

The BWT additive sensitivity function and BWT multiplicative sensitivity function for a morphism μ are, respectively, the functions

ASμ(n)=maxwΣn(Δμ+(w)) and MSμ(n)=maxwΣn(Δμ×(w))

Note that the additive sensitivity function is always a non-negative function, as for every n, Δμ+(an)0 for any letter a.

Example 26.

Let us consider the period-doubling morphism π. Let us compute the value of the BWT additive sensitivity function for π when n=5. From Table 1, it is possible to conclude that ASπ(5)=MSπ(5)=2.

Table 1: The first column contains the list of all words of length 5, up to rotations. This is not restrictive, since rotations of the same word have the same value of r. The columns r(w) and r(π(w)) are used to compute ASπ(5).
w bwt(w) r(w) π(w) bwt(π(w)) r(π(w))
aaaab baaaa 2 ababababaa babbbaaaaa 4
aaabb baaba 4 abababaaaa baaabbaaaa 4
aabab bbaaa 2 ababaaabaa bbaabaaaaa 4
aabbb babba 4 ababaaaaaa baaaaabaaa 4
ababb bbbaa 2 abaaabaaaa babaaaaaaa 4
abbbb bbbba 2 abaaaaaaaa baaaaaaaaa 2
 Remark 27.

Observe that for any pair of morphisms μ1:ΣΓ and μ2:ΓΛ, and any word wΣ, it holds that Δμ1μ2+(w)=Δμ2+(w)+Δμ1+(μ2(w)).

The following lemma shows that cyclic morphisms produce words with a fixed number of BWT-runs, whatever the words on which they are applied.

Lemma 28.

Let μ:{a,b}Γ be a cyclic morphism. Then, there exist two constants kμ+,kμ×, which depend on μ, such that ASμ(n)=kμ+ and MSμ(n)=kμ×, for all n2.

Proof.

Recall that a binary morphism is cyclic if and only if there exist two integers t1,t2>0 and a non-empty word zΓ+ such that μ(a)=zt1 and μ(b)=zt2. Hence, for each word wΣ+ it holds that r(μ(w))=r(z|w|at1+|w|bt2)=r(z). Let us fix the claimed constants kμ=r(z)2 and kμ=r(z)/2. For all n2, let us consider the word sn=an1b. By Theorem 6, it follows that r(sn)=2. The proof follows by observing that since r(μ(w)) is constant, the values of Δμ+ and Δμ× are maximal when r(w) assumes the smallest value, that in the case of binary words is 2, i.e., ASμ(n)=maxwΣn(r(μ(w)r(w))=r(z)r(sn)=kμ+ and MSμ(n)=maxwΣn(r(μ(w)/r(w))=r(z)/r(sn)=kμ×.

Example 29.

Let us consider the cyclic morphism μ=(ababbba,(ababbba)2). It is possible to verify that for every w{a,b}+, one has μ(w)=(ababbba)p, for some integer p>0 depending on w. This means that r(μ(w))=r(ababbba)=6 for every w{a,b}+. For every length n, we can consider the word an1b. We have r(an1b)=2, which is the lowest value that r can take on a binary word. Then, ASμ(n)=62=4 and MSμ(n)=6/2=3, for n2.

The following characterization of Sturmian morphisms in terms of the BWT additive sensitivity function was proved in [12].

Proposition 30 ([12, Theorem 21]).

Let μ be a binary injective morphism. Then ASμ(n)=0 for every n2 if and only if μ is a Sturmian morphism.

In the same paper, we showed that the Thue–Morse morphism τ increases by 2 the BWT-runs of every binary word, while in the case of the period-doubling morphism π, for each n2 we can find an n-length word w for which Δπ+(w)=Θ(n). We summarize these results in the following proposition.

Proposition 31 ([12, Lemma 24 and Proposition 31]).

Let τ and π be the Thue–Morse and the period-doubling morphisms, respectively. The following properties hold:

  1. 1.

    ASτ(n)=2, for all n2;

  2. 2.

    ASπ(n)=Ω(n).

Note that τ is not the only morphism for which the additive sensitivity function is 2. In [12] it is proved that this property also holds for the Thue–Morse-like morphisms τp,q=(abp,baq), for some p,q>0, and any composition of these morphisms with any Sturmian morphism.

Example 32.

Let us consider the morphism μ=(abbaab,ababba). By using Remark 27, it is possible to verify that μ=τφτ, where τ and φ are, respectively, the Thue–Morse and the Fibonacci morphism. By using Propositions 30 and 31, item 1, it follows that ASμ(n)=4 for all n2.

Figure 4: Comparison of the BWT–matrices for the word w=aabab (on the left) and its image after application of the morphism μ=(baa,aba) (on the right). The dashed lines partition the rotations according to the shortest prefixes with at least one synchronization pair (highlighted in bold). The rotations in light gray correspond to the words in μ((w)). The rotations in dark gray correspond to the rotations where bwt(w) is spelled in reverse order.

5 Characterization of Binary BWT-Run Preserving Morphisms

In this section, we are interested in morphisms having a bounded BWT additive sensitivity function.

Definition 33.

Let k0 be an integer. A morphism μ:ΣΓ is called k-BWT-run preserving if for all n|Σ|, ASμ(n)k. We simply say BWT-run preserving if such a k exists.

By Lemma 28, every cyclic morphism is BWT-run preserving since it has a bounded BWT additive sensitivity function. As a main result of this section, we characterize the acyclic (and therefore, by Remark 4, injective) binary morphisms which are BWT-run preserving. In particular, we prove that they coincide with the primitivity-preserving morphisms.

We first give a lemma, in which we prove that the finite-delay synchronization of a morphism on the images of a language results in a bounded increase in the number of BWT-runs.

Lemma 34.

Let Σ, where Σ={a,b}, and let μ:ΣΓ be synchronizing with delay k>0 on μ(). Then, there exists k>0 such that Δμ+(u)k for all u.

The following lemma proves one direction of the main result.

Lemma 35.

Let μ:{a,b}Γ be an injective morphism. If μ is primitivity-preserving, then μ is BWT-run preserving.

Proof.

If μ is primitivity-preserving, then by Theorem 20, either (i) μ is recognizable or (ii) there exist an integer t>0 and a morphism ψ:{a,b}Γ such that μ=ψτt and ψψτ for all ψ:{a,b}Γ. If we fall in case (i), the thesis follows from the equivalence between recognizable morphisms and synchronizing morphisms with bounded delay [30, Theorem 5.1] and Lemma 34.

If instead we fall in case (ii), then by Proposition 31, it follows that τ increases the BWT-runs by (at most) 2. Hence, the thesis is equivalent to showing that there exists k0 such that r(ψ(u))r(u)+k, for every uτt({a,b}): by using Remark 27, this would prove that the BWT additive sensitive function is bounded by a constant, that is ASμ(n)=maxwΣn(Δμ+(w))=maxwΣn(Δψ+(τt(w))+Δτt+(w))k+2t for all n>0. By Proposition 13, we can distinguish between two subcases: (ii.a) ψ is recognizable and (ii.b) ψ is not primitivity-preserving and τt(a)Pψ. If (ii.a), the proof follows analogously to (i). If (ii.b), then by Lemma 10 all the ψ-power words in the set Pψ consist of either a single letter or rotations of a unique word in the set {anbn1}{abnn>1}. Observe that ~(τt(Σ)){(anb)m,(abn)mn3,m1}= and ~(τt(Σ)){(a2b)m,(ab2)mm1}={a2b,ab2}, for all t1, while ~(τt(Σ)){(ab)mm1}={ab,(ab)2} for t2 and ~(τ(Σ)){(ab)mm1}={(ab)mm1}. Observe that only the latter does not have a finite size; however, for t=1, by hypothesis, we have abPψ. Hence, ~(τt(Σ)) contains only a finite number of powers of elements from Pψ, and the proof follows by Theorem 24.

Now we prove the opposite direction. We consider a class of morphisms that we use to decompose a generic morphism. For any p>1, let ρp:{a,b}{a,b} denote the injective morphism (a,bp). Observe that if p>1, then ρp is not primitivity-preserving.

In the following proposition, we prove that such morphisms have an unbounded additive sensitivity function.

Proposition 36.

Let ρp=(a,bp), for some p>1. Then, ASρp(n)=Ω(n).

In the following proposition, we consider a larger class of morphisms with an unbounded additive sensitivity function.

Proposition 37.

Given an injective morphism μ:{a,b}Γ, let u,vQ(Γ) and p,q1 such that μ=(up,vq). Then,

μ=ηρqEρpE

where η=(u,v). Moreover, if pq>1, then ASμ(n)=Ω(n).

The following lemma shows that if a morphism has bounded additive sensitivity, then it is primitivity-preserving.

Lemma 38.

Let μ:{a,b}Γ be a non-primitivity-preserving injective morphism. For each k>0, there exists a word w such that Δμ+(w)>k.

Proof.

If μ(a) or μ(b) are not primitive, then the thesis follows by Proposition 37, so let us assume that μ(a),μ(b)Q(Γ).

Recall that a Lyndon word is a primitive word that is lexicographically smaller than all its proper conjugates. Since μ is injective, not primitivity-preserving, and both images are primitive words, by Lemma 11 there exists some Lyndon word xPμ such that |x|>1, μ(x)=zt for some t>1, and z is primitive. Let ψ=(a,x) and η=(μ(a),zt). Observe that: i) μ(ψ(a))=μ(a); and ii) μ(ψ(b))=μ(x)=zt. Hence, η=μψ. Then, by Proposition 37, there exists a word w{a,b}n such that Δη+(w)=Θ(n). Since the concatenation uv of two Lyndon words u and v, with u<v, is a Lyndon word (see [9]), then, for every m1, amx and axm are Lyndon words, hence, by Lemma 9, the morphism ψ=(a,x) is primitivity-preserving, and by Lemma 35 ψ is BWT-run preserving. Finally, by using Remark 27, one has Δη+(w)=Δμ+(ψ(w))+Δψ+(w)=Δμ+(ψ(w))+O(1)=Θ(n), and the thesis follows.

Example 39.

Let μ=(ba,ababaa). Let x=aab and z=babaa, where x is Lyndon and z is primitive. It holds that μ(x)=μ(aab)=babaababaa=(babaa)2=z2. We define the morphisms ψ=(a,aab) and η=(ba,(babaa)2), as described in the proof of Lemma 38. Indeed, η=μψ, as μ(ψ(a))=μ(a)=ba=η(a) and μ(ψ(b))=μ(aab)=(babaa)2=η(b). The morphism η can be written as η=(ba,babaa)ρ2, and by Proposition 37 there exists w{a,b} such that r(η(w))r(w)=Θ(n). On the other hand, ψ is primitivity-preserving, so it must be the case that r(μ(ψ(w)))r(ψ(w))=Θ(n).

From Lemmas 35 and 38, the main result of the paper can be derived.

Theorem 40.

Let μ:{a,b}Γ be an injective morphism. Then μ is BWT-run preserving if and only if it is primitivity-preserving.

Finally, we can show that there exists a finite test case, as stated in the following theorem.

Theorem 41.

Let μ:{a,b}Γ be an injective morphism. It is decidable in polynomial time in the size of μ whether μ is BWT-run preserving.

Proof.

By Lemma 9, to decide whether a given morphism μ=(u,v), for some u,vΓ+, is primitivity-preserving, we have to check the primitiveness of all the possible non-trivial solutions of the equation uvm=zn. Let tmax=max{|u|,|v|} and tmin=min{|u|,|v|}. In [15], it has been proved that there are at most O(tmax/tmin) words to check, each of these having length Θ(|u|+|v|). Since the primitiveness can be checked in linear time in the size of the words, the total time complexity is O(tmax2/tmin).

6 Morphisms with bounded multiplicative sensitivity

Even though in the case of binary morphisms the additive sensitivity is not always bounded by a constant, it is natural to wonder whether the multiplicative sensitivity is. As shown in the following example, this is not the case when the alphabet size is greater than 2.

Example 42.

Let fk$=φk(a)$ be the k-th Fibonacci word with a letter $ such that $<a<b appended. Define μ as μ($)=$, μ(a)=ab, and μ(b)=a. Then μ(f2k$)=f2k+1$. It is known that r(f2k+1$)/r(f2k$)=Ω(logn) [14]. Hence, MSμ(n)=Ω(logn).

Observe that if p>1, then the morphism ρp=(a,bp) is not primitivity-preserving. We first show that MSρp>1(n) is bounded.

Lemma 43.

Let w{a,b} be a word that contains at least two a’s and one b. Let t be the length of the longest circular run of b’s in w (i.e., the longest run of b’s in any string in (w)). It holds that

r(ρp(w))r(w)+2|~(w){abia|i[1,t]}|,

where ρp=(a,bp). Moreover, it holds Δρp+(w)2r(w) and Δρp×(w)3.

Proof.

Let t be the length of the longest circular run of b’s in w. Since ρp=(a,bp) is order-preserving, and because the last characters of ρp(a) and ρp(b) are a and b, respectively, the sequence obtained by taking the last character of each image of the lexicographically sorted rotations of w is exactly bwt(w). Moreover, the string formed by concatenating the last characters of the range of rotations starting with a in the BWT matrix of ρp(w) is exactly bwt(w)[1,|w|a]. Similarly, the string formed by concatenating the last characters of the (disjoint) ranges of rotations starting with bipa for i[1,t] is exactly bwt(w)[|w|a+1,|w|]. Strictly in between the ranges of rotations starting with b(i1)pa and bipa for some i[1,t], there is a range of rotations starting with b(i1)p+sa for each s[1,p1], all ending with the character b. In the worst case, each of these blocks of rotations can only increase the number of runs by 2. Hence, the additive increase is at most 2 times the number of circular factors of the form abia in w. This proves the first claim of the proposition.

For the second claim, observe that a change of character occurs in correspondence with each block of rotations starting with bia, for each i such that abia~(w). Hence, the second claim follows because |~(w){abia|i[1,t]}|r(w).

We now give a sketch of the main result of this section.

Theorem 44.

For every morphism μ:{a,b}Γ, there exists an integer kμ such that MSμ(n)kμ.

Proof (sketch).

We assume μ is injective, as otherwise the result follows from Lemma 28. By Proposition 37, μ can be decomposed as μ=ηρqEρpE with η=(u,v) and u,vQ(Σ). By Lemma 43, both MSρp(n)3 and MSρq(n)3, hence MSμ(n) is bounded if and only if MSη(n) is bounded. If η is primitivity-preserving, then by Lemma 35 we are done. Hence, we are left to show the case when η is not primitivity-preserving and both images are primitive. We give a sketch for this case.

Let μ=(u,v) be a non-primitivity-preserving injective morphism with u,vQ(Σ). By Lemma 10, there exists a primitive word x with |x|>1, such that Pμ=(x) and μ(x)=zt with zQ(Σ) and t>1.

As a consequence of Lemma 23, there exists an integer k>0, which depends only on μ, such that every rotation with a k-length prefix yΓk~({z}) contains a synchronization pair. Hence, we can partition these rotations according to their length-k prefix, and the characters preceding these rotations can be determined.

The remaining rotations starting with a power of some rotation of z are handled in a similar (though more complicated) fashion with respect to how rotations starting with a power of b were handled in Lemma 43. This yields an upper-bound for MSμ depending on the value |z| instead of 3.

7 Conclusions and future work

In this paper, we have provided a complete characterization of binary injective morphisms that preserve the number of BWT-runs up to a bounded additive increase. We have shown that this class coincides with the class of binary primitivity-preserving morphisms.

Primitivity-preserving morphisms could be considered a general effective tool for studying and evaluating repetitiveness measures, since such measures remain invariant, up to small constants, when applied to powers of a word. This suggests that such morphisms could be seen as a unifying framework for the analysis and comparison of different repetitiveness measures.

It would be interesting to explore the design of compression and indexing techniques based on BWT-runs that operate directly on morphic encodings of highly repetitive text collections. This could have applications, for example, in the domain of privacy-preserving algorithms. Although our current approach allows for polynomial-time decision procedures for testing whether a given binary morphism is BWT-run preserving or, equivalently, primitivity-preserving, more efficient algorithms could yield significant improvements in terms of scalability and practical performance.

Furthermore, BWT-run sensitivity could support a new classification of morphisms, providing new insights for their structural behavior and the impact on repetitiveness measures.

Finally, we plan to investigate how to extend our results to morphisms over larger alphabets.

References

  • [1] Tooru Akagi, Mitsuru Funakoshi, and Shunsuke Inenaga. Sensitivity of string compressors and repetitiveness measures. Information and Computation, 291:104999, 2023. doi:10.1016/J.IC.2022.104999.
  • [2] Evelyne Barbin-Le Rest and Michel Le Rest. Sur la combinatoire des codes à deux mots. Theoretical Computer Science, 41:61–80, 1985. doi:10.1016/0304-3975(85)90060-X.
  • [3] Marie-Pierre Béal, Dominique Perrin, and Antonio Restivo. Unambiguously coded shifts. European Journal of Combinatorics, 119:103812, 2024. doi:10.1016/J.EJC.2023.103812.
  • [4] Jean Berstel, Dominique Perrin, and Christophe Reutenauer. Codes and Automata, volume 129 of Encyclopedia of mathematics and its applications. Cambridge University Press, 2010.
  • [5] Jean Berstel and Patrice Séébold. A Characterization of Sturmian Morphisms. In MFCS, volume 711 of Lecture Notes in Computer Science, pages 281–290. Springer, 1993. doi:10.1007/3-540-57182-5_20.
  • [6] Michael Burrows and David Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
  • [7] Julien Cassaigne. An algorithm to test if a given circular HD0L-language avoids a pattern. In IFIP Congress (1), volume A-51 of IFIP Transactions, pages 459–464. North-Holland, 1994.
  • [8] Julien Cassaigne, France Gheeraert, Antonio Restivo, Giuseppe Romana, Marinella Sciortino, and Manon Stipulanti. New string attractor-based complexities for infinite words. Journal of Combinatorial Theory, Series A, 208:105936, 2024. doi:10.1016/J.JCTA.2024.105936.
  • [9] Kuo Tsai Chen, Ralph H. Fox, and Roger C. Lyndon. Free differential calculus, IV. The quotient groups of the lower central series. Annals of Mathematics, 68(1):81–95, 1958.
  • [10] Wai-Fong Chuan. Sturmian morphisms and alpha-words. Theoretical Computer Science, 225(1-2):129–148, 1999. doi:10.1016/S0304-3975(97)00239-9.
  • [11] Pál Dömösi, Sándor Horváth, Masami Ito, László Kászonyi, and Masashi Katsura. Formal languages consisting of primitive words. In Zoltán Ésik, editor, Fundamentals of Computation Theory, pages 194–203, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg. doi:10.1007/3-540-57163-9_15.
  • [12] Gabriele Fici, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina. On the Impact of Morphisms on BWT-Runs. In CPM, volume 259 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023. doi:10.4230/LIPICS.CPM.2023.10.
  • [13] Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. Journal of the ACM, 67(1):2:1–2:54, 2020. doi:10.1145/3375890.
  • [14] Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina. Bit catastrophes for the Burrows-Wheeler transform. Theory of Computing Systems, 69(2):19, 2025. doi:10.1007/S00224-024-10212-9.
  • [15] Štěpán Holub, Martin Raska, and Stepán Starosta. Binary codes that do not preserve primitivity. Journal of Automated Reasoning, 67(3):25, 2023. doi:10.1007/S10817-023-09674-2.
  • [16] Cheng-Chi Huang. A note on pure codes. Acta Informatica, 47(5-6):347–357, 2010. doi:10.1007/S00236-010-0122-7.
  • [17] Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In STOC, pages 827–840. ACM, 2018. doi:10.1145/3188745.3188814.
  • [18] Karel Klouda and Stepán Starosta. Characterization of circular D0L-systems. Theor. Comput. Sci., 790:131–137, 2019. doi:10.1016/J.TCS.2019.04.021.
  • [19] Ben Langmead and Steven L. Salzberg. Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4):357–359, 2012.
  • [20] Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009.
  • [21] Heng Li and Richard Durbin. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26(5):589–595, 2010. doi:10.1093/BIOINFORMATICS/BTP698.
  • [22] M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997.
  • [23] Roger C. Lyndon and Marcel-Paul Schützenberger. The equation am=bncp in a free group. Michigan Mathematical Journal, 9(4):289–298, 1962.
  • [24] Sabrina Mantaci, Antonio Restivo, and Marinella Sciortino. Burrows–Wheeler transform and Sturmian words. Information Processing Letters, 86(5):241–246, 2003. doi:10.1016/S0020-0190(02)00512-4.
  • [25] Victor Mitrana. Primitive morphisms. Information Processing Letters, 64(6):277–281, 1997. doi:10.1016/S0020-0190(97)00178-6.
  • [26] Gonzalo Navarro. Indexing highly repetitive string collections, part I: Repetitiveness measures. ACM Computing Surveys, 54(2):article 29, 2021.
  • [27] Gonzalo Navarro. The compression power of the BWT: technical perspective. Communications of the ACM, 65(6):90, 2022. doi:10.1145/3531443.
  • [28] Gonzalo Navarro and Cristian Urbina. Repetitiveness measures based on string morphisms. Theoretical Computer Science, 1043:115259, 2025. doi:10.1016/J.TCS.2025.115259.
  • [29] Geneviève Paquin. On a generalization of Christoffel words: epichristoffel words. Theoretical Computer Science, 410(38-40):3782–3791, 2009. doi:10.1016/J.TCS.2009.05.014.
  • [30] Antonio Restivo. On a Question of McNaughton and Papert. Information and Control, 25(1):93–101, 1974. doi:10.1016/S0019-9958(74)90821-3.
  • [31] Michel Rigo. Formal Languages, Automata and Numeration Systems 1: Introduction to Combinatorics on Words. Wiley, 2014.
  • [32] Huei-Jan Shyr and Gabriel Thierrin. Codes, languages and MOL schemes. RAIRO Theoretical Informatics and Applications, 11(4):293–301, 1977. doi:10.1051/ITA/1977110402931.
  • [33] Huei-Jan Shyr and Shyr-Shen Yu. Non-primitive words in the language p+q+. Soochow Journal of Mathematics, 20:535–546, 1994.
  • [34] Md. Vasimuddin, Sanchit Misra, Heng Li, and Srinivas Aluru. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, Rio de Janeiro, Brazil, May 20-24, 2019, pages 314–324, Los Alamitos, CA, 2019. IEEE Computer Society. doi:10.1109/IPDPS.2019.00041.