eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2023-06-21
10:1
10:18
10.4230/LIPIcs.CPM.2023.10
article
On the Impact of Morphisms on BWT-Runs
Fici, Gabriele
1
https://orcid.org/0000-0002-3536-327X
Romana, Giuseppe
1
https://orcid.org/0000-0002-3489-0684
Sciortino, Marinella
1
https://orcid.org/0000-0001-6928-0168
Urbina, Cristian
2
3
https://orcid.org/0000-0001-8979-9055
Department of Mathematics and Informatics, University of Palermo, Italy
Department of Computer Science, University of Chile, Santiago, Chile
Centre for Biotechnology and Bioengineering (CeBiB), Santiago, Chile
Morphisms are widely studied combinatorial objects that can be used for generating infinite families of words. In the context of Information theory, injective morphisms are called (variable length) codes. In Data compression, the morphisms, combined with parsing techniques, have been recently used to define new mechanisms to generate repetitive words. Here, we show that the repetitiveness induced by applying a morphism to a word can be captured by a compression scheme based on the Burrows-Wheeler Transform (BWT). In fact, we prove that, differently from other compression-based repetitiveness measures, the measure r_bwt (which counts the number of equal-letter runs produced by applying BWT to a word) strongly depends on the applied morphism. More in detail, we characterize the binary morphisms that preserve the value of r_bwt(w), when applied to any binary word w containing both letters. They are precisely the Sturmian morphisms, which are well-known objects in Combinatorics on words. Moreover, we prove that it is always possible to find a binary morphism that, when applied to any binary word containing both letters, increases the number of BWT-equal letter runs by a given (even) number. In addition, we derive a method for constructing arbitrarily large families of binary words on which BWT produces a given (even) number of new equal-letter runs. Such results are obtained by using a new class of morphisms that we call Thue-Morse-like. Finally, we show that there exist binary morphisms μ for which it is possible to find words w such that the difference r_bwt(μ(w))-r_bwt(w) is arbitrarily large.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol259-cpm2023/LIPIcs.CPM.2023.10/LIPIcs.CPM.2023.10.pdf
Morphism
Burrows-Wheeler transform
Sturmian word
Sturmian morphism
Thue-Morse morphism
Repetitiveness measure