A Proof of Shur’s Conjecture on the Growth of Power-Free Languages over Large Alphabets
Abstract
We settle a conjecture of Shur on an estimation of the exponential growth rates of the languages of -free words and -free words over large alphabets of size with a correction of order .
Keywords and phrases:
power-free languages, large alphabets, Shur’s conjecture, Dejean’s conjectureCopyright and License:
2012 ACM Subject Classification:
Mathematics of computing Combinatorics on wordsEditors:
Paweł Gawrychowski, Filip Mazowiecki, and Michał SkrzypczakSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
A square is a word of the form where is a nonempty word.
A square-free word is a word that does not contain a square as a factor.
For instance, hotshots is a square, and minimize is a square-free word.
In the seminal work [16], which is widely regarded as the starting point of combinatorics on words, Thue showed that there are infinite square-free words over the ternary alphabet as well as infinite cube-free words over the binary alphabet. This notion of powers and its generalizations have received a lot of attention.
One such generalization is the notion of fractional power. A word of the form where is nonempty and is a prefix of is a power of exponent and of period (we also say that is a -power). A square is a -power, and a cube is a -power. For any real , we say that the word is -free (resp. -free) if it contains no -power as a factor with (resp. ). These notions were introduced by Dejean who conjectured in 1972 that for any , there exists a -free infinite word over the -ary alphabet, and proved that there exists no -free infinite word over the same alphabet [5]. For , she proved that there exists a free infinite word over the ternary alphabet [5], and also conjectured that there exists a free infinite word over the -ary alphabet, which was later proven by Pansiot [10]. Different authors solved Dejean’s conjecture for the cases [9], [7], [1], [2], until the remaining cases were finally proven 40 years after the initial conjecture [3, 11]. This led to another conjecture about the number of -free words over the -ary alphabet. It is conjectured that for all , this language grows exponentially [8], and that as goes to infinity, the growth rate converges to a constant whose first digits are [3, 15]. For all but six cases, it has been proven that these languages grow exponentially [6, 17, 4]. The second part of the conjecture remains completely open.
Let denote the languages of -free words over an alphabet of size . The growth rate , where is the set of words with length in , is usually hard to estimate and requires a large amount of computation. The previously mentioned conjecture asserts that . Shur suggests an interesting problem: study the asymptotic order of when for fixed . For an overview of progress on the problem, we refer the reader to the intensive survey of Shur [14]. In particular, Shur [13] settled the problem when the degree of the power is at least in Theorem 1 below.
We extend the strict total order over the reals to the numbers of the form in such a way that for all , we have . That is, is right after in this order.
Theorem 1 (Shur, 2010).
If is an integer and is in , then
However, the problem for powers with degree less than is still open and Shur [13] suggested the following conjecture.
Conjecture 2 (Shur, 2010).
For every integer , we have
Note that for all and for all , if then . This implies that is a non-decreasing function of . Moreover, if Shur’s conjecture holds then for every integer and , we have
| (1) | ||||
| (2) |
Shur’s conjecture implies that most of the jump between and is located between and . It provides precise bounds on the asymptotic behavior of tight up to for every . In particular, if is such that , then , which implies
| (3) |
That is, this conjecture provides for all a good estimate of the asymptotic behavior of as goes to infinity. This conjecture implies other similar empirical facts that also hold for and illustrate the particular behavior of ((1) and (2) are respectively called small variation and big jump in [13]).
Using a counting argument, the second author has established the lower bound in [12].
Theorem 3 (Rosenfeld, 2021).
Let be an integer, then the following holds
Shur has actually confirmed the other direction of the inequality for by a computer-assisted proof which settles the conjecture for these cases [13]. In this article, we settle Shur’s conjecture, by proving the upper bound for all .
Theorem 4.
For every integer , we have
| (4) | ||||
| (5) |
Shur’s proof for considers a regular language that contains (resp. ) and uses standard tools from automata theory to upper bound the growth of this language. His proof technique allows him to work with all , for one fixed , by considering the automaton where states are taken up to isomorphism (that is, up to renaming of the letters). However, this technique needs to explicitly construct the automaton for each specific value , which requires the use of computers. Proving the result for all asks for a different approach. Moreover, Shur’s approach is to explicitly construct the automata that avoid powers up to a given length. The states of these automata correspond to suffixes of certain lengths, hence the size of the automata grows exponentially. The main idea behind our proof is also to construct a sequence of automata for all that approximate the true automata. However, they must be simple enough to allow estimating the growth rates manually, and include all the words in the languages (to ensure that we obtain a proper upper bound), but do not include relatively too many forbidden words (to ensure that the upper bound on the growth rate is sharp).
2 Proof of the upper bounds
In this section, we settle Inequality (5) for -free words. Inequality (4) for -free words can be proved similarly.
For any two integers , we write for the set of integers , and we write for the word .
Fix . We denote by the language of words over letters that avoid -powers isomorphic to and for any and any . Since contains all -free words, the following theorem directly implies Inequality (5) of Shur’s conjecture. This section is devoted to the proof of this theorem.
Theorem 5.
The growth rate of the language is at most
The following fact simply illustrates that two occurrences of the same letter cannot be too close, as it would cause the existence of a -power isomorphic to with .
Observation 6.
Any factor from of length contains different letters.
It directly implies that the growth of is at most . The term requires considering the powers isomorphic to .
The idea is to construct an automaton whose language is a superset of . Shur noted in [13] that the size of the automaton recognizing is exponential in and is really difficult to analyze. But by considering a slightly larger language, we can construct an automaton with a much smaller number of states that we will be able to analyze, but whose growth rate is really close to the original language (up to the term).
For this, we define the following states for any long enough word (we require ):
-
for : if a suffix of is isomorphic to .
-
for : if a suffix of is isomorphic to with satisfying
We provide some intuition about this definition. The number in the states stands for the length of the longest suffix containing only distinct letters, hence the letter in front of that suffix should appear in the suffix (the only exception111It might be more natural to write this state as , but for the sake of notation it is more convenient to keep it as . being as it actually covers all the lengths from up to ). The states and differ based on whether this reoccurring letter is the last letter of the word. The states are distinguished from the other state, since when the word ends with and we know that the next letter cannot be (since this would create a power of the form ), that is, we know that at least one more letter is forbidden in the next position. The length of this power actually suggests the choice of the seemingly-arbitrary number .
Remark 7.
By definition, the words in state contain a forbidden factor. It is fine to consider them anyway because we only want an upper bound on the number of words, and introducing this “artificial” state creates some symmetries that simplify the statements and the proofs.
From here on, we assume
Observation 8.
Every word with is in exactly one of the previously defined states.
We now describe how appending a letter at the end of a word from can alter the state. In particular, we need such that for every states and for any in state ,
That is, there are at most letters such that is in and is in state .
In the following lemmas, we provide such a . For convenience, we use the following notation:
Lemma 9.
For , we can set
-
, for each state ,
-
,
-
for the other states , .
Proof.
If is in state , then up to renaming of the letters, is a suffix of for some . By Proposition 6, the next letter needs to be different from the last letters, so it belongs to . If the next letter is in , then the next state is , respectively. If the next letter is in , then the next state is always . The following lemma and its proof are almost identical to the ones above. The difference is due to the missing loop around the state .
Lemma 10.
For , we can set
-
, for each state ,
-
,
-
for the other states , .
Proof.
If is in state , then up to renaming of the letters, is a suffix of . By Proposition 6, if is in , then is different from the last letters of , so . This time, we also have since is forbidden in . In particular, if the next letter is , then the next state is , respectively. If the next letter is in , then the next state is .
In the remainder of this section, we treat as a square matrix indexed over the states. We abuse the notation and write for . By the definition of , for every in state we have
Since every long enough word from is in one of the states, we have the following upper bound on the growth of .
Proposition 11.
The growth rate of the language is at most the spectral radius of the matrix .
Theorem 4 can then be reduced to the following theorem.
Theorem 12.
For any , there exists a constant such that for all large enough , the spectral radius of is at most
Proof.
For the proof we fix , and we let , with large enough as a function of . We denote the coordinates of a vector by , . We consider the vector with
| (6) | ||||
| (7) |
We first verify that is positive when is large enough. Indeed, for all values of , we can prove that for every ,
when . We also need for any , which by (7) holds for large enough.
The nonnegativity of and Perron-Frobenius Theorem imply that an eigenvector associated to the dominant eigenvalue is nonnegative. Since is positive, it is non-orthogonal with , and we have with a positive real and a nonnegative vector. This and the nonnegativity of all the entries implies that . It follows that . Therefore, in order to deduce , it suffices to show that
where the inequality is coordinate-wise.
We now verify the inequality for each coordinate as follows.
For every ,
for some large enough that only depends on .
We now take care of the coordinates . By Lemma 9 and Lemma 10, we have for all ,
We have already proven that , for all , and the rest of the computation follows:
where the last inequality holds for large enough . This concludes our proof.
Remark 13.
Although we do not really care about how large and should be in Theorem 4, one easily verifies that and work.
3 Conclusion
As mentioned in the introduction (see Equation (3)), our result provides for all a good estimate for up to an error term . When we only care about powers isomorphic to and , the difference between -powers for and for lies only in the range of . By varying the length , we might allow ourselves to address -powers for a fraction that may be of more complicated nature than . So applying our approach for all instead of only of the form or might be enough to replace this error term by .
On the other hand, as conjectured by Shur, and later by the second author, the behavior of the growth rate up to the term is controlled by powers whose tail is of length or . It seems natural to wonder whether increasing the precision up to the term requires precisely the considerations of powers with tails of length . Shall the degree of the order of precision grow linearly with respect to the length of the tails?
References
- [1] Arturo Carpi. On Dejean’s conjecture over large alphabets. Theoret. Comput. Sci., 385(1):137–151, October 2007. doi:10.1016/j.tcs.2007.06.001.
- [2] James Currie and Narad Rampersad. Dejean’s conjecture holds for . Theoret. Comput. Sci., 410(30):2885–2888, August 2009. doi:10.1016/j.tcs.2009.01.026.
- [3] James Currie and Narad Rampersad. A proof of Dejean’s conjecture. Math. Comput., 80(274):1063–1070, April 2011. doi:10.1090/S0025-5718-2010-02407-X.
- [4] James D. Currie, Lucas Mol, and Narad Rampersad. The number of threshold words on n letters grows exponentially for every . Journal of Integer Sequences, 23, 2020. URL: https://cs.uwaterloo.ca/journals/JIS/VOL23/Mol/mol2.html.
- [5] Françoise Dejean. Sur un théorème de Thue. J. Combin. Theory Ser. A, 13(1):90–99, 1972.
- [6] Roman Kolpakov and Michael Rao. On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters. Theoret. Comput. Sci., 412(46), 2011. doi:10.1016/J.TCS.2011.08.006.
- [7] M. Mohammad-Noori and James D. Currie. Dejean’s conjecture and Sturmian words. Eur. J. Combin., 28(3):876–890, April 2007. doi:10.1016/j.ejc.2005.11.005.
- [8] Pascal Ochem. A generator of morphisms for infinite words. RAIRO-Theor. Inf. Appl., 40(3):427–441, July 2006. doi:10.1051/ita:2006020.
- [9] Jean Moulin Ollagnier. Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11 letters. Theoret. Comput. Sci., 95(2):187–205, March 1992. doi:10.1016/0304-3975(92)90264-G.
- [10] Jean-Jacques Pansiot. A propos d’une conjecture de F. Dejean sur les répétitions dans les mots. Discrete Appl. Math., 7(3):297–311, March 1984. doi:10.1016/0166-218X(84)90006-4.
- [11] Michael Rao. Last cases of Dejean’s conjecture. Theoret. Comput. Sci., 412(27):3010–3018, 2011. doi:10.1016/J.TCS.2010.06.020.
- [12] Matthieu Rosenfeld. Lower-bounds on the growth of power-free languages over large alphabets. Theory of Computing Systems, 65:1110–1116, 2021. doi:10.1007/S00224-021-10040-1.
- [13] Arseny M. Shur. Growth of power-free languages over large alphabets. In International Computer Science Symposium in Russia, pages 350–361. Springer, 2010. doi:10.1007/978-3-642-13182-0_35.
- [14] Arseny M. Shur. Growth properties of power-free languages. Computer Science Review, 6(5-6):187–208, 2012. doi:10.1016/J.COSREV.2012.09.001.
- [15] Arseny M. Shur and Irina A. Gorbunova. On the growth rates of complexity of threshold languages. RAIRO-Theor. Inf. Appl., 44(1):175–192, January 2010. doi:10.1051/ita/2010012.
- [16] Axel Thue. Über unendliche Zeichenreihen. ’Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania, 7:1–22, 1906.
- [17] Igor N. Tunev and Arseny M. Shur. On Two Stronger Versions of Dejean’s Conjecture. In Mathematical Foundations of Computer Science 2012, pages 800–812. Springer, Berlin, Germany, 2012. doi:10.1007/978-3-642-32589-2_69.